CN105468680A - Data retrieval method and device - Google Patents
Data retrieval method and device Download PDFInfo
- Publication number
- CN105468680A CN105468680A CN201510783040.2A CN201510783040A CN105468680A CN 105468680 A CN105468680 A CN 105468680A CN 201510783040 A CN201510783040 A CN 201510783040A CN 105468680 A CN105468680 A CN 105468680A
- Authority
- CN
- China
- Prior art keywords
- classification
- probability
- inquiry
- word
- inquiry string
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/243—Natural language query formulation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/2425—Iterative querying; Query formulation based on the results of a preceding query
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a data retrieval method and device. The method comprises the following steps: receiving a query character string which is input by a user at the current time, looking up the query character string corresponding to the query character string which is input at the current time from a query-category probability table of the user, taking the category of which the category probability is greater than a first preset threshold value as the category of the query character string which is input at the current time, wherein the category corresponds to found query character string, the query-category probability table comprises the query character string, each category clicked by the user for the query character string, and a probability for clicking each category. The correlation of the query character string and a retrieval result on an aspect of category is increased, and the accuracy of the retrieval result is improved.
Description
Technical field
The present invention relates to data processing field, specifically, relate to a kind of data retrieval method and device.
Background technology
In the retrieval of electronic emporium, be generally the correlativity between the key word of foundation user input and commodity, matching degree is higher, and the sequence of commodity is more forward.Be consider the matching degree between search key and trade name as Relativity, the algorithm calculating correlativity has two kinds of more classical algorithms at present: one is cosine-algorithm, and another kind is BM algorithm.Cosine-algorithm is a kind of vector model, it is all expressed as vector the key word of the inquiry of user's input and commodity, then calculates the correlativity between these two vectors, and correlation calculations adopts cosine formula, namely calculate the angle between key word of the inquiry vector sum commodity vector, angle is more little more similar.The main thought of BM algorithm carries out participle to inquiry Query, generates word qi; Then, for each Search Results D, calculate the Relevance scores of each word qi and D, finally, qi is weighted summation relative to the Relevance scores of D, thus obtains the Relevance scores of Query and D.
The scheme of existing two kinds of calculating retrieval and indexing results relevance, no matter be cosine-algorithm or BM algorithm, emphasis is all the word correlativity calculating key word of the inquiry and commodity, more consider the text relevant between key word and trade name, commodity brief introduction, less to the consideration of the key word of the inquiry of user itself.
Summary of the invention
For solving the problems of the technologies described above, the invention provides a kind of data retrieval method and device.
According to the first aspect of embodiment of the present invention, provide a kind of data retrieval method, the method comprises: the inquiry string receiving this input of user, the inquiry string corresponding with the inquiry string that this inputs is searched in the inquiry-classification probability tables of described user, and the classification of inquiry string that the classification that the probability of classification corresponding for the inquiry string found is greater than the first predetermined threshold inputs as this, wherein, described inquiry-classification probability tables comprises inquiry string, each classification that user described in described inquiry string is clicked and the described all kinds of object probability of click.
In certain embodiments of the present invention, when not finding the inquiry string corresponding with the inquiry string that this inputs in the inquiry-classification probability tables of described user, this inquiry string inputted is carried out participle and obtains first group of word, the word corresponding with described first group of word is searched in the word-classification probability tables of described user, and the probability of classification belonging to this inquiry string inputted is determined according to the probability of the classification corresponding with described first group of word, and the classification of the inquiry string that the classification that the probability of described affiliated classification is greater than the second predetermined threshold is inputted as this, wherein, described word-classification probability tables comprises word, each classification that user described in described word is clicked and the described all kinds of object probability of click.
In certain embodiments of the present invention, the probability of the classification that described basis is corresponding with described first group of word determines that the probability of classification belonging to this inquiry string inputted comprises: the probability being determined classification belonging to this inquiry string inputted by bayesian probability model according to the probability of the classification corresponding with described first group of word.
In certain embodiments of the present invention, clicking described all kinds of object probability in described inquiry-classification probability tables is that all kinds of object number of clicks by clicking for user described in described inquiry string and the number of clicks of whole classifications clicked for user described in described inquiry string are determined.
In certain embodiments of the present invention, described inquiry-classification probability tables and described word-classification probability tables carry out regular update according to the inquiry log of described user and click logs.
In certain embodiments of the present invention, described first group of word comprises afterbody word and non-afterbody word.
In certain embodiments of the present invention, described method also comprises: after the classification determining this inquiry string inputted, and the result for retrieval sequence of classification to the inquiry string that this inputs according to the described inquiry string determined is weighted.
According to the second aspect of embodiment of the present invention, provide a kind of data searcher, this device comprises: receiver module, for receiving the inquiry string of this input of user; Search module, for searching the inquiry string corresponding with the inquiry string that this inputs in the inquiry-classification probability tables of described user, and the classification of inquiry string that the classification that the probability of classification corresponding for the inquiry string found is greater than the first predetermined threshold inputs as this, wherein, described inquiry-classification probability tables comprise inquiry string, each classification that user described in described inquiry string is clicked and click described all kinds of object probability.
In certain embodiments of the present invention, describedly search module, time also for not finding the inquiry string corresponding with the inquiry string that this inputs in the inquiry-classification probability tables of described user, this inquiry string inputted is carried out participle and obtains first group of word, the word corresponding with described first group of word is searched in the word-classification probability tables of described user, and the probability of classification belonging to this inquiry string inputted is determined according to the probability of the classification corresponding with described first group of word, and the classification of the inquiry string that the classification that the probability of described affiliated classification is greater than the second predetermined threshold is inputted as this, wherein, described word-classification probability tables comprises word, each classification that user described in described word is clicked and the described all kinds of object probability of click.
In certain embodiments of the present invention, search module described in and determine that the probability of classification belonging to this inquiry string inputted comprises according to the probability of the classification corresponding with described first group of word: the probability being determined classification belonging to this inquiry string inputted by bayesian probability model according to the probability of the classification corresponding with described first group of word.
In certain embodiments of the present invention, clicking described all kinds of object probability in described inquiry-classification probability tables is that all kinds of object number of clicks by clicking for user described in described inquiry string and the number of clicks of whole classifications clicked for user described in described inquiry string are determined.
In certain embodiments of the present invention, described inquiry-classification probability tables and described word-classification probability tables carry out regular update according to the inquiry log of described user and click logs.
In certain embodiments of the present invention, described first group of word comprises afterbody word and non-afterbody word.
In certain embodiments of the present invention, described device also comprises: weighting block, for after the classification determining this inquiry string inputted, the result for retrieval sequence of classification to the inquiry string that this inputs according to the described inquiry string determined is weighted.
The data retrieval method that embodiment of the present invention provides and device, by portraying the search intention of user based on the inquiry log of user and the inquiry-classification probability tables of click logs gained and word-classification probability tables, add the correlativity in affiliated classification between inquiry string and result for retrieval, improve accuracy and the specific aim of result for retrieval.
Accompanying drawing explanation
Fig. 1 is the schematic flow sheet of the data retrieval method according to one embodiment of the present invention;
Fig. 2 is the structural representation of the data searcher according to one embodiment of the present invention;
Fig. 3 is the structural representation of the data searcher according to one embodiment of the present invention.
Embodiment
Be described in detail to various aspects of the present invention below in conjunction with the drawings and specific embodiments.Wherein, well-known module, unit and connection each other, link, communication or operation do not illustrate or do not elaborate.Further, described feature, framework or function can combine by any way in one or more embodiments.It will be appreciated by those skilled in the art that following various embodiments are only for illustrating, but not for limiting the scope of the invention.Can also easy understand, the module in each embodiment described herein and shown in the drawings or unit or processing mode can be undertaken combining and designing by various different configuration.
See the schematic flow sheet that Fig. 1, Fig. 1 are the data retrieval methods according to one embodiment of the present invention, the method can comprise:
S101, receives the inquiry string of user's input,
S102, the inquiry string corresponding with the inquiry string that this inputs is searched in the inquiry-classification probability tables of described user, and the classification of inquiry string that the classification that the probability of classification corresponding for the inquiry string found is greater than the first predetermined threshold inputs as this
Wherein, described inquiry-classification probability tables comprise inquiry string, each classification that user described in described inquiry string is clicked and click described all kinds of object probability.
Data retrieval method of the present invention is applicable to the item retrieves of all kinds of electric business website, can be used as an additional data treatment scheme in the application server of electric business website.Specifically, can comprise: step S101, receive the inquiry string of user's input.Such as, user can input need the keyword of inquiry as inquiry string in the item retrieves frame of the electric business website of webpage or client, can comprise Chinese character, English word or numeral etc., the application server receiving inquiry request receives the inquiry string of user's input.
Next, perform step S102, the inquiry string corresponding with the inquiry string that this inputs is searched according in the inquiry-classification probability tables of described user, and the classification of inquiry string that the classification that the probability of classification corresponding for the inquiry string found is greater than the first predetermined threshold inputs as this, wherein, the inquiry-classification probability tables of this user comprise inquiry string, each classification that this user of this inquiry string is clicked and click described all kinds of object probability.Wherein, inquiry-classification probability tables carries out statistics to the inquiry log of this user and the data of click logs to obtain.In the embodiment that some are concrete, inquiry-classification the probability tables of a user can be <query1, < classification i, Probability p 11>, < classification j, Probability p 12>, < classification n, Probability p 1n>>, <query2, < classification i, Probability p 21>, < classification j, Probability p 22>, < classification n, Probability p 2n>> etc., wherein, query1 and query2 is the inquiry string that user inputs, can obtain from the inquiry log of user, classification i and classification j is the classification clicked for this user of inquiry string query1 and query2, P11, P21, P12 and P22 etc. click the probability of classification i and classification j for this user of inquiry query1 and query2, these can calculate based on the click logs of user.For an inquiry string, user can click multiple classification (such as, commodity classification), the probability that user clicks a classification is determined by the number of clicks of this classification clicked for user described in this inquiry string and the number of clicks of whole classifications clicked for this user of this inquiry string, particularly, can be the number of clicks of number of clicks divided by whole classification of this classification.Statistic for these numbers of clicks of inquiry string can obtain from the inquiry log of this user and corresponding click logs.That is, the historical query that the inquiry-classification probability tables of user of the present invention comprises this user and the classification that these historical querys are clicked, the probability clicking each classification by calculating this user can portray the search intention of this user comparatively clearly, improves the accuracy of the retrieval of this inquiry.And, regularly can also carry out statistical treatment to the search daily record of user and click logs, inquiry-classification the probability tables of this user of regular update and word-classification probability tables, improve inquiry-classification probability tables and word-classification probability tables ageing what portray in user view.
In step S102, the inquiry string corresponding with the character string that in step S101, this inputs is searched in the inquiry-classification probability tables of this user, namely, search in the inquiry-classification probability tables of this user and whether preserve the higher historical query character string of the inquiry string similarity that inputs with this, if preserve such inquiry string, so for this inquiry string, there is corresponding click classification and the probability of each click classification, the probability of the classification this user can clicked is greater than one or more classifications of the first predetermined threshold (such as, 2 or 3 etc.) as the classification of the inquiry string of this input.Wherein, the first predetermined threshold can be determined according to repeatedly adding up, and can also determine according to other modes.In the embodiment that some are concrete, the classification of the inquiry string that the classification of the maximum probability of the classification that this user can be selected to click inputs as this.
After the classification determining this inquiry string inputted, can according to the classification of determined inquiry string to the result for retrieval of the inquiry string that this inputs (such as, multiple classification can be comprised) sorting is weighted, such as, can by result for retrieval weighting corresponding for the classification of this inquiry string inputted in the result for retrieval of the inquiry string inputted this, the result for retrieval sequence making the described classification determined corresponding in advance, is preferentially shown to this user.That is, data retrieval method of the present invention can be combined with the retrieval based on text relevant of the prior art, improves accuracy and the specific aim of result for retrieval.
Data retrieval method of the present invention also can comprise: this user-classification probability tables in do not find the inquiry string corresponding with the inquiry string that this inputs time, this inquiry string inputted is carried out participle and obtains first group of word, the word corresponding with first group of word is searched in the word-classification probability tables of this user, and the probability of classification belonging to this inquiry string inputted is determined according to the probability of the classification corresponding with first group of word, and the classification of the inquiry string that the classification that the probability of described affiliated classification is greater than the second predetermined threshold is inputted as this, wherein, described word-classification probability tables comprises word, each classification that user described in described word is clicked and the described all kinds of object probability of click.Word in word of the present invention-classification probability tables is the word that inquiry string carries out participle gained, each classification that this user of this word is clicked be for this word place inquiry string (such as, for a word, may reside in multiple queries character string, that is, the inquiry string at this word place can comprise multiple queries character string, as the word " mobile phone " in example below, be included in two inquiry strings) each classification of clicking of this user, clicking described all kinds of object probability is all kinds of object probability clicked for this user of inquiry string at this word place, one that word a is corresponding number of times clicking classification C1 is the summation sumc1 this user of one or more inquiry strings at this word a place being clicked to the number of times of classification C1, for this word a, this user clicks the probability of classification C1 for number of clicks sumc1 is divided by the number of clicks of the whole classifications clicked for this user of this word.One group of word is obtained after the inquiry string of this input is analyzed, the corresponding classification of this group word can be searched in the word of this user-classification probability tables, then can be determined the probability of classification belonging to this inquiry string inputted according to the probability of found classification by Bayesian model, and the classification of the inquiry string that one or more classifications that the probability of classification belonging to described determining is greater than the second predetermined threshold are inputted as this.In the embodiment that some are concrete, the classification of the inquiry string that the classification of maximum probability in affiliated classification can be selected to input as this.
In embodiments of the present invention, to this input or in inquiry log the participle of inquiry string can comprise: inquiry string is divided into afterbody word and non-afterbody word, that is, afterbody word and non-afterbody word are comprised to one group of word that inquiry string participle obtains, afterbody word refers to the word at the end being positioned at inquiry string, represent with tail herein, non-afterbody word refers to the word outside the afterbody being positioned at inquiry string, alternatively, be the word of the head being positioned at inquiry string, represent with head herein.In the present invention, inquiry string being divided into afterbody word and non-afterbody word, is consider that, for Chinese, subject is generally positioned at afterbody, and the search intention of inquiry string more can be expressed in the word of afterbody.
According to Bayes formula, belonging to an inquiry string query, the probability P (c|query) of classification c is:
Wherein, P (query) represents the probability that inquiry string query occurs, have nothing to do with the classification of inquiry string, can ignore, so formula (1) can be expressed as:
P(c|query)=P(query|c)P(c)=P(head,tail|c)P(c)(2)
Both head and tail in setting inquiry string query are independently, and so formula (2) can be expressed as:
In formula (3), hi is the word of head word head participle, and it doesn't matter for the affiliated classification of P (tail) and P (hi) and inquiry string, and can ignore, so formula (3) can be expressed as:
In formula (4), P (c) represents the probability of inhomogeneity query in all query, and it is the same for setting this classification rate, and so formula (4) can be expressed as:
In concrete application process, the part on the right side of the equal sign of formula (5) can be taken the logarithm, takes advantage of the company of being converted to add from by the company on right side, that is, can be write as following form:
By formula (6) above, the probability of each classification belonging to this inquiry string can will be calculated after an inquiry string participle, the classification of the inquiry string selecting the one or more classifications being greater than the second predetermined threshold value (such as, can determine according to statistics) to input as this in the probability of classification belonging to determining.In the embodiment that some are concrete, can the classification of inquiry string that inputs as this of the maximum classification of select probability.
After carrying out participle according to the inquiry string inputted this and determining affiliated classification, can be weighted according to the result for retrieval sequence of the classification of the described inquiry string determined to this inquiry string, make the result for retrieval of weighting preferentially show the result for retrieval of the affiliated classification of this inquiry string.And, can be combined with the retrieval technique based on text relevant in prior art, increase the correlativity in affiliated classification between result for retrieval and inquiry string, thus improve accuracy and the specific aim of result for retrieval.
With concrete example, the detailed process by inquiry of the present invention-classification probability tables and word-classification probability tables inquiry is described below.Have two query inside inquiry log: i Phone, iphone5s mobile phone, click logs comprises the number of clicks of these two query:
< i Phone, < classification 1, number of clicks i>, < classification 2, number of clicks j>>;
<iphone5s mobile phone, < classification 2, number of clicks x>, < classification 3, number of clicks y>>,
The inquiry calculated-classification probability tables:
< i Phone, < classification 1, clicks probability P i>, < classification 2, clicks probability P j>>;
<iphone5s mobile phone, < classification 2, clicks probability P x>, < classification 3, clicks probability P y>>.
Be " apple " and " mobile phone " by query i Phone participle, then have word-classification frequency table:
< apple, < classification 1, number of clicks i>, < classification 2, number of clicks j>>,
< mobile phone, < classification 1, number of clicks i>, < classification 2, number of clicks j>>;
To inquire about iphone5s mobile phone participle for " iphone ", " 5s " and " mobile phone ", then have:
<iphone, < classification 2, number of clicks x>, < classification 3, number of clicks y>>,
<5s, < classification 2, number of clicks x>, < classification 3, number of clicks y>>,
< mobile phone, < classification 2, number of clicks x>, < classification 3, number of clicks y>>.
The result superposition of above-mentioned two inquiry participles, then have:
< apple, < classification 1, number of clicks i>, < classification 2, number of clicks j>>;
<iphone, < classification 2, number of clicks x>, < classification 3, number of clicks y>>;
<5s, < classification 2, number of clicks x>, < classification 3, number of clicks y>>;
< mobile phone, < classification 1, number of clicks i>, < classification 2, number of clicks j+ number of clicks x>, < classification 3, number of clicks y>>.
Calculate word-classification probability tables thus:
< apple, < classification 1, clicks probability P 11>, < classification 2, clicks probability P 12>>;
<iphone, < classification 2, clicks probability P 21>, < classification 3, clicks probability P 22>>;
<5s, < classification 2, clicks probability P 31>, < classification 3, clicks probability P 32>>;
< mobile phone, < classification 1, clicks probability P 41>, < classification 2, clicks probability P 42>, < classification 3, click probability P 43>>
Wherein, P11=i/ (i+j), P12=j/ (i+j), P21=x/ (x+y), P22=y/ (x+y), P31=x/ (x+y), P32=y/ (x+y), P41=i/ (i+j+x+y), P42=(j+x)/(i+j+x+y), P43=y/ (i+j+x+y).
When this input inquiry character string of user " apple 5s ", if do not found in inquiry-classification probability tables " apple 5s ", be then " apple " and " 5s " by its participle, then in word-classification probability tables, search the classification probability array of each participle, the classification of this inquiry string inputted can be determined according to above-mentioned Bayes formula.
Data retrieval method of the present invention, by portraying the search intention of the inquiry string of this user itself based on the inquiry log of user and the inquiry-classification probability tables of click logs gained and word-classification probability tables, improve the degree of accuracy of result for retrieval, strengthen the specific aim to user search.
Describe data retrieval method of the present invention in conjunction with embodiment above, describe the data searcher of application said method below in conjunction with embodiment.
See the structural representation that Fig. 2, Fig. 2 are the data searchers according to one embodiment of the present invention, this device 200 can comprise:
Receiver module 201, for receiving the inquiry string of this input of user;
Search module 202, for searching the inquiry string corresponding with the inquiry string that this inputs in the inquiry-classification probability tables of described user, and the classification of inquiry string that the classification that the probability of classification corresponding for the inquiry string found is greater than the first predetermined threshold inputs as this, wherein, described inquiry-classification probability tables comprise inquiry string, each classification that user described in described inquiry string is clicked and click described all kinds of object probability.
Data searcher 200 of the present invention is applicable to the item retrieves of all kinds of electric business website, can as an additional data processing module in the application server of electric business website, and can communicate with other data processing modules in application server, obtain the data of other data processing modules.Below modules is specifically described.
Receiver module 201 can receive the inquiry string of user's input.User can input need the keyword of inquiry as inquiry string in the item retrieves frame of the electric business website of webpage or client, can comprise Chinese character, English word or numeral etc.The receiver module 201 being arranged at application server can receive the inquiry string of user's input, as the inquiry string of this input of user.
Search module 202 and search the inquiry string corresponding with the inquiry string that this inputs according in the inquiry-classification probability tables of described user, and the classification of inquiry string that the classification that the probability of classification corresponding for the inquiry string found is greater than the first predetermined threshold inputs as this, wherein, the inquiry-classification probability tables of this user comprise inquiry string, each classification that this user of this inquiry string is clicked and click described all kinds of object probability.Wherein, inquiry-classification probability tables carries out statistics to the inquiry log of this user and the data of click logs to obtain.In the embodiment that some are concrete, inquiry-classification the probability tables of a user can be <query1, < classification i, Probability p 11>, < classification j, Probability p 12>, < classification n, Probability p 1n>>, <query2, < classification i, Probability p 21>, < classification j, Probability p 22>, < classification n, Probability p 2n>> etc., wherein, query1 and query2 is the inquiry string that user inputs, classification i and classification j is the classification clicked for this user of inquiry string query1 and query2, P11, P21, P12 and P22 etc. click the probability of classification i and classification j for this user of inquiry query1 and query2.For an inquiry string, user can click multiple classification (such as, commodity classification), the probability that user clicks a classification is determined by the number of clicks of this classification clicked for user described in this inquiry string and the number of clicks of whole classifications clicked for this user of this inquiry string, particularly, can be the number of clicks of number of clicks divided by whole classification of this classification.Statistic for these numbers of clicks of inquiry string can obtain from the inquiry log of this user and corresponding click logs.That is, the historical query that the inquiry-classification probability tables of user of the present invention comprises this user and the classification that these historical querys are clicked, the probability clicking each classification by calculating this user can portray the search intention of this user comparatively clearly, improves the accuracy of the retrieval of this inquiry.And, regularly can also carry out statistical treatment, the inquiry-classification probability tables of this user of regular update to the search daily record of user and click logs.
Search module 202 and may be used for searching in the inquiry-classification probability tables of this user whether preserve the higher historical query character string of the inquiry string similarity that inputs with this, if preserve such inquiry string, so for this inquiry string, there is corresponding click classification and the probability of each click classification, the probability of the classification this user can clicked is greater than the classification of one or more classifications (such as, 2 or 3 etc.) as the inquiry string of this input of the first predetermined threshold.Wherein, the first predetermined threshold can be determined according to repeatedly adding up, and can also determine according to other modes.In the embodiment that some are concrete, the classification of the inquiry string that the classification of the maximum probability of the classification that this user can be selected to click inputs as this.
Search module 202 after the classification determining this inquiry string inputted, the classification result determined can be sent to weighting block 203, as shown in Figure 3.Weighting block 203 can be weighted according to the result for retrieval sequence of the classification of determined inquiry string to the inquiry string that this inputs, such as, can by result for retrieval weighting corresponding for the classification of this inquiry string inputted in the result for retrieval of the inquiry string inputted this, the result for retrieval sequence making the described classification determined corresponding in advance, is preferentially shown to this user.
Of the present invention search module 202 can also be used for this user-classification probability tables in do not find the inquiry string corresponding with the inquiry string that this inputs time, this inquiry string inputted is carried out participle and obtains first group of word, the word corresponding with first group of word is searched in the word-classification probability tables of this user, and the probability of classification belonging to this inquiry string inputted is determined according to the probability of the classification corresponding with first group of word, and the classification of the inquiry string that the classification that the probability of described affiliated classification is greater than the second predetermined threshold is inputted as this, wherein, described word-classification probability tables comprises word, each classification that user described in described word is clicked and the described all kinds of object probability of click.Word in word of the present invention-classification probability tables is the word that inquiry string carries out participle gained, each classification that this user of this word is clicked be for this word place inquiry string (such as, for a word, may reside in multiple queries character string, that is, the inquiry string at this word place can comprise multiple queries character string) each classification of clicking of this user, clicking described all kinds of object probability is all kinds of object probability clicked for this user of inquiry string at this word place, , one that word a is corresponding number of times clicking classification C1 is the summation sumc1 this user of one or more inquiry strings at this word a place being clicked to the number of times of classification C1, for this word a, this user clicks the probability of classification C1 for number of clicks sumc1 is divided by the number of clicks of the whole classifications clicked for this user of this word.One group of word is obtained after the inquiry string of this input is analyzed, the corresponding classification of this group word can be searched in the word of this user-classification probability tables, then can be determined the probability of classification belonging to this inquiry string inputted according to the probability of found classification by Bayesian model, and the classification of the inquiry string that one or more classifications that the probability of classification belonging to described determining is greater than the second predetermined threshold are inputted as this.In the embodiment that some are concrete, the classification of the inquiry string that the classification of maximum probability in affiliated classification can be selected to input as this.
The participle searching module 202 pairs of inquiry strings can comprise: inquiry string is divided into afterbody word and non-afterbody word, that is, afterbody word and non-afterbody word are comprised to one group of word that inquiry string participle obtains, afterbody word refers to the word at the end being positioned at inquiry string, represent with tail herein, non-afterbody word refers to the word outside the afterbody being positioned at inquiry string, alternatively, be the word of the head being positioned at inquiry string, represent with head herein.In the present invention, inquiry string being divided into afterbody word and non-afterbody word, is consider that, for Chinese, subject is generally positioned at afterbody, and the search intention of inquiry string more can be expressed in the word of afterbody.
The probability searching module 202 classification belonging to Bayes formula determination inquiry string can carry out to (6) according to the formula (1) of method part above, does not repeat them here.
Search module 202 after carrying out participle according to the inquiry string inputted this and determining affiliated classification, the classification result determined can be sent to weighting block 203.Weighting block 203 can be weighted according to the result for retrieval sequence of the classification of the described inquiry string determined to this inquiry string, makes the result for retrieval of weighting preferentially show the result for retrieval of the affiliated classification of this inquiry string.And, can be combined with the retrieval technique based on text relevant in prior art, increase the correlativity in affiliated classification between result for retrieval and inquiry string, thus improve accuracy and the specific aim of result for retrieval.
Through the above description of the embodiments, those skilled in the art can be well understood to the present invention and can realize by the mode of software combined with hardware platform.Based on such understanding, what technical scheme of the present invention contributed to background technology can embody with the form of software product in whole or in part, this computer software product can be stored in storage medium, as ROM/RAM, magnetic disc, CD etc., comprising some instructions in order to make a computer equipment (can be personal computer, server, smart mobile phone or the network equipment etc.) perform the method described in some part of each embodiment of the present invention or embodiment.
The term used in instructions of the present invention and wording, just to illustrating, are not meaned and are formed restriction.It will be appreciated by those skilled in the art that under the prerequisite of the ultimate principle not departing from disclosed embodiment, can various change be carried out to each details in above-mentioned embodiment.Therefore, scope of the present invention is only determined by claim, and in the claims, except as otherwise noted, all terms should be understood by the most wide in range rational meaning.
Claims (14)
1. a data retrieval method, is characterized in that, described method comprises:
Receive the inquiry string of this input of user,
The inquiry string corresponding with the inquiry string that this inputs is searched in the inquiry-classification probability tables of described user, and the classification of inquiry string that the classification that the probability of classification corresponding for the inquiry string found is greater than the first predetermined threshold inputs as this
Wherein, described inquiry-classification probability tables comprise inquiry string, each classification that user described in described inquiry string is clicked and click described all kinds of object probability.
2. method according to claim 1, is characterized in that, described method also comprises:
When not finding the inquiry string corresponding with the inquiry string that this inputs in the inquiry-classification probability tables of described user, this inquiry string inputted is carried out participle and obtains first group of word, the word corresponding with described first group of word is searched in the word-classification probability tables of described user, and the probability of classification belonging to this inquiry string inputted is determined according to the probability of the classification corresponding with described first group of word, and the classification of the inquiry string that the classification that the probability of described affiliated classification is greater than the second predetermined threshold is inputted as this
Wherein, described word-classification probability tables comprise word, each classification that user described in described word is clicked and click described all kinds of object probability.
3. method according to claim 2, it is characterized in that, the probability of the classification that described basis is corresponding with described first group of word determines that the probability of classification belonging to this inquiry string inputted comprises: the probability being determined classification belonging to this inquiry string inputted by bayesian probability model according to the probability of the classification corresponding with described first group of word.
4. according to the method in any one of claims 1 to 3, it is characterized in that, clicking described all kinds of object probability in described inquiry-classification probability tables is that all kinds of object number of clicks by clicking for user described in described inquiry string and the number of clicks of whole classifications clicked for user described in described inquiry string are determined.
5. according to the method in claim 2 or 3, it is characterized in that, described inquiry-classification probability tables and described word-classification probability tables carry out regular update according to the inquiry log of described user and click logs.
6. method according to claim 2, is characterized in that, described first group of word comprises afterbody word and non-afterbody word.
7. according to the method in any one of claims 1 to 3, it is characterized in that, described method also comprises:
After the classification determining this inquiry string inputted, the result for retrieval sequence of classification to the inquiry string that this inputs according to the described inquiry string determined is weighted.
8. a data searcher, is characterized in that, described device comprises:
Receiver module, for receiving the inquiry string of this input of user;
Search module, for searching the inquiry string corresponding with the inquiry string that this inputs in the inquiry-classification probability tables of described user, and the classification of inquiry string that the classification that the probability of classification corresponding for the inquiry string found is greater than the first predetermined threshold inputs as this, wherein, described inquiry-classification probability tables comprise inquiry string, each classification that user described in described inquiry string is clicked and click described all kinds of object probability.
9. device according to claim 8, it is characterized in that, describedly search module, time also for not finding the inquiry string corresponding with the inquiry string that this inputs in the inquiry-classification probability tables of described user, this inquiry string inputted is carried out participle and obtains first group of word, the word corresponding with described first group of word is searched in the word-classification probability tables of described user, and the probability of classification belonging to this inquiry string inputted is determined according to the probability of the classification corresponding with described first group of word, and the classification of the inquiry string that the classification that the probability of described affiliated classification is greater than the second predetermined threshold is inputted as this, wherein, described word-classification probability tables comprises word, each classification that user described in described word is clicked and the described all kinds of object probability of click.
10. device according to claim 9, it is characterized in that, described in search module and determine that the probability of classification belonging to this inquiry string inputted comprises according to the probability of the classification corresponding with described first group of word: the probability being determined classification belonging to this inquiry string inputted by bayesian probability model according to the probability of the classification corresponding with described first group of word.
Device according to any one of 11. according to Claim 8 to 10, it is characterized in that, clicking described all kinds of object probability in described inquiry-classification probability tables is that all kinds of object number of clicks by clicking for user described in described inquiry string and the number of clicks of whole classifications clicked for user described in described inquiry string are determined.
12. devices according to claim 9 or 10, is characterized in that, described inquiry-classification probability tables and described word-classification probability tables carry out regular update according to the inquiry log of described user and click logs.
13. devices according to claim 9, is characterized in that, described first group of word comprises afterbody word and non-afterbody word.
Device according to any one of 14. according to Claim 8 to 10, is characterized in that, described device also comprises:
Weighting block, for after the classification determining this inquiry string inputted, the result for retrieval sequence of classification to the inquiry string that this inputs according to the described inquiry string determined is weighted.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201510783040.2A CN105468680A (en) | 2015-11-16 | 2015-11-16 | Data retrieval method and device |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201510783040.2A CN105468680A (en) | 2015-11-16 | 2015-11-16 | Data retrieval method and device |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN105468680A true CN105468680A (en) | 2016-04-06 |
Family
ID=55606381
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201510783040.2A Pending CN105468680A (en) | 2015-11-16 | 2015-11-16 | Data retrieval method and device |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN105468680A (en) |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN106909642A (en) * | 2017-02-20 | 2017-06-30 | 中国银行股份有限公司 | Database index method and system |
| CN111159552A (en) * | 2019-12-30 | 2020-05-15 | 北京每日优鲜电子商务有限公司 | Commodity searching method, commodity searching device, server and storage medium |
| TWI753267B (en) * | 2019-06-14 | 2022-01-21 | 劉國良 | System and implementation method thereof for optimizing consumption recommendation information and purchasing decisions |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP3012482B2 (en) * | 1995-05-24 | 2000-02-21 | 日本電気株式会社 | String data management system |
| CN102033877A (en) * | 2009-09-27 | 2011-04-27 | 阿里巴巴集团控股有限公司 | Search method and device |
| CN103034665A (en) * | 2011-10-10 | 2013-04-10 | 阿里巴巴集团控股有限公司 | Information searching method and device |
| CN103310343A (en) * | 2012-03-15 | 2013-09-18 | 阿里巴巴集团控股有限公司 | Commodity information issuing method and device |
| CN104424296A (en) * | 2013-09-02 | 2015-03-18 | 阿里巴巴集团控股有限公司 | Query word classifying method and query word classifying device |
-
2015
- 2015-11-16 CN CN201510783040.2A patent/CN105468680A/en active Pending
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP3012482B2 (en) * | 1995-05-24 | 2000-02-21 | 日本電気株式会社 | String data management system |
| CN102033877A (en) * | 2009-09-27 | 2011-04-27 | 阿里巴巴集团控股有限公司 | Search method and device |
| CN103034665A (en) * | 2011-10-10 | 2013-04-10 | 阿里巴巴集团控股有限公司 | Information searching method and device |
| CN103310343A (en) * | 2012-03-15 | 2013-09-18 | 阿里巴巴集团控股有限公司 | Commodity information issuing method and device |
| CN104424296A (en) * | 2013-09-02 | 2015-03-18 | 阿里巴巴集团控股有限公司 | Query word classifying method and query word classifying device |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN106909642A (en) * | 2017-02-20 | 2017-06-30 | 中国银行股份有限公司 | Database index method and system |
| TWI753267B (en) * | 2019-06-14 | 2022-01-21 | 劉國良 | System and implementation method thereof for optimizing consumption recommendation information and purchasing decisions |
| CN111159552A (en) * | 2019-12-30 | 2020-05-15 | 北京每日优鲜电子商务有限公司 | Commodity searching method, commodity searching device, server and storage medium |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11651286B2 (en) | Method and system for distributed machine learning | |
| US20210224286A1 (en) | Search result processing method and apparatus, and storage medium | |
| US10685185B2 (en) | Keyword recommendation method and system based on latent Dirichlet allocation model | |
| US10482875B2 (en) | Word hash language model | |
| US20200074301A1 (en) | End-to-end structure-aware convolutional networks for knowledge base completion | |
| US9767183B2 (en) | Method and system for enhanced query term suggestion | |
| US10289957B2 (en) | Method and system for entity linking | |
| CN104899322A (en) | Search engine and implementation method thereof | |
| CN115168537B (en) | Training method and device for semantic retrieval model, electronic equipment and storage medium | |
| CN108170650B (en) | Text comparison method and text comparison device | |
| CN103870505A (en) | Query term recommending method and query term recommending system | |
| US8923655B1 (en) | Using senses of a query to rank images associated with the query | |
| CN102682001A (en) | Method and device for determining suggest word | |
| US20220414144A1 (en) | Multi-task deep hash learning-based retrieval method for massive logistics product images | |
| US10990626B2 (en) | Data storage and retrieval system using online supervised hashing | |
| CN110674635B (en) | Method and device for dividing text paragraphs | |
| CN110795527A (en) | Candidate entity ordering method, training method and related device | |
| WO2019133206A1 (en) | Search engine for identifying analogies | |
| EP3278238A1 (en) | Fast orthogonal projection | |
| CN104615723A (en) | Determining method and device of search term weight value | |
| CN105468680A (en) | Data retrieval method and device | |
| CN109657060A (en) | safety production accident case pushing method and system | |
| Kedia et al. | Keep learning: Self-supervised meta-learning for learning from inference | |
| CN114064929B (en) | Search ordering method and device | |
| CN104462347A (en) | Keyword classifying method and device |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| RJ01 | Rejection of invention patent application after publication |
Application publication date: 20160406 |
|
| RJ01 | Rejection of invention patent application after publication |