CN110766486A - Method and device for determining item category - Google Patents
Method and device for determining item category Download PDFInfo
- Publication number
- CN110766486A CN110766486A CN201810743678.7A CN201810743678A CN110766486A CN 110766486 A CN110766486 A CN 110766486A CN 201810743678 A CN201810743678 A CN 201810743678A CN 110766486 A CN110766486 A CN 110766486A
- Authority
- CN
- China
- Prior art keywords
- category
- article
- keywords
- determining
- keyword
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
- 
        - G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
- G06Q30/0601—Electronic shopping [e-shopping]
 
- 
        - G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
- G06Q30/0601—Electronic shopping [e-shopping]
- G06Q30/0623—Electronic shopping [e-shopping] by investigating goods or services
- G06Q30/0625—Electronic shopping [e-shopping] by investigating goods or services by formulating product or service queries, e.g. using keywords or predefined options
 
- 
        - G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
- G06Q30/0601—Electronic shopping [e-shopping]
- G06Q30/0631—Recommending goods or services
 
Landscapes
- Business, Economics & Management (AREA)
- Accounting & Taxation (AREA)
- Finance (AREA)
- Development Economics (AREA)
- Economics (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a method and a device for determining article categories, and relates to the technical field of computers. One embodiment of the method comprises: determining a keyword of each category according to transaction behavior data of a user and preset category data, and determining a keyword of an article according to description information of the article; and determining the category to which the article belongs according to the key words of each category and the key words of the article. According to the method and the system, the existing data are analyzed, and the category to which the article belongs is objectively and accurately judged, so that the problems that the error and inconvenience of article classification management are caused by the fact that a merchant selects the category of the article by himself or the customer shopping experience is poor due to the fact that the desired article cannot be searched or the searched article does not meet the customer requirement are solved.
    Description
Technical Field
      The invention relates to the technical field of computers, in particular to a method and a device for determining categories of articles.
    Background
      At present, with the development of the e-commerce industry, more and more merchants begin to sell their commodities on various e-commerce platforms, so that the commodity data rapidly increases, and a huge challenge is brought to the commodity data management and application system of the e-commerce platform. When the e-commerce platform manages the commodity data, the commodity is generally required to be classified, that is: the basic characteristics of the selected proper commodities are used as classification marks to classify the commodities, and the commodities are sequentially classified into a plurality of sub-aggregates (namely, categories) with smaller ranges and more consistent characteristics, such as large categories, medium categories, small categories and fine categories, to varieties, fine categories and the like, so that all the commodities are clearly distinguished and systematized.
      The category of the product is an important product attribute, and in the search system, the category of the product is used as important screening information to determine to a great extent whether the product can be searched by a client. At present, the commodity category is mainly selected by a merchant, and the accuracy of the category to which the commodity belongs is not measured and objectively evaluated. When a large amount of commodities are faced, a large error exists in the category selected only by the merchant, so that the commodities of the merchant cannot be searched by the customer, or the commodities searched by the customer do not meet the requirements of the customer, which seriously influences the shopping experience of the customer. Therefore, how to accurately determine the category of the commodity is an urgent problem to be solved by each system of the e-commerce platform.
    Disclosure of Invention
      In view of this, embodiments of the present invention provide a method and an apparatus for determining an item category, which are capable of objectively and accurately determining a category to which an item belongs by analyzing existing data, so as to avoid errors and inconvenience in item classification management caused by a merchant selecting an item category by himself/herself, and problems of poor customer shopping experience caused by an unexpected item being not searched or a searched item not meeting customer requirements.
      To achieve the above object, according to an aspect of an embodiment of the present invention, there is provided a method of determining a category of an article.
      A method of determining a category of an item, comprising: determining a keyword of each category according to transaction behavior data of a user and preset category data, and determining the keyword of an article according to description information of the article; and determining the category to which the article belongs according to the keywords of each category and the keywords of the article.
      Optionally, the step of determining the keyword of each category according to the transaction behavior data of the user and preset category data includes: obtaining search terms input by a user according to transaction behavior data of the user; for each search term, respectively counting categories to which the articles committed according to the search term belong and transaction data of each category according to preset category data; and using the search word as a keyword of a category of the transaction data meeting a preset rule, thereby determining the keyword of each category.
      Optionally, the step of determining the keyword of the item according to the description information of the item includes: and segmenting the description information of the article, then screening the words obtained after segmentation to delete the specified words, and taking the words obtained after screening as the keywords of the article.
      Optionally, the step of determining the category to which the item belongs according to the keyword of each category and the keyword of the item includes: respectively calculating the matching degree of the keywords of the article and the keywords of each category; and determining the category corresponding to the keyword of which the matching degree is not less than a preset threshold value as the category to which the article belongs.
      Optionally, the matching degree of the keywords of the article and the keywords of the category is calculated by calculating the jaccard similarity coefficient of the keywords of the article and the keywords of the category.
      Optionally, the method further comprises: and calculating the association degree score of the article and the category according to the matching degree of the keywords of the article and the keywords of the category to which the article belongs and the sales ratio of the article in the category so as to evaluate the association degree of the article and the category.
      According to another aspect of embodiments of the present invention, there is provided an apparatus for determining a category of an article.
      An apparatus for determining a category of an item, comprising: the keyword determining module is used for determining keywords of each category according to transaction behavior data of a user and preset category data, and determining the keywords of the article according to description information of the article; and the category determining module is used for determining the category to which the article belongs according to the key words of each category and the key words of the article.
      Optionally, the keyword determination module is further configured to: obtaining search terms input by a user according to transaction behavior data of the user; for each search term, respectively counting categories to which the articles committed according to the search term belong and transaction data of each category according to preset category data; and using the search word as a keyword of a category of the transaction data meeting a preset rule, thereby determining the keyword of each category.
      Optionally, the keyword determination module is further configured to: and segmenting the description information of the article, then screening the words obtained after segmentation to delete the specified words, and taking the words obtained after screening as the keywords of the article.
      Optionally, the category determining module is further configured to: respectively calculating the matching degree of the keywords of the article and the keywords of each category; and determining the category corresponding to the keyword of which the matching degree is not less than a preset threshold value as the category to which the article belongs.
      Optionally, the matching degree of the keywords of the article and the keywords of the category is calculated by calculating the jaccard similarity coefficient of the keywords of the article and the keywords of the category.
      Optionally, the apparatus further comprises a correlation degree evaluation module, configured to: and calculating the association degree score of the article and the category according to the matching degree of the keywords of the article and the keywords of the category to which the article belongs and the sales ratio of the article in the category so as to evaluate the association degree of the article and the category.
      According to yet another aspect of an embodiment of the present invention, there is provided an electronic device for determining a category of an article.
      An electronic device for determining a category of an item, comprising: one or more processors; the storage device is used for storing one or more programs, and when the one or more programs are executed by the one or more processors, the one or more processors implement the method for determining the object class provided by the embodiment of the invention.
      According to yet another aspect of embodiments of the present invention, a computer-readable medium is provided.
      A computer readable medium, on which a computer program is stored, which when executed by a processor, performs a method of determining a class of an object as provided by an embodiment of the invention.
      One embodiment of the above invention has the following advantages or benefits: the method and the device have the advantages that the keyword of each category is determined by analyzing the transaction behavior data of the user and the preset category data, the keyword of each article is determined by analyzing the description information of the article, then the category to which each article belongs is determined according to the keyword of each category and the keyword of each article, the purpose that the category to which the article belongs is objectively and accurately determined by analyzing the existing data is achieved, and therefore the problems that errors and inconvenience of article classification management are caused by the fact that a merchant selects the category of the article by self, and the customer shopping experience is poor due to the fact that the article cannot be searched or the searched article does not meet the customer requirements are solved. Meanwhile, by evaluating the degree of association between the articles and the categories, the algorithm for determining the categories of the articles can be optimized so as to further improve the accuracy of the categories to which the determined articles belong; moreover, the evaluation result is used for the search system, so that the search and shopping experience of the client can be improved.
      Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.
    Drawings
      The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
      FIG. 1 is a schematic diagram of the main steps of a method for determining a class of an object according to an embodiment of the present invention;
      FIG. 2 is a schematic diagram of the implementation of one embodiment of the present invention;
      FIG. 3 is a schematic diagram of the main modules of an apparatus for determining the type of an object according to an embodiment of the present invention;
      FIG. 4 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;
      fig. 5 is a schematic block diagram of a computer system suitable for use in implementing a terminal device or server of an embodiment of the invention.
    Detailed Description
      Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
      In order to solve the problems in the prior art, the invention provides a method for determining the category of an article, which can determine the category of the article by analyzing transaction behavior data of a user, preset category data and description information of the article; meanwhile, the relevance degree of the existing article and the category to which the article belongs can be objectively evaluated, so that the category of the article can be conveniently modified and the like.
      Fig. 1 is a schematic diagram of the main steps of a method for determining a class of an object according to an embodiment of the present invention. As shown in fig. 1, the method for determining the object class according to the embodiment of the present invention mainly includes the following steps S101 and S102.
      Step S101: and determining the key words of each category according to the transaction behavior data of the user and preset category data, and determining the key words of the articles according to the description information of the articles.
      In the embodiment of the present invention, the description is given by taking the example of determining the category corresponding to the commodity of the e-commerce platform. The transaction behavior data of the user includes, for example, data of goods purchased by the user (for example, name, price, brand, attribute, etc. of the goods), and behavior data of searching, browsing, etc. performed by the user to purchase the goods. In the invention, the transaction behavior data of the user is obtained by collecting, cleaning, screening and the like the log data of the behaviors of searching, browsing, purchasing, commenting and the like of the user on the commodity recorded by the log system.
      Specifically, when the behavior log of the user is cleaned, the following rules may be implemented:
      1. deleting the user data with the page browsing amount sequenced to be the first 1% (flexible adjustment can be carried out according to the requirement), wherein most of the data is non-artificial data and is possibly caused by operations such as list swiping and the like;
      2. deleting data without user identification;
      3. deleting data that cannot be sourced, for example: data not containing an IP (Internet Protocol, Protocol for interconnecting networks) address, and the like;
      4. the access data of the IP address included in the blacklist is deleted.
      Illegal data can be filtered by cleaning the behavior logs of the users, so that only legal and real user behavior data can be processed during data analysis. Moreover, after the behavior log of the user is cleaned, real user behavior data needs to be screened, so that transaction behavior data of the user is obtained.
      After the transaction behavior data of the user is obtained, the keyword of each category can be determined according to the transaction behavior data of the user and preset category data. The preset category data refers to a category which is planned by a system for facilitating classification management of the articles, and the invention is used for classifying the articles into the preset category.
      Specifically, in determining the keyword of each category, the following steps may be performed:
      obtaining search terms input by a user according to transaction behavior data of the user;
      for each search term, respectively counting the category to which the article filed according to the search term belongs and the filing data of each category according to preset category data;
      the search word is taken as a keyword for a category having deal data satisfying a predetermined rule, thereby determining a keyword for each category.
      In an embodiment of the invention, transaction behavior data of all users stored in the system in the last year is selected, for each transaction behavior data, a corresponding search word when the user purchases a certain category of commodities can be obtained, and the search word input when the user purchases each category of commodities can be obtained by analyzing all transaction behavior data.
      For each search term in the search term set, the deal data of each category corresponding to the search term can be counted respectively. Further, the category corresponding to the search term refers to a category to which the commodity traded according to the search term belongs. The data of the transaction is, for example, the number of times of the transaction, and/or the ratio of the transaction. Assuming that after a user searches and trades commodities by the search term 1, the categories of the deal include category A, category B and category C, and the number of the deal times and the deal proportion corresponding to the 3 categories of the category A, the category B and the category C can be obtained by respectively counting the number of the commodities corresponding to the search term 1 in the category A, the category B and the category C.
      Then, according to the transaction data of each category and the predetermined rule, it can be determined whether the search term should belong to a keyword of a certain category. Specifically, assuming that the predetermined rule to be satisfied by the deal data is that the deal proportion is higher than 80%, when the deal count of a certain category corresponding to the search word accounts for 80% or more of the sum of the deal counts of all categories corresponding to the search word, the search word can be determined as the keyword of the category; if the predetermined rule to be satisfied by the deal data is that the number of deals is the largest, then when the number of deals of a certain category in all categories corresponding to the search word is the largest, the search word can be determined as the keyword of the category. In a specific application process, the predetermined rule to be satisfied by the transaction data can be flexibly set according to needs, and is not limited to the above-listed examples.
      According to the content, the keyword of each category can be determined according to the transaction behavior data of the user and the preset category data.
      According to another embodiment of the present invention, when determining the keyword of the item according to the description information of the item, the following steps may be specifically performed:
      and segmenting the description information of the article, then screening the words obtained after segmentation to delete the specified words, and taking the words obtained after screening as the key words of the article.
      Specifically, in the e-commerce field, when the description information of an article is segmented, the description information such as the title and the attribute of the article is mainly segmented. The description information of the commodity generally includes information such as the name, brand, place of production, specification, etc. of the commodity. When the description information of the commodity is segmented, different segmentation rules can be formulated according to the characteristics of the use scene, for example, a word stock corresponding to a common segmentation method is supplemented or modified to obtain an applicable segmentation word stock, and then the applicable segmentation word stock is used for segmenting words. Common word segmentation methods such as word segmentation, ending segmentation, KCWS segmentation, etc. can be used to implement the word segmentation function of the present invention.
      After the description information of the article is segmented, the words obtained after the segmentation need to be screened, the specified words included in the words obtained after the segmentation can be deleted according to a pre-stored specified word list, and the words left after the specified words are deleted are used as the keywords of the article. Wherein, the designated word generally refers to nonsense words, such as: auxiliary words, exclamation words, punctuation marks, etc. For example: the words obtained after the word segmentation of the description information of the article, namely 'hot sales 1W table 32G XX mobile phone red', are respectively as follows: hot-selling, 1W, table, 32G, XX, mobile phone and red, wherein the hot-selling is a nonsense word stored in a designated word list and needs to be deleted, so that the obtained keywords corresponding to the article are: 1W, desk, 32G, XX, cell phone, red.
      According to the introduced steps, the corresponding key words of the article can be determined according to the description information of the article.
      After obtaining the keywords of each category and the keywords of the item, the category to which the item belongs may be determined by performing step S102.
      Step S102: and determining the category to which the article belongs according to the key words of each category and the key words of the article.
      Specifically, step S102 may be performed in the following manner:
      respectively calculating the matching degree of the keywords of the article and the keywords of each category;
      and determining the category corresponding to the keyword of which the matching degree of the keyword of the article is not less than a preset threshold value as the category to which the article belongs, thereby determining the category to which the article belongs.
      In one embodiment of the invention, the degree of matching of the keywords of the item with the keywords of the category is calculated by calculating the Jacard similarity coefficient of the keywords of the item with the keywords of the category.
      Jaccard similarity coefficient (Jaccard similarity coefficient), also known as Jaccard coefficient, is mainly used to compare similarity and difference between limited sample sets. The larger the value of the Jacard coefficient, the higher the sample similarity. In the embodiment of the present invention, if a is a keyword set of an item and B is a keyword set of a category, the matching degree Score between the keyword of the item and the keyword of the categoryjaccardCan be calculated according to the following equation (1):
      
      because the keywords of the article and the keywords of the category are limited sets, the Jacard similarity coefficient is selected to calculate the matching degree of the keywords and the category, the processing process is simple, intuitive and efficient, and the calculation result is accurate.
      In addition, when calculating the matching degree between the keyword of the article and the keyword of the category, other similarity comparison methods may be selected as needed, for example: cosine similarity, manhattan distance, euclidean distance, etc. Taking the example of calculating the matching degree of the keywords of the article and the keywords of the category by using the cosine similarity, the corresponding word vectors can be obtained by respectively extracting the features of the keywords of the article and the keywords of the category, and then the cosine similarity is calculated according to the word vectors of the keywords of the article and the keywords of the category to obtain the matching degree of the keywords of the article and the keywords of the category. In a specific application, the similarity comparison method may be flexibly selected as needed to calculate the matching degree between the keywords of the article and the keywords of the category, which is not limited in the present invention.
      After the matching degree of the keywords of the article and the keywords of the categories is obtained, the categories of the keywords of which the matching degree with the keywords of the article is not less than the predetermined threshold are determined as the categories to which the article belongs by setting a predetermined threshold. Wherein, assuming that the predetermined threshold is 0.8, a category having a keyword having a matching degree with the keyword of the item higher than 0.8 may be determined as the category to which the item belongs. When the predetermined threshold is set, a value between 0 and 1 that satisfies the requirement can be selected by analyzing a large amount of data.
      According to the steps S101 and S102, the transaction behavior data of the user and the preset category data are analyzed to obtain the keyword of each category, the description information of the article is analyzed to obtain the keyword of the article, and then the category to which the article belongs is determined by using the keyword of the category and the keyword of the article, so that the category to which the article belongs is objectively determined by analyzing the existing data.
      In addition, the present invention can also evaluate the accuracy of the determined categories, namely: the degree of association of the item with the category is evaluated. Application scenarios are for example: after a period of time (e.g., one month), it is necessary to determine whether it is appropriate or accurate to classify an item into the category to which it belongs; for another example: whether the category selected by the merchant for the commodity is accurate needs to be judged; the following steps are repeated: and verifying the algorithm for determining the item type information in the steps S101 and S102, providing a modification basis, and the like. Therefore, the method and the device can not only determine the category of the article, but also improve the accuracy of specifying the category of the article by updating the category of the article. Further, a process of updating the category to which the article belongs may be periodically performed. When the accuracy of detecting that the article belongs to a certain category is lower than the accuracy threshold, the article can be automatically updated to the category which is more matched with the article, or a notice is sent to a merchant or an administrator and the category is recommended so that the category to which the article belongs is updated.
      In particular, in embodiments of the present invention, the accuracy with which an item belongs to a certain category is calculated by two considerations. Firstly, the accuracy of the object belonging to a certain category is analyzed from the user behavior, generally, in the searching process, if the object has a deal, the category representing the object is credible, so whether the object has the deal can indicate whether the corresponding category is accurate to a certain extent; secondly, the accuracy of the object belonging to a certain category is analyzed according to the matching degree of the object key words and the category key words. By combining the two, the accuracy of the object belonging to a certain category can be accurately and objectively evaluated.
      In the embodiment of the present invention, the evaluation of the degree of association between an article and a category is realized by calculating the degree of association score between the article and the category to which the article belongs, that is: the degree of association between the article and the category is evaluated by calculating the degree of association between the article and the category according to the matching degree of the keyword of the article and the keyword of the category to which the article belongs and the transaction behavior data of the user, so that the accuracy of the determined category is evaluated.
      In one embodiment of the present invention, the relevancy Score of an item i and the category j to which it belongs is calculated by the following formula (2)i_j:
      
      Wherein, MonthGmvi_jSales data (e.g., amount of bargained, number of sales, etc.) for item i under category j for a recent period of time (e.g., one month); MonthGmvjSales data (e.g., amount of deals, number of sales, etc.) for the entire category j over the last period of time (e.g., one month); scorejaccardThe matching degree of the keyword of the item i and the keyword of the category j to which the keyword belongs is calculated according to the formula (1), α and β are weight coefficients and are between 0 and 1, and α + β is equal to 1.
      In one embodiment of the present invention, the impact of the sales ratio of item i under the specified category j on the relevancy Score between item i and the category j to which it belongs is further considered, so α is set to 0.7 in the system because analysis of the deal data shows that the higher the deal percentage of items under the specified category, the higher the probability of belonging to the specified category, for example, the sales ratio of item 1 entitled "hot sold 1W platform 32G Hua Mobile Red" is much higher than the sales ratio of item 2 entitled "full package shatter-proof hard shell Mobile phone case female" taking the sales ratio of two items under the "Mobile phone" category as an example, so the probability of belonging to the "Mobile phone" category of item 1 is much higher than that of item 2. at the same time, the relevance on the text, i.e., the matching degree of the item keywords and the category keywords, is set to β to 0.3. finally, Score is outputi_jI.e. the degree of association of the item i with the category j to which it belongs, or the accuracy with which the item i belongs to the category j, wherein the category j is selected by the merchant or previously determined by the system operation.
      By setting the weight coefficients α and β, the item i and the item located by the item i can be accurately obtainedRelevancy Score of Category j of genus, Scorei_jThe larger the indication, the higher the degree of association of item i with category j, and the higher the accuracy of the belonging of item i to category j.
      By calculating the association score between the item i and the category j, the category to which the item belongs can be adjusted according to the association score, the method can also be used for evaluating the accuracy of the category selected by the merchant, and in the search ranking, the association score can be used as an adjustment factor, and the items can be ranked according to the association score, so that the items with high purchase probability can be recommended to the user, and the like.
      Fig. 2 is a schematic diagram of the implementation principle of one embodiment of the present invention. As shown in fig. 2, legal user behavior data can be obtained by cleaning the user behavior log to delete illegal data, then, transaction behavior data of the user can be screened from the legal user behavior data, and keywords of each category can be obtained by analysis according to the transaction behavior data of the user and preset category data; the keywords of each article can be obtained by segmenting the description information of the article and deleting the designated meaningless words; then, calculating the matching degree of the key words of each category and the key words of each article to obtain the category to which each article belongs; and finally, calculating the association degree score of the article and the category to which the article belongs according to the matching degree of the keywords of the category and the keywords of the article and the transaction behavior data of the user, thereby evaluating the accuracy of the determined category of the article.
      Fig. 3 is a schematic diagram of main modules of an apparatus for determining the type of an object according to an embodiment of the present invention. As shown in fig. 3, an apparatus  300 for determining a category of an object according to an embodiment of the present invention mainly includes a keyword determining module  301 and a category determining module  302.
      The keyword determining module  301 is configured to determine a keyword of each category according to transaction behavior data of a user and preset category data, and determine a keyword of an article according to description information of the article;
      a category determining module  302, configured to determine a category to which the item belongs according to the keyword of each category and the keyword of the item.
      According to an embodiment of the present invention, the keyword determination module  301 may be further configured to:
      obtaining search terms input by a user according to transaction behavior data of the user;
      for each search term, respectively counting the category to which the article filed according to the search term belongs and the filing data of each category according to preset category data;
      the search word is taken as a keyword for a category having deal data satisfying a predetermined rule, thereby determining a keyword for each category.
      According to another embodiment of the present invention, the keyword determination module  301 may be further configured to:
      and segmenting the description information of the article, then screening the words obtained after segmentation to delete the specified words, and taking the words obtained after screening as the keywords of the article.
      According to yet another embodiment of the invention, the category determination module  302 may be further configured to:
      respectively calculating the matching degree of the keywords of the article and the keywords of each category;
      and determining the category corresponding to the keyword of which the matching degree of the keyword of the article is not less than a preset threshold value as the category to which the article belongs.
      According to the technical scheme of the embodiment of the invention, the matching degree of the keywords of the article and the keywords of the category is calculated by calculating the Jacard similarity coefficient of the keywords of the article and the keywords of the category.
      In addition, the apparatus  300 for determining the object type according to the embodiment of the present invention may further include a correlation degree evaluation module (not shown in the figure) configured to:
      and calculating the association degree score of the article and the category according to the matching degree of the keywords of the article and the keywords of the category to which the article belongs and the sales ratio of the article in the category so as to evaluate the association degree of the article and the category.
      According to the technical scheme of the embodiment of the invention, the transaction behavior data of the user and the preset category data are analyzed to determine the key words of each category, the description information of the articles is analyzed to determine the key words of each article, and then the category to which each article belongs is determined according to the key words of each category and the key words of each article, so that the category to which the article belongs is objectively and accurately judged by analyzing the existing data, and the problems of errors and inconvenience in article classification management caused by the fact that a merchant selects the category of the article by self, poor customer shopping experience caused by the fact that the article cannot be searched or the searched article does not meet the customer requirements and the like are solved. Meanwhile, by evaluating the degree of association between the articles and the categories, the algorithm for determining the categories of the articles can be optimized so as to further improve the accuracy of the categories to which the determined articles belong; moreover, the evaluation result is used for the search system, so that the search and shopping experience of the client can be improved.
      Fig. 4 illustrates an exemplary system architecture  400 of a method of determining an item category or an apparatus for determining an item category to which embodiments of the present invention may be applied.
      As shown in fig. 4, the system architecture  400 may include   terminal devices      401, 402, 403, a network  404, and a server  405. The network  404 serves as a medium for providing communication links between the   terminal devices      401, 402, 403 and the server  405. Network  404 may include various types of connections, such as wire, wireless communication links, or fiber optic cables, to name a few.
      A user may use   terminal devices      401, 402, 403 to interact with a server  405 over a network  404 to receive or send messages or the like. The   terminal devices      401, 402, 403 may have installed thereon various communication client applications, such as shopping-like applications, web browser applications, search-like applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).
      The   terminal devices      401, 402, 403 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
      The server  405 may be a server providing various services, such as a background management server (for example only) providing support for shopping websites browsed by users using the   terminal devices      401, 402, 403. The backend management server may analyze and perform other processing on the received data such as the product information query request, and feed back a processing result (for example, target push information, product information — just an example) to the terminal device.
      It should be noted that the method for determining the category of the article provided by the embodiment of the present invention is generally executed by the server  405, and accordingly, the apparatus for determining the category of the article is generally disposed in the server  405.
      It should be understood that the number of terminal devices, networks, and servers in fig. 4 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
      Referring now to FIG. 5, a block diagram of a computer system  500 suitable for use with a terminal device or server implementing an embodiment of the invention is shown. The terminal device or the server shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
      As shown in fig. 5, the computer system  500 includes a Central Processing Unit (CPU)501 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage section  508 into a Random Access Memory (RAM) 503. In the RAM  503, various programs and data necessary for the operation of the system  500 are also stored. The CPU  501, ROM  502, and RAM  503 are connected to each other via a bus  504. An input/output (I/O) interface  505 is also connected to bus  504.
      The following components are connected to the I/O interface 505: an input portion  506 including a keyboard, a mouse, and the like; an output portion  507 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion  508 including a hard disk and the like; and a communication section  509 including a network interface card such as a LAN card, a modem, or the like. The communication section  509 performs communication processing via a network such as the internet. The driver  510 is also connected to the I/O interface  505 as necessary. A removable medium  511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive  510 as necessary, so that a computer program read out therefrom is mounted into the storage section  508 as necessary.
      In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section  509, and/or installed from the removable medium  511. The computer program performs the above-described functions defined in the system of the present invention when executed by the Central Processing Unit (CPU) 501.
      It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
      The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
      The units or modules described in the embodiments of the present invention may be implemented by software, or may be implemented by hardware. The described units or modules may also be provided in a processor, and may be described as: a processor includes a keyword determination module and a category determination module. The names of the units or modules do not form a limitation on the units or modules, for example, the keyword determination module may also be described as "a module for determining a keyword of each category according to transaction behavior data of a user and preset category data, and determining the keyword of an article according to description information of the article".
      As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise: determining a keyword of each category according to transaction behavior data of a user and preset category data, and determining the keyword of an article according to description information of the article; and determining the category to which the article belongs according to the keywords of each category and the keywords of the article.
      According to the technical scheme of the embodiment of the invention, the transaction behavior data of the user and the preset category data are analyzed to determine the key words of each category, the description information of the articles is analyzed to determine the key words of each article, and then the category to which each article belongs is determined according to the key words of each category and the key words of each article, so that the category to which the article belongs is objectively and accurately judged by analyzing the existing data, and the problems of errors and inconvenience in article classification management caused by the fact that a merchant selects the category of the article by self, poor customer shopping experience caused by the fact that the article cannot be searched or the searched article does not meet the customer requirements and the like are solved. Meanwhile, by evaluating the degree of association between the articles and the categories, the algorithm for determining the categories of the articles can be optimized so as to further improve the accuracy of the categories to which the determined articles belong; moreover, the evaluation result is used for the search system, so that the search and shopping experience of the client can be improved.
      The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
    Claims (14)
1. A method of determining a category of an item, comprising:
      determining a keyword of each category according to transaction behavior data of a user and preset category data, and determining the keyword of an article according to description information of the article;
      and determining the category to which the article belongs according to the keywords of each category and the keywords of the article.
    2. The method of claim 1, wherein the step of determining the keyword for each category according to the transaction behavior data of the user and the preset category data comprises:
      obtaining search terms input by a user according to transaction behavior data of the user;
      for each search term, respectively counting categories to which the articles committed according to the search term belong and transaction data of each category according to preset category data;
      and using the search word as a keyword of a category of the transaction data meeting a preset rule, thereby determining the keyword of each category.
    3. The method of claim 1, wherein the step of determining the keyword of the item according to the description information of the item comprises:
      and segmenting the description information of the article, then screening the words obtained after segmentation to delete the specified words, and taking the words obtained after screening as the keywords of the article.
    4. The method according to claim 1, wherein the step of determining the category to which the item belongs according to the keywords of each category and the keywords of the item comprises:
      respectively calculating the matching degree of the keywords of the article and the keywords of each category;
      and determining the category corresponding to the keyword of which the matching degree is not less than a preset threshold value as the category to which the article belongs.
    5. The method according to claim 4, wherein the degree of matching of the keywords of the item with the keywords of the category is calculated by calculating Jacard similarity coefficients of the keywords of the item with the keywords of the category.
    6. The method according to claim 4 or 5, characterized in that the method further comprises:
      and calculating the association degree score of the article and the category according to the matching degree of the keywords of the article and the keywords of the category to which the article belongs and the sales ratio of the article in the category so as to evaluate the association degree of the article and the category.
    7. An apparatus for determining a category of an item, comprising:
      the keyword determining module is used for determining keywords of each category according to transaction behavior data of a user and preset category data, and determining the keywords of the article according to description information of the article;
      and the category determining module is used for determining the category to which the article belongs according to the key words of each category and the key words of the article.
    8. The apparatus of claim 7, wherein the keyword determination module is further configured to:
      obtaining search terms input by a user according to transaction behavior data of the user;
      for each search term, respectively counting categories to which the articles committed according to the search term belong and transaction data of each category according to preset category data;
      and using the search word as a keyword of a category of the transaction data meeting a preset rule, thereby determining the keyword of each category.
    9. The apparatus of claim 7, wherein the keyword determination module is further configured to:
      and segmenting the description information of the article, then screening the words obtained after segmentation to delete the specified words, and taking the words obtained after screening as the keywords of the article.
    10. The apparatus of claim 7, wherein the category determination module is further configured to:
      respectively calculating the matching degree of the keywords of the article and the keywords of each category;
      and determining the category corresponding to the keyword of which the matching degree is not less than a preset threshold value as the category to which the article belongs.
    11. The apparatus of claim 10, wherein the degree of matching of the keywords of the item with the keywords of the category is calculated by calculating a Jacard similarity coefficient between the keywords of the item and the keywords of the category.
    12. The apparatus according to claim 10 or 11, further comprising a correlation degree evaluation module configured to:
      and calculating the association degree score of the article and the category according to the matching degree of the keywords of the article and the keywords of the category to which the article belongs and the sales ratio of the article in the category so as to evaluate the association degree of the article and the category.
    13. An electronic device for determining a category of an item, comprising:
      one or more processors;
      a storage device for storing one or more programs,
      when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-6.
    14. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-6.
    Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title | 
|---|---|---|---|
| CN201810743678.7A CN110766486B (en) | 2018-07-09 | 2018-07-09 | Method and device for determining item category | 
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title | 
|---|---|---|---|
| CN201810743678.7A CN110766486B (en) | 2018-07-09 | 2018-07-09 | Method and device for determining item category | 
Publications (2)
| Publication Number | Publication Date | 
|---|---|
| CN110766486A true CN110766486A (en) | 2020-02-07 | 
| CN110766486B CN110766486B (en) | 2024-10-22 | 
Family
ID=69327914
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date | 
|---|---|---|---|
| CN201810743678.7A Active CN110766486B (en) | 2018-07-09 | 2018-07-09 | Method and device for determining item category | 
Country Status (1)
| Country | Link | 
|---|---|
| CN (1) | CN110766486B (en) | 
Cited By (12)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| CN111553762A (en) * | 2020-04-24 | 2020-08-18 | 广州探途网络技术有限公司 | Method, system and terminal equipment for improving search quality | 
| CN112463971A (en) * | 2020-09-15 | 2021-03-09 | 杭州商情智能有限公司 | E-commerce commodity classification method and system based on hierarchical combination model | 
| CN112767081A (en) * | 2021-01-19 | 2021-05-07 | 广州新丝路信息科技有限公司 | Cross-border bonded bin commodity classification method and device | 
| CN113157851A (en) * | 2021-02-23 | 2021-07-23 | 北京三快在线科技有限公司 | Category information generation method and device, electronic equipment and computer readable medium | 
| CN113706257A (en) * | 2021-09-01 | 2021-11-26 | 北京京东振世信息技术有限公司 | Article information processing method, searching method and device | 
| CN113743973A (en) * | 2020-11-30 | 2021-12-03 | 北京沃东天骏信息技术有限公司 | Method and device for analyzing market hotspot trend | 
| CN113779243A (en) * | 2021-08-16 | 2021-12-10 | 深圳市世强元件网络有限公司 | A kind of commodity automatic classification method, device and computer equipment | 
| CN114155586A (en) * | 2021-11-30 | 2022-03-08 | 佛山安赛夫信息科技有限公司 | An intelligent interrogation control system and its control method | 
| CN114529337A (en) * | 2022-02-08 | 2022-05-24 | 北京电解智科技有限公司 | Information detection method and device | 
| CN115114994A (en) * | 2022-07-15 | 2022-09-27 | 北京沃东天骏信息技术有限公司 | Method and device for determining commodity category information | 
| CN115345700A (en) * | 2022-08-03 | 2022-11-15 | 鑫源易网(大连)电力科技有限公司 | A method for standardizing electric power customized product categories | 
| CN116628556A (en) * | 2023-06-14 | 2023-08-22 | 上海桥创科技有限公司 | Product label establishing method and system | 
Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| JP2008299839A (en) * | 2007-05-31 | 2008-12-11 | Nhn Corp | Keyword recommendation method, computer-readable recording medium, keyword recommendation system | 
| CN103310343A (en) * | 2012-03-15 | 2013-09-18 | 阿里巴巴集团控股有限公司 | Commodity information issuing method and device | 
| CN103605815A (en) * | 2013-12-11 | 2014-02-26 | 焦点科技股份有限公司 | Automatic commodity information classifying and recommending method applicable to B2B (Business to Business) e-commerce platform | 
| WO2016107455A1 (en) * | 2014-12-29 | 2016-07-07 | 阿里巴巴集团控股有限公司 | Information matching processing method and apparatus | 
| CN105893349A (en) * | 2016-03-31 | 2016-08-24 | 新浪网技术(中国)有限公司 | Category label matching and mapping method and device | 
| CN106919576A (en) * | 2015-12-24 | 2017-07-04 | 北京奇虎科技有限公司 | Using the method and device of two grades of classes keywords database search for application now | 
| CN107767172A (en) * | 2017-10-12 | 2018-03-06 | 百度在线网络技术(北京)有限公司 | Information-pushing method, device, server and medium | 
- 
        2018
        - 2018-07-09 CN CN201810743678.7A patent/CN110766486B/en active Active
 
Patent Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| JP2008299839A (en) * | 2007-05-31 | 2008-12-11 | Nhn Corp | Keyword recommendation method, computer-readable recording medium, keyword recommendation system | 
| CN103310343A (en) * | 2012-03-15 | 2013-09-18 | 阿里巴巴集团控股有限公司 | Commodity information issuing method and device | 
| CN103605815A (en) * | 2013-12-11 | 2014-02-26 | 焦点科技股份有限公司 | Automatic commodity information classifying and recommending method applicable to B2B (Business to Business) e-commerce platform | 
| WO2016107455A1 (en) * | 2014-12-29 | 2016-07-07 | 阿里巴巴集团控股有限公司 | Information matching processing method and apparatus | 
| CN106919576A (en) * | 2015-12-24 | 2017-07-04 | 北京奇虎科技有限公司 | Using the method and device of two grades of classes keywords database search for application now | 
| CN105893349A (en) * | 2016-03-31 | 2016-08-24 | 新浪网技术(中国)有限公司 | Category label matching and mapping method and device | 
| CN107767172A (en) * | 2017-10-12 | 2018-03-06 | 百度在线网络技术(北京)有限公司 | Information-pushing method, device, server and medium | 
Cited By (12)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| CN111553762A (en) * | 2020-04-24 | 2020-08-18 | 广州探途网络技术有限公司 | Method, system and terminal equipment for improving search quality | 
| CN112463971A (en) * | 2020-09-15 | 2021-03-09 | 杭州商情智能有限公司 | E-commerce commodity classification method and system based on hierarchical combination model | 
| CN113743973A (en) * | 2020-11-30 | 2021-12-03 | 北京沃东天骏信息技术有限公司 | Method and device for analyzing market hotspot trend | 
| CN112767081A (en) * | 2021-01-19 | 2021-05-07 | 广州新丝路信息科技有限公司 | Cross-border bonded bin commodity classification method and device | 
| CN113157851A (en) * | 2021-02-23 | 2021-07-23 | 北京三快在线科技有限公司 | Category information generation method and device, electronic equipment and computer readable medium | 
| CN113779243A (en) * | 2021-08-16 | 2021-12-10 | 深圳市世强元件网络有限公司 | A kind of commodity automatic classification method, device and computer equipment | 
| CN113706257A (en) * | 2021-09-01 | 2021-11-26 | 北京京东振世信息技术有限公司 | Article information processing method, searching method and device | 
| CN114155586A (en) * | 2021-11-30 | 2022-03-08 | 佛山安赛夫信息科技有限公司 | An intelligent interrogation control system and its control method | 
| CN114529337A (en) * | 2022-02-08 | 2022-05-24 | 北京电解智科技有限公司 | Information detection method and device | 
| CN115114994A (en) * | 2022-07-15 | 2022-09-27 | 北京沃东天骏信息技术有限公司 | Method and device for determining commodity category information | 
| CN115345700A (en) * | 2022-08-03 | 2022-11-15 | 鑫源易网(大连)电力科技有限公司 | A method for standardizing electric power customized product categories | 
| CN116628556A (en) * | 2023-06-14 | 2023-08-22 | 上海桥创科技有限公司 | Product label establishing method and system | 
Also Published As
| Publication number | Publication date | 
|---|---|
| CN110766486B (en) | 2024-10-22 | 
Similar Documents
| Publication | Publication Date | Title | 
|---|---|---|
| CN110766486B (en) | Method and device for determining item category | |
| CN111444304B (en) | Search ordering method and device | |
| US9934293B2 (en) | Generating search results | |
| US11127063B2 (en) | Product and content association | |
| CN107832338B (en) | Method and system for recognizing core product words | |
| CN110111167A (en) | A kind of method and apparatus of determining recommended | |
| CN107908616B (en) | Method and device for predicting trend words | |
| WO2016107455A1 (en) | Information matching processing method and apparatus | |
| CN107679916A (en) | For obtaining the method and device of user interest degree | |
| CN113742564B (en) | Method and device for pushing target resources | |
| CN113722593B (en) | Event data processing methods, devices, electronic equipment and media | |
| CN110827101B (en) | Shop recommending method and device | |
| CN114445179A (en) | Business recommendation method, apparatus, electronic device, and computer-readable medium | |
| CN110232581B (en) | Method and device for providing coupons for users | |
| CN110020131B (en) | Method and device for arranging commodities | |
| CN111625619B (en) | Query omission method, device, computer readable medium and electronic equipment | |
| CN112529646A (en) | Commodity classification method and device | |
| CN112667770A (en) | Method and device for classifying articles | |
| CN111723201A (en) | A method and apparatus for text data clustering | |
| CN110858231A (en) | Item recommendation method and device | |
| CN112579896A (en) | Information recommendation method and device, electronic equipment and storage medium | |
| CN113327145A (en) | Article recommendation method and device | |
| CN110110267B (en) | Method and device for extracting object characteristics and searching objects | |
| CN111563107A (en) | Method, apparatus, electronic device and storage medium for information recommendation | |
| CN112784861A (en) | Similarity determination method and device, electronic equipment and storage medium | 
Legal Events
| Date | Code | Title | Description | 
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |