WO2018187949A1 - Procédé d'analyse de perspective pour modèle d'apprentissage machine - Google Patents
Procédé d'analyse de perspective pour modèle d'apprentissage machine Download PDFInfo
- Publication number
- WO2018187949A1 WO2018187949A1 PCT/CN2017/080173 CN2017080173W WO2018187949A1 WO 2018187949 A1 WO2018187949 A1 WO 2018187949A1 CN 2017080173 W CN2017080173 W CN 2017080173W WO 2018187949 A1 WO2018187949 A1 WO 2018187949A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- query
- result
- learning model
- user
- machine learning
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
Definitions
- the present invention relates to a perspective analysis method of a machine learning model, and belongs to the field of Internet search.
- search engines have become an important tool for people to use Internet information resources.
- search engines such as Google, Yahoo!. Bing, and Baidu
- the relevance of query results has attracted more and more attention.
- the pros and cons of sorting the results of the query have also become the main indicators for evaluating the search engine.
- the user gives the keyword as a query request
- the search engine queries the index database according to the user query, and returns the retrieval result of the sorting and correlation analysis to the user, helping the person to reject and ignore a large amount of irrelevant information, thereby Play the role of information navigation.
- the massive amount of information data means massive search results.
- most users of the cable engine only browse the first few pages of the returned results, and rarely care about the lower ranked pages. Search results with strong correlation should be ranked higher, while weak correlation results should be ranked lower. Therefore, sorting the query results according to their relevance becomes one of the core problems of search engines. The relevance ranking of search results has also become an important indicator for evaluating search engine performance.
- a multidimensional feature vector is used to represent the relevant attributes and information of each data pair (user query-query result). Extract some data pairs in the dataset and manually identify the relevance of the query results and user queries in each data pair.
- the machine learning model is trained using the already identified data as a training data set, and the resulting machine learning model is used to predict the relevance of the unknown query and the query results.
- machine learning models can predict errors in the application process, such as noise or extreme training data, such as unstable data distribution and defects in the machine learning model itself.
- Step 1 collecting error data fed back by the user and extracting basic information, and extracting relevant information in the feedback data to generate a feature space vector;
- Step 2 Calculating the score of the query result, using the original model and the sub-model to classify the user query result, and obtain the classification result, that is, the evaluation score;
- Step 3 For each user query, calculate the DCG value of the query result, and the actual sorting can be obtained according to the training result of the machine learning model, and the ideal sort can be obtained according to the query result and the user query.
- the value of DCG of the user query is obtained immediately by actual sorting and ideal sorting;
- Step four clustering, according to the "DCG value change trend, obtain the optimal sub-model of each query, and according to The similarity of the sub-models clusters the user queries;
- Step 5 extracting attributes, analyzing all member information in each class, and extracting some attributes as feature space vectors of the class;
- Step 6 Learning an unknown user query, when an unknown user query is given, analyzing its attributes, and classifying the user query, thereby obtaining the user query is learning, corresponding to the most Excellent child model.
- the user feedback data collected in step 1 above includes attribute information of a series of query results.
- the original learning model is used to learn each query result, and the result of the classification on each decision tree is obtained to calculate the score of the query result.
- the query result obtained by the prediction result of each decision tree is obtained, and according to the correlation between the query result and the user query, the ideal order of the query result can be obtained.
- the ideal order of the query result can be obtained.
- the user query can obtain the value of the DCG on all the sub-models, and the same can obtain the variation curve of the DCG.
- Extract the decision tree constructor model that makes DCG have the largest value.
- the perspective analysis method of the machine learning model provided by the present invention analyzes the sub-models inside the learning model, filters the sub-models with poor classification results, and selects the sub-models with better classification results, and The selected sub-models are reorganized to generate a new learning model, and the resulting new learning model has higher prediction accuracy.
- the present invention provides a perspective analysis method for a machine learning model.
- the present invention will be further described in detail below in order to clarify the objects, the technical solutions and the effects of the present invention. It should be understood that The specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
- the query result with high relevance is more applicable to the user, and should be ranked in the front position in the query list.
- the query result has a better correlation with the user query, indicating that the query result has more application value.
- ⁇ indicates the position of each query result in the query list; re/ ; indicates the relevance of the query result at the first position.
- This embodiment first collects user feedback data.
- the user feedback data includes the relevance of the current query result to the user query.
- the feedback data is learned and trained using the established machine learning model.
- the score sorts the results of the query, and the result of this sort is the actual sort result.
- the sorting of the query results according to the relevance of the user mark is an ideal sort result.
- the correlation between the query result and the query, and the ideal sorting result the value of DCG can be calculated. Namely: ⁇ . ⁇ 3 ⁇ 4
- the embodiment includes the following steps:
- Extracting attributes analyzing all member information in each class, and extracting some attributes as feature space vectors of this class;
- the collected user feedback data includes attribute information of a series of query results. It mainly includes: user query query, query result set D, and each query result doc corresponds to a search web page url. The degree of relevance of each query result to the user query, the tag information id, and other attribute information used for classification in the learning model.
- the feature space vector of the user feedback data can be used ⁇ e, doc, rating, features> ⁇ 7 ⁇ .
- ⁇ represents each query result in the query list, re /; represents the first; position, relevance of search results?.
- the feature vector space model can then be expressed as: ⁇ query, nDCG nDCG 2 , , nDCG n >.
- Clustering for each user query query, the user query can obtain the value of DCG on all sub-models, and the DCG curve can be obtained.
- the decision tree construction submodel m ⁇ [t rl , t r2 t ra ] which makes the «DCG have the maximum value can be extracted, and the sub-model training results can obtain the largest "Z) CG value.
- the evaluation score of each user query result can be obtained. Sort user query results based on evaluation scores.
- the perspective analysis method of the machine learning model provided by the present invention analyzes the sub-models inside the learning model, filters the sub-models with poor classification results, and selects the sub-models with better classification results, and The selected sub-models are reorganized to generate a new learning model, and the resulting new learning model has higher prediction accuracy.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
L'invention concerne un procédé d'analyse de perspective pour un modèle d'apprentissage machine, comprenant : la collecte de données d'erreur renvoyées par un utilisateur et l'extraction d'informations de base, et l'extraction d'informations associées dans les données de rétroaction pour générer un vecteur spatial de caractéristique ; le calcul d'un score du résultat d'interrogation et l'utilisation d'un modèle d'origine et de sous-modèles pour apprendre et classer le résultat d'interrogation d'utilisateur pour obtenir le résultat de classification, c'est-à-dire, un score d'évaluation ; et pour chaque requête d'utilisateur, le calcul d'une valeur nDCG du résultat d'interrogation, l'obtention du tri réel selon le résultat d'apprentissage du modèle d'apprentissage automatique, et l'obtention d'un tri idéal en fonction du résultat d'interrogation et de l'interrogation d'utilisateur. Le procédé comprend l'analyse des sous-modèles dans un modèle d'apprentissage, le filtrage des sous-modèles ayant de mauvais résultats de classification, la sélection des sous-modèles ayant de bons résultats de classification, et le regroupement des sous-modèles sélectionnés pour générer un nouveau modèle d'apprentissage. Le nouveau modèle d'apprentissage généré présente une précision de prédiction plus élevée.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2017/080173 WO2018187949A1 (fr) | 2017-04-12 | 2017-04-12 | Procédé d'analyse de perspective pour modèle d'apprentissage machine |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2017/080173 WO2018187949A1 (fr) | 2017-04-12 | 2017-04-12 | Procédé d'analyse de perspective pour modèle d'apprentissage machine |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2018187949A1 true WO2018187949A1 (fr) | 2018-10-18 |
Family
ID=63792211
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2017/080173 WO2018187949A1 (fr) | 2017-04-12 | 2017-04-12 | Procédé d'analyse de perspective pour modèle d'apprentissage machine |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2018187949A1 (fr) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101556603A (zh) * | 2009-05-06 | 2009-10-14 | 北京航空航天大学 | 一种用于对检索结果重新排序的协同检索方法 |
US20100076949A1 (en) * | 2008-09-09 | 2010-03-25 | Microsoft Corporation | Information Retrieval System |
CN103530321A (zh) * | 2013-09-18 | 2014-01-22 | 上海交通大学 | 一种基于机器学习的排序系统 |
CN103544307A (zh) * | 2013-11-04 | 2014-01-29 | 北京中搜网络技术股份有限公司 | 一种不依赖文档库的多搜索引擎自动化对比评测方法 |
CN103646070A (zh) * | 2013-12-06 | 2014-03-19 | 北京趣拿软件科技有限公司 | 搜索引擎的数据处理方法及装置 |
CN106339383A (zh) * | 2015-07-07 | 2017-01-18 | 阿里巴巴集团控股有限公司 | 一种搜索排序方法及系统 |
-
2017
- 2017-04-12 WO PCT/CN2017/080173 patent/WO2018187949A1/fr active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100076949A1 (en) * | 2008-09-09 | 2010-03-25 | Microsoft Corporation | Information Retrieval System |
CN101556603A (zh) * | 2009-05-06 | 2009-10-14 | 北京航空航天大学 | 一种用于对检索结果重新排序的协同检索方法 |
CN103530321A (zh) * | 2013-09-18 | 2014-01-22 | 上海交通大学 | 一种基于机器学习的排序系统 |
CN103544307A (zh) * | 2013-11-04 | 2014-01-29 | 北京中搜网络技术股份有限公司 | 一种不依赖文档库的多搜索引擎自动化对比评测方法 |
CN103646070A (zh) * | 2013-12-06 | 2014-03-19 | 北京趣拿软件科技有限公司 | 搜索引擎的数据处理方法及装置 |
CN106339383A (zh) * | 2015-07-07 | 2017-01-18 | 阿里巴巴集团控股有限公司 | 一种搜索排序方法及系统 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102760138B (zh) | 用户网络行为的分类方法和装置及对应的搜索方法和装置 | |
CN103544663B (zh) | 网络公开课的推荐方法、系统和移动终端 | |
CN110825877A (zh) | 一种基于文本聚类的语义相似度分析方法 | |
CN108846029B (zh) | 基于知识图谱的情报关联分析方法 | |
CN103593336B (zh) | 一种基于语义分析的知识推送系统及方法 | |
CN118673126A (zh) | 一种基于知识图谱的rag问答方法、系统及介质 | |
CN111708740A (zh) | 基于云平台的海量搜索查询日志计算分析系统 | |
CN107066599A (zh) | 一种基于知识库推理的相似上市公司企业检索分类方法及系统 | |
CN109471982B (zh) | 一种基于用户和服务聚类QoS感知的Web服务推荐方法 | |
US20180341686A1 (en) | System and method for data search based on top-to-bottom similarity analysis | |
CN101944099A (zh) | 一种使用本体进行文本文档自动分类的方法 | |
CN101539930A (zh) | 一种相关反馈图像检索方法 | |
CN102693316B (zh) | 基于线性泛化回归模型的跨媒体检索方法 | |
CN110442736B (zh) | 一种基于二次判别分析的语义增强子空间跨媒体检索方法 | |
CN108647729B (zh) | 一种用户画像获取方法 | |
CN108959378A (zh) | 文献热点的可视化分析方法 | |
CN103268406A (zh) | 一种基于煤矿安全培训游戏的数据挖掘系统及方法 | |
CN113657766A (zh) | 一种基于游客多元数据的旅游景区欢乐指数的计量方法 | |
CN112508363B (zh) | 基于深度学习的电力信息系统状态分析方法及装置 | |
CN120163675A (zh) | 社交网络影响力预测方法及其系统 | |
CN103761286A (zh) | 一种基于用户兴趣的服务资源检索方法 | |
CN112948544A (zh) | 一种基于深度学习与质量影响的图书检索方法 | |
CN104317853B (zh) | 一种基于语义Web的服务簇构建方法 | |
CN107103071B (zh) | 一种基于直接优化pauc算法的新闻信息分类方法 | |
CN103324707A (zh) | 一种基于半监督聚类的查询扩展方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17905604 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 200220) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 17905604 Country of ref document: EP Kind code of ref document: A1 |