[go: up one dir, main page]

CN107291817B - Method and device for constructing longitudinal search engine - Google Patents

Method and device for constructing longitudinal search engine Download PDF

Info

Publication number
CN107291817B
CN107291817B CN201710367823.1A CN201710367823A CN107291817B CN 107291817 B CN107291817 B CN 107291817B CN 201710367823 A CN201710367823 A CN 201710367823A CN 107291817 B CN107291817 B CN 107291817B
Authority
CN
China
Prior art keywords
webpage
weight value
web page
link
links
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201710367823.1A
Other languages
Chinese (zh)
Other versions
CN107291817A (en
Inventor
阮勇辉
俞侃
王丽君
詹玲
王方
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Duomi Technology Co ltd
Wenhua College
Original Assignee
Wenhua College Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wenhua College Huazhong University of Science and Technology filed Critical Wenhua College Huazhong University of Science and Technology
Priority to CN201710367823.1A priority Critical patent/CN107291817B/en
Publication of CN107291817A publication Critical patent/CN107291817A/en
Application granted granted Critical
Publication of CN107291817B publication Critical patent/CN107291817B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9558Details of hyperlinks; Management of linked annotations
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and a device for constructing a longitudinal search engine, which belong to the technical field of Internet, and comprise the following steps: acquiring search data when a user browses a webpage, wherein the search data at least comprises: each webpage link group comprises a plurality of webpage links; determining first weight values of a plurality of webpage links in each webpage link group; merging the same webpage links in each webpage link group to determine a second weight value of each merged webpage link; obtaining a third weight value of each group of webpage connection group according to the second weight value of each webpage link; and sorting the webpage link groups from large to small according to the third weight value. The method provided by the invention improves the searching efficiency and has the technical effect of improving the searching experience of the user.

Description

一种纵向搜索引擎的构建方法及装置A method and device for constructing a vertical search engine

技术领域technical field

本发明属于互联网技术领域,特别涉及一种纵向搜索引擎的构建方法及装置。The invention belongs to the field of Internet technology, and in particular relates to a method and device for constructing a vertical search engine.

背景技术Background technique

现有的网页搜索引擎核心,如同谷歌公司使用的page-rank算法,其是先找到与搜索关键字相关的所有网页,再对结果集进行排序,排序依据是网页之间的引用。The core of the existing web search engine, like the page-rank algorithm used by Google, first finds all web pages related to the search keyword, and then sorts the result set based on the references between web pages.

然而,使用上述现有的搜索引擎,当用户在搜索引擎中输入若干关键字时,搜索引擎返回的结果列表是对网页进行排序的结果。However, using the above existing search engine, when the user inputs several keywords in the search engine, the result list returned by the search engine is the result of sorting the web pages.

这样就使得用户在具体的搜索过程中,是通过在结果列表中依次点击网页链接来完成搜索过程,具有搜索效率低的技术缺陷。In this way, in the specific search process, the user completes the search process by sequentially clicking the web page links in the result list, which has the technical defect of low search efficiency.

发明内容SUMMARY OF THE INVENTION

本发明提供一种纵向搜索引擎的构建方法及装置,用以解决现有技术中由于用户具体的搜索过程,是通过在结果列表中依次点击网页链接来完成搜索过程,而导致的具有搜索效率低的技术缺陷。The present invention provides a method and device for constructing a vertical search engine, which is used to solve the problem of low search efficiency caused by the user's specific search process in the prior art, which is to complete the search process by sequentially clicking on web page links in the result list. technical defects.

依据本发明实施例的一个方面,本发明实施例提供了一种纵向搜索引擎的构建方法,包括:According to an aspect of the embodiments of the present invention, the embodiments of the present invention provide a method for constructing a vertical search engine, including:

获取用户浏览网页时的搜索数据,所述搜索数据至少包括:若干个网页链接组,每一个所述网页链接组包括若干个网页链接;Obtaining search data when the user browses a web page, the search data at least includes: several web page link groups, each of which includes several web page links;

确定所述每一个所述网页链接组中所述若干个网页链接的第一权重值;determining the first weight value of the several webpage links in each of the webpage link groups;

将所述每一个所述网页链接组中相同的网页链接进行合并,以确定合并后的每一个所述网页链接的第二权重值;combining the same webpage links in each of the webpage link groups to determine the second weight value of each of the combined webpage links;

依据每一个所述网页链接的所述第二权重值获得每一组所述网页连接组的第三权重值;obtaining a third weight value of each group of the webpage link groups according to the second weight value of each of the webpage links;

依据所述第三权重值将若干个所述网页链接组从大到小进行排序。The plurality of webpage link groups are sorted from large to small according to the third weight value.

进一步地,所述方法还包括:将排序后所述的若干个所述网页链接组按照排序顺序进行输出。Further, the method further includes: outputting the plurality of the webpage link groups after sorting in a sorted order.

进一步地,所述方法还包括:所述搜索数据还包括若干个关键词组,每一个所述关键词组对应每一个所述网页链接组。Further, the method further includes: the search data further includes several keyword groups, each of which corresponds to each of the webpage link groups.

进一步地,所述方法还包括:所述若干个网页链接组是三组,包括第一网页链接组,第二网页链接组,第三网页链接组;Further, the method further includes: the several webpage link groups are three groups, including a first webpage link group, a second webpage link group, and a third webpage link group;

所述第一网页链接组中所述若干个网页链接的个数是n1;The number of the several webpage links in the first webpage link group is n1;

所述第二网页链接组中所述若干个网页链接的个数是n2;The number of the several webpage links in the second webpage link group is n2;

所述第三网页链接组中所述若干个网页链接的个数是n3;The number of the several webpage links in the third webpage link group is n3;

所述确定所述每一个所述网页链接组中所述若干个网页链接的第一权重值包括:The determining the first weight value of the several webpage links in each of the webpage link groups includes:

对于所述第一网页链接组中,每一个网页链接的第一权重值是1/n1;For the first webpage link group, the first weight value of each webpage link is 1/n1;

对于所述第二网页链接组中,每一个网页链接的第一权重值是1/n2;For the second webpage link group, the first weight value of each webpage link is 1/n2;

对于所述第三网页链接组中,每一个网页链接的第一权重值是1/n3;For the third webpage link group, the first weight value of each webpage link is 1/n3;

其中,所述n1、所述n2和所述n3均为正整数。Wherein, the n1, the n2 and the n3 are all positive integers.

进一步地,所述方法还包括:所述将所述每一个所述网页链接组中相同的网页链接进行合并,以确定合并后的每一个所述网页链接的第二权重值包括:Further, the method further includes: merging the same webpage links in each of the webpage link groups to determine the second weight value of each of the combined webpage links includes:

在所述若干个网页链接中,将每一个相同的所述网页链接在对应的网页链接组中的第一权重值进行相加,作为合并后所述相同的所述网页链接的第二权重值;Among the plurality of web page links, the first weight value of each identical web page link in the corresponding web page link group is added, as the second weight value of the same web page link after merging ;

在所述若干个网页链接中,将每一个不同的所述网页链接在对应的网页链接组中的第一权重,作为所述不同的所述网页链接的第二权重。Among the several webpage links, the first weight of each different webpage link in the corresponding webpage link group is used as the second weight of the different webpage link.

进一步地,所述方法还包括:所述依据每一个所述网页链接的所述第二权重值获得每一组所述网页连接组的第三权重值包括:Further, the method further includes: obtaining the third weight value of each group of the webpage link groups according to the second weight value of each of the webpage links includes:

分别将每一个所述网页链接组中所述若干个所述网页链接的第二权重值相加,作为所述网页链接组的第三权重值。The second weight values of the several web page links in each of the web page link groups are respectively added together as a third weight value of the web page link group.

依据本发明实施例的又一个方面,本发明实施例还提供了一种纵向搜索引擎的构建装置,所述装置包括:搜索数据获取模块,用于获取用户浏览网页时的搜索数据,所述搜索数据至少包括:若干个网页链接组,每一个所述网页链接组包括若干个网页链接;第一权重值确定模块,用于确定所述每一个所述网页链接组中所述若干个网页链接的第一权重值;第二权重值确定模块,用于将所述每一个所述网页链接组中相同的网页链接进行合并,以确定合并后的每一个所述网页链接的第二权重值;第三权重值确定模块,用于依据每一个所述网页链接的所述第二权重值获得每一组所述网页连接组的第三权重值;排序模块,依据所述第三权重值将若干个所述网页链接组从大到小进行排序。According to yet another aspect of the embodiment of the present invention, the embodiment of the present invention further provides an apparatus for constructing a vertical search engine, the apparatus comprising: a search data acquisition module for acquiring search data when a user browses a web page, the search The data at least includes: several web page link groups, each of which includes several web page links; a first weight value determination module for determining the number of web page links in each of the web page link groups. a first weight value; a second weight value determination module, configured to combine the same webpage links in each of the webpage link groups to determine the second weight value of each of the combined webpage links; The three-weight value determination module is used for obtaining the third weight value of each group of the webpage link groups according to the second weight value of each of the webpage links; the sorting module is used for sorting a number of The web page link groups are sorted in descending order.

可选的,所述装置还包括:输出模块,用于将排序后所述的若干个所述网页链接组按照排序顺序进行输出。Optionally, the apparatus further includes: an output module configured to output the plurality of the web page link groups after sorting in a sorted order.

本发明实施例中提供的一个或多个技术方案,至少具有如下技术效果或优点:One or more technical solutions provided in the embodiments of the present invention have at least the following technical effects or advantages:

本发明通过首先获取用户使用搜索引擎时的网页链接搜索数据,第一方面,计算出各个网页链接的第一权重值,第二方面,合并相同网页链接计算出相同网页的第二权重值,第三方面,依据相同网页的第二权重值计算出网页链接组的第三权重值;第四方面,依据网页链接组的第三权重值对网页链接组进行从大到小的排序。这样就使得用户在具体的搜索过程中,当输入关键词进行搜索时,反馈给用户的结果列表是已经进行了排列后的结果,即可以是与用户输入关键词所对应的网页链接组,此时的网页链接组已经是从大到小进行的排序。网页链接组依据第三权重值进行排序的含义是:包含越多重要网页链接的网页链接组其重要性越高;包含在越多网页链接组中的网页链接其重要性越高。用户根据反馈的结果,不需要依次点击网页链接来完成搜索过程,反馈的结果即是为用户提供了经验信息参考,从而使用户依据经验信息更快的完成搜索过程,提高了搜索效率,提升了用户的搜索体验效果。The present invention first obtains the web page link search data when the user uses the search engine, in the first aspect, calculates the first weight value of each web page link, in the second aspect, combines the same web page links to calculate the second weight value of the same web page, In the third aspect, the third weight value of the webpage link group is calculated according to the second weight value of the same webpage; in the fourth aspect, the webpage link group is sorted from large to small according to the third weight value of the webpage link group. In this way, in the specific search process, when the user enters a keyword to search, the result list fed back to the user is the result that has been arranged, that is, the web page link group corresponding to the keyword input by the user. The web link group is already sorted from large to small. The meaning of sorting the webpage link groups according to the third weight value is that the webpage link group containing more important webpage links has higher importance; the webpage link group included in more webpage link groups has higher importance. According to the feedback results, users do not need to click the web links in turn to complete the search process. The feedback results provide users with experience information for reference, so that users can complete the search process faster based on experience information, improve search efficiency, and improve User search experience.

上述说明仅是本发明技术方案的概述,为了能够更清楚了解本发明的技术手段,而可依照说明书的内容予以实施,并且为了让本发明的上述和其它目的、特征和优点能够更明显易懂,以下特举本发明的具体实施方式。The above description is only an overview of the technical solutions of the present invention, in order to be able to understand the technical means of the present invention more clearly, it can be implemented according to the content of the description, and in order to make the above and other purposes, features and advantages of the present invention more obvious and easy to understand , the following specific embodiments of the present invention are given.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description These are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained according to these drawings without creative efforts.

图1为本发明一实施例提供的一种纵向搜索引擎的构建方法的流程示意图;1 is a schematic flowchart of a method for constructing a vertical search engine according to an embodiment of the present invention;

图2为本发明又一实施例提供的一种纵向搜索引擎的构建装置的结构框图;2 is a structural block diagram of an apparatus for constructing a vertical search engine provided by another embodiment of the present invention;

图3a为一种纵向搜索引擎的构建方法中一次完整的搜索过程定义1的示意图;3 a is a schematic diagram of a complete search process definition 1 in a method for constructing a vertical search engine;

图3b为一种纵向搜索引擎的构建方法中一次完整的搜索过程定义2的示意图;Figure 3b is a schematic diagram of a complete search process definition 2 in a method for constructing a vertical search engine;

图4a为一种纵向搜索引擎的构建方法的两种不同搜索过程中搜索路径定义1的示意图;4a is a schematic diagram of a search path definition 1 in two different search processes of a method for constructing a vertical search engine;

图4b为一种纵向搜索引擎的构建方法的两种不同搜索过程中搜索路径定义2的示意图;4b is a schematic diagram of search path definition 2 in two different search processes of a method for constructing a vertical search engine;

图5为一种纵向搜索引擎的构建方法中用户搜索的操作数据例子的示意图。FIG. 5 is a schematic diagram of an example of operation data searched by a user in a method for constructing a vertical search engine.

具体实施方式Detailed ways

本发明实施例提供的一种纵向搜索引擎的构建方法及装置,用以解决现有技术中用户具体的搜索过程,是通过在结果列表中依次点击网页链接来完成搜索过程,导致具有搜索效率低的技术问题,达到了提高搜索效率,提升了用户的搜索体验的技术效果。The method and device for constructing a vertical search engine provided by the embodiments of the present invention are used to solve the specific search process of users in the prior art. The search process is completed by sequentially clicking web page links in the result list, resulting in low search efficiency. The technical effect of improving the search efficiency and the user's search experience has been achieved.

本发明实施例中的技术方案,总体思路如下:The technical scheme in the embodiment of the present invention, the general idea is as follows:

一种纵向搜索引擎的构建方法,所述方法包括:A method for constructing a vertical search engine, the method comprising:

获取用户浏览网页时的搜索数据,所述搜索数据至少包括:若干个网页链接组,每一个所述网页链接组包括若干个网页链接;Obtaining search data when the user browses a web page, the search data at least includes: several web page link groups, each of which includes several web page links;

确定所述每一个所述网页链接组中所述若干个网页链接的第一权重值;determining the first weight value of the several webpage links in each of the webpage link groups;

将所述每一个所述网页链接组中相同的网页链接进行合并,以确定合并后的每一个所述网页链接的第二权重值;combining the same webpage links in each of the webpage link groups to determine the second weight value of each of the combined webpage links;

依据每一个所述网页链接的所述第二权重值获得每一组所述网页连接组的第三权重值;obtaining a third weight value of each group of the webpage link groups according to the second weight value of each of the webpage links;

依据第三权重值将若干个所述网页链接组从大到小进行排序。The plurality of webpage link groups are sorted from large to small according to the third weight value.

本发明实施例通过获取用户使用搜索引擎时的网页链接搜索数据,第一计算出各个网页链接的第一权重值,第二合并相同网页链接计算出相同网页链接的第二权重值,第三依据相同网页链接的第二权重值计算出网页链接组的第三权重值;第四依据网页链接组的第三权重值对网页链接组进行从大到小的排序。用户在具体的搜索过程中,当输入关键词进行搜索时,反馈给用户的结果列表是已经进行了排列后的结果,即可以是与用户输入关键词所对应的网页链接组,此时的网页链接组已经过了从大到小的排序。网页链接组依据第三权重值进行排序的含义是:包含越多重要网页链接的网页链接组其重要性越高;包含在越多网页链接组中的网页链接其重要性越高。用户根据反馈的结果,不需要依次点击网页链接来完成搜索过程,反馈的结果即是为用户提供了经验信息参考,从而使用户依据经验信息更快的完成搜索过程,提高了搜索效率,提升了用户的搜索体验效果的技术效果。In the embodiment of the present invention, by acquiring the web page link search data when the user uses the search engine, the first weight value of each web page link is calculated first, the second weight value of the same web page link is calculated by combining the same web page links, and the third weight value is calculated according to the The third weight value of the webpage link group is calculated from the second weight value of the same webpage link; fourthly, the webpage link group is sorted from large to small according to the third weight value of the webpage link group. In the specific search process, when the user enters a keyword to search, the result list fed back to the user is the result that has been arranged, that is, the web page link group corresponding to the keyword input by the user, the web page at this time. Link groups have been sorted from largest to smallest. The meaning of sorting the webpage link groups according to the third weight value is that the webpage link group containing more important webpage links has higher importance; the webpage link group included in more webpage link groups has higher importance. According to the feedback results, users do not need to click the web links in turn to complete the search process. The feedback results provide users with experience information for reference, so that users can complete the search process faster based on experience information, improve search efficiency, and improve The technical effect of the user's search experience effect.

为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to make the purposes, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments These are some embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。The term "and/or" in this article is only an association relationship to describe the associated objects, indicating that there can be three kinds of relationships, for example, A and/or B, it can mean that A exists alone, A and B exist at the same time, and A and B exist independently B these three cases. In addition, the character "/" in this document generally indicates that the related objects are an "or" relationship.

本发明实施例提供了一种纵向搜索引擎的构建方法,所述方法可以应用于互联网技术领域,请参阅图1,所述方法包括但不限于如下步骤:An embodiment of the present invention provides a method for constructing a vertical search engine. The method can be applied to the field of Internet technology. Please refer to FIG. 1. The method includes but is not limited to the following steps:

步骤S101:获取用户浏览网页时的搜索数据,所述搜索数据至少包括:若干个网页链接组,每一个所述网页链接组包括若干个网页链接;Step S101: Obtain search data when a user browses a web page, the search data at least includes: several web page link groups, each of which includes several web page links;

进一步地,所述搜索数据还包括若干个关键词组,每一个所述关键词组对应每一个所述网页链接组。Further, the search data further includes several keyword groups, and each of the keyword groups corresponds to each of the web page link groups.

具体而言,网页链接组是包括了在一次完整的搜索过程中用户依次点击的网页链接,该网页链接组是在用户输入关键词搜索时而得到的。一个网页链接组即是一次完整的搜索过程中所点击的所有网页链接。用户输入关键词搜索,可以是在谷歌首页中输入若干关键字或者在百度首页中输入若干关键字,用户是通过在搜索引擎返回的结果列表中依次点击网页链接。Specifically, the web page link group includes web page links that the user clicks in sequence in a complete search process, and the web page link group is obtained when the user enters a keyword to search. A web link group is all web links clicked during a complete search. The user enters a keyword to search, which may be several keywords on the Google homepage or several keywords on the Baidu homepage, and the user clicks the web page links in sequence in the result list returned by the search engine.

获取用户浏览网页时的搜索数据,搜索数据还包括若干个关键词组,每一个关键词组对应每一个网页链接组。该数据可以由搜索引擎获取,例如在百度搜索网站的服务器中记录每次搜索过程(从用户使用浏览器访问搜索引擎开始,直至用户关闭浏览器为止,所有搜索关键字提交和网页跳转操作的过程可以称为一次搜索过程)中用户的操作数据,包括输入的关键字和/或点击的网页链接。这些数据以日志文件的形式存储在百度搜索引擎服务器中,实际的日志文件包含的信息可能更多,但获取用户浏览网页时的搜索数据只选取其中需要信息即可,数据的抽取过程可以利用ETL工具完成。数据抽取完成后,获取用户浏览网页时的搜索数据可以都是键值对(搜索关键词和搜索链接)的形式,如<“健康”,(“网易健康”,“健康报网”)>,此时,关键词组为:“健康”,网页链接组是:“网易健康”,“健康报网”。每一个关键词组对应每一个网页链接组是指:作为一个关键词组“健康”,在搜索引擎中输入“健康”后,用户在搜索引擎返回的结果里依次点击:“网易健康”,“健康报网”两个网页链接;此时“网易健康”,“健康报网”即是关键词组“健康”所对应的一个网页链接组。The search data when the user browses the webpage is obtained, and the search data also includes several keyword groups, and each keyword group corresponds to each webpage link group. This data can be obtained by a search engine, for example, in the server of the Baidu search website to record each search process (starting from the user accessing the search engine with a browser until the user closes the browser, all search keyword submissions and webpage jumping operations The process may be referred to as a search process) of the user's operation data, including input keywords and/or clicked web page links. These data are stored in the Baidu search engine server in the form of log files. The actual log files may contain more information, but to obtain the search data when users browse web pages, only the required information can be selected. The data extraction process can use ETL Tool complete. After the data extraction is completed, the search data obtained when the user browses the web page can be in the form of key-value pairs (search keywords and search links), such as <"Health", ("Netease Health", "Health News Network")>, At this time, the keyword group is: "health", and the web page link group is: "Netease Health", "Health News Network". Each keyword group corresponds to each web link group means: as a keyword group "health", after entering "health" in the search engine, the user clicks in sequence in the results returned by the search engine: "Netease Health", "Health News" "Net" two webpage links; at this time, "Netease Health" and "Health News Network" are a webpage link group corresponding to the keyword group "health".

请参阅图3a和图3b,一次完整的搜索过程有两种定义,图3a和图3b 分别为两种定义下的搜索过程图示。Referring to FIG. 3a and FIG. 3b, a complete search process has two definitions, and FIG. 3a and FIG. 3b are diagrams of the search process under the two definitions respectively.

定义1:搜索过程是指用户对搜索引擎服务器提交一次搜索关键词组,并点击网页链接进行跳转的整个过程。当用户再次提交新的搜索关键词组时,新的搜索过程开始。此时,网页链接组是指:在一次完整的搜索过程中,用户依次点击的所有网页链接。Definition 1: The search process refers to the entire process in which a user submits a search keyword group to a search engine server and clicks a web link to jump. When the user submits a new search keyword group again, a new search process begins. In this case, the webpage link group refers to: all webpage links clicked by the user in sequence during a complete search process.

如图3a所示,用户提交一次搜索关键词组就开始一次搜索过程。需注意,图3a中的网页跳转是指用户在搜索引擎返回的结果列表中点击网页链接进行跳转的操作。第1种搜索过程代表用户准确给出搜索关键词组,通过一次提交就找到所需网页链接。As shown in Figure 3a, a search process starts when a user submits a search keyword group once. It should be noted that the webpage jumping in FIG. 3a refers to the operation that the user clicks on the webpage link in the result list returned by the search engine to jump. The first search process means that the user accurately gives the search keyword group and finds the desired web page link through one submission.

例如:用户输入“户外运动装备”作为关键词组进行搜索,在搜索引擎服务器返回的结果列表中依次点击了2个网页链接:“中国户外装备”、“山脉户外”(为描述的简便,在说明书中省略具体的链接地址,使用链接标题代表链接本身,比如:“中国户外装备”为链接标题,它所对应的具体的链接地址是:http://www.papbout.com/);然后用户再次输入“电影2016”作为新的关键词组提交搜索,则用户输入关键词组“户外运动装备”以及点击2个网页链接的过程被定义为一次完整的搜索过程。同时,用户点击的这2个网页链接都是针对“户外运动装备”这一组关键词组的;此时用户依次点击的2个网页链接:“中国户外装备”、“山脉户外”即是一个网页链接组。For example: the user enters "outdoor sports equipment" as a keyword group to search, and clicks two web links in turn in the result list returned by the search engine server: "Chinese outdoor equipment", "mountain outdoor" (for the convenience of description, in the manual The specific link address is omitted, and the link title is used to represent the link itself. For example, "China Outdoor Equipment" is the link title, and its corresponding specific link address is: http://www.papbout.com/); then the user again Entering "movie 2016" as a new keyword group to submit a search, the process of the user inputting the keyword group "outdoor sports equipment" and clicking two webpage links is defined as a complete search process. At the same time, the two web links clicked by the user are for the keyword group "outdoor sports equipment"; at this time, the two web links clicked by the user in turn: "Chinese outdoor equipment" and "mountain outdoor" are a web page link group.

定义2:搜索过程是指用户使用浏览器访问搜索引擎开始,直至用户关闭浏览器为止,所有搜索关键词组提交和网页跳转操作的过程。此时,网页链接组是指:在一次完整的搜索过程中,用户依次点击的所有网页链接。Definition 2: The search process refers to the process of all search keyword group submissions and web page jump operations starting from the user accessing the search engine with a browser until the user closes the browser. In this case, the webpage link group refers to: all webpage links clicked by the user in sequence during a complete search process.

如图3b所示,用户从打开浏览器开始,直到关闭浏览器为止,对搜索引擎服务器进行的所有搜索关键词组提交和相应的网页跳转请求过程为搜索过程。第2种搜索过程代表用户经过多次修改搜索关键词组才找到所需信息。As shown in Fig. 3b, the user starts from opening the browser until closing the browser, and the process of submitting all search keyword groups to the search engine server and the corresponding web page jump request process is a search process. The second type of search process means that the user can only find the desired information after modifying the search keyword group for many times.

例如:用户使用浏览器访问搜索引擎时,第一次输入的关键词组是“户外运动”,搜索引擎服务器返回结果列表,此时依次点击了“新浪运动”、“凤凰运动”两个网页链接;用户第二次更新了关键词组,此时输入的关键词组是“健康”,对应的依次点击了“网易健康”,“健康报网”两个网页链接,然后用户关闭浏览器。则用户输入关键词组“户外运动”和所对应点击的2个网页链接,以及输入关键词组“健康”和所对应点击的2个网页链接的过程被定义为一次完整的搜索过程。同时在用户输入关键词组“户外运动”,所对应点击的2个网页链接:“新浪运动”、“凤凰运动”;用户输入关键词组“健康”,所对应点击的2个网页链接:“网易健康”、“健康报网”的过程中,“新浪运动”、“凤凰运动”、“网易健康”、“健康报网”四个网页链接即是一个网页链接组。For example: when a user uses a browser to access a search engine, the first keyword group entered is "outdoor sports", the search engine server returns a list of results, and at this time, two webpage links of "Sina Sports" and "Phoenix Sports" are clicked in turn; The user updates the keyword group for the second time, and the keyword group entered at this time is "health", and the corresponding two webpage links of "Netease Health" and "Health News" are clicked in turn, and then the user closes the browser. Then, the process that the user inputs the keyword group "outdoor sports" and the corresponding two web page links clicked, and the process of inputting the keyword group "health" and the corresponding clicked two web page links is defined as a complete search process. At the same time, when the user enters the keyword group "outdoor sports", the corresponding 2 webpage links are clicked: "Sina Sports" and "Phoenix Sports"; the user enters the keyword group "health", and the corresponding 2 webpage links clicked: "Netease Health" In the process of "Sina Sports", "Phoenix Sports", "Netease Health", and "Health News", the four webpage links are a webpage link group.

上述的两种定义标准可以是二选一的关系,对于输入数据的集合来说,要么使用第一种标准来处理,要么使用第二种。可以由搜索引擎的实现者来决定。The above-mentioned two definition criteria can be a relationship between two alternatives. For the set of input data, either the first criterion is used for processing, or the second criterion is used. It is up to the implementer of the search engine to decide.

请参阅图4a和图4b,一条搜索路径是指用户在一次搜索过程中进行的网页跳转(点击的所有网页链接)的序列。两种搜索路径分别对应了以上2种搜索过程,如图4a和图4b分别是两种不同搜索过程中的搜索路径。Referring to Fig. 4a and Fig. 4b, a search path refers to a sequence of webpage jumps (all webpage links clicked) performed by a user in a search process. The two search paths correspond to the above two search processes, respectively, and FIG. 4a and FIG. 4b are respectively the search paths in two different search processes.

图4a对应以上定义1的搜索过程,用户提交搜索关键词组后,进行网页跳转操作,即在搜索引擎返回的结果列表中点击了若干网页链接,这些网页链接的序列即是搜索路径。此时,一个网页链接组是指:在一条搜索路径中,所包含的所有网页链接。Figure 4a corresponds to the search process of Definition 1 above. After the user submits the search keyword group, a web page jump operation is performed, that is, several web page links are clicked in the result list returned by the search engine, and the sequence of these web page links is the search path. In this case, a web page link group refers to: all web page links included in a search path.

图4b对应以上定义2的搜索过程,用户在多次提交搜索关键词组和多次网页跳转操作中,所有网页链接的序列即是搜索路径。此时,一个网页链接组是指:在一条搜索路径中,所包含的所有网页链接。Fig. 4b corresponds to the search process of Definition 2 above. When the user submits the search keyword group multiple times and performs multiple webpage jump operations, the sequence of all webpage links is the search path. In this case, a web page link group refers to: all web page links included in a search path.

若干个网页链接组中的“若干”是正整数,即若干个网页链接组可以是一组、二组、三组、四组等,当搜索数据包括三个网页链接组时,此时就有第一网页链接组,第二网页链接组,第三网页链接组。搜索数据至少包括若干个网页链接组是指,搜索数据还可以包括:与每个网页链接组所对应的关键词组等信息。如上例,当包括与每个网页链接组所对应的关键词组时,搜索数据如同<“健康”,(“网易健康”,“健康报网”)>,即“网易健康”和“健康报网”构成的一个网页链接组所对应的关键词组为“健康”。"Several" in several webpage link groups is a positive integer, that is, several webpage link groups can be one, two, three, four, etc. When the search data includes three webpage link groups, then there is the first A webpage link group, a second webpage link group, and a third webpage link group. The fact that the search data includes at least several web page link groups means that the search data may also include information such as keyword groups corresponding to each web page link group. As in the above example, when the keyword group corresponding to each web page link group is included, the search data is like <"Health", ("Netease Health", "Health News Network")>, namely "NetEase Health" and "Health News Network" The keyword group corresponding to a web page link group formed by "health" is "health".

搜索数据至少包括若干个网页链接组是指,搜索数据还可以包括搜索关键词组。根据上述定义:在一条搜索路径中,所包含的所有网页链接即是:网页链接组。The fact that the search data includes at least several web page link groups means that the search data may also include a search keyword group. According to the above definition: in a search path, all web links included are: web link group.

其中,搜索关键词组和搜索路径网页链接组的键值对是<K,SP>,K为关键词组,SP为关键词组对应的网页链接组,即每1条搜索路径对应1个输入键值对。Among them, the key-value pair of the search keyword group and the search path webpage link group is <K, SP>, K is the keyword group, and SP is the webpage link group corresponding to the keyword group, that is, each search path corresponds to one input key-value pair .

请参阅图5,图5为用户搜索的操作数据,用户在打开浏览器后,向搜索引擎服务器提交了2次搜索关键词组,第1次是“健康”,第2次是“运动”和“指导”。对应的,第1次提交后依次点击了“网易健康”和“健康报网” 2个网页链接,第2次则是“健康运动指导”。然后,用户关闭浏览器。Please refer to Figure 5. Figure 5 is the operation data of the user's search. After the user opens the browser, the user submits two search keyword groups to the search engine server, the first time is "health", the second time is "exercise" and "" guide". Correspondingly, after the first submission, I clicked the two webpage links of "Netease Health" and "Health News" in turn, and the second time was "Healthy Exercise Guidance". Then, the user closes the browser.

根据上述第1种搜索过程的定义,如图5例子对应的输入数据为2条,依次是<{“健康”},{“网易健康”,“健康报网”}>和<{“运动指导”},{“健康运动指导”}>。其中,<{“健康”},{“网易健康”,“健康报网”}>是1 个输入键值对,其中“网易健康”,“健康报网”为一个网页链接组。According to the definition of the first search process above, there are 2 pieces of input data corresponding to the example in Figure 5, which are <{"Health"}, {"Netease Health", "Health News"}> and <{"Exercise Guidance"}> "}, {"Healthy Exercise Guide"}>. Among them, <{"Health"}, {"NetEase Health", "Health News Network"}> is an input key-value pair, among which "NetEase Health" and "Health News Network" are a web page link group.

根据上述第2种搜索过程的定义,如图5例子对应的输入数据为1条,是<{“健康”,“运动指导”},{“网易健康”,“健康报网”,“健康运动指导”}>。其中,{“网易健康”,“健康报网”,“健康运动指导”}为一个网页链接组。According to the definition of the second search process above, the input data corresponding to the example in Figure 5 is 1, which is <{"Health", "Exercise guidance"}, {"Netease Health", "Health News Network", "Healthy Exercise" guidance"}>. Among them, {"Netease Health", "Health Newspaper", "Healthy Exercise Guide"} is a web link group.

步骤S102:确定所述每一个所述网页链接组中所述若干个网页链接的第一权重值;Step S102: determining the first weight value of the several webpage links in each of the webpage link groups;

进一步的,所述若干个网页链接组是三组,包括第一网页链接组,第二网页链接组,第三网页链接组;Further, the several webpage link groups are three groups, including a first webpage link group, a second webpage link group, and a third webpage link group;

所述第一网页链接组中所述若干个网页链接的个数是n1;The number of the several webpage links in the first webpage link group is n1;

所述第二网页链接组中所述若干个网页链接的个数是n2;The number of the several webpage links in the second webpage link group is n2;

所述第三网页链接组中所述若干个网页链接的个数是n3;The number of the several webpage links in the third webpage link group is n3;

所述确定所述每一个所述网页链接组中所述若干个网页链接的第一权重值包括:The determining the first weight value of the several webpage links in each of the webpage link groups includes:

对于所述第一网页链接组中,每一个网页链接的第一权重值是1/n1;For the first webpage link group, the first weight value of each webpage link is 1/n1;

对于所述第二网页链接组中,每一个网页链接的第一权重值是1/n2;For the second webpage link group, the first weight value of each webpage link is 1/n2;

对于所述第三网页链接组中,每一个网页链接的第一权重值是1/n3;For the third webpage link group, the first weight value of each webpage link is 1/n3;

其中,所述n1、所述n2和所述n3均为正整数。Wherein, the n1, the n2 and the n3 are all positive integers.

具体而言,若干个网页链接组可以是:二个、三个、四个、五个等网页链接组。当时若干个网页链接组是三组时,这三组可以分别是:第一网页链接组,第二网页链接组,第三网页链接组;其中,第一网页链接组中若干个网页链接的个数是n1;第二网页链接组中若干个网页链接的个数是n2;第三网页链接组中若干个网页链接的个数是n3;n1、n2和n3均为正整数;每个网页链接组的初始化权重值都可以设定为1。例如(在步骤S103中也会引用此列):第一网页链接组是(“网易健康”,“健康报网”),所对应的关键词组是:“健康”,第一网页链接组中包括2个网页链接,分别是“网易健康”,“健康报网”;第二网页链接组是(“网易健康”,“搜狐健康”,“百度健康”),所对应的关键词组是:“健康”,第二网页链接组中包括3网页链接,分别是“网易健康”,“搜狐健康”,“百度健康”;第三网页链接组是(“网易健康”,“搜狐健康”,“凤凰健康”),所对应的关键词组是:“健康”,第三网页链接组中包括3个网页链接,分别是“网易健康”,“搜狐健康”,“凤凰健康”;Specifically, the several webpage link groups may be: two, three, four, five, etc. webpage link groups. When several webpage link groups are three groups at that time, the three groups can be respectively: the first webpage link group, the second webpage link group, and the third webpage link group; wherein, each of the several webpage links in the first webpage link group The number is n1; the number of several webpage links in the second webpage link group is n2; the number of several webpage links in the third webpage link group is n3; n1, n2 and n3 are all positive integers; each webpage link The initial weight value of the group can be set to 1. For example (this column will also be quoted in step S103): the first webpage link group is ("Netease Health", "Health News"), the corresponding keyword group is: "health", and the first webpage link group includes 2 webpage links, namely "Netease Health" and "Health News"; the second webpage link group is ("Netease Health", "Sohu Health", "Baidu Health"), and the corresponding keyword group is: "Health" ”, the second webpage link group includes 3 webpage links, namely “NetEase Health”, “Sohu Health”, and “Baidu Health”; the third webpage link group is (“NetEase Health”, “Sohu Health”, “Phoenix Health” "), the corresponding keyword group is: "health", the third web page link group includes 3 web page links, namely "Netease Health", "Sohu Health", "Phoenix Health";

所述确定所述每一个所述网页链接组中所述若干个网页链接的第一权重值包括:对于所述第一网页链接组中,每一个网页链接的第一权重值是1/n1;The determining the first weight value of the several webpage links in each of the webpage link groups includes: for the first webpage link group, the first weight value of each webpage link is 1/n1;

对于所述第二网页链接组中,每一个网页链接的第一权重值是1/n2;对于所述第三网页链接组中,每一个网页链接的第一权重值是1/n3;如上述例子:第一网页链接组的n1=2,由于每个网页链接组的初始化权重值都可以设定为1,第一网页链接组中网页链接“网易健康”的第一权重值是1/2;第二网页链接组的n2=3,第二网页链接组中的“搜狐健康”的第一权重值是1/3。For the second web page link group, the first weight value of each web page link is 1/n2; for the third web page link group, the first weight value of each web page link is 1/n3; as above Example: n1=2 of the first webpage link group, since the initial weight value of each webpage link group can be set to 1, the first weight value of the webpage link "NetEase Health" in the first webpage link group is 1/2 ; n2=3 of the second webpage link group, and the first weight value of "Sohu Health" in the second webpage link group is 1/3.

可以使用谷歌公司的MapReduce计算框架来实现本实施例的步骤, MapReduce计算框架是一个高度并行化的计算框架。即是同时使用多台计算机去处理同一个问题,其中每台计算机分到整个问题的一小部分,如果有上千台机器同时计算的话,整个过程就相当于被分成上千个部分,会加快计算过程。当一个并行框架允许同时使用大量计算机进行并行计算的时候,一般习惯称为高度并行化的。MapReduce计算框架将整个问题的计算过程分为 Map阶段和Reduce阶段(以下会根据Map阶段和Reduce阶段详细介绍实现本发明的各步骤,例如:Map1阶段即是Map的第一阶段,Reduce2阶段即是 Reduce的第二阶段),其中Map阶段的输出为Reduce阶段的输入,同时也可以包含多个Map阶段和多个Reduce阶段。Hadoop是实现了MapReduce计算框架的开源软件,本发明实施例中的步骤S102、S103、S104、S105都可以运用MapReduce计算框架,直接采用Hadoop软件来实现。The steps of this embodiment may be implemented by using the MapReduce computing framework of Google, which is a highly parallelized computing framework. That is, using multiple computers to deal with the same problem at the same time, in which each computer is divided into a small part of the whole problem, if there are thousands of machines calculating at the same time, the whole process is equivalent to being divided into thousands of parts, which will speed up calculation process. When a parallel framework allows parallel computing using a large number of computers at the same time, it is commonly used to call it highly parallelized. The MapReduce computing framework divides the computing process of the entire problem into a Map stage and a Reduce stage (the following will introduce the steps to implement the present invention in detail according to the Map stage and the Reduce stage. For example, the Map1 stage is the first stage of the Map, and the Reduce2 stage is the The second stage of Reduce), where the output of the Map stage is the input of the Reduce stage, and it can also contain multiple Map stages and multiple Reduce stages. Hadoop is open source software that implements the MapReduce computing framework. Steps S102 , S103 , S104 , and S105 in the embodiments of the present invention can all be implemented using the MapReduce computing framework and directly using Hadoop software.

例如:如下所示的算法伪代码,在Map阶段(完成步骤S102),即是对输入数据集合中的每条搜索数据,计算各个网页链接组中所有网页链接的第一权重值。本阶段接收输入的数据集合是{<Ki,SPi>},其中<Ki,SPi>为所输入数据集合中的第i条输入数据(i为正整数),Ki为关键词组,SPi为网页链接组,比如:第1条输入数据的关键词组是K1,SP1是所对应的网页链接组。算法中pj(j为正整数)是指网页链接,例如:p1是第一个网页链接,在算法的第2至4行对搜索路径中(即网页链接组中)包含的每个网页链接pj,计算网页链接的第一权重值w,函数length(SPi)用于取得SPi中包含的网页链接个数。如果一条搜索路径包含n个网页链接(n为正整数),则其中每个网页链接的初始权重值为1/n。本阶段的输出的是一个键值对的集合,其中键值对数据包含了网页连接的第一权重值,形式类似于<“健康”|“网易健康”,0.5>,键为网页链接,值为相应的网页链接第一权重值;其中,假设0.5是“网页健康”的第一权重值。For example, in the algorithm pseudocode shown below, in the Map stage (step S102 is completed), the first weight value of all webpage links in each webpage link group is calculated for each piece of search data in the input data set. The input data set received at this stage is {<K i , SP i >}, where <K i , SP i > is the ith input data (i is a positive integer) in the input data set, and K i is the keyword group , SP i is a web page link group, for example: the keyword group of the first input data is K 1 , and SP 1 is the corresponding web page link group. In the algorithm, p j ( j is a positive integer) refers to the web page link, for example: p 1 is the first web page link, in the second to fourth lines of the algorithm, each web page included in the search path (that is, in the web page link group) is analyzed. For link p j , the first weight value w of the web page link is calculated, and the function length(SP i ) is used to obtain the number of web page links included in SP i . If a search path contains n webpage links (n is a positive integer), the initial weight of each webpage link is 1/n. The output of this stage is a set of key-value pairs, in which the key-value pair data contains the first weight value of the web page connection, the form is similar to <"Health"|"Netease Health", 0.5>, the key is the web page link, the value is the first weight value of the corresponding web page link; wherein, it is assumed that 0.5 is the first weight value of "web page health".

Map phase 1Map phase 1

第1行:input:{<Ki,SPi>}Line 1: input:{<K i ,SP i >}

第2行:for all pj in SPi doLine 2: for all p j in SP i do

第3行:

Figure BDA0001302061770000121
Line 3:
Figure BDA0001302061770000121

第4行:output{<Ki|pj,w>}Line 4: output{<K i |p j ,w>}

第5行:end forLine 5: end for

Reduce phase 1Reduce phase 1

第6行:input:{<Ki|pj,w>}Line 6: input:{<K i |p j ,w>}

第7行:

Figure BDA0001302061770000122
Line 7:
Figure BDA0001302061770000122

第8行:output{<Ki|pj,Wj>}Line 8: output{<K i |p j ,W j >}

Reduce phase 2Reduce phase 2

第9行:input:{<Ki,SPi>},{<Ki|pj,Wj>}Line 9: input:{<K i ,SP i >},{<K i |p j ,W j >}

第10行:

Figure BDA0001302061770000123
Line 10:
Figure BDA0001302061770000123

第11行:output{<Ki,SPi|SPWi>}Line 11: output{<K i ,SP i |SPW i >}

步骤S103:将所述每一个所述网页链接组中相同的网页链接进行合并,以确定合并后的每一个所述网页链接的第二权重值;Step S103: merging the same webpage links in each of the webpage link groups to determine the second weight value of each of the combined webpage links;

进一步的,在所述若干个网页链接中,将每一个相同的所述网页链接在对应的网页链接组中的第一权重值进行相加,作为合并后所述相同的所述网页链接的第二权重值;Further, in the several webpage links, the first weight value of each identical webpage link in the corresponding webpage link group is added as the first weight value of the same webpage link after merging. Two weight values;

在所述若干个网页链接中,将每一个不同的所述网页链接在对应的网页链接组中的第一权重,作为所述不同的所述网页链接的第二权重。Among the several webpage links, the first weight of each different webpage link in the corresponding webpage link group is used as the second weight of the different webpage link.

具体而言,网页链接组中相同的网页链接是指每个网页链接组中所包括的相同网页链接,例如(在步骤S104中也会引用此列)上述举例中,第一网页链接组中的“网易健康”与第二网页链接组中的“网易健康”、第三网页链接组中的“网易健康”都是相同网页链接,将第一、二、三网页链接组中包括的相同网页链接进行合并即是:将“网易健康”在三个网页链接组中的第一权重值进行合并,由于每个网页链接组的初始化权重值都可以设定为1,此时“网易健康”网页链接的第二权重值是1/2+1/3+1/3=7/6;第二网页链接组中的“搜狐健康”和第三网页链接组中的“搜狐健康”也是相同的网页链接,进行合并后“搜狐健康”的第二权重值是1/3+1/3=2/3。页链接组中不同的网页链接是指每个网页链接组中所包括的互不相同的网页链接,例如上述举例中,第一网页链接组中的“健康报网”与第二网页链接组中的“百度健康”、第三网页链接组中的“凤凰健康”都是不同的互不相同的网页链接,第一网页链接组中“健康报网”的第二权重值等于它的第一权重,即“健康报网”的第二权重值等于1/2。Specifically, the same web page link in the web page link group refers to the same web page link included in each web page link group. "NetEase Health" and "NetEase Health" in the second webpage link group and "NetEase Health" in the third webpage link group are all the same webpage links. Link the same webpage links included in the first, second, and third webpage link groups. Merging is: merging the first weight values of “NetEase Health” in the three web page link groups. Since the initial weight value of each web page link group can be set to 1, at this time, the “NetEase Health” web page link The second weight value is 1/2+1/3+1/3=7/6; "Sohu Health" in the second web link group and "Sohu Health" in the third web link group are also the same web link , the second weight value of "Sohu Health" after the merger is 1/3+1/3=2/3. The different webpage links in the page link group refer to the different webpage links included in each webpage link group. For example, in the above example, the “Health News” in the first webpage link group and the second webpage link group The "Baidu Health" in the third webpage link group and the "Phoenix Health" in the third webpage link group are different webpage links. The second weight value of "Health News" in the first webpage link group is equal to its first weight value. , that is, the second weight value of "Health News Network" is equal to 1/2.

当用MapReduce计算框架时,在Reduce 1阶段:完成步骤S103,即是对相同Ki|pj下的所有网页链接的第一权重值进行相加;其中,相同Ki|pj是指:同时满足Ki和pj都相同。(请参阅上所示算法伪代码的第7行,其中,Wj是网页链接的第二权重值,j为正整数),计算出所有网页链接的第二权重值 Wj。本阶段是将Map阶段的输出作为Reduce1阶段的输入,然后将所有网页链接组中相同的网页链接进行合并,相同网页链接的第一权重值进行相加,作为合并后相同的网页链接的第二权重值,将每一个不同的网页链接在对应的网页链接组中的第一权重,作为不同的网页链接的第二权重。本阶段输出值仍然为一个键值对的集合,其中每条键值对数据的形式类似于<“健康”| “网易健康”,7/6>,“网易健康”为网页链接,7/6是“网易健康”的第二权重值。When using MapReduce to calculate the framework, in the stage of Reduce 1: completing step S103, that is, adding the first weight values of all web page links under the same K i |p j ; wherein, the same K i |p j refers to: At the same time, both K i and p j are the same. (Refer to line 7 of the algorithm pseudocode shown above, where W j is the second weight value of web page links, and j is a positive integer), and calculate the second weight value W j of all web page links. In this stage, the output of the Map stage is used as the input of the Reduce1 stage, and then the same webpage links in all the webpage link groups are merged, and the first weight value of the same webpage link is added as the second link of the same webpage link after the merger. For the weight value, the first weight of each different webpage link in the corresponding webpage link group is taken as the second weight of the different webpage link. The output value at this stage is still a set of key-value pairs, in which the form of each key-value pair data is similar to <"Health" | "Netease Health", 7/6>, "Netease Health" is a web page link, 7/6 It is the second weight value of "NetEase Health".

因为将相同的网页链接进行合并,合并时将每一个相同的所述网页链接在对应的网页链接组中的第一权重值进行相加,所以权重值高的即是用户在搜索时关注高的网页链接,从而能将用户历史经验信息中价格高的信息反馈给用户,使用户能更快的完成搜索过来,达到了提高搜索效率的技术效果。Because the same web page links are merged, the first weight value of each same web page link in the corresponding web page link group is added during merging, so the higher weight value means that the user pays high attention when searching. The web page link can feed back the high price information in the user's historical experience information to the user, so that the user can complete the search faster, and achieve the technical effect of improving the search efficiency.

步骤S104:依据每一个所述网页链接的所述第二权重值获得每一组所述网页连接组的第三权重值;Step S104: obtaining a third weight value of each group of the webpage link groups according to the second weight value of each of the webpage links;

进一步的,分别将每一个所述网页链接组中所述若干个所述网页链接的第二权重值相加,作为所述网页链接组的第三权重值。Further, the second weight values of the several web page links in each of the web page link groups are added together as the third weight value of the web page link group.

具体而言,网页链接组的第三权重值,是将网页链接组中包含的若干个所述网页链接的第二权重值相加的数值。如上述例子(在步骤S105中也会引用此列):第一网页链接组中网页链接“网易健康”的第二权重值是7/6,“健康报网”网页链接的第二权重值是1/2,此时第一网页链接组的第三权重值等于“网易健康”和“健康报网”的第二权重值相加,即第一网页链接组的第三权重值为:7/6+1/2=5/3。Specifically, the third weight value of the webpage link group is a value obtained by adding the second weight values of several webpage links included in the webpage link group. As in the above example (this column will also be quoted in step S105): the second weight value of the webpage link "NetEase Health" in the first webpage link group is 7/6, and the second weight value of the webpage link of "Health News Network" is 1/2, at this time, the third weight value of the first webpage link group is equal to the sum of the second weight values of "NetEase Health" and "Health News", that is, the third weight value of the first webpage link group is: 7/ 6+1/2=5/3.

当用MapReduce计算框架时,在Reduce 2阶段:完成步骤S104,即是将每个网页连接组里所有网页链接的第二权重值相加,计算所有网页链接组的第三权重值。在上述算法伪代码第10行的计算中,对于关键词Ki,相应的网页链接组SPi的第三权重值SPWi,是由SPi对应的所有Ki|pj的网页链接的第二权重值Wj相加得到。本阶段是将Reduce1阶段的输出再次在Reduce 2阶段作为输入,根据Reduce1阶段输出的每个网页链接的第二权重值,计算所有网页连接组的第三权重值,输出一个键值对的集合,其中键值对数据包含了网页链接组第三权重值的计算值,形式类似于<“健康”|(“网易健康”,“健康报网”)|5/3>,键为网页链接组,值为相应网页链接组的第三权重值。When using MapReduce to calculate the framework, in the Reduce 2 stage: Step S104 is completed, that is, adding the second weight values of all web page links in each web page link group to calculate the third weight value of all web page link groups. In the calculation of the 10th line of the above algorithm pseudocode, for the keyword K i , the third weight value SPW i of the corresponding web page link group SP i is the third weight value SPW i of all web page links corresponding to SP i of K i |p j The two weight values W j are added together to obtain. In this stage, the output of the Reduce 1 stage is used as the input in the Reduce 2 stage again. According to the second weight value of each webpage link output in the Reduce 1 stage, the third weight value of all webpage connection groups is calculated, and a set of key-value pairs is output. The key-value pair data contains the calculated value of the third weight value of the webpage link group, in the form of <"Health"|("Netease Health", "Health News Network")|5/3>, the key is the webpage link group, The value is the third weight value of the corresponding webpage link group.

步骤S105:依据所述第三权重值将若干个所述网页链接组从大到小进行排序。Step S105: Sort a plurality of the webpage link groups from large to small according to the third weight value.

进一步的,将排序后所述的若干个所述网页链接组按照排序顺序进行输出。Further, outputting the plurality of the web page link groups after the sorting is performed according to the sorting order.

具体而言,相同的网页链接组是指网页链接组中所包括的网页链接都相同,并且网页链接组都是对应于相同的关键词组。比如:假设第四网页链接组中包括:“网易健康”、“搜狐健康”和“凤凰健康”三个网页链接,因为上述例子中的第三网页链接组也只包括:“网易健康”、“搜狐健康”和“凤凰健康”三个网页链接,第四网页链接组与第三网页链接组所对应的关键词组都是“健康”,即在用户输入关键词组“健康”时依次点击的网页链接,所以第四网页链接组与第三网页链接组是相同的网页链接组。当在第一、二、三、四网页链接组中,只有第三与第四网页链接组是相同的网页链接组时,如果第三网页链接组的第三权重值是1.2,第四网页链接组的第三权重值也是1.2,最终这2个网页链接组在排序时合并为一个链接组,排序时使用的网页链接组的第三权重值也是1.2;将各个网页链接组按照第三权重值的大小进行从大到小的排序。因为第三与第四网页链接组是相同的网页链接组,所以将第三与第四网页链接组合并为一个网页链接组参与排序,该网页链接组中包括:“网易健康”、“搜狐健康”和“凤凰健康”三个网页链接,该网页链接组的第三权重值是:1.2,将此合并后的网页链接组第三权重值与各个不相同的网页链接组的第三权重值进行大小的比较,即将此合并后的网页链接组第三权重值与第一、二网页链接组的第三权重值按照从大到小进行排列。Specifically, the same webpage link group means that the webpage links included in the webpage link group are all the same, and the webpage link groups all correspond to the same keyword group. For example: Suppose the fourth webpage link group includes three webpage links: "NetEase Health", "Sohu Health" and "Phoenix Health", because the third webpage link group in the above example only includes: "NetEase Health", "Phoenix Health" Sohu Health” and “Phoenix Health” are three webpage links, and the keyword groups corresponding to the fourth webpage link group and the third webpage link group are all “health”, that is, the webpage links clicked in turn when the user enters the keyword group “health” , so the fourth webpage link group and the third webpage link group are the same webpage link group. When in the first, second, third, and fourth webpage link groups, only the third and fourth webpage link groups are the same webpage link group, if the third weight value of the third webpage link group is 1.2, the fourth webpage link group The third weight value of the group is also 1.2. Finally, the two web page link groups are merged into one link group when sorting, and the third weight value of the web page link group used in sorting is also 1.2; The size is sorted from largest to smallest. Because the third and fourth webpage link groups are the same webpage link group, the third and fourth webpage links are combined into one webpage link group to participate in the sorting. The webpage link group includes: "Netease Health", "Sohu Health" ” and “Phoenix Health” three webpage links, the third weight value of the webpage link group is: 1.2, the third weight value of the combined webpage link group is compared with the third weight value of each different webpage link group. The size comparison is to arrange the third weight value of the combined webpage link group and the third weight value of the first and second webpage link groups in descending order.

在输出时按照网页链接组第三权重值按照从大到小进行排列的结果可以是:(“网易健康”、“搜狐健康”、“凤凰健康”)、(“网易健康”,“健康报网”)、 (“网易健康”,“搜狐健康”,“百度健康”),而所对应的关键词组都是:“健康”。此时可以单独输出此结果,或者将此结果和原来的搜索引擎搜索时返回的结果列表一起输出,例如:可以和用户在谷歌首页中输入关键字时,用户在搜索引擎所返回的结果列表一起输出。用户在使用本方法时搜索流程可以是:照常向搜索引擎提交搜索关键词,然后搜索引擎返回网页排序和使用本方法按照网页链接组的第三权重值从大到小进行排列的两种结果。其中,搜索引擎返回的网页排序结果与只使用搜索引擎时的排序结果一致。而使用本方法按照网页链接组第三权重值从大到小进行排列的结果,则是为用户提供推荐/可选的搜索路径,其形式是多个网页链接的序列。因为使用本方法是按照网页链接组的第三权重值从大到小进行排列输出,将用户的经验信息反馈给了用户,使用户能更快的完成搜索过来,达到了提高搜索效率的技术效果。When outputting according to the third weight value of the webpage link group, the results can be arranged in descending order: (“NetEase Health”, “Sohu Health”, “Phoenix Health”), (“NetEase Health”, “Health News Network” ”), (“NetEase Health”, “Sohu Health”, “Baidu Health”), and the corresponding keyword groups are: “Health”. At this time, the result can be output alone, or the result can be output together with the result list returned by the original search engine. For example, it can be output together with the result list returned by the search engine when the user enters a keyword on the Google homepage. output. When a user uses this method, the search process can be as follows: submit search keywords to the search engine as usual, and then the search engine returns two results: web page ranking and ranking according to the third weight value of the web page link group by this method. Wherein, the ranking result of the web pages returned by the search engine is consistent with the ranking result when only the search engine is used. The result of using this method to arrange the third weight value of the webpage link group in descending order is to provide the user with a recommended/optional search path in the form of a sequence of multiple webpage links. Because this method is used to arrange the output according to the third weight value of the webpage link group from large to small, the user's experience information is fed back to the user, so that the user can complete the search faster, and achieves the technical effect of improving the search efficiency. .

当用MapReduce计算框架时,在上述算法伪代码的第11行输出结果是:集合{<Ki,SPi|SPWi>}(完成步骤S105)。其中相同搜索关键词组下的所有网页链接组根据其第三权重值进行排序。当用户开始一个新的搜索过程时,如果提交的搜索关键词组正好与输出结果中的某个关键词组相同,则输出该关键词组下所有网页链接组排序后的结果。When using MapReduce to calculate the frame, the output result in line 11 of the above algorithm pseudocode is: set {<K i , SP i | SPW i >} (step S105 is completed). All webpage link groups under the same search keyword group are sorted according to their third weight value. When the user starts a new search process, if the submitted search keyword group is exactly the same as a certain keyword group in the output result, the sorted result of all webpage link groups under the keyword group is output.

本发明提供的方法可以与大数据方法结合,对海量的用户搜索数据进行收集和分析,所以分析的结果是:越多用户选择的搜索路径,网页链接组排名就越靠前。本发明提供的方法中,输出的排列结果又是为海量用户提供了参考信息。越是排名靠前的网页链接组就越是具有参考价值,越能代表大多数用户的选择,因此本发明提供的方法提供的排列结果,可以符合大多数用户的需求。因此,本发明提供的方法能够提升大多数用户的搜索体验,让大多数用户能够根据网页链接组的排序结果快速找出所需的一系列网页,达到提高搜索效率和提升用户搜索体验的技术效果。The method provided by the present invention can be combined with the big data method to collect and analyze massive user search data, so the analysis result is: the more search paths selected by the user, the higher the ranking of the webpage link group. In the method provided by the present invention, the output arrangement result provides reference information for a large number of users. The top-ranked web page link group has more reference value and can represent the choices of most users. Therefore, the ranking result provided by the method provided by the present invention can meet the needs of most users. Therefore, the method provided by the present invention can improve the search experience of most users, and allow most users to quickly find a series of web pages needed according to the sorting results of the web page link groups, so as to achieve the technical effect of improving search efficiency and user search experience. .

基于同一发明构思,本发明实施例还提供了与上述方法实施例对应的装置装置,如下所述:Based on the same inventive concept, the embodiments of the present invention also provide apparatuses corresponding to the foregoing method embodiments, as follows:

本发明又一实施例提供了一种装置,请参考图2,所述装置包括:Another embodiment of the present invention provides an apparatus, please refer to FIG. 2 , the apparatus includes:

搜索数据获取模块201,用于获取用户浏览网页时的搜索数据,所述搜索数据至少包括:若干个网页链接组,每一个所述网页链接组包括若干个网页链接;A search data acquisition module 201, configured to acquire search data when a user browses a web page, the search data at least includes: several web page link groups, and each of the web page link groups includes several web page links;

第一权重值确定模块202,用于确定所述每一个所述网页链接组中所述若干个网页链接的第一权重值;a first weight value determination module 202, configured to determine the first weight value of the several webpage links in each of the webpage link groups;

第二权重值确定模块203,用于将所述每一个所述网页链接组中相同的网页链接进行合并,以确定合并后的每一个所述网页链接的第二权重值;The second weight value determination module 203 is configured to combine the same webpage links in each of the webpage link groups to determine the second weight value of each of the combined webpage links;

第三权重值确定模块204,用于依据每一个所述网页链接的所述第二权重值获得每一组所述网页连接组的第三权重值;A third weight value determining module 204, configured to obtain a third weight value of each group of the webpage link groups according to the second weight value of each of the webpage links;

排序模块205,用于依据所述第三权重值将若干个所述网页链接组从大到小进行排序。The sorting module 205 is configured to sort several of the webpage link groups from large to small according to the third weight value.

本发明实施例中,所述装置还包括:输出模块,用于将排序后所述的若干个所述网页链接组按照排序顺序进行输出。In the embodiment of the present invention, the apparatus further includes: an output module, configured to output the plurality of the web page link groups after sorting in a sorted order.

在本发明实施例中,所述若干个网页链接组是三组,包括第一网页链接组,第二网页链接组,第三网页链接组;所述第一网页链接组中所述若干个网页链接的个数是n1;所述第二网页链接组中所述若干个网页链接的个数是 n2;所述第三网页链接组中所述若干个网页链接的个数是n3;所述确定所述每一个所述网页链接组中所述若干个网页链接的第一权重值包括:对于所述第一网页链接组中,每一个网页链接的第一权重值是1/n1;对于所述第二网页链接组中,每一个网页链接的第一权重值是1/n2;对于所述第三网页链接组中,每一个网页链接的第一权重值是1/n3;其中,所述n1、所述n2和所述 n3均为正整数。In the embodiment of the present invention, the several webpage link groups are three groups, including a first webpage link group, a second webpage link group, and a third webpage link group; the several webpages in the first webpage link group The number of links is n1; the number of the several webpage links in the second webpage link group is n2; the number of the several webpage links in the third webpage link group is n3; the determining The first weight value of the several webpage links in each of the webpage link groups includes: for the first webpage link group, the first weight value of each webpage link is 1/n1; for the first webpage link group, the first weight value of each webpage link is 1/n1; In the second webpage link group, the first weight value of each webpage link is 1/n2; for the third webpage link group, the first weight value of each webpage link is 1/n3; wherein, the n1 , the n2 and the n3 are all positive integers.

在本发明实施例中,所述第二权重值确定模块还包括:In this embodiment of the present invention, the second weight value determination module further includes:

第一子模块,用于在所述若干个网页链接中,将每一个相同的所述网页链接在对应的网页链接组中的第一权重值进行相加,作为合并后所述相同的所述网页链接的第二权重值;The first sub-module is used to add the first weight values of each identical web page link in the corresponding web page link group among the several web page links, as the same after the combination. The second weight value of the webpage link;

第二子模块,用于在所述若干个网页链接中,将每一个不同的所述网页链接在对应的网页链接组中的第一权重,作为所述不同的所述网页链接的第二权重。The second sub-module is configured to use, among the several webpage links, the first weight of each different webpage link in the corresponding webpage link group as the second weight of the different webpage links .

在本发明实施例中,所述第三权重值确定模块还包括:In this embodiment of the present invention, the third weight value determination module further includes:

第三子模块,用于分别将每一个所述网页链接组中所述若干个所述网页链接的第二权重值相加,作为所述网页链接组的第三权重值。The third sub-module is configured to respectively add the second weight values of the plurality of the webpage links in each of the webpage link groups as the third weight value of the webpage link group.

由于本发明又一实施例所介绍的装置,为实施本发明方法实施例所采用的装置,故而基于本发明实施例一所介绍的方法,本领域所属人员能够了解该装置的具体结构及变形,故而在此不再赘述。凡是本发明实施例的方法所采用的装置都属于本发明所欲保护的范围。Since the device introduced in another embodiment of the present invention is the device used to implement the method embodiment of the present invention, based on the method introduced in the first embodiment of the present invention, those skilled in the art can understand the specific structure and deformation of the device, Therefore, it is not repeated here. All devices used in the methods of the embodiments of the present invention belong to the scope of protection of the present invention.

本发明实施例中提供的技术方案,至少具有如下技术效果或优点:The technical solutions provided in the embodiments of the present invention have at least the following technical effects or advantages:

通过获取用户使用搜索引擎时的网页链接搜索数据,第一计算出各个网页链接的第一权重值,第二合并相同网页链接计算出相同网页的第二权重值,第三依据相同网页的第二权重值计算出网页链接组的第三权重值;第四依据网页链接组的第三权重值对网页链接组进行从大到小的排序,排序时重复网页链接组只排一次。用户在具体的搜索过程中,当输入关键词组进行搜索时,反馈给用户的结果列表是已经进行了排列后的结果,即可以是与用户输入关键词组所对应的网页链接组,此时的网页链接组已经过了从大到小的排序。网页链接组依据第三权重值进行排序的含义是:包含越多重要网页链接的网页链接组其重要性越高;包含在越多网页链接组中的网页链接其重要性越高。用户根据反馈的结果,不需要依次点击网页链接来完成搜索过程,反馈的结果即是为用户提供了经验信息参考,从而使用户依据经验信息更快的完成搜索过程,提高了搜索效率,提升了用户的搜索体验效果。By obtaining the web page link search data when the user uses the search engine, the first weight value of each web page link is calculated, the second weight value of the same web page is calculated by combining the same web page links, and the third weight value of the same web page is calculated according to the second The weight value calculates the third weight value of the webpage link group; fourthly, the webpage link group is sorted from large to small according to the third weight value of the webpage link group, and the repeated webpage link group is only arranged once during sorting. In the specific search process, when the user enters a keyword group to search, the result list fed back to the user is the result that has been arranged, that is, the web page link group corresponding to the keyword group input by the user, the web page at this time. Link groups have been sorted from largest to smallest. The meaning of sorting the webpage link groups according to the third weight value is that the webpage link group containing more important webpage links has higher importance; the webpage link group included in more webpage link groups has higher importance. According to the feedback results, users do not need to click the web links in turn to complete the search process. The feedback results provide users with experience information for reference, so that users can complete the search process faster based on experience information, improve search efficiency, and improve User search experience.

本领域内的技术人员应明白,本发明的实施例可提供为方法、系统、或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block in the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing device to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing device produce Means for implementing the functions specified in a flow or flow of a flowchart and/or a block or blocks of a block diagram.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions The apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.

这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable data processing device to cause a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process such that The instructions provide steps for implementing the functions specified in the flow or blocks of the flowcharts and/or the block or blocks of the block diagrams.

尽管已描述了本发明的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例作出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本发明范围的所有变更和修改。Although preferred embodiments of the present invention have been described, additional changes and modifications to these embodiments may occur to those skilled in the art once the basic inventive concepts are known. Therefore, the appended claims are intended to be construed to include the preferred embodiment and all changes and modifications that fall within the scope of the present invention.

显然,本领域的技术人员可以对本发明实施例进行各种改动和变型而不脱离本发明实施例的精神和范围。这样,倘若本发明实施例的这些修改和变型属于本发明权利要求及其等同技术的范围之内,则本发明也意图包含这些改动和变型在内。Obviously, those skilled in the art can make various changes and modifications to the embodiments of the present invention without departing from the spirit and scope of the embodiments of the present invention. Thus, provided that these modifications and variations of the embodiments of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (6)

1.一种纵向搜索引擎的构建方法,其特征在于,所述方法包括:1. a construction method of vertical search engine, is characterized in that, described method comprises: 获取用户浏览网页时的搜索数据,所述搜索数据至少包括:若干个网页链接组,每一个所述网页链接组包括若干个网页链接;其中,所述搜索数据还包括:若干个关键词组,每一个所述关键词组对应每一个所述网页链接组;The search data when the user browses the web page is obtained, the search data at least includes: several web page link groups, each of which includes several web page links; wherein, the search data further includes: several keyword groups, each One of the keyword groups corresponds to each of the webpage link groups; 确定所述每一个所述网页链接组中所述若干个网页链接的第一权重值;determining a first weight value of the several webpage links in each of the webpage link groups; 在所述若干个网页链接中,将每一个相同的所述网页链接在对应的网页链接组中的第一权重值进行相加,作为合并后所述相同的所述网页链接的第二权重值;Among the plurality of web page links, the first weight value of each identical web page link in the corresponding web page link group is added, as the second weight value of the same web page link after merging ; 依据每一个所述网页链接的所述第二权重值获得每一组所述网页链 接组的第三权重值;obtaining a third weight value of each group of the webpage link groups according to the second weight value of each of the webpage links; 依据所述第三权重值将若干个所述网页链接组从大到小进行排序;Sorting a plurality of the webpage link groups from large to small according to the third weight value; 将排序后所述的若干个所述网页链接组按照排序顺序进行输出。The plurality of the web page link groups described after the sorting are output according to the sorting order. 2.如权利要求1所述的方法,其特征在于:2. The method of claim 1, wherein: 所述若干个网页链接组是三组,包括第一网页链接组,第二网页链接组,第三网页链接组;The several webpage link groups are three groups, including a first webpage link group, a second webpage link group, and a third webpage link group; 所述第一网页链接组中所述若干个网页链接的个数是n1;The number of the several webpage links in the first webpage link group is n1; 所述第二网页链接组中所述若干个网页链接的个数是n2;The number of the several webpage links in the second webpage link group is n2; 所述第三网页链接组中所述若干个网页链接的个数是n3;The number of the several webpage links in the third webpage link group is n3; 所述确定所述每一个所述网页链接组中所述若干个网页链接的第一权重值包括:The determining the first weight value of the several webpage links in each of the webpage link groups includes: 对于所述第一网页链接组中,每一个网页链接的第一权重值是1/n1;For the first webpage link group, the first weight value of each webpage link is 1/n1; 对于所述第二网页链接组中,每一个网页链接的第一权重值是1/n2;For the second webpage link group, the first weight value of each webpage link is 1/n2; 对于所述第三网页链接组中,每一个网页链接的第一权重值是1/n3;For the third webpage link group, the first weight value of each webpage link is 1/n3; 其中,所述n1、所述n2和所述n3均为正整数。Wherein, the n1, the n2 and the n3 are all positive integers. 3.如权利要求2所述的方法,其特征在于:3. The method of claim 2, wherein: 在所述若干个网页链接中,将每一个不同的所述网页链接在对应的网页链接组中的第一权重,作为所述不同的所述网页链接的第二权重。Among the several webpage links, the first weight of each different webpage link in the corresponding webpage link group is used as the second weight of the different webpage link. 4.如权利要求3所述的方法,其特征在于,所述依据每一个所述网页链接的所述第二权重值获得每一组所述网页链 接组的第三权重值包括:4. The method of claim 3, wherein the obtaining the third weight value of each group of the webpage link groups according to the second weight value of each of the webpage links comprises: 分别将每一个所述网页链接组中所述若干个所述网页链接的第二权重值相加,作为所述网页链接组的第三权重值。The second weight values of the several web page links in each of the web page link groups are respectively added together as a third weight value of the web page link group. 5.一种纵向搜索引擎的构建装置,其特征在于,所述装置包括:5. A device for constructing a vertical search engine, wherein the device comprises: 搜索数据获取模块,用于获取用户浏览网页时的搜索数据,所述搜索数据至少包括:若干个网页链接组,每一个所述网页链接组包括若干个网页链接;其中,所述搜索数据还包括:若干个关键词组,每一个所述关键词组对应每一个所述网页链接组;A search data acquisition module, used for acquiring search data when a user browses a web page, the search data at least includes: several web page link groups, each of which includes several web page links; wherein, the search data further includes : several keyword groups, each of which corresponds to each of the web page link groups; 第一权重值确定模块,用于确定所述每一个所述网页链接组中所述若干个网页链接的第一权重值;a first weight value determination module, configured to determine the first weight value of the several webpage links in each of the webpage link groups; 第二权重值确定模块,用于在所述若干个网页链接中,将每一个相同的所述网页链接在对应的网页链接组中的第一权重值进行相加,作为合并后所述相同的所述网页链接的第二权重值;The second weight value determination module is configured to add the first weight values of each identical web page link in the corresponding web page link group among the several web page links, as the combined identical web page link. the second weight value of the webpage link; 第三权重值确定模块,用于依据每一个所述网页链接的所述第二权重值获得每一组所述网页链接组的第三权重值;a third weight value determining module, configured to obtain the third weight value of each group of the webpage link groups according to the second weight value of each of the webpage links; 排序模块,依据所述第三权重值将若干个所述网页链接组从大到小进行排序。The sorting module sorts several of the webpage link groups from large to small according to the third weight value. 6.如权利要求5所述的装置,其特征在于,还包括:6. The apparatus of claim 5, further comprising: 输出模块,用于将排序后所述的若干个所述网页链接组按照排序顺序进行输出。The output module is used for outputting the plurality of the webpage link groups after sorting according to the sorting order.
CN201710367823.1A 2017-05-23 2017-05-23 Method and device for constructing longitudinal search engine Expired - Fee Related CN107291817B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710367823.1A CN107291817B (en) 2017-05-23 2017-05-23 Method and device for constructing longitudinal search engine

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710367823.1A CN107291817B (en) 2017-05-23 2017-05-23 Method and device for constructing longitudinal search engine

Publications (2)

Publication Number Publication Date
CN107291817A CN107291817A (en) 2017-10-24
CN107291817B true CN107291817B (en) 2020-07-03

Family

ID=60094556

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710367823.1A Expired - Fee Related CN107291817B (en) 2017-05-23 2017-05-23 Method and device for constructing longitudinal search engine

Country Status (1)

Country Link
CN (1) CN107291817B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104978416A (en) * 2015-06-26 2015-10-14 北京理工大学 Redis-based intelligent object retrieval method
EP2950226A1 (en) * 2014-05-30 2015-12-02 Linkedin Corporation New heuristic for optimizing non-convex function for learning to rank

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8725729B2 (en) * 2006-04-03 2014-05-13 Steven G. Lisa System, methods and applications for embedded internet searching and result display
CN102456016B (en) * 2010-10-18 2014-10-01 中国移动通信集团四川有限公司 A method and device for sorting search results
CN104035927B (en) * 2013-03-05 2020-03-03 百度在线网络技术(北京)有限公司 Search method and system based on user behaviors

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2950226A1 (en) * 2014-05-30 2015-12-02 Linkedin Corporation New heuristic for optimizing non-convex function for learning to rank
CN104978416A (en) * 2015-06-26 2015-10-14 北京理工大学 Redis-based intelligent object retrieval method

Also Published As

Publication number Publication date
CN107291817A (en) 2017-10-24

Similar Documents

Publication Publication Date Title
CN110457581B (en) Information recommendation method and device, electronic equipment and storage medium
CN102054004B (en) Webpage recommendation method and device adopting same
US9396262B2 (en) System and method for enhancing search relevancy using semantic keys
JP5913736B2 (en) Keyword recommendation
US10534781B2 (en) Website traffic optimization
US8745039B2 (en) Method and system for user guided search navigation
US8983954B2 (en) Finding data in connected corpuses using examples
CN112084150B (en) Model training, data retrieval method, device, equipment and storage medium
CN113204621B (en) Document storage, document retrieval method, device, equipment and storage medium
TWI615723B (en) Network search method and device
US9262555B2 (en) Machine for recognizing or generating Jabba-type sequences
WO2008106667A1 (en) Searching heterogeneous interrelated entities
CN102043833A (en) Search method and device based on query word
Kowalczyk et al. Enhancing SEO in single-page web applications in contrast with multi-page applications
CN111159563A (en) Method, device and equipment for determining user interest point information and storage medium
CN105389328B (en) A large-scale open source software search ranking optimization method
CN104281619A (en) System and method for ordering search results
CN102999495B (en) A kind of synonym Semantic mapping relation determines method and device
Kumar Apache Solr search patterns
JP5165021B2 (en) Category processing apparatus and method
US20160092595A1 (en) Systems And Methods For Processing Graphs
CN105608183B (en) A kind of method and apparatus that polymeric type is provided and is answered
CN107291817B (en) Method and device for constructing longitudinal search engine
US9195940B2 (en) Jabba-type override for correcting or improving output of a model
CN103136223B (en) A method and device for mining queries with similar requirements

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address

Address after: No.8, wenhuayuan Road, Donghu Development Zone, Wuhan City, Hubei Province

Patentee after: WENHUA College

Address before: 430074 No. 8 Wenhua Road, East Lake hi tech Development Zone, Hubei, Wuhan

Patentee before: HUAZHONG UNIVERSITY OF SCIENCE AND TECHNOLOGY WENHUA College

CP03 Change of name, title or address
TR01 Transfer of patent right

Effective date of registration: 20201125

Address after: No.091, area C, 6 / F, building 7, Chuangye street, Guanggu, Donghu New Technology Development Zone, Wuhan, Hubei Province

Patentee after: Wuhan Duomi Technology Co.,Ltd.

Address before: No.8, wenhuayuan Road, Donghu Development Zone, Wuhan City, Hubei Province

Patentee before: WENHUA College

TR01 Transfer of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20200703

CF01 Termination of patent right due to non-payment of annual fee