CN102929984B

CN102929984B - Inefficacy address searching method and apparatus

Info

Publication number: CN102929984B
Application number: CN201210397984.2A
Authority: CN
Inventors: 赵飞
Original assignee: Beijing Qihoo Technology Co Ltd; Qizhi Software Beijing Co Ltd
Current assignee: Beijing Qizhi Business Consulting Co ltd; Beijing Qihoo Technology Co Ltd
Priority date: 2012-10-18
Filing date: 2012-10-18
Publication date: 2016-06-22
Anticipated expiration: 2032-10-18
Also published as: CN102929984A

Abstract

The invention discloses a method and device for searching an invalid website, wherein the device includes a website information collection module; a search request receiving module; an invalid website judgment module; a webpage snapshot acquisition module; wherein the search request receiving module includes: The search request sending submodule of the server is adapted to receive the search request and send the search request to the server; the search result return submodule located at the server is adapted to grab webpages related to the search request in the database to form search results Return to the browser; the search result display submodule located in the browser is suitable for displaying the search result. The invention can ensure that the user normally browses the content of the webpage when the user fails to click on the search result.

Description

Invalid website search method and device

技术领域 technical field

本发明涉及互联网访问技术领域，具体涉及一种失效网址搜索方法，以及一种失效网址搜索装置。The invention relates to the technical field of Internet access, in particular to an invalid website search method and an invalid website search device.

背景技术 Background technique

伴随互联网的普及和网上信息的爆炸式增长，搜索引擎越来越引起人们的重视，目前，搜索引擎技术成为仅次于门户的互联网第二大核心技术。With the popularization of the Internet and the explosive growth of online information, search engines have attracted more and more attention. At present, search engine technology has become the second core technology of the Internet after portals.

在使用搜索引擎进行网页搜索时，点击某一搜索结果可能会出现无法访问的情况，这是因为互联网上的网页经常发生变化，当被搜索到的网页被删除或死链时，直接点击链接无法查看网页的内容。When using a search engine to search webpages, clicking on a certain search result may result in inaccessibility. This is because webpages on the Internet often change. View the content of a web page.

在这种情况下，如果用户需要继续查看该无法访问的网页的内容，用户不得不重新查找相应的网址或搜索相关的内容，搜索效率低下，用户体验非常差，并且增加了客户端与服务器的资源耗费。In this case, if the user needs to continue to view the content of the inaccessible webpage, the user has to find the corresponding URL or search for related content again, which results in low search efficiency, very poor user experience, and increases the communication between the client and the server. Resource consumption.

因此，本领域技术人员需要解决的技术问题是提供一种搜索机制，能够在用户点击搜索结果失败时保证用户正常浏览该网页的内容。Therefore, the technical problem to be solved by those skilled in the art is to provide a search mechanism that can ensure that the user can normally browse the content of the webpage when the user fails to click on the search result.

发明内容 Contents of the invention

鉴于上述问题，提出了本发明以便提供一种克服上述问题或者至少部分地解决上述问题的一种基于失效网址搜索方法和相应的搜索装置。In view of the above problems, the present invention is proposed to provide a search method and a corresponding search device based on invalid URLs that overcome the above problems or at least partially solve the above problems.

依据本发明的一个方面，提供了一种失效网址搜索方法，包括：According to one aspect of the present invention, a method for searching invalid URLs is provided, including:

采集多台用户设备的浏览器收藏夹的网址信息，保存所述网址信息至数据库，所述网址信息包括网址的网页快照；Collecting the URL information of the browser favorites of multiple user devices, saving the URL information to a database, the URL information including the webpage snapshot of the URL;

浏览器接收搜索请求并将所述搜索请求发送至服务器；The browser receives the search request and sends the search request to the server;

服务器在数据库中抓取与所述搜索请求相关的网页形成搜索结果返回给浏览器；The server grabs webpages related to the search request in the database to form a search result and returns it to the browser;

浏览器展示所述搜索结果；the browser displays said search results;

判断访问某个搜索结果的网址是否为失效网址；Determine whether the URL for accessing a certain search result is an invalid URL;

若所述搜索结果的网址为失效网址，服务器在数据库中查找匹配的网页快照，并返回至浏览器。If the URL of the search result is an invalid URL, the server searches the database for a matching webpage snapshot and returns it to the browser.

可选地，所述网页快照为服务器获取所述网页的代码保存生成，或为，在所述服务器获取该网页的代码保存不成功时，通知浏览器将对应的网页的代码上传生成。Optionally, the webpage snapshot is generated by saving the code of the webpage obtained by the server, or by notifying the browser to upload and generate the code of the corresponding webpage when the server fails to obtain the code of the webpage and save it.

可选地，所述判断访问某个搜索结果的网址是否为失效网址的步骤包括：Optionally, the step of judging whether the URL for accessing a certain search result is an invalid URL includes:

浏览器将所述搜索结果的网址发送至服务器；The browser sends the URL of the search result to the server;

服务器对所述搜索结果的网址进行解析生成响应消息返回浏览器；The server parses the URL of the search result to generate a response message and returns it to the browser;

浏览器解析所述响应消息，提取对应网址的HTTP状态码；The browser parses the response message and extracts the HTTP status code of the corresponding URL;

浏览器依据所述HTTP状态码判定网址访问请求是否为失效网址的访问请求。The browser judges whether the URL access request is an invalid URL access request according to the HTTP status code.

服务器对所述搜索结果的网址进行解析，提取对应网址中的HTTP状态码；The server parses the URL of the search result, and extracts the HTTP status code in the corresponding URL;

服务器依据所述HTTP状态码判定网址访问请求是否为失效网址的访问请求。The server determines whether the URL access request is an invalid URL access request according to the HTTP status code.

根据本发明的另一方面，提供了一种失效网址搜索装置，包括：According to another aspect of the present invention, a device for searching invalid URLs is provided, including:

网址信息采集模块，适于采集多台用户设备的浏览器收藏夹的网址信息，保存所述网址信息至数据库，所述网址信息包括网址的网页快照；The website information collection module is adapted to collect website information of browser favorites of multiple user devices, and saves the website information to a database, and the website information includes a webpage snapshot of the website;

搜索请求接收模块，适于接收搜索请求，并根据所述搜索请求返回搜索结果；A search request receiving module, adapted to receive a search request, and return search results according to the search request;

失效网址判断模块，适于判断访问某个搜索结果的网址是否为失效网址；An invalid URL judging module, suitable for judging whether a URL for accessing a certain search result is an invalid URL;

网页快照获取模块，适于在所述搜索结果的网址为失效网址时，服务器在数据库中查找匹配的网页快照，并返回至浏览器；The webpage snapshot obtaining module is suitable for searching the matching webpage snapshot in the database by the server when the URL of the search result is an invalid URL, and returning it to the browser;

其中，所述搜索请求接收模块包括：Wherein, the search request receiving module includes:

位于浏览器的搜索请求发送子模块，适于接收搜索请求并将所述搜索请求发送至服务器；The search request sending submodule located in the browser is adapted to receive the search request and send the search request to the server;

位于服务器的搜索结果返回子模块，适于在数据库中抓取与所述搜索请求相关的网页形成搜索结果返回给浏览器；The search result return submodule located at the server is adapted to grab webpages related to the search request in the database to form a search result and return it to the browser;

位于浏览器的搜索结果展示子模块，适于展示所述搜索结果。The search result display submodule located in the browser is suitable for displaying the search result.

可选地，所述失效网址判断模块包括：Optionally, the failure URL judging module includes:

位于浏览器的第一网址发送子模块，适于将所述搜索结果的网址发送至服务器；The first URL sending submodule located in the browser is suitable for sending the URL of the search result to the server;

位于服务器的响应消息返回子模块，适于对所述搜索结果的网址进行解析生成响应消息返回浏览器；The response message return submodule located in the server is adapted to analyze the URL of the search result to generate a response message and return it to the browser;

位于浏览器的HTTP状态码获取子模块，适于解析所述响应消息，提取对应网址的HTTP状态码；The HTTP status code acquisition submodule located in the browser is suitable for parsing the response message and extracting the HTTP status code of the corresponding URL;

位于浏览器的网址判定子模块，适于依据所述HTTP状态码判定网址访问请求是否为失效网址的访问请求。The URL determination submodule located in the browser is adapted to determine whether the URL access request is an invalid URL access request according to the HTTP status code.

位于浏览器的第二网址发送子模块，适于将所述搜索结果的网址发送至服务器；The second URL sending submodule located in the browser is suitable for sending the URL of the search result to the server;

位于服务器的HTTP状态码获取子模块，适于对所述搜索结果的网址进行解析，提取对应网址中的HTTP状态码；The HTTP status code acquisition submodule located in the server is suitable for parsing the URL of the search result and extracting the HTTP status code in the corresponding URL;

位于服务器的网址判定子模块，适于依据所述HTTP状态码判定网址访问请求是否为失效网址的访问请求。The URL judging submodule located on the server is suitable for judging whether the URL access request is an invalid URL access request according to the HTTP status code.

根据本发明的一种基于收藏夹的搜索方法可以提供一种基于收藏夹的收藏机制，由此解决了针对搜索请求得到的搜索结果无法正常访问的问题取得了保证用户正常浏览所述无法正常访问的搜索结果的网页内容，提高搜索效率的有益效果。A search method based on favorites according to the present invention can provide a collection mechanism based on favorites, thereby solving the problem that the search results obtained for the search request cannot be accessed normally, and ensuring the normal browsing of the user. The content of web pages in the search results has the beneficial effect of improving search efficiency.

上述说明仅是本发明技术方案的概述，为了能够更清楚了解本发明的技术手段，而可依照说明书的内容予以实施，并且为了让本发明的上述和其它目的、特征和优点能够更明显易懂，以下特举本发明的具体实施方式。The above description is only an overview of the technical solution of the present invention. In order to better understand the technical means of the present invention, it can be implemented according to the contents of the description, and in order to make the above and other purposes, features and advantages of the present invention more obvious and understandable , the specific embodiments of the present invention are enumerated below.

附图说明 Description of drawings

通过阅读下文优选实施方式的详细描述，各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的，而并不认为是对本发明的限制。而且在整个附图中，用相同的参考符号表示相同的部件。在附图中：Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiment. The drawings are only for the purpose of illustrating a preferred embodiment and are not to be considered as limiting the invention. Also throughout the drawings, the same reference numerals are used to designate the same components. In the attached picture:

图1示出了根据本发明一个实施例的一种失效网址搜索方法实施例的步骤流程图；FIG. 1 shows a flow chart of steps of an embodiment of a method for searching an invalid website according to an embodiment of the present invention;

图2示出了根据本发明一个实施例的一种失效网址搜索装置实施例的结构框图。Fig. 2 shows a structural block diagram of an embodiment of an invalid website search device according to an embodiment of the present invention.

具体实施方式 detailed description

下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例，然而应当理解，可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反，提供这些实施例是为了能够更透彻地理解本公开，并且能够将本公开的范围完整的传达给本领域的技术人员。Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.

本发明实施例的核心构思之一在于，通过采集多台用户设备中浏览器收藏夹的网址信息和所述网址对应的网页快照，将网址信息和网页快照保存至数据库。当针对搜索请求返回相应的搜索结果时，判断所述搜索结果是否为失效网址，若是，服务器返回网址对应的网页快照给浏览器。One of the core concepts of the embodiments of the present invention is to save the website information and the webpage snapshots to the database by collecting the website information in favorites of browsers in multiple user devices and the webpage snapshots corresponding to the websites. When a corresponding search result is returned for the search request, it is judged whether the search result is an invalid URL, and if so, the server returns a webpage snapshot corresponding to the URL to the browser.

参照图1，示出了根据本发明一个实施例的失效网址搜索方法实施例的步骤流程图，具体可以包括以下步骤：Referring to FIG. 1 , it shows a flow chart of the steps of an embodiment of an invalid website search method according to an embodiment of the present invention, which may specifically include the following steps:

步骤101：采集多台用户设备的浏览器收藏夹的网址信息，保存所述网址信息至数据库，所述网址信息包括网址的网页快照；Step 101: Collect URL information of browser favorites of multiple user devices, save the URL information to a database, and the URL information includes a webpage snapshot of the URL;

网页快照，英文名叫WebCache，网页缓存。搜索引擎在收录网页时，对网页进行备份，存在自己的服务器缓存里，当用户在搜索引擎中点击“网页快照”链接时，搜索引擎将Spider(蜘蛛)系统当时所抓取并保存的网页内容展现出来，称为“网页快照”。在本发明中，所述网页快照可以由服务器获取所述网页的代码保存生成，或者，可以在所述服务器获取该网页的代码保存不成功时，通知浏览器将对应的网页的代码上传生成。也就是说，网页快照在服务器侧的展现为一些网页代码。Web page snapshot, the English name is WebCache, web page cache. When the search engine collects the webpage, it backs up the webpage and stores it in its own server cache. When the user clicks the "webpage snapshot" link in the search engine, the search engine will take the content of the webpage captured and saved by the Spider (spider) system at that time. Displayed, called "web snapshot". In the present invention, the web page snapshot can be generated by the server obtaining the code of the web page and saving, or, when the server fails to obtain the code of the web page and saving, it can notify the browser to upload and generate the code of the corresponding web page. That is to say, the webpage snapshot is presented as some webpage codes on the server side.

网页代码就是指在网页制作过程中需要用到的一些特殊的“语言”，设计人员通过对这些“语言”进行组织编排制作出网页，然后由浏览器对代码进行“翻译”后才是我们最终看到的效果。目前制作网页时常用的代码有HTML，JavaScript，ASP，PHP，CGI等，其中HTML是最基础的网页代码。所述网页代码可以由服务器在解析浏览器的请求消息时直接获取；或者，所述网页代码也可以在浏览器解析服务器返回的响应消息时获取，然后将网页代码上传至服务器。使用服务器来获取网页代码的好处是这样可以节省用户的上网流量，最小地耗用用户带宽，当服务器保存网页代码失败的时候，可以通知浏览器获取网页代码上传，服务器再对所述网页代码进行保存，浏览器上传所述网页代码时可以采用压缩代码的方式将所述网页代码上传，这样也可以降低上传的流浪，减少带宽。Webpage code refers to some special "languages" that need to be used in the process of making webpages. Designers organize and arrange these "languages" to create webpages, and then the browser "translates" the codes. See the effect. At present, the codes commonly used in making web pages include HTML, JavaScript, ASP, PHP, CGI, etc., among which HTML is the most basic web page code. The web page code may be directly obtained by the server when parsing the browser's request message; or, the web page code may also be obtained when the browser parses the response message returned by the server, and then upload the web page code to the server. The advantage of using the server to obtain the webpage code is that it can save the user's Internet traffic and consume the user's bandwidth minimally. When the server fails to save the webpage code, it can notify the browser to obtain the webpage code and upload it, and then the server will upload the webpage code. Save, when the browser uploads the webpage code, the webpage code can be uploaded in a compressed code manner, which can also reduce uploading vagrants and reduce bandwidth.

在具体实现中，一种服务器保存网页代码不成功的情况可以是一些网站为了防止自己内容被其他人恶意盗用，会在自己服务器上做一些访问限制，例如限定其他机器对它的访问频率，这样服务器就不能直接保存网页代码，在具体实现中，服务器可以将网页代码进行哈希算法得到网站内容验证串，将所述网站内容验证串与预设的保存检验接口中的网站内容验证串进行比较判断服务器保存网页代码是否成功，如果所述网站内容验证串存在于预设的保存检验接口中则服务器保存网页代码成功，否则，服务器保存代码不成功。本领域技术人员采用其他方式均是可以的，本发明对此不作限制。In a specific implementation, a situation where the server fails to save the webpage code can be that some websites will impose some access restrictions on their own servers in order to prevent their content from being maliciously stolen by others, such as limiting the frequency of access to it by other machines, so that The server cannot directly save the webpage code. In a specific implementation, the server can perform a hash algorithm on the webpage code to obtain a website content verification string, and compare the website content verification string with the website content verification string in the preset save verification interface. It is judged whether the server saves the webpage code successfully. If the website content verification string exists in the preset save check interface, the server successfully saves the webpage code; otherwise, the server fails to save the code. Those skilled in the art may adopt other methods, which are not limited in the present invention.

浏览器在采集多台用户设备的浏览器收藏夹的网址信息后，将网址信息保存在数据库中以供后续的搜索使用。在具体实现中，本发明可以用两个数据库保存网址信息，一个是内容数据库，一个是网页快照数据库，网页快照数据库用于保存网址的网页快照，内容数据库用户保存网址除网页快照外的其他信息；或者，本发明也可以建立一个数据库，数据库中包括两张表，一张用于存储网页快照，一张用于存储网页快照以外的内容，本领域技术人员应该可以理解，上述的网址信息存储方式仅仅是本发明的示例，本领域技术人员可以采用其他存储方法进行存储，本发明在此不作限制。After the browser collects the URL information of the browser favorites of multiple user devices, it saves the URL information in the database for subsequent search. In the specific implementation, the present invention can use two databases to save website information, one is the content database, and the other is the webpage snapshot database. Or, the present invention can also set up a database, including two tables in the database, one is used for storing the webpage snapshot, and one is used for storing the content other than the webpage snapshot, those skilled in the art should be able to understand, the above-mentioned URL information storage The manner is only an example of the present invention, and those skilled in the art may use other storage methods for storage, and the present invention is not limited here.

步骤102：浏览器接收搜索请求并将所述搜索请求发送至服务器；Step 102: the browser receives the search request and sends the search request to the server;

步骤103：服务器在数据库中抓取与所述搜索请求相关的网页形成搜索结果返回给浏览器；Step 103: the server grabs webpages related to the search request in the database to form a search result and return it to the browser;

例如，当用户在浏览器中进行关键词搜索时，浏览器接收用户的搜索的关键词后将关键词发送给服务器，服务器根据所述关键词到所述内容数据库中抓取跟关键词相关的网页内容形成搜索结果返回给浏览器。在具体实现中，搜索结果可以按照网页的权重进行排序然后返回，也可以按照其他方法进行排序返回，本发明在此不作限制。For example, when a user searches for a keyword in a browser, the browser receives the keyword of the user's search and sends the keyword to the server, and the server fetches information related to the keyword from the content database according to the keyword. The content of the webpage forms the search result and returns it to the browser. In a specific implementation, the search results can be sorted according to the weight of the web pages and then returned, or sorted and returned according to other methods, which is not limited in the present invention.

步骤104：浏览器展示所述搜索结果。Step 104: The browser displays the search results.

步骤105：判断访问某个搜索结果的网址是否为失效网址；Step 105: judging whether the URL for accessing a certain search result is an invalid URL;

当用户需要查看某个搜索结果时，浏览器或服务器首先判断所述搜索结果对应的网址能否正常访问，如果所述网址不能正常访问，则将网址对应的网页快照展示给用户。When a user needs to view a certain search result, the browser or server first judges whether the URL corresponding to the search result can be accessed normally, and if the URL cannot be accessed normally, then the web page snapshot corresponding to the URL is displayed to the user.

一般情况下，采用HTTP状态码(HTTPStatusCode)来判断网址的有效性。HTTP状态码由三位十进制数字组成，用以指出网页访问请求的成功或失败，如果失败则指出原因。HTTP状态码分分五种类型，由其第一位数字表示：Generally, the validity of the URL is judged by using the HTTP status code (HTTPStatusCode). The HTTP status code consists of three decimal numbers to indicate the success or failure of the web page access request, and if it fails, the reason is indicated. There are five types of HTTP status codes, represented by their first digits:

以1开头的3位数字代码，包括100(客户端应当继续发送请求)、101(服务器已经理解了客户端的请求，并将通过Upgrade(升级)消息头通知客户端采用不同的协议来完成这个请求)、102(由WebDAV(Web-basedDistributedAuthoringandVersioning，一种基于HTTP1.1协议的通信协议)扩展的状态码，代表处理将被继续执行)，表示请求已被接受，需要继续处理，这类响应是临时响应，只包含状态行和某些可选的响应头信息，并以空行结束，但是由于HTTP/1.0协议中没有定义任何以1开头的状态码，所以除非在某些试验条件下，服务器禁止向此类客户端发送此类状态码的响应；A 3-digit code starting with 1, including 100 (the client should continue to send the request), 101 (the server has understood the client's request, and will notify the client to use a different protocol to complete the request through the Upgrade (upgrade) message header ), 102 (status code extended by WebDAV (Web-based Distributed Authoring and Versioning, a communication protocol based on HTTP1.1 protocol), which means that the processing will continue), indicating that the request has been accepted and needs to continue processing. This type of response is temporary The response contains only the status line and some optional response header information, and ends with a blank line, but since no status code starting with 1 is defined in the HTTP/1.0 protocol, unless under certain experimental conditions, the server prohibits send responses with such status codes to such clients;

以2开头的3位数字代码，包括200(请求已成功，请求所希望的响应头或数据体将随此响应返回)、201(请求已经被实现，而且有一个新的资源已经依据请求的需要而建立)、202(服务器已接受请求，但尚未处理)、203(服务器已成功处理了请求，但返回的实体头部元信息不是在原始服务器上有效的确定集合，而是来自本地或者第三方的拷贝)、204(服务器成功处理了请求，但不需要返回任何实体内容，并且希望返回更新了的元信息)、205(服务器成功处理了请求，且没有返回任何内容)、206(服务器已经成功处理了部分GET请求)、207(由WebDAV(RFC2518)扩展的状态码，代表之后的消息体将是一个XML消息)，表示请求已成功被服务器接收、理解、并接受；A 3-digit code starting with 2, including 200 (the request has been successful, and the response header or data body expected by the request will be returned with this response), 201 (the request has been implemented, and a new resource has been requested according to the needs of the request and established), 202 (the server has accepted the request, but has not yet processed it), 203 (the server has successfully processed the request, but the returned entity header meta information is not a definite set valid on the original server, but comes from the local or third party ), 204 (the server successfully processed the request, but does not need to return any entity content, and hopes to return updated meta information), 205 (the server successfully processed the request, and did not return any content), 206 (the server has successfully Part of the GET request has been processed), 207 (the status code extended by WebDAV (RFC2518), which means that the subsequent message body will be an XML message), indicating that the request has been successfully received, understood, and accepted by the server;

以3开头的3位数字代码，300(用户或浏览器能够自行选择一个首选的地址进行重定向)、301(被请求的资源已永久移动到新位置，并且将来任何对此资源的引用都应该使用本响应返回的若干个URI(通用资源标志符)之一)、302(请求的资源现在临时从不同的URI响应请求)、303(对应当前请求的响应可以在另一个URI上被找到，而且客户端应当采用GET的方式访问那个资源)、304(如果客户端发送了一个带条件的GET请求且该请求已被允许，而文档的内容(自上次访问以来或者根据请求的条件)并没有改变，则服务器应当返回这个状态码)、305(被请求的资源必须通过指定的代理才能被访问)、306(在最新版的规范中，306状态码已经不再被使用)、307(请求的资源现在临时从不同的URI响应请求)，表示需要客户端采取进一步的操作才能完成请求，通常，这些状态码用来重定向，后续的请求地址(重定向目标)在本次响应的位置域中指明；A 3-digit code starting with 3, 300 (the user or browser can choose a preferred address for redirection), 301 (the requested resource has been permanently moved to a new location, and any future references to this resource should be Use one of several URIs (Universal Resource Identifiers) returned by this response), 302 (the requested resource is now temporarily responding to the request from a different URI), 303 (the response corresponding to the current request can be found at another URI, and The client should use GET to access that resource), 304 (if the client sends a conditional GET request and the request is allowed, and the content of the document (since the last access or according to the requested conditions) does not change, the server should return this status code), 305 (the requested resource must be accessed through the specified proxy), 306 (in the latest version of the specification, the 306 status code is no longer used), 307 (the requested The resource is now temporarily responding to the request from a different URI), indicating that the client needs to take further action to complete the request. Usually, these status codes are used for redirection, and the subsequent request address (redirection target) is in the location field of this response specified;

以4开头的3位数字代码，包括400(语义有误，当前请求无法被服务器理解，请求参数有误)、401(当前请求需要用户验证)、402(该状态码是为了将来可能的需求而预留的)、403(服务器已经理解请求，但是拒绝执行它)、404(请求失败，请求所希望得到的资源未被在服务器上发现)、405(请求行中指定的请求方法不能被用于请求相应的资源)、406(请求的资源的内容特性无法满足请求头中的条件，因而无法生成响应实体)、407(与401响应类似，只不过客户端必须在代理服务器上进行身份验证)、408(请求超时)、409(由于和被请求的资源的当前状态之间存在冲突，请求无法完成)、410(被请求的资源在服务器上已经不再可用，而且没有任何已知的转发地址)、411(服务器拒绝在没有定义Content-Length头的情况下接受请求)、412(服务器在验证在请求的头字段中给出先决条件时，没能满足其中的一个或多个)、413(服务器拒绝处理当前请求，因为该请求提交的实体数据大小超过了服务器愿意或者能够处理的范围)、414(请求的URI长度超过了服务器能够解释的长度，因此服务器拒绝对该请求提供服务)、415(对于当前请求的方法和所请求的资源，请求中提交的实体并不是服务器中所支持的格式，因此请求被拒绝)、416(如果请求中包含了Range请求头，并且Range中指定的任何数据范围都与当前资源的可用范围不重合，同时请求中又没有定义If-Range请求头，那么服务器就应当返回416状态码)、417(在请求头Expect中指定的预期内容无法被服务器满足，或者这个服务器是一个代理服务器，它有明显的证据证明在当前路由的下一个节点上，Expect的内容无法被满足)、421(从当前客户端所在的IP地址到服务器的连接数超过了服务器许可的最大范围)、422(请求格式正确，但是由于含有语义错误，无法响应)、424(由于之前的某个请求发生的错误，导致当前请求失败)、425(在WebDavAdvancedCollections草案中定义，但是未出现在《WebDAV顺序集协议》(RFC3658)中)、426(客户端应当切换到TLS/1.0)、449(由微软扩展，代表请求应当在执行完适当的操作后进行重试)，表示客户端看起来可能发生了错误，妨碍了服务器的处理；3-digit codes starting with 4, including 400 (semantic error, the current request cannot be understood by the server, and request parameters are incorrect), 401 (the current request requires user verification), and 402 (this status code is for possible future needs) Reserved), 403 (the server has understood the request, but refused to execute it), 404 (the request failed, the resource desired by the request was not found on the server), 405 (the request method specified in the request line cannot be used for Request the corresponding resource), 406 (the content characteristics of the requested resource cannot meet the conditions in the request header, so the response entity cannot be generated), 407 (similar to the 401 response, except that the client must authenticate on the proxy server), 408 (request timed out), 409 (the request cannot be completed due to a conflict with the current state of the requested resource), 410 (the requested resource is no longer available on the server, and there is no known forwarding address) , 411 (the server refuses to accept the request without defining the Content-Length header), 412 (the server failed to meet one or more of the prerequisites given in the header field of the request), 413 (the server Refuse to process the current request, because the size of the entity data submitted by the request exceeds the range that the server is willing or capable of processing), 414 (the length of the requested URI exceeds the length that the server can interpret, so the server refuses to provide services for the request), 415 ( For the method of the current request and the requested resource, the entity submitted in the request is not in the format supported by the server, so the request is rejected), 416 (if the request contains the Range request header, and any data range specified in the Range Both do not coincide with the available range of the current resource, and the If-Range request header is not defined in the request, then the server should return a 416 status code), 417 (the expected content specified in the request header Expect cannot be satisfied by the server, or this The server is a proxy server, and it has obvious evidence that the content of Expect cannot be satisfied on the next node of the current route), 421 (the number of connections from the IP address of the current client to the server exceeds the maximum allowed by the server) range), 422 (the request format is correct, but it cannot respond due to semantic errors), 424 (the current request fails due to an error in a previous request), 425 (defined in the WebDavAdvancedCollections draft, but does not appear in " WebDAV Sequenced Set Protocol (RFC3658), 426 (the client should switch to TLS/1.0), 449 (extended by Microsoft, which means that the request should be retried after performing appropriate operations), indicating that the client may appear to be an error has occurred which prevents processing by the server;

以5开头的3位数字代码，包括500(服务器遇到了一个未曾预料的状况，导致了它无法完成对请求的处理)、501(服务器不支持当前请求所需要的某个功能)、502(作为网关或者代理工作的服务器尝试执行请求时，从上游服务器接收到无效的响应)、503(由于临时的服务器维护或者过载，服务器当前无法处理请求)、504(作为网关或者代理工作的服务器尝试执行请求时，未能及时从上游服务器收到响应)、505(服务器不支持，或者拒绝支持在请求中使用的HTTP版本)、506(由《透明内容协商协议》(RFC2295)扩展，代表服务器存在内部配置错误)、507(服务器无法存储完成请求所必须的内容)、509(服务器达到带宽限制)、510(获取资源所需要的策略并没有没满足)，表示服务器在处理请求的过程中有错误或者异常状态发生，也有可能是服务器意识到以当前的软硬件资源无法完成对请求的处理。A 3-digit code starting with 5, including 500 (the server encountered an unexpected situation, which caused it to be unable to complete the processing of the request), 501 (the server does not support a function required by the current request), 502 (as An invalid response was received from an upstream server when a server working as a gateway or proxy tried to execute the request), 503 (the server is currently unable to process the request due to temporary server maintenance or overload), 504 (a server working as a gateway or proxy attempted to execute the request Failed to receive a response from the upstream server in a timely manner), 505 (the server does not support, or refuses to support the HTTP version used in the request), 506 (extended by the "Transparent Content Negotiation Protocol" (RFC2295), which means that the server has internal configuration Error), 507 (the server cannot store the content necessary to complete the request), 509 (the server reaches the bandwidth limit), 510 (the strategy required to obtain the resource is not satisfied), indicating that the server has an error or exception in the process of processing the request The status occurs, and it may also be that the server realizes that the processing of the request cannot be completed with the current hardware and software resources.

在本发明的一种优选实施例中，所述步骤103可以包括如下子步骤：In a preferred embodiment of the present invention, the step 103 may include the following sub-steps:

子步骤S21：浏览器将所述搜索结果的网址发送至服务器；Sub-step S21: the browser sends the URL of the search result to the server;

子步骤S22：服务器对所述搜索结果的网址进行解析生成响应消息返回浏览器；Sub-step S22: The server parses the URL of the search result to generate a response message and returns it to the browser;

子步骤S23：浏览器解析所述响应消息，提取对应网址的HTTP状态码；Sub-step S23: the browser parses the response message, and extracts the HTTP status code of the corresponding URL;

子步骤S24：浏览器依据所述HTTP状态码判定网址访问请求是否为失效网址的访问请求。Sub-step S24: The browser determines whether the URL access request is an invalid URL access request according to the HTTP status code.

在本发明的另一种优选实施例中，所述步骤103可以包括如下子步骤：In another preferred embodiment of the present invention, the step 103 may include the following sub-steps:

子步骤S31：浏览器将所述搜索结果的网址发送至服务器；Sub-step S31: the browser sends the URL of the search result to the server;

子步骤S32：服务器对所述搜索结果的网址进行解析，提取对应网址中的HTTP状态码；Sub-step S32: the server parses the URL of the search result, and extracts the HTTP status code in the corresponding URL;

子步骤S33：服务器依据所述HTTP状态码判定网址访问请求是否为失效网址的访问请求。Sub-step S33: The server determines whether the URL access request is an invalid URL access request according to the HTTP status code.

作为本实施例的一种优选示例，状态码为200、301、302、304的状态可以视为网址链接成功网页正常打开的状态，其余的状态码可以视为失效网址的状态码。As a preferred example of this embodiment, the status codes of 200, 301, 302, and 304 can be regarded as the status of a successful website link and the webpage is normally opened, and the rest of the status codes can be regarded as status codes of invalid website addresses.

实际上，上述从浏览器侧或者服务器侧获取HTTP状态码的方式可以是在浏览器侧或者服务器侧生成独立的线程或者进程进行捕获HTTP状态码，而本领域技术人员应当可以理解，以上获取HTTP状态码的方式仅是一种示例，本领域技术人员可以采取其他方式来实现均是可以的，本发明在此不作限制。In fact, the above method of obtaining the HTTP status code from the browser side or the server side can be to generate an independent thread or process on the browser side or the server side to capture the HTTP status code, and those skilled in the art should understand that the above method of obtaining the HTTP status code The manner of the status code is only an example, and those skilled in the art may implement it in other manners, which are not limited in the present invention.

步骤106：若所述搜索结果的网址为失效网址，服务器在数据库中查找匹配的网页快照，并返回至浏览器。Step 106: If the URL of the search result is an invalid URL, the server searches the database for a matching webpage snapshot, and returns it to the browser.

实际上，若是在浏览器端判断访问某个搜索结果的网址为失效网址时，浏览器将所述搜索结果的网址对应的网页快照获取请求发送至服务器，服务器在所述网页快照数据库中查找与网页快照获取请求匹配的网页快照返回浏览器；In fact, if the browser judges that the website address of a certain search result is an invalid website address, the browser sends a request for obtaining a webpage snapshot corresponding to the website address of the search result to the server, and the server searches the webpage snapshot database for the relevant URL. The webpage snapshot acquisition request matches the webpage snapshot and returns it to the browser;

若是在服务器端判断访问某个搜索结果的网址为失效网址时，服务器直接从网页快照数据库中查找与网页快照获取请求匹配的网页快照返回浏览器。If it is judged on the server side that the URL for accessing a certain search result is an invalid URL, the server directly searches the webpage snapshot database for a webpage snapshot that matches the webpage snapshot acquisition request and returns it to the browser.

需要说明的是，对于方法实施例，为了简单描述，故将其都表述为一系列的动作组合，但是本领域技术人员应该知悉，本发明并不受所描述的动作顺序的限制，因为依据本发明，某些步骤可以采用其他顺序或者同时进行。其次，本领域技术人员也应该知悉，说明书中所描述的实施例均属于优选实施例，所涉及的动作和模块并不一定是本发明所必须的。It should be noted that, for the method embodiment, for the sake of simple description, it is expressed as a series of action combinations, but those skilled in the art should know that the present invention is not limited by the described action order, because according to this According to the invention, certain steps may be performed in other order or simultaneously. Secondly, those skilled in the art should also know that the embodiments described in the specification belong to preferred embodiments, and the actions and modules involved are not necessarily required by the present invention.

参照图2，示出了根据本发明一个实施例的失效网址搜索装置实施例的结构框图，具体可以包括以下模块：Referring to Figure 2, it shows a structural block diagram of an embodiment of an invalid website search device according to an embodiment of the present invention, which may specifically include the following modules:

网址信息采集模块201，适于采集多台用户设备的浏览器收藏夹的网址信息，保存所述网址信息至数据库，所述网址信息包括网址的网页快照；The website information collection module 201 is adapted to collect website information of browser favorites of multiple user devices, and saves the website information to a database, and the website information includes a webpage snapshot of the website;

浏览器在采集多台用户设备的浏览器收藏夹的网址信息后，将网址信息保存在数据库中以供后续的搜索使用。在具体实现中，本发明可以用两个数据库保存网址信息，一个是内容数据库，一个是网页快照数据库，网页快照数据库用于保存网址的网页快照，内容数据库用户保存网址除网页快照外的其他信息；或者，本发明也可以建立一个数据库，数据库中包括两张表，一张用于存储网页快照，一种用于存储网页快照以外的内容，本领域技术人员应该可以理解，上述的网址信息存储方式仅仅是本发明的示例，本领域技术人员可以采用其他存储方法进行存储，本发明在此不作限制。After the browser collects the URL information of the browser favorites of multiple user devices, it saves the URL information in the database for subsequent search. In the specific implementation, the present invention can use two databases to save website information, one is the content database, and the other is the webpage snapshot database. or, the present invention can also set up a database, including two tables in the database, one is used to store the webpage snapshot, and the other is used to store content other than the webpage snapshot. Those skilled in the art should understand that the above-mentioned URL information storage The manner is only an example of the present invention, and those skilled in the art may use other storage methods for storage, and the present invention is not limited here.

搜索请求接收模块202，适于接收搜索请求，并根据所述搜索请求返回搜索结果；A search request receiving module 202, adapted to receive a search request, and return search results according to the search request;

在本发明的一种优选实施例中，所述搜索请求接收模块202可以包括如下子模块：In a preferred embodiment of the present invention, the search request receiving module 202 may include the following submodules:

失效网址判断模块203，适于判断访问某个搜索结果的网址是否为失效网址；The invalid URL judging module 203 is suitable for judging whether the URL for accessing a certain search result is an invalid URL;

一般情况下，采用HTTP状态码(HTTPStatusCode)来判断网址的有效性。HTTP状态码由三位十进制数字组成，用以指出网页访问请求的成功或失败，如果失败则指出原因。Generally, the validity of the URL is judged by using the HTTP status code (HTTPStatusCode). The HTTP status code consists of three decimal numbers to indicate the success or failure of the web page access request, and if it fails, the reason is indicated.

在本发明的一种优选实施例中，所述失效网址判断模块203可以包括如下子模块：In a preferred embodiment of the present invention, the invalid URL judging module 203 may include the following submodules:

在本发明的另一种优选实施例中，所述失效网址判断模块203可以包括如下子模块：In another preferred embodiment of the present invention, the invalid URL judging module 203 may include the following submodules:

网页快照获取模块204，适于在所述搜索结果的网址为失效网址时，服务器在数据库中查找匹配的网页快照，并返回至浏览器。The webpage snapshot obtaining module 204 is adapted to search the database for a matching webpage snapshot by the server when the URL of the search result is an invalid URL, and return it to the browser.

对于图2的系统实施例而言，由于其与图1的方法实施例基本相似，所以描述的比较简单，相关之处参见方法实施例的部分说明即可。As for the system embodiment in FIG. 2 , since it is basically similar to the method embodiment in FIG. 1 , the description is relatively simple, and for relevant parts, refer to the part of the description of the method embodiment.

在此提供的算法和显示不与任何特定计算机、虚拟系统或者其它设备固有相关。各种通用系统也可以与基于在此的示教一起使用。根据上面的描述，构造这类系统所要求的结构是显而易见的。此外，本发明也不针对任何特定编程语言。应当明白，可以利用各种编程语言实现在此描述的本发明的内容，并且上面对特定语言所做的描述是为了披露本发明的最佳实施方式。The algorithms and displays presented herein are not inherently related to any particular computer, virtual system, or other device. Various generic systems can also be used with the teachings based on this. The structure required to construct such a system is apparent from the above description. Furthermore, the present invention is not specific to any particular programming language. It should be understood that various programming languages can be used to implement the content of the present invention described herein, and the above description of specific languages is for disclosing the best mode of the present invention.

在此处所提供的说明书中，说明了大量具体细节。然而，能够理解，本发明的实施例可以在没有这些具体细节的情况下实践。在一些实例中，并未详细示出公知的方法、结构和技术，以便不模糊对本说明书的理解。In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure the understanding of this description.

类似地，应当理解，为了精简本公开并帮助理解各个发明方面中的一个或多个，在上面对本发明的示例性实施例的描述中，本发明的各个特征有时被一起分组到单个实施例、图、或者对其的描述中。然而，并不应将该公开的方法解释成反映如下意图：即所要求保护的本发明要求比在每个权利要求中所明确记载的特征更多的特征。更确切地说，如下面的权利要求书所反映的那样，发明方面在于少于前面公开的单个实施例的所有特征。因此，遵循具体实施方式的权利要求书由此明确地并入该具体实施方式，其中每个权利要求本身都作为本发明的单独实施例。Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, in order to streamline this disclosure and to facilitate an understanding of one or more of the various inventive aspects, various features of the invention are sometimes grouped together in a single embodiment, figure, or its description. This method of disclosure, however, is not to be interpreted as reflecting an intention that the claimed invention requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the Detailed Description are hereby expressly incorporated into this Detailed Description, with each claim standing on its own as a separate embodiment of this invention.

本领域那些技术人员可以理解，可以对实施例中的设备中的模块进行自适应性地改变并且把它们设置在与该实施例不同的一个或多个设备中。可以把实施例中的模块或单元或组件组合成一个模块或单元或组件，以及此外可以把它们分成多个子模块或子单元或子组件。除了这样的特征和/或过程或者单元中的至少一些是相互排斥之外，可以采用任何组合对本说明书(包括伴随的权利要求、摘要和附图)中公开的所有特征以及如此公开的任何方法或者设备的所有过程或单元进行组合。除非另外明确陈述，本说明书(包括伴随的权利要求、摘要和附图)中公开的每个特征可以由提供相同、等同或相似目的的替代特征来代替。Those skilled in the art can understand that the modules in the device in the embodiment can be adaptively changed and arranged in one or more devices different from the embodiment. Modules or units or components in the embodiments may be combined into one module or unit or component, and furthermore may be divided into a plurality of sub-modules or sub-units or sub-assemblies. All features disclosed in this specification (including accompanying claims, abstract and drawings) and any method or method so disclosed may be used in any combination, except that at least some of such features and/or processes or units are mutually exclusive. All processes or units of equipment are combined. Each feature disclosed in this specification (including accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

此外，本领域的技术人员能够理解，尽管在此所述的一些实施例包括其它实施例中所包括的某些特征而不是其它特征，但是不同实施例的特征的组合意味着处于本发明的范围之内并且形成不同的实施例。例如，在下面的权利要求书中，所要求保护的实施例的任意之一都可以以任意的组合方式来使用。Furthermore, those skilled in the art will understand that although some embodiments described herein include some features included in other embodiments but not others, combinations of features from different embodiments are meant to be within the scope of the invention. and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

本发明的各个部件实施例可以以硬件实现，或者以在一个或者多个处理器上运行的软件模块实现，或者以它们的组合实现。本领域的技术人员应当理解，可以在实践中使用微处理器或者数字信号处理器(DSP)来实现根据本发明实施例的基于收藏夹的搜索设备中的一些或者全部部件的一些或者全部功能。本发明还可以实现为用于执行这里所描述的方法的一部分或者全部的设备或者装置程序(例如，计算机程序和计算机程序产品)。这样的实现本发明的程序可以存储在计算机可读介质上，或者可以具有一个或者多个信号的形式。这样的信号可以从因特网网站上下载得到，或者在载体信号上提供，或者以任何其他形式提供。The various component embodiments of the present invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art should understand that a microprocessor or a digital signal processor (DSP) may be used in practice to implement some or all functions of some or all components in the favorites-based search device according to the embodiment of the present invention. The present invention can also be implemented as an apparatus or an apparatus program (for example, a computer program and a computer program product) for performing a part or all of the methods described herein. Such a program for realizing the present invention may be stored on a computer-readable medium, or may be in the form of one or more signals. Such a signal may be downloaded from an Internet site, or provided on a carrier signal, or provided in any other form.

应该注意的是上述实施例对本发明进行说明而不是对本发明进行限制，并且本领域技术人员在不脱离所附权利要求的范围的情况下可设计出替换实施例。在权利要求中，不应将位于括号之间的任何参考符号构造成对权利要求的限制。单词“包含”不排除存在未列在权利要求中的元件或步骤。位于元件之前的单词“一”或“一个”不排除存在多个这样的元件。本发明可以借助于包括有若干不同元件的硬件以及借助于适当编程的计算机来实现。在列举了若干装置的单元权利要求中，这些装置中的若干个可以是通过同一个硬件项来具体体现。单词第一、第二、以及第三等的使用不表示任何顺序。可将这些单词解释为名称。It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention can be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In a unit claim enumerating several means, several of these means can be embodied by one and the same item of hardware. The use of the words first, second, and third, etc. does not indicate any order. These words can be interpreted as names.

Claims

1. A search method for an invalid website, comprising:

Gather the web site information of the browser favorites of multiple user devices, save the web site information to a database, and the web site information includes a web page snapshot of the web site; wherein, the database includes a content database and a web page snapshot database, and the web page snapshot database It is used to save the webpage snapshot of the URL, and the user of the content database saves other information of the URL except the webpage snapshot;

The browser receives the search request and sends the search request to the server;

The server grabs webpages related to the search request in the database to form a search result and returns it to the browser;

the browser displays said search results;

Determine whether the URL for accessing a certain search result is an invalid URL;

If the URL of the search result is an invalid URL, the server searches the database for a matching webpage snapshot and returns it to the browser.

2. The method according to claim 1, wherein the webpage snapshot is generated for the server to obtain the code of the webpage, or to notify the browser to save the corresponding webpage when the server obtains the code of the webpage and saves it unsuccessfully. The code upload generated.

3. The method as claimed in claim 1 or 2, wherein the step of judging whether the URL for accessing a certain search result is an invalid URL comprises:

The browser sends the URL of the search result to the server;

The server parses the URL of the search result to generate a response message and returns it to the browser;

The browser parses the response message and extracts the HTTP status code of the corresponding URL;

The browser judges whether the URL access request is an invalid URL access request according to the HTTP status code.

4. The method according to claim 1 or 2, wherein the step of judging whether a website address for accessing a certain search result is an invalid website address comprises:

The browser sends the URL of the search result to the server;

The server parses the URL of the search result, and extracts the HTTP status code in the corresponding URL;

The server determines whether the URL access request is an invalid URL access request according to the HTTP status code.

5. An invalid website search device, comprising:

The website information collection module is adapted to collect website information of browser favorites of multiple user devices, and saves the website information to a database, and the website information includes a webpage snapshot of the website; wherein, the database includes a content database and a webpage snapshot Database, the webpage snapshot database is used to save the webpage snapshot of the website, and the user of the content database saves other information of the website except the webpage snapshot;

A search request receiving module, adapted to receive a search request, and return search results according to the search request;

An invalid URL judging module, suitable for judging whether a URL for accessing a certain search result is an invalid URL;

The webpage snapshot obtaining module is suitable for searching the matching webpage snapshot in the database by the server when the URL of the search result is an invalid URL, and returning it to the browser;

Wherein, the search request receiving module includes:

The search request sending submodule located in the browser is adapted to receive the search request and send the search request to the server;

The search result return submodule located at the server is adapted to grab webpages related to the search request in the database to form a search result and return it to the browser;

The search result display submodule located in the browser is suitable for displaying the search results.

6. The device according to claim 5, wherein the webpage snapshot is generated for the server to obtain the code of the webpage and save it, or to notify the browser to save the corresponding webpage when the server obtains the code of the webpage and saves it unsuccessfully. The code upload generated.

7. The device according to claim 5 or 6, wherein the failure URL judging module comprises:

The first URL sending submodule located in the browser is suitable for sending the URL of the search result to the server;

The response message return submodule located in the server is adapted to analyze the URL of the search result to generate a response message and return it to the browser;

The HTTP status code acquisition submodule located in the browser is suitable for parsing the response message and extracting the HTTP status code of the corresponding URL;

The website address judging sub-module located in the browser is suitable for judging whether the website access request is an invalid website access request according to the HTTP status code.

8. The device according to claim 5 or 6, wherein the invalid URL judging module comprises:

The second URL sending submodule located in the browser is suitable for sending the URL of the search result to the server;

The HTTP status code acquisition submodule located in the server is suitable for parsing the URL of the search result and extracting the HTTP status code in the corresponding URL;

The URL judging submodule located on the server is suitable for judging whether the URL access request is an invalid URL access request according to the HTTP status code.