CN114385950A - Method and device, electronic device and storage medium for distinguishing internal and external links of target website - Google Patents
Method and device, electronic device and storage medium for distinguishing internal and external links of target website Download PDFInfo
- Publication number
- CN114385950A CN114385950A CN202111674847.4A CN202111674847A CN114385950A CN 114385950 A CN114385950 A CN 114385950A CN 202111674847 A CN202111674847 A CN 202111674847A CN 114385950 A CN114385950 A CN 114385950A
- Authority
- CN
- China
- Prior art keywords
- link
- target website
- distinguished
- website
- icp
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/958—Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/955—Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Transfer Between Computers (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
技术领域technical field
本申请涉及内外链区分技术领域,例如涉及一种用于区分目标网站内外链的方法及装置、电子设备、存储介质。The present application relates to the technical field of distinguishing internal and external links, for example, to a method and device, electronic device, and storage medium for distinguishing internal and external links of a target website.
背景技术Background technique
网站的链接分为内链和外链,内链是本网站内部页面的链接,而外链则是其它网站页面的链接。在某些网页相关的应用中,收集到网站下的链接后,可能需要识别该链接是内链还是外链,以便做不同处理。然而现在业务和应用的云化部署已经成为当前网站建设的主要方向,有利于快速部署、降低成本和分布式运营,特别是对于政府、大型企业等非经营性网站为了加强管理,提升数据共享能力,纷纷开展网站的集约化建设,构架统一的数据中心和业务平台,各类型、各部门网站集中云化部署。这类网站的集约化建设模糊了网站内链和外链的边界,也改变了网站内链和外链的内涵。The links of the website are divided into internal links and external links. Internal links are links to internal pages of this website, while external links are links to other website pages. In some webpage-related applications, after collecting the links under the website, it may be necessary to identify whether the link is an internal link or an external link, so as to do different processing. However, cloud deployment of business and applications has become the main direction of current website construction, which is conducive to rapid deployment, cost reduction and distributed operation, especially for non-operating websites such as governments and large enterprises, in order to strengthen management and improve data sharing capabilities , have carried out the intensive construction of websites, built a unified data center and business platform, and centralized cloud deployment of websites of various types and departments. The intensive construction of such websites blurs the boundaries between the internal and external links of the website, and also changes the connotation of the internal and external links of the website.
在实现本公开实施例的过程中,发现相关技术中至少存在如下问题:现有技术中利用URL识别法,通过将目标网站的主域名或IP地址与URL链接的主域名或IP地址进行比对,通过这种方式确定URL链接为目标网站的内链或外链的准确度较低,容易出现IP地址相同且主域名相同的外链被判断为内链,IP地址不同且主域名不完全相同的内链被判断为外链等错判情况。In the process of implementing the embodiments of the present disclosure, it is found that there are at least the following problems in the related art: the URL identification method is used in the prior art to compare the main domain name or IP address of the target website with the main domain name or IP address linked by the URL. In this way, the accuracy of determining the URL link as the internal link or external link of the target website is low, and it is easy to appear that the external link with the same IP address and the same main domain name is judged as the internal link, and the IP address is different and the main domain name is not exactly the same. The internal link is judged to be an external link and other misjudgments.
发明内容SUMMARY OF THE INVENTION
为了对披露的实施例的一些方面有基本的理解,下面给出了简单的概括。所述概括不是泛泛评述,也不是要确定关键/重要组成元素或描绘这些实施例的保护范围,而是作为后面的详细说明的序言。In order to provide a basic understanding of some aspects of the disclosed embodiments, a brief summary is given below. This summary is not intended to be an extensive review, nor to identify key/critical elements or delineate the scope of protection of these embodiments, but rather serves as a prelude to the detailed description that follows.
本公开实施例提供了一种用于区分目标网站内外链的方法及装置、电子设备、存储介质,以提高区分目标网站内外链的准确度。Embodiments of the present disclosure provide a method and apparatus, electronic device, and storage medium for distinguishing internal and external links of a target website, so as to improve the accuracy of distinguishing internal and external links of a target website.
在一些实施例中,所述用于区分目标网站内外链的方法包括:获取目标网站的第一ICP备案信息;获取目标网站的待区分URL链接;根据第一ICP备案信息区分待区分URL链接是目标网站的内链或外链。In some embodiments, the method for distinguishing internal and external links of a target website includes: obtaining the first ICP filing information of the target website; obtaining the URL links to be distinguished of the target website; distinguishing the URL links to be distinguished according to the first ICP filing information is Internal or external links to the target website.
在一些实施例中,所述用于区分目标网站内外链的装置包括:第一获取模块,被配置为获取目标网站的第一ICP备案信息;第二获取模块,被配置为获取所述目标网站的待区分URL链接;区分模块,被配置为根据所述第一ICP备案信息区分所述待区分URL链接是所述目标网站的内链或外链。In some embodiments, the device for distinguishing internal and external links of a target website includes: a first acquisition module configured to acquire first ICP filing information of the target website; a second acquisition module configured to acquire the target website The URL link to be distinguished; the distinguishing module is configured to distinguish whether the URL link to be distinguished is an internal link or an external link of the target website according to the first ICP filing information.
在一些实施例中,所述用于区分目标网站内外链的装置包括:包括处理器和存储有程序指令的存储器,所述处理器被配置为在运行所述程序指令时,执行上述的用于区分目标网站内外链的方法。In some embodiments, the apparatus for distinguishing between internal and external links of a target website includes: a processor and a memory storing program instructions, the processor is configured to execute the above-mentioned program instructions when running the program instructions. A method of distinguishing between internal and external links of the target website.
在一些实施例中,所述电子设备包括上述的用于区分目标网站内外链的装置。In some embodiments, the electronic device includes the above-mentioned apparatus for distinguishing internal and external links of the target website.
在一些实施例中,所述存储介质,存储有程序指令,该程序指令在运行时,执行上述的用于区分目标网站内外链的方法。In some embodiments, the storage medium stores program instructions, and when the program instructions are running, the above-mentioned method for distinguishing internal and external links of a target website is executed.
本公开实施例提供的用于区分目标网站内外链的方法及装置、电子设备、存储介质,可以实现以下技术效果:通过获取目标网站的第一ICP备案信息,获取目标网站的待区分URL链接,然后根据第一ICP备案信息区分待区分URL链接是目标网站的内链或外链。这样,不需要考虑目标网站的主域名、目标网站的IP地址、待区分URL链接的主域名和待区分URL链接的IP地址之间的关系导致的误判,通过利用目标网站的第一ICP备案信息区分待区分URL链接是目标网站的内链或外链,提高了区分目标网站内外链的准确度。The method and device, electronic device, and storage medium for distinguishing internal and external links of a target website provided by the embodiments of the present disclosure can achieve the following technical effects: by obtaining the first ICP filing information of the target website, the URL links to be distinguished of the target website are obtained, Then, according to the first ICP filing information, it is distinguished whether the URL link to be distinguished is an internal link or an external link of the target website. In this way, there is no need to consider the misjudgment caused by the relationship between the main domain name of the target website, the IP address of the target website, the main domain name of the URL link to be distinguished, and the IP address of the URL link to be distinguished. By using the first ICP record of the target website Information Distinction The URL link to be distinguished is the internal link or external link of the target website, which improves the accuracy of distinguishing the internal and external links of the target website.
以上的总体描述和下文中的描述仅是示例性和解释性的,不用于限制本申请。The foregoing general description and the following description are exemplary and explanatory only and are not intended to limit the application.
附图说明Description of drawings
一个或多个实施例通过与之对应的附图进行示例性说明,这些示例性说明和附图并不构成对实施例的限定,附图中具有相同参考数字标号的元件示为类似的元件,附图不构成比例限制,并且其中:One or more embodiments are exemplified by the accompanying drawings, which are not intended to limit the embodiments, and elements with the same reference numerals in the drawings are shown as similar elements, The drawings do not constitute a limitation of scale, and in which:
图1是本公开实施例提供的一个用于区分目标网站内外链的方法的示意图;1 is a schematic diagram of a method for distinguishing internal and external links of a target website provided by an embodiment of the present disclosure;
图2是本公开实施例提供的另一个用于区分目标网站内外链的方法的示意图;2 is a schematic diagram of another method for distinguishing internal and external links of a target website provided by an embodiment of the present disclosure;
图3是本公开实施例提供的另一个用于区分目标网站内外链的方法的示意图;3 is a schematic diagram of another method for distinguishing internal and external links of a target website provided by an embodiment of the present disclosure;
图4是本公开实施例提供的另一个用于区分目标网站内外链的方法的示意图;4 is a schematic diagram of another method for distinguishing internal and external links of a target website provided by an embodiment of the present disclosure;
图5是本公开实施例提供的一个用于区分目标网站内外链的装置的示意图;5 is a schematic diagram of an apparatus for distinguishing internal and external links of a target website provided by an embodiment of the present disclosure;
图6是本公开实施例提供的另一个用于区分目标网站内外链的装置的示意图。FIG. 6 is a schematic diagram of another apparatus for distinguishing internal and external links of a target website provided by an embodiment of the present disclosure.
具体实施方式Detailed ways
为了能够更加详尽地了解本公开实施例的特点与技术内容,下面结合附图对本公开实施例的实现进行详细阐述,所附附图仅供参考说明之用,并非用来限定本公开实施例。在以下的技术描述中,为方便解释起见,通过多个细节以提供对所披露实施例的充分理解。然而,在没有这些细节的情况下,一个或多个实施例仍然可以实施。在其它情况下,为简化附图,熟知的结构和装置可以简化展示。In order to understand the features and technical contents of the embodiments of the present disclosure in more detail, the implementation of the embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings, which are for reference only and are not intended to limit the embodiments of the present disclosure. In the following technical description, for the convenience of explanation, numerous details are provided to provide a thorough understanding of the disclosed embodiments. However, one or more embodiments may be practiced without these details. In other instances, well-known structures and devices may be shown simplified in order to simplify the drawings.
本公开实施例的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本公开实施例的实施例。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含。The terms "first", "second" and the like in the description and claims of the embodiments of the present disclosure and the above-mentioned drawings are used to distinguish similar objects, and are not necessarily used to describe a specific order or sequence. It should be understood that the data so used may be interchanged under appropriate circumstances for the purposes of implementing the embodiments of the disclosure described herein. Furthermore, the terms "comprising" and "having", and any variations thereof, are intended to cover non-exclusive inclusion.
除非另有说明,术语“多个”表示两个或两个以上。Unless stated otherwise, the term "plurality" means two or more.
本公开实施例中,字符“/”表示前后对象是一种“或”的关系。例如,A/B表示:A或B。In the embodiment of the present disclosure, the character "/" indicates that the preceding and following objects are in an "or" relationship. For example, A/B means: A or B.
术语“和/或”是一种描述对象的关联关系,表示可以存在三种关系。例如,A和/或B,表示:A或B,或,A和B这三种关系。The term "and/or" is an associative relationship describing objects, indicating that three relationships can exist. For example, A and/or B, means: A or B, or, A and B three relationships.
术语“对应”可以指的是一种关联关系或绑定关系,A与B相对应指的是A与B之间是一种关联关系或绑定关系。The term "correspondence" may refer to an association relationship or a binding relationship, and the correspondence between A and B refers to an association relationship or a binding relationship between A and B.
本公开实施例的技术方案可以应用于智能终端或服务器中。在一些实施例中,智能终端包括智能手机、平板或计算机等能够访问网站的装置。The technical solutions of the embodiments of the present disclosure can be applied to intelligent terminals or servers. In some embodiments, the smart terminal includes a device capable of accessing a website, such as a smart phone, a tablet, or a computer.
本公开实施例中,利用智能终端或服务器对目标网站的URL链接进行区分,使智能终端或服务器在访问目标网站的时候,能够通过目标网站的ICP备案信息确定目标网站的待区分URL链接是目标网站的内链或外链,从而便于对该链接进行处理。In the embodiment of the present disclosure, the intelligent terminal or server is used to distinguish the URL links of the target website, so that when the intelligent terminal or server accesses the target website, it can determine that the URL link of the target website to be distinguished is the target website through the ICP filing information of the target website. The internal or external link of the website, so as to facilitate the processing of the link.
结合图1所示,本公开实施例提供一种用于区分目标网站内外链的方法,包括:1, an embodiment of the present disclosure provides a method for distinguishing internal and external links of a target website, including:
步骤S101,电子设备获取目标网站的第一ICP(Internet Content Provider,网站内容提供商)备案信息。In step S101, the electronic device obtains the first ICP (Internet Content Provider, website content provider) filing information of the target website.
步骤S102,电子设备获取目标网站的待区分URL链接。Step S102, the electronic device acquires the URL link of the target website to be distinguished.
步骤S103,电子设备根据第一ICP备案信息区分待区分URL链接是目标网站的内链或外链。Step S103, the electronic device distinguishes whether the URL link to be distinguished is an internal link or an external link of the target website according to the first ICP filing information.
采用本公开实施例提供的用于区分目标网站内外链的方法,通过获取目标网站的第一ICP备案信息,获取目标网站的待区分URL链接,然后根据第一ICP备案信息区分待区分URL链接是目标网站的内链或外链。这样,不需要考虑目标网站的主域名、目标网站的IP地址、待区分URL链接的主域名和待区分URL链接的IP地址之间的关系导致的误判,通过利用目标网站的第一ICP备案信息区分待区分URL链接是目标网站的内链或外链,提高了区分目标网站内外链的准确度。Using the method for distinguishing internal and external links of a target website provided by the embodiment of the present disclosure, by obtaining the first ICP filing information of the target website, obtaining the URL links to be distinguished of the target website, and then distinguishing the URL links to be distinguished according to the first ICP filing information. Internal or external links to the target website. In this way, there is no need to consider the misjudgment caused by the relationship between the main domain name of the target website, the IP address of the target website, the main domain name of the URL link to be distinguished, and the IP address of the URL link to be distinguished. By using the first ICP record of the target website Information Distinction The URL link to be distinguished is the internal link or external link of the target website, which improves the accuracy of distinguishing the internal and external links of the target website.
可选地,目标网站为非经营性网站。Optionally, the target website is a non-commercial website.
可选地,电子设备获取目标网站的第一ICP备案信息,包括:电子设备访问目标网站的第一网站首页;电子设备提取第一网站首页的内容;电子设备在第一网站首页的内容中获取目标网站的第一ICP备案信息。这样,通过电子设备访问目标网站的第一网站首页,并在第一网站首页的内容中获取目标网站的第一ICP备案信息,以便于根据第一ICP备案信息区分待区分URL链接是目标网站的内链或外链,不需要考虑目标网站的主域名、目标网站的IP地址、待区分URL链接的主域名和待区分URL链接的IP地址之间的关系导致的误判,通过利用目标网站的第一ICP备案信息区分待区分URL链接是目标网站的内链或外链,提高了区分目标网站内外链的准确度。Optionally, obtaining the first ICP filing information of the target website by the electronic device includes: accessing the first website homepage of the target website by the electronic device; extracting the content of the first website homepage by the electronic device; obtaining the electronic device from the content of the first website homepage The first ICP filing information of the target website. In this way, the first website homepage of the target website is accessed through the electronic device, and the first ICP filing information of the target website is obtained from the content of the first website homepage, so as to distinguish the URL links to be distinguished from the target website according to the first ICP filing information. Internal or external links do not need to consider the relationship between the main domain name of the target website, the IP address of the target website, the main domain name of the URL link to be distinguished, and the IP address of the URL link to be distinguished. The first ICP filing information distinguishes whether the URL link to be distinguished is the internal link or external link of the target website, which improves the accuracy of distinguishing the internal and external links of the target website.
可选地,电子设备访问目标网站的第一网站首页,包括:电子设备获取目标网站的URL链接;电子设备获取目标网站的URL链接的首页地址;电子设备访问该首页地址对应的第一网站首页。可选地,目标网站的URL链接的首页地址为目标网站的URL链接的主机字段对应的网站地址。Optionally, accessing the first website homepage of the target website by the electronic device includes: obtaining the URL link of the target website by the electronic device; obtaining the homepage address of the URL link of the target website by the electronic device; accessing the first website homepage corresponding to the homepage address by the electronic device . Optionally, the home page address of the URL link of the target website is the website address corresponding to the host field of the URL link of the target website.
在一些实施例中,电子设备访问目标网站的第一网站首页,然后电子设备通过爬虫提取第一网站首页的内容,电子设备在第一网站首页的内容中获取目标网站的第一ICP备案信息。In some embodiments, the electronic device accesses the first website homepage of the target website, then the electronic device extracts the content of the first website homepage through a crawler, and the electronic device obtains the first ICP filing information of the target website from the content of the first website homepage.
可选地,目标网站的第一ICP备案信息为该目标网站的第一网站首页的ICP备案号。Optionally, the first ICP filing information of the target website is the ICP filing number of the homepage of the first website of the target website.
在一些实施例中,按照相关政策规定,非经营性互联网信息服务提供者在网站开通前应对该网站进行网站信息备案,并获得ICP备案信息,即ICP备案号。然后在该网站的网站首页底部放置ICP备案号,以供公众查询核对,则能够确认ICP备案信息为网站首页的页面在内容上的公共专有属性字段。这样,通过ICP备案号尤其适合非经营性网站的内外链区分,不需要考虑目标网站的主域名、目标网站的IP地址、待区分URL链接的主域名和待区分URL链接的IP地址之间的关系导致的误判,通过利用目标网站的第一ICP备案信息区分待区分URL链接是非经营性网站的内链或外链,提高了区分非经营性网站内外链的准确度。同时,利用非经营性网站的特有属性,仅通过对页面内容中ICP备案信息一个通用属性的解析,就能够完成网站内外链区分,提高了区分区分目标网站内外链的简洁性、高效性和准确性。In some embodiments, according to relevant policies and regulations, a non-commercial Internet information service provider shall record the website information before the website is opened, and obtain the ICP record information, that is, the ICP record number. Then place the ICP record number at the bottom of the homepage of the website for the public to check and check, then it can be confirmed that the ICP recordal information is the public exclusive attribute field on the content of the page on the homepage of the website. In this way, the ICP record number is especially suitable for the distinction between internal and external links of non-commercial websites. It does not need to consider the main domain name of the target website, the IP address of the target website, the main domain name of the URL link to be distinguished, and the IP address of the URL link to be distinguished. For misjudgment caused by the relationship, by using the first ICP filing information of the target website to distinguish whether the URL link to be distinguished is the internal link or external link of the non-operating website, the accuracy of distinguishing the internal and external links of the non-operating website is improved. At the same time, using the unique attributes of non-commercial websites, only by analyzing a common attribute of the ICP filing information in the page content, it is possible to complete the distinction between internal and external links of the website, which improves the simplicity, efficiency and accuracy of distinguishing between internal and external links of the target website. sex.
可选地,电子设备在第一网站首页的内容中获取目标网站的第一ICP备案信息,包括:电子设备对第一网站首页的内容进行格式匹配,电子设备在匹配到与预设的格式相同的字段的情况下,将该字段确定为目标网站的第一ICP备案信息。Optionally, the electronic device obtains the first ICP filing information of the target website from the content of the homepage of the first website, including: the electronic device performs format matching on the content of the homepage of the first website, and the electronic device matches the format to the same as the preset format. In the case of the field, the field is determined as the first ICP filing information of the target website.
可选地,预设的格式为ICP备案号的格式。Optionally, the preset format is the format of the ICP record number.
在一些实施例中,ICP备案号的格式为:“省份简写+ICP备”+“主体ICP备案号码”+“网站序列号”;或,ICP备案号的格式为:“ICP备”+“主体ICP备案号码”+“网站序列号”。In some embodiments, the format of the ICP record number is: "province abbreviation + ICP record number" + "subject ICP record number" + "website serial number"; or, the format of the ICP record number is: "ICP record number" + "subject record number" ICP record number" + "Website serial number".
可选地,电子设备在没有获取到目标网站的第一ICP备案信息的情况下,在第一网站首页中提取目标网站的待区分URL链接;电子设备利用URL识别法区分待区分URL链接是目标网站的内链或外链。Optionally, the electronic device extracts the URL link to be distinguished of the target website in the homepage of the first website without obtaining the first ICP filing information of the target website; the electronic device utilizes the URL identification method to distinguish the URL link to be distinguished as the target. Internal or external links to the website.
可选地,电子设备获取目标网站的待区分URL链接,包括:电子设备在获取到目标网站的第一ICP备案信息的情况下,电子设备在第一网站首页中提取目标网站的待区分URL链接。这样,通过在获取到目标网站的第一ICP备案信息的情况下,电子设备在第一网站首页中提取目标网站的待区分URL链接,以便于根据第一ICP备案信息区分待区分URL链接是目标网站的内链或外链,不需要考虑目标网站的主域名、目标网站的IP地址、待区分URL链接的主域名和待区分URL链接的IP地址之间的关系导致的误判,通过利用目标网站的第一ICP备案信息区分待区分URL链接是目标网站的内链或外链,提高了区分目标网站内外链的准确度。Optionally, obtaining the URL link to be distinguished of the target website by the electronic device includes: when the electronic device obtains the first ICP filing information of the target website, the electronic device extracts the URL link to be distinguished of the target website from the home page of the first website. . In this way, when the first ICP filing information of the target website is obtained, the electronic device extracts the URL link to be differentiated of the target website from the home page of the first website, so as to distinguish the URL link to be differentiated as the target according to the first ICP filing information The internal link or external link of the website does not need to consider the relationship between the main domain name of the target website, the IP address of the target website, the main domain name of the URL link to be distinguished, and the IP address of the URL link to be distinguished. The first ICP filing information of the website distinguishes whether the URL link to be distinguished is the internal link or external link of the target website, which improves the accuracy of distinguishing the internal and external links of the target website.
可选地,电子设备在第一网站首页中提取目标网站的待区分URL链接,包括:电子设备对第一网站首页的内容进行格式匹配;电子设备在匹配到与预设的URL格式相同的字段的情况下,将该字段确定为目标网站的待区分URL链接。Optionally, the electronic device extracts the URL link of the target website from the home page of the first website, including: the electronic device performs format matching on the content of the home page of the first website; In the case of , this field is determined as the URL link of the target website to be distinguished.
可选地,待区分URL链接包括目标网站的二级链接换和目标网站的三级链接等。Optionally, the URL links to be distinguished include secondary links of the target website, tertiary links of the target website, and the like.
可选地,电子设备根据第一ICP备案信息区分待区分URL链接是目标网站的内链或外链,包括:电子设备获取第一网站首页的第一主机字段;电子设备获取待区分URL链接的第二主机字段;电子设备在第一主机字段与第二主机字段相同的情况下,将待区分URL链接确定为目标网站的内链;电子设备在第一主机字段与第二主机字段不相同的情况下,根据第一ICP备案信息区分待区分URL链接是目标网站的内链或外链。这样,通过在第一主机字段与第二主机字段相同的情况下,将待区分URL链接确定为目标网站的内链,并在第一主机字段与第二主机字段不相同的情况下,根据第一ICP备案信息区分待区分URL链接是目标网站的内链或外链,不需要考虑目标网站的主域名、目标网站的IP地址、待区分URL链接的主域名和待区分URL链接的IP地址之间的关系导致的误判,提高了区分目标网站内外链的准确度。Optionally, the electronic device distinguishes that the URL link to be distinguished is an internal link or an external link of the target website according to the first ICP filing information, including: the electronic device obtains the first host field of the home page of the first website; the electronic device obtains the URL link to be distinguished. The second host field; when the first host field is the same as the second host field, the electronic device determines the URL link to be differentiated as the internal link of the target website; In this case, according to the first ICP filing information, the URL link to be distinguished is an internal link or an external link of the target website. In this way, when the first host field is the same as the second host field, the URL link to be distinguished is determined as the internal link of the target website, and when the first host field is different from the second host field, according to the 1. ICP filing information distinguishes whether the URL link to be discriminated is the internal link or external link of the target website, and does not need to consider the main domain name of the target website, the IP address of the target website, the main domain name of the URL link to be discriminated, and the IP address of the URL link to be discriminated. The misjudgment caused by the relationship between them improves the accuracy of distinguishing between internal and external links of the target website.
在一些实施例中,URL链接包括:主机字段、路径字段和文件字段。例如:www.aaa.com/a/b/files/202111/index.html;其中,www.aaa.com为主机字段;a/b/files/202111为路径字段;index.html为文件字段;aaa.com为域名字段。In some embodiments, the URL link includes: a host field, a path field, and a file field. For example: www.aaa.com/a/b/files/202111/index.html; where www.aaa.com is the host field; a/b/files/202111 is the path field; index.html is the file field; aaa .com is the domain name field.
结合图2所示,本公开实施例提供另一种用于区分目标网站内外链的方法,包括:With reference to FIG. 2, an embodiment of the present disclosure provides another method for distinguishing internal and external links of a target website, including:
步骤S201,电子设备访问目标网站的第一网站首页。Step S201, the electronic device accesses the first website homepage of the target website.
步骤S202,电子设备提取第一网站首页的内容。Step S202, the electronic device extracts the content of the home page of the first website.
步骤S203,电子设备在第一网站首页的内容中获取目标网站的第一ICP备案信息。Step S203, the electronic device acquires the first ICP filing information of the target website in the content of the homepage of the first website.
步骤S204,电子设备获取目标网站的待区分URL链接。Step S204, the electronic device obtains the URL link of the target website to be distinguished.
步骤S205,电子设备获取第一网站首页的第一主机字段。Step S205, the electronic device acquires the first host field of the home page of the first website.
步骤S206,电子设备获取待区分URL链接的第二主机字段。Step S206, the electronic device acquires the second host field of the URL link to be distinguished.
步骤S207,电子设备在第一主机字段与第二主机字段相同的情况下,将待区分URL链接确定为目标网站的内链。Step S207, in the case that the first host field is the same as the second host field, the electronic device determines the URL link to be distinguished as the internal link of the target website.
步骤S208,电子设备在第一主机字段与第二主机字段不相同的情况下,根据第一ICP备案信息区分待区分URL链接是目标网站的内链或外链。Step S208, when the first host field is different from the second host field, the electronic device distinguishes whether the URL link to be distinguished is an internal link or an external link of the target website according to the first ICP filing information.
采用本公开实施例提供的用于区分目标网站内外链的方法,通过目标网站的第一网站首页获取目标网站的第一ICP备案信息,在第一网站首页获取目标网站的待区分URL链接,然后根据第一主机字段、第二主机字段和第一ICP备案信息区分待区分URL链接是目标网站的内链或外链。这样,不需要考虑目标网站的主域名、目标网站的IP地址、待区分URL链接的主域名和待区分URL链接的IP地址之间的关系导致的误判,通过利用第一主机字段、第二主机字段和目标网站的第一ICP备案信息区分待区分URL链接是目标网站的内链或外链,提高了区分目标网站内外链的准确度。Using the method for distinguishing internal and external links of a target website provided by the embodiments of the present disclosure, the first ICP filing information of the target website is obtained through the first website homepage of the target website, and the URL link to be distinguished of the target website is obtained on the first website homepage, and then According to the first host field, the second host field and the first ICP filing information, it is distinguished whether the URL link to be distinguished is an internal link or an external link of the target website. In this way, there is no need to consider the misjudgment caused by the relationship between the main domain name of the target website, the IP address of the target website, the main domain name of the URL link to be distinguished, and the IP address of the URL link to be distinguished. The host field and the first ICP filing information of the target website distinguish whether the URL link to be distinguished is the internal link or external link of the target website, which improves the accuracy of distinguishing the internal and external links of the target website.
结合图3所示,本公开实施例提供另一种用于区分目标网站内外链的方法,包括:With reference to FIG. 3 , an embodiment of the present disclosure provides another method for distinguishing internal and external links of a target website, including:
步骤S301,电子设备访问目标网站的第一网站首页。Step S301, the electronic device accesses the first website homepage of the target website.
步骤S302,电子设备提取第一网站首页的内容。Step S302, the electronic device extracts the content of the home page of the first website.
步骤S303,电子设备在第一网站首页的内容中获取目标网站的第一ICP备案信息。Step S303, the electronic device acquires the first ICP filing information of the target website from the content of the homepage of the first website.
步骤S304,电子设备在获取到目标网站的第一ICP备案信息的情况下,在第一网站首页中提取目标网站的待区分URL链接。Step S304, when the electronic device obtains the first ICP filing information of the target website, it extracts the URL link to be distinguished of the target website from the home page of the first website.
步骤S305,电子设备获取第一网站首页的第一主机字段。Step S305, the electronic device acquires the first host field of the home page of the first website.
步骤S306,电子设备获取待区分URL链接的第二主机字段。Step S306, the electronic device obtains the second host field of the URL link to be distinguished.
步骤S307,电子设备在第一主机字段与第二主机字段相同的情况下,将待区分URL链接确定为目标网站的内链。Step S307, in the case that the first host field is the same as the second host field, the electronic device determines the URL link to be distinguished as the internal link of the target website.
步骤S308,电子设备在第一主机字段与第二主机字段不相同的情况下,根据第一ICP备案信息区分待区分URL链接是目标网站的内链或外链。Step S308, when the first host field is different from the second host field, the electronic device distinguishes whether the URL link to be distinguished is an internal link or an external link of the target website according to the first ICP filing information.
采用本公开实施例提供的用于区分目标网站内外链的方法,通过目标网站的第一网站首页获取目标网站的第一ICP备案信息,在获取到目标网站的第一ICP备案信息的情况下,在第一网站首页中提取目标网站的待区分URL链接,然后根据第一主机字段、第二主机字段和第一ICP备案信息区分待区分URL链接是目标网站的内链或外链。这样,不需要考虑目标网站的主域名、目标网站的IP地址、待区分URL链接的主域名和待区分URL链接的IP地址之间的关系导致的误判,在获取到目标网站的第一ICP备案信息的情况下,通过利用第一主机字段、第二主机字段和目标网站的第一ICP备案信息区分待区分URL链接是目标网站的内链或外链,提高了区分目标网站内外链的准确度。Using the method for distinguishing internal and external links of the target website provided by the embodiments of the present disclosure, the first ICP filing information of the target website is obtained through the first website homepage of the target website, and when the first ICP filing information of the target website is obtained, Extract the URL link to be distinguished of the target website from the homepage of the first website, and then distinguish whether the URL link to be distinguished is an internal link or an external link of the target website according to the first host field, the second host field and the first ICP filing information. In this way, there is no need to consider the misjudgment caused by the relationship between the main domain name of the target website, the IP address of the target website, the main domain name of the URL link to be distinguished, and the IP address of the URL link to be distinguished. In the case of filing information, by using the first host field, the second host field and the first ICP filing information of the target website to distinguish whether the URL link to be distinguished is an internal link or an external link of the target website, the accuracy of distinguishing the internal and external links of the target website is improved. Spend.
可选地,电子设备在第一主机字段与第二主机字段不相同的情况下根据第一ICP备案信息区分待区分URL链接是目标网站的内链或外链,包括:电子设备在第一主机字段与第二主机字段不相同的情况下,电子设备根据第二主机字段访问待区分URL链接的第二网站首页;电子设备提取第二网站首页的内容;电子设备在第二网站首页的内容中获取待区分URL链接的第二ICP备案信息;电子设备根据第一ICP备案信息和第二ICP备案信息区分待区分URL链接是目标网站的内链或外链。这样,通过在第一主机字段与第二主机字段不相同的情况下根据第二主机字段获取待区分URL链接的第二ICP备案信息,并利用第一ICP备案信息和第二ICP备案信息区分待区分URL链接是目标网站的内链或外链,不需要考虑目标网站的主域名、目标网站的IP地址、待区分URL链接的主域名和待区分URL链接的IP地址之间的关系导致的误判,提高了区分目标网站内外链的准确度。Optionally, when the first host field and the second host field are different, the electronic device distinguishes according to the first ICP filing information that the URL link to be distinguished is an internal link or an external link of the target website, including: the electronic device is in the first host. When the field is different from the second host field, the electronic device accesses the second website homepage linked by the URL to be distinguished according to the second host field; the electronic device extracts the content of the second website homepage; the electronic device is in the content of the second website homepage Obtain the second ICP filing information of the URL link to be distinguished; the electronic device distinguishes whether the URL link to be distinguished is the internal link or the external link of the target website according to the first ICP filing information and the second ICP filing information. In this way, when the first host field is different from the second host field, the second ICP filing information of the URL link to be differentiated is obtained according to the second host field, and the first ICP filing information and the second ICP filing information are used to distinguish the ICP filing information. It is not necessary to consider the relationship between the main domain name of the target website, the IP address of the target website, the main domain name of the URL link to be distinguished, and the IP address of the URL link to be distinguished. It improves the accuracy of distinguishing internal and external links of the target website.
可选地,电子设备根据第二主机字段访问待区分URL链接的第二网站首页,包括:电子设备访问第二主机字段对应的待区分URL链接的第二网站首页。Optionally, the electronic device accessing the second website homepage of the URL link to be distinguished according to the second host field includes: the electronic device accessing the second website homepage of the URL link to be distinguished corresponding to the second host field.
可选地,电子设备在第二网站首页的内容中获取待区分URL链接的第二ICP备案信息,包括:电子设备对第二网站首页的内容进行格式匹配;电子设备在匹配到与预设的格式相同的字段的情况下,将该字段确定为待区分URL链接的第二ICP备案信息。Optionally, the electronic device obtains the second ICP filing information of the URL link to be differentiated in the content of the home page of the second website, including: the electronic device performs format matching on the content of the home page of the second website; In the case of fields with the same format, this field is determined as the second ICP filing information of URL links to be distinguished.
在一些实施例中,电子设备在第一主机字段与第二主机字段不相同的情况下,电子设备访问第二主机字段对应的待区分URL链接的第二网站首页,然后电子设备通过爬虫提取第二网站首页的内容,电子设备通过格式匹配在第二网站首页的内容中待区分URL链接的第二ICP备案信息。In some embodiments, when the first host field is different from the second host field, the electronic device accesses the second website home page linked by the URL to be distinguished corresponding to the second host field, and then the electronic device extracts the first page through a crawler. The content of the homepage of the second website, the electronic device matches the second ICP filing information of the URL link to be distinguished in the content of the homepage of the second website through the format.
可选地,电子设备根据第一ICP备案信息和第二ICP备案信息区分待区分URL链接是目标网站的内链或外链,包括:电子设备在第一ICP备案信息和第二ICP备案信息相同的情况下,确定待区分URL链接是目标网站的内链;和/或,电子设备在第一ICP备案信息和第二ICP备案信息不相同的情况下,确定待区分URL链接是目标网站的外链。这样,通过在第一ICP备案信息和第二ICP备案信息相同的情况下,确定待区分URL链接是目标网站的内链,在第一ICP备案信息和第二ICP备案信息不相同的情况下,确定待区分URL链接是目标网站的外链,不需要考虑目标网站的主域名、目标网站的IP地址、待区分URL链接的主域名和待区分URL链接的IP地址之间的关系导致的误判,提高了区分目标网站内外链的准确度。Optionally, the electronic device distinguishes according to the first ICP filing information and the second ICP filing information that the URL link to be differentiated is an internal link or an external link of the target website, including: the electronic device is the same in the first ICP filing information and the second ICP filing information In this case, determine that the URL link to be distinguished is an internal link of the target website; and/or, the electronic device determines that the URL link to be distinguished is an external link of the target website when the first ICP filing information and the second ICP filing information are different. chain. In this way, when the first ICP filing information and the second ICP filing information are the same, it is determined that the URL link to be distinguished is the internal link of the target website, and when the first ICP filing information and the second ICP filing information are different, It is determined that the URL link to be differentiated is an external link of the target website, and there is no need to consider the misjudgment caused by the relationship between the main domain name of the target website, the IP address of the target website, the main domain name of the URL link to be distinguished, and the IP address of the URL link to be distinguished , which improves the accuracy of distinguishing the internal and external links of the target website.
在一些实施例中,在第一ICP备案信息和第二ICP备案信息的“省份简写”字段、“主体ICP备案号码”字段、“网站序列号”字段都相同的情况下,确定第一ICP备案信息和第二ICP备案信息相同。In some embodiments, the first ICP filing is determined under the condition that the "province abbreviation" field, the "subject ICP filing number" field, and the "website serial number" field of the first ICP filing information and the second ICP filing information are all the same The information is the same as the second ICP filing information.
在一些实施例中,在第一ICP备案信息和第二ICP备案信息的“省份简写”字段、“主体ICP备案号码”字段、“网站序列号”字段任一项不相同的情况下,确定第一ICP备案信息和第二ICP备案信息不相同。In some embodiments, if any of the "province abbreviation" field, the "subject ICP record number" field, and the "website serial number" field of the first ICP record information and the second ICP record information are different, determine the first ICP record information. The first ICP filing information and the second ICP filing information are different.
可选地,电子设备在获取到第一备案信息和第二备案信息后,将第一备案信息和第二备案信息进行存储。Optionally, after acquiring the first filing information and the second filing information, the electronic device stores the first filing information and the second filing information.
结合图4所示,本公开实施例提供另一种用于区分目标网站内外链的方法,包括:With reference to FIG. 4 , an embodiment of the present disclosure provides another method for distinguishing internal and external links of a target website, including:
步骤S401,电子设备访问目标网站的第一网站首页。Step S401, the electronic device accesses the first website homepage of the target website.
步骤S402,电子设备提取第一网站首页的内容。Step S402, the electronic device extracts the content of the home page of the first website.
步骤S403,电子设备在第一网站首页的内容中获取目标网站的第一ICP备案信息。Step S403, the electronic device obtains the first ICP filing information of the target website from the content of the first website homepage.
步骤S404,电子设备在获取到目标网站的第一ICP备案信息的情况下,在第一网站首页中提取目标网站的待区分URL链接。Step S404, when the electronic device obtains the first ICP filing information of the target website, it extracts the URL link to be distinguished of the target website from the home page of the first website.
步骤S405,电子设备获取第一网站首页的第一主机字段。Step S405, the electronic device acquires the first host field of the home page of the first website.
步骤S406,电子设备获取待区分URL链接的第二主机字段。Step S406, the electronic device obtains the second host field of the URL link to be distinguished.
步骤S407,电子设备判断第一主机字段与第二主机字段是否相同;若第一主机字段与第二主机字段相同,执行步骤S412;若第一主机字段与第二主机字段不相同,执行步骤S408。Step S407, the electronic device determines whether the first host field and the second host field are the same; if the first host field and the second host field are the same, go to step S412; if the first host field and the second host field are different, go to step S408 .
步骤S408,电子设备根据第二主机字段访问待区分URL链接的第二网站首页。Step S408, the electronic device accesses the home page of the second website linked by the URL to be distinguished according to the second host field.
步骤S409,电子设备提取第二网站首页的内容。Step S409, the electronic device extracts the content of the home page of the second website.
步骤S410,电子设备在第二网站首页的内容中获取待区分URL链接的第二ICP备案信息。Step S410, the electronic device obtains the second ICP filing information of the URL link to be distinguished from the content of the home page of the second website.
步骤S411,电子设备判断第一ICP备案信息和第二ICP备案信息是否相同;若第一ICP备案信息和第二ICP备案信息相同,执行步骤S412;若第一ICP备案信息和第二ICP备案信息不相同,执行步骤S413。Step S411, the electronic device determines whether the first ICP filing information and the second ICP filing information are the same; if the first ICP filing information and the second ICP filing information are the same, perform step S412; if the first ICP filing information and the second ICP filing information If not, step S413 is executed.
步骤S412,电子设备确定待区分URL链接是目标网站的内链。Step S412, the electronic device determines that the URL link to be distinguished is an internal link of the target website.
步骤S413,电子设备确定待区分URL链接是目标网站的外链。Step S413, the electronic device determines that the URL link to be distinguished is an external link of the target website.
采用本公开实施例提供的用于区分目标网站内外链的方法,通过目标网站的第一网站首页获取目标网站的第一ICP备案信息,在获取到目标网站的第一ICP备案信息的情况下,在第一网站首页中提取目标网站的待区分URL链接,然后在第一主机字段于第二主机字段相同的情况下,确定URL链接为目标网站的内链,在第一主机字段于第二主机字段不相同的情况下,通过URL连接的第二网站首页获取URL连接的第二ICP备案信息,以根据第一ICP备案信息和第二ICP备案信息区分待区分URL链接是目标网站的内链或外链。这样,不需要考虑目标网站的主域名、目标网站的IP地址、待区分URL链接的主域名和待区分URL链接的IP地址之间的关系导致的误判,在获取到目标网站的第一ICP备案信息的情况下,通过利用第一主机字段、第二主机字段、第一ICP备案信息和第二ICP备案信息区分待区分URL链接是目标网站的内链或外链,提高了区分目标网站内外链的准确度。Using the method for distinguishing internal and external links of the target website provided by the embodiments of the present disclosure, the first ICP filing information of the target website is obtained through the first website homepage of the target website, and when the first ICP filing information of the target website is obtained, Extract the URL link of the target website from the home page of the first website, and then determine the URL link as the internal link of the target website when the first host field and the second host field are the same, and the first host field is in the second host field. If the fields are not the same, obtain the second ICP filing information connected by the URL through the home page of the second website connected by the URL, so as to distinguish whether the URL link to be distinguished is the internal link of the target website or the second ICP filing information according to the first ICP filing information and the second ICP filing information. External link. In this way, there is no need to consider the misjudgment caused by the relationship between the main domain name of the target website, the IP address of the target website, the main domain name of the URL link to be distinguished, and the IP address of the URL link to be distinguished. In the case of filing information, by using the first host field, the second host field, the first ICP filing information and the second ICP filing information to distinguish whether the URL link to be differentiated is an internal link or an external link of the target website, it improves the ability to distinguish between inside and outside the target website. chain accuracy.
在一些实施例中,在针对目标网站www.xxx1.gov.cn网站进行内链和外链的判断的情况下,获取的目标网站的待区分URL链接包括URL1:“www.xxx1.gov.cn/hudong/hdjl/......”和URL2:“jw.xxx1.gov.cn/xxgk/zfxxgkml/......”,按照URL识别法区分URL1和URL2,目标网站的主域名为xxx1.gov.cn,URL1的主域名为xxx1.gov.cn,URL2的主域名为xxx1.gov.cn,三者的主域名相同,且通过域名解析两者具有共同的IP地址:111.13.x.x,则按照URL识别法确定URL1和URL2都是网站的内链。然而,获取的目标网站与待区分URL链接URL1的ICP备案信息相同,目标网站与待区分URL链接URL2的ICP备案信息不相同,因此,采用本方案的用于区分目标网站内外链的方法对待区分URL链接进行内外链区分,确定URL1是目标网站的内链,URL2是目标网站的外链。通过确定目标网站、URL1与URL2的主办单位或承办单位,获得目标网站与URL1的主办单位为xx1市xx2政府,URL2的主办单位为xx1市xx3委员会,即URL2是目标网站的外链。可见,本方案利用目标网站的ICP备案信息能够IP地址相同且主域名相同的外链被判断为内链,提高了区分目标网站内外链的准确度。In some embodiments, in the case of determining internal links and external links for the target website www.xxx1.gov.cn, the obtained URL links of the target website to be distinguished include URL1: "www.xxx1.gov.cn /hudong/hdjl/......" and URL2: "jw.xxx1.gov.cn/xxgk/zfxxgkml/......", according to the URL identification method to distinguish URL1 and URL2, the main domain name of the target website It is xxx1.gov.cn, the main domain name of URL1 is xxx1.gov.cn, and the main domain name of URL2 is xxx1.gov.cn. The main domain names of the three are the same, and they have a common IP address through domain name resolution: 111.13. x.x, according to the URL identification method, it is determined that both URL1 and URL2 are internal links of the website. However, the obtained target website is the same as the ICP filing information of the URL link URL1 to be distinguished, and the ICP filing information of the target website is different from the URL link URL2 to be distinguished. Therefore, the method for distinguishing the internal and external links of the target website in this scheme is used to distinguish The URL links are differentiated between internal and external links, and it is determined that URL1 is the internal link of the target website, and URL2 is the external link of the target website. By determining the organizer or organizer of the target website, URL1 and URL2, the organizer to obtain the target website and URL1 is the government of xx1 city xx2, and the organizer of URL2 is the committee of xx1 city xx3, that is, URL2 is the external link of the target website. It can be seen that this scheme can use the ICP filing information of the target website to be able to judge the external links with the same IP address and the same main domain name as internal links, which improves the accuracy of distinguishing between internal and external links of the target website.
在一些实施例中,在针对目标网站www.xxx2.gov.cn网站进行内链和外链的判断的情况下,获取的目标网站的待区分URL链接包括URL3“www.xxx2.gov.cn/col/col80524/index.html”和URL4“www.xxx2.cn/col/col80524/index.html”,按照URL识别法区分URL3和URL4,目标网站的主域名为xxx2.gov.cn、URL3的主域名为xxx2.gov.cn、URL4的主域名为xxx2.cn。目标网站的主域名与URL3的主域名相同,且通过域名解析两者也具有相同的IP地址,119.188.x.x,则URL3为目标网站的内链。目标网站的主域名与URL4的主域名不相同,且通过域名解析URL4的IP地址为202.110.x.x,与目标网站的IP地址不同,则URL4为目标网站的外链。然而,获取的目标网站与待区分URL链接URL3和URL4的ICP备案信息都相同,因此,采用本方案的用于区分目标网站内外链的方法对待区分URL链接进行内外链区分,确定URL3和URL4都是目标网站的内链。通过确定目标网站、URL3与URL4的主办单位或承办单位,获得目标网站、URL3和URL4的主办单位为xx3市xx4政府,即URL3和URL4都是目标网站的内链。可见,本方案利用目标网站的ICP备案信息能够避免IP地址不同且主域名不完全相同的内链被判断为外链,提高了区分目标网站内外链的准确度。In some embodiments, in the case of determining internal links and external links for the target website www.xxx2.gov.cn, the obtained URL links of the target website to be distinguished include URL3 "www.xxx2.gov.cn/ col/col80524/index.html" and URL4 "www.xxx2.cn/col/col80524/index.html", according to the URL identification method to distinguish URL3 and URL4, the main domain name of the target website is xxx2.gov.cn, the main domain name of URL3 The domain name is xxx2.gov.cn, and the primary domain name of URL4 is xxx2.cn. The main domain name of the target website is the same as the main domain name of URL3, and both have the same IP address through domain name resolution, 119.188.x.x, then URL3 is the internal link of the target website. The main domain name of the target website is different from the main domain name of URL4, and the IP address of URL4 resolved through the domain name is 202.110.x.x, which is different from the IP address of the target website, then URL4 is the external link of the target website. However, the obtained target website is the same as the ICP filing information of URL3 and URL4 to be distinguished. Therefore, the method for distinguishing internal and external links of the target website is adopted to distinguish between internal and external links of URL links, and it is determined that both URL3 and URL4 are It is the internal link of the target website. By determining the sponsor or organizer of the target website, URL3 and URL4, the sponsor to obtain the target website, URL3 and URL4 is the government of xx3 city xx4, that is, both URL3 and URL4 are internal links of the target website. It can be seen that this scheme can avoid internal links with different IP addresses and different primary domain names from being judged as external links by using the ICP filing information of the target website, which improves the accuracy of distinguishing internal and external links of the target website.
在一些实施例中,在业务上云和网站集约化建站场景下,导致URL识别法无法准确区分内外链的问题。本公开实施例提供引入ICP备案信息对待区分URL链接进行区分,能够从从管理域角度实现区分待区分URL链接是目标网站的内链或外链,提高了区分待区分URL链接的准确度。In some embodiments, in the scenario of cloud service and intensive website construction, the URL identification method cannot accurately distinguish between internal and external links. The embodiments of the present disclosure provide the introduction of ICP filing information to distinguish URL links to be distinguished, which can distinguish whether the URL links to be distinguished are internal or external links of the target website from the perspective of the management domain, and improve the accuracy of distinguishing URL links to be distinguished.
结合图5所示,本公开实施例提供一种用于区分目标网站内外链的装置,包括第一获取模块1、第二获取模块2和区分模块3。第一获取模块1被配置为获取目标网站的第一ICP备案信息;第二获取模块2被配置为获取目标网站的待区分URL链接;区分模块3被配置为根据第一ICP备案信息区分待区分URL链接是目标网站的内链或外链。With reference to FIG. 5 , an embodiment of the present disclosure provides an apparatus for distinguishing internal and external links of a target website, including a first obtaining
采用本公开实施例提供的用于区分目标网站内外链的装置,通过获取目标网站的第一ICP备案信息,获取目标网站的待区分URL链接,然后根据第一ICP备案信息区分待区分URL链接是目标网站的内链或外链。这样,不需要考虑目标网站的主域名、目标网站的IP地址、待区分URL链接的主域名和待区分URL链接的IP地址之间的关系导致的误判,通过利用目标网站的第一ICP备案信息区分待区分URL链接是目标网站的内链或外链,提高了区分目标网站内外链的准确度。Using the device for distinguishing internal and external links of a target website provided by the embodiment of the present disclosure, by obtaining the first ICP filing information of the target website, the URL links to be distinguished of the target website are obtained, and then the URL links to be distinguished are distinguished according to the first ICP filing information. Internal or external links to the target website. In this way, there is no need to consider the misjudgment caused by the relationship between the main domain name of the target website, the IP address of the target website, the main domain name of the URL link to be distinguished, and the IP address of the URL link to be distinguished, by using the first ICP record of the target website Information Distinction The URL link to be distinguished is the internal link or external link of the target website, which improves the accuracy of distinguishing the internal and external links of the target website.
可选地,第一获取模块被配置为通过以下方法获取目标网站的第一ICP备案信息:访问目标网站的第一网站首页;提取第一网站首页的内容;在第一网站首页的内容中获取目标网站的第一ICP备案信息。Optionally, the first obtaining module is configured to obtain the first ICP filing information of the target website through the following methods: visit the first website homepage of the target website; extract the content of the first website homepage; obtain from the content of the first website homepage The first ICP filing information of the target website.
可选地,第二获取模块被配置为通过以下方法获取目标网站的待区分URL链接:在获取到目标网站的第一ICP备案信息的情况下,在第一网站首页中提取目标网站的待区分URL链接。Optionally, the second obtaining module is configured to obtain the URL link to be distinguished of the target website by the following method: in the case of obtaining the first ICP filing information of the target website, extract the to-be-differentiated URL of the target website from the home page of the first website. URL link.
可选地,区分模块被配置为通过以下方法根据第一ICP备案信息区分待区分URL链接是目标网站的内链或外链:获取第一网站首页的第一主机字段;获取待区分URL链接的第二主机字段;在第一主机字段与第二主机字段相同的情况下,将待区分URL链接确定为目标网站的内链;在第一主机字段与第二主机字段不相同的情况下,根据第一ICP备案信息区分待区分URL链接是目标网站的内链或外链。Optionally, the distinguishing module is configured to distinguish whether the URL link to be distinguished is an internal link or an external link of the target website according to the first ICP filing information by the following methods: obtain the first host field of the homepage of the first website; obtain the URL link to be distinguished. The second host field; when the first host field is the same as the second host field, the URL link to be distinguished is determined as the internal link of the target website; when the first host field is different from the second host field, according to The first ICP filing information distinguishes whether the URL link to be distinguished is an internal link or an external link of the target website.
可选地,区分模块被配置为通过以下方法在第一主机字段与第二主机字段不相同的情况下根据第一ICP备案信息区分待区分URL链接是目标网站的内链或外链:在第一主机字段与第二主机字段不相同的情况下,根据第二主机字段访问待区分URL链接的第二网站首页;提取第二网站首页的内容;在第二网站首页的内容中获取待区分URL链接的第二ICP备案信息;根据第一ICP备案信息和第二ICP备案信息区分待区分URL链接是目标网站的内链或外链。Optionally, the distinguishing module is configured to distinguish whether the URL link to be distinguished is an internal link or external link of the target website according to the first ICP filing information when the first host field is different from the second host field by the following method: When the first host field is different from the second host field, access the home page of the second website linked by the URL to be distinguished according to the second host field; extract the content of the home page of the second website; obtain the URL to be distinguished from the content of the home page of the second website Linked second ICP filing information; according to the first ICP filing information and the second ICP filing information, the URL link to be distinguished is the internal link or external link of the target website.
可选地,区分模块被配置为通过以下方法根据第一ICP备案信息和第二ICP备案信息区分待区分URL链接是目标网站的内链或外链:在第一ICP备案信息和第二ICP备案信息相同的情况下,确定待区分URL链接是目标网站的内链;和/或,在第一ICP备案信息和第二ICP备案信息不相同的情况下,确定待区分URL链接是目标网站的外链。Optionally, the distinguishing module is configured to distinguish whether the URL link to be distinguished is an internal link or an external link of the target website according to the first ICP filing information and the second ICP filing information by the following method: in the first ICP filing information and the second ICP filing information In the case of the same information, it is determined that the URL link to be distinguished is an internal link of the target website; and/or, when the first ICP filing information and the second ICP filing information are different, it is determined that the URL link to be distinguished is an external link of the target website. chain.
结合图6所示,本公开实施例提供一种用于区分目标网站内外链的装置,包括处理器(processor)100和存储器(memory)101。可选地,该装置还可以包括通信接口(Communication Interface)102和总线103。其中,处理器100、通信接口102、存储器101可以通过总线103完成相互间的通信。通信接口102可以用于信息传输。处理器100可以调用存储器101中的逻辑指令,以执行上述实施例的用于区分目标网站内外链的方法。With reference to FIG. 6 , an embodiment of the present disclosure provides an apparatus for distinguishing internal and external links of a target website, including a processor (processor) 100 and a memory (memory) 101 . Optionally, the apparatus may further include a communication interface (Communication Interface) 102 and a
此外,上述的存储器101中的逻辑指令可以通过软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。In addition, the above-mentioned logic instructions in the
存储器101作为一种计算机可读存储介质,可用于存储软件程序、计算机可执行程序,如本公开实施例中的方法对应的程序指令/模块。处理器100通过运行存储在存储器101中的程序指令/模块,从而执行功能应用以及数据处理,即实现上述实施例中用于区分目标网站内外链的方法。As a computer-readable storage medium, the
存储器101可包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序;存储数据区可存储根据终端设备的使用所创建的数据等。此外,存储器101可以包括高速随机存取存储器,还可以包括非易失性存储器。The
采用本公开实施例提供的用于区分目标网站内外链的装置,能通过获取目标网站的第一ICP备案信息,获取目标网站的待区分URL链接,然后根据第一ICP备案信息区分待区分URL链接是目标网站的内链或外链。这样,不需要考虑目标网站的主域名、目标网站的IP地址、待区分URL链接的主域名和待区分URL链接的IP地址之间的关系导致的误判,通过利用目标网站的第一ICP备案信息区分待区分URL链接是目标网站的内链或外链,提高了区分目标网站内外链的准确度。Using the device for distinguishing internal and external links of a target website provided by the embodiment of the present disclosure, the URL links to be distinguished of the target website can be obtained by obtaining the first ICP filing information of the target website, and then the URL links to be distinguished can be distinguished according to the first ICP filing information. It is the internal or external link of the target website. In this way, there is no need to consider the misjudgment caused by the relationship between the main domain name of the target website, the IP address of the target website, the main domain name of the URL link to be distinguished, and the IP address of the URL link to be distinguished, by using the first ICP record of the target website Information Distinction The URL link to be distinguished is the internal link or external link of the target website, which improves the accuracy of distinguishing the internal and external links of the target website.
本公开实施例提供了一种电子设备,包含上述的用于区分目标网站内外链的装置。An embodiment of the present disclosure provides an electronic device, including the above-mentioned apparatus for distinguishing internal and external links of a target website.
采用本公开实施例提供的电子设备,通过获取目标网站的第一ICP备案信息,获取目标网站的待区分URL链接,然后根据第一ICP备案信息区分待区分URL链接是目标网站的内链或外链。这样,不需要考虑目标网站的主域名、目标网站的IP地址、待区分URL链接的主域名和待区分URL链接的IP地址之间的关系导致的误判,通过利用目标网站的第一ICP备案信息区分待区分URL链接是目标网站的内链或外链,提高了区分目标网站内外链的准确度。Using the electronic device provided by the embodiment of the present disclosure, by obtaining the first ICP filing information of the target website, the URL link to be distinguished of the target website is obtained, and then according to the first ICP filing information, it is distinguished whether the URL link to be distinguished is the internal link or the external link of the target website. chain. In this way, there is no need to consider the misjudgment caused by the relationship between the main domain name of the target website, the IP address of the target website, the main domain name of the URL link to be distinguished, and the IP address of the URL link to be distinguished, by using the first ICP record of the target website Information Distinction The URL link to be distinguished is the internal link or external link of the target website, which improves the accuracy of distinguishing the internal and external links of the target website.
可选地,该电子设备包括智能终端或服务器。可选地,智能终端包括智能手机、平板或计算机等能够访问网站的装置。Optionally, the electronic device includes an intelligent terminal or a server. Optionally, the smart terminal includes a device capable of accessing a website, such as a smart phone, a tablet, or a computer.
本公开实施例提供了一种存储介质,存储有计算机可执行指令,所述计算机可执行指令设置为执行上述用于区分目标网站内外链的方法。An embodiment of the present disclosure provides a storage medium storing computer-executable instructions, where the computer-executable instructions are configured to execute the above method for distinguishing internal and external links of a target website.
本公开实施例提供了一种计算机程序产品,所述计算机程序产品包括存储在计算机可读存储介质上的计算机程序,所述计算机程序包括程序指令,当所述程序指令被计算机执行时,使所述计算机执行上述用于区分目标网站内外链的方法。An embodiment of the present disclosure provides a computer program product, where the computer program product includes a computer program stored on a computer-readable storage medium, and the computer program includes program instructions that, when executed by a computer, cause all The computer executes the above-mentioned method for distinguishing internal and external links of the target website.
上述的计算机可读存储介质可以是暂态计算机可读存储介质,也可以是非暂态计算机可读存储介质。The above-mentioned computer-readable storage medium may be a transient computer-readable storage medium, and may also be a non-transitory computer-readable storage medium.
本公开实施例的技术方案可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括一个或多个指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本公开实施例所述方法的全部或部分步骤。而前述的存储介质可以是非暂态存储介质,包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等多种可以存储程序代码的介质,也可以是暂态存储介质。The technical solutions of the embodiments of the present disclosure may be embodied in the form of software products, and the computer software products are stored in a storage medium and include one or more instructions to enable a computer device (which may be a personal computer, a server, or a network equipment, etc.) to execute all or part of the steps of the methods described in the embodiments of the present disclosure. The aforementioned storage medium may be a non-transitory storage medium, including: U disk, removable hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk, etc. A medium that can store program codes, and can also be a transient storage medium.
以上描述和附图充分地示出了本公开的实施例,以使本领域的技术人员能够实践它们。其他实施例可以包括结构的、逻辑的、电气的、过程的以及其他的改变。实施例仅代表可能的变化。除非明确要求,否则单独的部件和功能是可选的,并且操作的顺序可以变化。一些实施例的部分和特征可以被包括在或替换其他实施例的部分和特征。而且,本申请中使用的用词仅用于描述实施例并且不用于限制权利要求。如在实施例以及权利要求的描述中使用的,除非上下文清楚地表明,否则单数形式的“一个”(a)、“一个”(an)和“所述”(the)旨在同样包括复数形式。类似地,如在本申请中所使用的术语“和/或”是指包含一个或一个以上相关联的列出的任何以及所有可能的组合。另外,当用于本申请中时,术语“包括”(comprise)及其变型“包括”(comprises)和/或包括(comprising)等指陈述的特征、整体、步骤、操作、元素,和/或组件的存在,但不排除一个或一个以上其它特征、整体、步骤、操作、元素、组件和/或这些的分组的存在或添加。在没有更多限制的情况下,由语句“包括一个…”限定的要素,并不排除在包括所述要素的过程、方法或者设备中还存在另外的相同要素。本文中,每个实施例重点说明的可以是与其他实施例的不同之处,各个实施例之间相同相似部分可以互相参见。对于实施例公开的方法、产品等而言,如果其与实施例公开的方法部分相对应,那么相关之处可以参见方法部分的描述。The foregoing description and drawings sufficiently illustrate the embodiments of the present disclosure to enable those skilled in the art to practice them. Other embodiments may include structural, logical, electrical, process, and other changes. The examples represent only possible variations. Unless expressly required, individual components and functions are optional and the order of operations may vary. Portions and features of some embodiments may be included in or substituted for those of other embodiments. Also, the terms used in this application are used to describe the embodiments only and not to limit the claims. As used in the description of the embodiments and the claims, the singular forms "a" (a), "an" (an) and "the" (the) are intended to include the plural forms as well, unless the context clearly dictates otherwise. . Similarly, the term "and/or" as used in this application is meant to include any and all possible combinations of one or more of the associated listings. Additionally, when used in this application, the term "comprise" and its variations "comprises" and/or including and/or the like refer to stated features, integers, steps, operations, elements, and/or The presence of a component does not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components and/or groupings of these. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in the process, method, or device that includes the element. Herein, each embodiment may focus on the differences from other embodiments, and the same and similar parts between the various embodiments may refer to each other. For the methods, products, etc. disclosed in the embodiments, if they correspond to the method section disclosed in the embodiments, reference may be made to the description of the method section for relevant parts.
本领域技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,可以取决于技术方案的特定应用和设计约束条件。所述技术人员可以对每个特定的应用来使用不同方法以实现所描述的功能,但是这种实现不应认为超出本公开实施例的范围。所述技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can realize that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software may depend on the specific application and design constraints of the technical solution. Skilled artisans may use different methods for implementing the described functionality for each particular application, but such implementations should not be considered beyond the scope of the disclosed embodiments. The skilled person can clearly understand that, for the convenience and brevity of description, the specific working process of the above-described systems, devices and units can refer to the corresponding processes in the foregoing method embodiments, and details are not repeated here.
本文所披露的实施例中,所揭露的方法、产品(包括但不限于装置、设备等),可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,可以仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例。另外,在本公开实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。In the embodiments disclosed herein, the disclosed methods and products (including but not limited to apparatuses, devices, etc.) may be implemented in other ways. For example, the apparatus embodiments described above are only illustrative. For example, the division of the units may only be a logical function division. In actual implementation, there may be other division methods, for example, multiple units or components may be combined Either it can be integrated into another system, or some features can be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms. The units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. This embodiment may be implemented by selecting some or all of the units according to actual needs. In addition, each functional unit in the embodiment of the present disclosure may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
附图中的流程图和框图显示了根据本公开实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或代码的一部分,所述模块、程序段或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这可以依所涉及的功能而定。在附图中的流程图和框图所对应的描述中,不同的方框所对应的操作或步骤也可以以不同于描述中所披露的顺序发生,有时不同的操作或步骤之间不存在特定的顺序。例如,两个连续的操作或步骤实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这可以依所涉及的功能而定。框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或动作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more functions for implementing the specified logical function(s) executable instructions. In some alternative implementations, the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. In the descriptions corresponding to the flowcharts and block diagrams in the accompanying drawings, operations or steps corresponding to different blocks may also occur in different sequences than those disclosed in the description, and sometimes there is no specific relationship between different operations or steps. order. For example, two consecutive operations or steps may, in fact, be performed substantially concurrently, or they may sometimes be performed in the reverse order, depending upon the functionality involved. Each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented in special purpose hardware-based systems that perform the specified functions or actions, or special purpose hardware implemented in combination with computer instructions.
Claims (10)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202111674847.4A CN114385950A (en) | 2021-12-31 | 2021-12-31 | Method and device, electronic device and storage medium for distinguishing internal and external links of target website |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202111674847.4A CN114385950A (en) | 2021-12-31 | 2021-12-31 | Method and device, electronic device and storage medium for distinguishing internal and external links of target website |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN114385950A true CN114385950A (en) | 2022-04-22 |
Family
ID=81199699
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202111674847.4A Pending CN114385950A (en) | 2021-12-31 | 2021-12-31 | Method and device, electronic device and storage medium for distinguishing internal and external links of target website |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN114385950A (en) |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102902917A (en) * | 2011-07-29 | 2013-01-30 | 国际商业机器公司 | Method and system for preventing phishing attacks |
| CN108270754A (en) * | 2017-01-03 | 2018-07-10 | 中国移动通信有限公司研究院 | A kind of detection method and device of fishing website |
| CN111865886A (en) * | 2019-04-30 | 2020-10-30 | 深信服科技股份有限公司 | IP address information configuration method, system, device and storage medium |
| CN112217815A (en) * | 2020-10-10 | 2021-01-12 | 杭州安恒信息技术股份有限公司 | Phishing website identification method and device and computer equipment |
| CN113407802A (en) * | 2021-06-10 | 2021-09-17 | 杭州安恒信息技术股份有限公司 | Spider pool website identification method and device, electronic device and storage medium |
-
2021
- 2021-12-31 CN CN202111674847.4A patent/CN114385950A/en active Pending
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102902917A (en) * | 2011-07-29 | 2013-01-30 | 国际商业机器公司 | Method and system for preventing phishing attacks |
| CN108270754A (en) * | 2017-01-03 | 2018-07-10 | 中国移动通信有限公司研究院 | A kind of detection method and device of fishing website |
| CN111865886A (en) * | 2019-04-30 | 2020-10-30 | 深信服科技股份有限公司 | IP address information configuration method, system, device and storage medium |
| CN112217815A (en) * | 2020-10-10 | 2021-01-12 | 杭州安恒信息技术股份有限公司 | Phishing website identification method and device and computer equipment |
| CN113407802A (en) * | 2021-06-10 | 2021-09-17 | 杭州安恒信息技术股份有限公司 | Spider pool website identification method and device, electronic device and storage medium |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US10216848B2 (en) | Method and system for recommending cloud websites based on terminal access statistics | |
| US9954895B2 (en) | System and method for identifying phishing website | |
| CN109274632B (en) | Method and device for identifying a website | |
| CN102801574B (en) | The detection method of a kind of web page interlinkage, device and system | |
| ES2866723T3 (en) | Online fraud detection dynamic score aggregation methods and systems | |
| CN103491543A (en) | Method for detecting malicious websites through wireless terminal, and wireless terminal | |
| CN103888490A (en) | Automatic WEB client man-machine identification method | |
| WO2019109529A1 (en) | Webpage identification method, device, computer apparatus, and computer storage medium | |
| CN108270754B (en) | Method and device for detecting phishing website | |
| CN106127463A (en) | One is transferred accounts control method and terminal unit | |
| CN108900554A (en) | Http protocol asset detecting method, system, equipment and computer media | |
| CN104426868A (en) | Request processing method and processing apparatus | |
| CN105138912A (en) | Method and device for generating phishing website detection rules automatically | |
| CN107135199B (en) | Method and device for detecting webpage backdoor | |
| CN105187439A (en) | Phishing website detection method and device | |
| CN102831232B (en) | The matching process of character string and device | |
| CN114168945A (en) | Method and device for detecting potential risk of sub-domain name | |
| CN111212153A (en) | IP address checking method, device, terminal equipment and storage medium | |
| CN110929185A (en) | Website directory detection method and device, computer equipment and computer storage medium | |
| CN103618742A (en) | Method and system for acquiring sub domain names and webmaster permission verification method | |
| CN114710468A (en) | Domain name generation and identification method, device, equipment and medium | |
| CN106611022B (en) | Method and device for improving search efficiency in website | |
| CN114385950A (en) | Method and device, electronic device and storage medium for distinguishing internal and external links of target website | |
| CN115033599B (en) | Graph query method, system and related device based on multi-party security | |
| CN118174972A (en) | A method, device and electronic device for feature expansion of threat intelligence data |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |