Disclosure of Invention
In view of this, embodiments of the present invention provide a method and an apparatus for determining a website root domain name, which can accurately obtain the website root domain name in a target URL without introducing an external file.
To achieve the above object, according to one aspect of the present invention, a method for determining a root domain name of a website is provided.
The method for determining the website root domain name in the embodiment of the invention comprises the following steps: extracting a domain name part in the target URL; selecting two elements at the rightmost end of the domain name part to construct a current domain name; executing operation of creating cookie files under the current domain name; after the operation is finished, judging whether the cookie file exists under the current domain name: if yes, determining the current domain name as a website root domain name corresponding to the target URL; otherwise, splicing the right-most end element which is not selected in the domain name part with the current domain name to form a new current domain name, and executing the steps of creating and judging again; and the arrangement sequence of the elements in each current domain name is consistent with the domain name part.
Optionally, the extracting a domain name part in the target URL includes: the domain name portion is extracted using a document domain name attribute of the browser script.
Optionally, the performing an operation of creating a cookie file under the current domain name includes: and assigning the customized cookie file identification data and the current domain name to the cookie related attribute of the browser script.
Optionally, the determining whether the cookie file exists under the current domain name includes: and reading cookie file information under the current domain name based on the cookie related attribute of the browser script, and judging whether cookie file identification data exists in returned data.
Optionally, the method further comprises: after determining a website root domain name corresponding to the target URL, assigning cookie file identification data used when creating the cookie file and expiration time data with a value of zero to cookie related attributes of the browser script, and deleting the created cookie file.
Optionally, the browser script is JavaScript; the document domain name attribute is document. Cookie related attribute is document. The cookie file identification data comprises: the name of the cookie file and the value of the cookie file.
To achieve the above object, according to another aspect of the present invention, there is provided an apparatus for determining a root domain name of a website.
The device for determining the website root domain name in the embodiment of the invention can comprise: an extracting unit configured to extract a domain name part in the target URL; the current domain name constructing unit is used for selecting two elements at the rightmost end of the domain name part to construct a current domain name; the cookie creating unit is used for executing the operation of creating the cookie file under the current domain name; a determination unit configured to: after the operation is finished, judging whether the cookie file exists under the current domain name: if yes, determining the current domain name as a website root domain name corresponding to the target URL; otherwise, splicing the right-most end element which is not selected in the domain name part with the current domain name to form a new current domain name, and executing the steps of creating and judging again; and the arrangement sequence of the elements in each current domain name is consistent with the domain name part.
Optionally, the cookie creating unit may be further configured to: assigning the customized cookie file identification data and the current domain name to cookie related attributes of the browser script; the determining unit may be further configured to: and reading cookie file information under the current domain name based on the cookie related attribute of the browser script, and judging whether cookie file identification data exists in returned data.
To achieve the above object, according to still another aspect of the present invention, there is provided an electronic apparatus.
An electronic device of the present invention includes: one or more processors; the storage device is used for storing one or more programs, and when the one or more programs are executed by the one or more processors, the one or more processors implement the method for determining the root domain name of the website provided by the invention.
To achieve the above object, according to still another aspect of the present invention, there is provided a computer-readable storage medium.
A computer-readable storage medium of the present invention has stored thereon a computer program which, when executed by a processor, implements the method of determining a root domain name of a web site provided by the present invention.
According to the technical scheme of the invention, the embodiment of the invention has the following advantages or beneficial effects: firstly, extracting a domain name part in a target URL, then selecting two elements at the rightmost end of the domain name part to construct a current domain name, trying to create a cookie under the current domain name, and if the creation is successful, indicating that the current domain name is a website root domain name corresponding to the target URL; if the creation is failed, the current domain name is not legal, at this time, the right-most end elements which are not selected in the domain name part can be spliced with the current domain name to form a new current domain name, and the steps of creating the cookie file and judging whether the creation is successful or not are repeatedly executed until the corresponding website root domain name is obtained. According to the method, the website root domain name is obtained by utilizing the characteristic that the browser can only write the cookie under the legal domain name, and the website root domain name is accurately identified on the premise of not introducing an external library file.
Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.
Detailed Description
Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
It should be noted that the embodiments of the present invention and the technical features of the embodiments may be combined with each other without conflict.
Fig. 1 is a schematic diagram illustrating the main steps of a method for determining a root domain name of a website according to an embodiment of the present invention.
As shown in fig. 1, the method for determining a root domain name of a website according to the embodiment of the present invention may be specifically executed according to the following steps:
step S101: the domain name portion in the target URL is extracted.
In the embodiment of the present invention, the website root domain name refers to the root domain name of one site, which is different from the "root domain name" in "13 root domain name servers all over the world". For example, the domain name of a site search service is "www.test.com", the domain name of a translation service is "fanyi. It can be seen that for any website domain name, which is composed of multiple elements (typically, one element is a word), any two adjacent elements are separated by a separator ". times.. Generally, for any website domain name, the rightmost element of the website domain name may be referred to as a top-level domain name (also referred to as a first-level domain name), and a second-level domain name, a third-level domain name, a fourth-level domain name, and the like may be sequentially arranged from left to right. At present, three types of top-level domain names are shared in a domain name system on the internet, namely category top-level domain names (such as com, net, org, gov and the like), geographic top-level domain names (such as cn, uk and the like), and new top-level domain names (such as aero, biz, coop and the like).
In this step, the domain name portion in the target URL may be extracted first. It will be appreciated that the web page URL is generally comprised of a protocol section (e.g., http:), a domain name section, a port section (e.g., 8080), a virtual directory section (e.g.,/a/b), a parameter section, and the like. In some embodiments, the domain name portion may be extracted using a document domain name attribute (i.e., domain main) of a browser script (e.g., JavaScript), and for example, the domain name portion may be obtained by inputting JavaScript: alert in an address bar of a page corresponding to the target URL. In the area of front-end technology, alert is a method for displaying a specified message.
Step S102: and selecting two elements at the rightmost end of the domain name part to construct the current domain name.
After the domain name part of the target URL is obtained, the website root domain name can be obtained by utilizing the characteristic that the browser can only write cookies under a legal domain name. Because the website root domain name is necessarily composed of two or more than two elements at the right end of the domain name part, the two elements at the rightmost end of the domain name part are selected to construct the current domain name in the step. It will be appreciated that the current domain name needs to be constructed while maintaining the order of the elements in the domain name portion, and while retaining the delimiter. For example, if the domain name portion extracted from the target URL is "www.test.com.cn", the current domain name constructed in this step is "com.
Step S103: the operation of creating the cookie file is performed under the current domain name.
In the field of computer technology, a cookie refers to a text file stored in a user terminal and data therein, and a server can recognize a user state through cookie information carried when the user terminal sends a request. Preferably, in the embodiment of the present invention, a cookie-related attribute (i.e., document cookie) in JavaScript may be used to attempt to create and save a cookie file. Specifically, custom cookie file identification data (e.g., the name of the cookie file and the value of the cookie file) and the current domain name may be assigned to the document. For example, a code document, cookie _ test 1; cn performs the above assignment. It is to be understood that "_ cookie _ test" in the above code is the name of the cookie file to be created, "1" is the value of the cookie file to be created, "domain" is the domain name attribute of the cookie file, and "com.
Step S104: and after the operation is finished, judging whether the cookie file exists under the current domain name.
In this step, after the operation of creating the cookie file is completed, cookie file information under the current domain name may be read based on the document. If the cookie file identification data exists in the returned data, which indicates that the cookie file is successfully created and the corresponding current domain name is legal, the current domain name is determined as the website root domain name corresponding to the target URL (i.e., step S105). Generally, all cookie file information under the current domain name can be obtained by reading the document.
If the cookie file identification data does not exist in the returned data, indicating that the creation of the cookie file fails, at this time, the rightmost element that is not selected in the domain name part may be spliced to the current domain name to form a new current domain name, and the aforementioned steps of creating the cookie file and determining whether the creation of the cookie file is successful are performed again until the website root domain name is determined (i.e., step S106).
For example, under the current domain name "com.cn", document.cookie _ cookie _ test 1 is executed; cn to try to create a cookie file, the cookie may be read to obtain cookie file information under the current domain name "com. In practical application, it can be found that corresponding cookie file identification data "_ cookie _ test ═ 1" does not exist in the returned data, which indicates that creating a cookie file before the time is failed, at this time, the rightmost element "test" that is not selected in the domain name part "www.test.com.cn" may be spliced to the current domain name "com.cn" to form a new current domain name "test.com.cn", and the steps of creating a cookie file and determining whether the cookie file is successful or not are executed again under the new current domain name "test.com.cn", that is, the step of executing document. Com.cn to try to create a cookie file, and thereafter reads the cookie to acquire cookie file information under the current domain name "test. In practical application, it can be found that cookie file identification data "_ cookie _ test ═ 1" exists in the returned character string data, which indicates that the current domain name is legal, and the current domain name "test.
In specific application, if the current domain name is illegal, the cookie file cannot be successfully created under the current domain name, and no cookie file exists under the current domain name; creating a cookie file under the current domain name is generally successful if the current domain name is legitimate, and there may be other cookie files under the current domain name in addition to this cookie file created. Thus, in some embodiments, whether the cookie file was created successfully may be determined by: and if the returned data is null after the document cookie attribute is read, the creation is failed, and if the returned data is not null, the creation is successful.
Preferably, since the cookie file created in the above process has no typical role of the cookie file, the created cookie file may be deleted after determining the website root domain name corresponding to the target URL. Specifically, cookie file identification data used when creating a cookie file and expiration time data whose value is zero may be assigned to the document. And (5) deleting the created cookie file if maxAge is 0. It is understood that maxAge in the above instructions is the expiration time attribute of the cookie file.
Fig. 2 is a schematic diagram of a specific implementation of the method for determining a root domain name of a website in the embodiment of the present invention, and specifically executes the following steps: step S201: after the domain name part is extracted from the target URL, a preset regular expression is used for matching with the domain name part. In this step, a small number of regular expressions can be written based on the most common top-level domain names com, cn, com. Step S202: and judging whether the matching is successful, if so, obtaining the website root domain name (namely step S205), and ending the process. Step S203: and if the matching is judged to be failed, constructing the current domain name according to the method, and trying to create a cookie file under the current domain name. Step S204: whether creating the cookie file is successful is determined using the aforementioned method. Step S205: if the creation is successful, the website root domain name is obtained, and the process is ended. In step S206, if the creation fails, a new current domain name is constructed by using the aforementioned method, and step S203 and step S204 are executed again until the website root domain name is finally obtained.
According to the technical scheme of the embodiment of the invention, firstly, a domain name part in a target URL is extracted, then two elements at the rightmost end of the domain name part are selected to construct a current domain name, cookie creation is tried under the current domain name, and if the cookie creation is successful, the current domain name is indicated to be a website root domain name corresponding to the target URL; if the creation is failed, the current domain name is not legal, at this time, the right-most end elements which are not selected in the domain name part can be spliced with the current domain name to form a new current domain name, and the steps of creating the cookie file and judging whether the creation is successful or not are repeatedly executed until the corresponding website root domain name is obtained. The method obtains the website root domain name by utilizing the characteristic that the browser can only write the cookie under the legal domain name, and realizes accurate identification of the website root domain name on the premise of not influencing the loading performance of the website.
It should be noted that, for the convenience of description, the foregoing method embodiments are described as a series of acts, but those skilled in the art will appreciate that the present invention is not limited by the order of acts described, and that some steps may in fact be performed in other orders or concurrently. Moreover, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no acts or modules are necessarily required to implement the invention.
To facilitate a better implementation of the above-described aspects of embodiments of the present invention, the following also provides relevant means for implementing the above-described aspects.
Referring to fig. 3, an apparatus 300 for determining a root domain name of a web site according to an embodiment of the present invention may include: an extracting unit 301, a current domain name constructing unit 302, a cookie creating unit 303, and a judging unit 304.
Wherein, the extracting unit 301 may be configured to extract a domain name part in the target URL; the current domain name constructing unit 302 may be configured to select two elements at the rightmost end of the domain name part to construct a current domain name; the cookie creating unit 303 may be configured to perform an operation of creating a cookie file under the current domain name; the determining unit 304 is operable to: after the operation is finished, judging whether the cookie file exists under the current domain name: if yes, determining the current domain name as a website root domain name corresponding to the target URL; otherwise, splicing the right-most end element which is not selected in the domain name part with the current domain name to form a new current domain name, and executing the steps of creating and judging again; and the arrangement sequence of the elements in each current domain name is consistent with the domain name part.
In an embodiment of the present invention, the cookie creating unit 303 may be further configured to: assigning the customized cookie file identification data and the current domain name to cookie related attributes of the browser script; the determining unit 304 may be further configured to: and reading cookie file information under the current domain name based on the cookie related attribute of the browser script, and judging whether cookie file identification data exists in returned data.
In a specific application, the extracting unit 301 may further be configured to: the domain name portion is extracted using a document domain name attribute of the browser script.
As a preferred aspect, the apparatus 300 may further include a cookie deletion unit for: after determining a website root domain name corresponding to the target URL, assigning cookie file identification data used when creating the cookie file and expiration time data with a value of zero to cookie related attributes of the browser script, and deleting the created cookie file.
In addition, in the embodiment of the present invention, the browser script is JavaScript; the document domain name attribute is document. Cookie related attribute is document. The cookie file identification data comprises: the name of the cookie file and the value of the cookie file.
According to the technical scheme of the embodiment of the invention, firstly, a domain name part in a target URL is extracted, then two elements at the rightmost end of the domain name part are selected to construct a current domain name, cookie creation is tried under the current domain name, and if the cookie creation is successful, the current domain name is indicated to be a website root domain name corresponding to the target URL; if the creation is failed, the current domain name is not legal, at this time, the right-most end elements which are not selected in the domain name part can be spliced with the current domain name to form a new current domain name, and the steps of creating the cookie file and judging whether the creation is successful or not are repeatedly executed until the corresponding website root domain name is obtained. The method obtains the website root domain name by utilizing the characteristic that the browser can only write the cookie under the legal domain name, and realizes accurate identification of the website root domain name on the premise of not influencing the loading performance of the website.
The invention also provides the electronic equipment. The electronic device of the embodiment of the invention comprises: one or more processors; the storage device is used for storing one or more programs, and when the one or more programs are executed by the one or more processors, the one or more processors implement the method for determining the root domain name of the website provided by the invention.
Referring now to FIG. 4, a block diagram of a computer system 400 suitable for use with the electronic device implementing an embodiment of the invention is shown. The electronic device shown in fig. 4 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 4, the computer system 400 includes a Central Processing Unit (CPU)401 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)402 or a program loaded from a storage section 408 into a Random Access Memory (RAM) 403. In the RAM403, various programs and data necessary for the operation of the computer system 400 are also stored. The CPU401, ROM 402, and RAM403 are connected to each other via a bus 404. An input/output (I/O) interface 405 is also connected to bus 404.
The following components are connected to the I/O interface 405: an input section 406 including a keyboard, a mouse, and the like; an output section 407 including a display device such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 408 including a hard disk and the like; and a communication section 409 including a network interface card such as a LAN card, a modem, or the like. The communication section 409 performs communication processing via a network such as the internet. A driver 410 is also connected to the I/O interface 405 as needed. A removable medium 411 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 410 as necessary, so that a computer program read out therefrom is mounted into the storage section 408 as necessary.
In particular, the processes described in the main step diagrams above may be implemented as computer software programs, according to embodiments of the present disclosure. For example, embodiments of the invention include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the main step diagram. In the above-described embodiment, the computer program can be downloaded and installed from a network through the communication section 409, and/or installed from the removable medium 411. The computer program performs the above-described functions defined in the system of the present invention when executed by the central processing unit 401.
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present invention may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes an extracting unit, a current domain name constructing unit, a cookie creating unit, and a judging unit. Where the names of these units do not in some cases constitute a limitation of the unit itself, for example, an extraction unit may also be described as a "unit providing a domain name part to the current domain name construction unit".
According to the technical scheme of the embodiment of the invention, firstly, a domain name part in a target URL is extracted, then two elements at the rightmost end of the domain name part are selected to construct a current domain name, cookie creation is tried under the current domain name, and if the cookie creation is successful, the current domain name is indicated to be a website root domain name corresponding to the target URL; if the creation is failed, the current domain name is not legal, at this time, the right-most end elements which are not selected in the domain name part can be spliced with the current domain name to form a new current domain name, and the steps of creating the cookie file and judging whether the creation is successful or not are repeatedly executed until the corresponding website root domain name is obtained. The method obtains the website root domain name by utilizing the characteristic that the browser can only write the cookie under the legal domain name, and realizes accurate identification of the website root domain name on the premise of not influencing the loading performance of the website.
The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.