HK40000813A

HK40000813A - System and methods for detecting online fraud

Info

Publication number: HK40000813A
Application number: HK19124260.1A
Authority: HK
Inventors: A-O‧达米安
Original assignee: 比特梵德知识产权管理有限公司
Priority date: 2016-07-11
Filing date: 2017-07-10
Publication date: 2020-02-14

Description

System and method for detecting online fraud

Background

The present invention relates to computer security systems and methods, and in particular, to systems and methods for detecting online fraud (e.g., fraudulent web pages).

The rapid development of services such as electronic communication, online commerce and online banking commerce is accompanied by the increase of electronic crimes. Internet fraud, particularly in the form of phishing and identity theft, has become an increasing threat to users of the internet worldwide. Sensitive identity information and credit card details fraudulently obtained by an international criminal network operating on the internet are used to fund various online transactions and/or are further sold to third parties. In addition to direct economic losses to individuals, internet fraud can also cause a range of undesirable side effects, such as increased security costs to companies, increased retail and banking costs, decreased stock value, decreased payroll and decreased taxes.

In an exemplary phishing attempt, a fake website masquerades as a real webpage belonging to an online retailer or financial institution inviting the user to enter some personal information (e.g., username, password) and/or financial information (e.g., credit card number, account number, security code). Once an unarmed user submits information, it may be collected by a fake web site. In addition, the user may be directed to another web page that may install malware on the user's computer. Malware (e.g., viruses, trojan horses) can continue to steal personal information by recording the keystrokes a user presses when accessing certain web pages, and can turn the user's computer into a platform for launching other malicious attacks.

Software running on the internet user's computer system may be used to identify fraudulent network documents and issue alerts and/or prevent access to these documents. Several methods have been proposed for identifying fraudulent web pages. Exemplary policies include matching the addresses of web pages to a list of known fraudulent and/or trusted addresses (techniques referred to as blacklisting and whitelisting, respectively). To avoid such detection, fraudsters often alter their website address.

Efforts have been made to develop methods for detecting and preventing online fraud, particularly methods that enable active detection.

Disclosure of Invention

According to one aspect, a computer system includes at least one hardware processor configured to operate an inverse address mapper, a registration data filter connected to the inverse address mapper, and a content analyzer connected to the registration data filter. The reverse address mapper is configured to identify a set of co-hosted internet domains from a known rogue internet domain, wherein the known rogue internet domain is located at a target Internet Protocol (IP) address, and wherein identifying the set of co-hosted internet domains comprises selecting the set of co-hosted internet domains such that all components of the set of co-hosted internet domains are located at the target IP address. The registration data filter is configured to filter the set of co-hosted internet domains to produce a subset of fraud candidate domains. Filtering the set of co-hosted internet domains comprises determining whether a selection condition is satisfied from domain name registration data characterizing one of the set of co-hosted internet domains, and in response, selecting the domain into the subset of fraud candidate domains when the selection condition is satisfied. The content analyzer is configured to analyze an electronic document distributed by a candidate domain selected from the subset of fraudulent candidate domains to determine whether the electronic document is fraudulent and, in response, when the electronic document is fraudulent, to determine that the candidate domain is fraudulent.

According to another aspect, a method of identifying a rogue internet domain includes: identifying a set of co-hosted internet domains using at least one hardware processor in accordance with a known rogue internet domain, wherein the known rogue internet domain is located at a target Internet Protocol (IP) address, and wherein identifying the set of co-hosted internet domains comprises selecting the set of co-hosted internet domains such that all components of the set of co-hosted internet domains are located at the target IP address. The method further includes filtering, using the at least one hardware processor, the set of co-hosted internet domains to produce a subset of fraud candidate domains. Filtering the set of co-hosted internet domains comprises determining whether a selection condition is satisfied from domain name registration data characterizing one of the set of co-hosted internet domains, and in response, selecting the domain into the subset of fraud candidate domains when the selection condition is satisfied. The method further includes analyzing, using the at least one hardware processor, an electronic document distributed by a candidate domain selected from the subset of fraudulent candidate domains to determine whether the electronic document is fraudulent. The method further comprises: in response to analyzing the electronic document, determining that the candidate domain is fraudulent when the electronic document is fraudulent.

According to another aspect, a non-transitory computer-readable medium stores instructions that, when executed by at least one hardware processor, cause the hardware processor to form an inverse address mapper, a registration data filter connected to the inverse address mapper, and a content analyzer connected to the registration data filter. The reverse address mapper is configured to identify a set of co-hosted internet domains from a known rogue internet domain, wherein the known rogue internet domain is located at a target Internet Protocol (IP) address, and wherein identifying the set of co-hosted internet domains comprises selecting the set of co-hosted internet domains such that all components of the set of co-hosted internet domains are located at the target IP address. The registration data filter is configured to filter the set of co-hosted internet domains to produce a subset of fraud candidate domains. Filtering the set of co-hosted internet domains comprises determining whether a selection condition is satisfied from domain name registration data characterizing one of the set of co-hosted internet domains, and in response, selecting the domain into the subset of fraud candidate domains when the selection condition is satisfied. The content analyzer is configured to analyze an electronic document distributed by a candidate domain selected from the subset of fraudulent candidate domains to determine whether the electronic document is fraudulent, and in response, when the electronic document is fraudulent, to determine that the candidate domain is fraudulent.

Drawings

The foregoing aspects and advantages of the invention will become better understood when the following detailed description is read and when taken with reference to the accompanying drawings, wherein:

FIG. 1 illustrates an exemplary set of client systems protected from online fraud according to some embodiments of the invention.

FIG. 2-A illustrates an exemplary hardware configuration of a client system according to some embodiments of the invention.

FIG. 2-B illustrates an exemplary hardware configuration of a server computer system according to some embodiments of the invention.

FIG. 3 illustrates exemplary software components executing on a client system according to some embodiments of the invention.

FIG. 4 illustrates an exemplary data exchange between a client system and a security server according to some embodiments of the invention.

FIG. 5 illustrates an exemplary sequence of steps performed by the fraud prevention module and the security server to protect the client system from e-fraud, according to some embodiments of the invention.

FIG. 6 illustrates exemplary components of a fraud identification server according to some embodiments of the invention.

FIG. 7 illustrates an exemplary sequence of steps performed by a fraud identification server according to some embodiments of the invention.

Detailed Description

In the following description, it is understood that all enumerated connections between structures may be either direct operative connections or indirect operative connections through intermediate structures. A set of elements includes one or more elements. Any reference to an element is to be understood as referring to at least one of the elements. The plurality of elements includes at least two elements. Unless otherwise required, any described method steps need not necessarily be performed in the particular illustrated order. A first element (e.g., data) derived from a second element encompasses the first element as equal to the second element, as well as the first element generated by processing the second element and optionally other data. Making a determination or decision based on a parameter encompasses making a determination or decision based on a parameter and optionally based on other data. Unless otherwise specified, an indicator of a certain quantity/data may be the quantity/data itself, or an indicator other than the quantity/data itself. A computer program is a sequence of processor instructions that perform a task. The computer program described in some embodiments of the invention may be a stand-alone software entity or a sub-entity (e.g., subroutine, library) of another computer program. Unless otherwise specified, computer security encompasses protecting devices and data from illegal access, modification, and/or corruption. Unless otherwise specified, the term online fraud is not limited to fraudulent web sites, but also encompasses other illegal or unsolicited commercial electronic communications, such as e-mail, instant messaging, and phone text and multimedia messaging internet domains (or simply domains) that are a subset of the computing resources (real or virtual computer systems, network addresses) owned, controlled or operated by a particular individual or organization. A fraudulent internet domain is a domain that hosts and/or distributes fraudulent electronic documents. The domain name is an alphanumeric alias that represents the respective internet domain. The rogue domain name is a domain name of the rogue domain. Computer-readable media encompass non-transitory media such as magnetic, optical, and semiconductor storage media (e.g., hard disk drives, optical disks, flash memory, DRAM), as well as communication links such as conductive cables and fiber optic links. According to some embodiments, the invention provides, among other things, a computer system comprising hardware (e.g., one or more processors) programmed to perform the methods described herein, and computer-readable medium encoding instructions to perform the methods described herein.

The following description illustrates embodiments of the invention and is not necessarily intended to illustrate embodiments of the invention in a limiting sense.

FIG. 1 illustrates an exemplary fraud prevention system according to some embodiments of the invention. The security server 14 and the fraud identification server 12 protect the plurality of client systems 10a-d from online fraud. Client systems 10a-d generally represent any electronic device having a processor and memory and capable of connecting to a communication network. Exemplary client devices include personal computers, laptop computers, mobile computing devices (e.g., tablet computers), mobile phones, wearable devices (e.g., watches, fitness monitors), game consoles, TVs, and household appliances (e.g., refrigerators, media players), among others. Client systems 10a-d are interconnected via a communication network 13, such as a corporate network or the internet. Portions of network 13 may comprise a Local Area Network (LAN) and/or a telecommunications network (e.g., a 3G network).

Each server 12, 14 generally represents a set of communicatively coupled computer systems, which may not be in physical proximity to each other. In some embodiments, the security server 14 is configured to: receiving a query from a client system, the query indicating an electronic document, such as a web page or an electronic message; and respond with an evaluation indicator that indicates whether the respective document is likely to be fraudulent. In some embodiments, the likelihood of fraud is determined based on the location indicators of the respective documents. Exemplary location indicators include domain names, host names, and Internet Protocol (IP) addresses of computer systems hosting or distributing the respective electronic documents. Domain name is a term commonly used in the art to denote a unique sequence of characters that identifies a particular address domain of the internet owned and/or controlled by an individual or organization. The domain name constitutes an abstraction (e.g., alias) of a set of network addresses (e.g., IP addresses) of computers hosting and/or distributing the electronic document. Domain names typically include a concatenated sequence of labels, such as www.bitdefender.com, bounded by dots.

Fraud identification server 12 is configured to collect information about online fraud, including, for example, a list of location indicators (domain names, IP addresses, etc.) for fraudulent documents. In some embodiments, fraud identification server 12 stores fraud indication information in fraud domain database 15, which may be further used by security server 14 to determine the likelihood that an electronic document is fraudulent. Details of such functions are given below.

Fig. 2-a illustrates an exemplary hardware configuration of client system 10 (e.g., systems 10a-d in fig. 1). For simplicity, the client system shown is a computer system; the hardware configuration of other client systems, such as mobile phones, smartwatches, etc., may be slightly different from the configuration shown. Client system 10 includes a set of physical devices, including a hardware processor 20 and a memory unit 22. The processor 20 comprises a physical device (e.g., a microprocessor, a multi-core integrated circuit formed on a semiconductor substrate, etc.) configured to perform computational and/or logical operations with a signal and/or data set. In some embodiments, such operations are indicated to the processor 20 in the form of a sequence of processor instructions (e.g., machine code or other type of encoding). The memory unit 22 may include a volatile computer-readable medium (e.g., DRAM, SRAM) that stores instructions and/or data accessed or generated by the processor 20.

Input device 24 may include a computer keyboard, mouse, microphone, etc., including respective hardware interfaces and/or adapters that allow a user to introduce data and/or instructions into client system 10. Output devices 26 may include display devices (e.g., display screens, liquid crystal displays) and speakers, as well as hardware interfaces/adapters, such as graphics cards, that allow client system 10 to transmit data to a user. In some embodiments, the input device 24 and the output device 26 may share common hardware, such as a touch screen device. Storage unit 28 includes a computer-readable medium capable of non-volatile storage, reading and writing of software instructions and/or data. Exemplary storage devices 28 include magnetic and optical disks and flash memory devices as well as removable media such as CD and/or DVD disks and drives. The set of network adapters 32 enables the client system 10 to connect to a computer network and/or other electronic devices. Controller hub 30 represents a plurality of system, peripheral, and/or chipset buses and/or all other circuits that enable communication between processor 20 and devices 22, 24, 26, 28, and 32. For example, controller hub 30 may include a memory controller, an input/output (I/O) controller, an interrupt controller, and the like. In another example, controller hub 30 may include a north bridge that connects processor 20 to memory 22 and/or a south bridge that connects processor 20 to devices 24, 26, 28, and 32.

FIG. 2-B illustrates an exemplary hardware configuration of fraud identification server 12 according to some embodiments of the invention. The security server 14 may have a similar configuration. The fraud identification server 12 includes at least one hardware processor 120 (e.g., a microprocessor, a multi-core integrated circuit), physical memory 122, server storage 128, and a set of server network adapters 132. Adapter 132 may include a network card or other communication interface that enables fraud identification server 12 to connect to communication network 13. Server storage 128 may store at least a subset of records from rogue-domain database 15. In an alternative embodiment, the server 12 may access the fraud record from the database 15 via the network 13. In some embodiments, server 12 further includes input and output devices that may function similarly to input/output devices 24 and 26, respectively, of client system 10.

FIG. 3 illustrates exemplary software executing on client system 10 according to some embodiments of the invention. An Operating System (OS)34 provides an interface between the hardware of client system 10 and a set of software applications. An exemplary OS includesAndand the like. Application programs 36 generally represent any user application, such as word processing, image processing, spreadsheets, calendars, online games, social media, web browsers, and electronic communication applications.

The fraud prevention module 38 protects the client system 10 from electronic fraud, such as by preventing the client system 10 from accessing fraudulent electronic documents (e.g., fraudulent websites, email messages, etc.). In some embodiments, the operation of fraud prevention module 38 may be turned on and/or off by a user of client system 10. The fraud prevention module 38 may be a stand-alone application or may form part of a suite of computer programs that protect the client system 10 from computer security threats, such as malware (malicious code), spyware, and unauthorized intrusion. The module 38 may operate at various levels of processor privilege (e.g., user mode, kernel mode). In some embodiments, the module 38 is integrated with the application 36, for example as a plug-in, an attachment, or a toolbar.

In some embodiments, fraud prevention module 38 may include a network filter 39 configured to intercept requests by client system 10 to access remote documents and selectively block respective requests. Exemplary access requests detected by module 38 include hypertext transfer protocol (HTTP) requests issued by client system 10. The network filter 39 may operate, for example, as a driver registered with the OS 34. In embodiments where OS34 and application 36 execute within a virtual machine, fraud prevention module 38 (or at least network filter 39) may execute outside the respective virtual machine, such as at the hypervisor's processor privilege level. Such a configuration may effectively protect module 38 and/or network filter 39 from malicious code that may affect the virtual machine. In yet another embodiment, fraud prevention module 38 may operate at least partially on an electronic device different from client system 10, such as a router, proxy server, or gateway device for connecting client system 10 to an extended network such as the Internet.

Fig. 4 illustrates the operation of fraud prevention module 38 via an exemplary data exchange between client system 10 and security server 14. FIG. 5 further illustrates an exemplary sequence of steps performed by fraud prevention module 38 and/or security server 14 to protect client system 10 from electronic fraud, according to some embodiments of the invention. In the illustrative example in which application 36 comprises a web browser, when a user attempts to access a remote document (e.g., a website), application 36 may send a request to a service provider server over communication network 13 to access the respective document. A typical request may contain a code for the location of the respective resource. Exemplary location codes include domain names, host names, Uniform Resource Identifiers (URIs), Uniform Resource Locators (URLs), Internet Protocol (IP) addresses, and the like.

Upon detecting an access request (e.g., an HTTP request issued by a web browser), some embodiments of the fraud prevention module 38 at least temporarily suspend transmission of the respective request to its intended destination, but instead communicate the document indicator 42 to the security server 14. In some embodiments, the document indicator 42 contains an encoding of the location of the requested document (e.g., domain name, URL, IP address), and may further contain other information obtained by the fraud prevention module 38 by analyzing the intercepted access request. Such information may include an indicator of the type of document requested, an indicator of the requesting application, and an identifier of the requesting user, among other things. In response to receiving document indicator 42, in step sequence 208 and 210 (FIG. 5), some embodiments of security server 14 formulate an evaluation indicator 44 indicating whether the requested document is likely fraudulent, and transmit indicator 44 to client system 10. In some embodiments, the likelihood of fraud is quantified as a boolean value (e.g., 0/1, yes/no), or as a number between a lower limit and an upper limit (e.g., between 0 and 100).

In some embodiments, in step 212, fraud prevention module 38 determines whether the requested document is likely fraudulent based on evaluation indicator 44. If so, step 214 allows client system 10 (e.g., application 36) to access the respective document, such as by transmitting the original access request to its intended destination. If not, step 216 may prevent access to the respective document. Some embodiments may further display notifications (e.g., warning screens, icons, interpretations, etc.) to the user and/or may notify a system administrator of client system 10.

In an alternative embodiment, fraud prevention module 38 executing on client system 10 or on a router connecting client system 10 to the Internet may redirect all requests to access remote documents to security server 14 for analysis. Thus, the security server 14 may be located at a proxy server location between the client system 10 and a remote server providing access to the respective resource. In such embodiments, steps 212-214-216 may be performed by the security server 14.

In an exemplary embodiment in which a user of the client system 10 is protected from fraudulent electronic messages (e.g., e-mails), the fraud prevention module 38 may be installed as a plug-in or attachment to the message reader application. Upon receiving the message, module 38 may parse the header of the respective message to extract a document indicator including, for example, an electronic address of the sender of the respective message and/or a domain name of an email server that delivered the respective message. Module 38 may then transmit document indicator 42 to security server 14 and, in response, receive evaluation indicator 44 from server 14. The fraud prevention module 38 may determine from the indicator 44 whether the respective message is likely fraudulent and, if so, prevent the content of the respective message from being displayed to the user. In some embodiments, module 38 may place messages that are deemed likely to be fraudulent in a separate message folder.

In an alternative embodiment, fraud prevention module 38 may execute on a server computer system (e.g., an email server) that manages electronic messaging on behalf of multiple client systems (e.g., client systems 10a-d in FIG. 1). To determine that the message may be fraudulent, module 38 may prevent the respective message from being distributed to its intended recipient.

In determining the likelihood of fraud, security server 14 may query fraud domain database 15 (step 208 in FIG. 5). In some embodiments, the database 15 includes a set of records, each record corresponding to a rogue domain name; these record sets are sometimes referred to in the art as blacklists. In some embodiments, step 208 includes determining whether the domain name indicated by document indicator 42 matches any blacklisted records of database 15. If so, the security server 14 may determine that the requested document is likely fraudulent.

Fraud domain database 15 may be populated and maintained by fraud identification server 12. In some embodiments, server 12 identifies a previously unknown set of rogue domains based on knowledge gained from analyzing known rogue internet domains (referred to herein as seed domains). The domain name of the newly discovered rogue domain may then be added to the database 15. FIG. 6 illustrates exemplary components of fraud identification server 12 according to some embodiments of the invention. The server 12 may include a reverse address mapper 52, a registration data filter 54 coupled to the reverse address mapper 52, and a content analyzer 56 coupled to the filter 54. FIG. 7 illustrates an exemplary sequence of steps performed by fraud identification server 12 to discover fraudulent Internet domains according to some embodiments of the present invention.

Some embodiments of the invention rely on the observation that physical computing resources belonging to one rogue domain typically also belong to other rogue domains. For example, the same server and/or IP address may host multiple fraudulent websites. Such a server or network address may be owned by a fraudster or may be hijacked without knowing its legitimate owner/operator, for example, through the use of elaborate malware. The following description shows how knowledge of one rogue domain is used to expose other previously unknown rogue domains.

In some embodiments, the inverse address mapper 52 is configured to receive an indicator of a seed domain (e.g., seed domain name 62 in fig. 6) and output a set of co-hosting domains 64 (step 234 in fig. 7). The seed domain represents a known fraud domain, i.e., a domain known to host or distribute fraudulent documents. Examples of such domains include domains hosting fake banking sites, fake online betting sites, fake loan sites, and the like. For example, the seed domain name may be detected by a researcher of a computer security company, or may be reported by an internet user or an administrator investigating online fraud. Seed domain names can also be automatically discovered by a suite of tools known in the art (e.g., honeypot technology).

In some embodiments, the co-hosting domain 64 includes a set of domains (e.g., public IP addresses) that share a common network address with the seed domain. An exemplary set of co-hosted domains 64 uses the same physical server to distribute electronic documents. Since a single network/IP address may correspond to multiple different computer systems, the co-hosted domain 64 may not necessarily contain the same physical machine as the seed domain. However, the domain name server would map the seed domain name 62 and the domain names of all of the co-hosted domains 64 to the same network address. To identify the co-hosting domain 64, the fraud identification server 12 may use any method known in the art of computer networking. Such operations are commonly referred to as reverse IP analysis, reverse Domain Name System (DNS) lookup, or reverse DNS resolution. In one exemplary method, the server 12 operates a name server for performing direct DNS lookups (i.e., determining an IP address from a domain name) and uses the name server to construct a reverse DNS map. Another approach may look for a pointer DNS record type (PTR record) for a particular domain, such as in-addr.

Not all of the co-hosted domains 64 need to be fraudulent. As described above, sometimes computer systems belonging to a legal domain are hijacked by a fraudster who then uses a respective machine to host a set of fraudulent domains. Sometimes, such rogue domains are only hosted on the respective machine for a short period of time, and then moved to another server to avoid detection or countermeasures. In some embodiments, the registration data filter 54 of the fraud identification server 12 is configured to filter the set of co-hosting domains 64 to select a set of fraud candidate domains 66 (step 236 in FIG. 7), representing domains suspected of being fraudulent. The fraud candidate field 66 may be subject to further review as shown below.

Step 236 may be considered an optimization because fraud analysis as shown below may be computationally expensive. Pre-filtering the set of co-hosted domains 64 may reduce the computational burden by using relatively less expensive rules to select a subset of candidate domains for fraud analysis. Some embodiments of the registration data filter 54 select the fraud candidate domain 66 based on the domain name registration record for each co-hosted domain. The registration record is generated and/or maintained by a domain registration authority (e.g., an internet registrar). For each registered domain name, an exemplary registration record may include contact data (e.g., name, address, telephone number, email address, etc.) of the registrant, owner, or administrator of the respective domain name, as well as automatically generated data, such as an ID of the registrant, and various timestamps indicating the time at which the respective domain name was registered, the time at which the respective registration record was last modified, the time at which the respective registration record expired, etc.

Certain domain name registration data is public and can be queried by specific computer instructions and/or protocols (e.g., WHOIS). In some embodiments, the registration data filter 54 obtains domain name registration data related to the co-hosted domain 64 from the domain registration database 17, for example, by using the WHOIS protocol. The filter 54 may then search the domain name registration data for each co-hosted domain for a set of fraud indication patterns to determine whether the domain is likely fraudulent. Some embodiments rely on the following observations: registrations of rogue domain names are typically aggregated over time (bursts of domain registration); such embodiments may compare the registration timestamp of the seed domain name 62 with the registration timestamp of the co-hosted domain 64 and select the respective co-hosted domain into the set of fraud candidate domains 66 based on the comparison (e.g., when the two registrations are very close in time).

Another exemplary fraud indication feature is a registrar (e.g., owner, administrator, etc.) of the domain name. Some embodiments of the filter 54 may attempt to match the registrant's credentials to a list of known names, telephone numbers, addresses, emails, etc. collected from domain name registration data for known rogue domains, such as the seed domain name 62. A match may indicate that the respective co-hosted domain is likely to be fraudulent, thus justifying the inclusion of the respective co-hosted domain in the set of fraud candidate domains 66.

In some embodiments, the filter 54 may look up certain fraud indication characteristics of the phone number of the registrant. In one example, some region or country codes may be considered fraud-indicative. In another example, certain combinations of digits within a telephone number correspond to an automatic call redirection service; the respective telephone number may appear to be a legitimate number, but calling it will result in the respective call being redirected to another number, possibly a number of another country. This call redirection mode may be considered fraud-indicative. Some embodiments of the registration data filter 54 may perform a reverse telephone number lookup and compare the result of the lookup to other domain registration data, such as an address or name. Any discrepancies may be considered fraud indicative and may result in the inclusion of distinct co-hosting domains in the fraud candidate set.

Yet another exemplary criterion for selecting a domain into the set of fraud candidate domains 66 is the registrant's email address. Some embodiments of filter 54 may attempt to match individual email addresses with a blacklist of email addresses collected from known fraudulent documents (e.g., web pages, email messages). The blacklist may also include email addresses collected from domain registration data for known rogue domains. Some embodiments of the filter 54 may look for certain patterns in the registrant's email, such as an apparently random sequence of characters, an unusually long email address, and the like. Such patterns may indicate that the respective addresses are automatically generated, which may be fraud-indicative. In some embodiments, the filter 54 may determine whether to include the co-hosted domain into the fraud candidate set based on the provider of the email address, e.g., based on whether the respective provider allows anonymous email accounts, based on whether the respective email address is provided free of charge, etc. Some embodiments may identify email servers that handle emails addressed to and/or originating from respective email addresses and determine whether to include a co-hosted domain into a fraud candidate set based on the identity of such servers.

In response to selecting the fraud candidate domain 66, in some embodiments, the content analyzer 56 performs content analysis to determine whether any fraud candidate domains in the set of fraud candidate domains are actually fraudulent (step 238 in FIG. 7). Content analysis may include accessing fraud candidate domains and analyzing the content of electronic documents hosted or distributed by the respective domains. When the content analysis determines that the electronic document is fraudulent, step 240 may determine that the respective fraudulent candidate domain is indeed fraudulent, and may save the newly identified fraudulent domain name to the fraudulent domain database 15.

Exemplary content analysis of hypertext markup language (HTMT) documents includes, among other things, determining whether a respective document includes a user authentication (login) page. Such a determination may include determining whether the respective web page includes a form field and/or any of a plurality of user authentication keywords (e.g., "username," "password," name of financial institution, and/or acronym).

The content analysis may further include comparing the respective HTML documents to a known set of fraudulent and/or legitimate documents. When the document is sufficiently similar to a known fraudulent document, some embodiments determine that the respective document is fraudulent. These methods rely on the observation that fraudsters often reuse successful document templates, and thus there are typically several fraudulent documents that use roughly the same design and/or format.

However, a document may also be fraudulent when it is sufficiently similar to a particular legitimate document. In one such example, the web page may attempt to deceive the user by masquerading as a legitimate web page of a financial institution (e.g., a bank, an insurance company, etc.). Thus, some embodiments of the content analyzer 56 use content analysis to determine whether an HTML document located in a fraudulent candidate domain is an illegitimate clone of a legitimate web page. Such a determination may include analyzing a set of graphical elements (e.g., images, logos, color schemes, fonts, font styles, font sizes, etc.) of a document under review and comparing these elements to graphical elements captured from a set of legitimate web pages.

The content analysis may further comprise analyzing text portions of the respective electronic documents. Such text analysis may include searching for certain keywords, calculating frequency of occurrence of certain terms and/or word sequences, determining relative positions of certain terms with respect to other terms, and so forth. Some embodiments determine an inter-document distance indicative of a degree of similarity between the target document and the reference document (or fraudulent or legitimate), and determine whether the target document is legitimate based on the calculated distance.

Another example of text-based content analysis includes identifying and extracting contact information from an electronic document, such as an HTML document or an email message (e.g., address, contact phone number, contact email address, etc.). Content analyzer 56 may then attempt to match the respective contact data with a blacklist of similar data extracted from known fraudulent documents. For example, when a web page lists contact phone numbers that appear on a fraudulent website, some embodiments may infer that the web page is also fraudulent. Other embodiments look for fraud indication patterns in the contact data, such as for telephone numbers with certain country and/or region codes, patterns indicating telephone number digits of the call redirection service, etc. (see above, analysis regarding domain registration data).

Another set of exemplary content analysis methods identifies code (e.g., business tracking code) segments that are placed within an electronic document. A network analysis service (e.g.,) Instances of such code are used to calculate and report various data related to web page usage: number of visits, recommended persons, home country of visit, etc. Such codes typically include a unique client ID (e.g., a tracking ID) that allows the respective analysis service to associate the respective electronic document with a particular client. Some embodiments of the content analyzer 56 may identify the tracking IDs and attempt to match the respective IDs to a blacklist of such IDs collected from known fraudulent documents. A match may indicate that the currently analyzed document is also fraudulent.

The exemplary systems and methods described above allow for automatic detection of internet fraud, such as fraudulent web pages and electronic messages. Some embodiments identify automatically identifying fraudulent internet domain names, i.e., the names of domains hosting or distributing fraudulent documents, and prevent users from accessing individual fraudulent domain names. Alternative embodiments display an alert and/or notify a system administrator when attempting to access a known rogue domain name.

Some embodiments automatically discover a previously unknown rogue domain name set based on knowledge derived from analyzing known rogue domain names. Such automatic detection can quickly respond to newly emerging fraudulent attempts, and fraud can even be proactively prevented by detecting domain names that are registered but have not yet been used to perform fraudulent activities.

Some embodiments select a fraud candidate domain from a set of domains hosted on the same machine as known fraud domains. The candidate set may be further refined according to the domain registration data. Content analysis can then be used to identify truly rogue domains within the candidate set.

It will be clear to a person skilled in the art that the above-described embodiments may be varied in a number of ways without departing from the scope of the invention. Accordingly, the scope of the invention should be determined by the appended claims and their legal equivalents.

Claims

1. A computer system comprising at least one hardware processor configured to operate a reverse address mapper, a registration data filter connected to the reverse address mapper, and a content analyzer connected to the registration data filter, wherein:

the reverse address mapper is configured to identify a set of co-hosted internet domains from a known rogue internet domain, wherein the known rogue internet domain is located at a target internet protocol IP address, and wherein identifying the set of co-hosted internet domains comprises selecting the set of co-hosted internet domains such that all components of the set of co-hosted internet domains are located at the target IP address;

the registration data filter is configured to filter the set of co-hosted internet domains to produce a subset of fraud candidate domains, wherein filtering the set of co-hosted internet domains comprises:

determining whether a selection condition is satisfied based on domain name registration data characterizing one of the set of co-hosted internet domains, an

In response, when the selection condition is satisfied, selecting the domain into the subset of fraud candidate domains;

and is

The content analyzer is configured to:

analyzing an electronic document distributed by a candidate domain selected from the subset of fraudulent candidate domains to determine whether the electronic document is fraudulent, and

in response, when the electronic document is fraudulent, the candidate domain is determined to be fraudulent.

2. The computer system of claim 1, wherein determining whether the selection condition is satisfied comprises: comparing the domain name registration data characterizing the domain with domain name registration data characterizing the known rogue internet domain.

3. The computer system of claim 2, wherein determining whether the selection condition is satisfied comprises: comparing the registration timestamp of the domain with the registration timestamp of the known rogue internet domain.

4. The computer system of claim 1, wherein the domain name registry data characterizing the domain comprises an email address, and wherein the registry filter is configured to determine whether the selection condition is satisfied in accordance with the email address.

5. The computer system of claim 4, wherein the registration data filter is configured to determine whether the selection condition is satisfied according to a length of the email address.

6. The computer system of claim 4, wherein the registration data filter is configured to determine whether the selection condition is satisfied according to an identification of a mail server that processes emails sent to the email address.

7. The computer system of claim 4, wherein the registration data filter is configured to determine whether the selection condition is satisfied in accordance with whether an anonymous email account is allowed by a provider of the email address.

8. The computer system of claim 4, wherein the enrollment data filter is configured to determine whether the selection condition is satisfied according to a likelihood of automatically generating the email address.

9. The computer system of claim 1, wherein the domain name registration data characterizing the domain comprises a telephone number, and wherein the registration data filter is configured to determine whether the selection condition is satisfied according to the telephone number.

10. The computer system of claim 9, wherein determining whether the selection condition is satisfied comprises performing a reverse telephone number lookup to determine an entity that owns the telephone number, and wherein the registration data filter is configured to determine whether the selection condition is satisfied according to a result of the reverse telephone number lookup.

11. A method of identifying a rogue internet domain, the method comprising:

identifying, using at least one hardware processor, a set of co-hosted internet domains from a known rogue internet domain, wherein the known rogue internet domain is located at a target internet protocol IP address, and wherein identifying the set of co-hosted internet domains comprises selecting the set of co-hosted internet domains such that all components of the set of co-hosted internet domains are located at the target IP address;

filtering, using the at least one hardware processor, the set of co-hosted internet domains to produce a subset of fraud candidate domains, wherein filtering the set of co-hosted internet domains comprises:

analyzing, using the at least one hardware processor, an electronic document distributed by a candidate domain selected from the subset of fraudulent candidate domains to determine whether the electronic document is fraudulent; and

in response to analyzing the electronic document, determining that the candidate domain is fraudulent when the electronic document is fraudulent.

12. The method of claim 11, wherein determining whether the selection condition is satisfied comprises: comparing the domain name registration data characterizing the domain with domain name registration data characterizing the known rogue internet domain.

13. The method of claim 12, wherein determining whether the selection condition is satisfied comprises: comparing the registration timestamp of the domain with the registration timestamp of the known rogue internet domain.

14. A method according to claim 11, wherein the domain name registration data characterising the domain comprises an email address, and wherein the method comprises determining whether the selection condition is met in dependence on the email address.

15. The method of claim 14, comprising determining whether the selection condition is satisfied according to a length of the email address.

16. The method of claim 14, comprising determining whether the selection condition is satisfied based on an identification of a mail server that processes emails sent to the email address.

17. The method of claim 14, comprising determining whether the selection condition is satisfied based on whether a provider of the email address allows an anonymous email account.

18. The method of claim 14, comprising determining whether the selection condition is satisfied according to a likelihood of automatically generating the email address.

19. A method according to claim 11, wherein the domain name registration data characterising the domain comprises a telephone number, and wherein the method comprises determining whether the selection condition is met in dependence on the telephone number.

20. The method of claim 19, wherein determining whether the selection condition is satisfied comprises performing a reverse telephone number lookup to determine an entity owning the telephone number, and wherein the method comprises determining whether the selection condition is satisfied based on a result of the reverse telephone number lookup.

21. A non-transitory computer-readable medium storing instructions that, when executed by at least one hardware processor, cause the hardware processor to form an inverse address mapper, a registration data filter connected to the inverse address mapper, and a content analyzer connected to the registration data filter, wherein:

and is

The content analyzer is configured to: