HK1180410A - Entity based search and resolution - Google Patents
Entity based search and resolution Download PDFInfo
- Publication number
- HK1180410A HK1180410A HK13107521.8A HK13107521A HK1180410A HK 1180410 A HK1180410 A HK 1180410A HK 13107521 A HK13107521 A HK 13107521A HK 1180410 A HK1180410 A HK 1180410A
- Authority
- HK
- Hong Kong
- Prior art keywords
- search
- list
- search results
- entity
- sources
- Prior art date
Links
Description
Background
Given a search query string, Web search engines traditionally return a list of hyperlinks that, after selection, link to pages on the Web that are deemed relevant to the input search query. More recently, search engine result pages often also include richer content, often via vertical informational domain names. As this trend continues, searches will converge to a point where indexing and retrieval of information is performed not only with respect to Web pages, but also with respect to information such as applications (e.g., from an application marketplace), movies, television shows, people, celebrities, events, cities, restaurants, theaters, companies, and the like. To surfacing an entity, the search engine must crawl (crawl) multiple unstructured Web pages and/or subscribe to structured feeds about a particular entity type, parse instances of the entity across this multi-source data, and surfacing representations of the (merged) entity when the user's intent refers to the entity and/or its entity type. The complications associated with indexing and searching entities are further complicated by the need to perform the following operations: retrieving an entity based on the approximate description; retrieving a broad set of entities-some of which may not be directly described by a query string; retrieving metadata about entities from popular sources based on descriptions of the entities in the non-popular sources; generally combining features and rankings of indexed entities across multiple sources; performing a faceted search on the entity; and performing integrated searches generally by integrating information from multiple web pages into a composite whole.
Prior art solutions to the entity search problem can be categorized into one of two approaches, each of which suffers from its own drawbacks. Initially, Vertical Engine Results Pages (VERPs) dedicated to a single information vertical often search a collection of entities of a single type (e.g., movie entities) according to an index that contains basic entity attributes. Such a solution may fail on queries that provide semantically-related text or ambiguous descriptions that do not appear in The index (e.g., The query "movie with a linked book standing DiCaprio (a movie of a sunken ship that The DiCaprio has starred)" may not return The movie "Titanic", or The query "Batman" may not return The movie "The Dark Knight)"). A second general approach uses Web search, which has the advantage of utilizing a large index of relevant terms for Web link structures and anchor text, including powerful intent analysis, and the use of automatic spelling correction. The drawback of this approach is that if the indexed pages are not parsed with entities, the rich content provided by VERP may not be surfaced at all. Moreover, because indexed pages are not parsed for each other, even if rich content is retrieved, a large number of results linked to instances of the same underlying entity may be retrieved together, weakening the diversity of the results.
Disclosure of Invention
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
Embodiments of the present invention relate to systems, methods, and computer-readable storage media for, among other things, integrating the advantages of vertical searches with the advantages of Web searches to provide a rich search experience with entity type characterization. A search query is received and an entity type is determined for the query. A Web search for the query is then performed against a predetermined list of sources determined to be authoritative for the determined entity type. While a source may be authoritative for the entity type, it may also include information for other entity types, and likewise, identifying results associated with multiple entity types. Thus, the results of the source-specific search are filtered based on the entity type, providing a filtered list of results that are each related to an entity determined to be the entity type associated with the query. The filtered list is then compared to the parsed list of entities to determine equivalent entities identified by different searched sources, which are combined into a single potential search result. The merged search results are then sorted based on the ranking value. The ranking values assigned to the merged entities are aggregate ranking values calculated from the individual ranking values provided for the entities from the different sources. At least a portion of the results are then presented to the user.
Drawings
The present invention is illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements, and in which:
FIG. 1 is a block diagram of an exemplary computing environment suitable for use in implementing embodiments of the present invention;
FIG. 2 is a block diagram of an exemplary computing system in which embodiments of the present invention may be utilized;
FIG. 3 is a diagram illustrating an exemplary screen display of results of a source-specific search according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating an exemplary screen display demonstrating why filtering is sometimes deemed necessary when an entity search is performed in accordance with an embodiment of the present invention;
FIG. 5 is a diagram illustrating an exemplary screen display of an entity search implemented in accordance with an embodiment of the present invention;
FIG. 6 is a diagram illustrating an exemplary screen display of another entity search implemented in accordance with an embodiment of the invention;
FIG. 7 is a diagram of a screen display illustrating an exemplary presentation of results of an integrated search implemented in accordance with an embodiment of the invention;
FIGS. 8A and 8B are schematic diagrams illustrating screen displays of an exemplary presentation of a faceted entity search, according to an embodiment of the present invention;
FIG. 9 sets forth a flow chart illustrating an exemplary method for targeting a Web search based on entity type and parsing its results according to embodiments of the present invention; and
FIG. 10 sets forth a flow chart illustrating a further exemplary method for targeting a Web search based on entity type and parsing its results according to embodiments of the present invention.
Detailed Description
The subject matter of the present invention is described with specificity herein to meet statutory requirements. However, the description itself is not intended to limit the scope of this patent. Rather, the inventors have contemplated that the claimed subject matter might also be embodied in other ways, to include different steps or combinations of steps similar to the ones described in this document, in conjunction with other present or future technologies. Moreover, although the terms "step" and/or "block" may be used herein to connote different elements of methods employed, the terms should not be interpreted as implying any particular order among or between various steps herein disclosed unless and except when the order of individual steps is explicitly described.
Various aspects of the technology described herein are generally directed to systems, methods, and computer-readable storage media for, among other things, integrating the advantages of vertical searches and the advantages of Web searches to provide a rich search experience with entity type characterization. An "entity" is a description of some real-world object or item, according to an embodiment of the invention. That is, an entity is a representation of a real-world concept that is different from a Web document. Rather than matching Web documents to terms that appear in a search query, embodiments of the present invention seek to match entities or real world items (tangible or intangible) to the query, so that a richer search experience can be provided, as described more fully below. Entities sharing a common attribute may be grouped into entity types.
According to an embodiment of the present invention, a search query is received and an entity type is determined for the query. A Web search for the query is then performed against a predetermined list of sources that are deemed authoritative for the determined entity type. While a source may be authoritative for the entity type, it may also include information for other entity types, and likewise, identifying results related to multiple entity types. Thus, the results of the source-specific search are filtered based on the entity type, providing a filtered list of results that are each related to an entity determined to be the entity type associated with the query. The filtered list is then compared to the parsed list of entities to determine equivalent entities identified by the different searched sources, which are combined into a single potential search result. The merged search results are then sorted based on the ranking value. The ranking values assigned to the merged entities are aggregate ranking values calculated from the individual ranking values provided for the entities from the different sources. At least a portion of the results are then presented to the user.
Accordingly, one embodiment of the present invention is directed to one or more computer-readable storage media storing computer-useable instructions that, when used by one or more computing devices, cause the one or more computing devices to perform a method for targeting and parsing the results of a Web search based on an entity type. The method includes receiving a search query and determining at least one entity type for the received search query. The method further includes performing a Web search for the received search query, the Web search being limited to a plurality of sources that have been identified for the at least one entity type. Still further, the method includes filtering results of the Web search to create a filtered list of search results, each search result in the filtered list of search results relating to an entity of the at least one entity type. Further, the method includes merging equivalent entities identified by different ones of the plurality of sources to create a merged list of search results, each search result in the merged list of search results being related to a different entity of the at least one entity type. Finally, the method includes ordering the consolidated list of search results based on ranking values, wherein the ranking value assigned to at least one of the search results in the consolidated list of search results is an aggregate ranking value calculated from individual ranking values provided for entities associated with the at least one of the search results from at least a portion of the different ones of the plurality of sources.
In another embodiment, the present invention is directed to a method performed by one or more computing devices comprising at least one processor for determining a target for a Web search based on an entity type and parsing results of the Web search. The method comprises the following steps: associating at least one entity type with the received search query; searching a plurality of predetermined Web sources identified for the at least one entity type to determine a list of search results; and filtering the list of search results with respect to the at least one entity type to create a filtered list of search results. Each search result in the filtered list of search results is related to an entity of the at least one entity type. The method further includes comparing the filtered list of search results to a resolved list of entities in order to determine equivalent entities identified by different ones of the plurality of predetermined sources; and creating a consolidated list of search results by consolidating the equivalent entities determined to have been identified by the different ones of the plurality of predetermined sources. Each search result in the consolidated list of search results is related to a different entity of the entity type. Still further, the method includes ordering the consolidated list of search results based on a ranking value, wherein the ranking value assigned to at least one of the search results in the consolidated list of search results is an aggregate ranking value calculated from individual ranking values provided for entities associated with the at least one of the search results, the individual ranking values provided by at least a portion of the different ones of the plurality of predetermined sources.
In yet another embodiment, the present invention is directed to a system for determining a target for a Web search based on an entity type. The system includes a computing device associated with a search engine, the computing device having one or more processors and one or more computer-readable storage media and a data store coupled with the search engine. The search engine is configured to receive a search query, determine at least one entity type for the received search query, identify a plurality of authoritative sources associated with the at least one entity type, search the plurality of identified authoritative sources to determine a list of search results, and filter the list of search results to create a filtered list of search results. Each search result in the filtered list of search results is related to an entity of the at least one entity type. The search engine is further configured to compare the filtered list of search results to the parsed entity list to determine equivalent entities identified by different ones of the plurality of authoritative sources, and create a consolidated list of search results by consolidating the equivalent entities determined to have been identified by the different ones of the plurality of authoritative sources. Each search result in the consolidated list of search results is associated with a different entity of the entity type. The search engine is further configured to rank the consolidated list of search results based on a ranking value, wherein the ranking value assigned to the at least one of the search results in the consolidated list of search results is an aggregate ranking value calculated from individual ranking values provided for entities associated with the at least one of the search results, the individual ranking values provided by at least a portion of the different ones of the plurality of authoritative sources.
Having briefly described an overview of embodiments of the present invention, an exemplary operating environment in which embodiments of the present invention may be implemented is described below in order to provide a general context for various aspects of the present invention. Referring to the drawings in general and initially to FIG. 1 in particular, an exemplary operating environment for implementing embodiments of the present invention is shown and designated generally as computing device 100. The computing device 100 is only one example of a suitable computing environment and is not intended to suggest any limitation as to the scope of use or functionality of embodiments of the invention. Neither should the computing device 100 be interpreted as having any dependency or requirement relating to any one or combination of components illustrated.
Embodiments of the invention may be described in the general context of computer code or machine-readable instructions, including computer-usable or computer-executable instructions, such as program modules, being executed by a computer or other machine, such as a personal digital assistant or other handheld device. Generally, program modules including routines, programs, objects, components, data structures, etc., refer to code that perform particular tasks or implement particular abstract data types. Embodiments of the invention may be practiced in a variety of system configurations, including hand-held devices, consumer electronics, general-purpose computers, more specialty computing devices, and the like. Embodiments of the invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network.
With continued reference to FIG. 1, computing device 100 includes a bus 110 that directly or indirectly couples the following devices: a memory 112; one or more processors 114; one or more presentation members 116; one or more input/output (I/O) ports 118; one or more I/O components 120; and an illustrative power supply 122. Bus 110 represents what may be one or more busses (such as an address bus, data bus, or combination thereof). Although the various blocks of FIG. 1 are shown with lines for the sake of clarity, in reality, these blocks represent logical (not necessarily actual) components. For example, one may consider a presentation component such as a display device to be an I/O component. Likewise, the processor has a memory. The inventors hereof recognize that such is the nature of the art, and reiterate that the diagram of FIG. 1 is merely illustrative of an exemplary computing device that can be used in connection with one or more embodiments of the present invention. No distinction is made between categories such as "workstation," server, "" laptop, "" handheld device, "etc., as all are contemplated within the scope of fig. 1 and are referred to as" computing devices.
Computing device 100 typically includes a variety of computer-readable media. Computer readable media can be any available media that can be accessed by computing device 100 and includes both volatile and nonvolatile media, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer-readable media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by computing device 100. Combinations of any of the above are also included within the scope of computer readable media.
Memory 112 includes computer storage media in the form of volatile and/or nonvolatile memory. The memory may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid state memory, hard disk drives, optical disk drives, and the like. Computing device 100 includes one or more processors that read data from various entities such as memory 112 or I/O components 120. Presentation component(s) 116 present data indications to a user or other device. Exemplary presentation components include a display device, speakers, printing components, vibrating components, and the like.
I/O ports 118 allow computing device 100 to be logically coupled to other devices including I/O fabric 120, some of which may be built in. Illustrative components include a microphone, joystick, game pad, satellite dish, scanner, printer, wireless device, and the like.
As previously mentioned, embodiments of the present invention are generally directed to systems, methods, and computer-readable storage media for, among other things, integrating the advantages of vertical searches and the advantages of Web searches to provide a rich search experience with entity type characterization. A search query is received and an entity type is determined for the query. A Web search for the query is then performed against a predetermined list of sources determined to be authoritative for the determined entity type. While a source may be authoritative for the entity type, it may also include information for other entity types, and likewise, identifying results related to multiple entity types. Thus, the results of the source-specific search are filtered based on the entity type, providing a filtered list of results, each result relating to an entity determined to be the entity type associated with the query. The filtered list is then compared to the parsed list of entities to determine equivalent entities identified by different searched sources, which are combined into a single potential search result. The merged search results are then sorted based on the ranking value. The ranking values assigned to the merged entities are aggregate ranking values calculated from the individual ranking values provided for entities from the different sources. At least a portion of the results are then presented to the user.
Referring now to FIG. 2, a block diagram is provided illustrating an exemplary computing system 200 in which embodiments of the present invention may be utilized. In general, computing system 200 illustrates an environment in which a targeted Web search and parsing of its results may be conducted based on an entity type determined for an input search query. Among other components not shown, computing system 200 generally includes a user computing device 210, a search engine 212, and a data store 214, all in communication with one another via a network 216. Network 216 may include, but is not limited to, one or more Local Area Networks (LANs) and/or Wide Area Networks (WANs). Such networking environments are commonplace in offices, enterprise-wide computer networks, intranets, and the Internet. Accordingly, the network 216 is not further described herein.
It should be understood that any number of user computing devices and search engines may be utilized in computing system 200 within the scope of embodiments of the present invention. Each may comprise a single device/interface or multiple devices/interfaces operating cooperatively in a distributed environment. For example, the search engine 212 may include multiple devices and/or modules arranged in a distributed environment that collectively provide the functionality of the search engine 212 described herein. Additionally, other components/modules not shown may also be included within computing system 200.
In some embodiments, one or more of the illustrated components/modules may be implemented as a standalone application. In other embodiments, one or more of the illustrated components/modules may be implemented as an internet-based service via the user computing device 210 or as a module within the search engine 212. It will be understood by those of ordinary skill in the art that the components/modules illustrated in fig. 2 are exemplary in nature and in number, and should not be construed as limiting. Any number of components/modules may be utilized to achieve the desired functionality within the scope of embodiments of the present invention. Moreover, the components/modules may be located on any number of search engines or user computing devices. By way of example only, the search engine 212 may be provided as a single server (as shown), a cluster of servers, or a computing device remote from one or more of the remaining components.
It should be understood that this and other arrangements described herein are set forth only as examples. Other arrangements and elements (e.g., machines, interfaces, functions, orders, and groupings of functions, etc.) can be used in addition to or instead of those shown, and some elements may be omitted altogether. Moreover, many of the elements described herein are functional entities that may be implemented as discrete or distributed components or in conjunction with other components, and in any suitable combination and location. Various functions described herein as being performed by one or more entities may be carried out by hardware, firmware, and/or software. For example, various functions may be performed by a processor executing instructions stored in a memory.
User computing device 210 may include any type of computing device, such as computing device 100 described with reference to fig. 1, for example. In general, the client computing device 210 includes a browser 218 and a display 220. The browser 218 is configured to render, among other things, a search engine home page (or other online landing page), and render a search engine results page in association with the display 220 of the client computing device 210. Browser 218 is further configured to receive user input for requests for various web pages, including search engine home pages, to receive user-entered search queries (typically entered via a user interface presented on display 220 and permitting alphanumeric and/or text entry into specified search boxes), and to receive content for presentation on display 220, e.g., from search engine 212. It should be noted that the functionality described herein as being performed by the browser 218 may be performed by any other application capable of rendering Web content. Any and all such variations, and any combination thereof, are contemplated to be within the scope of embodiments of the present invention.
The search engine 212 is configured to receive requests it receives from components associated with the user computing device (e.g., the browser 218 associated with the client computing device 210) and to respond to the requests. Those skilled in the art of the present invention will recognize that the present invention may be implemented with any number of search tools. For example, an Internet search engine or a database search engine may utilize the present invention. These search engines are well known in the art, and commercially available engines share many similar processes, which are not further described herein.
As illustrated, the search engine 212 includes a query receiving component 222, an entity type determining component 224, an authority source determining component 226, a search component 228, a filtering component 230, an entity merging component 232, a ranking component 234, and a presentation component 236. The illustrated search engine 212 also has access to a data store 214. The data store 214 is configured to store information related to search queries, entities, and authoritative sources. In various embodiments, such information may include, but is not limited to, a search query log, an index of entity types and corresponding entities, an index or other listing of sources determined to be authoritative with respect to the entity types being indexed, and a list of resolved entities. In an embodiment, the data store 214 is configured to be searchable for one or more of the items stored in association therewith. It will be understood and appreciated by those of ordinary skill in the art that the information stored in association with the data store 214 may be configurable and may include any information related to the search query, the entity type and corresponding entity, and the searchable source. The content and volume of such information is not intended to limit the scope of embodiments of the present invention in any way. Moreover, although illustrated as a single, independent component, the data store 214 may, in fact, be a plurality of storage devices (e.g., a database cluster), portions of which may exist in association with the search engine 212, the client computing device 210, another external computing device (not shown), and/or any combination thereof.
The query receiving component 222 of the search engine 212 is configured to receive a request to present search results that satisfy an input search query. Typically, such a request is received via a browser associated with the client computing device (e.g., browser 218 associated with client computing device 210). In embodiments, the search query may also be invoked implicitly and received by the query receiving component 222, for example, by an action such as the user pointing to something (e.g., on a screen, on a television, or in the physical world), or by other means; moving the mouse pointer to the icon/text; talking to someone on the phone; send SMS, twitter, or status update. That is, embodiments of the present invention are not limited to a user entering a search query into a conventional query entry area of a screen display.
The entity-type determining component 224 is configured to determine at least one entity type that is relevant to the received search query. Standard techniques of query understanding can be used to map query intent to one or more entity types. For example, static and dynamic relevance scores for entity types may be obtained through static content within the data relevant to each type. By way of example only, such content may include text in a database of related entities, unstructured Web pages on the related entities and link structures of the Web that are restricted to those Web pages, and trained classifiers for determining when features of a query string match features of an entity type and its corresponding entity. The entity type may also be implicitly determined from context, e.g., by a user's action, such as the user pointing to something (e.g., on-screen, on-television, or in the physical world), or by other means; moving the mouse pointer to the icon/text; talking to someone on the phone; send SMS, twitter, or status update. Any and all such variations, and any combination thereof, are contemplated to be within the scope of embodiments of the present invention.
If the entity type determination component 224 determines that more than one entity type is relevant to the input search query, the results of subsequent entity searches may be flattened into a single search results page, with each type of entity being staggered/blended in some manner. Alternatively, an interface to a faceted search may be presented in which the user may narrow the search to the suggested entity type(s), effectively filtering out results for some of the subsequent entity searches from the last results. This is described more fully below with respect to fig. 8A and 8B. Any and all such variations, and any combination thereof, are contemplated to be within the scope of embodiments of the present invention.
The authoritative source determining component 226 is configured to identify a plurality of sources that are predetermined to be authoritative for a given entity type. Authoritative sources are all crawled sources or a subset of sources that are available from a structured feed of subscriptions for entity types. These sources may be specific to a regional market, but they may also be global. As an illustrative example, movie sources specific to the U.S. market may include IMDB, ROVI/AMG, NETFLIX, HULU, YAHOO! REVIEWS, FLIXTER, and METACRITIC. Of these, the authoritative source may be identified as an IMDB (i.e.,www.imdb.com) The optical axis of the optical fiber, the ROVI (i.e.,www.Allrovi.com) And a net flix (i.e.,www.netflix.com)。
the choice of which sources are authoritative may depend on a number of factors. At one extreme, all sources may be authoritative. At the other extreme, only one is authoritative. Generally, good authoritative sources are those that will produce results for a source-specific Web search. For example, sources that are not indexed by a Web search engine may not be good candidates for authoritative sources. A very good authoritative source candidate may be one source linked from many other Web sites, with rich anchor text, and with rich metadata about the source sites indexed by the Web search engine.
Given a user's query with intent that has matched a particular entity type, the search component 228 is configured to perform a source-specific search for the input search query by performing a Web search on each authoritative source site using a Web search engine (e.g., the well-known business search engine BING provided by Microsoft corporation of Redmond, Washington). For example, searching for a movie entity that matches the query "in space no one can heel you screen" (no one can hear your scream in space), the search component 228 may issue the query using BING as follows:
for IMDB:
http://www.bing.com/searchq=site%3Awww.imdb.com+in+space+no+one+can+hear+you+scream;
for NETFLIX:
http://www.bing.com/searchq=site%3Awww.netflix.com+in+space+no+one+can+hear+you+scream;
for ROVI:
http:// www.bing.com/searchq = site%3 aww.allovi.com + in + space + no + one + can + heel + you + scream. FIG. 3 is a diagram illustrating an exemplary screen display 300 of this source-specific search. As expected, the movie "Alien" is surfaced because the quote from the movie (including the user query in this example) is contained in rich unstructured text on the ALLROVI website, or because a hyperlink to the ROVI "Alien" web page uses this quote as anchor text. Similar searches may be issued to any primary search engine through similar HTTP POST requests, or via an alternate API surfaced through a Web search engine. The result of a source-specific Web search is a list of documents that are deemed to be somehow relevant to the query by the querying Web search engine. Depending on what information is available to the Web search engine, the internal relevance score of the engine may also be returned for each listed document.
The filtering component 230 is configured to filter each list of documents returned by an authoritative source-specific Web search and retain only URLs potentially relevant to the entity type of interest. FIG. 4 is a schematic diagram illustrating an exemplary screen display 400 demonstrating why filtering is sometimes considered necessary. In the illustrated example, the authoritative source web site contains many pages that are not directly related to the entity type (in this example, the entity type is "movie"). Any of a number of methods may be used to filter out irrelevant pages. The following examples are intended to be illustrative, but not limiting, of the present invention. The regular expression URL pattern may be determined manually for each entity type and authoritative source. The source-specific search results that match the pattern will remain unfiltered, while URLs that do not match the pattern will be filtered out. Further, the schema can specify where the source's own internal entity ideas are represented, which can be used to identify the presentation of the source of an entity in an entity merge/parse step, described more fully below with reference to entity merge component 232. Exemplary patterns for IMDB and NETFLIX include:
IMDB mode: www.imdb.com/title/{ ID starting with tt };
NETFLIX mode: www.netflix.com/Movie/{ string }/{ numeric id }.
These URL patterns may also be automatically extracted from authoritative source websites that are related to a given entity type given an exemplary set of documents. The entity search engine 212 may instead emerge an API through which the source web site and the feed submit URL patterns. Alternatively, the relevance of Web pages to particular entity types along with internal source IDs may be embedded on those pages using predetermined criteria. Any and all such variations, and any combination thereof, are contemplated to be within the scope of embodiments of the present invention.
If the internal data source ID is not available from pages that emerge in the source-specific search, entity resolution may be used to match those page results with entities from the data source. Entity resolution is described more fully below with reference to entity merge component 232. This additional application of entity resolution, described herein, can provide a mapping between the surfaced URL and the internal ID, enabling filtering by the entity merge component 232.
The entity merge component 232 is configured to run all sources for entity types through the entity resolution pipeline and is configured to compare results of entity resolution with filtered source-specific results. Given a set of entities from multiple sources, the entity parsing represents a two-step process at a high level that parses equivalent entities between the sources and then merges the set of equivalent entities into a single, richer representation of the underlying entity. Initially, only entity resolution was necessary. Merging is later required for the final step of the methods described herein, where a single merged representation of a query-relevant entity is surfaced through an entity search, as described more fully below. Parsing and merging utilizes characteristics or attributes of entities. Examples for the entity type "movie" include title, year of release, director(s), actor lineups, show time, studio, genre(s), etc. For the entity type "person," the characteristics or attributes may include name, birthday/age, gender, occupation, geographic location, home address, phone number, and other personal identification information. Entities of the same entity type from different sources are compared with respect to their attributes, ultimately resulting in a set of matching entities. Mergers are employed on these sets of matching entities, and prototyped entities are generated for each by merging and combining the individual attributes of the member entities.
The results of the entity resolution are combined with the results of the filtered source-specific search. Because the source-specific search is run on a source of entity data that has been run through entity resolution, the search engine 212 can group documents retrieved through the search that are known to be related entities (due to the filtering component 230) and that have been matched with all other known representations of the same entity (using the results of the entity resolution). In this manner, search engine 212 groups documents related to the entity from the same, filtered source-specific search. Linking from the source-specific search to the entity resolution results is accomplished by using the source internal entity ID, as described above.
Each authoritative source is assigned a quality factor Q. The coefficient is based on the total number of entities in the source that can be surfaced during the Web search. The result of this step is a collection of URLs relevant to the user-submitted query that represent the same entities of the entity type of interest (as represented by the input search query). For example, the following two URLs may include one of these result sets for Batman 1989 movies:
www.imdb.com/title/tt0096895;
www.netflix.com/Movie/Batman/287290。
the ranking component 234 is configured to rank a set of identical or identical entities extracted from the entity resolution and the filtered source-specific Web search. Several methods may be followed to generate such a ranking. Two main sources in the signal are extracted for ranking. The first source is the ranking of documents from the source-specific Web search (and optionally the relevance scores returned by the Web search engine, if any). Without a relevance score, these search results generate a ranking for each filtered document (e.g., the first document may be assigned a rank of one, the second a rank of two, etc.). Relevance scores may be assigned to documents that have been returned, if available. A second source of signals for use in ranking the relevant set of identical entities may be from applying a standard database search on query strings in data sources entered into the entity resolution. For example, a query string may be parsed into words, and the keywords matched to attributes of the entity. Additional sources of signals may also be available depending on the data available to the entity search engine. For example, there may be popularity information such as user ratings that are used for collaborative filtering for recommendations. Another example may be historical click-through rate data regarding merging entities that were surfaced by an entity search engine in the past. Along similar lines, toolbar data from authoritative sites may be used for general Web search ranking, but may be more relevant for entity searches, such that by including them as signals in entity searches, they may receive higher weighting than they receive for their contribution to the source-specific relevance score.
The ranking or correlation score combining step is then applied to the set of source-specific search rankings, search-specific search correlation scores, attribute-based search correlation scores, and other sources of related signals. Basically, such a step combines multiple relevance scores from these many separate rankings as a feature of the meta-ranking for the grouped entities. The result of this step is then a ranking on the different entities that have been grouped by entity resolution through separate source-specific searches.
The presentation component 236 is configured to present at least a portion of the ranked list of entities that are relevant to the user-submitted query. The merging according to entity resolution can be used to merge attributes from separate representations of entities across different crawled/indexed/subscribed sources. The illustrations/images, metadata, entity behaviors, and even entity-specific advertisements can be surfaced in a natural and consistent manner by the entity search system, as illustrated in the screen display 700 of FIG. 7 described more fully below.
In addition, search engine 212 may utilize faceted searches to suggest to the user how their search results may be narrowed or further refined. For example, if multiple entity types are surfaced in an integrated search results page, the user may be provided with an option to deselect the type (e.g., retain a restaurant from the results but remove a cafe). Facets may be presented within a single entity type at a more granular level. For example, if movie results are presented to the user, the search engine 212 may allow the user to specify a genre (e.g., science fiction novels) and a year of release range (e.g., movies released in 2000). One embodiment for presenting a faceted search is shown in the screen displays of FIGS. 8A and 8B, which are described more fully below.
Turning now to fig. 5, shown is an exemplary screen display 500 illustrating an example of an entity search for entering a search query "James Bond" in accordance with an embodiment of the present invention. Initially, the entity type has been identified as "movie". Thus, in this illustrated embodiment, it may be assumed that the user desires to find a movie in the James Bond franchise (e.g., on NETFLIX), but does not know the movie title. Even if the input query string (i.e., "James Bond") is not included in any of the eventually surfaced movie titles (and may not belong to other attributes for the desired movie on NETFLIX), the resulting movie is surfaced because the query string may appear in anchor text linked to NETFLIX, or potentially in another source (e.g., blog text with the comment "diamond Forever is my favorite James Bond movie") parsed for NETFLIX and indexed by the search engine. Moreover, by performing a consolidated entity search in accordance with embodiments of the invention, a search engine (e.g., search engine 212 of FIG. 2) may emerge metadata about related entities. Note that to the right of the surfaced movie entity are a plurality of selectable icons representing authoritative sources and/or entity behaviors (e.g., buying tickets, streaming, rentals, etc.). The user may select, for example, a NETFLIX icon and be navigated to the NETFLIX version of the corresponding James Bond movie according to exemplary intent.
Referring to FIG. 6, shown is an exemplary screen display 600 illustrating another entity search example, this time for the input search query "Neo and Trinity," in accordance with an embodiment of the present invention. Again, the entity type has been identified as "movie". As illustrated, the user is looking for one of the movies in the Matrix trilogy, but does not remember the name of the movie, only remembering the names of the two main characters. A search engine performing an entity search according to an embodiment of the present invention is able to retrieve desired entities by integrating source-specific searches on a site that includes cast lineups and their featured characters, critics and lovers reviews, plots, etc. Note that the character names "Neo" and "Trinity" are unlikely to have appeared in the summary of the movie associated with the conventionally searched source.
Referring to FIG. 7, a screen display 700 illustrating an exemplary presentation of results of an integrated search implemented according to an embodiment of the invention is shown. In the illustrated example, the user query is for the movie quote "there can be only one" from the movie Highlander. Note how the merged results of the entity resolution and the source specific search are displayed in a consistent manner. Each movie entity is displayed by an illustration, a title, and metadata. Different sources of data providing entity behavior (such as rent, flow, read comments, purchase, etc.) are surfaced through a consistent interface of vertically stacked icons on the right-hand side. Note also that only documents in the desired entity type are surfaced (due to source-specific searching and filtering), and how there is only one result per different entity (due to entity parsing and merging).
Referring to fig. 8A and 8B, screen displays illustrating a faceted entity search according to an embodiment of the present invention are shown. For the initial search (FIG. 8A), multiple types of entities potentially map to the input search query (i.e., "Superman"). In the illustrated example, "Superman" may refer to any of the entertainment entity types "Movies (Movies)", "television Series (TV Series)", or "Songs (Songs)". In the illustrated screen display of fig. 8A, the user is able to select a facet and by doing so zoom out on the characteristics of the displayed entity. For example, fig. 8B shows a user selection of the entity type "TV Series".
Turning now to FIG. 9, a flow diagram is illustrated showing an exemplary method 900 for targeting and parsing the results of a Web search based on entity type, in accordance with embodiments of the present invention. Initially, as indicated at block 910, a search query is received, for instance, by query receiving component 222 of search engine 212 of FIG. 2. As indicated at block 912, at least one entity type is determined for the received search query. This can be accomplished, for example, by utilizing the entity type determination component 224 of the search engine 212 of FIG. 2. The Web search is performed for the received search query, as indicated at block 914 (e.g., by utilizing search component 228 of fig. 2). The Web search is limited to a plurality of sources that have been identified as authoritative for the at least one entity type determined for the received search query. As indicated at block 916, the results of the Web search are filtered to create a filtered list of search results (e.g., by utilizing filtering component 230 of FIG. 2). Each search result in the filtered list of search results relates to an entity of the at least one entity type determined for the received search query. Equivalent entities identified by different ones of the multiple sources are merged (as indicated at block 918) to create a merged list of search results. This may be accomplished, for example, by utilizing the entity merge component 232 of FIG. 2. Each search result in the consolidated list of search results relates to a different entity of the at least one entity type determined for the received search query. As indicated at block 920, the consolidated list of search results is sorted for presentation based on ranking values, for example, by utilizing sorting component 234 of FIG. 2. The ranking values assigned to the search results in the consolidated list of search results representing equivalent entities are aggregate ranking values calculated from the individual ranking values provided for the entities from at least a portion of the different ones of the plurality of sources.
Referring to FIG. 10, a flow diagram is illustrated showing another exemplary method 1000 according to an embodiment of the present invention, the method 1000 for targeting and parsing the results of a Web search based on entity type. Initially, as indicated at block 1010, at least one entity type is associated with a received search query, for example, by utilizing entity type determination component 224 of search engine 212 of FIG. 2. As indicated at block 1012, a plurality of predetermined Web sources identified as authoritative for the at least one entity type are searched to determine a list of search results (e.g., by utilizing the search component 228 of FIG. 2). As indicated at block 1014, the results of the search are filtered with respect to the at least one entity type to create a filtered list of search results (e.g., by utilizing filtering component 230 of FIG. 2). Each search result in the filtered list of search results relates to an entity of the at least one entity type determined for the received search query. The filtered list of search results is then compared to the parsed entity list, as indicated at block 1016, to determine equivalent entities identified by different ones of the plurality of predetermined sources. The equivalent entities determined to have been identified by different ones of the multiple sources are merged (as indicated at block 1018) to create a merged list of search results. This may be accomplished, for example, by utilizing the entity merge component 232 of FIG. 2. Each search result in the consolidated list of search results relates to a different entity of the at least one entity type determined for the received search query. As indicated at block 1020, the consolidated list of search results is sorted for presentation based on the ranking values, for example, by utilizing sorting component 234 of FIG. 2. The ranking values assigned to the search results in the consolidated list of search results representing equivalent entities are aggregate ranking values calculated from the individual ranking values provided for the entities from at least a portion of the different ones of the plurality of sources.
It will be appreciated that embodiments of the present invention provide systems and methods for integrating the advantages of vertical searches with the advantages of Web searches to provide a rich search experience with entity type characterization. The present invention has been described in relation to particular embodiments, which are intended in all respects to be illustrative rather than restrictive. Alternative embodiments will become apparent to those of ordinary skill in the art to which the present invention pertains without departing from its scope.
While the invention is susceptible to various modifications and alternative constructions, certain illustrated embodiments thereof are shown in the drawings and have been described above in detail. It should be understood, however, that there is no intention to limit the invention to the specific forms disclosed, but on the contrary, the intention is to cover all modifications, alternative constructions, and equivalents falling within the spirit and scope of the invention.
Those of ordinary skill in the art will understand that the order of the steps shown in method 900 of fig. 9 and method 1000 of fig. 10 is not meant to limit the scope of the present invention in any way, and that, in fact, the steps may occur in a wide variety of different orders in embodiments of the present invention. Any and all such variations, and any combination thereof, are contemplated to be within the scope of embodiments of the present invention.
Claims (15)
1. One or more computer-readable storage media storing computer-useable instructions that, when used by one or more computing devices, cause the one or more computing devices to perform a method (900) for targeting a Web search and parsing results of the Web search based on an entity type, the method comprising:
receiving (910) a search query;
determining (912) at least one entity type for the received search query;
performing (914) a Web search for the received search query, the Web search being limited to a plurality of sources that have been identified for the at least one entity type;
filtering (916) results of the Web search to create a filtered list of search results, each search result in the filtered list of search results relating to an entity of the at least one entity type;
merging (918) equivalent entities identified by different ones of the plurality of sources to create a merged list of search results, each search result in the merged list of search results being related to a different entity of the at least one entity type; and
the consolidated list of search results is ordered (920) based on ranking values, wherein the ranking value assigned to at least one of the search results in the consolidated list of search results is an aggregate ranking value calculated from individual ranking values provided for entities associated with the at least one of the search results from at least a portion of the different ones of the plurality of sources.
2. The one or more computer-readable storage media of claim 1, wherein filtering results of the Web search to create the filtered list of search results comprises filtering the results of the Web search at least in part by utilizing a uniform resource locator pattern.
3. The one or more computer-readable storage media of claim 1, wherein merging the equivalent entities identified by the different ones of the plurality of sources to create the merged list of search results comprises comparing the filtered list of search results to a parsed list of entities in order to determine the equivalent entities identified by the different ones of the plurality of sources.
4. The one or more computer-readable storage media of claim 1, wherein the method further comprises presenting at least a portion of the ranked list of search results.
5. The one or more computer-readable storage media of claim 4, wherein the information associated with at least one of the presented portions of the sorted list of search results is derived from a plurality of sources of the plurality of sources.
6. The one or more computer-readable storage media of claim 5, wherein the information derived from at least one of the plurality of sources is presented without being associated with the at least one search result.
7. The one or more computer-readable storage media of claim 5, wherein the method further comprises presenting an indication of at least a portion of the plurality of sources from which the information associated with the at least one search result was derived.
8. The one or more computer-readable storage media of claim 7, wherein at least a portion of the indications of the ones of the plurality of sources are selectable.
9. A method (1000), performed by one or more computing devices comprising at least one processor, for determining a target for a Web search and parsing results of the Web search based on an entity type, the method comprising:
associating (1010) at least one entity type with the received search query;
searching (1012) a plurality of predetermined Web sources identified for the at least one entity type to determine a list of search results;
filtering (1014) the list of search results with respect to the at least one entity type to create a filtered list of search results, each search result in the filtered list of search results relating to an entity of the at least one entity type;
comparing (1016) the filtered list of search results to the parsed entity list in order to determine equivalent entities identified by different ones of the plurality of predetermined sources;
creating (1018) a consolidated list of search results by consolidating the equivalent entities determined to have been identified by the different ones of the plurality of predetermined sources, each search result in the consolidated list of search results relating to a different entity of the entity type; and
ordering (1020) the consolidated list of search results based on ranking values, wherein a ranking value assigned to at least one of the search results in the consolidated list of search results is an aggregate ranking value calculated from individual ranking values provided for entities associated with the at least one of the search results, the individual ranking values provided by at least a portion of the different ones of the plurality of predetermined sources.
10. The method of claim 9, wherein filtering the list of search results with respect to the at least one entity type to create the filtered list of search results comprises filtering the list of search results at least in part by utilizing a uniform resource locator pattern.
11. The method of claim 9, further comprising presenting at least a portion of the ordered list of search results.
12. The method of claim 11, wherein information associated with at least one search result in the presented portion of the ordered list of search results is derived from a plurality of sources in the plurality of predetermined sources.
13. The method of claim 12, wherein said information derived from at least one of said plurality of predetermined sources is not presented.
14. The method of claim 12, further comprising presenting an indication of at least a portion of said plurality of predetermined sources from which said information associated with said at least one search result was derived.
15. The method of claim 14, wherein at least a portion of said indications of said plurality of predetermined sources are selectable.
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US13/341,284 | 2011-12-30 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| HK1180410A true HK1180410A (en) | 2013-10-18 |
| HK1180410B HK1180410B (en) | 2018-02-02 |
Family
ID=
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US9443021B2 (en) | Entity based search and resolution | |
| US20230205828A1 (en) | Related entities | |
| US10019495B2 (en) | Knowledge panel | |
| US8819006B1 (en) | Rich content for query answers | |
| KR102001647B1 (en) | Contextualizing knowledge panels | |
| US9916384B2 (en) | Related entities | |
| CN101305390A (en) | Media object metadata association and ranking | |
| JP5555809B2 (en) | System and method for television search assistant | |
| US20140181070A1 (en) | People searches using images | |
| US10055463B1 (en) | Feature based ranking adjustment | |
| HK1180410A (en) | Entity based search and resolution | |
| HK1180410B (en) | Entity based search and resolution | |
| CN103064954B (en) | Search based on entity and parsing |