[go: up one dir, main page]

WO2003032199A2 - Classification de sources d'information effectuee a l'aide de structures de graphe - Google Patents

Classification de sources d'information effectuee a l'aide de structures de graphe Download PDF

Info

Publication number
WO2003032199A2
WO2003032199A2 PCT/US2001/042479 US0142479W WO03032199A2 WO 2003032199 A2 WO2003032199 A2 WO 2003032199A2 US 0142479 W US0142479 W US 0142479W WO 03032199 A2 WO03032199 A2 WO 03032199A2
Authority
WO
WIPO (PCT)
Prior art keywords
information source
query
graph structure
knowledge representation
structures
Prior art date
Application number
PCT/US2001/042479
Other languages
English (en)
Other versions
WO2003032199A3 (fr
Inventor
Kenneth P. Baclawski
Original Assignee
Jarg Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jarg Corporation filed Critical Jarg Corporation
Priority to PCT/US2001/042479 priority Critical patent/WO2003032199A2/fr
Publication of WO2003032199A2 publication Critical patent/WO2003032199A2/fr
Publication of WO2003032199A3 publication Critical patent/WO2003032199A3/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation

Definitions

  • the invention relates to methods and apparatus for the classification of information sources and the display of information to a user.
  • the search process itself has been studied at least since the 1930s, and a standard model was developed by the mid-1960s.
  • the searcher has an "information need" which the searcher tries to satisfy using a large collection or "corpus" of information sources.
  • the information sources that satisfy the searcher's needs are the "relevant" information sources.
  • the searcher expresses an information need using a formal statement called a "query.” Queries may be expressed using topics, categories and/or words.
  • the query is then given to a search intermediary.
  • the intermediary was a person who specialized in searching. It is more common today for the intermediary to be a computer system. Such systems are called information retrieval systems or online search engines.
  • the search intermediary tries to match the topics, categories and/or words from the query with information sources in the corpus.
  • the intermediary responds with a set of information sources that, so it is hoped, satisfies the searcher's needs.
  • another very commonly used technique to find information in a corpus is to start with a document and then follow citations or references within the document to find other documents in the corpus. References in these documents are then used to find further documents. This technique is called “browsing" and online browsing tools are now becoming very popular. Such tools allow a searcher to quickly follow references contained in information sources, often by simply "clicking" on a word or picture within the information source.
  • Non-word based techniques currently employ approaches to extracting relevant information that are different and distinct from those used in word based systems and generally involve extracting data "features" from the raw data.
  • Features of images, sound and video streams can be represented in a computer system as a set of data structures stored in a database.
  • Features can be as simple as the value of an attribute such as brightness of an image, but many features are more complicated and are thus represented using a complex data structure.
  • features can be extracted from structured documents by parsing the document to produce data structures, and can be extracted from unstructured documents by using one of the many feature extraction algorithms that have been developed for implementation on a computer. As in the case of structured documents, feature extraction from an unstructured document produces data structures.
  • the data structures that represent features typically conform to a "data model" for the database that determines the kinds of components and attribute values that are allowed.
  • Each feature can have one or more values associated with components of the data structure that represents the feature.
  • the data structure can have a single component with an associated value, and the feature can be represented by one attribute of the object.
  • Features that are more complex can be represented by several inter-related components, each of which may have attribute values.
  • the data model for features at the domain level is often called an "ontology.”
  • An ontology models knowledge within a particular domain, such as, for example, medicine.
  • An ontology can include a concept network, specialized vocabulary, syntactic forms and inference rules.
  • an ontology specifies the features that objects can possess as well as how to extract features from objects.
  • the data structure is called a "knowledge representation" of the information source.
  • the quality of a search is measured using two numbers.
  • the first number represents how thorough the search was. It is the fraction of the total number of relevant information sources that are presented to the searcher. This number is called the "recall.” If the recall is less than 100%, then some relevant information sources have been missed.
  • the second number represents the fraction of the total number of information sources that are presented to the searcher that are judged to be relevant. This number is called the "precision.” If the precision is less than 100%, then some irrelevant information sources were presented to the searcher.
  • the recall can always be increased by adding many more information sources to those already presented, which can decrease the precision. Similarly, the precision can be increased by reducing the number of references retrieved and presented to the searcher, which can decrease the recall.
  • the recall and precision should be balanced so as to achieve a search that is as careful and thorough as possible.
  • both the information sources and queries are processed to generate knowledge representations that consist of graph structures.
  • the knowledge representation graph structures are converted into graph structure views and the graph structure views for both the query and the information sources are then displayed to a searcher.
  • the searcher can examine the source for relevance.
  • available information sources are classified by comparing the knowledge representation of a query with the knowledge representations of the information sources by matching the graph structures with graph matching algorithms. Those information sources that have a substructure that matches the query in full, or in part, are classified by the largest matching substructure of the query.
  • a searcher it is possible for a searcher to request the "next occurrence" of a knowledge representation graph structure in an information source.
  • the computer system searches the current information source knowledge representation for another substructure that matches the query graph structure occurring at a subsequent point in the information source.
  • requesting a "previous occurrence" causes the system to search for a matching substructure occurring at a previous point in the information source.
  • information sources are classified by constructing hierarchies of knowledge representations.
  • the simplest construction is obtained by using the knowledge representation of a query as the top of the hierarchy.
  • the structures in the hierarchy are then substructures of the query.
  • the hierarchy of structures may also be constructed by using the knowledge representation of the query as the bottom of the hierarchy. Structures in the hierarchy, in this case, are structures that contain the query. Views of this hierarchy can be displayed to a searcher with a substructure view being displayed adjacent to the information source from which it was derived.
  • the graph structure corresponding to a knowledge representation consists of vertices joined by directed edges.
  • Each vertex represents a concept that can be visually portrayed as a word, phase and/or icon.
  • a vertex may also contain a category that is visually portrayed either textually or by a distinct shape, color and/or icon.
  • An edge may be labeled by an edge type. Different types of edges can be distinguished by using a textual label or by using a distinct shape, color and/or icon.
  • Two vertices that are joined by an edge are called adjacent vertices.
  • the categories, concepts and edge types used to construct the graph structure are specified by an ontology for the knowledge domain.
  • the vertices of a graph structure view can be displayed on a computer screen next to the corresponding items, such as words, phrases and visual features, of an information source view. Selecting a vertex in the graph structure view causes the selected vertex and vertices adjacent to the selected vertex to be "highlighted.” In addition, the corresponding items in the information source view are highlighted. Similarly, selecting a feature in the information source view causes the corresponding vertex in the graph structure to be highlighted. Highlighting can be accomplished by using the same feature (such as the same color or the same location on the screen) for corresponding parts of the two views.
  • a searcher By selecting a succession of vertices in the graph structure view, a searcher can perform knowledge navigation of the information source. By successively selecting items in the information source view, a searcher can perform knowledge exploration of the information source.
  • Figure 1 is a schematic block diagram that illustrates the creation and display of a graph structure from a query or an information source.
  • Figure 2 is a schematic block diagram that illustrates the processing of a query to locate and classify information sources that respond to the query using graph structures.
  • Figure 3 is a flowchart that illustrates the steps performed in the query processing shown in Figure 2.
  • Figures 4A and 4B when placed together, form a flowchart that illustrates a process for matching a query graph structure to an information source graph structure using subgraph structures.
  • Figure 5 is a flowchart that illustrates a process for matching a query graph structure to an information source graph structure using supergraph structures.
  • Figure 6 is a screen shot of a sample display illustrating the processing of a query by means of graph structures which shows the query entered in a natural language.
  • Figure 7 is a screen shot of a sample display illustrating the processing of a query by means of graph structures which shows the query converted into a graph structure.
  • Figure 8 is a screen shot of a sample display illustrating the processing of a query by means of graph structures, which shows how vertex definitions of the graph structure are displayed.
  • Figure 9 is a screen shot of a sample display illustrating the processing of a query by means of graph structures, which shows how edge definitions of the graph structure are displayed.
  • Figure 10 is a screen shot of a sample display illustrating the processing of a query by means of graph structures which shows how processing of the query is initiated.
  • Figure 11 is a screen shot of a sample display illustrating the processing of a query by means of graph structures which shows the results of the processing including the graph substructures discovered in the search and the documents in which the substructures were discovered.
  • Figure 12 is a screen shot of a sample display illustrating the processing of a query by means of graph structures which shows how additional information concerning the results of the processing are displayed.
  • Figure 13 is a screen shot of a sample display illustrating the processing of a query by means of graph structures which shows how relevance navigation and exploration is initiated.
  • Figure 14 is a screen shot of a sample display illustrating the processing of a query by means of graph structures, which shows an expanded view of a selected information source.
  • Figure 15 is a screen shot of a sample display illustrating the processing of a query by means of graph structures in which items in the selected information source are highlighted to show correspondence with graph structure features.
  • Figure 16 is a screen shot of a sample display illustrating the processing of a query by means of graph structures, which shows how knowledge exploration is initiated.
  • Figure 17 is a screen shot of a sample display illustrating the processing of a query by means of graph structures which shows how corresponding vertices in the graph structure are highlighted when items are selected in the information source document.
  • Figure 18 is a screen shot of a sample display illustrating the processing of a query by means of graph structures which shows knowledge exploration in which corresponding vertices in the information source are highlighted when vertices are selected in the graph structure.
  • Figure 19 is a block schematic diagram of an illustrative hardware implementation of the inventive classification system.
  • Figure 1 illustrates the basic process by which a query or information source is converted into a graph structure that can then be visually displayed.
  • This process begins when a query or information source 100 is provided to a knowledge extractor 102.
  • the knowledge extractor 102 is a known processor or engine that uses a knowledge extraction algorithm to process the information in the query or information source to generate a knowledge representation of the input.
  • the knowledge extractor 102 may also use an ontology 104 to assist in the knowledge extraction process.
  • a large variety of knowledge extraction algorithms has been developed for media such as sound, images and video streams. For example, medical images typically use edge detection algorithms to extract the data objects, while domain- specific knowledge is used to classify the data objects as medically significant objects, such as blood vessels, lesions and tumors.
  • wavelet analysis has been used to characterize the texture of a region and to determine a shape (such as a letter) no matter where the shape is located in, or what orientation the shape has, within the image.
  • An example of a knowledge extraction process is described in detail in an article entitled "An Abstract Model for Semantically Rich Information Retrieval", Kenneth P. Baclawski, Northeastern University, March 30 1994, the disclosure of which is incorporated by reference in its entirety.
  • the result of the knowledge extraction process is a knowledge representation 106 that, in the aforementioned article, is implemented by a graph structure called a "keynet".
  • the keynet structure is described using the terminology of graph theory from mathematics.
  • the structure consists of vertices and edges, where each edge connects one vertex to another (possibly the same) vertex.
  • An edge can be labeled to indicate its purpose, and this label is called the relationship represented by the edge.
  • RDF Resource Description Framework
  • vertices are called resources, and an edge is called a statement.
  • the label on an edge is called the property represented by the edge.
  • the graph structures that represent the knowledge representations conform to an ontological data model that determines the kinds of components and attribute values that are allowed.
  • Many current systems that perform knowledge extraction from information objects use very simple ontologies, but other more complicated systems can be designed.
  • the keynet graph structure can be converted into a graph structure view by means of a graphic converter 108.
  • the graph structure view is a visual structure that is easy to read.
  • the graphic converter is a simple algorithm that examines each vertex in the keynet and determines whether the directed edges that are connected to the vertex leave the vertex or enter it. The vertices are then rearranged into a more or less hierarchical structure so that vertices with edges that only leave the vertex are located at the top of the structure and vertices with edges that only enter the vertex are located at the bottom. The remaining vertices are located between the top and bottom levels as dictated by the edge connections.
  • graph structure matching can also be used to classify information sources in their order of relevance as perceived by a human searcher.
  • information sources can be classified according to their relevance to a query by matching the graph structures of the information sources to the graph structure of the query.
  • the classification process is illustrated schematically in Figure 2 and the steps of the process are shown in the flowchart of Figure 3. This process starts in step 300 and proceeds to step 302 where a new query 200 is received.
  • step 304 a determination is made whether the query is acceptable for use with the knowledge extractor 202.
  • the query must be formulated using the ontology 204 in order for it to operate successfully with the knowledge extractor 202. Thus, a check must be made to ensure that the terms and relationships described by the query are in fact compatible with the ontology 204.
  • step 306 the query may be reformatted in order to make it compatible with the search engine that will later be used to retrieve information source documents from the information source collection or corpus.
  • step 308 the knowledge representation embodied by the query is extracted by the knowledge extractor 202.
  • the result is a knowledge representation 206 which, as previously discussed in the preferred embodiment of the invention, is a keynet.
  • the knowledge representation 206 may be presented to the user for editing and modification. Alternatively, the knowledge representation 206 can be generated by the user directly without the knowledge extractor 202.
  • the knowledge representation 206 is provided to a high recall retrieval engine 208.
  • This retrieval engine compares the knowledge representation that corresponds to the query with knowledge representations that have been previously stored for the information sources.
  • Retrieval engines of this type are known and operate by indexing either a single database or distributed databases to retrieve relevant documents. For example, a retrieval engine that is suitable for use with the present invention is disclosed in detail in U.S. Patent No.
  • the retrieval engine produces a plurality of information source knowledge representations 210 and, in step 312, these knowledge representations are presented to a graph matching processor 212 along with the knowledge representation 206 of the query.
  • the graph matching processor 212 organizes the collection of information source knowledge representations by their relevance to a human searcher. Thus, by progressing down the ordered list of knowledge representations, the searcher can progress through the information source knowledge representations in order of their relevance. Thus, the resulting search not only has high recall, but also has high precision and relevance.
  • the result is an ordered list of references 214, which, in step 314, are transmitted to the user.
  • the user may then display the list in step 316 as discussed below.
  • the graph matching processor 212 can make use of the ontology 204 to define any appropriate inference rules during the matching process.
  • the graph matching processor 212 compares the query graph structure with the knowledge representations of each of the information sources and classifies the sources by constructing a hierarchy of graph structures.
  • This hierarchy is an ordered set for which each pair of elements has a least upper bound and a greatest lower bound.
  • the concepts in the hierarchy can be ordered by generality, i.e., a concept A is less than a concept B if A is less general (more specific) than B.
  • the hierarchy of structures may be constructed in several ways. The simplest construction is obtained by using the knowledge representation of the query as the top of the hierarchy.
  • the structures in the hierarchy are then substructures of the query. Such structures are called subgraphs of the query.
  • the subgraphs of the query are arranged by containment of one subgraph in another. This construction method is best suited for highly specific queries.
  • the strategy for unspecific queries is to classify information sources using structures (called supergraphs) that contain more features than the original query.
  • supergraphs are constructed by starting with the query and adding new vertices to those already in the supergraph. The vertices are added so that each added vertex is adjacent to another vertex already in the supergraph.
  • each supergraph must occur in at least one information source as part, or all, of its knowledge representation. The supergraphs of the query are then arranged by containment of one supergraph in another.
  • the hierarchy of structures is constructed by using both subgraphs and supergraphs of the query.
  • Each information source is classified by the largest structures in the hierarchy that are contained in the knowledge representation of the information source.
  • a single information source can belong to more than one classification.
  • the large set of relevant information sources is subclassified into smaller sets of information sources.
  • the user is presented a list of relevant supergraphs and subgraphs rather than a set of information sources.
  • the classifications and subclassifications form the hierarchical structure, called a taxonomy or classification hierarchy.
  • FIG. 4A and 4B The process of comparing a query to information source documents by graphical analysis of subgraphs is illustrated in Figures 4A and 4B, which, when placed together, form a flowchart of the process.
  • This process starts in step 400 and proceeds to step 402 in which a graph structure corresponding to the query knowledge representation is selected.
  • the process then proceeds to step 404 where a vertex is selected in the query graph structure.
  • step 406 the graph structure of the information source is examined to determine whether the same vertex appears in the information source graph structure. If the vertex does not appear in the information source graph structure, as determined in step 406, then the process proceeds to step 410 in which the query graph structure is examined to determine whether more vertices are present that have not yet been processed. If there are more vertices present, the process proceeds back to step 404 and the next vertex in the query graph structure is selected for processing.
  • step 406 it is determined that a selected vertex in a query graph structure appears in the information source graph structure, then the routine proceeds to step 405 where information identifying the selected vertex and the corresponding information source vertex are placed in a candidate group of vertices.
  • This information might consist, for example, of information identifying the concept and associated edges in the query graph structure and information identifying the location and content of the document features that constitute the vertices in the information source document.
  • step 410 determine whether more unprocessed vertices are present. If so, the process then returns to step 404 where the next unprocessed vertex is selected from the query graph structure.
  • step 416 in which the candidate vertex group is examined to find vertices that have corresponding edges in the query graph structure and information source graph structure.
  • step 416 one of the pair of vertices previously identified from the query and information source graph structures are selected in the candidate group.
  • step 418 the edges that appear in the query graph structure are examined. Each edge is compared to the edges in the corresponding vertex in the information source graph structure. This comparison is made in step 420. If the selected edges do not appear in the information source graph structure, then the process proceeds to step 424 in which the candidate group is examined to determine whether any vertex pairs remain that have not been processed. If so, the routine proceeds back to step 416 when the next pair of vertices in the candidate group is selected.
  • step 420 the selected edges appear in the information source graph structure
  • the information identifying the pair of vertexes in the candidate group is placed into an intersection group in step 422.
  • step 424 the process consisting of steps 416, 418, 420 and 422 is repeated. If not, the process finishes in step 426.
  • the result of this process is a subgraph structure of a knowledge representation that appears in the information source document that matches the query source graph structure.
  • the process illustrated in Figure 5 can be used to construct supergraphs of the query graph structure from the information source graph structures.
  • This process starts in step 500 and proceeds to step 502 where a vertex in the information source graph structure is selected.
  • this selected vertex is compared to the query graph structure to determine if the vertex is in the query graph structure. If it is, the process proceeds to step 510 where it is determined whether more vertices exist in the information source graph structure that have yet to be examined. If more vertices exist, the process proceeds back to step 502 in which the next vertex in the information source graph structure is selected.
  • step 504 if, in step 504, it is determined that the vertex selected in the information source graph structure is not in the query graph structure, then the routine proceeds to step 506 in which a determination is made whether the selected vertex is connected to a vertex in the query graph structure.
  • step 510 determines whether unprocessed vertices exist. If the selected vertex is connected to a vertex in the query graph structure, information identifying the vertex is placed in the supergraph list in step 508 and the process proceeds to step 510. If additional vertices remain to be processed, then steps 502, 504, 506 and 508 are repeated. If no additional vertices remain to be processed, then the process finishes in step 512.
  • lists that result from the information source classification process illustrated in Figures 2 and 3 can be visually displayed to a user.
  • the visual display facilitates relevance exploration and relevance testing of the retrieved information source documents.
  • FIG. 6 An illustrative graphic user interface is shown in Figure 6.
  • the graphic user interface consists of a window, or frame, 600 which contains a conventional menu 602 with menu selections such as "File” 604 that activates a drop down menu with selections that allow a user to open, close and save search files in a conventional manner.
  • the "Edit” menu selection 606 displays a dropdown menu with selections that allow the query to be modified.
  • the "History” menu selection 608 displays previous versions of the query and a "Help" menu selection 610 allows the user to select various help options in a conventional fashion.
  • a query is entered into text edit box 612 in a natural language.
  • a push button 614 may be provided, which can be used to start the search and classification process as will hereinafter be described.
  • Figure 7 illustrates the display of a graph structure that has been generated from the query that has been entered into text edit box 612.
  • elements that correspond to elements in Figure 6 have been given corresponding numerals.
  • window 600 in Figure 6 corresponds to window 700 in Figure 7.
  • the description of the elements in Figure 6 also applies to corresponding elements in Figure 7.
  • the query in box 712 has been used to generate a graph structure 718, which is displayed at graphics display area 716 of the window 700.
  • the graph structure 718 consists of four vertices 720, 722, 724 and 726. These vertices correspond to concepts, words and phrases that have been selected from the query by means of the knowledge extractor as described previously.
  • the vertices 720-726 are connected together by edges 728, 730 and 732, which represent actions and/or results that are expressed in the query.
  • the structure has been folded to fit it into the graphics display area 716.
  • the graph structure 718 not only illustrates the major concepts expressed in the query, but also their relationships as indicated by the edges 728-732.
  • the user may examine the definitions that are part of the ontology that was used to generate the graph structure. For example, as shown in Figure 8, selecting vertex 726 by means of the cursor 840 causes a pop-up text box 842 to appear.
  • the text box 842 contains the definition for the term in the vertex 826.
  • the user may examine the edge definitions that are part of the ontology that was used to generate the graph structure. For example, as shown in Figure 9, selecting edge 930 by means of the cursor 940 causes a pop-up text box 944 to appear.
  • the text box 944 contains the definition for the term represented by edge 930.
  • the classification process is started by pressing a pushbutton on the interface. As shown in Figure 10, the classification process is started by selecting button 1014 with cursor 1040.
  • Figure 11 illustrates how the result of the search and classification are displayed to the user. The results may be displayed in a variety of manners that would be obvious to those skilled in the art.
  • a scrolling list of the hierarchical list structure described above is displayed in the graphics area 1116. Each "line" in this display corresponds to one source reference. The supergraph or subgraph structures associated with that reference are shown on the left side of the display and the information source title or identifying information is shown on the right.
  • a subgraph structure 1150 is shown on the first line and the title of the source article 1152 from which the subgraph structure was derived is shown adjacent to the subgraph structure 1150.
  • additional subgraph structures 1152-1156 and titles 1160-1164 are displayed with the most relevant , source article located at the top of the list.
  • the titles can be selected by means of the cursor. Additional information concerning each information source can also be displayed. For example, as shown in Figure 12, this additional information might be displayed as a pop-up window 1266 when the cursor 1240 is moved over the line associated with an information source.
  • information source titles can be selected in order to expand the content of the information source. This operation is illustrated in Figure 13 in which title 1358 has been selected with the cursor 1340. The result is shown in Figure 14 in which the content of the document has been expanded in scrolling area 1470.
  • the display shown in Figure 13 in which title 1358 has been selected with the cursor 1340.
  • Figure 15 illustrates that the document content in area 1470 has been displayed with items 1572, 1574, 1576 and 1578 corresponding to graph structure vertices highlighted.
  • this highlighting is shown as a color different from the background color, but those skilled in the art will realize that highlighting can be accomplished in other manners such as by using the same location on the screen for corresponding parts of the two views. The manner of highlighting is not important to the operation of the present invention.
  • a related item 1579 is also highlighted. Item 1579 does not have a corresponding vertex in graph structure 1550, but is related to item 1572 which does have a corresponding vertex. In this manner, the system highlights not only those items that have corresponding vertices, but also related items.
  • the user can successively select items of the information source to perform knowledge exploration of the information source.
  • item 1672 has been selected with the cursor 1640, causing the item to indicate the selection, for example by changing color.
  • the selection of an item 1772 in the information source document causes not only the item to be highlighted, but also related items to be highlighted.
  • the related items 1779 and 1781 are also highlighted.
  • the new corresponding graph structure 1780 is displayed above the content portion 1770 with the corresponding vertex 1782 to be highlighted.
  • the new graph structure replaces the query 1612 and the article title 1658 ( Figure 16) with a new graph area 1783.
  • this highlighting can be accomplished in a variety of ways known to those skilled in the art.
  • the highlighting of related items allows the user to better urderstand the relationship of the items in the information source content.
  • a searcher can perform knowledge navigation of the information source. This is shown in Figure 18, in which a vertex 1892 has been selected in the graph structure 1650 ( Figure 16), in turn, causing the corresponding item 1874 to be highlighted in the document content section 1870.
  • the selection of a vertex causes related vertices to also be selected in graph structure 1890 (a new graph structure 1890 reflecting these related items is also displayed in the graphic area 1883.)
  • the corresponding items 1894 and 1896 are also highlighted in the document content 1870.
  • a searcher can request the "next occurrence" of a graph structure in the information source.
  • the computer system searches the current information source knowledge representation for another substructure that matches the query graph structure occurring at a subsequent point in the information source. If such a substructure is found, then the corresponding vertices of the information source are highlighted. Similarly, requesting a "previous occurrence" causes the system to search for a matching substructure occurring at a previous point in the information source
  • one embodiment of a system of the invention includes a user computer 1900 which communicates with a classification engine comprised of computer nodes 1902, 1904 and 1906 through a network 1908.
  • the individual computer nodes 1902-1906 may include local disks, or may, alternatively or additionally, obtain data from a network disk server (not shown.)
  • the computer nodes 1902-1906 of the classification engine may be of several types, including home node 1902 and index nodes 1904 and 1906.
  • the nodes 1902- 1906 of the classification engine need not represent distinct computers.
  • the classification engine consists of a single computer that takes on the roles of all home nodes 1902 and index nodes 1904-1906.
  • the classification engine consists of separate computers for each home node 1902 and index node 1904-1906. Those skilled in the art will realize many variations are possible which will still be within the scope and spirit of the present invention.
  • a user transmits the query to the classification engine and home node 1902 receives the query.
  • the home node 1902 is responsible for establishing the connection with the user computer 1900 to enable the user to transmit a query and to receive a response in an appropriate format.
  • the home node 1902 may also be responsible for any authentication and administrative functionality, for example the acceptance function performed in step 304 of Figure 3.
  • the home node 1902 is a World Wide Web server communicating with the user computer 1900 using the HTTP protocol.
  • the home node 1902 After verifying that the query is acceptable, the home node 1902 performs any reformatting necessary to make the query compatible with the requirements of the search engine as set forth on step 306 of Figure 3. The home node 1902 then transmits the query to the classification engine consisting of nodes 1904-1906 that, as previously discussed performs a search and classification of the information sources. This processing may involve the query being presented to a knowledge extractor that utilizes an ontology to extract a knowledge representation from the query.
  • the user may transmit a knowledge representation directly to the classification engine without the step of knowledge extraction.
  • the home node 1902 Upon receiving confirmation from the user that the knowledge representation is correct, the home node 1902 provides the query knowledge representation to a high recall retrieval engine which produces a collection of information source knowledge representations which collection is then transmitted to the graph matching processor along with the query knowledge representation. The results are then conveyed back to the home node 1902 and from there to the user computer 1900 for display as previously discussed.
  • a high recall retrieval engine which produces a collection of information source knowledge representations which collection is then transmitted to the graph matching processor along with the query knowledge representation.
  • the results are then conveyed back to the home node 1902 and from there to the user computer 1900 for display as previously discussed.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Selon cette invention, dans un système de classification de connaissances, les sources d'information ainsi que les requêtes sont traitées afin que des structures de graphe à représentation de connaissances soient générées. Ces structures de graphe relatives à la fois à la requête et aux sources d'information sont ensuite converties en représentations visuelles et présentées à un chercheur. La manipulation des représentations visuelles des structures de graphe pour chaque source d'information permet au chercheur d'examiner la source pour connaître son degré de pertinence. Une recherche peut être effectuée au moyen de la comparaison de la structure de graphe de la requête à la structure de graphe de chaque source d'information à l'aide d'un algorithme informatique de correspondance de graphe. Les sources d'information sont classées au moyen de l'élaboration de hiérarchies de représentations de connaissances. La structure la plus simple est obtenue en utilisant la représentation de connaissances d'une requête comme sommet de la hiérarchie. Les structures contenues dans la hiérarchie constituent des structures secondaires de la requête. La hiérarchie des structures peut également être élaborée en utilisant la représentation de connaissances de la requête comme base de la hiérarchie. Les structures de la hiérarchie, dans ce cas, sont des structures qui contiennent la requête. Les vertex d'une représentation visuelle d'une structure de graphe peuvent être affichés sur un écran d'ordinateur à côté des objets correspondants, tels que des mots, des phrases ou des éléments visuels, d'une représentation visuelle d'une source d'information. La sélection d'un vertex dans la structure de graphe entraîne la mise en évidence du vertex choisi et des vertex adjacents au vertex choisi. La sélection d'une succession de vertex dans la structure de graphe permet à un chercheur d'effectuer une navigation de connaissance de la source d'information. La sélection successive d'objets de la source d'information permet à un chercheur d'effectuer une exploration de connaissance de la source d'information.
PCT/US2001/042479 2001-10-05 2001-10-05 Classification de sources d'information effectuee a l'aide de structures de graphe WO2003032199A2 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/US2001/042479 WO2003032199A2 (fr) 2001-10-05 2001-10-05 Classification de sources d'information effectuee a l'aide de structures de graphe

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2001/042479 WO2003032199A2 (fr) 2001-10-05 2001-10-05 Classification de sources d'information effectuee a l'aide de structures de graphe

Publications (2)

Publication Number Publication Date
WO2003032199A2 true WO2003032199A2 (fr) 2003-04-17
WO2003032199A3 WO2003032199A3 (fr) 2003-08-28

Family

ID=21742964

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2001/042479 WO2003032199A2 (fr) 2001-10-05 2001-10-05 Classification de sources d'information effectuee a l'aide de structures de graphe

Country Status (1)

Country Link
WO (1) WO2003032199A2 (fr)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1681645A1 (fr) * 2005-01-14 2006-07-19 FatLens, Inc. Procédé et système de comparaison d'objets de données
US7606819B2 (en) 2001-10-15 2009-10-20 Maya-Systems Inc. Multi-dimensional locating system and method
EP2184685A1 (fr) * 2008-11-07 2010-05-12 Lingupedia Investments SARL Procédé de traitement sémantique du langage naturel avec langage pivot graphique
WO2010051966A1 (fr) * 2008-11-07 2010-05-14 Lingupedia Investments Sarl Procédé de traitement sémantique de langue naturelle au moyen d’une interlangue graphique
US9058093B2 (en) 2011-02-01 2015-06-16 9224-5489 Quebec Inc. Active element
US9613167B2 (en) 2011-09-25 2017-04-04 9224-5489 Quebec Inc. Method of inserting and removing information elements in ordered information element arrays
US9646080B2 (en) 2012-06-12 2017-05-09 9224-5489 Quebec Inc. Multi-functions axis-based interface
US9652438B2 (en) 2008-03-07 2017-05-16 9224-5489 Quebec Inc. Method of distinguishing documents
US9690460B2 (en) 2007-08-22 2017-06-27 9224-5489 Quebec Inc. Method and apparatus for identifying user-selectable elements having a commonality thereof
US10430495B2 (en) 2007-08-22 2019-10-01 9224-5489 Quebec Inc. Timescales for axis of user-selectable elements
US10606849B2 (en) 2016-08-31 2020-03-31 International Business Machines Corporation Techniques for assigning confidence scores to relationship entries in a knowledge graph
US10607142B2 (en) 2016-08-31 2020-03-31 International Business Machines Corporation Responding to user input based on confidence scores assigned to relationship entries in a knowledge graph
US10671266B2 (en) 2017-06-05 2020-06-02 9224-5489 Quebec Inc. Method and apparatus of aligning information element axes
US10845952B2 (en) 2012-06-11 2020-11-24 9224-5489 Quebec Inc. Method of abutting multiple sets of elements along an axis thereof
CN114579826A (zh) * 2022-04-27 2022-06-03 支付宝(杭州)信息技术有限公司 基于知识图谱的任务处理方法及装置
US20230418876A1 (en) * 2012-08-29 2023-12-28 Dennis Alan Van Dusen System and method for modeling, fuzzy concept mapping, crowd sourced supervision, ensembling, and technology prediction

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080058106A1 (en) 2002-10-07 2008-03-06 Maya-Systems Inc. Multi-dimensional locating game system and method
US8607155B2 (en) 2008-09-12 2013-12-10 9224-5489 Quebec Inc. Method of managing groups of arrays of documents

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6076088A (en) * 1996-02-09 2000-06-13 Paik; Woojin Information extraction system and method using concept relation concept (CRC) triples
US6006217A (en) * 1997-11-07 1999-12-21 International Business Machines Corporation Technique for providing enhanced relevance information for documents retrieved in a multi database search
AU1338201A (en) * 1999-10-20 2001-04-30 Ali Hussam System and method for location, understanding and assimilation of digital documents through abstract indicia

Cited By (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7606819B2 (en) 2001-10-15 2009-10-20 Maya-Systems Inc. Multi-dimensional locating system and method
EP1681645A1 (fr) * 2005-01-14 2006-07-19 FatLens, Inc. Procédé et système de comparaison d'objets de données
US10430495B2 (en) 2007-08-22 2019-10-01 9224-5489 Quebec Inc. Timescales for axis of user-selectable elements
US10719658B2 (en) 2007-08-22 2020-07-21 9224-5489 Quebec Inc. Method of displaying axes of documents with time-spaces
US10282072B2 (en) 2007-08-22 2019-05-07 9224-5489 Quebec Inc. Method and apparatus for identifying user-selectable elements having a commonality thereof
US9690460B2 (en) 2007-08-22 2017-06-27 9224-5489 Quebec Inc. Method and apparatus for identifying user-selectable elements having a commonality thereof
US11550987B2 (en) 2007-08-22 2023-01-10 9224-5489 Quebec Inc. Timeline for presenting information
US9652438B2 (en) 2008-03-07 2017-05-16 9224-5489 Quebec Inc. Method of distinguishing documents
RU2509350C2 (ru) * 2008-11-07 2014-03-10 Матрокс Профешнл Инк Способ семантической обработки естественного языка с использованием графического языка-посредника
WO2010051966A1 (fr) * 2008-11-07 2010-05-14 Lingupedia Investments Sarl Procédé de traitement sémantique de langue naturelle au moyen d’une interlangue graphique
EP2184685A1 (fr) * 2008-11-07 2010-05-12 Lingupedia Investments SARL Procédé de traitement sémantique du langage naturel avec langage pivot graphique
US9733801B2 (en) 2011-01-27 2017-08-15 9224-5489 Quebec Inc. Expandable and collapsible arrays of aligned documents
US10067638B2 (en) 2011-02-01 2018-09-04 9224-5489 Quebec Inc. Method of navigating axes of information elements
US9058093B2 (en) 2011-02-01 2015-06-16 9224-5489 Quebec Inc. Active element
US9613167B2 (en) 2011-09-25 2017-04-04 9224-5489 Quebec Inc. Method of inserting and removing information elements in ordered information element arrays
US11281843B2 (en) 2011-09-25 2022-03-22 9224-5489 Quebec Inc. Method of displaying axis of user-selectable elements over years, months, and days
US10289657B2 (en) 2011-09-25 2019-05-14 9224-5489 Quebec Inc. Method of retrieving information elements on an undisplayed portion of an axis of information elements
US10558733B2 (en) 2011-09-25 2020-02-11 9224-5489 Quebec Inc. Method of managing elements in an information element array collating unit
US11080465B2 (en) 2011-09-25 2021-08-03 9224-5489 Quebec Inc. Method of expanding stacked elements
US10845952B2 (en) 2012-06-11 2020-11-24 9224-5489 Quebec Inc. Method of abutting multiple sets of elements along an axis thereof
US11513660B2 (en) 2012-06-11 2022-11-29 9224-5489 Quebec Inc. Method of selecting a time-based subset of information elements
US10180773B2 (en) 2012-06-12 2019-01-15 9224-5489 Quebec Inc. Method of displaying axes in an axis-based interface
US9646080B2 (en) 2012-06-12 2017-05-09 9224-5489 Quebec Inc. Multi-functions axis-based interface
US20230418876A1 (en) * 2012-08-29 2023-12-28 Dennis Alan Van Dusen System and method for modeling, fuzzy concept mapping, crowd sourced supervision, ensembling, and technology prediction
US10607142B2 (en) 2016-08-31 2020-03-31 International Business Machines Corporation Responding to user input based on confidence scores assigned to relationship entries in a knowledge graph
US10606849B2 (en) 2016-08-31 2020-03-31 International Business Machines Corporation Techniques for assigning confidence scores to relationship entries in a knowledge graph
US10671266B2 (en) 2017-06-05 2020-06-02 9224-5489 Quebec Inc. Method and apparatus of aligning information element axes
CN114579826A (zh) * 2022-04-27 2022-06-03 支付宝(杭州)信息技术有限公司 基于知识图谱的任务处理方法及装置

Also Published As

Publication number Publication date
WO2003032199A3 (fr) 2003-08-28

Similar Documents

Publication Publication Date Title
US6598043B1 (en) Classification of information sources using graph structures
US6904429B2 (en) Information retrieval apparatus and information retrieval method
US8108405B2 (en) Refining a search space in response to user input
US8332439B2 (en) Automatically generating a hierarchy of terms
US8131779B2 (en) System and method for interactive multi-dimensional visual representation of information content and properties
JP4241934B2 (ja) テキスト処理及び検索システム及び方法
JP3577819B2 (ja) 情報探索装置及び情報探索方法
US7185001B1 (en) Systems and methods for document searching and organizing
WO2003032199A2 (fr) Classification de sources d'information effectuee a l'aide de structures de graphe
JPH11328228A (ja) 問い合わせ検索結果精緻化方法及び装置
WO2000054185A1 (fr) Procede et dispositif d'elaboration d'un thesaurus a l'usage d'utilisateurs, au moyen de bases de donnees en ligne
JP4967133B2 (ja) 情報取得装置、そのプログラム及び方法
JPH09231238A (ja) テキスト検索結果表示方法及び装置
JP2001184358A (ja) カテゴリ因子による情報検索装置,情報検索方法およびそのプログラム記録媒体
JP2004110834A (ja) 情報記憶検索システム及び方法
Carmel et al. Entity oriented search and exploration for cultural heritage collections: the EU cultura project
Turetken Visualization support for managing information overload in the web environment
JPH1027125A (ja) 文書分類装置
Chung et al. Web-based business intelligence systems: a review and case studies
Sugiyama Studies on Improving Retrieval Accuracy in Web Information Retrieval
WO2024211835A1 (fr) Interface de recherche sémantique pour référentiels de données
Koh et al. Deriving image-text document surrogates to optimize cognition
Tang et al. A visual exploratory search engine solution based on cloud computing
Minghim et al. Visual Mining of Text Collections.
CN118797019A (zh) 任务处理方法、文档对话方法以及文档处理方法

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BY BZ CA CH CN CO CR CU CZ DE DM DZ EE ES FI GB GD GE GH GM HU ID IL IN IS JP KE KG KP KR KZ LK LR LS LT LU LV MA MD MG MK MW MX MZ NO NZ PH PL PT RO RU SE SG SI SK SL TJ TM TR TT TZ UA UZ VN YU ZA

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ UG ZW AM AZ BY KG KZ MD TJ TM AT BE CH CY DE DK ES FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP