[go: up one dir, main page]

WO2000039713A1 - Procede et systeme de collecte electronique de donnees parmi de multiples sources de donnees - Google Patents

Procede et systeme de collecte electronique de donnees parmi de multiples sources de donnees Download PDF

Info

Publication number
WO2000039713A1
WO2000039713A1 PCT/US1999/030965 US9930965W WO0039713A1 WO 2000039713 A1 WO2000039713 A1 WO 2000039713A1 US 9930965 W US9930965 W US 9930965W WO 0039713 A1 WO0039713 A1 WO 0039713A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
gem
user
data object
metadata
Prior art date
Application number
PCT/US1999/030965
Other languages
English (en)
Inventor
Ashwin Gulati
William J. Blackburn
Original Assignee
Gemteq Software, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Gemteq Software, Inc. filed Critical Gemteq Software, Inc.
Priority to AU22170/00A priority Critical patent/AU2217000A/en
Publication of WO2000039713A1 publication Critical patent/WO2000039713A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/93Document management systems

Definitions

  • Patent Application Serial No. 60/160,639 "Method And System For Performing Electronic Data-Gathering Across Multiple Data Sources", by Ashwin Gulati and William J. Blackburn, filed October 20, 1999, which subject matter is incorporated herein by reference.
  • the present invention relates generally to electronic data-gathering, and more particularly, to the automatic capture, storage and classification of selected portions of electronic data.
  • Digital information may be collected from such varied sources as text and image files, network sources, and the Internet.
  • the current process of collecting such data involves a complex series of interactions with the source data itself, in addition to computer-based or manual filing systems, word processors, and other authoring tools and applications that make the process non-linear and difficult to manage.
  • a user might want to save a selected portion of information from a web page viewed using a web browser.
  • the user could save the complete web page; however, this would not only save the relevant information but would store the entire web page. Storing the entire web page would distract future users from focusing upon the relevant information within the web page. Such storing of extraneous information is undesirable in a research environment.
  • creation information refers to information about the actions relating to saving the original data, such as the identity of the system user, the date and time of storage, and the source document from which the data was taken. Attribution information refers to bibliographic information such as the original author, date of publication, etc.
  • No conventional system allows for the automatic capture, storage, and classification of less than an entire source document, including the acquisition and storage of creation and attribution metadata, packaged into a single, routable self-attributing format.
  • a system that will allow a researcher to acquire and use important pieces of data gleaned from electronic files of various types without stopping his or her work to interact with an additional application user-interface.
  • Such a system would provide a conduit for inserting research data into a system model directly while negating or delaying the need to interact with a view of the system.
  • the electronic data-gathering system of the present invention allows a user to easily capture and archive electronic data without the need to interact with an additional application user-interface. This streamlines the workflow of performing research, and also ensures that information is easily traceable to its original source.
  • the described embodiments of the present invention automatically encapsulate user- selected sets of electronic data with a set of attribution, creation, and user-defined metadata.
  • the system uses the captured data and metadata to create gem data objects. These gem data objects are then routed within the electronic data-gathering system.
  • the gem data objects may be stored on a persistent storage mechanism, and viewed or edited by a system user.
  • a user captures a set of electronic data via the system clipboard, drag and drop activity, manual type-in or a scanner.
  • the data is input into a data target, which does not necessitate opening an additional program view.
  • the user may optionally choose to input data using a system data viewer.
  • the research system automatically captures a set of metadata and creates an interim object by appending the set of metadata to the original captured data. This interim object is then passed to the main routine, where a gem data object is created.
  • the research system also includes a data viewer that allows a user to perform actions upon the gem data objects.
  • a user may perform a search for gem data objects using keywords or other attributes of gem data objects.
  • a user may view the gem object database in a hierarchical manner, and create a storage hierarchy by placing gem data objects in various containers within the database. Additionally, a user may edit gem data objects using the data viewer.
  • Fig. 1 is an example of an electronic data-gathering system in accordance with the present invention.
  • Fig. 2 is a flowchart of a research workflow method of the present invention.
  • Fig. 3 is a flowchart showing an example of a gathering phase for a research workflow method.
  • Fig. 4A is an illustration of an acquire data step using a data target as used in an embodiment of the present invention.
  • Fig. 4B is an illustration of an acquire data step using a gem data object graphical user interface view as used in an embodiment of the present invention.
  • Fig. 5 is flowchart showing an example of a method for processing data into a gem data object.
  • Fig. 6 is a flowchart of an example of a method for creating an interim object.
  • Fig. 7 is a flowchart of an example of a method for creating a gem data object.
  • Fig. 8 is a diagram showing an example of a user interface window for a gem data object viewer with the "General" tab of the popup menu of a gem data object expanded.
  • Fig. 9A is a diagram showing an example of a user interface window for a gem data object viewer with the "Bibliography" tab of the popup menu of a gem data object expanded.
  • Fig. 9B is a diagram showing an example of the "Bibliography” tab of the popup menu of a gem data object expanded, with a scrolling menu displayed for the "Bibliography Format” option.
  • Fig. 10 is a diagram showing an example of a user interface window for a gem data object viewer with the "View" tab of the popup menu of a gem data object expanded.
  • Fig. 11 is a flowchart showing an example of a storage phase for a research workflow method.
  • Fig. 12 is a flowchart showing an example of a performing data actions phase for a research workflow method.
  • Fig. 13 is a flowchart showing an example of a data analysis process.
  • Fig. 14 is an example of a root node as used in an embodiment of the present invention.
  • Fig. 15 is an example of a container node as used in an embodiment of the present invention.
  • Fig. 16 is an example of a container node as used in an embodiment of the present invention.
  • Fig. 17 is an example of a gem data object as used in an embodiment of the present invention.
  • Fig. 18 is a diagram showing an example of a gem data object viewer with a "View" popup menu expanded to show a text type gem data object.
  • Fig. 19 is a diagram showing an example of a gem data object viewer with a "View” popup menu expanded to show a text type gem data object.
  • Fig. 20 is a diagram showing an example of a gem data object viewer with a "View” popup menu expanded to show a file type gem data object.
  • Fig. 21 is a diagram showing an example of a gem data object viewer with a "View" popup menu expanded to show a graphics type gem data object.
  • Fig. 22 is a diagram showing an example of a gem data object viewer with a "View" popup menu expanded to show a Uniform Resource Locator (URL) type gem data object.
  • URL Uniform Resource Locator
  • Fig. 1 is an illustration of an embodiment of an electronic data-gathering system for use in research operations.
  • System 100 is used for creating, viewing, storing and using encapsulated packages of electronic data and metadata. These encapsulated packages of data and metadata are referred to herein as gem data objects.
  • gem data objects refers not only to a Gem from the eGems TM software by Gemteq Software, Inc. but also to other types of objects. Different formats for creating, storing, and using gem data objects will be evident to one of skill in the art.
  • Fig. 1 The system of Fig. 1 is suitable for use with both data acquisition and data retrieval. First, the data acquisition functions will be discussed, followed by the data retrieval functions.
  • the data acquisition functions of the electronic data-gathering system of Fig. 1 will be presented by broadly tracing the path of a set of data that a user wishes to enter into the system.
  • electronic data must have a point of entry into a system 100.
  • the system 100 contains a data target 10 through which electronic data may be captured by the system 100.
  • electronic data may be captured using a data scanner 20.
  • the captured data is then passed to an analyzer/conduit 30, which receives the captured data, captures additional metadata, and passes the data and metadata to a client module 40.
  • the metadata captured by the analyzer/conduit 30 may be any data describing or referring to the original data.
  • the client module 40 now contains both the original data and the captured metadata.
  • different processing modules operate on the data and metadata to create a gem data object, an encapsulated package of data and metadata, capable of being routed throughout the system 100.
  • Specialized modules within the client module 40 operate on the gem data object and further enhance data acquisition, storage and usage.
  • a plain formatted text module 44, a graphics module 46, and an import/export module 56 make up the core functionality for acquiring and using basic image and text data, as well as packaging that data for transfer between databases.
  • the text module 44 is responsible for processing acquired text data sets.
  • Text data sets may be defined using a system clipboard, drag and drop activity, manual typing-in of data, speech recognition of audio data, or scanning-in of data combined with optical character recognition (OCR).
  • Processing text data includes recognizing the type of text, displaying it in a visual component, and performing editing operations upon it.
  • the graphics module 46 is responsible for processing acquired image data sets.
  • Image data sets may be defined using a system clipboard, drag and drop activity, or scanning-in of data. Processing includes recognizing the type of image, displaying it in a visual component, and performing editing operations upon it.
  • the import export module 56 handles the transfer of data between databases. In one embodiment, this module resides in the client module 40, but it may also reside in the server module 70.
  • the import/export module 56 provides for the transfer of all or part of the current system database into a new or existing database or export file, as well as the transfer of some or all of a database or import file into the current system database.
  • a language module 42 is capable of performing in-line language translation upon the captured data to convert from text in one language to text in another language.
  • a speech-to- text module 48 allows for the translation of text data into audio data and vice versa.
  • a palm- top device synchronizer module 50 allows for the acquisition of data from palm-top devices during routine synchronization procedures, which such palm-top devices are capable of performing in conjunction with another computer.
  • An audio/visual module 52 allows for the capture, storage and retransmission of audio and video data.
  • An optical character recognition (OCR) module 54 allows for the conversion of scanned text into editable text data.
  • the specialized client modules shown in Fig. 1 are merely representative. Additional modules to perform specialized manipulations for gem data objects may be added to the client module 40, as will be evident to one of skill in the art.
  • a data viewer 62 allows the user to interact with the system 100 to manipulate raw electronic data and gem data objects in several ways. Such user interactions include, but are not limited to: capturing original data; gathering, refining, and editing gem data objects; classifying and storing gem data objects; and retrieving and using gem data objects.
  • the data viewer 62 provides a graphical user interface that displays all gem data object database information in a hierarchical format.
  • a data object encoder/decoder 58 packs information from the client module 40 for transporting to a server module 70. During data acquisition, the data object encoder/decoder 58 will package the newly created gem data object into a format suitable for transport. The data object encoder/decoder 58 also unpacks information received back from the server module 70.
  • a system clipboard data manager 60 allows a user to cut and paste data. The clipboard data manager 60 handles interpreting a selected gem data object during copy and paste or drag and drop actions, and makes the selected data available to the requesting application.
  • encoded gem data object 64 is sent to a data transmission module 66 for transport to the server module 70.
  • the specific implementation of the data transmission module 66 will vary based upon the type of client server architecture within the system 100. For example, for a local, single user installation, no transmission protocol is required, as data transmission may be done using standard in-process communication mechanisms. For a remote server module, some means of requesting data, or operations on data, and for receiving a response is required.
  • the data transmission module 66 can therefore use any number of different data transmission protocols, which are well known in the art.
  • the server module 70 receives the encoded gem data objects from the data transmission module 66.
  • the server module 70 encapsulates all data access to an underlying persistent storage mechanism 80.
  • the server module 70 can run in or out of process with the client module 40, and can run either locally, as would be the case with a single user version of the system, or remotely, as in a client-server implementation or over the Internet, via standard protocols.
  • a data object encoder/decoder 72 unpacks the transmission, in this example a newly created gem data object, received from the client module 40.
  • the server module 70 passes the transmission to a data object router 74, which processes client requests and routes data to the appropriate gem object database location.
  • the data object router 74 may route data based upon routing information received with a transmission, or may route according to pre-set rules.
  • the server module 70 interacts with the persistent storage 80 to persist the data.
  • the newly created gem data object will thus be stored in the persistent storage 80 in a gem object database.
  • Various different embodiments of a gem object database will be evident to one of skill in the art.
  • the system 100 of Fig. 1 may also be used for data retrieval.
  • Data retrieval may include, but is not limited to, retrieving individual gem data objects and performing queries for information upon the gem object database.
  • the data viewer 62 provides a graphical user interface for viewing the contents of the gem object database and performing searches of the gem object database.
  • the data viewer 62 provides standard graphical means (such as drag and drop or copy and paste) for organizing data via the movement, addition, and deletion of categories and gem data objects.
  • the client module 40 receives the request from the data viewer 62.
  • the data object encoder/decoder 58 packages the request for transmission to the server module 70.
  • the encoded request is transmitted to the server module 70 via the data transmission module 66.
  • the server module 70 receives the request from the data transmission module 66, decodes it using the data object encoder/decoder 72, and submits it to a query processor 76.
  • the query processor 76 forms database queries based upon requests received from the client module 40 or generated locally within the server module 70. These queries are used to search the gem object database and respond to requests for database searches. The response is then encoded via the data object encoder/decoder 72 and retransmitted back to the client module 40. The client module 40 will display the request results to the user via the data viewer 62.
  • the query processor 76 and a data analysis processor 78 perform additional server module 70 tasks.
  • the data analysis processor 78 analyzes the gem object database stored within the system 100 to create general profiles of system 100 users, and to create specific profiles of topics of interest within the gem object database. The data derived from these profiles may then be formed into queries. These queries may be submitted to Internet search engines as well as to the query processor 76 to provide a user with Internet sites or local files of potential interest.
  • the various modules disclosed with respect to Fig. 1 are merely representative. The functionality of the various different modules may be combined into a non-modular electronic data-gathering system without departing from the inventive concepts disclosed herein.
  • the embodiment disclosed above is implemented in software.
  • the software is embodied in a computer readable medium, suitable for use with a computer system.
  • a computer system may comprise a plurality of processors combined with a data viewscreen; however, other embodiments will be evident to one of skill in the art.
  • Fig. 2 shows a flowchart of a research workflow method of the present invention.
  • the method of Fig. 2 is used with the electronic data-gathering system 100 of Fig. 1 to perform research operations.
  • a gathering phase 210 important and relevant information is chosen and collected.
  • the information is stored in a storage phase 220.
  • the information may be retrieved, used, and manipulated in a performing data actions phase 230.
  • the process shown here is a linear one, the steps can be performed in any order. For example, a user could perform data actions upon the data before storage.
  • Fig. 3 shows further details of an embodiment of the gathering phase of the research workflow method for use with an electronic data-gathering system.
  • Data is first acquired in step 310, and then data is processed into a gem data object in step 320.
  • Figs. 4 A and 4B show two different embodiments of the acquire data step.
  • Four different methods of selecting data are shown: interaction with the system clipboard 410; drag and drop activity 420; manual typing-in of data 430; and electronic scanning-in of data 440.
  • a system user may capture data electronically using any of these different methods.
  • a user may also alternate between the different methods for selecting data as convenience dictates.
  • a user may capture any kind of electronic data in the acquire data step.
  • the list of potential types of data that may be selected includes, but is not limited to: text type data, including Rich Text Format (RTF), American Standard Code for Information Interchange
  • RTF Rich Text Format
  • ASCII Joint Photographic Expert Group
  • GIF Graphics Interchange Format
  • URL links including Joint Photographic Expert Group (JPEG) and Graphics Interchange Format (GIF) type files
  • WWW World Wide Web
  • HTML Hyper Text Markup Language
  • File links including Object Linking and Embedding (OLE) type object links.
  • JPEG Joint Photographic Expert Group
  • GIF Graphics Interchange Format
  • gem data objects may be created from the data types listed above.
  • gem data objects may be created from other types of data as will be evident to one of skill in the art.
  • Support for additional gem data object formats may be included by coding and installing the appropriate module. There is no limit on the type and number of potentially supported formats except for the disk space and memory of the hardware platform chosen by the user.
  • Fig. 4A these four different methods of selecting data 410, 420, 430 and 440 are used to input data into a data target 400.
  • the data target 400 is a free-floating, movable icon symbolizing the active research system.
  • the data target 400 eliminates the need for a user to open and interact with additional application windows, while still allowing the user to access key features of the system.
  • electronic data may be input directly into the data target without the need for an additional view window. In this case, the data being input is either pasted or dragged onto the data target 400.
  • the data target 400 provides a target for inputting data into the research system, and also provides a visual indication that the electronic data-gathering system is running. The user can add data to any level of the tree structure of the research system. If data is added to data target 400, the data is placed at a default location in the tree.
  • Fig. 4B the aforementioned methods of selecting data 410, 420, 430 and 440 are used to input data into a data viewer window 450.
  • the data target 400 has been expanded to allow a view into the research system.
  • the data viewer window 450 provides a user with an expanded view of the research system, while still allowing a user to work concurrently with other program applications.
  • Fig. 5 shows an embodiment of the method for processing the selected raw data into a gem data object.
  • the steps of acquiring metadata 510 and creating an interim object 520 are performed by the analyzer/conduit 30.
  • the metadata could be passed directly to the main routine without creating an interim object.
  • the main processing thread of either the client module 40 or the server module 70 is referred to herein as the main routine.
  • metadata is acquired in step 510.
  • This metadata is information about the original data obtained in the gathering phase.
  • metadata may include, but is not limited to, information about the source of the original data, such as: a Web page address where the original data was found; the filename of the source of the data; or the date the data was copied.
  • the metadata may also include attribution data required to format a proper bibliography, such as: the original author; date of publication; or specific location or page number where the original data was located within the source document.
  • the metadata is used to create an interim object in step 520.
  • This interim object is then passed to the main routine via a capture event 530.
  • the main routine is then used in step 540 to create a gem data object from the interim object.
  • Fig. 6 shows an embodiment of the method of operation for the analyzer/conduit 30 to create an interim object.
  • the steps shown in Fig. 6 encompass steps 510 and 520 from Fig. 5.
  • Steps 610-660 all relate to acquiring metadata and may be performed in any order. Not all steps will be performed in all embodiments.
  • step 610 user data is added to the set of metadata being collected.
  • the operating system (OS) Application Program Interface (API) is used to acquire the user data, which may include: information about the system user who is collecting data; the name or designation of the machine being used; and the system date and time.
  • step 620 source data is added to the metadata.
  • the OS API is used to determine the source application for the original data as well as the designation of the document providing the original data. For example, if the original data was a web page, the source application might be a web browser while the document designation would be a URL. However, in certain situations this information may not be available. In such cases, a best-guess routine is used which attempts to determine the application name and document name providing the data. The routine makes a guess based upon the application window that was last active before the user input data into the electronic data-gathering system.
  • the captured data is scanned to determine such information as the author, the publication date, the original document title, and other attribution data. When this data is not available, a best-guess routine is used which suggests a value from the captured data, for example "meta" tags in HTML files, or embedded tags in RTF files.
  • the original source data may also be scanned to determine a best guess for this information. If no data is available to make a best guess, the field is left blank.
  • a default name and description is added in step 640.
  • the captured data is scanned to suggest a default name for the gem data object that will be created.
  • This name is typically derived from the first 3-4 words of captured text, but other implementations will be evident to one of skill in the art.
  • the description will default to " ⁇ data type> from ⁇ source documents"
  • a keyword scan is performed in step 650. Any text associated with the captured data is scanned and separated into individual words. A dictionary of "small words" is applied to discard words that do not make meaningful keywords. The resulting collection of keywords may be used for multiple purposes, such as suggesting related search terms or routing the resulting gem data object after it is created.
  • the data is scanned for embedded URLs in step 660. Any embedded URLs found are stored and may be used for multiple purposes, such as suggesting related web links or routing the resulting gem data object after it is created.
  • Interim object packing is performed in step 670.
  • the collection of captured data and metadata is packaged into an interim object that may be passed to the main routine for further processing. Not all embodiments perform this step.
  • Fig. 6 shows only one embodiment of the various types of metadata that may be captured by the electronic data-gathering system.
  • metadata is initially captured automatically by the electronic data-gathering system.
  • a user may also add metadata to a gem data object as well.
  • the user of the data-gathering system may also decide not to capture certain types of metadata disclosed herein without departing from the scope of the present invention.
  • Fig. 7 is a flowchart of a method to create a gem data object from captured data and metadata.
  • the steps shown in Fig. 7 encompass step 540 of Fig. 5.
  • the steps of Fig. 7 are performed by the main routine.
  • the analyzer/conduit 30 notifies the main routine of available new data via an event notice. This prompts the main routine to retrieve the interim object in step 720.
  • the main routine also retrieves an empty gem data object from the server module in step 730.
  • the main routine examines the captured data in order to recognize the data type, and applies the client module or modules corresponding to that data type to the captured data in step 740.
  • the main routine populates the fields of the empty gem data object with the data from the interim object.
  • the resulting gem data object is an implementation of the encapsulated original data and metadata along with the functionality required to use, view, and manipulate the captured data.
  • the gem data object is a single shareable package of data and metadata.
  • This gem data object package is a "self-attributing" unit - it contains bibliographic information as well as information about how an appropriate bibliography, footnote, or caption should be formatted. This enables the gem data object to generate its own bibliography or other identifying caption about itself.
  • Figs. 8, 9, and 10 are diagrams showing an embodiment of a user interface for the electronic data-gathering research system.
  • the user interface provides one or more views into the research system model and displays information about the gem data objects within the system.
  • the views also provide a user interface for adding, modifying, or deleting the original data or metadata encapsulated by the gem data object.
  • the windows shown in Figs. 8, 9, and 10 are merely representative of the type of information that may be displayed to describe a gem data object.
  • a window 810 shows a "Gem View” window providing a hierarchical view of the stored gem data objects within the system.
  • the individual gem data objects may be organized in a hierarchical manner by grouping them into containers.
  • Each container is a grouping of individual gem data objects, or a grouping of containers.
  • This hierarchical storage method provides a convenient way for a user to organize the gem data objects within the research system.
  • "PTO Website" constitutes a container 812
  • "Unfiled” constitutes another container 816.
  • a gem data object “Functions of Patent and Trademark Office” 814 is stored in the container 812, while a gem data object “Future paper” 818 is stored in the container 816.
  • gem data object "Functions of the Patent and Trademark Office” 814 has a data type of "text”.
  • a window 820 shows an expanded view of a popup window for the gem data object "Functions of Patent and Trademark Office" 814.
  • the tab 822 displays information regarding the gem data object's name and descriptive notes regarding the gem data object.
  • the tab 822 also displays the container in which the gem data object is located, as well as information about the original source application for the data contained in the gem data object.
  • Tab 822 provides a user interface for viewing, adding, deleting, or modifying the displayed information regarding a gem data object.
  • Fig. 9A contains the same window 810 showing a "Gem View” window providing a hierarchical view of the stored gem data objects within the system as shown in Fig. 8.
  • a popup window 920 for the gem data object 814 has a "Bibliography" tab 922 selected.
  • the tab 922 displays bibliographic information regarding the original source data, such as: the author's name; publication; the place published; the edition or volume; and the page numbers.
  • the tab 922 also displays the original article title and date of publication, as well as the URL of the source document.
  • Tab 922 provides a user interface for adding, deleting, or modifying the displayed information regarding a gem data object.
  • Fig. 9B shows a second view of the popup window 920 with the "Bibliography” tab 922 selected.
  • a scrolling menu 924 is displayed for the "Bibliography Format” option.
  • the scrolling menu 924 allows a user to select from various different types of standard bibliographic formats for citing printed and electronic resources.
  • the bibliography format field of the gem data object contains an integer value code representing the chosen type of bibliography format.
  • the bibliography entry itself is composed "on-the-fly" when requested by the user.
  • the user can request the insertion of a bibliographic entry, footnote, or other type of caption by using drag and drop or copy and paste interactions.
  • a method of the gem data object is invoked wherein the format code is evaluated.
  • the data fields corresponding to source information (such as author, publication, etc.) are then strung together using the appropriate formatting (such as italics, quotation marks, etc.) and made available to the target application via the system clipboard.
  • Fig. 10 contains the same window 810 showing a "Gem View” window providing a hierarchical view of the stored gem data objects within the system as shown in Figs. 8 and 9A.
  • a popup window 1020 for the gem data object 814 has a "View" tab 1022 selected.
  • the tab 1022 provides a view of the original captured data.
  • Tab 1022 also provides a user interface for adding, deleting, or modifying the data contained in the gem data object.
  • the user can perform standard editing operations such as changing the font, cutting and pasting, justifying the text, etc.
  • a gem data object After a gem data object is created in the gathering phase, it is stored so that it can be used in future research. Multiple different types of processes may be performed using the accumulated stored data. For example, individual gem data objects may be retrieved, the storage database can be reordered or reorganized, and the storage database can be searched. The actual type of persistent storage system used is masked from the underlying workings of the system by the server component. The role of persistent storage may thus be implemented in any of a number of commercially available database systems, as will be evident to one of skill in the art.
  • Fig. 1 1 is a flowchart of a storage phase for a research workflow method.
  • Fig. 1 1 relates to the initial storage of a new gem data object through interaction between the client module 40, the server module 70, and the persistent storage module 80 as shown in Fig. 1.
  • client/server architecture may be used in the research workflow method of the present invention.
  • the server module could be local to the client module.
  • a more distributed client server system could be used wherein the server module is remote from the client module.
  • client server architectures will be presented herein, the present invention encompasses all of the various types of client server architectures suitable for use with the present invention.
  • the gem data object is packaged by the client module 40 for transmission to the server module 70.
  • the raw data comprising the gem data object may first be compressed in step 1112.
  • the gem data object is encoded in step 1114, where it is converted into a form suitable for transfer over existing network protocols.
  • extensible Markup Language (XML) is used to encode the gem data
  • the XML-based implementation would contain a method call requesting that the gem data object be saved, information about the database in which to save it, and other required parameters.
  • the encoded data is then transmitted in step 1 120 from the client module 40 to the server module 70.
  • the client module 40 contacts the server module 70 on a well-known port using Hyper Text Transfer Protocol (HTTP) and Java
  • the client module 40 Upon establishing a successful connection, the client module 40 transmits the XML-RPC method call containing the XML-encoded gem data object.
  • the server module 70 then decodes the gem data object in step 1132, and decompresses the gem data object in step 1 134 if the gem data object was compressed in step
  • the XML tags are parsed and the transmitted gem data object is rebuilt and forwarded for routing, along with passed parameters such as the name of the requested storage database.
  • the gem data object is routed to a storage location.
  • the transmitted gem data object may already have a pre-assigned database location. If so, this information is passed along with the gem data object and used for routing purposes.
  • the server module 70 can apply pre-set rules that have been configured on the server module 70 to the gem data object.
  • pre-set rules can be developed for routing based upon any type of data or metadata contained within the gem data object. For example, if a destination location within the database is not supplied, a rule might exist which routes all data containing a particular phrase to a particular location. Such a rule could instruct all gem data objects containing the phrase "trademark" to be placed into the "Patent and Trademark Office" container. In another example, if the gem data object comes from a source, such as a URL, that is known to provide offensive material, the data could be discarded instead of being stored.
  • the pre-set rules on the server module may also relate to adding or modifying certain metadata associated with gem data objects. Such rules would allow a user to perform large amounts of data gathering without having to interact with each individual data set as it is collected. For example, a user could create a rule to apply a default bibliography to a gem data object for a certain source author.
  • the gem data object is stored in step 1 140. As noted above, this may be accomplished using a variety of different types of persistent storage 80.
  • individual gem data objects may be retrieved, modified, deleted, or reorganized.
  • Individual gem data objects may also be copied or cross-referenced to other locations within the database.
  • a cross-reference is a pointer to a gem data object that is located elsewhere within the database.
  • the gem object database may be searched, sorted, or reorganized.
  • the contents of the gem object database may be displayed, for example in a hierarchical format showing the location of gem data objects within different containers.
  • Fig. 12 is a flowchart of a performing data actions phase for a research workflow method.
  • these actions involve interactions between the client module 40 and the server module 70.
  • the client module 40 is responsible for identifying data sets, performing a first-pass analysis of the data, presenting a user interface to the user, and requesting single gem data objects or groups of objects according to search and browse interactions with the user.
  • the server module 70 responds to search and browse requests, modifies data, processes detailed data analysis, and performs automated data routing where applicable.
  • the client module 40 For example, if a system user drags a gem data object from the data viewer 62 to a word processing program, the client module 40 generates a request to the server module 70 to retrieve the requested gem data object. The server module 70 responds by providing the gem data object or returning an error message. The client module 40, via the system clipboard data manager 60, takes the gem data object and evaluates its data. The client module 40 then makes the data available to the word processing application via the system clipboard.
  • step 1210 a request is received from the client module 40.
  • the request is decoded in step 1220 so that it can be analyzed and processed.
  • the request is checked to see if the instructions are to add a new gem data object to the gem object database (step 1230). If yes, the request is then checked to determine if a filing preference has been included to route the new gem data object within the gem object database (step 1232). If no filing preference is included, the server module 70 checks to see if there are pre-set rules which determine the routing of the gem data object (step 1236). The gem data object is then routed according to the included filing preferences or pre-set rules (step 1234).
  • the new gem data object will be routed to the "Unfiled" category or gem data object container (step 1238).
  • the request is checked to see if a data update is requested (step 1240). If yes, the server module 70 processes the request to update data by either editing or deleting data (step 1242). Examples of a data update request include modifying an existing gem data object, or cross-referencing or copying a gem data object to another location within the gem object database.
  • the request is an information-only query, which is processed by the server module 70 as shown in step 1250.
  • Examples of an information-only query include expanding a node in a hierarchical depiction of the gem object database to view the contents of a particular gem data object container, or requesting a report on the database contents.
  • a user could request a report based upon a search of the gem object database for a particular keyword or gem data object field.
  • the server module 70 Upon completion of the request, the server module 70 encodes the response in step 1260 and returns the response to the client module 40 in step 1270.
  • the server module 70 encodes the response in step 1260 and returns the response to the client module 40 in step 1270.
  • client module 40 requests a variety of different implementations may be used to encode and process client module 40 requests.
  • One embodiment is an XML-RPC method call, but other embodiments will be evident to one of skill in the art.
  • the server module 70 responds to requests for information from the client module 40 by querying the gem object database. Additionally, the server module 70 may perform background tasks to aid the user of the electronic data-gathering system in identifying other potential sources of research information.
  • a data analysis process is performed by the data analysis processor 78 as a background server module 70 task. The purpose of the data analysis process is to analyze the user of the electronic data-gathering system, as well as the user's topics of interest, to make suggestions for further data gathering which may be forwarded to the client module 40.
  • the data analysis process performs two types of analysis passes. First, a general profile of the user of the electronic data-gathering system is created. Second, specific profiles are created for one or more topics of interest. If multiple users exist for the gem object database, the data analysis process will create individual user-specific profiles, and may additionally create profiles for topics of interest that have multiple users.
  • Creating a user profile is performed by deriving generalizations about the type of data stored by the user.
  • Such an analysis may include, but is not limited to: the naming and depth of storage containers and sub-containers; the keywords associated with gem data objects; and the sources for the gem data objects.
  • the data so obtained may then be compared against internally maintained demographics tables to determine the system user's most likely occupation, level of computer sophistication, areas of interest, most valued information sources, and various other attributes.
  • the data may also be compared against databases provided by external sources.
  • the data analysis process uses a similar technique to focus in on a specific topic of interest within the gem object database.
  • the data analysis process compares the text of the gem data objects against stored tables of word roots. This process is most effective for areas of expertise that are well defined. For example, medical, legal, and technology professionals share vocabularies that can be distilled into word roots. The presence of these roots within the captured data can reveal primary and secondary occupations or areas of interest.
  • the data analysis process uses the user profile and topical profiles to generate additional potential areas of research for the system user. Queries based upon the profiles are submitted to Internet search engines to generate Web site addresses of potential interest. Queries are also submitted to the query processor 76 of the server module 70, which queries the local gem object database and any other local files systems for additional gem data objects or files of potential interest. The queries may also suggest other users of the system with similar interests or research topics, thereby fostering collaboration between users who may not know each other. For example, the data analysis process may reveal that a particular user has created a gem data object collection with the following hierarchy of containers:
  • the data analysis process would infer that: the system user is a lawyer or legal scholar who is currently interested in Constitutional and Patent law; the user has a medium to high computer sophistication; the user may be shopping for a new DVD player; and the user may take a vacation to a tropical island in the near future. Based upon this information, the data analysis process would submit a query such as "tropical island vacation" to one or more Internet search engines and return the top ten resulting list of sites as a suggestion to the system user for future research. The data analysis processor 78 would also submit a query to the local query processor 76 to check local files and gem object databases for information regarding tropical island vacations. The data analysis process would perform similar queries for the other topical areas of research found.
  • step 1300 a pass through all data containers is made to search for keywords and phrases. Text labels are parsed, and small words are discarded.
  • step 1310 a data pass is made through all individual gem data objects to search for keywords and phrases. The data is parsed and small words are again discarded.
  • step 1320 a count is made of the most used data sources, such as URLs, local files, and network or system files. Steps 1300-1320 may be performed in any order.
  • step 1340 a comparison is made between the information obtained in steps 1300- 1320 and a stored database of word roots 1330.
  • the stored database of word roots provides an association between common data found in steps 1300-1320 and particular professions or fields of study.
  • the root "ombro" is shown in the database of word roots 1330 as being associated with medicine and psychology.
  • step 1340 The comparison of step 1340 generates additional information.
  • a user profile 1350 is generated, containing: an occupation and or topics of research; favored data sources; and suggested search queries.
  • step 1360 these suggested types of search queries are submitted and the results evaluated.
  • step 1370 additional gem data objects are created out of the results of the search query.
  • step 1380 the newly-created gem data objects are placed in a container within the user's gem object database.
  • Figs. 14-17 show one embodiment of a gem data object formatting scheme.
  • This gem data object scheme may be used to implement a client module and a server module that operate upon the gem data objects and allow users to interact with a gem object database.
  • the coding and formatting scheme for implementing the electronic data-gathering system is not limited to the embodiments disclosed herein. Other formatting methods will be evident to one of skill in the art.
  • the gem data objects in the scheme shown in Figs. 14-17 are organized in a hierarchical "tree" structure.
  • a "root” node that contains the tree structure within the gem object database. All members of the tree structure are part of this root node.
  • container objects that represent inner nodes and empty nodes of the tree structure.
  • the "leaf nodes of the tree structure are represented by individual items, corresponding to individual gem data objects.
  • the example structure of Figs. 14-17 is directed towards a data tree that contains information about scary books.
  • Fig. 14 shows an example of a root node.
  • a root node may contain only other containers.
  • the root created in Fig. 14 is named "Examples", and becomes the root node of a tree containing examples of scary books.
  • Identifier (URI) for this root node is "gemserver.gemteq.com.” This URI allows the client and server modules to locate the root node.
  • Fig. 15 shows an example of a container node.
  • a container node may contain other containers or individual gem data objects, or both.
  • the container of Fig. 15 is named "ScaryBooks.”
  • the "ScaryBooks" container has a URI of "gemserver.gemteq.com/Examples.” This URI indicates the "ScaryBooks" container is located in the "Examples" root node.
  • Fig. 16 shows another example of a container.
  • the container of Fig. 16 is named “StephenKing.”
  • This container has a URI of "gemserver.gemteq.com/Examples/ScaryBooks", as given in line 1609.
  • This URI indicates the "StephenKing" container is located in the "ScaryBooks" container in the "Examples” root node.
  • Fig. 17 shows an example of an individual item.
  • the item of Fig. 17 is named
  • a client would use the above example of a tree structure to find and route gem data objects.
  • the client module 40 would parse the URI of the "Carrie” item shown in Fig. 17 in the following way.
  • the server module 70 would be indicated as "gemserver.gemteq.com”.
  • the data tree would be given by the root node “Examples”.
  • the path would be given by the two container nodes "ScaryBooks/StephenKing”.
  • the actual gem data object would be the item "Carrie.”
  • the client module 40 would make a RPC to the server module 70 using standard HTTP protocols to request the "Carrie" gem data object.
  • the server module 70 would then reply with a standard HTTP response that contains the XML-RPC response that in turn contains the "Carrie” gem data object requested.
  • the XML portion of the response may be of many different types to represent the success or failure of the RPC call.
  • a gem data object "Carrie” as shown in Fig. 17 demonstrates that a gem data object is an encapsulation of the original electronic data combined with creation, attribution, and user-selected metadata.
  • Line 1703 of the "Carrie” gem data object contains user-selected comments about the gem data object.
  • Lines 1704-1710 of the "Carrie” gem data object contain metadata about the creation of the original gem data object: when the gem data object was created (line 1704); who created the gem data object (lines 1705-1708); and when the gem data object was modified (line 1710).
  • Lines 1711-1722 contain attribution metadata as well as information about how a bibliography for the "Carrie” gem data object should be formatted. For example, line 171 1 gives the bibliography format "MLABook", which corresponds to one of the bibliography format choices from the popup menu 924 of Fig. 9B.
  • Figs. 18-22 Several different types of gem data objects are shown in Figs. 18-22.
  • the overall structure for a gem data object will remain substantially the same regardless of the type of original data contained in the gem data object. Examples of data types that may comprise the original data have been discussed previously with regard to Figs. 4A and 4B.
  • Each type of gem data object will contain the original data encapsulated with creation, attribution, and user- selected metadata. This allows for consistency in the use of gem data objects, and also allows new types of gem data objects to be easily incorporated into the gem object database.
  • the gem data object types shown in Figs. 18-22 are merely examples of specific embodiments. It will be evident to one of skill in the art that various other types of gem data objects may be created, stored, and manipulated within the electronic data-gathering system.
  • Figs. 18-22 show a user interface window for a gem data object viewer.
  • the "View" tab of the popup menu for a different gem data object is expanded. This creates a view of the original data encapsulated within the gem data object.
  • Fig. 18 shows in a window 1870 a view of the gem data object "Functions of Patent and Trademark Office" which is a text type of gem data object.
  • Fig. 19 shows in a window 1970 another text type of gem data object "Text of plant patent form.”
  • text type gem data objects may contain different text formatting styles as well as plain text. Both ASCII and RTF text formats are supported, as well as a variety of word processing applications' native formats.
  • Fig. 20 shows in a window 2070 a view of a file type of gem data object.
  • the file gem data object "My briefcase - link" provides a link to the file "plantpatdoc.”
  • a related type of gem data object is an OLE object.
  • a OLE type gem data object contains: the data required to present the source data embedded in a target document; data about the providing application; and either a path to the data file or the raw data itself.
  • Fig. 21 shows in a window 2170 a pictorial type of gem data object.
  • the window 2170 contains a view of the picture file "PTO header picture”.
  • Graphical file types such as JPEG and GIF formatted files, may be stored as pictorial gem data objects
  • Fig. 22 shows in a window 2270 a view of a URL type of gem data object.
  • the URL gem data object "Patent and Trademark Office Home Page" contains a URL that provides an
  • Such URL type gem data objects may be created where the user does not wish to store the entire web page as a separate gem data object.
  • File types may be stored in clipboard format or converted from native formats.
  • Standard clipboard formats are available for a variety of data types, including: plain text (ASCII); rich text format; HTML text; Bitmap Image (BMP); Device Independent Bitmap Image (DIB); Metafile Image (WMF); Enhanced Metafile Image (EMF); and Icon Image (ICO).
  • Other formats such as GIF or JPEG, may be converted by the system clipboard or the electronic data-gathering system.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • General Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Document Processing Apparatus (AREA)

Abstract

L'invention se rapporte à un procédé et à un système de collecte électronique de données qui permet à un utilisateur d'acquérir et d'archiver facilement des données électroniques sans qu'il lui soit nécessaire d'interagir avec une interface-utilisateur d'application supplémentaire. La présente invention canalise le flux des travaux permettant la mise en oeuvre de la recherche et fait en sorte que la source originale des informations soit facilement identifiable. Les réalisations de la présente invention permettent l'encapsulation automatique d'ensembles de données électroniques sélectionnés par l'utilisateur avec un ensemble de métadonnées d'attribution et de création définies par l'utilisateur. Le système utilise les données acquises et les métadonnées pour créer des objets de données 'Gem'. Ces objets de données Gem sont alors acheminés au sein du système électronique de collecte de données. Ils peuvent être stockés sur un mécanisme de mémoire permanente. Le système de recherche comporte également un organe de visualisation des données qui permet à un utilisateur de visualiser les objets de données Gem et d'effectuer des actions sur ces objets.
PCT/US1999/030965 1998-12-28 1999-12-23 Procede et systeme de collecte electronique de donnees parmi de multiples sources de donnees WO2000039713A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU22170/00A AU2217000A (en) 1998-12-28 1999-12-23 A method and system for performing electronic data-gathering across multiple data sources

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US11406598P 1998-12-28 1998-12-28
US60/114,065 1998-12-28
US16063999P 1999-10-20 1999-10-20
US60/160,639 1999-10-20

Publications (1)

Publication Number Publication Date
WO2000039713A1 true WO2000039713A1 (fr) 2000-07-06

Family

ID=26811793

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1999/030965 WO2000039713A1 (fr) 1998-12-28 1999-12-23 Procede et systeme de collecte electronique de donnees parmi de multiples sources de donnees

Country Status (2)

Country Link
AU (1) AU2217000A (fr)
WO (1) WO2000039713A1 (fr)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004012103A3 (fr) * 2002-07-25 2004-04-08 Comm Synergy Technologies Llc Preservation de document
WO2004025466A3 (fr) * 2002-09-16 2004-12-16 Clearcube Technology Inc Infrastructure informatique repartie
US7370083B2 (en) 2001-11-21 2008-05-06 Clearcube Technology, Inc. System and method for providing virtual network attached storage using excess distributed storage capacity
US7606824B2 (en) 2005-11-14 2009-10-20 Microsoft Corporation Databinding workflow data to a user interface layer
WO2009010989A3 (fr) * 2007-07-13 2010-06-24 Anuradha Vaidyanathan Procédé et système pour stocker et récupérer des données
US8924395B2 (en) 2010-10-06 2014-12-30 Planet Data Solutions System and method for indexing electronic discovery data
US9519794B2 (en) 2013-12-10 2016-12-13 International Business Machines Corporation Desktop redaction and masking
US10552509B2 (en) 2000-11-13 2020-02-04 Talsk Research, Inc. Method and system for archiving and retrieving bibliography information and reference material

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0580536A2 (fr) * 1992-07-22 1994-01-26 International Business Machines Corporation Procédé et dispositif de construction automatique d'une bibliographie dans un environnement multi média

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0580536A2 (fr) * 1992-07-22 1994-01-26 International Business Machines Corporation Procédé et dispositif de construction automatique d'une bibliographie dans un environnement multi média

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
ANONYMOUS: "Accessing Remote Data Services Through an Object-Oriented Language.", IBM TECHNICAL DISCLOSURE BULLETIN, vol. 34, no. 6, 1 November 1991 (1991-11-01), New York, US, pages 423 - 425, XP002137313 *
ANONYMOUS: "Automated Bibliographic Information Gathering and Processing", IBM TECHNICAL DISCLOSURE BULLETIN, vol. 37, no. 4A, 1 April 1994 (1994-04-01), New York, US, pages 253 - 254, XP002137312 *
DATABASE INSPEC INSTITUTE OF ELECTRICAL ENGINEERS, STEVENAGE, GB; XP002137314 *
ZANIOLO C.: "The Database Language GEM", SIGMOD RECORD, ISSN 0163-5808, vol. 13, no. 4, May 1983 (1983-05-01), pages 207 - 218, XP058090662, DOI: doi:10.1145/971695.582226 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10552509B2 (en) 2000-11-13 2020-02-04 Talsk Research, Inc. Method and system for archiving and retrieving bibliography information and reference material
US7370083B2 (en) 2001-11-21 2008-05-06 Clearcube Technology, Inc. System and method for providing virtual network attached storage using excess distributed storage capacity
WO2004012103A3 (fr) * 2002-07-25 2004-04-08 Comm Synergy Technologies Llc Preservation de document
US7171433B2 (en) 2002-07-25 2007-01-30 Wolfe Gene J Document preservation
WO2004025466A3 (fr) * 2002-09-16 2004-12-16 Clearcube Technology Inc Infrastructure informatique repartie
US7370336B2 (en) 2002-09-16 2008-05-06 Clearcube Technology, Inc. Distributed computing infrastructure including small peer-to-peer applications
US7430616B2 (en) 2002-09-16 2008-09-30 Clearcube Technology, Inc. System and method for reducing user-application interactions to archivable form
US7434220B2 (en) 2002-09-16 2008-10-07 Clearcube Technology, Inc. Distributed computing infrastructure including autonomous intelligent management system
US7606824B2 (en) 2005-11-14 2009-10-20 Microsoft Corporation Databinding workflow data to a user interface layer
WO2009010989A3 (fr) * 2007-07-13 2010-06-24 Anuradha Vaidyanathan Procédé et système pour stocker et récupérer des données
US8924395B2 (en) 2010-10-06 2014-12-30 Planet Data Solutions System and method for indexing electronic discovery data
US9519794B2 (en) 2013-12-10 2016-12-13 International Business Machines Corporation Desktop redaction and masking

Also Published As

Publication number Publication date
AU2217000A (en) 2000-07-31

Similar Documents

Publication Publication Date Title
US6924827B1 (en) Method and system for allowing a user to perform electronic data gathering using foldable windows
Karger et al. Haystack: A customizable general-purpose information management tool for end users of semistructured data
US7363581B2 (en) Presentation generator
US7426687B1 (en) Automatic linking of documents
KR101960115B1 (ko) 대화 스레드의 요약 기법
US6253239B1 (en) System for indexing and display requested data having heterogeneous content and representation
US7315848B2 (en) Web snippets capture, storage and retrieval system and method
JP4064549B2 (ja) ドキュメントの作成を援助する方法およびシステム
US9858255B1 (en) Computer-implemented method and system for automated claim construction charts with context associations
US6434546B1 (en) System and method for transferring attribute values between search queries in an information retrieval system
US8001490B2 (en) System, method and computer program product for a content publisher for wireless devices
US20050091186A1 (en) Integrated method and apparatus for capture, storage, and retrieval of information
US20090210780A1 (en) Document processing and management approach to creating a new document in a mark up language environment using new fragment and new scheme
US20090094327A1 (en) Method and apparatus for mapping a site on a wide area network
US20040103087A1 (en) Method and apparatus for combining multiple search workers
JP2000090076A (ja) ドキュメント管理方法およびドキュメント管理システム
US20100332967A1 (en) System and method for automatically generating web page augmentation
US8306984B2 (en) System, method, and data structure for providing access to interrelated sources of information
Carr et al. Implementing an open link service for the World Wide Web
US20030018650A1 (en) Link management of document structures
WO2000039713A1 (fr) Procede et systeme de collecte electronique de donnees parmi de multiples sources de donnees
JP2002297662A (ja) 構造化文書編集方法および構造化文書編集装置および端末装置およびプログラム
Zoller et al. WEBCON: a toolkit for an automatic, data dictionary based connection of databases to the WWW
Li et al. ViDE: A visual Data extraction environment for the web
Ginsburg Unified Citation Management and Visualization Using Open Standards

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK TJ TM TR TT UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase