WO2018186599A1 - Extraction et structuration automatiques, par sujet, d'un sous-thème d'interrogation - Google Patents
Extraction et structuration automatiques, par sujet, d'un sous-thème d'interrogation Download PDFInfo
- Publication number
- WO2018186599A1 WO2018186599A1 PCT/KR2018/002834 KR2018002834W WO2018186599A1 WO 2018186599 A1 WO2018186599 A1 WO 2018186599A1 KR 2018002834 W KR2018002834 W KR 2018002834W WO 2018186599 A1 WO2018186599 A1 WO 2018186599A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- topic
- query
- search
- tree
- subtopic
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
Definitions
- the description below relates to a technique for automatically extracting and structuring subtopics suitable for queries.
- the search system When providing a search result of a query input by a user, the search system provides various functions to help the user further search in addition to documents matching the search condition. Representatives that help users navigate further include related search terms, related tags, and search term autocompletion. They are based on identifying queries and frequently appearing search terms or tags based on the co-occurrence of word pairs.
- Korean Patent Application Publication No. 10-2012-0096806 discloses a search term recommendation system and a search term recommendation method that select a search term based on location information of a user terminal and provide the search term to a user. Is disclosed.
- the shopping search system may provide a function to help the product search by using hierarchical information such as the brand, color, and price of the product in case of a shopping intention.
- a computer-implemented topic structuring method comprising: extracting a subtopic associated with the topic for each topic; Generating a topic tree for the subtopic using hierarchical information of the subject; And providing a sub-topic hierarchically as an associated search word for the query according to a topic tree of a topic to which the query belongs, when a query for searching is given.
- the extracting may include extracting the subtopic by analyzing words related to the core object that determines the subject.
- the method may further include filtering the subtopic according to at least one of a document appearance frequency and a retrieval frequency.
- the method may further include clustering the subtopics according to a synonym or substring (substring) relationship to select a representative of each cluster.
- the generating may include generating the topic tree by labeling the subtopic with each class name of the hierarchical information.
- the generating may include: extracting a similar word from word embedding data for the subtopic; Clustering the similar words according to a synonym or substring (substring) relationship; And labeling the clustered words by mapping them to respective classes in linguistic taxonomy.
- the method may further include rebalancing the topic tree by reducing at least one of breadth and depth of the topic tree.
- the providing may include at least one condition of a subject score indicating a correlation between the query and the subtopic, the number of documents corresponding to the subtopic, and whether or not the topic is correct for the query. And filtering the subtopics accordingly.
- a computer-implemented search result providing method comprising: providing a search result corresponding to a query given a query for searching; Providing a subtopic associated with the topic in a hierarchical form with a plurality of depths as an associated search word for the query according to the hierarchical information of the subject to which the query belongs; And providing a search result corresponding to the query including the selected search word when at least one search word is selected from the subtopics.
- a topic structured system implemented in a computer comprising: at least one processor configured to execute a computer readable instruction, the at least one processor comprising: an extracting unit configured to extract subtopics related to the topic for each topic; A generator configured to generate a topic tree for the subtopic using hierarchical information of the subject; And a providing unit providing the subtopics hierarchically as a related search word for the query according to a topic tree of a topic to which the query belongs, when a query for searching is given.
- a specific topic when a specific topic is given, only the subtopics suitable for the topic are extracted, and hierarchical information is automatically constructed based on the specific topic, and then the subject is appropriately structured and presented according to the degree (segmentation) desired by the user. And attributes specific to the query that are relevant to the query and contribute to helping the user to efficiently identify and actually perform further navigation.
- FIG. 1 is a diagram illustrating an example of a network environment according to an embodiment of the present invention.
- FIG. 2 is a block diagram illustrating an internal configuration of an electronic device and a server according to an embodiment of the present invention.
- FIG. 3 illustrates an example of a process of layering a patterned query according to an embodiment of the present invention.
- FIG. 4 is a diagram illustrating an example of components that may be included in a processor of a server according to an embodiment of the present invention.
- FIG. 5 is a flowchart illustrating an example of a method that a server may perform according to an embodiment of the present invention.
- FIG. 6 shows an example of a process of filtering and grouping subtopic candidates for queries 'Guam' and 'potato' according to an embodiment of the present invention.
- FIG. 7 is a flowchart illustrating an example of a process of constructing hierarchical information according to an embodiment of the present invention.
- FIG. 8 shows an example of hierarchical information constructed by using clustering and language taxonomy for a travel subject.
- FIG 9 illustrates an example of a process of converting a topic network constructed according to an embodiment of the present invention into a tree having a depth of 2 (2-depth tree).
- FIG. 10 is a flowchart illustrating an example of a tree rebalancing process according to an embodiment of the present invention.
- 11 to 12 are diagrams illustrating examples of a tree rebalancing process according to an embodiment of the present invention.
- FIG. 13 to 14 illustrate examples of a search result screen in which a 2-depth topic structure is reflected according to an embodiment of the present invention.
- Embodiments of the present invention relate to techniques for automatically extracting and structuring subtopics suitable for queries.
- Embodiments including those specifically disclosed herein, provide topical query topics and allow for the organization of topics for efficient information retrieval, thereby providing significant improvements in terms of accuracy, efficiency, scalability, cost savings, and the like. Achieve the advantages.
- FIG. 1 is a diagram illustrating an example of a network environment according to an embodiment of the present invention.
- the network environment of FIG. 1 illustrates an example including a plurality of electronic devices 110, 120, 130, and 140, a plurality of servers 150 and 160, and a network 170.
- 1 is an example for describing the present invention, and the number of electronic devices or the number of servers is not limited as shown in FIG. 1.
- the plurality of electronic devices 110, 120, 130, and 140 may be fixed terminals or mobile terminals implemented as computer devices. Examples of the plurality of electronic devices 110, 120, 130, and 140 include smart phones, mobile phones, tablet PCs, navigation systems, computers, notebook computers, digital broadcasting terminals, personal digital assistants (PDAs), and PMPs ( Portable Multimedia Player).
- PDAs personal digital assistants
- PMPs Portable Multimedia Player
- the first electronic device 110 may communicate with other electronic devices 120, 130, 140 and / or the server 150, 160 through the network 170 using a wireless or wired communication scheme.
- the communication method is not limited, and may include not only a communication method using a communication network (for example, a mobile communication network, a wired internet, a wireless internet, a broadcasting network) that the network 170 may include, but also a short range wireless communication between devices.
- the network 170 may include a personal area network (PAN), a local area network (LAN), a campus area network (CAN), a metropolitan area network (MAN), a wide area network (WAN), and a broadband network (BBN). And one or more of networks such as the Internet.
- the network 170 may also include any one or more of network topologies, including bus networks, star networks, ring networks, mesh networks, star-bus networks, trees, or hierarchical networks, but It is not limited.
- Each of the servers 150 and 160 communicates with the plurality of electronic devices 110, 120, 130, and 140 through the network 170 to provide a command, code, file, content, service, or the like. It may be implemented in devices.
- the server 160 may provide a file for installing an application to the first electronic device 110 connected through the network 170.
- the first electronic device 110 may install an application using a file provided from the server 160.
- the server 150 is provided by accessing the server 150 under the control of an operating system (OS) included in the first electronic device 110 or at least one program (for example, a browser or the installed application). Can be provided with services or content.
- OS operating system
- the server 150 sends a code corresponding to the service request message to the first.
- the electronic device 110 may transmit the content to the electronic device 110, and the first electronic device 110 may provide content to the user by configuring and displaying a screen according to a code according to the control of the application.
- 2 is a block diagram illustrating an internal configuration of an electronic device and a server according to an embodiment of the present invention. 2 illustrates an internal configuration of the first electronic device 110 as an example of one electronic device and the server 150 as an example of one server. Other electronic devices 120, 130, 140 or server 160 may also have the same or similar internal configuration.
- the first electronic device 110 and the server 150 may include memories 211 and 221, processors 212 and 222, communication modules 213 and 223, and input / output interfaces 214 and 224.
- the memories 211 and 221 are computer-readable recording media, and may include non-volatile permanent storage devices such as random access memory (RAM), read only memory (ROM), and disk drives.
- the memory 211 and 221 may store an operating system or at least one program code (for example, a code for an application installed in the first electronic device 110 and driven). These software components may be loaded from a computer readable recording medium separate from the memories 211 and 221.
- Such a separate computer-readable recording medium may include a computer-readable recording medium such as a floppy drive, a disk, a tape, a DVD / CD-ROM drive, a memory card, and the like.
- software components may be loaded into the memory 211, 221 through the communication module 213, 223 rather than a computer readable recording medium.
- the at least one program is a program installed by files provided by the file distribution system (for example, the server 160 described above) through the network 170 for distributing installation files of developers or applications (for example, It can be loaded into the memory (211, 221) based on the above-described application).
- Processors 212 and 222 may be configured to process instructions of a computer program by performing basic arithmetic, logic, and input / output operations. Instructions may be provided to the processors 212, 222 by the memory 211, 221 or the communication modules 213, 223. For example, the processors 212 and 222 may be configured to execute a command received according to a program code stored in a recording device such as the memory 211 and 221.
- the communication modules 213 and 223 may provide a function for the first electronic device 110 and the server 150 to communicate with each other through the network 170.
- the other electronic device eg, the second electronic device 120
- other server eg, server 160
- a request eg, a search request
- a request generated by the processor 212 of the first electronic device 110 according to a program code stored in a recording device such as the memory 211 may be controlled according to the control of the communication module 213. It may be delivered to the server 150 through 170.
- control signals, commands, contents, files, and the like provided according to the control of the processor 222 of the server 150 are transmitted to the communication module of the first electronic device 110 via the communication module 223 and the network 170.
- a control signal or command of the server 150 received through the communication module 213 may be transmitted to the processor 212 or the memory 211, and the content or file may be transmitted to the first electronic device 110. May be stored as a storage medium that may further include.
- the input / output interface 214 may be a means for interfacing with the input / output device 215.
- the input device may include a device such as a keyboard or mouse
- the output device may include a device such as a display for displaying a communication session of an application.
- the input / output interface 214 may be a means for interfacing with a device in which functions for input and output are integrated into one, such as a touch screen.
- the processor 212 of the first electronic device 110 uses data provided by the server 150 or the second electronic device 120 in processing a command of a computer program loaded in the memory 211.
- the service screen or contents configured to be displayed on the display may be displayed through the input / output interface 214.
- the input / output interface 224 may output information configured using data provided by the server 150 when the processor 222 of the server 150 processes a command of a computer program loaded in the memory 221. have.
- the first electronic device 110 and the server 150 may include more components than those of FIG. 2. However, it is not necessary to clearly show most of the prior art components.
- the first electronic device 110 may be implemented to include at least a part of the above-described input / output device 215 or may be other such as a transceiver, a Global Positioning System (GPS) module, a camera, various sensors, a database, or the like. It may further include components.
- GPS Global Positioning System
- an acceleration sensor when the first electronic device 110 is a smartphone, an acceleration sensor, a gyro sensor, a camera, various physical buttons, a button using a touch panel, an input / output port, and vibration for a smartphone generally include It can be appreciated that various components such as a vibrator may be implemented to be further included in the first electronic device 110.
- Topic structuring (grouping and hierarchical) is required to provide as many search results as possible in one query and to enable efficient information retrieval.
- search system When a search system provides a search result of a query input by a user, the search system provides various functions to help the user further search in addition to the documents matching the search condition. Typical examples include related search terms, related tags, and automatic completion of search terms. These features are useful in the following ways.
- the user can indirectly grasp the detailed attributes / subtopics related to the query.
- association is ambiguous, so the specific relationship with the query (eg, parent / child concept, synonym or sibling concept) cannot be known. Therefore, when the number of search terms or tags provided increases, it is difficult to arrange the structure structurally, and thus, the number provided to the user can be reduced from the viewpoint of user use.
- the associated search terms or tags are provided unorganized according to each meaning, so that it is not very helpful for further searching.
- the present invention has a function to enable the user to efficiently grasp the detailed attributes / subtopics related to the query while helping the user further search, and propose an automatic subtopic extraction and structured technique that can solve the above-mentioned limitations. do.
- the key contents of the topic structuring system according to the present invention are as follows.
- Topic structuring system patterns main queries by topic into "main object + subtopic".
- the main object refers to a core object that determines a subject
- the subtopic includes at least one of a sub object and an attribute.
- the sub object refers to an object that embodies the subject
- the attribute refers to a word representing an attribute of the subject such as suffix or prefix.
- the topic structuring system hierarchies the patterned queries according to the semantic relationship of sub-objects and attributes.
- 3 illustrates an example of a process of layering a patterned query according to an embodiment of the present invention.
- the main object MainObj and the subtopics SubObj and Suffix may be layered based on a query of a specific subject, that is, the main object MainObj.
- the topic structuring system can provide hierarchical queries and subtopics with the search results (documents) to the user. At this time, the topic structuring system can contribute to help the user to efficiently identify detailed subtopics that are suitable for the topic and related to the query and to assist the actual additional search by structuring and presenting the subtopic according to the degree (segmentation) desired by the user. have.
- FIG. 4 is a diagram illustrating an example of components that may be included in a processor of a server according to an embodiment of the present invention
- FIG. 5 is an example of a method that may be performed by a server according to an embodiment of the present invention. It is a flowchart shown.
- the processor 222 of the server 150 may include the extractor 410, the refiner 420, the generator 430, the adjuster 440, and the provider 450 as components. It may include.
- the processor 222 and the components of the processor 222 may control the server 150 to perform the steps S510 to S550 included in the method of FIG. 5.
- the processor 222 and the components of the processor 222 may be implemented to execute instructions according to code of an operating system included in the memory 221 and code of at least one program.
- the components of the processor 222 may be representations of different functions performed by the processor 222 according to a control command provided by an operating system or at least one program.
- the extractor 410 may be used as a functional expression for the processor 222 to extract the main object and the subtopic according to the above-described control command.
- the components of the processor 222 will be described first as follows.
- the MainObj + Suffix extraction module of the extractor 410 extracts main objects and attributes by subject. Natural language understanding technology can be used to extract main objects and attributes.
- the SubObj extraction module of the extraction unit 410 extracts a sub object when there is a main object + (property) for each subject.
- various statistical information eg, clicks, likes, comments, authors, etc.
- dictionary information may be used.
- the ranker (Suffix Ranker, SubObj Ranker) module of the refiner 420 determines the ranking of sub-objects and / or attributes in order of importance with respect to the main object. In this case, various information such as the number of clicks, the number of likes, the number of comments, and the number of authors may be utilized to determine the importance.
- the Post-Processor (Post-Processor Ranker, Post-Processor Ranker) module of the refiner 420 receives the ranked sub-objects and / or attributes and clusters them according to synonyms or substring (substring) relationships and Select a representative value.
- the TopicGraphToTree module of the generator 430 collects clustered sub-objects and / or attributes, finds the relationship strength of how often they appear in a query or document, creates a network (graph) structure, and then creates a tree (search / cluster-based tree). Will be converted. A detailed search / cluster-based tree will be described below.
- the TreeConstructor module of the generator 430 integrates a dictionary-based tree and a search / cluster-based tree structure to form a final topic tree (eg, a 2-depth tree structure).
- the Topic Reranker module of the adjuster 440 further filters the topic tree according to filtering conditions (eg, the number of documents, subject suitability, correctness, etc.).
- the New Object Assigner module of the adjusting unit 440 extracts and assigns a new item related to the main object to the original tree structure. If there are a lot of new items that do not fit in the tree structure, the process is restarted from the beginning to reconstruct the tree.
- the Document Finding API module of the providing unit 450 constructs a query based on the final topic tree to extract a suitable document.
- a filtering function may also be included.
- the Auto-Tagger module of the providing unit 450 constructs a topic tag based on the final topic tree and tags it in a suitable document.
- Steps S510 to S550 included in the method of FIG. 5 may be performed through the processor 222 including the above components.
- the extractor 410 may extract a main object, which is a core object for determining a corresponding subject, and a subtopic that embodies the corresponding subject for each subject.
- the extractor 410 may extract sub-objects and / or attribute candidates by analyzing words frequently appearing with the main object on the document or by analyzing words frequently used in the search system with the main object. .
- the refiner 420 may filter the subtopics according to the appearance frequency or the search frequency in the document and then perform grouping based on the relationship between words.
- the candidate filtering process at least some of the sub object and / or attribute candidates may be filtered.
- the refiner 420 may filter the sub-object and / or the attribute according to at least one of the frequency of appearance in the document and the search frequency of the user.
- the frequency of appearance can be filtered by limiting data of a specific period.
- the filtering method may vary according to the characteristics of the subject. For example, if the subject has high timeliness, the data may be filtered recently for a certain period (for example, one week before the present).
- the refiner 420 may group the selected sub-objects and / or attribute candidates through the candidate filtering process in consideration of a substring relation and the like, and may select a representative for each group after grouping.
- the method of selecting a representative may be various.
- the representative may be selected to have the highest search frequency.
- the refiner 420 ranks subtopics in order of importance (eg, frequency of appearance in documents, search frequency, etc.), clusters the ranked subtopics according to a synonym or substring relationship, and selects a representative of each cluster. can do. Extraction and purification of the subtopics consists of candidate selection and grouping / representation.
- WTRIP and FOOD are classification codes (category classification codes) indicating the subject of the query, and the number next to the words indicates the frequencies retrieved associated with the query.
- the generation unit 430 may generate a topic tree for the grouped subtopics using hierarchical information on the corresponding subject.
- the generation unit 430 may generate a topic tree by labeling each grouped subtopic to match each class name of the hierarchy by using hierarchy information.
- hierarchy information For example, dictionary information constructed from a database containing various kinds of contents is one of information that can be usefully used for hierarchical information.
- the generation unit 430 may generate a topic tree based on the existing hierarchical information such as dictionary information. For example, cooking or recipe topics have a rich hierarchy of information based on a cooking encyclopedia.
- the generation unit 430 may be used to generate a topic tree by constructing the layer information based on a word embedding-based clustering technique and a taxonomy.
- the present invention has an advantage in that a topic can be automatically layered even when there is no layer information.
- FIG. 7 is a flowchart illustrating an example of a process of constructing hierarchical information using a word embedding-based clustering technique and language taxonomy according to an embodiment of the present invention.
- the generation unit 430 extracts a similar word from word embedding data for a subtopic (S701), clusters the extracted word according to a synonym or substring relation (S702), and then clusters the word. Can be labeled based on linguistic taxonomy (S703).
- 8 shows an example of hierarchical information constructed by using clustering and language taxonomy for a travel subject.
- the word embedding-based clustering process (S702), the word embedding data is learned by subject-specific documents (eg, blog posts, etc.), the word vector values of subtopics requiring clustering are learned from the learning data, and clustering is performed based on the word vector values. do.
- clustering may use various methods such as hierarchical clustering, K-means algorithm, density clustering, and the like.
- the clustered result may be labeled by mapping the clustered result to each class on the linguistic taxonomy. At this time, language taxonomy is general, and there are many unnecessary classes when compared with hierarchical information specialized for a subject. Therefore, it is necessary to delete unnecessary classes, which will be described later in the rebalancing process of the adjusting unit 440.
- clustered subtopics are gathered to find out how often they appear together in a query or document, create a network (graph) structure, convert them into topic trees (cluster-based trees), and cluster them with dictionary-based topic trees.
- the topic tree built on the basis can be integrated to form the final tree structure.
- the adjustment unit 440 may perform rebalancing on the topic tree constructed in the topic layering step of the generation unit 430 according to the purpose of the user or the system.
- the adjuster 440 may perform pruning on the topic tree in consideration of subject fitness, search intention, search result amount, and the like.
- Table 1 shows the definition of the topic network according to an embodiment of the present invention.
- the generation unit 430 generates a topic tree using information constructed by using a search frequency and clustering.
- a topic network G is constructed by representing each word as a node and the relationship between the words as edges.
- the node V and the trunk line E in the topic network G may be defined as shown in Table 1.
- the generation unit 430 changes the topic network to the topic tree in consideration of the search frequency.
- 9 illustrates an example of a process of converting a topic network constructed according to an embodiment of the present invention into a tree having a depth of 2 (2-depth tree).
- Various algorithms may be used to convert the network into a tree, and for example, a minimum spanning tree construction algorithm in a weight graph may be applied.
- the controller 440 may then be based on linguistic taxonomy. You can combine one tree with a tree based on search frequency / clustering to perform rebalancing based on user or system purpose.
- 10 is a flowchart illustrating an example of a tree rebalancing process according to an embodiment of the present invention. Referring to FIG. 10, the coordinator 440 may insert a cluster corresponding to a leaf node in a clustering-based tree into a corresponding class of a dictionary-based tree (S1001). The breadth and depth of the topic tree are different for each query, and the depth and width of the topic tree are generally large, so it is necessary to reduce them (S1002 ⁇ S1003).
- FIG. 11 illustrates some methods for reducing the width of a tree rebalancing process.
- the width of the topic tree may be reduced by bottom-up node movement and / or top-down node movement.
- FIG. 12 illustrates some methods for reducing the depth during the tree rebalancing process.
- the depth of the topic tree may be reduced by replacing some nodes with child nodes.
- the provider 450 may provide a subtopic along with a search result corresponding to the query by using a topic tree of a topic to which the query belongs.
- the provider 450 may filter the subtopic according to various conditions as the related search word for the query and provide the search result with the search result.
- the provider 450 may filter the subtopic according to the subject fitness of the query. Given a query for searching, you can check the subject to which the query belongs, and if the query belongs to several topics, you can filter the topics that do not fit the given subject. To this end, the subject score of 'Query + Subtopic', which is a score indicating the correlation between the query and the subtopic, may be used.
- Text categorization eg, support vector machine (SVM), k-Nearest Neighbor (kNN), Convolutional Neural Networks (CNN), etc.
- SVM support vector machine
- kNN k-Nearest Neighbor
- CNN Convolutional Neural Networks
- the provider 450 may filter the subtopic using the number of documents corresponding to the subtopic. If the number of documents included in the search result is less than a certain number, the usefulness may be reduced, so the corresponding subtopic may be excluded. As another example, the provider 450 may filter the subtopic based on whether the topic is correct. For subtopics where it is more appropriate to provide correctness information than providing multiple documents as a result of a search (for example, when correctness information is required, such as Guam weather), you can include it as a related search term for the query. .
- the provider 450 may hierarchically expose detailed subtopics (sub objects and / or attributes) related to the query as a related search word for the query input by the user.
- the topic tree for each topic may be updated in units of a certain period, and the update cycle of the topic tree may be determined in consideration of the characteristics of the corresponding topic according to the topic.
- FIG. 13 to 14 illustrate examples of a search result screen reflecting a topic structure of two depths according to an exemplary embodiment of the present invention.
- queries of depth1 and queries of depth2 are provided as related search terms of the input query according to hierarchical information of a subject corresponding to the input query. can do.
- a related search term of the input query “Guam” according to hierarchical information of the corresponding subject “Guam” along with a search result corresponding to the input query.
- the queries 1310 of depth1 and the queries 1320 of depth2 may be provided.
- each of the queries provided as the related search word is configured in a form selectable by the user, and the query selected by the user is automatically added to the search box 1301.
- the query selected by the user is automatically added to the search box 1301.
- FIG. 13 when the user selects 'delicious' from the queries 1310 of depth1 provided as the related query of the initial query 'Guam', 'delicious' is additionally input to the search box 1301.
- a search result 1302 of depth1 may be exposed using the query “Guam restaurant”.
- the user selects the query 'handmade burger' among the queries 1320 of depth2, as shown in FIG. 14, 'handmade burger' is additionally input into the search box 1301, and the 'guam restaurant homemade burger' is selected.
- the query may expose a search result 1402 of depth2.
- a search result may be provided along with hierarchical topics to help efficient additional search and provide a variety of search results with a single query.
- hierarchical topic structures for search ranking.
- documents containing sub-objects and attributes are likely to be relatively high quality documents, which can be used to boost these documents in search ranking.
- the apparatus described above may be implemented as a hardware component, a software component, and / or a combination of hardware components and software components.
- the devices and components described in the embodiments may include a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate array (FPGA), a programmable PLU (programmable). It can be implemented using one or more general purpose or special purpose computers, such as logic units, microprocessors, or any other device capable of executing and responding to instructions.
- the processing device may execute an operating system (OS) and one or more software applications running on the operating system.
- the processing device may also access, store, manipulate, process, and generate data in response to the execution of the software.
- OS operating system
- the processing device may also access, store, manipulate, process, and generate data in response to the execution of the software.
- processing device includes a plurality of processing elements and / or a plurality of types of processing elements. It can be seen that it may include.
- the processing device may include a plurality of processors or one processor and one controller.
- other processing configurations are possible, such as parallel processors.
- the software may include a computer program, code, instructions, or a combination of one or more of the above, and configure the processing device to operate as desired, or process it independently or collectively. You can command the device.
- the software and / or data may be embodied in any type of machine, component, physical device, computer storage medium or device in order to be interpreted by or provided to the processing device or to provide instructions or data. have.
- the software may be distributed over networked computer systems so that they may be stored or executed in a distributed manner. Software and data may be stored on one or more computer readable recording media.
- the method according to the embodiment may be embodied in the form of program instructions that can be executed by various computer means and recorded in a computer readable medium.
- the medium may be to continuously store a program executable by the computer, or to temporarily store for execution or download.
- the medium may be a variety of recording means or storage means in the form of a single or several hardware combined, not limited to a medium directly connected to any computer system, it may be distributed on the network. Examples of the medium include magnetic media such as hard disks, floppy disks and magnetic tape, optical recording media such as CD-ROMs and DVDs, magneto-optical media such as floptical disks, And ROM, RAM, flash memory, and the like, configured to store program instructions.
- examples of another medium may include a recording medium or a storage medium managed by an app store that distributes an application, a site that supplies or distributes various software, a server, or the like.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
L'invention concerne une technique d'extraction et de structuration automatiques d'un sous-thème approprié relatif à une interrogation. Un procédé de structuration de thème peut comprendre les étapes consistant : à extraire, par un sujet, un sous-thème associé à un sujet ; à générer une arborescence thématique pour le sous-thème en utilisant des informations hiérarchiques du sujet ; et à fournir hiérarchiquement le sous-thème sous forme d'un mot de recherche associé à une interrogation en fonction d'une arborescence thématique d'un sujet auquel l'interrogation appartient, lorsque l'interrogation concernant une recherche est soumise.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2019554996A JP6808851B2 (ja) | 2017-04-06 | 2018-03-09 | トピック構造化方法、検索結果提供方法、コンピュータプログラムおよびトピック構造化システム |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR10-2017-0044683 | 2017-04-06 | ||
KR20170044683 | 2017-04-06 | ||
KR10-2017-0085316 | 2017-07-05 | ||
KR1020170085316A KR101958729B1 (ko) | 2017-04-06 | 2017-07-05 | 주제별 질의의 서브토픽 자동 추출 및 구조화 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2018186599A1 true WO2018186599A1 (fr) | 2018-10-11 |
Family
ID=63713479
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/KR2018/002834 WO2018186599A1 (fr) | 2017-04-06 | 2018-03-09 | Extraction et structuration automatiques, par sujet, d'un sous-thème d'interrogation |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2018186599A1 (fr) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2020113048A (ja) * | 2019-01-11 | 2020-07-27 | 富士ゼロックス株式会社 | 情報処理装置及びプログラム |
JP2020119254A (ja) * | 2019-01-23 | 2020-08-06 | 株式会社日立製作所 | テキストデータ収集装置及び方法 |
CN112100360A (zh) * | 2020-10-30 | 2020-12-18 | 北京淇瑀信息科技有限公司 | 一种基于向量检索的对话应答方法、装置和系统 |
JP2021170309A (ja) * | 2020-04-15 | 2021-10-28 | 北京百度網訊科技有限公司 | トピック概念マイニング方法、装置、電子機器、記憶媒体及びプログラム |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR100818357B1 (ko) * | 2006-05-10 | 2008-04-02 | (주)윕스 | 연관모델을 이용한 검색어 제시 시스템 및 그 제시 방법 |
KR20100080099A (ko) * | 2008-12-31 | 2010-07-08 | 주식회사 솔트룩스 | 정보 검색 방법 및 이를 수행할 수 있는 프로그램이 수록된컴퓨터로 읽을 수 있는 기록 매체 |
KR20110099574A (ko) * | 2010-03-02 | 2011-09-08 | 연세대학교 산학협력단 | 이동통신 단말기를 이용한 검색어 추천 방법 및 장치 |
KR101485940B1 (ko) * | 2013-08-23 | 2015-01-27 | 네이버 주식회사 | 시멘틱 뎁스 구조 기반의 검색어 제시 시스템 및 방법 |
US20170061485A1 (en) * | 2011-03-22 | 2017-03-02 | Excalibur Ip, Llc | Search assistant system and method |
-
2018
- 2018-03-09 WO PCT/KR2018/002834 patent/WO2018186599A1/fr active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR100818357B1 (ko) * | 2006-05-10 | 2008-04-02 | (주)윕스 | 연관모델을 이용한 검색어 제시 시스템 및 그 제시 방법 |
KR20100080099A (ko) * | 2008-12-31 | 2010-07-08 | 주식회사 솔트룩스 | 정보 검색 방법 및 이를 수행할 수 있는 프로그램이 수록된컴퓨터로 읽을 수 있는 기록 매체 |
KR20110099574A (ko) * | 2010-03-02 | 2011-09-08 | 연세대학교 산학협력단 | 이동통신 단말기를 이용한 검색어 추천 방법 및 장치 |
US20170061485A1 (en) * | 2011-03-22 | 2017-03-02 | Excalibur Ip, Llc | Search assistant system and method |
KR101485940B1 (ko) * | 2013-08-23 | 2015-01-27 | 네이버 주식회사 | 시멘틱 뎁스 구조 기반의 검색어 제시 시스템 및 방법 |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2020113048A (ja) * | 2019-01-11 | 2020-07-27 | 富士ゼロックス株式会社 | 情報処理装置及びプログラム |
JP7238411B2 (ja) | 2019-01-11 | 2023-03-14 | 富士フイルムビジネスイノベーション株式会社 | 情報処理装置及びプログラム |
JP2020119254A (ja) * | 2019-01-23 | 2020-08-06 | 株式会社日立製作所 | テキストデータ収集装置及び方法 |
JP7085499B2 (ja) | 2019-01-23 | 2022-06-16 | 株式会社日立製作所 | テキストデータ収集装置及び方法 |
JP2021170309A (ja) * | 2020-04-15 | 2021-10-28 | 北京百度網訊科技有限公司 | トピック概念マイニング方法、装置、電子機器、記憶媒体及びプログラム |
JP7072034B2 (ja) | 2020-04-15 | 2022-05-19 | ベイジン バイドゥ ネットコム サイエンス テクノロジー カンパニー リミテッド | トピック概念マイニング方法、装置、電子機器、記憶媒体及びプログラム |
US11651164B2 (en) | 2020-04-15 | 2023-05-16 | Beijing Baidu Netcom Science Technology Co., Ltd. | Method, device, equipment, and storage medium for mining topic concept |
CN112100360A (zh) * | 2020-10-30 | 2020-12-18 | 北京淇瑀信息科技有限公司 | 一种基于向量检索的对话应答方法、装置和系统 |
CN112100360B (zh) * | 2020-10-30 | 2024-02-02 | 北京淇瑀信息科技有限公司 | 一种基于向量检索的对话应答方法、装置和系统 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2018074716A1 (fr) | Procédé et système pour recommander une interrogation à l'aide d'un contexte de recherche | |
JP6808851B2 (ja) | トピック構造化方法、検索結果提供方法、コンピュータプログラムおよびトピック構造化システム | |
Losiewicz et al. | Textual data mining to support science and technology management | |
US10558754B2 (en) | Method and system for automating training of named entity recognition in natural language processing | |
CN109564573B (zh) | 来自计算机应用元数据的平台支持集群 | |
US9110985B2 (en) | Generating a conceptual association graph from large-scale loosely-grouped content | |
CN102197394B (zh) | 通过基于注释聚集搜索结果来进行数字图像取得 | |
WO2017057921A1 (fr) | Procédé et système de classement automatique de données exprimées par une pluralité de facteurs avec les valeurs d'une séquence de mots et de symboles de texte au moyen d'un d'apprentissage approfondi | |
CN111639486A (zh) | 段落搜索方法、装置、电子设备及存储介质 | |
WO2018186599A1 (fr) | Extraction et structuration automatiques, par sujet, d'un sous-thème d'interrogation | |
JP6629935B2 (ja) | 文書のカテゴリ分類のためのディープラーニング学習方法およびそのシステム | |
Nesi et al. | Geographical localization of web domains and organization addresses recognition by employing natural language processing, Pattern Matching and clustering | |
US11256707B1 (en) | Per-query database partition relevance for search | |
Zhang et al. | Mining and clustering service goals for restful service discovery | |
JPWO2020005986A5 (fr) | ||
CN110888970B (zh) | 文本生成方法、装置、终端和存储介质 | |
US20180285448A1 (en) | Producing personalized selection of applications for presentation on web-based interface | |
Maiya et al. | Topic similarity networks: visual analytics for large document sets | |
WO2024185948A1 (fr) | Procédé et appareil de génération et de recherche de dictionnaire de termes de recherche basés sur un réseau neuronal artificiel | |
CN101836209B (zh) | 管理信息地图的系统和方法 | |
WO2018022333A1 (fr) | Catégories de demande d'application informatique entre plateformes | |
Prasanth et al. | Effective big data retrieval using deep learning modified neural networks | |
Hu et al. | Embracing information explosion without choking: Clustering and labeling in microblogging | |
Tabarcea et al. | Framework for location-aware search engine | |
Huang et al. | Pandasearch: A fine-grained academic search engine for research documents |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18780404 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2019554996 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18780404 Country of ref document: EP Kind code of ref document: A1 |