CN120578746A

CN120578746A - Industry question-answer enhancement method and system based on multi-dimensional label automatic extraction

Info

Publication number: CN120578746A
Application number: CN202511075834.3A
Authority: CN
Inventors: 于海军; 严晨; 陈家树
Original assignee: Shanghai Anshuo Enterprise Credit Reporting Service Co ltd
Current assignee: Shanghai Anshuo Enterprise Credit Reporting Service Co ltd
Priority date: 2025-08-01
Filing date: 2025-08-01
Publication date: 2025-09-02
Anticipated expiration: 2045-08-01

Abstract

The application relates to the technical field of natural language processing, and provides an industry question-answer enhancement method and system based on automatic extraction of multidimensional labels. The method comprises the steps of labeling user questions, performing multi-scale expansion of labels to obtain a hierarchical demand label group, structuring the label group into a question-answer demand label map according to a cross-level association relation, performing hierarchical extraction on a question-answer library along K layers of labels in the map based on a preset retrieval convergence vector to output a minimum matching domain answer set, extracting K-1 layers of backtracking answer sets along the convergence vector, outputting association expansion answer sets through aggregation, and assembling the two groups of answer sets into an enhanced industry question-answer response output. The application solves the technical problems of low answer matching precision and narrow coverage range caused by single understanding level of the traditional industry question-answering system on the user demands and insufficient relevance of labels, and achieves the technical effects of improving question-answering accuracy, richness and response speed through hierarchical expansion and relevance expansion of multi-dimensional labels.

Description

Industry question-answer enhancement method and system based on multi-dimensional label automatic extraction

Technical Field

The application relates to the technical field of natural language processing, in particular to an industry question-answer enhancement method and system based on automatic extraction of multidimensional labels.

Background

In the industry application scene of deep fusion of digital and intelligent technologies, the traditional industry question-answering system based on keyword matching or shallow semantic analysis gradually exposes a technical bottleneck. Along with the exponential increase of the knowledge graph scale of each vertical field, the user query requirement presents multidimensional association characteristics, and the knowledge graph comprises explicit fact retrieval and implicit cross-level knowledge deduction requirements. The prior art system has three general defects that firstly, a planarization tag system is difficult to describe hierarchical structural features of industry knowledge, so that the requirement understanding dimension is single, secondly, a static tag association mechanism cannot adapt to the context evolution of a dynamic query scene, so that the contradiction between answer matching precision and recall rate is caused, thirdly, a single retrieval path lacks the expansion aggregation capability of associated knowledge, and the knowledge service requirement of a complex decision scene is difficult to meet. Therefore, a scheme capable of dynamically resolving the multi-scale requirements and realizing hierarchical answer extraction is needed to break through the technical bottlenecks of the traditional question-answering system in the aspects of industry knowledge deep understanding, cross-domain correlation reasoning, dynamic knowledge fusion and the like.

Disclosure of Invention

The application provides an industry question-answering enhancement method and system based on multi-dimensional label automatic extraction, and aims to solve the technical problems of low answer matching precision and narrow coverage range caused by single understanding level of a traditional industry question-answering system on user requirements and insufficient label relevance.

The application discloses a first aspect of the method, which provides an industry question-answer enhancement method based on multi-dimensional label automatic extraction, and the method comprises the steps of obtaining a hierarchical demand label group through multi-scale expansion of demand labels after labeling processing user questions, structuring the hierarchical demand label group into a question-answer demand label map according to a label cross-level association relation of the hierarchical demand label group, wherein the question-answer demand label map comprises K-layer expansion demand labels, carrying out industry question-answer hierarchical label extraction on a user question-answer information base along the K-layer expansion demand labels on the basis of a preset search convergence vector by the question-answer demand label map, outputting a minimum matching domain answer set, extracting a K-1 layer backtracking extraction answer set along the search convergence vector, and outputting an association expansion answer set through association expansion aggregation, and assembling the minimum matching domain answer set and the association expansion answer set into an enhanced industry question-answer response output.

The application discloses another aspect of the multi-dimensional label automatic extraction-based industry question and answer enhancement system, which comprises a multi-scale expansion module, a label structuring module, a label extraction module and a response output module, wherein the multi-scale expansion module is used for carrying out multi-scale expansion on a demand label to obtain a hierarchical demand label group after a user question is processed in a labeled mode, the label structuring module is used for structuring the hierarchical demand label group into a question and answer demand label map according to a cross-level association relation of labels of the hierarchical demand label group, the question and answer demand label map comprises K layers of expansion demand labels, the label extraction module is used for carrying out the hierarchical label extraction of the industry question and answer on a user question and answer information base along the K layers of expansion demand labels on the basis of a preset search convergence vector to output a minimum matching domain answer set, the expansion aggregation module is used for outputting an association expansion answer set through association expansion aggregation after extracting K-1 layers of retroactive answer sets along the search convergence vector, and the response output module is used for assembling the minimum matching domain answer set and the association expansion answer set into enhanced industry question and answer response output.

One or more technical schemes provided by the application have at least the following technical effects or advantages:

according to the industry question-answer enhancement method based on multi-dimensional label automatic extraction, firstly, a user question is decomposed into basic labels, and the labels are subjected to multi-round expansion from different dimensions to form a three-dimensional label system. Then, according to the hierarchical association rules among the labels, the labels are organized into a tree-like map containing K layers of structures, each layer representing knowledge dimensions of different granularity. In the retrieval stage, deep layers are formed in the map along a preset retrieval convergence vector, the most core relevant label layer is precisely positioned, and a direct matching answer set is screened out from a knowledge base. Meanwhile, the related answers of the upper layer are automatically extracted when the path is returned, more peripheral information is expanded through a knowledge association network, and finally, the accurate answers and the associated knowledge are intelligently combined to form an enhanced answer with a core conclusion and background information, so that the breadth and depth of knowledge coverage are improved.

The foregoing description is only an overview of the present application, and is intended to be implemented in accordance with the teachings of the present application in order that the same may be more clearly understood and to make the same and other objects, features and advantages of the present application more readily apparent.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flow diagram of an industry question-answer enhancement method based on multi-dimensional label automatic extraction in one embodiment.

FIG. 2 is a diagram of an industry question-answer enhancement system architecture based on multi-dimensional tag automatic extraction in one embodiment.

Reference numerals illustrate a multi-scale expansion module 11, a tag structuring module 12, a tag extraction module 13, an expansion aggregation module 14, and a response output module 15.

Detailed Description

The embodiment of the application solves the technical problems of low answer matching precision and narrow coverage range caused by single understanding level of the traditional industry question-answering system on the user demand and insufficient label relevance by providing the industry question-answering enhancement method and system based on multi-dimensional label automatic extraction.

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. It will be apparent that the described embodiments are only some, but not all, embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

It should be noted that the terms "comprises" and "comprising," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or server that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus, but may include other steps or modules not expressly listed or inherent to such process, method, article, or apparatus.

In a first embodiment, as shown in fig. 1, the present application provides an industry question-answer enhancement method based on automatic extraction of multidimensional labels, the method comprising:

After the user problem is treated in the labeling mode, the hierarchical demand label group is obtained through multi-scale expansion of the demand labels.

In the embodiment of the application, when the user problem is processed, firstly, standardized processing (such as removing redundant information and extracting industry keywords) is carried out on user input to generate a basic tag group. Then, the basic tag is amplified layer by layer through a preset multi-level expansion container (such as a semantic expansion container, a hierarchy expansion container, a scene expansion container and the like), each expansion hierarchy is given different weights (such as the highest core word weight and the descending cross-domain associated word weight) according to the tag source, and a hierarchical tag group comprising a plurality of subsets is formed. The hierarchical structure not only maintains the core intention of the original problem, but also can cover potential association requirements through multi-scale expansion, and provides a structural basis for the subsequent construction of the knowledge graph.

Further, the present application provides a method for obtaining a hierarchical demand label group by performing multi-scale expansion of demand labels after user problems are treated by labeling, the method comprising:

The method comprises the steps of performing field self-adaptive text processing on a user problem to generate a standardized query text, dynamically extracting a core semantic unit of the standardized query text through a preset industry keyword library to generate a question-answer demand label group, and performing multi-scale expansion on the question-answer demand label group to obtain the hierarchical demand label group.

Preferably, firstly, the basic language cleaning process is performed on the text of the problem input by the user, redundant symbols, punctuations, stop words and the like are removed, then, according to the specific industry or domain to which the problem belongs, a specific dictionary or term library of the industry or domain is matched, and the matched dictionary or term library is used for converting common words and expressions in the problem of the user into standard terms of the domain, so that standardized query text is formed, for example, in the medical domain, "pain" can be mapped into "pain sense" or "pain score", and in the financial domain, "investment risk" can be mapped into "market fluctuation risk". Then, based on a preset industry keyword library, key semantic units are dynamically extracted from the processed standardized query text, wherein the key semantic units refer to parts with core information in the questions, such as proper nouns, important concepts or technical terms related to the industry, and through the extraction process, the most significant parts in the user questions can be captured and organized into question-answering requirement label groups, the labels reflect main focus points of the user questions, and an explicit direction is provided for subsequent processing of the question-answering system. The generated set of question-answer demand labels is then multi-scale expanded using semantic expansion containers, hierarchical expansion containers, scene expansion containers, and cross-domain expansion containers in the hierarchical expansion container set, thereby generating a hierarchical set of demand labels that may be progressively expanded from industry-level broad labels to more specific sub-domain, technology, or product labels. The finally formed hierarchical label group can comprehensively cover different dimensions of the user problem, so that the question-answering system can be more accurately matched with the requirements of the user in response.

Further, the application provides a multi-scale expansion method for the question-answer demand label group to obtain a hierarchical demand label group, wherein the method comprises the following steps:

The method comprises the steps of pre-constructing a hierarchy expansion container set, loading N question-answer demand labels in the question-answer demand label set into the hierarchy expansion container set, carrying out multi-scale label expansion through the semantic expansion container, the scene expansion container and the cross-domain expansion container, carrying out weight differentiation assignment according to container sources of expansion results, and outputting N hierarchy demand label subsets, wherein the N hierarchy demand label subsets form the hierarchy demand label set.

Optionally, in order to perform multi-scale expansion on the question-answer requirement tag group, a hierarchical expansion container group is pre-constructed, where the container group includes a semantic expansion container, a hierarchical expansion container, a scene expansion container and a cross-domain expansion container, where the semantic expansion container mainly performs expansion of synonyms and related words, the hierarchical expansion container is responsible for performing upper-lower level differentiation, the scene expansion container performs horizontal scene association, and the cross-domain expansion container expands tags from one domain to other related domains. And then, N question-answer demand labels in the question-answer demand label group are loaded into the hierarchical expansion container group, and in each container, the labels undergo a multi-scale expansion process, for example, a semantic expansion container firstly generates a synonymous/near-sense label for each question-answer demand label, the hierarchical expansion container carries out upper-lower level differentiation on a semantic expansion result, a scene expansion container associates an actual application scene label according to the upper-lower level differentiation result, and a cross-domain expansion container binds a cross-domain knowledge label according to an application scene association result. After the hierarchical expansion container group is expanded, the expanded labels can be subjected to weight differentiation assignment according to different source containers, which means that the expanded labels generated by different containers are given different weight values to represent the importance or applicability of the expanded labels, for example, the labels expanded by semantics can have higher weight because the labels directly affect the accuracy of semantics, and the labels expanded by cross domains can be given lower weight according to practical situations. And finally, summarizing the expansion results of the same question-answer demand label to form N hierarchical demand label subsets, wherein the subsets are more concrete in content and hierarchy to form a hierarchical demand label group, and the label subsets provide more accurate and multidimensional label information for subsequent question-answer matching, so that the accuracy of core answers is ensured.

Further, the method further comprises:

The method comprises the steps of loading a first question-answer demand label into a hierarchical expansion container set, carrying out synonymous term deduction expansion on the first question-answer demand label through a semantic expansion container to obtain a first equivalent expansion label set, loading the first equivalent expansion label set into the hierarchical expansion container to carry out upper and lower level differentiation expansion to obtain a first upper level expansion label set and a first lower level expansion label set, loading the first upper level expansion label set and the first lower level expansion label set into the scene expansion container to carry out multidimensional association scene expansion to obtain a first upper level multidimensional label set and a first lower level multidimensional label set, identifying a first knowledge association node of the first question-answer demand label, carrying out cross-domain label binding on the first upper level multidimensional label set and the first lower level multidimensional label set by taking the first knowledge association node as domain constraint, and obtaining a first upper level cross-domain expansion label set, a first upper level expansion label set, a first lower level expansion label set and a first lower level expansion label set according to label container sources, and assigning a first multi-domain expansion label set.

Alternatively, first, a question-answer demand label is randomly extracted from the N question-answer demand labels as a first question-answer demand label, and then the first question-answer demand label is loaded into the hierarchical expansion container group. After the hierarchical expansion container receives the first question-answer requirement label, the semantic expansion container is used for carrying out synonymous term deduction expansion on the first question-answer requirement label. Specifically, a synonym library, a semantic dictionary or a domain-specific knowledge base bound with a semantic expansion container is queried, words with the same meaning are obtained through a direct matching mode, in addition, a pre-trained industry word vector model is loaded, words with similar meaning are found through calculating cosine similarity, co-occurrence probability of the words with similar meaning and a first question-answer demand label in an industry corpus is counted and screened, words with similar meaning, the co-occurrence probability of which is smaller than or equal to the minimum co-occurrence frequency of the industry, are removed, and then the rest words and the words which are directly matched are summarized to form a first co-extensive label group so as to enrich the semantic range of the label. Then, the generated first homonymous expansion tag group is loaded into a hierarchical expansion container for upper and lower level differentiation expansion, the hierarchical expansion container defines the hierarchical structure of various tags through an industry ontology library (for example, industry standards, technical documents, dictionaries and the like aiming at wind power generation), a tree-shaped upper and lower level structure is formed, each tag in the first homonymous expansion tag group is matched with the tree-shaped upper and lower level structure, and accordingly the first homonymous expansion tag group is divided into a first upper expansion tag group and a first lower expansion tag group, the upper expansion tag group represents the wide or high-level classification of the tags, and the lower tag group represents the tags of more specific or subclasses. And then, loading the first upper expansion tag group and the first lower expansion tag group into a scene expansion container, analyzing the first upper expansion tag group and the first lower expansion tag group through a pre-trained scene classification model (such as TextCNN), and matching a typical service scene, so that the tags are further refined, the first upper multi-dimensional tag group and the first lower multi-dimensional tag group are generated, the expansion enables the tags to be more fit with the requirements of specific scenes, and the applicability of the tags is enhanced. And then, identifying a first knowledge association node of the first question-answer demand label, using the node as a domain constraint, and performing cross-domain label binding on the first upper multi-dimensional label group and the first lower multi-dimensional label group in a cross-domain expansion container, and expanding the labels from the current domain to other related domains through cross-domain binding to generate the first upper cross-domain label group and the first lower cross-domain label group. Finally, weights are assigned to each tag group (first homogeneous extended tag group, first upper extended tag group, first lower extended tag group, first upper multidimensional tag group, first lower multidimensional tag group, first upper cross-domain tag group, first lower cross-domain tag group) according to the source container of the tag, and these weights are differentially assigned according to the source and importance of the tag, for example, the semantic extended tag may have a higher weight because it directly affects the accuracy of the semantic, while the cross-domain extended tag may have a lower weight. Finally, these expanded tags would be combined into a complete first-tier demand tag subset as part of the hierarchical demand tag set, providing a structured input basis for subsequent atlas construction and answer extraction.

Taking a cutter abrasion label as an example, a first synchronous expansion label group generated after the processing of a semantic expansion container is [ cutter abrasion, cutting edge passivation and tool failure ], a first upper expansion label group generated after the processing of a hierarchical expansion container is [ machine tool cutter failure ], a first lower expansion label group is [ rear cutter surface abrasion, front cutter surface crater abrasion and cutting edge collapse defect ], a first upper multidimensional label group generated after the processing of a scene expansion container is [ batch processing cutter management, intelligent manufacturing system cutter monitoring ], a first lower multidimensional label group is [ superalloy milling working condition, high-speed cutting flutter scene and intermittent processing of composite materials ], a first upper cross-domain label group generated after the processing of a cross-domain expansion container is [ cutter life prediction, spare part stock optimization ], a first lower cross-domain label group is [ nickel-based alloy phase transition temperature parameter, cutting vibration spectrum characteristic and carbon fiber layering damage threshold ].

Further, the present application provides that the first knowledge association node is used as a domain constraint, and the cross-domain expansion container binds the cross-domain labels of the first upper multi-dimensional label group and the first lower multi-dimensional label group to obtain a first upper cross-domain label group and a first lower cross-domain label group, where the method further includes:

Extracting industry attribute characteristics of the first knowledge association node, matching a preset cross-domain binding rule base based on the industry attribute characteristics, adopting the cross-domain binding rule base to conduct macroscopic cross-domain binding of the first upper multi-dimensional tag group, outputting the first upper cross-domain tag group, adopting the cross-domain binding rule base to conduct microscopic cross-domain binding of the first lower cross-domain tag group, and outputting the first lower cross-domain tag group.

Optionally, first, the industry attribute feature of the first knowledge association node is extracted, where the industry attribute feature refers to industry specific information related to the node, such as a technical field, an application scenario, a standard requirement, and the like, and these features help to further understand and limit the industry background of the node, so that the subsequent label expansion and binding process can more accurately conform to the actual industry requirement. Then, according to the extracted industry attribute characteristics, a preset cross-domain binding rule base is matched, the most suitable cross-domain rule is selected, and the cross-domain binding rule base is a base containing different inter-industry label association rules, wherein how to effectively bind labels of one field with labels of other related fields is defined. And then, carrying out macroscopic cross-domain binding on the first upper multi-dimensional tag group by adopting the matched cross-domain binding rule library, wherein the macroscopic cross-domain binding mainly relates to the expansion of tags from the current field (such as a specific industry field) to a wider related field, generally cross-industry or a larger range of cross-field, and the step aims at mapping the tags in the first upper multi-dimensional tag group to the tags in other industry fields and generating the first upper cross-domain tag group. Then, the same cross-domain binding rule library is adopted to carry out micro cross-domain binding on the first lower multi-dimensional label group, and the micro cross-domain binding mainly relates to a more specific and more refined layer for expanding the labels from the current field to the related field. Unlike macro cross-domain binding, micro cross-domain binding is more focused on the exact matching of the details of the tag and the actual application scenario. Through this step, the first lower cross-domain tag group is output. Finally, the obtained first upper cross-domain tag group and the first lower cross-domain tag group can span different fields to provide wider and more diverse tag information, so that the adaptability and the accuracy of the question-answering system are enhanced.

And structuring the hierarchical demand label group into a question-answer demand label map according to the cross-hierarchy association relation of labels of the hierarchical demand label group, wherein the question-answer demand label map comprises K layers of expansion demand labels.

In one embodiment, after the hierarchical demand label set is obtained, the labels are structured into question-answer demand label maps according to cross-level associations between labels in the hierarchical demand label set. Specifically, firstly, a plurality of selected question-answer requirement labels are taken as root nodes, the labels are organized into a series of requirement label trees according to the relationship among labels of different levels, and the requirement label trees are expanded from the root nodes layer by layer and show the level relationship among the different labels. Then, a semantic relevance matrix corresponding to each question-answer demand label is locally invoked, and the matrices represent semantic similarity or relevance among the labels. On the basis, a plurality of requirement label trees are connected in an associated mode through a cross-tree connector, semantic association degree and hierarchical relation of the requirement label trees are utilized for accurate matching and connection, a structured question-answer requirement label map is output, the question-answer requirement label map comprises K layers of expansion requirement labels, each layer represents label expansion of different layers, a system can be enabled to accurately locate corresponding hierarchical labels according to a user problem, answer matching and searching are conducted better, and answer accuracy and adaptability are improved.

Further, the present application provides structuring the hierarchical demand label group into a question-answer demand label graph according to a cross-hierarchy association relationship of labels of the hierarchical demand label group, the question-answer demand label graph including K-layer expanded demand labels, the method comprising:

The N question-answer demand labels are used as root nodes, the N level demand label subsets are structured according to the multi-container cascade execution sequence in the level expansion container group to obtain N question-answer demand label trees, N initial semantic relevance matrixes of the N question-answer demand labels are called locally, a cross-tree connector is driven to carry out relevant connection of the N question-answer demand label trees according to the N initial semantic relevance matrixes, and the question-answer demand label map is output.

Optionally, N question-answer demand labels are selected as root nodes, these labels represent core demands of user problems, then according to a multi-container cascade execution sequence defined in a hierarchical expansion container group, namely, a semantic expansion container, a hierarchical expansion container, a scene expansion container and a cross-domain expansion container, a hierarchical demand label subset corresponding to the N root nodes is structured, so that each question-answer demand label subset is expanded layer by layer according to an expansion hierarchy of the container group, a tree structure with K layers of expansion demand labels is generated, for example, a root node "cutter wear" is divided into an upper label "machine tool cutter fault" and a lower label "rear cutter wear" (a second layer) through the hierarchical expansion container, the upper label "machine tool fault" and the lower label "are associated to a" batch processing cutter management "," superalloy milling "(a third layer) through the scene expansion container, and finally, the question-answer demand label tree with depth of k=4 is formed through binding to the cross-domain expansion container to" cutter life prediction "(a fourth layer). After the N question-answer demand label trees are structured, an initial semantic association matrix corresponding to each question-answer demand label is locally called, each matrix is used for measuring semantic relativity among labels, similarity and connection among labels at a semantic level are reflected, the semantic association matrix is usually calculated through a natural language processing algorithm, and semantic distances among the labels and other labels are represented. Then, based on the initial semantic association degree matrixes, a cross-tree connector is driven to carry out association connection between the question-answer requirement label trees, and the cross-tree connector is used for identifying and connecting relevant labels between different question-answer requirement label trees to establish association between label trees. The process guides the connection operation through the semantic similarity information in the association degree matrix so as to ensure the accuracy and the effectiveness of the connection, and finally forms a tightly connected question-answer demand label map which is a multi-layer and structured map, covers the multidimensional demand of the user problem, and can accurately capture the real intention of the user and provide corresponding industry question-answer results. By the method, accurate mapping from the simple labels to the complex requirements can be realized, and the high efficiency and the intelligent level of the question-answering system are ensured.

Further, the method further comprises:

According to the N initial semantic association degree matrixes, driving a cross-tree connector to carry out association connection of the N question-answer demand label trees to obtain initial association patterns, presetting topology optimization rules, wherein the topology optimization rules are a plurality of differentiated optimization strategies of a plurality of association levels, carrying out hierarchical network node optimization of the initial association patterns according to the plurality of differentiated optimization strategies by taking the plurality of association levels as hierarchical topology optimization triggering conditions, and outputting the question-answer demand label patterns.

Optionally, first, according to N initial semantic association matrices, a cross-tree connector is driven to perform association connection on N question-answer requirement label trees, each semantic association matrix reflects semantic similarity among labels, relevant nodes in different label trees are connected by identifying highly relevant label combinations from the initial semantic association matrix, an initial association map is formed, the initial association map comprises a plurality of layers of labels and semantic relations among the labels, and it is ensured that each requirement dimension of a user problem can be accurately reflected in the map. Then, in order to optimize the initial association graph, a preset topology optimization rule is read locally, and a series of differential optimization strategies are defined by the topology optimization rule, and the operations of node recombination, merging, splitting and the like are carried out according to the hierarchical structure of the graph, so that the accessibility and the accuracy of the nodes in the graph are improved. The diversity of the topology optimization rules enables the atlas to be flexibly adjusted under the complex problem scene, and ensures that the association relation of each hierarchy is reasonably optimized. And then, taking a plurality of associated levels as level topology optimization triggering conditions, and starting an optimization process of the map. In this stage, the connection mode between nodes is adjusted by using a differential optimization strategy according to the semantic and structural characteristics of each hierarchy, for example, for nodes with higher similarity, the nodes are converged to reduce redundancy, so that the simplicity and query efficiency of the map are optimized, for nodes frequently accessed in the query, the connection weight between the nodes is enhanced, so that relevant answers can be found more rapidly in the search process, for nodes with the degree of departure and the degree of entry being 0, the nodes are marked as isolated nodes and removed from the map, and for branches with the degree of support (an important index for measuring the association strength of labels and used for evaluating the frequency of co-occurrence of two or more labels in historical query data) being lower than the preset degree of support are pruned. Finally, through the series of optimization, a question and answer demand label map can be obtained, the question and answer demand label map not only can accurately reflect the demands of users, but also can improve the question and answer efficiency in practical application, and can ensure that the industry question and answer service can be provided rapidly and accurately when facing complex user problems.

And carrying out hierarchical label extraction of industry questions and answers on the user question and answer information base along the K-layer expansion demand labels on the question and answer demand label map based on a preset retrieval convergence vector, and outputting a minimum matching domain answer set.

In one embodiment,

Further, the application provides that each node of the scientific research information double-spiral path comprises an index module, wherein the index module comprises a sequential index facing to the next node, a reverse sequential index facing to the first node and a cross index jumping to another spiral path, and parallel spiral search of a plurality of inlets is carried out by the index module at the spiral inlet node.

Optionally, in the scientific research information double-spiral path, each node comprises an index module, the function of the index module is to provide a navigation function for each node, the index module is used for guiding the direction of the search process, the index module comprises three main types of indexes, namely a sequential index, an inverted index and a cross index, wherein the sequential index faces to the next node, namely the next node in the path, which is the next node in the path, and is usually used for searching in sequence, the inverted index faces to the previous node in the path, the retrieval process can trace back and visit the previous node, the cross index is used for guiding the node which jumps to another spiral path, and the index can span different spiral paths and connect the scientific research content path and the scientific research result path, so that information flow between the paths is realized. Upon locating the spiral entry node, the indexing module will assist in performing multi-directional parallel retrieval. Through the index module, not only can the next node be searched downwards (sequential search), but also the previous node can be traced upwards (reverse search), and the related node of another spiral path can be jumped to through the cross index, and the parallel search mode enables the system to search related information in multiple dimensions and directions at the same time. Through the three indexes, under the guidance of the spiral entry node, the related information and achievements can be flexibly switched between scientific research contents and scientific research achievements according to the intention of a user, and the parallel search ensures the comprehensiveness and depth of information acquisition, so that a search return result is determined. Through the cooperation of the index module, the retrieval efficiency can be improved, the full coverage of scientific research information from multiple angles is ensured, and a user can quickly find out the most relevant scientific research content and achievements.

Further, the present application provides for parallel spiral retrieval of multiple portals by the indexing module at the spiral portal node, the method comprising:

And respectively carrying out similarity calculation on the user intention vector and the candidate retrieval points, and determining the next retrieval direction from the candidate retrieval points by the index module according to a similarity calculation result.

Optionally, firstly, starting from a spiral entry node, obtaining candidate retrieval points of the node through an index module, wherein the candidate retrieval points comprise a next node, a previous node and a cross node, the next node is the next node following the current node in the path and is used for continuing to search forward in the path, the previous node is the previous node in the path and is used for backtracking to the front part of the path, and the cross node is a node which jumps from the current path to another spiral path, so that the cross node is allowed to span different scientific research contents or scientific research achievement paths, and the retrieval range is expanded. For each candidate retrieval point, cosine similarity is used for carrying out similarity calculation on the user intention vector and semantic vector codes of the node to measure the matching degree of the user intention vector and the semantic vector codes, and the higher the similarity value is, the more relevant the node and the user retrieval requirement are indicated. And then, according to the similarity calculation results of all candidate search points, the index module selects a node with the highest similarity value as a next search direction, and the next search direction and the spiral entry node form an optimal search path together to guide the subsequent search steps, so that the accuracy and efficiency of search are improved, and the most relevant scientific research information is ensured to be found.

In summary, the embodiment of the application has at least the following technical effects:

The method comprises the steps of firstly carrying out multi-scale expansion on a demand label after labeling processing a user question to obtain a hierarchical demand label group, then structuring the hierarchical demand label group into a question-answer demand label map according to a cross-level association relation of labels of the hierarchical demand label group, wherein the question-answer demand label map comprises K layers of expansion demand labels, then carrying out hierarchical label extraction on a user question-answer information base along the K layers of expansion demand labels on the basis of a preset search convergence vector to output a minimum matching domain answer set, then extracting K-1 layers of retrospective extraction answer sets along the search convergence vector, and finally assembling the minimum matching domain answer set and the association expansion answer set into an enhanced industry question-answer response output. The technical effects jointly solve the technical problems of low answer matching precision and narrow coverage range caused by single understanding level of the traditional industry question-answering system on the user demands and insufficient relevance of the labels, and achieve the technical effects of improving question-answering accuracy, richness and response speed through hierarchical expansion and relevance expansion of multidimensional labels.

In a second embodiment, based on the same inventive concept as the industry question-answer enhancement method based on multi-dimensional label automatic extraction in the previous embodiment, as shown in fig. 2, the application provides an industry question-answer enhancement system based on multi-dimensional label automatic extraction, which comprises a multi-scale expansion module 11, a label structuring module 12, a response output module 15 and a response output module, wherein the multi-scale expansion module 11 is used for obtaining a hierarchical demand label set by carrying out multi-scale expansion of a demand label after labeling a user question, the hierarchical demand label set is structured into a question-answer demand label map according to a label cross-level association relationship of the hierarchical demand label set, the question-answer demand label map comprises K-layer expansion demand labels, the label extraction module 13 is used for carrying out industry question-answer hierarchical label extraction on a user question-answer information base along the K-layer expansion demand labels based on a preset search convergence vector, the minimum matching domain answer set is output, the response output module 14 is used for outputting a response-1 layer back extraction answer set by association expansion, and the response output module 15 is used for outputting the response-enhancement answer set of the minimum matching domain answer set and the response enhancement answer set.

Further, the multi-scale expansion module 11 is further configured to perform the following method:

Further, the tag structuring module 12 is further configured to perform the following method:

Further, the tag extraction module 13 is further configured to perform the following method:

Extracting a core dimension label of a newly added user answer bar to obtain an initial multi-dimension label, performing standardized multi-dimension expansion on the initial multi-dimension label by adopting the hierarchical expansion container group, performing discretization storage on the obtained label group to obtain a discrete matching label group, and adding the newly added user answer into the user question-answer information base after establishing bidirectional index association of the discrete matching label group and the newly added user answer.

Extracting a first layer of expansion requirement labels from the question-answer requirement label atlas along the retrieval convergence vector, traversing the discrete matching label group of each historical user question-answer in the user question-answer information base by adopting the first layer of expansion requirement labels to execute weighted similarity calculation so as to screen and obtain an initial matching domain answer set meeting a predefined similarity threshold, and performing hierarchical label extraction on the initial matching domain answer set along the residual K-1 layer of expansion requirement labels in the question-answer requirement label atlas until the minimum matching domain answer set is output.

It should be noted that the sequence of the embodiments of the present application is only for description, and does not represent the advantages and disadvantages of the embodiments. And the foregoing description has been directed to specific embodiments of this specification. The processes depicted in the accompanying drawings do not necessarily require the particular order shown, nor the sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

The foregoing description of the preferred embodiments of the application is not intended to limit the application to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the application are intended to be included within the scope of the application.

The specification and figures are merely exemplary illustrations of the present application and are considered to cover any and all modifications, variations, combinations, or equivalents that fall within the scope of the application. It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the scope of the application. Thus, the present application is intended to include such modifications and alterations insofar as they come within the scope of the application or the equivalents thereof.

Claims

1. An industry question-answering enhancement method based on automatic extraction of multi-dimensional tags, characterized in that the method comprises:

After labeling user questions, a hierarchical demand label group is obtained by multi-scale expansion of demand labels;

According to the cross-level association relationship of the labels of the hierarchical demand label group, the hierarchical demand label group is structured into a question-answering demand label graph, wherein the question-answering demand label graph includes K layers of extended demand labels;

Based on the preset retrieval convergence vector, extract the hierarchical labels of industry questions and answers from the user question and answer information database along the K-layer extended demand labels in the question and answer demand label graph, and output the minimum matching domain answer set;

After extracting the K-1 layer backtracking extraction answer set along the retrieval convergence vector, the associated extended answer set is output through associated extension aggregation;

The minimum matching domain answer set and the associated expanded answer set are assembled into an enhanced industry question and answer response output.

2. The industry question-answering enhancement method based on automatic extraction of multi-dimensional tags according to claim 1 is characterized in that after labeling the user question, a hierarchical demand tag group is obtained by multi-scale expansion of the demand tags, and the method includes:

Performing domain-adaptive text processing on the user question to generate standardized query text;

Dynamically extract the core semantic units of the standardized query text through a preset industry keyword library to generate a question and answer requirement tag group;

The question-and-answer requirement label group is expanded at multiple scales to obtain the hierarchical requirement label group.

3. The industry question and answer enhancement method based on automatic extraction of multi-dimensional tags according to claim 2, characterized in that the question and answer requirement tag group is multi-scale expanded to obtain a hierarchical requirement tag group, and the method includes:

Pre-constructing a hierarchical extension container group, wherein the hierarchical extension container group includes a cascaded semantic extension container, a hierarchical extension container, a scenario extension container, and a cross-domain extension container;

By loading N question-and-answer requirement labels in the question-and-answer requirement label group into the hierarchical extension container group, performing multi-scale label expansion through the semantic extension container, scenario extension container, and cross-domain extension container, weighting is differentiated according to the container source of the expansion result, and N hierarchical requirement label subsets are output;

The N hierarchical requirement label subsets constitute the hierarchical requirement label group.

4. The industry question-and-answer enhancement method based on automatic extraction of multi-dimensional tags according to claim 3 is characterized in that, based on the cross-hierarchical association relationship of the tags in the hierarchical demand tag group, the hierarchical demand tag group is structured into a question-and-answer demand tag graph, the question-and-answer demand tag graph includes K layers of extended demand tags, and the method comprises:

Taking the N question-and-answer requirement tags as root nodes, and according to the multi-container cascade execution order in the hierarchical extension container group, structurally processing the N hierarchical requirement tag subsets to obtain N question-and-answer requirement tag trees;

Locally calling N initial semantic relevance matrices of the N question-answering requirement labels;

Based on the N initial semantic association matrices, drive the cross-tree connector to perform associative connection of the N question and answer demand label trees, and output the question and answer demand label graph.

5. The industry question-answering enhancement method based on automatic extraction of multi-dimensional tags according to claim 4, characterized in that the method further comprises:

According to the N initial semantic relevance matrices, drive the cross-tree connector to perform associative connection of the N question-answering requirement tag trees to obtain an initial association graph;

Presetting a topology optimization rule, wherein the topology optimization rule is a plurality of differentiated optimization strategies at a plurality of associated levels;

Taking the multiple association levels as trigger conditions for hierarchical topology optimization, the hierarchical network nodes of the initial association graph are optimized according to the multiple differentiated optimization strategies, and the question-and-answer demand label graph is output.

6. The industry question-answering enhancement method based on automatic extraction of multi-dimensional tags according to claim 3, characterized in that the method further comprises:

Extract core dimension labels from newly added user answers to obtain initial multi-dimensional labels;

After performing standardized multi-scale expansion on the initial multi-dimensional labels using the hierarchical expansion container group, the obtained label groups are discretized and stored to obtain discrete matching label groups;

After establishing a bidirectional index association between the discrete matching tag group and the newly added user answer, the newly added user answer is added to the user question and answer information library.

7. The method for enhancing industry Q&A based on automatic extraction of multi-dimensional labels according to claim 6, characterized in that, based on a preset retrieval convergence vector, hierarchical label extraction of industry Q&A is performed on the user Q&A information database along the K-layer extended demand labels in the Q&A demand label graph, and a minimum matching domain answer set is output. The method further comprises:

Extracting the first layer of extended demand labels in the question-answer demand label graph along the retrieval convergence vector;

Using the first layer of extended requirement tags to traverse the discrete matching tag groups of each historical user question and answer in the user question and answer information database to perform weighted similarity calculation to screen out an initial matching domain answer set that meets a predefined similarity threshold;

Similarly, hierarchical label extraction is iteratively performed on the initial matching domain answer set along the remaining K-1 layers of expanded requirement labels in the question-answer requirement label graph until the minimum matching domain answer set is output.

8. The industry question-answering enhancement method based on automatic extraction of multi-dimensional tags according to claim 3, characterized in that the method further comprises:

After loading the first question-and-answer requirement tag into the hierarchical extension container group, performing synonym term derivation and expansion on the first question-and-answer requirement tag via the semantic extension container to obtain a first synonymous extension tag group;

Loading the first synonymous extension tag group into the hierarchical extension container for upper and lower differentiation expansion to obtain a first upper extension tag group and a first lower extension tag group;

Loading the first upper extension tag group and the first lower extension tag group into the scenario extension container to perform multi-dimensional associated scenario extension to obtain a first upper multi-dimensional tag group and a first lower multi-dimensional tag group;

After identifying the first knowledge association node of the first question-and-answer requirement tag, performing cross-domain tag binding on the first upper multidimensional tag group and the first lower multidimensional tag group in the cross-domain extension container using the first knowledge association node as a domain constraint to obtain a first upper cross-domain tag group and a first lower cross-domain tag group;

According to the source of the label container, weights are assigned to the first synonymous extended label group, the first upper extended label group, the first lower extended label group, the first upper multidimensional label group, the first lower multidimensional label group, the first upper cross-domain label combination, and the first lower cross-domain label group, and a first-level requirement label subset is output.

9. The method for enhancing industry Q&A based on automatic extraction of multidimensional tags according to claim 8, characterized in that, with the first knowledge association node as a domain constraint, cross-domain tag binding is performed on the first upper multidimensional tag group and the first lower multidimensional tag group in the cross-domain extension container to obtain the first upper cross-domain tag group and the first lower cross-domain tag group, and the method further comprises:

Extracting industry attribute features of the first knowledge association node, and matching a preset cross-domain binding rule library based on the industry attribute features;

Using the cross-domain binding rule library to perform macro cross-domain binding of the first upper multi-dimensional tag group, and outputting the first upper cross-domain tag group;

The cross-domain binding rule library is used to perform micro-cross-domain binding of the first subordinate cross-domain tag group, and the first subordinate cross-domain tag group is output.

10. An industry question-answering enhancement system based on automatic extraction of multi-dimensional tags, characterized in that the system is used to execute the industry question-answering enhancement method based on automatic extraction of multi-dimensional tags according to any one of claims 1 to 9, and the system comprises:

Multi-scale expansion module: After labeling user questions, it performs multi-scale expansion of demand labels to obtain a hierarchical demand label group;

A label structuring module: structures the hierarchical demand label group into a question-and-answer demand label graph according to the cross-hierarchical association relationship of the labels in the hierarchical demand label group, wherein the question-and-answer demand label graph includes K layers of extended demand labels;

Label extraction module: Based on the preset retrieval convergence vector, the module extracts hierarchical labels of industry questions and answers from the user question and answer information database along the K-layer extended requirement labels in the question and answer requirement label graph, and outputs the minimum matching domain answer set;

Extension aggregation module: after extracting the K-1 layer backtracking extraction answer set along the retrieval convergence vector, output the associated extended answer set through associated extension aggregation;

Response output module: assembles the minimum matching domain answer set and the associated extended answer set into an enhanced industry question-answering response output.