CN113254612B

CN113254612B - Knowledge question answering processing method, device, equipment and storage medium

Info

Publication number: CN113254612B
Application number: CN202110565939.2A
Authority: CN
Inventors: 孙泽烨; 李炫�; 陈思姣
Original assignee: Ping An Life Insurance Company of China Ltd
Current assignee: Ping An Life Insurance Company of China Ltd
Priority date: 2021-05-24
Filing date: 2021-05-24
Publication date: 2024-12-31
Anticipated expiration: 2041-05-24
Also published as: CN113254612A

Abstract

The present invention discloses a knowledge question and answer processing method, device, equipment and storage medium, which relates to the field of language processing technology, and its main purpose is to solve the problem of low efficiency in the existing processing of unconventional knowledge questions and answers. It includes: obtaining the first category of sentence information to be processed for knowledge question and answer, and parsing the data source of the first category of sentence information; matching the trained unified language model according to the parsed data source, the unified language model is trained according to the second category of sentence information in different data sources, and the second category of sentence information has a replacement relationship with the first category of sentence information; performing question and answer processing on the first category of sentence information according to the matched unified language model, and generating question and answer information of the first category of sentence information. It is mainly used for knowledge question and answer processing.

Description

Knowledge question-answering processing method, device, equipment and storage medium

Technical Field

The present invention relates to the field of language processing technologies, and in particular, to a knowledge question-answering processing method, apparatus, device, and storage medium.

Background

With the rapid development of natural language technology, intelligent common problem solutions (Frequently Asked Questions, FAQ) are gradually intelligent, and more enterprises solve various problems of online users in an unmanned manner by using a FAQ question-answering system. Aiming at the problem solutions of commonalities in the FAQ question-answering system, which belong to the problem frequently proposed by users, the problem solutions are completed and accurately realized through methods such as big data processing, machine learning and the like, but aiming at the problem of unusual commonalities in the FAQ question-answering system, the accuracy of the solutions is still lower.

At present, the existing recognition of the non-frequent question-answer sentences generally uses collected question-answer sentences to feed back to a system background for manual writing, but because the occurrence amount of the questions is too large, a large amount of human resources are consumed, and the manual writing is difficult to cover all questions-answers, the processing efficiency of the knowledge questions-answers is affected.

Disclosure of Invention

In view of the above, the present invention provides a knowledge question-answering processing method, device, apparatus and storage medium, and is mainly aimed at solving the problem of low efficiency of the conventional knowledge question-answering processing.

According to one aspect of the present invention, there is provided a knowledge question-answering processing method, including:

acquiring first class sentence information to be subjected to knowledge question-answering processing, and analyzing the data source of the first class sentence information;

The method comprises the steps of completing a unified language model of training according to the analyzed data source matching, wherein the unified language model is obtained by training according to second-class statement information in different data sources, and the second-class statement information has a replacement relationship with the first-class statement information;

And carrying out question-answering processing on the first category sentence information according to the matched unified language model, and generating question-answering information of the first category sentence information.

According to another aspect of the present invention, there is provided a knowledge question-answering processing apparatus including:

The acquisition module is used for acquiring first class statement information to be subjected to knowledge question-answering processing and analyzing the data source of the first class statement information;

the matching module is used for matching a unified language model for completing training according to the analyzed data sources, wherein the unified language model is obtained by training according to second-class statement information in different data sources, and the second-class statement information has a replacement relationship with the first-class statement information;

And the processing module is used for carrying out question-answer processing on the first category sentence information according to the matched unified language model and generating question-answer information of the first category sentence information.

According to still another aspect of the present invention, there is provided a storage medium having stored therein at least one executable instruction for causing a processor to perform operations corresponding to the knowledge question-answering processing method as described above.

According to yet another aspect of the present invention, there is provided a computer device comprising a processor, a memory, a communication interface and a communication bus, the processor, the memory and the communication interface completing communication with each other through the communication bus;

the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the operation corresponding to the knowledge question-answering processing method.

By means of the technical scheme, the technical scheme provided by the embodiment of the invention has at least the following advantages:

Compared with the prior art, the embodiment of the invention acquires the first class sentence information to be subjected to knowledge question-answering processing, analyzes the data source of the first class sentence information, and matches the unified language model which is obtained by training according to the second class sentence information in different data sources, wherein the second class sentence information has a replacement relationship with the first class sentence information, carries out question-answering processing on the first class sentence information according to the matched unified language model, generates question-answer information of the first class sentence information, realizes the full coverage purpose of question-answering aiming at unconventional sentence information, greatly reduces the manpower and material resources for identifying question-answer sentences, and improves the processing efficiency of knowledge-answer.

The foregoing description is only an overview of the present invention, and is intended to be implemented in accordance with the teachings of the present invention in order that the same may be more clearly understood and to make the same and other objects, features and advantages of the present invention more readily apparent.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:

FIG. 1 shows a flow chart of a knowledge question-answering processing method provided by an embodiment of the invention;

FIG. 2 shows a UNILM model network architecture diagram provided by an embodiment of the present invention;

Fig. 3 shows a block diagram of a knowledge question-answering processing apparatus according to an embodiment of the present invention;

fig. 4 shows a schematic structural diagram of a computer device according to an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

The embodiment of the invention provides a knowledge question-answering processing method, which is shown in fig. 1 and comprises the following steps:

101. And acquiring first class statement information to be subjected to knowledge question-answering processing, and analyzing the data source of the first class statement information.

In the embodiment of the invention, the first class sentence information to be subjected to knowledge question-answering is a sentence which is subjected to conventional question-answering recognition once, namely, the first class sentence information is sentence information of which the question-answering information is not found in the preset question-answering library, the second class sentence information is sentence information of which the question-answering information is found in the preset question-answering library, a large amount of question-answering information corresponding to different sentence information is obtained by carrying out operation on the basis of a unified language model which is completed with training is stored in the preset question-answering library, specifically, the first class sentence information is a conventional question-answering sentence which can be recognized (question-answer) pairs, and the question-answering information can be recognized directly based on the established knowledge question-answering library as an unconventional question-answering sentence, so that the unconventional question-answering sentence is the first class sentence information, and the conventional question-answering sentence is the second class sentence information. The data source is used for representing a data storage source of first-class sentence information to be identified, including but not limited to a question-answer knowledge base, a knowledge graph base, a product list database, a product clause database, and the specific analysis method may be determined based on a storage path of the first-class sentence information.

It should be noted that, for the knowledge question and answer in the embodiment of the present invention, the knowledge question and answer processing in the embodiment of the present invention may be performed only by the sentences which cannot be identified based on the knowledge question and answer, which may be applied to the application scenarios of different requirements such as the product transaction application program and the web page question and answer. The conventional question-answer sentences are a large number of question-answer sentences which are already obtained corresponding (question-answer) pairs based on big data analysis, the unconventional question-answer sentences are a large number of question-answer sentences which are not accurately processed yet, and in general, the conventional question-answer sentences and the unconventional question-answer sentences are consultation problems generated by users habitually having questions on different businesses and products, and the embodiment of the invention is not particularly limited.

102. And matching the unified language model which completes training according to the analyzed data sources.

Aiming at the characteristics that different data sources correspond to different data storage structures, in the embodiment of the invention, corresponding unified language models are trained according to training sample sets in the different data sources, wherein the unified language models are obtained by training according to second category statement information in the different data sources, and the different data sources are explained to correspond to the different unified language models, so that the corresponding unified language models can be directly matched according to the analyzed specific data sources. In addition, the second category sentence information is the conventional question-answer sentence, so that a unified language model based on the conventional question-answer sentence completed training is used as a replacement model for question-answer recognition of the non-conventional question-answer sentence. Specifically, the second-class sentence information and the first-class sentence information have a replacement relationship, and the replacement relationship is used for representing the relationship between the first-class sentence information serving as an input parameter of the unified language model and the second sentence information in replacement, namely, the first-class sentence information is used as an input parameter of the unified language model corresponding to the second-class sentence information to be processed, so that question-answer information of the first-class sentence information is obtained.

It should be noted that, since the first category sentence information to be subjected to the knowledge question-answering processing may be already stored in the database of different data sources, or may be expected to be stored in the database of corresponding different data sources, when the data sources are parsed, the first category sentence information may be identified based on the already stored paths or the paths expected to be stored, and the embodiment of the present invention is not limited specifically.

103. And carrying out question-answering processing on the first category sentence information according to the matched unified language model, and generating question-answering information of the first category sentence information.

In the embodiment of the invention, when the data source based on the first class sentence information is matched with the unified language model, the model processing is performed based on the first class sentence as the input parameter of the unified language model, so that the question-answer information corresponding to the first class sentence information is obtained, and the unified language model which is used for completing training and corresponds to the second class sentence information is used as the first class sentence information processing identification model, so that the knowledge question-answer library is expanded, the non-routine question-answer coverage rate is greatly improved, and a large amount of manpower and material resources are saved.

It should be noted that, in the embodiment of the present invention, the question-answer sentence includes sentence information that needs to identify a question and an answer, so that the question-answer information that can be obtained by processing based on the unified language model includes a (question, answer) pair corresponding to the sentence information.

In an embodiment of the invention, in order to realize the solution coverage rate of the irregular questions and answers based on the regular questions and answers database, the data sources comprise a question and answer knowledge base, a knowledge graph base, a product list database and a product clause database, before the unified language model for completing training is matched according to the analyzed data sources, the method further comprises the steps of respectively obtaining second class sentence information training sample sets in the question and answer knowledge base, the knowledge graph base, the product list database and the product clause database, training the unified language model for completing language network construction by utilizing the second class sentence information training sample sets to obtain unified language models which are respectively suitable for the question and answer knowledge base, the knowledge graph base, the product list database and the product clause database and complete training, and establishing an alternative link between input parameters of the unified language model and first class sentence information.

Specifically, in combination with the embodiment of the present invention, the data sources at least include a question-answer knowledge base, a knowledge map base, a product list database, and a product clause database, so that in order to enable the first classification sentence information to be directly replaced with the input parameters of the unified language model for completing training corresponding to the second classification sentence information, training of the unified language model is required in advance. The question-answer knowledge base, the knowledge graph base, the product list database and the product clause data are respectively stored with training sample sets matched with the second class sentence information, namely the second class sentence information training sample sets, and when model training is carried out, corresponding training samples are respectively obtained for different data sources. Then, training a unified language model (UNI ≡pre-trained Language Model, UNILM) by using a second class sentence information training sample set of different data sources, firstly constructing a language network, as shown in fig. 2, and then training a model by using the training sample set. After training, in order to replace the first class sentence information as an input parameter, a replacement link between the input parameter of the unified language model and the first class sentence information is established, so that the training model is directly utilized to process when the first class sentence information is identified.

The training process of UNILM models includes 1 defining a model function loss function, for example, cross entropy, 2 updating model parameters by gradient descent of the loss function, and 3 completing model training when the loss function is smaller than a threshold. The structure of the model function comprises that the final model output is obtained through an embedding layer- > a transmission layer- > an output layer from bottom to top, all inputs are in the form of text sentences, the text is segmented and converted into temporary identification token of word vectors (each token corresponds to a word or punctuation mark), then the embedding layer maps the token into vectors, the transmission layer calculates, and finally the output layer calculates output text (question and answer) pairs (each token of the output text answer pair is the token with the maximum probability value). In addition, the model input parameter X is a series of text sequences, which may be a text segment or a pair of text segments, and in the training process, the main network structure is composed of 24 layers of transformers, and the input vector { xi } is converted into h0= [ X1, x|x| ], and is transmitted to the 24 layers of transformers network, so that model training is performed.

In addition, for the embodiment of the invention, because the output question-answer information is the content of the (question-answer) pair form, when the UNILM model is trained, the output form is preset to be the (question-answer) pair form, so that the obtained question-answer information exists in the (question-answer) pair form no matter whether the input parameter is a single word, a single sentence or a text paragraph, and the recognition coverage of the question-answer information is improved.

In an embodiment of the invention, in order to further define and explain the first category sentence information of different data sources, the first category sentence information can be suitable for a corresponding trained unified language model, so as to realize a wide coverage effect of irregular knowledge question-answer recognition, the question-answer processing of the first category sentence information according to the matched unified language model comprises the steps of taking the first category sentence information as an input parameter of the unified language model according to the replacement link to perform model operation processing if the data sources are question-answer knowledge bases, splitting the first category sentence information according to a triplet form of a knowledge graph if the data sources are knowledge graph bases, performing model operation processing according to the replacement link by taking a split sentence main language and a sentence predicate as input parameter of the unified language model, extracting list structural data of the first category sentence from the product list database if the data sources are product list database, taking the extracted list structural data as input parameter of the unified language model according to the replacement link, and taking the extracted list structural data as the product item database as the item structural parameter of the unified language model, and performing model operation processing if the data sources are product list structural data of the product list database and taking the extracted list structural sentence predicate as the item model.

In the embodiment of the invention, aiming at a specific question-answer knowledge base, such as a FAQ question-answer knowledge base, if the data source is the question-answer knowledge base, question-answer processing is performed by taking the first category sentence information as the input parameter of the unified language model according to the alternative link. Specifically, the FAQ question-answer knowledge base is a database obtained by completing the knowledge question-answer recognition processing of conventional question-answer sentences by a developer, and a large number of question-answer knowledge bases as input entity names, questions and corresponding output (question-answer) pairs are stored, so that when the recognition processing is performed, first-class sentence information is used as input parameters of a unified language model which is trained by using a training sample set of second-class sentence information according to an alternative link, and transportation processing is performed to obtain question-answer information. For example, a UNILM model is trained by using sentence contents such as "Xinsheng insurance feature" in a training sample set of second-class sentence information, after a data source of first-class sentence information is determined as an FAQ question-answer knowledge base, the first-class sentence information is replaced by "what is the selling point of a Jin Rui life" as an input parameter of a UNILM model, and question-answer information is obtained. It should be noted that most of the sentence information stored in the FAQ question-answer knowledge base is suitable for general problems in application scenarios such as insurance products and selection of insurance products, for example, how does a product claim be solved and what characteristics of the product are.

In the embodiment of the invention, if the data source is a knowledge graph library, namely, for a specific knowledge graph library, because the data is stored according to the format of (S, P, O) triples in the knowledge graph library, wherein S represents a subject, P represents a predicate, and O represents an object, for example, the waiting period of peaceful blech is 90 days, the data is stored in the knowledge graph library in the format of (peaceful blech, waiting period, 90 days), therefore, training is completed by taking the triples as input samples when training a UNILM model, and correspondingly, the first-class sentence information is split according to the triples of the knowledge graph, and the split sentence subject and sentence predicate are used as input parameters of the unified language model for question-answer processing according to the replaced link. Specifically, the first classification statement information is split in a triple storage structure of a knowledge graph, a subject obtained by splitting is an entity, a predicate is an attribute, and then the entity obtained by splitting and the attribute are used as input parameters of a UNILM model to operate, so that question and answer identification of the first classification statement information is completed. It should be noted that, most of the sentence information stored in the knowledge graph library is suitable for question-answering of the existing product and question-answering recognition of the existing triplet, including but not limited to the question of product information class and insurance class, for example, recognition of the question-answer pair such as the grace period of peace and happiness.

In the embodiment of the invention, if the data source is a product list database, namely, for a specific product list database, as the data stored in the product list database is a pdf-type list and the stored sentences are structured, list structured data of a first class of sentences are extracted from the product list database, and the extracted list structured data is used as input parameters of a unified language model for question-answering processing according to an alternative link. Specifically, structured data in the pdf list is extracted based on a pipeline technology, namely, table contents are converted into structured sentence contents, after data structuring is completed, the obtained list structured data is used as input parameters of a UNILM model to operate, and therefore question-answer identification of first-class sentence information is completed. The UNILM model after training is completed after the structured data extraction is performed based on the second class sentence information, so that the purpose of replacing the first class sentence information can be achieved, and when the first class sentence information is replaced, the first list of the structured data, namely the list head, needs to be traversed, and the list content is output by the pre-model under the corresponding condition as an input parameter.

Since the table content and the header in the list structure data may be one sentence or a plurality of sentences (text paragraphs), the first class sentence information is processed by the semantic dependency analyzer in order to extract the input parameters as an entity and an intention as a model. For example, the first classification sentence information as a user question and answer is "how much money is paid by a safe man of a man of 16 years old", based on that an entity extraction module in the semantic dependency analyzer extracts "safe man" as an entity and identifies the intention that can get "premium consultation", the semantic dependency analyzer can further analyze two constraint conditions of "16 years old" and "man of" so that the entity and the intention are taken as model input parameters, and the output parameters are determined to be "up to 144 yuan of 80 years old and up to 160 yuan of 100 years old" by combining 2 constraint conditions.

In the embodiment of the invention, if the data source is a product clause database, namely, for a specific product clause database, as the data stored in the product clause database is in pdf form and the stored sentences are structured, the clause structured data of the first class of sentences are extracted from the product clause database, and the extracted clause structured data is used as input parameters of a unified language model for question-answering processing according to an alternative link. Specifically, structured data in the product clause data pdf file is extracted based on a pipeline technology, namely clause content is converted into structured sentence content, and after data structuring is completed, the obtained clause structured data is used as input parameters of a UNILM model to operate, so that question and answer identification of first class sentence information is completed. The UNILM model after training is completed after the structured data is extracted based on the second-class sentence information, so that the purpose of replacing the first-class sentence information can be achieved, and when the first-class sentence information is replaced based on a replacement link, each paragraph text of the structured data needs to be traversed, and model operation is performed by combining the identified intention as an input parameter of model training, so that question-answer identification of the first-class sentence information is completed. In addition, the first category sentence information is product clause data, and the data content in the product clause data is combined with the characteristic of the large paragraph text content, after the structural processing of the product clause data is completed, the content realized based on model operation is extracted from the paragraph text according to the identified intention, and an answer is generated. For example, the large text content in the product clause is "we have the value of the insurance policy after receiving the above-mentioned related proving material of the insurance policy payment application".. after the structuring treatment, the formatted text contents such as that the loan amount is not more than the term of the loan, the longest loan term is not more than 6 months each time, the hesitation period is 20 days are obtained, and then the formatted text contents are used as model input to perform model operation, so as to obtain question-answer information.

In one embodiment of the present invention, for further defining and describing, the parsing the data sources of the first category sentence information includes obtaining a storage path of the first category sentence information, parsing a storage location in the storage path, and determining the data sources of the first category according to a database corresponding to at least one data source matched with the storage location.

In the embodiment of the invention, because different data sources are data sources which are stored according to different data forms, when the data sources are analyzed, a storage path of the first-class statement information is acquired, and the storage path represents a database in which the data of the first-class statement information can be stored or a system path corresponding to the database to be stored. Specifically, the storage position of the storage path is analyzed, namely, contents such as character strings, code identifiers and the like belonging to the storage position in the storage path are screened, so that the data source is judged according to the storage position.

In addition, since the different data sources include at least one of a question-answer knowledge base, a knowledge graph base, a product list database and a product term database, one data can be stored in the question-answer knowledge base, or can be stored in the knowledge graph base, that is, the storage paths can be multiple, in the process of analyzing the data sources, if multiple data sources are analyzed based on the storage paths, such as the question-answer knowledge base, the knowledge graph base, the product list database and the product term database, the data sources are determined according to the priority of the preset question-answer knowledge base > the knowledge graph base > the product list term database.

In an embodiment of the invention, in order to effectively cover all question-answer sentences and improve the answer efficiency of knowledge question-answer, before acquiring first-class sentence information to be subjected to knowledge question-answer processing, the method further comprises the steps of acquiring at least one sentence information requesting knowledge question-answer, searching whether question-answer information of sentence information exists in a question-answer library corresponding to the second-class sentence information, if not, determining the sentence information as the first-class sentence information, and performing knowledge question-answer processing, and if so, determining the searched question-answer information as the question-answer information of the sentence.

In the embodiment of the invention, the conventional question-answer sentence is a question-answer pair which is based on a standard answer obtained after a large amount of data processing and sentence recognition, so that when a user requests to conduct knowledge question-answer, whether the conventional question-answer sentence is the conventional question-answer sentence or not is firstly judged, namely at least one sentence information requesting knowledge question-answer is acquired, and the sentence information represents a template question-answer sentence which is input or selected by the user. Then, whether question-answer information matched with the sentence information exists is searched from a question-answer library established by the second-class sentence information, if so, the sentence information is a conventional question-answer sentence, and if not, the sentence is an unconventional question-answer sentence, and the sentence information is further determined to be the first-class sentence information, so that the recognition method in steps 101 to 103 is carried out.

In an embodiment of the invention, in order to further define and explain, searching whether the question-answer information of the first category sentence information exists in the question-answer library corresponding to the second category sentence information comprises respectively extracting semantic terms in the first category sentence and the second category sentence, calculating the similarity between the semantic terms, and judging whether the question-answer information of the second category sentence information is suitable for the knowledge question-answer of the first category sentence information according to the similarity.

Because the second category sentence information is a conventional question-answer sentence, the corresponding question-answer libraries store (question, answer) pairs which are matched with the conventional question-answer sentence, and therefore, the specific method for searching the question-answer pairs corresponding to the first category sentence information in the question-answer library can be based on the similarity. Specifically, semantic terms corresponding to the first-class sentence information and the second-class sentence information, including subject terms, predicate terms, object terms and the like, can be respectively analyzed based on natural language processing technology, and then the similarity between the semantic terms, namely the similarity between the subject terms and the subject terms of the second-class sentence information in the first-class sentence information, the similarity between the predicate terms and the object terms of the second-class sentence information in the first-class sentence information, and the similarity between the object terms and the object terms of the second-class sentence information in the first-class sentence information are calculated. Judging whether the question-answer information of the second-class sentence information is suitable for the first-class sentence information knowledge question-answer based on the calculated three similarities, namely presetting a similarity threshold according to the similarity degree of word meanings, if any two of the three similarities exceed the similarity threshold, determining that the question-answer information of the second-class sentence information is suitable for the knowledge question-answer of the first-class sentence information, and taking the question-answer information corresponding to the second-class sentence information determined by similarity matching in a question-answer library as the question-answer information of the first-class sentence information to finish identification.

In order to optimize a unified language model and improve recognition accuracy of knowledge question-answer sentences, the method further comprises the steps of receiving a question-answer feedback result obtained by outputting the question-answer information, wherein the question-answer feedback result is used for representing the question-answer satisfaction degree of the question-answer information, determining to convert the first class sentence information into the second class sentence information according to the question-answer feedback result, and determining to update the question-answer information into a second class sentence information training sample set so as to update the model.

Specifically, after the recognition of the first category sentence information is completed based on the unified language model, the obtained question and answer information is fed back to the user, the user determines whether the answer is the answer to be obtained based on the question and answer information, if the answer is the answer obtained by the user, the question and answer feedback result fed back by the user is information which is satisfactory, unsatisfactory, acceptable, unacceptable and the like and indicates the satisfaction degree, namely the question and answer feedback result is used for representing the answer satisfaction degree of the question and answer information. After the current execution end receives the question-answer feedback result, determining whether to convert the first-class sentence information into the second-class sentence information according to the question-answer feedback result, if the question-answer feedback result is satisfied, indicating that the question-answer information identified by the first-class sentence information is correct, and converting the first-class sentence information of the unconventional question-answer into the second-class sentence information of the conventional question-answer. In addition, in order to improve the training efficiency of the model, the question-answer information is the corresponding answer of the first-class sentence information, so that the question-answer information is updated to the second-class sentence information training sample set while the question-answer information is converted, so that when the model is trained again, the model training efficiency is improved based on the updated training set.

Compared with the prior art, the embodiment of the invention obtains the first class sentence information to be subjected to knowledge question-answer processing, analyzes the data source of the first class sentence information, matches a unified language model for completing training according to the analyzed data source, wherein the unified language model is obtained by training according to the second class sentence information in different data sources, the second class sentence information has a replacement relation with the first class sentence information, carries out question-answer processing on the first class sentence information according to the matched unified language model, generates question-answer information of the first class sentence information, achieves the full coverage purpose of realizing question-answer aiming at the unconventional sentence information, greatly reduces the manpower and material resources for identifying the question-answer sentences, and improves the processing efficiency of knowledge question-answer.

Further, as an implementation of the method shown in fig. 1, an embodiment of the present invention provides a knowledge question-answering processing apparatus, as shown in fig. 3, where the apparatus includes:

An obtaining module 21, configured to obtain first-class sentence information to be subjected to knowledge question-answering processing, and parse a data source of the first-class sentence information;

the matching module 22 is configured to match a unified language model that is trained according to the parsed data sources, where the unified language model is obtained by training according to second-class sentence information in different data sources, and the second-class sentence information has a substitution relationship with the first-class sentence information;

And the processing module 23 is configured to perform question-answer processing on the first category sentence information according to the matched unified language model, and generate question-answer information of the first category sentence information.

Further, the first class sentence information is sentence information of which no question and answer information is found in a preset question and answer library, the second class sentence information is sentence information of which the question and answer information is found in the preset question and answer library, the data source comprises a question and answer knowledge library, a knowledge map library, a product list database and a product clause database, the device further comprises a training module, an establishing module,

The acquisition module is further used for respectively acquiring second category sentence information training sample sets in the question-answer knowledge base, the knowledge graph base, the product list database and the product clause database;

the training module is used for training the unified language model constructed by the completion language network by utilizing the second class sentence information training sample set to obtain unified language models which are respectively applicable to the question-answer knowledge base, the knowledge graph base, the product list database and the product clause database and complete training;

The establishing module is used for establishing a replacement link between the input parameters of the unified language model and the first category sentence information so as to determine a replacement relation between the first category sentence information and the second category sentence information, wherein the replacement relation is used for representing the relation between the first category sentence information serving as the input parameters of the unified language model and the second sentence information for replacement.

Further, the method comprises the steps of,

The processing module is specifically configured to perform model operation processing according to the alternative link by using the first class sentence information as an input parameter of the unified language model if the data source is a question-answer knowledge base;

The processing module is specifically configured to split the first type sentence information according to a triplet form of a knowledge graph if the data source is a knowledge graph base, and perform model operation processing according to the split sentence subjects and sentence predicates as input parameters of the unified language model by using the alternative link;

the processing module is specifically configured to extract list structured data of the first class sentence from the product list database if the data source is the product list database, and perform model operation processing according to the extracted list structured data as an input parameter of the unified language model by using the alternative link;

The processing module is specifically configured to extract the clause structured data of the first class sentence from the product clause database if the data source is the product clause database, and perform model operation processing according to the alternative link by using the extracted clause structured data as an input parameter of the unified language model.

Further, the acquisition module includes:

The acquisition unit is used for acquiring a storage path of the first class statement information and analyzing a storage position in the storage path;

and the determining unit is used for determining the data source of the first category according to the database corresponding to the at least one data source matched with the storage position.

Further, the apparatus further comprises:

the acquisition module is used for acquiring at least one statement information of the request knowledge question and answer;

The searching module is used for searching whether question-answer information of the sentence information exists in a question-answer library corresponding to the second-class sentence information;

The first determining module is used for determining that the statement information is first-class statement information if the statement information does not exist, and performing knowledge question-answering processing;

and the second determining module is used for determining that the searched question-answer information is the question-answer information of the sentence if the question-answer information exists.

Further, the search module includes:

The extraction unit is used for respectively extracting semantic terms in the first class sentences and the second class sentences and calculating the similarity between the semantic terms;

And the judging unit is used for judging whether the question-answer information of the second-class statement information is suitable for the knowledge question-answer of the first-class statement information according to the similarity.

Further, the apparatus further comprises:

The receiving module is used for receiving a question-answer feedback result obtained according to the output of the question-answer information, and the question-answer feedback result is used for representing the answer satisfaction degree of the question-answer information;

And the judging module is used for determining to convert the first class sentence information into the second class sentence information according to the question-answer feedback result and determining to update the question-answer information into a second class sentence information training sample set so as to update the model.

According to an embodiment of the present invention, there is provided a storage medium storing at least one executable instruction that can execute the knowledge question-answering processing method in any of the above-described method embodiments.

Fig. 4 is a schematic structural diagram of a computer device according to an embodiment of the present invention, and the specific embodiment of the present invention is not limited to the specific implementation of the computer device.

As shown in FIG. 4, the computer device may include a processor 302, a communication interface (Communications Interface) 304, a memory 306, and a communication bus 308.

Wherein the processor 302, the communication interface 304, and the memory 306 communicate with each other via a communication bus 308.

A communication interface 304 for communicating with network elements of other devices, such as clients or other servers.

The processor 302 is configured to execute the program 310, and may specifically execute relevant steps in the above-described embodiment of the knowledge question-answering processing method.

In particular, program 310 may include program code including computer-operating instructions.

The processor 302 may be a central processing unit CPU, or an Application-specific integrated Circuit ASIC (Application SPECIFIC INTEGRATED Circuit), or one or more integrated circuits configured to implement embodiments of the present invention. The computer device may include one or more processors of the same type, such as one or more CPUs, or of different types, such as one or more CPUs and one or more ASICs.

Memory 306 for storing programs 310. Memory 306 may comprise high-speed RAM memory or may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

Program 310 may be specifically operable to cause processor 302 to:

It will be appreciated by those skilled in the art that the modules or steps of the invention described above may be implemented in a general purpose computing device, they may be concentrated on a single computing device, or distributed across a network of computing devices, they may alternatively be implemented in program code executable by computing devices, so that they may be stored in a memory device for execution by computing devices, and in some cases, the steps shown or described may be performed in a different order than that shown or described, or they may be separately fabricated into individual integrated circuit modules, or multiple modules or steps within them may be fabricated into a single integrated circuit module for implementation. Thus, the present invention is not limited to any specific combination of hardware and software.

The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A knowledge question answering method, characterized by comprising:

Acquire first category sentence information to be processed for knowledge question answering, and analyze the data source of the first category sentence information;

A unified language model trained according to the parsed data source matching, wherein the unified language model is trained according to second category sentence information in different data sources, and the second category sentence information has a replacement relationship with the first category sentence information;

Performing question-answering processing on the first category sentence information according to the matched unified language model to generate question-answering information of the first category sentence information;

Wherein, the first category of sentence information is sentence information for which question and answer information has not been found in the preset question and answer library, and the second category of sentence information is sentence information for which question and answer information has been found in the preset question and answer library, and the data sources include a question and answer knowledge base, a knowledge graph library, a product list database, and a product terms database. Before the training unified language model is completed according to the parsed data source matching, the method further includes:

Respectively obtain a second category sentence information training sample set from the question-answer knowledge base, the knowledge graph database, the product list database, and the product terms database;

The unified language model that has completed the language network construction is trained using the second category sentence information training sample set to obtain a unified language model that is respectively applicable to the question-answer knowledge base, the knowledge graph database, the product list database, and the product terms database and has completed the training;

Establishing a replacement link between the input parameter of the unified language model and the first category sentence information to determine a replacement relationship between the first category sentence information and the second category sentence information, wherein the replacement relationship is used to characterize a relationship in which the first category sentence information, as an input parameter of the unified language model, replaces the second sentence information;

The performing question-answering processing on the first category sentence information according to the matched unified language model includes:

If the data source is a question-answer knowledge base, the first category sentence information is used as an input parameter of the unified language model to perform model operation processing according to the replacement link;

If the data source is a knowledge graph library, the first category of sentence information is split according to the triple form of the knowledge graph, and the split sentence subject and sentence predicate are used as input parameters of the unified language model for model operation processing according to the replacement link;

If the data source is a product list database, extracting list structured data of the first category sentence from the product list database, and using the extracted list structured data as input parameters of the unified language model for model operation processing according to the replacement link;

If the data source is a product terms database, the terms structured data of the first category of sentences are extracted from the product terms database, and the extracted terms structured data are used as input parameters of the unified language model for model calculation processing according to the replacement link.

2. The method according to claim 1, characterized in that the data source for parsing the first category of sentence information includes:

Obtaining a storage path for the first category statement information, and resolving a storage location in the storage path;

The data source of the first category is determined based on the storage location matching the database corresponding to at least one data source.

3. The method according to claim 1, characterized in that before obtaining the first category of sentence information to be processed for knowledge question answering, the method further comprises:

Collect at least one sentence information of requesting knowledge question and answer;

Searching from the question-and-answer database corresponding to the second category of sentence information whether there is question-and-answer information for the sentence information;

If not, determining that the sentence information is the first category of sentence information, and performing knowledge question and answer processing;

If so, it is determined that the found question and answer information is the question and answer information of the sentence.

4. The method according to claim 3, wherein searching the question-answer database corresponding to the second category statement information for question-answer information of the first category statement information comprises:

extracting semantic words from the first category sentences and the second category sentences respectively, and calculating the similarity between the semantic words;

It is determined whether the question and answer information of the second category sentence information is applicable to the knowledge question and answer of the first category sentence information according to the similarity.

5. The method according to any one of claims 1 to 4, characterized in that the method further comprises:

receiving a question-and-answer feedback result obtained according to outputting the question-and-answer information, wherein the question-and-answer feedback result is used to indicate a satisfaction degree of an answer to the question-and-answer information;

According to the question and answer feedback result, determine to convert the first category sentence information into the second category sentence information, and determine to update the question and answer information to the second category sentence information training sample set to perform model updating.

6. A knowledge question and answer processing device, characterized by comprising:

An acquisition module, used to acquire first category sentence information to be processed for knowledge question answering, and analyze the data source of the first category sentence information;

A matching module, used for matching a trained unified language model according to the parsed data source, wherein the unified language model is trained according to the second category sentence information in different data sources, and the second category sentence information has a replacement relationship with the first category sentence information;

a processing module, configured to perform question-answering processing on the first category of sentence information according to the matched unified language model, and generate question-answering information of the first category of sentence information;

The first category of sentence information is sentence information for which question and answer information has not been found in the preset question and answer library, and the second category of sentence information is sentence information for which question and answer information has been found in the preset question and answer library. The data sources include a question and answer knowledge base, a knowledge graph library, a product list database, and a product terms database. The device also includes: a training module, an establishment module,

The acquisition module is further used to respectively acquire a training sample set of second category sentence information from the question-answer knowledge base, the knowledge graph database, the product list database, and the product terms database;

The training module is used to train the unified language model that has completed the language network construction using the second category sentence information training sample set, to obtain a unified language model that is respectively applicable to the question-answer knowledge base, the knowledge graph database, the product list database, and the product terms database and has completed the training;

The establishing module is used to establish a replacement link between the input parameter of the unified language model and the first category sentence information to determine the replacement relationship between the first category sentence information and the second category sentence information, wherein the replacement relationship is used to characterize the relationship between the first category sentence information as the input parameter of the unified language model and the second sentence information for replacement;

The processing module is specifically configured to, if the data source is a question-answer knowledge base, use the first category sentence information as an input parameter of the unified language model to perform model operation processing according to the replacement link;

The processing module is specifically used to split the first category sentence information according to the triple form of the knowledge graph if the data source is a knowledge graph library, and use the split sentence subject and sentence predicate as input parameters of the unified language model for model operation processing according to the replacement link;

The processing module is specifically configured to extract the list structured data of the first category sentence from the product list database if the data source is a product list database, and use the extracted list structured data as an input parameter of the unified language model for model operation processing according to the replacement link;

The processing module is specifically used to extract the clause structured data of the first category statement from the product clause database if the data source is the product clause database, and use the extracted clause structured data as input parameters of the unified language model for model operation processing according to the replacement link.

7. A storage medium storing at least one executable instruction, wherein the executable instruction enables a processor to execute an operation corresponding to the knowledge question and answer processing method as described in any one of claims 1-5.

8. A computer device, comprising: a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface communicate with each other via the communication bus;

The memory is used to store at least one executable instruction, and the executable instruction enables the processor to perform operations corresponding to the knowledge question and answer processing method as described in any one of claims 1-5.