Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
It should be noted that although functional block division is performed in a device diagram and a logic sequence is shown in a flowchart, in some cases, the steps shown or described may be performed in a different order than the block division in the device, or in the flowchart. The terms first, second and the like in the description and in the claims and in the above-described figures, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the application only and is not intended to be limiting of the application.
First, several nouns involved in the present application are parsed:
artificial intelligence (ARTIFICIAL INTELLIGENCE, AI), a new technical science that explores, develops, simulates, extends and expands the theory, method, technology and application of human intelligence, is a branch of computer science that attempts to understand the essence of intelligence and produces a new intelligent machine that reacts in a similar way to human intelligence, including robotics, language recognition, image recognition, natural language processing, and expert systems. Artificial intelligence can simulate the information process of consciousness and thinking of people. Artificial intelligence is also a theory, method, technique, and application system that utilizes a digital computer or digital computer-controlled machine to simulate, extend, and expand human intelligence, sense the environment, acquire knowledge, and use knowledge to obtain optimal results.
Natural language processing (natural language processing, NLP) NLP is a branch of artificial intelligence, which is a interdisciplinary of computer science and linguistics, and is often referred to as computational linguistics, where NLP is processed, understood, and applied in human language (e.g., chinese, english, etc.). Natural language processing includes parsing, semantic analysis, chapter understanding, and the like. Natural language processing is commonly used in the technical fields of machine translation, handwriting and print character recognition, voice recognition and text-to-speech conversion, information intent recognition, information extraction and filtering, text classification and clustering, public opinion analysis and opinion mining, and the like, and relates to data mining, machine learning, knowledge acquisition, knowledge engineering, artificial intelligence research, linguistic research related to language calculation, and the like.
Information extraction Information Extraction extracts fact information of specified type of entities, relations, events, etc. from natural language text and forms text processing technique of structured data output. Information extraction is a technique for extracting specific information from text data. Text data is made up of specific units, such as sentences, paragraphs, chapters, and text information is made up of small specific units, such as words, phrases, sentences, paragraphs, or a combination of these specific units. The noun phrase, the name of a person, the name of a place, etc. in the extracted text data are all text information extraction, and of course, the information extracted by the text information extraction technology can be various types of information.
Encoding (encoder) converts the input sequence into a fixed length vector.
Embedding (Embedding) Embedding Layer is word embedding that is jointly learned with neural network models on specific natural language processing. The embedding method performs one hot encoding (thermal encoding) of words in the cleaned text, and the size or dimension of the vector space is specified as part of a model, for example, 50, 100, or 300 dimensions. The vector is initialized with a small random number. Embedding Layer are used in the front end of the neural network and supervised using a back propagation algorithm. The encoded words are mapped into word vectors, which are concatenated before being input into the model if a multi-layer perceptron model MLP is used. If a recurrent neural network RNN is used, each word can be entered as one of the sequences. This method of learning the embedding layer requires a lot of training data, which can be slow, but can learn to train out the embedding model for both specific text data and NLP. Embedding is a vector representation, which means that an object is represented by a low-dimensional vector, which may be a word, a commodity, a movie, etc., and the Embedding vector is characterized by the fact that objects corresponding to similar vectors are similar in meaning, such as Embedding (the revenge alliance) and Embedding (the iron man) are closely spaced, but Embedding (the revenge alliance) and Embedding (the wizard) are far apart. Embedding is essentially a mapping from semantic space to vector space, while maintaining as close as possible the relationship of the original samples in semantic space in vector space, e.g., the two words that are semantically close are also located closer together in vector space. Embedding can encode objects with low-dimensional vectors and preserve their meaning, often applied to machine learning, to improve efficiency by encoding objects into a low-dimensional dense vector and then transmitting the dense vector to DNN during machine learning model construction.
A Long Short-Term Memory (LSTM) is a time-circulating neural network, which is specially designed to solve the Long-Term dependence problem of common RNNs (circulating neural networks), all of which have a chain form of repeated neural network modules. In a standard RNN, this repeated structural module has only a very simple structure, such as a tanh layer. LSTM is a type of neural network that contains LSTM blocks (blocks) or others, which may be described as intelligent network elements in literature or other data because they can memorize values for indefinite lengths of time, and a gate in a block can determine if input is important enough to be memorized and not output.
Bi-directional Long Short-terminal Memory (Bi-LSTM) is composed of forward LSTM and backward LSTM. Are commonly used in natural language processing tasks to model context information. Bi-LSTM combines information of the input sequence in both forward and backward directions on the basis of LSTM. For the output of time t, the forward LSTM layer has information of time t and previous times in the input sequence, and the backward LSTM layer has information of time t and subsequent times in the input sequence. The output of the forward LSTM layer at time t is denoted as X, the output of the backward LSTM layer at time t is denoted as Y, and the vectors output by the two LSTM layers can be processed by adding, averaging or connecting.
A segmenter (Tokenizer) divides the text into individual individuals (typically words).
Softmax function is a normalized exponential function that can "compress" a K-dimensional vector z containing arbitrary real numbers into another K-dimensional real vector σ (z) such that each element ranges between (0, 1) and the sum of all elements is 1, which is commonly used in multi-classification problems.
Normalization is performed in two ways, one is to change the number to a fraction between (0, 1) and one is to change the dimensionality expression to a dimensionless expression. The method is mainly used for conveniently providing data processing, mapping the data to the range of 0-1 for processing, and is more convenient and rapid, and the method belongs to the digital signal processing category.
In the field of intelligent prediction, the prediction of answers according to questions can be realized generally, and at present, the prediction is usually realized by extracting matched answers from a database according to the questions, in a specific question-answer scene, the method usually predicts each question independently, and the matching degree of the predicted answers and the questions is not high enough, so that how to improve the matching accuracy of the questions and the answers becomes a technical problem to be solved urgently.
Based on the above, the embodiment of the application provides a question-answer matching method, a question-answer matching device, electronic equipment and a storage medium, aiming at improving the matching accuracy of questions and answers.
The question-answer matching method, the question-answer matching device, the electronic equipment and the storage medium provided by the embodiment of the application are specifically described through the following embodiments, and the question-answer matching method in the embodiment of the application is described first.
The embodiment of the application can acquire and process the related data based on the artificial intelligence technology. Wherein artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) is the theory, method, technique, and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend, and expand human intelligence, sense the environment, acquire knowledge, and use knowledge to obtain optimal results.
Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.
The application provides a question-answer matching method, and relates to the technical field of artificial intelligence. The question-answer matching method provided by the embodiment of the application can be applied to a terminal, a server and software running in the terminal or the server. In some embodiments, the terminal may be a smart phone, a tablet computer, a notebook computer, a desktop computer, etc., the server may be configured as an independent physical server, may be configured as a server cluster or a distributed system formed by a plurality of physical servers, and may be configured as a cloud server for providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, and basic cloud computing services such as big data and artificial intelligent platforms, and the software may be an application for implementing a question-answer matching method, but is not limited to the above form.
The application is operational with numerous general purpose or special purpose computer system environments or configurations. Such as a personal computer, a server computer, a hand-held or portable device, a tablet device, a multiprocessor system, a microprocessor-based system, a set top box, a programmable consumer electronics, a network PC, a minicomputer, a mainframe computer, a distributed computing environment that includes any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
Fig. 1 is an optional flowchart of a question-answer matching method provided in an embodiment of the present application, where the method in fig. 1 may include, but is not limited to, steps S101 to S106.
Step S101, obtaining original sentence data to be processed;
step S102, extracting context characteristics of original sentence data to obtain initial sentence data;
step S103, performing sequence labeling on the initial sentence data to obtain target question data and candidate answer data, wherein the target question data comprises at least one target question, the candidate answer data comprises at least one candidate answer, and each candidate answer is used for solving one of the target questions;
step S104, constructing an initial question-answer pair according to the target question and the candidate answer;
step S105, extracting features of the initial question-answer pair through a preset question-answer matching model to obtain question-answer semantic features;
And S106, carrying out matching probability calculation on the semantic features of the questions and the answers through a question and answer matching model to obtain question and answer matching values, and screening an initial question and answer pair according to the question and answer matching values to obtain a target question and answer pair, wherein the target question and answer pair comprises target questions and target answers corresponding to the target questions.
In the steps S101 to S106 shown in the embodiment of the application, the original sentence data to be processed is obtained, and the context characteristics of the original sentence data are extracted to obtain the original sentence data, so that the context semantic information of the original sentence can be well reserved, and the semantic integrity of the original sentence is improved. Further, the initial sentence data is subjected to sequence labeling to obtain target question data and candidate answer data, the target question data comprises at least one target question, the candidate answer data comprises at least one candidate answer, each candidate answer is used for solving one of the target questions, the target question and the candidate answer in the initial sentence data can be split into independent text data according to labeling information, and accordingly the target question and the candidate answer are paired to construct initial question-answer pairs, and each initial question-answer pair comprises one target question and one candidate answer. Finally, extracting features of the initial question-answer pair through a preset question-answer matching model to obtain question-answer semantic features, extracting question-answer characteristic information of the initial question-answer pair, carrying out matching probability calculation on the question-answer semantic features to obtain a question-answer matching value, and screening the initial question-answer pair according to the question-answer matching value to obtain a target question-answer pair, wherein the target question-answer pair comprises a target question and a target answer corresponding to the target question, and the method can better capture complete semantic information of the target question and a candidate answer, improve matching effects of the target question and the target answer, and improve matching accuracy of the question and the answer.
In step S101 of some embodiments, the data may be crawled in a targeted manner after the data source is set by writing a web crawler, so as to obtain the original sentence data to be processed. The original sentence data may also be acquired by other means, not limited thereto. The original sentence data may be text data, and the original sentence data mainly includes two sentences, i.e., a question sentence and an answer sentence.
Referring to fig. 2, in some embodiments, step S102 may include, but is not limited to, steps S201 to S202:
Step S201, performing feature embedding processing on the original sentence data to obtain sentence embedded vectors;
Step S202, extracting context characteristics of the sentence embedded vector through a preset attention mechanism model to obtain initial sentence data.
In step S201 of some embodiments, feature embedding is performed on the original sentence data by a pre-set pre-training model, where the feature embedding process includes embedding a question sentence portion and an answer sentence portion in the original sentence data, and the pre-set pre-training model may be roberta model or the like. For example, feature embedding processing is performed on the original sentence data through roberta models to obtain initial word embedding data and initial sentence embedding data corresponding to the original sentence data, multi-head attention calculation is performed on the word embedding data to obtain candidate sentence embedding data, and splicing processing is performed on the candidate sentence embedding data and the initial sentence embedding data to obtain sentence embedding vectors.
In step S202 of some embodiments, a preset attention mechanism model may be constructed based on an LSTM algorithm or a Bi-LSTM algorithm, for example, the Bi-LSTM algorithm may be used to encode the sentence embedded vector according to a left-to-right sequence through the attention mechanism model to obtain a first feature vector a, then encode the sentence embedded vector according to a right-to-left sequence to obtain a second feature vector b, and the first feature vector and the second feature vector are processed by using a vector addition method, a vector averaging method, and the like, so that context semantic information of the sentence embedded vector can be better captured, thereby obtaining initial sentence data, where the initial sentence data is mainly CLS feature data.
It should be noted that, the CLS feature is mainly used to obtain the representation of the sentence-level information through the self-attention mechanism, and in different tasks, the CLS feature may be used to represent the context information in a specific environment.
The above steps S201 to S202 can better preserve the context semantic information of the original sentence, thereby improving the semantic integrity of the original sentence.
Referring to fig. 3, in some embodiments, step S201 may include, but is not limited to, steps S301 to S303:
Step S301, word segmentation processing is carried out on the original sentence data to obtain an original word segment;
step S302, carrying out word embedding processing on the original word segments to obtain original word embedding vectors;
Step S303, the original word embedded vector is spliced to obtain a sentence embedded vector.
In step S301 of some embodiments, word segmentation processing is performed on the original sentence data by using a preset Jieba word segmentation device, specifically, a dictionary file in the Jieba word segmentation device is loaded first, each word in the Jieba word segmentation device and the number of occurrences of the word are obtained, further, the dictionary file is traversed, a directed acyclic graph of all possible word segmentation conditions in the original sentence data is constructed by using a character string matching method, and the maximum probability in all paths from each kanji node to the end of a sentence in the original sentence data is calculated, and at the same time, the end position of the corresponding kanji word segment in the directed acyclic graph when the maximum probability is recorded. Finally, the word segments of the original sentence data machine are segmented according to the node paths, and the original word segments are obtained. In addition, for the original sentence data without corresponding words in the dictionary file, the original word segment can be obtained by processing through a statistical method.
In step S302 of some embodiments, word embedding processing is performed on the original word segment, so as to map the original word segment from the semantic space to the vector space, and obtain an original word embedded vector.
In step S303 of some embodiments, the original word-embedded vectors are spliced according to the sequence of the original word segments in the whole original sentence data, so as to obtain a complete long vector, which is the sentence-embedded vector.
Referring to fig. 4, in some embodiments, step S103 may include, but is not limited to, steps S401 to S402:
step S401, carrying out position prediction on the initial sentence data according to a preset first function and BIO labels to obtain a question position label and an answer position label of the initial sentence data;
Step S402, the initial sentence data is segmented according to the question position label to obtain target question data, and the initial sentence data is segmented according to the answer position label to obtain candidate answer data.
In step S401 of some embodiments, when position prediction is performed on the initial sentence data according to the preset first function, the first function may use a softmax function to perform class functions, and meanwhile, a joint labeling manner may be introduced, that is, a manner of using a BIO label to perform labeling processing on different sentence portions of each initial sentence by using different labels, where in the BIO label, B-X generally means that a segment where the portion is located belongs to X and is located at a beginning position, I-X generally means that a segment where the portion is located belongs to X and is located at an intermediate position, and O generally means that the segment does not belong to any portion.
Specifically, according to the BIO label, a preset BIO label is set as a question position label and an answer position label, wherein the question position label comprises a question starting label Bq, a question middle label Iq and a question ending label Oq, and the answer position label comprises an answer starting label Ba, an answer middle label Ia and an answer ending label Oa. And carrying out position probability calculation on the initial sentence data through a softmax function to obtain probability distribution conditions (namely position probability vectors) of the initial sentence data corresponding to each preset position label. The matching degree of the sentence fragments corresponding to the initial sentence data and the preset position labels is reflected through the probability distribution condition, if the initial probability vector is larger, the probability that the sentence fragments belong to the preset position labels is larger, therefore, the preset position label corresponding to the position probability vector with the largest numerical value is selected as the position label of each sentence fragment, and the question position label and the answer position label of the initial sentence data are obtained.
It should be noted that, each target question includes a question content and a question location tag, the question location tag is used for marking a location of the question content, each candidate answer includes an answer content and an answer location tag, and the answer location tag is used for marking a location of the answer content.
For example, for an initial sentence, all articles are of advantage and disadvantage, please point out the advantages and disadvantages of this text for the text content and results. Asking a question again, what is the core idea of the article? each sentence fragment and corresponding position tag may constitute a < sentence fragment n, the labeling of the location tag n >, i.e. < all articles are of advantage and disadvantage, bq1>, < please refer to the content and results of the articles, iq1>, < points out the advantages and disadvantages herein, iq2>, < ask a question again, oq1>, < what the core idea of the article is? the objective problem in the initial sentence is "please refer to the content and result of the article, point out the advantages and disadvantages of the text" and "what the core idea of the article is".
Referring to fig. 5, in some embodiments, step S402 may include, but is not limited to, steps S501 to S502:
Step S501, extracting a problem start tag and a problem end tag in a problem position tag, and dividing initial sentence data according to the problem start tag and the problem end tag to obtain target problem data;
step S502, extracting an answer start tag and an answer end tag in the answer position tag, and dividing the candidate answer data according to the answer start tag and the answer end tag to obtain the candidate answer data.
In step S501 of some embodiments, the question location tag includes a question start tag Bq, a question intermediate tag Iq, and a question end tag Oq, and since a target question can be formed between the question start tag Bq and the question end tag Oq, the initial sentence data is divided according to the question start tag and the question end tag, and sentence fragments between the question start tag Bq and the question end tag Oq are intercepted, so as to obtain target question data.
In step S502 of some embodiments, the answer position label includes an answer start label Ba, an answer middle label Ia, and an answer end label Oa, and since a candidate question can be formed between the answer start label Ba and the answer end label Oa, the initial sentence data is divided according to the answer start label and the answer end label to intercept the sentence fragments between the answer start label Ba and the answer end label Oa, and candidate answer data is obtained.
Through the steps S501 to S502, the initial sentence data can be split into a plurality of individual target questions and candidate answers according to the question position tags and the answer position tags while the context semantic information of the original sentence is maintained, so that the matching accuracy of the questions and the answers is improved.
In step S104 of some embodiments, an initial question-answer pair is constructed according to the target question and the candidate answers, and the target question and each candidate answer are paired to form a one-to-many mapping relationship between the target question and the candidate answer, so as to obtain the initial question-answer pair, where each initial question-answer pair includes one target question and one candidate answer.
Before step S105 of some embodiments, the question-answer matching method further includes pre-training a question-answer matching model, where the question-answer matching model may be constructed based on the Roberta model, and the question-answer matching model includes a coding layer and a linear layer, where the coding layer is mainly used to perform coding processing on an input question-answer pair, capture CLS features of the input question-answer pair, and the linear layer is mainly used to perform probability calculation on the CLS features, and determine a degree of correlation between a question and an answer in each question-answer pair. When the question-answer matching model is trained, a sample question-answer pair is input into the question-answer matching model, and model loss is calculated through a loss function of the question-answer matching model, wherein the loss function can be a commonly used cross entropy loss function, meanwhile, a gradient descent method can be adopted to carry out backward propagation on the model loss, and model parameters of the question-answer matching are adjusted according to the model loss so as to train the question-answer matching model.
Referring to fig. 6, in some embodiments, step S105 includes, but is not limited to, steps S601 to S602:
Step S601, carrying out coding processing on an initial question-answer pair through a question-answer matching model to obtain a question-answer coding vector;
Step S602, carrying out normalization processing on the question-answer coding vector to obtain question-answer semantic features, wherein the question-answer semantic features are characterization features used for characterizing context semantic information of sentences.
In step S601 of some embodiments, word embedding processing may be performed on the initial question-answer pair by using a transducer algorithm of the question-answer matching model, so that mapping of the initial question-answer pair from a semantic space to a vector space is achieved, and a relation between the initial question-answer pair in the semantic space can be maintained in the vector space, so as to obtain a question-answer coding vector in an embedded form.
In step S602 of some embodiments, the question-answer encoding vector is normalized by the question-answer matching model, and the question-answer encoding vector is converted from a dimensionalized form into a question-answer semantic feature in a dimensionless representation form, where the question-answer semantic feature is a CLS feature that can be used to characterize semantic context information of a question-answer pair in a question-answer matching scenario.
In some embodiments, the specific process of performing the encoding process and the normalization process on the initial question-answer pair through the question-answer matching model may be expressed as shown in formula (1):
h cls =bert (X) formula (1)
Wherein h cls is a question-answer semantic feature, X is an input initial question-answer pair, and BERT is an encoding and normalization operation process.
Referring to fig. 7, in some embodiments, step S106 may include, but is not limited to, steps S701 to S702:
Step S701, carrying out matching probability calculation on the question-answer semantic features through a second function of the question-answer matching model to obtain a question-answer matching value;
Step S702, taking the initial question-answer pair with the largest question-answer matching value as a target question-answer pair.
In step S701 of some embodiments, the second function may be a classification function such as a softmax function or sigmod functions, and by taking the softmax function as an example, the matching probability calculation is performed on the question-answer semantic features by the softmax function, so as to obtain the probability distribution condition of each question-answer semantic feature in the preset classification label, where the probability distribution condition is a question-answer matching value, if the question-answer matching value is greater than a preset matching threshold, it indicates that the target question and the candidate answer of the initial question-answer pair corresponding to the question-answer semantic feature are related, and if the question-answer matching value is less than or equal to the preset matching threshold, it indicates that the target question and the candidate answer of the initial question-answer pair corresponding to the question-answer semantic feature are not related, and by this way, the degree of correlation between the target question and the candidate answer of each initial question-answer pair can be obtained more conveniently.
Further, in order to represent whether the target questions and the candidate answers of the initial question-answer pair are related, the initial question-answer pair can be classified according to the question-answer matching value and a preset matching threshold, classification labels of the initial question-answer pair with the question-answer matching value larger than the preset matching threshold are marked as related and are represented by the number 1, and classification labels of the initial question-answer pair with the question-answer matching value smaller than or equal to the preset matching threshold are marked as uncorrelated and are represented by the number 0.
In step S702 of some embodiments, since the larger the question-answer matching value is, the higher the correlation degree between the target question and the candidate answer of the initial question-answer pair is, the initial question-answer pair with the largest question-answer matching value is selected as the target question-answer pair, where the target question-answer pair includes the target question and the target answer corresponding to the target question.
According to the question-answer matching method, the original sentence data to be processed is obtained, the context characteristics of the original sentence data are extracted, the original sentence data are obtained, the context semantic information of the original sentence can be well reserved, and the semantic integrity of the original sentence is improved. Further, the initial sentence data is subjected to sequence labeling to obtain target question data and candidate answer data, the target question data comprises at least one target question, the candidate answer data comprises at least one candidate answer, each candidate answer is used for solving one of the target questions, the target question and the candidate answer in the initial sentence data can be split into independent text data according to labeling information, and accordingly the target question and the candidate answer are paired to construct initial question-answer pairs, and each initial question-answer pair comprises one target question and one candidate answer. Finally, extracting features of the initial question-answer pair through a preset question-answer matching model to obtain question-answer semantic features, extracting question-answer characteristic information of the initial question-answer pair, carrying out matching probability calculation on the question-answer semantic features to obtain a question-answer matching value, and screening the initial question-answer pair according to the question-answer matching value to obtain a target question-answer pair, wherein the target question-answer pair comprises a target question and a target answer corresponding to the target question, and the method can better capture complete semantic information of the target question and a candidate answer, improve matching effects of the target question and the target answer, and improve matching accuracy of the question and the answer.
Referring to fig. 8, the embodiment of the present application further provides a question-answer matching device, which can implement the question-answer matching method, where the device includes:
an obtaining module 801, configured to obtain original sentence data to be processed;
A first feature extraction module 802, configured to perform contextual feature extraction on the original sentence data, so as to obtain initial sentence data;
The sequence labeling module 803 is configured to perform sequence labeling on the initial sentence data to obtain target question data and candidate answer data, where the target question data includes at least one target question, the candidate answer data includes at least one candidate answer, and each candidate answer is used for solving one of the target questions;
a construction module 804, configured to construct an initial question-answer pair according to the target question and the candidate answer;
The second feature extraction module 805 is configured to perform feature extraction on the initial question-answer pair through a preset question-answer matching model, so as to obtain a question-answer semantic feature;
And a calculating module 806, configured to perform matching probability calculation on the question-answer semantic features through a question-answer matching model to obtain a question-answer matching value, and perform screening processing on the initial question-answer pair according to the question-answer matching value to obtain a target question-answer pair, where the target question-answer pair includes a target question and a target answer corresponding to the target question.
In some embodiments, the first feature extraction module 802 includes:
the embedding unit is used for carrying out characteristic embedding processing on the original sentence data to obtain sentence embedding vectors;
and the extraction unit is used for extracting the context characteristics of the sentence embedded vector through a preset attention mechanism model to obtain initial sentence data.
In some embodiments, the embedding unit comprises:
the word segmentation subunit is used for carrying out word segmentation processing on the original sentence data to obtain an original word segment;
The word embedding subunit is used for carrying out word embedding processing on the original word segments to obtain original word embedding vectors;
And the splicing subunit is used for carrying out splicing processing on the original word embedded vector to obtain a sentence embedded vector.
In some embodiments, the sequence annotation module 803 includes:
the position prediction unit is used for performing position prediction on the initial sentence data according to a preset first function and the BIO label to obtain a question position label and an answer position label of the initial sentence data;
the segmentation unit is used for carrying out segmentation processing on the initial sentence data according to the question position label to obtain target question data, and carrying out segmentation processing on the initial sentence data according to the answer position label to obtain candidate answer data.
In some embodiments, the partitioning unit comprises:
the first segmentation subunit is used for extracting a problem starting tag and a problem ending tag in the problem position tag, and segmenting the initial sentence data according to the problem starting tag and the problem ending tag to obtain target problem data;
and the second segmentation subunit is used for extracting an answer starting label and an answer ending label in the answer position label, and carrying out segmentation processing on the candidate answer data according to the answer starting label and the answer ending label to obtain the candidate answer data.
In some embodiments, the second feature extraction module 805 includes:
the coding unit is used for coding the initial question-answer pair through a question-answer matching model to obtain a question-answer coding vector;
The normalization unit is used for carrying out normalization processing on the question-answer encoding vector to obtain question-answer semantic features, wherein the question-answer semantic features are characterization features used for characterizing context semantic information of sentences.
In some embodiments, the computing module 806 includes:
the probability calculation unit is used for carrying out matching probability calculation on the question-answer semantic features through a second function of the question-answer matching model to obtain question-answer matching values;
And the screening unit is used for taking the initial question-answer pair with the largest question-answer matching value as the target question-answer pair.
The specific implementation of the question-answer matching device is basically the same as the specific embodiment of the question-answer matching method, and will not be described herein.
The embodiment of the application also provides electronic equipment, which comprises a memory, a processor, a program stored on the memory and capable of running on the processor and a data bus for realizing connection communication between the processor and the memory, wherein the program realizes the question-answer matching method when being executed by the processor. The electronic equipment can be any intelligent terminal including a tablet personal computer, a vehicle-mounted computer and the like.
Referring to fig. 9, fig. 9 illustrates a hardware structure of an electronic device according to another embodiment, the electronic device includes:
The processor 901 may be implemented by a general purpose CPU (central processing unit), a microprocessor, an application specific integrated circuit (ApplicationSpecificIntegratedCircuit, ASIC), or one or more integrated circuits, etc. for executing related programs, so as to implement the technical solution provided by the embodiments of the present application;
The memory 902 may be implemented in the form of read-only memory (ReadOnlyMemory, ROM), static storage, dynamic storage, or random access memory (RandomAccessMemory, RAM), among others. The memory 902 may store an operating system and other application programs, and when the technical solution provided in the embodiments of the present disclosure is implemented by software or firmware, relevant program codes are stored in the memory 902, and the processor 901 invokes the question-answer matching method for executing the embodiments of the present disclosure;
an input/output interface 903 for inputting and outputting information;
the communication interface 904 is configured to implement communication interaction between the device and other devices, and may implement communication in a wired manner (e.g. USB, network cable, etc.), or may implement communication in a wireless manner (e.g. mobile network, WIFI, bluetooth, etc.);
A bus 905 that transfers information between the various components of the device (e.g., the processor 901, the memory 902, the input/output interface 903, and the communication interface 904);
Wherein the processor 901, the memory 902, the input/output interface 903 and the communication interface 904 are communicatively coupled to each other within the device via a bus 905.
The embodiment of the application also provides a storage medium, which is a computer readable storage medium and is used for computer readable storage, the storage medium stores one or more programs, and the one or more programs can be executed by one or more processors to realize the question-answer matching method.
The memory, as a non-transitory computer readable storage medium, may be used to store non-transitory software programs as well as non-transitory computer executable programs. In addition, the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory remotely located relative to the processor, the remote memory being connectable to the processor through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
According to the question-answer matching method, the question-answer matching device, the electronic equipment and the storage medium, the original sentence data to be processed is obtained, the context characteristics of the original sentence data are extracted, the original sentence data are obtained, the context semantic information of the original sentence can be well reserved, and the semantic integrity of the original sentence is improved. Further, the initial sentence data is subjected to sequence labeling to obtain target question data and candidate answer data, the target question data comprises at least one target question, the candidate answer data comprises at least one candidate answer, each candidate answer is used for solving one of the target questions, the target question and the candidate answer in the initial sentence data can be split into independent text data according to labeling information, and accordingly the target question and the candidate answer are paired to construct initial question-answer pairs, and each initial question-answer pair comprises one target question and one candidate answer. Finally, extracting features of the initial question-answer pair through a preset question-answer matching model to obtain question-answer semantic features, extracting question-answer characteristic information of the initial question-answer pair, carrying out matching probability calculation on the question-answer semantic features to obtain a question-answer matching value, and screening the initial question-answer pair according to the question-answer matching value to obtain a target question-answer pair, wherein the target question-answer pair comprises a target question and a target answer corresponding to the target question, and the method can better capture complete semantic information of the target question and a candidate answer, improve matching effects of the target question and the target answer, and improve matching accuracy of the question and the answer.
The embodiments described in the embodiments of the present application are for more clearly describing the technical solutions of the embodiments of the present application, and do not constitute a limitation on the technical solutions provided by the embodiments of the present application, and those skilled in the art can know that, with the evolution of technology and the appearance of new application scenarios, the technical solutions provided by the embodiments of the present application are equally applicable to similar technical problems.
It will be appreciated by those skilled in the art that the solutions shown in fig. 1-7 are not limiting on the embodiments of the application and may include more or fewer steps than shown, or certain steps may be combined, or different steps.
The above described apparatus embodiments are merely illustrative, wherein the units illustrated as separate components may or may not be physically separate, i.e. may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
Those of ordinary skill in the art will appreciate that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof.
The terms "first," "second," "third," "fourth," and the like in the description of the application and in the above figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be understood that in the present application, "at least one (item)" means one or more, and "a plurality" means two or more. "and/or" is used to describe an association relationship of an associated object, and indicates that three relationships may exist, for example, "a and/or B" may indicate that only a exists, only B exists, and three cases of a and B exist simultaneously, where a and B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one of a, b or c may represent a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.
In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the above-described division of units is merely a logical function division, and there may be another division manner in actual implementation, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including multiple instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method of the various embodiments of the present application. The storage medium includes various media capable of storing programs, such as a U disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory RAM), a magnetic disk, or an optical disk.
The preferred embodiments of the present application have been described above with reference to the accompanying drawings, and are not thereby limiting the scope of the claims of the embodiments of the present application. Any modifications, equivalent substitutions and improvements made by those skilled in the art without departing from the scope and spirit of the embodiments of the present application shall fall within the scope of the claims of the embodiments of the present application.