[go: up one dir, main page]

CN114154517B - Dialogue quality assessment method and system based on deep learning - Google Patents

Dialogue quality assessment method and system based on deep learning Download PDF

Info

Publication number
CN114154517B
CN114154517B CN202111442436.2A CN202111442436A CN114154517B CN 114154517 B CN114154517 B CN 114154517B CN 202111442436 A CN202111442436 A CN 202111442436A CN 114154517 B CN114154517 B CN 114154517B
Authority
CN
China
Prior art keywords
reply
dialogue
user
candidate
replies
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111442436.2A
Other languages
Chinese (zh)
Other versions
CN114154517A (en
Inventor
何婷婷
王逾凡
范瑞
阿布都乃比江·库尔班
戴汝峰
洪婕
章哲铭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central China Normal University
Original Assignee
Central China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central China Normal University filed Critical Central China Normal University
Priority to CN202111442436.2A priority Critical patent/CN114154517B/en
Publication of CN114154517A publication Critical patent/CN114154517A/en
Application granted granted Critical
Publication of CN114154517B publication Critical patent/CN114154517B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • G06F40/35Discourse or dialogue representation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Machine Translation (AREA)

Abstract

A dialogue quality assessment method based on deep learning is characterized in that: firstly, constructing a dialogue corpus, wherein the corpus comprises a user dialogue D and a plurality of candidate replies R; training a dialogue quality assessment deep learning model M by using dialogue corpus, wherein the deep learning model assesses candidate reply quality from four aspects of reply smoothness, reply and semantic relativity of the user dialogue D, reply forward emotion guiding performance of the user dialogue D and reply and context logic consistency of the user dialogue D; finally, the candidate replies with the highest scores are selected as the final replies of the user session D by linear fusion of the smoothness result P A, the relevance result P B, the forward emotion guiding result P C and the context logic consistency result P D. The invention can effectively evaluate a plurality of candidate replies given by the dialogue system, and can provide a standard for dialogue comparability in dialogue generation. And the method can also be expanded into text quality assessment tasks in other natural language processing fields.

Description

Dialogue quality assessment method and system based on deep learning
Technical Field
The invention belongs to the technical field of man-machine conversation systems, and particularly relates to a conversation quality assessment method and system based on deep learning.
Background
In recent years, internet technology has been rapidly developed, and various internet products are deep into lives of people, so that a large amount of data is generated at any time, and a large data age is coming. The deep learning technology is rapidly advanced under the support of big data, and is widely applied to the field of artificial intelligence, and the artificial intelligence technology is a new development opportunity. Man-machine conversation is an important component in the field of artificial intelligence, and various man-machine conversation systems are sequentially arranged in daily life of people, such as Microsoft chatting robot ice, intelligent customer service honey in Aly, voice assistant Siri of apples and the like. The Microsoft ice can give a reply with emotion through analyzing emotion change of the user, and can be used as an emotion partner of intimacy of the user; the Arisaema serves as a shopping personal assistant of the customer, can provide full-program consultation service for the customer, and improves shopping experience of the user. An intelligent man-machine dialogue system can fully understand semantic information of dialogue, generate meaningful replies which accord with dialogue scenes, and better serve users.
Dialog generation is a key element of man-machine dialog, which is able to generate machine responses from the dialog content and to convert the responses into natural language feedback to the user. The quality of dialog generation directly affects the user experience, and reflects the degree of intelligentization of the man-machine dialog system to a great extent. At present, a dialogue system based on deep learning can learn and generalize semantic information of a dialogue from massive dialogue corpora and automatically generate replies. However, not all candidate replies generated by the system are appropriate for the current dialog. Therefore, how to select an appropriate reply from a plurality of candidate replies is important for improving the performance of the dialogue system and increasing the user experience. In addition, an accurate and comprehensive dialogue quality assessment model is established, and the method has great significance for improving the user consultation satisfaction and improving the machine language dialogue skills. Currently, in open field dialog systems, automated assessment metrics typically focus on the quality of dialog generation, such as the consistency and fluency of dialog context. But this automatic assessment is not comprehensive for the assessment of dialog quality. In addition, manual assessment, while more accurate, is inefficient and costly. In view of the foregoing, it is important to explore an efficient automated way to comprehensively evaluate the quality of the generated dialog.
Disclosure of Invention
The invention aims to solve the problem of improving the accuracy of dialogue quality assessment by utilizing a deep learning technology.
The invention provides a dialogue quality assessment method based on deep learning, which comprises the steps of firstly, constructing dialogue corpus, wherein the corpus comprises a user dialogue D and a plurality of candidate replies R; training a dialogue quality assessment deep learning model M by using dialogue corpus, wherein the deep learning model assesses candidate reply quality from four aspects of reply smoothness, reply and semantic relativity of the user dialogue D, reply forward emotion guiding performance of the user dialogue D and reply and context logic consistency of the user dialogue D; finally, the candidate replies with the highest scores are selected as the final replies of the user session D by linear fusion of the reply smoothness result P A, the reply and the correlation result P B of the user session D, the reply and the forward emotion guiding result P C of the user session D and the reply and the context logic consistency result P D of the user session D.
Moreover, evaluating from the smoothness candidate reply quality of replies includes computing a confusion of candidate replies R i for N candidate replies r= { R 1,r2,...,rN},ru representing each candidate reply, u=1, 2,..n), scoring the smoothness of the candidate replies, in the following manner,
The candidate replies r u are segmented to obtain an input text r u=[x1,x2,...,xm with the length of m, and the confusion PPL (r u) of the current dialogue is calculated through an N-gram language model as follows:
Wherein x 1,x2,...,xm is each word constituting a sentence;
Wherein q (x i|xi-1,xi-2,...,x1) can be represented by a language model, which is a distribution q (x i|xi-1,xi-2,...,x1) of probability that the i+1th word may appear given the first i words of a sentence;
to map the value of confusion to around 0 to 1, a normalized calculation is taken:
PA=-(PPL(ru)-mu)/sigma
Wherein mu and sigma are fixed values, and P A is the result of the smoothness scoring of the recovery.
Moreover, evaluating the semantic relevance of the reply to the user dialog D includes scoring the semantic relevance of the candidate replies to the user dialog D by computing the semantic similarity of the user dialog D to each candidate reply R i, in the following manner,
The user dialogue D and the candidate reply r u are segmented to obtain n words w 1,w2,...,wn and m words x 1,x2,...,xm respectively, and preprocessing is carried out, wherein the method comprises the steps of adding a label [ CLS ] at a starting position to obtain D= [ [ CLS ], w 1,w2,...,wn ] and r u=[[CLS],x1,x2,...,xm ], and then obtaining a word vector sequence by using a word embedding modeAnd Corresponding to the vectors corresponding to the two [ CLS ] flag bits in the input text, e wj representing the word vector of the j-th word in the user dialog D, e xi representing the word vector of the i-th word in the candidate reply r u, i=1, 2,..m, j=1, 2,..n; d e and r ue are respectively input into the BERT model, and the output of BERT is obtained through calculationOutput corresponding to two [ CLS ] flag bits respectively, dialog semantic information representing user dialog and candidate reply is calculatedCosine similarity scores the semantic similarity between the user dialog D and the candidate reply r u as follows,
Where cos () represents cosine similarity and P B is the result of scoring the semantic relevance of user dialog D and candidate reply r u.
Further, the evaluation of the forward emotion guiding performance of the user session D from replies includes formulating emotion guiding rules for the user session D for N candidate replies R, scoring each candidate reply by the rules, in the following manner,
Scoring the user dialogue D and each candidate reply r i through a dialogue emotion recognition model to obtain probability distribution [ P (middle), P (positive), P (negative) ], P (middle), P (positive), P (negative) respectively representing the probabilities of emotion being neutral, positive and negative, and calculating emotion scores as follows:
Score D = [0.05 x p (medium) +1*P (positive) -1*P (negative) ]x100
Wherein Score D represents the emotion Score of the user's dialog,An emotion score representing the i candidate reply;
The scoring rules for calculating the forward emotion guiding of the reply to user session D are as follows,
When the user dialogue emotion Score D is less than 0, andWhen (1): .
PC=-(Scoret-num)/sigma
Wherein abs () represents absolute value, score t represents Score D andIs a constant value, and P C is a forward emotion guidance score for a reply to the user session D.
When the user dialogue emotion Score D is less than 0, andWhen (1):
PC=0
When user dialogue emotion Score D is ≡ 0:
Moreover, evaluating from the contextual logical consistency of the replies with the user session D, including scoring the logical consistency of the candidate replies by computing the user session D and by computing the contextual logical consistency of each candidate reply R u, is accomplished as follows,
The user dialogue D and the candidate reply r u are segmented to obtain D= [ w 1,w2,...,wn ] and r u=[x1,x2,...,xm ], the text is preprocessed to obtain an input text I= [ [ CLS ], w 1,w2,...,wn,[SEP],x1,x2,...,xm ], [ CLS ] is a start mark, and [ SEP ] is a paragraph separation mark for separating the user dialogue D and the candidate reply r u respectively. ;
Then, using word embedding to obtain a word vector sequence Ie=[ecls,ew1,ew2,…,ewn,esep,ex1,ex2,…,exm],ewj representing the word vector of the j-th word in the user's dialogue, e xi representing the word vector of the i-th word in the candidate reply r u, i=1, 2,..m, j=1, 2,..n;
The sequence I e is input into the BERT model, h cls corresponds to the output of the [ CLS ] flag bit and is used for calculating whether two sentences have the context logic consistency or not, and the specific implementation process is as follows:
y=softmax(Wlhcls)
Where W l represents a trainable parameter that is a fully connected network layer, softmax represents an activation function, h cls is subjected to activation function processing by the fully connected layer and the softmax activation function, and a probability distribution y that D and r u are context logic consistency is calculated. Finally, the probability of the user dialog and candidate reply logical agreement is selected as P D.
And, the smoothness result P A of the candidate replies, the relevance result P B of the replies and the user session D, the forward emotion guiding result P C of the replies and the context logic consistency result P D of the user session D are linearly fused, and the candidate reply with the highest score in the plurality of candidate replies is selected as the final reply corresponding to the session D.
On the other hand, the invention also provides a dialogue quality evaluation system based on deep learning, which is used for realizing the dialogue quality evaluation method based on deep learning.
Further, the system includes a processor and a memory, the memory for storing program instructions, the processor for invoking the stored instructions in the memory to perform a deep learning based dialog quality assessment method as described above.
Or comprises a readable storage medium having stored thereon a computer program which, when executed, implements a deep learning based dialog quality assessment method as described above.
The invention provides a dialogue quality evaluation technical scheme based on deep learning. The invention provides a comprehensive and automatic assessment method for the conversation generation quality in the open field. Compared with the prior art, the evaluation method not only evaluates the fluency and semantic relevance of the generated dialogue, but also further considers the logical consistency and emotion relevance of the replies. For fluency and semantic relevance of conversations, the invention evaluates fluency of replies through an n-gram language model, and adopts a deep learning BERT model to acquire semantic relevance of user conversations and candidate replies. For logical consistency of replies, the invention utilizes a deep learning BERT model and utilizes an implication-based approach to evaluate logical consistency. For emotion relevance, the invention provides a dialogue emotion calculation method, and an evaluation standard is designed according to the emotion distance between a user dialogue and candidate replies. Experimental results show that the dialogue quality assessment method provided by the invention can effectively assess a plurality of candidate replies given by a dialogue system, and the results of the automatic assessment method have strong correlation with the results of manual assessment. The present invention proposes an evaluation method that can provide a criterion for conversation comparability in conversation generation. Meanwhile, the method can be expanded to text quality assessment tasks in other natural language processing fields.
Drawings
FIG. 1 is a flow chart of a dialog quality assessment in an embodiment of the present invention.
Detailed Description
The technical scheme of the invention is specifically described below with reference to the accompanying drawings and examples.
Open domain dialog generation is receiving increasing attention in the field of man-machine dialog. Reply generation requires a comprehensive evaluation criterion. Manual assessment is considered the gold standard. Because of the inefficiency and high cost of manual assessment, an automated alternative is highly desirable. Furthermore, some automatic assessment methods only consider the consistency and fluency of the dialog context, resulting in an overcompaction of dialog quality assessment. In order to more comprehensively evaluate the dialogue, the dialogue quality evaluation method provided by the invention not only evaluates the fluency and semantic relevance of the generated dialogue, but also further considers the logical consistency and emotion relevance of the replies. The present invention evaluates a plurality of candidate replies from four aspects of smoothness of the replies, relevance of the replies to the user's dialog, emotional consistency of the replies to the user's dialog, and contextual logic of the replies to the user's dialog. Specifically: (1) dialog fluency: dialog fluency based on an n-gram language model; (2) semantic relevance: based on the contextual semantic relevance of the BERT model; (3) logic of context: logical self-consistency based on text implication reasoning; (4) judging emotion contained in the dialogue: emotion guidance based on dialog emotion recognition. In particular, the present invention allows for emotional association of user conversations and candidate replies, more deeply allowing for emotional transfer between conversations. And a dialogue emotion calculating method is provided, and an evaluation standard is designed according to the emotion distance between the dialogue of the user and the candidate replies. Experiments prove that a plurality of candidate replies given by the dialogue system are effectively evaluated, and the result of the automatic evaluation method has strong correlation with the result of manual evaluation.
According to the dialogue quality assessment method based on deep learning, a dialogue corpus is used for training a dialogue quality assessment deep learning model, and the deep learning model scores candidate replies from four aspects of reply smoothness, reply-to-user dialogue correlation, reply-to-user dialogue emotion consistency and reply-to-user dialogue context logic; and finally, linearly fusing the smoothness result of the reply, the relevance result of the reply and the user dialogue D, the emotion consistency result of the reply and the user dialogue, and the context logic result of the reply and the user dialogue, and selecting the candidate reply with the highest score as the final reply of the user dialogue.
Referring to fig. 1, a dialogue quality assessment method based on deep learning provided by an embodiment of the present invention includes first constructing a dialogue corpus, where the corpus includes a user dialogue D and a plurality of candidate replies R; training a dialogue quality assessment deep learning model M by using dialogue corpus, wherein the deep learning model assesses candidate reply quality from four aspects of reply smoothness, reply correlation with a user dialogue D, reply forward emotion guiding performance of the user dialogue D and reply context logic consistency with the user dialogue D; finally, the candidate replies with the highest scores are selected as the final replies of the user session D by linear fusion of the reply smoothness result P A, the reply and the correlation result P B of the user session D, the reply and the forward emotion guiding result P C of the user session D and the reply and the context logic consistency result P D of the user session D.
The implementation process mainly comprises four sub-parts: 1. for N candidate replies R, R u represents each candidate reply, calculating the confusion degree of R u, and scoring the smoothness degree of the candidate replies; 2. the N candidate replies R are scored for the semantic relativity of the candidate replies and the user dialogue by calculating the semantic similarity of the user dialogue D and each candidate reply R u; 3. scoring the emotion consistency of the candidate replies by calculating the emotion-containing consistency of the user dialogue D and each candidate reply R u for the N candidate replies R; 4. for N candidate replies R, the logicality of the candidate replies is scored by computing the user dialog D and by computing the contextual logicality of each candidate reply R u. The specific implementation process is as follows:
step 1: for N candidate replies r= { R 1,r2,...,rN},ru, each candidate reply (u=1, 2,..n), calculating the confusion degree of the candidate replies R u, scoring the smoothness degree of the candidate replies, and realizing the following modes:
Step 1.1: the candidate replies r u are segmented to obtain an input text r u=[x1,x2,...,xm with the length of m, and the confusion degree PPL (r u) of the current dialogue can be calculated through an N-gram language model by the following calculation method:
where x 1,x2,...,xm is each word that makes up a sentence.
Where q (x i|xi-1,xi-2,...,x1) can be given by a language model, i=1, 2. The language model, given the first i words of a sentence, can predict what the i+1st word is, i.e., give a distribution q (x i|xi-1,xi-2,...,x1) of the probability that the i+1st word is likely to occur.
Step 1.2: in order to map the value of confusion to the vicinity of 0 to 1, a normalization is adopted, and a specific calculation method is as follows:
PA=-(PPL(ru)-mu)/sigma
Wherein mu and sigma are fixed values, and P A is the result of the smoothness scoring of the recovery.
Step 2: and scoring the semantic relativity of the candidate replies and the user session by calculating the semantic similarity of the user session D and each candidate reply R u for N candidate replies R, wherein the implementation mode is as follows:
Step 2.1: the user dialogue D and the candidate reply r u are subjected to word segmentation to obtain n words w 1,w2,...,wn and m words x 1,x2,...,xm respectively, pretreatment is carried out, namely a [ CLS ] tag is added at the starting position to obtain D= [ [ CLS ], w 1,w2,...,wn ] and r u=[[CLS],x1,x2,...,xm ], and then word vector sequences are obtained by using a word embedding technology And Corresponding to the vectors corresponding to the two [ CLS ] flag bits in the input text respectively, e wj representing the word vector of the jth word in the user dialogue, e xi representing the word vector of the ith word in the candidate reply r u, i=1, 2,..m, j=1, 2,..n; d e and r ue are respectively input into the BERT model, and the output of BERT is obtained through calculation And respectively corresponding to the output of the two [ CLS ] flag bits, and representing the semantic information of the dialogue sentence.
Note that: BERT is an abbreviation for Pre-training of Deep Bidirectional Transformers for Language Understanding. In specific implementation, the BERT implementation can be referred to in the prior art :Devlin,j.,Chang,M.W.,Lee,K.,&Toutanova,K.(2018).Bert:Pre-training of deep bidirectional transformers for language understanding.arXiv preprint arXiv:1810.04805., and the description of the present invention is omitted.
Step 2.2: by calculation ofThe cosine similarity scores the semantic similarity between the user dialogue D and the candidate reply r u, and the calculation method is as follows:
Where cos () represents cosine similarity and P B is the result of scoring the semantic relevance of user dialog D and candidate reply r u.
Step 3: and for N candidate replies R, formulating emotion guiding rules of the candidate replies to the user dialogue D, and scoring each candidate reply through the rules, wherein the implementation mode is as follows:
Step 3.1: scoring the user dialogue D and each candidate reply r u through a dialogue emotion recognition model to obtain probability distribution [ P (middle), P (positive), P (negative) ], P (middle), P (positive) and P (negative) of dialogue emotion polarities, wherein the probability of emotion neutral, positive and negative is respectively represented by the following method for calculating emotion scores:
Score D = [0.05 x p (medium) +1*P (positive) -1*P (negative) ]x100
Wherein Score D represents the emotion Score of the user's dialog,Representing the emotion score of the u candidate reply, multiplied by the x.
In practice, the dialog emotion recognition model may employ existing techniques.
Step 3.2: the scoring rules for computing the forward emotion guiding of the reply to user dialog D are as follows:
when the user dialogue emotion Score D is less than 0, and When (1):
PC=-(Scoret-num)/sigma
Wherein abs () represents absolute value, score t represents Score D and Is constant, num < sigma, preferably with a num proposal value of 200, a sigma proposal value of 300, and p C is a positive emotion guidance score for the reply to the user session D.
When the user dialogue emotion Score D is less than 0, andWhen (1):
PC=0
When user dialogue emotion Score D is ≡ 0:
step 4: for N candidate replies R, the logical consistency of the candidate replies is scored by calculating the user dialogue D and by calculating the contextual logical consistency of each candidate reply R u, the implementation is as follows:
Step 4.1: the user dialogue D and the candidate reply r u are segmented to obtain D= [ w 1,w2,...,wn ] and r u=[x1,x2,...,xm ], the text is preprocessed to obtain an input text I= [ [ CLS ], w 1,w2,...,wn,[SEP],x1,x2,...,xm ], [ CLS ] is a start mark, and [ SEP ] is a paragraph separation mark for separating the user dialogue D and the candidate reply r u respectively. Word vector sequence Ie=[ecls,ew1,ew2,…,ewn,esep,ex1,ex2,…,exm],ewj is then used to represent the word vector of the j-th word in the user's dialogue, e xi represents the word vector of the i-th word in candidate reply r u, i=1, 2,..m, j=1, 2,..n.
Step 4.2: the sequence I e is input into the BERT model, h cls corresponds to the output of the [ CLS ] flag bit and is used for calculating whether two sentences have the context logic consistency or not, and the specific implementation process is as follows:
y=softmax(Wlhcls)
Where W l represents a trainable parameter that is a fully connected network layer, softmax represents an activation function, h cls is subjected to activation function processing by the fully connected layer and the softmax activation function, and a probability distribution y that D and r u are context logic consistency is calculated. Finally, the probability of the user dialog and candidate reply logical agreement is selected as P D.
Step 5: for dialog D for which intent recognition is to be performed, the candidate reply's compliance results P A, reply's relevance results P B to user dialog D, reply's emotional consistency with user dialog D P C, reply's contextual logic with user dialog D P D are linearly fused in the following manner,
P=δ1·PA2·PB3·PC4·PD
Wherein, delta 1、δ2、δ3、δ4 is a fixed value and the value interval is [0,1], and the recommended value is 0.8,0.6,0.9,0.8.
And finally, selecting the candidate reply with the highest score from the plurality of candidate replies as the final reply corresponding to the conversation D.
By adopting the mode, the dialogue quality assessment model based on deep learning can be realized, and the automatic operation flow is realized.
In particular, the method for evaluating the dialogue quality based on deep learning according to the technical scheme of the invention can be implemented by a person skilled in the art by adopting a computer software technology to realize an automatic operation flow, and a system device for realizing the method, such as a computer readable storage medium for storing a corresponding computer program according to the technical scheme of the invention, and a computer device and a server for running the corresponding computer program, are also within the protection scope of the invention.
In some possible embodiments, a dialogue quality assessment system based on deep learning is provided, which comprises a processor and a memory, wherein the memory is used for storing program instructions, and the processor is used for calling the stored instructions in the memory to execute an aspect-level emotion analysis method based on deep learning.
In some possible embodiments, a dialogue quality assessment system based on deep learning is provided, which includes a readable storage medium having a computer program stored thereon, the computer program implementing a deep learning-based aspect-level emotion analysis method as described above when executed.
The specific embodiments described herein are offered by way of example only to illustrate the spirit of the invention. Those skilled in the art may make various modifications or additions to the described embodiments or substitutions thereof without departing from the spirit of the invention or exceeding the scope of the invention as defined in the accompanying claims.

Claims (5)

1. A dialogue quality assessment method based on deep learning is characterized in that: firstly, constructing a dialogue corpus, wherein the corpus comprises a user dialogue D and a plurality of candidate replies R; training a dialogue quality assessment deep learning model M by using dialogue corpus, wherein the deep learning model assesses candidate reply quality from four aspects of reply smoothness, reply and semantic relativity of the user dialogue D, reply forward emotion guiding performance of the user dialogue D and reply and context logic consistency of the user dialogue D; finally, the reply and the context logic consistency result PD of the user dialogue D are linearly fused through the reply smoothness result P A, the reply and the correlation result P B of the user dialogue D, the reply and the forward emotion guiding result P C of the user dialogue D, and the reply and the highest-scoring candidate reply is selected as the final reply of the user dialogue D;
Evaluation from the smoothness candidate reply quality of replies includes the steps of representing each candidate reply by N candidate replies r= { R 1,r2,...,rN},ru, u=1, 2,..n, calculating the confusion of candidate replies R i, scoring the smoothness of the candidate replies, the implementation is as follows,
The candidate replies r u are segmented to obtain an input text r u=[x1,x2,...,xm with the length of m, and the confusion PPL (r u) of the current dialogue is calculated through an N-gram language model as follows:
Wherein x 1,x2,...,xm is each word constituting a sentence;
Wherein q (x i|xi-1,xi-2,...,x1) can be represented by a language model, which is a distribution q (x i|xi-1,xi-2,...,x1) of probability that the i+1th word may appear given the first i words of a sentence;
to map the value of confusion to around 0 to 1, a normalized calculation is taken:
PA=-(PPL(ru)-mu)/sigma
Wherein mu and sigma are fixed values, and P A is a return smoothness scoring result;
Evaluation from the semantic relevance of the reply to the user dialog D includes scoring the semantic relevance of the candidate replies to the user dialog D by computing the semantic similarity of the user dialog D to each candidate reply ri, by,
The user dialogue D and the candidate reply r u are segmented to obtain n words w 1,w2,...,wn and m words x 1,x2,...,xm respectively, and preprocessing is carried out, wherein the method comprises the steps of adding a label [ CLS ] at a starting position to obtain D= [ [ CLS ], w 1,w2,...,wn ] and r u=[[CLS],x1,x2,...,xm ], and then obtaining a word vector sequence by using a word embedding modeAnd Corresponding to the vectors corresponding to the two [ CLS ] flag bits in the input text, e wj representing the word vector of the j-th word in the user dialog D, e xi representing the word vector of the i-th word in the candidate reply r u, i=1, 2,..m, j=1, 2,..n; d e and r ue are respectively input into the BERT model, and the output of BERT is obtained through calculation Output corresponding to two [ CLS ] flag bits respectively, dialog semantic information representing user dialog and candidate reply is calculatedCosine similarity scores the semantic similarity between the user dialog D and the candidate reply r u as follows,
Where cos () represents cosine similarity and P B is the result of scoring the semantic relevance of user dialog D and candidate reply r u;
The evaluation of the forward emotion guiding performance of the user session D from replies includes formulating emotion guiding rules for the user session D for N candidate replies R, scoring each candidate reply by the rules, in the following manner,
Scoring the user dialogue D and each candidate reply r i through a dialogue emotion recognition model to obtain probability distribution [ P (middle), P (positive), P (negative) ], P (middle), P (positive), P (negative) respectively representing the probabilities of emotion being neutral, positive and negative, and calculating emotion scores as follows:
Score D = [0.05 x p (medium) +1*P (positive) -1*P (negative) ]x100
Wherein Score D represents the emotion Score of the user's dialog,An emotion score representing the i candidate reply;
The scoring rules for calculating the forward emotion guiding of the reply to user session D are as follows,
When the user dialog emotion scores Score D <0, andWhen (1): .
PC=-(Scoret-num)/sigma
Wherein abs () represents absolute value, score t represents Score D andIs the positive emotion guiding score of the reply to the user session D;
When the user dialog emotion scores Score D <0, and When (1):
PC=0
When user dialogue emotion Score D is ≡ 0:
Evaluation from the logical consistency of the context of the replies to the user session D includes scoring the logical consistency of the candidate replies by computing the user session D and by computing the logical consistency of the context of each candidate reply R u, by the following,
The user dialogue D and the candidate reply ru are segmented to obtain D= [ w 1,w2,...,wn ] and r u=[x1,x2,...,xm ], the text is preprocessed to obtain an input text I= [ [ CLS ], w 1,w2,...,wn,[SEP],x1,x2,...,xm ], [ CLS ] is a start mark, and [ SEP ] is a paragraph separation mark which is used for separating the user dialogue D and the candidate reply r u respectively;
then, using word embedding to obtain a word vector sequence Ie=[ecls,ew1,ew2,…,ewn,esep,ex1,ex2,…,exm],ewj representing the word vector of the j-th word in the user's dialogue, e xi representing the word vector of the i-th word in the candidate reply r u, i=1, 2,..m, j=1, 2,..n;
The sequence I e is input into the BERT model, h cls corresponds to the output of the [ CLS ] flag bit and is used for calculating whether two sentences have the context logic consistency or not, and the specific implementation process is as follows:
y=softmax(Wlhcls)
Wherein W l represents a trainable parameter of a fully connected network layer, softmax represents an activation function, h cls is subjected to activation function processing through the fully connected layer and the softmax activation function, and the probability distribution y that D and r u are context logic consistency is obtained through calculation; finally, the probability of the user dialog and candidate reply logical agreement is selected as P D.
2. The deep learning-based dialog quality assessment method of claim 1, wherein: and linearly fusing the smoothness result P A of the candidate replies, the relevance result P B of the replies and the user session D, the forward emotion guiding result P C of the replies to the user session D and the context logic consistency result P D of the replies and the user session D, and selecting the candidate reply with the highest score in the plurality of candidate replies as the final reply corresponding to the session D.
3. A dialogue quality assessment system based on deep learning, characterized in that: for implementing a deep learning based dialog quality assessment method as claimed in any of claims 1-2.
4. A deep learning based dialog quality assessment system according to claim 3, characterized in that: comprising a processor and a memory for storing program instructions, the processor being adapted to invoke the stored instructions in the memory to perform a deep learning based dialog quality assessment method according to any of claims 1-2.
5. A deep learning based dialog quality assessment system according to claim 3, characterized in that: comprising a readable storage medium having stored thereon a computer program which, when executed, implements a deep learning based dialog quality assessment method as claimed in any of claims 1-2.
CN202111442436.2A 2021-11-30 2021-11-30 Dialogue quality assessment method and system based on deep learning Active CN114154517B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111442436.2A CN114154517B (en) 2021-11-30 2021-11-30 Dialogue quality assessment method and system based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111442436.2A CN114154517B (en) 2021-11-30 2021-11-30 Dialogue quality assessment method and system based on deep learning

Publications (2)

Publication Number Publication Date
CN114154517A CN114154517A (en) 2022-03-08
CN114154517B true CN114154517B (en) 2024-09-27

Family

ID=80455118

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111442436.2A Active CN114154517B (en) 2021-11-30 2021-11-30 Dialogue quality assessment method and system based on deep learning

Country Status (1)

Country Link
CN (1) CN114154517B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115017288A (en) * 2022-06-17 2022-09-06 平安科技(深圳)有限公司 Model training method, model training device, equipment and storage medium
CN117112744B (en) * 2023-08-02 2024-07-12 北京聆心智能科技有限公司 Assessment method and device for large language model and electronic equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111241250A (en) * 2020-01-22 2020-06-05 中国人民大学 A system and method for generating emotional dialogue
CN111897933A (en) * 2020-07-27 2020-11-06 腾讯科技(深圳)有限公司 Emotional dialogue generation method and device and emotional dialogue model training method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9213687B2 (en) * 2009-03-23 2015-12-15 Lawrence Au Compassion, variety and cohesion for methods of text analytics, writing, search, user interfaces

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111241250A (en) * 2020-01-22 2020-06-05 中国人民大学 A system and method for generating emotional dialogue
CN111897933A (en) * 2020-07-27 2020-11-06 腾讯科技(深圳)有限公司 Emotional dialogue generation method and device and emotional dialogue model training method and device

Also Published As

Publication number Publication date
CN114154517A (en) 2022-03-08

Similar Documents

Publication Publication Date Title
CN109918673B (en) Semantic arbitration method and device, electronic equipment and computer-readable storage medium
CN108763510B (en) Intention recognition method, device, equipment and storage medium
CN108874972A (en) A kind of more wheel emotion dialogue methods based on deep learning
CN111062220B (en) End-to-end intention recognition system and method based on memory forgetting device
KR20230171234A (en) Method for Providing Question-and-Answer Service Based on User Participation And Apparatus Therefor
CN113408287B (en) Entity identification method and device, electronic equipment and storage medium
CN117689963B (en) A visual entity linking method based on multimodal pre-training model
CN114417880B (en) An interactive intelligent question-answering method based on power grid practical training question-answering knowledge base
CN117010387A (en) Roberta-BiLSTM-CRF voice dialogue text naming entity recognition system integrating attention mechanism
CN112364148B (en) A generative chatbot based on deep learning method
CN116991982B (en) Interactive dialogue method, device, equipment and storage medium based on artificial intelligence
CN114154517B (en) Dialogue quality assessment method and system based on deep learning
CN116910220A (en) Multi-turn dialogue interactive processing methods, devices, equipment and storage media
CN115274086B (en) Intelligent diagnosis guiding method and system
Liu et al. Cross-domain slot filling as machine reading comprehension: A new perspective
CN110597968A (en) Reply selection method and device
CN112182159A (en) Personalized retrieval type conversation method and system based on semantic representation
CN112528654A (en) Natural language processing method and device and electronic equipment
WO2023087935A1 (en) Coreference resolution method, and training method and apparatus for coreference resolution model
CN115617974B (en) Dialogue processing method, device, equipment and storage medium
CN116795970A (en) Dialog generation method and application thereof in emotion accompanying
Sarker et al. Anglo-Bangla language-based AI chatbot for Bangladeshi university admission system
CN112257432A (en) Self-adaptive intention identification method and device and electronic equipment
CN118377909B (en) Customer label determining method and device based on call content and storage medium
CN119150964A (en) Information processing method, apparatus, device, storage medium, and program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant