CN115617997A - A dialog state tracking method, device, equipment and medium - Google Patents
A dialog state tracking method, device, equipment and medium Download PDFInfo
- Publication number
- CN115617997A CN115617997A CN202211286338.9A CN202211286338A CN115617997A CN 115617997 A CN115617997 A CN 115617997A CN 202211286338 A CN202211286338 A CN 202211286338A CN 115617997 A CN115617997 A CN 115617997A
- Authority
- CN
- China
- Prior art keywords
- slot
- vector
- probability distribution
- preset
- domain
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/353—Clustering; Classification into predefined classes
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3346—Query execution using probabilistic model
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Human Computer Interaction (AREA)
- Probability & Statistics with Applications (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Machine Translation (AREA)
Abstract
本申请公开了一种对话状态追踪方法、装置、设备及介质,涉及计算机领域,包括:利用第一BiGRU神经网络对历史对话编码得到第一编码结果,基于注意力及领域向量对第一编码结果进行特征提取得到包含至少一个领域的第一提取结果;利用第二BiGRU神经网络对第一提取结果编码得到第二编码结果,基于注意力及槽位词向量对第二编码结果进行特征提取得到第二提取结果;将第二提取结果、领域‑槽位词向量及词表向量输入解码器得到解码后向量,基于解码后向量计算槽位值的预测概率分布,基于预测概率分布对领域下的槽位填充,本申请基于BiGRU神经网络解决了多领域下历史对话长依赖问题,通过将领域及槽位分开处理并依次做注意力实现了不同领域‑槽位对的信息共享。
The present application discloses a dialogue state tracking method, device, equipment and medium, which relate to the field of computers, including: using the first BiGRU neural network to encode the historical dialogue to obtain the first encoding result, and based on the attention and the domain vector to encode the first encoding result Perform feature extraction to obtain the first extraction result containing at least one field; use the second BiGRU neural network to encode the first extraction result to obtain the second encoding result, and perform feature extraction on the second encoding result based on attention and slot word vectors to obtain the second The second extraction result: input the second extraction result, domain-slot word vector and vocabulary vector into the decoder to obtain the decoded vector, calculate the predicted probability distribution of the slot value based on the decoded vector, and calculate the slot value under the domain based on the predicted probability distribution. For bit filling, this application solves the long-term dependency problem of historical dialogues in multiple domains based on the BiGRU neural network, and realizes the information sharing of different domain-slot pairs by processing domains and slots separately and paying attention in turn.
Description
技术领域technical field
本发明涉及计算机技术领域,特别涉及一种对话状态追踪方法、装置、设备及介质。The present invention relates to the field of computer technology, in particular to a dialog state tracking method, device, equipment and medium.
背景技术Background technique
随着人工智能的不断发展,人机对话技术以个人助理、客服机器人和语音控制系统等多种不同形式在导航、娱乐和通信等多方面得到了广泛的应用。对话系统可分为任务型和非任务型两大类,关于任务型的对话系统,目前主要有两种方法,分别是管道方法和端到端的方法。管道方法将整个人机对话过程划分为语音识别(Automatic SpeechRecognition,ASR)、自然语言处理(Natural Language Understanding,NLU)、对话管理(Dialogue Management,DM)、自然语言生成(Natural Language Generation,NLG)和语音合成(Text To Speech,TTS)五大模块。其中,对话管理模块相当于对话系统的大脑,在整个对话系统中起着非常重要的作用,决定着系统要采取的相应动作,包括追问、澄清和确认等,其主要子任务分别是对话状态追踪(Dialogue State Tracking,DST)和对话策略生成(Dialogue Policy,DP)。管道方法可以分别建模优化每一部分,但在多轮对话系统中容易产生误差的传递和累积,使得整体性能变差。端到端的方法减少了模块数量,简化了模型结构,有利于全局优化。由于在任务型对话系统中,用户的需求难以在单论对话中完整清楚的表达,因此,需要用户进行多轮对话逐渐表明自己的需求。With the continuous development of artificial intelligence, man-machine dialogue technology has been widely used in various forms such as personal assistants, customer service robots and voice control systems in navigation, entertainment and communication. Dialogue systems can be divided into two categories: task-based and non-task-based. Regarding task-based dialogue systems, there are currently two main methods, namely the pipeline method and the end-to-end method. The pipeline method divides the entire human-machine dialogue process into automatic speech recognition (Automatic SpeechRecognition, ASR), natural language processing (Natural Language Understanding, NLU), dialogue management (Dialogue Management, DM), natural language generation (Natural Language Generation, NLG) and Speech synthesis (Text To Speech, TTS) five modules. Among them, the dialogue management module is equivalent to the brain of the dialogue system, which plays a very important role in the entire dialogue system and determines the corresponding actions to be taken by the system, including questioning, clarification and confirmation, etc., and its main subtasks are dialogue state tracking (Dialogue State Tracking, DST) and dialogue policy generation (Dialogue Policy, DP). The pipeline method can model and optimize each part separately, but it is easy to cause error transmission and accumulation in a multi-round dialogue system, making the overall performance worse. The end-to-end approach reduces the number of modules, simplifies the model structure, and facilitates global optimization. In a task-based dialogue system, it is difficult to fully and clearly express the user's needs in a single-talk dialogue. Therefore, users need to conduct multiple rounds of dialogue to gradually express their needs.
现有技术之一为早期的对话追踪技术,其主要是利用基于人工规则的方法。基于有限状态机、槽填充和信息状态更新的对话管理,都需要人工定义大量的规则。基于人工制定规则的方法需要根据任务场景制定规则,但无法保证能够穷举出所有可能的状况和对话规则。总的来说,基于规则模板的对话状态追踪方法仅适用于简单任务的对话状态跟踪,不适用于复杂任务且无法在任务改变后复用规则。One of the existing technologies is the early dialogue tracking technology, which mainly utilizes human rule-based methods. Dialogue management based on finite state machine, slot filling and information state update all need to manually define a large number of rules. The method based on manual formulating rules needs to formulate rules according to task scenarios, but it cannot guarantee that all possible situations and dialogue rules can be exhaustively enumerated. In general, the dialog state tracking method based on rule templates is only suitable for simple tasks, not for complex tasks and cannot reuse rules after task changes.
鉴于基于规则的方法的局限性,学者们对模型的研究大都采用数据驱动的方法,从而产生了生成式方法和判别式方法,也即现有技术之二。其具有以下缺点:(1)未充分提取历史对话信息特征。深度学习中,可以利用循环神经网络(Recurrent Neural Network,RNN)对时间序列进行建模,提取历史对话信息特征。但RNN存在着梯度消失和梯度爆炸的问题,而长短时记忆网络(Long Short Term Memory,LSTM)和门控循环单元(GatedRecurrent Unit,GRU)虽然可以通过门控机制解决序列长依赖的问题,但它们只有前向的上下文信息,忽略了后向的上下文信息;(2)多领域场景下,未充分共享领域-槽值对信息。In view of the limitations of rule-based methods, scholars mostly use data-driven methods for model research, resulting in generative methods and discriminative methods, which is the second prior art. It has the following disadvantages: (1) The characteristics of historical dialogue information are not fully extracted. In deep learning, Recurrent Neural Network (RNN) can be used to model time series and extract historical dialogue information features. However, RNN has the problem of gradient disappearance and gradient explosion. Although Long Short Term Memory (LSTM) and Gated Recurrent Unit (GRU) can solve the problem of long sequence dependence through the gating mechanism, but They only have forward context information, ignoring backward context information; (2) In multi-domain scenarios, domain-slot value pair information is not fully shared.
为此,如何在不忽略前后向的上下文信息的前提下解决多领域场景下的历史对话长依赖问题,并实现不同领域-槽位对之间的信息共享是本领域亟待解决的问题。For this reason, how to solve the long-term dependency problem of historical dialogues in multi-domain scenarios without ignoring the forward and backward context information, and realize information sharing between different domain-slot pairs is an urgent problem in this field.
发明内容Contents of the invention
有鉴于此,本发明的目的在于提供一种对话状态追踪方法、装置、设备及介质,能够在不忽略前后向的上下文信息的前提下解决多领域场景下的历史对话长依赖问题,并实现不同领域-槽位对之间的信息共享,其具体方案如下:In view of this, the object of the present invention is to provide a dialog state tracking method, device, device and medium, which can solve the long-term dependency problem of historical dialogs in multi-field scenarios without ignoring the forward and backward context information, and realize different The specific scheme for information sharing between domain-slot pairs is as follows:
第一方面,本申请公开了一种对话状态追踪方法,包括:In the first aspect, the present application discloses a dialogue state tracking method, including:
利用第一BiGRU神经网络对历史对话记录进行编码,得到第一编码结果,并基于注意力机制以及预设的领域向量对所述第一编码结果进行领域层面的特征提取,得到包含至少一个领域的第一特征提取结果;Use the first BiGRU neural network to encode the historical dialogue records to obtain the first encoding result, and perform domain-level feature extraction on the first encoding result based on the attention mechanism and the preset domain vector, and obtain a domain-level feature extraction including at least one domain. The first feature extraction result;
利用第二BiGRU神经网络对所述第一特征提取结果进行编码,得到第二编码结果,并基于注意力机制以及预设的槽位词向量对所述第二编码结果进行槽位层面的特征提取,得到包含槽位的第二特征提取结果;Use the second BiGRU neural network to encode the first feature extraction result to obtain the second encoding result, and perform slot-level feature extraction on the second encoding result based on the attention mechanism and the preset slot word vector , to obtain the second feature extraction result including slots;
将所述第二特征提取结果、预设的领域-槽位词向量以及预设的词表字典向量输入至解码器,得到解码后向量,并基于所述解码后向量计算待填充的槽位值的预测概率分布,然后基于所述待填充的槽位值的预测概率分布对所述至少一个领域下的槽位进行填充,以便实现对话状态追踪;其中,所述预设的领域-槽位词向量由所述预设的领域向量以及所述预设的槽位词向量拼接而成。Input the second feature extraction result, the preset field-slot word vector and the preset vocabulary dictionary vector to the decoder to obtain a decoded vector, and calculate the slot value to be filled based on the decoded vector The predicted probability distribution of the value of the slot to be filled is then based on the predicted probability distribution of the value of the slot to be filled to fill the slot under the at least one field, so as to realize the dialogue state tracking; wherein, the preset field-slot word The vector is spliced by the preset domain vector and the preset slot word vector.
可选的,所述利用第一BiGRU神经网络对历史对话记录进行编码,得到第一编码结果之前,还包括:Optionally, before said utilizing the first BiGRU neural network to encode the historical dialogue record, before obtaining the first encoding result, it also includes:
获取若干轮次对话记录,并对所述若干轮次对话记录进行拼接,得到所述历史对话记录。Acquiring several rounds of dialogue records, and splicing the several rounds of dialogue records to obtain the historical dialogue records.
可选的,所述将所述第二特征提取结果、预设的领域-槽位词向量以及预设的词表字典向量输入至解码器,得到解码后向量之后,还包括:Optionally, after inputting the second feature extraction result, the preset field-slot word vector and the preset vocabulary dictionary vector into the decoder, and obtaining the decoded vector, it further includes:
基于对话状态追踪策略将所述解码后向量映射为目标概率分布;mapping the decoded vector to a target probability distribution based on a dialogue state tracking strategy;
当所述目标概率分布为第一概率分布,则表示用户对所述至少一个领域下的槽位未提及,当所述目标概率分布为第二概率分布,则表示用户对所述至少一个领域下的槽位持无所谓的状态,当所述目标概率分布为第三概率分布,则表示用户对所述至少一个领域下的槽位有所提及。When the target probability distribution is the first probability distribution, it means that the user has not mentioned the slots under the at least one field; The slots below remain indifferent, and when the target probability distribution is the third probability distribution, it means that the user has mentioned the slots under the at least one domain.
可选的,所述基于所述解码后向量计算待填充的槽位值的预测概率分布,包括:Optionally, the calculating the predicted probability distribution of the slot value to be filled based on the decoded vector includes:
当所述目标概率分布为所述第三概率分布,则基于所述解码后向量计算所述预设的词表字典中的槽位值的概率分布以及所述历史对话记录中的槽位值的概率分布;When the target probability distribution is the third probability distribution, then calculate the probability distribution of slot values in the preset vocabulary dictionary and the probability distribution of slot values in the historical dialogue record based on the decoded vector Probability distributions;
根据所述预设的词表字典中的槽位值的概率分布以及所述历史对话记录中的槽位值的概率分布计算所述待填充的槽位值的预测概率分布。The predicted probability distribution of the slot values to be filled is calculated according to the probability distribution of the slot values in the preset vocabulary dictionary and the probability distribution of the slot values in the historical dialogue records.
可选的,基于所述解码后向量计算所述预设的词表字典中的槽位值的概率分布以及所述历史对话记录中的槽位值的概率分布的公式为:Optionally, the formula for calculating the probability distribution of slot values in the preset vocabulary dictionary and the probability distribution of slot values in the historical dialogue record based on the decoded vector is:
其中,V表示所述预设的词表字典向量,Hdecode表示所述解码后向量,h表示所述第一编码结果,pvocab表示所述预设的词表字典中的槽位值的概率分布,phistory表示所述历史对话记录中的槽位值的概率分布。Wherein, V represents the preset vocabulary dictionary vector, H decode represents the decoded vector, h represents the first encoding result, and p vocab represents the probability of a slot value in the preset vocabulary dictionary distribution, and p history represents the probability distribution of the slot values in the historical dialogue records.
可选的,根据所述预设的词表字典中的槽位值的概率分布以及所述历史对话记录中的槽位值的概率分布计算所述待填充的槽位值的预测概率分布的公式为:Optionally, according to the probability distribution of the slot values in the preset vocabulary dictionary and the probability distribution of the slot values in the historical dialogue record, the formula for calculating the predicted probability distribution of the slot values to be filled for:
pvalue=pgen×pvocab+(1-pgen)×phistory;p value = p gen × p vocab + (1-p gen ) × p history ;
其中,pgen表示从所述预设的词表字典中生成所述待填充的槽位值的权重。Wherein, p gen represents the weight of generating the value of the slot to be filled from the preset vocabulary dictionary.
可选的,所述将所述第二特征提取结果、预设的领域-槽位词向量以及预设的词表字典向量输入至解码器,得到解码后向量,包括:Optionally, the second feature extraction result, the preset field-slot word vector and the preset vocabulary dictionary vector are input to the decoder to obtain a decoded vector, including:
将所述第二特征提取结果、预设的领域-槽位词向量以及预设的词表字典向量输入至由第三BiGRU神经网络构建的解码器,得到解码后向量。The second feature extraction result, the preset field-slot word vector and the preset vocabulary dictionary vector are input to the decoder constructed by the third BiGRU neural network to obtain the decoded vector.
第二方面,本申请公开了一种对话状态追踪装置,包括:In a second aspect, the present application discloses a dialogue state tracking device, including:
第一编码模块,用于利用第一BiGRU神经网络对历史对话记录进行编码,得到第一编码结果;The first encoding module is used to encode the historical dialogue record by using the first BiGRU neural network to obtain the first encoding result;
领域层面特征提取模块,用于基于注意力机制以及预设的领域向量对所述第一编码结果进行领域层面的特征提取,得到包含至少一个领域的第一特征提取结果;A domain-level feature extraction module, configured to perform domain-level feature extraction on the first encoding result based on an attention mechanism and a preset domain vector, to obtain a first feature extraction result including at least one domain;
第二编码模块,用于利用第二BiGRU神经网络对所述第一特征提取结果进行编码,得到第二编码结果;The second encoding module is used to encode the first feature extraction result by using the second BiGRU neural network to obtain the second encoding result;
槽位层面特征提取模块,用于基于注意力机制以及预设的槽位词向量对所述第二编码结果进行槽位层面的特征提取,得到包含槽位的第二特征提取结果;The feature extraction module at the slot level is used to perform feature extraction at the slot level on the second encoding result based on the attention mechanism and the preset slot word vector, to obtain a second feature extraction result including the slot;
解码模块,用于将所述第二特征提取结果、预设的领域-槽位词向量以及预设的词表字典向量输入至解码器,得到解码后向量;A decoding module, configured to input the second feature extraction result, the preset field-slot word vector and the preset vocabulary dictionary vector to the decoder to obtain a decoded vector;
对话状态追踪模块,用于基于所述解码后向量计算待填充的槽位值的预测概率分布,然后基于所述待填充的槽位值的预测概率分布对所述至少一个领域下的槽位进行填充,以便实现对话状态追踪;其中,所述预设的领域-槽位词向量由所述预设的领域向量以及所述预设的槽位词向量拼接而成。A dialog state tracking module, configured to calculate a predicted probability distribution of the slot values to be filled based on the decoded vector, and then perform an operation on the slots under the at least one domain based on the predicted probability distribution of the slot values to be filled. filling, so as to realize dialogue state tracking; wherein, the preset domain-slot word vector is spliced by the preset domain vector and the preset slot word vector.
第三方面,本申请公开了一种电子设备,包括:In a third aspect, the present application discloses an electronic device, comprising:
存储器,用于保存计算机程序;memory for storing computer programs;
处理器,用于执行所述计算机程序,以实现前述公开的对话状态追踪方法。A processor is configured to execute the computer program to implement the dialog state tracking method disclosed above.
第四方面,本申请公开了一种计算机可读存储介质,用于保存计算机程序;其中,所述计算机程序被处理器执行时实现前述公开的对话状态追踪方法。In a fourth aspect, the present application discloses a computer-readable storage medium for storing a computer program; wherein, when the computer program is executed by a processor, the aforementioned dialog state tracking method disclosed above is implemented.
可见,本申请提出一种对话状态追踪方法,包括:利用第一BiGRU神经网络对历史对话记录进行编码,得到第一编码结果,并基于注意力机制以及预设的领域向量对所述第一编码结果进行领域层面的特征提取,得到包含至少一个领域的第一特征提取结果;利用第二BiGRU神经网络对所述第一特征提取结果进行编码,得到第二编码结果,并基于注意力机制以及预设的槽位词向量对所述第二编码结果进行槽位层面的特征提取,得到包含槽位的第二特征提取结果;将所述第二特征提取结果、预设的领域-槽位词向量以及预设的词表字典向量输入至解码器,得到解码后向量,并基于所述解码后向量计算待填充的槽位值的预测概率分布,然后基于所述待填充的槽位值的预测概率分布对所述至少一个领域下的槽位进行填充,以便实现对话状态追踪;其中,所述预设的领域-槽位词向量由所述预设的领域向量以及所述预设的槽位词向量拼接而成。综上可见,由于BiGRU(双向门控循环单元)神经网络可以利用前向和后向的门控循环单元(GRU)结构提取上下文信息,因此本申请基于BiGRU神经网络在不忽略前后向的上下文信息的前提下解决了多领域场景下的历史对话长依赖问题;此外,传统的对话状态追踪是对领域-槽位对做整体的特征提取,如此一来,同一领域下的不同槽位信息之间无关,不同领域下的相同槽位信息之间亦无关,而本申请是将领域及槽位分开处理并依次对领域以及槽位做注意力运算,使得提取到的不同领域-槽位对之间的信息能够共享。It can be seen that the present application proposes a dialogue state tracking method, including: using the first BiGRU neural network to encode the historical dialogue record, obtaining the first encoding result, and encoding the first encoding result based on the attention mechanism and the preset domain vector As a result, the feature extraction at the domain level is carried out to obtain the first feature extraction result including at least one domain; the second BiGRU neural network is used to encode the first feature extraction result to obtain the second encoding result, and based on the attention mechanism and the prediction The set slot word vector performs feature extraction at the slot level on the second encoding result to obtain a second feature extraction result including the slot; the second feature extraction result, the preset domain-slot word vector And the preset vocabulary dictionary vector is input to the decoder to obtain the decoded vector, and calculate the predicted probability distribution of the slot value to be filled based on the decoded vector, and then based on the predicted probability of the slot value to be filled Distributing to fill slots under the at least one domain so as to realize dialogue state tracking; wherein, the preset domain-slot word vector consists of the preset domain vector and the preset slot word Vector concatenation. In summary, since the BiGRU (bidirectional gated recurrent unit) neural network can extract context information using the forward and backward gated recurrent unit (GRU) structures, this application does not ignore the forward and backward context information based on the BiGRU neural network. Under the premise of solving the long-term dependency problem of historical dialogues in multi-domain scenarios; in addition, the traditional dialogue state tracking is to perform overall feature extraction on domain-slot pairs. In this way, the information between different slots in the same domain It is irrelevant, and the same slot information in different fields is also irrelevant, but this application separates the fields and slots and performs attention calculations on the fields and slots in turn, so that the difference between the extracted different field-slot pairs information can be shared.
附图说明Description of drawings
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only It is an embodiment of the present invention, and those skilled in the art can also obtain other drawings according to the provided drawings without creative work.
图1为本申请公开的一种对话状态追踪方法流程图;FIG. 1 is a flow chart of a dialog state tracking method disclosed in the present application;
图2为本申请公开的一种具体的对话状态追踪方法流程图;FIG. 2 is a flow chart of a specific dialog state tracking method disclosed in the present application;
图3为本申请公开的一种对话状态追踪装置结构示意图;FIG. 3 is a schematic structural diagram of a dialog state tracking device disclosed in the present application;
图4为本申请公开的一种电子设备结构图。FIG. 4 is a structural diagram of an electronic device disclosed in the present application.
具体实施方式detailed description
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.
生成式方法和判别式方法,具有以下缺点:(1)未充分提取历史对话信息特征。深度学习中,可以利用循环神经网络对时间序列进行建模,提取历史对话信息特征。但RNN存在着梯度消失和梯度爆炸的问题,而长短时记忆网络和门控循环单元虽然可以通过门控机制解决序列长依赖的问题,但它们只有前向的上下文信息,忽略了后向的上下文信息;(2)多领域场景下,未充分共享领域-槽位对信息。Generative methods and discriminative methods have the following disadvantages: (1) The characteristics of historical dialogue information are not fully extracted. In deep learning, recurrent neural networks can be used to model time series and extract historical dialogue information features. However, RNN has the problem of gradient disappearance and gradient explosion. Although the long-short-term memory network and the gated recurrent unit can solve the problem of long-term sequence dependence through the gating mechanism, they only have forward context information and ignore the backward context. (2) In multi-domain scenarios, domain-slot pair information is not fully shared.
为此,本申请实施例提出一种对话状态追踪方案,能够在不忽略前后向的上下文信息的前提下解决多领域场景下的历史对话长依赖问题,并实现不同领域-槽位对之间的信息共享。For this reason, the embodiment of the present application proposes a dialogue state tracking scheme, which can solve the long-term dependency problem of historical dialogues in multi-domain scenarios without ignoring the forward and backward context information, and realize the communication between different domain-slot pairs. Information Sharing.
本申请实施例公开了一种对话状态追踪方法,参见图1所示,该方法包括:The embodiment of the present application discloses a dialog state tracking method, as shown in Fig. 1, the method includes:
步骤S11:利用第一BiGRU神经网络对历史对话记录进行编码,得到第一编码结果,并基于注意力机制以及预设的领域向量对所述第一编码结果进行领域层面的特征提取,得到包含至少一个领域的第一特征提取结果。Step S11: Use the first BiGRU neural network to encode the historical dialogue records to obtain the first encoding result, and perform domain-level feature extraction on the first encoding result based on the attention mechanism and the preset domain vector to obtain at least A domain's first feature extraction results.
需要指出的是,在多轮次的对话任务过程中,往往会出现没有意义的对话,还会出现话题转移的情况,这些都会导致多领域场景下领域判别不清,上下文之间相互影响,为了捕获不同领域下的历史对话信息,使其更好的关注于各自领域信息而不受其他域的信息干扰,本申请采用领域注意力机制解决传统技术中不能更好的关注于各自领域信息并且受其他域的信息干扰的问题,由于注意力机制能够忽略输入序列中元素的距离,直接捕获任务中的关键信息,因此本申请能够更好的关注于各自领域信息而不受其他域的信息干扰。具体过程如下所示:It should be pointed out that in the process of multiple rounds of dialogue tasks, there will often be meaningless dialogues and topic shifts, which will lead to unclear domain discrimination in multi-domain scenarios and mutual influence between contexts. Capturing historical dialogue information in different domains, so that it can better focus on information in its own domain without being interfered by information in other domains. For the problem of information interference in other domains, since the attention mechanism can ignore the distance of elements in the input sequence and directly capture the key information in the task, this application can better focus on the information in its own domain without interference from information in other domains. The specific process is as follows:
本实施例中,首先需要获取若干轮次对话记录,并对所述若干轮次对话记录进行拼接,得到所述历史对话记录。具体的,将T轮对话的多轮次对话DT依次拼接成x={u1,s1,u2,s2,...uT,sT},其中,ut表示t时刻的用户话语,st表示t时刻的系统话语。进一步的,利用第一BiGRU神经网络对x={u1,s1,u2,s2,...uT,sT}进行编码,得到第一编码结果,表示为h={h1,h2,...hT},所述第一编码结果为BiGRU神经网络的隐含层输出,需要指出的是,此处编码是为了后续提取领域层面的特征。其中,BiGRU神经网络可以利用前向和后向的GRU结构提取上下文信息,充分捕捉到对话语句中蕴含的语义信息。在对拼接后的对话记录进行编码后,对预设的领域词数据库中的领域词进行编码得到所述预设的领域向量,具体的,对于所述预设的领域词数据库中的某一个特定的领域词di,可以将其编码为需要指出的是,此处编码是为了将领域词编码为领域向量。进一步的,对编码后的对话记录和编码后得到的领域向量进行注意力计算,从而完成领域特征提取。所述对编码后的对话记录和编码后得到的领域向量进行计算的过程,通过公式可以表现为以下形式:In this embodiment, it is first necessary to obtain several rounds of dialogue records, and splice the several rounds of dialogue records to obtain the historical dialogue records. Specifically, multiple rounds of dialogue D T of T rounds of dialogue are sequentially spliced into x={u 1 , s 1 , u 2 , s 2 ,... u T , s T }, where u t represents the User utterance, s t represents the system utterance at time t. Further, use the first BiGRU neural network to encode x={u 1 , s 1 , u 2 , s 2 ,... u T , s T } to obtain the first encoding result, expressed as h={h 1 ,h 2 ,...h T }, the first encoding result is the hidden layer output of the BiGRU neural network. It should be pointed out that the encoding here is for the subsequent extraction of domain-level features. Among them, the BiGRU neural network can use the forward and backward GRU structure to extract context information, and fully capture the semantic information contained in the dialogue sentence. After encoding the spliced dialog records, encode the domain words in the preset domain word database to obtain the preset domain vector, specifically, for a certain specific domain word in the preset domain word database domain word d i , which can be coded as It should be pointed out that the encoding here is to encode domain words into domain vectors. Further, attention calculation is performed on the encoded dialogue record and the encoded domain vector, so as to complete domain feature extraction. The process of calculating the encoded dialogue record and the encoded domain vector can be expressed in the following form through the formula:
其中,yi表示预设的领域词向量中第i个域和历史对话记录之间的相关性;yi softmax表示归一化后的yi,也即每一个领域i对应历史对话记录的权重;yi context表示带有领域权重的历史向量h;所述历史向量即上下文向量,也即对历史对话记录进行编码后得到的第一编码结果;T表示转置。Among them, y i represents the correlation between the i-th field in the preset field word vector and the historical dialogue record; y i softmax represents the normalized y i , that is, the weight of each field i corresponding to the historical dialogue record ; y i context represents the history vector h with domain weight; the history vector is the context vector, that is, the first encoding result obtained after encoding the historical dialogue records; T indicates transposition.
由于含有整个历史对话记录的上下文信息,因此,本申请使用作为特定领域di的表示,然后,利用注意力机制计算上下文,通过赋予权重的方式提取神经网络输出的隐含表示,本实施例中,所述第一特征提取结果也即yi context。because Contains the context information of the entire historical conversation record, therefore, this application uses As the representation of the specific field d i , then use the attention mechanism to calculate the context, and extract the hidden representation output by the neural network by assigning weights. In this embodiment, the first feature extraction result is also y i context .
步骤S12:利用第二BiGRU神经网络对所述第一特征提取结果进行编码,得到第二编码结果,并基于注意力机制以及预设的槽位词向量对所述第二编码结果进行槽位层面的特征提取,得到包含槽位的第二特征提取结果。Step S12: Use the second BiGRU neural network to encode the first feature extraction result to obtain the second encoding result, and perform slot level analysis on the second encoding result based on the attention mechanism and the preset slot word vector feature extraction to obtain the second feature extraction result including slots.
本实施例中,在获取带有领域关键信息的第一特征提取结果之后,利用第二BiGRU神经网络对所述第一特征提取结果进行编码,得到第二编码结果,记为h',需要指出的是,此处编码是为了后续提取槽位层面的特征。在对第一特征提取结果进行编码后,对预设的槽位词数据库中的槽位词进行编码得到所述预设的槽位词向量,具体的,对于所述预设的槽位词数据库中的某一个特定的槽位词sj,可以将其编码为需要指出的是,此处编码是为了将槽位词编码为槽位词向量,进一步的,对第二编码结果以及编码后得到的槽位词向量进行注意力计算,从而完成槽位词特征提取。所述对第二编码结果以及编码后得打的槽位词向量进行注意力计算的过程,通过公式可以表现为以下形式:In this embodiment, after obtaining the first feature extraction result with domain key information, the second BiGRU neural network is used to encode the first feature extraction result to obtain the second encoding result, denoted as h', it needs to be pointed out Interestingly, the encoding here is for the subsequent extraction of features at the slot level. After encoding the first feature extraction result, encode the slot word in the preset slot word database to obtain the preset slot word vector, specifically, for the preset slot word database A specific slot word s j in , can be coded as It should be pointed out that the encoding here is to encode the slot word into a slot word vector, and further, perform attention calculation on the second encoding result and the slot word vector obtained after encoding, so as to complete the slot word feature extraction . The process of calculating attention to the second encoding result and the slot word vectors to be played after encoding can be expressed in the following form through the formula:
其中,zj表示预设的槽位词向量中第j个域和历史对话记录之间的相关性,zj softmax表示归一化后的zj,也即每一个槽位词对应历史对话记录的权重;zj context表示带有槽位词权重的历史向量h。Among them, z j represents the correlation between the jth field in the preset slot word vector and the historical dialogue record, and z j softmax represents the normalized z j , that is, each slot word corresponds to the historical dialogue record weight; z j context represents the history vector h with slot word weights.
参见领域特征的处理方式,由于含有整个历史对话信息的上下文信息,因此,本申请使用作为特定槽位sj的表示。然后,利用注意力机制计算上下文,通过赋予权重的方式提取神经网络输出的隐含表示,实施例中,所述第二特征提取结果也即zj context。如此一来,本申请能够将历史对话信息中关于槽位的信息突出表达。See how domain features are handled, due to Contains the context information of the entire historical dialogue information, therefore, this application uses as a representation of a specific slot s j . Then, the attention mechanism is used to calculate the context, and the hidden representation output by the neural network is extracted by assigning weights. In the embodiment, the second feature extraction result is also z j context . In this way, the present application can highlight the information about slots in the historical dialogue information.
需要指出的是,在对话状态追踪中,传统的操作都是对领域-槽位对做整体编码,如此一来,同一领域下的不同槽位之间无关,不同领域下的相同槽位之间亦无关。但是在多领域对话的情况下,所有领域-槽位对之间并不是完全无关的,比如,地点槽位信息会出现在打车和订餐领域中。再者,多领域数据集中不同领域之间的数据规模往往不是完全均衡的,可能有些领域的数据量较少从而导致模型训练不充分,也即传统的操作未充分共享领域-槽位对之间的信息,相较于传统的基于领域-槽位对的处理方式,本申请通过将领域以及槽位分开处理,并依次领域以及槽位做注意力运算,实现了不同领域-槽位对之间的信息共享。例如,假设存在领域A以及领域B,领域A下的某一槽位对应的槽位值是A1,A2,领域B下的相同槽位对应的槽位值是B1,B2,传统的操作是对领域-槽位对做整体编码,也即A-A1,A-A2,B-B1,B-B2,编码结果使得领域A和领域B下的相同槽位之间没有关系,但是在多领域对话的情况下,所有领域-槽位对之间并不是完全无关的,因此,本申请通过将领域以及槽位分开处理,并依次领域以及槽位做注意力运算,也即先对领域做注意力运算,提取到领域A和领域B,再对槽位做注意力运算,然后确定待填充的槽位值可能是A1、A2、B1、B2,此时领域下的槽位对应的槽位值不但可以包括A1与A2还可以包括B1与B2,如此一来,实现了不同领域-槽位对之间的信息共享。It should be pointed out that in the dialogue state tracking, the traditional operation is to encode domain-slot pairs as a whole. In this way, different slots in the same domain have nothing to do with each other, and the same slots in different domains have nothing to do with each other. Also irrelevant. But in the case of multi-domain dialogue, all domain-slot pairs are not completely irrelevant. For example, location slot information will appear in the taxi and meal order domains. Furthermore, the data size between different domains in a multi-domain data set is often not completely balanced, and the amount of data in some domains may be small, resulting in insufficient model training, that is, traditional operations do not fully share domain-slot pairs. Compared with the traditional processing method based on domain-slot pairs, this application separates domains and slots, and performs attention calculations on domains and slots in turn, so as to realize the difference between different domain-slot pairs. information sharing. For example, assuming that domain A and domain B exist, the slot values corresponding to a certain slot in domain A are A1 and A2, and the slot values corresponding to the same slot in domain B are B1 and B2. The traditional operation is to Domain-slot pairs are encoded as a whole, that is, A-A1, A-A2, B-B1, and B-B2. The result of the encoding is that there is no relationship between the same slots under domain A and domain B, but in multi-domain dialogue In the case of , all field-slot pairs are not completely irrelevant. Therefore, this application separates the fields and slots, and performs attention calculations on the fields and slots in turn, that is, pays attention to the fields first. Operation, extract domain A and domain B, and then perform attention calculation on the slots, and then determine that the slot values to be filled may be A1, A2, B1, B2. At this time, the slot values corresponding to the slots under the domain are not only It may include A1 and A2, and may also include B1 and B2. In this way, information sharing among different field-slot pairs is realized.
步骤S13:将第二特征提取结果、预设的领域-槽位词向量以及预设的词表字典向量输入至解码器,得到解码后向量,并基于所述解码后向量计算待填充的槽位值的预测概率分布,然后基于所述待填充的槽位值的预测概率分布对所述至少一个领域下的槽位进行填充,以便实现对话状态追踪;其中,所述预设的领域-槽位词向量由所述预设的领域向量以及所述预设的槽位词向量拼接而成。Step S13: Input the second feature extraction result, the preset domain-slot word vector and the preset vocabulary dictionary vector into the decoder to obtain the decoded vector, and calculate the slot to be filled based on the decoded vector The predicted probability distribution of values, and then fill the slots under the at least one domain based on the predicted probability distribution of the slot values to be filled, so as to realize dialogue state tracking; wherein, the preset domain-slot The word vector is concatenated by the preset domain vector and the preset slot word vector.
具体的,将所述第二特征提取结果、预设的领域-槽位词向量以及预设的词表字典向量输入至由第三BiGRU神经网络构建的解码器,得到解码后向量。Specifically, the second feature extraction result, the preset domain-slot word vector and the preset vocabulary dictionary vector are input to the decoder constructed by the third BiGRU neural network to obtain the decoded vector.
本实施例中,在将第二特征提取结果、预设的领域-槽位词向量以及预设的词表字典向量输入至解码器,得到解码后向量之后,还包括:基于对话状态追踪策略将所述解码后向量映射为目标概率分布;当所述目标概率分布为第一概率分布,则表示用户对所述至少一个领域下的槽位未提及,当所述目标概率分布为第二概率分布,则表示用户对所述至少一个领域下的槽位持无所谓的状态,当所述目标概率分布为第三概率分布,则表示用户对所述至少一个领域下的槽位有所提及。所述对话状态追踪策略具体为当下主流的DST(Dialogue State Tracking,对话状态追踪)更新策略,即分类器将解码后向量映射成NONE、DONTCARE和MENTIONED的概率分布,其中,当映射为NONE的概率分布(第一概率分布),则表示用户对所述至少一个领域下的槽位未提及,当映射为DOTCARE的概率分布(第二概率分布),则表示用户对所述至少一个领域下的槽位持无所谓的状态,当映射为MENTIONED的概率分布(第三概率分布),则表示用户对所述至少一个领域下的槽位有所提及。In this embodiment, after inputting the second feature extraction result, the preset field-slot word vector and the preset vocabulary dictionary vector into the decoder to obtain the decoded vector, it also includes: The decoded vector is mapped to a target probability distribution; when the target probability distribution is the first probability distribution, it means that the user has not mentioned the slots under the at least one domain; when the target probability distribution is the second probability distribution If the target probability distribution is the third probability distribution, it means that the user has mentioned the slots in the at least one field. The dialogue state tracking strategy is specifically the current mainstream DST (Dialogue State Tracking, dialogue state tracking) update strategy, that is, the classifier maps the decoded vector into the probability distribution of NONE, DONTCARE and MENTIONED, wherein when the probability distribution of mapping to NONE distribution (the first probability distribution), it means that the user has not mentioned the slots under the at least one domain; The slot is in an indifferent state, and when mapped to a MENTIONED probability distribution (the third probability distribution), it means that the user has mentioned the slot under the at least one domain.
本实施例中,在历史对话记录中,由于用户的表述可能模糊不清,因此无法直接根据用户的表述计算待填充的槽位值,因此本申请引入预设的词表词典,所述预设的词表字典中包括若干个槽位值,进一步的,由于词表字典中无法包括所有的槽位值,因此本申请基于词表字典以及历史对话记录共同计算待填充的槽位值的概率分布。具体的,当映射为第三概率分布时,则基于所述解码后向量计算所述预设的词表字典中的槽位值的概率分布以及所述历史对话记录中的槽位值的概率分布;根据所述预设的词表字典中的槽位值的概率分布以及所述历史对话记录中的槽位值的概率分布计算所述待填充的槽位值的预测概率分布。其中,所述基于所述解码后向量计算所述预设的词表字典中的槽位值的概率分布以及所述历史对话记录中的槽位值的概率分布的公式为:In this embodiment, in the historical dialogue records, since the user's expression may be ambiguous, it is impossible to directly calculate the value of the slot to be filled according to the user's expression. Therefore, this application introduces a preset vocabulary dictionary, and the preset The vocabulary dictionary includes several slot values. Further, because the vocabulary dictionary cannot include all slot values, this application calculates the probability distribution of the slot values to be filled based on the vocabulary dictionary and historical dialogue records. . Specifically, when mapping to the third probability distribution, the probability distribution of slot values in the preset vocabulary dictionary and the probability distribution of slot values in the historical dialogue records are calculated based on the decoded vector ; Calculate the predicted probability distribution of the slot values to be filled according to the probability distribution of the slot values in the preset vocabulary dictionary and the probability distribution of the slot values in the historical dialogue records. Wherein, the formula for calculating the probability distribution of the slot value in the preset vocabulary dictionary and the probability distribution of the slot value in the historical dialogue record based on the decoded vector is:
其中,V表示所述预设的词表字典向量,Hdecode表示所述解码后向量,h表示所述第一编码结果,pvocab表示所述预设的词表字典中的槽位值的概率分布,phistory表示所述历史对话记录中的槽位值的概率分布。Wherein, V represents the preset vocabulary dictionary vector, H decode represents the decoded vector, h represents the first encoding result, and p vocab represents the probability of a slot value in the preset vocabulary dictionary distribution, and p history represents the probability distribution of the slot values in the historical dialogue records.
所述根据所述预设的词表字典中的槽位值的概率分布以及所述历史对话记录中的槽位值的概率分布计算所述待填充的槽位值的预测概率分布的公式为:The formula for calculating the predicted probability distribution of the slot value to be filled according to the probability distribution of the slot value in the preset vocabulary dictionary and the probability distribution of the slot value in the historical dialogue record is:
pvalue=pgen×pvocab+(1-pgen)×phistory;p value = p gen × p vocab + (1-p gen ) × p history ;
其中,pgen表示从所述预设的词表字典中生成所述待填充的槽位值的权重。Wherein, p gen represents the weight of generating the value of the slot to be filled from the preset vocabulary dictionary.
如此一来,本申请根据待填充的槽位值的概率分布确定出相应的槽位值,并利用所述槽位值对至少一个领域下的槽位进行填充。In this way, the present application determines the corresponding slot value according to the probability distribution of the slot values to be filled, and uses the slot value to fill slots in at least one domain.
可见,本申请提出一种对话状态追踪方法,包括:利用第一BiGRU神经网络对历史对话记录进行编码,得到第一编码结果,并基于注意力机制以及预设的领域向量对所述第一编码结果进行领域层面的特征提取,得到包含至少一个领域的第一特征提取结果;利用第二BiGRU神经网络对所述第一特征提取结果进行编码,得到第二编码结果,并基于注意力机制以及预设的槽位词向量对所述第二编码结果进行槽位层面的特征提取,得到包含槽位的第二特征提取结果;将所述第二特征提取结果、预设的领域-槽位词向量以及预设的词表字典向量输入至解码器,得到解码后向量,并基于所述解码后向量计算待填充的槽位值的预测概率分布,然后基于所述待填充的槽位值的预测概率分布对所述至少一个领域下的槽位进行填充,以便实现对话状态追踪;其中,所述预设的领域-槽位词向量由所述预设的领域向量以及所述预设的槽位词向量拼接而成。综上可见,由于BiGRU(双向门控循环单元)神经网络可以利用前向和后向的门控循环单元(GRU)结构提取上下文信息,因此本申请基于BiGRU神经网络在不忽略前后向的上下文信息的前提下解决了多领域场景下的历史对话长依赖问题;此外,传统的对话状态追踪是对领域-槽位对做整体的特征提取,如此一来,同一领域下的不同槽位信息之间无关,不同领域下的相同槽位信息之间亦无关,而本申请是将领域及槽位分开处理并依次对领域以及槽位做注意力运算,使得提取到的不同领域-槽位对之间的信息能够共享。It can be seen that the present application proposes a dialogue state tracking method, including: using the first BiGRU neural network to encode the historical dialogue record, obtaining the first encoding result, and encoding the first encoding result based on the attention mechanism and the preset domain vector As a result, the feature extraction at the domain level is carried out to obtain the first feature extraction result including at least one domain; the second BiGRU neural network is used to encode the first feature extraction result to obtain the second encoding result, and based on the attention mechanism and the prediction The set slot word vector performs feature extraction at the slot level on the second encoding result to obtain a second feature extraction result including the slot; the second feature extraction result, the preset domain-slot word vector And the preset vocabulary dictionary vector is input to the decoder to obtain the decoded vector, and calculate the predicted probability distribution of the slot value to be filled based on the decoded vector, and then based on the predicted probability of the slot value to be filled Distributing to fill slots under the at least one domain so as to realize dialogue state tracking; wherein, the preset domain-slot word vector consists of the preset domain vector and the preset slot word Vector concatenation. In summary, since the BiGRU (bidirectional gated recurrent unit) neural network can extract context information using the forward and backward gated recurrent unit (GRU) structures, this application does not ignore the forward and backward context information based on the BiGRU neural network. Under the premise of solving the long-term dependency problem of historical dialogues in multi-domain scenarios; in addition, the traditional dialogue state tracking is to perform overall feature extraction on domain-slot pairs. In this way, the information between different slots in the same domain It is irrelevant, and the same slot information in different fields is also irrelevant, but this application separates the fields and slots and performs attention calculations on the fields and slots in turn, so that the difference between the extracted different field-slot pairs information can be shared.
图2为本申请公开的一种具体的对话状态追踪方法流程图,参见图2所示,本申请(1)首先使用第一BiGRU神经网络将拼接的历史对话进行编码,得到第一编码结果;(2)基于注意力机制以及预设的领域向量对第一编码结果进行领域层面的特征提取,得到第一特征提取结果;(3)使用第二BiGRU神经网络结构对第一特征提取结果进行编码,得到第二编码结果;(4)基于注意力机制以及预设的槽位词向量对第二编码结果进行槽位层面的特征提取,得到第二特征提取结果;(5)将第二特征提取结果、预设的词表字典向量以及领域-槽位词向量输入至解码器,得到解码后向量,解码器采用第三BiGRU神经网络结构;(6)计算预设的词表字典中槽位值的概率以及历史对话记录中槽位值的概率;并基于预设的词表字典中槽位值的概率、历史对话记录中槽位值的概率以及解码后向量计算槽位值最终的预测概率分布,并根据槽位值最终概率分布确定相应的槽位值,然后完成填充。Fig. 2 is a flow chart of a specific dialog state tracking method disclosed in the present application. Referring to Fig. 2, the present application (1) first uses the first BiGRU neural network to encode the spliced historical dialogs to obtain the first coding result; (2) Perform domain-level feature extraction on the first encoding result based on the attention mechanism and the preset domain vector to obtain the first feature extraction result; (3) Use the second BiGRU neural network structure to encode the first feature extraction result , to obtain the second encoding result; (4) perform slot-level feature extraction on the second encoding result based on the attention mechanism and the preset slot word vector to obtain the second feature extraction result; (5) extract the second feature The result, the preset vocabulary dictionary vector and the field-slot word vector are input to the decoder to obtain the decoded vector, and the decoder adopts the third BiGRU neural network structure; (6) Calculate the slot value in the preset vocabulary dictionary and the probability of the slot value in the historical dialogue record; and calculate the final predicted probability distribution of the slot value based on the probability of the slot value in the preset vocabulary dictionary, the probability of the slot value in the historical dialogue record, and the decoded vector , and determine the corresponding slot value according to the final probability distribution of the slot value, and then complete the filling.
可见,由于双向门控单元可以利用前向和后向的门控循环单元结构提取上下文信息,因此本申请基于BiGRU神经网络在不忽略前后向的上下文信息的前提下解决多领域场景下的历史对话长依赖问题;本申请通过将领域及槽位分开处理并依次对领域以及槽位做注意力运算,实现了不同领域-槽位对之间的信息共享。It can be seen that since the bidirectional gating unit can use the forward and backward gating cyclic unit structure to extract context information, this application solves the historical dialogue in multi-domain scenarios based on the BiGRU neural network without ignoring the forward and backward context information. Long-term dependency problem; this application realizes information sharing between different domain-slot pairs by separately processing domains and slots and performing attention operations on domains and slots in turn.
相应的,本申请实施例还公开了一种对话状态追踪装置,参见图3所示,该装置包括:Correspondingly, the embodiment of the present application also discloses a dialog state tracking device, as shown in Fig. 3, the device includes:
第一编码模块11,用于利用第一BiGRU神经网络对历史对话记录进行编码,得到第一编码结果;The first encoding module 11 is used to encode the historical dialogue record by using the first BiGRU neural network to obtain the first encoding result;
领域层面特征提取模块12,用于基于注意力机制以及预设的领域向量对所述第一编码结果进行领域层面的特征提取,得到包含至少一个领域的第一特征提取结果;A domain-level feature extraction module 12, configured to perform domain-level feature extraction on the first encoding result based on an attention mechanism and a preset domain vector, to obtain a first feature extraction result including at least one domain;
第二编码模块13,用于利用第二BiGRU神经网络对所述第一特征提取结果进行编码,得到第二编码结果;The second encoding module 13 is used to encode the first feature extraction result by using the second BiGRU neural network to obtain the second encoding result;
槽位层面特征提取模块14,用于基于注意力机制以及预设的槽位词向量对所述第二编码结果进行槽位层面的特征提取,得到包含槽位的第二特征提取结果;The slot-level feature extraction module 14 is used to perform feature extraction at the slot level on the second encoding result based on the attention mechanism and the preset slot word vector, to obtain a second feature extraction result including the slot;
解码模块15,用于将所述第二特征提取结果、预设的领域-槽位词向量以及预设的词表字典向量输入至解码器,得到解码后向量;The decoding module 15 is configured to input the second feature extraction result, the preset field-slot word vector and the preset vocabulary dictionary vector to the decoder to obtain the decoded vector;
对话状态追踪模块16,用于基于所述解码后向量计算待填充的槽位值的预测概率分布,然后基于所述待填充的槽位值的预测概率分布对所述至少一个领域下的槽位进行填充,以便实现对话状态追踪;其中,所述预设的领域-槽位词向量由所述预设的领域向量以及所述预设的槽位词向量拼接而成。Dialogue state tracking module 16, configured to calculate the predicted probability distribution of the slot value to be filled based on the decoded vector, and then calculate the slots under the at least one domain based on the predicted probability distribution of the slot value to be filled Filling is performed to realize dialogue state tracking; wherein, the preset domain-slot word vector is spliced by the preset domain vector and the preset slot word vector.
其中,关于上述各个模块更加具体的工作过程可以参考前述实施例中公开的相应内容,在此不再进行赘述。For the more specific working process of each of the above modules, reference may be made to the corresponding content disclosed in the foregoing embodiments, which will not be repeated here.
可见,本申请提出一种对话状态追踪方法,包括:利用第一BiGRU神经网络对历史对话记录进行编码,得到第一编码结果,并基于注意力机制以及预设的领域向量对所述第一编码结果进行领域层面的特征提取,得到包含至少一个领域的第一特征提取结果;利用第二BiGRU神经网络对所述第一特征提取结果进行编码,得到第二编码结果,并基于注意力机制以及预设的槽位词向量对所述第二编码结果进行槽位层面的特征提取,得到包含槽位的第二特征提取结果;将所述第二特征提取结果、预设的领域-槽位词向量以及预设的词表字典向量输入至解码器,得到解码后向量,并基于所述解码后向量计算待填充的槽位值的预测概率分布,然后基于所述待填充的槽位值的预测概率分布对所述至少一个领域下的槽位进行填充,以便实现对话状态追踪;其中,所述预设的领域-槽位词向量由所述预设的领域向量以及所述预设的槽位词向量拼接而成。综上可见,由于BiGRU(双向门控循环单元)神经网络可以利用前向和后向的门控循环单元(GRU)结构提取上下文信息,因此本申请基于BiGRU神经网络在不忽略前后向的上下文信息的前提下解决了多领域场景下的历史对话长依赖问题;此外,传统的对话状态追踪是对领域-槽位对做整体的特征提取,如此一来,同一领域下的不同槽位信息之间无关,不同领域下的相同槽位信息之间亦无关,而本申请是将领域及槽位分开处理并依次对领域以及槽位做注意力运算,使得提取到的不同领域-槽位对之间的信息能够共享。It can be seen that the present application proposes a dialogue state tracking method, including: using the first BiGRU neural network to encode the historical dialogue record, obtaining the first encoding result, and encoding the first encoding result based on the attention mechanism and the preset domain vector As a result, the feature extraction at the domain level is carried out to obtain the first feature extraction result including at least one domain; the second BiGRU neural network is used to encode the first feature extraction result to obtain the second encoding result, and based on the attention mechanism and the prediction The set slot word vector performs feature extraction at the slot level on the second encoding result to obtain a second feature extraction result including the slot; the second feature extraction result, the preset domain-slot word vector And the preset vocabulary dictionary vector is input to the decoder to obtain the decoded vector, and calculate the predicted probability distribution of the slot value to be filled based on the decoded vector, and then based on the predicted probability of the slot value to be filled Distributing to fill slots under the at least one domain so as to realize dialogue state tracking; wherein, the preset domain-slot word vector consists of the preset domain vector and the preset slot word Vector concatenation. In summary, since the BiGRU (bidirectional gated recurrent unit) neural network can extract context information using the forward and backward gated recurrent unit (GRU) structures, this application does not ignore the forward and backward context information based on the BiGRU neural network. Under the premise of solving the long-term dependency problem of historical dialogues in multi-domain scenarios; in addition, the traditional dialogue state tracking is to perform overall feature extraction on domain-slot pairs. In this way, the information between different slots in the same domain It is irrelevant, and the same slot information in different fields is also irrelevant, but this application separates the fields and slots and performs attention calculations on the fields and slots in turn, so that the difference between the extracted different field-slot pairs information can be shared.
进一步的,本申请实施例还提供了一种电子设备。图4是根据一示例性实施例示出的电子设备20结构图,图中的内容不能认为是对本申请的使用范围的任何限制。Further, the embodiment of the present application also provides an electronic device. Fig. 4 is a structural diagram of an
图4为本申请实施例提供的一种电子设备20的结构示意图。该电子设备20,具体可以包括:至少一个处理器21、至少一个存储器22、显示屏23、输入输出接口24、通信接口25、电源26和通信总线27。其中,所述存储器22用于存储计算机程序,所述计算机程序由所述处理器21加载并执行,以实现前述任一实施例公开的对话状态追踪方法中的相关步骤。另外,本实施例中的电子设备20具体可以为电子计算机。FIG. 4 is a schematic structural diagram of an
本实施例中,电源26用于为电子设备20上的各硬件设备提供工作电压;通信接口25能够为电子设备20创建与外界设备之间的数据传输通道,其所遵循的通信协议是能够适用于本申请技术方案的任意通信协议,在此不对其进行具体限定;输入输出接口24,用于获取外界输入数据或向外界输出数据,其具体的接口类型可以根据具体应用需要进行选取,在此不进行具体限定。In this embodiment, the power supply 26 is used to provide working voltage for each hardware device on the
另外,存储器22作为资源存储的载体,可以是只读存储器、随机存储器、磁盘或者光盘等,其上所存储的资源可以包括计算机程序221,存储方式可以是短暂存储或者永久存储。其中,计算机程序221除了包括能够用于完成前述任一实施例公开的由电子设备20执行的对话状态追踪方法的计算机程序之外,还可以进一步包括能够用于完成其他特定工作的计算机程序。In addition, as a resource storage carrier, the memory 22 may be a read-only memory, random access memory, magnetic disk or optical disk, etc., and the resources stored thereon may include computer programs 221, and the storage method may be temporary storage or permanent storage. Wherein, the computer program 221 may further include a computer program capable of completing other specific tasks in addition to the computer program capable of completing the dialog state tracking method performed by the
进一步的,本申请实施例还公开了一种计算机可读存储介质,用于存储计算机程序;其中,所述计算机程序被处理器执行时实现前述公开的对话状态追踪方法。Furthermore, the embodiment of the present application also discloses a computer-readable storage medium for storing a computer program; wherein, when the computer program is executed by a processor, the aforementioned dialog state tracking method disclosed above is implemented.
关于该方法的具体步骤可以参考前述实施例中公开的相应内容,在此不再进行赘述。Regarding the specific steps of the method, reference may be made to the corresponding content disclosed in the foregoing embodiments, and details are not repeated here.
本申请书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似部分互相参见即可对于实施例公开的装置而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。Each embodiment in this application is described in a progressive manner, and each embodiment focuses on the difference from other embodiments. The same or similar parts of each embodiment can be disclosed for the embodiments by referring to each other. As far as the device is concerned, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and for relevant parts, please refer to the description of the method part.
专业人员还可以进一步意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。Professionals can further realize that the units and algorithm steps of the examples described in conjunction with the embodiments disclosed herein can be implemented by electronic hardware, computer software or a combination of the two. In order to clearly illustrate the possible Interchangeability, in the above description, the components and steps of each example have been generally described according to their functions. Whether these functions are executed by hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the scope of the present application.
结合本文中所公开的实施例描述的方法或算法的步骤可以直接用硬件、处理器执行的软件模块,或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质中。The steps of the methods or algorithms described in connection with the embodiments disclosed herein may be directly implemented by hardware, software modules executed by a processor, or a combination of both. Software modules can be placed in random access memory (RAM), internal memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or any other Any other known storage medium.
最后,还需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。Finally, it should also be noted that in this text, relational terms such as first and second etc. are only used to distinguish one entity or operation from another, and do not necessarily require or imply that these entities or operations, any such actual relationship or order exists. Furthermore, the term "comprises", "comprises" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus comprising a set of elements includes not only those elements, but also includes elements not expressly listed. other elements of or also include elements inherent in such a process, method, article, or device. Without further limitations, an element defined by the phrase "comprising a ..." does not exclude the presence of additional identical elements in the process, method, article or apparatus comprising said element.
以上对本申请所提供的一种对话状态追踪方法、装置、设备、存储介质进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。A dialogue state tracking method, device, equipment, and storage medium provided by this application have been introduced in detail above. In this paper, specific examples are used to illustrate the principle and implementation of this application. The description of the above embodiments is only for To help understand the method and its core idea of this application; at the same time, for those of ordinary skill in the art, according to the idea of this application, there will be changes in the specific implementation and application scope. In summary, the content of this specification It should not be construed as a limitation of the application.
Claims (10)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202211286338.9A CN115617997A (en) | 2022-10-20 | 2022-10-20 | A dialog state tracking method, device, equipment and medium |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202211286338.9A CN115617997A (en) | 2022-10-20 | 2022-10-20 | A dialog state tracking method, device, equipment and medium |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN115617997A true CN115617997A (en) | 2023-01-17 |
Family
ID=84864058
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202211286338.9A Pending CN115617997A (en) | 2022-10-20 | 2022-10-20 | A dialog state tracking method, device, equipment and medium |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN115617997A (en) |
-
2022
- 2022-10-20 CN CN202211286338.9A patent/CN115617997A/en active Pending
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| EP3582119B1 (en) | Spoken language understanding system and method using recurrent neural networks | |
| CN112560496B (en) | Training method and device of semantic analysis model, electronic equipment and storage medium | |
| US9548051B2 (en) | System and method of spoken language understanding in human computer dialogs | |
| EP4113357A1 (en) | Method and apparatus for recognizing entity, electronic device and storage medium | |
| CN110678882B (en) | Method and system for selecting answer spans from electronic documents using machine learning | |
| US20220156467A1 (en) | Hybrid Natural Language Understanding | |
| WO2023020262A1 (en) | Integrating dialog history into end-to-end spoken language understanding systems | |
| CN113128206B (en) | Question generation method based on word importance weighting | |
| US11170765B2 (en) | Contextual multi-channel speech to text | |
| CN112507705B (en) | A method, device and electronic equipment for generating position codes | |
| CN114898734B (en) | Pre-training method and device based on voice synthesis model and electronic equipment | |
| CN111966811A (en) | Intention recognition and slot filling method and device, readable storage medium and terminal equipment | |
| CN110060674A (en) | Form management method, apparatus, terminal and storage medium | |
| CN113744713A (en) | Speech synthesis method and training method of speech synthesis model | |
| US20240028893A1 (en) | Generating neural network outputs using insertion commands | |
| CN117153142A (en) | A speech signal synthesis method, device, electronic equipment and storage medium | |
| US12073822B2 (en) | Voice generating method and apparatus, electronic device and storage medium | |
| US11481609B2 (en) | Computationally efficient expressive output layers for neural networks | |
| US12437748B2 (en) | Spoken language processing method and apparatus, and storage medium | |
| CN115510203B (en) | Method, device, equipment, storage medium and program product for determining answers to questions | |
| CN114416941B (en) | Method and device for generating dialogue knowledge point determination model fused with knowledge graph | |
| CN109002498B (en) | Man-machine conversation method, device, equipment and storage medium | |
| CN115081459B (en) | Spoken language text generation method, device, equipment and storage medium | |
| CN115617997A (en) | A dialog state tracking method, device, equipment and medium | |
| CN112541557B (en) | Training method and device for generating countermeasure network and electronic equipment |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| CB02 | Change of applicant information | ||
| CB02 | Change of applicant information |
Country or region after: China Address after: No. 9 Mozhou East Road, Nanjing City, Jiangsu Province, 211111 Applicant after: Zijinshan Laboratory Address before: No. 9 Mozhou East Road, Jiangning Economic Development Zone, Jiangning District, Nanjing City, Jiangsu Province Applicant before: Purple Mountain Laboratories Country or region before: China |