[go: up one dir, main page]

CN114254085A - Semantic annotation method and device and vehicle-mounted terminal equipment - Google Patents

Semantic annotation method and device and vehicle-mounted terminal equipment Download PDF

Info

Publication number
CN114254085A
CN114254085A CN202011017386.9A CN202011017386A CN114254085A CN 114254085 A CN114254085 A CN 114254085A CN 202011017386 A CN202011017386 A CN 202011017386A CN 114254085 A CN114254085 A CN 114254085A
Authority
CN
China
Prior art keywords
corpus
semantic
labeled
labeling
annotation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011017386.9A
Other languages
Chinese (zh)
Inventor
张文瑜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Volkswagen Mobvoi Beijing Information Technology Co Ltd
Original Assignee
Volkswagen Mobvoi Beijing Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Volkswagen Mobvoi Beijing Information Technology Co Ltd filed Critical Volkswagen Mobvoi Beijing Information Technology Co Ltd
Priority to CN202011017386.9A priority Critical patent/CN114254085A/en
Publication of CN114254085A publication Critical patent/CN114254085A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medical Informatics (AREA)
  • Human Computer Interaction (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Machine Translation (AREA)

Abstract

The embodiment of the invention discloses a semantic annotation method, a semantic annotation device and vehicle-mounted terminal equipment. The method comprises the following steps: obtaining a corpus to be labeled; marking the linguistic data to be marked according to the selectable marking quantity of the linguistic data to be marked; and according to the marking result, selecting a matched semantic marking strategy to carry out semantic marking on the corpus to be marked. According to the technical scheme, different semantic annotation strategies are adopted for semantic annotation according to the complexity of the linguistic data to be annotated, so that the human resource cost investment of semantic annotation is saved, and the efficiency of semantic annotation is improved.

Description

语义标注方法、装置及车载终端设备Semantic annotation method, device and vehicle terminal equipment

技术领域technical field

本发明实施例涉及信息处理技术领域,尤其涉及一种语义标注方法、装置 及车载终端设备。Embodiments of the present invention relate to the technical field of information processing, and in particular, to a semantic labeling method, device, and vehicle-mounted terminal device.

背景技术Background technique

在智能对话领域,NLU(Natural Language Understanding,自然语言理解) 语义理解,通常是指将非结构化的文本信息以结构化的语义进行表示,结构化 的语义包括:领域(domain)、意图(intent)、属性槽(slot)。In the field of intelligent dialogue, NLU (Natural Language Understanding) semantic understanding usually refers to the representation of unstructured text information with structured semantics. The structured semantics include: domain, intent ), attribute slot (slot).

目前,NLU语义标注的方式有人工标注和机器标注两种,鉴于语料的场景 复杂性并不同,语义标注的复杂性也就不同,可能导致语义标注不准确的问题。 因此,如何提高NLU语义标注的质量,减少人工标注成本,是亟待解决的问题。At present, there are two methods of NLU semantic annotation: manual annotation and machine annotation. Since the complexity of the corpus is different, the complexity of semantic annotation is also different, which may lead to the problem of inaccurate semantic annotation. Therefore, how to improve the quality of NLU semantic annotation and reduce the cost of manual annotation is an urgent problem to be solved.

发明内容SUMMARY OF THE INVENTION

本发明实施例提供一种语义标注方法、装置及车载终端设备,以提高NLU 语义标注的效率,减少人工标注成本。Embodiments of the present invention provide a semantic labeling method, an apparatus, and a vehicle-mounted terminal device, so as to improve the efficiency of NLU semantic labeling and reduce manual labeling costs.

第一方面,本发明实施例提供了一种语义标注方法,包括:In a first aspect, an embodiment of the present invention provides a semantic annotation method, including:

获取待标注语料;Get the corpus to be labeled;

根据所述待标注语料的可选标注数量,对所述待标注语料进行标记;Marking the to-be-annotated corpus according to the optional label quantity of the to-be-labeled corpus;

根据标记结果,选取匹配的语义标注策略对所述待标注语料进行语义标注。According to the labeling result, a matching semantic labeling strategy is selected to perform semantic labeling on the corpus to be labelled.

第二方面,本发明实施例还提供了一种语义标注装置,包括:In a second aspect, an embodiment of the present invention further provides a semantic annotation device, including:

语料获取模块,设置为获取待标注语料;The corpus acquisition module is set to acquire the corpus to be marked;

语料标记模块,设置为根据所述待标注语料的可选标注数量,对所述待标 注语料进行标记;A corpus marking module, configured to mark the corpus to be labeled according to the optional labeling quantity of the corpus to be labeled;

语义标注模块,设置为根据标记结果,选取匹配的语义标注策略对所述待 标注语料进行语义标注。The semantic labeling module is configured to select a matching semantic labeling strategy to perform semantic labeling on the to-be-labeled corpus according to the labeling result.

第三方面,本发明实施例还提供了一种车载终端设备,包括存储器、处理 器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述 程序时实现如本发明任意实施例所述的语义标注方法。In a third aspect, an embodiment of the present invention further provides an in-vehicle terminal device, including a memory, a processor, and a computer program stored in the memory and running on the processor, where the processor executes the program to achieve the The semantic labeling method described in any embodiment of the invention is disclosed.

第四方面,本发明实施例还提供了一种计算机可读存储介质,其上存储有 计算机程序,该程序被处理器执行时实现如本发明任意实施例所述的语义标注 方法。In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, implements the semantic annotation method according to any embodiment of the present invention.

本发明实施例提供的技术方案中,当获取到待标注语料时,根据待标注语 料的可选标注数量,对待标注语料进行标记,然后根据标记结果,选取与标记 结果匹配的语义标注策略对待标注语料进行语义标注。上述技术方案,根据待 标注语料的可选标注数量采用不同的语义标注策略进行语义标注,以此节省了 语义标注的人力资源成本投入,也提高了语义标注的效率。In the technical solution provided by the embodiment of the present invention, when the corpus to be labeled is obtained, the corpus to be labeled is marked according to the number of optional labels of the corpus to be labeled, and then, according to the labeling result, a semantic labeling strategy matching the labeling result is selected to be labeled. The corpus is semantically annotated. In the above technical solution, different semantic labeling strategies are used for semantic labeling according to the number of optional labels of the corpus to be labelled, thereby saving the human resource cost of semantic labeling and improving the efficiency of semantic labeling.

附图说明Description of drawings

图1是本发明实施例一中的一种语义标注方法的流程图;1 is a flowchart of a semantic labeling method in Embodiment 1 of the present invention;

图2是本发明实施例二中的一种语义标注方法的流程图;2 is a flowchart of a semantic labeling method in Embodiment 2 of the present invention;

图3是本发明实施例三中的一种语义标注方法的流程图;3 is a flowchart of a semantic labeling method in Embodiment 3 of the present invention;

图4是本发明实施例四中的一种语义标注装置的结构示意图;4 is a schematic structural diagram of a semantic labeling device in Embodiment 4 of the present invention;

图5是本发明实施例五中的一种车载终端设备的硬件结构示意图。FIG. 5 is a schematic diagram of a hardware structure of a vehicle-mounted terminal device in Embodiment 5 of the present invention.

具体实施方式Detailed ways

下面结合附图和实施例对本发明作进一步的详细说明。可以理解的是,此 处所描述的具体实施例仅仅用于解释本发明,而非对本发明的限定。另外还需 要说明的是,为了便于描述,附图中仅示出了与本发明相关的部分而非全部结 构。The present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention. In addition, it should be noted that, for the convenience of description, the drawings only show some but not all structures related to the present invention.

在更加详细地讨论示例性实施例之前应当提到的是,一些示例性实施例被 描述成作为流程图描绘的处理或方法。虽然流程图将各项操作(或步骤)描述 成顺序的处理,但是其中的许多操作可以被并行地、并发地或者同时实施。此 外,各项操作的顺序可以被重新安排。当其操作完成时所述处理可以被终止, 但是还可以具有未包括在附图中的附加步骤。所述处理可以对应于方法、函数、 规程、子例程、子程序等等。Before discussing the exemplary embodiments in greater detail, it should be mentioned that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart depicts various operations (or steps) as sequential processing, many of the operations may be performed in parallel, concurrently, or concurrently. In addition, the order of operations can be rearranged. The process may be terminated when its operation is complete, but may also have additional steps not included in the figures. The processing may correspond to a method, function, procedure, subroutine, subroutine, or the like.

实施例一Example 1

图1是本发明实施例一提供的一种语义标注方法的流程图,可适用于根据 待标注语料的语义标注难易程度采用不同的语义标注策略进行标注的情况,该 方法可以由本发明实施例提供的语义标注装置来执行,该装置可采用软件和/或 硬件的方式实现,并一般可集成在车载终端设备中。1 is a flowchart of a semantic labeling method provided in Embodiment 1 of the present invention, which can be applied to the situation where different semantic labeling strategies are used for labeling according to the difficulty of semantic labeling of the corpus to be labelled. It is executed by the provided semantic labeling device, which can be implemented in software and/or hardware, and can generally be integrated in vehicle-mounted terminal equipment.

如图1所示,本实施例的方法具体包括:As shown in Figure 1, the method of this embodiment specifically includes:

S110、获取待标注语料。S110, acquiring the corpus to be marked.

语料,指的是语言材料,例如,“我想导航去XX广场”、“我想看下最 近的肯德基”,等等。Corpus refers to language materials, for example, "I want to navigate to XX Square", "I want to see the nearest KFC", and so on.

标注,指的是对语料进行加工,将各种可以表示语言材料特征的标注信息 标注在相应的语言成分上,以便于计算机的识别读取。例如,可以采用NLU模 型对语料进行标注,标注出语料的领域、意图和属性槽信息。Annotation refers to processing the corpus, and marking various annotation information that can represent the characteristics of the language material on the corresponding language components, so as to facilitate the recognition and reading of the computer. For example, the NLU model can be used to annotate the corpus, and the domain, intent and attribute slot information of the corpus can be marked.

待标注语料,指的是需要进行标注的语料。可选的,待标注语料为应用于 对话场景中的需要进行标注的语料。The corpus to be labeled refers to the corpus that needs to be labeled. Optionally, the corpus to be annotated is a corpus that needs to be annotated and applied in the dialogue scene.

S120、根据待标注语料的可选标注数量,对待标注语料进行标记。S120: Mark the corpus to be labeled according to the optional labeled quantity of the corpus to be labeled.

可选标注数量,指的是可选的语义标注结果的数量。有的语料表达清晰易 于语义标注,其标注复杂程度较低,通常只可以获取到一个确定的语义标注结 果,此时待标注语料的可选标注数量为一个;有的语料存在一语多义的情况, 不同的人对其所要表达的意图可能会产生不同的理解,进而在对其进行语义标 注时可能会出现多个语义标注结果,其标注复杂程度较高,使得对应的待标注 语料的可选标注数量为多个。The number of optional annotations, which refers to the number of optional semantic annotation results. Some corpora are clearly expressed and easy for semantic annotation, and their annotation complexity is low. Usually, only one definite semantic annotation result can be obtained. At this time, the number of optional annotations for the corpus to be annotated is one; some corpora have polysemy. In this case, different people may have different understandings of the intention to be expressed, and then multiple semantic annotation results may appear when semantically annotating it. The annotation complexity is high, making the corresponding corpus to be annotated. The number of selected labels is multiple.

根据可选标注数量对待标注语料进行标记,指的是根据获取到的待标注语 料的可选标注数量对待标注语料进行分类,可以使用分类标识对待标注语料进 行标记,以区分其可选标注数量。例如,对应于可选标注数量为一个的待标注 语料,可以将这类简单语料标记为第一类语料,具体可以通过标识“0”进行标 记;对应于可选标注数量为多个的待标注语料,可以将这类复杂语料标记为第 二类语料,具体可以通过标识“1”进行标记。Marking the corpus to be annotated according to the number of optional annotations refers to classifying the corpus to be annotated according to the obtained number of optional annotations of the corpus to be annotated. The classification mark can be used to mark the corpus to be annotated to distinguish the number of optional annotations. For example, corresponding to the corpus to be annotated with one optional annotation quantity, this kind of simple corpus can be marked as the first type of corpus, which can be marked by marking "0"; corpus, such complex corpus can be marked as the second type of corpus, which can be marked by marking "1".

示例性的,根据待标注语料的可选标注数量,对待标注语料进行标记,包 括:Exemplarily, according to the optional labeling quantity of the corpus to be labeled, the corpus to be labeled is marked, including:

将待标注语料输入至预先训练得到的语料分类模型,得到待标注语料的领 域分类结果;Input the corpus to be labeled into the corpus classification model obtained by pre-training, and obtain the domain classification result of the corpus to be labeled;

如果领域分类结果的数量为一个,则将待标注语料标记为第一类语料;If the number of domain classification results is one, mark the corpus to be labeled as the first type of corpus;

如果领域分类结果的数量为多个,则将待标注语料标记为第二类语料。If the number of domain classification results is multiple, the corpus to be labeled is marked as the second type of corpus.

其中,第二类语料的标注复杂程度大于第一类语料。Among them, the labeling complexity of the second type of corpus is greater than that of the first type of corpus.

语料分类模型,指的是用于对输入的待标注语料进行领域分类的模型。其 中,将待标注语料输入至语料分类模型,语料分类模型即可输出该条待标注语 料的领域分类结果。The corpus classification model refers to the model used to classify the input corpus to be labeled. The corpus to be labeled is input into the corpus classification model, and the corpus classification model can output the domain classification result of the corpus to be labeled.

将待标注语料输入至预先训练得到的语料分类模型,可以得到待标注语料 的领域分类结果。其中,语料分类模型输出的可以仅包括各个领域分类结果, 例如,将待标注语料“查找一条从天津站到天津西站最近的路线”输入至预先 训练得到的语料分类模型,可以得到该带标注语料的领域分类结果为“地图”; 语料分类模型输出的也可以同时包括领域分类结果和领域分类结果的数量,例 如,将待标注语料“看下重庆的景点”输入至预先训练得到的语料分类模型, 可以得到该带标注语料的领域分类结果为“地图”、“导航”和“景点”以及 领域分类结果的数量为三个。Input the corpus to be labeled into the corpus classification model obtained by pre-training, and then the domain classification result of the corpus to be labeled can be obtained. Among them, the output of the corpus classification model may only include the classification results of various fields. For example, input the corpus to be labeled "find a nearest route from Tianjin Station to Tianjin West Station" into the pre-trained corpus classification model, and the labeled corpus can be obtained. The domain classification result of the corpus is "map"; the output of the corpus classification model can also include the domain classification result and the number of domain classification results. For example, input the corpus to be labeled "Look at the scenic spots in Chongqing" into the pre-trained corpus classification Model, the domain classification results of the annotated corpus can be obtained as "map", "navigation" and "scenic spots", and the number of domain classification results is three.

在获取到待标注语料之后,分析待标注语料的领域信息,作为待标注语料 的领域分类结果。鉴于待标注语料可能是一语一义,也可能是一语多义,进而 其领域分类结果的数量可以为一个或多个。对待标注语料的领域分类结果的数 量进行统计,并根据统计结果对待标注语料进行标记。After acquiring the corpus to be annotated, the domain information of the corpus to be annotated is analyzed as the result of the domain classification of the corpus to be annotated. In view of the fact that the corpus to be labeled may be monogrammatic or polygrammatic, the number of domain classification results can be one or more. The number of domain classification results of the corpus to be labeled is counted, and the corpus to be labeled is marked according to the statistical results.

领域分类结果的数量为一个的待标注语料,说明这条待标注语料能够直接 进行语义标注,不容易产生歧义,可以认定为是标注复杂程度较低的简单语料, 故可以将其标记为第一类语料,具体可以通过标识“0”进行标记;领域分类结 果的数量为多个的待标注语料,说明这条待标注语料存在一语多义的情况,不 同人的理解可能不同,可以认定为是标注复杂程度较高的复杂语料,故可以将 其标记为第二类语料,具体可以通过标识“1”进行标记。The number of domain classification results is one corpus to be labeled, indicating that this corpus to be labeled can be directly semantically labeled, and it is not easy to produce ambiguity. It can be regarded as a simple corpus with low labeling complexity, so it can be marked as the first. Class corpus, which can be marked by marking "0"; the number of domain classification results is multiple to-be-labeled corpus, indicating that this to-be-labeled corpus has polysemy, and different people may have different understandings, so it can be identified as It is a complex corpus with a high degree of labeling complexity, so it can be marked as the second type of corpus, which can be marked by marking "1".

例如,待标注语料为“我想导航去XX广场”时,在对这条待标注语料进 行语义分析后,可以确定这条待标注语料的领域信息为“导航”,也即这条待 标注语料的领域分类结果为“导航”,领域分类结果数量为一个,那么可以将 这条待标注语料标记为第一类语料。再例如,待标注语料为“我想看下最近的 肯德基”时,这条待标注语料可以理解成“我想导航去最近的肯德基”、“我 想看下最近的肯德基的具体位置”或者是“我想看下最近的肯德基有哪些”, 进而在对这条待标注语料进行语义分析后,可以确定这条待标注语料的领域分类结果为“导航”、“地图”或“餐厅”,领域分类结果数量为多个,从而可 以将这条待标注语料标记为第二类语料。For example, when the corpus to be labeled is "I want to navigate to XX Square", after semantic analysis of the corpus to be labeled, it can be determined that the domain information of the corpus to be labeled is "navigation", that is, the corpus to be labeled The domain classification result is "navigation", and the number of domain classification results is one, then this corpus to be labeled can be marked as the first type of corpus. For another example, when the corpus to be labeled is "I want to see the nearest KFC", the corpus to be labeled can be understood as "I want to navigate to the nearest KFC", "I want to see the specific location of the nearest KFC" or "I want to see what the nearest KFCs are", and after semantic analysis of the corpus to be labeled, it can be determined that the domain classification result of the corpus to be labeled is "navigation", "map" or "restaurant", the domain The number of classification results is multiple, so that the to-be-labeled corpus can be marked as the second type of corpus.

S130、根据标记结果,选取匹配的语义标注策略对待标注语料进行语义标 注。S130. According to the labeling result, select a matching semantic labeling strategy to perform semantic labeling on the corpus to be labelled.

语义标注策略,指的是对待标注语料进行语义标注时采用的具体方案。对 于可选标注数量不同的待标注语料可以采用不同的语义标注策略进行标注。The semantic tagging strategy refers to the specific scheme used when semantically tagging the corpus to be tagged. Different semantic labeling strategies can be used to label the corpus to be labelled with different numbers of optional labels.

可选的,适用于可选标注数量为一个的语义标注策略的人工成本低于适用 于可选标注数量为多个的语义标注策略。Optionally, the labor cost for a semantic annotation strategy with one optional annotation is lower than that for a semantic annotation strategy with multiple optional annotations.

针对可选标注数量为一个的简单语料,可以采用机器标注的方式来实现, 以免造成人力资源成本投入的浪费,并且还会降低语义标注的效率。针对可选 标注数量为多个的复杂语料,可以采用人工标注的方式来实现,以提高语义标 注的质量和准确率。For a simple corpus with one optional annotation, machine annotation can be used to avoid waste of human resource costs and reduce the efficiency of semantic annotation. For complex corpus with multiple optional annotations, manual annotation can be used to improve the quality and accuracy of semantic annotation.

根据待标注语料的标记结果,对待标注语料的可选标注数量进行区分,进 而选取与其可选标注数量匹配的语义标注策略对待标注语料进行语义标注。According to the labeling result of the corpus to be labeled, the number of optional labels of the corpus to be labeled is distinguished, and then a semantic labeling strategy matching the number of optional labels to be labeled is selected for semantic labeling of the corpus to be labeled.

由于可选标注数量为一个的简单语料的语义标注不需要较高的人力资源成 本投入就能够满足语义标注准确率的要求,而可选标注数量为多个的复杂语料 会存在一语多义的情况,需要较高的人力资源成本投入。因此,可以有针对性 的设计语义标注策略和合理地分配人力资源的投入方向,有的放矢地将更多的 人力资源投入在复杂语料的语义标注上,以此节省了人力资源成本的投入。Since the semantic annotation of a simple corpus with one optional annotation can meet the requirements of semantic annotation accuracy without high human resource cost investment, and the complex corpus with multiple optional annotations will have polysemy. In this case, higher human resource costs are required. Therefore, it is possible to design a semantic annotation strategy and rationally allocate the investment direction of human resources, so that more human resources can be invested in the semantic annotation of complex corpus, thereby saving the investment of human resources.

示例性的,根据标记结果,选取匹配的语义标注策略对待标注语料进行语 义标注,包括:Exemplarily, according to the labeling result, a matching semantic labeling strategy is selected to perform semantic labeling on the corpus to be labelled, including:

如果将待标注语料标记为第一类语料,则使用预设的NLU模型对待标注语 料进行语义标注,得到待标注语料的语义目标标注结果。If the corpus to be labeled is marked as the first type of corpus, the preset NLU model is used to semantically label the corpus to be labeled, and the semantic target annotation result of the corpus to be labeled is obtained.

NLU模型,指的是自然语言理解模型,该模型覆盖中文自动分词、词性标 注、句法分析、自然语言生成、信息检索、信息抽取、机器翻译、文字蕴涵等 多个领域,通过机器学习能够对用户输入的语料或者语音识别的结果进行处理, 提取用户的对话意图以及用户所传递的信息。NLU模型输出的结果,可以获得 一个语义标注结果。The NLU model refers to the natural language understanding model, which covers Chinese automatic word segmentation, part-of-speech tagging, syntactic analysis, natural language generation, information retrieval, information extraction, machine translation, text entailment and other fields. The input corpus or the result of speech recognition is processed to extract the user's dialogue intention and the information transmitted by the user. As a result of the output of the NLU model, a semantic annotation result can be obtained.

可选的,可以利用NLU模型获得通过领域、意图和属性槽的方式表示的待 标注语料的语义标注结果。其中,领域,指的是同一类型的数据或者资源,以 及围绕这些数据或资源提供的服务,比如“餐厅”,“酒店”,“飞机票”、 “地图”、“导航”等;意图,指的是对于领域数据的操作,例如在飞机票领 域中,可以有“购票”、“退票”等意图;属性槽,是用来存放领域的属性, 比如在飞机票领域中可以有“时间”、“出发地”、“目的地”等属性。本实 施例对采用的预设的NLU模型不做具体限定。Optionally, the NLU model can be used to obtain the semantic annotation results of the corpus to be annotated represented by the domain, intent and attribute slot. Among them, domain refers to the same type of data or resources, and services provided around these data or resources, such as "restaurant", "hotel", "air ticket", "map", "navigation", etc.; intent, refers to It is the operation of the field data. For example, in the field of air tickets, there can be intentions such as "purchase tickets" and "refunds"; attribute slots are used to store the attributes of the field, such as "time" in the field of air tickets. , "Origin", "Destination" and other attributes. This embodiment does not specifically limit the adopted preset NLU model.

在本实施例中,可以利用预设的NLU模型对标记为第一类语料的待标注语 料,即对可选标注数量为一个的复杂程度较低的语料进行语义标注,从而得到 待标注语料的语义目标标注结果。In this embodiment, the pre-set NLU model can be used to semantically label the corpus to be labeled as the first type of corpus, that is, the corpus of lower complexity with the optional label quantity is one, so as to obtain the corpus to be labeled. Semantic object annotation results.

在一可选的实施方式中,如果将待标注语料标记为第一类语料,则使用预 设的NLU模型对待标注语料进行语义标注,得到待标注语料的语义预标注结果; 如果语义预标注结果通过人工核查,则将语义预标注结果作为待标注语料的语 义目标标注结果。In an optional embodiment, if the corpus to be marked is marked as the first type of corpus, the preset NLU model is used to semantically mark the corpus to be marked, and the semantic pre-marking result of the corpus to be marked is obtained; if the semantic pre-marking result Through manual verification, the semantic pre-labeling result is used as the semantic target labeling result of the corpus to be labelled.

在本实施方式中,可以利用预设的NLU模型对标记为第一类语料的待标注 语料,即对可选标注数量为一个的复杂程度较低的语料进行语义标注,得到待 标注语料的语义预标注结果。然后质检员可以对得到的语义预标注结果进行人 工核查,如果该语义预标注结果正确,则确定该语义预标注结果为待标注语料 的语义目标标注结果,如果该语义预标注结果错误,质检员可以对语义预标注 结果进行人工修改,进而将修改后的语义预标注结果作为待标注语料的语义目 标标注结果。利用NLU模型代替人工语义标注,节约了很多人力资源成本的投 入,还提高了语义标注的效率,通过对NLU模型获得的语义预标注结果进行人 工核查,也保证了语义标注结果的准确率和质量。In this embodiment, the pre-set NLU model can be used to semantically label the corpus to be labeled as the first type of corpus, that is, the corpus with a lower complexity that can be optionally labeled as one, to obtain the semantics of the corpus to be labeled Pre-annotated results. Then the quality inspector can manually check the obtained semantic pre-labeling result. If the semantic pre-labeling result is correct, it is determined that the semantic pre-labeling result is the semantic target labeling result of the corpus to be labeled. If the semantic pre-labeling result is wrong, the quality The inspector can manually modify the semantic pre-labeling results, and then use the modified semantic pre-labeling results as the semantic target labeling results of the corpus to be labeled. Using the NLU model to replace manual semantic annotation saves a lot of human resource costs and improves the efficiency of semantic annotation. By manually checking the semantic pre-annotation results obtained by the NLU model, the accuracy and quality of the semantic annotation results are also guaranteed. .

例如,当被标记为第一类语料的待标注语料为“我想导航去XX广场”时, 那么可以使用预设的NLU模型对该待标注语料进行语义标注,得到该待标注语 料的语义预标注结果,然后质检员对得到的语义预标注结果进行人工核查,如 果语义预标注结果中的领域信息为“导航”,语义预标注结果通过人工核查, 则确定该语义预标注结果为待标注语料的语义目标标注结果,如果该语义预标 注结果中的领域信息不是“导航”,比如“景点”,语义预标注结果未通过人 工核查,质检员则可以将该语义预标注结果中的领域信息人工修改为“导航”, 进而将修改后的语义预标注结果作为该待标注语料的语义目标标注结果。For example, when the to-be-labeled corpus marked as the first type of corpus is "I want to navigate to XX Square", then the preset NLU model can be used to semantically label the to-be-labeled corpus to obtain the semantic prediction of the to-be-labeled corpus. Then the quality inspector manually checks the obtained semantic pre-labeling results. If the domain information in the semantic pre-labeling results is "navigation" and the semantic pre-labeling results pass manual verification, it is determined that the semantic pre-labeling results are to be labeled. The semantic target labeling result of the corpus. If the domain information in the semantic pre-labeling result is not "navigation", such as "scenic spots", and the semantic pre-labeling result has not passed manual verification, the quality inspector can use the semantic pre-labeling result. Domain The information is manually modified to "navigation", and then the modified semantic pre-labeling result is used as the semantic target labeling result of the corpus to be labelled.

示例性的,根据标记结果,选取匹配的语义标注策略对待标注语料进行语 义标注,包括:Exemplarily, according to the labeling result, a matching semantic labeling strategy is selected to perform semantic labeling on the corpus to be labelled, including:

如果将待标注语料标记为第二类语料,则获取对待标注语料的多个语义待 定标注结果;在多个语义待定标注结果中,确定待标注语料的语义目标标注结 果。If the to-be-labeled corpus is marked as the second type of corpus, multiple semantic pending labeling results of the to-be-labeled corpus are obtained; among the multiple semantically pending labeling results, the semantic target labeling results of the to-be-labeled corpus are determined.

其中,语义待定标注结果,可以是通过计算机技术获取到的与待标注语料 对应的语义标注结果,例如,可以是通过语料标语软件获取到的多个语义标注 结果,也可以是结合历史语义标注数据确定的多个语义标注结果,也可以是基 于大数据分析确定的多个语义标注结果,还可以是同时基于多种标注模型或软 件得到多个语义标注结果,等等;语义待定标注结果,也可以是通过人工标注 手段获取到的与待标注语料对应的语义标注结果,例如可以是多个具有标注经 验的标注员同时进行语义标注而得到的多个语义标注结果。Among them, the semantic undetermined annotation result may be the semantic annotation result corresponding to the corpus to be annotated obtained through computer technology, for example, it may be a plurality of semantic annotation results obtained through corpus slogan software, or it may be combined with historical semantic annotation data. The determined multiple semantic annotation results may also be multiple semantic annotation results determined based on big data analysis, or multiple semantic annotation results obtained based on multiple annotation models or software at the same time, etc.; semantic pending annotation results, also It may be a semantic annotation result corresponding to the corpus to be annotated obtained by manual annotation means, for example, it may be a plurality of semantic annotation results obtained by simultaneous semantic annotation performed by multiple annotators with annotation experience.

当待标注语料标记为第二类语料时,即对标记复杂程度较高的语料进行语 义标注,这类待标注语料可能是一语双关,也可能是一语三关,因此可以选择 至少四名标注员同时对这条待标注语料进行语义标注,避免出现对同一条待标 注语料进行语义标注后得到的多个语义待定标注结果都不相同的情况。When the corpus to be labeled is marked as the second type of corpus, semantic labeling is performed on the corpus with a high degree of labeling complexity. This kind of corpus to be labeled may be a pun or a triad, so you can choose at least four The annotator also performs semantic annotation on the to-be-annotated corpus to avoid the situation where multiple semantically-to-be-determined annotation results obtained after semantic annotation of the same to-be-annotated corpus are different.

在获取到的多个语义待定标注结果中,依据设定规则选取一个语义待定标 注结果作为待标注语料的语义目标标注结果,例如依据少数服从多数的规则在 多个语义待定标注结果中确定待标注语料的语义目标标注结果。Among the multiple semantic pending annotation results obtained, one semantic pending annotation result is selected as the semantic target annotation result of the to-be-annotated corpus according to the set rule. The semantic target annotation results of the corpus.

进一步的,在多个语义待定标注结果中,确定待标注语料的语义目标标注 结果,包括:Further, among the multiple semantic pending annotation results, determine the semantic target annotation results of the corpus to be annotated, including:

统计每个语义待定标注结果中的领域信息;如果数量占比最大的领域信息 仅存在一个,则将数量占比最大的领域信息作为待标注语料的目标领域信息, 并将与目标领域信息匹配的一条语义待定标注结果作为待标注语料的语义目标 标注结果;Count the domain information in each semantic pending annotation result; if there is only one domain information with the largest proportion, the domain information with the largest proportion will be used as the target domain information of the corpus to be annotated, and will match the target domain information. A semantic pending annotation result is used as the semantic target annotation result of the corpus to be annotated;

如果数量占比最大的领域信息存在多个,则统计每个语义待定标注结果中 的意图信息;If there are more than one domain information with the largest proportion, the intent information in each semantic pending annotation result is counted;

若数量占比最大的意图信息仅存在一个,则将与数量占比最大的领域信息 以及数量占比最大的意图信息匹配的一条语义待定标注结果作为待标注语料的 语义目标标注结果;If there is only one intent information with the largest proportion, a semantic pending annotation result that matches the domain information with the largest proportion and the intent information with the largest proportion will be used as the semantic target annotation result of the corpus to be annotated;

若数量占比最大的意图信息存在多个,则统计每个语义待定标注结果中的 属性槽信息,并将与数量占比最大的领域信息、数量占比最大的意图信息以及 数量占比最大的属性槽信息匹配的一条语义待定标注结果作为待标注语料的语 义目标标注结果。If there are more than one intent information with the largest proportion, the attribute slot information in each semantic pending annotation result will be counted, and will be compared with the domain information with the largest proportion, the intent information with the largest proportion, and the A semantic pending annotation result matching the attribute slot information is used as the semantic target annotation result of the corpus to be annotated.

统计每个语义待定标注结果中的领域信息,如果数量占比最大的领域信息 仅存在一个,则采用少数服从多数的原则,选取数量占比最大的一个领域信息 作为待标注语料的目标领域信息,最终将与确定的目标领域信息匹配的一条语 义待定标注结果作为待标注语料的语义目标标注结果;如果数量占比最大的领 域信息存在多个,进一步统计每个语义待定标注结果中的意图信息:如果数量 占比最大的意图信息仅存在一个,则将与数量占比最大的领域信息以及数量占 比最大的意图信息匹配的一条语义待定标注结果作为待标注语料的语义目标标 注结果;如果数量占比最大的意图信息存在多个,再进一步统计每个语义待定 标注结果中的属性槽信息,并将与数量占比最大的领域信息、数量占比最大的意图信息以及数量占比最大的属性槽信息匹配的一条语义待定标注结果作为待 标注语料的语义目标标注结果。Count the domain information in each semantic pending annotation result. If there is only one domain information with the largest proportion, the principle of minority obeying the majority is adopted, and the domain information with the largest proportion is selected as the target domain information of the corpus to be annotated. Finally, a semantic pending annotation result matching the determined target domain information is used as the semantic target annotation result of the to-be-annotated corpus; if there are multiple domain information with the largest proportion, the intent information in each semantic pending annotation result is further counted: If there is only one intent information with the largest proportion of the number, a semantic pending annotation result matching the domain information with the largest proportion and the intent information with the largest proportion will be used as the semantic target annotation result of the corpus to be annotated; There are more than the largest intention information, and then further statistics the attribute slot information in each semantic pending annotation result, and compares it with the domain information that accounts for the largest number, the intent information that accounts for the largest number, and the attribute slot that accounts for the largest amount. A semantic pending annotation result of the information matching is used as the semantic target annotation result of the corpus to be annotated.

进而,在采用多人工同时标注的情况下,可以避免由于待标注语料存在一 语多义的情况引发争执讨论,节省了语义标注时间,同时获取多个语义待定标 注结果,也可以避免对待标注语料的标注结果进行反复修改的情况。例如,当 一个待标注语料的领域信息被语义标注员标注为A时,在质检员进行人工核查 时将其修改为B,在语义标注准确率不达标的时候,语义标注员进行检查修改 可能会将标注结果中的领域信息改回A或改为C,这样就会出现一条待标注语 料的领域信息被质检员修改为正确的语义标注结果后又被语义标注员改错的现 象,从而造成对待标注语料进行反复修改且在确定语义标注结果上产生分歧,因此通过多个人工同时进行语义标注,采用少数服从多数的原则确定语义标注 结果,并且机器学习模型可以冻结最终确定的语义标注结果,可以更便捷地确 定语义标注结果,提高语义标注结果的准确率和效率。Furthermore, in the case of using multiple manual annotations at the same time, it is possible to avoid disputes and discussions due to the polysemy of the corpus to be annotated, save the time for semantic annotation, and obtain multiple semantic pending annotation results at the same time. The annotation results are repeatedly modified. For example, when the domain information of a corpus to be annotated is marked as A by the semantic annotator, it is revised to B when the quality inspector conducts manual verification. The domain information in the labeling result will be changed back to A or to C, so that the domain information of a corpus to be labelled is modified by the quality inspector to the correct semantic labeling result and then corrected by the semantic labeler. This results in repeated modification of the corpus to be labeled and disagreements in determining the semantic annotation results. Therefore, multiple manual semantic annotations are performed at the same time, and the principle of minority obeying the majority is used to determine the semantic annotation results, and the machine learning model can freeze the final semantic annotation results. , the semantic annotation results can be determined more conveniently, and the accuracy and efficiency of the semantic annotation results can be improved.

对于存在一语多义的待标注语料,统计获取到的多个语义待定标注结果的 领域信息中,如果数量占比最大的领域信息仅存在一个,则采用少数服从多数 的原则,确定待标注语料的目标领域信息,再将与目标领域信息匹配的一条语 义待定标注结果作为待标注语料的语义目标标注结果,既可以避免得到的多个 语义待定标注结果都不相同,也可以快速确定并冻结语义标注结果。For the corpus to be labeled with polysemy, among the domain information of the multiple semantic undetermined labeling results obtained by statistics, if there is only one domain information with the largest proportion, the principle of minority obeying the majority is adopted to determine the corpus to be labeled. The target domain information of the target domain, and then a semantic pending annotation result matching the target domain information is used as the semantic target annotation result of the to-be-annotated corpus, which can not only avoid the obtained multiple semantic pending annotation results from being different, but also can quickly determine and freeze the semantics. Label the results.

例如,当被标记为第二类语料的标注语料为“我想找肯德基”时,假设得 到下述四个语义待定标注结果:For example, when the annotated corpus marked as the second type of corpus is "I want to find KFC", it is assumed that the following four semantic undetermined annotation results are obtained:

Figure BDA0002699515310000111
Figure BDA0002699515310000111

Figure BDA0002699515310000121
Figure BDA0002699515310000121

统计获得的四个语义待定标注结果中的领域信息,其中,有两个待定标注 结果中的领域信息确定为“地图”,另外两个待定标注结果中的领域信息分别 为“导航”和“餐厅”,按照少数服从多数原则,“地图”数量占比最大,且 数量占比最大的领域信息只有一个“地图”,因此选择“地图”作为待标注语 料“我想找肯德基”的目标领域信息,然后将与“地图”目标领域信息匹配的 一条语义待定标注结果作为该待标注语料的语义目标标注结果。其中,可以将 与“地图”目标领域信息匹配的任意一条语义待定标注结果作为该待标注语料的语义目标标注结果,也可以根据各条语义待定标注结果中的意图信息和属性 槽信息的数量占比选取与“地图”目标领域信息匹配的一条语义待定标注结果 作为该待标注语料的语义目标标注结果。The domain information in the four semantic pending annotation results obtained by statistics, among them, the domain information in two pending annotation results is determined to be "map", and the domain information in the other two pending annotation results are "navigation" and "restaurant" respectively. ", according to the principle of minority obeying the majority, the number of "maps" is the largest, and there is only one "map" in the field information with the largest number, so "map" is selected as the target field information of the corpus to be marked "I want to find KFC", Then, a semantic undetermined annotation result matching the target domain information of the "map" is used as the semantic object annotation result of the to-be-annotated corpus. Among them, any semantic pending annotation result that matches the target domain information of the "map" can be used as the semantic target annotation result of the to-be-annotated corpus, or the number of intent information and attribute slot information in each semantic pending annotation result can be accounted for A semantic undetermined annotation result that matches the target domain information of the "map" is selected as the semantic object annotation result of the to-be-annotated corpus.

值得指出的是,多个语义待定标注结果中可能会存在多个领域信息数量占 比相同且最大的情况,也即无法唯一确定一个数量占比最大的领域信息。在这 种情况下,可以在数量占比相同且最大的多个领域信息中随机选取一个作为与 待标注语料对应的目标领域信息,进而将与目标领域信息匹配的一条语义待定 标注结果作为待标注语料的语义目标标注结果,也可以进一步根据语义待定标 注结果中的意图信息和属性槽信息进行判断,根据意图信息和属性槽信息的数 量占比在多个(数量占比相同且最大的)领域信息中确定与待标注语料对应的 目标领域信息,进而将与目标领域信息匹配的一条语义待定标注结果作为待标 注语料的语义目标标注结果。It is worth pointing out that there may be situations in which the amount of multiple domain information accounts for the same and the largest amount in the multiple semantic pending annotation results, that is, it is impossible to uniquely determine a domain information with the largest amount. In this case, one of the multiple domain information with the same proportion and the largest number can be randomly selected as the target domain information corresponding to the corpus to be annotated, and then a semantic pending annotation result matching the target domain information can be used as the pending annotation. The semantic target annotation results of the corpus can also be further judged according to the intent information and attribute slot information in the semantic pending annotation results. The target domain information corresponding to the corpus to be annotated is determined in the information, and then a semantic pending annotation result matching the target domain information is used as the semantic target annotation result of the corpus to be annotated.

例如,当被标记为第二类语料的标注语料为“查一下附近的火车站”时, 假设得到下述五个语义待定标注结果:For example, when the labeled corpus marked as the second type of corpus is "check the nearby train station", it is assumed that the following five semantic undetermined annotation results are obtained:

标注序号Label serial number 待标注语料corpus to be labeled 标注结果中的领域信息Domain information in the annotation results one 查一下附近的火车站Check out the nearest train station 地图map two 查一下附近的火车站Check out the nearest train station 导航navigation three 查一下附近的火车站Check out the nearest train station 地图map Four 查一下附近的火车站Check out the nearest train station 火车站TRAIN STATION five 查一下附近的火车站Check out the nearest train station 火车站 TRAIN STATION

统计获得的五个语义待定标注结果中的领域信息,其中,各有两名标注员 将待标注语料“查一下附近的火车站”标注为“地图”和“火车站”两种领域, 故数量占比最大的领域信息有两个。在这种情况下,可选的,可以在“地图” 和“火车站”这两种数量占比最大的领域信息中随机选取一个领域信息作为待 标注语料“查一下附近的火车站”的目标领域信息,进而可以将与目标领域信 息匹配的任意一条语义待定标注结果作为待标注语料的语义目标标注结果;也 可以进一步根据语义待定标注结果中的意图信息进行选择,如果进一步分析确 定数据占比最大的意图信息仅存在一个,则可以将与数量占比最大的领域信息以及数量占比最大的意图信息匹配的一条语义待定标注结果作为待标注语料的 语义目标标注结果,假设如下所示:The field information in the five semantic undetermined annotation results obtained by statistics, among which two annotators each marked the to-be-annotated corpus "check the nearby train station" as two fields of "map" and "train station", so the number of There are two areas of information with the largest proportion. In this case, optionally, one domain information can be randomly selected as the target of the corpus to be labeled "check the nearby train stations" from the two domain information with the largest proportion of "map" and "train station" Domain information, and then any semantic pending annotation result that matches the target domain information can be used as the semantic target annotation result of the corpus to be annotated; it can also be further selected according to the intent information in the semantic pending annotation result. If further analysis determines the proportion of data If there is only one largest intent information, a semantic pending annotation result that matches the domain information with the largest proportion and the intent information with the largest proportion can be used as the semantic target annotation result of the corpus to be annotated, assuming the following:

Figure BDA0002699515310000131
Figure BDA0002699515310000131

Figure BDA0002699515310000141
Figure BDA0002699515310000141

其中,两个标注结果中的意图信息为“乘车”,另外两名标注员标注结果 中的意图信息分别为“规划路径”和“显示位置”,按照少数服从多数原则, 意图信息为“乘车”的数量占比最大且仅存在一个,因此选择与数量占比最大 的领域信息“火车站”和数量占比最大的意图信息“乘车”匹配的一条语义待 定标注结果作为待标注语料“查一下附近的火车站”的语义目标标注结果。Among them, the intent information in the two annotation results is "ride by car", and the intent information in the other two annotation results are "planned path" and "display location" respectively. According to the principle of minority obeying the majority, the intent information is "ride by car". The number of "car" is the largest and only one exists, so a semantic pending annotation result matching the domain information "train station" with the largest proportion and the intent information "ride" with the largest proportion is selected as the to-be-annotated corpus" Check out the semantic object annotation results for "Nearby Train Station".

值得指出的是,如果数量占比最大的意图信息存在多个,例如,在待标注 语料被标注为“地图”和“火车站”两种领域的情况下,各有两名标注员标注 结果中的意图信息为“乘车”和“路径规划”,则再进一步地统计每个语义待 定标注结果中的属性槽信息:如果与“乘车”和“路径规划”两种意图信息各 自对应的属性槽信息的数量占比都相同且最大,则可以通过随机选取的方式任 意选择一个数量占比最大的属性槽信息,将与该属性槽信息匹配的一条语义待 定标注结果作为待标注语料的语义目标标注结果;如果与“乘车”和“路径规 划”两种意图信息各自对应的属性槽信息的数量占比不同,则可以将与数量占 比最大的领域信息、数量占比最大的意图信息以及数量最大的属性槽信息匹配 的一条语义待定标注结果作为待标注语料的语义目标标注结果。It is worth pointing out that if there are multiple intent information with the largest number, for example, when the corpus to be annotated is labeled as “map” and “train station”, two annotators will each label the results. The intent information is "ride" and "path planning", then further statistics the attribute slot information in each semantic pending annotation result: if the attributes corresponding to the two intent information of "ride" and "path planning" If the proportion of the slot information is the same and the largest, then the attribute slot information with the largest proportion can be arbitrarily selected by random selection, and a semantic pending annotation result matching the attribute slot information can be used as the semantic target of the corpus to be annotated. Annotation results; if the proportion of attribute slot information corresponding to the two intent information of "ride" and "path planning" is different, the domain information with the largest proportion, the intent information with the largest proportion, and A semantic pending annotation result matched with the largest number of attribute slot information is used as the semantic target annotation result of the to-be-annotated corpus.

本发明实施例提供的技术方案中,当获取到待标注语料时,可以根据待标 注语料的可选标注数量,对待标注语料进行标记,然后根据标记结果,选取与 标记结果匹配的语义标注策略对待标注语料进行语义标注。根据对不同可选标 注数量的待标注语料匹配不同的语义标注策略,节省了人力资源的成本投入, 提高了语义标注的效率及语义标注结果的准确率和质量。In the technical solution provided by the embodiment of the present invention, when the corpus to be labeled is acquired, the corpus to be labeled can be labeled according to the number of optional labels of the corpus to be labeled, and then, according to the labeling result, a semantic labeling strategy that matches the labeling result is selected to be treated. The annotation corpus is semantically annotated. Different semantic annotation strategies are matched according to the corpus to be annotated with different optional annotation quantities, which saves the cost of human resources, improves the efficiency of semantic annotation and the accuracy and quality of semantic annotation results.

实施例二Embodiment 2

图2是本发明实施例二提供的一种语义标注方法的流程图,本实施例以上 述实施例为基础进行具体化,其中,在将待标注语料输入至预先训练得到的语 料分类模型,得到待标注语料的领域分类结果之前,还可以包括:FIG. 2 is a flowchart of a semantic labeling method provided in Embodiment 2 of the present invention. This embodiment is embodied on the basis of the above-mentioned embodiment, wherein, after inputting the corpus to be labeled into the pre-trained corpus classification model, the obtained Before the domain classification result of the corpus to be annotated, it can also include:

获取附带领域分类结果的至少两条样本语料;Obtain at least two sample corpora with domain classification results;

将样本语料以及样本语料附带的领域分类结果对应作为一组训练样本数据;Corresponding the sample corpus and the domain classification results attached to the sample corpus as a set of training sample data;

采用至少两组训练样本数据对机器学习模型进行训练,生成语料分类模型。。At least two sets of training sample data are used to train the machine learning model to generate a corpus classification model. .

如图2所示,本实施例的方法具体包括:As shown in Figure 2, the method of this embodiment specifically includes:

S210、获取附带领域分类结果的至少两条样本语料。S210: Acquire at least two sample corpora with field classification results.

例如,样本语料为“我想听周某某的歌”,其附带的领域分类结果为“音 乐”;再例如,样本语料为“我想看肯德基”,其附带的领域分类结果为“地 图、导航、餐厅”;等等。For example, the sample corpus is "I want to listen to Zhou Moumou's song", and its accompanying field classification result is "music"; for another example, the sample corpus is "I want to watch KFC", and its accompanying field classification result is "map, Navigation, restaurants"; etc.

S220、将样本语料以及样本语料附带的领域分类结果对应作为一组训练样 本数据。S220. Corresponding the sample corpus and the domain classification result attached to the sample corpus as a set of training sample data.

根据步骤S210中的示例,可以将样本语料“我想听周某某的歌”以及领域 分类结果“音乐”对应作为一组训练样本数据,将样本语料“我想看肯德基” 以及领域分类结果“地图、导航、餐厅”对应作为一组训练样本数据。According to the example in step S210, the sample corpus "I want to listen to Zhou XX's song" and the domain classification result "music" can be used as a set of training sample data, and the sample corpus "I want to watch KFC" and the domain classification result " Map, Navigation, Restaurant” corresponds to a set of training sample data.

S230、采用至少两组训练样本数据对机器学习模型进行训练,生成语料分 类模型。S230. Use at least two sets of training sample data to train the machine learning model to generate a corpus classification model.

采用至少两组训练样本数据对机器学习模型进行训练,生成语料分类模型。 其中,机器学习模型可以是NLU模型,通过对NLU模型进行训练,生成语料 分类模型。机器学习模型还可以是能够实现语料分类的其他模型,例如,RCNN (Recurrent ConvolutionalNeural Networks,递归卷积神经网络)模型、BERT (Bidirectional EncoderRepresentations from Transformer,基于Transformer的双 向编码表征)模型,等等,本实施例对此不做具体限定。At least two sets of training sample data are used to train the machine learning model to generate a corpus classification model. The machine learning model may be an NLU model, and a corpus classification model is generated by training the NLU model. The machine learning model can also be other models that can implement corpus classification, for example, the RCNN (Recurrent ConvolutionalNeural Networks, recurrent convolutional neural network) model, BERT (Bidirectional EncoderRepresentations from Transformer, Transformer-based bidirectional encoding representation) model, etc. The embodiment does not specifically limit this.

对机器学习模型进行训练学习,使其具备对待标注语料进行领域分类的能 力。将样本语料及其附带的领域分类结果作为一组训练样本数据,采用至少两 组训练样本数据对机器学习模型进行训练学习,生成语料分类模型。The machine learning model is trained and learned, so that it has the ability to classify the field of the labeled corpus. The sample corpus and its accompanying field classification results are used as a set of training sample data, and at least two sets of training sample data are used to train and learn the machine learning model to generate a corpus classification model.

例如,在对语料分类模型进行训练时,可以输入两条附带领域分类结果的 样本语料:“我想听周某某的歌——分类:音乐”和“我想导航到五道口—— 分类:导航”,将这两条样本语料及其附带的领域分类结果作为两组训练样本 数据,此外,还可以再输入一些存在一语多义的附带领域分类结果的样本语料: “看看周某某——分类:音乐、视频、百科”和“我想看肯德基——地图、导 航、餐厅”,将这两条样本语料及其附带的领域分类结果作为另两组训练样本 数据,然后利用这多组训练样本数据对机器学习模型进行训练,当训练完成后, 模型就能够将“我想听+【一名歌手】的歌”的语料的领域分类判断为“音乐”, 将“我想看导航到+【一个地名】”的语料的领域分类判断为“导航”;将“看 看+【一名歌手】”的语料的领域分类判断为“音乐”、“视频”和“百科”, 将“我想看+【餐厅名称】”的语料的领域分类判断为“地图”、“导航”和“餐 厅”。For example, when training a corpus classification model, two sample corpora with domain classification results can be input: "I want to listen to Zhou Moumou's song - Category: Music" and "I want to navigate to Wudaokou - Category: Navigation ”, take these two sample corpora and their accompanying domain classification results as two sets of training sample data, in addition, you can also input some sample corpora with polysemy and accompanying domain classification results: “Look at Zhou Moumou— —Category: music, video, encyclopedia” and “I want to see KFC-map, navigation, restaurant”, take these two sample corpora and their accompanying domain classification results as another two sets of training sample data, and then use these two sets of The training sample data is used to train the machine learning model. When the training is completed, the model can classify the domain of the corpus of "I want to listen + [a singer]'s song" as "music", and "I want to see the navigation to The domain classification of the corpus of "+[a place name]" is judged as "navigation"; the domain classification of the corpus of "Look at + [a singer]" is judged to be "music", "video" and "encyclopedia", and the "I" The domain classification of the corpus that I want to see + [restaurant name]" is judged as "map", "navigation" and "restaurant".

S240、获取待标注语料。S240. Obtain the corpus to be marked.

S250、将待标注语料输入至训练得到的语料分类模型,得到待标注语料的 领域分类结果。S250, input the corpus to be labeled into the corpus classification model obtained by training, and obtain the domain classification result of the corpus to be labeled.

使用训练生成的语料分类模型确定待标注语料的领域分类结果,进而根据 领域分类结果的数量对待标注语料进行标记,以区分待标注语料的可选标注数 量,进而可以有针对性地选择不同的语义标注策略处理不同的语料,将更多的 人力资源集中在可选标注数量多于一个的复杂样本语义标注中,节省人力资源 成本的投入,也提高了语义标注的准确率和质量。而且,在上述技术方案中, 待标注语料的区分方式简单易实现,准确率高,且无需投入人工成本。Use the corpus classification model generated by training to determine the domain classification results of the corpus to be labeled, and then mark the corpus to be labeled according to the number of domain classification results to distinguish the number of optional labels for the corpus to be labeled, and then select different semantics in a targeted manner The annotation strategy processes different corpora, concentrates more human resources in the semantic annotation of complex samples with more than one optional annotation, saves the investment of human resources cost, and improves the accuracy and quality of semantic annotation. Moreover, in the above technical solution, the way of distinguishing the corpus to be marked is simple and easy to implement, with high accuracy, and no labor cost is required.

S260、判断领域分类结果的数量是否是一个,若是,则执行S270,否则, 则执行S2100。S260: Determine whether the number of field classification results is one, if yes, execute S270, otherwise, execute S2100.

S270、将待标注语料标记为第一类语料。S270, marking the corpus to be marked as the first type of corpus.

当待标注语料的领域分类结果为一个时,则该待标注语料的可选标注数量 为一个,该待标注语料的复杂程度较为简单,将该待标注语料标记为第一类语 料。When the field classification result of the corpus to be labeled is one, the number of optional annotations of the corpus to be labeled is one, the complexity of the corpus to be labeled is relatively simple, and the corpus to be labeled is marked as the first type of corpus.

S280、使用预设的NLU模型对待标注语料进行语义标注,得到待标注语料 的语义预标注结果。S280. Use a preset NLU model to perform semantic annotation on the corpus to be labeled, and obtain a semantic pre-labeling result of the corpus to be labeled.

S290、如果语义预标注结果通过人工核查,则将语义预标注结果作为待标 注语料的语义目标标注结果。S290. If the semantic pre-labeling result passes manual inspection, the semantic pre-labeling result is used as the semantic target labeling result of the corpus to be labelled.

当使用预设的NLU模型对待标注语料进行语义标注,得到待标注语料的语 义预标注结果后,可以对得到的预标注结果中的领域信息进行人工核查,相比 于完全通过人工标注的方式对待标注语料进行语义标注,利用人工对NLU模型 得到的语义预标注结果进行核查,可以更加节约时间和人力资源成本投入,也 更容易保证语义标注结果的准确率和质量。可选的,如果语义预标注结果没有 通过人工核查,则对语义预标注结果进行人工修改,然后将修改后的语义预标 注结果作为待标注语料的语义目标标注结果。When the pre-labeled corpus is semantically labeled using the preset NLU model, and the semantic pre-labeling result of the to-be-labeled corpus is obtained, the domain information in the obtained pre-labeling result can be manually checked, compared to the way of manual labeling. The annotation corpus is semantically annotated, and the semantic pre-annotation results obtained by the NLU model are manually checked, which can save time and human resource costs, and make it easier to ensure the accuracy and quality of the semantic annotation results. Optionally, if the semantic pre-labeling result does not pass manual verification, manually modify the semantic pre-labeling result, and then use the modified semantic pre-labeling result as the semantic target labeling result of the corpus to be labelled.

S2100、将待标注语料标记为第二类语料。S2100. Mark the corpus to be marked as the second type of corpus.

当待标注语料的领域分类结果为多个时,则该待标注语料的可选标注数量 多于一个,该待标注语料的复杂程度较为复杂,将该待标注语料标记为第二类 语料。When there are multiple domain classification results of the corpus to be labeled, the number of optional annotations of the corpus to be labeled is more than one, and the complexity of the corpus to be labeled is relatively complex, and the corpus to be labeled is marked as the second type of corpus.

S2110、获取待标注语料的多个语义待定标注结果。S2110. Acquire multiple semantic pending annotation results of the corpus to be annotated.

S2120、在多个语义待定标注结果中,确定待标注语料的语义目标标注结果。S2120. Among the multiple semantic pending annotation results, determine the semantic target annotation results of the corpus to be annotated.

本实施例未尽详细之处,请参见前述实施例,在此不再赘述。For details that are not described in this embodiment, please refer to the foregoing embodiments, which will not be repeated here.

上述技术方案中,通过对模型进行训练,可以得到具备对待标注语料进行 领域分类能力的语料分类模型,然后将待标注语料输入至训练后得到的语料分 类模型,可以得到待标注语料的领域分类结果,根据领域分类结果的数量对待 标注语料进行标记,然后再根据待标注语料的标记结果,选取匹配的语义标注 策略对待标注语料进行语义标注。在上述技术方案中,根据领域分类结果的数 量将待标注语料分成两类,并使用不同的语义标注策略分别对两类语料进行标 注处理,避免完全利用人工标注的方式对所有的待标注语料进行语义标注,节 省了人力资源的成本投入,提高了语义标注的效率和语义标注结果的准确率, 保证了语义标注结果的质量。In the above technical solution, by training the model, a corpus classification model capable of classifying the corpus to be labeled can be obtained, and then the corpus to be labeled is input into the corpus classification model obtained after training, and the domain classification result of the corpus to be labeled can be obtained. , mark the corpus to be labeled according to the number of domain classification results, and then select a matching semantic labeling strategy to perform semantic labeling on the corpus to be labeled according to the labeling results of the corpus to be labeled. In the above technical solution, the corpus to be labeled is divided into two categories according to the number of domain classification results, and different semantic labeling strategies are used to label the two kinds of corpus respectively, so as to avoid the manual labeling method for all the corpus to be labeled. Semantic annotation saves the cost of human resources, improves the efficiency of semantic annotation and the accuracy of semantic annotation results, and ensures the quality of semantic annotation results.

实施例三Embodiment 3

图3是本发明实施例三提供的一种语义标注方法的流程图,本实施例提供 了一种具体的实施方式。Fig. 3 is a flowchart of a semantic labeling method provided in Embodiment 3 of the present invention, and this embodiment provides a specific implementation manner.

如图3所示,本实施例的方法具体包括:As shown in Figure 3, the method of this embodiment specifically includes:

S310、获取待标注语料。S310. Acquire the corpus to be marked.

S320、将待标注语料输入至预先训练得到的语料分类模型,得到待标注语 料的领域分类结果。S320. Input the corpus to be labeled into the corpus classification model obtained by pre-training, and obtain the domain classification result of the corpus to be labeled.

将待标注语料输入预先训练得到的语料分类模型,语料分类模型输出该待 标语料的领域分类结果。其中,领域分类结果的数量可以标识待标注语料的标 注复杂程度,可选的,当领域分类结果的数量为1个时,可以认定待标注语料 的标注复杂程度较低,当领域分类结果的数量为多个时,可以认定待标注语料 的标注复杂程度较高。Input the corpus to be labeled into the pre-trained corpus classification model, and the corpus classification model outputs the domain classification result of the corpus to be labeled. The number of domain classification results can identify the labeling complexity of the corpus to be labeled. Optionally, when the number of domain classification results is 1, it can be determined that the labeling complexity of the corpus to be labeled is low. When the number of domain classification results When there are more than one, it can be determined that the annotation complexity of the corpus to be annotated is relatively high.

S330、判断领域分类结果的数量是否是一个,若是,则执行S340,否则, 则执行S390。S330. Determine whether the number of field classification results is one, if so, execute S340, otherwise, execute S390.

S340、将待标注语料标记为0。S340, marking the corpus to be marked as 0.

其中,标记为0的待标注语料属于第一类语料,也即标注可选标注数量为 一个的复杂程度较低的语料。Among them, the corpus to be annotated marked as 0 belongs to the first type of corpus, that is, the corpus of lower complexity with the number of optional annotations marked as one.

S350、使用预设的NLU模型对待标注语料进行语义标注,得到待标注语料 的语义预标注结果。S350. Use a preset NLU model to perform semantic annotation on the corpus to be labeled, and obtain a semantic pre-labeling result of the corpus to be labeled.

由于标记为0的待标注语料的标记复杂程度较低,可以借助模型完成语义 标注,然后再加以人工核查即可使语义标注结果的准确率和质量达标。因此, 对标记为0的待标注语料使用预设的NLU模型进行语义标注,得到待标注语料 的语义预标注结果,并进行人工核查。Since the marking complexity of the corpus to be annotated marked as 0 is low, the semantic annotation can be completed with the help of the model, and then the accuracy and quality of the semantic annotation results can be achieved by manual verification. Therefore, the pre-labeled corpus marked as 0 is semantically labeled using the preset NLU model, the semantic pre-labeling result of the corpus to be labeled is obtained, and manual verification is performed.

S360、判断语义预标注结果是否通过人工核查,若是,则执行S370,若否, 则执行S380。S360: Determine whether the semantic pre-labeling result has passed the manual inspection, if yes, execute S370, and if not, execute S380.

S370、将语义预标注结果作为待标注语料的语义目标标注结果。S370. Use the semantic pre-labeling result as the semantic target labeling result of the corpus to be labelled.

S380、获取人工修改后的语义标注结果,作为待标注语料的语义目标标注 结果。S380. Obtain the artificially modified semantic labeling result as the semantic target labeling result of the corpus to be labelled.

S390、将待标注语料标记为1。S390, marking the corpus to be marked as 1.

其中,标记为1的待标注语料属于第二类语料,也即标注可选标注数量为 多个的复杂程度较高的语料。Among them, the corpus to be annotated marked as 1 belongs to the second type of corpus, that is, the corpus with a relatively high degree of complexity with multiple optional annotations.

S3100、获取待标注语料的多个语义待定标注结果。S3100. Acquire multiple semantic pending annotation results of the corpus to be annotated.

由于标记为1的待标注语料的标记复杂程度较高,会存在一语多义的情况, 在语义标注中容易产生歧义,可能会得到多个有语义标注结果,故可以采用多 人工同时进行语义标注的策略。因此,对标记为1的待标注语料进行多人工语 义标注,获取对待标注语料的多个语义待定标注结果。Due to the high degree of marking complexity of the corpus to be marked marked as 1, there will be polysemy, ambiguity will easily occur in semantic marking, and multiple semantic marking results may be obtained, so multiple manual semantics can be used at the same time. Labeled strategy. Therefore, multi-artificial semantic annotation is performed on the corpus to be annotated marked as 1, and multiple semantic undetermined annotation results of the corpus to be annotated are obtained.

S3110、统计每个语义待定标注结果中的领域信息。S3110. Count the domain information in each semantic pending annotation result.

统计每一个语义待定标注结果中的领域信息在获得的多个语义待定标注结 果中的数量占比。Count the proportion of the domain information in each semantic pending annotation result in the obtained multiple semantic pending annotation results.

S3120、判断数量占比最大的领域信息是否仅存在一个,若是,则执行S3130, 否则,执行S3140。S3120: Determine whether there is only one domain information with the largest proportion of the number, if yes, execute S3130, otherwise, execute S3140.

S3130、将数量占比最大的一个领域信息作为待标注语料的目标领域信息, 并将与目标领域信息匹配的一条语义待定标注结果作为待标注语料的语义目标 标注结果。S3130. Use the domain information with the largest proportion as the target domain information of the corpus to be labeled, and use a semantic pending labeling result matching the target domain information as the semantic target labeling result of the corpus to be labeled.

统计每个语义待定标注结果中的领域信息,若数量占比最大的领域信息仅 存在一个,采用少数服从多数的原则,选取数量占比最大的一个领域信息作为 待标注语料的目标领域信息,并冻结确定的目标领域信息,避免在确定待标注 语料的语义标注结果时产生歧义以及标注员和质检员对同一条待标注语料进行 反复修改的情况发生。Count the domain information in each semantic pending annotation result. If there is only one domain information with the largest proportion, adopt the principle of minority obeying the majority, and select the domain information with the largest proportion as the target domain information of the corpus to be annotated. Freeze the determined target domain information to avoid ambiguity when determining the semantic annotation results of the corpus to be annotated and the situation where the annotator and the quality inspector repeatedly modify the same corpus to be annotated.

将根据确定的目标领域信息匹配的一条语义待定标注结果作为待标注语料 的语义目标标注结果,能够快速锁定待标注语义的正确语义标注结果,提高语 义标注的效率。A semantic pending annotation result matched according to the determined target domain information is used as the semantic target annotation result of the corpus to be annotated, which can quickly lock the correct semantic annotation result of the to-be-annotated semantics and improve the efficiency of semantic annotation.

S3140、统计每个语义待定标注结果中的意图信息。S3140. Count the intent information in each semantic pending annotation result.

如果数量占比最大的领域信息为多个,则进一步地统计每个语义待定标注 结果中的意图信息。If there are more than one domain information with the largest proportion, the intent information in each semantic pending annotation result is further counted.

S3150、判断数量占比最大的意图信息是否仅存在一个,若是,则执行S3160, 否则,执行S3170。S3150: Determine whether there is only one intent information with the largest proportion, if so, execute S3160; otherwise, execute S3170.

S3160、将与数量占比最大的领域信息以及数量占比最大的意图信息匹配的 一条语义待定标注结果作为待标注语料的语义目标标注结果。S3160. Use a semantic pending annotation result matching the domain information with the largest quantity and the intention information with the largest quantity as the semantic target annotation result of the corpus to be annotated.

如果判断数量占比最大的意图信息为一个,依据少数服从多数的原则,将 与数量占比最大的领域信息以及数量占比最大的意图信息匹配的一条语义待定 标注结果作为待标注语料的语义目标标注结果。If it is judged that there is one intent information with the largest proportion, according to the principle that the minority obeys the majority, a semantic pending annotation result that matches the domain information with the largest proportion and the intent information with the largest proportion is used as the semantic target of the corpus to be annotated. Label the results.

S3170、统计每个语义待定标注结果中的属性槽信息,并将与数量占比最大 的领域信息、数量占比最大的意图信息以及数量占比最大的属性槽信息匹配的 一条语义待定标注结果作为待标注语料的语义目标标注结果。S3170: Count the attribute slot information in each semantic pending annotation result, and use a semantic pending annotation result matching the domain information with the largest proportion, the intent information with the largest proportion, and the attribute slot information with the largest proportion as the semantic undetermined annotation result. The semantic target annotation results of the corpus to be annotated.

如果判断数量占比最大的意图信息为多个,则再进一步地统计每个语义待 定标注结果中的属性槽信息,并将与数量占比最大的领域信息、数量占比最大 的意图信息以及数量占比最大的属性槽信息匹配的一条语义待定标注结果作为 待标注语料的语义目标标注结果。其中,如果数量占比最大的属性槽信息为多 个,则可以通过随机选取的方式任意选择一个数量占比最大的属性槽信息,将 与该属性槽信息匹配的一条语义待定标注结果作为待标注语料的语义目标标注 结果。If it is judged that there are more than one intent information with the largest proportion, then the attribute slot information in each semantic pending annotation result will be further counted, and will be compared with the domain information with the largest proportion, the intent information with the largest proportion, and the number of A semantic pending annotation result matching the attribute slot information with the largest proportion is used as the semantic target annotation result of the corpus to be annotated. Among them, if there are more than one attribute slot information with the largest proportion, then one attribute slot information with the largest proportion can be arbitrarily selected by random selection, and a semantic pending annotation result matching the attribute slot information is used as the pending annotation. The semantic target annotation results of the corpus.

本实施例未尽详细之处,请参见前述实施例,在此不再赘述。For details that are not described in this embodiment, please refer to the foregoing embodiments, which will not be repeated here.

在上述技术方案中,将待标注语料输入预先训练得到的语料分类模型,可 以通过该语料分类模型判断待标注语料的领域分类结果的数量,将输入的待标 注语料分为标记为0的简单语料和标记为1的复杂语料,以实现根据标记0和 1区分待标注语料的标注复杂程度,并对标记为0的简单语料使用预设的NLU 模型对待标注语料进行语义标注,对标记为1的复杂语料使用多人工语义标注, 合理分配人力,节省了人力资源的成本投入,并且提高了语义标注结果的准确 率。In the above technical solution, the corpus to be labeled is input into a pre-trained corpus classification model, the number of domain classification results of the corpus to be labeled can be judged by the corpus classification model, and the input corpus to be labeled is divided into simple corpus marked as 0 and complex corpus marked as 1, in order to distinguish the labeling complexity of the corpus to be marked according to the marks 0 and 1, and use the preset NLU model for the simple corpus marked as 0 to semantically mark the corpus to be marked. Complex corpus uses multi-manual semantic annotation, rationally allocates manpower, saves the cost of human resources, and improves the accuracy of semantic annotation results.

实施例四Embodiment 4

图4是本发明实施例四提供的一种语义标注装置的结构示意图,可适用于 根据待标注语料的语义标注难易程度采用不同的语义标注策略进行标注的情况, 该装置可采用软件和/或硬件的方式实现,并一般可集成在例如车载终端设备的 处理器中。4 is a schematic structural diagram of a semantic labeling device provided in Embodiment 4 of the present invention, which can be applied to the situation where different semantic labeling strategies are used for labeling according to the difficulty of semantic labeling of the corpus to be labelled. The device can use software and/or or hardware, and can generally be integrated in, for example, a processor of an in-vehicle terminal device.

如图4所示,该语义标注装置具体包括:语料获取模块410、语料标记模 块420和语义标注模块430。其中,As shown in FIG. 4 , the semantic labeling device specifically includes: a corpus acquisition module 410, a corpus labeling module 420, and a semantic labeling module 430. in,

语料获取模块410,设置为获取待标注语料;The corpus acquisition module 410 is configured to acquire the corpus to be marked;

语料标记模块420,设置为根据所述待标注语料的可选标注数量,对所述 待标注语料进行标记;The corpus marking module 420 is configured to mark the corpus to be labeled according to the optional labeling quantity of the corpus to be labeled;

语义标注模块430,设置为根据标记结果,选取匹配的语义标注策略对所 述待标注语料进行语义标注。The semantic labeling module 430 is configured to select a matching semantic labeling strategy to perform semantic labeling on the corpus to be labelled according to the labeling result.

本实施例提供的一种语义标注装置,实现了对不同复杂程度的待标注语料 匹配不同的语义标注策略,节省了人力资源的成本投入,提高了语义标注的效 率及语义标注结果的准确率和质量。The semantic labeling device provided in this embodiment realizes matching different semantic labeling strategies for corpus to be labelled with different complexity, saves the cost input of human resources, improves the efficiency of semantic labeling and the accuracy of semantic labeling results. quality.

进一步的,所述语料标记模块包括:Further, the corpus marking module includes:

领域分类结果确定单元,设置为将所述待标注语料输入至预先训练得到的 语料分类模型,得到所述待标注语料的领域分类结果;a domain classification result determination unit, configured to input the to-be-labeled corpus into a pre-trained corpus classification model to obtain the domain classification result of the to-be-labeled corpus;

第一语料标记单元,设置为如果所述领域分类结果的数量为一个,则将所 述待标注语料标记为第一类语料;The first corpus marking unit is configured to mark the corpus to be marked as the first type of corpus if the number of the field classification results is one;

第二语料标记单元,设置为如果所述领域分类结果的数量为多个,则将所 述待标注语料标记为第二类语料。The second corpus marking unit is configured to mark the to-be-labeled corpus as the second type of corpus if the number of the domain classification results is multiple.

进一步的,上述装置还包括:语料分类模型生成模块,设置为在将所述待 标注语料输入至预先训练得到的语料分类模型,得到所述待标注语料的领域分 类结果之前,获取附带领域分类结果的至少两条样本语料;Further, the above-mentioned device further includes: a corpus classification model generation module, configured to input the to-be-labeled corpus into a pre-trained corpus classification model, and obtain an additional domain classification result before obtaining the domain classification result of the to-be-labeled corpus. at least two sample corpora;

将所述样本语料以及所述样本语料附带的领域分类结果对应作为一组训练 样本数据;Corresponding to the sample corpus and the domain classification result attached to the sample corpus as a set of training sample data;

采用至少两组训练样本数据对机器学习模型进行训练,生成语料分类模型。At least two sets of training sample data are used to train the machine learning model to generate a corpus classification model.

在一种可选的实施方式中,所述语义标注模块具体设置为:In an optional implementation manner, the semantic annotation module is specifically set as:

如果将所述待标注语料标记为第一类语料,则使用预设的自然语言理解 NLU模型对所述待标注语料进行语义标注,得到所述待标注语料的语义目标标 注结果。If the to-be-labeled corpus is marked as the first type of corpus, a preset natural language understanding NLU model is used to semantically label the to-be-labeled corpus, and a semantic target annotation result of the to-be-labeled corpus is obtained.

在另一种可选的实施方式中,所述语义标注模块具体设置为:In another optional implementation manner, the semantic annotation module is specifically set as:

如果将所述待标注语料标记为第二类语料,则获取所述待标注语料的多个 语义待定标注结果;If the to-be-labeled corpus is marked as the second type of corpus, then multiple semantic pending labeling results of the to-be-labeled corpus are obtained;

在所述多个语义待定标注结果中,确定所述待标注语料的语义目标标注结 果。Among the plurality of semantic pending annotation results, the semantic target annotation results of the to-be-annotated corpus are determined.

进一步的,所述语义标注模块具体设置为:Further, the semantic annotation module is specifically set as:

统计每个所述语义待定标注结果中的领域信息;Count the domain information in each of the semantic pending annotation results;

如果数量占比最大的领域信息仅存在一个,则将所述数量占比最大的领域 信息作为所述待标注语料的目标领域信息,并将与所述目标领域信息匹配的一 条所述语义待定标注结果作为所述待标注语料的语义目标标注结果;If there is only one domain information with the largest proportion, the domain information with the largest proportion is used as the target domain information of the to-be-annotated corpus, and a piece of the semantic to-be-determined information matching the target domain information is annotated The result is used as the semantic target labeling result of the corpus to be labelled;

如果数量占比最大的领域信息存在多个,则统计每个所述语义待定标注结 果中的意图信息;If there are more than one domain information with the largest proportion, count the intent information in each of the semantic pending annotation results;

若数量占比最大的意图信息仅存在一个,则将与所述数量占比最大的领域 信息以及所述数量占比最大的意图信息匹配的一条所述语义待定标注结果作为 所述待标注语料的语义目标标注结果;If there is only one piece of intent information with the largest proportion of the quantity, the one piece of the semantic pending annotation result that matches the domain information with the largest proportion of the quantity and the intent information with the largest proportion of the quantity is used as the corpus to be annotated. Semantic target annotation results;

若数量占比最大的意图信息存在多个,则统计每个所述语义待定标注结果 中的属性槽信息,并将与所述数量占比最大的领域信息、所述数量占比最大的 意图信息以及数量占比最大的属性槽信息匹配的一条所述语义待定标注结果作 为所述待标注语料的语义目标标注结果。If there are more than one intent information with the largest proportion, the attribute slot information in each of the semantic pending annotation results will be counted, and the domain information with the largest proportion of the number and the intent information with the largest proportion will be counted. and a piece of the semantic pending annotation result that matches the attribute slot information with the largest proportion as the semantic target annotation result of the to-be-annotated corpus.

上述语义标注装置可执行本发明任意实施例所提供的语义标注方法,具备 执行语义标注方法相应的功能模块和有益效果。The above-mentioned semantic labeling device can execute the semantic labeling method provided by any embodiment of the present invention, and has corresponding functional modules and beneficial effects for executing the semantic labeling method.

实施例五Embodiment 5

图5为本发明实施例五提供的一种车载终端设备的硬件结构示意图,如图 5所示,该车载终端设备包括:FIG. 5 is a schematic diagram of the hardware structure of a vehicle-mounted terminal device according to Embodiment 5 of the present invention. As shown in FIG. 5 , the vehicle-mounted terminal device includes:

一个或多个处理器510,图5中以一个处理器510为例;One or more processors 510, one processor 510 is taken as an example in FIG. 5;

存储器520;memory 520;

所述车载终端设备还可以包括:输入装置530和输出装置540。The in-vehicle terminal device may further include: an input device 530 and an output device 540 .

所述车载终端设备中的处理器510、存储器520、输入装置530和输出装置 540可以通过总线或者其他方式连接,图5中以通过总线连接为例。The processor 510, the memory 520, the input device 530 and the output device 540 in the in-vehicle terminal device may be connected by a bus or in other ways, and the connection by a bus is taken as an example in FIG. 5 .

存储器520作为一种非暂态计算机可读存储介质,可用于存储软件程序、 计算机可执行程序以及模块,如本发明实施例中的一种语义标注方法对应的程 序指令/模块(例如,附图4所示的语料获取模块410、语料标记模块420和语 义标注模块430)。处理器510通过运行存储在存储器520中的软件程序、指令 以及模块,从而执行车载终端设备的各种功能应用以及数据处理,即实现上述 方法实施例的一种语义标注方法。As a non-transitory computer-readable storage medium, the memory 520 can be used to store software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to a semantic labeling method in the embodiments of the present invention (for example, the accompanying drawings). The corpus acquisition module 410, the corpus tagging module 420 and the semantic tagging module 430 shown in 4). The processor 510 executes various functional applications and data processing of the in-vehicle terminal device by running the software programs, instructions and modules stored in the memory 520, that is, to implement a semantic labeling method of the above method embodiments.

存储器520可以包括存储程序区和存储数据区,其中,存储程序区可存储 操作系统、至少一个功能所需要的应用程序;存储数据区可存储根据计算机设 备的使用所创建的数据等。此外,存储器520可以包括高速随机存取存储器, 还可以包括非暂态性存储器,例如至少一个磁盘存储器件、闪存器件、或其他 非暂态性固态存储器件。在一些实施例中,存储器520可选包括相对于处理器 510远程设置的存储器,这些远程存储器可以通过网络连接至终端设备。上述 网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。The memory 520 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the computer device, and the like. Additionally, memory 520 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 520 may optionally include memory located remotely from processor 510, which may be connected to the terminal device via a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.

输入装置530可用于接收输入的数字或字符信息,以及产生与车载终端设 备的用户设置以及功能控制有关的键信号输入。输出装置540可包括显示屏等 显示设备。The input device 530 may be used to receive input numerical or character information, and generate key signal input related to user setting and function control of the in-vehicle terminal device. The output device 540 may include a display device such as a display screen.

实施例六Embodiment 6

本发明实施例六还提供一种包含计算机可执行指令的存储介质,所述计算 机可执行指令在由计算机处理器执行时用于执行一种语义标注方法,该方法包 括:获取待标注语料;根据所述待标注语料的可选标注数量,对所述待标注语 料进行标记;根据标记结果,选取匹配的语义标注策略对所述待标注语料进行 语义标注。Embodiment 6 of the present invention also provides a storage medium containing computer-executable instructions, the computer-executable instructions being used to execute a semantic labeling method when executed by a computer processor, the method comprising: acquiring a corpus to be labeled; The optional label quantity of the to-be-labeled corpus is to mark the to-be-labeled corpus; according to the labeling result, a matching semantic labeling strategy is selected to perform semantic labeling on the to-be-labeled corpus.

可选的,该计算机可执行指令在由计算机处理器执行时还可以用于执行本 发明任意实施例所提供的一种语义标注方法的技术方案。Optionally, when executed by a computer processor, the computer-executable instructions may also be used to execute the technical solution of a semantic labeling method provided by any embodiment of the present invention.

通过以上关于实施方式的描述,所属领域的技术人员可以清楚地了解到, 本发明可借助软件及必需的通用硬件来实现,当然也可以通过硬件实现,但很 多情况下前者是更佳的实施方式。基于这样的理解,本发明的技术方案本质上 或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机 软件产品可以存储在计算机可读存储介质中,如计算机的软盘、只读存储器 (Read-Only Memory,ROM)、随机存取存储器(RandomAccess Memory,RAM)、 闪存(FLASH)、硬盘或光盘等,包括若干指令用以使得一台计算机设备(可以 是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述的方法。From the above description of the embodiments, those skilled in the art can clearly understand that the present invention can be realized by software and necessary general-purpose hardware, and of course can also be realized by hardware, but in many cases the former is a better embodiment . Based on such understanding, the technical solutions of the present invention can be embodied in the form of software products in essence or the parts that make contributions to the prior art, and the computer software products can be stored in a computer-readable storage medium, such as a floppy disk of a computer , read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), flash memory (FLASH), hard disk or CD, etc., including several instructions to make a computer device (which can be a personal computer, A server, or a network device, etc.) executes the methods described in the various embodiments of the present invention.

值得注意的是,上述语义标注装置的实施例中,所包括的各个单元和模块 只是按照功能逻辑进行划分的,但并不局限于上述的划分,只要能够实现相应 的功能即可;另外,各功能单元的具体名称也只是为了便于相互区分,并不用 于限制本发明的保护范围。It is worth noting that, in the above embodiments of the semantic labeling device, the units and modules included are only divided according to functional logic, but are not limited to the above division, as long as the corresponding functions can be realized; The specific names of the functional units are only for the convenience of distinguishing from each other, and are not used to limit the protection scope of the present invention.

注意,上述仅为本发明的较佳实施例及所运用技术原理。本领域技术人员 会理解,本发明不限于这里所述的特定实施例,对本领域技术人员来说能够进 行各种明显的变化、重新调整和替代而不会脱离本发明的保护范围。因此,虽 然通过以上实施例对本发明进行了较为详细的说明,但是本发明不仅仅限于以 上实施例,在不脱离本发明构思的情况下,还可以包括更多其他等效实施例, 而本发明的范围由所附的权利要求范围决定。Note that the above are only preferred embodiments of the present invention and applied technical principles. Those skilled in the art will understand that the present invention is not limited to the specific embodiments described herein, and various obvious changes, readjustments and substitutions can be made to those skilled in the art without departing from the protection scope of the present invention. Therefore, although the present invention has been described in detail through the above embodiments, the present invention is not limited to the above embodiments, and can also include more other equivalent embodiments without departing from the concept of the present invention. The scope is determined by the scope of the appended claims.

Claims (11)

1.一种语义标注方法,其特征在于,包括:1. a semantic labeling method, is characterized in that, comprises: 获取待标注语料;Get the corpus to be labeled; 根据所述待标注语料的可选标注数量,对所述待标注语料进行标记;Marking the to-be-annotated corpus according to the optional label quantity of the to-be-labeled corpus; 根据标记结果,选取匹配的语义标注策略对所述待标注语料进行语义标注。According to the labeling result, a matching semantic labeling strategy is selected to perform semantic labeling on the corpus to be labelled. 2.根据权利要求1所述的方法,其特征在于,根据所述待标注语料的可选标注数量,对所述待标注语料进行标记,包括:2. The method according to claim 1, wherein, according to the optional labeling quantity of the corpus to be labeled, the corpus to be labeled is marked, comprising: 将所述待标注语料输入至预先训练得到的语料分类模型,得到所述待标注语料的领域分类结果;Inputting the to-be-labeled corpus into a pre-trained corpus classification model to obtain a domain classification result of the to-be-labeled corpus; 如果所述领域分类结果的数量为一个,则将所述待标注语料标记为第一类语料;If the number of the domain classification results is one, marking the to-be-labeled corpus as the first type of corpus; 如果所述领域分类结果的数量为多个,则将所述待标注语料标记为第二类语料。If the number of the domain classification results is multiple, the to-be-labeled corpus is marked as the second type of corpus. 3.根据权利要求2所述的方法,其特征在于,在将所述待标注语料输入至预先训练得到的语料分类模型,得到所述待标注语料的领域分类结果之前,还包括:3. The method according to claim 2, characterized in that, before the corpus to be marked is input into a corpus classification model obtained by pre-training, and before the domain classification result of the corpus to be marked is obtained, it also comprises: 获取附带领域分类结果的至少两条样本语料;Obtain at least two sample corpora with domain classification results; 将所述样本语料以及所述样本语料附带的领域分类结果对应作为一组训练样本数据;Corresponding the sample corpus and the domain classification result attached to the sample corpus as a set of training sample data; 采用至少两组训练样本数据对机器学习模型进行训练,生成语料分类模型。At least two sets of training sample data are used to train the machine learning model to generate a corpus classification model. 4.根据权利要求2所述的方法,其特征在于,根据标记结果,选取匹配的语义标注策略对所述待标注语料进行语义标注,包括:4. The method according to claim 2, wherein, according to the labeling result, selecting a matching semantic labeling strategy to perform semantic labeling on the to-be-labeled corpus, comprising: 如果将所述待标注语料标记为第一类语料,则使用预设的自然语言理解NLU模型对所述待标注语料进行语义标注,得到所述待标注语料的语义目标标注结果。If the to-be-labeled corpus is marked as the first type of corpus, a preset natural language understanding NLU model is used to semantically label the to-be-labeled corpus, and a semantic target labeling result of the to-be-labeled corpus is obtained. 5.根据权利要求2所述的方法,其特征在于,根据标记结果,选取匹配的语义标注策略对所述待标注语料进行语义标注,包括:5. The method according to claim 2, wherein, according to the labeling result, selecting a matching semantic labeling strategy to perform semantic labeling on the to-be-labeled corpus, comprising: 如果将所述待标注语料标记为第二类语料,则获取所述待标注语料的多个语义待定标注结果;If the to-be-annotated corpus is marked as the second type of corpus, acquiring multiple semantic pending annotation results of the to-be-annotated corpus; 在所述多个语义待定标注结果中,确定所述待标注语料的语义目标标注结果。Among the plurality of semantic pending annotation results, a semantic target annotation result of the to-be-annotated corpus is determined. 6.根据权利要求5所述的方法,其特征在于,在所述多个语义待定标注结果中,确定所述待标注语料的语义目标标注结果,包括:6. The method according to claim 5, wherein, among the plurality of semantic pending annotation results, determining the semantic target annotation results of the to-be-annotated corpus, comprising: 统计每个所述语义待定标注结果中的领域信息;Count the domain information in each of the semantic pending annotation results; 如果数量占比最大的领域信息仅存在一个,则将所述数量占比最大的领域信息作为所述待标注语料的目标领域信息,并将与所述目标领域信息匹配的一条所述语义待定标注结果作为所述待标注语料的语义目标标注结果;If there is only one domain information with the largest proportion, the domain information with the largest proportion is used as the target domain information of the to-be-annotated corpus, and a piece of the semantic to-be-determined information matching the target domain information is annotated The result is used as the semantic target labeling result of the corpus to be labelled; 如果数量占比最大的领域信息存在多个,则统计每个所述语义待定标注结果中的意图信息;If there is more than one domain information with the largest proportion, count the intent information in each of the semantic pending annotation results; 若数量占比最大的意图信息仅存在一个,则将与所述数量占比最大的领域信息以及所述数量占比最大的意图信息匹配的一条所述语义待定标注结果作为所述待标注语料的语义目标标注结果;If there is only one piece of intent information with the largest proportion of the quantity, the one piece of the semantic pending annotation result that matches the domain information with the largest proportion of the quantity and the intent information with the largest proportion of the quantity is used as the corpus to be annotated. Semantic target annotation results; 若数量占比最大的意图信息存在多个,则统计每个所述语义待定标注结果中的属性槽信息,并将与所述数量占比最大的领域信息、所述数量占比最大的意图信息以及数量占比最大的属性槽信息匹配的一条所述语义待定标注结果作为所述待标注语料的语义目标标注结果。If there are more than one intent information with the largest proportion, the attribute slot information in each of the semantic pending annotation results will be counted, and the domain information with the largest proportion of the number and the intent information with the largest proportion will be counted. and a piece of the semantic pending annotation result that matches the attribute slot information with the largest proportion as the semantic target annotation result of the to-be-annotated corpus. 7.一种语义标注装置,其特征在于,包括:7. A semantic labeling device, comprising: 语料获取模块,设置为获取待标注语料;The corpus acquisition module is set to acquire the corpus to be marked; 语料标记模块,设置为根据所述待标注语料的可选标注数量,对所述待标注语料进行标记;a corpus marking module, configured to mark the to-be-labeled corpus according to the optional label quantity of the to-be-labeled corpus; 语义标注模块,设置为根据标记结果,选取匹配的语义标注策略对所述待标注语料进行语义标注。The semantic labeling module is configured to select a matching semantic labeling strategy to perform semantic labeling on the to-be-labeled corpus according to the labeling result. 8.根据权利要求7所述的装置,其特征在于,所述语料标记模块包括:8. The apparatus according to claim 7, wherein the corpus marking module comprises: 领域分类结果确定单元,设置为将所述待标注语料输入至预先训练得到的语料分类模型,得到所述待标注语料的领域分类结果;a domain classification result determination unit, configured to input the to-be-labeled corpus into a pre-trained corpus classification model to obtain a domain classification result of the to-be-labeled corpus; 第一语料标记单元,设置为如果所述领域分类结果的数量为一个,则将所述待标注语料标记为第一类语料;a first corpus marking unit, configured to mark the to-be-labeled corpus as the first type of corpus if the number of the field classification results is one; 第二语料标记单元,设置为如果所述领域分类结果的数量为多个,则将所述待标注语料标记为第二类语料;其中,所述第二类语料的标注复杂程度大于所述第一类语料。The second corpus marking unit is configured to mark the to-be-labeled corpus as the second type of corpus if the number of the domain classification results is multiple; wherein, the labeling complexity of the second type of corpus is greater than that of the first type of corpus. A class of corpus. 9.根据权利要求8所述的装置,其特征在于,所述语义标注模块具体设置为:9. The device according to claim 8, wherein the semantic labeling module is specifically set as: 如果将所述待标注语料标记为第一类语料,则使用预设的自然语言理解NLU模型对所述待标注语料进行语义标注,得到所述待标注语料的语义目标标注结果。If the to-be-labeled corpus is marked as the first type of corpus, a preset natural language understanding NLU model is used to semantically label the to-be-labeled corpus, and a semantic target labeling result of the to-be-labeled corpus is obtained. 10.根据权利要求8所述的装置,其特征在于,所述语义标注模块具体设置为:10. The device according to claim 8, wherein the semantic labeling module is specifically set to: 如果将所述待标注语料标记为第二类语料,则获取所述待标注语料的多个语义待定标注结果;If the to-be-annotated corpus is marked as the second type of corpus, acquiring multiple semantic pending annotation results of the to-be-annotated corpus; 在所述多个语义待定标注结果中,确定所述待标注语料的语义目标标注结果。Among the plurality of semantic pending annotation results, a semantic target annotation result of the to-be-annotated corpus is determined. 11.一种车载终端设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现如权利要求1~6中任一所述的方法。11. An in-vehicle terminal device, comprising a memory, a processor and a computer program stored in the memory and running on the processor, characterized in that, when the processor executes the program, the program as claimed in claims 1 to 6 is implemented. any of the methods described.
CN202011017386.9A 2020-09-24 2020-09-24 Semantic annotation method and device and vehicle-mounted terminal equipment Pending CN114254085A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011017386.9A CN114254085A (en) 2020-09-24 2020-09-24 Semantic annotation method and device and vehicle-mounted terminal equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011017386.9A CN114254085A (en) 2020-09-24 2020-09-24 Semantic annotation method and device and vehicle-mounted terminal equipment

Publications (1)

Publication Number Publication Date
CN114254085A true CN114254085A (en) 2022-03-29

Family

ID=80788817

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011017386.9A Pending CN114254085A (en) 2020-09-24 2020-09-24 Semantic annotation method and device and vehicle-mounted terminal equipment

Country Status (1)

Country Link
CN (1) CN114254085A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100063799A1 (en) * 2003-06-12 2010-03-11 Patrick William Jamieson Process for Constructing a Semantic Knowledge Base Using a Document Corpus
CN108959412A (en) * 2018-06-07 2018-12-07 出门问问信息科技有限公司 Generation method, device, equipment and the storage medium of labeled data
CN109918489A (en) * 2019-02-28 2019-06-21 上海乐言信息科技有限公司 A kind of knowledge question answering method and system of more strategy fusions
CN111695053A (en) * 2020-06-12 2020-09-22 上海智臻智能网络科技股份有限公司 Sequence labeling method, data processing device and readable storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100063799A1 (en) * 2003-06-12 2010-03-11 Patrick William Jamieson Process for Constructing a Semantic Knowledge Base Using a Document Corpus
CN108959412A (en) * 2018-06-07 2018-12-07 出门问问信息科技有限公司 Generation method, device, equipment and the storage medium of labeled data
CN109918489A (en) * 2019-02-28 2019-06-21 上海乐言信息科技有限公司 A kind of knowledge question answering method and system of more strategy fusions
CN111695053A (en) * 2020-06-12 2020-09-22 上海智臻智能网络科技股份有限公司 Sequence labeling method, data processing device and readable storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
AMIT DEOKAR: "On semantic annotation of decision models", INFORMATION SYSTEMS AND E-BUSINESS MANAGEMENT, 31 March 2013 (2013-03-31), pages 1 - 11 *
习翔宇: "语义角色标注(Semantic Role Labelling)", pages 1 - 2, Retrieved from the Internet <URL:https://zhuanlan.zhihu.com/p/35789254> *
付博;陈毅恒;邵艳秋;刘挺;: "基于用户自然标注的微博文本的消费意图识别", 中文信息学报, no. 04, 15 July 2017 (2017-07-15), pages 1 - 11 *

Similar Documents

Publication Publication Date Title
CN111695345B (en) Method and device for identifying entity in text
CN107291783B (en) Semantic matching method and intelligent equipment
CN101840406B (en) Place name searching device and system
CN102262634B (en) An automatic question answering method and system
CN111292751B (en) Semantic analysis method and device, voice interaction method and device, and electronic equipment
CN108763510A (en) Intension recognizing method, device, equipment and storage medium
CN112749265B (en) Intelligent question-answering system based on multiple information sources
WO2018000272A1 (en) Corpus generation device and method
CN110737770B (en) Text data sensitivity identification method and device, electronic equipment and storage medium
CN102662923A (en) Entity instance leading method based on machine learning
CN111737990B (en) Word slot filling method, device, equipment and storage medium
CN110309277A (en) Man-machine dialog semantic analysis method and system
CN113268615A (en) Resource label generation method and device, electronic equipment and storage medium
CN111914539A (en) Channel announcement information extraction method and system based on BilSTM-CRF model
CN110738055A (en) Text entity identification method, text entity identification equipment and storage medium
CN114817576A (en) Model training and patent knowledge graph completion method, device and storage medium
CN116976321A (en) Text processing method, apparatus, computer device, storage medium, and program product
CN106897274B (en) Cross-language comment replying method
CN110851572A (en) Session marking method, device, storage medium and electronic device
CN114254085A (en) Semantic annotation method and device and vehicle-mounted terminal equipment
US8666987B2 (en) Apparatus and method for processing documents to extract expressions and descriptions
CN110232160B (en) Method and device for detecting interest point transition event and storage medium
CN112883735A (en) Form image structured processing method, device, equipment and storage medium
CN110728982A (en) Information interaction method and system based on voice touch screen, storage medium and vehicle-mounted terminal
CN110738041B (en) Statement labeling method, device, server and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination