CN115688808A

CN115688808A - Translation method, translation device, readable medium and electronic equipment

Info

Publication number: CN115688808A
Application number: CN202211430812.0A
Authority: CN
Inventors: 孙泽维; 王移帆; 程善伯; 王明轩
Original assignee: Beijing Youzhuju Network Technology Co Ltd; Lemon Inc Cayman Island
Current assignee: Beijing Youzhuju Network Technology Co Ltd; Lemon Inc Cayman Island
Priority date: 2022-11-15
Filing date: 2022-11-15
Publication date: 2023-02-03

Abstract

The embodiment of the disclosure relates to a translation method, a translation device, a readable medium and electronic equipment. The method comprises the following steps: determining a source text to be translated and a target language style; determining a text with an undetermined style from a plurality of styles corresponding to the target language style according to the source text; and inputting the source text and the text with the undetermined style into a pre-generated text translation model to obtain a target translation text output by the text translation model, wherein the language style of the target translation text is a target language style. Therefore, the translation process of the text translation model can be assisted through the text with the undetermined style, so that the text translation model can realize accurate stylized translation without training a large amount of stylized bilingual texts, the training complexity of the text translation model is reduced, and the accuracy of the stylized translation of the text translation model is improved.

Description

Translation method, device, readable medium and electronic equipment

技术领域technical field

本公开涉及计算机技术领域，具体地，涉及一种翻译方法、装置、可读介质及电子设备。The present disclosure relates to the field of computer technology, and in particular, to a translation method, device, readable medium and electronic equipment.

背景技术Background technique

随着计算机技术的进步，机器翻译成为自然语言文本处理中的一个重要研究课题。机器翻译是指通过计算机或其他电子设备，将源语言的文本翻译到与之语义等价的目标语言文本的过程。自然语言文本可以用不同的词汇和句法写成各种风格，而不同风格的语义学保持不变。语言风格在许多语言中起着重要的交流作用，例如英语中有美式英语风格和英式英语风格，韩语中有敬语风格和非敬语风格，等等。With the advancement of computer technology, machine translation has become an important research topic in natural language text processing. Machine translation refers to the process of translating a source language text into a semantically equivalent target language text through computers or other electronic devices. Natural language texts can be written in various styles with different vocabularies and syntaxes, while the semantics of the different styles remain the same. Language style plays an important role in communication in many languages, for example, there are American English style and British English style in English, honorific style and non-honorific style in Korean, and so on.

但是，在相关技术中，机器翻译无法针对性的实现翻译后的语言风格，需要在翻译完成后再人工对语言进行调整，影响机器翻译的翻译效果和翻译效率。However, in related technologies, machine translation cannot achieve targeted language style after translation, and the language needs to be manually adjusted after the translation is completed, which affects the translation effect and translation efficiency of machine translation.

发明内容Contents of the invention

提供该发明内容部分以便以简要的形式介绍构思，这些构思将在后面的具体实施方式部分被详细描述。该发明内容部分并不旨在标识要求保护的技术方案的关键特征或必要特征，也不旨在用于限制所要求的保护的技术方案的范围。This Summary is provided to introduce a simplified form of concepts that are described in detail later in the Detailed Description. This summary of the invention is not intended to identify key features or essential features of the claimed technical solution, nor is it intended to be used to limit the scope of the claimed technical solution.

根据本公开实施例的第一方面，提供一种翻译方法，所述方法包括：According to a first aspect of an embodiment of the present disclosure, there is provided a translation method, the method comprising:

确定待翻译的源文本和目标语言风格；Determine the source text and target language style to be translated;

根据所述源文本，从所述目标语言风格对应的多个风格文本中确定待定风格文本；According to the source text, determine the undetermined style text from a plurality of style texts corresponding to the target language style;

将所述源文本和所述待定风格文本输入预先生成的文本翻译模型，得到所述文本翻译模型输出的目标翻译文本，所述目标翻译文本的语言风格为所述目标语言风格。Inputting the source text and the undetermined style text into a pre-generated text translation model to obtain a target translation text output by the text translation model, the language style of the target translation text being the target language style.

根据本公开实施例的第二方面，提供一种翻译装置，所述装置包括：According to a second aspect of an embodiment of the present disclosure, there is provided a translation device, the device comprising:

第一确定模块，用于确定待翻译的源文本和目标语言风格；The first determining module is used to determine the source text to be translated and the target language style;

第二确定模块，用于根据所述源文本，从所述目标语言风格对应的多个风格文本中确定待定风格文本；The second determination module is configured to determine a pending style text from a plurality of style texts corresponding to the target language style according to the source text;

翻译模块，用于将所述源文本和所述待定风格文本输入预先生成的文本翻译模型，得到所述文本翻译模型输出的目标翻译文本，所述目标翻译文本的语言风格为所述目标语言风格。A translation module, configured to input the source text and the text of the undetermined style into a pre-generated text translation model to obtain a target translation text output by the text translation model, the language style of the target translation text is the target language style .

根据本公开实施例的第三方面，提供一种计算机可读介质，其上存储有计算机程序，所述计算机程序被处理装置执行时实现本公开第一方面所述方法的步骤。According to a third aspect of the embodiments of the present disclosure, there is provided a computer-readable medium on which a computer program is stored, and when the computer program is executed by a processing device, the steps of the method described in the first aspect of the present disclosure are implemented.

根据本公开实施例的第四方面，提供一种电子设备，包括：According to a fourth aspect of the embodiments of the present disclosure, there is provided an electronic device, including:

存储装置，其上存储有计算机程序；a storage device on which a computer program is stored;

处理装置，用于执行所述存储装置中的所述计算机程序，以实现本公开第一方面所述方法的步骤。A processing device configured to execute the computer program in the storage device to implement the steps of the method described in the first aspect of the present disclosure.

采用上述技术方案，确定待翻译的源文本和目标语言风格；根据源文本，从目标语言风格对应的多个风格文本中确定待定风格文本；将源文本和待定风格文本输入预先生成的文本翻译模型，得到文本翻译模型输出的目标翻译文本，目标翻译文本的语言风格为目标语言风格。这样，可以通过待定风格文本对文本翻译模型的翻译过程进行辅助，使得该文本翻译模型可以无需大量风格化双语文本进行训练就可以实现准确的风格化翻译，既降低了文本翻译模型训练的复杂度，又提高了文本翻译模型进行风格化翻译的准确度。Using the above technical solution, determine the source text and target language style to be translated; determine the undetermined style text from multiple style texts corresponding to the target language style according to the source text; input the source text and undetermined style text into the pre-generated text translation model , to obtain the target translation text output by the text translation model, and the language style of the target translation text is the target language style. In this way, the translation process of the text translation model can be assisted by the undetermined style text, so that the text translation model can achieve accurate stylized translation without training a large number of stylized bilingual texts, which not only reduces the complexity of text translation model training , which improves the accuracy of the text translation model for stylized translation.

应当理解的是，以上的一般描述和后文的细节描述仅是示例性和解释性的，并不能限制本公开。It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the present disclosure.

附图说明Description of drawings

结合附图并参考以下具体实施方式，本公开各实施例的上述和其他特征、优点及方面将变得更加明显。贯穿附图中，相同或相似的附图标记表示相同或相似的元素。应当理解附图是示意性的，原件和元素不一定按照比例绘制。在附图中：The above and other features, advantages and aspects of the various embodiments of the present disclosure will become more apparent with reference to the following detailed description in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numerals denote the same or similar elements. It should be understood that the drawings are schematic and that elements and elements are not necessarily drawn to scale. In the attached picture:

图1是根据一示例性实施例示出的一种翻译方法的流程图。Fig. 1 is a flowchart of a translation method according to an exemplary embodiment.

图2是根据图1所示实施例示出的一种S102步骤的流程图。Fig. 2 is a flow chart showing a step of S102 according to the embodiment shown in Fig. 1 .

图3是根据一示例性实施例示出的一种翻译方法的示意图。Fig. 3 is a schematic diagram of a translation method according to an exemplary embodiment.

图4是根据一示例性实施例示出的一种生成目标风格文本集的方法的流程图。Fig. 4 is a flowchart showing a method for generating a target style text set according to an exemplary embodiment.

图5是根据一示例性实施例示出的一种翻译装置的框图。Fig. 5 is a block diagram of a translation device according to an exemplary embodiment.

图6是根据一示例性实施例示出的另一种翻译装置的框图。Fig. 6 is a block diagram of another translation device according to an exemplary embodiment.

图7是根据一示例性实施例示出的一种电子设备的框图。Fig. 7 is a block diagram of an electronic device according to an exemplary embodiment.

具体实施方式Detailed ways

下面将参照附图更详细地描述本公开的实施例。虽然附图中显示了本公开的某些实施例，然而应当理解的是，本公开可以通过各种形式来实现，而且不应该被解释为限于这里阐述的实施例，相反提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是，本公开的附图及实施例仅用于示例性作用，并非用于限制本公开的保护范围。Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although certain embodiments of the present disclosure are shown in the drawings, it should be understood that the disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein; A more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for exemplary purposes only, and are not intended to limit the protection scope of the present disclosure.

应当理解，本公开的方法实施方式中记载的各个步骤可以按照不同的顺序执行，和/或并行执行。此外，方法实施方式可以包括附加的步骤和/或省略执行示出的步骤。本公开的范围在此方面不受限制。It should be understood that the various steps described in the method implementations of the present disclosure may be executed in different orders, and/or executed in parallel. Additionally, method embodiments may include additional steps and/or omit performing illustrated steps. The scope of the present disclosure is not limited in this regard.

本文使用的术语“包括”及其变形是开放性包括，即“包括但不限于”。术语“基于”是“至少部分地基于”。术语“一个实施例”表示“至少一个实施例”；术语“另一实施例”表示“至少一个另外的实施例”；术语“一些实施例”表示“至少一些实施例”。其他术语的相关定义将在下文描述中给出。As used herein, the term "comprise" and its variations are open-ended, ie "including but not limited to". The term "based on" is "based at least in part on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one further embodiment"; the term "some embodiments" means "at least some embodiments." Relevant definitions of other terms will be given in the description below.

需要注意，本公开中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或单元进行区分，并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存关系。It should be noted that concepts such as "first" and "second" mentioned in this disclosure are only used to distinguish different devices, modules or units, and are not used to limit the sequence of functions performed by these devices, modules or units or interdependence.

需要注意，本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的，本领域技术人员应当理解，除非在上下文另有明确指出，否则应该理解为“一个或多个”。在本公开的描述中，除非另有说明，“多个”是指两个或多于两个，其它量词与之类似；“至少一项(个)”、“一项(个)或多项(个)”或其类似表达，是指的这些项(个)中的任意组合，包括单项(个)或复数项(个)的任意组合。例如，至少一项(个)a，可以表示任意数目个a；再例如，a，b和c中的一项(个)或多项(个)，可以表示：a，b，c，a-b，a-c，b-c，或a-b-c，其中a，b，c可以是单个，也可以是多个；“和/或”是一种描述关联对象的关联关系，表示可以存在三种关系，例如，A和/或B，可以表示：单独存在A，同时存在A和B，单独存在B这三种情况，其中A，B可以是单数或者复数。It should be noted that the modifications of "one" and "multiple" mentioned in the present disclosure are illustrative and not restrictive, and those skilled in the art should understand that unless the context clearly indicates otherwise, it should be understood as "one or more" multiple". In the description of the present disclosure, unless otherwise stated, "plurality" refers to two or more than two, and other quantifiers are similar; "at least one (one)", "one (one) or more (units)” or similar expressions refer to any combination of these items (units), including any combination of single item (units) or plural items (units). For example, at least one (one) a can represent any number of a; another example, one or more items (ones) in a, b and c can represent: a, b, c, a-b, a-c, b-c, or a-b-c, where a, b, c can be single or multiple; "and/or" is an association relationship describing associated objects, indicating that there can be three relationships, for example, A and/ Or B, can mean: A alone exists, A and B exist at the same time, and B exists alone, where A and B can be singular or plural.

在本公开实施例中尽管在附图中以特定的顺序描述操作或步骤，但是不应将其理解为要求按照所示的特定顺序或是串行顺序来执行这些操作或步骤，或是要求执行全部所示的操作或步骤以得到期望的结果。在本公开的实施例中，可以串行执行这些操作或步骤；也可以并行执行这些操作或步骤；也可以执行这些操作或步骤中的一部分。In the embodiments of the present disclosure, although operations or steps are described in a specific order in the drawings, it should not be understood as requiring that these operations or steps be performed in the specific order shown or in a serial order, or that All indicated actions or steps to achieve the desired result. In the embodiments of the present disclosure, these operations or steps may be performed in series; these operations or steps may also be performed in parallel; or a part of these operations or steps may be performed.

本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的，而并不是用于对这些消息或信息的范围进行限制。The names of messages or information exchanged between multiple devices in the embodiments of the present disclosure are used for illustrative purposes only, and are not used to limit the scope of these messages or information.

可以理解的是，在使用本公开各实施例公开的技术方案之前，均应当依据相关法律法规通过恰当的方式对本公开所涉及个人信息的类型、使用范围、使用场景等告知用户并获得用户的授权。It can be understood that before using the technical solutions disclosed in the embodiments of the present disclosure, the user should be informed of the type, scope of use, and use scenarios of the personal information involved in the present disclosure in an appropriate manner in accordance with relevant laws and regulations, and the authorization of the user should be obtained. .

例如，在响应于接收到用户的主动请求时，向用户发送提示信息，以明确地提示用户，其请求执行的操作将需要获取和使用到用户的个人信息。从而，使得用户可以根据提示信息来自主地选择是否向执行本公开技术方案的操作的电子设备、应用程序、服务器或存储介质等软件或硬件提供个人信息。For example, in response to receiving the user's active request, send prompt information to the user to clearly remind the user that the requested operation will require the acquisition and use of the user's personal information. Thus, the user can independently choose whether to provide personal information to software or hardware such as electronic devices, application programs, servers, or storage media that perform the operations of the technical solution of the present disclosure according to the prompt information.

作为一种可选的但非限定性的实现方式，响应于接收到用户的主动请求，向用户发送提示信息的方式例如可以是弹窗的方式，弹窗中可以以文字的方式呈现提示信息。此外，弹窗中还可以承载供用户选择“同意”或者“不同意”向电子设备提供个人信息的选择控件。As an optional but non-limiting implementation, in response to receiving the active request from the user, the prompt information may be sent to the user, for example, in the form of a pop-up window, and the prompt information may be presented in text in the pop-up window. In addition, the pop-up window may also carry a selection control for the user to choose "agree" or "disagree" to provide personal information to the electronic device.

可以理解的是，上述通知和获取用户授权过程仅是示意性的，不对本公开的实现方式构成限定，其它满足相关法律法规的方式也可应用于本公开的实现方式中。It can be understood that the above process of notifying and obtaining user authorization is only illustrative and does not limit the implementation of the present disclosure. Other methods that meet relevant laws and regulations may also be applied to the implementation of the present disclosure.

同时，可以理解的是，本技术方案所涉及的数据(包括但不限于数据本身、数据的获取或使用)应当遵循相应法律法规及相关规定的要求。At the same time, it can be understood that the data involved in this technical solution (including but not limited to the data itself, the acquisition or use of data) should comply with the requirements of corresponding laws and regulations and relevant regulations.

下面结合具体实施例对本公开进行说明。The present disclosure will be described below in combination with specific embodiments.

首先，对本公开的应用场景进行说明。本公开可以应用于语言翻译场景，特别是风格化语言翻译场景，例如，在将中文翻译为美式英语的场景，或者，将英文翻译为韩语的敬语风格的场景。First, the application scenarios of the present disclosure are described. The present disclosure can be applied to language translation scenarios, especially stylized language translation scenarios, for example, in the scenario of translating Chinese into American English, or in the scenario of translating English into Korean honorific style.

为了使得机器翻译模型能够实现风格化翻译，可以收集大量的风格化的双语文本(例如中文文本和美式英语文本的文本对，英文文本和韩语敬语风格文本的文本对)，对机器翻译模型进行训练，以得到风格化的翻译模型。但是，使用该方法，需要收集大量的风格化双语文本，而由于风格化的双语文本语料有限，导致训练后的模型翻译准确度不高。In order to enable the machine translation model to achieve stylized translation, a large number of stylized bilingual texts (such as text pairs of Chinese text and American English text, text pairs of English text and Korean honorific style text) can be collected, and the machine translation model can be Train to get a stylized translation model. However, using this method, a large amount of stylized bilingual texts needs to be collected, and due to the limited corpus of stylized bilingual texts, the translation accuracy of the trained model is not high.

图1是根据一示例性实施例示出的一种翻译方法的流程图。该方法可以应用于电子设备，该电子设备可以包括终端设备，例如智能手机、智能可穿戴设备、智能音箱、智能平板、PDA(Personal Digital Assistant，个人数字助理)、CPE(Customer PremiseEquipment，客户终端设备)、个人计算机、车载终端等；该电子设备也可以包括服务器，例如本地服务器或云服务器。如图1所示，该方法可以包括：Fig. 1 is a flowchart of a translation method according to an exemplary embodiment. The method can be applied to electronic equipment, and the electronic equipment can include terminal equipment, such as smart phones, smart wearable devices, smart speakers, smart tablets, PDA (Personal Digital Assistant, personal digital assistant), CPE (Customer PremiseEquipment, customer terminal equipment) ), a personal computer, a vehicle terminal, etc.; the electronic device may also include a server, such as a local server or a cloud server. As shown in Figure 1, the method may include:

S101、确定待翻译的源文本和目标语言风格。S101. Determine the source text to be translated and the target language style.

其中，该目标语言风格可以是对源文本翻译后预期的语言风格；该源文本可以是词语、句子、段落或文章，本公开对此不作限定。Wherein, the target language style may be an expected language style after translation of the source text; the source text may be words, sentences, paragraphs or articles, which is not limited in the present disclosure.

该源文本对应的语言可以称为源语言，待翻译后预期的语言可以称为目标语言，翻译后预期的语言风格可以称为目标语言风格。示例地，若用户输入的源文本为中文文本，则源语言为中文，翻译后预期的目标语言可以是英文，翻译后预期的目标语言风格可以是美式英语。The language corresponding to the source text may be called the source language, the expected language to be translated may be called the target language, and the expected language style after translation may be called the target language style. For example, if the source text input by the user is Chinese text, the source language is Chinese, the expected target language after translation may be English, and the expected target language style after translation may be American English.

其中，翻译后的目标语言和目标语言风格可以由用户指定，也可以由电子设备自动检测确定。Wherein, the translated target language and target language style may be specified by the user, or may be automatically detected and determined by the electronic device.

在一些实施例中，可以将用户输入的待翻译文本作为源文本，将用户输入的语言风格作为目标语言风格。In some embodiments, the text to be translated input by the user may be used as the source text, and the language style input by the user may be used as the target language style.

示例地，可以向用户提供语言风格选择框，用户可以通过该语音风格选择框，选择相应的语言风格作为该目标语言风格。For example, a language style selection box may be provided to the user, through which the user may select a corresponding language style as the target language style.

在另一些实施例中，可以将用户输入的待翻译文本作为源文本，根据用户输入该源文本使用的电子设备的状态参数，自动确定目标语言风格。In some other embodiments, the text to be translated input by the user may be used as the source text, and the target language style may be automatically determined according to the state parameters of the electronic device used by the user to input the source text.

其中，该状态参数可以包括电子设备的当前时区参数、国家和语言参数等。Wherein, the state parameters may include current time zone parameters, country and language parameters of the electronic device, and the like.

S102、根据该源文本，从该目标语言风格对应的多个风格文本中确定待定风格文本。S102. According to the source text, determine a pending style text from a plurality of style texts corresponding to the target language style.

其中，该目标语言风格对应的多个风格文本可以是预先生成的单语风格文本。示例地，若该目标语言风格为美式英语风格，则该多个风格文本均为美式英语风格的文本；若该目标语言风格为韩语敬语风格，则该多个风格文本均为韩语敬语风格的文本。Wherein, the multiple style texts corresponding to the target language style may be pre-generated monolingual style texts. For example, if the target language style is American English style, then the multiple style texts are all American English style texts; if the target language style is Korean honorific style, then the multiple style texts are Korean honorific style of the text.

在一些实施例中，可以将多个风格文本中与该源文本最接近的文本作为该待定风格文本。例如，可以通过预先生成的文本编码模型，对风格文本和源文本分别进行编码并计算源文本和每个风格文本的相似度，将相似度最高的风格文本作为该待定风格文本。In some embodiments, the text closest to the source text among the multiple style texts may be used as the undetermined style text. For example, the style text and the source text can be coded separately by a pre-generated text encoding model, and the similarity between the source text and each style text can be calculated, and the style text with the highest similarity can be used as the undetermined style text.

在另一些实施例中，可以从多个风格文本中随机选择一个风格文本作为该待定风格文本。In other embodiments, a style text may be randomly selected from multiple style texts as the pending style text.

S103、将该源文本和待定风格文本输入预先生成的文本翻译模型，得到文本翻译模型输出的目标翻译文本。S103. Input the source text and the undetermined style text into the pre-generated text translation model to obtain the target translation text output by the text translation model.

其中，该目标翻译文本的语言风格可以为目标语言风格。Wherein, the language style of the target translation text may be the target language style.

在一些实施例中，可以将源文本和待定风格文本进行拼接后，得到目标拼接文本，并将目标拼接文本输入文本翻译模型，得到目标翻译文本。In some embodiments, the target spliced text can be obtained after splicing the source text and the undetermined style text, and input the target spliced text into the text translation model to obtain the target translated text.

示例地，可以根据预设关键词将源文本和待定风格文本进行拼接，可以拼接为待定风格文本+预设关键词+源文本，或者，也可以拼接为源文本+预设关键词+待定风格文本。For example, the source text and the undetermined style text can be spliced according to preset keywords, which can be spliced into undetermined style text + preset keywords + source text, or can also be spliced into source text + preset keywords + undetermined style text.

例如，该预设关键词可以是<token>，若该源文本为“我不知道，他也不会告诉我”，该待定风格文本为“I tell thee,Kate,'twas burnt and dried away”，拼接后得到的目标拼接文本可以是“I tell thee,Kate,'twas burnt and dried away<token>我不知道，他也不会告诉我”，也可以是“我不知道，他也不会告诉我<token>I tell thee,Kate,'twasburnt and dried away”。For example, the preset keyword can be <token>, if the source text is "I don't know, he won't tell me", the pending style text is "I tell thee, Kate, 'twas burnt and dried away" , the target spliced text obtained after splicing can be "I tell thee, Kate, 'twas burnt and dried away <token> I don't know, and he won't tell me", or "I don't know, and he won't Tell me <token>I tell thee, Kate, 'twas burnt and dried away".

在另外一些实施例中，可以将源文本和待定风格文本分别输入上述文本翻译模型，得到目标翻译文本。In some other embodiments, the source text and the undetermined style text may be respectively input into the above-mentioned text translation model to obtain the target translation text.

在一些实施例中，该待定风格文本可以作为源文本对应的提示单语，用于对文本翻译模型进行提示，使得翻译后的目标翻译文本中保留该风格文本或该风格文本对应的目标语言风格，也就是使输出的目标翻译文本的语言风格为目标语言风格。In some embodiments, the undetermined style text can be used as a prompt monolingual corresponding to the source text, and is used to prompt the text translation model, so that the style text or the target language style corresponding to the style text is retained in the translated target translation text , that is, make the language style of the output target translation text the target language style.

在另一些实施例中，该文本翻译模型可以根据该待定风格文本确定目标语言风格，使得在翻译后的目标翻译文本的语音风格与该目标语言风格相同或相似。In some other embodiments, the text translation model may determine the target language style according to the pending style text, so that the speech style of the translated target translated text is the same as or similar to the target language style.

需要说明的是，上述文本翻译模型可以采用相关技术中的机器翻译模型，例如，Transformer模型或BERT模型，本公开对此不作限定。It should be noted that the above-mentioned text translation model may adopt a machine translation model in the related art, for example, a Transformer model or a BERT model, which is not limited in the present disclosure.

采用上述方法，确定待翻译的源文本和目标语言风格；根据源文本，从目标语言风格对应的多个风格文本中确定待定风格文本；将源文本和待定风格文本输入预先生成的文本翻译模型，得到文本翻译模型输出的目标翻译文本，目标翻译文本的语言风格为目标语言风格。这样，可以通过待定风格文本对文本翻译模型的翻译过程进行辅助，使得该文本翻译模型可以无需大量风格化双语文本进行训练就可以实现准确的风格化翻译，既降低了文本翻译模型训练的复杂度，又提高了文本翻译模型进行风格化翻译的准确度。Using the above method, determine the source text to be translated and the target language style; determine the undetermined style text from multiple style texts corresponding to the target language style according to the source text; input the source text and the undetermined style text into the pre-generated text translation model, The target translation text output by the text translation model is obtained, and the language style of the target translation text is the target language style. In this way, the translation process of the text translation model can be assisted by the undetermined style text, so that the text translation model can achieve accurate stylized translation without training a large number of stylized bilingual texts, which not only reduces the complexity of text translation model training , which improves the accuracy of the text translation model for stylized translation.

图2是根据图1所示实施例示出的一种S102步骤的流程图。如图2所示，上述S102步骤可以包括以下子步骤：Fig. 2 is a flow chart showing a step of S102 according to the embodiment shown in Fig. 1 . As shown in Figure 2, the above S102 step may include the following sub-steps:

S1021、确定目标语言风格对应的目标风格文本集。S1021. Determine a target style text set corresponding to the target language style.

其中，该目标风格文本集可以是预先生成的文本集，该目标风格文本集中可以包括多个风格文本，以及每个风格文本对应的第一向量，不同的语言风格可以对应不同的风格文本集，同一风格文本集中的风格文本对应的语言风格可以相同。Wherein, the target style text set may be a pre-generated text set, the target style text set may include multiple style texts, and the first vector corresponding to each style text, and different language styles may correspond to different style text sets, The language styles corresponding to the style texts in the same style text set can be the same.

示例地，在目标语言风格为美式英语的情况下，该目标风格文本集可以是美式英语风格文本集。For example, when the target language style is American English, the target style text set may be an American English style text set.

S1022、将源文本输入预先生成的多语言编码模型中，得到多语言编码模型输出的第二向量。S1022. Input the source text into the pre-generated multilingual encoding model to obtain a second vector output by the multilingual encoding model.

需要说明的是，该多语言编码模型可以是基于XLM-Roberta基础上进行开发的，但本实施例不限定于XLM-Roberta的模型网络结构，也可以是其它神经网络结构。其中，XLM-Roberta是一种典型的多语言预训练模型，是基于转换器的语言模型，其依赖于掩码语言模型为目标，能够处理100多种不同语言的文本。It should be noted that the multilingual coding model may be developed based on XLM-Roberta, but this embodiment is not limited to the model network structure of XLM-Roberta, and may also be other neural network structures. Among them, XLM-Roberta is a typical multilingual pre-training model, which is a converter-based language model, which relies on the mask language model as the target, and can process text in more than 100 different languages.

在一些实施例中，该第二向量可以是512维或768维的向量。In some embodiments, the second vector may be a 512-dimensional or 768-dimensional vector.

在一些实施例中，上述第一向量也可以是通过相同的多语言编码模型对风格文本进行编码后得到的向量。In some embodiments, the above-mentioned first vector may also be a vector obtained by encoding the style text through the same multilingual encoding model.

S1023、根据第二向量和第一向量，从目标风格文本集中确定待定风格文本。S1023. Determine the undetermined style text from the target style text set according to the second vector and the first vector.

在一些实施例中，可以从目标风格文本集的第一向量中，确定与第二向量的向量距离最小(也就是相似度最高)的第三向量；并将第三向量对应的风格文本作为待定风格文本。In some embodiments, from the first vector of the target style text set, determine the third vector with the smallest vector distance (that is, the highest similarity) with the second vector; and use the style text corresponding to the third vector as an undetermined style text.

在另一些实施例中，可以从目标风格文本集的第一向量中，确定与第二向量的向量距离最小的第三向量；在该第三向量与第二向量的向量距离小于或等于预设距离阈值的情况下，将第三向量对应的风格文本作为待定风格文本。该预设距离阈值可以是预先设定的阈值。In some other embodiments, the third vector with the smallest vector distance from the second vector can be determined from the first vector of the target style text set; when the vector distance between the third vector and the second vector is less than or equal to the preset In the case of the distance threshold, the style text corresponding to the third vector is used as the undetermined style text. The preset distance threshold may be a preset threshold.

这样，可以根据源文本从目标语言风格对应的多个风格文本中确定待定风格文本。In this way, the pending style text can be determined from multiple style texts corresponding to the target language style according to the source text.

这样，可以通过多语言编码模型，进行跨语言检索，得到源文本对应的待定风格文本，并将检索出的待定风格文本与源文本拼接，从而可以干预文本翻译模型输出的目标翻译文本的语言风格。In this way, cross-language retrieval can be performed through the multilingual coding model to obtain the undetermined style text corresponding to the source text, and the retrieved undetermined style text can be spliced with the source text, so that the language style of the target translation text output by the text translation model can be intervened .

图3是根据一示例性实施例示出的一种翻译方法的示意图。在图3中以三个源文本为例给出了该翻译方法的数据流示意。Fig. 3 is a schematic diagram of a translation method according to an exemplary embodiment. In Fig. 3, three source texts are taken as examples to illustrate the data flow of the translation method.

首先，确定待翻译的源文本。First, identify the source text to be translated.

以图3为例，该三个源文本可以包括：源文本Case1为“我不知道，他也不会告诉我。”，源文本Case2为“一个比我爱的人更美丽的女人？”，源文本Case3为“现在我告诉你，这样你就不用问了。”。Taking Figure 3 as an example, the three source texts can include: source text Case1 is "I don't know, and he won't tell me.", source text Case2 is "a woman more beautiful than the person I love?", The source text of Case3 is "Now I'll tell you so you don't have to ask.".

其次，将该三个源文本输入预先生成的多语言编码模型中，得到多语言编码模型针对每个源文本输出的第二向量。Secondly, the three source texts are input into the pre-generated multilingual encoding model to obtain a second vector output by the multilingual encoding model for each source text.

示例地，源文本Case1对应的第二向量为[0.01,0.02,-0.03,…,0.05,0.37]；源文本Case2对应的第二向量为[0.09,0.04,-0.01,…,0.17,0.07]；源文本Case3对应的第二向量为[-0.01,0.07,0.05,…,0.23,0.02]。For example, the second vector corresponding to the source text Case1 is [0.01,0.02,-0.03,…,0.05,0.37]; the second vector corresponding to the source text Case2 is [0.09,0.04,-0.01,…,0.17,0.07] ; The second vector corresponding to the source text Case3 is [-0.01,0.07,0.05,...,0.23,0.02].

再次，针对每个源文本，根据第二向量，从目标风格文本集中确定该源文本对应的待定风格文本。Again, for each source text, according to the second vector, determine the undetermined style text corresponding to the source text from the target style text set.

其中，该待定风格文本对应的第一向量与第二向量的向量距离最小(也就是相似度最高)。Wherein, the vector distance between the first vector and the second vector corresponding to the undetermined style text is the smallest (that is, the similarity is the highest).

示例地，源文本Case1与top1近邻向量的向量距离为20；源文本Case1对应的待定风格文本1为“I tell thee,Kate,'twas burnt and dried away.”；源文本Case2与top1近邻向量的向量距离为40；源文本Case2对应的待定风格文本2为“And I the king shalllove thee.”；源文本Case3与top1近邻向量的向量距离为30；源文本Case3对应的待定风格文本3为“Now let me see if I can conster it.”。For example, the vector distance between the source text Case1 and the top1 neighbor vector is 20; the undetermined style text 1 corresponding to the source text Case1 is "I tell thee, Kate, 'twas burnt and dried away."; the source text Case2 and the top1 neighbor vector The vector distance is 40; the undetermined style text 2 corresponding to the source text Case2 is "And I the king shall love thee."; the vector distance between the source text Case3 and the top1 neighbor vector is 30; the undetermined style text 3 corresponding to the source text Case3 is "Now Let me see if I can conster it.".

然后，将待定风格文本与源文本进行拼接，得到目标拼接文本。Then, the undetermined style text is spliced with the source text to obtain the target spliced text.

示例地，源文本Case1对应的目标拼接文本1为：“I tell thee,Kate,'twas burntand dried away.<token>我不知道，他也不会告诉我。”；源文本Case2对应的目标拼接文本2为：“And I the king shall love thee.<token>一个比我爱的人更美丽的女人？”；源文本Case3对应的目标拼接文本3为：“Now let me see if I can conster it.<token>现在我告诉你，这样你就不用问了。”For example, the target splicing text 1 corresponding to the source text Case1 is: "I tell thee, Kate, 'twas burnt and dried away. <token> I don't know, and he won't tell me."; the target splicing corresponding to the source text Case2 Text 2 is: "And I the king shall love thee. <token> a woman more beautiful than the person I love?"; the target spliced text 3 corresponding to the source text Case3 is: "Now let me see if I can conster it .<token>I'll tell you now so you don't have to ask."

最后，将目标拼接文本输入文本翻译模型，得到该文本翻译模型输出的目标翻译文本。Finally, the target spliced text is input into the text translation model to obtain the target translated text output by the text translation model.

这样，可以通过待定风格文本与源文本拼接，干预文本翻译模型输出的目标翻译文本的语言风格，实现准确的风格化翻译。In this way, the language style of the target translation text output by the text translation model can be intervened by splicing the undetermined style text and the source text to achieve accurate stylized translation.

在本公开的一些实施例中，上述目标风格文本集可以是预先生成的文本集。图4是根据一示例性实施例示出的一种生成目标风格文本集的方法的流程图。如图4所示，该目标风格文本集可以通过以下方式预先生成：In some embodiments of the present disclosure, the above-mentioned target style text set may be a pre-generated text set. Fig. 4 is a flowchart showing a method for generating a target style text set according to an exemplary embodiment. As shown in Figure 4, this target style text set can be pre-generated by:

S301、获取目标语言风格对应的多个风格文本。S301. Acquire multiple style texts corresponding to the target language style.

在本步骤中，可以人工收集目标语言风格的多个风格文本，也可以从网络上搜索得到目标语言风格对应的多个风格文本。In this step, multiple style texts of the target language style can be manually collected, or multiple style texts corresponding to the target language style can be obtained from the Internet.

S302、针对每个风格文本，将该风格文本输入多语言编码模型中，得到该多语言编码模型输出的第一向量。S302. For each style text, input the style text into the multilingual encoding model to obtain a first vector output by the multilingual encoding model.

同样地，该多语言编码模型可以是基于XLM-Roberta基础上进行开发的模型，该第一向量也可以是512维或768维的向量。Likewise, the multilingual coding model may be a model developed based on XLM-Roberta, and the first vector may also be a 512-dimensional or 768-dimensional vector.

在一些实施例中，该第一向量的维数与第二向量的维数可以相同，例如，均为768维。In some embodiments, the dimensions of the first vector and the second vector may be the same, for example, both are 768 dimensions.

S303、根据风格文本和第一向量，生成目标风格文本集。S303. Generate a target style text set according to the style text and the first vector.

在一些实施例中，可以根据风格文本和第一向量，基于预设检索算法生成目标风格文本集。该预设检索算法可以实现对文本向量的检索，例如，可以是ANN(ApproximateNearest Neighbor，近似最近邻)算法，也可以是相关技术中的文本向量检索引擎，本公开对此不作限定。In some embodiments, the target style text set may be generated based on a preset retrieval algorithm according to the style text and the first vector. The preset retrieval algorithm can realize the retrieval of text vectors, for example, it can be an ANN (Approximate Nearest Neighbor, approximate nearest neighbor) algorithm, or it can be a text vector retrieval engine in the related art, which is not limited in the present disclosure.

示例地，可以根据上述每个风格文本和该风格文本对应的第一向量生成检索对(pair)，例如，该检索对可以是(风格文本、第一向量)，或者，该检索对也可以是(第一向量、风格文本)。可以将该检索对输入文本向量检索引擎，通过ANN(Approximate NearestNeighbor，近似最近邻)算法进行训练，从而生成目标风格文本集。该目标风格文本集可以是一种检索库，可以将风格文本作为该检索库的数据(Value)，将第一向量作为该检索库的检索关键字(Key)。Exemplarily, a search pair (pair) can be generated according to each style text and the first vector corresponding to the style text, for example, the search pair can be (style text, first vector), or the search pair can also be (first vector, style text). The retrieval can be trained on an input text vector retrieval engine through an ANN (Approximate Nearest Neighbor, approximate nearest neighbor) algorithm to generate a target style text set. The target style text set may be a retrieval library, the style text may be used as the data (Value) of the retrieval database, and the first vector may be used as the retrieval keyword (Key) of the retrieval database.

这样，通过该方式可以预先生成上述目标风格文本集。In this way, the above-mentioned target style text set can be generated in advance in this manner.

在本公开的一些实施例中，可以将上述目标风格文本集用于对文本翻译模型的训练，示例地，该文本翻译模型为通过以下方式预先生成的：In some embodiments of the present disclosure, the above-mentioned target style text set can be used for training a text translation model. For example, the text translation model is pre-generated in the following manner:

首先，获取翻译训练样本。First, obtain translation training samples.

其中，该翻译训练样本包括多个源样本文本，以及每个源样本文本对应的目标样本文本。Wherein, the translation training sample includes a plurality of source sample texts, and a target sample text corresponding to each source sample text.

其次，根据源样本文本，从目标风格文本集中确定该源样本文本对应的风格样本文本。Secondly, according to the source sample text, the style sample text corresponding to the source sample text is determined from the target style text set.

在一些实施例中，可以将源样本文本输入上述多语言编码模型中，得到该多语言编码模型输出的第三向量；根据第三向量和目标风格文本集中的第一向量，从目标风格文本集中确定风格样本文本。In some embodiments, the source sample text can be input into the above-mentioned multilingual encoding model to obtain the third vector output by the multilingual encoding model; according to the third vector and the first vector in the target style text set, from the target style text set Determine the style sample text.

最后，根据该风格样本文本、源样本文本和目标样本文本，对预设翻译模型进行训练，得到文本翻译模型。Finally, according to the style sample text, the source sample text and the target sample text, the preset translation model is trained to obtain a text translation model.

示例地，可以将风格样本文本和源样本文本进行拼接后得到拼接源样本文本，并将拼接源样本文本和目标样本文本作为新的翻译训练样本，对预设翻译模型进行训练，得到目标翻译模型。For example, the style sample text and the source sample text can be spliced to obtain the spliced source sample text, and the spliced source sample text and the target sample text can be used as a new translation training sample to train the preset translation model to obtain the target translation model .

这样，可以根据目标风格文本集对文本翻译模型进行训练，从而进一步提高文本翻译模型的准确性。In this way, the text translation model can be trained according to the target style text set, thereby further improving the accuracy of the text translation model.

在本公开的一些实施例中提供的翻译方法可以分为离线和在线两个阶段，示例地，可以首先离线生成上述目标风格文本集和上述文本翻译模型；然后根据该目标风格文本集和文本翻译模型进行在线翻译，这样，可以减少在线翻译阶段的延迟，提高翻译效率。The translation method provided in some embodiments of the present disclosure can be divided into two stages: offline and online. For example, the above-mentioned target style text set and the above-mentioned text translation model can be generated offline first; then according to the target style text set and text translation The model is translated online, so that the delay in the online translation stage can be reduced and the translation efficiency can be improved.

进一步地，采用本公开实施例中的方式，在有新的语言风格产生时，可以离线生成该语言风格对应的风格文本集，就可以干预文本翻译模型的翻译风格，而文本翻译模型可以无需再次收集大量样本进行训练，从而实现了零样本学习的能力。Furthermore, by adopting the method in the embodiment of the present disclosure, when a new language style is generated, the style text set corresponding to the language style can be generated offline, and the translation style of the text translation model can be intervened, and the text translation model can not need to Collect a large number of samples for training, thus realizing the ability of zero-sample learning.

图4是根据一示例性实施例示出的一种翻译装置400的框图，如图4所示，该装置400可以包括：Fig. 4 is a block diagram of a translation device 400 according to an exemplary embodiment. As shown in Fig. 4, the device 400 may include:

第一确定模块401，用于确定待翻译的源文本和目标语言风格；The first determination module 401 is used to determine the source text to be translated and the target language style;

第二确定模块402，用于根据所述源文本，从所述目标语言风格对应的多个风格文本中确定待定风格文本；The second determination module 402 is configured to determine a pending style text from a plurality of style texts corresponding to the target language style according to the source text;

翻译模块403，用于将所述源文本和所述待定风格文本输入预先生成的文本翻译模型，得到所述文本翻译模型输出的目标翻译文本，所述目标翻译文本的语言风格为所述目标语言风格。A translation module 403, configured to input the source text and the text of the undetermined style into a pre-generated text translation model to obtain a target translation text output by the text translation model, the language style of the target translation text being the target language style.

根据本公开的一个或多个实施例，所述第二确定模块402，用于确定所述目标语言风格对应的目标风格文本集；所述目标风格文本集为预先生成的文本集，所述目标风格文本集中包括多个所述风格文本，以及每个所述风格文本对应的第一向量；不同的语言风格对应不同的风格文本集，同一风格文本集中的风格文本对应的语言风格相同；将所述源文本输入预先生成的多语言编码模型中，得到所述多语言编码模型输出的第二向量；根据所述第二向量和所述第一向量，从所述目标风格文本集中确定待定风格文本。According to one or more embodiments of the present disclosure, the second determining module 402 is configured to determine a target style text set corresponding to the target language style; the target style text set is a pre-generated text set, and the target The style text set includes a plurality of style texts, and the first vector corresponding to each style text; different language styles correspond to different style text sets, and the style texts in the same style text set correspond to the same language style; The source text is input into the pre-generated multilingual encoding model, and the second vector output by the multilingual encoding model is obtained; according to the second vector and the first vector, the undetermined style text is determined from the target style text set .

根据本公开的一个或多个实施例，所述第二确定模块402，用于从所述目标风格文本集的第一向量中，确定与所述第二向量的向量距离最小的第三向量；将所述第三向量对应的风格文本作为所述待定风格文本。According to one or more embodiments of the present disclosure, the second determination module 402 is configured to determine, from the first vectors of the target style text set, a third vector with the smallest vector distance to the second vector; The style text corresponding to the third vector is used as the undetermined style text.

图5是根据一示例性实施例示出的另一种翻译装置的框图，如图5所示，该装置400可以包括：Fig. 5 is a block diagram of another translation device according to an exemplary embodiment. As shown in Fig. 5, the device 400 may include:

生成模块404，用于获取所述目标语言风格对应的多个所述风格文本；针对每个所述风格文本，将所述风格文本输入所述多语言编码模型中，得到所述多语言编码模型输出的第一向量；根据所述风格文本和所述第一向量，生成所述目标风格文本集。A generating module 404, configured to acquire a plurality of style texts corresponding to the target language style; for each style text, input the style text into the multilingual coding model to obtain the multilingual coding model Output the first vector; generate the target style text set according to the style text and the first vector.

根据本公开的一个或多个实施例，所述生成模块404，用于根据所述风格文本和所述第一向量，基于预设检索算法生成所述目标风格文本集。According to one or more embodiments of the present disclosure, the generating module 404 is configured to generate the target style text set based on a preset retrieval algorithm according to the style text and the first vector.

根据本公开的一个或多个实施例，所述生成模块404，还用于获取翻译训练样本；所述翻译训练样本包括多个源样本文本，以及每个源样本文本对应的目标样本文本；根据所述源样本文本，从所述目标风格文本集中确定所述源样本文本对应的风格样本文本；根据所述风格样本文本、所述源样本文本和所述目标样本文本，对预设翻译模型进行训练，得到所述文本翻译模型。According to one or more embodiments of the present disclosure, the generating module 404 is further configured to obtain translation training samples; the translation training samples include multiple source sample texts, and target sample texts corresponding to each source sample text; according to For the source sample text, determine the style sample text corresponding to the source sample text from the target style text set; perform a preset translation model according to the style sample text, the source sample text, and the target sample text Train to obtain the text translation model.

根据本公开的一个或多个实施例，所述翻译模块403，用于将所述源文本和所述待定风格文本进行拼接后，得到目标拼接文本；将所述目标拼接文本输入所述文本翻译模型，得到所述目标翻译文本。According to one or more embodiments of the present disclosure, the translation module 403 is configured to obtain a target spliced text after splicing the source text and the undetermined style text; input the target spliced text into the text translation model to obtain the target translation text.

关于上述实施例中的装置，其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述，此处将不做详细阐述说明。Regarding the apparatus in the foregoing embodiments, the specific manner in which each module executes operations has been described in detail in the embodiments related to the method, and will not be described in detail here.

下面参考图6，其示出了适于用来实现本公开实施例的电子设备2000(例如终端设备或服务器)的结构示意图。本公开实施例中的终端设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。本公开实施例中的服务器可以包括但不限于诸如本地服务器、云服务器、单个服务器、分布式服务器等。图6示出的电子设备仅仅是一个示例，不应对本公开实施例的功能和使用范围带来任何限制。Referring now to FIG. 6 , it shows a schematic structural diagram of an electronic device 2000 (such as a terminal device or a server) suitable for implementing the embodiments of the present disclosure. The terminal equipment in the embodiment of the present disclosure may include but not limited to such as mobile phone, notebook computer, digital broadcast receiver, PDA (personal digital assistant), PAD (tablet computer), PMP (portable multimedia player), vehicle terminal (such as mobile terminals such as car navigation terminals) and fixed terminals such as digital TVs, desktop computers and the like. Servers in the embodiments of the present disclosure may include, but are not limited to, local servers, cloud servers, single servers, distributed servers, and the like. The electronic device shown in FIG. 6 is only an example, and should not limit the functions and application scope of the embodiments of the present disclosure.

如图6所示，电子设备2000可以包括处理装置(例如中央处理器、图形处理器等)2001，其可以根据存储在只读存储器(ROM)2002中的程序或者从存储装置2008加载到随机访问存储器(RAM)2003中的程序而执行各种适当的动作和处理。在RAM2003中，还存储有电子设备2000操作所需的各种程序和数据。处理装置2001、ROM2002以及RAM2003通过总线2004彼此相连。输入/输出(I/O)接口2005也连接至总线2004。As shown in FIG. 6, an electronic device 2000 may include a processing device (such as a central processing unit, a graphics processing unit, etc.) 2001, which may be randomly accessed according to a program stored in a read-only memory (ROM) 2002 or loaded from a storage device 2008. Various appropriate actions and processes are executed by programs in the memory (RAM) 2003 . In the RAM 2003, various programs and data necessary for the operation of the electronic device 2000 are also stored. The processing device 2001 , ROM 2002 , and RAM 2003 are connected to each other via a bus 2004 . An input/output (I/O) interface 2005 is also connected to the bus 2004 .

通常，以下装置可以连接至输入/输出接口2005：包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置2006；包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置2007；包括例如磁带、硬盘等的存储装置2008；以及通信装置2009。通信装置2009可以允许电子设备2000与其他设备进行无线或有线通信以交换数据。虽然图6示出了具有各种装置的电子设备2000，但是应理解的是，并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。Typically, the following devices can be connected to the input/output interface 2005: input devices 2006 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speaker, vibration an output device 2007 such as a computer; a storage device 2008 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 2009. The communication means 2009 may allow the electronic device 2000 to perform wireless or wired communication with other devices to exchange data. While FIG. 6 shows electronic device 2000 having various means, it should be understood that implementing or having all of the means shown is not a requirement. More or fewer means may alternatively be implemented or provided.

特别地，根据本公开的实施例，上文参考流程图描述的过程可以被实现为计算机软件程序。例如，本公开的实施例包括一种计算机程序产品，其包括承载在非暂态计算机可读介质上的计算机程序，该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中，该计算机程序可以通过通信装置2009从网络上被下载和安装，或者从存储装置2008被安装，或者从ROM2002被安装。在该计算机程序被处理装置2001执行时，执行本公开实施例的方法中限定的上述功能。In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product, which includes a computer program carried on a non-transitory computer readable medium, where the computer program includes program code for executing the method shown in the flowchart. In such an embodiment, the computer program may be downloaded and installed from the network via the communication means 2009, or installed from the storage means 2008, or installed from the ROM 2002. When the computer program is executed by the processing device 2001, the above-mentioned functions defined in the methods of the embodiments of the present disclosure are executed.

需要说明的是，本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质可以包括但不限于电、磁、光、电磁、红外线、或半导体的系统、装置或器件，或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于：具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中，计算机可读存储介质可以是任何包含或存储程序的有形介质，该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中，计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号，其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式，包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质，该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输，包括但不限于：电线、光缆、RF(射频)等等，或者上述的任意合适的组合。It should be noted that the above-mentioned computer-readable medium in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium or any combination of the above two. A computer readable storage medium may include, but is not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above. In the present disclosure, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In the present disclosure, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. The computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, and the computer-readable signal medium may send, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device . Program code embodied on a computer readable medium may be transmitted by any appropriate medium, including but not limited to: wires, optical cables, RF (radio frequency), etc., or any suitable combination of the above.

在一些实施方式中，客户端、服务器可以利用诸如HTTP(HyperText TransferProtocol，超文本传输协议)之类的任何当前已知或未来研发的网络协议进行通信，并且可以与任意形式或介质的数字数据通信(例如，通信网络)互连。通信网络的示例包括局域网(“LAN”)，广域网(“WAN”)，网际网(例如，互联网)以及端对端网络(例如，ad hoc端对端网络)，以及任何当前已知或未来研发的网络。In some embodiments, the client and the server can communicate using any currently known or future-developed network protocols such as HTTP (HyperText Transfer Protocol, Hypertext Transfer Protocol), and can communicate with digital data in any form or medium (eg, communication network) interconnections. Examples of communication networks include local area networks ("LANs"), wide area networks ("WANs"), internetworks (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network of.

上述计算机可读介质可以是上述电子设备中所包含的；也可以是单独存在，而未装配入该电子设备中。The above-mentioned computer-readable medium may be included in the above-mentioned electronic device, or may exist independently without being incorporated into the electronic device.

上述计算机可读介质承载有一个或者多个程序，当上述一个或者多个程序被该电子设备执行时，使得该电子设备：确定待翻译的源文本和目标语言风格；根据所述源文本，从所述目标语言风格对应的多个风格文本中确定待定风格文本；将所述源文本和所述待定风格文本输入预先生成的文本翻译模型，得到所述文本翻译模型输出的目标翻译文本，所述目标翻译文本的语言风格为所述目标语言风格。The above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device: determines the source text to be translated and the target language style; according to the source text, from Determining the undetermined style text among the plurality of style texts corresponding to the target language style; inputting the source text and the undetermined style text into a pre-generated text translation model to obtain the target translation text output by the text translation model, the The language style of the target translation text is the target language style.

可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码，上述程序设计语言包括但不限于面向对象的程序设计语言—诸如Java、Smalltalk、C++，还包括常规的过程式程序设计语言——诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中，远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)——连接到用户计算机，或者，可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, or combinations thereof, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and Includes conventional procedural programming languages - such as "C" or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In cases involving a remote computer, the remote computer may be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (for example, using an Internet service provider to connected via the Internet).

附图中的流程图和框图，图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上，流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分，该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意，在有些作为替换的实现中，方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如，两个接连地表示的方框实际上可以基本并行地执行，它们有时也可以按相反的顺序执行，这依所涉及的功能而定。也要注意的是，框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合，可以用执行规定的功能或操作的专用的基于硬件的系统来实现，或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in a flowchart or block diagram may represent a module, program segment, or portion of code that contains one or more logical functions for implementing specified executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved. It should also be noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by a dedicated hardware-based system that performs the specified functions or operations , or may be implemented by a combination of dedicated hardware and computer instructions.

描述于本公开实施例中所涉及到的模块可以通过软件的方式实现，也可以通过硬件的方式来实现。其中，模块的名称在某种情况下并不构成对该模块本身的限定，例如，第一确定模块还可以被描述为“确定待翻译的源文本和目标语言风格的模块”。The modules involved in the embodiments described in the present disclosure may be implemented by software or by hardware. Wherein, the name of the module does not constitute a limitation of the module itself under certain circumstances, for example, the first determination module may also be described as "a module for determining the style of the source text to be translated and the target language".

本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如，非限制性地，可以使用的示范类型的硬件逻辑部件包括：现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、片上系统(SOC)、复杂可编程逻辑设备(CPLD)等等。The functions described herein above may be performed at least in part by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), System on Chips (SOCs), Complex Programmable Logical device (CPLD) and so on.

在本公开的上下文中，机器可读介质可以是有形的介质，其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备，或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。In the context of the present disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. A machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.

根据本公开的一个或多个实施例，提供了一种翻译方法，所述方法包括：According to one or more embodiments of the present disclosure, there is provided a translation method, the method comprising:

根据本公开的一个或多个实施例，所述根据所述源文本，从所述目标语言风格对应的多个风格文本中确定待定风格文本包括：According to one or more embodiments of the present disclosure, according to the source text, determining the undetermined style text from the multiple style texts corresponding to the target language style includes:

确定所述目标语言风格对应的目标风格文本集；所述目标风格文本集为预先生成的文本集，所述目标风格文本集中包括多个所述风格文本，以及每个所述风格文本对应的第一向量；不同的语言风格对应不同的风格文本集，同一风格文本集中的风格文本对应的语言风格相同；Determine the target style text set corresponding to the target language style; the target style text set is a pre-generated text set, the target style text set includes a plurality of style texts, and each style text corresponds to the first One vector; different language styles correspond to different style text sets, and the style texts in the same style text set correspond to the same language style;

将所述源文本输入预先生成的多语言编码模型中，得到所述多语言编码模型输出的第二向量；inputting the source text into a pre-generated multilingual coding model to obtain a second vector output by the multilingual coding model;

根据所述第二向量和所述第一向量，从所述目标风格文本集中确定待定风格文本。According to the second vector and the first vector, a pending style text is determined from the target style text set.

根据本公开的一个或多个实施例，所述根据所述第二向量和所述第一向量，从所述目标风格文本集中确定待定风格文本包括：According to one or more embodiments of the present disclosure, the determining the pending style text from the target style text set according to the second vector and the first vector includes:

从所述目标风格文本集的第一向量中，确定与所述第二向量的向量距离最小的第三向量；From the first vectors of the target style text set, determine a third vector with the smallest vector distance from the second vector;

将所述第三向量对应的风格文本作为所述待定风格文本。The style text corresponding to the third vector is used as the undetermined style text.

根据本公开的一个或多个实施例，所述目标风格文本集为通过以下方式预先生成的：According to one or more embodiments of the present disclosure, the target style text set is pre-generated in the following manner:

获取所述目标语言风格对应的多个所述风格文本；Obtaining multiple style texts corresponding to the target language style;

针对每个所述风格文本，将所述风格文本输入所述多语言编码模型中，得到所述多语言编码模型输出的第一向量；For each style text, input the style text into the multilingual encoding model to obtain a first vector output by the multilingual encoding model;

根据所述风格文本和所述第一向量，生成所述目标风格文本集。Generate the target style text set according to the style text and the first vector.

根据本公开的一个或多个实施例，所述根据所述风格文本和所述第一向量，生成所述目标风格文本集包括：According to one or more embodiments of the present disclosure, the generating the target style text set according to the style text and the first vector includes:

根据所述风格文本和所述第一向量，基于预设检索算法生成所述目标风格文本集。According to the style text and the first vector, the target style text set is generated based on a preset retrieval algorithm.

根据本公开的一个或多个实施例，所述文本翻译模型为通过以下方式预先生成的：According to one or more embodiments of the present disclosure, the text translation model is pre-generated in the following manner:

获取翻译训练样本；所述翻译训练样本包括多个源样本文本，以及每个源样本文本对应的目标样本文本；Obtain translation training samples; the translation training samples include a plurality of source sample texts, and target sample texts corresponding to each source sample text;

根据所述源样本文本，从所述目标风格文本集中确定所述源样本文本对应的风格样本文本；determining the style sample text corresponding to the source sample text from the target style text set according to the source sample text;

根据所述风格样本文本、所述源样本文本和所述目标样本文本，对预设翻译模型进行训练，得到所述文本翻译模型。According to the style sample text, the source sample text and the target sample text, the preset translation model is trained to obtain the text translation model.

根据本公开的一个或多个实施例，所述将所述源文本和所述待定风格文本输入预先生成的文本翻译模型，得到所述文本翻译模型输出的目标翻译文本包括：According to one or more embodiments of the present disclosure, the input of the source text and the undetermined style text into a pre-generated text translation model, and obtaining the target translation text output by the text translation model includes:

将所述源文本和所述待定风格文本进行拼接后，得到目标拼接文本；After splicing the source text and the undetermined style text, the target spliced text is obtained;

将所述目标拼接文本输入所述文本翻译模型，得到所述目标翻译文本。Inputting the target spliced text into the text translation model to obtain the target translated text.

根据本公开的一个或多个实施例，提供了一种翻译装置，所述装置包括：According to one or more embodiments of the present disclosure, there is provided a translation device, the device comprising:

根据本公开的一个或多个实施例，所述第二确定模块，用于确定所述目标语言风格对应的目标风格文本集；所述目标风格文本集为预先生成的文本集，所述目标风格文本集中包括多个所述风格文本，以及每个所述风格文本对应的第一向量；不同的语言风格对应不同的风格文本集，同一风格文本集中的风格文本对应的语言风格相同；将所述源文本输入预先生成的多语言编码模型中，得到所述多语言编码模型输出的第二向量；根据所述第二向量和所述第一向量，从所述目标风格文本集中确定待定风格文本。According to one or more embodiments of the present disclosure, the second determining module is configured to determine a target style text set corresponding to the target language style; the target style text set is a pre-generated text set, and the target style The text set includes a plurality of style texts, and the first vector corresponding to each style text; different language styles correspond to different style text sets, and the style texts in the same style text set correspond to the same language style; The source text is input into the pre-generated multilingual encoding model, and the second vector output by the multilingual encoding model is obtained; according to the second vector and the first vector, the undetermined style text is determined from the target style text set.

根据本公开的一个或多个实施例，所述第二确定模块，用于从所述目标风格文本集的第一向量中，确定与所述第二向量的向量距离最小的第三向量；将所述第三向量对应的风格文本作为所述待定风格文本。According to one or more embodiments of the present disclosure, the second determination module is configured to determine, from the first vectors of the target style text set, a third vector with the smallest vector distance to the second vector; The style text corresponding to the third vector is used as the pending style text.

根据本公开的一个或多个实施例，所述装置还包括：According to one or more embodiments of the present disclosure, the device further includes:

生成模块，用于获取所述目标语言风格对应的多个所述风格文本；针对每个所述风格文本，将所述风格文本输入所述多语言编码模型中，得到所述多语言编码模型输出的第一向量；根据所述风格文本和所述第一向量，生成所述目标风格文本集。A generating module, configured to obtain a plurality of style texts corresponding to the target language style; for each style text, input the style text into the multilingual coding model to obtain the output of the multilingual coding model the first vector; generating the target style text set according to the style text and the first vector.

根据本公开的一个或多个实施例，所述生成模块，用于根据所述风格文本和所述第一向量，基于预设检索算法生成所述目标风格文本集。According to one or more embodiments of the present disclosure, the generating module is configured to generate the target style text set based on a preset retrieval algorithm according to the style text and the first vector.

根据本公开的一个或多个实施例，所述生成模块，还用于获取翻译训练样本；所述翻译训练样本包括多个源样本文本，以及每个源样本文本对应的目标样本文本；根据所述源样本文本，从所述目标风格文本集中确定所述源样本文本对应的风格样本文本；根据所述风格样本文本、所述源样本文本和所述目标样本文本，对预设翻译模型进行训练，得到所述文本翻译模型。According to one or more embodiments of the present disclosure, the generating module is further configured to obtain translation training samples; the translation training samples include multiple source sample texts, and target sample texts corresponding to each source sample text; according to the The source sample text, determine the style sample text corresponding to the source sample text from the target style text set; train the preset translation model according to the style sample text, the source sample text and the target sample text , to obtain the text translation model.

根据本公开的一个或多个实施例，所述翻译模块，用于将所述源文本和所述待定风格文本进行拼接后，得到目标拼接文本；将所述目标拼接文本输入所述文本翻译模型，得到所述目标翻译文本。According to one or more embodiments of the present disclosure, the translation module is configured to obtain a target spliced text after splicing the source text and the undetermined style text; input the target spliced text into the text translation model , to get the target translation text.

以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解，本公开中所涉及的公开范围，并不限于上述技术特征的特定组合而成的技术方案，同时也应涵盖在不脱离上述公开构思的情况下，由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。The above description is only a preferred embodiment of the present disclosure and an illustration of the applied technical principle. Those skilled in the art should understand that the disclosure scope involved in this disclosure is not limited to the technical solution formed by the specific combination of the above-mentioned technical features, but also covers the technical solutions formed by the above-mentioned technical features or Other technical solutions formed by any combination of equivalent features. For example, a technical solution formed by replacing the above-mentioned features with (but not limited to) technical features with similar functions disclosed in this disclosure.

此外，虽然采用特定次序描绘了各操作，但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序执行来执行。在一定环境下，多任务和并行处理可能是有利的。同样地，虽然在上面论述中包含了若干具体实现细节，但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实施例中。相反地，在单个实施例的上下文中描述的各种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。In addition, while operations are depicted in a particular order, this should not be understood as requiring that the operations be performed in the particular order shown or performed in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while the above discussion contains several specific implementation details, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

尽管已经采用特定于结构特征和/或方法逻辑动作的语言描述了本主题，但是应当理解所附权利要求书中所限定的主题未必局限于上面描述的特定特征或动作。相反，上面所描述的特定特征和动作仅仅是实现权利要求书的示例形式。关于上述实施例中的装置，其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述，此处将不做详细阐述说明。Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are merely example forms of implementing the claims. Regarding the apparatus in the foregoing embodiments, the specific manner in which each module executes operations has been described in detail in the embodiments related to the method, and will not be described in detail here.

Claims

1. A translation method, characterized in that the method comprises:

Determine the source text and target language style to be translated;

According to the source text, determine the undetermined style text from a plurality of style texts corresponding to the target language style;

Inputting the source text and the undetermined style text into a pre-generated text translation model to obtain a target translation text output by the text translation model, the language style of the target translation text being the target language style.

2. The method according to claim 1, wherein, according to the source text, determining the undetermined style text from a plurality of style texts corresponding to the target language style comprises:

Determine the target style text set corresponding to the target language style; the target style text set is a pre-generated text set, the target style text set includes a plurality of style texts, and each style text corresponds to the first One vector; different language styles correspond to different style text sets, and the style texts in the same style text set correspond to the same language style;

inputting the source text into a pre-generated multilingual coding model to obtain a second vector output by the multilingual coding model;

According to the second vector and the first vector, a pending style text is determined from the target style text set.

3. The method according to claim 2, wherein, according to the second vector and the first vector, determining the style text to be determined from the target style text set comprises:

From the first vectors of the target style text set, determine a third vector with the smallest vector distance from the second vector;

The style text corresponding to the third vector is used as the undetermined style text.

4. The method according to claim 2, wherein the target style text set is pre-generated in the following manner:

Obtaining multiple style texts corresponding to the target language style;

For each style text, input the style text into the multilingual encoding model to obtain a first vector output by the multilingual encoding model;

Generate the target style text set according to the style text and the first vector.

5. The method according to claim 4, wherein, according to the style text and the first vector, generating the target style text set comprises:

According to the style text and the first vector, the target style text set is generated based on a preset retrieval algorithm.

6. The method according to claim 2, wherein the text translation model is pre-generated in the following manner:

Obtain translation training samples; the translation training samples include a plurality of source sample texts, and target sample texts corresponding to each source sample text;

determining the style sample text corresponding to the source sample text from the target style text set according to the source sample text;

According to the style sample text, the source sample text and the target sample text, the preset translation model is trained to obtain the text translation model.

7. The method according to any one of claims 1 to 6, wherein the input of the source text and the undetermined style text into a pre-generated text translation model obtains the output of the text translation model Target translated texts include:

After splicing the source text and the undetermined style text, the target spliced text is obtained;

Inputting the target spliced text into the text translation model to obtain the target translated text.

8. A translation device, characterized in that the device comprises:

The first determining module is used to determine the source text to be translated and the target language style;

The second determination module is configured to determine a pending style text from a plurality of style texts corresponding to the target language style according to the source text;

A translation module, configured to input the source text and the text of the undetermined style into a pre-generated text translation model to obtain a target translation text output by the text translation model, the language style of the target translation text is the target language style .

9. A computer-readable medium, on which a computer program is stored, characterized in that, when the computer program is executed by a processing device, the steps of the method according to any one of claims 1 to 7 are implemented.

10. An electronic device, characterized in that it comprises:

a storage device on which a computer program is stored;

A processing device configured to execute the computer program in the storage device to implement the steps of the method according to any one of claims 1-7.