CN106228986A

CN106228986A - The automated testing method of a kind of speech recognition engine, device and system

Info

Publication number: CN106228986A
Application number: CN201610597319.6A
Authority: CN
Inventors: 李伟; 李龙; 杜冰
Original assignee: Beijing Qihoo Technology Co Ltd; Qizhi Software Beijing Co Ltd
Current assignee: Beijing Qihoo Technology Co Ltd; Qizhi Software Beijing Co Ltd
Priority date: 2016-07-26
Filing date: 2016-07-26
Publication date: 2016-12-14

Abstract

The invention discloses an automatic testing method, device and system of a speech recognition engine. The method includes: playing the current test voice signal, making the specified terminal equipment as the receiving end receive the voice signal and send it to the voice recognition engine and obtain the voice recognition result from the voice recognition engine; Speech recognition result information corresponding to the speech signal; verifying the speech recognition result information to implement a speech recognition test for the speech recognition engine. In summary, the present invention plays the voice signal in the voice library that needs to be detected, and the designated device as the receiving end will send the voice signal to the voice recognition engine, receive the returned recognition result, complete the verification of the voice recognition result, and automatically According to the verification results, it is judged whether to proceed to the next test, and the automatic test of the speech recognition engine is realized, eliminating the time-consuming and labor-intensive manual test.

Description

An automated testing method, device and system for a speech recognition engine

技术领域technical field

本发明涉及测试技术领域，具体涉及一种语音识别引擎的自动化测试方法、装置和系统。The invention relates to the technical field of testing, in particular to an automatic testing method, device and system for a speech recognition engine.

背景技术Background technique

随着科技的进步，机器、电子设备等越来越趋向智能化，尤其是语音识别技术的开发，实现人与机器、设备进行语音交流，让机器、设备明白你说什么。近二十年来，语音识别技术取得了显著进步，已经开始从实验室走向市场，且预计，未来10年内，语音识别技术将进入工业、家电、通信、汽车电子、医疗、家庭服务、消费电子产品等各个领域。但是这种技术的普及离不开性能优异的语音识别引擎，该语音识别引擎必须实现这种不同语音的识别，能够识别不同语音、不同语义，甚至不同口音的声音，其识别结果符合识别标准；而且也需要有一个庞大的语音数据库的支持，可以容纳各种各样的语音。所以对这种具有庞大语音数据库的性能优异的语音识别引擎的检测就不可能通过传统的人工检测方法来进行，如何实现语音识别引擎的自动化监测成为急需解决的问题。With the advancement of science and technology, machines and electronic equipment are becoming more and more intelligent, especially the development of voice recognition technology, which enables people to communicate with machines and equipment through voice, so that machines and equipment can understand what you say. In the past two decades, speech recognition technology has made remarkable progress, and it has begun to move from the laboratory to the market. It is expected that in the next 10 years, speech recognition technology will enter industries, home appliances, communications, automotive electronics, medical care, home services, and consumer electronics products. and other fields. However, the popularization of this technology is inseparable from a speech recognition engine with excellent performance. The speech recognition engine must realize the recognition of such different speeches, be able to recognize sounds of different speeches, different semantics, and even different accents, and its recognition results meet the recognition standards; And it also needs the support of a huge voice database, which can accommodate a variety of voices. Therefore, it is impossible to detect such a speech recognition engine with excellent performance with a huge speech database through traditional manual detection methods. How to realize the automatic monitoring of speech recognition engine has become an urgent problem to be solved.

发明内容Contents of the invention

鉴于上述问题，提出了本发明以便提供一种克服上述问题或者至少部分地解决上述问题的语音识别引擎的自动化测试方法、装置和系统。In view of the above problems, the present invention is proposed to provide an automated testing method, device and system for a speech recognition engine that overcomes the above problems or at least partially solves the above problems.

依据本发明的一个方面，提供了一种语音识别引擎的自动化测试方法，其中，该方法包括：According to one aspect of the present invention, a kind of automatic testing method of speech recognition engine is provided, wherein, this method comprises:

播放当前的测试用语音信号，使得作为接收端的指定终端设备接收该语音信号并发送至语音识别引擎以及从语音识别引擎获得语音识别结果；Play the current voice signal for testing, so that the designated terminal device as the receiving end receives the voice signal and sends it to the voice recognition engine and obtains the voice recognition result from the voice recognition engine;

接收所述指定终端设备输出的与所述当前的测试用语音信号对应的语音识别结果信息；receiving voice recognition result information corresponding to the current test voice signal output by the designated terminal device;

对所述语音识别结果信息进行验证，以实现对所述语音识别引擎的语音识别测试。The speech recognition result information is verified to realize the speech recognition test of the speech recognition engine.

可选地，该方法进一步包括：Optionally, the method further includes:

根据验证结果判断，对当前的测试用语音信号的识别是否达到预设标准，如果是则播放下一条测试用语音信号，否则再次播放当前的测试用语音信号。Judging according to the verification result, whether the recognition of the current test voice signal reaches the preset standard, if so, playing the next test voice signal, otherwise playing the current test voice signal again.

可选地，该方法进一步包括：Optionally, the method further includes:

对当前的测试用语音信号的识别未达到预设标准的次数达到预设值时，播放下一条测试用语音信号。When the number of times the recognition of the current voice signal for testing fails to meet the preset standard reaches a preset value, the next voice signal for testing is played.

可选地，所述播放当前的测试用语音信号包括：Optionally, the playing the current test voice signal includes:

依次遍历测试语音库中的各个语音信号，将当前遍历到的语音信号作为当前的测试用语音信号进行播放；Traverse each voice signal in the test voice library in turn, and play the currently traversed voice signal as the current test voice signal;

或者，or,

依次遍历文本数据库中的各个文本数据，将当前遍历到的文本数据利用文本到语音TTS技术转换成语音信号后作为当前的测试用语音信号进行播放。Each text data in the text database is traversed sequentially, and the currently traversed text data is converted into a voice signal by using the text-to-speech TTS technology, and then played as the current test voice signal.

可选地，该方法进一步包括：Optionally, the method further includes:

根据对各测试用语音信号对应的语音识别信息的验证结果，生成所述语音识别引擎的语音识别测试报告。A speech recognition test report of the speech recognition engine is generated according to verification results of the speech recognition information corresponding to each test speech signal.

可选地，所述接收所述指定终端设备输出的与所述当前的测试用语音信号对应的语音识别结果信息包括：Optionally, the receiving the speech recognition result information corresponding to the current test speech signal output by the designated terminal device includes:

通过收音模块接收所述指定终端设备输出的包含识别结果信息的语音信号；receiving the voice signal output by the specified terminal device and including the recognition result information through the sound receiving module;

和/或，and / or,

通过无线通信模块或网络模块接收所述指定终端设备输出的包含识别结果信息的无线信号。The wireless signal including the identification result information output by the designated terminal device is received through the wireless communication module or the network module.

可选地，所述对所述语音识别结果信息进行验证包括：Optionally, the verifying the speech recognition result information includes:

预先保存测试用语音信号的预期识别信息；saving the expected identification information of the voice signal for testing in advance;

将所述语音识别结果信息与当前测试用语音信号对应的预期识别信息进行对比。The speech recognition result information is compared with the expected recognition information corresponding to the current test speech signal.

可选地，所述指定终端设备通过所述语音识别引擎的提供语音识别业务的界面将测试用语音信号发送给语音识别引擎；Optionally, the specified terminal device sends the test voice signal to the voice recognition engine through the voice recognition service interface of the voice recognition engine;

或者，or,

所述指定终端设备通过所述语音识别引擎提供的特定接口将测试用语音信号发送给语音识别引擎。The specified terminal device sends the test voice signal to the voice recognition engine through a specific interface provided by the voice recognition engine.

可选地，该方法进一步包括：Optionally, the method further includes:

接收所述指定终端设备输出的与所述测试用语音信号对应的语义识别结果信息；其中，所述指定终端设备接收所述语音识别引擎对测试用语音信号进行语义识别后对指定终端设备进行的操作后，生成与该操作对应的语义识别结果信息；receiving the semantic recognition result information corresponding to the test voice signal output by the specified terminal device; wherein, the specified terminal device receives the semantic recognition of the test voice signal by the voice recognition engine for the specified terminal device After the operation, generate semantic recognition result information corresponding to the operation;

对所述语义识别结果信息进行验证，以实现对所述语音识别引擎的语义识别测试。The semantic recognition result information is verified to realize the semantic recognition test of the speech recognition engine.

依据本发明的另一个方面，提供了一种语音识别引擎的自动化测试装置，其中，该装置包括：According to another aspect of the present invention, a kind of automatic testing device of speech recognition engine is provided, wherein, this device comprises:

播放单元，适于播放当前的测试用语音信号，使得作为接收端的指定终端设备接收该语音信号并发送至语音识别引擎以及从语音识别引擎获得语音识别结果；The playback unit is adapted to play the current test voice signal, so that the designated terminal device as the receiving end receives the voice signal and sends it to the voice recognition engine and obtains the voice recognition result from the voice recognition engine;

接收单元，适于接收所述指定终端设备输出的与所述当前的测试用语音信号对应的语音识别结果信息；The receiving unit is adapted to receive the speech recognition result information corresponding to the current test speech signal output by the designated terminal device;

测试验证单元，适于对所述语音识别结果信息进行验证，以实现对所述语音识别引擎的语音识别测试。The test verification unit is adapted to verify the speech recognition result information, so as to implement the speech recognition test of the speech recognition engine.

可选地，该装置进一步包括：Optionally, the device further includes:

判断单元，适于根据所述验证单元的验证结果判断，对当前的测试用语音信号的识别是否达到预设标准，如果是则通知所述播放单元播放下一条测试用语音信号，否则通知所述播放单元再次播放当前的测试用语音信号。The judging unit is adapted to judge according to the verification result of the verification unit, whether the recognition of the current test voice signal reaches a preset standard, if so, notify the playback unit to play the next test voice signal, otherwise notify the The playback unit plays the current voice signal for testing again.

可选地，所述判断单元，进一步适于在对当前的测试用语音信号的识别未达到预设标准的次数达到预设值时，通知所述播放单元播放下一条测试用语音信号。Optionally, the judging unit is further adapted to notify the playing unit to play the next test voice signal when the number of times the recognition of the current test voice signal fails to meet the preset standard reaches a preset value.

可选地，所述播放单元，适于依次遍历测试语音库中的各个语音信号，将当前遍历到的语音信号作为当前的测试用语音信号进行播放；或者，适于依次遍历文本数据库中的各个文本数据，将当前遍历到的文本数据利用文本到语音TTS技术转换成语音信后作为当前的测试用语音信号进行播放。Optionally, the playback unit is adapted to sequentially traverse each voice signal in the test voice database, and play the currently traversed voice signal as the current test voice signal; or, is adapted to sequentially traverse each voice signal in the text database. For text data, the currently traversed text data is converted into a voice message by using the text-to-speech TTS technology, and then played as the current test voice signal.

可选地，该装置进一步包括：Optionally, the device further includes:

报告生成单元，适于根据对各测试用语音信号对应的语音识别信息的验证结果，生成所述语音识别引擎的语音识别测试报告。The report generation unit is adapted to generate the speech recognition test report of the speech recognition engine according to the verification result of the speech recognition information corresponding to each test speech signal.

可选地，所述接收单元，适于通过收音模块接收所述指定终端设备输出的包含识别结果信息的语音信号；和/或，通过无线通信模块或网络模块接收所述指定终端设备输出的包含识别结果信息的无线信号。Optionally, the receiving unit is adapted to receive the voice signal output by the designated terminal device through the sound receiving module, including the recognition result information; and/or, receive the voice signal output by the designated terminal device through the wireless communication module or the network module. Wireless signal of recognition result information.

可选地，所述测试验证单元，适于预先保存测试用语音信号的预期识别信息，将所述语音识别结果信息与当前测试用语音信号对应的预期识别信息进行对比。Optionally, the test verification unit is adapted to store the expected recognition information of the test voice signal in advance, and compare the voice recognition result information with the expected recognition information corresponding to the current test voice signal.

可选地，所述接收单元，进一步适于接收所述指定终端设备输出的与所述测试用语音信号对应的语义识别结果信息；其中，所述指定终端设备接收所述语音识别引擎对测试用语音信号进行语义识别后对指定终端设备进行的操作后，生成与该操作对应的语义识别结果信息；Optionally, the receiving unit is further adapted to receive semantic recognition result information corresponding to the test voice signal output by the designated terminal device; After the semantic recognition of the voice signal is performed on the specified terminal equipment, the semantic recognition result information corresponding to the operation is generated;

所述测试验证单元，进一步适于对所述语义识别结果信息进行验证，以实现对所述语音识别引擎的语义识别测试。The test verification unit is further adapted to verify the semantic recognition result information, so as to implement the semantic recognition test of the speech recognition engine.

依据本发明的再一个方面，提供了一种语音识别引擎的自动化测试系统，其中，该系统包括作为发送端的终端设备和作为接收端的终端设备；According to another aspect of the present invention, an automated test system for a speech recognition engine is provided, wherein the system includes a terminal device as a sending end and a terminal device as a receiving end;

所述作为发送端的终端设备包括前述任一项所述的语音识别引擎的自动化测试装置。The terminal device as the sending end includes the automatic testing device for the speech recognition engine described in any one of the foregoing.

可选地，所述作为接收端的终端设备包括：Optionally, the terminal device serving as the receiving end includes:

接收单元，适于接收所述语音识别引擎的自动化测试装置发送的测试用语音信号；The receiving unit is adapted to receive the test voice signal sent by the automatic test device of the voice recognition engine;

语音识别处理单元，适于将测试用语音信号发送至语音识别引擎以及从语音识别引擎获得语音识别结果；The speech recognition processing unit is suitable for sending the speech signal for testing to the speech recognition engine and obtaining the speech recognition result from the speech recognition engine;

输出单元，适于输出与所述当前的测试用语音信号对应的语音识别结果信息。The output unit is adapted to output speech recognition result information corresponding to the current test speech signal.

可选地，所述语音识别处理单元，适于通过所述语音识别引擎的提供语音识别业务的界面将测试用语音信号发送给语音识别引擎；或者，通过所述语音识别引擎提供的特定接口将测试用语音信号发送给语音识别引擎。Optionally, the speech recognition processing unit is adapted to send the test speech signal to the speech recognition engine through an interface of the speech recognition engine that provides a speech recognition service; or, send the speech signal to the speech recognition engine through a specific interface provided by the speech recognition engine. The test voice signal is sent to the speech recognition engine.

可选地，语音识别处理单元，适于在所述作为接收端的终端设备接收所述语音识别引擎对测试用语音信号进行语义识别后对该终端设备进行的操作后，生成与该操作对应的语义识别结果信息；Optionally, the speech recognition processing unit is adapted to generate semantics corresponding to the operation after the terminal device serving as the receiving end receives the operation performed on the terminal device after the speech recognition engine performs semantic recognition on the test speech signal. Identification result information;

所述输出单元，进一步适于输出与所述当前的测试用语音信号对应的语义识别结果信息。The output unit is further adapted to output semantic recognition result information corresponding to the current test speech signal.

可见，作为发送端的终端设备生成需要检测的语音库，并将该语音库中当前遍历的语音通过第三方应用的语音播放模块播放出来，作为接收端的指定终端设备接收到当前语音信号后，将其发送到语音识别引擎，语音识别引擎会对其接收到的语音信号进行识别，并将识别结果返回给作为接收端的指定终端设备，然后根据作为接收端的指定终端设备输出的语音识别结果进行验证，判断该语音识别结果的正确性或其正确率，其所判断出的正确性或正确率就可反映当前的语音识别引擎的性能，实现对语音识别引擎的测试，同时会根据验证的结果自行判断是进行下一条需检测的语音的播放，还是重新播放该条语音，并最终形成一个测试报告供开发人员判断是否对测试的语音视频引擎进一步优化，这就实现了语音识别引擎的自动化测试。综上所述，本发明的技术方案通过引入作为发送端的终端设备和作为接收端的终端设备对语音识别引擎进行测试，实现了语音识别引擎的自动化测试，免去了人工测试的耗时耗力，使语音识别引擎测试更加智能化。It can be seen that the terminal device as the sending end generates the voice library that needs to be detected, and plays the currently traversed voice in the voice library through the voice playback module of the third-party application. After receiving the current voice signal, the designated terminal device as the receiving end plays it. Send it to the speech recognition engine, the speech recognition engine will recognize the speech signal it receives, and return the recognition result to the designated terminal device as the receiving end, and then verify and judge according to the speech recognition result output by the designated terminal device as the receiving end The correctness or accuracy rate of the speech recognition result, the correctness or accuracy rate judged by it can reflect the performance of the current speech recognition engine, realize the test of the speech recognition engine, and judge whether it is correct or not according to the verification result Play the next voice to be detected, or replay the voice, and finally form a test report for the developer to judge whether to further optimize the tested voice and video engine, which realizes the automated test of the voice recognition engine. In summary, the technical solution of the present invention realizes automatic testing of the speech recognition engine by introducing the terminal device as the sending end and the terminal device as the receiving end to test the speech recognition engine, eliminating the time-consuming and labor-intensive manual testing, Make speech recognition engine testing smarter.

上述说明仅是本发明技术方案的概述，为了能够更清楚了解本发明的技术手段，而可依照说明书的内容予以实施，并且为了让本发明的上述和其它目的、特征和优点能够更明显易懂，以下特举本发明的具体实施方式。The above description is only an overview of the technical solution of the present invention. In order to better understand the technical means of the present invention, it can be implemented according to the contents of the description, and in order to make the above and other purposes, features and advantages of the present invention more obvious and understandable , the specific embodiments of the present invention are enumerated below.

附图说明Description of drawings

通过阅读下文优选实施方式的详细描述，各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的，而并不认为是对本发明的限制。而且在整个附图中，用相同的参考符号表示相同的部件。在附图中：Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiment. The drawings are only for the purpose of illustrating a preferred embodiment and are not to be considered as limiting the invention. Also throughout the drawings, the same reference numerals are used to designate the same components. In the attached picture:

图1示出了根据本发明一个实施例的一种语音识别引擎的自动化测试方法的流程图；Fig. 1 shows the flow chart of the automated testing method of a kind of speech recognition engine according to an embodiment of the present invention;

图2示出了根据本发明一个实施例的一种语音识别引擎的自动化测试装置的示意图；Fig. 2 shows a schematic diagram of an automated testing device for a speech recognition engine according to an embodiment of the present invention;

图3示出了根据本发明另一个实施例的一种语音识别引擎的自动化测试装置的示意图；FIG. 3 shows a schematic diagram of an automated testing device for a speech recognition engine according to another embodiment of the present invention;

图4示出了根据本发明一个实施例的一种作为接收端的指定设备装置的示意图；Fig. 4 shows a schematic diagram of a designated device as a receiving end according to an embodiment of the present invention;

图5示出了根据本发明一个实施例的一种语音识别引擎的自动化测试系统的示意图。Fig. 5 shows a schematic diagram of an automated test system for a speech recognition engine according to an embodiment of the present invention.

具体实施方式detailed description

下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例，然而应当理解，可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反，提供这些实施例是为了能够更透彻地理解本公开，并且能够将本公开的范围完整的传达给本领域的技术人员。Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.

图1示出了根据本发明一个实施例的一种语音识别引擎的自动化测试方法的流程图，从作为发送端的终端设备侧说明本方案的实施过程。如图1所示，该方法包括：FIG. 1 shows a flow chart of an automatic testing method for a speech recognition engine according to an embodiment of the present invention, and illustrates the implementation process of this solution from the side of the terminal device as the sending end. As shown in Figure 1, the method includes:

步骤S110，播放当前的测试用语音信号，使得作为接收端的指定终端设备接收该语音信号并发送至语音识别引擎以及从语音识别引擎获得语音识别结果。Step S110, playing the current voice signal for testing, so that the designated terminal device as the receiving end receives the voice signal and sends it to the voice recognition engine and obtains a voice recognition result from the voice recognition engine.

在本发明的实施例中，该播放当前的测试用语音信号的作为发送端的终端设备包括但不限于手机，平板电脑、台式电脑等任何可以生成并播放语音的设备。首先生成需要检测的语音库，并将该语音库中的当前遍历的语音通过第三方应用的语音播放模块播放出来，作为接收端的指定终端设备接收到当前的语音信号后，将其发送到语音识别引擎，语音识别引擎会根据其接收到的当前语音信号进行识别，并将识别结果返回给作为接收端的指定终端设备。其中，作为接收端的指定终端设备包括但也不限于手机，平板电脑、台式电脑等可以接收语音信号，并能发送语音信号到语音识别引擎同时能将识别结果输出的设备。在进行语音识别引擎的测试时，其需要检测的语音不仅仅只有少量的几条，需要尽可能遍历其在以后的应用场景中所要遇到的各种情况的语音，那么所生成需要检测的语音就不仅仅是几条，而是上千条乃至上万条，是一个需要检测的语音库。In the embodiment of the present invention, the terminal device as the sending end that plays the current test voice signal includes, but is not limited to, any device that can generate and play voice, such as a mobile phone, a tablet computer, or a desktop computer. First generate the voice library to be detected, and play the currently traversed voice in the voice library through the voice playback module of the third-party application. After the designated terminal device as the receiving end receives the current voice signal, it sends it to the voice recognition Engine, the speech recognition engine will recognize the current speech signal it receives, and return the recognition result to the designated terminal device as the receiving end. Among them, the designated terminal device as the receiving end includes but is not limited to mobile phones, tablet computers, desktop computers and other devices that can receive voice signals, send voice signals to the voice recognition engine and output the recognition results at the same time. When testing the speech recognition engine, there are not only a few voices that need to be detected, but it is necessary to traverse the voices of various situations that it will encounter in future application scenarios as much as possible, so the generated voices that need to be detected Not just a few lines, but thousands or even tens of thousands of lines, which is a voice library that needs to be detected.

步骤S120，接收指定终端设备输出的与当前的测试用语音信号对应的语音识别结果信息。Step S120, receiving the speech recognition result information corresponding to the current test speech signal output by the designated terminal device.

在本发明的实施例中，播放当前检测用的语音后，会受到来自接收指定终端设备输出的与其当前播放的语音相对应的语音识别结果，该语音识别结果是语音识别引擎返回的结果信息，同时会根据这个语音识别结果进行下一步的逻辑验证。In an embodiment of the present invention, after playing the voice for current detection, a voice recognition result corresponding to the voice currently played will be received from the receiving specified terminal device, the voice recognition result is the result information returned by the voice recognition engine, At the same time, the next logical verification will be carried out based on the speech recognition result.

步骤S130，对语音识别结果信息进行验证，以实现对语音识别引擎的语音识别测试。Step S130, verifying the speech recognition result information, so as to realize the speech recognition test of the speech recognition engine.

在本发明的实施例中，根据其接收到的当前的语音识别结果进行验证，判断该语音识别结果的正确性，或其正确率，并根据判断结果进行下一步程序，该语音识别结果是语音识别引擎返回的结果，其所判断出的正确性或正确率就可反映当前的语音识别引擎的性能，实现对语音识别引擎的测试。因为生成的是一个需要检测的语音库，那么就会根据其接收到的当前的语音识别结果进行验证后，并根据验证的结果自行判断是进行下一条需检测的语音的播放，还是重新播放当前的语音，这就实现了语音识别引擎的自动化测试。In the embodiment of the present invention, verify according to the current speech recognition result that it receives, judge the correctness of this speech recognition result, or its accuracy rate, and carry out the next procedure according to judgment result, this speech recognition result is speech The correctness or accuracy rate of the result returned by the recognition engine can reflect the performance of the current speech recognition engine and realize the test of the speech recognition engine. Because a speech library that needs to be detected is generated, it will be verified according to the current speech recognition result it receives, and it will judge whether to play the next speech to be detected or replay the current speech according to the verification result. voice, which realizes the automated testing of the speech recognition engine.

可见，作为发送端的终端设备设备生成需要检测的语音库，并将该语音库中当前遍历的语音通过第三方应用的语音播放模块播放出来，作为接收端的指定终端设备接收到当前语音信号后，将其发送到语音识别引擎，语音识别引擎会对其接收到的语音信号进行识别，并将识别结果返回给作为接收端的指定终端设备，然后根据作为接收端的指定终端设备输出的语音识别结果进行验证，判断该语音识别结果的正确性，或其正确率，其所判断出的正确性或正确率就可反映当前的语音识别引擎的性能，实现对语音识别引擎的测试，同时会根据验证的结果自行判断是进行下一条需检测的语音的播放，还是重新播放该条语音，并最终形成一个测试报告供开发人员判断是否对测试的语音视频引擎进一步优化，这就实现了语音识别引擎的自动化测试。本发明的技术方案通过引入作为发送端的终端设备和作为接收端的终端设备对语音识别引擎进行测试，实现了语音识别引擎的自动化测试，免去了人工测试的耗时耗力，使语音识别引擎测试更加智能化。It can be seen that the terminal device as the sending end generates the voice library that needs to be detected, and plays the currently traversed voice in the voice library through the voice playback module of the third-party application. After the designated terminal device as the receiving end receives the current voice signal, it will It is sent to the speech recognition engine, and the speech recognition engine will recognize the speech signal it receives, and return the recognition result to the designated terminal device as the receiving end, and then verify it according to the speech recognition result output by the designated terminal device as the receiving end, Judging the correctness of the speech recognition result, or its correct rate, the correctness or correct rate can reflect the performance of the current speech recognition engine, realize the test of the speech recognition engine, and will automatically Judging whether to play the next voice to be detected, or to replay the voice, and finally form a test report for developers to judge whether to further optimize the voice and video engine for testing, which realizes the automated testing of the voice recognition engine. The technical scheme of the present invention realizes the automated testing of the speech recognition engine by introducing the terminal device as the sending end and the terminal device as the receiving end to test the speech recognition engine, which eliminates the time-consuming and labor-intensive manual testing, and makes the speech recognition engine test Smarter.

在本发明的一个实施例中，图1所示的方法进一步包括：根据验证结果判断，对当前的测试用语音信号的识别是否达到预设标准，如果是则播放下一条测试用语音信号，否则再次播放当前的测试用语音信号。In one embodiment of the present invention, the method shown in Fig. 1 further comprises: judging according to the verification result, whether the recognition of the current test voice signal reaches the preset standard, if so, playing the next test voice signal, otherwise The current test audio signal is played again.

在语音识别引擎的测试过程中，在生成需要检测的语音库的同时也会生成一个预设的与其相对应的每条语音识别后的正确结果，并且设置一个预设标准，那么根据作为接收端的指定终端设备输出的识别结果信息和预设的正确结果进行对比，对比后如果识别信息和正确结果一致，或者达到了预设标准，那么就会自动播放下一条语音，如果对比后相差较大，且达不到预设标准，则重新播放当前的语音再次进行识别。比如，在进行测试时，生成了一个语音的正确结果是十个字的，且预设标准设置为90％，那么在验证识别结果的时候，返回了十个字，并且这十个字中有九个字和正确结果相匹配才可以播放下一条语音；如果返回的结果是十个字，但是只有八个字是匹配的，则重新播放，或者返回的识别结果是9个字的，那也要重新播放该语音。During the testing process of the speech recognition engine, while generating the speech bank to be detected, a preset correct result after each speech recognition corresponding to it is also generated, and a preset standard is set, then according to the receiving end The recognition result information output by the specified terminal device is compared with the preset correct result. After the comparison, if the recognition information is consistent with the correct result, or reaches the preset standard, then the next voice will be played automatically. If the difference is large after the comparison, And if the preset standard is not reached, the current voice will be played again for recognition again. For example, when performing a test, the correct result of generating a speech is ten characters, and the preset standard is set to 90%, then when verifying the recognition result, ten characters are returned, and there are The next speech can only be played if the nine characters match the correct result; if the returned result is ten characters, but only eight characters match, then replay, or the returned recognition result is nine characters, that is also To replay the voice.

在本发明的一个实施例中，图1所示的方法进一步包括：对当前的测试用语音信号的识别未达到预设标准的次数达到预设值时，播放下一条测试用语音信号。因为在进行语音识别引擎的测试时，遇到一个该引擎不能识别的语音，即重播了很多遍后也不能正确识别，如果不限定这个重播的次数则该程序就会无限循环下去，该测试过程也不会往下进行。所以，在生成语音识别的正确结果且设定预设标准的同时，还要设定一个预设值，来限定重播的次数，以便在语音识别引擎无法识别当前语音的情况下，也能进行其它语音的识别。比如，生成了一条十个字的语音后，预设标准为90％，预设值为3次，则在进行语音识别引擎返回的识别结果验证时，其结果一直只返回九个字，在重播3次后，识别结果也无法正确匹配，则不会在进行当前语音的重播，会进行下一条语音的播放，该语音就会被标记为无法识别的语音，并记录到最后生成的测试报告中。In one embodiment of the present invention, the method shown in FIG. 1 further includes: playing the next test voice signal when the number of times the recognition of the current test voice signal fails to meet the preset standard reaches a preset value. Because during the test of the speech recognition engine, a speech that the engine cannot recognize is encountered, that is, it cannot be recognized correctly after replaying it many times. If the number of replays is not limited, the program will continue indefinitely. The test process It will not proceed further. Therefore, while generating the correct result of speech recognition and setting the preset standard, it is also necessary to set a preset value to limit the number of replays, so that when the speech recognition engine cannot recognize the current speech, it can also perform other tasks. Voice recognition. For example, after a ten-character voice is generated, the preset standard is 90%, and the default value is 3 times. Then, when verifying the recognition result returned by the speech recognition engine, the result always returns only nine characters. After 3 times, if the recognition result cannot be matched correctly, the current voice will not be replayed, but the next voice will be played, and the voice will be marked as an unrecognizable voice, and will be recorded in the final generated test report .

因为在进行语音识别引擎的测试时，其需要检测的语音不仅仅只有少量的几条，需要尽可能遍历其在以后的应用场景中所要遇到的各种情况的语音，那么所生成需要检测的语音就不仅仅是几条，可能是成千上万条，是一个需要检测的语音库。在本发明的一个实施例中，则图1所示的步骤S110中播放当前的测试用语音信号包括：依次遍历测试语音库中的各个语音信号，将当前遍历到的语音信号作为当前的测试用语音信号进行播放。Because when testing the speech recognition engine, the speech that it needs to detect is not only a small number, but also needs to traverse the speech of various situations that it will encounter in future application scenarios as much as possible, so the generated speech that needs to be detected Voice is not just a few lines, it may be thousands of lines, it is a voice library that needs to be detected. In one embodiment of the present invention, playing the current test voice signal in step S110 shown in Fig. 1 comprises: traversing each voice signal in the test voice library in turn, using the current voice signal traversed as the current test voice signal The audio signal is played.

或者，其生成的语音库中的语音信号并不是音频的形式，而是文本数据库，那么就需要引入一个第三方的应用，即可以进行文本转语音的语音合成软件(TTS，Text toSpeech)，将文本数据库中的当前文本数据通过TTS转换成语音信号后在通过语音播放模块进行播放。在本发明的一个实施例中，图1所示的步骤S110中播放当前的测试用语音信号包括：依次遍历文本数据库中的各个文本数据，将当前遍历到的文本数据利用文本到语音TTS技术转换成语音信号后作为当前的测试用语音信号进行播放。Or, the voice signal in the voice bank that it generates is not the form of audio frequency, but text database, so just need to introduce the application of a third party, promptly can carry out the speech synthesis software (TTS, Text toSpeech) of text-to-speech, will The current text data in the text database is converted into a voice signal through TTS and then played through the voice playback module. In one embodiment of the present invention, playing the current voice signal for testing in step S110 shown in FIG. 1 includes: sequentially traversing each text data in the text database, converting the currently traversed text data using text-to-speech TTS technology After being converted into a voice signal, it will be played as the current test voice signal.

在本发明的一个实施例中，图1所示的方法进一步包括：根据对各测试用语音信号对应的语音识别信息的验证结果，生成所述语音识别引擎的语音识别测试报告。在语音识别引擎进行测试结束后，会生成一个测试结果的报告，该测试报告中显示了对测试语音库中的语音的总的识别正确率、每条语音识别结果的正确率，以及不能识别的语音的记录。该检测结果会显示当前的语音识别引擎的性能，并且方便开发者发现不能识别的语音，以便对该语音识别引擎进行优化。In an embodiment of the present invention, the method shown in FIG. 1 further includes: generating a speech recognition test report of the speech recognition engine according to verification results of the speech recognition information corresponding to each test speech signal. After the speech recognition engine is tested, a test result report will be generated, which shows the total recognition accuracy of the speech in the test speech database, the accuracy of each speech recognition result, and the unrecognized voice recording. The detection result will show the performance of the current speech recognition engine, and it will be convenient for developers to find unrecognized speech, so as to optimize the speech recognition engine.

在本发明的实施例中，指定终端设备输出的与当前的测试用语音信号对应的语音识别结果信息可以是语音信号，也可以是无线信号。则在本发明的一个实施例中，图1所示的方法的步骤S120中接收指定终端设备输出的与当前的测试用语音信号对应的语音识别结果信息包括：通过收音模块，如麦克风等，接收指定终端设备输出的包含识别结果信息的语音信号；和/或，通过无线通信模块或网络模块，如蓝牙模块、WiFi模块等，接收指定终端设备输出的包含识别结果信息的无线信号。In the embodiment of the present invention, the speech recognition result information corresponding to the current test speech signal output by the designated terminal device may be a speech signal or a wireless signal. Then, in one embodiment of the present invention, in step S120 of the method shown in FIG. 1 , receiving the voice recognition result information corresponding to the current test voice signal output by the specified terminal device includes: through a sound collection module, such as a microphone, receiving The voice signal containing the recognition result information output by the designated terminal device; and/or, through the wireless communication module or network module, such as Bluetooth module, WiFi module, etc., receiving the wireless signal containing the recognition result information output by the designated terminal device.

在本发明的一个实施例中，图1所示的方法的步骤S130中对所述语音识别结果信息进行验证包括：预先保存测试用语音信号的预期识别信息；将语音识别结果信息与当前测试用语音信号对应的预期识别信息进行对比。其中，预期识别信息即为前述中的语音的正确结果，测试语音库中的每条语音或者文本数据都有其对应的预期识别结果。In one embodiment of the present invention, verifying the speech recognition result information in step S130 of the method shown in FIG. 1 includes: pre-saving the expected recognition information of the speech signal for testing; The expected identification information corresponding to the speech signal is compared. Wherein, the expected recognition information is the correct result of the aforementioned speech, and each piece of speech or text data in the test speech database has its corresponding expected recognition result.

在本发明的一个实施例中，作为接收端的指定终端设备事先会安装一个与语音识别引擎对应的业务软件，来实现对语音识别引擎提供语音信号，具体的是通过语音识别引擎的提供语音识别业务的界面将测试用语音信号发送给语音识别引擎。In one embodiment of the present invention, the designated terminal device as the receiving end will install a service software corresponding to the speech recognition engine in advance to realize the provision of speech signals to the speech recognition engine, specifically providing speech recognition services through the speech recognition engine The interface of the test will send the speech signal to the speech recognition engine.

或者，作为测试版的语音识别引擎会提供一个特定接口，作为接收端的指定终端设备通过语音识别引擎提供的这个特定接口将测试用语音信号发送给语音识别引擎。Alternatively, the speech recognition engine as a test version will provide a specific interface, and the designated terminal device as the receiving end will send the speech signal for testing to the speech recognition engine through the specific interface provided by the speech recognition engine.

与设备、机器等进行语音对话的过程中，不仅仅涉及到语音的本身，也会涉及到具有指令含义的语音，及需要机器或者设备根据该带有指令含义的语音进行相应的操作，即语义识别。则在本发明的一个实施例中，图1所示的方法进一步包括：接收指定终端设备输出的与测试用语音信号对应的语义识别结果信息；其中，指定终端设备接收语音识别引擎对测试用语音信号进行语义识别后对指定终端设备进行的操作后，生成与该操作对应的语义识别结果信息并输出；对所述语义识别结果信息进行验证，以实现对所述语音识别引擎的语义识别测试。In the process of voice dialogue with equipment, machines, etc., not only the voice itself is involved, but also the voice with command meaning is involved, and the machine or device needs to perform corresponding operations according to the voice with command meaning, that is, semantics identify. Then in one embodiment of the present invention, the method shown in Figure 1 further includes: receiving the semantic recognition result information corresponding to the voice signal for testing output by the specified terminal equipment; wherein, the specified terminal equipment receives the voice recognition engine for testing After the semantic recognition of the signal is performed on the specified terminal equipment, the semantic recognition result information corresponding to the operation is generated and output; the semantic recognition result information is verified to realize the semantic recognition test of the speech recognition engine.

在一个具体的例子中，播放的当前的测试用语音信号为“打开蓝牙”，那么接收指定终端设备输出的语音识别引擎对测试用语音信号“打开蓝牙”的识别结果信息包括：其语音结果是否是“打开蓝牙”这四个字，以及语义识别引擎对作为指定终端设备是否进行了“打开蓝牙”这个操作。对这个识别结果进行验证时，既要对返回的语音结果进行匹配，还要对是否“打开蓝牙”这个操作进行验证，若两者均匹配，则播放下一条；若返回的语义识别结果信息是“打开蓝牙”，但是其中并没有包含已执行“打开蓝牙”的操作的信息，则重新播放。其中，在生成该条“打开蓝牙”的测试用语音时，也会预先设置其预期识别结果，和其对应的预期操作结果，进行验证时，对两者均进行验证。In a specific example, the current test voice signal played is "turn on bluetooth", then the recognition result information of the test voice signal "turn on bluetooth" by the voice recognition engine receiving the output of the specified terminal device includes: whether the voice result It is the four words "turn on bluetooth", and whether the semantic recognition engine has performed the operation of "turn on bluetooth" as the designated terminal device. When verifying the recognition result, it is necessary not only to match the returned voice result, but also to verify whether the operation of "turning on Bluetooth" is performed. If both match, the next one will be played; if the returned semantic recognition result information is "Turn on Bluetooth", but it does not contain the information that the operation of "Turn on Bluetooth" has been performed, then play again. Wherein, when the test voice of "turn on Bluetooth" is generated, the expected recognition result and the corresponding expected operation result are also preset, and both are verified during verification.

在另一个具体的例子中，播放当前的测试用语音信号为“你好么？”，在生成该条“你好么？”的测试用语音时，也会预先设置其预期识别结果，该预期结果为“我很好”或者“我不好”，那么接收指定终端设备输出的语音识别引擎对测试用语音信号“你好么？”的识别结果信息包括：其语音结果是否是“我很好”或者“我不好”，若匹配，则播放下一条；若不匹配，则重新播放。In another specific example, the current test voice signal is played as "How are you?", and when the test voice signal "How are you?" is generated, the expected recognition result will also be preset. If the result is "I'm fine" or "I'm not good", then the recognition result information of the voice signal "How are you?" for the test voice signal "How are you?" by the voice recognition engine receiving the output of the designated terminal device includes: whether the voice result is "I'm fine" " or "I'm not good", if it matches, the next one will be played; if it doesn't match, it will be played again.

图2示出了根据本发明一个实施例的一种语音识别引擎的自动化测试装置的示意图。如图2所示，该语音识别引擎的自动化测试装置200包括：Fig. 2 shows a schematic diagram of an automatic testing device for a speech recognition engine according to an embodiment of the present invention. As shown in Figure 2, the automatic testing device 200 of this speech recognition engine comprises:

播放单元210，适于播放当前的测试用语音信号，使得作为接收端的指定终端设备接收该语音信号并发送至语音识别引擎以及从语音识别引擎获得语音识别结果。The playing unit 210 is adapted to play the current test voice signal, so that the designated terminal device as the receiving end receives the voice signal and sends it to the voice recognition engine and obtains a voice recognition result from the voice recognition engine.

接收单元220，适于接收指定终端设备输出的与当前的测试用语音信号对应的语音识别结果信息。The receiving unit 220 is adapted to receive speech recognition result information corresponding to the current test speech signal output by the designated terminal device.

测试验证单元230，适于对语音识别结果信息进行验证，以实现对语音识别引擎的语音识别测试。The test verification unit 230 is adapted to verify the speech recognition result information, so as to implement the speech recognition test of the speech recognition engine.

可见，作为发送端的终端设备生成需要检测的语音库，并将该语音库中当前遍历的语音通过第三方应用的语音播放模块播放出来，作为接收端的指定终端设备接收到当前语音信号后，将其发送到语音识别引擎，语音识别引擎会对其接收到的语音信号进行识别，并将识别结果返回给作为接收端的指定终端设备，然后根据作为接收端的指定终端设备输出的语音识别结果进行验证，判断该语音识别结果的正确性或其正确率，其所判断出的正确性或正确率就可反映当前的语音识别引擎的性能，实现对语音识别引擎的测试，同时会根据验证的结果自行判断是进行下一条需检测的语音的播放，还是重新播放该条语音，并最终形成一个测试报告供开发人员判断是否对测试的语音视频引擎进一步优化，这就实现了语音识别引擎的自动化测试。综本发明的技术方案通过引入作为发送端的终端设备和作为接收端的终端设备对语音识别引擎进行测试，实现了语音识别引擎的自动化测试，免去了人工测试的耗时耗力，使语音识别引擎测试更加智能化。It can be seen that the terminal device as the sending end generates the voice library that needs to be detected, and plays the currently traversed voice in the voice library through the voice playback module of the third-party application. After receiving the current voice signal, the designated terminal device as the receiving end plays it. Send it to the speech recognition engine, the speech recognition engine will recognize the speech signal it receives, and return the recognition result to the designated terminal device as the receiving end, and then verify and judge according to the speech recognition result output by the designated terminal device as the receiving end The correctness or accuracy rate of the speech recognition result, the correctness or accuracy rate judged by it can reflect the performance of the current speech recognition engine, realize the test of the speech recognition engine, and judge whether it is correct or not according to the verification result Play the next voice to be detected, or replay the voice, and finally form a test report for the developer to judge whether to further optimize the tested voice and video engine, which realizes the automated test of the voice recognition engine. In summary, the technical solution of the present invention realizes the automated testing of the speech recognition engine by introducing the terminal device as the sending end and the terminal device as the receiving end to test the speech recognition engine, thereby eliminating the time-consuming and labor-intensive manual testing, and making the speech recognition engine Test smarter.

图3示出了根据本发明另一个实施例的一种语音识别引擎的自动化测试装置的示意图。如图3所示，该语音识别引擎的自动化测试装置300包括：播放单元310、接收单元320、测试验证单元330、判断单元340和报告生成单元350。其中，播放单元310、接收单元320、测试验证单元330和图2所示装置的播放单元210、接收单元220、测试验证单元230具有对应相同的功能，相同的部分在此不再赘述。Fig. 3 shows a schematic diagram of an automatic testing device for a speech recognition engine according to another embodiment of the present invention. As shown in FIG. 3 , the automatic testing device 300 of the speech recognition engine includes: a playback unit 310 , a receiving unit 320 , a test verification unit 330 , a judging unit 340 and a report generating unit 350 . Wherein, the playback unit 310, the reception unit 320, the test verification unit 330 and the playback unit 210, the reception unit 220, the test verification unit 230 of the apparatus shown in FIG.

判断单元340，适于根据验证单元的验证结果判断，对当前的测试用语音信号的识别是否达到预设标准，如果是则通知播放单元播放下一条测试用语音信号，否则通知播放单元再次播放当前的测试用语音信号。The judging unit 340 is suitable for judging according to the verification result of the verification unit, whether the recognition of the current test voice signal reaches the preset standard, if so, then notify the playback unit to play the next test voice signal, otherwise notify the playback unit to play the current test voice signal again. audio signal for testing.

报告生成单元350，适于根据对各测试用语音信号对应的语音识别信息的验证结果，生成所述语音识别引擎的语音识别测试报告。该测试报告中显示了，对测试语音库中的语音的总的识别正确率，每条语音识别结果的正确率，以及不能识别的语音的记录。该结果会显示当前的语音识别引擎的性能，并且方便开发者发现不能识别的语音，以便对该语音识别引擎进行优化。The report generation unit 350 is adapted to generate the speech recognition test report of the speech recognition engine according to the verification result of the speech recognition information corresponding to each test speech signal. The test report shows the total recognition accuracy rate of the speech in the test speech library, the accuracy rate of each speech recognition result, and the record of unrecognized speech. The result will show the performance of the current speech recognition engine, and facilitate developers to find unrecognizable speech in order to optimize the speech recognition engine.

在本发明的一个实施例中，判断单元340，进一步适于在对当前的测试用语音信号的识别未达到预设标准的次数达到预设值时，通知播放单元播放下一条测试用语音信号。In an embodiment of the present invention, the judging unit 340 is further adapted to notify the playback unit to play the next test voice signal when the number of times the recognition of the current test voice signal fails to meet the preset standard reaches a preset value.

在本发明的一个实施例中，播放单元310，适于依次遍历测试语音库中的各个语音信号，将当前遍历到的语音信号作为当前的测试用语音信号进行播放；或者，适于依次遍历文本数据库中的各个文本数据，将当前遍历到的文本数据利用文本到语音TTS技术转换成语音信后作为当前的测试用语音信号进行播放。In one embodiment of the present invention, the playback unit 310 is adapted to sequentially traverse each speech signal in the test speech library, and plays the currently traversed speech signal as the current test speech signal; or, is suitable for sequentially traversing the text For each text data in the database, the currently traversed text data is converted into a voice message by using the text-to-speech TTS technology, and then played as the current test voice signal.

在本发明的一个实施例中，接收单元320，适于通过收音模块，包括麦克风等，接收指定终端设备输出的包含识别结果信息的语音信号；和/或，通过无线通信模块或网络模块，包括蓝牙模块、WiFi模块模块等接收指定终端设备输出的包含识别结果信息的无线信号。In one embodiment of the present invention, the receiving unit 320 is adapted to receive the voice signal containing the recognition result information output by the designated terminal device through the sound receiving module, including a microphone, etc.; and/or, through the wireless communication module or network module, including The bluetooth module, the WiFi module, etc. receive the wireless signal output by the specified terminal device that includes the identification result information.

在本发明的一个实施例中，测试验证单元330，适于预先保存测试用语音信号的预期识别信息，将语音识别结果信息与当前测试用语音信号对应的预期识别信息进行对比。In one embodiment of the present invention, the test verification unit 330 is adapted to pre-save the expected recognition information of the test voice signal, and compare the voice recognition result information with the expected recognition information corresponding to the current test voice signal.

在本发明的一个实施例中，接收单元310，进一步适于接收指定终端设备输出的与测试用语音信号对应的语义识别结果信息；其中，指定终端设备接受语音识别引擎对测试用语音信号进行语义识别后对指定终端设备进行的操作后，生成与该操作对应的语义识别结果信息。In one embodiment of the present invention, the receiving unit 310 is further adapted to receive the semantic recognition result information corresponding to the test voice signal output by the designated terminal device; After the operation performed on the specified terminal device is identified, semantic recognition result information corresponding to the operation is generated.

测试验证单元330，进一步适于对语义识别结果信息进行验证，以实现对语音识别引擎的语义识别测试。The test verification unit 330 is further adapted to verify the semantic recognition result information, so as to implement the semantic recognition test of the speech recognition engine.

需要说明的是，图2-图3所示的装置的各实施例与图1所示方法的各实施例对应相同，上文已有详细说明，在此不再赘述。It should be noted that the embodiments of the apparatus shown in FIGS. 2-3 are correspondingly the same as the embodiments of the method shown in FIG. 1 , which have been described in detail above and will not be repeated here.

图4示出了根据本发明一个实施例的一种作为接收端的指定设备的示意图。如图4所示，该作为接收端的指定设备400包括：Fig. 4 shows a schematic diagram of a designated device as a receiving end according to an embodiment of the present invention. As shown in Figure 4, the specified device 400 as the receiving end includes:

接收单元410，适于接收语音识别引擎的自动化测试装置发送的测试用语音信号.The receiving unit 410 is adapted to receive the test voice signal sent by the automatic test device of the voice recognition engine.

语音识别处理单元420，适于将测试用语音信号发送至语音识别引擎以及从语音识别引擎获得语音识别结果。The voice recognition processing unit 420 is adapted to send the test voice signal to the voice recognition engine and obtain a voice recognition result from the voice recognition engine.

输出单元430，适于输出与当前的测试用语音信号对应的语音识别结果信息。该结果信息可以是语音信号，也可以是无线信号，会通过语音播放模块或者蓝牙模块、WiFi模块等输出给作为发送端的终端设备。The output unit 430 is adapted to output speech recognition result information corresponding to the current test speech signal. The result information can be a voice signal or a wireless signal, which will be output to the terminal device as the sending end through the voice playback module, bluetooth module, WiFi module, etc.

在本发明的一个实施例中，语音识别处理单元420，适于通过语音识别引擎的提供语音识别业务的界面将测试用语音信号发送给语音识别引擎；或者，通过语音识别引擎提供的特定接口将测试用语音信号发送给语音识别引擎。In one embodiment of the present invention, the voice recognition processing unit 420 is adapted to send the test voice signal to the voice recognition engine through the interface of the voice recognition engine that provides voice recognition services; The test voice signal is sent to the speech recognition engine.

在本发明的一个实施例中，语音识别处理单元420，适于在作为接收端的终端设备接受语音识别引擎对测试用语音信号进行语义识别后对该终端设备进行的操作后，生成与该操作对应的语义识别结果信息。In one embodiment of the present invention, the speech recognition processing unit 420 is adapted to generate a corresponding response to the operation after the terminal device as the receiving end accepts the operation performed on the terminal device after the speech recognition engine performs semantic recognition on the test speech signal. Semantic recognition result information.

则输出单元430，进一步适于输出与当前的测试用语音信号对应的语义识别结果信息。Then the output unit 430 is further adapted to output semantic recognition result information corresponding to the current test speech signal.

需要说明的是，图4所示的装置的各实施例与图1所示方法的各实施例对应相同，上文已有详细说明，在此不再赘述。It should be noted that the embodiments of the apparatus shown in FIG. 4 are correspondingly the same as the embodiments of the method shown in FIG. 1 , which have been described in detail above and will not be repeated here.

图5示出了根据本发明一个实施例的一种语音识别引擎的自动化测试系统的示意图。如图5所示，该语音识别引擎的自动化测试系统500包括作为发送端的终端设备510和作为接收端的终端设备520。Fig. 5 shows a schematic diagram of an automated test system for a speech recognition engine according to an embodiment of the present invention. As shown in FIG. 5 , the automatic test system 500 of the speech recognition engine includes a terminal device 510 as a sending end and a terminal device 520 as a receiving end.

作为发送端的终端设备510与上文中的语音识别引擎的自动化测试装置 200/300具有对应相同的功能，作为接收端的终端设备520与上文中的作为接收端的指定设备400具有相同的功能，相同的部分在此不再赘述。The terminal device 510 as the sending end has the corresponding same function as the automatic test device 200/300 of the speech recognition engine above, and the terminal device 520 as the receiving end has the same function as the specified device 400 as the receiving end above, and the same parts I won't repeat them here.

在本发明的实施例中，语音识别引擎的自动化测试系统引入了作为发送端的终端设备和作为接收端的终端设备的两种设备，分别作为发送端发送语音和验证语音识别结果，以及作为接收端接收语音并输出语音识别结果，保证遍历语音识引擎可能出现的各种语音识别结果，实现语音识别引擎测试的自动化。In the embodiment of the present invention, the automatic test system of the speech recognition engine introduces two kinds of devices as the terminal device of the sending end and the terminal device as the receiving end, respectively, as the sending end to send the voice and verify the speech recognition result, and as the receiving end to receive Speech and output speech recognition results to ensure that various speech recognition results that may appear in the speech recognition engine can be traversed to realize the automation of speech recognition engine testing.

可见，作为发送端设备生成需要检测的语音库，并将该语音库中当前遍历的语音通过第三方应用的语音播放模块播放出来，作为接收端的指定终端设备接收到当前语音信号后，将其发送到语音识别引擎，语音识别引擎会对其接收到的语音信号进行识别，并将识别结果返回给作为接收端的指定终端设备，然后根据作为接收端的指定终端设备输出的语音识别结果进行验证，判断该语音识别结果的正确性，或其正确率，其所判断出的正确性或正确率就可反映当前的语音识别引擎的性能，实现对语音识别引擎的测试，同时会根据验证的结果自行判断是进行下一条需检测的语音的播放，还是重新播放该条语音，并最终形成一个测试报告供开发人员判断是否对测试的语音视频引擎进一步优化，这就实现了语音识别引擎的自动化测试。本发明的技术方案通过引入作为发送端的终端设备和作为接收端的终端设备对语音识别引擎进行测试，实现了语音识别引擎的自动化测试，免去了人工测试的耗时耗力，使语音识别引擎测试更加智能化。It can be seen that the device as the sending end generates the voice library that needs to be detected, and plays the currently traversed voice in the voice library through the voice playback module of the third-party application. After receiving the current voice signal, the designated terminal device as the receiving end sends it to to the speech recognition engine, the speech recognition engine will recognize the speech signal it receives, and return the recognition result to the designated terminal device as the receiving end, and then verify it according to the speech recognition result output by the designated terminal device as the receiving end, and judge the The correctness of the speech recognition result, or its accuracy rate, the correctness or accuracy rate can reflect the performance of the current speech recognition engine, realize the test of the speech recognition engine, and judge whether it is Play the next voice to be detected, or replay the voice, and finally form a test report for the developer to judge whether to further optimize the tested voice and video engine, which realizes the automated test of the voice recognition engine. The technical scheme of the present invention realizes the automated testing of the speech recognition engine by introducing the terminal device as the sending end and the terminal device as the receiving end to test the speech recognition engine, which eliminates the time-consuming and labor-intensive manual testing, and makes the speech recognition engine test Smarter.

需要说明的是：It should be noted:

在此提供的算法和显示不与任何特定计算机、虚拟装置或者其它设备固有相关。各种通用装置也可以与基于在此的示教一起使用。根据上面的描述，构造这类装置所要求的结构是显而易见的。此外，本发明也不针对任何特定编程语言。应当明白，可以利用各种编程语言实现在此描述的本发明的内容，并且上面对特定语言所做的描述是为了披露本发明的最佳实施方式。The algorithms and displays presented herein are not inherently related to any particular computer, virtual appliance, or other device. Various general purpose devices can also be used with the teachings based on this. The structure required to construct such an apparatus will be apparent from the foregoing description. Furthermore, the present invention is not specific to any particular programming language. It should be understood that various programming languages can be used to implement the contents of the present invention described herein, and the above description of specific languages is for disclosing the best mode of the present invention.

在此处所提供的说明书中，说明了大量具体细节。然而，能够理解，本发明的实施例可以在没有这些具体细节的情况下实践。在一些实例中，并未详细示出公知的方法、结构和技术，以便不模糊对本说明书的理解。In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure the understanding of this description.

类似地，应当理解，为了精简本公开并帮助理解各个发明方面中的一个或多个，在上面对本发明的示例性实施例的描述中，本发明的各个特征有时被一起分组到单个实施例、图、或者对其的描述中。然而，并不应将该公开的方法解释成反映如下意图：即所要求保护的本发明要求比在每个权利要求中所明确记载的特征更多的特征。更确切地说，如下面的权利要求书所反映的那样，发明方面在于少于前面公开的单个实施例的所有特征。因此，遵循具体实施方式的权利要求书由此明确地并入该具体实施方式，其中每个权利要求本身都作为本发明的单独实施例。Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, in order to streamline this disclosure and to facilitate an understanding of one or more of the various inventive aspects, various features of the invention are sometimes grouped together in a single embodiment, figure, or its description. This method of disclosure, however, is not to be interpreted as reflecting an intention that the claimed invention requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the Detailed Description are hereby expressly incorporated into this Detailed Description, with each claim standing on its own as a separate embodiment of this invention.

本领域那些技术人员可以理解，可以对实施例中的设备中的模块进行自适应性地改变并且把它们设置在与该实施例不同的一个或多个设备中。可以把实施例中的模块或单元或组件组合成一个模块或单元或组件，以及此外可以把它们分成多个子模块或子单元或子组件。除了这样的特征和/或过程或者单元中的至少一些是相互排斥之外，可以采用任何组合对本说明书(包括伴随的权利要求、摘要和附图)中公开的所有特征以及如此公开的任何方法或者设备的所有过程或单元进行组合。除非另外明确陈述，本说明书(包括伴随的权利要求、摘要和附图)中公开的每个特征可以由提供相同、等同或相似目的的替代特征来代替。Those skilled in the art can understand that the modules in the device in the embodiment can be adaptively changed and arranged in one or more devices different from the embodiment. Modules or units or components in the embodiments may be combined into one module or unit or component, and furthermore may be divided into a plurality of sub-modules or sub-units or sub-assemblies. All features disclosed in this specification (including accompanying claims, abstract and drawings) and any method or method so disclosed may be used in any combination, except that at least some of such features and/or processes or units are mutually exclusive. All processes or units of equipment are combined. Each feature disclosed in this specification (including accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

此外，本领域的技术人员能够理解，尽管在此所述的一些实施例包括其它实施例中所包括的某些特征而不是其它特征，但是不同实施例的特征的组合意味着处于本发明的范围之内并且形成不同的实施例。例如，在下面的权利要求书中，所要求保护的实施例的任意之一都可以以任意的组合方式来使用。Furthermore, those skilled in the art will understand that although some embodiments described herein include some features included in other embodiments but not others, combinations of features from different embodiments are meant to be within the scope of the invention. and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

本发明的各个部件实施例可以以硬件实现，或者以在一个或者多个处理器上运行的软件模块实现，或者以它们的组合实现。本领域的技术人员应当理解，可以在实践中使用微处理器或者数字信号处理器(DSP)来实现根据本发明实施例的语音识别引擎的自动化测试装置和系统中的一些或者全部部件的一些或者全部功能。本发明还可以实现为用于执行这里所描述的方法的一部分或者全部的设备或者装置程序(例如，计算机程序和计算机程序产品)。这样的实现本发明的程序可以存储在计算机可读介质上，或者可以具有一个或者多个信号的形式。这样的信号可以从因特网网站上下载得到，或者在载体信号上提供，或者以任何其他形式提供。The various component embodiments of the present invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art should be understood that can use microprocessor or digital signal processor (DSP) to realize some or some or all parts in the automatic testing device and system of speech recognition engine according to the embodiment of the present invention in practice Full functionality. The present invention can also be implemented as an apparatus or an apparatus program (for example, a computer program and a computer program product) for performing a part or all of the methods described herein. Such a program for realizing the present invention may be stored on a computer-readable medium, or may be in the form of one or more signals. Such a signal may be downloaded from an Internet site, or provided on a carrier signal, or provided in any other form.

应该注意的是上述实施例对本发明进行说明而不是对本发明进行限制，并且本领域技术人员在不脱离所附权利要求的范围的情况下可设计出替换实施例。在权利要求中，不应将位于括号之间的任何参考符号构造成对权利要求的限制。单词“包含”不排除存在未列在权利要求中的元件或步骤。位于元件之前的单词“一”或“一个”不排除存在多个这样的元件。本发明可以借助于包括有若干不同元件的硬件以及借助于适当编程的计算机来实现。在列举了若干装置的单元权利要求中，这些装置中的若干个可以是通过同一个硬件项来具体体现。单词第一、第二、以及第三等的使用不表示任何顺序。可将这些单词解释为名称。It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention can be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In a unit claim enumerating several means, several of these means can be embodied by one and the same item of hardware. The use of the words first, second, and third, etc. does not indicate any order. These words can be interpreted as names.

本发明公开了A1、一种语音识别引擎的自动化测试方法，其中，该方法包括：The invention discloses A1, an automated testing method for a speech recognition engine, wherein the method includes:

对所述语音识别结果信息进行验证，以实现对所述语音识别引擎的语音识别测试。The speech recognition result information is verified to implement the speech recognition test of the speech recognition engine.

A2、如A1所述的方法，其中，该方法进一步包括：A2. The method as described in A1, wherein the method further comprises:

A3、如A2所述的方法，其中，该方法进一步包括：A3, the method as described in A2, wherein, the method further comprises:

A4、如A1所述的方法，其中，所述播放当前的测试用语音信号包括：A4, the method as described in A1, wherein, described playing current test voice signal comprises:

或者，or,

A5、如A4所述的方法，其中，该方法进一步包括：A5. The method as described in A4, wherein the method further comprises:

A6、如A1所述的方法，其中，所述接收所述指定终端设备输出的与所述当前的测试用语音信号对应的语音识别结果信息包括：A6. The method as described in A1, wherein said receiving the speech recognition result information corresponding to the current test speech signal output by the designated terminal device includes:

和/或，and / or,

A7、如A1所述的方法，其中，所述对所述语音识别结果信息进行验证包括：A7. The method as described in A1, wherein the verifying the speech recognition result information includes:

A8、如A1-A7中任一项所述的方法，其中，A8. The method of any one of A1-A7, wherein,

所述指定终端设备通过所述语音识别引擎的提供语音识别业务的界面将测试用语音信号发送给语音识别引擎；The designated terminal device sends the test voice signal to the voice recognition engine through the voice recognition service interface of the voice recognition engine;

或者，or,

A9、如A1-A7中任一项所述的方法，其中，该方法进一步包括：A9. The method according to any one of A1-A7, wherein the method further comprises:

本发明还公开了B10、一种语音识别引擎的自动化测试装置，其中，该装置包括：The present invention also discloses B10, an automated testing device for a speech recognition engine, wherein the device includes:

B11、如B10所述的装置，其中，该装置进一步包括：B11. The device as described in B10, wherein the device further comprises:

B12、如B11所述的装置，其中，B12. The device of B11, wherein,

所述判断单元，进一步适于在对当前的测试用语音信号的识别未达到预设标准的次数达到预设值时，通知所述播放单元播放下一条测试用语音信号。The judging unit is further adapted to notify the playback unit to play the next test voice signal when the number of times the recognition of the current test voice signal fails to meet the preset standard reaches a preset value.

B13、如B10所述的装置，其中，B13. The device of B10, wherein,

所述播放单元，适于依次遍历测试语音库中的各个语音信号，将当前遍历到的语音信号作为当前的测试用语音信号进行播放；或者，适于依次遍历文本数据库中的各个文本数据，将当前遍历到的文本数据利用文本到语音TTS技术转换成语音信后作为当前的测试用语音信号进行播放。The playback unit is suitable for sequentially traversing each voice signal in the test voice database, playing the currently traversed voice signal as a current test voice signal; or, being suitable for sequentially traversing each text data in the text database, and The currently traversed text data is converted into a voice message using the text-to-speech TTS technology and then played as the current test voice signal.

B14、如B13所述的装置，其中，该装置进一步包括：B14. The device as described in B13, wherein the device further comprises:

B15、如B10所述的装置，其中，B15. The device of B10, wherein,

所述接收单元，适于通过收音模块接收所述指定终端设备输出的包含识别结果信息的语音信号；和/或，通过无线通信模块或网络模块接收所述指定终端设备输出的包含识别结果信息的无线信号。The receiving unit is adapted to receive the voice signal containing the recognition result information output by the specified terminal device through the sound receiving module; and/or receive the voice signal containing the recognition result information output by the specified terminal device through the wireless communication module or the network module. wireless signal.

B16、如B10所述的装置，其中，B16. The device of B10, wherein,

所述测试验证单元，适于预先保存测试用语音信号的预期识别信息，将所述语音识别结果信息与当前测试用语音信号对应的预期识别信息进行对比。The test verification unit is adapted to store expected recognition information of the test voice signal in advance, and compare the voice recognition result information with the expected recognition information corresponding to the current test voice signal.

B17、如B10-B16中任一项所述的装置，其中，B17. The device of any one of B10-B16, wherein,

所述接收单元，进一步适于接收所述指定终端设备输出的与所述测试用语音信号对应的语义识别结果信息；其中，所述指定终端设备接收所述语音识别引擎对测试用语音信号进行语义识别后对指定终端设备进行的操作后，生成与该操作对应的语义识别结果信息；The receiving unit is further adapted to receive semantic recognition result information corresponding to the test voice signal output by the specified terminal device; wherein, the specified terminal device receives the voice recognition engine to perform semantic recognition on the test voice signal After identifying the operation performed on the designated terminal device, generate semantic recognition result information corresponding to the operation;

本发明还公开了C18、一种语音识别引擎的自动化测试系统，其中，该系统包括作为发送端的终端设备和作为接收端的终端设备；The present invention also discloses C18, an automatic test system for a speech recognition engine, wherein the system includes a terminal device as a sending end and a terminal device as a receiving end;

所述作为发送端的终端设备包括如B10-B17中任一项所述的语音识别引擎的自动化测试装置。The terminal device as the sending end includes the automatic testing device for the speech recognition engine described in any one of B10-B17.

C19、如C18所述的系统，其中，所述作为接收端的终端设备包括：C19. The system as described in C18, wherein the terminal device as the receiving end includes:

C20、如C19所述的系统，其中，C20. The system of C19, wherein,

所述语音识别处理单元，适于通过所述语音识别引擎的提供语音识别业务的界面将测试用语音信号发送给语音识别引擎；或者，通过所述语音识别引擎提供的特定接口将测试用语音信号发送给语音识别引擎。The speech recognition processing unit is adapted to send the speech signal for testing to the speech recognition engine through the interface of the speech recognition engine that provides speech recognition services; or send the speech signal for testing to the speech recognition engine through a specific interface provided by the speech recognition engine sent to the speech recognition engine.

C21、如C19所述的系统，其中，C21. The system of C19, wherein,

语音识别处理单元，适于在所述作为接收端的终端设备接收所述语音识别引擎对测试用语音信号进行语义识别后对该终端设备进行的操作后，生成与该操作对应的语义识别结果信息；The speech recognition processing unit is adapted to generate semantic recognition result information corresponding to the operation after the terminal device as the receiving end receives the operation performed on the terminal device after the speech recognition engine performs semantic recognition of the test voice signal;

Claims

1. A method for automated testing of a speech recognition engine, wherein the method comprises:

Play the current voice signal for testing, so that the designated terminal device as the receiving end receives the voice signal and sends it to the voice recognition engine and obtains the voice recognition result from the voice recognition engine;

receiving voice recognition result information corresponding to the current test voice signal output by the designated terminal device;

The speech recognition result information is verified to implement the speech recognition test of the speech recognition engine.

2. The method of claim 1, wherein the method further comprises:

Judging according to the verification result, whether the recognition of the current test voice signal reaches the preset standard, if so, playing the next test voice signal, otherwise playing the current test voice signal again.

3. The method of claim 2, wherein the method further comprises:

When the number of times the recognition of the current voice signal for testing fails to meet the preset standard reaches a preset value, the next voice signal for testing is played.

4. method as claimed in claim 1, wherein, the current test speech signal of described playing comprises:

Traverse each voice signal in the test voice library in turn, and play the currently traversed voice signal as the current test voice signal;

or,

Each text data in the text database is traversed sequentially, and the currently traversed text data is converted into a voice signal by using the text-to-speech TTS technology, and then played as the current test voice signal.

5. An automated testing device for a speech recognition engine, wherein the device comprises:

The playback unit is adapted to play the current test voice signal, so that the designated terminal device as the receiving end receives the voice signal and sends it to the voice recognition engine and obtains the voice recognition result from the voice recognition engine;

The receiving unit is adapted to receive the speech recognition result information corresponding to the current test speech signal output by the designated terminal device;

The test verification unit is adapted to verify the speech recognition result information, so as to implement the speech recognition test of the speech recognition engine.

6. The device of claim 5, wherein the device further comprises:

The judging unit is adapted to judge according to the verification result of the verification unit, whether the recognition of the current test voice signal reaches a preset standard, if so, notify the playback unit to play the next test voice signal, otherwise notify the The playback unit plays the current voice signal for testing again.

7. The apparatus of claim 6, wherein,

The judging unit is further adapted to notify the playback unit to play the next test voice signal when the number of times the recognition of the current test voice signal fails to meet the preset standard reaches a preset value.

8. An automated test system for a speech recognition engine, wherein the system includes a terminal device as a sending end and a terminal device as a receiving end;

The terminal device as the sending end includes the automatic testing device for the speech recognition engine according to any one of claims 5-7.

9. The system according to claim 8, wherein the terminal device as the receiving end comprises:

The receiving unit is adapted to receive the test voice signal sent by the automatic test device of the voice recognition engine;

The speech recognition processing unit is suitable for sending the speech signal for testing to the speech recognition engine and obtaining the speech recognition result from the speech recognition engine;

The output unit is adapted to output speech recognition result information corresponding to the current test speech signal.

10. The system of claim 9, wherein,

The speech recognition processing unit is adapted to send the speech signal for testing to the speech recognition engine through the interface of the speech recognition engine that provides speech recognition services; or send the speech signal for testing to the speech recognition engine through a specific interface provided by the speech recognition engine sent to the speech recognition engine.