[go: up one dir, main page]

CN113889102B - Instruction receiving method, system, electronic device, cloud server and storage medium - Google Patents

Instruction receiving method, system, electronic device, cloud server and storage medium Download PDF

Info

Publication number
CN113889102B
CN113889102B CN202111115408.XA CN202111115408A CN113889102B CN 113889102 B CN113889102 B CN 113889102B CN 202111115408 A CN202111115408 A CN 202111115408A CN 113889102 B CN113889102 B CN 113889102B
Authority
CN
China
Prior art keywords
voice recognition
voice
recognition result
speech recognition
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111115408.XA
Other languages
Chinese (zh)
Other versions
CN113889102A (en
Inventor
高斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cloudminds Beijing Technologies Co Ltd
Original Assignee
Cloudminds Beijing Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cloudminds Beijing Technologies Co Ltd filed Critical Cloudminds Beijing Technologies Co Ltd
Priority to CN202111115408.XA priority Critical patent/CN113889102B/en
Publication of CN113889102A publication Critical patent/CN113889102A/en
Application granted granted Critical
Publication of CN113889102B publication Critical patent/CN113889102B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephonic Communication Services (AREA)

Abstract

本申请实施例涉及人工智能技术领域,公开了一种指令接收方法、系统、电子设备、云端服务器和存储介质,该方法包括:拾取用户的语音信息;对所述语音信息进行语音识别,生成第一语音识别结果;获取第二设备生成的第二语音识别结果;其中,所述第二设备为具备拾音功能且具备语音识别功能的设备,所述第二语音识别结果是所述第二设备对所述第二设备拾取的用户的语音信息进行语音识别生成的;根据所述第一语音识别结果、所述第二语音识别结果和预设的各设备的语音识别可信度,生成指令,供所述第一设备执行。本申请实施例提供的指令接收方法,可以突破拾音范围的限制,准确地、完整地、快速地接收用户下达的语音指令,从而提升用户的使用体验。

The embodiment of the present application relates to the field of artificial intelligence technology, and discloses a method, system, electronic device, cloud server and storage medium for receiving instructions, the method comprising: picking up the voice information of the user; performing voice recognition on the voice information to generate a first voice recognition result; obtaining a second voice recognition result generated by a second device; wherein the second device is a device with a sound pickup function and a voice recognition function, and the second voice recognition result is generated by the second device performing voice recognition on the voice information of the user picked up by the second device; based on the first voice recognition result, the second voice recognition result and the preset voice recognition credibility of each device, an instruction is generated for execution by the first device. The instruction receiving method provided in the embodiment of the present application can break through the limitation of the sound pickup range, accurately, completely and quickly receive the voice instructions issued by the user, thereby improving the user experience.

Description

Instruction receiving method, system, electronic device, cloud server and storage medium
Technical Field
The embodiment of the application relates to the technical field of artificial intelligence, in particular to an instruction receiving method, an instruction receiving system, electronic equipment, a cloud server and a storage medium.
Background
Along with the rapid development of artificial intelligence technology, more and more intelligent devices enter into the life of people, such as intelligent home robots with various types and functions, wherein the intelligent home robots are special robots for providing services for users and mainly engage in home services, maintenance, repair, transportation, supervision, children education and other works, the intelligent home robots are provided with flexible multi-joint arms, can not only understand voice instructions of users, but also recognize three-dimensional objects by means of various sensors, and can perform different works at different positions in the families of the users and receive the voice instructions issued by the users in real time.
However, the inventors of the present application found that the pickup range of the robot is limited, and that it is difficult for the robot to accurately acquire a voice instruction issued by a user when the user and the robot are located in different rooms.
Disclosure of Invention
The embodiment of the application aims to provide an instruction receiving method, an instruction receiving system, electronic equipment, a cloud server and a storage medium, which can break through the limit of a pickup range, accurately, completely and rapidly receive a voice instruction issued by a user, thereby improving the use experience of the user.
In order to solve the technical problems, the embodiment of the application provides an instruction receiving method, which comprises the following steps of picking up voice information of a user, carrying out voice recognition on the voice information to generate a first voice recognition result, acquiring a second voice recognition result generated by second equipment, wherein the second equipment is equipment with a sound pickup function and a voice recognition function, the second voice recognition result is generated by the second equipment through carrying out voice recognition on the voice information of the user picked up by the second equipment, and generating an instruction for execution by the first equipment according to the first voice recognition result, the second voice recognition result and preset voice recognition reliability of each equipment.
The embodiment of the application also provides an instruction receiving system which comprises a first device and a second device, wherein the second device is provided with a sound pickup function and a voice recognition function, the first device is used for acquiring voice information of a user, carrying out voice recognition on the voice information to generate a first voice recognition result, the second device is used for picking up the voice information of the user, carrying out voice recognition on the voice information to generate a second voice recognition result, and the first device is also used for acquiring the second voice recognition result, generating an instruction according to the first voice recognition result, the second voice recognition result and the voice recognition credibility of each preset device, and executing the instruction.
The embodiment of the application also provides electronic equipment, which comprises at least one processor and a memory in communication connection with the at least one processor, wherein the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor so that the at least one processor can execute the instruction receiving method.
The embodiment of the application also provides a cloud server which comprises at least one processor and a memory in communication connection with the at least one processor, wherein the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor so that the at least one processor can execute the instruction receiving method.
Embodiments of the present application also provide a computer-readable storage medium storing a computer program which, when executed by a processor, implements the above-described instruction receiving method.
According to the instruction receiving method, the system, the electronic equipment, the cloud server and the storage medium, the first equipment picks up voice information of a user and carries out voice recognition on the acquired voice information to generate a first voice recognition result, then the second equipment acquires a second voice recognition result generated by the second equipment, the second equipment carries out voice recognition on the voice information picked up by the second equipment to generate the second voice recognition result, the first equipment generates an instruction according to the first voice recognition result, the second voice recognition result and the voice recognition reliability of each preset equipment, the first equipment carries out the generated instruction by itself, and the first equipment is difficult to accurately acquire the voice instruction issued by the user when the user and the first equipment are located in different rooms.
In addition, the instruction is generated according to the first voice recognition result, the second voice recognition result and the voice recognition credibility of the preset equipment, the instruction is generated, the first voice recognition result and the second voice recognition result can be processed in a word segmentation mode to obtain a plurality of first word segmentation fragments, the voice recognition credibility of the first word segmentation fragments and the voice recognition credibility of the second word segmentation fragments are determined according to the voice recognition credibility of the first word segmentation fragments, the voice recognition credibility of the second word segmentation fragments and the voice recognition credibility of the second word segmentation fragments are determined according to the first word segmentation fragments, the second word segmentation fragments and the voice recognition credibility of the second word segmentation fragments, and the voice recognition experience of the user is further improved according to the voice recognition instruction.
In addition, before generating the instruction according to the first voice recognition result, the second voice recognition result and the preset voice recognition reliability of each device, the method comprises the steps of obtaining the position of the user, the position of the first device and the positions of a plurality of second devices, determining the relative position between the first device and the user according to the positions of the user and the positions of the first devices, determining the relative positions between the plurality of second devices and the user according to the positions of the user and the positions of the plurality of second devices, generating the instruction according to the first voice recognition result, the second voice recognition result and the preset voice recognition reliability of each device, and generating the instruction according to the first voice recognition result, the second voice recognition result, the relative position between the first device and the user, the relative position between the plurality of second devices and the user and the preset voice recognition reliability of each device, and the instruction according to the fact that the relative positions of the user and the positions of the second devices are in the life are distributed in a certain range, and the relative positions of the second devices are not in the life are in a certain range, and the relative positions of the second devices are in a certain range, and the second devices are in a certain range of the important condition that the relative to the second devices are in a certain range of the second devices, and the important condition is in the fact the second devices are in a certain range of being relatively close to the second devices are in a certain range of the second devices is in a certain range of the voice recognition is in a certain range of the fact in a relative situation is quite important condition between the second devices is in the second voice recognition is in a certain device is quite important fact carried out between the user is in the user is quite important fact between the user.
In addition, the preset voice recognition credibility of each device comprises the voice recognition credibility of the first device and the voice recognition credibility of the second device, the voice recognition credibility of the second device is obtained by the first device through the steps that through voice and the second device, a plurality of times of conversations are conducted, voice recognition results of the second device on the plurality of times of conversations are obtained, the voice recognition results of the second device on the plurality of times of conversations are compared with text contents of the plurality of times of conversations, the voice recognition credibility of the second device is determined, the voice recognition credibility of the second device is obtained by the first device through the plurality of times of conversations in advance, and the obtained voice recognition credibility of the second device is scientific, real and reliable, so that the accuracy of the obtained instruction is further improved.
In addition, the speech recognition reliability of each device to dialect, the speech recognition reliability of each device to different languages, the speech recognition reliability of each device to different volume sizes, the speech recognition reliability of each device under different noise size environments, the speech recognition reliability of each device to users with different identity information, the speech recognition accuracy of the first device comprehensively considers the aspects when generating instructions, and the accuracy of the acquired instructions can be further improved.
In addition, before generating the instruction according to the first voice recognition result, the second voice recognition result and the preset voice recognition reliability of each device, the method comprises the steps of obtaining voice information of a user picked up by a third device, wherein the third device is a device with a sound pickup function and without the voice recognition function, performing voice recognition on the voice information of the user picked up by the third device to generate a third voice recognition result, and generating the instruction according to the first voice recognition result, the second voice recognition result and the preset voice recognition reliability of each device, wherein the method comprises the steps of generating the instruction according to the first voice recognition result, the second voice recognition result, the third voice recognition result and the preset voice recognition reliability of each device, wherein the third device is a device with the sound pickup function but without the voice recognition function, and according to the voice information of the user picked up by the third device, the first device can further improve the sound pickup range, so that the first device can obtain the more accurate and complete instruction.
In addition, after the instruction is generated, the method further comprises the steps of directly carrying out voice reply on the instruction, or carrying out voice reply on the instruction through the second equipment, wherein the first equipment can select the instruction of a user in voice reply, and can reply the instruction of the user through any second equipment, so that the actual requirement of the user can be better met, and the use experience of the user is further improved.
In addition, the second voice recognition result generated by the second device is obtained, wherein the screening information comprises any combination of identity information of the user, position information of the user and position information of each second device, target second devices are determined according to the screening information and a preset corresponding relation, the corresponding relation is the corresponding relation between the screening information and the target second devices, the second voice recognition result generated by the target second devices is obtained, one or more of the position information of the user and the position information of each second device is determined in each second device according to the identity information of the user, only the second voice recognition result generated by the target second device is obtained, namely, the target second devices are only taken into consideration when receiving instructions, and therefore the instruction receiving efficiency can be effectively improved.
Drawings
One or more embodiments are illustrated by way of example and not limitation in the figures of the accompanying drawings.
FIG. 1 is a flow chart diagram of a method of instruction reception according to one embodiment of the application;
FIG. 2 is a schematic distribution diagram of a user, a first device, and a second device in a home provided in one embodiment in accordance with the application;
FIG. 3 is a flow chart of generating instructions according to a first speech recognition result, a second speech recognition result, and a predetermined speech recognition confidence level of a device in accordance with an embodiment of the present application;
FIG. 4 is a second flowchart of a method of receiving instructions according to another embodiment of the present application;
FIG. 5 is a flow chart of obtaining speech recognition confidence of a second device in accordance with one embodiment of the application;
FIG. 6 is a flow chart III of a method of instruction reception according to another embodiment of the application;
FIG. 7 is a flow chart of obtaining a second speech recognition result generated by a second device in accordance with one embodiment of the present application;
FIG. 8 is a schematic diagram of an instruction receiving system according to another embodiment of the application;
Fig. 9 is a schematic structural view of an electronic device according to another embodiment of the present application;
Fig. 10 is a schematic structural diagram of a cloud server according to another embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the embodiments of the present application will be described in detail below with reference to the accompanying drawings. However, it will be understood by those of ordinary skill in the art that in various embodiments of the present application, numerous specific details are set forth in order to provide a thorough understanding of the present application. The claimed application may be practiced without these specific details and with various changes and modifications based on the following embodiments. The following embodiments are divided for convenience of description, and should not be construed as limiting the specific implementation of the present application, and the embodiments can be mutually combined and referred to without contradiction.
An embodiment of the present application relates to an instruction receiving method applied to a first device, and implementation details of the instruction receiving method of the present embodiment are specifically described below, where the following description is merely provided for understanding implementation details, and is not necessary to implement the present embodiment.
The specific flow of the instruction receiving method of this embodiment may be as shown in fig. 1, including:
Step 101, picking up the voice information of the user.
Step 102, performing voice recognition on the voice information to generate a first voice recognition result.
Specifically, the first device is a device capable of receiving a voice command issued by a user, understanding the voice command issued by the user, and executing the voice command issued by the user, such as an intelligent home robot, an intelligent sound box, an intelligent refrigerator, an intelligent washing machine, an intelligent air conditioner and the like.
In a specific implementation, the first device has a sound pickup function and a voice recognition function, the first device may monitor voice information in an environment in real time, when a preset wake-up word is recognized, the voice information of the user may be picked up, and voice recognition is performed on the picked-up voice information of the user to generate a first voice recognition result, where the preset wake-up word is used to wake up the first device, that is, instruct the first device to pick up sound, and preset keywords may be set by those skilled in the art according to actual needs, which is not specifically limited by embodiments of the present application.
In one example, the first device is an intelligent home robot, the preset wake-up word is "small a", when the first device recognizes that the voice information in the environment has "small a", the first device immediately starts to pick up the voice information of the user, and performs voice recognition on the picked-up voice information to obtain a first voice recognition result.
Step 103, obtaining a second voice recognition result generated by the second device.
Specifically, after the first device picks up the voice information of the user and performs voice recognition on the picked-up voice information of the user and generates a first voice recognition result, a second voice recognition result generated by the second device can be obtained, wherein the second device is a device with a sound pickup function and a voice recognition function, and the second voice recognition result is generated by the second device by performing voice recognition on the voice information of the user picked up by the second device.
In a specific implementation, the second device may also monitor the voice information in the environment in real time, pick up the voice information of the user, and perform voice recognition on the picked up voice information of the user to generate a second voice recognition result.
In one example, the second device is a device with a sound pickup function and a voice recognition function, such as a smart speaker, a home background music system, or a smart television.
In an example, the first device and the second device may be mutually converted, for example, there are multiple intelligent home robots in a user's home, including a robot a and a robot B, where a preset wake-up word of the robot a is "small a", a preset wake-up word of the robot B is "small B", both the robot a and the robot B may monitor voice information in an environment in real time, if the robot a recognizes the preset wake-up word "small a", the robot a may be the first device, and the robot B may be the second device, and if the robot B recognizes the preset wake-up word "small B", the robot B may be the first device, and the robot a may be the second device.
In one example, the wireless connection between the first device and the second device may be through any one or any combination of a Wifi connection, an ieee 802.B.g.n connection, a 2.4GHz connection, an intelligent gateway connection, a bluetooth Mesh gateway connection, a zigbee protocol gateway connection, a multimode network connection, an intelligent socket connection, an intelligent wireless switch connection, and the like.
In one example, the number of second devices is several, and a distribution schematic diagram of the user, the first device and the second device in the home may be shown in fig. 2, where a wireless connection is established between the first device and the second device a, the second device b, the second device c and the second device d.
Step 104, generating an instruction for the first equipment to execute according to the first voice recognition result, the second voice recognition result and the preset voice recognition credibility of each equipment.
In a specific implementation, after the first device obtains the first voice recognition result and the second voice recognition result, an instruction may be generated according to the first voice recognition result, the second voice recognition result and a preset voice recognition reliability of each device, so as to be executed by the first device, where the preset voice recognition reliability of each device may be set by a person skilled in the art according to actual needs.
In one example, the number of the second devices is a plurality of second recognition results, the preset speech recognition reliability of each device can represent the speech recognition capability and speech recognition quality of each device, the speech recognition capability and speech recognition quality of each device are generally proportional to the performance of each device, the higher the performance of each device is, the higher the speech recognition capability and speech recognition quality of each device are, the higher the speech recognition reliability of each device is, the more the speech recognition results of each device are reliable, and the server can select the most reliable speech recognition result from the first speech recognition result and the second speech recognition results and take the speech recognition result as an instruction for the first device to execute.
In one example, the preset speech recognition reliability of each device may be speech recognition reliability set by the user, including speech recognition reliability of each device that the user actively inputs in the background, and speech recognition reliability of a device that the user unintentionally sets, for example: the user is speaking to a certain second device: "where in the vicinity?" inexhaustible i am not aware of "the user says" how bad you are to the second device "again, meaning that the user does not trust the second device, and the first device automatically sets the speech recognition confidence of the second device to a low value.
In the embodiment of the application, compared with the technical scheme that the first equipment picks up the voice command issued by the user and executes the picked-up voice command, the first equipment picks up the voice information of the user and carries out voice recognition on the acquired voice information to generate a first voice recognition result, and then acquires a second voice recognition result generated by the second equipment, wherein the second equipment is equipment with a pickup function and a voice recognition function, the second equipment carries out voice recognition on the voice information picked up by the second equipment so as to generate a second voice recognition result, the first equipment then generates commands according to the first voice recognition result, the second voice recognition result and the preset voice recognition credibility of each equipment, so that the first equipment can execute the generated commands, and the pickup range of the equipment is limited.
In one embodiment, the number of the second devices is several, and the number of the second speech recognition results is also several.
In one embodiment, the number of the second devices is also a plurality of second speech recognition results, and the first device generates the instruction according to the first speech recognition result, the second speech recognition result and the preset speech recognition reliability of the device, which may be implemented by steps shown in fig. 3, and specifically includes:
step 201, word segmentation is performed on the first voice recognition result, so as to obtain a plurality of first word segmentation fragments.
Step 202, word segmentation is performed on a plurality of second voice recognition results to obtain a plurality of second word segmentation fragments.
In a specific implementation, after a first device generates a first voice recognition result and obtains a plurality of second voice recognition results, the first device may perform word segmentation on the first voice recognition result according to a preset word segmentation dictionary to obtain a plurality of first word segmentation fragments, and perform word segmentation on the plurality of second voice recognition results to obtain a plurality of second word segmentation fragments, where the preset word segmentation dictionary may be set by a person skilled in the art according to actual needs, and embodiments of the present application are not limited in this specific way.
In one example, the first voice recognition result is an answer bank leaflet, the first device performs word segmentation on the first voice recognition result according to a preset word segmentation dictionary to obtain three first word segmentation fragments of answer, bank and leaflet, the second voice recognition result J is a print bank leaflet, the second voice recognition result K is a print bank leaflet, the second voice recognition result L is a print English-Chinese sheet, the second voice recognition result M is a print hidden sheet, the second voice recognition result N is a print bank leaflet, and the first device performs word segmentation on the second voice recognition result J, the second voice recognition result K, the second voice recognition result L, the second voice recognition result M and the second voice recognition result N according to the preset word segmentation dictionary to obtain five second word segmentation fragments of print, bank, english-Chinese, hidden, leaflet and sheet.
Step 203, determining the voice recognition credibility of a plurality of first word segmentation fragments and the voice recognition credibility of a plurality of second word segmentation fragments according to the preset voice recognition credibility of each device.
In a specific implementation, after the first device obtains a plurality of first word segmentation fragments and a plurality of second word segmentation fragments, the voice recognition credibility of the plurality of first word segmentation fragments and the voice recognition credibility of the plurality of second word segmentation fragments can be determined according to the preset voice recognition credibility of each device.
In one example, the speech recognition reliability of the first device is 0.9, the speech recognition reliability of the second device J is 0.6, the speech recognition reliability of the second device K is 0.6, the speech recognition reliability of the second device L is 0.3, the speech recognition reliability of the second device M is 0.8, the speech recognition reliability of the second device N is 0.4, and the server determines the speech recognition reliability of each word segment according to the speech recognition reliability of each device as follows, the speech recognition reliability of "answer" is 0.9, the speech recognition reliability of "print" is 0.6+0.6+0.3+0.8+0.4=3.1, the speech recognition reliability of "bank" is 0.9+0.6+0.6+0.4=2.5, the speech recognition reliability of "english" is 0.3:0.8:0.8:0.6+0.4=2.5, the speech recognition reliability of "english" is 0.3:0.8:0.6+0.8:0.6+0.4=3:1:0.6+0.1.
Step 204, generating an instruction according to the voice recognition credibility of the plurality of first word segmentation fragments, the plurality of second word segmentation fragments and the plurality of second word segmentation fragments.
In a specific implementation, after determining the speech recognition credibility of the plurality of first word segmentation segments and the speech recognition credibility of the plurality of second word segmentation segments, the first device may generate the instruction according to the speech recognition credibility of the plurality of first word segmentation segments, the plurality of second word segmentation segments and the plurality of second word segmentation segments.
In one example, the first device determines that the speech recognition reliability of "answer" is 0.9, the speech recognition reliability of "print" is 3.1, the speech recognition reliability of "bank" is 2.5, the speech recognition reliability of "english-chinese" is 0.3, the speech recognition reliability of "implicit" is 0.8, the speech recognition reliability of "leaflet" is 2.5, the speech recognition reliability of "sheet" is 1.1, the first device may compose a sentence according to the word segmentation segment with the highest speech recognition reliability, i.e., compose the sentence "print a bank leaflet" according to "print", "bank" and "leaflet", the sentence is a trusted sentence, and the first device takes "print bank leaflet" as an instruction.
The method comprises the steps of performing word segmentation on a first voice recognition result to obtain a plurality of first word segmentation fragments, performing word segmentation on the second voice recognition result to obtain a plurality of second word segmentation fragments, determining the voice recognition credibility of the first word segmentation fragments and the voice recognition credibility of the second word segmentation fragments according to the voice recognition credibility of the first word segmentation fragments and the voice recognition credibility of the second word segmentation fragments, determining the voice recognition credibility of the first word segmentation fragments and the voice recognition credibility of the second word segmentation fragments according to the first word segmentation fragments, the voice recognition credibility of the second word segmentation fragments and the voice recognition credibility of the second word segmentation fragments, generating instructions, enabling the first device to perform word segmentation processing on the first voice recognition result and the second voice recognition result respectively in the process of generating the instructions to obtain the first word segmentation fragments and the second word segmentation fragments, and further combining the voice recognition credibility of the second word segmentation fragments with the preset voice recognition fragments, and further improving the voice recognition experience of the voice recognition fragments according to the voice recognition credibility of the voice segmentation fragments.
Another embodiment of the present application relates to a method for receiving an instruction, in this embodiment, the number of second devices is several, the number of second speech recognition results is also several, implementation details of the method for receiving an instruction of this embodiment are specifically described below, and the following is only implementation details provided for facilitating understanding, but not essential for implementing the present embodiment, and a specific flow of the method for receiving an instruction of this embodiment may be as shown in fig. 4, and includes:
step 301, picking up voice information of a user.
Step 302, performing voice recognition on the voice information to generate a first voice recognition result.
Step 303, obtaining a second speech recognition result generated by the second device.
Step 301 to step 303 are substantially the same as step 101 to step 103, and are not described herein.
Step 304, obtaining a position of a user, a position of a first device, and positions of a plurality of second devices.
Step 305, determining a relative position between the first device and the user based on the position of the user and the position of the first device.
Step 306, determining the relative positions between the plurality of second devices and the user according to the positions of the user and the positions of the plurality of second devices.
In a specific implementation, the first device may acquire the position of the user, the position of the first device, and the positions of the plurality of second devices, and after obtaining the position of the user, the position of the first device, and the positions of the plurality of second devices, the first device may determine the relative position between the first device and the user according to the position of the user and the position of the first device, and determine the relative position between the plurality of second devices and the user according to the position of the user and the positions of the plurality of second devices.
In an example, the first device may determine the location of the user, the location of the first device, and the locations of the plurality of second devices through a camera, a bluetooth positioning lamp, and other devices of the first device, and the first device may also determine the location of the user, the location of the first device, and the locations of the plurality of second devices through functions such as a global positioning system (Global Positioning System, abbreviated as GPS), a 3D semantic map, and the like.
In one example, the relative location between the device and the user may include, but is not limited to, a distance between the device and the user. The number of walls spaced between the device and the user, the number of rooms spaced between the device and the user, the number of floors spaced between the device and the user, etc.
Step 307, generating an instruction according to the first voice recognition result, the second voice recognition result, the relative positions between the first device and the user, the relative positions between the plurality of second devices and the user and the preset voice recognition credibility of each device.
In a specific implementation, after determining the relative positions between the first device and the user and the relative positions between the plurality of second devices and the user, the first device may generate the instruction according to the first voice recognition result, the second voice recognition result, the relative positions between the first device and the user, the relative positions between the plurality of second devices and the user, and the preset voice recognition credibility of each device.
In one example, the first device assigns weights to the first speech recognition result and the plurality of second speech recognition results according to the distance between each device and the user, and the first device considers the speech recognition result of the device closer to the user to be more reliable and the speech recognition result of the device farther from the user to be less reliable.
In one example, the first device assigns weights to the first speech recognition result and the number of second speech recognition results based on the number of walls spaced between each device and the user, the first device considering that the speech recognition result of a device with a small number of walls spaced between the user is more trusted and the speech recognition result of a device with a large number of walls spaced between the user is less trusted.
The method comprises the steps of obtaining the position of a user, the position of the first device and the positions of a plurality of second devices before generating instructions according to the first voice recognition result, the second voice recognition result and the preset voice recognition reliability of each device, determining the relative position between the first device and the user according to the position of the user and the positions of the first device, determining the relative positions between the plurality of second devices and the user according to the positions of the user and the positions of the plurality of second devices, generating instructions according to the first voice recognition result, the second voice recognition result and the preset voice recognition reliability of each device, and generating instructions according to the first voice recognition result, the second voice recognition result, the relative position between the first device and the user, the relative position between the plurality of second devices and the user and the preset voice recognition reliability of each device, and taking into consideration that the relative positions between the second devices and the second devices are distributed in a home range in the actual situation that the second devices are far away from the first devices, and the second devices are relatively close to the first devices in a certain range of the second devices in consideration, and the important voice recognition is carried out relatively between the second devices in the first devices and the second devices is relatively close to the second devices in the first range.
In one embodiment, the preset speech recognition reliability of each device includes the speech recognition reliability of the first device and the speech recognition reliability of the second device, where the speech recognition reliability of the second device is obtained by the first device through the steps as shown in fig. 5, and specifically includes:
Step 401, performing a plurality of dialogs with the second device through the voice, and obtaining a voice recognition result of the second device on the plurality of dialogs.
Step 402, comparing the speech recognition result of the second device for the several dialogues with the text content of the several dialogues, and determining the speech recognition reliability of the second device.
Specifically, the first device may perform several dialogs with the second device through the voice, obtain a voice recognition result of the second device for the several dialogs, and compare the voice recognition result of the several dialogs with the second device according to text content of the several dialogs, so as to determine the voice recognition reliability of the second device.
In a specific implementation, the first device and the second device may be mutually converted, and devices in the home that may be used as the first device may perform several dialogs with other devices through voice, so as to obtain the voice recognition reliability of the other devices.
In one example, the first device is a robot, and when a second device is newly added in the home, the robot may start the camera to determine the position of the newly added second device, move to the vicinity of the newly added second device, and perform several dialogs on the newly added second device, so as to determine the speech recognition reliability of the newly added second device.
In one example, a first device may obtain a voice recognition confidence of the first device from a second device.
In this embodiment, the preset speech recognition reliability of each device includes the speech recognition reliability of the first device and the speech recognition reliability of the second device, where the speech recognition reliability of the second device is obtained by the first device through performing several conversations with the second device through speech to obtain speech recognition results of the several conversations by the second device, comparing the speech recognition results of the several conversations by the second device with text contents of the several conversations to determine the speech recognition reliability of the second device, where the speech recognition reliability of the second device is obtained by the first device through several conversations in advance, and the obtained speech recognition reliability of the second device is scientific, real and reliable, thereby further improving accuracy of the obtained instruction.
In one embodiment, the speech recognition confidence level of each device includes any combination of the following: the voice recognition reliability of each device to dialects, the voice recognition reliability of each device to different languages, the voice recognition reliability of each device to different volume sizes, the voice recognition reliability of each device under different noise size environments, the voice recognition reliability of each device to users with different identity information, considering that different users have different living habits, such as that certain users frequently use dialects, certain users frequently use English, certain users have smaller voices, certain users have children and the like in home, and the actual environment where second devices are located is different, such as that certain second devices are located near windows of roads, certain second devices are located in quiet study rooms and the like, the voice recognition reliability of the aspects is comprehensively considered when the first devices generate instructions, and the accuracy of the acquired instructions can be further improved.
Another embodiment of the present application relates to an instruction receiving method, and the following details of implementation of the instruction receiving method of this embodiment are provided for understanding only, and not essential to implementation of this embodiment, and the specific flow of the instruction receiving method of this embodiment may be as shown in fig. 6, and includes:
Step 501, the voice information of the user is picked up.
Step 502, performing voice recognition on the voice information to generate a first voice recognition result.
Step 503, obtaining a second speech recognition result generated by the second device.
Step 501 to step 503 are substantially the same as step 101 to step 103, and are not described herein.
Step 504, obtaining voice information of the user picked up by the third device.
Step 505, performing voice recognition on the voice information of the user picked up by the third device, and generating a third voice recognition result.
In a specific implementation, after the first device generates the first voice recognition result and obtains the second voice recognition result, the first device may further obtain voice information of the user picked up by the third device, and perform voice recognition on the voice information of the user picked up by the third device to generate the third voice recognition result, where the third device is a device having a sound pickup function and not having a voice recognition function, and the first device may also include the third device having the sound pickup function and not having the voice recognition function in the reference range, which is equivalent to further expanding the sound pickup range of the first device.
In one example, the first device may execute step 504 and step 505 first and then execute step 503, or may execute step 502, step 503 and step 504 simultaneously.
Step 506, generating an instruction according to the first voice recognition result, the second voice recognition result, the third voice recognition result and the preset voice recognition credibility of each device.
In a specific implementation, after the first device generates the first voice recognition result and the third voice recognition result and obtains the second voice recognition result, the first device can generate the instruction according to the first voice recognition result, the second voice recognition result, the third voice recognition result and the preset voice recognition credibility of each device, so that the first device can acquire more accurate and complete voice instructions.
The method comprises the steps of obtaining voice information of a user picked up by third equipment, wherein the third equipment is equipment with a sound pickup function and does not have the voice recognition function, carrying out voice recognition on the voice information of the user picked up by the third equipment to generate a third voice recognition result, and generating instructions according to the first voice recognition result, the second voice recognition result and the preset voice recognition reliability of each equipment, wherein the instructions comprise the steps of generating the instructions according to the first voice recognition result, the second voice recognition result, the third voice recognition result and the preset voice recognition reliability of each equipment, wherein the third equipment is equipment with the sound pickup function but not with the voice recognition function, and the first equipment can further improve the sound pickup range by referring to the voice information of the user picked up by the third equipment, so that the first equipment can obtain more accurate and complete instructions.
In one embodiment, after the first device generates the instruction, the first device can directly reply the instruction by voice through the first device itself, or reply the instruction by voice through the second device, so that the actual requirement of the user can be better met, and the use experience of the user is further improved.
In one example, the first device may determine whether to reply to the instruction by voice via the first device itself or select to reply to the instruction by voice via a second device nearest to the user based on the relative location between the first device and the user and the relative locations between the plurality of second devices and the user.
In one example, the first device may reply to the instruction by voice, i.e., by the first device itself, while also replying to the instruction by voice, i.e., by all of the second devices.
In one embodiment, the first device obtains the second speech recognition result generated by the second device, which may be implemented by the steps shown in fig. 7, and specifically includes:
step 601, obtaining screening information, wherein the screening information comprises any combination of identity information of a user, position information of the user and position information of each second device.
Step 602, determining a target second device according to the screening information and a preset corresponding relation.
In a specific implementation, when the first device generates the first voice recognition result, the first device may first acquire screening information, where the screening information includes one or more of identity information of a user, location information of the user, and location information of each second device, and after the first device acquires the screening information, the first device may determine, according to a corresponding relationship between the screening information and a preset, a target second device, where the preset corresponding relationship includes a corresponding relationship between the identity information of the user and the target second device, a corresponding relationship between the location information of the user and the target second device, and a corresponding relationship between the location information of each second device and the target second device, so as to determine the target second device, that is, a second device that needs to be considered by the first device.
In one example, the first device determines that the location information of the user is a kitchen, and the first device may determine that each second device in the kitchen is a target device according to the location information of the kitchen.
In one example, the first device determines that the identity information of the user is a child, and the first device may use a second device corresponding to the child in each second device as a target second device.
Step 603, obtaining a second speech recognition result generated by the target second device.
In a specific implementation, the first device only acquires the second voice recognition result generated by the target second device, so that the efficiency of receiving the instruction can be effectively improved.
The second voice recognition result generated by the second device is obtained, wherein the screening information comprises any combination of identity information of the user, position information of the user and position information of each second device, target second devices are determined according to the screening information and a preset corresponding relation, the corresponding relation is the corresponding relation between the screening information and the target second devices, the second voice recognition result generated by the target second devices is obtained, the target second devices are determined in each second device according to one or more of the identity information of the user, the position information of the user and the position information of each second device, only the second voice recognition result generated by the target second devices is obtained, namely, the target second devices are only taken into consideration when receiving instructions, and therefore the instruction receiving efficiency can be effectively improved.
The above steps of the methods are divided into only for clarity of description, and may be combined into one step or split into multiple steps when implemented, so long as the steps include the same logic relationship, and all the steps are within the protection scope of the patent, and adding insignificant modification or introducing insignificant design to the algorithm or the process, but not changing the core design of the algorithm and the process, and all the steps are within the protection scope of the patent.
Another embodiment of the present application relates to an instruction receiving system, and details of the instruction receiving system of this embodiment are specifically described below, where the following description is provided merely for understanding implementation details, and is not necessary for implementing the present embodiment, and fig. 8 is a schematic diagram of the instruction receiving system of this embodiment, where the instruction receiving system includes a first device 701 and a second device 702, and the second device 702 is a device having a sound pickup function and a voice recognition function.
The first device 701 is configured to obtain voice information of a user, perform voice recognition on the voice information, and generate a first voice recognition result.
The second device 702 is configured to pick up voice information of a user, perform voice recognition on the voice information, and generate a second voice recognition result.
The first device 701 is further configured to obtain a second speech recognition result, generate an instruction according to the first speech recognition result, the second speech recognition result, and a preset speech recognition reliability of each device, and execute the generated instruction.
It is to be noted that this embodiment is a system embodiment corresponding to the above-described method embodiment, and this embodiment may be implemented in cooperation with the above-described method embodiment. The related technical details and technical effects mentioned in the above embodiments are still valid in this embodiment, and in order to reduce repetition, they are not described here again. Accordingly, the related technical details mentioned in the present embodiment can also be applied to the above-described embodiments.
It should be noted that, each module involved in this embodiment is a logic module, and in practical application, one logic unit may be one physical unit, or may be a part of one physical unit, or may be implemented by a combination of multiple physical units. In addition, in order to highlight the innovative part of the present application, units less closely related to solving the technical problem presented by the present application are not introduced in the present embodiment, but it does not indicate that other units are not present in the present embodiment.
Another embodiment of the application is directed to an electronic device, as shown in fig. 9, comprising at least one processor 801, and a memory 802 communicatively coupled to the at least one processor 801, wherein the memory 802 stores instructions executable by the at least one processor 801, the instructions being executable by the at least one processor 801 to enable the at least one processor 801 to perform the instruction receiving method of the embodiments described above.
Where the memory and the processor are connected by a bus, the bus may comprise any number of interconnected buses and bridges, the buses connecting the various circuits of the one or more processors and the memory together. The bus may also connect various other circuits such as peripherals, voltage regulators, and power management circuits, which are well known in the art, and therefore, will not be described any further herein. The bus interface provides an interface between the bus and the transceiver. The transceiver may be one element or may be a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. The data processed by the processor is transmitted over the wireless medium via the antenna, which further receives the data and transmits the data to the processor.
The processor is responsible for managing the bus and general processing and may also provide various functions including timing, peripheral interfaces, voltage regulation, power management, and other control functions. And memory may be used to store data used by the processor in performing operations.
Another embodiment of the present application relates to a cloud server, as shown in fig. 10, including at least one processor 901, and a memory 902 communicatively connected to the at least one processor 901, wherein the memory 902 stores instructions executable by the at least one processor 901, and the instructions are executed by the at least one processor 901 to enable the at least one processor 901 to perform the instruction receiving method in the above embodiments.
Where the memory and the processor are connected by a bus, the bus may comprise any number of interconnected buses and bridges, the buses connecting the various circuits of the one or more processors and the memory together. The bus may also connect various other circuits such as peripherals, voltage regulators, and power management circuits, which are well known in the art, and therefore, will not be described any further herein. The bus interface provides an interface between the bus and the transceiver. The transceiver may be one element or may be a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. The data processed by the processor is transmitted over the wireless medium via the antenna, which further receives the data and transmits the data to the processor.
The processor is responsible for managing the bus and general processing and may also provide various functions including timing, peripheral interfaces, voltage regulation, power management, and other control functions. And memory may be used to store data used by the processor in performing operations.
Another embodiment of the present application relates to a computer-readable storage medium storing a computer program. The computer program implements the above-described method embodiments when executed by a processor.
That is, it will be understood by those skilled in the art that all or part of the steps in implementing the methods of the embodiments described above may be implemented by a program stored in a storage medium, where the program includes several instructions for causing a device (which may be a single-chip microcomputer, a chip or the like) or a processor (processor) to perform all or part of the steps in the methods of the embodiments of the application. The storage medium includes various media capable of storing program codes, such as a U disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk or an optical disk.
It will be understood by those of ordinary skill in the art that the foregoing embodiments are specific examples of carrying out the application and that various changes in form and details may be made therein without departing from the spirit and scope of the application.

Claims (11)

1. An instruction receiving method applied to a first device, comprising:
Picking up voice information of a user;
Performing voice recognition on the voice information to generate a first voice recognition result;
the method comprises the steps of obtaining a second voice recognition result generated by second equipment, wherein the second equipment is equipment with a sound pickup function and a voice recognition function, and the second voice recognition result is generated by the second equipment through voice recognition on voice information of a user picked up by the second equipment;
Generating an instruction for the first equipment to execute according to the first voice recognition result, the second voice recognition result and the preset voice recognition credibility of each equipment;
The method comprises the steps of generating instructions according to the first voice recognition results, the second voice recognition results and the voice recognition credibility of preset devices, segmenting the first voice recognition results to obtain a plurality of first segmentation fragments, segmenting the second voice recognition results to obtain a plurality of second segmentation fragments, determining the voice recognition credibility of the first segmentation fragments and the voice recognition credibility of the second segmentation fragments according to the voice recognition credibility of the preset devices, and generating the instructions according to the voice recognition credibility of the first segmentation fragments, the voice recognition credibility of the first segmentation fragments and the voice recognition credibility of the second segmentation fragments.
2. The instruction receiving method according to claim 1, wherein before the generating the instruction according to the first speech recognition result, the second speech recognition result, and the preset speech recognition reliability of each device, the method comprises:
Acquiring the position of the user, the position of the first equipment and the positions of a plurality of second equipment;
Determining a relative position between the first device and the user according to the position of the user and the position of the first device;
determining the relative positions between the plurality of second devices and the user according to the positions of the user and the positions of the plurality of second devices;
The generating an instruction according to the first voice recognition result, the second voice recognition result and the preset voice recognition credibility of each device comprises the following steps:
generating an instruction according to the first voice recognition result, the second voice recognition result, the relative position between the first equipment and the user, the relative positions between the plurality of second equipment and the user and the voice recognition credibility of each preset equipment.
3. The instruction receiving method according to claim 1, wherein the preset speech recognition credibility of each device includes speech recognition credibility of the first device and speech recognition credibility of the second device, and the speech recognition credibility of the second device is obtained by the first device through the following steps:
performing a plurality of conversations with the second equipment through voice, and acquiring voice recognition results of the second equipment on the plurality of conversations;
and comparing the voice recognition results of the second equipment on the conversations with the text contents of the conversations to determine the voice recognition credibility of the second equipment.
4. The method according to claim 1 or 3, wherein the speech recognition reliability of each device includes any combination of speech recognition reliability of each device to dialect, speech recognition reliability of each device to different languages, speech recognition reliability of each device to different volume levels, speech recognition reliability of each device under different noise level environments, and speech recognition reliability of each device to users of different identity information.
5. The method according to claim 1, wherein before generating the instruction according to the first speech recognition result, the second speech recognition result, and the preset speech recognition reliability of each device, the method comprises:
Acquiring voice information of a user picked up by third equipment, wherein the third equipment is equipment with a sound pickup function and without a voice recognition function;
Performing voice recognition on voice information of the user picked up by the third equipment to generate a third voice recognition result;
The generating an instruction according to the first voice recognition result, the second voice recognition result and the preset voice recognition credibility of each device comprises the following steps:
Generating an instruction according to the first voice recognition result, the second voice recognition result, the third voice recognition result and preset voice recognition credibility of each device.
6. The instruction receiving method according to claim 1, characterized in that after the generation of the instruction, the method further comprises:
directly carrying out voice reply on the instruction;
or the second equipment replies the instruction with voice.
7. The method of claim 1, wherein the obtaining the second speech recognition result generated by the second device comprises:
the method comprises the steps of acquiring screening information, wherein the screening information comprises any combination of identity information of a user, position information of the user and position information of second equipment;
Determining a target second device according to the screening information and a preset corresponding relation, wherein the corresponding relation is the corresponding relation between the screening information and the target second device;
And obtaining a second voice recognition result generated by the target second equipment.
8. An instruction receiving system is characterized by comprising a first device and a second device, wherein the second device is a device with a sound pickup function and a voice recognition function;
the first equipment is used for acquiring voice information of a user, carrying out voice recognition on the voice information and generating a first voice recognition result;
The second equipment is used for picking up voice information of a user, carrying out voice recognition on the voice information and generating second voice recognition results, wherein the number of the second equipment is a plurality of the second voice recognition results;
the first device is further configured to obtain the second speech recognition result, generate an instruction according to the first speech recognition result, the second speech recognition result and a preset speech recognition reliability of each device, and execute the instruction, generate the instruction according to the first speech recognition result, the second speech recognition result and the preset speech recognition reliability of each device, and generate the instruction according to the first speech recognition result, the second speech recognition result and the preset speech recognition reliability of each device, wherein the instruction comprises word segmentation of the first speech recognition result to obtain a plurality of first word segmentation fragments, word segmentation of the plurality of second speech recognition result to obtain a plurality of second word segmentation fragments, determination of the speech recognition reliability of the plurality of first word segmentation fragments and the speech recognition reliability of the plurality of second word segmentation fragments according to the preset speech recognition reliability of each device, and generate the instruction according to the plurality of first word segmentation fragments, the speech recognition reliability of the plurality of first word segmentation fragments, the plurality of second word segmentation fragments and the speech recognition reliability of the plurality of second word segmentation fragments.
9. An electronic device, comprising:
At least one processor, and
A memory communicatively coupled to the at least one processor, wherein,
The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the instruction receiving method of any one of claims 1 to 7.
10. A cloud server, comprising:
At least one processor, and
A memory communicatively coupled to the at least one processor, wherein,
The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the instruction receiving method of any one of claims 1 to 7.
11. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the instruction receiving method of any one of claims 1 to 7.
CN202111115408.XA 2021-09-23 2021-09-23 Instruction receiving method, system, electronic device, cloud server and storage medium Active CN113889102B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111115408.XA CN113889102B (en) 2021-09-23 2021-09-23 Instruction receiving method, system, electronic device, cloud server and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111115408.XA CN113889102B (en) 2021-09-23 2021-09-23 Instruction receiving method, system, electronic device, cloud server and storage medium

Publications (2)

Publication Number Publication Date
CN113889102A CN113889102A (en) 2022-01-04
CN113889102B true CN113889102B (en) 2025-05-02

Family

ID=79010340

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111115408.XA Active CN113889102B (en) 2021-09-23 2021-09-23 Instruction receiving method, system, electronic device, cloud server and storage medium

Country Status (1)

Country Link
CN (1) CN113889102B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110288997A (en) * 2019-07-22 2019-09-27 苏州思必驰信息科技有限公司 Device wake-up method and system for acoustic networking
CN111696562A (en) * 2020-04-29 2020-09-22 华为技术有限公司 Voice wake-up method, device and storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6898567B2 (en) * 2001-12-29 2005-05-24 Motorola, Inc. Method and apparatus for multi-level distributed speech recognition
US10049669B2 (en) * 2011-01-07 2018-08-14 Nuance Communications, Inc. Configurable speech recognition system using multiple recognizers
EP2678861B1 (en) * 2011-02-22 2018-07-11 Speak With Me, Inc. Hybridized client-server speech recognition
WO2014064324A1 (en) * 2012-10-26 2014-05-01 Nokia Corporation Multi-device speech recognition
CN108461084A (en) * 2018-03-01 2018-08-28 广东美的制冷设备有限公司 Speech recognition system control method, control device and computer readable storage medium
CN110377716B (en) * 2019-07-23 2022-07-12 百度在线网络技术(北京)有限公司 Interaction method and device for conversation and computer readable storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110288997A (en) * 2019-07-22 2019-09-27 苏州思必驰信息科技有限公司 Device wake-up method and system for acoustic networking
CN111696562A (en) * 2020-04-29 2020-09-22 华为技术有限公司 Voice wake-up method, device and storage medium

Also Published As

Publication number Publication date
CN113889102A (en) 2022-01-04

Similar Documents

Publication Publication Date Title
JP6828001B2 (en) Voice wakeup method and equipment
CN107644638B (en) Audio recognition method, device, terminal and computer readable storage medium
CN108694940B (en) Voice recognition method and device and electronic equipment
JP6400129B2 (en) Speech synthesis method and apparatus
CN111261151B (en) Voice processing method and device, electronic equipment and storage medium
JP2020505643A (en) Voice recognition method, electronic device, and computer storage medium
JP6276400B2 (en) Control device and message output control system
CN108269567A (en) For generating the method, apparatus of far field voice data, computing device and computer readable storage medium
KR20190046631A (en) System and method for natural language processing
CN110349575A (en) Method, apparatus, electronic equipment and the storage medium of speech recognition
US12002451B1 (en) Automatic speech recognition
US11531789B1 (en) Floor plan generation for device visualization and use
CN110473542B (en) Awakening method and device for voice instruction execution function and electronic equipment
US12190883B2 (en) Speaker recognition adaptation
US11900921B1 (en) Multi-device speech processing
CN111414760B (en) Natural language processing method, related equipment, system and storage device
US10952075B2 (en) Electronic apparatus and WiFi connecting method thereof
CN111968643A (en) Intelligent recognition method, robot and computer readable storage medium
US20220161131A1 (en) Systems and devices for controlling network applications
CN113889102B (en) Instruction receiving method, system, electronic device, cloud server and storage medium
CN117238275B (en) Speech synthesis model training method, device and synthesis method based on common sense reasoning
CN109712606A (en) A kind of information acquisition method, device, equipment and storage medium
CN113597641A (en) Voice processing method, device and system
US11670294B2 (en) Method of generating wakeup model and electronic device therefor
JP7055327B2 (en) Conversation collection device, conversation collection system and conversation collection method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant