CN114236469A - A method and system for robot speech recognition and localization - Google Patents
A method and system for robot speech recognition and localization Download PDFInfo
- Publication number
- CN114236469A CN114236469A CN202111361624.2A CN202111361624A CN114236469A CN 114236469 A CN114236469 A CN 114236469A CN 202111361624 A CN202111361624 A CN 202111361624A CN 114236469 A CN114236469 A CN 114236469A
- Authority
- CN
- China
- Prior art keywords
- microphone
- robot
- time delay
- speech signal
- speech
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S5/00—Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
- G01S5/18—Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J13/00—Controls for manipulators
- B25J13/003—Controls for manipulators by means of an audio-responsive input
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S5/00—Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
- G01S5/18—Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
- G01S5/186—Determination of attitude
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Multimedia (AREA)
- Robotics (AREA)
- Mechanical Engineering (AREA)
- Measurement Of Velocity Or Position Using Acoustic Or Ultrasonic Waves (AREA)
Abstract
The invention discloses a robot voice recognition positioning method, which comprises the steps of S1, constructing a microphone array model; s2, collecting voice signals of the microphone elements and preprocessing the voice signals; s3, carrying out matching recognition on the preprocessed voice signals; s4, respectively calculating the relative time delay of the voice signals matched between the microphone elements; and S5, estimating the position and the posture of the robot according to the calculated relative time delay. The invention can acquire the voice signals in the environment in real time and accurately determine the accurate position of the robot through sound source positioning, thereby ensuring the positioning precision and accelerating the calculation efficiency of the robot positioning algorithm.
Description
Technical Field
The invention relates to the technical field of robot self-perception and positioning, in particular to a robot voice recognition positioning method and system.
Background
In daily life, the interaction modes among people mainly comprise voice, vision, gestures and other forms, wherein the voice is the simplest and most efficient interaction mode and is also most consistent with the communication habits of people. The voice recognition technology is a research hotspot in recent years, has made great progress, and is applied to many fields, such as vehicle-mounted equipment, games, intelligent household appliances, and the like. The voice recognition technology enables the machine to understand the content spoken by the user, frees both hands of the user, and improves human-computer interaction experience.
The emphasis of speech recognition is different for different applications. There are cases where only some of the keywords need to be identified, such as motion control based on speech keywords; some scenes require that all Chinese characters contained in the voice are recognized as accurately as possible, such as voice input; there are also situations that require not only complete recognition of the text, but also insight into the emotional information of the speaker. In order to enable a user to have good human-computer interaction experience, besides a voice recognition technology, a sound source positioning technology is also not required, only a machine knows the direction of a speaker, action response can be made in a targeted manner, and positioning information is further combined with information such as vision, so that more functional scenes can be developed. Although the voice technology has been widely used in many fields, the voice technology has not been completely popularized in the robot industry, and some technical problems still remain to be solved.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a robot voice recognition positioning method and system.
In order to achieve the purpose of the invention, the invention adopts the technical scheme that:
in a first aspect, the present invention provides a robot voice recognition positioning method, including the following steps:
s1, constructing a microphone array model;
s2, collecting voice signals of the microphone elements and preprocessing the voice signals;
s3, carrying out matching recognition on the preprocessed voice signals;
s4, respectively calculating the relative time delay of the voice signals matched between the microphone elements;
and S5, estimating the position and the posture of the robot according to the calculated relative time delay.
Further, the step S1 specifically includes:
four microphones are adopted in a world coordinate system to form a quaternary cross-shaped microphone array, the center of the quaternary cross-shaped microphone array is located at the origin of the world coordinate system, each microphone is located on a coordinate axis, and the distances from the origin of the world coordinate system are equal.
Further, the step S2 specifically includes the following sub-steps:
s2-1, collecting voice signals of microphone elements;
s2-2, sampling the collected voice signals by adopting a set sampling frequency;
s2-3, carrying out high-frequency lifting processing on the sampled voice signal;
s2-4, framing the processed voice signal;
s2-5, windowing the processed voice signal;
and S2-6, carrying out end point detection on the processed voice signal by adopting a short-time energy and short-time average zero crossing rate method.
Further, the step S3 specifically includes the following sub-steps:
s3-1, respectively extracting linear prediction coefficient characteristics and frequency cepstrum coefficient characteristics from the preprocessed voice signals, and establishing a voice recognition characteristic vector sequence;
s3-2, calculating a frame matching distance matrix of each speech recognition feature vector sequence and a known speech recognition feature vector sequence;
and S3-3, recursively searching the speech signal with the minimum matching distance in the frame matching distance matrix as a recognition result.
Further, the step S3-3 of recursively searching for the speech signal with the minimum matching distance in the frame matching distance matrix specifically includes the following sub-steps:
s3-1-1, constructing a search objective function:
D(i,j)=|t(i)-r(j)|+min{D(i-1,j),D(i-1,j-1),D(i,j-1)}
in the formula, D (i, j) represents the matching distance between the ith feature in the speech recognition feature vector sequence and the jth feature in the known speech recognition feature vector sequence, t (i) represents the ith feature value in the speech recognition feature vector sequence, and r (j) represents the jth feature value in the known speech recognition feature vector sequence;
the constraint conditions are as follows:
D(1,1)=|t(1)-r(1)|
and S3-1-2, starting from D (1,1), calculating the values of D (i, j) row by row or column by column, and finally comparing all the calculated values to screen the voice signal with the minimum matching distance.
Further, the calculation formula of the relative time delay in step S4 is as follows:
τ12=argmaxE(αs(n-τ1)s(n-τ1-τ2))
in the formula, τ12Representing the relative time delays of the speech signal to the microphones 1 and 2, E representing the mathematical expectation value, a representing the attenuation coefficient of the speech signal, s (n) representing the speech signal, τ1,τ2Representing the time of arrival of the sound signal at microphone 1 and microphone 2.
Further, in step S5, the expression for estimating the distance between the robot and the sound source according to the calculated relative time delay is as follows:
wherein c represents the speed of sound, τ12Representing the relative time delay, τ, of the speech signal to microphone 1 and microphone 213Representing the relative time delay, τ, of the speech signal to the microphones 1 and 314Representing the relative time delay of the speech signal to the microphone 1 and the microphone 4.
Further, the expression for estimating the robot azimuth angle according to the calculated relative time delay in step S5 is as follows:
in the formula, τ12Representing the relative time delay, τ, of the speech signal to microphone 1 and microphone 213Representing the relative time delay, τ, of the speech signal to the microphones 1 and 314Representing the relative time delay of the speech signal to the microphone 1 and the microphone 4.
Further, the expression for estimating the pitch angle of the robot according to the calculated relative time delay in step S5 is as follows:
wherein c represents the speed of sound, τ12Representing the relative time delay, τ, of the speech signal to microphone 1 and microphone 213Representing the relative time delay, τ, of the speech signal to the microphones 1 and 314Representing the relative time delays of the speech signals to the microphone 1 and the microphone 4 and d representing the distance of the microphone elements to the origin of the world coordinate system.
In a second aspect, the present invention further provides a robot voice recognition positioning system, including:
the constructing module is used for constructing a microphone array model;
the acquisition module is used for acquiring the voice signals of the microphone elements and carrying out pretreatment;
the recognition module is used for carrying out matching recognition on the preprocessed voice signals;
and the estimation module is used for respectively calculating the relative time delay of the voice signals matched between the microphone elements and estimating the pose of the robot according to the calculated relative time delay.
The invention has the following beneficial effects:
according to the method, the microphone array model is built, the voice signals of the microphone elements are preprocessed and then are matched and identified, and finally the relative time delay of the matched voice signals among the microphone elements is calculated respectively to estimate the pose of the robot, so that the voice signals in the environment are acquired in real time, the accurate position of the robot is accurately determined through sound source positioning, and the calculation efficiency of a robot positioning algorithm can be improved while the positioning accuracy is ensured.
Drawings
FIG. 1 is a schematic flow chart of a robot speech recognition positioning method according to the present invention;
fig. 2 is a schematic structural diagram of a robot speech recognition positioning system of the present invention.
Detailed Description
The following description of the embodiments of the present invention is provided to facilitate the understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and it will be apparent to those skilled in the art that various changes may be made without departing from the spirit and scope of the invention as defined and defined in the appended claims, and all matters produced by the invention using the inventive concept are protected.
As shown in fig. 1, an embodiment of the present invention provides a robot speech recognition positioning method, including the following steps S1 to S5:
s1, constructing a microphone array model;
in an optional embodiment of the present invention, step S1 specifically includes:
four microphones are adopted in a world coordinate system to form a quaternary cross-shaped microphone array, the center of the quaternary cross-shaped microphone array is located at the origin of the world coordinate system, each microphone is located on a coordinate axis, and the distances from the origin of the world coordinate system are equal.
S2, collecting voice signals of the microphone elements and preprocessing the voice signals;
in an optional embodiment of the present invention, step S2 specifically includes the following sub-steps:
s2-1, collecting voice signals sent by a microphone element;
specifically, the present invention collects and receives the voice signal emitted from the sound source for each microphone element in the four-element cross-shaped microphone array constructed in step S1.
S2-2, sampling the collected voice signals by adopting a set sampling frequency;
specifically, the present invention samples the collected voice signal with a sampling frequency equal to or higher than twice the voice signal frequency, and converts the sampled voice signal from an analog signal to a digital signal.
S2-3, carrying out high-frequency lifting processing on the sampled voice signal;
specifically, the invention firstly performs pre-filtering on the sampled voice signal by adopting an anti-aliasing filter, and then performs high-frequency boosting processing on the voice signal, thereby smoothing the frequency spectrum of the voice signal.
The high-frequency boosting formula adopted by the invention is as follows:
in the formula, the output signal after high frequency lifting, x (n), x (n-1) are the input sampling values at the current moment and the previous moment respectively.
S2-4, framing the processed voice signal;
specifically, the invention divides the voice signal into a plurality of time segments, namely voice signal frames, by adopting a set time interval.
S2-5, windowing the processed voice signal;
specifically, the invention adopts a Hamming window to carry out windowing processing on the voice signal frame, thereby enabling the voice signal frame to be more stable.
The Hamming window expression adopted by the invention is as follows:
and S2-6, carrying out end point detection on the processed voice signal by adopting a short-time energy and short-time average zero crossing rate method.
Specifically, the method comprises the steps of firstly determining a high threshold parameter and a low threshold parameter according to short-time energy and a short-time average zero crossing rate; then, judging an initial endpoint of the voice signal according to the high threshold parameter; searching a secondary endpoint when the short-time average amplitude is reduced to the low threshold parameter near the determined initial endpoint according to the low threshold parameter; and finally, setting a segmentation threshold according to the average value of the short-time average zero-crossing rate, and searching a point near the secondary endpoint when the short-time average zero-crossing rate is reduced to the segmentation threshold of a set multiple as the endpoint obtained by final detection.
S3, carrying out matching recognition on the preprocessed voice signals;
in an optional embodiment of the present invention, step S3 specifically includes the following sub-steps:
s3-1, respectively extracting linear prediction coefficient characteristics and frequency cepstrum coefficient characteristics from the preprocessed voice signals, and establishing a voice recognition characteristic vector sequence;
s3-2, calculating a frame matching distance matrix of each speech recognition feature vector sequence and a known speech recognition feature vector sequence;
and S3-3, recursively searching the speech signal with the minimum matching distance in the frame matching distance matrix as a recognition result.
Specifically, the recursive search of the speech signal with the minimum matching distance in the frame matching distance matrix of the present invention specifically comprises the following sub-steps:
s3-1-1, constructing a search objective function:
D(i,j)=|t(i)-r(j)|+min{D(i-1,j),D(i-1,j-1),D(i,j-1)}
in the formula, D (i, j) represents the matching distance between the ith feature in the speech recognition feature vector sequence and the jth feature in the known speech recognition feature vector sequence, t (i) represents the ith feature value in the speech recognition feature vector sequence, and r (j) represents the jth feature value in the known speech recognition feature vector sequence;
the constraint conditions are as follows:
D(1,1)=|t(1)-r(1)|
and S3-1-2, starting from D (1,1), calculating the values of D (i, j) row by row or column by column, and finally comparing all the calculated values to screen the voice signal with the minimum matching distance.
S4, respectively calculating the relative time delay of the voice signals matched between the microphone elements;
in an alternative embodiment of the present invention, the calculation formula of the relative time delay in step S4 is:
τ12=argmaxE(αs(n-τ1)s(n-τ1-τ2))
in the formula, τ12Representing the relative time delays of the speech signals to the microphones 1 and 2, E representing the mathematical expectation value, alpha tableRepresenting the attenuation coefficient of the speech signal, s (n) representing the speech signal, tau1,τ2Representing the time of arrival of the sound signal at microphone 1 and microphone 2.
In the present invention, the microphone 1 is used as a comparison microphone, and the same principle of the relative time delays of the microphone 3 and the microphone 4 and the microphone 1 can be obtained by calculation according to the above calculation formula, which is not described herein again.
And S5, estimating the position and the posture of the robot according to the calculated relative time delay.
In an alternative embodiment of the invention, the robot pose to be taken into account by the invention comprises the robot-to-sound source distance, the robot azimuth angle and the robot pitch angle.
The expression of estimating the distance between the robot and the sound source according to the calculated relative time delay is as follows:
wherein c represents the speed of sound, τ12Representing the relative time delay, τ, of the speech signal to microphone 1 and microphone 213Representing the relative time delay, τ, of the speech signal to the microphones 1 and 314Representing the relative time delay of the speech signal to the microphone 1 and the microphone 4.
The expression of estimating the azimuth angle of the robot according to the calculated relative time delay is as follows:
in the formula, τ12Representing the relative time delay, τ, of the speech signal to microphone 1 and microphone 213Representing the relative time delay, τ, of the speech signal to the microphones 1 and 314Representing the relative time delay of the speech signal to the microphone 1 and the microphone 4.
The expression of estimating the pitch angle of the robot according to the calculated relative time delay is as follows:
wherein c represents the speed of sound, τ12Representing the relative time delay, τ, of the speech signal to microphone 1 and microphone 213Representing the relative time delay, τ, of the speech signal to the microphones 1 and 314Representing the relative time delays of the speech signals to the microphone 1 and the microphone 4 and d representing the distance of the microphone elements to the origin of the world coordinate system.
As shown in fig. 2, the present invention further provides a robot voice recognition positioning system, including:
the constructing module is used for constructing a microphone array model;
the acquisition module is used for acquiring the voice signals of the microphone elements and carrying out pretreatment;
the recognition module is used for carrying out matching recognition on the preprocessed voice signals;
and the estimation module is used for respectively calculating the relative time delay of the voice signals matched between the microphone elements and estimating the pose of the robot according to the calculated relative time delay.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The principle and the implementation mode of the invention are explained by applying specific embodiments in the invention, and the description of the embodiments is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.
It will be appreciated by those of ordinary skill in the art that the embodiments described herein are intended to assist the reader in understanding the principles of the invention and are to be construed as being without limitation to such specifically recited embodiments and examples. Those skilled in the art can make various other specific changes and combinations based on the teachings of the present invention without departing from the spirit of the invention, and these changes and combinations are within the scope of the invention.
Claims (10)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202111361624.2A CN114236469A (en) | 2021-11-17 | 2021-11-17 | A method and system for robot speech recognition and localization |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202111361624.2A CN114236469A (en) | 2021-11-17 | 2021-11-17 | A method and system for robot speech recognition and localization |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN114236469A true CN114236469A (en) | 2022-03-25 |
Family
ID=80749832
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202111361624.2A Pending CN114236469A (en) | 2021-11-17 | 2021-11-17 | A method and system for robot speech recognition and localization |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN114236469A (en) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN115972231A (en) * | 2023-01-06 | 2023-04-18 | 西北工业大学 | A swarm robot system and method based on acoustic perception processing |
| CN117768816A (en) * | 2023-11-15 | 2024-03-26 | 兴科迪科技(泰州)有限公司 | Method and device for realizing sound collection based on small-size PCBA |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102982803A (en) * | 2012-12-11 | 2013-03-20 | 华南师范大学 | Isolated word speech recognition method based on HRSF and improved DTW algorithm |
| CN103308889A (en) * | 2013-05-13 | 2013-09-18 | 辽宁工业大学 | Passive sound source two-dimensional DOA (direction of arrival) estimation method under complex environment |
| CN104991573A (en) * | 2015-06-25 | 2015-10-21 | 北京品创汇通科技有限公司 | Locating and tracking method and apparatus based on sound source array |
| CN106162431A (en) * | 2015-04-02 | 2016-11-23 | 钰太芯微电子科技(上海)有限公司 | The beam positioning system of giant-screen mobile terminal |
| WO2016183791A1 (en) * | 2015-05-19 | 2016-11-24 | 华为技术有限公司 | Voice signal processing method and device |
-
2021
- 2021-11-17 CN CN202111361624.2A patent/CN114236469A/en active Pending
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102982803A (en) * | 2012-12-11 | 2013-03-20 | 华南师范大学 | Isolated word speech recognition method based on HRSF and improved DTW algorithm |
| CN103308889A (en) * | 2013-05-13 | 2013-09-18 | 辽宁工业大学 | Passive sound source two-dimensional DOA (direction of arrival) estimation method under complex environment |
| CN106162431A (en) * | 2015-04-02 | 2016-11-23 | 钰太芯微电子科技(上海)有限公司 | The beam positioning system of giant-screen mobile terminal |
| WO2016183791A1 (en) * | 2015-05-19 | 2016-11-24 | 华为技术有限公司 | Voice signal processing method and device |
| CN104991573A (en) * | 2015-06-25 | 2015-10-21 | 北京品创汇通科技有限公司 | Locating and tracking method and apparatus based on sound source array |
Non-Patent Citations (6)
| Title |
|---|
| 杜鹃: "基于支持向量机的说话人识别", 中国优秀硕士学位论文全文数据库信息科技辑, no. 03, 15 September 2007 (2007-09-15), pages 11 - 15 * |
| 段生全: "分频匹配在语音识别与控制中的应用", 声学与电子工程, no. 3, 31 December 2005 (2005-12-31), pages 25 - 27 * |
| 谢迎春, 于湘珍, 刘建平, 张卫华: "基于多特征有效组合的说话人识别", 现代电子技术, no. 09, 1 September 2005 (2005-09-01) * |
| 陈克兴 等: "设备状态监测与故障诊断技术", 31 August 1991, 科学技术文献出版社, pages: 53 * |
| 靳晓强 等: "移动机器人语音定向算法及其实现", 计算机仿真, vol. 29, no. 11, 30 November 2012 (2012-11-30), pages 223 - 226 * |
| 韩鸿鸾 等: "工业机器人的组成一体化教程", 31 December 2020, 西安电子科技大学出版社, pages: 390 * |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN115972231A (en) * | 2023-01-06 | 2023-04-18 | 西北工业大学 | A swarm robot system and method based on acoustic perception processing |
| CN115972231B (en) * | 2023-01-06 | 2024-08-30 | 西北工业大学 | Cluster robot system and method based on acoustic perception processing |
| CN117768816A (en) * | 2023-11-15 | 2024-03-26 | 兴科迪科技(泰州)有限公司 | Method and device for realizing sound collection based on small-size PCBA |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11508366B2 (en) | Whispering voice recovery method, apparatus and device, and readable storage medium | |
| CN110310623B (en) | Sample generation method, model training method, device, medium, and electronic apparatus | |
| CN103943107B (en) | A kind of audio frequency and video keyword recognition method based on Decision-level fusion | |
| CN110910891B (en) | Speaker segmentation labeling method based on long-time and short-time memory deep neural network | |
| CN106653056B (en) | Fundamental frequency extraction model and training method based on LSTM recurrent neural network | |
| CN110838289A (en) | Awakening word detection method, device, equipment and medium based on artificial intelligence | |
| CN114387997B (en) | Voice emotion recognition method based on deep learning | |
| CN110600017A (en) | Training method of voice processing model, voice recognition method, system and device | |
| CN106601230B (en) | Logistics sorting place name voice recognition method and system based on continuous Gaussian mixture HMM model and logistics sorting system | |
| CN102707806B (en) | Motion recognition method based on acceleration sensor | |
| EP4310838B1 (en) | Speech wakeup method and apparatus, and storage medium and system | |
| CN103258533B (en) | Novel model domain compensation method in remote voice recognition | |
| CN108766459A (en) | Target speaker method of estimation and system in a kind of mixing of multi-person speech | |
| US9799333B2 (en) | System and method for processing speech to identify keywords or other information | |
| JP2011186351A (en) | Information processor, information processing method, and program | |
| CN110534133A (en) | A kind of speech emotion recognition system and speech-emotion recognition method | |
| CN101452529A (en) | Information processing apparatus and information processing method, and computer program | |
| CN109697978B (en) | Method and apparatus for generating a model | |
| CN105161092A (en) | Speech recognition method and device | |
| CN114236469A (en) | A method and system for robot speech recognition and localization | |
| CN113450771A (en) | Awakening method, model training method and device | |
| CN109688271A (en) | The method, apparatus and terminal device of contact information input | |
| CN104103280A (en) | Dynamic time warping algorithm based voice activity detection method and device | |
| Marti et al. | Real time speaker localization and detection system for camera steering in multiparticipant videoconferencing environments | |
| CN117995187A (en) | Customer service robot and dialogue processing system and method based on deep learning |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |