[go: up one dir, main page]

CN114236469A - A method and system for robot speech recognition and localization - Google Patents

A method and system for robot speech recognition and localization Download PDF

Info

Publication number
CN114236469A
CN114236469A CN202111361624.2A CN202111361624A CN114236469A CN 114236469 A CN114236469 A CN 114236469A CN 202111361624 A CN202111361624 A CN 202111361624A CN 114236469 A CN114236469 A CN 114236469A
Authority
CN
China
Prior art keywords
microphone
robot
time delay
speech signal
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111361624.2A
Other languages
Chinese (zh)
Inventor
张九华
陈兴元
罗国平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Leshan Normal University
Original Assignee
Leshan Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Leshan Normal University filed Critical Leshan Normal University
Priority to CN202111361624.2A priority Critical patent/CN114236469A/en
Publication of CN114236469A publication Critical patent/CN114236469A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S5/00Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
    • G01S5/18Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J13/00Controls for manipulators
    • B25J13/003Controls for manipulators by means of an audio-responsive input
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S5/00Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
    • G01S5/18Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
    • G01S5/186Determination of attitude

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Multimedia (AREA)
  • Robotics (AREA)
  • Mechanical Engineering (AREA)
  • Measurement Of Velocity Or Position Using Acoustic Or Ultrasonic Waves (AREA)

Abstract

The invention discloses a robot voice recognition positioning method, which comprises the steps of S1, constructing a microphone array model; s2, collecting voice signals of the microphone elements and preprocessing the voice signals; s3, carrying out matching recognition on the preprocessed voice signals; s4, respectively calculating the relative time delay of the voice signals matched between the microphone elements; and S5, estimating the position and the posture of the robot according to the calculated relative time delay. The invention can acquire the voice signals in the environment in real time and accurately determine the accurate position of the robot through sound source positioning, thereby ensuring the positioning precision and accelerating the calculation efficiency of the robot positioning algorithm.

Description

Robot voice recognition positioning method and system
Technical Field
The invention relates to the technical field of robot self-perception and positioning, in particular to a robot voice recognition positioning method and system.
Background
In daily life, the interaction modes among people mainly comprise voice, vision, gestures and other forms, wherein the voice is the simplest and most efficient interaction mode and is also most consistent with the communication habits of people. The voice recognition technology is a research hotspot in recent years, has made great progress, and is applied to many fields, such as vehicle-mounted equipment, games, intelligent household appliances, and the like. The voice recognition technology enables the machine to understand the content spoken by the user, frees both hands of the user, and improves human-computer interaction experience.
The emphasis of speech recognition is different for different applications. There are cases where only some of the keywords need to be identified, such as motion control based on speech keywords; some scenes require that all Chinese characters contained in the voice are recognized as accurately as possible, such as voice input; there are also situations that require not only complete recognition of the text, but also insight into the emotional information of the speaker. In order to enable a user to have good human-computer interaction experience, besides a voice recognition technology, a sound source positioning technology is also not required, only a machine knows the direction of a speaker, action response can be made in a targeted manner, and positioning information is further combined with information such as vision, so that more functional scenes can be developed. Although the voice technology has been widely used in many fields, the voice technology has not been completely popularized in the robot industry, and some technical problems still remain to be solved.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a robot voice recognition positioning method and system.
In order to achieve the purpose of the invention, the invention adopts the technical scheme that:
in a first aspect, the present invention provides a robot voice recognition positioning method, including the following steps:
s1, constructing a microphone array model;
s2, collecting voice signals of the microphone elements and preprocessing the voice signals;
s3, carrying out matching recognition on the preprocessed voice signals;
s4, respectively calculating the relative time delay of the voice signals matched between the microphone elements;
and S5, estimating the position and the posture of the robot according to the calculated relative time delay.
Further, the step S1 specifically includes:
four microphones are adopted in a world coordinate system to form a quaternary cross-shaped microphone array, the center of the quaternary cross-shaped microphone array is located at the origin of the world coordinate system, each microphone is located on a coordinate axis, and the distances from the origin of the world coordinate system are equal.
Further, the step S2 specifically includes the following sub-steps:
s2-1, collecting voice signals of microphone elements;
s2-2, sampling the collected voice signals by adopting a set sampling frequency;
s2-3, carrying out high-frequency lifting processing on the sampled voice signal;
s2-4, framing the processed voice signal;
s2-5, windowing the processed voice signal;
and S2-6, carrying out end point detection on the processed voice signal by adopting a short-time energy and short-time average zero crossing rate method.
Further, the step S3 specifically includes the following sub-steps:
s3-1, respectively extracting linear prediction coefficient characteristics and frequency cepstrum coefficient characteristics from the preprocessed voice signals, and establishing a voice recognition characteristic vector sequence;
s3-2, calculating a frame matching distance matrix of each speech recognition feature vector sequence and a known speech recognition feature vector sequence;
and S3-3, recursively searching the speech signal with the minimum matching distance in the frame matching distance matrix as a recognition result.
Further, the step S3-3 of recursively searching for the speech signal with the minimum matching distance in the frame matching distance matrix specifically includes the following sub-steps:
s3-1-1, constructing a search objective function:
D(i,j)=|t(i)-r(j)|+min{D(i-1,j),D(i-1,j-1),D(i,j-1)}
in the formula, D (i, j) represents the matching distance between the ith feature in the speech recognition feature vector sequence and the jth feature in the known speech recognition feature vector sequence, t (i) represents the ith feature value in the speech recognition feature vector sequence, and r (j) represents the jth feature value in the known speech recognition feature vector sequence;
the constraint conditions are as follows:
D(1,1)=|t(1)-r(1)|
and S3-1-2, starting from D (1,1), calculating the values of D (i, j) row by row or column by column, and finally comparing all the calculated values to screen the voice signal with the minimum matching distance.
Further, the calculation formula of the relative time delay in step S4 is as follows:
τ12=argmaxE(αs(n-τ1)s(n-τ12))
in the formula, τ12Representing the relative time delays of the speech signal to the microphones 1 and 2, E representing the mathematical expectation value, a representing the attenuation coefficient of the speech signal, s (n) representing the speech signal, τ12Representing the time of arrival of the sound signal at microphone 1 and microphone 2.
Further, in step S5, the expression for estimating the distance between the robot and the sound source according to the calculated relative time delay is as follows:
Figure BDA0003359489590000041
wherein c represents the speed of sound, τ12Representing the relative time delay, τ, of the speech signal to microphone 1 and microphone 213Representing the relative time delay, τ, of the speech signal to the microphones 1 and 314Representing the relative time delay of the speech signal to the microphone 1 and the microphone 4.
Further, the expression for estimating the robot azimuth angle according to the calculated relative time delay in step S5 is as follows:
Figure BDA0003359489590000042
in the formula, τ12Representing the relative time delay, τ, of the speech signal to microphone 1 and microphone 213Representing the relative time delay, τ, of the speech signal to the microphones 1 and 314Representing the relative time delay of the speech signal to the microphone 1 and the microphone 4.
Further, the expression for estimating the pitch angle of the robot according to the calculated relative time delay in step S5 is as follows:
Figure BDA0003359489590000043
wherein c represents the speed of sound, τ12Representing the relative time delay, τ, of the speech signal to microphone 1 and microphone 213Representing the relative time delay, τ, of the speech signal to the microphones 1 and 314Representing the relative time delays of the speech signals to the microphone 1 and the microphone 4 and d representing the distance of the microphone elements to the origin of the world coordinate system.
In a second aspect, the present invention further provides a robot voice recognition positioning system, including:
the constructing module is used for constructing a microphone array model;
the acquisition module is used for acquiring the voice signals of the microphone elements and carrying out pretreatment;
the recognition module is used for carrying out matching recognition on the preprocessed voice signals;
and the estimation module is used for respectively calculating the relative time delay of the voice signals matched between the microphone elements and estimating the pose of the robot according to the calculated relative time delay.
The invention has the following beneficial effects:
according to the method, the microphone array model is built, the voice signals of the microphone elements are preprocessed and then are matched and identified, and finally the relative time delay of the matched voice signals among the microphone elements is calculated respectively to estimate the pose of the robot, so that the voice signals in the environment are acquired in real time, the accurate position of the robot is accurately determined through sound source positioning, and the calculation efficiency of a robot positioning algorithm can be improved while the positioning accuracy is ensured.
Drawings
FIG. 1 is a schematic flow chart of a robot speech recognition positioning method according to the present invention;
fig. 2 is a schematic structural diagram of a robot speech recognition positioning system of the present invention.
Detailed Description
The following description of the embodiments of the present invention is provided to facilitate the understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and it will be apparent to those skilled in the art that various changes may be made without departing from the spirit and scope of the invention as defined and defined in the appended claims, and all matters produced by the invention using the inventive concept are protected.
As shown in fig. 1, an embodiment of the present invention provides a robot speech recognition positioning method, including the following steps S1 to S5:
s1, constructing a microphone array model;
in an optional embodiment of the present invention, step S1 specifically includes:
four microphones are adopted in a world coordinate system to form a quaternary cross-shaped microphone array, the center of the quaternary cross-shaped microphone array is located at the origin of the world coordinate system, each microphone is located on a coordinate axis, and the distances from the origin of the world coordinate system are equal.
S2, collecting voice signals of the microphone elements and preprocessing the voice signals;
in an optional embodiment of the present invention, step S2 specifically includes the following sub-steps:
s2-1, collecting voice signals sent by a microphone element;
specifically, the present invention collects and receives the voice signal emitted from the sound source for each microphone element in the four-element cross-shaped microphone array constructed in step S1.
S2-2, sampling the collected voice signals by adopting a set sampling frequency;
specifically, the present invention samples the collected voice signal with a sampling frequency equal to or higher than twice the voice signal frequency, and converts the sampled voice signal from an analog signal to a digital signal.
S2-3, carrying out high-frequency lifting processing on the sampled voice signal;
specifically, the invention firstly performs pre-filtering on the sampled voice signal by adopting an anti-aliasing filter, and then performs high-frequency boosting processing on the voice signal, thereby smoothing the frequency spectrum of the voice signal.
The high-frequency boosting formula adopted by the invention is as follows:
Figure BDA0003359489590000061
in the formula, the output signal after high frequency lifting, x (n), x (n-1) are the input sampling values at the current moment and the previous moment respectively.
S2-4, framing the processed voice signal;
specifically, the invention divides the voice signal into a plurality of time segments, namely voice signal frames, by adopting a set time interval.
S2-5, windowing the processed voice signal;
specifically, the invention adopts a Hamming window to carry out windowing processing on the voice signal frame, thereby enabling the voice signal frame to be more stable.
The Hamming window expression adopted by the invention is as follows:
Figure BDA0003359489590000071
and S2-6, carrying out end point detection on the processed voice signal by adopting a short-time energy and short-time average zero crossing rate method.
Specifically, the method comprises the steps of firstly determining a high threshold parameter and a low threshold parameter according to short-time energy and a short-time average zero crossing rate; then, judging an initial endpoint of the voice signal according to the high threshold parameter; searching a secondary endpoint when the short-time average amplitude is reduced to the low threshold parameter near the determined initial endpoint according to the low threshold parameter; and finally, setting a segmentation threshold according to the average value of the short-time average zero-crossing rate, and searching a point near the secondary endpoint when the short-time average zero-crossing rate is reduced to the segmentation threshold of a set multiple as the endpoint obtained by final detection.
S3, carrying out matching recognition on the preprocessed voice signals;
in an optional embodiment of the present invention, step S3 specifically includes the following sub-steps:
s3-1, respectively extracting linear prediction coefficient characteristics and frequency cepstrum coefficient characteristics from the preprocessed voice signals, and establishing a voice recognition characteristic vector sequence;
s3-2, calculating a frame matching distance matrix of each speech recognition feature vector sequence and a known speech recognition feature vector sequence;
and S3-3, recursively searching the speech signal with the minimum matching distance in the frame matching distance matrix as a recognition result.
Specifically, the recursive search of the speech signal with the minimum matching distance in the frame matching distance matrix of the present invention specifically comprises the following sub-steps:
s3-1-1, constructing a search objective function:
D(i,j)=|t(i)-r(j)|+min{D(i-1,j),D(i-1,j-1),D(i,j-1)}
in the formula, D (i, j) represents the matching distance between the ith feature in the speech recognition feature vector sequence and the jth feature in the known speech recognition feature vector sequence, t (i) represents the ith feature value in the speech recognition feature vector sequence, and r (j) represents the jth feature value in the known speech recognition feature vector sequence;
the constraint conditions are as follows:
D(1,1)=|t(1)-r(1)|
and S3-1-2, starting from D (1,1), calculating the values of D (i, j) row by row or column by column, and finally comparing all the calculated values to screen the voice signal with the minimum matching distance.
S4, respectively calculating the relative time delay of the voice signals matched between the microphone elements;
in an alternative embodiment of the present invention, the calculation formula of the relative time delay in step S4 is:
τ12=argmaxE(αs(n-τ1)s(n-τ12))
in the formula, τ12Representing the relative time delays of the speech signals to the microphones 1 and 2, E representing the mathematical expectation value, alpha tableRepresenting the attenuation coefficient of the speech signal, s (n) representing the speech signal, tau12Representing the time of arrival of the sound signal at microphone 1 and microphone 2.
In the present invention, the microphone 1 is used as a comparison microphone, and the same principle of the relative time delays of the microphone 3 and the microphone 4 and the microphone 1 can be obtained by calculation according to the above calculation formula, which is not described herein again.
And S5, estimating the position and the posture of the robot according to the calculated relative time delay.
In an alternative embodiment of the invention, the robot pose to be taken into account by the invention comprises the robot-to-sound source distance, the robot azimuth angle and the robot pitch angle.
The expression of estimating the distance between the robot and the sound source according to the calculated relative time delay is as follows:
Figure BDA0003359489590000081
wherein c represents the speed of sound, τ12Representing the relative time delay, τ, of the speech signal to microphone 1 and microphone 213Representing the relative time delay, τ, of the speech signal to the microphones 1 and 314Representing the relative time delay of the speech signal to the microphone 1 and the microphone 4.
The expression of estimating the azimuth angle of the robot according to the calculated relative time delay is as follows:
Figure BDA0003359489590000091
in the formula, τ12Representing the relative time delay, τ, of the speech signal to microphone 1 and microphone 213Representing the relative time delay, τ, of the speech signal to the microphones 1 and 314Representing the relative time delay of the speech signal to the microphone 1 and the microphone 4.
The expression of estimating the pitch angle of the robot according to the calculated relative time delay is as follows:
Figure BDA0003359489590000092
wherein c represents the speed of sound, τ12Representing the relative time delay, τ, of the speech signal to microphone 1 and microphone 213Representing the relative time delay, τ, of the speech signal to the microphones 1 and 314Representing the relative time delays of the speech signals to the microphone 1 and the microphone 4 and d representing the distance of the microphone elements to the origin of the world coordinate system.
As shown in fig. 2, the present invention further provides a robot voice recognition positioning system, including:
the constructing module is used for constructing a microphone array model;
the acquisition module is used for acquiring the voice signals of the microphone elements and carrying out pretreatment;
the recognition module is used for carrying out matching recognition on the preprocessed voice signals;
and the estimation module is used for respectively calculating the relative time delay of the voice signals matched between the microphone elements and estimating the pose of the robot according to the calculated relative time delay.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The principle and the implementation mode of the invention are explained by applying specific embodiments in the invention, and the description of the embodiments is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.
It will be appreciated by those of ordinary skill in the art that the embodiments described herein are intended to assist the reader in understanding the principles of the invention and are to be construed as being without limitation to such specifically recited embodiments and examples. Those skilled in the art can make various other specific changes and combinations based on the teachings of the present invention without departing from the spirit of the invention, and these changes and combinations are within the scope of the invention.

Claims (10)

1.一种机器人语音识别定位方法,其特征在于,包括以下步骤:1. a robot voice recognition positioning method, is characterized in that, comprises the following steps: S1、构建麦克风阵列模型;S1. Build a microphone array model; S2、采集麦克风阵元的语音信号,并进行预处理;S2, collect the voice signal of the microphone array element, and perform preprocessing; S3、对预处理后的语音信号进行匹配识别;S3, matching and recognizing the preprocessed speech signal; S4、分别计算麦克风阵元之间匹配的语音信号的相对时延;S4, respectively calculating the relative time delay of the matched voice signal between the microphone array elements; S5、根据计算的相对时延对机器人位姿进行估计。S5. Estimate the robot pose according to the calculated relative time delay. 2.根据权利要求1所述的机器人语音识别定位方法,其特征在于,所述步骤S1具体包括:2. The robot voice recognition positioning method according to claim 1, wherein the step S1 specifically comprises: 在世界坐标系中采用四个麦克风组成四元十字形麦克风阵列,上述四元十字形麦克风阵列的中心位于世界坐标系原点,每个麦克风均位于坐标轴上,且距离世界坐标系原点的距离相等。In the world coordinate system, four microphones are used to form a quaternary cross-shaped microphone array. The center of the above-mentioned quaternary cross-shaped microphone array is located at the origin of the world coordinate system, and each microphone is located on the coordinate axis, and the distance from the origin of the world coordinate system is equal. . 3.根据权利要求1所述的机器人语音识别定位方法,其特征在于,所述步骤S2具体包括以下分步骤:3. robot voice recognition positioning method according to claim 1, is characterized in that, described step S2 specifically comprises the following sub-steps: S2-1、采集麦克风阵元的语音信号;S2-1, collect the voice signal of the microphone array element; S2-2、采用设定采样频率对采集的语音信号进行采样;S2-2. Use the set sampling frequency to sample the collected voice signal; S2-3、对采样的语音信号进行高频提升处理;S2-3, perform high-frequency boost processing on the sampled voice signal; S2-4、对处理后的语音信号进行分帧处理;S2-4, performing frame-by-frame processing on the processed voice signal; S2-5、对处理后的语音信号进行加窗处理;S2-5, performing window processing on the processed voice signal; S2-6、采用短时能量和短时平均过零率方法对处理后的语音信号进行端点检测。S2-6, using short-time energy and short-time average zero-crossing rate method to perform endpoint detection on the processed speech signal. 4.根据权利要求1所述的机器人语音识别定位方法,其特征在于,所述步骤S3具体包括以下分步骤:4. The robot voice recognition positioning method according to claim 1, wherein the step S3 specifically comprises the following steps: S3-1、对预处理后的语音信号分别提取线性预测系数特征和频率倒谱系数特征,并建立语音识别特征矢量序列;S3-1. Extract the linear prediction coefficient feature and the frequency cepstral coefficient feature from the preprocessed speech signal respectively, and establish a speech recognition feature vector sequence; S3-2、计算每一个语音识别特征矢量序列与已知语音识别特征矢量序列的帧匹配距离矩阵;S3-2, calculate the frame matching distance matrix between each speech recognition feature vector sequence and the known speech recognition feature vector sequence; S3-3、在帧匹配距离矩阵中递归搜索匹配距离最小的语音信号作为识别结果。S3-3, recursively search for the speech signal with the smallest matching distance in the frame matching distance matrix as the recognition result. 5.根据权利要求4所述的机器人语音识别定位方法,其特征在于,所述步骤S3-3中在帧匹配距离矩阵中递归搜索匹配距离最小的语音信号具体包括以下分步骤:5. robot voice recognition positioning method according to claim 4, is characterized in that, in described step S3-3, in described step S3-3, in the frame matching distance matrix, recursively search for the voice signal with the minimum matching distance specifically comprises the following substeps: S3-1-1、构建搜索目标函数:S3-1-1. Build the search objective function: D(i,j)=|t(i)-r(j)|+min{D(i-1,j),D(i-1,j-1),D(i,j-1)}D(i,j)=|t(i)-r(j)|+min{D(i-1,j),D(i-1,j-1),D(i,j-1)} 式中,D(i,j)表示语音识别特征矢量序列中第i个特征与已知语音识别特征矢量序列中第j个特征的匹配距离,t(i)表示语音识别特征矢量序列中第i个特征值,r(j)表示已知语音识别特征矢量序列中第j个特征值;In the formula, D(i,j) represents the matching distance between the ith feature in the speech recognition feature vector sequence and the jth feature in the known speech recognition feature vector sequence, and t(i) represents the ith feature in the speech recognition feature vector sequence. eigenvalues, r(j) represents the jth eigenvalue in the sequence of known speech recognition feature vectors; 约束条件为:The constraints are: D(1,1)=|t(1)-r(1)|D(1,1)=|t(1)-r(1)| S3-1-2、从D(1,1)开始,逐行或逐列计算D(i,j)的值,最终比较所有计算的值筛选得到匹配距离最小的语音信号。S3-1-2. Starting from D(1,1), calculate the value of D(i,j) row by row or column by column, and finally compare all the calculated values and filter to obtain the speech signal with the smallest matching distance. 6.根据权利要求1所述的机器人语音识别定位方法,其特征在于,所述步骤S4中相对时延的计算公式为:6. robot voice recognition positioning method according to claim 1, is characterized in that, the calculation formula of relative time delay in described step S4 is: τ12=argmaxE(αs(n-τ1)s(n-τ12))τ 12 =argmaxE(αs(n-τ 1 )s(n-τ 12 )) 式中,τ12表示语音信号到麦克风1和麦克风2的相对时延,E表示数学期望值,α表示语音信号的衰减系数,s(n)表示语音信号,τ12表示声音信号到麦克风1和麦克风2的时间。In the formula, τ 12 represents the relative delay from the speech signal to microphone 1 and microphone 2, E represents the mathematical expectation, α represents the attenuation coefficient of the speech signal, s(n) represents the speech signal, τ 1 , τ 2 represent the sound signal to the microphone 1 and mic 2 time. 7.根据权利要求1所述的机器人语音识别定位方法,其特征在于,所述步骤S5中根据计算的相对时延估计机器人与声源距离的表达式为:7. robot speech recognition positioning method according to claim 1, is characterized in that, in described step S5, the expression of estimated robot and sound source distance according to the relative time delay of calculation is:
Figure FDA0003359489580000031
Figure FDA0003359489580000031
式中,c表示声速,τ12表示语音信号到麦克风1和麦克风2的相对时延,τ13表示语音信号到麦克风1和麦克风3的相对时延,τ14表示语音信号到麦克风1和麦克风4的相对时延。In the formula, c represents the speed of sound, τ 12 represents the relative time delay from the speech signal to microphone 1 and microphone 2, τ 13 represents the relative time delay from the speech signal to microphone 1 and microphone 3, and τ 14 represents the speech signal to microphone 1 and microphone 4. relative delay.
8.根据权利要求1所述的机器人语音识别定位方法,其特征在于,所述步骤S5中根据计算的相对时延估计机器人方位角的表达式为:8. robot voice recognition positioning method according to claim 1, is characterized in that, in described step S5, the expression of estimated robot azimuth angle according to the relative time delay of calculation is:
Figure FDA0003359489580000032
Figure FDA0003359489580000032
式中,τ12表示语音信号到麦克风1和麦克风2的相对时延,τ13表示语音信号到麦克风1和麦克风3的相对时延,τ14表示语音信号到麦克风1和麦克风4的相对时延。In the formula, τ 12 represents the relative delay from the speech signal to microphone 1 and microphone 2, τ 13 represents the relative delay from the speech signal to microphone 1 and microphone 3, and τ 14 represents the relative delay from the speech signal to microphone 1 and microphone 4 .
9.根据权利要求1所述的机器人语音识别定位方法,其特征在于,所述步骤S5中根据计算的相对时延估计机器人俯仰角的表达式为:9. robot speech recognition positioning method according to claim 1, is characterized in that, the expression that estimates robot pitch angle according to the relative time delay of calculation in described step S5 is:
Figure FDA0003359489580000033
Figure FDA0003359489580000033
式中,c表示声速,τ12表示语音信号到麦克风1和麦克风2的相对时延,τ13表示语音信号到麦克风1和麦克风3的相对时延,τ14表示语音信号到麦克风1和麦克风4的相对时延,d表示麦克风阵元到世界坐标系原点的距离。In the formula, c represents the speed of sound, τ 12 represents the relative time delay from the speech signal to microphone 1 and microphone 2, τ 13 represents the relative time delay from the speech signal to microphone 1 and microphone 3, and τ 14 represents the speech signal to microphone 1 and microphone 4. The relative delay of , d represents the distance from the microphone element to the origin of the world coordinate system.
10.一种机器人语音识别定位系统,其特征在于,包括:10. A robot voice recognition and positioning system, comprising: 构建模块,用于构建麦克风阵列模型;Building blocks for modeling microphone arrays; 采集模块,用于采集麦克风阵元的语音信号,并进行预处理;The acquisition module is used to collect the voice signal of the microphone array element and perform preprocessing; 识别模块,用于对预处理后的语音信号进行匹配识别;The recognition module is used for matching and recognizing the preprocessed speech signal; 估计模块,用于分别计算麦克风阵元之间匹配的语音信号的相对时延,并根据计算的相对时延对机器人位姿进行估计。The estimation module is used to separately calculate the relative time delay of the matched speech signals between the microphone array elements, and estimate the robot pose according to the calculated relative time delay.
CN202111361624.2A 2021-11-17 2021-11-17 A method and system for robot speech recognition and localization Pending CN114236469A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111361624.2A CN114236469A (en) 2021-11-17 2021-11-17 A method and system for robot speech recognition and localization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111361624.2A CN114236469A (en) 2021-11-17 2021-11-17 A method and system for robot speech recognition and localization

Publications (1)

Publication Number Publication Date
CN114236469A true CN114236469A (en) 2022-03-25

Family

ID=80749832

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111361624.2A Pending CN114236469A (en) 2021-11-17 2021-11-17 A method and system for robot speech recognition and localization

Country Status (1)

Country Link
CN (1) CN114236469A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115972231A (en) * 2023-01-06 2023-04-18 西北工业大学 A swarm robot system and method based on acoustic perception processing
CN117768816A (en) * 2023-11-15 2024-03-26 兴科迪科技(泰州)有限公司 Method and device for realizing sound collection based on small-size PCBA

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102982803A (en) * 2012-12-11 2013-03-20 华南师范大学 Isolated word speech recognition method based on HRSF and improved DTW algorithm
CN103308889A (en) * 2013-05-13 2013-09-18 辽宁工业大学 Passive sound source two-dimensional DOA (direction of arrival) estimation method under complex environment
CN104991573A (en) * 2015-06-25 2015-10-21 北京品创汇通科技有限公司 Locating and tracking method and apparatus based on sound source array
CN106162431A (en) * 2015-04-02 2016-11-23 钰太芯微电子科技(上海)有限公司 The beam positioning system of giant-screen mobile terminal
WO2016183791A1 (en) * 2015-05-19 2016-11-24 华为技术有限公司 Voice signal processing method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102982803A (en) * 2012-12-11 2013-03-20 华南师范大学 Isolated word speech recognition method based on HRSF and improved DTW algorithm
CN103308889A (en) * 2013-05-13 2013-09-18 辽宁工业大学 Passive sound source two-dimensional DOA (direction of arrival) estimation method under complex environment
CN106162431A (en) * 2015-04-02 2016-11-23 钰太芯微电子科技(上海)有限公司 The beam positioning system of giant-screen mobile terminal
WO2016183791A1 (en) * 2015-05-19 2016-11-24 华为技术有限公司 Voice signal processing method and device
CN104991573A (en) * 2015-06-25 2015-10-21 北京品创汇通科技有限公司 Locating and tracking method and apparatus based on sound source array

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
杜鹃: "基于支持向量机的说话人识别", 中国优秀硕士学位论文全文数据库信息科技辑, no. 03, 15 September 2007 (2007-09-15), pages 11 - 15 *
段生全: "分频匹配在语音识别与控制中的应用", 声学与电子工程, no. 3, 31 December 2005 (2005-12-31), pages 25 - 27 *
谢迎春, 于湘珍, 刘建平, 张卫华: "基于多特征有效组合的说话人识别", 现代电子技术, no. 09, 1 September 2005 (2005-09-01) *
陈克兴 等: "设备状态监测与故障诊断技术", 31 August 1991, 科学技术文献出版社, pages: 53 *
靳晓强 等: "移动机器人语音定向算法及其实现", 计算机仿真, vol. 29, no. 11, 30 November 2012 (2012-11-30), pages 223 - 226 *
韩鸿鸾 等: "工业机器人的组成一体化教程", 31 December 2020, 西安电子科技大学出版社, pages: 390 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115972231A (en) * 2023-01-06 2023-04-18 西北工业大学 A swarm robot system and method based on acoustic perception processing
CN115972231B (en) * 2023-01-06 2024-08-30 西北工业大学 Cluster robot system and method based on acoustic perception processing
CN117768816A (en) * 2023-11-15 2024-03-26 兴科迪科技(泰州)有限公司 Method and device for realizing sound collection based on small-size PCBA

Similar Documents

Publication Publication Date Title
US11508366B2 (en) Whispering voice recovery method, apparatus and device, and readable storage medium
CN110310623B (en) Sample generation method, model training method, device, medium, and electronic apparatus
CN103943107B (en) A kind of audio frequency and video keyword recognition method based on Decision-level fusion
CN110910891B (en) Speaker segmentation labeling method based on long-time and short-time memory deep neural network
CN106653056B (en) Fundamental frequency extraction model and training method based on LSTM recurrent neural network
CN110838289A (en) Awakening word detection method, device, equipment and medium based on artificial intelligence
CN114387997B (en) Voice emotion recognition method based on deep learning
CN110600017A (en) Training method of voice processing model, voice recognition method, system and device
CN106601230B (en) Logistics sorting place name voice recognition method and system based on continuous Gaussian mixture HMM model and logistics sorting system
CN102707806B (en) Motion recognition method based on acceleration sensor
EP4310838B1 (en) Speech wakeup method and apparatus, and storage medium and system
CN103258533B (en) Novel model domain compensation method in remote voice recognition
CN108766459A (en) Target speaker method of estimation and system in a kind of mixing of multi-person speech
US9799333B2 (en) System and method for processing speech to identify keywords or other information
JP2011186351A (en) Information processor, information processing method, and program
CN110534133A (en) A kind of speech emotion recognition system and speech-emotion recognition method
CN101452529A (en) Information processing apparatus and information processing method, and computer program
CN109697978B (en) Method and apparatus for generating a model
CN105161092A (en) Speech recognition method and device
CN114236469A (en) A method and system for robot speech recognition and localization
CN113450771A (en) Awakening method, model training method and device
CN109688271A (en) The method, apparatus and terminal device of contact information input
CN104103280A (en) Dynamic time warping algorithm based voice activity detection method and device
Marti et al. Real time speaker localization and detection system for camera steering in multiparticipant videoconferencing environments
CN117995187A (en) Customer service robot and dialogue processing system and method based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination