[go: up one dir, main page]

CN120220649B - Smart home voice interaction testing method and device - Google Patents

Smart home voice interaction testing method and device

Info

Publication number
CN120220649B
CN120220649B CN202510370640.XA CN202510370640A CN120220649B CN 120220649 B CN120220649 B CN 120220649B CN 202510370640 A CN202510370640 A CN 202510370640A CN 120220649 B CN120220649 B CN 120220649B
Authority
CN
China
Prior art keywords
voice interaction
coefficient
voice
data
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202510370640.XA
Other languages
Chinese (zh)
Other versions
CN120220649A (en
Inventor
曲宗峰
李红伟
焦利敏
胡清华
胡亚欣
金轮
刘泽超
顾子谦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cheari Beijing Certification & Testing Co ltd
Tianjin University
Original Assignee
Cheari Beijing Certification & Testing Co ltd
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cheari Beijing Certification & Testing Co ltd, Tianjin University filed Critical Cheari Beijing Certification & Testing Co ltd
Priority to CN202510370640.XA priority Critical patent/CN120220649B/en
Publication of CN120220649A publication Critical patent/CN120220649A/en
Application granted granted Critical
Publication of CN120220649B publication Critical patent/CN120220649B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/01Assessment or evaluation of speech recognition systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/28Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
    • H04L12/2803Home automation networks
    • H04L12/2816Controlling appliance services of a home automation network by calling their functionalities
    • H04L12/282Controlling appliance services of a home automation network by calling their functionalities based on user interaction within the home
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Quality & Reliability (AREA)
  • Automation & Control Theory (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Telephone Function (AREA)

Abstract

本发明涉及语音交互测试技术领域,公开了一种智能家居语音交互测试方法和装置,该方法包括:确定待测试特异化场景,根据场景特征数据确定语音交互参数;在待测试特异化场景下,根据语音交互数据获得测试分值;将测试分值与测试分值阈值进行比对,根据比对结果判断是否对语音交互参数进行修正;当确定修正语音交互参数后,根据距离数据、行为数据和设备状态数据确定用户与待测试智能家居之间的交互影响指数;根据交互影响指数判断是否对修正语音交互参数进行补偿;当判定对修正语音交互参数进行补偿时,根据补偿系数对修正语音交互参数进行补偿,获得补偿语音交互参数。本发明提高了语音交互测试的稳定性和准确性。

The present invention relates to the technical field of voice interaction testing, and discloses a method and device for testing voice interaction in a smart home. The method comprises: determining a specific scenario to be tested, determining voice interaction parameters based on scenario feature data; obtaining a test score based on the voice interaction data in the specific scenario to be tested; comparing the test score with a test score threshold, and determining whether to modify the voice interaction parameters based on the comparison result; after determining the modified voice interaction parameters, determining an interaction influence index between the user and the smart home to be tested based on distance data, behavior data, and device status data; determining whether to compensate for the modified voice interaction parameters based on the interaction influence index; and when it is determined that the modified voice interaction parameters are to be compensated, compensating the modified voice interaction parameters based on a compensation coefficient to obtain compensated voice interaction parameters. The present invention improves the stability and accuracy of voice interaction testing.

Description

Smart home voice interaction testing method and device
Technical Field
The invention relates to the technical field of voice interaction testing, in particular to a method and a device for intelligent home voice interaction testing.
Background
With the continuous progress and wide popularization of smart home technology, voice interaction has become one of the key ways for users to communicate and interact with smart home devices. At present, the existing voice interaction testing means are excellent in performance under a general application scene, and can better meet the basic requirements of users. However, under specific and complex circumstances, these test methods have limitations, and it is difficult to achieve stable and accurate test results. Such specialized scenarios cover complex home space layouts, diverse equipment type environments, and multiple complex and diverse scenarios such as multi-room environments, open and closed spaces, personalized space configurations, and the like.
In these specialized scenarios, speech interaction testing faces multiple challenges in view of the diversity and complexity of the environment. For example, in a multi-room environment, a user may need to interact with a device in different rooms, requiring that the voice interaction test be able to accurately identify and respond to instructions from the different rooms. Meanwhile, in the open space and the closed space, the problems of sound propagation characteristics, echo and the like can also influence the accuracy of voice interaction.
Therefore, it is necessary to design a method and a device for intelligent home voice interaction test for specific scene recognition to solve the problems in the prior art.
Disclosure of Invention
In view of the above, the invention provides a method and a device for testing voice interaction of an intelligent home, which aim to improve the stability and the accuracy of the voice interaction test.
In one aspect, the invention provides an intelligent home voice interaction testing method, which comprises the following steps:
S100, determining a specialized scene to be tested, extracting scene feature data of the specialized scene to be tested, and determining voice interaction parameters according to the scene feature data;
S200, under the specialized scene to be tested, performing voice interaction test on the intelligent home to be tested by adopting real-time voice data to obtain voice interaction data, and obtaining a test score according to the voice interaction data;
s300, comparing the test score with a test score threshold value, and judging whether to correct the voice interaction parameter according to the comparison result;
S400, when the voice interaction parameters are determined to be corrected, extracting the characteristics of the real-time voice data to obtain real-time voice characteristic values, comparing the real-time voice characteristic values with the historical voice data, determining correction coefficients according to comparison results, and correcting the voice interaction parameters according to the correction coefficients to obtain corrected voice interaction parameters;
s500, after the corrected voice interaction parameters are determined, acquiring distance data between a user and the intelligent home to be tested, behavior data of the user and equipment state data of the intelligent home to be tested, and determining interaction influence indexes between the user and the intelligent home to be tested according to the distance data, the behavior data and the equipment state data;
and S600, when the corrected voice interaction parameter is judged to be compensated, determining a compensation coefficient according to the interaction influence index, and compensating the corrected voice interaction parameter according to the compensation coefficient to obtain the compensated voice interaction parameter.
Further, when determining the voice interaction parameter according to the scene feature data, the method comprises the following steps:
The scene characteristic data comprise noise characteristic data and space echo characteristic data;
the voice interaction parameters comprise a recognition threshold, a noise suppression parameter and a voice enhancement coefficient;
The recognition threshold is obtained by:
The noise suppression parameter is obtained by the following formula:
the speech enhancement coefficient is obtained by:
Wherein T represents a recognition threshold, T0 represents a basic recognition threshold, a1 represents a noise recognition weight coefficient, xn represents a noise characteristic value, bn represents a noise standard value, a2 represents a spatial recognition weight coefficient, xr represents a current spatial echo characteristic value, br represents a current spatial echo standard value, N represents a noise suppression parameter, N0 represents a basic noise suppression parameter, b1 represents a noise suppression weight coefficient, b2 represents a spatial suppression weight coefficient, C represents a voice enhancement coefficient, C0 represents a basic voice enhancement coefficient, k1 represents a voice enhancement weight coefficient, and k2 represents a spatial enhancement weight coefficient.
Further, comparing the test score with a test score threshold, and judging whether to correct the voice interaction parameter according to the comparison result, wherein the method comprises the following steps:
The test score is obtained by:
Wherein S represents a test score, c1 represents a weight coefficient of the speech recognition accuracy, rmax represents a maximum value of the speech recognition accuracy, R1 represents the speech recognition accuracy, c2 represents a weight coefficient of the response time, T2 represents the response time, tmin represents a minimum value of the response time, c3 represents a weight coefficient of the response accuracy, amax represents a maximum value of the response accuracy, and A3 represents the response accuracy.
Further, comparing the test score with a test score threshold, and judging whether to correct the voice interaction parameter according to the comparison result, wherein the method further comprises the following steps:
when the test score is larger than the test score threshold, judging that the voice interaction parameter is not corrected;
and when the test score is smaller than or equal to the test score threshold, the voice interaction parameter is judged to be corrected.
Further, comparing the real-time voice characteristic value with the historical voice data, determining a correction coefficient according to the comparison result, and correcting the voice interaction parameter according to the correction coefficient, wherein the method comprises the following steps of:
Calculating the maximum similarity between the real-time voice characteristic value and the historical voice data, comparing the maximum similarity with a maximum similarity threshold value, and determining the correction coefficient according to the comparison result;
the maximum similarity is obtained by the following formula:
Wherein M max represents the maximum similarity, X i represents the ith real-time speech feature vector, Y j represents the jth historical speech feature vector, M represents the number of real-time speech feature vectors, and n represents the number of historical speech feature vectors.
Further, comparing the maximum similarity with a maximum similarity threshold, and determining the correction coefficient according to the comparison result includes:
Comparing the maximum similarity with a first maximum similarity threshold and a second maximum similarity threshold, and determining the correction coefficient according to the comparison result, wherein the first maximum similarity threshold is smaller than the second maximum similarity threshold;
Setting a correction coefficient interval, wherein the correction coefficient interval comprises a first correction coefficient, a second correction coefficient and a third correction coefficient;
When the maximum similarity is smaller than or equal to the first maximum similarity threshold, determining a correction coefficient of the voice interaction parameter as a first correction coefficient, and taking a product value of the first correction coefficient and the voice interaction parameter as the corrected voice interaction parameter;
When the maximum similarity is larger than the first maximum similarity threshold and smaller than or equal to the second maximum similarity threshold, determining a correction coefficient of the voice interaction parameter as a second correction coefficient, and taking the product value of the second correction coefficient and the voice interaction parameter as the corrected voice interaction parameter;
And when the maximum similarity is greater than the second maximum similarity threshold, determining a correction coefficient of the voice interaction parameter as a third correction coefficient, and taking the product value of the third correction coefficient and the voice interaction parameter as the corrected voice interaction parameter.
Further, when determining the interaction impact index between the user and the smart home to be tested according to the distance data, the behavior data and the equipment state data, the method comprises the following steps:
the interaction impact index is obtained by the following formula:
Wherein I represents an interaction impact index, D represents distance data, B represents behavior data, E represents equipment state data, alpha 1 represents a distance weight coefficient, alpha 2 represents a behavior weight coefficient, alpha 3 represents an equipment state weight coefficient, beta 1 represents a first impact coefficient, beta 2 represents a second impact coefficient, and beta 3 represents a third impact coefficient.
Further, when judging whether to compensate the corrected voice interaction parameter according to the interaction impact index, the method includes:
the interaction influence index is subjected to difference with an interaction influence index threshold value, and an index difference value is obtained;
comparing the index difference value with an index difference value threshold value, and judging whether to compensate the corrected voice interaction parameter according to the comparison result;
when the index difference value is larger than or equal to the index difference value threshold value, the correction voice interaction parameter is judged to be compensated;
And when the index difference value is smaller than the index difference value threshold value, judging that the correction voice interaction parameter is not compensated.
Further, determining a compensation coefficient according to the interaction impact index, and compensating the corrected voice interaction parameter according to the compensation coefficient includes:
Comparing the index difference value threshold with a first index difference value threshold and a second index difference value threshold, and determining the compensation coefficient according to the comparison result, wherein the first index difference value threshold is smaller than the second index difference value threshold;
setting a compensation coefficient interval, wherein the compensation coefficient interval comprises a first compensation coefficient, a second compensation coefficient and a third compensation coefficient;
When the index difference value is smaller than or equal to the first index difference value threshold value, determining a compensation coefficient of the corrected voice interaction parameter as a first compensation coefficient, and taking a product value of the first compensation coefficient and the corrected voice interaction parameter as the corrected voice interaction parameter;
When the exponent difference is larger than the first exponent difference threshold and smaller than or equal to the second exponent difference threshold, determining a compensation coefficient of the corrected voice interaction parameter as a second compensation coefficient, and taking a product value of the second compensation coefficient and the corrected voice interaction parameter as the corrected voice interaction parameter;
and when the exponent difference is larger than the second exponent difference threshold, determining a compensation coefficient of the corrected voice interaction parameter as a third compensation coefficient, and taking the product value of the third compensation coefficient and the corrected voice interaction parameter as the compensated voice interaction parameter.
Compared with the prior art, the intelligent home voice interaction testing method for identifying the specialized scene has the advantages that stability and accuracy of voice interaction testing can be improved, scene characteristic data of the specialized scene to be tested are extracted through determining the specialized scene to be tested, voice interaction parameters are determined according to the scene characteristic data, adaptability and accuracy of testing can be improved, good performance in various scenes is guaranteed, voice interaction testing is conducted on the intelligent home to be tested through real-time voice data in the specialized scene to be tested, voice interaction data are obtained, testing scores are obtained according to the voice interaction data, testing conditions can be clearly mastered, the testing scores are compared with testing score thresholds, whether correction is conducted on the voice interaction parameters is judged according to comparison results, continuous optimization and adjustment are facilitated in practical application, requirements and environment changes of different users are met, the voice interaction parameters can be further guaranteed to be kept high in recognition under various noise environments through feature extraction of the real-time voice data and comparison with historical voice data, the voice interaction parameters can be collected through distance data between users and the intelligent home, the user interaction conditions can be better compensated, and the user interaction conditions can be better estimated, and the user interaction conditions can be more flexibly assessed, and the user interaction conditions can be better meet requirements and requirements are better and requirements are met.
In another aspect, the invention further provides an intelligent home voice interaction testing device for specific scene recognition, which comprises:
The system comprises a determining module, a voice interaction testing module, a determining module and a testing module, wherein the determining module is configured to determine a special scene to be tested, extract scene characteristic data of the special scene to be tested, and determine voice interaction parameters according to the scene characteristic data;
the first judging module is configured to compare the test score with a test score threshold value and judge whether to correct the voice interaction parameter according to the comparison result;
The correction module is configured to perform feature extraction on the real-time voice data when the voice interaction parameters are determined to be corrected, so as to obtain real-time voice feature values; comparing the real-time voice characteristic value with historical voice data, determining a correction coefficient according to a comparison result, and correcting the voice interaction parameter according to the correction coefficient to obtain a corrected voice interaction parameter;
the second judging module is configured to acquire distance data between a user and the intelligent home to be tested, behavior data of the user and equipment state data of the intelligent home to be tested after the corrected voice interaction parameters are determined, and determine interaction influence indexes between the user and the intelligent home to be tested according to the distance data, the behavior data and the equipment state data;
And the compensation module is configured to determine a compensation coefficient according to the interaction influence index when the correction voice interaction parameter is judged to be compensated, and compensate the correction voice interaction parameter according to the compensation coefficient to obtain the compensation voice interaction parameter.
It can be appreciated that the intelligent home voice interaction testing method and device for specific scene recognition have the same beneficial effects and are not described in detail herein.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:
Fig. 1 is a flowchart of an intelligent home voice interaction testing method for specific scene recognition, which is provided by the embodiment of the invention;
fig. 2 is a block diagram of an intelligent home voice interaction testing device for specific scene recognition, which is provided by the embodiment of the invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art. It should be noted that, without conflict, the embodiments of the present invention and features of the embodiments may be combined with each other. The invention will be described in detail below with reference to the drawings in connection with embodiments.
Referring to fig. 1, in some embodiments of the present application, the present embodiment provides a smart home voice interaction testing method for specific scene recognition, including the following steps:
S100, determining a specialized scene to be tested, extracting scene feature data of the specialized scene to be tested, and determining voice interaction parameters according to the scene feature data;
S200, under the specialized scene to be tested, performing voice interaction test on the intelligent home to be tested by adopting real-time voice data to obtain voice interaction data, and obtaining a test score according to the voice interaction data;
s300, comparing the test score with a test score threshold value, and judging whether to correct the voice interaction parameter according to the comparison result;
S400, when the voice interaction parameters are determined to be corrected, extracting the characteristics of the real-time voice data to obtain real-time voice characteristic values, comparing the real-time voice characteristic values with the historical voice data, determining correction coefficients according to comparison results, and correcting the voice interaction parameters according to the correction coefficients to obtain corrected voice interaction parameters;
s500, after the corrected voice interaction parameters are determined, acquiring distance data between a user and the intelligent home to be tested, behavior data of the user and equipment state data of the intelligent home to be tested, and determining interaction influence indexes between the user and the intelligent home to be tested according to the distance data, the behavior data and the equipment state data;
and S600, when the corrected voice interaction parameter is judged to be compensated, determining a compensation coefficient according to the interaction influence index, and compensating the corrected voice interaction parameter according to the compensation coefficient to obtain the compensated voice interaction parameter.
In this embodiment, the specialized scene to be tested is preferably a high noise environment. For example, in a high noise environment, it is desirable to be able to distinguish and ignore background noise and accurately identify the user's voice instructions.
It can be appreciated that the intelligent home voice interaction testing method for identifying the specialized scene provided by the embodiment can improve stability and accuracy of voice interaction testing, extract scene feature data of the specialized scene to be tested by determining the specialized scene to be tested, determine voice interaction parameters according to the scene feature data, improve adaptability and accuracy of testing, ensure good performance under various scenes, conduct voice interaction testing on the intelligent home to be tested by adopting real-time voice data in the specialized scene to be tested, obtain voice interaction data, obtain test scores according to the voice interaction data, clearly grasp testing conditions, compare the test scores with test score thresholds, judge whether to correct the voice interaction parameters according to comparison results, and be beneficial to continuous optimization and adjustment in practical application so as to adapt to requirements and environmental changes of different users, further refine the voice interaction parameters by extracting features of the real-time voice data and comparing with historical voice data, ensure that high recognition rate can be maintained under various noise environments, and further evaluate the necessary voice interaction parameters by acquiring distance data, intelligent home behavior data and behavior data, and more comprehensively evaluate the user interaction parameters, and provide a more comprehensive and more satisfactory interaction experience.
Specifically, when determining the voice interaction parameter according to the scene feature data, the method comprises the following steps:
The scene characteristic data comprise noise characteristic data and space echo characteristic data;
the voice interaction parameters comprise a recognition threshold, a noise suppression parameter and a voice enhancement coefficient;
The recognition threshold is obtained by:
The noise suppression parameter is obtained by the following formula:
the speech enhancement coefficient is obtained by:
Wherein T represents a recognition threshold, T0 represents a basic recognition threshold, a1 represents a noise recognition weight coefficient, xn represents a noise characteristic value, bn represents a noise standard value, a2 represents a spatial recognition weight coefficient, xr represents a current spatial echo characteristic value, br represents a current spatial echo standard value, N represents a noise suppression parameter, N0 represents a basic noise suppression parameter, b1 represents a noise suppression weight coefficient, b2 represents a spatial suppression weight coefficient, C represents a voice enhancement coefficient, C0 represents a basic voice enhancement coefficient, k1 represents a voice enhancement weight coefficient, and k2 represents a spatial enhancement weight coefficient.
In this embodiment, the recognition threshold is a voice interaction parameter for determining whether an input voice signal should be responded, the noise suppression parameter is used to reduce the influence of background noise on voice recognition, and the voice enhancement coefficient is used to improve the quality of the voice signal and ensure that a user command can still be clearly recognized in a noisy environment. By adjusting these parameters, the user's voice command can be recognized more accurately, and good interactive performance can be maintained even in a high-noise specific scene.
In this embodiment, the basic recognition threshold refers to a threshold set in advance in an ideal environment without noise and echo. The basic noise suppression parameters and basic speech enhancement coefficients are also set under similar ideal conditions, and they serve as a reference point for adjustment according to scene feature data in practical applications. By the method, a plurality of different environments can be adapted, so that more stable and reliable voice interaction experience is provided in the smart home scene.
In this embodiment, a1 represents a noise recognition weight coefficient, a1 is preferably between 0.1 and 0.3, a larger value thereof indicates a larger influence of noise characteristics on the recognition threshold value, a2 represents a spatial recognition weight coefficient, and a2 is preferably between 0.1 and 0.3, a larger value thereof indicates a larger influence of spatial echo characteristics on the recognition threshold value. Similarly, b1 and b2 represent noise suppression weight coefficients and spatial suppression weight coefficients, respectively, which are also preferably in the range of 0.1 to 0.3 to ensure that the noise suppression parameters and speech enhancement coefficients can be effectively adjusted to accommodate different noise and echo environments. k1 and k2 represent the weight coefficient of the voice enhancement respectively, and the preferred range is 0.1 to 0.3, so as to ensure that the voice enhancement coefficient can be properly adjusted according to the actual conditions of noise and echo, thereby improving the definition and the identifiability of the voice signal. Through reasonable selection and adjustment of the weight coefficients, various specific scenes can be flexibly dealt with, and more accurate voice recognition and better interactive experience are realized.
Specifically, comparing the test score with a test score threshold, and judging whether to correct the voice interaction parameter according to the comparison result, wherein the method comprises the following steps:
The test score is obtained by:
Wherein S represents a test score, c1 represents a weight coefficient of the speech recognition accuracy, rmax represents a maximum value of the speech recognition accuracy, R1 represents the speech recognition accuracy, c2 represents a weight coefficient of the response time, T2 represents the response time, tmin represents a minimum value of the response time, c3 represents a weight coefficient of the response accuracy, amax represents a maximum value of the response accuracy, and A3 represents the response accuracy.
In the present embodiment, c1 represents a weight coefficient of the speech recognition accuracy, preferably 0.4, c2 represents a weight coefficient of the response time, preferably 0.3, and c3 represents a weight coefficient of the response accuracy, preferably 0.3. Such weight assignment ensures that speech recognition accuracy is a greater percentage of the test score, while also not ignoring the importance of response time and response accuracy.
It will be appreciated that determining the test score based on speech recognition accuracy, response time, and response accuracy enables a comprehensive assessment of the performance of a speech interaction test. The speech recognition accuracy reflects the understanding capability of the user speech instruction, the response time reflects the speed of processing the instruction, and the response accuracy pays attention to the accuracy of executing the instruction. By integrating these three indices, a comprehensive test score can be obtained, thereby more accurately reflecting the overall performance.
In this embodiment, the test score threshold is set based on historical test data and user feedback. By analyzing the historical test data, a reasonable threshold range can be determined, and the test score can be ensured to be real and reliable. At the same time, user feedback is also an important reference factor, which can help to adjust the threshold value to better conform to the actual experience and expectations of the user.
Specifically, the test score is compared with a test score threshold, and when judging whether to correct the voice interaction parameter according to the comparison result, the method further comprises the following steps:
when the test score is larger than the test score threshold, judging that the voice interaction parameter is not corrected;
and when the test score is smaller than or equal to the test score threshold, the voice interaction parameter is judged to be corrected.
It can be understood that by setting a reasonable test score threshold, the voice interaction parameters can be flexibly adjusted according to the actual test conditions. When the test score exceeds the threshold, the current voice interaction parameters can meet the requirements of users and environments, and adjustment is not needed. Conversely, if the test score is less than or equal to the threshold, it is indicated that the desired interaction effect cannot be achieved under the current parameter setting, and at this time, the voice interaction parameter needs to be modified.
Specifically, comparing the real-time voice characteristic value with the historical voice data, determining a correction coefficient according to the comparison result, and correcting the voice interaction parameter according to the correction coefficient, wherein when the corrected voice interaction parameter is obtained, the method comprises the following steps:
Calculating the maximum similarity between the real-time voice characteristic value and the historical voice data, comparing the maximum similarity with a maximum similarity threshold value, and determining the correction coefficient according to the comparison result;
the maximum similarity is obtained by the following formula:
Wherein M max represents the maximum similarity, X i represents the ith real-time speech feature vector, Y j represents the jth historical speech feature vector, M represents the number of real-time speech feature vectors, and n represents the number of historical speech feature vectors.
It can be seen that the real-time speech feature vector refers to the feature vector extracted from the current speech data, and the historical speech feature vector refers to the feature vector collected and stored in the previous speech interaction test. The matching degree of the current voice data and the historical data can be evaluated by calculating the maximum similarity between the real-time voice feature vector and the historical voice feature vector.
Specifically, comparing the maximum similarity with a maximum similarity threshold, and determining the correction coefficient according to the comparison result includes:
Comparing the maximum similarity with a first maximum similarity threshold and a second maximum similarity threshold, and determining the correction coefficient according to the comparison result, wherein the first maximum similarity threshold is smaller than the second maximum similarity threshold;
Setting a correction coefficient interval, wherein the correction coefficient interval comprises a first correction coefficient, a second correction coefficient and a third correction coefficient;
When the maximum similarity is smaller than or equal to the first maximum similarity threshold, determining a correction coefficient of the voice interaction parameter as a first correction coefficient, and taking a product value of the first correction coefficient and the voice interaction parameter as the corrected voice interaction parameter;
When the maximum similarity is larger than the first maximum similarity threshold and smaller than or equal to the second maximum similarity threshold, determining a correction coefficient of the voice interaction parameter as a second correction coefficient, and taking the product value of the second correction coefficient and the voice interaction parameter as the corrected voice interaction parameter;
And when the maximum similarity is greater than the second maximum similarity threshold, determining a correction coefficient of the voice interaction parameter as a third correction coefficient, and taking the product value of the third correction coefficient and the voice interaction parameter as the corrected voice interaction parameter.
It can be understood that by setting different similarity threshold values and correction coefficient intervals, the voice interaction parameters are flexibly adjusted according to the matching degree of the real-time voice data and the historical data. When the similarity between the real-time voice data and the historical data is low, a larger correction coefficient is adopted to obviously adjust the voice interaction parameters, so that the adaptability and the accuracy of the test are improved. On the contrary, if the similarity is higher, the current voice interaction parameters are indicated to be suitable, and at the moment, smaller correction coefficients are adopted for fine adjustment so as to maintain the stability and continuity of the test. In this way, it can be ensured that a high quality speech interaction experience can be provided in a variety of specific scenarios.
Specifically, when determining the interaction impact index between the user and the smart home to be tested according to the distance data, the behavior data and the equipment state data, the method comprises the following steps:
the interaction impact index is obtained by the following formula:
Wherein I represents an interaction impact index, D represents distance data, B represents behavior data, E represents equipment state data, alpha 1 represents a distance weight coefficient, alpha 2 represents a behavior weight coefficient, alpha 3 represents an equipment state weight coefficient, beta 1 represents a first impact coefficient, beta 2 represents a second impact coefficient, and beta 3 represents a third impact coefficient.
It will be appreciated that in the present embodiment, α1, α2, and α3 represent weight coefficients of distance, behavior, and device status data, respectively, and their selection is based on an intensive study of the extent to which these factors affect during the voice interaction. For example, the weight coefficient α1 of the distance data is preferably 0.3, because the distance between the user and the device directly affects the strength and clarity of the voice signal, and the closer the distance is, the easier the voice signal is to be accurately captured. The weight coefficient α2 of the behavior data is preferably 0.4 because the behavior pattern of the user, such as the speaking speed, the volume, etc., has a significant influence on the accuracy of voice recognition. The weight coefficient α3 of the device status data is preferably 0.3, because the operation status of the smart home device, such as whether it is in a mute mode, whether music is being played, etc., also affects the voice interaction. β1, β2, β3 represent first, second, and third influence coefficients, respectively, related to distance, behavior, and device state data, and are set to further refine the calculation of the interaction impact index. For example, β1 may be set to 0.5 to emphasize the importance of distance data in interaction, β2 may be set to 0.3 to represent a moderate degree of behavioral data on interaction, and β3 may be set to 0.2 to indicate a lesser degree of device status data on interaction. Through the setting of the weight coefficient and the influence coefficient, the interaction influence index can be ensured to accurately reflect the interaction condition between the user and the intelligent household equipment, so that a scientific basis is provided for the adjustment of the voice interaction parameters. The adjustment mechanism based on data driving not only improves the adaptability and individuation level of the voice interaction test, but also is beneficial to better meeting the specific requirements of users in actual use.
Specifically, when judging whether to compensate the corrected voice interaction parameter according to the interaction impact index, the method includes:
the interaction influence index is subjected to difference with an interaction influence index threshold value, and an index difference value is obtained;
comparing the index difference value with an index difference value threshold value, and judging whether to compensate the corrected voice interaction parameter according to the comparison result;
when the index difference value is larger than or equal to the index difference value threshold value, the correction voice interaction parameter is judged to be compensated;
And when the index difference value is smaller than the index difference value threshold value, judging that the correction voice interaction parameter is not compensated.
Specifically, determining a compensation coefficient according to the interaction impact index, and compensating the corrected voice interaction parameter according to the compensation coefficient includes:
Comparing the index difference value threshold with a first index difference value threshold and a second index difference value threshold, and determining the compensation coefficient according to the comparison result, wherein the first index difference value threshold is smaller than the second index difference value threshold;
setting a compensation coefficient interval, wherein the compensation coefficient interval comprises a first compensation coefficient, a second compensation coefficient and a third compensation coefficient;
When the index difference value is smaller than or equal to the first index difference value threshold value, determining a compensation coefficient of the corrected voice interaction parameter as a first compensation coefficient, and taking a product value of the first compensation coefficient and the corrected voice interaction parameter as the corrected voice interaction parameter;
When the exponent difference is larger than the first exponent difference threshold and smaller than or equal to the second exponent difference threshold, determining a compensation coefficient of the corrected voice interaction parameter as a second compensation coefficient, and taking a product value of the second compensation coefficient and the corrected voice interaction parameter as the corrected voice interaction parameter;
and when the exponent difference is larger than the second exponent difference threshold, determining a compensation coefficient of the corrected voice interaction parameter as a third compensation coefficient, and taking the product value of the third compensation coefficient and the corrected voice interaction parameter as the compensated voice interaction parameter.
It can be understood that by setting different exponent difference thresholds and compensation coefficient intervals, the corrected voice interaction parameters can be flexibly compensated according to the difference between the interaction effect exponent and the thresholds. When the difference value between the interaction impact index and the threshold value is larger, a larger compensation coefficient is adopted to obviously adjust the correction parameters, so that the adaptability and individuation level of voice interaction are improved. On the contrary, if the difference is smaller, it indicates that the current correction parameter is suitable, and at this time, a smaller compensation coefficient is adopted for fine tuning, so as to maintain stability and continuity of voice interaction. In this way, it can be ensured that a high quality speech interaction experience can be provided in a variety of specific scenarios.
Referring to fig. 2, in some embodiments of the present application, an intelligent home voice interaction testing device for specific scene recognition is provided, including:
The system comprises a determining module, a voice interaction testing module, a determining module and a testing module, wherein the determining module is configured to determine a special scene to be tested, extract scene characteristic data of the special scene to be tested, and determine voice interaction parameters according to the scene characteristic data;
the first judging module is configured to compare the test score with a test score threshold value and judge whether to correct the voice interaction parameter according to the comparison result;
The correction module is configured to perform feature extraction on the real-time voice data when the voice interaction parameters are determined to be corrected, so as to obtain real-time voice feature values; comparing the real-time voice characteristic value with historical voice data, determining a correction coefficient according to a comparison result, and correcting the voice interaction parameter according to the correction coefficient to obtain a corrected voice interaction parameter;
the second judging module is configured to acquire distance data between a user and the intelligent home to be tested, behavior data of the user and equipment state data of the intelligent home to be tested after the corrected voice interaction parameters are determined, and determine interaction influence indexes between the user and the intelligent home to be tested according to the distance data, the behavior data and the equipment state data;
And the compensation module is configured to determine a compensation coefficient according to the interaction influence index when the correction voice interaction parameter is judged to be compensated, and compensate the correction voice interaction parameter according to the compensation coefficient to obtain the compensation voice interaction parameter.
It will be appreciated by those skilled in the art that embodiments of the application may be provided as methods, systems, or computer program products. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flowchart and/or block of the flowchart illustrations and/or block diagrams, and combinations of flowcharts and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Finally, it should be noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the above embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made to the specific embodiments of the present invention without departing from the spirit and scope of the present invention, and any modifications and equivalents are intended to be included in the scope of the claims of the present invention.

Claims (9)

1. The intelligent home voice interaction testing method is characterized by comprising the following steps of:
Determining a specialized scene to be tested, extracting scene feature data of the specialized scene to be tested, and determining voice interaction parameters according to the scene feature data, wherein the scene feature data comprises noise feature data and space echo feature data;
Under the specialized scene to be tested, performing voice interaction test on the intelligent home to be tested by adopting real-time voice data to obtain voice interaction data, and obtaining a test score according to the voice interaction data;
Comparing the test score with a test score threshold, and judging whether to correct the voice interaction parameter according to the comparison result;
Comparing the real-time voice characteristic value with the historical voice data, determining a correction coefficient according to the comparison result, and correcting the voice interaction parameter according to the correction coefficient to obtain a corrected voice interaction parameter;
After the corrected voice interaction parameters are determined, acquiring distance data between a user and the intelligent home to be tested, behavior data of the user and equipment state data of the intelligent home to be tested, and determining interaction influence indexes between the user and the intelligent home to be tested according to the distance data, the behavior data and the equipment state data;
When the correction voice interaction parameters are judged to be compensated, a compensation coefficient is determined according to the interaction influence index, and the correction voice interaction parameters are compensated according to the compensation coefficient, so that the compensation voice interaction parameters are obtained;
When determining the interaction influence index between the user and the smart home to be tested according to the distance data, the behavior data and the equipment state data, the method comprises the following steps:
the interaction impact index is obtained by the following formula:
Wherein I represents an interaction impact index, D represents distance data, B represents behavior data, E represents equipment state data, alpha 1 represents a distance weight coefficient, alpha 2 represents a behavior weight coefficient, alpha 3 represents an equipment state weight coefficient, beta 1 represents a first impact coefficient, beta 2 represents a second impact coefficient, and beta 3 represents a third impact coefficient.
2. The smart home voice interaction test method according to claim 1, wherein when determining voice interaction parameters according to the scene feature data, the method comprises:
The recognition threshold is obtained by:
The noise suppression parameter is obtained by the following formula:
the speech enhancement coefficient is obtained by:
Wherein T represents a recognition threshold, T0 represents a basic recognition threshold, a1 represents a noise recognition weight coefficient, xn represents a noise characteristic value, bn represents a noise standard value, a2 represents a spatial recognition weight coefficient, xr represents a current spatial echo characteristic value, br represents a current spatial echo standard value, N represents a noise suppression parameter, N0 represents a basic noise suppression parameter, b1 represents a noise suppression weight coefficient, b2 represents a spatial suppression weight coefficient, C represents a voice enhancement coefficient, C0 represents a basic voice enhancement coefficient, k1 represents a voice enhancement weight coefficient, and k2 represents a spatial enhancement weight coefficient.
3. The smart home voice interaction test method according to claim 1, wherein comparing the test score with a test score threshold, and determining whether to correct the voice interaction parameter according to the comparison result, comprises:
The test score is obtained by:
Wherein S represents a test score, c1 represents a weight coefficient of the speech recognition accuracy, rmax represents a maximum value of the speech recognition accuracy, R1 represents the speech recognition accuracy, c2 represents a weight coefficient of the response time, T2 represents the response time, tmin represents a minimum value of the response time, c3 represents a weight coefficient of the response accuracy, amax represents a maximum value of the response accuracy, and A3 represents the response accuracy.
4. The smart home voice interaction test method according to claim 3, wherein comparing the test score with a test score threshold, and determining whether to correct the voice interaction parameter according to the comparison result, further comprises:
when the test score is larger than the test score threshold, judging that the voice interaction parameter is not corrected;
and when the test score is smaller than or equal to the test score threshold, the voice interaction parameter is judged to be corrected.
5. The smart home voice interaction test method according to claim 1, wherein comparing the real-time voice characteristic value with the historical voice data, determining a correction coefficient according to a comparison result, and correcting the voice interaction parameter according to the correction coefficient, and when obtaining a corrected voice interaction parameter, comprising:
Calculating the maximum similarity between the real-time voice characteristic value and the historical voice data, comparing the maximum similarity with a maximum similarity threshold value, and determining the correction coefficient according to the comparison result;
the maximum similarity is obtained by the following formula:
Wherein M max represents the maximum similarity, X i represents the ith real-time speech feature vector, Y j represents the jth historical speech feature vector, M represents the number of real-time speech feature vectors, and n represents the number of historical speech feature vectors.
6. The smart home voice interaction test method according to claim 5, wherein comparing the maximum similarity with a maximum similarity threshold, and determining the correction coefficient according to the comparison result, comprises:
Comparing the maximum similarity with a first maximum similarity threshold and a second maximum similarity threshold, and determining the correction coefficient according to the comparison result, wherein the first maximum similarity threshold is smaller than the second maximum similarity threshold;
Setting a correction coefficient interval, wherein the correction coefficient interval comprises a first correction coefficient, a second correction coefficient and a third correction coefficient;
When the maximum similarity is smaller than or equal to the first maximum similarity threshold, determining a correction coefficient of the voice interaction parameter as a first correction coefficient, and taking a product value of the first correction coefficient and the voice interaction parameter as the corrected voice interaction parameter;
When the maximum similarity is larger than the first maximum similarity threshold and smaller than or equal to the second maximum similarity threshold, determining a correction coefficient of the voice interaction parameter as a second correction coefficient, and taking the product value of the second correction coefficient and the voice interaction parameter as the corrected voice interaction parameter;
And when the maximum similarity is greater than the second maximum similarity threshold, determining a correction coefficient of the voice interaction parameter as a third correction coefficient, and taking the product value of the third correction coefficient and the voice interaction parameter as the corrected voice interaction parameter.
7. The smart home voice interaction test method according to claim 1, wherein when determining whether to compensate the corrected voice interaction parameter according to the interaction impact index, the method comprises:
the interaction influence index is subjected to difference with an interaction influence index threshold value, and an index difference value is obtained;
comparing the index difference value with an index difference value threshold value, and judging whether to compensate the corrected voice interaction parameter according to the comparison result;
when the index difference value is larger than or equal to the index difference value threshold value, the correction voice interaction parameter is judged to be compensated;
And when the index difference value is smaller than the index difference value threshold value, judging that the correction voice interaction parameter is not compensated.
8. The smart home voice interaction test method according to claim 7, wherein determining a compensation coefficient according to the interaction impact index and compensating the corrected voice interaction parameter according to the compensation coefficient comprises:
Comparing the index difference value threshold with a first index difference value threshold and a second index difference value threshold, and determining the compensation coefficient according to the comparison result, wherein the first index difference value threshold is smaller than the second index difference value threshold;
setting a compensation coefficient interval, wherein the compensation coefficient interval comprises a first compensation coefficient, a second compensation coefficient and a third compensation coefficient;
When the index difference value is smaller than or equal to the first index difference value threshold value, determining a compensation coefficient of the corrected voice interaction parameter as a first compensation coefficient, and taking a product value of the first compensation coefficient and the corrected voice interaction parameter as the corrected voice interaction parameter;
When the exponent difference is larger than the first exponent difference threshold and smaller than or equal to the second exponent difference threshold, determining a compensation coefficient of the corrected voice interaction parameter as a second compensation coefficient, and taking a product value of the second compensation coefficient and the corrected voice interaction parameter as the corrected voice interaction parameter;
and when the exponent difference is larger than the second exponent difference threshold, determining a compensation coefficient of the corrected voice interaction parameter as a third compensation coefficient, and taking the product value of the third compensation coefficient and the corrected voice interaction parameter as the compensated voice interaction parameter.
9. An intelligent home voice interaction testing device applied to the intelligent home voice interaction testing method as set forth in any one of claims 1 to 8, comprising:
The system comprises a determining module, a voice interaction testing module, a determining module and a testing module, wherein the determining module is configured to determine a special scene to be tested, extract scene characteristic data of the special scene to be tested, and determine voice interaction parameters according to the scene characteristic data;
the first judging module is configured to compare the test score with a test score threshold value and judge whether to correct the voice interaction parameter according to the comparison result;
The correction module is configured to perform feature extraction on the real-time voice data when the voice interaction parameters are determined to be corrected, so as to obtain real-time voice feature values; comparing the real-time voice characteristic value with historical voice data, determining a correction coefficient according to a comparison result, and correcting the voice interaction parameter according to the correction coefficient to obtain a corrected voice interaction parameter;
the second judging module is configured to acquire distance data between a user and the intelligent home to be tested, behavior data of the user and equipment state data of the intelligent home to be tested after the corrected voice interaction parameters are determined, and determine interaction influence indexes between the user and the intelligent home to be tested according to the distance data, the behavior data and the equipment state data;
And the compensation module is configured to determine a compensation coefficient according to the interaction influence index when the correction voice interaction parameter is judged to be compensated, and compensate the correction voice interaction parameter according to the compensation coefficient to obtain the compensation voice interaction parameter.
CN202510370640.XA 2025-03-27 2025-03-27 Smart home voice interaction testing method and device Active CN120220649B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202510370640.XA CN120220649B (en) 2025-03-27 2025-03-27 Smart home voice interaction testing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202510370640.XA CN120220649B (en) 2025-03-27 2025-03-27 Smart home voice interaction testing method and device

Publications (2)

Publication Number Publication Date
CN120220649A CN120220649A (en) 2025-06-27
CN120220649B true CN120220649B (en) 2025-09-02

Family

ID=96118067

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202510370640.XA Active CN120220649B (en) 2025-03-27 2025-03-27 Smart home voice interaction testing method and device

Country Status (1)

Country Link
CN (1) CN120220649B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9899021B1 (en) * 2013-12-20 2018-02-20 Amazon Technologies, Inc. Stochastic modeling of user interactions with a detection system
CN108550365A (en) * 2018-02-01 2018-09-18 北京云知声信息技术有限公司 The threshold adaptive method of adjustment of offline speech recognition

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0511795A (en) * 1991-07-01 1993-01-22 Fujitsu Ten Ltd Voice recognition device
JPH05265484A (en) * 1992-03-19 1993-10-15 Fujitsu Ltd Variable threshold voice recognition device
JP3926242B2 (en) * 2002-09-19 2007-06-06 富士通株式会社 Spoken dialogue system, program for spoken dialogue, and spoken dialogue method
WO2012137263A1 (en) * 2011-04-08 2012-10-11 三菱電機株式会社 Voice recognition device and navigation device
EP2707872A2 (en) * 2011-05-12 2014-03-19 Johnson Controls Technology Company Adaptive voice recognition systems and methods
TWI601032B (en) * 2013-08-02 2017-10-01 晨星半導體股份有限公司 Controller for voice-controlled device and associated method
CN109469969A (en) * 2018-10-25 2019-03-15 珠海格力电器股份有限公司 Environment correction method and device based on voice air conditioner
CN110827821B (en) * 2019-12-04 2022-04-12 三星电子(中国)研发中心 Voice interaction device and method and computer readable storage medium
US11455996B2 (en) * 2020-07-27 2022-09-27 Google Llc Automated assistant adaptation of a response to an utterance and/or of processing of the utterance, based on determined interaction measure
CN117219058B (en) * 2023-11-09 2024-02-06 广州云趣信息科技有限公司 Method, system and medium for improving speech recognition accuracy
CN118658459A (en) * 2024-08-16 2024-09-17 天津大学 Smart home voice interaction testing method and device for specific scene recognition
CN118865936B (en) * 2024-09-24 2024-12-17 深圳市讴旎科技有限公司 Method, system and chip for adaptively adjusting noise reduction mode
CN119601010B (en) * 2025-02-08 2025-06-24 厦门海洋职业技术学院 Intelligent home voice interaction control method and system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9899021B1 (en) * 2013-12-20 2018-02-20 Amazon Technologies, Inc. Stochastic modeling of user interactions with a detection system
CN108550365A (en) * 2018-02-01 2018-09-18 北京云知声信息技术有限公司 The threshold adaptive method of adjustment of offline speech recognition

Also Published As

Publication number Publication date
CN120220649A (en) 2025-06-27

Similar Documents

Publication Publication Date Title
CN107564513B (en) Voice recognition method and device
CN108417224B (en) Method and system for training and recognition of bidirectional neural network model
US11138989B2 (en) Sound quality prediction and interface to facilitate high-quality voice recordings
CN114283795A (en) Training and recognition method of voice enhancement model, electronic equipment and storage medium
US20120130716A1 (en) Speech recognition method for robot
JP7086521B2 (en) Information processing method and information processing equipment
Sehr et al. Reverberation model-based decoding in the logmelspec domain for robust distant-talking speech recognition
CN115062678B (en) Training method of equipment fault detection model, fault detection method and device
Subakan et al. REAL-M: Towards speech separation on real mixtures
CN117789699B (en) Speech recognition method, device, electronic device and computer-readable storage medium
Liang et al. Neural acoustic context field: Rendering realistic room impulse response with neural fields
WO2023001128A1 (en) Audio data processing method, apparatus and device
Barchiesi et al. Reverse engineering of a mix
CN111031463A (en) Microphone array performance evaluation method, device, equipment and medium
CN113870893A (en) Multi-channel double-speaker separation method and system
JP6244297B2 (en) Acoustic score calculation apparatus, method and program thereof
Hernandez-Olivan et al. Interaural time difference loss for binaural target sound extraction
CN120220649B (en) Smart home voice interaction testing method and device
Bovbjerg et al. Self-supervised pretraining for robust personalized voice activity detection in adverse conditions
CN115620739A (en) Speech enhancement method for specified direction, electronic device and storage medium
EP3573352A1 (en) Data processing device and data processing method
CN113270112A (en) Electronic camouflage voice automatic distinguishing and restoring method and system
JP6404780B2 (en) Wiener filter design apparatus, sound enhancement apparatus, acoustic feature quantity selection apparatus, method and program thereof
CN115394298B (en) Speech recognition text punctuation prediction model training method and prediction method
CN115421099A (en) Voice direction of arrival estimation method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant