CN114401384A - Intelligent device audio working mode prompting method and device - Google Patents
Intelligent device audio working mode prompting method and device Download PDFInfo
- Publication number
- CN114401384A CN114401384A CN202111486295.4A CN202111486295A CN114401384A CN 114401384 A CN114401384 A CN 114401384A CN 202111486295 A CN202111486295 A CN 202111486295A CN 114401384 A CN114401384 A CN 114401384A
- Authority
- CN
- China
- Prior art keywords
- information
- sound
- mode
- image
- volume
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 85
- 230000008859 change Effects 0.000 claims description 14
- 238000004590 computer program Methods 0.000 claims description 14
- 230000005236 sound signal Effects 0.000 claims description 14
- 230000002452 interceptive effect Effects 0.000 claims description 10
- 230000001755 vocal effect Effects 0.000 claims description 6
- 238000007781 pre-processing Methods 0.000 claims description 3
- 238000004519 manufacturing process Methods 0.000 claims 3
- 238000005516 engineering process Methods 0.000 abstract description 10
- 238000004891 communication Methods 0.000 abstract description 8
- 238000012544 monitoring process Methods 0.000 abstract description 5
- 230000008569 process Effects 0.000 description 17
- 238000012545 processing Methods 0.000 description 12
- 238000010586 diagram Methods 0.000 description 8
- 230000005540 biological transmission Effects 0.000 description 7
- 230000009471 action Effects 0.000 description 6
- 230000007613 environmental effect Effects 0.000 description 4
- 230000001960 triggered effect Effects 0.000 description 3
- 206010011224 Cough Diseases 0.000 description 2
- 241001465754 Metazoa Species 0.000 description 2
- 230000005674 electromagnetic induction Effects 0.000 description 2
- 230000001815 facial effect Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000002085 persistent effect Effects 0.000 description 2
- 230000003313 weakening effect Effects 0.000 description 2
- 238000010924 continuous production Methods 0.000 description 1
- 230000009849 deactivation Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Images
Classifications
- 
        - H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/15—Conference systems
 
- 
        - G—PHYSICS
- G08—SIGNALLING
- G08B—SIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
- G08B21/00—Alarms responsive to a single specified undesired or abnormal condition and not otherwise provided for
- G08B21/18—Status alarms
- G08B21/24—Reminder alarms, e.g. anti-loss alarms
 
- 
        - G—PHYSICS
- G08—SIGNALLING
- G08B—SIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
- G08B7/00—Signalling systems according to more than one of groups G08B3/00 - G08B6/00; Personal calling systems according to more than one of groups G08B3/00 - G08B6/00
- G08B7/06—Signalling systems according to more than one of groups G08B3/00 - G08B6/00; Personal calling systems according to more than one of groups G08B3/00 - G08B6/00 using electric transmission, e.g. involving audible and visible signalling through the use of sound and light sources
 
- 
        - H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M1/00—Substation equipment, e.g. for use by subscribers
- H04M1/72—Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
- H04M1/724—User interfaces specially adapted for cordless or mobile telephones
- H04M1/72448—User interfaces specially adapted for cordless or mobile telephones with means for adapting the functionality of the device according to specific conditions
- H04M1/72454—User interfaces specially adapted for cordless or mobile telephones with means for adapting the functionality of the device according to specific conditions according to context-related or environment-related conditions
 
- 
        - H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M1/00—Substation equipment, e.g. for use by subscribers
- H04M1/72—Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
- H04M1/724—User interfaces specially adapted for cordless or mobile telephones
- H04M1/72448—User interfaces specially adapted for cordless or mobile telephones with means for adapting the functionality of the device according to specific conditions
- H04M1/72463—User interfaces specially adapted for cordless or mobile telephones with means for adapting the functionality of the device according to specific conditions to restrict the functionality of the device
 
Landscapes
- Engineering & Computer Science (AREA)
- Signal Processing (AREA)
- Human Computer Interaction (AREA)
- Computer Networks & Wireless Communication (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Environmental & Geological Engineering (AREA)
- Business, Economics & Management (AREA)
- Emergency Management (AREA)
- Telephone Function (AREA)
Abstract
本申请公开了一种智能设备音频工作模式提示方法,包括:获取用户的发音部位图像;获取拾音设备拾取的音量信息;根据发音部位图像,或者根据音量信息,判断是否存在需要传送的声音信息;若是,则判断拾音设备的工作模式是否为静音模式;若是,则根据预设的方式,发出静音模式提示信息。本申请通过发音部位图像结合图像识别比对技术来判断是否存在用户需要传送的声音信息,或者通过音量信息结合拾音环境音量监测技术来判断是否存在用户需要传送的声音信息,并根据预设的方式,发出静音模式提示信息来提醒用户目前所使用的智能设备处于拾音设备静音模式,以及时解除静音模式,从而提升沟通的流畅程度和效率。
The present application discloses a method for prompting an audio working mode of an intelligent device, which includes: acquiring an image of a user's pronunciation part; acquiring volume information picked up by a sound pickup device; and judging whether there is sound information to be transmitted according to the pronunciation part image or volume information ; if yes, then judge whether the working mode of the sound pickup device is the silent mode; if so, send out the silent mode prompt information according to the preset method. This application uses the pronunciation part image combined with the image recognition and comparison technology to determine whether there is sound information that the user needs to transmit, or uses the volume information combined with the sound pickup environment volume monitoring technology to determine whether there is sound information that the user needs to transmit. In this way, a silent mode prompt message is issued to remind the user that the smart device currently used is in the silent mode of the pickup device, and the silent mode is released in time, thereby improving the smoothness and efficiency of communication.
Description
技术领域technical field
本申请涉及声音处理技术领域,具体涉及一种智能设备音频工作模式提示方法和一种智能设备音频工作模式提示装置,以及另外一种智能设备音频工作模式提示方法,一种电子设备和一种计算机存储介质。The present application relates to the technical field of sound processing, in particular to a method for prompting an audio working mode of a smart device, an apparatus for prompting an audio working mode of a smart device, and another method for prompting an audio working mode of a smart device, an electronic device and a computer storage medium.
背景技术Background technique
目前,手机、笔记本电脑为代表的智能设备越来越多的渗透到生活、工作中,采用智能设备进行远程沟通,如召开远程会议等,已经成为一种常见的工作方式。At present, smart devices represented by mobile phones and laptops have increasingly penetrated into life and work. Using smart devices for remote communication, such as holding remote conferences, has become a common way of working.
在基于智能终端设备的远程会议中,经常会遇到需要提醒用户的情况,例如,在音视频会议(或者仅仅使用音频的音频会议)过程中,常常需要让某一个人发言,而其他人员需要静音以避免引入干扰音源,然而,处于静音模式的终端又可能需要进入会议讨论或者发言状态,而使用者却很可能忘记自己的终端已经处于静音模式--即切断了声音传递功能的音频工作模式,导致该用户在不停的说话,但是其他音视频参会方没有听到任何声音,造成远程会议体验不顺畅,影响会议的进展。In remote conferences based on smart terminal devices, it is often necessary to remind users. For example, in the process of audio and video conferences (or audio conferences only using audio), it is often necessary to let one person speak, while other personnel need to Mute to avoid the introduction of interfering audio sources. However, the terminal in the silent mode may need to enter the conference discussion or speech state, and the user may forget that the terminal is in the silent mode - that is, the audio working mode that cuts off the sound transmission function. , causing the user to keep talking, but other audio and video participants did not hear any sound, resulting in an unsmooth remote conference experience and affecting the progress of the conference.
发明内容SUMMARY OF THE INVENTION
本申请实施例提供一种拾音设备工作模式提示方法,以解决现有技术中终端设备在需要传送声音信息时却工作在静音模式的问题。本申请实施例还提供一种智能设备音频工作模式提示装置,以及另外一种智能设备音频工作模式提示方法,一种电子设备和一种计算机存储介质。An embodiment of the present application provides a method for prompting a working mode of a sound pickup device, so as to solve the problem in the prior art that a terminal device works in a silent mode when it needs to transmit sound information. Embodiments of the present application further provide an apparatus for prompting an audio working mode of a smart device, another method for prompting an audio working mode of a smart device, an electronic device, and a computer storage medium.
本申请实施例提供的智能设备音频工作模式提示方法,包括:The method for prompting the audio working mode of the smart device provided by the embodiment of the present application includes:
获取用户的发音部位图像;Obtain the image of the user's pronunciation part;
获取拾音设备拾取的音量信息;Get the volume information picked up by the sound pickup device;
根据所述发音部位图像,或者根据所述音量信息,判断是否存在需要传送的声音信息;若是,则进入下一步;According to the pronunciation part image, or according to the volume information, determine whether there is sound information that needs to be transmitted; if so, enter the next step;
判断所述拾音设备的工作模式是否为静音模式;若是,则进入下一步;Determine whether the working mode of the sound pickup device is the silent mode; if so, enter the next step;
根据预设的方式,发出静音模式提示信息。According to the preset method, the prompt message of silent mode is sent out.
可选的,所述获取用户的发音部位图像,包括:Optionally, the acquiring an image of the user's pronunciation part includes:
获得摄像设备采集的用户的发音部位图像;Obtain an image of the user's pronunciation part collected by the camera device;
从所述发音部位图像,提取所述发音部位图像中的用户的发音部位的第一图像信息和第二图像信息;所述第一图像信息与所述第二图像信息之间具有预定的获取时间间隔;From the pronunciation part image, extract first image information and second image information of the pronunciation part of the user in the pronunciation part image; there is a predetermined acquisition time between the first image information and the second image information interval;
所述根据所述发音部位图像,判断是否存在需要传送的声音信息;若是,则进入下一步,包括:Described according to the described pronunciation part image, judge whether there is sound information that needs to be transmitted; If yes, then enter the next step, including:
将所述用户的一段规定时间内的发音部位的多组第一图像信息和第二图像信息进行比较,判断发音部位是否存在持续变化;若是,则确定存在需要传送的声音信息,并进入下一步。Compare the multiple groups of first image information and the second image information of the pronunciation part within a specified period of time of the user, and judge whether the pronunciation part has continuous change; if so, determine that there is sound information that needs to be transmitted, and enter the next step .
可选的,所述获取拾音设备拾取的音量信息,包括:Optionally, the acquiring the volume information picked up by the sound pickup device includes:
获取拾音设备采集的声音信号;Obtain the sound signal collected by the pickup device;
对所述声音信号进行预处理,提取发声信息;Preprocessing the sound signal to extract the sounding information;
对所述发声信息的音量进行估计,获取发声音量信息;Estimate the volume of the sounding information, and obtain the sounding volume information;
将所述发声音量信息确定为所述拾音设备拾取的音量信息。The sound volume information is determined as volume information picked up by the sound pickup device.
可选的,所述根据所述音量信息,判断是否存在需要传送的声音信息;包括:Optionally, according to the volume information, judging whether there is sound information that needs to be transmitted; including:
获得发声音量信息中的音量值和对应音量值的声音持续时间;Obtain the volume value in the sound volume information and the sound duration corresponding to the volume value;
将所述音量值与预设音量值阈值进行比较,判断所述音量值是否满足所述预设音量值阈值;若满足,则进一步判断所述声音持续时间是否满足预设声音持续时间阈值;若是,则判断存在需要传送的声音信息。Compare the volume value with a preset volume value threshold, and determine whether the volume value satisfies the preset volume value threshold; if so, further determine whether the sound duration satisfies the preset sound duration threshold; , it is judged that there is voice information that needs to be transmitted.
可选的,还包括:Optionally, also include:
获得发声信息中的声纹特征信息;Obtain the voiceprint feature information in the vocalization information;
将所述声纹特征信息与预设的声纹特征标准进行比对,确定是否存在需要传送的声音信息。The voiceprint feature information is compared with a preset voiceprint feature standard to determine whether there is voice information that needs to be transmitted.
可选的,所述判断所述拾音设备的工作模式是否为静音模式,包括:Optionally, the judging whether the working mode of the sound pickup device is a silent mode includes:
获取相关应用中拾音设备的状态设定,若设定为麦克风关闭的状态,则为静音模式。Get the status setting of the sound pickup device in the relevant application. If the microphone is set to be off, it is in silent mode.
可选的,所述静音模式提示信息包括如下信息中的至少一种:提示灯闪烁信息、扬声器播音信息、拾音设备震动信息以及与所述静音模式对应的静音图标信息。Optionally, the silent mode prompt information includes at least one of the following information: prompt light flashing information, speaker broadcasting information, sound pickup equipment vibration information, and mute icon information corresponding to the silent mode.
可选的,还包括:若在预设时间内,未获得针对所述静音图标的触发操作,则将所述静音图标进行放大处理,且将放大处理后的静音图标展示在所述拾音设备的交互界面的居中位置。Optionally, it also includes: if no trigger operation for the mute icon is obtained within a preset time, enlarging the mute icon, and displaying the magnified mute icon on the sound pickup device The center position of the interactive interface.
可选的,在根据预设的方式,发出静音模式提示信息之后,还包括:获得针对所述静音模式提示信息反馈的解除触发操作,根据所述解除触发操作解除所述静音模式;或者,Optionally, after sending out the silent mode prompt information according to a preset method, the method further includes: obtaining a triggering operation for releasing the feedback of the silent mode prompt information, and releasing the silent mode according to the canceling trigger operation; or,
在预设时间内,未获得针对所述静音模式提示信息反馈的解除触发操作,则触发生成针对解除静音模式的解除信号,并根据所述解除信号解除所述静音模式。If no triggering operation for releasing the prompt information feedback of the mute mode is obtained within a preset time, a release signal for releasing the mute mode is triggered to be generated, and the mute mode is released according to the release signal.
可选的,包括前置步骤:判断拾音设备是否处于静音模式,若是,则获取用户的发音部位图像,以及获取拾音设备拾取的音量信息。Optionally, it includes a pre-step: judging whether the sound-picking device is in a silent mode, and if so, acquiring an image of the user's pronunciation part, and acquiring volume information picked up by the sound-picking device.
本申请同时提供一种智能设备音频工作模式提示方法,包括:The present application also provides a method for prompting an audio working mode of a smart device, including:
获取拾音设备的工作模式为发言模式;Obtain the working mode of the pickup device as the speaking mode;
获取用户的发音部位图像;Obtain the image of the user's pronunciation part;
获取拾音设备拾取的音量信息;Get the volume information picked up by the sound pickup device;
根据所述发音部位图像,或者根据所述音量信息,判断是否存在需要传送的声音信息;若否,则进入下一步;According to the pronunciation part image, or according to the volume information, determine whether there is sound information that needs to be transmitted; if not, enter the next step;
根据预设的方式,将所述发言模式切换为静音模式。According to a preset manner, the speaking mode is switched to the mute mode.
本申请同时提供一种智能设备音频工作模式提示装置,包括:The application also provides an audio working mode prompting device for an intelligent device, including:
图像获取单元,用于获取用户的发音部位图像;an image acquisition unit for acquiring an image of a user's pronunciation part;
音量信息获取单元,用于获取拾音设备拾取的音量信息;a volume information acquisition unit, used for acquiring the volume information picked up by the sound pickup device;
声音信息判断单元,用于根据所述发音部位图像,或者根据所述音量信息,判断是否存在需要传送的声音信息;若是,则进入下一步;The sound information judgment unit is used to judge whether there is sound information that needs to be transmitted according to the image of the pronunciation part or according to the volume information; if so, enter the next step;
静音模式判断单元,用于判断所述拾音设备的工作模式是否为静音模式;若是,则进入下一步;A silent mode judgment unit, used for judging whether the working mode of the sound pickup device is a silent mode; if yes, then enter the next step;
静音模式提示信息发出单元,用于根据预设的方式,发出静音模式提示信息。The silent mode prompt information sending unit is used to send out the silent mode prompt information according to a preset method.
本申请同时提供一种智能设备音频工作模式提示装置,包括:The application also provides an audio working mode prompting device for an intelligent device, including:
发言模式获取单元,用于获取拾音设备的工作模式为发言模式;a speaking mode acquisition unit, used for acquiring the working mode of the sound pickup device as the speaking mode;
图像获取单元,用于获取用户的发音部位图像;an image acquisition unit for acquiring an image of a user's pronunciation part;
音量信息获取单元,用于获取拾音设备拾取的音量信息;a volume information acquisition unit, used for acquiring the volume information picked up by the sound pickup device;
声音信息判断单元,用于根据所述发音部位图像,或者根据所述音量信息,判断是否存在需要传送的声音信息;若否,则进入下一步;The sound information judgment unit is used to judge whether there is sound information that needs to be transmitted according to the image of the pronunciation part or according to the volume information; if not, then enter the next step;
切换单元,用于根据预设的方式,将所述发言模式切换为静音模式。The switching unit is configured to switch the speaking mode to the mute mode according to a preset manner.
本申请还提供一种电子设备,所述电子设备包括:处理器;存储器,用于存储计算机程序,该计算机程序被处理器运行,执行前述任意一项所述的方法。The present application further provides an electronic device, the electronic device includes: a processor; and a memory for storing a computer program, the computer program being executed by the processor to execute any one of the methods described above.
本申请还提供一种计算机存储介质,,所述计算机存储介质存储有计算机程序,该计算机程序被处理器运行,执行前述任意一项所述的方法。The present application also provides a computer storage medium, where the computer storage medium stores a computer program, and the computer program is executed by a processor to execute any one of the methods described above.
本申请实施例还提供一种计算机存储介质,所述计算机存储介质存储有计算机程序,该计算机程序被处理器运行,执行上述所述的方法。Embodiments of the present application further provide a computer storage medium, where a computer program is stored in the computer storage medium, and the computer program is executed by a processor to execute the above-mentioned method.
与现有技术相比,本申请具有以下优点:Compared with the prior art, the present application has the following advantages:
本申请实施例提供一种智能设备音频工作模式提示方法,包括:获取用户的发音部位图像;获取拾音设备拾取的音量信息;根据所述发音部位图像,或者根据所述音量信息,判断是否存在需要传送的声音信息;若是,则进入下一步;判断所述拾音设备的工作模式是否为静音模式;若是,则进入下一步;根据预设的方式,发出静音模式提示信息。本申请实施例通过发音部位图像来判断是否存在用户需要传送的声音信息,或者通过环境音量监测来判断是否存在需要传送的用户声音信息,并根据预设的方式,发出静音模式提示信息来提醒用户目前所使用的拾音设备处于静音模式,以便用户及时解除静音模式,从而提升电话会议等智能终端远程交流沟通的流畅程度和效率。An embodiment of the present application provides a method for prompting an audio working mode of a smart device, including: acquiring an image of a user's pronunciation part; acquiring volume information picked up by a sound pickup device; Sound information to be transmitted; if yes, go to the next step; judge whether the working mode of the sound pickup device is the silent mode; In this embodiment of the present application, it is judged whether there is sound information that the user needs to transmit by using the image of the pronunciation part, or whether there is sound information that needs to be transmitted by the environmental volume monitoring, and according to a preset method, a silent mode prompt message is sent to remind the user. The currently used sound pickup equipment is in the mute mode, so that the user can release the mute mode in time, thereby improving the fluency and efficiency of remote communication of intelligent terminals such as conference calls.
附图说明Description of drawings
图1是本申请第一实施例提供的智能设备音频工作模式应用场景的示意图;1 is a schematic diagram of an application scenario of an audio working mode of a smart device provided by the first embodiment of the present application;
图2为本申请第一实施例提供的智能设备音频工作模式提示方法的流程图;2 is a flowchart of a method for prompting an audio working mode of a smart device provided by the first embodiment of the present application;
图3为本申请第二实施例提供的智能设备音频工作模式提示装置的示意图;3 is a schematic diagram of a device for prompting an audio working mode of a smart device according to a second embodiment of the present application;
图4为本申请第三实施例提供的智能设备音频工作模式提示方法的流程图;4 is a flowchart of a method for prompting an audio working mode of a smart device provided by the third embodiment of the present application;
图5为本申请第四实施例提供的智能设备音频工作模式提示装置的示意图;5 is a schematic diagram of an apparatus for prompting an audio working mode of a smart device according to a fourth embodiment of the present application;
图6为本申请第五实施例提供的电子设备的示意图。FIG. 6 is a schematic diagram of an electronic device provided by a fifth embodiment of the present application.
具体实施方式Detailed ways
在下面的描述中阐述了很多具体细节以便于充分理解本申请实施例。但是本申请实施例能够以很多不同于在此描述的其它方式来实施,本领域技术人员可以在不违背本申请实施例内涵的情况下做类似推广,因此本申请实施例不受下面公开的具体实施的限制。In the following description, numerous specific details are set forth in order to facilitate a thorough understanding of the embodiments of the present application. However, the embodiments of the present application can be implemented in many other ways different from those described herein, and those skilled in the art can make similar promotions without violating the connotations of the embodiments of the present application. Therefore, the embodiments of the present application are not subject to the specific details disclosed below. Implemented restrictions.
为了使本领域的技术人员更好的理解本申请方案,下面基于本申请提供的拾音设备工作模式提示方法对其实施例的具体应用场景进行详细描述,如图1所示,其为本申请第一实施例提供的应用场景的示意图。In order to enable those skilled in the art to better understand the solution of the present application, the specific application scenarios of the embodiments of the present application will be described in detail below based on the method for prompting the working mode of a sound pickup device provided by the present application, as shown in FIG. 1 , which is the present application. A schematic diagram of an application scenario provided by the first embodiment.
本场景为视频会议场景,用户使用终端设备进行视频会议时,由于其误操作,或者仅仅是由于避免打扰其他发言者,终端设备获得使用者针对静音图标的触发操作,根据该触发操作,将所述静音图标对应的模式切换为静音模式。该静音模式可以是用户自己不知晓的,或者是其发言时忘了恢复正常的。其中,在本场景中,终端设备中包含拾音设备,终端设备的具体形态可以是手机、平板电脑或笔记本电脑、桌面电脑等。This scenario is a video conference scenario. When a user uses a terminal device for a video conference, due to its misoperation, or simply to avoid disturbing other speakers, the terminal device obtains the user's trigger operation for the mute icon. The mode corresponding to the mute icon is switched to mute mode. The mute mode may be unknown to the user, or the user forgets to return to normal when speaking. Wherein, in this scenario, the terminal device includes a sound pickup device, and the specific form of the terminal device may be a mobile phone, a tablet computer, a notebook computer, a desktop computer, or the like.
在进行视频会议时,智能设备处于音视频工作环境(或者称为音视频工作模式,在本申请中,将其称为音视频工作环境,以便与技术主题中的音频工作模式相互区别)或者音频工作环境,所谓音视频工作环境,就是即传输音频信息,又传输视频信息,通常可以称为视频会议,此时,智能设备的视频采集设备(摄像设备)处于工作状态;所述音频工作环境,是指仅仅智能设备仅仅传输音频信息,通常可以称为电话会议。在上述两种工作环境下,本申请中可以采用不同的方式判断是否需要传送声音信息。During a video conference, the smart device is in an audio and video working environment (or called an audio and video working mode, in this application, it is called an audio and video working environment, so as to distinguish it from the audio working mode in the technical topic) or the audio and video working mode. The working environment, the so-called audio and video working environment, is to transmit both audio information and video information, which can usually be called a video conference. At this time, the video capture device (camera device) of the smart device is in a working state; the audio working environment, It means that only the smart device only transmits audio information, which can usually be called a conference call. In the above two working environments, different ways can be used in the present application to judge whether the voice information needs to be transmitted.
在所述音视频工作环境下,只能设备中的摄像设备处于工作状态,此时,摄像设备在其采集的视频信息中,通常包括用户的发音部位图像,用户的发音部位主要是指嘴部,相应的发音部位图像主要是指视频中的嘴部区域的视频。在摄像设备获得用户的发音部位图像后,会将该用户的发音部位图像发送给智能设备的图像处理器,图像处理器对采集到的所述发音部位图像进行处理。具体的,提取所述发音部位图像中的用户的发音部位在一个合适的时间间隔之间获得的第一图像信息和第二图像信息,并将第一图像信息和第二图像信息进行比较,判断发音部位是否存在变化,即判断第一图像信息和第二图像信息是否存在图像差异信息,若存在图像差异信息,则判断出则确定存在需要传送的声音信息。上述第一图像信息和第二图像信息可以不断采集和比对,形成一个连续过程,并可以设置合理的判断流程,通过一段时间中,在所述发音部位图像中截取的图像信息的彼此比较,获得是否存在需要传送的声音信息的准确判断。上述图像处理器,一般是指智能设备中对图像进行分析处理的软件程序,当然不排除实现该分析处理功能的专用硬件。In the audio and video working environment, only the camera device in the device is in a working state. At this time, the video information collected by the camera device usually includes the image of the user's pronunciation part, and the user's pronunciation part mainly refers to the mouth. , the corresponding pronunciation part image mainly refers to the video of the mouth area in the video. After the camera device obtains the image of the voice part of the user, the image of the voice part of the user is sent to the image processor of the smart device, and the image processor processes the collected image of the voice part. Specifically, extract the first image information and the second image information obtained by the user's pronunciation part in the pronunciation part image between a suitable time interval, and compare the first image information and the second image information to determine Whether there is a change in the pronunciation part is to determine whether there is image difference information between the first image information and the second image information, and if there is image difference information, it is determined that there is sound information that needs to be transmitted. The above-mentioned first image information and second image information can be continuously collected and compared to form a continuous process, and a reasonable judgment process can be set. Obtain an accurate judgment of whether there is audio information that needs to be transmitted. The above-mentioned image processor generally refers to a software program in a smart device that analyzes and processes an image, and certainly does not exclude dedicated hardware for implementing the analysis and processing function.
在所述音频工作环境下,摄像设备不工作,但是拾音设备仍然工作;但是,根据当前的音频工作模式不同,拾音设备采集的信号的处理和传输情况不同;具体而言,如果智能设备的拾音设备处于静音模式,则拾音设备采集的信号不会被解码处理,但是,拾音设备本身仍然会工作。这里需要注意,在很多情况下,智能设备的静音模式,是指智能设备处于不被外部来电信号打扰的工作模式,然而,本申请的静音模式,是针对远程会议等场景下的拾音设备的静音模式,在此场景下,所述静音模式,是指拾音设备处于静音模式,即操作界面上的麦克风标记被关闭(如果操作界面由该标记的话),使用者的声音不会被传输到其他参加远程会议的智能设备的情况。在音频工作环境下,不管是否处于静音模式,所述拾音设备仍然可以拾取音量信息,语音处理器可以在获得音量信息的音量值和对应音量值的声音持续时间后,根据预定的判断标准——具体而言,包括音量值是否达到预设的音量值阈值,以及在满足上述条件的情况下,判断音量值阈值是否达到预设的声音持续时间阈值,如果也满足,则可以判断目前需要传送声音信息。In the audio working environment, the camera device does not work, but the sound pickup device still works; however, according to the current audio working mode, the processing and transmission of the signals collected by the sound pickup device are different; If the pickup device is in silent mode, the signal collected by the pickup device will not be decoded, but the pickup device itself will still work. It should be noted here that in many cases, the silent mode of a smart device refers to a working mode in which the smart device is not disturbed by external incoming call signals. However, the silent mode of this application is aimed at the sound pickup device in scenarios such as remote conferences. Silent mode, in this scenario, the silent mode means that the sound pickup device is in silent mode, that is, the microphone mark on the operation interface is turned off (if the operation interface is marked with this mark), and the user's voice will not be transmitted to the The case of other smart devices participating in remote meetings. In the audio working environment, regardless of whether it is in the silent mode, the sound pickup device can still pick up the volume information, and the speech processor can obtain the volume value of the volume information and the sound duration corresponding to the volume value, according to a predetermined judgment standard— —Specifically, including whether the volume value reaches the preset volume value threshold, and if the above conditions are met, determine whether the volume value threshold reaches the preset sound duration threshold, and if it is also satisfied, then it can be judged that the current need to transmit sound information.
不论上述何种工作环境,如果确定需要传送声音信息,则进一步判断拾音设备是否工作在静音模式,如果是,则采用预定的方式提醒用户结束静音模式。Regardless of the above working environment, if it is determined that the sound information needs to be transmitted, it is further judged whether the sound pickup device is working in the silent mode, and if so, a predetermined method is used to remind the user to end the silent mode.
具体的,其会使用智能设备的灯光元件发出闪烁动作;或者,使用智能设备的扬声器发出语音信息;再或者,使用智能设备的震动设备发出震动动作;或者,在智能设备的交互界面上展示静音模式对应的静音图标信息。通过上述静音模式提示信息来提示用户当前视频会议环境下的工作模式为静音模式。当用户看到这些静音模式提示信息时,其会及时发现视频会议环境下的工作模式为静音模式,并及时解除该静音模式。Specifically, it will use the lighting element of the smart device to send out a flashing action; or, use the speaker of the smart device to send out voice information; or, use the vibrating device of the smart device to send out a vibrating action; or, display mute on the interactive interface of the smart device Mute icon information corresponding to the mode. The above-mentioned silent mode prompt information is used to prompt the user that the working mode in the current video conference environment is the silent mode. When the user sees these mute mode prompt messages, he or she will find out that the working mode in the video conference environment is the mute mode in time, and cancel the mute mode in time.
本场景通过图像识别比对技术和环境音量监测技术,来判断用户是否在发言,并通过拾音设备软件交互界面的提醒,同时通过硬件的声音、灯光、震动等方式来提醒用户目前所处的视频会议处于静音模式,以让用户或者拾音设备自动解除静音模式,从而提升沟通的流畅程度和效率。This scene uses image recognition and comparison technology and environmental volume monitoring technology to determine whether the user is speaking, and reminds the user through the software interactive interface of the pickup device, and reminds the user of the current location through hardware sound, light, vibration, etc. The video conference is in silent mode, so that the user or the pickup device can automatically unmute the mode, thereby improving the smoothness and efficiency of communication.
与上述场景相对应的,本申请第一实施例提供了一种智能设备音频工作模式提示方法,如图2所示,图2为本申请第一实施例提供的一种智能设备音频工作模式提示方法的流程图。所述方法包括如下步骤:Corresponding to the above scenario, the first embodiment of the present application provides a method for prompting an audio working mode of a smart device, as shown in FIG. 2 , which is an audio working mode prompt for a smart device provided by the first embodiment of the present application. Flowchart of the method. The method includes the following steps:
步骤S201,获取用户的发音部位图像。Step S201, acquiring an image of a user's pronunciation part.
在本步骤中,音视频工作环境至少包括视频会议环境和视频聊天环境。音频工作环境至少包括语音会议环境、语音聊天环境和电话通信环境。In this step, the audio and video working environment includes at least a video conference environment and a video chat environment. The audio working environment includes at least a voice conferencing environment, a voice chatting environment and a telephone communication environment.
在音视频工作环境下,摄像设备处于开启状态,并且,摄像设备一般情况会采集用户的面部信息,其中自然包括发音部位图像。其中,用户的发音部位主要是指嘴部,相应的发音部位图像主要是指针对面部的视频中的嘴部部分。在摄像设备获得用户的发音部位图像后,会将该用户的发音部位图像发送给智能设备中设置的图像处理器,图像处理器将对获得的发音部位图像进行进一步处理。所述图像处理器,在本实施例中专门指对发音部位图像进行分析处理以便获得是否存在需要传送的声音的软件程序或者进程,当然,不排除用专门的硬件实现。In the audio and video working environment, the camera device is turned on, and the camera device generally collects the user's facial information, which naturally includes the image of the pronunciation part. Among them, the pronunciation part of the user mainly refers to the mouth, and the corresponding pronunciation part image mainly refers to the mouth part in the video for the face. After the camera device obtains the image of the voice part of the user, it will send the image of the voice part of the user to the image processor set in the smart device, and the image processor will further process the obtained image of the voice part. The image processor in this embodiment specifically refers to a software program or process that analyzes and processes an image of a speech part to obtain whether there is a sound to be transmitted. Of course, it is not excluded to implement it with special hardware.
步骤S202,获取拾音设备拾取的音量信息。Step S202, acquiring volume information picked up by the sound pickup device.
在音频工作环境下,获取拾音设备拾取的音量信息。拾音设备,也可以称为拾音头,一般是指包括麦克风(俗称咪头)和音频放大电路构成,麦克风是通过电磁感应将音频震动转化为电信号的设备,一般为被动设备,即无需供电即可工作,所述音频放大电路则用于将麦克风采集转换的电信号进行整形放大,传送给后续元件进行编解码和传输。如前所述,即使已经将麦克风静音,其实质含义是拾音设备后续的编解码和传输程序被停止,但是拾音设备本身仍然在采集周围环境的音量信息。In the audio working environment, obtain the volume information picked up by the sound pickup device. Pickup equipment, also known as pickup head, generally refers to the composition of a microphone (commonly known as a microphone head) and an audio amplifier circuit. A microphone is a device that converts audio vibrations into electrical signals through electromagnetic induction. The audio amplifying circuit is used to shape and amplify the electrical signal collected and converted by the microphone, and transmit it to subsequent components for encoding, decoding and transmission. As mentioned above, even if the microphone has been muted, the essential meaning is that the subsequent codec and transmission procedures of the sound pickup device are stopped, but the sound pickup device itself is still collecting volume information of the surrounding environment.
在本步骤中,获取拾音设备拾取的音量信息,具体包括,获取拾音设备采集的声音信号,需要对所述声音信号进行预处理,提取发声信息,并对所述发声信息的音量进行估计,获取发声音量信息,将所述发声音量信息确定为所述拾音设备拾取的音量信息。该过程是为了将目标对象的发声信息和环境中的其他声音信息区别开。具体的实现方式在本领域有多种可能性,尤其是目标对象为自然人时,将自然人的声音信息和其他的环境中的声音信息区别开,是可以实现的技术。In this step, acquiring the volume information picked up by the sound pickup device specifically includes: acquiring the sound signal collected by the sound pickup device, the sound signal needs to be preprocessed, the vocalization information is extracted, and the volume of the vocalization information is estimated , obtain sound volume information, and determine the sound volume information as the volume information picked up by the sound pickup device. This process is to distinguish the vocal information of the target object from other acoustic information in the environment. There are many possibilities for specific implementations in the art, especially when the target object is a natural person, it is an achievable technology to distinguish the sound information of the natural person from the sound information in other environments.
在本步骤中,由于音量信息中不仅涉及有用户的声音,还可能涉及有其它用户或者是其它设备发出的声音,为了准确地确定出该声音是否为当前用户的声音,上述步骤还包括,获得发声信息中的声纹特征信息,声纹特征信息表征一种声音特意的标准。例如,用户发出的声音,其声纹特征信息则对应有用户标准,电子设备发出的声音,其声纹特征信息则对应有电子设备标准。在获得声纹特征信息后,将所述声纹特征信息与预设的声纹特征标准进行比对,从而确定是否存在需要传送的声音信息的判定标准,从而确定出当前的音量信息是由特定的用户产生的。例如,智能手机的用户,可以预先在手机中存储其自己的声纹信息,在本步骤中,就可以使用该预存声纹信息作为标准,判断是否是其本人发出的声音。In this step, since the volume information involves not only the voice of the user, but also the voice of other users or other devices, in order to accurately determine whether the voice is the voice of the current user, the above steps further include: obtaining The voiceprint feature information in the vocalization information, the voiceprint feature information represents a specific sound standard. For example, the voiceprint feature information of the voice made by the user corresponds to the user standard, and the voiceprint feature information of the sound emitted by the electronic device corresponds to the electronic device standard. After obtaining the voiceprint feature information, the voiceprint feature information is compared with the preset voiceprint feature standard to determine whether there is a judgment standard for the voice information to be transmitted, so as to determine that the current volume information is generated by a specific generated by users. For example, the user of the smart phone can store his own voiceprint information in the mobile phone in advance, and in this step, he can use the pre-stored voiceprint information as a standard to determine whether it is his own voice.
步骤S203,根据所述发音部位图像,或者根据所述音量信息,判断是否存在需要传送的声音信息;若是,则进入下一步。Step S203, according to the image of the sounding part, or according to the volume information, determine whether there is sound information that needs to be transmitted; if so, go to the next step.
在获得发音部位图像和音量信息后,则可根据所述发音部位图像,或者所述音量信息,判断是否存在需要传送的声音信息。其中,在本申请第一实施例中,需要传送的声音信息是指当前用户发出的声音,当然,在其它实施例中,需要传送的声音信息还可以是指目标对象发出的声音,该目标对象包括可以发声的电子设备、动物等。以下将分别描述根据所述发音部位图像判断是否存在需要传送的声音信息,和根据所述音量信息,判断是否存在需要传送的声音信息的具体内容。After the sounding part image and the volume information are obtained, it can be determined whether there is sound information to be transmitted according to the sounding part image or the sound volume information. Among them, in the first embodiment of the present application, the sound information to be transmitted refers to the sound made by the current user. Of course, in other embodiments, the sound information to be transmitted may also refer to the sound emitted by the target object. Including electronic devices that can make sounds, animals, etc. The following will describe the specific content of determining whether there is sound information to be transmitted according to the sounding part image, and determining whether there is sound information to be transmitted according to the volume information.
其中,根据所述发音部位图像判断是否存在需要传送的声音信息,若是,则进入下一步,具体包括:对应于上述步骤,从所述发音部位图像,提取所述发音部位图像中的用户的发音部位的第一图像信息和第二图像信息后,将所述用户的发音部位的第一图像信息和第二图像信息进行比较,判断发音部位是否存在变化,具体的,比较第一图像信息和第二图像信息是否存在图像差异信息,若存在图像差异信息,则确定发音部位存在变化,并确定存在需要传送的声音信息,并进入下一步。Wherein, it is judged whether there is sound information that needs to be transmitted according to the image of the pronunciation part, and if so, the next step is entered, which specifically includes: corresponding to the above steps, extracting the pronunciation of the user in the image of the pronunciation part from the image of the pronunciation part After the first image information and the second image information of the part are compared, the first image information and the second image information of the pronunciation part of the user are compared to determine whether there is a change in the pronunciation part. Specifically, the first image information and the second image information are compared. 2. Whether there is image difference information in the image information, if there is image difference information, it is determined that there is a change in the pronunciation part, and it is determined that there is sound information that needs to be transmitted, and the next step is entered.
具体而言,上述处理过程是将发音部位图像按照时间顺序进行分帧处理,以提取发音部位图像中的用户的发音部位的第一图像信息和第二图像信息,第一图像信息与第二图像信息具有不同获取时间,并且该不同的获取时间的时间间隔根据被观察的发音部位的运动频率进行适当设定。例如,第一图像信息和第二图像信息可以是相邻视频帧对应的图像信息,也可以是获得第一帧对应的第一图像信息后,再获得第十帧对应的第二图像信息。将第一图像信息和第二图像信息之间的视频帧的间隔增大,是为了可以获得图像信息明显不同的第一图像信息和第二图像信息。当然,根据发音部位图像进行是否存在需要传送的声音信息的判断,可以采用多种不同的方案,例如,采用机器学习技术训练专门的识别模型,在本申请中,优选采用上述直接比对第一图像信息和第二图像信息的方案,可以占用比较少的计算资源,只要发现第一图像信息和第二图像信息中的发音部位存在差异即可。当然,在本申请中,上述第一图像信息和第二图像信息的比对,仅仅是一种简化的方案,实际上可以才一定的时间段内,例如10-30秒,不断截取间隔固定时间的相关图像,并进行相邻的截取图像的比对判断,通过该时间段中的多组第一图像信息和第二图像信息比对,最终确定是否存在需要传送的声音信息。例如,所述比对结果都表明发音部位存在运动,则可以确定使用者在持续发声。对于本申请而言,在进行上述判断的过程中,需要非常注意不能错误判断,例如,仅仅有一组第一图像信息和第二图像信息之间发生了变化,则可能是使用者偶尔有发音部位的运动,并不一定是在发言。为了准确判断,需要将所述用户的一段时间内的发音部位的多组第一图像信息和第二图像信息进行比较,判断发音部位是否存在持续变化。例如,可以设置观察的时间长度为30秒,每组第一图像信息和第二图像信息间隔为1秒,如果30秒内发现30组比对结果都是第一图像信息和第二图像信息之间存在变化,则可以确定使用者在发言,即存在需要传送的声音信息。Specifically, the above processing process is to perform frame-by-frame processing on the pronunciation part images in time sequence, so as to extract the first image information and the second image information of the user's pronunciation part in the pronunciation part images, the first image information and the second image information. The information has different acquisition times, and the time intervals of the different acquisition times are appropriately set according to the movement frequency of the observed speech part. For example, the first image information and the second image information may be image information corresponding to adjacent video frames, or may be obtained after obtaining the first image information corresponding to the first frame and then obtaining the second image information corresponding to the tenth frame. The reason for increasing the interval of the video frames between the first image information and the second image information is to obtain the first image information and the second image information whose image information is obviously different. Of course, a variety of different schemes can be adopted to judge whether there is sound information to be transmitted according to the image of the pronunciation part. For example, a special recognition model is trained by using machine learning technology. In this application, the above-mentioned direct comparison first The solution of the image information and the second image information can occupy relatively few computing resources, as long as it is found that there is a difference between the pronunciation parts in the first image information and the second image information. Of course, in the present application, the above-mentioned comparison of the first image information and the second image information is only a simplified solution. In fact, it is possible to continuously intercept a fixed time interval within a certain period of time, such as 10-30 seconds. and compare and judge the adjacent intercepted images, and finally determine whether there is sound information that needs to be transmitted by comparing multiple sets of first image information and second image information in this time period. For example, if the comparison results all indicate that there is movement in the vocal part, it can be determined that the user is vocalizing continuously. For this application, in the process of making the above judgment, it is necessary to be very careful not to make a wrong judgment. For example, if there is only a change between a set of first image information and the second image information, it may be that the user occasionally has a pronunciation part. movement, not necessarily speaking. For accurate judgment, it is necessary to compare multiple sets of first image information and second image information of the pronunciation part of the user within a period of time to determine whether there is a continuous change in the pronunciation part. For example, the observation time length can be set to 30 seconds, and the interval between each set of first image information and the second image information is 1 second. If 30 sets of comparison results are found within 30 seconds, the first image information and the second image information If there is a change between the two, it can be determined that the user is speaking, that is, there is voice information that needs to be transmitted.
其中,根据所述音量信息,判断是否存在需要传送的声音信息,若是,则进入下一步。具体的,需要获取拾音设备采集的声音信号,对所述声音信号进行预处理,提取发声信息,并对所述发声信息的音量进行估计,获取发声音量信息,将所述发声音量信息确定为所述拾音设备拾取的音量信息。Wherein, according to the volume information, it is judged whether there is sound information that needs to be transmitted, and if so, the next step is entered. Specifically, it is necessary to obtain the sound signal collected by the sound pickup device, preprocess the sound signal, extract the sounding information, estimate the volume of the sounding information, obtain the sounding volume information, and determine the sounding volume information as The volume information picked up by the sound pickup device.
在本步骤中,由于音量信息中不仅涉及有用户的声音,还可能涉及有其它用户或者是其它设备发出的声音,为了准确地确定出该声音是否为当前用户的声音,上述步骤还包括,获得发声信息中的声纹特征信息,声纹特征信息表征一种声音特意的标准。例如,用户发出的声音,其声纹特征信息则对应有用户标准,电子设备发出的声音,其声纹特征信息则对应有电子设备标准。在获得声纹特征信息后,将所述声纹特征信息与预设的声纹特征标准进行比对,从而确定是否存在需要传送的声音信息的判定标准,从而确定出当前的音量信息是由用户产生的。获得发声音量信息中的音量值和对应音量值的声音持续时间,将所述音量值与预设音量值阈值进行比较,判断所述音量值是否满足所述预设音量值阈值;若满足,则进一步判断所述声音持续时间是否满足预设声音持续时间阈值;若是,则判断存在需要传送的声音信息。举例说明,若发声音量信息中的音量值为40分贝,预设音量值阈值为40分贝,则音量值满足所述预设音量值阈值。且对应音量值的声音持续时间为5秒,预设声音持续时间阈值是5秒,则判断存在需要传送的声音信息。In this step, since the volume information involves not only the voice of the user, but also the voice of other users or other devices, in order to accurately determine whether the voice is the voice of the current user, the above steps further include: obtaining The voiceprint feature information in the vocalization information, the voiceprint feature information represents a specific sound standard. For example, the voiceprint feature information of the voice made by the user corresponds to the user standard, and the voiceprint feature information of the sound emitted by the electronic device corresponds to the electronic device standard. After the voiceprint feature information is obtained, the voiceprint feature information is compared with the preset voiceprint feature standard, so as to determine whether there is a judgment standard for the voice information to be transmitted, so as to determine that the current volume information is generated by the user produced. Obtain the volume value in the sound volume information and the sound duration corresponding to the volume value, compare the volume value with a preset volume value threshold, and determine whether the volume value satisfies the preset volume value threshold; if so, then It is further judged whether the sound duration satisfies the preset sound duration threshold; if so, it is judged that there is sound information to be transmitted. For example, if the volume value in the sound volume information is 40 decibels and the preset volume value threshold is 40 decibels, the volume value satisfies the preset volume value threshold. And the sound duration corresponding to the volume value is 5 seconds, and the preset sound duration threshold is 5 seconds, it is determined that there is sound information to be transmitted.
上述判断过程的目的,同样是为了解决误判断的问题,例如,发音者偶尔窃窃私语,不能判断为其在发言,这种情况可以通过音量是否超过预定的标准阈值判断;或者,发音者是偶尔咳嗽,尽管造成有比较大的环境音量,但持续时间很短;所以,必须结合音量和持续时间,即,判断其音量超过一定阈值并持续一段时间,才能确认该使用者处于会议发言状态。The purpose of the above judgment process is also to solve the problem of misjudgment. For example, the speaker occasionally whispers and cannot be judged to be speaking. This situation can be judged by whether the volume exceeds a predetermined standard threshold; Cough, although it causes a relatively large ambient volume, has a very short duration; therefore, the volume and duration must be combined, that is, it is judged that the volume exceeds a certain threshold and lasts for a period of time, in order to confirm that the user is in the conference speech state.
步骤S204,判断所述拾音设备的工作模式是否为静音模式;若是,则进入下一步。Step S204, judging whether the working mode of the sound pickup device is the silent mode; if so, go to the next step.
在确定存在需要传送的声音信息后,判断所述拾音设备的工作模式是否为静音模式,具体的,判断正在运行的相关应用中,对拾音设备的状态设定,若设定为麦克风关闭的状态,则为静音模式。After determining that there is sound information to be transmitted, it is determined whether the working mode of the sound pickup device is the silent mode. Specifically, in judging that the relevant application is running, the state of the sound pickup device is set. If the microphone is set to be off is in silent mode.
在其它实施方案中,判断所述拾音设备的工作模式是否为静音模式还可以包括如下步骤:拾音设备获得需要传送的声音信息在音视频工作环境下对应播放的音频区域,并判断需要传送的声音信息是否在所述音频区域播放,若音频区域中播放有需要传送的声音信息,则确定所述需要传送的声音信息在音视频工作环境下被录入,并确定视频会议环境下的拾音设备的工作模式为非静音模式;反之,若音频区域中未播放有需要传送的声音信息,则确定所述需要传送的声音信息在音视频工作环境下未被录入,并确定视频会议环境下的拾音设备的工作模式为静音模式。其中,音频区域是指智能设备对音频信息进行编解码和传送的处理通道。In other embodiments, judging whether the working mode of the sound pickup device is the silent mode may further include the following steps: the sound pickup device obtains the sound information that needs to be transmitted in the corresponding audio region played in the audio and video working environment, and judges that the sound information needs to be transmitted. Whether the audio information is played in the audio area, and if the audio information that needs to be transmitted is played in the audio area, it is determined that the audio information to be transmitted is recorded in the audio and video working environment, and the sound pickup in the video conferencing environment is determined. The working mode of the device is non-silent mode; on the contrary, if the audio information to be transmitted is not played in the audio area, it is determined that the audio information to be transmitted has not been recorded in the audio and video working environment, and the audio and video conferencing environment is determined. The working mode of the pickup device is silent mode. The audio area refers to a processing channel in which the smart device encodes, decodes and transmits audio information.
步骤S205,根据预设的方式,发出静音模式提示信息。Step S205 , according to a preset method, send out a silent mode prompt message.
在确定拾音设备的工作模式为静音模式后,拾音设备会根据预设的方式,发出静音模式提示信息,以提示用户当前的拾音设备处于静音模式,以让用户及时解除当前的静音模式。其中,在本申请第一实施例中,根据预设的方式,发出静音模式提示信息至少包括如下几种提示信息:提示灯闪烁信息、扬声器播音信息、拾音设备震动信息以及与静音模式对应的静音图标信息。以下将具体描述这几种提示信息的实现方式。After it is determined that the working mode of the pickup device is the silent mode, the pickup device will issue a silent mode prompt message according to the preset method to remind the user that the current pickup device is in the silent mode, so that the user can release the current silent mode in time. . Among them, in the first embodiment of the present application, according to a preset method, the prompt information of the silent mode is sent out at least the following kinds of prompt information: prompt light flashing information, speaker broadcasting information, sound pickup equipment vibration information and corresponding to the silent mode. Mute icon information. The implementation manners of these types of prompt information will be described in detail below.
具体的,对于提示灯闪烁信息,在确定拾音设备的工作模式为静音模式后,拾音设备的处理器会向灯光设备发送闪烁信息,灯光设备根据该闪烁信息控制设置在拾音设备外部的灯执行闪烁动作,该灯的闪烁动作的频率和强度可根据闪烁信息中的闪烁频率和强度信号实现。进一步的,若在预设时间内,拾音设备未获得针对所述静音图标的触发操作,则将闪烁动作闪烁频率和强度增加,以便提醒用户当前拾音设备正处于静音模式。Specifically, for the flashing information of the prompt light, after it is determined that the working mode of the sound pickup device is the silent mode, the processor of the sound pickup device will send the flashing information to the lighting device, and the lighting device will control the sound pickup device according to the flashing information. The lamp performs a flickering action, and the frequency and intensity of the flickering action of the light can be realized according to the flickering frequency and intensity signals in the flickering information. Further, if the sound pickup device does not obtain a trigger operation for the mute icon within a preset time, the flickering frequency and intensity of the flickering action will be increased to remind the user that the sound pickup device is currently in the mute mode.
对于扬声器播音信息,在确定拾音设备的工作模式为静音模式后,拾音设备的处理器会向扬声器发送播音信息,该播音信息可以具体是“静音模式”,或是其它的语音信息等。扬声器该播音信息控制扬声器播放语音信息。进一步的,若在预设时间内,拾音设备未获得针对所述静音图标的触发操作,则将扬声器播放语音的音量提高,以便提醒用户当前拾音设备正处于静音模式。For speaker broadcast information, after determining that the working mode of the sound pickup device is silent mode, the processor of the sound pickup device will send broadcast information to the speaker, and the broadcast information may specifically be "silent mode" or other voice information. Speaker The broadcast information controls the speaker to play the voice information. Further, if the sound pickup device does not obtain a trigger operation for the mute icon within the preset time, the volume of the sound played by the speaker is increased to remind the user that the sound pickup device is currently in the silent mode.
对于拾音设备震动信息,在确定拾音设备的工作模式为静音模式后,拾音设备的处理器会向震动设备发送震动信息,震动设备根据该震动信息控制震动设备的震动体带动拾音设备发生震动。其中,震动体的震动频率和强度可根据震动信息中的震动频率和强度信号实现。进一步的,若在预设时间内,拾音设备未获得针对所述静音图标的触发操作,则将震动动作的震动频率和强度增加,以便提醒用户当前拾音设备正处于静音模式。For the vibration information of the pickup device, after the working mode of the pickup device is determined to be silent mode, the processor of the pickup device will send the vibration information to the vibration device, and the vibration device will control the vibration body of the pickup device to drive the pickup device according to the vibration information. Vibration occurs. Wherein, the vibration frequency and intensity of the vibration body can be realized according to the vibration frequency and intensity signals in the vibration information. Further, if the sound pickup device does not obtain a trigger operation for the mute icon within the preset time, the vibration frequency and intensity of the vibration action are increased to remind the user that the sound pickup device is currently in the silent mode.
对于静音模式对应的静音图标信息,在确定拾音设备的工作模式为静音模式后,拾音设备的处理器会获取静音模式对应的静音图标,并将该静音图标发送至拾音设备的交互界面展示。进一步的,若在预设时间内,拾音设备未获得针对所述静音图标的触发操作,则将静音图标进行放大处理,且将放大处理后的静音图标展示在拾音设备的交互界面的居中位置,以便提醒用户当前拾音设备正处于静音模式。For the mute icon information corresponding to the silent mode, after determining that the working mode of the pickup device is the silent mode, the processor of the pickup device will obtain the mute icon corresponding to the silent mode, and send the mute icon to the interactive interface of the pickup device exhibit. Further, if the sound pickup device does not obtain a trigger operation for the mute icon within the preset time, the mute icon is enlarged, and the enlarged mute icon is displayed in the center of the interactive interface of the sound pickup device. position to alert the user that the current pickup device is in silent mode.
在本申请第一实施例中,在根据预设的方式,发出静音模式提示信息后,还包括:获得针对静音模式提示信息反馈的解除触发操作,根据解除触发操作解除静音模式。或者,在根据预设的方式,发出静音模式提示信息后,在预设时间内,未获得针对静音模式提示信息反馈的解除触发操作,则触发生成针对解除静音模式的解除信号,根据该解除信号解除静音模式以切换到发言模式。其中,在预设时间内,还包括在交互界面上呈现动态计时信息,所述动态计时信息至少包括数字倒计时信息和灯条弱化计时信息,以通过动态计时信息提示用户解除静音模式。可见,在本申请第一实施例中,用户可通过拾音设备提供的静音模式提示信息来手动解除静音模式,还可以通过拾音设备自身的控制方式来自动解除静音模式,进而从多方面实现对于静音模式的解除操作。In the first embodiment of the present application, after sending out the mute mode prompt information according to a preset method, the method further includes: obtaining a triggering operation for canceling the feedback of the mute mode prompt information, and canceling the mute mode according to the cancel trigger operation. Or, after the silent mode prompt information is sent out according to a preset method, within a preset time, the trigger operation for releasing the feedback for the silent mode prompt information is not obtained, then triggering the generation of a cancel signal for canceling the mute mode, according to the cancel signal Unmute mode to switch to speak mode. Wherein, within the preset time, it also includes presenting dynamic timing information on the interactive interface, the dynamic timing information including at least digital countdown information and light bar weakening timing information, so as to prompt the user to cancel the mute mode through the dynamic timing information. It can be seen that in the first embodiment of the present application, the user can manually cancel the mute mode through the mute mode prompt information provided by the sound pickup device, and can also automatically cancel the mute mode through the control method of the sound pickup device itself, so as to realize the realization from various aspects. For the release operation of silent mode.
针对本实施例,还可以包括前置步骤:判断拾音设备是否处于静音模式,若是,则获取用户的发音部位图像,以及获取拾音设备拾取的音量信息的步骤。也就是说,首先判断是否需要监测拾音设备是否处于静音模式,而后根据对应的模式执行相关的操作步骤。For this embodiment, it may further include a pre-step: judging whether the sound pickup device is in a silent mode, and if so, acquiring an image of the user's pronunciation part, and acquiring volume information picked up by the sound pickup device. That is to say, it is first determined whether it is necessary to monitor whether the sound pickup device is in the silent mode, and then the relevant operation steps are performed according to the corresponding mode.
本申请实施例提供一种智能设备音频工作模式提示方法,包括:获取用户的发音部位图像;获取拾音设备拾取的音量信息;根据所述发音部位图像,或者根据所述音量信息,判断是否存在需要传送的声音信息;若是,则进入下一步;判断所述拾音设备的工作模式是否为静音模式;若是,则进入下一步;根据预设的方式,发出静音模式提示信息。本申请实施例通过发音部位图像来判断是否存在用户需要传送的声音信息,或者通过环境音量监测来判断是否存在需要传送的用户声音信息,并根据预设的方式,发出静音模式提示信息来提醒用户目前所使用的拾音设备处于静音模式,以便用户及时解除静音模式,从而提升电话会议等智能终端远程交流沟通的流畅程度和效率。An embodiment of the present application provides a method for prompting an audio working mode of a smart device, including: acquiring an image of a user's pronunciation part; acquiring volume information picked up by a sound pickup device; Sound information to be transmitted; if yes, go to the next step; judge whether the working mode of the sound pickup device is the silent mode; In this embodiment of the present application, it is judged whether there is sound information that the user needs to transmit by using the image of the pronunciation part, or whether there is sound information that needs to be transmitted by the environmental volume monitoring, and according to a preset method, a silent mode prompt message is sent to remind the user. The currently used sound pickup equipment is in the mute mode, so that the user can release the mute mode in time, thereby improving the fluency and efficiency of remote communication of intelligent terminals such as conference calls.
与本申请第一实施例提供的拾音设备工作模式提示方法相对应的,本申请第二实施例对应提供一种智能设备音频工作模式提示装置。由于装置实施例基本相似于第一实施例,所以描述得比较简单,相关之处参见第一实施例的部分说明即可。下述描述的装置实施例仅仅是示意性的。Corresponding to the method for prompting the working mode of a sound pickup device provided in the first embodiment of the present application, the second embodiment of the present application correspondingly provides a device for prompting an audio working mode of a smart device. Since the apparatus embodiment is basically similar to the first embodiment, the description is relatively simple, and reference may be made to the partial description of the first embodiment for related parts. The apparatus embodiments described below are merely illustrative.
         请参照图3,其为本申请第二实施例提供的一种智能设备音频工作模式提示装置的示意图。该智能设备工作模式提示装置包括:图像获取单元301,用于获取用户的发音部位图像;音量信息获取单元302,用于获取拾音设备拾取的音量信息;声音信息判断单元303,用于根据所述发音部位图像,或者根据所述音量信息,判断是否存在需要传送的声音信息;若是,则进入下一步;静音模式判断单元304,用于判断所述拾音设备的工作模式是否为静音模式;若是,则进入下一步;静音模式提示信息发出单元305,用于根据预设的方式,发出静音模式提示信息。Please refer to FIG. 3 , which is a schematic diagram of a device for prompting an audio working mode of a smart device according to a second embodiment of the present application. The smart device working mode prompting device includes: an 
         可选的,图像获取单元301具体用于获得摄像设备采集的用户的发音部位图像;Optionally, the 
         从所述发音部位图像,提取所述发音部位图像中的用户的发音部位的第一图像信息和第二图像信息;所述第一图像信息与所述第二图像信息之间具有预定的获取时间间隔;对应的,声音信息判断单元302包括第一声音信息判断子单元,用于将所述用户的一段规定时间内的发音部位的多组第一图像信息和第二图像信息进行比较,判断发音部位是否存在持续变化;若是,则确定存在需要传送的声音信息,并进入下一步。From the pronunciation part image, extract first image information and second image information of the pronunciation part of the user in the pronunciation part image; there is a predetermined acquisition time between the first image information and the second image information Correspondingly, the sound 
         可选的,音量信息获取单元302具体用于获取拾音设备采集的声音信号;Optionally, the volume 
对所述声音信号进行预处理,提取发声信息;Preprocessing the sound signal to extract the sounding information;
对所述发声信息的音量进行估计,获取发声音量信息;Estimate the volume of the sounding information, and obtain the sounding volume information;
将所述发声音量信息确定为所述拾音设备拾取的音量信息。The sound volume information is determined as volume information picked up by the sound pickup device.
         可选的,声音信息判断单元303包括第二声音信息判断子单元,用于获得发声音量信息中的音量值和对应音量值的声音持续时间;Optionally, the sound 
将所述音量值与预设音量值阈值进行比较,判断所述音量值是否满足所述预设音量值阈值;若满足,则进一步判断所述声音持续时间是否满足预设声音持续时间阈值;若是,则判断存在需要传送的声音信息。Compare the volume value with a preset volume value threshold, and determine whether the volume value satisfies the preset volume value threshold; if so, further determine whether the sound duration satisfies the preset sound duration threshold; , it is judged that there is voice information that needs to be transmitted.
         可选的,声音信息判断单元303还包括判定标准单元,用于获得发声信息中的声纹特征信息;Optionally, the sound 
将所述声纹特征信息与预设的声纹特征标准进行比对,确定是否存在需要传送的声音信息。The voiceprint feature information is compared with a preset voiceprint feature standard to determine whether there is voice information that needs to be transmitted.
         可选的,静音模式判断单元304,具体用于获取相关应用中拾音设备的状态设定,若设定为麦克风关闭的状态,则为静音模式。Optionally, the mute 
可选的,所述静音模式提示信息包括如下信息中的至少一种:提示灯闪烁信息、扬声器播音信息、拾音设备震动信息以及与所述静音模式对应的静音图标信息。Optionally, the silent mode prompt information includes at least one of the following information: prompt light flashing information, speaker broadcasting information, sound pickup equipment vibration information, and mute icon information corresponding to the silent mode.
可选的,还包括静音图标处理单元,用于若在预设时间内,未获得针对所述静音图标的触发操作,则将所述静音图标进行放大处理,且将放大处理后的静音图标展示在所述拾音设备的交互界面的居中位置。Optionally, it also includes a mute icon processing unit, configured to enlarge the mute icon if a trigger operation for the mute icon is not obtained within a preset time, and display the magnified mute icon in the center of the interactive interface of the pickup device.
可选的,静音图标处理单元,还用于在根据预设的方式,发出静音模式提示信息之后,还包括:获得针对所述静音模式提示信息反馈的解除触发操作,根据所述解除触发操作解除所述静音模式;或者,Optionally, the mute icon processing unit is further configured to, after sending out the mute mode prompt information according to a preset manner, further comprising: obtaining a triggering operation for releasing the feedback for the prompt information of the mute mode, and releasing the triggering operation according to the deactivation operation. the silent mode; or,
在预设时间内,未获得针对所述静音模式提示信息反馈的解除触发操作,则触发生成针对解除静音模式的解除信号,并根据所述解除信号解除所述静音模式。If no triggering operation for releasing the prompt information feedback of the mute mode is obtained within a preset time, a release signal for releasing the mute mode is triggered to be generated, and the mute mode is released according to the release signal.
可选的,还包括处理单元,用于判断拾音设备是否处于静音模式,若是,则获取用户的发音部位图像,以及获取拾音设备拾取的音量信息。Optionally, it further includes a processing unit configured to determine whether the sound pickup device is in a silent mode, and if so, obtain an image of the user's pronunciation part, and obtain volume information picked up by the sound pickup device.
本申请第三实施例还提供一种智能设备音频工作模式提示方法,如图4所示,图4为本申请第三实施例提供的一种智能设备音频工作模式提示方法的流程图。所述方法包括如下步骤:The third embodiment of the present application also provides a method for prompting an audio working mode of a smart device, as shown in FIG. 4 , which is a flowchart of a method for prompting an audio working mode of a smart device provided by the third embodiment of the present application. The method includes the following steps:
步骤S401,获取拾音设备的工作模式为发言模式。Step S401, acquiring the working mode of the sound pickup device as the speaking mode.
在本步骤中,获取拾音设备的工作模式为发言模式包括:判断正在运行的相关应用中,对拾音设备的状态设定,若设定为麦克风开放的状态,则为发言模式,从而获取拾音设备的工作模式为发言模式。In this step, obtaining the working mode of the sound pickup device as the speaking mode includes: judging that the state of the sound pickup device is set in the relevant application that is running. The working mode of the pickup device is the speaking mode.
步骤S402,获取用户的发音部位图像。Step S402, acquiring an image of the pronunciation part of the user.
在本步骤中,音视频工作环境至少包括视频会议环境和视频聊天环境。音频工作环境至少包括语音会议环境、语音聊天环境和电话通信环境。In this step, the audio and video working environment includes at least a video conference environment and a video chat environment. The audio working environment includes at least a voice conferencing environment, a voice chatting environment and a telephone communication environment.
在音视频工作环境下,摄像设备处于开启状态,并且,摄像设备一般情况会采集用户的面部信息,其中自然包括发音部位图像。其中,用户的发音部位主要是指嘴部,相应的发音部位图像主要是指针对面部的视频中的嘴部部分。在摄像设备获得用户的发音部位图像后,会将该用户的发音部位图像发送给智能设备中设置的图像处理器,图像处理器将对获得的发音部位图像进行进一步处理。所述图像处理器,在本实施例中专门指对发音部位图像进行分析处理以便获得是否存在需要传送的声音的软件程序或者进程,当然,不排除用专门的硬件实现。In the audio and video working environment, the camera device is turned on, and the camera device generally collects the user's facial information, which naturally includes the image of the pronunciation part. Among them, the pronunciation part of the user mainly refers to the mouth, and the corresponding pronunciation part image mainly refers to the mouth part in the video for the face. After the camera device obtains the image of the voice part of the user, it will send the image of the voice part of the user to the image processor set in the smart device, and the image processor will further process the obtained image of the voice part. The image processor in this embodiment specifically refers to a software program or process for analyzing and processing an image of a speech part to obtain whether there is a sound to be transmitted. Of course, it is not excluded to implement it with special hardware.
步骤S403,获取拾音设备拾取的音量信息。Step S403, acquiring volume information picked up by the sound pickup device.
在音频工作环境下,获取拾音设备拾取的音量信息。拾音设备,也可以称为拾音头,一般是指包括麦克风(俗称咪头)和音频放大电路构成,麦克风是通过电磁感应将音频震动转化为电信号的设备,一般为被动设备,即无需供电即可工作,所述音频放大电路则用于将麦克风采集转换的电信号进行整形放大,传送给后续元件进行编解码和传输。如前所述,即使已经将麦克风静音,其实质含义是拾音设备后续的编解码和传输程序被停止,但是拾音设备本身仍然在采集周围环境的音量信息。In the audio working environment, obtain the volume information picked up by the sound pickup device. Pickup equipment, also known as pickup head, generally refers to the composition of a microphone (commonly known as a microphone head) and an audio amplifier circuit. A microphone is a device that converts audio vibrations into electrical signals through electromagnetic induction. The audio amplifying circuit is used to shape and amplify the electrical signal collected and converted by the microphone, and transmit it to subsequent components for encoding, decoding and transmission. As mentioned above, even if the microphone has been muted, the essential meaning is that the subsequent codec and transmission procedures of the sound pickup device are stopped, but the sound pickup device itself is still collecting volume information of the surrounding environment.
在本步骤中,获取拾音设备拾取的音量信息,具体包括,获取拾音设备采集的声音信号,需要对所述声音信号进行预处理,提取发声信息,并对所述发声信息的音量进行估计,获取发声音量信息,将所述发声音量信息确定为所述拾音设备拾取的音量信息。该过程是为了将目标对象的发声信息和环境中的其他声音信息区别开。具体的实现方式在本领域有多种可能性,尤其是目标对象为自然人时,将自然人的声音信息和其他的环境中的声音信息区别开,是可以实现的技术。In this step, acquiring the volume information picked up by the sound pickup device specifically includes: acquiring the sound signal collected by the sound pickup device, the sound signal needs to be preprocessed, the vocalization information is extracted, and the volume of the vocalization information is estimated , obtain sound volume information, and determine the sound volume information as the volume information picked up by the sound pickup device. This process is to distinguish the vocal information of the target object from other acoustic information in the environment. There are many possibilities for specific implementations in the art, especially when the target object is a natural person, it is an achievable technology to distinguish the sound information of the natural person from the sound information in other environments.
在本步骤中,由于音量信息中不仅涉及有用户的声音,还可能涉及有其它用户或者是其它设备发出的声音,为了准确地确定出该声音是否为当前用户的声音,上述步骤还包括,获得发声信息中的声纹特征信息,声纹特征信息表征一种声音特意的标准。例如,用户发出的声音,其声纹特征信息则对应有用户标准,电子设备发出的声音,其声纹特征信息则对应有电子设备标准。在获得声纹特征信息后,将所述声纹特征信息与预设的声纹特征标准进行比对,从而确定是否存在需要传送的声音信息的判定标准,从而确定出当前的音量信息是由特定的用户产生的。例如,智能手机的用户,可以预先在手机中存储其自己的声纹信息,在本步骤中,就可以使用该预存声纹信息作为标准,判断是否是其本人发出的声音。In this step, since the volume information involves not only the voice of the user, but also the voice of other users or other devices, in order to accurately determine whether the voice is the voice of the current user, the above steps further include: obtaining The voiceprint feature information in the vocalization information, the voiceprint feature information represents a specific sound standard. For example, the voiceprint feature information of the voice made by the user corresponds to the user standard, and the voiceprint feature information of the sound emitted by the electronic device corresponds to the electronic device standard. After obtaining the voiceprint feature information, the voiceprint feature information is compared with the preset voiceprint feature standard to determine whether there is a judgment standard for the voice information to be transmitted, so as to determine that the current volume information is generated by a specific generated by users. For example, the user of the smart phone can store his own voiceprint information in the mobile phone in advance, and in this step, he can use the pre-stored voiceprint information as a standard to determine whether it is his own voice.
步骤S404,根据所述发音部位图像,或者根据所述音量信息,判断是否存在需要传送的声音信息;若否,则进入下一步。Step S404, according to the image of the pronunciation part or according to the volume information, determine whether there is sound information that needs to be transmitted; if not, go to the next step.
在获得发音部位图像和音量信息后,则可根据所述发音部位图像,或者所述音量信息,判断是否存在需要传送的声音信息。其中,在本申请第三实施例中,需要传送的声音信息是指当前用户发出的声音,当然,在其它实施例中,需要传送的声音信息还可以是指目标对象发出的声音,该目标对象包括可以发声的电子设备、动物等。以下将分别描述根据所述发音部位图像判断是否存在需要传送的声音信息,和根据所述音量信息,判断是否存在需要传送的声音信息的具体内容。After the sounding part image and the volume information are obtained, it can be determined whether there is sound information to be transmitted according to the sounding part image or the sound volume information. Among them, in the third embodiment of the present application, the sound information to be transmitted refers to the sound made by the current user. Of course, in other embodiments, the sound information to be transmitted may also refer to the sound made by the target object. Including electronic devices that can make sounds, animals, etc. The following will describe the specific content of determining whether there is sound information to be transmitted according to the sounding part image, and determining whether there is sound information to be transmitted according to the volume information.
其中,根据所述发音部位图像判断是否存在需要传送的声音信息,若否,则进入下一步,具体包括:对应于上述步骤,从所述发音部位图像,提取所述发音部位图像中的用户的发音部位的第一图像信息和第二图像信息后,将所述用户的发音部位的第一图像信息和第二图像信息进行比较,判断发音部位是否存在变化,具体的,比较第一图像信息和第二图像信息是否存在图像差异信息,若不存在图像差异信息,则确定发音部位不存在变化,并确定不存在需要传送的声音信息,进入下一步。Wherein, it is determined whether there is sound information that needs to be transmitted according to the image of the pronunciation part, and if not, the next step is entered, which specifically includes: corresponding to the above steps, from the image of the pronunciation part, extracting the voice information of the user in the image of the pronunciation part. After the first image information and the second image information of the pronunciation part, compare the first image information and the second image information of the user's pronunciation part to determine whether there is a change in the pronunciation part. Specifically, compare the first image information and the second image information. Whether there is image difference information in the second image information, if there is no image difference information, it is determined that there is no change in the pronunciation part, and it is determined that there is no sound information to be transmitted, and the next step is entered.
具体而言,上述处理过程是将发音部位图像按照时间顺序进行分帧处理,以提取发音部位图像中的用户的发音部位的第一图像信息和第二图像信息,第一图像信息与第二图像信息具有不同获取时间,并且该不同的获取时间的时间间隔根据被观察的发音部位的运动频率进行适当设定。例如,第一图像信息和第二图像信息可以是相邻视频帧对应的图像信息,也可以是获得第一帧对应的第一图像信息后,再获得第十帧对应的第二图像信息。将第一图像信息和第二图像信息之间的视频帧的间隔增大,是为了可以获得图像信息明显不同的第一图像信息和第二图像信息。当然,根据发音部位图像进行是否存在需要传送的声音信息的判断,可以采用多种不同的方案,例如,采用机器学习技术训练专门的识别模型,在本申请中,优选采用上述直接比对第一图像信息和第二图像信息的方案,可以占用比较少的计算资源,只要发现第一图像信息和第二图像信息中的发音部位不存在差异即可。当然,在本申请中,上述第一图像信息和第二图像信息的比对,仅仅是一种简化的方案,实际上可以才一定的时间段内,例如10-30秒,不断截取间隔固定时间的相关图像,并进行相邻的截取图像的比对判断,通过该时间段中的多组第一图像信息和第二图像信息比对,最终确定是否存在需要传送的声音信息。例如,所述比对结果都表明发音部位不存在运动,则可以确定使用者不在持续发声。Specifically, the above processing process is to perform frame-by-frame processing on the pronunciation part images in time sequence, so as to extract the first image information and the second image information of the user's pronunciation part in the pronunciation part images, the first image information and the second image information. The information has different acquisition times, and the time intervals of the different acquisition times are appropriately set according to the movement frequency of the observed speech part. For example, the first image information and the second image information may be image information corresponding to adjacent video frames, or may be obtained after obtaining the first image information corresponding to the first frame and then obtaining the second image information corresponding to the tenth frame. The reason for increasing the interval of the video frames between the first image information and the second image information is to obtain the first image information and the second image information whose image information is obviously different. Of course, a variety of different schemes can be adopted to judge whether there is sound information to be transmitted according to the image of the pronunciation part. For example, a special recognition model is trained by using machine learning technology. In this application, the above-mentioned direct comparison first The solution of the image information and the second image information can occupy relatively few computing resources, as long as it is found that there is no difference between the pronunciation parts in the first image information and the second image information. Of course, in the present application, the above-mentioned comparison of the first image information and the second image information is only a simplified solution. In fact, it is possible to continuously intercept a fixed time interval within a certain period of time, such as 10-30 seconds. and compare and judge the adjacent intercepted images, and finally determine whether there is sound information that needs to be transmitted by comparing multiple sets of first image information and second image information in this time period. For example, if all of the comparison results show that there is no movement in the pronunciation part, it can be determined that the user does not continue to vocalize.
需要说明的是,对于本申请而言,在进行上述判断的过程中,需要非常注意不能错误判断,例如,仅仅有一组第一图像信息和第二图像信息之间发生了变化,则可能是使用者偶尔有发音部位的运动,并不一定是在发言。为了准确判断,需要将所述用户的一段时间内的发音部位的多组第一图像信息和第二图像信息进行比较,判断发音部位是否存在持续变化。例如,可以设置观察的时间长度为30秒,每组第一图像信息和第二图像信息间隔为1秒,如果30秒内发现30组比对结果都是第一图像信息和第二图像信息之间存在变化,则可以确定使用者在发言,即存在需要传送的声音信息。It should be noted that, for this application, in the process of making the above judgment, it is necessary to be very careful not to make a wrong judgment. For example, if there is only a change between a set of first image information and The speaker occasionally has movement of the vocal part, not necessarily speaking. For accurate judgment, it is necessary to compare multiple sets of first image information and second image information of the pronunciation part of the user within a period of time to determine whether there is a continuous change in the pronunciation part. For example, the observation time length can be set to 30 seconds, and the interval between each set of first image information and the second image information is 1 second. If 30 sets of comparison results are found within 30 seconds, the first image information and the second image information If there is a change between the two, it can be determined that the user is speaking, that is, there is voice information that needs to be transmitted.
其中,根据所述音量信息,判断是否存在需要传送的声音信息,若否,则进入下一步。具体的,需要获取拾音设备采集的声音信号,对所述声音信号进行预处理,提取发声信息,并对所述发声信息的音量进行估计,获取发声音量信息,将所述发声音量信息确定为所述拾音设备拾取的音量信息。Wherein, according to the volume information, it is judged whether there is sound information that needs to be transmitted, and if not, the next step is entered. Specifically, it is necessary to obtain the sound signal collected by the sound pickup device, preprocess the sound signal, extract the sounding information, estimate the volume of the sounding information, obtain the sounding volume information, and determine the sounding volume information as The volume information picked up by the sound pickup device.
在本步骤中,由于音量信息中不仅涉及有用户的声音,还可能涉及有其它用户或者是其它设备发出的声音,为了准确地确定出该声音是否为当前用户的声音,上述步骤还包括,获得发声信息中的声纹特征信息,声纹特征信息表征一种声音特意的标准。例如,用户发出的声音,其声纹特征信息则对应有用户标准,电子设备发出的声音,其声纹特征信息则对应有电子设备标准。在获得声纹特征信息后,将所述声纹特征信息与预设的声纹特征标准进行比对,从而确定是否存在需要传送的声音信息的判定标准,从而确定出当前的音量信息是由用户产生的。获得发声音量信息中的音量值和对应音量值的声音持续时间,将所述音量值与预设音量值阈值进行比较,判断所述音量值是否满足所述预设音量值阈值;若不满足,则进一步判断所述声音持续时间是否满足预设声音持续时间阈值;若否,则判断不存在需要传送的声音信息。举例说明,若发声音量信息中的音量值为40分贝,预设音量值阈值为50分贝,则音量值满足所述预设音量值阈值。且对应音量值的声音持续时间为5秒,预设声音持续时间阈值是10秒,则判断不存在需要传送的声音信息。In this step, since the volume information involves not only the voice of the user, but also the voice of other users or other devices, in order to accurately determine whether the voice is the voice of the current user, the above steps further include: obtaining The voiceprint feature information in the vocalization information, the voiceprint feature information represents a specific sound standard. For example, the voiceprint feature information of the voice made by the user corresponds to the user standard, and the voiceprint feature information of the sound emitted by the electronic device corresponds to the electronic device standard. After the voiceprint feature information is obtained, the voiceprint feature information is compared with the preset voiceprint feature standard, so as to determine whether there is a judgment standard for the voice information to be transmitted, so as to determine that the current volume information is generated by the user produced. Obtain the volume value in the sound volume information and the sound duration corresponding to the volume value, compare the volume value with a preset volume value threshold, and determine whether the volume value satisfies the preset volume value threshold; if not, Then it is further judged whether the sound duration satisfies the preset sound duration threshold; if not, it is judged that there is no sound information to be transmitted. For example, if the volume value in the sound volume information is 40 decibels and the preset volume value threshold is 50 decibels, the volume value satisfies the preset volume value threshold. And the sound duration corresponding to the volume value is 5 seconds, and the preset sound duration threshold is 10 seconds, it is determined that there is no sound information to be transmitted.
反之,获得发声音量信息中的音量值和对应音量值的声音持续时间,将所述音量值与预设音量值阈值进行比较,判断所述音量值是否满足所述预设音量值阈值;若满足,则进一步判断所述声音持续时间是否满足预设声音持续时间阈值;若是,则判断存在需要传送的声音信息。举例说明,若发声音量信息中的音量值为40分贝,预设音量值阈值为40分贝,则音量值满足所述预设音量值阈值。且对应音量值的声音持续时间为5秒,预设声音持续时间阈值是5秒,则判断存在需要传送的声音信息。On the contrary, obtain the volume value in the sound volume information and the sound duration of the corresponding volume value, compare the volume value with the preset volume value threshold, and judge whether the volume value satisfies the preset volume value threshold; , then it is further judged whether the sound duration satisfies the preset sound duration threshold; if so, it is judged that there is sound information to be transmitted. For example, if the volume value in the sound volume information is 40 decibels and the preset volume value threshold is 40 decibels, the volume value satisfies the preset volume value threshold. And the sound duration corresponding to the volume value is 5 seconds, and the preset sound duration threshold is 5 seconds, it is determined that there is sound information to be transmitted.
上述判断过程的目的,同样是为了解决误判断的问题,例如,发音者偶尔窃窃私语,不能判断为其在发言,这种情况可以通过音量是否超过预定的标准阈值判断;或者,发音者是偶尔咳嗽,尽管造成有比较大的环境音量,但持续时间很短;所以,必须结合音量和持续时间,即,判断其音量超过一定阈值并持续一段时间,才能确认该使用者处于会议发言状态。The purpose of the above judgment process is also to solve the problem of misjudgment. For example, the speaker occasionally whispers and cannot be judged to be speaking. This situation can be judged by whether the volume exceeds a predetermined standard threshold; Cough, although it causes a relatively large ambient volume, has a very short duration; therefore, the volume and duration must be combined, that is, it is judged that the volume exceeds a certain threshold and lasts for a period of time, in order to confirm that the user is in the conference speech state.
步骤S405,根据预设的方式,将所述发言模式切换为静音模式。Step S405 , according to a preset manner, switch the speaking mode to a mute mode.
在确定使用者处于会议非发言状态后,拾音设备会根据预设的方式,将所述发言模式切换为静音模式。具体的,获得针对发言模式的切换操作,根据切换操作将发言模式切换为静音模式。或者,在预设时间内,未获得针对发言模式的切换操作,则触发生成针对切换发言模式的切换信号,根据该切换信号将发言模式切换为静音模式。其中,在预设时间内,还包括在交互界面上呈现动态计时信息,所述动态计时信息至少包括数字倒计时信息和灯条弱化计时信息,通过动态计时信息提示用户当前需要将发言模式切换为静音模式。可见,在本申请第三实施例中,用户可手动将发言模式切换为静音模式,还可以通过拾音设备自身的控制方式来将发言模式切换为静音模式,进而从多方面实现对于发言模式的切换。After it is determined that the user is in the non-speaking state of the conference, the voice pickup device will switch the speaking mode to the mute mode according to a preset method. Specifically, a switching operation for the speaking mode is obtained, and the speaking mode is switched to the mute mode according to the switching operation. Or, if the switching operation for the speaking mode is not obtained within a preset time, the generation of a switching signal for switching the speaking mode is triggered, and the speaking mode is switched to the mute mode according to the switching signal. Among them, within the preset time, it also includes presenting dynamic timing information on the interactive interface, the dynamic timing information includes at least digital countdown information and light bar weakening timing information, and prompting the user to switch the speech mode to mute through the dynamic timing information. model. It can be seen that in the third embodiment of the present application, the user can manually switch the speaking mode to the silent mode, and can also switch the speaking mode to the silent mode through the control method of the pickup device itself, thereby realizing the control of the speaking mode from various aspects. switch.
本申请第三实施例提供一种拾音设备工作模式提示方法,包括:获取拾音设备的工作模式为发言模式;获取用户的发音部位图像;获取拾音设备拾取的音量信息;A third embodiment of the present application provides a method for prompting a working mode of a sound pickup device, including: obtaining the working mode of the sound pickup device as a speech mode; obtaining an image of a user's pronunciation part; obtaining volume information picked up by the sound pickup device;
根据所述发音部位图像,或者根据所述音量信息,判断是否存在需要传送的声音信息;若否,则进入下一步;根据预设的方式,将所述发言模式切换为静音模式。本申请实施例通过发音部位图像来判断是否存在用户需要传送的声音信息,或者通过环境音量监测来判断是否存在需要传送的用户声音信息,以便用户及时切换对应的模式,从而提升电话会议等智能终端远程交流沟通的流畅程度和效率。According to the pronunciation part image, or according to the volume information, it is judged whether there is sound information to be transmitted; if not, go to the next step; according to the preset mode, the speaking mode is switched to the mute mode. In this embodiment of the present application, it is judged whether there is voice information that the user needs to transmit by using the image of the pronunciation part, or whether there is voice information that needs to be transmitted by the environmental volume monitoring, so that the user can switch the corresponding mode in time, thereby improving intelligent terminals such as teleconferencing. Fluency and efficiency of remote communication.
与本申请第三实施例提供的拾音设备工作模式提示方法相对应的,本申请第四实施例对应提供一种智能设备音频工作模式提示装置。由于装置实施例基本相似于第三实施例,所以描述得比较简单,相关之处参见第三实施例的部分说明即可。下述描述的装置实施例仅仅是示意性的。Corresponding to the method for prompting the working mode of a sound pickup device provided by the third embodiment of the present application, the fourth embodiment of the present application correspondingly provides a device for prompting an audio working mode of a smart device. Since the apparatus embodiment is basically similar to the third embodiment, the description is relatively simple, and reference may be made to part of the description of the third embodiment for related parts. The apparatus embodiments described below are merely illustrative.
         请参照图5,其为本申请第四实施例提供的一种智能设备音频工作模式提示装置的示意图。该智能设备工作模式提示装置包括:发言模式获取单元501,用于获取拾音设备的工作模式为发言模式;图像获取单元502,用于获取用户的发音部位图像;音量信息获取单元503,用于获取拾音设备拾取的音量信息;声音信息判断单元504,用于根据所述发音部位图像,或者根据所述音量信息,判断是否存在需要传送的声音信息;若否,则进入下一步;切换单元505,用于根据预设的方式,将所述发言模式切换为静音模式。Please refer to FIG. 5 , which is a schematic diagram of an apparatus for prompting an audio working mode of a smart device according to a fourth embodiment of the present application. The smart device working mode prompting device includes: a speaking 
         与本申请第一实施例和第三实施例的拾音设备工作模式提示方法相对应的,本申请第五实施例还提供一种电子设备。如图6所示,图6为本申请第五实施例中提供的一种电子设备的示意图。该电子设备,包括:处理器601;存储器602,用于存储计算机程序,该计算机程序被处理器运行,执行第一实施例和第三实施例的拾音设备工作模式提示方法。Corresponding to the method for prompting the working mode of the sound pickup device in the first embodiment and the third embodiment of the present application, the fifth embodiment of the present application further provides an electronic device. As shown in FIG. 6 , FIG. 6 is a schematic diagram of an electronic device provided in a fifth embodiment of the present application. The electronic device includes: a 
与本申请第第一实施例和第三实施例的拾音设备工作模式提示方法相对应的,本申请第六实施例还提供一种计算机存储介质,所述计算机存储介质存储有计算机程序,该计算机程序被处理器运行,执行第一实施例和第三实施例的拾音设备工作模式提示方法。Corresponding to the method for prompting the working mode of a sound pickup device in the first embodiment and the third embodiment of the present application, the sixth embodiment of the present application further provides a computer storage medium, where the computer storage medium stores a computer program, the The computer program is executed by the processor to execute the method for prompting the working mode of the sound pickup device of the first embodiment and the third embodiment.
本申请虽然以较佳实施例公开如上,但其并不是用来限定本申请,任何本领域技术人员在不脱离本申请的精神和范围内,都可以做出可能的变动和修改,因此本申请的保护范围应当以本申请权利要求所界定的范围为准。Although the present application is disclosed above with preferred embodiments, it is not intended to limit the present application. Any person skilled in the art can make possible changes and modifications without departing from the spirit and scope of the present application. Therefore, the present application The scope of protection shall be subject to the scope defined by the claims of this application.
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。Memory may include non-persistent memory in computer readable media, random access memory (RAM) and/or non-volatile memory in the form of, for example, read only memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括非暂存电脑可读媒体(transitory media),如调制的数据信号和载波。Computer-readable media includes both persistent and non-permanent, removable and non-removable media, and storage of information may be implemented by any method or technology. Information may be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Flash Memory or other memory technology, Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, excludes non-transitory computer-readable media, such as modulated data signals and carrier waves.
本领域技术人员应明白,本申请的实施例可提供为方法、系统或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。It will be appreciated by those skilled in the art that the embodiments of the present application may be provided as a method, a system or a computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
Claims (14)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title | 
|---|---|---|---|
| CN202111486295.4A CN114401384A (en) | 2021-12-07 | 2021-12-07 | Intelligent device audio working mode prompting method and device | 
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title | 
|---|---|---|---|
| CN202111486295.4A CN114401384A (en) | 2021-12-07 | 2021-12-07 | Intelligent device audio working mode prompting method and device | 
Publications (1)
| Publication Number | Publication Date | 
|---|---|
| CN114401384A true CN114401384A (en) | 2022-04-26 | 
Family
ID=81226371
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date | 
|---|---|---|---|
| CN202111486295.4A Pending CN114401384A (en) | 2021-12-07 | 2021-12-07 | Intelligent device audio working mode prompting method and device | 
Country Status (1)
| Country | Link | 
|---|---|
| CN (1) | CN114401384A (en) | 
Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| US20030185371A1 (en) * | 2002-03-29 | 2003-10-02 | Dobler Steve R. | Mute status reminder for a communication device | 
| CN106791109A (en) * | 2016-12-23 | 2017-05-31 | 维沃移动通信有限公司 | A kind of sound prompting method and mobile terminal | 
| CN108111701A (en) * | 2016-11-24 | 2018-06-01 | 北京中创视讯科技有限公司 | Silence processing method and device | 
| CN111510662A (en) * | 2020-04-27 | 2020-08-07 | 深圳米唐科技有限公司 | Network call microphone state prompting method and system based on audio and video analysis | 
| CN111694479A (en) * | 2020-06-11 | 2020-09-22 | 北京百度网讯科技有限公司 | Mute processing method and device in teleconference, electronic device and storage medium | 
| CN111753769A (en) * | 2020-06-29 | 2020-10-09 | 歌尔科技有限公司 | Terminal audio collection control method, electronic device and readable storage medium | 
| CN112135213A (en) * | 2020-08-27 | 2020-12-25 | 深圳市妙严科技有限公司 | Information processing method based on audio wearable device | 
| CN113055774A (en) * | 2021-04-30 | 2021-06-29 | 珠海市特乐雅有限公司 | Wireless microphone, control method and storage medium | 
- 
        2021
        - 2021-12-07 CN CN202111486295.4A patent/CN114401384A/en active Pending
 
Patent Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| US20030185371A1 (en) * | 2002-03-29 | 2003-10-02 | Dobler Steve R. | Mute status reminder for a communication device | 
| CN108111701A (en) * | 2016-11-24 | 2018-06-01 | 北京中创视讯科技有限公司 | Silence processing method and device | 
| CN106791109A (en) * | 2016-12-23 | 2017-05-31 | 维沃移动通信有限公司 | A kind of sound prompting method and mobile terminal | 
| CN111510662A (en) * | 2020-04-27 | 2020-08-07 | 深圳米唐科技有限公司 | Network call microphone state prompting method and system based on audio and video analysis | 
| CN111694479A (en) * | 2020-06-11 | 2020-09-22 | 北京百度网讯科技有限公司 | Mute processing method and device in teleconference, electronic device and storage medium | 
| CN111753769A (en) * | 2020-06-29 | 2020-10-09 | 歌尔科技有限公司 | Terminal audio collection control method, electronic device and readable storage medium | 
| CN112135213A (en) * | 2020-08-27 | 2020-12-25 | 深圳市妙严科技有限公司 | Information processing method based on audio wearable device | 
| CN113055774A (en) * | 2021-04-30 | 2021-06-29 | 珠海市特乐雅有限公司 | Wireless microphone, control method and storage medium | 
Similar Documents
| Publication | Publication Date | Title | 
|---|---|---|
| US11570223B2 (en) | Intelligent detection and automatic correction of erroneous audio settings in a video conference | |
| US10516788B2 (en) | Method and apparatus for adjusting volume of user terminal, and terminal | |
| CN105814913B (en) | Name sensitive listening device | |
| JP6489563B2 (en) | Volume control method, system, device and program | |
| US8878678B2 (en) | Method and apparatus for providing an intelligent mute status reminder for an active speaker in a conference | |
| KR101626438B1 (en) | Method, device, and system for audio data processing | |
| US9560208B2 (en) | System and method for providing intelligent and automatic mute notification | |
| US8036375B2 (en) | Automated near-end distortion detection for voice communication systems | |
| US20130211826A1 (en) | Audio Signals as Buffered Streams of Audio Signals and Metadata | |
| US11650790B2 (en) | Centrally controlling communication at a venue | |
| CN107995360A (en) | Call processing method and related products | |
| CN111753769B (en) | Terminal audio acquisition control method, electronic equipment and readable storage medium | |
| KR101559364B1 (en) | Mobile apparatus executing face to face interaction monitoring, method of monitoring face to face interaction using the same, interaction monitoring system including the same and interaction monitoring mobile application executed on the same | |
| CN111988704B (en) | Sound signal processing method, device and storage medium | |
| WO2019174492A1 (en) | Voice call data detection method, device, storage medium and mobile terminal | |
| CN108172237A (en) | Voice call data processing method, device, storage medium and mobile terminal | |
| CN114710730A (en) | Volume prompting method, device, earphone and storage medium | |
| CN114401384A (en) | Intelligent device audio working mode prompting method and device | |
| CN117496988A (en) | Audio playback methods, devices, equipment and storage media | |
| CN120544562A (en) | Microphone control based on speech direction | |
| CN110767229A (en) | Voiceprint-based audio output method, device, device and readable storage medium | |
| US20190333517A1 (en) | Transcription of communications | |
| HK40071597A (en) | Method and device for prompting audio working mode of intelligent equipment | |
| TW201336290A (en) | Communication device and method thereof | |
| US20230290356A1 (en) | Hearing aid for cognitive help using speaker recognition | 
Legal Events
| Date | Code | Title | Description | 
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| REG | Reference to a national code | Ref country code: HK Ref legal event code: DE Ref document number: 40071597 Country of ref document: HK |