KR102268245B1

KR102268245B1 - Device, method and server for providing voice recognition service

Info

Publication number: KR102268245B1
Application number: KR1020190077521A
Authority: KR
Inventors: 노재근; 박종세; 정대성
Original assignee: 주식회사 카카오엔터프라이즈; 주식회사 카카오
Priority date: 2019-06-28
Filing date: 2019-06-28
Publication date: 2021-06-23
Anticipated expiration: 2039-06-28
Also published as: KR20210001434A

Abstract

본 발명의 일 실시예에 따르는 음성 인식 서비스를 제공하는 단말은 단말의 일측에 배치된 제 1 마이크 및 단말의 타측에 배치된 제 2 마이크 각각을 통해 사운드 신호를 입력받는 사운드 입력부, 제 1 마이크를 통해 입력된 제 1 사운드 신호 및 제 2 마이크를 통해 입력된 제 2 사운드 신호 각각에 대한 음압 레벨을 측정하는 음압 레벨 측정부, 측정된 음압 레벨에 기초하여 제 1 마이크 및 제 2 마이크 중 하나의 마이크를 음성 명령 신호를 수신하는 음성 명령 수신용 마이크로 선택하는 음성 명령 수신용 마이크 선택부 및 선택된 명령 수신용 마이크를 통해 입력된 음성 명령 신호에 기초하여 음성 인식을 수행하는 음성 인식 수행부를 포함한다. A terminal providing a voice recognition service according to an embodiment of the present invention includes a sound input unit receiving a sound signal through each of a first microphone disposed on one side of the terminal and a second microphone disposed on the other side of the terminal, the first microphone A sound pressure level measuring unit that measures sound pressure levels for each of the first sound signal input through the first sound signal and the second sound signal input through the second microphone, and one of the first microphone and the second microphone based on the measured sound pressure level and a microphone selector for receiving a voice command that selects as a microphone for receiving a voice command for receiving a voice command signal, and a voice recognition performing unit that performs voice recognition based on a voice command signal input through the microphone for receiving the selected command.

Description

Terminal, method and server for providing voice recognition service {DEVICE, METHOD AND SERVER FOR PROVIDING VOICE RECOGNITION SERVICE}

본 발명은 음성 인식 서비스를 제공하는 단말, 방법 및 서버에 관한 것이다.The present invention relates to a terminal, a method and a server for providing a voice recognition service.

음성 인식 장치는 음성 웨이크-업(voice wake up) 방식에 기반하여 음성 인식 서비스를 시작할 수 있다. 예를 들면, 호출어(wake up word)를 포함하는 음성 명령 신호가 입력되면, 음성 인식 장치는 호출어에 따라 음성 인식을 준비하고, 마이크를 통해 입력된 음성 명령 신호에 따라 음성 인식 서비스를 제공할 수 있다. The voice recognition apparatus may start a voice recognition service based on a voice wake-up method. For example, when a voice command signal including a wake up word is input, the voice recognition device prepares for voice recognition according to the call word, and provides a voice recognition service according to the voice command signal input through a microphone can do.

한편, 호출어(또는, 음성 명령)가 음성 인식 장치의 마이크(마이크로폰, Microphone)로 입력될 때 특정 노이즈도 함께 입력될 경우, 이는, 호출어(또는 음성 명령)의 음성 인식을 어렵게 하는 요인이 된다. 예를 들어, 음성 인식 장치의 스피커를 통해 특정 음원이 출력 중인 상황이거나, 음성 인식 장치 주변이 시끄러운 상황에서 음성 인식 장치가 호출어를 포함하는 음성 명령 신호를 입력받게 되면 호출어를 인식하지 못할 가능성이 높다. On the other hand, when a specific noise is also input when the call word (or voice command) is input to the microphone (microphone) of the voice recognition device, this is a factor that makes it difficult to recognize the call word (or voice command). do. For example, if a specific sound source is being output through the speaker of the voice recognition device or when the voice recognition device receives a voice command signal including a calling word in a noisy environment around the voice recognition device, it may not be able to recognize the calling word this is high

이러한 문제를 해결하기 위하여, 에코 제거기(예컨대, AEC: Acoustic Echo Canceller)를 이용하여 음성 인식 장치의 자체 재생음이 마이크로 재입력되는 현상인 에코를 제거하거나 잡음 제거(NS: noise suppression) 기능을 이용하여 노이즈를 줄여주는 기술들이 이용되고 있다.In order to solve this problem, an echo that is a phenomenon in which the self-reproduced sound of the voice recognition device is re-inputted into the microphone using an echo canceller (eg, Acoustic Echo Canceller (AEC)) or by using a noise suppression (NS) function Techniques for reducing noise are being used.

그러나, 음성 인식 장치는 그 특성상 마이크와 스피커를 모두 구비하고 있고, 특히 스마트폰과 같은 특정 장치에서는 마이크와 스피커가 인접하게 배치되어 있는 경우도 있다. 이러한 경우, 스피커로부터 출력되는 음원이 매우 큰 노이즈로서 작용하므로 앞서 제시한 기술들을 이용하여 상술한 문제점을 해결하기 어렵다.However, the voice recognition device has both a microphone and a speaker due to its characteristics, and in particular, in a specific device such as a smart phone, the microphone and the speaker may be disposed adjacently. In this case, since the sound source output from the speaker acts as a very large noise, it is difficult to solve the above-described problem using the techniques presented above.

예를 들어, 음성 인식 장치의 마이크와 스피커가 인접하게 배치되어 있는 경우, 스피커를 통해 특정 음원이 출력되는 상황에서 사용자가 호출어를 발화하게 되더라도 마이크에 해당 음원과 호출어가 함께 입력되기 때문에 음성 인식 장치가 호출어에 반응하지 못하는 경우가 많다. For example, when a microphone and a speaker of the voice recognition device are disposed adjacent to each other, even if a user utters a call word in a situation where a specific sound source is output through the speaker, the sound source and the call word are input together into the microphone, so voice recognition The device often fails to respond to the caller.

일본등록특허공보 제5862318호 (2016.01.08. 등록)Japanese Patent Publication No. 5862318 (registered on Jan. 8, 2016)

음성 인식 장치는 발화 음성의 방향성을 고려하여 복수의 마이크를 구비하고 있는 것이 일반적이다. 본 발명은 이러한 음성 인식 장치에 있어서, 사용자의 음성 명령 신호가 수신될 때, 음압 레벨이 낮은 스피커를 음성 명령 수신용 마이크로 선택하여 음성 인식을 수행함으로써, 상술한 문제점을 해결하고자 한다.In general, the voice recognition apparatus is provided with a plurality of microphones in consideration of the directionality of the spoken voice. The present invention is to solve the above-mentioned problem by selecting a speaker having a low sound pressure level as a microphone for receiving a voice command to perform voice recognition in such a voice recognition apparatus when a user's voice command signal is received.

본 발명본 발명다만, 본 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제로 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다.The present invention However, the technical problems to be achieved by this embodiment are not limited to the technical problems as described above, and other technical problems may exist.

상술한 기술적 과제를 달성하기 위한 기술적 수단으로서, 음성 인식 서비스를 제공하는 단말은 상기 단말의 일측에 배치된 제 1 마이크 및 상기 단말의 타측에 배치된 제 2 마이크 각각을 통해 사운드 신호를 입력받는 사운드 입력부; 상기 제 1 마이크를 통해 입력된 제 1 사운드 신호 및 상기 제 2 마이크를 통해 입력된 제 2 사운드 신호 각각에 대한 음압 레벨을 측정하는 음압 레벨 측정부; 상기 측정된 음압 레벨에 기초하여 상기 제 1 마이크 및 상기 제 2 마이크 중 하나의 마이크를 음성 명령 신호를 수신하는 음성 명령 수신용 마이크로 선택하는 음성 명령 수신용 마이크 선택부; 및 상기 선택된 명령 수신용 마이크를 통해 입력된 상기 음성 명령 신호에 기초하여 음성 인식을 수행하는 음성 인식 수행부를 포함한다. As a technical means for achieving the above technical problem, a terminal providing a voice recognition service receives a sound signal through each of a first microphone disposed on one side of the terminal and a second microphone disposed on the other side of the terminal. input unit; a sound pressure level measuring unit for measuring sound pressure levels for each of the first sound signal input through the first microphone and the second sound signal input through the second microphone; a microphone selection unit for receiving a voice command for selecting one of the first microphone and the second microphone as a microphone for receiving a voice command for receiving a voice command signal based on the measured sound pressure level; and a voice recognition performing unit configured to perform voice recognition based on the voice command signal input through the selected command receiving microphone.

본 발명의 다른 실시예에 따르는 음성 인식 서비스를 제공하는 방법은 단말의 일측에 배치된 제 1 마이크 및 상기 단말의 타측에 배치된 제 2 마이크 각각을 통해 사운드 신호를 입력받는 단계; 상기 제 1 마이크를 통해 입력된 제 1 사운드 신호 및 상기 제 2 마이크를 통해 입력된 제 2 사운드 신호 각각에 대한 음압 레벨을 측정하는 단계; 상기 측정된 음압 레벨에 기초하여 상기 제 1 마이크 및 상기 제 2 마이크 중 하나의 마이크를 음성 명령 신호를 수신하는 음성 명령 수신용 마이크로 선택하는 단계; 및 상기 선택된 명령 수신용 마이크를 통해 입력된 상기 음성 명령 신호에 기초하여 음성 인식을 수행하는 단계를 포함한다. A method of providing a voice recognition service according to another embodiment of the present invention includes: receiving a sound signal through each of a first microphone disposed on one side of a terminal and a second microphone disposed on the other side of the terminal; measuring a sound pressure level for each of a first sound signal input through the first microphone and a second sound signal input through the second microphone; selecting one of the first microphone and the second microphone as a microphone for receiving a voice command for receiving a voice command signal based on the measured sound pressure level; and performing voice recognition based on the voice command signal input through the selected command receiving microphone.

본 발명의 또 다른 실시예에 따르는 음성 인식 서비스를 제공하는 서버는 단말로부터 상기 단말의 제 1 마이크(Microphone) 및 제 2 마이크 각각을 통해 입력된 사운드 신호를 입력받는 사운드 수신부, 상기 제 1 마이크를 통해 입력된 제 1 사운드 신호 및 상기 제 2 마이크를 통해 입력된 제 2 사운드 신호 각각에 대한 음압 레벨을 측정하는 음압 레벨 측정부, 상기 측정된 음압 레벨에 기초하여 상기 제 1 마이크 및 상기 제 2 마이크 중 하나의 마이크를 음성 명령 신호를 수신하는 음성 명령 수신용 마이크로 선택하는 음성 명령 수신용 마이크 선택부 및 상기 선택된 명령 수신용 마이크를 통해 입력된 상기 음성 명령 신호에 기초하여 음성 인식을 수행하는 음성 인식 수행부를 포함한다. A server providing a voice recognition service according to another embodiment of the present invention includes a sound receiving unit that receives a sound signal input through each of a first microphone and a second microphone of the terminal from a terminal, the first microphone A sound pressure level measuring unit for measuring sound pressure levels for each of the first sound signal input through the first sound signal and the second sound signal input through the second microphone, the first microphone and the second microphone based on the measured sound pressure level A voice command receiving microphone selector selecting one of the microphones as a voice command receiving microphone for receiving a voice command signal, and voice recognition performing voice recognition based on the voice command signal input through the selected command receiving microphone includes an execution unit.

전술한 본 발명의 과제 해결 수단 중 어느 하나에 의하면, 본 발명은 복수의 마이크 각각을 통해 입력된 각 사운드 신호에 대한 음압 레벨에 기초하여 복수의 마이크 중 음성 명령 신호를 수신하는 음성 명령 수신용 마이크를 선택하고, 선택된 마이크를 통해 입력된 음성 명령 신호에 기초하여 음성 인식을 수행함으로써 음성 인식 장치가 호출어에 쉽게 반응하도록 할 수 있다. According to any one of the above-described problem solving means of the present invention, the present invention is a microphone for receiving a voice command for receiving a voice command signal from among the plurality of microphones based on the sound pressure level for each sound signal input through each of the plurality of microphones , and performing voice recognition based on a voice command signal input through the selected microphone, the voice recognition apparatus may easily respond to the call word.

또한, 본 발명에 따르면, 음성 인식 장치의 음성 인식의 성능을 향상시켜 노이즈가 많은 환경에서도 안정적인 음성 인식 서비스를 제공할 수 있다. Further, according to the present invention, it is possible to provide a stable voice recognition service even in a noisy environment by improving the voice recognition performance of the voice recognition apparatus.

도 1은 본 발명의 일 실시예에 따른, 음성 인식 서비스 제공 시스템의 구성도이다.
도 2는 본 발명의 일 실시예에 따른, 도 1에 도시된 음성 인식 서비스 제공 단말의 블록도이다.
도 3은 본 발명의 일 실시예에 따른, 음성 명령 신호를 수신하는 음성 명령 수신용 마이크를 선택하는 방법을 설명하기 위한 도면이다.
도 4는 본 발명의 일 실시예에 따른, 음성 인식 서비스를 제공하는 방법을 나타낸 흐름도이다.
도 5는 본 발명의 일 실시예에 따른, 도 1에 도시된 음성 인식 서버의 블록도이다. 1 is a block diagram of a system for providing a voice recognition service according to an embodiment of the present invention.
2 is a block diagram of the voice recognition service providing terminal shown in FIG. 1 according to an embodiment of the present invention.
3 is a view for explaining a method of selecting a microphone for receiving a voice command for receiving a voice command signal according to an embodiment of the present invention.
4 is a flowchart illustrating a method of providing a voice recognition service according to an embodiment of the present invention.
5 is a block diagram of the voice recognition server shown in FIG. 1 according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참조하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본 발명의 실시예를 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily implement them. However, the present invention may be embodied in many different forms and is not limited to the embodiments described herein. And in order to clearly explain the present invention in the drawings, parts irrelevant to the description are omitted, and similar reference numerals are attached to similar parts throughout the specification.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다. 또한 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다.Throughout the specification, when a part is "connected" with another part, this includes not only the case of being "directly connected" but also the case of being "electrically connected" with another element interposed therebetween. . In addition, when a part "includes" a certain component, this means that other components may be further included rather than excluding other components unless otherwise stated.

본 명세서에 있어서 '부(部)'란, 하드웨어에 의해 실현되는 유닛(unit), 소프트웨어에 의해 실현되는 유닛, 양방을 이용하여 실현되는 유닛을 포함한다. 또한, 1 개의 유닛이 2 개 이상의 하드웨어를 이용하여 실현되어도 되고, 2 개 이상의 유닛이 1 개의 하드웨어에 의해 실현되어도 된다. 한편, '~부'는 소프트웨어 또는 하드웨어에 한정되는 의미는 아니며, '~부'는 어드레싱 할 수 있는 저장 매체에 있도록 구성될 수도 있고 하나 또는 그 이상의 프로세서들을 재생시키도록 구성될 수도 있다. 따라서, 일 예로서 '~부'는 소프트웨어 구성요소들, 객체지향 소프트웨어 구성요소들, 클래스 구성요소들 및 태스크 구성요소들과 같은 구성요소들과, 프로세스들, 함수들, 속성들, 프로시저들, 서브루틴들, 프로그램 코드의 세그먼트들, 드라이버들, 펌웨어, 마이크로코드, 회로, 데이터, 데이터베이스, 데이터 구조들, 테이블들, 어레이들 및 변수들을 포함한다. 구성요소들과 '~부'들 안에서 제공되는 기능은 더 작은 수의 구성요소들 및 '~부'들로 결합되거나 추가적인 구성요소들과 '~부'들로 더 분리될 수 있다. 뿐만 아니라, 구성요소들 및 '~부'들은 디바이스 내의 하나 또는 그 이상의 CPU들을 재생시키도록 구현될 수도 있다.In this specification, a "part" includes a unit realized by hardware, a unit realized by software, and a unit realized using both. In addition, one unit may be implemented using two or more hardware, and two or more units may be implemented by one hardware. Meanwhile, '~ unit' is not limited to software or hardware, and '~ unit' may be configured to be in an addressable storage medium or may be configured to reproduce one or more processors. Thus, as an example, '~' denotes components such as software components, object-oriented software components, class components, and task components, and processes, functions, properties, and procedures. , subroutines, segments of program code, drivers, firmware, microcode, circuitry, data, databases, data structures, tables, arrays and variables. The functions provided in the components and '~ units' may be combined into a smaller number of components and '~ units' or further separated into additional components and '~ units'. In addition, components and '~ units' may be implemented to regenerate one or more CPUs in a device.

본 발명에 있어서, 사운드 신호는 음성 인식 서비스 제공 단말(100)의 마이크로 입력되는 것으로서, 주변 노이즈에 대한 신호(이하, 주변 노이즈 신호) 또는 주변 노이즈와 음성 명령을 포함하는 사운드에 대한 신호를 의미할 수 있다.In the present invention, the sound signal is input to the microphone of the voice recognition service providing terminal 100, and may mean a signal for ambient noise (hereinafter referred to as an ambient noise signal) or a signal for sound including ambient noise and a voice command. can

또한, 본 발명에 있어서, 음성 명령 신호는 사용자가 음성 인식 서비스를 이용하기 위해 발화함으로써 발생된 음성 명령에 대한 신호로서, 음성 인식 서비스를 활성화시키기 위한 활성화 명령에 대한 신호(이하, 활성화 명령 신호) 및 활성화된 음성 인식 서비스를 이용하기 위한 제어 명령에 대한 신호(이하, 제어 명령 신호)를 포함할 수 있다.In addition, in the present invention, the voice command signal is a signal for a voice command generated by a user uttering to use the voice recognition service, and a signal for an activation command for activating the voice recognition service (hereinafter, an activation command signal) and a signal for a control command for using the activated voice recognition service (hereinafter, a control command signal).

도 1은 본 발명의 일 실시예에 따른, 음성 인식 서비스 제공 시스템의 구성도이다. 1 is a block diagram of a system for providing a voice recognition service according to an embodiment of the present invention.

도 1을 참조하면, 음성 인식 서비스 제공 시스템은 음성 인식 서비스 제공 단말(100) 및 음성 인식 서버(110)를 포함할 수 있다. 다만, 이러한 도 1의 음성 인식 서비스 제공 시스템은 본 발명의 일 실시예에 불과하므로 도 1을 통해 본 발명이 한정 해석되는 것은 아니며, 본 발명의 다양한 실시예들에 따라 도 1과 다르게 구성될 수도 있다. Referring to FIG. 1 , a system for providing a voice recognition service may include a voice recognition service providing terminal 100 and a voice recognition server 110 . However, since the voice recognition service providing system of FIG. 1 is only an embodiment of the present invention, the present invention is not limitedly interpreted through FIG. 1 , and may be configured differently from FIG. 1 according to various embodiments of the present invention. have.

일반적으로, 도 1의 음성 인식 서비스 제공 시스템의 각 구성요소들은 네트워크(미도시)를 통해 연결된다. 네트워크는 단말들 및 서버들과 같은 각각의 노드 상호 간에 정보 교환이 가능한 연결 구조를 의미하는 것으로, 근거리 통신망(LAN: Local Area Network), 광역 통신망(WAN: Wide Area Network), 인터넷 (WWW: World Wide Web), 유무선 데이터 통신망, 전화망, 유무선 텔레비전 통신망 등을 포함한다. 무선 데이터 통신망의 일례에는 3G, 4G, 5G, 3GPP(3rd Generation Partnership Project), LTE(Long Term Evolution), WIMAX(World Interoperability for Microwave Access), 와이파이(Wi-Fi), 블루투스 통신, 적외선 통신, 초음파 통신, 가시광 통신(VLC: Visible Light Communication), 라이파이(LiFi) 등이 포함되나 이에 한정되지는 않는다. In general, each component of the voice recognition service providing system of FIG. 1 is connected through a network (not shown). A network refers to a connection structure that enables information exchange between each node, such as terminals and servers, and includes a local area network (LAN), a wide area network (WAN), and the Internet (WWW: World). Wide Web), wired and wireless data communication networks, telephone networks, wired and wireless television networks, and the like. Examples of wireless data communication networks include 3G, 4G, 5G, 3rd Generation Partnership Project (3GPP), Long Term Evolution (LTE), World Interoperability for Microwave Access (WIMAX), Wi-Fi, Bluetooth communication, infrared communication, ultrasound Communication, Visible Light Communication (VLC), LiFi, etc. are included, but are not limited thereto.

음성 인식 서비스 제공 단말(100)은 예를 들어, 음성 인식 스피커, 스마트폰, 차량용 스피커 등을 포함하는 음성 인식 서비스를 제공하는 모든 단말을 포함할 수 있다.The voice recognition service providing terminal 100 may include all terminals providing a voice recognition service including, for example, a voice recognition speaker, a smart phone, a vehicle speaker, and the like.

음성 인식 서비스 제공 단말(100)은 사용자로부터 발화 음성을 입력받고, 입력된 발화 음성에 대한 음성 인식을 수행할 수 있다. 구체적으로, 음성 인식 서비스 제공 단말(100)은 활성화 명령 신호(wake up word, WUW)를 감지한 후, 음성 인식을 수행할 수 있다. 여기서, 활성화 명령 신호는 음성 인식 기능을 활성화하는 트리거 음성 명령이다. The voice recognition service providing terminal 100 may receive a spoken voice from a user and perform voice recognition on the inputted spoken voice. Specifically, the voice recognition service providing terminal 100 may perform voice recognition after detecting an activation command signal (wake up word, WUW). Here, the activation command signal is a trigger voice command for activating the voice recognition function.

예를 들어, 음성 인식 서비스 제공 단말(100)에는 음성 인식 서비스 제공 단말(100)의 일측에 제 1 마이크(10, Microphone)가 설치되고, 음성 인식 서비스 제공 단말(100)의 타측에 제 2 마이크(20)가 설치되고, 어느 한 마이크와 근접한 위치에 스피커(30)가 설치될 수 있다. For example, in the voice recognition service providing terminal 100 , a first microphone 10 is installed on one side of the voice recognition service providing terminal 100 , and a second microphone is installed on the other side of the voice recognition service providing terminal 100 . 20 is installed, and a speaker 30 may be installed in a position close to any one microphone.

예를 들어, 음성 인식 서비스 제공 단말(100)에는 제 1 마이크(10) 및 제 2 마이크(20) 각각이 설치된 위치 이외의 위치에 복수의 마이크(미도시)가 추가로 설치될 수도 있다. For example, in the voice recognition service providing terminal 100 , a plurality of microphones (not shown) may be additionally installed at positions other than the positions where each of the first microphone 10 and the second microphone 20 are installed.

예를 들어, 제 2 마이크(20)와 스피커(30)가 인접하게 배치된 구조에서는 스피커(30)를 통해 음원이 출력되는 중에 사용자가 음성 인식을 위한 활성화 명령 신호가 포함된 음성 명령을 발화하더라도 제 1 마이크(10) 또는 제 2 마이크(20) 각각으로 해당 음성 명령과 함께 스피커(30)를 통해 출력되고 있는 음원이 동시에 입력되기 때문에 음성 인식에 어려움이 많다. For example, in a structure in which the second microphone 20 and the speaker 30 are disposed adjacently, even if the user utters a voice command including an activation command signal for voice recognition while a sound source is output through the speaker 30 Since the sound source being output through the speaker 30 is simultaneously inputted with the corresponding voice command to each of the first microphone 10 or the second microphone 20, there are many difficulties in voice recognition.

이러한 문제점을 해결하기 위해, 음성 인식 서비스 제공 단말(100)은 제 1 마이크(10) 및 제 2 마이크(20) 각각을 통해 사운드 신호를 입력받고, 제 1 마이크(10)를 통해 입력된 사운드 신호(이하, 제 1 사운드 신호라고 함) 및 제 2 마이크(20)를 통해 입력된 사운드 신호(이하, 제 2 사운드 신호라고 함) 각각에 대한 음압 레벨을 측정할 수 있다. 여기서, 음압 레벨은 음압의 대소를 나타내는 물리량으로서 이로부터 음의 세기의 측정이 가능하다. 이러한 음압 레벨의 단위는 데시벨(dB)이 이용된다. In order to solve this problem, the voice recognition service providing terminal 100 receives a sound signal through each of the first microphone 10 and the second microphone 20 , and the sound signal input through the first microphone 10 . (hereinafter, referred to as a first sound signal) and a sound signal input through the second microphone 20 (hereinafter referred to as a second sound signal), respectively, may measure sound pressure levels. Here, the sound pressure level is a physical quantity indicating the magnitude of the sound pressure, from which the sound intensity can be measured. The unit of this sound pressure level is decibel (dB).

본 발명의 일 실시 예에 따르면, 음성 인식 서비스 제공 단말(100)은 제 1 마이크(10) 및 제 2 마이크(20) 각각으로 사운드 신호가 입력되면, 제 1 마이크(10)를 통해 입력된 제 1 사운드 신호 및 제 2 마이크(20)를 통해 입력된 제 2 사운드 신호를 음성 인식 서버(110)에게 전송할 수 있다. According to an embodiment of the present invention, when a sound signal is input to each of the first microphone 10 and the second microphone 20 , the voice recognition service providing terminal 100 receives the second input through the first microphone 10 . The first sound signal and the second sound signal input through the second microphone 20 may be transmitted to the voice recognition server 110 .

이때, 음성 인식 서버(110)는 음성 인식 서비스 제공 단말(100)로부터 수신된 제 1 마이크(10)에 대응되는 제 1 사운드 신호에 대한 음압 레벨을 측정하고, 제 2 마이크(20)에 대응되는 제 2 사운드 신호에 대한 음압 레벨을 측정할 수 있다. At this time, the voice recognition server 110 measures the sound pressure level of the first sound signal corresponding to the first microphone 10 received from the voice recognition service providing terminal 100 , and the second microphone 20 A sound pressure level of the second sound signal may be measured.

음성 인식 서비스 제공 단말(100)은 측정된 음압 레벨에 기초하여 제 1 마이크(10) 및 제 2 마이크(20) 중 하나의 마이크를 음성 명령 신호를 수신하는 음성 명령 수신용 마이크로 선택할 수 있다.The voice recognition service providing terminal 100 may select one of the first microphone 10 and the second microphone 20 as a microphone for receiving a voice command for receiving a voice command signal based on the measured sound pressure level.

본 발명의 일 실시예에 따르면, 음성 인식 서버(110)에서 제 1 사운드 신호 및 제 2 사운드 신호에 대한 음압 레벨에 기초하여 제 1 마이크(10) 및 제 2 마이크(20) 중 하나를 음성 명령 신호를 수신하는 음성 명령 수신용 마이크로 선택할 수 있다.According to an embodiment of the present invention, one of the first microphone 10 and the second microphone 20 is given a voice command based on the sound pressure level for the first sound signal and the second sound signal in the voice recognition server 110 . It can be selected as a microphone for receiving voice commands that receives a signal.

예를 들어, 음성 인식 서비스 제공 단말(100)이 음압 레벨을 측정하는 경우, 음성 인식 서비스 제공 단말(100)은 제 1 사운드 신호에 대한 음압 레벨 및 제 2 사운드 신호에 대한 음압 레벨을 음성 인식 서버(110)로 전송하고, 음성 인식 서버(110)는 제 1 사운드 신호 및 제 2 사운드 신호에 대한 음압 레벨에 기초하여 음성 명령 수신용 마이크를 선택할 수 있다. For example, when the voice recognition service providing terminal 100 measures the sound pressure level, the voice recognition service providing terminal 100 sets the sound pressure level for the first sound signal and the sound pressure level for the second sound signal to the voice recognition server. In response to transmission to 110 , the voice recognition server 110 may select a microphone for receiving a voice command based on sound pressure levels for the first sound signal and the second sound signal.

이와 달리, 음성 인식 서버(110)가 제 1 사운드 신호 및 제 2 사운드 신호에 대한 음압 레벨을 측정하는 경우, 음성 인식 서버(110)에 의해 측정된 제 1 사운드 신호 및 제 2 사운드 신호에 대한 음압 레벨에 기초하여 음성 명령 수신용 마이크를 선택될 수 있다.On the other hand, when the voice recognition server 110 measures the sound pressure level for the first sound signal and the second sound signal, the sound pressure for the first sound signal and the second sound signal measured by the voice recognition server 110 . A microphone for receiving voice commands may be selected based on the level.

음성 인식 서비스 제공 단말(100)은 선택된 음성 명령 수신용 마이크를 통해 입력된 음성 명령 신호에 기초하여 음성 인식 서버(110)와 연동하여 음성 인식을 수행할 수 있다. The voice recognition service providing terminal 100 may perform voice recognition in conjunction with the voice recognition server 110 based on a voice command signal input through the selected voice command reception microphone.

이때 음성 인식 서버(110)가 음성 명령 수신용 마이크를 선택하는 경우, 음성 인식 서비스 제공 단말(100)은 음성 명령 수신용 마이크에 대한 정보를 음성 인식 서버(110)로부터 수신하고, 해당 음성 명령 수신용 마이크를 통해 입력된 음성 명령 신호에 기초한 음성 인식을 수행할 수 있다. At this time, when the voice recognition server 110 selects a microphone for receiving a voice command, the voice recognition service providing terminal 100 receives information about the microphone for receiving a voice command from the voice recognition server 110, and the number of corresponding voice commands Voice recognition may be performed based on a voice command signal input through the credit microphone.

예를 들어, 음성 인식 서비스 제공 단말(100)은 제 1 마이크(10) 및 제 2 마이크(20) 중 음성 명령 수신용 마이크로 선택된 마이크를 통해 입력된 사용자의 활성화 명령 신호에 기초하여 음성 인식 서비스를 활성화시킬 수 있다. 이 때, 음성 인식 수행부(230)는 활성화 명령 신호를 음성 인식 서버(110)로 전송하고, 음성 인식 서버(110)로부터 수신된 활성화 명령 신호에 대응하는 음성 인식 정보에 기초하여 음성 인식을 준비할 수 있다. For example, the voice recognition service providing terminal 100 provides a voice recognition service based on a user's activation command signal input through a microphone selected as a microphone for receiving a voice command among the first microphone 10 and the second microphone 20 . can be activated. At this time, the voice recognition performing unit 230 transmits an activation command signal to the voice recognition server 110 , and prepares for voice recognition based on voice recognition information corresponding to the activation command signal received from the voice recognition server 110 . can do.

이를 통해, 본 발명은 스피커(30)를 통해 음원이 출력되고 있는 등 노이즈가 많은 환경에서도 활성화 명령 신호를 쉽게 감지할 수 있기 때문에 보다 안정적인 음성 인식 서비스를 제공할 수 있다. Through this, the present invention can provide a more stable voice recognition service because the activation command signal can be easily detected even in a noisy environment such as when a sound source is being output through the speaker 30 .

이하에서는 도 1의 음성 인식 서비스 제공 시스템의 각 구성요소의 동작에 대해 보다 구체적으로 설명한다. Hereinafter, the operation of each component of the voice recognition service providing system of FIG. 1 will be described in more detail.

도 2는 본 발명의 일 실시예에 따른, 도 1에 도시된 음성 인식 서비스 제공 단말(100)의 블록도이다. 2 is a block diagram of the voice recognition service providing terminal 100 shown in FIG. 1 according to an embodiment of the present invention.

도 2를 참조하면, 본 발명의 일 실시예에 따른 음성 인식 서비스 제공 단말(100)은 사운드 입력부(200), 음압 레벨 측정부(210), 음성 명령 수신용 마이크 선택부(220), 음성 인식 수행부(230) 및 전처리 수행부(240)를 포함할 수 있다. 다만, 본 발명의 일 실시예에 따른 음성 인식 서비스 제공 단말(100)의 구성은 도 1과 다르게 이루어질 수도 있다. Referring to FIG. 2 , the voice recognition service providing terminal 100 according to an embodiment of the present invention includes a sound input unit 200 , a sound pressure level measurement unit 210 , a microphone selector 220 for receiving a voice command, and voice recognition. It may include a performing unit 230 and a preprocessing performing unit 240 . However, the configuration of the voice recognition service providing terminal 100 according to an embodiment of the present invention may be different from that of FIG. 1 .

사운드 입력부(200)는 음성 인식 서비스 제공 단말(100)의 일측에 배치된 제 1 마이크(10) 및 음성 인식 서비스 제공 단말(100)의 타측에 배치된 제 2 마이크(20) 각각을 통해 사운드 신호를 입력받을 수 있다. 여기서, 제 1 마이크(10)는 메인 마이크이고, 제 2 마이크(20)는 서브 마이크일 수 있다. 제 1 마이크(10) 및 제 2 마이크(20)는 예를 들면, 오디오 채널을 사용하는 스테레오 마이크일 수 있다. The sound input unit 200 is a sound signal through each of the first microphone 10 disposed on one side of the voice recognition service providing terminal 100 and the second microphone 20 disposed on the other side of the voice recognition service providing terminal 100 . can be input. Here, the first microphone 10 may be a main microphone, and the second microphone 20 may be a sub microphone. The first microphone 10 and the second microphone 20 may be, for example, stereo microphones using an audio channel.

예를 들면, 사운드 입력부(200)는 음성 인식 서비스 제공 단말(100)의 사용자가 음성 명령을 발화한 경우, 제 1 마이크(10) 및 제 2 마이크(20) 각각을 통해 사용자의 음성 명령 신호와 주변 노이즈 신호(예컨대, 스피커(30)에서 출력중인 음원, 에어컨의 바람 소리 등과 같이 사용자의 음성 명령과 관련없는 소음을 포함)가 포함된 사운드 신호를 입력받을 수 있다. For example, when the user of the voice recognition service providing terminal 100 utters a voice command, the sound input unit 200 may include the user's voice command signal and the user's voice command signal through each of the first microphone 10 and the second microphone 20 . A sound signal including an ambient noise signal (eg, noise not related to a user's voice command, such as a sound source being output from the speaker 30 or a wind sound of an air conditioner) may be input.

음압 레벨 측정부(210)는 제 1 마이크(10)를 통해 입력된 제 1 사운드 신호 및 제 2 마이크(20)를 통해 입력된 제 2 사운드 신호 각각에 대한 음압 레벨을 측정할 수 있다. The sound pressure level measuring unit 210 may measure the sound pressure level of each of the first sound signal input through the first microphone 10 and the second sound signal input through the second microphone 20 .

음압 레벨 측정부(210)는 제 1 단위 시간마다 제 1 마이크(10)를 통해 입력된 제 1 사운드 신호 및 제 2 마이크(20)를 통해 입력된 제 2 사운드 신호 각각에 대한 음압 레벨을 측정할 수 있다. 여기서, 사운드 신호에 대한 음압 레벨이 측정되는 제 1 단위 시간은 예를 들어, 음성 인식 서비스 제공 단말(100)의 사양 또는 음성 인식 알고리즘에 따라 달라질 수 있으며, 일례로 80ms 내지 120ms 내에서 결정될 수도 있다. 바람직하게 제 1 단위 시간은 100ms일 수 있다. The sound pressure level measuring unit 210 measures the sound pressure level for each of the first sound signal input through the first microphone 10 and the second sound signal input through the second microphone 20 every first unit time. can Here, the first unit time at which the sound pressure level of the sound signal is measured may vary according to, for example, the specification of the voice recognition service providing terminal 100 or the voice recognition algorithm, for example, may be determined within 80ms to 120ms . Preferably, the first unit time may be 100 ms.

예를 들어, 음성 인식 서비스 제공 단말(100)의 음성 인식에 대한 사양이 높을수록 음성 인식률이 높을 수 있기 때문에 제 1 단위 시간이 늘어날 수 있고, 음성 인식에 대한 사양이 낮으면 음성 인식률을 높여야 하기 때문에 제 1 단위 시간이 짧아질 수 있다. 이 때 제1 단위 시간은 미리 설정되어 있을 수도 있고, 이후 사용자에 의해 변경될 수도 있다.For example, the higher the specification for voice recognition of the voice recognition service providing terminal 100 is, the higher the voice recognition rate may be, so the first unit time may increase, and if the specification for voice recognition is low, the voice recognition rate should be increased. Therefore, the first unit time may be shortened. In this case, the first unit time may be preset or may be changed later by the user.

여기서, 음성 명령 수신용 마이크 선택부(220)는 제 1 단위 시간마다 측정된 제 1 사운드 신호에 대한 음압 레벨 및 제 2 사운드 신호에 대한 음압 레벨을 비교하여 음성 명령 수신용 마이크를 선택할 수 있다. 이 때, 음성 명령 수신용 마이크 선택부(220)는 제 1 단위 시간마다 선택된 마이크에 대한 정보를 음성 인식 서비스 제공 단말(100)의 버퍼에 저장할 수 있다. Here, the microphone selection unit 220 for receiving a voice command may select a microphone for receiving a voice command by comparing the sound pressure level of the first sound signal and the sound pressure level of the second sound signal measured every first unit time. In this case, the microphone selection unit 220 for receiving a voice command may store information on the microphone selected for each first unit time in a buffer of the voice recognition service providing terminal 100 .

예를 들어, 음성 명령 수신용 마이크 선택부(220)는 제 1 단위 시간마다 측정된 제 1 마이크(10)의 제 1 사운드 신호(301)에 대한 음압 레벨 및 제 2 마이크(20)의 제 2 사운드 신호(303)에 대한 음압 레벨 중 낮은 음압 레벨을 갖는 사운드 신호가 입력된 마이크를 음성 명령 수신용 마이크로 선택한 후, 선택된 마이크에 대한 정보를 버퍼에 순차적으로 저장할 수 있다. For example, the microphone selector 220 for receiving a voice command may include a sound pressure level for the first sound signal 301 of the first microphone 10 and the second of the second microphone 20 measured every first unit time. After selecting a microphone to which a sound signal having a lower sound pressure level among sound pressure levels for the sound signal 303 is input as a microphone for receiving a voice command, information on the selected microphone may be sequentially stored in a buffer.

음성 명령 수신용 마이크 선택부(220)는 음성 명령 수신용 마이크를 선택하는데 버퍼에 저장된 마이크에 대한 정보를 이용할 수 있다. 예를 들어, 음성 명령 수신용 마이크 선택부(220)는 후술하는 바와 같이, 제 1 마이크(10)를 통해 입력된 제 1 사운드 신호에 대한 음압 레벨과 제 2 마이크(20)를 통해 입력된 제 2 사운드 신호에 대한 음압 레벨이 동일하거나 유사한 경우, 버퍼에 저장된 마이크에 대한 정보를 이용하여 음성 명령 수신용 마이크를 선택할 수 있다.The microphone selection unit 220 for receiving a voice command may use information about the microphone stored in the buffer to select a microphone for receiving a voice command. For example, the microphone selection unit 220 for receiving a voice command may include a sound pressure level for a first sound signal input through the first microphone 10 and a second microphone input through the second microphone 20 , as will be described later. 2 When the sound pressure levels for the sound signals are the same or similar, a microphone for receiving a voice command may be selected using information about the microphone stored in the buffer.

또한, 음성 명령 수신용 마이크 선택부(220)는 후술하는 바와 같이, 음성 명령 수신용 마이크를 선택하거나 추가 음성 명령 신호를 입력받을 마이크를 선택할 경우에도 버퍼에 저장된 마이크에 대한 정보를 이용할 수 있다. 이에 대한 상세한 내용은 후술하기로 한다.Also, as described later, the microphone selection unit 220 for receiving a voice command may use information about the microphone stored in the buffer even when selecting a microphone for receiving a voice command or a microphone for receiving an additional voice command signal. Detailed information on this will be described later.

음성 명령 수신용 마이크 선택부(220)는 측정된 음압 레벨에 기초하여 제 1 마이크(10) 및 제 2 마이크(20) 중 하나의 마이크를 음성 명령 신호를 수신하는 음성 명령 수신용 마이크로 선택할 수 있다. The microphone selection unit 220 for receiving a voice command may select one of the first microphone 10 and the second microphone 20 as a microphone for receiving a voice command signal for receiving a voice command signal based on the measured sound pressure level. .

예를 들면, 음성 명령 수신용 마이크 선택부(220)는 제 1 마이크(10)를 통해 입력된 제 1 사운드 신호에 대한 음압 레벨 및 제 2 마이크(20)를 통해 입력된 제 2 사운드 신호에 대한 음압 레벨 중 낮은 음압 레벨을 갖는 사운드 신호가 입력된 마이크를 음성 명령 수신용 마이크로 선택할 수 있다. For example, the microphone selection unit 220 for receiving a voice command may include a sound pressure level for a first sound signal input through the first microphone 10 and a second sound signal input through the second microphone 20 . A microphone to which a sound signal having a lower sound pressure level among sound pressure levels is input may be selected as a microphone for receiving a voice command.

본 발명의 일 실시예에 따르면, 음성 명령 수신용 마이크 선택부(220)는 기설정된 과거의 시간 동안(예컨대, 10분 전부터 현재까지) 제 1 마이크(10) 및 제 2 마이크(20)가 음성 명령 수신용 마이크로 선택된 횟수에 기초하여 제 1 마이크(10) 및 제 2 마이크(20) 각각의 음압 레벨에 대하여 서로 다른 가중치를 적용할 수 있다.According to an embodiment of the present invention, the microphone selection unit 220 for receiving a voice command is configured to receive voice commands from the first microphone 10 and the second microphone 20 for a preset past time (eg, from 10 minutes to the present). Different weights may be applied to the respective sound pressure levels of the first microphone 10 and the second microphone 20 based on the number of times the microphone is selected for receiving a command.

즉, 음성 명령 수신용 마이크 선택부(220)는 기설정된 과거의 시간 동안 음성 명령 수신용 마이크로 비교적 많이 선택된 마이크를 통해 입력된 사운드 신호에 대한 음압 레벨에 제 1 가중치를 적용하고, 다른 마이크를 통해 입력된 사운드 신호에 대한 음압 레벨에 제1가중치보다 낮은 제 2 가중치를 적용하고, 제 1 가중치가 적용된 음압 레벨 및 제 2 가중치가 적용된 음압 레벨 중 낮은 음압 레벨을 갖는 사운드 신호가 입력된 마이크를 음성 명령 수신용 마이크로 선택할 수 있다. That is, the microphone selection unit 220 for receiving a voice command applies a first weight to the sound pressure level for a sound signal input through a microphone that has been selected as a microphone for receiving a voice command during a preset past time, and is applied through another microphone. A second weight lower than the first weight is applied to the sound pressure level of the input sound signal, and the microphone to which the sound signal having the lower sound pressure level among the sound pressure level to which the first weight is applied and the sound pressure level to which the second weight is applied is input is voiced. It can be selected as a microphone for receiving commands.

다른 예를 들어, 음성 명령 수신용 마이크 선택부(220)는 사용자의 음성 명령의 발화 위치(즉, 음성 인식 서비스 제공 단말(100)을 기준으로 한 사용자의 위치)에 기초하여 제 1 마이크(10) 및 제 2 마이크(20) 각각의 음압 레벨에 서로 다른 가중치를 적용할 수 있다. 이를 위해, 음성 음압 레벨 측정부(210)는 제 1 마이크(10) 및 제 2 마이크(20)로 입력되는 음성 명령 신호의 음압 레벨을 측정하고, 이에 기초하여 사용자의 음성 명령의 발화 위치를 추적할 수 있다.For another example, the microphone selection unit 220 for receiving a voice command may include the first microphone 10 based on the utterance position of the user's voice command (ie, the position of the user with respect to the voice recognition service providing terminal 100 ). ) and the second microphone 20, different weights may be applied to the respective sound pressure levels. To this end, the voice sound pressure level measurement unit 210 measures the sound pressure level of the voice command signal input to the first microphone 10 and the second microphone 20, and tracks the utterance position of the user's voice command based thereon. can do.

즉 음성 명령 수신용 마이크 선택부(220)는 사용자의 음성 명령의 발화 위치와 가까운 위치에 있는 마이크(음압 레벨이 높을수록 사용자의 발화 위치와 가까운 마이크임)를 통해 입력된 사운드 신호에 대한 음압 레벨에 제 1 가중치를 적용하고, 사용자의 음성 명령의 발화 위치와 멀리 떨어진 위치에 있는 마이크(음압 레벨이 낮을수록 사용자의 발화 위치와 멀리 떨어진 마이크임)를 통해 입력된 사운드 신호에 대한 음압 레벨에 제1가중치보다 낮은 제 2 가중치를 적용한 후, 제 1 가중치가 적용된 음압 레벨 및 제 2 가중치가 적용된 음압 레벨 중 낮은 음압 레벨을 갖는 사운드 신호가 입력된 마이크를 음성 명령 수신용 마이크로 선택할 수 있다.That is, the microphone selection unit 220 for receiving a voice command is a sound pressure level for a sound signal input through a microphone (the higher the sound pressure level, the closer the microphone is to the user's speaking position) located close to the user's voice command utterance position. The first weight is applied to the sound pressure level for the sound signal input through the microphone (the lower the sound pressure level, the farther away from the user's speaking position) is the microphone located far from the user's speech command location. After applying a second weight lower than the first weight, a microphone to which a sound signal having a lower sound pressure level among the sound pressure level to which the first weight is applied and the sound pressure level to which the second weight is applied is input may be selected as a microphone for receiving a voice command.

또 다른 예를 들어, 음성 명령 수신용 마이크 선택부(220)는 음성 인식 서비스 제공 단말(100)에 구비된 적어도 하나의 센서에 의해 센싱된 센싱 데이터에 기초한 음성 인식 서비스 제공 단말(100)의 기울기 정보에 기초하여 제 1 마이크(10) 및 제 2 마이크(20) 각각 음압 레벨에 서로 다른 가중치를 적용할 수 있다.As another example, the microphone selection unit 220 for receiving a voice command is the inclination of the voice recognition service providing terminal 100 based on sensing data sensed by at least one sensor provided in the voice recognition service providing terminal 100 . Based on the information, different weights may be applied to the sound pressure levels of the first microphone 10 and the second microphone 20, respectively.

즉, 음성 명령 수신용 마이크 선택부(220)는 음성 인식 서비스 제공 단말(100)의 기울기 정보를 고려하여 사용자의 음성 명령의 발화 위치와 가까운 위치에 있는 마이크를 통해 입력된 사운드 신호에 대한 음압 레벨에 제 1 가중치를 적용하고, 사용자의 음성 명령의 발화 위치와 멀리 떨어진 위치에 있는 마이크를 통해 입력된 사운드 신호에 대한 음압 레벨에 제1가중치보다 낮은 제 2 가중치를 적용한 후, 제 1 가중치가 적용된 음압 레벨 및 제 2 가중치가 적용된 음압 레벨 중 낮은 음압 레벨을 갖는 사운드 신호가 입력된 마이크를 음성 명령 수신용 마이크로 선택할 수 있다.That is, the microphone selection unit 220 for receiving a voice command considers the tilt information of the voice recognition service providing terminal 100 and the sound pressure level for the sound signal input through the microphone located close to the utterance position of the user's voice command. A first weight is applied to , and a second weight lower than the first weight is applied to the sound pressure level of the sound signal input through a microphone located far from the utterance position of the user's voice command, and then the first weight is applied A microphone to which a sound signal having a lower sound pressure level is input among the sound pressure level and the sound pressure level to which the second weight is applied may be selected as a microphone for receiving a voice command.

본 발명의 상기 구성에 따르면, 음성 명령 수신용 마이크 선택부(220)가 실제로 사용자의 음성 명령을 인식하기 용이한 마이크를 음성 명령 수신용 마이크로 선택하도록 하여 음성 인식의 성능을 보다 향상시킬 수 있다. According to the above configuration of the present invention, the performance of voice recognition can be further improved by allowing the microphone selection unit 220 for receiving a voice command to actually select a microphone for easily recognizing a user's voice command as a microphone for receiving a voice command.

본 발명의 다른 일 실시예에 따르면, 음성 명령 수신용 마이크 선택부(220)는 제 1 사운드 신호 및 제 2 사운드 신호에 대한 음압 레벨이 동일하거나 각 음압 레벨의 차이가 기설정된 값 이내인 경우, 적어도 하나의 다른 정보에 기초하여 음성 명령 수신용 마이크를 선택할 수 있다.According to another embodiment of the present invention, the microphone selection unit 220 for receiving a voice command is configured when the sound pressure levels of the first sound signal and the second sound signal are the same or the difference between the sound pressure levels is within a preset value, A microphone for receiving a voice command may be selected based on at least one other piece of information.

예를 들어, 음성 명령 수신용 마이크 선택부(220)는 기설정된 과거의 시간 동안(예컨대, 10분 전부터 현재까지) 제 1 마이크(10) 및 제 2 마이크(20)가 음성 명령 수신용 마이크로 선택된 횟수에 기초하여 음성 명령 수신용 마이크를 선택할 수 있다. 즉, 음성 명령 수신용 마이크 선택부(220)는 기설정된 과거의 시간 동안 음성 명령 수신용 마이크로 많이 선택된 마이크를 음성 명령 수신용 마이크로 선택할 수 있다. For example, the microphone selection unit 220 for receiving a voice command is configured to select the first microphone 10 and the second microphone 20 as the microphone for receiving a voice command for a preset past time (eg, from 10 minutes to the present). You can select a microphone for receiving voice commands based on the number of times. That is, the microphone selection unit 220 for receiving a voice command may select a microphone for receiving a voice command from many microphones selected as a microphone for receiving a voice command during a preset past time.

다른 예를 들어, 음성 명령 수신용 마이크 선택부(220)는 사용자의 음성 명령의 발화 위치(즉, 음성 인식 서비스 제공 단말(100)을 기준으로 한 사용자의 위치)에 기초하여 음성 명령 수신용 마이크를 선택할 수 있다. For another example, the microphone selection unit 220 for receiving a voice command may include a microphone for receiving a voice command based on the utterance position of the user's voice command (ie, the position of the user with respect to the voice recognition service providing terminal 100 ). can be selected.

즉, 음성 명령 수신용 마이크 선택부(220)는 추적된 사용자의 음성 명령의 발화 위치에 기초하여 이에 대응하는 마이크(즉, 사용자의 음성 명령의 발화 위치 측에 위치한 마이크)를 음성 명령 수신용 마이크를 선택할 수 있다.That is, the microphone selection unit 220 for receiving a voice command selects a corresponding microphone (ie, a microphone located at the utterance position of the user's voice command) based on the tracked utterance position of the user's voice command. can be selected.

또 다른 예를 들어, 음성 명령 수신용 마이크 선택부(220)는 음성 인식 서비스 제공 단말(100)의 배치 형태 및 사용자의 음성 명령의 발화 위치에 기초하여 음성 명령 수신용 마이크를 선택할 수 있다.As another example, the microphone selection unit 220 for receiving a voice command may select a microphone for receiving a voice command based on an arrangement form of the voice recognition service providing terminal 100 and a location where a user's voice command is uttered.

이를 위해, 음성 명령 수신용 마이크 선택부(220)는 음성 인식 서비스 제공 단말(100)에 구비된 적어도 하나의 센서에 의해 센싱된 센싱 데이터에 기초하여 음성 인식 서비스 제공 단말(100)의 기울기 정보를 추출할 수 있다. 예를 들어, 자이로 센서의 각속도로부터 음성 인식 서비스 제공 단말(100)이 기울어진 방향 및 기울기 값 등이 추출될 수 있다.To this end, the microphone selection unit 220 for receiving a voice command selects tilt information of the voice recognition service providing terminal 100 based on sensing data sensed by at least one sensor provided in the voice recognition service providing terminal 100 . can be extracted. For example, the direction in which the voice recognition service providing terminal 100 is inclined and the value of the inclination may be extracted from the angular velocity of the gyro sensor.

이 때, 음성 명령 수신용 마이크 선택부(220)는 음성 인식 서비스 제공 단말(100)의 기울기 정보에 기초하여 사용자의 음성 명령의 발화 위치에 기초하여 이에 대응하는 마이크를 음성 명령 수신용 마이크를 선택할 수 있다.At this time, the microphone selection unit 220 for receiving a voice command selects a microphone for receiving a voice command based on the utterance position of the user's voice command based on the tilt information of the voice recognition service providing terminal 100 . can

이와 같이, 본 발명에 따르면, 제 1 사운드 신호 및 제 2 사운드 신호에 대한 음압 레벨이 동일 또는 유사하더라도 음성 인식의 성능을 향상시킬 수 있는 최적의 마이크를 선택하는 것이 가능하다.도 3을 참조하면, 도면부호 305는 20ms의 단위 시간마다 음성 명령 수신용 마이크로 선택된 마이크를 도시한 것이고, 도면부호 307은 100ms의 단위 시간마다 음성 명령 수신용 마이크로 선택된 마이크를 도시한 것이다.As described above, according to the present invention, it is possible to select an optimal microphone capable of improving the performance of voice recognition even if the sound pressure levels of the first sound signal and the second sound signal are the same or similar. , reference numeral 305 denotes a microphone selected as a microphone for receiving a voice command every unit time of 20 ms, and reference numeral 307 denotes a microphone selected as a microphone for receiving a voice command every unit time of 100 ms.

도면부호 305 및 307을 살펴보면, 20ms의 단위 시간마다 음압 레벨을 측정하여 음성 명령 수신용 마이크를 선택할 경우, 선택되는 마이크가 빈번하게 바뀌는 문제가 있다. Referring to reference numerals 305 and 307 , when a microphone for receiving a voice command is selected by measuring the sound pressure level every unit time of 20 ms, there is a problem in that the selected microphone is frequently changed.

이에 반해, 100ms의 단위 시간마다 음압 레벨을 측정하여 음성 명령 수신용 마이크를 선택할 경우, 선택되는 마이크가 빈번하게 바뀌는 문제가 발생되지 않는 것을 확인할 수 있다. On the other hand, when a microphone for receiving a voice command is selected by measuring the sound pressure level every 100 ms unit time, it can be confirmed that a problem in which the selected microphone is frequently changed does not occur.

다만, 제 1 단위 시간은 80ms 내지 120ms 내에서 음성 인식 서비스 제공 단말(100)의 사양 또는 음성 인식 알고리즘에 따라 변동될 수 있음은 상술한 바와 같다. However, as described above, the first unit time may vary according to the specification of the voice recognition service providing terminal 100 or the voice recognition algorithm within 80 ms to 120 ms.

본 발명의 일 실시예에 따르면, 음압 레벨 측정부(210)는 제 1 단위 시간마다 측정된 제 1 사운드 신호에 대한 음압 레벨 및 제 2 사운드 신호에 대한 음압 레벨을 비교하여 제 1 마이크(10) 및 제 2 마이크(20) 중 하나의 마이크에 대응하는 카운터에 카운트값을 할당할 수 있다. According to an embodiment of the present invention, the sound pressure level measuring unit 210 compares the sound pressure level for the first sound signal and the sound pressure level for the second sound signal measured every first unit time to the first microphone 10 . and a count value may be assigned to a counter corresponding to one of the second microphones 20 .

예를 들어, 음압 레벨 측정부(210)는 제 1 단위 시간마다 제 1 마이크(10)로 입력된 제 1 사운드 신호에 대한 음압 레벨이 제 2 마이크(20)로 입력된 제 2 사운드 신호에 대한 음압 레벨보다 낮으면, 제 1 마이크(10)에 대응하는 카운터에 제 1 카운트값(예컨대, +1)을 할당하고, 제 2 마이크(20)로 입력된 제 2 사운드 신호에 대한 음압 레벨이 제 1 마이크(10)로 입력된 제 1 사운드 신호에 대한 음압 레벨이 낮으면, 제 1 마이크(10)에 대응하는 카운터에 제 2 카운트값(예컨대, -1)을 할당할 수 있다. For example, the sound pressure level measuring unit 210 may measure the sound pressure level of the first sound signal input to the first microphone 10 every first unit time for the second sound signal input to the second microphone 20 . If it is lower than the sound pressure level, a first count value (eg, +1) is assigned to the counter corresponding to the first microphone 10 , and the sound pressure level for the second sound signal input to the second microphone 20 is the second When the sound pressure level of the first sound signal input to the first microphone 10 is low, a second count value (eg, -1) may be assigned to a counter corresponding to the first microphone 10 .

음성 명령 수신용 마이크 선택부(220)는 제 1 마이크(10) 및 제 2 마이크(20) 중 하나의 마이크에 대응하는 카운터의 총 카운트값과 기설정된 임계값 간의 비교 결과에 기초하여 제 1 마이크(10) 및 제 2 마이크(20) 중 하나의 마이크를 음성 명령 수신용 마이크로 선택할 수 있다. The microphone selection unit 220 for receiving a voice command is based on a comparison result between the total count value of the counter corresponding to one of the first microphone 10 and the second microphone 20 and a preset threshold value, the first microphone One of the microphones (10) and the second microphone 20 may be selected as a microphone for receiving a voice command.

예를 들어, 음성 명령 수신용 마이크 선택부(220)는 제 1 마이크(10)에 대응하는 카운터의 총 카운트값이 기설정된 임계값(예컨대, 0)을 초과하는 경우, 제 1 마이크(10)를 음성 명령 수신용 마이크로 선택할 수 있다. 음성 명령 수신용 마이크 선택부(220)는 제 1 마이크(10)에 대응하는 카운터의 총 카운트값이 기설정된 임계값(예컨대, 0) 미만인 경우, 제 2 마이크(20)를 음성 명령 수신용 마이크로 선택할 수 있다. 이를 통해, 음성 명령 수신용 마이크의 선택에 있어서의 마이크 선택이 빈번하게 바뀌는 것을 방지할 수 있다. For example, when the total count value of the counter corresponding to the first microphone 10 exceeds a preset threshold value (eg, 0), the microphone selection unit 220 for receiving the voice command may include the first microphone 10 . can be selected as the microphone for receiving voice commands. When the total count value of the counter corresponding to the first microphone 10 is less than a preset threshold (eg, 0), the microphone selection unit 220 for receiving a voice command uses the second microphone 20 as a microphone for receiving a voice command. You can choose. Through this, it is possible to prevent the microphone selection from being frequently changed in the selection of the microphone for receiving the voice command.

다시 도 2로 돌아오면, 전처리 수행부(240)는 선택된 음성 명령 수신용 마이크를 통해 입력된 음성 명령 신호에 대하여 전처리를 수행할 수 있다. 예를 들어, 전처리 수행부(240)는 선택된 음성 명령 수신용 마이크를 통해 입력된 사운드 신호를 노이즈 필터에 입력하여 사운드 신호에서 잡음 노이즈 신호를 제거하고, 음성 명령 신호만을 추출하는 전처리를 수행할 수 있다. Returning to FIG. 2 , the pre-processing unit 240 may perform pre-processing on the voice command signal input through the selected voice command receiving microphone. For example, the preprocessing unit 240 may perform preprocessing of inputting a sound signal input through a microphone for receiving a selected voice command to a noise filter to remove a noise noise signal from the sound signal, and extracting only the voice command signal. have.

또한, 전처리 수행부(240)는 사운드 신호로부터 추출된 음성 명령 신호를 텍스트로 변환하고, 변환된 텍스트를 음성 인식 수행부(230)로 전달할 수 있다. Also, the preprocessing unit 240 may convert the voice command signal extracted from the sound signal into text, and transmit the converted text to the voice recognition performing unit 230 .

이 때, 음성 인식 수행부(230)는 전처리 수행부(240)로부터 수신된 텍스트가 활성화 명령 신호에 대응하는 텍스트인지 여부를 판단할 수 있다. In this case, the voice recognition performing unit 230 may determine whether the text received from the preprocessing performing unit 240 is a text corresponding to the activation command signal.

음성 인식 수행부(230)는 선택된 음성 명령 수신용 마이크를 통해 입력된 음성 명령 신호에 기초하여 음성 인식 서버(110)와 연동하여 음성 인식을 수행할 수 있다.The voice recognition performing unit 230 may perform voice recognition in conjunction with the voice recognition server 110 based on a voice command signal input through the selected voice command receiving microphone.

음성 인식 수행부(230)는 선택된 음성 명령 수신용 마이크를 통해 입력된 사운드 신호로부터 추출된 음성 명령 신호가 음성 인식을 위한 활성화 명령 신호인지 여부를 판단할 수 있다. The voice recognition performer 230 may determine whether the voice command signal extracted from the sound signal input through the selected voice command reception microphone is an activation command signal for voice recognition.

예를 들어, 음성 인식 수행부(230)는 추출된 음성 명령 신호가 활성화 명령 리스트(기설정된 활성화 명령 신호 또는 사용자 아이디 별로 기등록된 활성화 명령 신호)에 등록되어 있는지 여부를 판단할 수 있다. For example, the voice recognition performing unit 230 may determine whether the extracted voice command signal is registered in the activation command list (a preset activation command signal or an activation command signal registered for each user ID).

또한, 음성 인식 수행부(230)는 추출된 음성 명령 신호가 음성 인식 서비스를 활성화하기 위한 활성화 명령 신호(예컨대, '헤이 카카오')인지 또는 활성화된 음성인식 서비스를 이용하기 위한 제어 명령 신호(예컨대, '날씨 알려줘')인지 판단할 수 있다. In addition, the voice recognition performing unit 230 determines whether the extracted voice command signal is an activation command signal for activating the voice recognition service (eg, 'Hey Kakao') or a control command signal for using the activated voice recognition service (eg, 'Hey Kakao'). , 'tell me the weather').

음성 인식 수행부(230)는 음성 명령 신호가 음성 인식을 위한 활성화 명령 신호인 것으로 판단된 경우, 음성 인식 서비스 제공 단말(100)을 이 추가 음성 명령을 수신하도록 활성화시킬 수 있다. When it is determined that the voice command signal is an activation command signal for voice recognition, the voice recognition performer 230 may activate the voice recognition service providing terminal 100 to receive the additional voice command.

음성 명령 수신용 마이크 선택부(220)는 음성 인식 수행부(230)에 의해 음성 명령 신호가 활성화 명령 신호인 것으로 판단된 경우, 추가 음성 명령 신호(활성화 명령 이외의 음성 명령)를 제1 마이크(10) 및 제 2 마이크(20) 중 적어도 하나를 추가 음성 명령 신호를 입력받을 추가 음성 명령 수신용 마이크를 선택할 수 있다. When it is determined by the voice recognition performing unit 230 that the voice command signal is an activation command signal, the microphone selection unit 220 for receiving a voice command sends an additional voice command signal (voice command other than the activation command) to the first microphone ( 10) and at least one of the second microphone 20 may be selected as a microphone for receiving an additional voice command to receive an additional voice command signal.

예를 들어, 음성 명령 수신용 마이크 선택부(220)는 활성화 명령 신호를 입력받은 마이크를 추가 음성 명령 신호를 입력받을 추가 음성 명령 수신용 마이크로 선택할 수 있다. For example, the microphone selection unit 220 for receiving a voice command may select a microphone receiving an activation command signal as a microphone for receiving an additional voice command receiving an additional voice command signal.

다른 예로, 음성 명령 수신용 마이크 선택부(220)는 메인 마이크로 설정된 제 1 마이크(10)를 추가 음성 명령 신호를 입력받을 추가 음성 명령 수신용 마이크로 선택할 수도 있다. As another example, the microphone selector 220 for receiving a voice command may select the first microphone 10 set as the main microphone as a microphone for receiving an additional voice command to receive an additional voice command signal.

또 다른 예로, 음성 명령 수신용 마이크 선택부(220)는 음압 레벨 측정부(210)에 의해 측정된 제 1 마이크(10) 및 제 2 마이크(20) 각각을 통해 입력된 음성 명령 신호 또는 추가 음성 명령 신호에 대한 음압 레벨에 기초하여 제 1 마이크(10) 및 제 2 마이크(20) 중 하나를 추가 음성 명령 수신용 마이크로 다시 선택할 수도 있다.As another example, the microphone selection unit 220 for receiving a voice command is a voice command signal or additional voice input through each of the first microphone 10 and the second microphone 20 measured by the sound pressure level measurement unit 210 . Based on the sound pressure level for the command signal, one of the first microphone 10 and the second microphone 20 may be selected again as a microphone for receiving an additional voice command.

예를 들어, 음성 명령 수신용 마이크 선택부(220)는 기설정된 과거의 시간 동안 제 1 마이크(10) 및 제 2 마이크(20)가 추가 음성 명령 수신용 마이크로 선택된 횟수에 기초하여 제1 마이크(10) 및 제 2 마이크(20) 중 추가 음성 명령 신호를 입력받을 추가 음성 명령 수신용 마이크를 선택할 수도 있다. For example, the microphone selection unit 220 for receiving a voice command may include the first microphone ( 10 ) and the second microphone ( 20 ) based on the number of times the first microphone 10 and the second microphone 20 are selected as the microphone for receiving an additional voice command for a preset past time. 10) and the second microphone 20, a microphone for receiving an additional voice command to receive an additional voice command signal may be selected.

예를 들어, 음성 명령 수신용 마이크 선택부(220)는 사용자의 음성 명령의 발화 위치에 기초하여 제1 마이크(10) 및 제 2 마이크(20) 중 어느 하나를 추가 음성 명령 신호를 입력받을 추가 음성 명령 수신용 마이크를 선택할 수도 있다. 이때, 상술한 바와 같이, 음성 인식 서비스 제공 단말(100)의 기울기 정보를 추가로 고려할 수 있다.For example, the microphone selection unit 220 for receiving a voice command may add any one of the first microphone 10 and the second microphone 20 to receive an additional voice command signal based on the utterance position of the user's voice command. You can also choose a microphone for receiving voice commands. In this case, as described above, tilt information of the voice recognition service providing terminal 100 may be additionally taken into consideration.

이와 달리, 가장 최근에 추가 음성 명령 신호를 입력받을 추가 음성 명령 수신용 마이크로 선택된 마이크가 동일하게 선택될 수도 있다.Alternatively, a microphone selected as a microphone for receiving an additional voice command from which an additional voice command signal is most recently received may be equally selected.

음성 인식 수행부(230)는 제 1 마이크(10) 및 제 2 마이크(20) 중 추가 음성 명령 수신용 마이크로 선택된 적어도 하나의 마이크를 통해 입력된 사용자의 추가 음성 명령 신호(예컨대, '음악 틀어줘')에 기초하여 음성 인식을 수행할 수 있다. 이 때, 음성 인식 수행부(230)는 추가 음성 명령 신호를 음성 인식 서버(110)로 전송하고, 음성 인식 서버(110)로부터 수신된 추가 음성 명령 신호에 대응하는 음성 인식 정보에 기초하여 음성 인식 서비스를 제공할 수 있다. 여기서, 음성 인식 정보는 추가 음성 명령 신호(예컨대, 질문일 경우)에 대한 응답 정보(예컨대, 질문에 대한 답변), 음성 인식 서비스 제공 단말(100)의 제어 정보(예컨대, 온, 오프, 음량 조절에 대한 정보), 음성 인식 서비스 제공 단말(100)과 연동된 다양한 단말의 제어 정보를 포함할 수 있다.The voice recognition performing unit 230 may include a user's additional voice command signal (eg, 'play music') input through at least one microphone selected as a microphone for receiving an additional voice command among the first microphone 10 and the second microphone 20 . '), voice recognition may be performed. At this time, the voice recognition performing unit 230 transmits an additional voice command signal to the voice recognition server 110 , and performs voice recognition based on voice recognition information corresponding to the additional voice command signal received from the voice recognition server 110 . service can be provided. Here, the voice recognition information includes response information to an additional voice command signal (eg, in the case of a question) (eg, an answer to a question), and control information (eg, on, off, volume control) of the voice recognition service providing terminal 100 . information), and control information of various terminals interworking with the voice recognition service providing terminal 100 .

도 4는 본 발명의 일 실시예에 따른, 음성 인식 서비스를 제공하는 방법을 나타낸 흐름도이다. 도 4를 참조하면, 단계 S401에서 음성 인식 서비스 제공 단말(100)은 음성 인식 서비스 제공 단말(100)의 일측에 배치된 제 1 마이크(10) 및 음성 인식 서비스 제공 단말(100)의 타측에 배치된 제 2 마이크(20) 각각을 통해 사운드 신호를 입력받을 수 있다. 4 is a flowchart illustrating a method of providing a voice recognition service according to an embodiment of the present invention. Referring to FIG. 4 , in step S401 , the voice recognition service providing terminal 100 is disposed on the other side of the first microphone 10 and the voice recognition service providing terminal 100 disposed on one side of the voice recognition service providing terminal 100 . A sound signal may be input through each of the second microphones 20 .

단계 S403에서 음성 인식 서비스 제공 단말(100)은 제 1 마이크(10)를 통해 입력된 제 1 사운드 신호 및 제 2 마이크(20)를 통해 입력된 제 2 사운드 신호 각각에 대한 음압 레벨을 측정할 수 있다. In step S403 , the voice recognition service providing terminal 100 may measure the sound pressure level for each of the first sound signal input through the first microphone 10 and the second sound signal input through the second microphone 20 . have.

단계 S405에서 음성 인식 서비스 제공 단말(100)은 측정된 음압 레벨에 기초하여 제 1 마이크(10) 및 제 2 마이크(20) 중 하나의 마이크를 음성 명령 신호를 수신하는 음성 명령 수신용 마이크로 선택할 수 있다. In step S405, the voice recognition service providing terminal 100 may select one of the first microphone 10 and the second microphone 20 as a microphone for receiving a voice command for receiving a voice command signal based on the measured sound pressure level. have.

단계 S407에서 음성 인식 서비스 제공 단말(100)은 선택된 명령 수신용 마이크를 통해 입력된 음성 명령 신호에 기초하여 음성 인식을 수행할 수 있다. In step S407, the voice recognition service providing terminal 100 may perform voice recognition based on the voice command signal input through the selected command receiving microphone.

도 5는 본 발명의 일 실시예에 따른, 도 1에 도시된 음성 인식 서버(110)의 블록도이다. 5 is a block diagram of the voice recognition server 110 shown in FIG. 1 according to an embodiment of the present invention.

도 5를 참조하면, 본 발명의 일 실시예에 따른 음성 인식 서버(110)는 사운드 수신부(500), 음압 레벨 측정부(510), 음성 명령 수신용 마이크 선택부(520), 음성 인식 수행부(530) 및 전처리 수행부(540)를 포함할 수 있다. 다만, 본 발명의 일 실시예에 따른 음성 인식 서버(110)의 구성은 도 1과 다르게 이루어질 수도 있다. Referring to FIG. 5 , the voice recognition server 110 according to an embodiment of the present invention includes a sound receiving unit 500 , a sound pressure level measuring unit 510 , a voice command receiving microphone selection unit 520 , and a voice recognition performing unit. 530 and a pre-processing unit 540 may be included. However, the configuration of the voice recognition server 110 according to an embodiment of the present invention may be different from that of FIG. 1 .

사운드 수신부(500)는 음성 인식 서비스 제공 단말(100)의 제 1 마이크(10)를 통해 입력된 제 1 사운드 신호 및 제 2 마이크(20)를 통해 입력된 제 2 사운드 신호를 음성 인식 서비스 제공 단말(100)로부터 수신할 수 있다. 여기서, 제 1 사운드 신호 및 제 2 사운드 신호는 동일 신호이며, 제 1 및 제 2 사운드 신호에는 사용자의 음성 명령 및 주변 노이즈 신호가 포함될 수 있다. The sound receiver 500 receives the first sound signal input through the first microphone 10 of the voice recognition service providing terminal 100 and the second sound signal input through the second microphone 20 of the voice recognition service providing terminal It can be received from (100). Here, the first sound signal and the second sound signal are the same signal, and the first and second sound signals may include a user's voice command and an ambient noise signal.

사운드 수신부(500)는 제 1 단위 시간마다 제 1 사운드 신호 및 제 2 사운드 신호를 음성 인식 서비스 제공 단말(100)로부터 수신할 수 있다. 여기서, 제 1 단위 시간은 예를 들어, 음성 인식 서비스 제공 단말(100)의 사양 또는 음성 인식 알고리즘에 따라 달라질 수 있으며 일례로 80ms 내지 120ms 내에서 결정될 수도 있다. 바람직하게 제 1 단위 시간은 100ms일 수 있다The sound receiver 500 may receive the first sound signal and the second sound signal from the voice recognition service providing terminal 100 every first unit time. Here, the first unit time may vary according to, for example, a specification of the voice recognition service providing terminal 100 or a voice recognition algorithm, and may be determined within, for example, 80 ms to 120 ms. Preferably, the first unit time may be 100 ms.

음압 레벨 측정부(510)는 제 1 사운드 신호 및 제 2 사운드 신호 각각에 대한 음압 레벨을 측정할 수 있다. The sound pressure level measuring unit 510 may measure the sound pressure level of each of the first sound signal and the second sound signal.

음압 레벨 측정부(510)는 제 1 단위 시간마다 제 1 마이크(10)를 통해 입력된 제 1 사운드 신호 및 제 2 마이크(20)를 통해 입력된 제 2 사운드 신호 각각에 대한 음압 레벨을 측정할 수 있다. The sound pressure level measuring unit 510 measures the sound pressure level for each of the first sound signal input through the first microphone 10 and the second sound signal input through the second microphone 20 every first unit time. can

음성 명령 수신용 마이크 선택부(520)는 측정된 음압 레벨에 기초하여 제 1 마이크(10) 및 제 2 마이크(20) 중 하나의 마이크를 음성 명령 신호를 수신하는 음성 명령 수신용 마이크로 선택할 수 있다. 예를 들면, 음성 명령 수신용 마이크 선택부(520)는 제 1 사운드 신호에 대한 음압 레벨 및 제 2 사운드 신호에 대한 음압 레벨 중 낮은 음압 레벨을 갖는 사운드 신호가 입력된 마이크를 음성 명령 수신용 마이크로 선택할 수 있다. The microphone selector 520 for receiving a voice command may select one of the first microphone 10 and the second microphone 20 as a microphone for receiving a voice command for receiving a voice command signal based on the measured sound pressure level. . For example, the microphone selection unit 520 for receiving a voice command sets a microphone to which a sound signal having a lower sound pressure level among a sound pressure level for the first sound signal and a sound pressure level for the second sound signal is input as a microphone for receiving a voice command. You can choose.

음성 명령 수신용 마이크 선택부(520)는 제 1 단위 시간마다 측정된 제 1 사운드 신호에 대한 음압 레벨 및 제 2 사운드 신호에 대한 음압 레벨을 비교하여 음성 명령 수신용 마이크를 선택할 수 있다. The microphone selection unit 520 for receiving a voice command may select a microphone for receiving a voice command by comparing the sound pressure level of the first sound signal and the sound pressure level of the second sound signal measured every first unit time.

상술한 음성 인식 서비스 제공 단말(100)이 음성 명령 수신용 마이크를 선택하는 다양한 방법을 통해 음성 명령 수신용 마이크 선택부(520)가 음성 명령 수신용 마이크를 선택할 수 있다.The microphone selection unit 520 for receiving a voice command may select a microphone for receiving a voice command through various methods in which the above-described voice recognition service providing terminal 100 selects a microphone for receiving a voice command.

전처리 수행부(540)는 선택된 음성 명령 수신용 마이크를 통해 입력된 음성 명령 신호를 음성 인식 서비스 제공 단말(110)로부터 수신된 경우, 해당 음성 명령 신호에 대하여 전처리를 수행할 수 있다. When the voice command signal input through the selected voice command reception microphone is received from the voice recognition service providing terminal 110 , the preprocessing unit 540 may perform preprocessing on the voice command signal.

예를 들어, 전처리 수행부(540)는 선택된 음성 명령 수신용 마이크를 통해 입력된 사운드 신호를 노이즈 필터에 입력하여 사운드 신호에서 잡음 노이즈 신호를 제거하고, 음성 명령 신호만을 추출하는 전처리를 수행할 수 있다.For example, the pre-processing unit 540 may perform pre-processing of removing a noise noise signal from a sound signal by inputting a sound signal input through a microphone for receiving a selected voice command to a noise filter, and extracting only a voice command signal. have.

또한, 전처리 수행부(540)는 사운드 신호로부터 추출된 음성 명령 신호를 텍스트로 변환하고, 변환된 텍스트를 음성 인식 수행부(530)로 전달할 수 있다. 이 때, 음성 인식 수행부(530)는 전처리 수행부(540)로부터 수신된 텍스트가 활성화 명령 신호에 대응하는 텍스트인지 여부를 판단할 수 있다.Also, the preprocessing unit 540 may convert the voice command signal extracted from the sound signal into text, and transmit the converted text to the voice recognition performing unit 530 . In this case, the voice recognition performing unit 530 may determine whether the text received from the preprocessing performing unit 540 is a text corresponding to the activation command signal.

음성 인식 수행부(530)는 선택된 음성 명령 수신용 마이크를 통해 입력된 음성 명령 신호에 기초하여 음성 인식 서비스 제공 단말(100)과 연동하여 음성 인식 서비스를 제공할 수 있다.The voice recognition performer 530 may provide a voice recognition service by interworking with the voice recognition service providing terminal 100 based on a voice command signal input through the selected voice command reception microphone.

음성 인식 수행부(530)는 선택된 음성 명령 수신용 마이크를 통해 입력된 사운드 신호로부터 음성 명령 신호를 추출하고, 추출된 음성 명령 신호가 음성 인식 서비스를 활성화하기 위한 활성화 명령 신호인지 또는 음성 인식 서비스를 이용하기 위한 제어 명령 신호인지 여부를 판단할 수 있다. The voice recognition performing unit 530 extracts a voice command signal from a sound signal input through a microphone for receiving a selected voice command, and whether the extracted voice command signal is an activation command signal for activating a voice recognition service or a voice recognition service It may be determined whether a control command signal to be used.

음성 인식 수행부(530)는 음성 명령 신호가 음성 인식을 위한 활성화 명령 신호인 것으로 판단된 경우, 음성 인식 서비스 제공 단말(100)로 음성 인식 서비스 제공 단말(100)이 추가 음성 명령 신호를 수신하도록 활성화하는 메시지를 전달할 수 있다. When it is determined that the voice command signal is an activation command signal for voice recognition, the voice recognition performing unit 530 is configured to receive an additional voice command signal from the voice recognition service providing terminal 100 to the voice recognition service providing terminal 100 . You can pass an activation message.

음성 명령 수신용 마이크 선택부(520)는 음성 명령 신호가 활성화 명령 신호인 것으로 판단된 경우, 제 1 마이크(10) 및 제 2 마이크(20) 중 적어도 하나의 마이크를 추가 음성 명령 신호(활성화 명령 이외의 음성 명령)을 입력받을 추가 음성 명령 수신용 마이크로 선택할 수 있다. When it is determined that the voice command signal is an activation command signal, the microphone selection unit 520 for receiving a voice command adds at least one microphone among the first microphone 10 and the second microphone 20 to an additional voice command signal (activation command). You can select a microphone for receiving additional voice commands to receive other voice commands).

상술한 음성 인식 서비스 제공 단말(100)이 추가 음성 명령 수신용 마이크를 선택하는 다양한 방법을 통해 음성 명령 수신용 마이크 선택부(520)가 추가 음성 명령 수신용 마이크를 선택할 수 있다.The above-described voice recognition service providing terminal 100 may select the microphone for receiving the additional voice command by the microphone selection unit 520 for receiving the voice command through various methods for selecting the microphone for receiving the additional voice command.

음성 인식 수행부(530)는 추가 음성 명령 수신용 마이크를 통해 입력된 사용자의 추가 음성 명령 신호를 음성 인식 서비스 제공 단말(100)로부터 수신한 경우, 수신된 추가 음성 명령 신호에 기초하여 음성 인식을 수행하고, 이의 결과로서 음성 인식 정보를 음성 인식 서비스 제공 단말(100)에게 전송할 수 있다. 여기서, 음성 인식 정보는 추가 음성 명령 신호(예컨대, 질문일 경우)에 대한 응답 정보(예컨대, 질문에 대한 답변), 음성 인식 서비스 제공 단말(100)의 제어 정보(예컨대, 온, 오프, 음량 조절에 대한 정보), 음성 인식 서비스 제공 단말(100)과 연동된 다양한 단말의 제어 정보를 포함할 수 있다.When the voice recognition performing unit 530 receives the user's additional voice command signal input through the microphone for receiving the additional voice command from the voice recognition service providing terminal 100, the voice recognition is performed based on the received additional voice command signal. performed, and as a result of this, voice recognition information may be transmitted to the voice recognition service providing terminal 100 . Here, the voice recognition information includes response information to an additional voice command signal (eg, in the case of a question) (eg, an answer to a question), and control information (eg, on, off, volume control) of the voice recognition service providing terminal 100 . information), and control information of various terminals interworking with the voice recognition service providing terminal 100 .

본 발명의 실시예는 컴퓨터에 의해 실행되는 프로그램 모듈과 같은 컴퓨터에 의해 실행가능한 명령어를 포함하는 기록 매체의 형태로도 구현될 수 있다. 컴퓨터 판독 가능 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한 컴퓨터 판독가능 매체는 컴퓨터 저장 매체를 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함한다. 전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.Embodiments of the present invention may also be implemented in the form of a recording medium including instructions executable by a computer, such as a program module to be executed by a computer. Computer-readable media can be any available media that can be accessed by a computer and includes both volatile and nonvolatile media, removable and non-removable media. Computer-readable media may also include computer storage media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. The description of the present invention described above is for illustration, and those of ordinary skill in the art to which the present invention pertains can understand that it can be easily modified into other specific forms without changing the technical spirit or essential features of the present invention. will be. Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive. For example, each component described as a single type may be implemented in a dispersed form, and likewise components described as distributed may be implemented in a combined form.

본 발명의 방법 및 시스템은 특정 실시예와 관련하여 설명되었지만, 그것들의 구성 요소 또는 동작의 일부 또는 전부는 범용 하드웨어 아키텍쳐를 갖는 컴퓨터 시스템을 사용하여 구현될 수 있다.Although the methods and systems of the present invention have been described with reference to specific embodiments, some or all of their components or operations may be implemented using a computer system having a general purpose hardware architecture.

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.The description of the present invention described above is for illustration, and those of ordinary skill in the art to which the present invention pertains can understand that it can be easily modified into other specific forms without changing the technical spirit or essential features of the present invention. will be. Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive. For example, each component described as a single type may be implemented in a dispersed form, and likewise components described as distributed may be implemented in a combined form.

본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present invention is indicated by the following claims rather than the above detailed description, and all changes or modifications derived from the meaning and scope of the claims and their equivalent concepts should be interpreted as being included in the scope of the present invention. do.

Claims

In the terminal providing a voice recognition service,
a sound input unit receiving a sound signal through each of a first microphone disposed on one side of the terminal and a second microphone disposed on the other side of the terminal;
a sound pressure level measuring unit for measuring sound pressure levels for each of the first sound signal input through the first microphone and the second sound signal input through the second microphone;
a microphone selection unit for receiving a voice command for selecting one of the first microphone and the second microphone as a microphone for receiving a voice command for receiving a voice command signal based on the measured sound pressure level; and
A voice recognition performing unit for performing voice recognition based on the voice command signal input through the selected command receiving microphone
including,
The microphone selection unit for receiving a voice command is based on the number of times that one of the first microphone and the second microphone is selected as the microphone for receiving the voice command during a preset past time, the sound pressure level of each of the first microphone and the second microphone different weights are applied to
Of the sound pressure levels to which the different weights are applied, a microphone to which a sound signal having a lower sound pressure level is input is selected as a microphone for receiving a voice command.

delete

The method of claim 1,
The sound pressure level measuring unit measures the sound pressure level of the sound signal every first unit time,
The microphone selection unit for receiving a voice command compares the sound pressure level for the first sound signal and the sound pressure level for the second sound signal measured every first unit time to select the microphone for receiving the voice command, terminal.

delete

The method of claim 1,
The sound pressure level measuring unit measures a sound pressure level of a first voice command signal input to the first microphone and a sound pressure level of a second voice command signal input into the second microphone, and measures the sound pressure level of the first voice command signal and the second voice command signal. 2 The terminal is to track the utterance position of the user's voice command based on the sound pressure level of the voice command signal.

6. The method of claim 5,
The microphone selection unit for receiving a voice command applies a first weight to the first sound signal and a second weight to the second sound signal based on the utterance position of the user's voice command.

delete

6. The method of claim 5,
The terminal for selecting the microphone for receiving the voice command is to select the microphone for receiving the voice command based on the utterance position of the user's voice command.

The method of claim 1,
When the voice command signal is an activation command signal for activating the voice recognition service, the voice command receiving microphone selector receives an additional voice command signal through at least one of the first microphone and the second microphone. A terminal that is selected as a microphone for receiving commands.

10. The method of claim 9,
When the voice command signal is an activation command signal for activating the voice recognition service, the microphone selection unit for receiving a voice command is a microphone for receiving the additional voice command for the first microphone and the second microphone for a preset past time. The terminal for selecting the microphone for receiving the additional voice voice command based on the selected number of times.

The method of claim 1,
The terminal further comprising a pre-processing unit for performing pre-processing on the voice command signal input through the selected command receiving microphone.

A method of providing a voice recognition service, comprising:
receiving a sound signal through each of a first microphone disposed on one side of the terminal and a second microphone disposed on the other side of the terminal;
measuring a sound pressure level for each of a first sound signal input through the first microphone and a second sound signal input through the second microphone;
selecting one of the first microphone and the second microphone as a microphone for receiving a voice command for receiving a voice command signal based on the measured sound pressure level; and
performing voice recognition based on the voice command signal input through the selected command receiving microphone
including,
The selecting step
For a preset past time, based on the number of times that one of the first microphone and the second microphone is selected as the microphone for receiving the voice command, different weights are applied to the sound pressure levels of the first microphone and the second microphone, respectively step; and
and selecting a microphone to which a sound signal having a lower sound pressure level is input from among the sound pressure levels to which the different weights are applied as a microphone for receiving a voice command.

delete

13. The method of claim 12,
Measuring the sound pressure level
Measuring the sound pressure level for the sound signal every first unit time,
The selecting step
Comparing the sound pressure level for the first sound signal and the sound pressure level for the second sound signal measured for each first unit time, selecting a microphone for receiving the voice command, the voice recognition service providing Way.

13. The method of claim 12,
and performing pre-processing on the voice command signal input through the selected command receiving microphone.

A server providing a voice recognition service, comprising:
a sound receiver for receiving sound signals input from the terminal through each of the first microphone and the second microphone of the terminal;
a sound pressure level measuring unit for measuring sound pressure levels for each of the first sound signal input through the first microphone and the second sound signal input through the second microphone;
a microphone selection unit for receiving a voice command for selecting one of the first microphone and the second microphone as a microphone for receiving a voice command for receiving a voice command signal based on the measured sound pressure level; and
A voice recognition performing unit for performing voice recognition based on the voice command signal input through the selected command receiving microphone
including,
The microphone selection unit for receiving a voice command is based on the number of times that one of the first microphone and the second microphone is selected as the microphone for receiving the voice command during a preset past time, the sound pressure level of each of the first microphone and the second microphone different weights are applied to
Of the sound pressure levels to which the different weights are applied, a microphone to which a sound signal having a lower sound pressure level is input is selected as a microphone for receiving a voice command.

17. The method of claim 16,
The voice recognition performing unit determines whether the voice command signal is an activation command signal for activating the voice recognition service, and when it is determined that the voice command signal is the activation command signal, the terminal sends an additional voice command signal A server that sends a message that activates it to be received.

18. The method of claim 17,
The voice recognition performing unit receives the additional voice command signal from the terminal, performs voice recognition based on the received additional voice command signal, and transmits voice recognition information to the terminal as a result of the voice recognition , server.