KR100233532B1

KR100233532B1 - Audio Codec of Voice Communication System

Info

Publication number: KR100233532B1
Application number: KR1019970005111A
Authority: KR
Inventors: 김남시
Original assignee: 윤종용; 삼성전자주식회사
Priority date: 1997-02-20
Filing date: 1997-02-20
Publication date: 1999-12-01
Anticipated expiration: 2017-02-20
Also published as: KR19980068496A

Abstract

개시된 내용은 음성통신시스템의 오디오코덱(CODEC)에 관한 것으로서, 음성통신시스템에 있어서, 입력되는 음성신호를 인식하여 문자데이터로 변환시켜 출력하는 부호기, 및 부호기로부터 전송되는 문자데이터를 수신하며, 수신된 문자데이터를 기설정된 음색을 갖는 인공의 음성신호로 생성하여 출력하는 복호기를 포함한다. 이와 같은, 오디오코덱은 대역폭이 작은 통신라인을 이용하여 음성통신할 수 있게 하는 효과를 가져온다.The present disclosure relates to an audio codec (CODEC) of a voice communication system. In a voice communication system, an encoder recognizes an input voice signal, converts it into text data, and outputs the received text data, and receives text data transmitted from the coder. And a decoder for generating and outputting the generated text data into an artificial voice signal having a predetermined tone. As such, the audio codec has an effect of enabling voice communication using a communication line having a small bandwidth.

Description

Audio Codec of Voice Communication System

본 발명은 음성통신시스템의 오디오코덱(CODEC)에 관한 것으로서, 보다 상세하게는, 음성인식과 인공(人工) 음성의 생성을 이용하여 대역폭이 작은 통신채널에서도 음성통신을 할 수 있도록 하는 장치에 관한 것이다.The present invention relates to an audio codec (CODEC) of a voice communication system, and more particularly, to an apparatus for enabling voice communication in a communication channel having a low bandwidth by using voice recognition and artificial voice generation. will be.

근래 들어, 지역적으로 멀리 떨어진 사람의 얼굴을 보며 대화를 할 수 있게 하는 화상통신에 기반을 둔 멀티미디어제품들이 등장하고 있다. 하나의 회선으로 연결되어 있는 화상통신시스템에서, 영상 및 음성신호를 부호화/복호화하는 방식, 및 신호들을 다중화하기 위한 방식의 표준안들이 회선의 종류마다 권고되어 있다.In recent years, multimedia products based on video communication have emerged, which enable people to talk face to face with remote people. In a video communication system connected by one line, standard proposals of a method of encoding / decoding video and audio signals and a method of multiplexing signals are recommended for each type of line.

화상통신시스템을 이용하여 통신을 하는 사용자가, 예를 들어, "안녕하십니까?"라고 2초에 걸쳐 말하였을 경우를 생각하여 보자. 여기서, 사용자로부터의 음성신호를 8kH_Z의 주파수로 샘플링한 후, 한 샘플당 2바이트를 할당하여 디지탈변환시키면, 데이터량은 32,000바이트(byte)가 된다. 이렇게 많은 데이터를 전송하기 위해서는 대역폭이 큰 통신라인을 사용하여야 한다. 최근 들어, 대역폭이 작은 통신라인을 이용하여 음성신호를 전송할 수 있도록 G.723, G.728, G.729와 같은 음성압축부호화의 표준안들이 권고되었다. 이 중에서, 일반전화망(PSTN)을 이용한 디지탈통신을 위하여 제안된 G.723의 성능이 가장 우수한 것으로 알려져 있다. G.723에서 낮은 전송율(low-rate)의 경우, 5.3Kbps로 데이터를 압축한다. 이를 이용하여 전술한 음성신호를 압축하더라도 데이터량은 약 1333바이트가 된다.Consider a case where a user who communicates using a video communication system has said, for example, "Hello?" Over two seconds. Here, after sampling a speech signal from a user at a frequency of 8kH _Z, when the digital conversion to allocate two bytes per sample, the amount of data is a 32,000 byte (byte). In order to transmit such a large amount of data, a communication line having a large bandwidth must be used. Recently, voice compression coding standards such as G.723, G.728, and G.729 have been recommended to transmit voice signals using a small bandwidth communication line. Among them, the performance of the proposed G.723 for digital communication using the public telephone network (PSTN) is known to be the best. For low-rate in G.723, data is compressed at 5.3Kbps. Even if the above-mentioned audio signal is compressed using this, the data amount is about 1333 bytes.

여기서, 통신에 참가한 사용자들 사이에, 말하는 사람이 누구인 지를 분명히 알 수 있으며, 그런 이유에서 말하는 사람의 음색(tone)에 상관없이 그 말한 내용만을 알 수 있어도 된다면, 우리는 굳이 전술한 압축부호화방법들을 사용하지 않아도 된다. 다시 말해, 전술한 화자(話者)의 음색에 상관없이 "안녕하십니까?"라는 내용만 인식하여 문자화하면, 12바이트만을 사용하여 그 내용을 표현할 수 있다.Here, if the users who participated in the communication can clearly know who the speaker is, and for that reason, only the contents of the speaker can be known regardless of the speaker's tone, There is no need to use methods. In other words, regardless of the tone of the speaker described above, if only the content of "Hello?" Is recognized and characterized, the contents can be expressed using only 12 bytes.

본 발명의 목적은, 화자의 음성을 인식하여 데이터량이 적은 문자데이터로 변환시켜 전송하고, 전송되어진 문자데이터를 인공음성으로 변환시켜 출력하므로써 대역폭이 작은 통신채널을 이용하여 음성통신할 수 있도록 하는 음성통신시스템의 오디오코덱을 제공함에 있다.An object of the present invention is to recognize the voice of the speaker to convert the data into a small amount of text data and transmit, and to convert the transmitted text data into artificial voice and output the voice to enable the voice communication using a small bandwidth communication channel An audio codec of a communication system is provided.

도 1은 본 발명에 따른 음성통신시스템의 구성도.1 is a block diagram of a voice communication system according to the present invention.

<도면의 주요부분에 대한 부호의 설명><Description of the symbols for the main parts of the drawings>

11 : 마이크12 : 음성인식기11: microphone 12: voice recognizer

13 : 음성생성기14 : 스피커13: voice generator 14: speaker

이와 같은 목적을 달성하기 위한 본 발명에 따른 오디오코덱은, 음성통신시스템에 있어서, 입력되는 음성신호를 인식하여 문자데이터로 변환시켜 출력하는 부호기, 및 부호기로부터 전송되는 문자데이터를 수신하며, 수신된 문자데이터를 기설정된 음색을 갖는 인공의 음성신호로 생성하여 출력하는 복호기를 포함한다.The audio codec according to the present invention for achieving the above object, in the voice communication system, recognizes the input voice signal, converts it into text data and outputs, and receives the text data transmitted from the coder, And a decoder for generating and outputting text data into an artificial voice signal having a predetermined tone.

이하, 첨부한 도면을 참조하여 본 발명을 상세히 설명하기로 한다.Hereinafter, with reference to the accompanying drawings will be described in detail the present invention.

도 1은 본 발명에 따른 음성통신시스템의 구성을 보여준다.1 shows a configuration of a voice communication system according to the present invention.

도 1에서, 마이크(11)는 사용자로부터 음성신호를 입력받는다. 음성인식기(12)는 마이크(11)로부터의 음성신호를 입력받으며, 문자화된 데이터를 출력한다. 문자화된 데이터는 일반전화망인 PSTN을 통해 전송된다. 수신측의 음성생성기(13)는 수신되는 문자화된 데이터로부터 음성신호를 생성하며, 생성된 음성신호를 스피커(14)를 통해 출력시킨다.In FIG. 1, the microphone 11 receives a voice signal from a user. The voice recognizer 12 receives a voice signal from the microphone 11 and outputs text data. The text data is transmitted through the PSTN, which is a general telephone network. The voice generator 13 on the receiving side generates a voice signal from the received text data, and outputs the generated voice signal through the speaker 14.

이와 같이 구성된 도 1을 사용하는 사용자는, 화상회의시 마이크(11)를 이용해 자신의 의견을 말한다. 마이크(11)를 통해 입력되는 사용자의 음성신호는 음성인식기(12)로 인가된다. 음성인식기(12)는 입력되는 신호를 시스템내에서 인식할 수 있는 코드로 변환시킨다. 코드로 변환된 음성데이터, 즉, 문자화된 음성데이터는 일반전화망(PSTN)을 통해 상대측 화상단말로 전송된다. 여기서, 문자화된 데이터의 양은, 문자변환되기 이전의 음성신호를 압축했을 때보다 훨씬 작기 때문에 대역폭이 작은 통신채널을 이용하여 전송해도 된다.The user using FIG. 1 configured as described above speaks his or her opinion using the microphone 11 during a video conference. The voice signal of the user input through the microphone 11 is applied to the voice recognizer 12. The speech recognizer 12 converts an input signal into a code that can be recognized in the system. Voice data converted into codes, i.e., textualized voice data, is transmitted to the opposite video terminal through the PSTN. In this case, since the amount of text data is much smaller than that of compressing the voice signal before the text conversion, the text data may be transmitted using a communication channel having a small bandwidth.

한편, 복호기는 일반전화망(PSTN)을 통해 전송되는 문자화된 데이터를 수신하며, 수신된 데이터를 음성신호로 복원한다. 즉, 음성생성기(13)는 수신되는 문자데이터를 기설정된 음색을 갖는 인공의 음성신호로 변환시켜 출력한다. 음성생성기(13)로부터 발생된 음성신호는 스피커(14)를 통해 출력된다. 여기서, 마이크(11)를 통해 입력되는 음색과 스피커(14)를 통해 출력되는 음색은 서로 다르지만, 화자가 말한 내용은 다른 사용자들에게 그대로 전달된다.Meanwhile, the decoder receives the text data transmitted through the PSTN, and restores the received data into the voice signal. That is, the voice generator 13 converts the received text data into an artificial voice signal having a predetermined tone and outputs the same. The voice signal generated from the voice generator 13 is output through the speaker 14. Here, the voice input through the microphone 11 and the voice output through the speaker 14 are different from each other, but the content of the speaker is transmitted to other users.

전술한 부호화측의 음성인식기(12)와 복호화측의 음성생성기(13)는 최근 들어 그 개발에 많은 진척을 보인 음성인식IC와 음성합성용IC를 시스템에 내장하므로써 구현이 가능하다.The above-mentioned speech recognizer 12 on the encoding side and the speech generator 13 on the decoding side can be implemented by incorporating a speech recognition IC and a speech synthesis IC, which have made much progress in recent years in the system.

이와 같은 본 발명에 따른 오디오코덱은 대역폭이 작은 통신라인을 이용하여 음성통신할 수 있게 하는 효과를 가져온다.Such an audio codec according to the present invention has the effect of enabling voice communication using a communication line having a small bandwidth.

Claims

In a voice communication system,

An encoder which recognizes an input voice signal and converts the input voice signal into text data; And

A decoder which receives the text data transmitted from the encoder and generates and outputs the received text data as an artificial voice signal having a predetermined tone; And

And a network for establishing a communication connection between the encoder and the decoder.

The audio codec according to claim 1, wherein the encoder comprises a speech recognition IC.

The audio codec according to claim 1, wherein the decoder comprises a voice synthesis IC.