JPS5823097A

JPS5823097A - Voice recognition apparatus

Info

Publication number: JPS5823097A
Application number: JP12163181A
Authority: JP
Inventors: 直樹石井; 良平中津; 小島　順治
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1981-08-03
Filing date: 1981-08-03
Publication date: 1983-02-10

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】この発明は音声で入力された単語、音節等を紹鐵する音
声ｉＩａ！識装置に関するものである。[Detailed Description of the Invention] This invention is a voice iia! that introduces words, syllables, etc. input by voice! It is related to the identification device.

音声で入力された数字、地名等の単語もしくは音節等を
認識する音声認識装置は荷物の区分け、端末からのデー
タ入力尋の分野で実用に供されている。従来の音声認識
装置の構成を第１図に示す。Speech recognition devices that recognize words or syllables such as numbers, place names, etc. input by voice are in practical use in the fields of sorting luggage and inputting data from terminals. The configuration of a conventional speech recognition device is shown in FIG.

第１図において入力端子１よ多入力された祈声は音声分
析部２において帯域制限された後、Ａ／Ｄ変換によりデ
ィジタル音声に変換され、パワー情報等を用いて音声区
間の抽出が行われる、その抽出された音声区間では特徴
パラメータが計算され、入力音声は％徴パラメータに変
換される。ＭＵ部３では標準パターン記憶部４にＩ記憶
された標準パターンと分析部２で特徴パラメータに変換
された入力音声との照合が行われ、入力音声と各標準パ
ターンとの距離（もしくは類似度またはそれに相当する
ｔ）が計算される。すべての標準パターンとの距離が計
算され、その最小の距離（類似度の場合は厳大の類似度
）をもつ標準パターンに対応する単語が認識結果として
出力端子５から出力される。なお以後の観明は距離につ
いて行うが、類似度の場合は最小値を最大値に読みかえ
ることによシ適用できる。In Fig. 1, prayers that are input more than input terminal 1 are band-limited in the voice analysis section 2, and then converted into digital voices by A/D conversion, and voice sections are extracted using power information, etc. , a feature parameter is calculated in the extracted speech section, and the input speech is converted into a percentage feature parameter. The MU section 3 compares the standard patterns stored in the standard pattern storage section 4 with the input speech converted into feature parameters by the analysis section 2, and calculates the distance (or similarity or similarity) between the input speech and each standard pattern. The corresponding t) is calculated. The distances to all standard patterns are calculated, and the word corresponding to the standard pattern with the minimum distance (in the case of similarity, the strict similarity) is output from the output terminal 5 as a recognition result. Although the following discussion will be based on distance, it can also be applied to similarity by replacing the minimum value with the maximum value.

従来この種の音声認識装置では認識対象の単語の標準パ
ターンのみを標準パターン記憶部４に記憶しておくのが
通常であった。しかしながら音声認識装置の利用者は必
ずしもａｇｔ＆対象の単鎖のみを正しく入力してくれる
とは限らない。時にはせきばらい、くシやみ等や“え−
と”、“あの−”等の音声が入力されるのは避けられな
い。また背抜で話し声や靴音、ドアの開閉の音などがす
る場合はこれらもひろってしまうことも起りうる。更に
は電話回線を通して音声を入力する場合、状況によって
ビジートーン、リングバックトーン等が入力されること
も起シうる。これらの音声もしくは音が入力されると、
これらの音声屯しくけ音が内置されているいずれかの標
準パターンに近いと判定されて出力端子５から出力され
ると、望ましくない結果が出力されることになる。した
がって例えば音声で計算機にデータを投入しようとする
場合や音声によシ銀行業務、座席予約等のサービスを受
けようとする場合、重大な支障をき九す仁とになる。Conventionally, in this type of speech recognition apparatus, it has been usual to store only standard patterns of words to be recognized in the standard pattern storage section 4. However, the user of the speech recognition device does not necessarily correctly input only the single chain of agt & target. Sometimes I have a cough, dark circles, etc.
It is unavoidable that voices such as "and" and "um-" are input.Also, if you hear voices, shoes, doors opening and closing, etc., these may also be picked up.Furthermore, When inputting audio through a telephone line, busy tones, ringback tones, etc. may be input depending on the situation.When these voices or sounds are input,
If these audio trigger sounds are determined to be close to any of the internally placed standard patterns and are output from the output terminal 5, an undesirable result will be output. Therefore, for example, when trying to enter data into a computer by voice, or when trying to receive services such as banking or seat reservations by voice, this can cause serious problems.

このような事態を避けるため従来とられてきた方法に棄
却という出力を許すことがある。これはしきい（ｍＤを
もうけておき、１繊結果に付随して得られる距離値ｄが
Ｄよシ大きけれは棄却と判定することにより上記のよう
な認識誤り金防ごうというものである。しかしながらこ
のような方法をとっても認識対象の音声に似た音声もし
くＦｉ祈が入力された場合、認識誤りが生じるのは避け
られない。In order to avoid such a situation, a conventional method sometimes allows an output to be rejected. This is to prevent the above-mentioned recognition error by setting a threshold (mD) and determining that the distance value d associated with the result of one fiber is larger than D is rejected. However, Even with this method, if a voice similar to the voice to be recognized or a Fi prayer is input, recognition errors will inevitably occur.

第２図はこの様子を示した図であり、音声の特徴パラメ
ータを空間上の点として模式的に示したものである。４
＠域６は“はい”という認識対象の単鎖の分布、点７は
その標準パターンの点である。FIG. 2 is a diagram showing this situation, and schematically shows the voice characteristic parameters as points in space. 4
@Area 6 is the distribution of single chains to be recognized as "yes", and point 7 is the point of its standard pattern.

正しい発声を棄却しないためにはしきい値りは領域６を
おおうに足る大きさにする必要がある。このとき点７を
中心とした半径りの球８の内部の点は“はい”と認識さ
れる。領域９は“たに１というＭ一対象外の音声の分布
を示し、これは穎塚８と重なっており、点１０はその１
つの発声データの位置を示し、前記重な多部分にある。In order not to reject correct utterances, the threshold value needs to be large enough to cover region 6. At this time, points inside the sphere 8 with a radius around the point 7 are recognized as "yes". Region 9 shows the distribution of M1 non-targeted speech, which is "Tani 1", which overlaps with Muzuka 8, and point 10 is that 1.
The position of the two utterance data is shown in the overlapping multi-portion.

この場合点１０は１はい″と誤って認識されてしまうこ
とになる。このような事態は認識方法のいかんにかかわ
らず従来の音声１１１ｇ誠装置では避けられない問題点
であった。In this case, point 10 will be erroneously recognized as 1, ``Yes''.This situation is an unavoidable problem in conventional voice 111g Makoto devices, regardless of the recognition method.

この発明はこの欠点を除去するために、棄却すべき音声
もしくは音に対応した標準パターンをあらかじめ作成し
て内蔵しておき、認識結果が上記棄却用の標準パターン
になった場合は、棄却を出力することを特徴とした音声
認識装置で、その目的は入力されることが避けられず、
従来の装置では誤って認識されやすいため重大な支障が
生じた音声もしくは音を正しく棄却することにある。In order to eliminate this drawback, the present invention creates and stores in advance a standard pattern corresponding to the voice or sound to be rejected, and outputs a rejection when the recognition result becomes the standard pattern for rejection. It is a voice recognition device that is characterized by
The purpose of the present invention is to correctly reject voices or sounds that are easily recognized incorrectly and cause serious trouble in conventional devices.

第３図はこの発明の実施例を示し、第１図と対応する部
分には同一符号を付けであるが、この発明では標準パタ
ーン記憶部４に認識対象の単結に対応した標準パターン
を記憶する記憶［４１の他に、棄却すべき音声もしくは
音に対応した伸率パターンの記憶部４２が設けられる。FIG. 3 shows an embodiment of the present invention, and parts corresponding to those in FIG. In addition to the storage section 41 for storing the data, a storage section 42 for expansion rate patterns corresponding to voices or sounds to be rejected is provided.

この装置の動作は第１図に示した音声認識装置の場合と
殆んど同一である。たりしこの発明装置で祉−識部３で
入力音声と、記憶部４１に記憶しであるｗｔ識対象の単
鎖の標準パターンとの照合を行う他に、記憶Ｎ４２に記
憶しである棄却すべき音声もしくは音の標準パターンと
も照合を行い、これらの中で最小の距離を持つ標準パタ
ーンを求める。更に求めた標準パターンが記憶部４１に
記憶しである標準パターンであれば、第１図に示した装
置と全く同じ処理をして結果を出力する。もし最小距離
をもつ［４パターンとして記憶部４２にｂピ憶しである
標準パターンが選ばれた場合は棄却として結果を出力端
子５に出力する。The operation of this device is almost the same as that of the speech recognition device shown in FIG. In this inventive device, the identification section 3 compares the input voice with the standard single-chain pattern of the wt recognition object stored in the storage section 41, and also rejects the input voice stored in the storage section N42. It also performs matching with standard patterns of appropriate speech or sounds, and finds the standard pattern with the minimum distance among them. Further, if the obtained standard pattern is a standard pattern stored in the storage section 41, the same processing as in the apparatus shown in FIG. 1 is performed and the result is output. If the standard pattern stored in the memory unit 42 as the [4 pattern with the minimum distance is selected, the result is output to the output terminal 5 as a rejection.

第４図はこの発明装置が有効に動作することの原理を図
示したものであシ、第２図と対応する部分には同一符号
を付けである。棄却すべき音声頓たに”に対して用意し
た標準パターンの空間上の点１１と発声データの点ｌＯ
とが比較され、この点１０は標準パターン１１と最も距
離が近いのでこの点１０の音声に対しては棄却という正
常な出力が得られる。FIG. 4 illustrates the principle of effective operation of the device of the present invention, and parts corresponding to those in FIG. 2 are given the same reference numerals. Point 11 on the space of the standard pattern prepared for “Voices that should be rejected” and point lO of the utterance data
Since this point 10 is the closest to the standard pattern 11, a normal output of rejection is obtained for the voice at this point 10.

以上説明したように、この発明の＃ｃＴｉｊｔは棄却す
べき音声もしくは音の標準パターンを用意しておき、入
力音声が棄却すべき音声もしくは音の憚準パターンと最
も近いと判定された場合には棄却という出力を出すため
、せきばらい、くシやみ、ビジートーン、リングバック
トーン婢音声紹誠装置への入力が避けられない音声もし
くは音を正しく棄却し、望ましくない耐織結釆が得られ
ることを避けられる利点がある。As explained above, #cTijt of the present invention prepares a standard pattern of speech or sound to be rejected, and when it is determined that the input speech is closest to the standard pattern of speech or sound to be rejected, In order to produce an output of rejection, voices or sounds that are unavoidable to be input to the voice introduction device such as coughing, coughing, busy tones, and ringback tones are correctly rejected and an undesirable durable weave is obtained. It has the advantage of avoiding

４、図面の簡単なりｉｌ、Ｅ９４第１図は従来の音声ｇ繊装置の構成を示すブロック図、
第２図は従来の音声認識装置において棄却すべき語が誤
って認識される例を示しだ図、第３図はこの発明装置の
一実施例の構成を示すブロック図、！４図はこの発明を
用いると棄却すべき時が正しく棄却されることを示す図
である。4. Simplified drawings, E94 Figure 1 is a block diagram showing the configuration of a conventional audio cable device.
Fig. 2 shows an example in which a word to be rejected is incorrectly recognized in a conventional speech recognition device, and Fig. 3 is a block diagram showing the configuration of an embodiment of the inventive device. FIG. 4 is a diagram showing that when this invention is used, a case that should be rejected is correctly rejected.

ｌ二人力端子、２：分析部、３：餡瞳部、４：標準パタ
ーン記憶部、４１：認鍼対象飴の標準パターン記憶部、
４２：棄却すべき入力Ｏ４Ｊ準パＬンＨ己憶部、５：出
力端子、６：Ｍｋ対象の飴“はい″の分布、７：“はい
”の４ｓ準パターン、８：”はい”とＶ、陳される入力
の範囲、９：棄却すべき胎”たに″の分布、１０：“た
に”の特定の発声の位置、１１：”だに”の標準パター
ン。l two-person power terminal, 2: analysis section, 3: bean paste pupil section, 4: standard pattern storage section, 41: standard pattern storage section for certified acupuncture target candy,
42: Input O4J quasi-pan LnH self-memory part to be rejected, 5: Output terminal, 6: Distribution of candy “yes” for Mk target, 7: 4s quasi-pattern of “yes”, 8: “yes” and V , range of input to be displayed, 9: distribution of ``tani'' to be rejected, 10: position of specific utterance of ``tani'', 11: standard pattern of ``dani''.

％許出１人　　日本電信’ＮＵ話公社代理人　草野　車７１　図オ　２　図オ　３　図４２% Permit 1 person Nippon Telegraph's NU Service Corporation Agent Kusano Kuruma Figure 71 E 2 Diagram Figure 3 42

Claims

[Claims]

(1) a speech analysis section that analyzes input speech and extracts characteristic parameters; a standard pattern storage section that stores standard patterns; a feature parameter extracted from the input speech by the speech analysis section; It is composed of a voice ktk section that performs matching with the yajunno (turn) stored in the pattern memory section, and in the standard pattern memory section, the voice to be recognized or the voice to be recognized is A speech recognition device whose main purpose is to memorize standard patterns.