[go: up one dir, main page]

WO1992006469A1 - Assouplissement des limites pour la reconnaissance des structures de la parole - Google Patents

Assouplissement des limites pour la reconnaissance des structures de la parole Download PDF

Info

Publication number
WO1992006469A1
WO1992006469A1 PCT/US1991/007165 US9107165W WO9206469A1 WO 1992006469 A1 WO1992006469 A1 WO 1992006469A1 US 9107165 W US9107165 W US 9107165W WO 9206469 A1 WO9206469 A1 WO 9206469A1
Authority
WO
WIPO (PCT)
Prior art keywords
point
pattern
path
score
feasible
Prior art date
Application number
PCT/US1991/007165
Other languages
English (en)
Inventor
Ilan D. Shallom
Raziel Haimi-Cohen
Original Assignee
The Dsp Group, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from IL9586990A external-priority patent/IL95869A/en
Priority claimed from IL98092A external-priority patent/IL98092A0/xx
Application filed by The Dsp Group, Inc. filed Critical The Dsp Group, Inc.
Publication of WO1992006469A1 publication Critical patent/WO1992006469A1/fr

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/12Speech classification or search using dynamic programming techniques, e.g. dynamic time warping [DTW]

Definitions

  • the present invention relates to pattern recognition processing generally and more particularly to speech recognition using a dynamic programming algorithm, typically a modification of a standard Dynamic Time Warping (DTW) or similar algorithms (for example Hidden Markov Model based on Viterbi s algorithm) .
  • DTW Dynamic Time Warping
  • Hidden Markov Model based on Viterbi s algorithm for example Hidden Markov Model based on Viterbi s algorithm
  • the degradation in recognition accuracy due to mismatch in boundary determination can be reduced by various approaches.
  • the method of Wilpon et al uses the approach of improving the accuracy in boundary determination to a certain degree of uncertainty.
  • a procedure be developed that is immune to small endpoint errors.
  • Rabiner et al attempts to improve speech recognition by relaxation of the boundary constraints and modification of the standard dynamic time warping algorithm, allowing the warping path to begin and end within a specified range with respect to the estimated boundaries.
  • the accumulated distance of the final path is normalized by its length.
  • the method of Rabiner et al is enhanced by the algorithm described in "Dynamic Time Warping with Boundaries Constraint Relaxation", by I.D. Shallo , R. Haimi Cohen and T. Golan, and published in Proc. Conf. IEEE Israel. 1989, paper 3.1.3.
  • the algorithm of Shallom et al also uses relaxation of boundary constraints. Their method uses the dynamic time warping algorithm — that is, where a path length normalization factor is applied in the dynamic equation at each grid point. This improves the path optimization process.
  • the present invention provides a method of improved pattern recognition which may be used for speech recognition by relaxation of boundary constraints so as to account for boundary detection errors.
  • the dynamic programming algorithm is modified so that the known and predicted path lengths are taken into account when determining the optimal path to each gridpoint. Additionally, the present invention provides a method for improving the accuracy of the estimated boundaries of a tested pattern.
  • a method for determining the predicted path length and for utilizing it in a dynamic programming algorithm is outlined below.
  • apparatus for pattern recognition including apparatus for providing a digital pattern to be inspected which contains a plurality of feature vectors, apparatus for providing at least one digital reference pattern containing a different plurality of parameter vectors and apparatus for comparing the digital pattern to be inspected with the at least one digital reference pattern.
  • the apparatus for comparing includes apparatus for providing a search area including a grid with the feature vectors on a first axis and the parameter vectors on a second axis and apparatus for calculating a final normalized score which is the estimated minimum of a plurality of optimal normalized scores each associated with a corresponding feasible path, wherein each of the feasible paths is located in the search area.
  • SUBSTITUTESHEET apparatus for calculating includes, for each point in the search area, apparatus for computing an accumulated score for a plurality of feasible paths which contain the point, apparatus for computing an overall weight for each of the plurality of feasible paths which contain the point, apparatus for computing a normalized score, whereby the normalized score is the accumulated score for the point divided by the overall weight for the point, for each of the plurality of feasible paths which contain the point, and apparatus for selecting the normalized score which is least, from the plurality of normalized scores, as an optimal normalized score for the point.
  • the search area includes a plurality of path beginning points and a plurality of path ending points.
  • the apparatus for pattern recognition also includes an apparatus for determining beginning and ending points of that feasible path which is associated with the final normalized score thereby to determine beginning and ending points of the digital pattern.
  • the overall weight includes an accumulated weight and a predicted weight.
  • the pattern to be inspected is a speech utterance and the reference pattern is based on a Hidden Markov Model.
  • the pattern to be inspected is a speech utterance
  • the reference pattern is a reference template
  • the feasible paths are calculated according to a Dynamic Time Warping algorithm.
  • SUBSTITUTESHEET ending points of the feasible path which is associated with the final normalized score are used to estimate beginning and ending points of the pattern to be inspected. Additionally, in accordance with a preferred embodiment of the present invention, the digital pattern is derived from a speech signal.
  • a method for producing a final normalized score which is the minimum of a plurality of optimal normalized scores each associated with a corresponding feasible path, wherein each of the feasible paths is located in a search area and wherein the search area includes a set of points characterized by a plurality of path beginning points and a plurality of path ending points.
  • the method For each point in the search area, the method includes the steps of computing an accumulated score for a plurality of feasible paths which contain the point, computing an overall weight for each of the plurality of feasible paths which contain the point, computing a normalized score, whereby the normalized score is the accumulated score for the point divided by the overall weight for the point, for each of the plurality of feasible paths which contain the point, and selecting the normalized score which is least, from the plurality of normalized scores, as an optimal normalized score for the point.
  • the method also includes the step of determining beginning and ending points of that feasible path which is associated with the final normalized score.
  • the overall weight includes an accumulated weight and a predicted wei ht.
  • SUBSTITUTESHEET normalized score indicates the similarity between a reference form and a pattern to be inspected.
  • the pattern to be inspected is a speech utterance and the reference form is based on a Hidden Markov Model.
  • the pattern to be in ⁇ spected is a speech utterance
  • the reference form is a reference template
  • the feasible paths are cal ⁇ culated according to a Dynamic Time Warping algorithm.
  • the beginning and ending points of the feasible path which is associated with the final normalized score are used to estimate beginning and ending points of the pattern to be inspected.
  • a method for pattern recognition including the steps of providing a digital pattern to be inspected which contains a plurality of feature vectors, providing at least one digital reference pattern containing a different plurality of parameter vectors, and comparing the digital pattern to be inspected with the at least one digital reference pattern.
  • the step of comparing includes the steps of providing a search area including a grid with the feature vectors on a first axis and the parameter vectors on a second axis, and calculating a final normalized score which is the minimum of a plurality of optimal normalized scores each associated with a corresponding feasible path, wherein each of the feasible paths is located in the search area.
  • the step of calculating includes, for each point in the search area, the steps of computing an accumulated score for a plurality of feasible paths which contain the point, computing an overall weight for each of the plurality of feasible paths which contain the point, computing a normalized score, whereby the normalized score is the accumulated score for the point divided by the overall
  • SUBSTITUTESHEET weight for the point for each of the plurality of feasible paths which contain the point, and selecting the normalized score which is least, from the plurality of normalized scores, as an optimal normalized score for the point.
  • Fig. 1 is a schematic block diagram illustration of the architecture of a preferred embodiment of speech recognition apparatus constructed and operated in accordance with a preferred embodiment of the present invention
  • Fig. 2 is a schematic block diagram illustration of a speech recognition system constructed and operated in accordance with the principles of a preferred embodiment of the present invention
  • Fig. 3 is a graphical representation illustration of an optimization procedure of a preferred embodiment of the invention.
  • Fig. 4 is a pseudo-code illustration of a scoring algorithm for pattern recognition in the speech recognition system of Fig. 2 in accordance with a dynamic programming technique of the invention.
  • Fig. 1 shows a schematic block diagram of the architecture of a microprocessor-based speech recognition system operated in accordance with the principles of the present invention.
  • a user codec 2 such as an Intel 2913, from Intel Corporation, interfaces with digital signal processing circuitry 4, typically a TNS 320C25 from Texas Instruments Corporation.
  • SUBSTITUTESHEET comprises a static random-access memory, such as a 32K by 8 bit with an access time of 100 nsec, is connected to the digital signal processing circuitry by means of a standard address data and read-write control bus.
  • Fig. 2 shows a schematic block diagram of a microprocessor-based speech recognition system operated in accordance with the principles of the present invention.
  • Fig. 2 The algorithms of Fig. 2 are typically carried out by software run on digital signal processing circuitry 4, such as the digital signal processing circuitry of Fig. 1.
  • An analog signal 12 which may be obtained from a microphone or similar device, is typically provided to a standard sampling device 14.
  • the output of the sampling device, the digital signal 16, is then supplied to a voice activated detection device 18 which may be a device as described in U.S. Patent Application 07/151,740 to the same assignee, which is incorporated herein by reference.
  • the output of the voice activated detection device 18 is a digital speech signal 20.
  • the voice activated detection device may be incorporated by digital signal processing circuitry 4(Fig. 1).
  • the digital speech signal 20 After the digital speech signal 20 has been extracted from the input signal, the digital speech signal 20 is provided to a boundary detector 22 which typically determines the beginning and end points of an utterance that is found in the digital speech signal. The determination may be carried out by a standard boundary detector algorithm such as the type described by Wilpon et al.
  • the utterance is then conveyed to a feature extraction device 26 where spectral or other features
  • SUBSTITUTESHEET are typically extracted, typically through LPC analysis.
  • the feature extraction procedure transforms the utterance into a sequence of test feature vectors 28.
  • each test vector contains the features of a speech frame of approximately 30 msec.
  • An overlap of typically 50% may be applied between adjacent speech frames.
  • the sequence of test feature vectors 28 supplied by the feature extraction 26 is provided to a pattern recognition algorithm 30.
  • the pattern recognition algorithm consists of two primary parts — a scoring algorithm 31 and a decision procedure 36.
  • a set of reference templates 32 from a memory 34 is passed to the scoring algorithm 31 to serve as a reference.
  • the memory storage area 34 is typically of the type depicted in Fig. 1.
  • reference templates consisting of sequences of parameter vectors, are stored in the memory 34 during a process called training (not shown) .
  • Training typically consists of inputting signals of a certain class to the system according to the steps of voice detection through feature extraction described above. Following these steps, the input signals are processed, and reference templates 32 are generated and stored in the memory area 34.
  • the parameter vectors of the template provided by the training procedure represent characteristic features of the class of input signals.
  • a template may represent utterances of a particular word or of a particular subword word unit such as a syllable or a phoneme.
  • the template may represent the voice of a particular person.
  • each parameter vector is a feature vector of a reference utterance.
  • the parameter vectors may include parameters defining a model for a feature sequence of a test utterance.
  • SUBSTITUTESHEET novel approach to pattern recognition using a modification of the dynamic programming method for the scoring procedure, is achieved based on a method of path estimation and normalization of an accumulated similarity score as described in detail hereinbelow.
  • the novel approach to pattern recognition uses a modified Dynamic Time Warping algorithm or alternatively, a Hidden Markov Model algorithm for the scoring algorithm 31.
  • any other suitable dynamic programming based algorithm may be used instead of the examples offered herein.
  • the output of the scoring algorithm 31 is a set of final similarity scores (as defined hereinbelow) , with each score indicating the similarity between the sequence of test vectors 28 and each of the reference templates 32.
  • the scoring algorithm output is typically provided to decision procedure 36 which may comprise a k-NN (k-Nearest Neighbor) rule for determination of the class of inputs to which the pattern between the beginning and endpoints in input signal 12 belongs.
  • decision procedure 36 may comprise a k-NN (k-Nearest Neighbor) rule for determination of the class of inputs to which the pattern between the beginning and endpoints in input signal 12 belongs.
  • the overall output of the pattern recognition procedure provides a code or index 40, which describes the class of inputs to which the pattern between the beginning and the endpoints in input signal 12 belongs.
  • this code or index indicates the verbal contents of input signal 12.
  • the code or index indicates the identity of the speaker who uttered the speech embodied in the input signal 12.
  • FIG. 3 shows a graphical representation of a preferred embodiment of a part of the sequence of the pattern recognition procedure of Fig. 2 in accordance with a preferred embodiment of the invention.
  • the graph representation shows a non-linear time warping function which may be used for scoring the
  • the time warping function maps the time axis of a test feature sequence 50 to the time axis of a reference template 52.
  • the mapping provides a time registration between the reference template 52, which is preferably provided by the memory storage area 34 (Fig. 2) and the test feature 50, which may be provided by the feature extraction device 26 (Fig. 2) .
  • the reference template 52 comprises a sequence of M parameter vectors representing a word from a vocabulary recognizable by a speech recognition system such as the speech recognition system of Fig. 2. M may vary according to the particular reference template.
  • the test feature sequence 50 comprises a sequence of N test feature vectors.
  • the graph comprises a grid with points associated with a local similarity score for the point (n,m) where m is the m * *-* 1 parameter vector of the reference template and n is the n*-* *1 ** test feature vector in the sequence of test feature vectors.
  • the skilled professional may determine the local similarity score associated with each pair of test feature vectors and reference parameter vectors according to his considerations.
  • the local similarity scores may be determined by computing standard Euclidean or
  • the local similarity score may be determined by a speech specific distortion measure such as the likelihood ratio distortion measure proposed by Itakura in the article, "Minimum Prediction Residual Principle Applied to Speech
  • the local similarity score may be probabilistic.
  • the probabilistic local similarity score could be computed using a parametric function of the test feature vector, which depends on the reference parameter vector. The function value provides a statistical estimate of the minus log of the likelihood of observing the test feature vector in a particular segment of the reference word.
  • a feasible warping path, 54 is a sequence of grid points which satisfy certain constraints. Specific constraints are determined by the skilled professional. A typical constraint requires the feasible warping path to map the beginning and ending feature vectors of the test to the beginning and ending parameter vectors of the reference, respectively. Another typical constraint is that the slope of the warping path will be within a specified limit, typically between 1:2 and 2:1.
  • Fig. 4 shows a pseudo-code description of a scoring algorithm as part of the pattern recognition in the speech recognition system of Fig. 2 in accordance with a preferred embodiment of a dynamic programming
  • the algorithm of Fig. 4 can be implemented by the digital processing circuitry 4 of Fig. 1.
  • the algorithm can be implemented using other suitable computing hardware in accordance with state-of-the-art electronic design and programming techniques.
  • the scoring procedure which is typically based on a Dynamic Time Warping algorithm, or alternatively, on a Hidden Markov Model algorithm, is preferably used to determine the similarity between a test utterance and reference word in speech recognition procedures.
  • initial values are assigned to each point in search area 56, where the search area is as defined above.
  • This step is independent of the content of the sequence of test feature vectors, and depends only on the number N of test feature vectors in a certain sequence and the number M of parameter vectors in a reference template.
  • a set of path beginning grid points and a set of path ending grid points are defined.
  • a typical definition of the beginning set is:
  • x., x. are the maximum expected beginning and end errors of the boundary detector at the beginning and at the end of the test word (assuming that the reference boundaries are sufficiently accurate) .
  • SUBSTITUTESHEET (2) For each grid point in the search area, as defined hereinabove, a list of "access paths" is defined.
  • An access path is a short path leading from a neighboring grid point to a given grid point.
  • the access paths should be defined in such a way that a concatenation of access paths leading from a path beginning grid point to a path ending grid point constitutes a feasible path (as defined above) . Additionally, any feasible path must be representable as a concatenation of access paths from a path beginning grid point to a path ending grid point.
  • the rule is described in the article, incorporated herein by reference, "Dynamic programming Algorithm Optimization for Spoken Word Recognition", published in the IEEE Trans. Acoustic. Speech and Signal Processing. Vol. ASSP-26, Feb. 1978, pp. 43-49.
  • an access path may be defined by a left to right finite state automaton where each reference parameter vector is represented by a state and each grid point (n,m) indicates that at time n, the automaton has reached state m.
  • An access path to a grid point (n,m) is a two-point path of the form [(n-l,k), (n,m)] where there exists a transition leading from the state representing the k-th reference parameter vector to the state representing the m-th reference parameter vector.
  • Such a definition is common in Hidden Markov Models.
  • STEP 2 LOOP ON GRID POINTS IN SEARCH AREA:
  • a local weight may be defined indicating the significance of the local similarity score at that point.
  • a bias at the point (n,m) may be defined to indicate the apriori likelihood of the feasible path passing through that point.
  • the accumulated similarity score, D(n,m) of a feasible path containing the grid point (n,m) is the sum of all biases along the path from the path beginning to the point (n,m) , plus the sum of all local similarity scores from the path beginning to the point (n,m) , where each local score is multiplied by a corresponding local weight.
  • the local similarity score is calculated according to the methods outlined above and the bias and local weight are calculated as defined below.
  • the overall weight, W(n,m) of a path con ⁇ taining the point (n,m) is the sum of all local weights along that path from its beginning to its ending.
  • the accumulated weight, B(n,m) of a path containing the point (n,m) is the sum of all local weights along the path, from the path beginning till the point (n,m) .
  • the future weight, F(n,m) of a path containing the point (n,m) is the sum of all local weights along the path, from the point following (n,m) till the path end.
  • the overall weight is the sum of the accumulated weight and the future weight.
  • the optimal normalized similarity score, A*(n,m) is the minimum of the normalized similarity scores A(n,m) , taken over all feasible paths containing (n,m) .
  • the optimal feasible path through (n,m) is the path for which A(n,m) was minimal. If there are more than one such paths, the choice of the optimal one is
  • the optimal overall weight W*(n,m), the optimal accumulated weight B*(n,m), the optimal future weight F*(n,m) and the optimal accumulated similarity score D*(n,m) are the overall weight W(n,m) , the accumulated weight B(n,m) , the future weight F(n,m) and the accumulated similarity score D(n,m) respectively, associated with the optimal feasible path through (n,m) .
  • the optimal path beginning grid point b* (n,m) , and the optimal path ending grid point _£*(n,m) are the beginning and ending points, respectively, of the optimal feasible path through (n,m) (the underline in _ and b indicates that each represents a pair of coordinates) .
  • the local similarity score D(n,m) at point (n,m) is computed according to the methods outlined above.
  • STEP 2.2 ESTIMATING THE FUTURE WEIGHT.
  • F*(n,m) the optimal future weight is predicted.
  • F*(n,m) is the average of the future weights from (n,m) to each of the path ending grid points which are accessible from (n,m) by a feasible path.
  • F*(n,m) may be the median of those future weights.
  • initial estimates for the optimal scores of a grid point (n,m) are established, based on the assumption that the optimal path begins at that point.
  • step 2 If (n,m) is in the set of path beginning grid points (as defined in step 1) , the initial estimates are computed according to the following steps.
  • a typical value for the bias is 0 and a typical value for the local weight is 2.
  • a typical value for the bias may be minus log of the likelihood that the path begins at the given point (n,m) and the local weight may be set equal to 1.
  • the value of the bias is estimated during the training procedure.
  • the optimal beginning point is set to be the same point: fe*(n,m) - (n,m) .
  • the optimal accumulated weight, B*(n,m) gets the value of the local weight.
  • the optimal overall weight W*(n,m) is the sum of optimal accumulated and future weights, B*(n,m)+F*(n,m) .
  • the optimal accumulated similarity score, D*(n,m), is the bias for the point (n,m) plus the local similarity score of that same point multiplied by the local weight of the point.
  • the optimal normalized similarity score, A*(n,m), is the optimal accumulated similarity score divided by the optimal overall weight D*(n,m)/W*(n,m) .
  • one of the access paths leading to a point (n,m) is checked for the hypothesis that the optimal path through (n,m) contains that particular access path. This is done by computing the normalized similarity score for a particular access path under this hypothesis and then comparing it to the current estimated value of the optimal normalized similarity score. If the computed value is smaller than the current estimate, all current estimates of optimal scores for that point (n,m) are replaced by the computed value.
  • the bias may be minus log of the likelihood of moving to the current grid point from the preceding one (this likelihood may typically be determined during training) and the local weight is 1. This is the common ca ⁇ a in Hidden Markov Model devices.
  • the accumulated similarity score D(n,m) is computed for a path which comprises the concatenation of the optimal path to (p,q) and the given access path. Therefore D(n,m) is calculated as D*(p,q) plus the sum of all biases along the given access path (except for the first point (p,q)) plus the sum of all local similarity scores along the access path (except for the first point (p,q)), each multiplied by the corresponding local weight.
  • the overall weight W(n,m) is computed by adding the accumulated weight B(n,m) to the estimated optimal future weight F*(n,m). 2.4.4: COMPUTE NORMALIZED SIMILARITY SCORE FOR GIVEN ACCESS PATH
  • A(n,m) is computed for a path which contains the concatenation of the optimal path to (p,q) and the given access path. Therefore A(n,m) is calculated as D(n,m) divided by W(n,m).
  • SUBSTITUTESHEET path, A(n,m) is less than the current estimate of the optimal normalized similarity score, A*(n,m), the following step is performed: STEP 2.4.5.1: ASSIGN NEW OPTIMAL VALUES
  • the current estimate for the optimal path through (n,m) is updated to be a path which contains the concatenation of the optimal path to point (p,q) and the given access path.
  • D*(n,m), B*(n,m), W*(n,m), and A*(n,m) are replaced by the values corresponding to the updated optimal path, that is, D(n,m) , B(n,m) , W(n,m) , and A(n,m) , respectively.
  • path beginning grid point b*(n,m) is set to be equal to fe(p,q), the optimal path beginning grid point of the beginning point of the given access path.
  • the minimal value of A*(n,m), over all the points in the set of path ending grid points (as defined in step 1) is the final normalized similarity score.
  • the feasible path associated with the final normalized score is the final path.
  • the path ending grid point (n,m) of the final path is the final path ending grid point.
  • the optimal path beginning grid point of the final path, b*(n,m) is the final path beginning grid point.
  • STEP 3.2 DETERMINE FINAL BEGIN AND END ESTIMATES
  • the first coordinates of the final path beginning grid point and of the path ending grid point are the final estimates for the beginning and ending of a test utterance, respectively.
  • the second coordinate of these grid points indicates the beginning and ending, respectively, of the part of a reference template

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

L'algorithme de reconnaissance de la parole est mis en ÷uvre dans un programme informatique en envoyant un signal d'entrée vocal dans un codeur (2) et en le traitant dans un ordinateur standard (4) au moyen de structures de référence stockées en mémoire (6). L'algorithme met en ÷uvre la technique bien connue de la programmation dynamique pour inclure les fonctions de pondération et de normalisation.
PCT/US1991/007165 1990-10-02 1991-10-02 Assouplissement des limites pour la reconnaissance des structures de la parole WO1992006469A1 (fr)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
IL9586990A IL95869A (en) 1990-10-02 1990-10-02 Boundary relaxation for speech pattern recognition
IL95869 1990-10-02
IL98092 1991-05-09
IL98092A IL98092A0 (en) 1991-05-09 1991-05-09 Boundary relaxation for speech pattern recognition

Publications (1)

Publication Number Publication Date
WO1992006469A1 true WO1992006469A1 (fr) 1992-04-16

Family

ID=26322136

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1991/007165 WO1992006469A1 (fr) 1990-10-02 1991-10-02 Assouplissement des limites pour la reconnaissance des structures de la parole

Country Status (2)

Country Link
EP (1) EP0551374A4 (fr)
WO (1) WO1992006469A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE19930522A1 (de) * 1999-07-05 2001-02-01 Univ Ilmenau Tech Verfahren zur Erkennung von Lautsignalen

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4256924A (en) * 1978-11-22 1981-03-17 Nippon Electric Co., Ltd. Device for recognizing an input pattern with approximate patterns used for reference patterns on mapping
US4400788A (en) * 1981-03-27 1983-08-23 Bell Telephone Laboratories, Incorporated Continuous speech pattern recognizer
US4400828A (en) * 1981-03-27 1983-08-23 Bell Telephone Laboratories, Incorporated Word recognizer
US4467437A (en) * 1981-03-06 1984-08-21 Nippon Electric Co., Ltd. Pattern matching device with a DP technique applied to feature vectors of two information compressed patterns
US4570232A (en) * 1981-12-21 1986-02-11 Nippon Telegraph & Telephone Public Corporation Speech recognition apparatus
US4624008A (en) * 1983-03-09 1986-11-18 International Telephone And Telegraph Corporation Apparatus for automatic speech recognition
US4751737A (en) * 1985-11-06 1988-06-14 Motorola Inc. Template generation method in a speech recognition system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4256924A (en) * 1978-11-22 1981-03-17 Nippon Electric Co., Ltd. Device for recognizing an input pattern with approximate patterns used for reference patterns on mapping
US4467437A (en) * 1981-03-06 1984-08-21 Nippon Electric Co., Ltd. Pattern matching device with a DP technique applied to feature vectors of two information compressed patterns
US4400788A (en) * 1981-03-27 1983-08-23 Bell Telephone Laboratories, Incorporated Continuous speech pattern recognizer
US4400828A (en) * 1981-03-27 1983-08-23 Bell Telephone Laboratories, Incorporated Word recognizer
US4570232A (en) * 1981-12-21 1986-02-11 Nippon Telegraph & Telephone Public Corporation Speech recognition apparatus
US4624008A (en) * 1983-03-09 1986-11-18 International Telephone And Telegraph Corporation Apparatus for automatic speech recognition
US4751737A (en) * 1985-11-06 1988-06-14 Motorola Inc. Template generation method in a speech recognition system

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
ICASSP'86, Tokyo, April 1986, QUENOT et al., "A Dyamic time wrap VLSI processor for continuous speech recognition", see esp fig. 2. *
IEEE Trans. on ASSP, Vol. 32, No. 2, April 1984, NEY, "The use of a One-Stage Dynamic - Programming Algorithm for connected word recognition", pages 263-271, see esp. pages 265, 269. *
IEEE Transion ASSP, Vol. 26, No. 1, February 1978, SAKOE et al., "Dynamic Programming Alogrithm Optimization for Spoken Word Recognition", pages 43-49, see esp. page 44. *
IEEE Transion ASSP, vol. 36, no. 9, September 1988, IRWIN, "A Digit Pipelined Dynamic Time Warp Processor", pages 1412-1422, see especially pages 1413,1415 (Fig.4), 1418 (Figs.9,11) and page 1420. *
See also references of EP0551374A4 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE19930522A1 (de) * 1999-07-05 2001-02-01 Univ Ilmenau Tech Verfahren zur Erkennung von Lautsignalen

Also Published As

Publication number Publication date
EP0551374A4 (en) 1995-02-15
EP0551374A1 (fr) 1993-07-21

Similar Documents

Publication Publication Date Title
US6125345A (en) Method and apparatus for discriminative utterance verification using multiple confidence measures
US8532991B2 (en) Speech models generated using competitive training, asymmetric training, and data boosting
US4918732A (en) Frame comparison method for word recognition in high noise environments
JP3549681B2 (ja) 連結数字の認識のための発声識別立証
US7447634B2 (en) Speech recognizing apparatus having optimal phoneme series comparing unit and speech recognizing method
US6226612B1 (en) Method of evaluating an utterance in a speech recognition system
US7027985B2 (en) Speech recognition method with a replace command
US6029124A (en) Sequential, nonparametric speech recognition and speaker identification
US6317711B1 (en) Speech segment detection and word recognition
US7318032B1 (en) Speaker recognition method based on structured speaker modeling and a “Pickmax” scoring technique
US5459815A (en) Speech recognition method using time-frequency masking mechanism
EP0601778A1 (fr) Classification de mots-clés/de mots non clés dans la reconnaissance du langage par mots isolés
US20060190259A1 (en) Method and apparatus for recognizing speech by measuring confidence levels of respective frames
US20020049593A1 (en) Speech processing apparatus and method
JPH07334184A (ja) 音響カテゴリ平均値計算装置及び適応化装置
McDermott et al. Prototype-based minimum classification error/generalized probabilistic descent training for various speech units
US4937870A (en) Speech recognition arrangement
WO1987004294A1 (fr) Procede de comparaison de sequences pour la reconnaissance de mots dans des environnements a bruit ambiant eleve
Sanchís et al. Improving utterance verification using a smoothed naive bayes model
EP0177854B1 (fr) Système de reconnaissance de mot clef utilisant des chaînes d'éléments de langage
WO1992006469A1 (fr) Assouplissement des limites pour la reconnaissance des structures de la parole
JP2853418B2 (ja) 音声認識方法
IL95869A (en) Boundary relaxation for speech pattern recognition
Sharma et al. Speech recognition of Punjabi numerals using synergic HMM and DTW approach
JP2003271185A (ja) 音声認識用情報作成装置及びその方法と、音声認識装置及びその方法と、音声認識用情報作成プログラム及びそのプログラムを記録した記録媒体と、音声認識プログラム及びそのプログラムを記録した記録媒体

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): JP SU

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE DK ES FR GB GR IT LU NL SE

WWE Wipo information: entry into national phase

Ref document number: 1991917937

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 1991917937

Country of ref document: EP

WWW Wipo information: withdrawn in national office

Ref document number: 1991917937

Country of ref document: EP