[go: up one dir, main page]

CN100367347C - Speech encoder and speech decoder - Google Patents

Speech encoder and speech decoder Download PDF

Info

Publication number
CN100367347C
CN100367347C CNB988015560A CN98801556A CN100367347C CN 100367347 C CN100367347 C CN 100367347C CN B988015560 A CNB988015560 A CN B988015560A CN 98801556 A CN98801556 A CN 98801556A CN 100367347 C CN100367347 C CN 100367347C
Authority
CN
China
Prior art keywords
vector
diffusion
unit
pulse
code
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
CNB988015560A
Other languages
Chinese (zh)
Other versions
CN1242860A (en
Inventor
安永和敏
森井利幸
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Godo Kaisha IP Bridge 1
Original Assignee
Matsushita Electric Industrial Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Matsushita Electric Industrial Co Ltd filed Critical Matsushita Electric Industrial Co Ltd
Publication of CN1242860A publication Critical patent/CN1242860A/en
Application granted granted Critical
Publication of CN100367347C publication Critical patent/CN100367347C/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Transmission Systems Not Characterized By The Medium Used For Transmission (AREA)
  • Test And Diagnosis Of Digital Computers (AREA)
  • Semiconductor Integrated Circuits (AREA)

Abstract

An excitation vector generation apparatus comprising: a pulse vector generation unit having N channels (N ≧ 1) for generating pulse vectors; corresponding to N channels, each channel stores M (M is more than or equal to 1) storage units with diffusion modes; a selection unit selectively taking out the diffusion pattern from the memory unit for each channel; each channel carries out convolution operation of the extracted diffusion mode and the generated pulse vector and generates N diffusion units of diffusion vectors; and an excitation vector generation unit for generating excitation vectors from the generated N diffusion vectors.

Description

话音信号编码器和话音信号解码器 Voice signal encoder and voice signal decoder

技术领域technical field

本发明涉及高效编码和解码话音信息用的话音信号编码器和话音信号解码器。The present invention relates to a speech signal encoder and a speech signal decoder for efficiently encoding and decoding speech information.

背景技术Background technique

当今,正在开发高效编码和解码话音信息用的话音编码技术。《码激励线性预测:低比特率高质量话音》(Code Excited Linear Prediction:HighQuality Speech at Low Bit Rate)(M.R.Schroeder著;发表于ICASSP’85,pp.937~940)中记载着基于这种话音编码技术的CELP型话音信号编码器。此话音信号编码器对用固定时间划分输入话音所得的每一帧进行线性预测,由每帧的线性预测求预测残差(激励信号),并用存放过去驱动音源的自适应码本和存放多个噪声码矢量的噪声码本将该预测残差编码。Today, speech coding techniques for efficiently coding and decoding speech information are being developed. "Code Excited Linear Prediction: High Quality Speech at Low Bit Rate" (Code Excited Linear Prediction: High Quality Speech at Low Bit Rate) (written by M.R.Schroeder; published in ICASSP'85, pp.937~940) records that based on this speech Coding technology CELP type speech signal coder. This voice signal encoder performs linear prediction on each frame obtained by dividing the input voice with a fixed time, and obtains the prediction residual (excitation signal) from the linear prediction of each frame, and stores the adaptive codebook of the driving sound source in the past and stores multiple A random codebook of random code vectors encodes the prediction residual.

图1中示出以往的CELP型话音信号编码器(下文简称为“话音编码器”)的功能框图。FIG. 1 shows a functional block diagram of a conventional CELP type speech signal encoder (hereinafter simply referred to as "speech encoder").

线性预测分析单元12对此CELP型话音信号编码器中输入的话音信号11进行线性预测分析。利用该线性预测分析,可取得线性预测系数。线性预测系数为表示话音信号11的频谱包络特性的参数。线性预测分析单元12所得的线性预测系数在线性预测系数编码单元13进行量化,并将量化后的线性预测系数送到线性预测系数解码单元14。又将量化所得量化号作为线性预测码输入到编码输出单元24。线性预测系数解码单元14将线性预测系数编码单元13所得量化线性预测系数解码后,取得合成滤波器的系数,线性预测系数解码单元14将合成滤波器的系数输出到合成滤波器15。The linear predictive analysis unit 12 performs linear predictive analysis on the speech signal 11 input to the CELP type speech signal encoder. Using this linear predictive analysis, a linear predictive coefficient can be obtained. The linear prediction coefficient is a parameter representing the spectrum envelope characteristic of the speech signal 11 . The linear prediction coefficient obtained by the linear prediction analysis unit 12 is quantized in the linear prediction coefficient coding unit 13 , and the quantized linear prediction coefficient is sent to the linear prediction coefficient decoding unit 14 . The quantization number obtained by quantization is also input to the encoding output unit 24 as a linear predictive code. The linear prediction coefficient decoding unit 14 decodes the quantized linear prediction coefficient obtained by the linear prediction coefficient encoding unit 13 to obtain the coefficient of the synthesis filter, and the linear prediction coefficient decoding unit 14 outputs the coefficient of the synthesis filter to the synthesis filter 15 .

自适应码本17为输出多种候补自适应码矢量的码本,由存放过去的多帧驱动音源的缓存器构成。自适应码矢量为表现输入话音中周期分量的时间序列矢量。The adaptive codebook 17 is a codebook for outputting multiple kinds of candidate adaptive codevectors, and is composed of a buffer for storing past multi-frame driving sound sources. The adaptive code vector is a time series vector representing the periodic component in the input speech.

噪声码本18为存放多种候补噪声码矢量的码本,其种类对应于所分配的比特数。噪声码矢量为表现输入话音中非周期分量的时间序列矢量。The random codebook 18 is a codebook storing multiple kinds of candidate random code vectors, the type of which corresponds to the number of allocated bits. The random code vector is a time-series vector representing the aperiodic component of the input speech.

自适应码增益加权单元19和噪声码增益加权单元20分别对自适应码本17和噪声码本18输出的候补矢量乘以从加权码本21读出的自适应增益和噪声码增益后,输出到加法器22。After the adaptive code gain weighting unit 19 and the random code gain weighting unit 20 multiply the candidate vectors output by the adaptive codebook 17 and the random codebook 18 by the adaptive gain and the random code gain read from the weighted codebook 21, output to adder 22.

加权码本是一种存储器,分别存放多种与候补自适应码矢量相乘的加权数和与候补噪声码矢量相乘的加权数,其种类对应于所分配的比特数。The weighted codebook is a kind of memory, which respectively stores various weights to be multiplied with candidate adaptive code vectors and weighted numbers to be multiplied with candidate random code vectors, the types of which correspond to the number of allocated bits.

加法器22将分别在自适应码增益加权单元19、噪声码增益加权单元20加权后的候补自适应码矢量和候补噪声码矢量相加,产生候补驱动音源矢量,并输出到合成滤波器15。The adder 22 adds the candidate adaptive code vector and the candidate random code vector weighted by the adaptive code gain weighting unit 19 and the random code gain weighting unit 20 respectively to generate a candidate driving excitation vector, and outputs it to the synthesis filter 15 .

合成滤波器15为由线性预测系数解码单元14所得的合成滤波器系数构成的全极型滤波器。合成滤波器15具有一种功能,输入来自加法器22的候补驱动音源矢量时,输出候补合成话音矢量。The synthesis filter 15 is an omnipolar filter composed of synthesis filter coefficients obtained by the linear prediction coefficient decoding unit 14 . The synthesis filter 15 has a function of outputting a candidate synthesized voice vector when a candidate driving sound source vector from the adder 22 is input.

失真计算单元16计算合成滤波器15的输出(即候补合成话音矢量)与输入话音11之间的失真,所得失真值输出到码号规定单元23。码号规定单元23分别对三种码本(自适应码本、噪声码本和加权码本)规定使失真计算单元16中算出的失真最小的三种码号(自适应码号、噪声码号和加权码号)。然后,将码号规定单元23所规定的三种码号输出到编码输出单元24。编码输出单元24汇合线性预测系数编码单元13所得的线性预测码号以及码号规定单元23所规定的自适应码号、噪声码号和加权码号,并输出到传输线路。The distortion calculation unit 16 calculates the distortion between the output of the synthesis filter 15 (namely, the candidate synthesized speech vector) and the input speech 11, and outputs the obtained distortion value to the code number specifying unit 23. The code number specifying unit 23 specifies three kinds of code numbers (adaptive code number, random code number and and weight code number). Then, the three code numbers specified by the code number specifying unit 23 are output to the code output unit 24 . The coding output unit 24 combines the linear prediction code number obtained by the linear prediction coefficient coding unit 13 and the adaptive code number, random code number and weighted code number specified by the code number specification unit 23, and outputs to the transmission line.

图2中示出对上述编码器编码的信号进行解码的CELP型话音信号解码器(下文简称为“话音解码器”)的功能框图。此话音信号解码器中,编码输入单元31接收话音信号编码器(图1)送来的编码,将接收到的编码分解为线性预测码号、自适应码号、噪声码号和加权码号,并将分解所得编码分别输出到线性预测系数解码单元32、自适应码本33、噪声码本34和加权码本35。FIG. 2 shows a functional block diagram of a CELP type speech signal decoder (hereinafter simply referred to as "speech decoder") for decoding a signal encoded by the above encoder. In this voice signal decoder, code input unit 31 receives the code that voice signal coder (Fig. 1) sends, and the code that receives is decomposed into linear predictive code number, adaptive code number, noise code number and weighted code number, And the decomposed codes are respectively output to the linear prediction coefficient decoding unit 32 , the adaptive codebook 33 , the noise codebook 34 and the weighted codebook 35 .

接着,线性预测系数解码单元32将编码输入单元31所得线性预测码号解码后,取得合成滤波器系数,并输出到合成滤波器39。然后,从自适应码本中与自适应码号对应的位置读出自适应码矢量,从噪声码本读出与噪声码号对应的噪声码矢量,进而从加权码本读出加权码号对应的自适应码增益和噪声码增益。而且,在自适应码加权单元36将自适应码矢量乘以自适应码增益后,送到加法器38。同样,在噪声码矢量加权单元37将噪声码矢量乘以噪声码增益后,送到加法器38。Next, the linear predictive coefficient decoding unit 32 decodes the linear predictive code number obtained by the encoding input unit 31 to obtain a synthesis filter coefficient, and outputs it to the synthesis filter 39 . Then, read the adaptive code vector from the position corresponding to the adaptive code number in the adaptive codebook, read the random code vector corresponding to the random code number from the random codebook, and then read the weighted code number corresponding to Adaptive code gain and random code gain of . Furthermore, after the adaptive code weighting unit 36 multiplies the adaptive code vector by the adaptive code gain, it is sent to the adder 38 . Similarly, after the random code vector weighting unit 37 multiplies the random code vector by the random code gain, it is sent to the adder 38 .

加法器38将上述两个编码矢量相加后,产生驱动音源矢量,并将产生的驱动音源送到自适应码本33,以更新缓存器,该驱动音源也送到合成滤波器39,以驱动滤波器。合成滤波器39由加法器38所得驱动音源矢量驱动,并用线性预测系数解码单元32的输出再现合成话音。Adder 38 generates the driving sound source vector after the above-mentioned two coding vectors are added, and the driving sound source that produces is sent to adaptive codebook 33, to update the register, and this driving sound source is also sent to synthesis filter 39 to drive filter. The synthesis filter 39 is driven by the driving sound source vector obtained by the adder 38, and uses the output of the linear prediction coefficient decoding unit 32 to reproduce the synthesized speech.

在CELP型话音信号编码器的失真计算单元16一般利用下式(式1)计算所求失真E:The distortion calculation unit 16 of the CELP type voice signal encoder generally utilizes the following formula (formula 1) to calculate the required distortion E:

E=‖V-(gaHP+gcHC)‖2                (1)E=‖V-(gaHP+gcHC)‖ 2 (1)

V:输入话音信号(矢量)V: input voice signal (vector)

H:合成滤波器脉冲响应卷积矩阵H: synthesis filter impulse response convolution matrix

Figure C9880155600101
Figure C9880155600101

其中,h为合成滤波器的脉冲响应(矢量),L为帧长度,Among them, h is the impulse response (vector) of the synthesis filter, L is the frame length,

p:自适应码矢量p: adaptive code vector

c:噪声码矢量c: random code vector

ga:自适应码增益ga: adaptive code gain

gc:噪声码增益gc: noise code gain

这里,为了使式(1)的失真E最小,需要对自适应码号、噪声码号和加权码号的全部组合用闭环计算失真,规定各码号。Here, in order to minimize the distortion E of the expression (1), it is necessary to calculate the distortion in a closed loop for all combinations of adaptive code numbers, random code numbers, and weighted code numbers, and specify each code number.

然而,对式(1)进行闭环检索,则运算处理量过大,因而一般首先用自适应码本由矢量量化规定自适应码号,其次由采用噪声码本的矢量量化规定噪声码号,最后由采用加权码本的矢量量化规定加权码号。现就这种情况下,对采用噪声码本的矢量量化处理作进一步详细说明。However, if the closed-loop retrieval of formula (1) is performed, the amount of calculation processing is too large. Therefore, generally, the adaptive code number is firstly specified by vector quantization using the adaptive codebook, and secondly, the random code number is specified by vector quantization using the random codebook. Finally, The weighted code numbers are specified by vector quantization using a weighted codebook. In this case, the vector quantization process using the random codebook will be further described in detail.

自适应码号和自适应码增益预先确定或暂时确定时,式(1)的失真估算式变为下面的式(2)。When the adaptive code number and the adaptive code gain are determined in advance or temporarily, the distortion estimating formula of the formula (1) becomes the following formula (2).

Ec=‖X-gcHC‖                  (2)Ec=‖X-gcHC‖ (2)

其中,式(2)内的矢量X为采用预先规定或暂时规定的自适应码号和自适应码增益,并由下列式(3)求得的噪声源信息(规定噪声码号用的目标矢量)。Wherein, the vector X in the formula (2) adopts the adaptive code number and adaptive code gain specified in advance or provisionally, and the noise source information obtained by the following formula (3) (the target vector used for specifying the noise code number ).

X=V-gaHP                   (3)X=V-gaHP (3)

ga:自适应码增益ga: adaptive code gain

V:话音信号(矢量)V: voice signal (vector)

H:合成滤波器脉冲响应卷积矩阵H: synthesis filter impulse response convolution matrix

P:自适应码矢量P: adaptive code vector

在规定噪声码号后规定噪声码增益gc的情况下,一般知道可假设式(2)中的gc能取任意值,使式(2)最小的规定噪声码矢量号的处理(噪声源信息的矢量量化处理)可置换为规定使下式(4)的分数式最大的噪声码矢量号。In the case of specifying the random code gain gc after specifying the random code number, it is generally known that gc in formula (2) can take any value, so that the processing of formula (2) to minimize the specified random code vector number (the vector of noise source information Quantization processing) can be replaced by a random code vector number that maximizes the fractional expression of the following equation (4).

(( Xx ′′ HcHc )) 22 || || HcHc || || 22 -- -- -- (( 44 ))

即,在自适应码号和自适应码增益先前已有或暂时规定的情况下,噪声源信息矢量量化处理成为规定使失真计算单元16算出的式(4)分数式最大的候补噪声码矢量号的处理。That is, when the adaptive code number and the adaptive code gain have previously been or are temporarily specified, the noise source information vector quantization process is to specify the candidate noise code vector number that maximizes the fractional expression of Equation (4) calculated by the distortion calculation unit 16 processing.

初期的CELP型编码器/解码器中,将存储器中存储种类与所分配比特数对应的随机数序列所得的数据用作噪声码本。然而,存在以下课题:需要非常大的存储容量,同时对各候补噪声码矢量计算式(4)的失真用的运算处理量庞大。In the early CELP type encoder/decoder, data obtained by storing random number sequences corresponding to the number of allocated bits in the memory is used as a random codebook. However, there are problems in that a very large storage capacity is required, and at the same time, the calculation processing amount for calculating the distortion of Equation (4) for each random code vector candidate is enormous.

作为解决此课题的一种方法,可举出《采用10毫秒话音帧的8kb/s ACELP话音编码:候选CCITT标准》(“8KBIT/S ACELP CODING OF SPEECH WITH 10MS SPEECH-FRAME:A CANDIDATE FOR CCITT STANDARDIZATION”)(R.Salami.C.Laflamme和J-P.Adoul著,刊登于ICASSP’94,pp.II-97~II-100,1994)等中所记载那样,采用以代数方式产生音源矢量的代数音源矢量生成单元的CELP型话音信号编码器/解码器。As a method for solving this problem, "8kb/s ACELP speech coding using 10ms speech frame: Candidate CCITT standard" ("8KBIT/S ACELP CODING OF SPEECH WITH 10MS SPEECH-FRAME: A CANDIDATE FOR CCITT STANDARDIZATION ") (by R.Salami.C.Laflamme and J-P.Adoul, published in ICASSP'94, pp.II-97~II-100, 1994), etc., using an algebraic sound source that generates a sound source vector algebraically CELP type speech signal encoder/decoder for vector generation unit.

然而,噪声码本采用上述代数音源生成单元的CELP型话音信号编码器/解码器中,常以少量脉冲近似表现由式(3)求得的噪声源信息(规定噪声码号的目标矢量),因而在谋求改善话音质量方面存在局限。实际察看式(3)中噪声源信息X的要素,则几乎没有仅用少量脉冲构成该要素的情况。由此可说明存在局限。Yet, in the CELP type voice signal encoder/decoder that adopts above-mentioned algebraic sound source generating unit in random code book, often with a small amount of pulse approximation expression by the noise source information that formula (3) obtains (specify the target vector of random code number), Therefore, there is a limit in seeking to improve voice quality. When actually looking at the elements of the noise source information X in equation (3), it is almost never the case that these elements are composed of only a small number of pulses. This shows that there are limitations.

发明内容Contents of the invention

本发明的目的在于提供一种新的音源矢量生成装置,该装置能产生与实际分析话音信号时所得音源矢量的形状和统计相似性高的音源矢量。The purpose of the present invention is to provide a new sound source vector generating device, which can generate sound source vectors with high shape and statistical similarity to the sound source vectors obtained when actually analyzing speech signals.

本发明的又一目的在于提供一种CELP话音信号编码器/解码器、话音信号通信系统和话音信号记录系统,它们通过将上述音源矢量生成装置用作噪声码本,可获得比代数音源生成单元用作噪声码本时品质高的合成话音。Another object of the present invention is to provide a kind of CELP speech signal coder/decoder, speech signal communication system and speech signal recording system, they can obtain than algebraic sound source generation unit by using above-mentioned source of sound vector generation device as noise code book Synthetic speech of high quality when used as a noise codebook.

本发明第1形态是一种音源矢量生成装置,其特征在于,包括:具有N个(N≥1)生成在矢量轴上某一要素建立带极性单元脉冲的脉冲矢量的通道的脉冲矢量生成单元;具有存储所述N中每一个通道M(M≥1)种扩散模式的功能,同时具有从所存M种扩散模式选择一种扩散模式的功能的扩散模式存储选择单元;具有每一通道进行所述脉冲矢量生成单元所输出脉冲矢量与所述扩散模式存储选择单元所选择扩散模式的卷积运算,并产生N个扩散矢量的功能的脉冲矢量扩散单元;具有将所述脉冲矢量扩散单元产生的N个扩散矢量相加后,产生音源矢量的功能的扩散矢量加法器。使所述脉冲矢量生成单元具有以代数方式产生N个脉冲矢量(N≥1)的功能,再加上所述扩散模式存储选择单元预先存储通过预先学习实际话音矢量的形状(特性)获得的扩散模式,因而可产生形状比以往的代数音源生成单元更好地接近实际音源矢量形状的音源矢量。The first aspect of the present invention is a sound source vector generation device, which is characterized in that it includes: pulse vector generation with N (N≥1) channels for generating a pulse vector with a polar unit pulse on a certain element on the vector axis Unit; has the function of storing M (M ≥ 1) kinds of diffusion patterns for each channel in the N, and has a diffusion pattern storage and selection unit capable of selecting a diffusion pattern from the stored M diffusion patterns; The pulse vector outputted by the pulse vector generation unit and the convolution operation of the diffusion mode selected by the diffusion mode storage selection unit, and generate N diffusion vectors; a pulse vector diffusion unit with the function of generating N diffusion vectors; After the N diffusion vectors are added, the function of generating the sound source vector is a diffusion vector adder. Make described impulse vector generating unit have the function of producing N impulse vectors (N≥1) in algebraic mode, add that described diffusion mode storage selection unit pre-stores the diffusion obtained by learning the shape (characteristic) of actual voice vector in advance mode, so it can generate a sound source vector whose shape is closer to the shape of the actual sound source vector than the previous algebraic sound source generation unit.

本发明第2形态是一种CELP话音信号编码器/解码器,其特征在于,噪声码本中采用所述音源矢量生成装置。与以往噪声码本中采用代数音源生成单元的话音信号编码器/解码器相比,能产生更接近实际形状的音源矢量,因此,能取得可输出质量更高的合成话音的话音信号编码器/解码器、话音信号通信系统和话音信号记录系统。A second aspect of the present invention is a CELP speech signal encoder/decoder characterized in that said sound source vector generator is used in a random codebook. Compared with the speech signal encoder/decoder using the algebraic sound source generation unit in the conventional noise codebook, it can generate the sound source vector closer to the actual shape, so the speech signal encoder/decoder that can output the synthesized speech with higher quality can be obtained. Decoder, voice signal communication system and voice signal recording system.

附图说明Description of drawings

图1为以往的CELP型话音信号编码器的功能框图。Fig. 1 is a functional block diagram of a conventional CELP type speech signal encoder.

图2为以往的CELP型话音信号解码器的功能框图。Fig. 2 is a functional block diagram of a conventional CELP type voice signal decoder.

图3为本发明第1实施形态有关音源矢量生成装置的功能框图。Fig. 3 is a functional block diagram of the sound source vector generating device according to the first embodiment of the present invention.

图4为本发明第2实施形态有关CELP型话音信号编码器的功能框图。Fig. 4 is a functional block diagram of a CELP type speech signal encoder according to a second embodiment of the present invention.

图5为本发明第2实施形态有关CELP型话音信号解码器的功能框图。Fig. 5 is a functional block diagram of a CELP type voice signal decoder according to the second embodiment of the present invention.

图6为本发明第3实施形态有关CELP型话音信号编码器的功能框图。Fig. 6 is a functional block diagram of a CELP type speech signal encoder according to a third embodiment of the present invention.

图7为本发明第4实施形态有关CELP型话音信号编码器的功能框图。Fig. 7 is a functional block diagram of a CELP type speech signal encoder according to a fourth embodiment of the present invention.

图8为本发明第5实施形态有关CELP型话音信号编码器的功能框图。Fig. 8 is a functional block diagram of a CELP type speech signal encoder according to a fifth embodiment of the present invention.

图9为第5实施形态中矢量量化功能的框图。Fig. 9 is a block diagram of the vector quantization function in the fifth embodiment.

图10为第5实施形态中提取目标的算法的说明图。Fig. 10 is an explanatory diagram of an algorithm for extracting objects in the fifth embodiment.

图11为第5实施形态中预测量化的功能框图。Fig. 11 is a functional block diagram of predictive quantization in the fifth embodiment.

图12为第6实施形态中预测量化的功能框图。Fig. 12 is a functional block diagram of predictive quantization in the sixth embodiment.

图13为第7实施形态中CELP型话音信号编码器的功能框图。Fig. 13 is a functional block diagram of a CELP type speech signal encoder in the seventh embodiment.

图14为第7实施形态中失真计算单元的功能框图。Fig. 14 is a functional block diagram of a distortion calculation unit in the seventh embodiment.

具体实施方式Detailed ways

下面利用附图说明本发明的实施形态。Embodiments of the present invention will be described below using the drawings.

(第1实施形态)(first embodiment)

图3中示出本发明实施形态有关音源矢量生成装置的功能框图。此音源矢量生成装置包括:具有多个通道的脉冲矢量生成单元101;具有扩散模式存储单元和开关的扩散模式存储选择单元102;扩散脉冲矢量的脉冲矢量扩散单元103;将扩散的多个通道脉冲矢量相加的扩散矢量加法器104。Fig. 3 is a functional block diagram of the sound source vector generation device according to the embodiment of the present invention. This sound source vector generating device includes: a pulse vector generating unit 101 with multiple channels; a diffusion mode storage selection unit 102 with a diffusion mode storage unit and a switch; a pulse vector diffusion unit 103 of a diffusion pulse vector; a plurality of channel pulses to be diffused Diffusion vector adder 104 for vector addition.

脉冲矢量生成单元101具有N个通道(本实施形态中对N=3的情况进行说明),这些通道生成在矢量轴上某一要素配置带极性单元脉冲的矢量(下文称为脉冲矢量)。The pulse vector generation section 101 has N channels (in the present embodiment, the case of N=3 will be described), and these channels generate a vector (hereinafter referred to as a pulse vector) in which a unit pulse with a polarity is arranged in a certain element on the vector axis.

扩散模式存储选择单元102具有存储单元M1~M3和开关SW1~SW3,前者对每一通道存储M种扩散模式(本实施形态中对M=2的情况进行说明),后者从各存储单元M1~M3分别选择M种扩散模式中的一种扩散模式。The diffusion pattern storage selection unit 102 has memory cells M1-M3 and switches SW1-SW3, the former stores M kinds of diffusion patterns for each channel (the case of M=2 will be described in this embodiment), and the latter selects from each memory cell M1 ~M3 respectively select one of the M diffusion modes.

脉冲矢量扩散单元103对每一通道进行脉冲矢量生成单元101所输出脉冲矢量与扩散模式存储选择单元102所输出扩散模式的卷积运算,并产生N个扩散矢量。The pulse vector diffusion unit 103 performs convolution operation of the pulse vector output by the pulse vector generation unit 101 and the diffusion pattern output by the diffusion pattern storage selection unit 102 for each channel, and generates N diffusion vectors.

扩散矢量加法器104将脉冲矢量扩散单元103生成的N个扩散矢量相加后,生成音源矢量105。The diffuse vector adder 104 adds the N diffuse vectors generated by the pulse vector diffuser 103 to generate the sound source vector 105 .

本实施形态中,对脉冲矢量生成单元101按照下列表1所记载规则以代数方式产生N个脉冲矢量(N=3)的情况进行说明。In this embodiment, the case where the pulse vector generation unit 101 algebraically generates N pulse vectors (N=3) according to the rules described in Table 1 below will be described.

表1Table 1

Figure C9880155600141
Figure C9880155600141

说明上文所述那样构成的音源矢量生成装置的运作。扩散模式存储选择单元102从每通道分别存储2种的扩散模式中选择1种,并输出到脉冲矢量扩散单元103。但,对应于选择的扩散模式组合(组合总数MN=8种),专门分配号码。The operation of the sound source vector generator configured as described above will be described. The diffusion pattern storage selection section 102 selects one of the two diffusion patterns stored for each channel, and outputs it to the pulse vector diffusion section 103 . However, numbers are assigned exclusively corresponding to the selected diffusion pattern combinations (the total number of combinations M N =8 types).

接着,脉冲矢量生成单元101按照表1记载的规则,以代数方式生成通道数量份额的脉冲矢量(本实施形态中为3)。Next, pulse vector generating section 101 algebraically generates pulse vectors corresponding to the number of channels (3 in this embodiment) according to the rules described in Table 1.

脉冲矢量扩散单元103将扩散模式存储选择单元102选择的扩散模式和脉冲矢量生成单元101生成的脉冲用式(5)作卷积运算,对每一通道生成扩散矢量。The pulse vector diffusion unit 103 performs convolution operation on the diffusion pattern selected by the diffusion pattern storage selection unit 102 and the pulse generated by the pulse vector generation unit 101 using equation (5) to generate a diffusion vector for each channel.

cici (( nno )) == ΣΣ kk == 00 LL -- 11 wijwij (( nno -- kk )) didi (( kk )) -- -- -- (( 55 ))

其中,n:0~L-1Among them, n: 0~L-1

L:扩散矢量长度L: Diffusion vector length

i:通道号i: channel number

j:扩散模式号(j=1~M)j: Diffusion pattern number (j=1~M)

ci:通道i的扩散矢量ci: Diffusion vector for channel i

wij:通道i的第j种扩散模式wij: the jth diffusion mode of channel i

Wij(m)的矢量长度为2L-1(m:-(L-1)~L-1),但,2L-1个要素中能规定值的是Lij要素,其它要素为零The vector length of Wij(m) is 2L-1(m: -(L-1)~L-1), but among the 2L-1 elements, the Lij element is the element that can specify the value, and the other elements are zero

di:通道i的脉冲矢量di: pulse vector of channel i

di=±δ(n-pi),n=0~L-1di=±δ(n-pi), n=0~L-1

pi:通道i的候补脉冲矢量pi: alternate pulse vector of channel i

扩散矢量加法器104利用公式(6)将脉冲矢量扩散单元103生成的3个扩散矢量相加后,产生音源矢量105。The diffusion vector adder 104 adds the three diffusion vectors generated by the pulse vector diffusion unit 103 using the formula (6), and generates the sound source vector 105 .

cc (( nno )) == ΣΣ ii == 11 NN cici (( nno )) -- -- -- (( 66 ))

c:音源矢量c: sound source vector

ci:扩散矢量ci: diffusion vector

i:通道号(i=1~N)i: channel number (i=1~N)

n:矢量要素号(n=0~L-1,其中L为音源矢量长度)n: vector element number (n=0~L-1, where L is the length of the sound source vector)

这样构成的音源矢量生成装置,通过使扩散模式存储选择单元102所选择扩散模式的组合方法以及脉冲矢量生成单元101所生成脉冲矢量中的脉冲位置和极性带有变化,可产生多样的音源矢量。The sound source vector generation device configured in this way can generate various sound source vectors by changing the combination method of the diffusion pattern selected by the diffusion pattern storage selection unit 102 and the pulse position and polarity in the pulse vector generated by the pulse vector generation unit 101. .

于是,以上那样构成的音源矢量生成装置,对扩散模式存储选择单元102所选择扩散模式的组合方法和脉冲矢量生成单元101所生成脉冲矢量形状(脉冲位置和脉冲极性)的组合方法等2种信息,可预先分配分别一一对应的号码。扩散模式存储选择单元102中,还可根据实际音源信息进行预先学习,并预先存储该学习结果所得的扩散模式。Then, the sound source vector generation device configured as above stores two types of diffusion patterns, such as a combination method of the diffusion pattern selected by the selection section 102 and a combination method of the pulse vector shape (pulse position and pulse polarity) generated by the pulse vector generation section 101. Information can be pre-assigned one-to-one corresponding numbers. The diffusion pattern storage selection unit 102 may also perform pre-learning according to actual sound source information, and pre-store the diffusion pattern obtained from the learning result.

若话音信号编码器/解码器的音源信息生成单元中采用上述音源矢量生成装置,则通过传送扩散模式存储选择单元所选择扩散模式的组合号和脉冲矢量生成单元所生成脉冲矢量的组合号(能规定脉冲位置和脉冲极性)等2种号码。能实现噪声源信息的传送。If adopt above-mentioned source of sound vector generation device in the source of sound information generation unit of speech signal coder/decoder, then store the combination number of the combination number of the selected diffusion mode of the selection unit and the combination number of the pulse vector generated by the pulse vector generation unit by transmitting the diffusion mode (can Two types of numbers such as pulse position and pulse polarity are specified. The transmission of noise source information can be realized.

又,采用以上那样构成的音源矢量生成单元时,与采用以代数方式生成的脉冲音源时相比,可产生形状(特性)和实际音源信息相似的音源矢量。Also, when using the above-configured sound source vector generating means, it is possible to generate an sound source vector whose shape (characteristic) is similar to the actual sound source information compared to using an algebraically generated pulse sound source.

本实施形态中,对扩散模式存储选择单元102存储每一通道2种扩散模式的情况进行说明,但在对各通道分配2种以外的扩散模式时,也能取得同样的作用和效果。In this embodiment, the case where the diffusion pattern storage and selection section 102 stores two diffusion patterns for each channel is described, but the same action and effect can be obtained when other than two diffusion patterns are allocated to each channel.

本实施形态中对脉冲矢量生成单元101由3通道组成而且以表1所记载脉冲生成规则为基础的情况进行说明,但在信道数不同时,以及脉冲生成规则采用表1所记载以外的规则时,也能取得同样的作用和效果。In this embodiment, the case where the pulse vector generation unit 101 consists of three channels and is based on the pulse generation rules listed in Table 1 will be described. However, when the number of channels is different, or when the pulse generation rules other than those listed in Table 1 are used , can also achieve the same function and effect.

此外,组成具有上述音源矢量生成装置或话音信号编码器/解码器的话音信号通信系统或话音信号记录系统,可取得上述音源矢量生成装置所具有的作用和效果。Furthermore, by constructing a speech signal communication system or a speech signal recording system having the above-mentioned sound source vector generating device or speech signal encoder/decoder, the functions and effects possessed by the above-mentioned sound source vector generating device can be obtained.

(第2实施形态)(Second Embodiment)

图4中示出本实施形态有关CELP型话音信号编码器的功能框图,图5中示出本实施形态有关CELP型话音信号解码器的功能框图。FIG. 4 shows a functional block diagram of a CELP-type voice signal encoder of this embodiment, and FIG. 5 shows a functional block diagram of a CELP-type voice signal decoder of this embodiment.

有关本实施形态的CELP型话音信号编码器,在上述图1 CELP型话音信号编码器的噪声码本中应用第1实施形态所说明的音源矢量生成装置。与本实施形态有关的CELP型话音信号解码器,在上述图2 CELP话音信号解码器的噪声码本中,应用上述第1实施形态的音源矢量生成装置。因此,除噪声源信息矢量量化处理以外的处理,均与上述图1、图2的装置相同。本实施形态中,以噪声源信息矢量量化处理为中心说明话音信号编码器和话音信号解码器。而且,与第1实施形态相同,也设通道数N=3,一个通道的扩散模式数M=2,脉冲矢量的生成依据图1。In the CELP-type speech signal encoder of this embodiment, the excitation vector generator described in the first embodiment is applied to the noise codebook of the CELP-type speech signal encoder in FIG. 1 above. In the CELP type speech signal decoder related to this embodiment, the sound source vector generator of the first embodiment is applied to the noise codebook of the CELP speech signal decoder in FIG. 2 above. Therefore, the processing other than the noise source information vector quantization processing is the same as that of the above-mentioned devices in FIG. 1 and FIG. 2 . In this embodiment, the speech signal encoder and the speech signal decoder will be described centering on the noise source information vector quantization process. Also, as in the first embodiment, the number of channels N=3, the number of diffusion modes per channel M=2, and the generation of pulse vectors is based on FIG. 1 .

图4话音信号编码器中的噪声源矢量量化处理是规定使式(4)基准值最大的2种号码(扩散模式组合号、脉冲位置和脉冲极性组合号)的处理。The noise source vector quantization processing in the voice signal encoder of Fig. 4 is a process of specifying two types of numbers (diffusion pattern combination number, pulse position and pulse polarity combination number) that maximize the reference value of formula (4).

将图3音源矢量生成装置用作噪声码本时,用闭环规定扩散模式组合号(8种)和脉冲矢量组合号(考虑极性时为16384种)。When the sound source vector generating device in Fig. 3 is used as a noise codebook, the combination numbers of diffusion patterns (8 types) and pulse vector combinations (16384 types when polarity is considered) are specified in a closed loop.

因此,扩散模式存储选择单元215首先从本身存储的2种扩散模式中选择一种扩散模式,并输出到脉冲矢量扩散单元217。然后,脉冲矢量生成单元216按照图1的规则,以代数方法产生通道数份额的脉冲矢量(本实施形态中为3个),并输出到脉冲矢量扩散单元217。Therefore, the diffusion pattern storage selection section 215 first selects one of the two diffusion patterns stored in itself, and outputs it to the pulse vector diffusion section 217 . Then, pulse vector generating section 216 algebraically generates pulse vectors corresponding to the number of channels (three in this embodiment) according to the rule in FIG. 1 , and outputs the pulse vectors to pulse vector spreading section 217 .

脉冲矢量扩散单元217将扩散模式存储选择单元215选择的扩散模式和脉冲矢量生成单元216产生的脉冲矢量用式(5)的卷积运算对每一通道产生扩散矢量。The pulse vector diffusion unit 217 uses the convolution operation of formula (5) to generate a diffusion vector for each channel by using the diffusion mode selected by the diffusion mode storage selection unit 215 and the pulse vector generated by the pulse vector generation unit 216 .

扩散矢量加法器218将脉冲矢量扩散单元217获得的扩散矢量相加后,生成音源矢量(成为候补噪声码矢量)。The diffuse vector adder 218 adds the diffuse vectors obtained by the pulse vector spreading unit 217 to generate an excitation vector (which becomes a candidate random code vector).

然后,失真计算单元206算出采用扩散矢量加法器218所得候补噪声码矢量的式(4)的值。对按表1的规则产生的脉冲矢量的全部组合,进行上述式(4)的值的运算,并将其中式(4)的值最大时的扩散模式组合号、脉冲矢量组合号(脉冲位置及其极性的组合),以及当时的最大值输出到码号规定单元213。Then, distortion calculating section 206 calculates the value of Equation (4) of the candidate random code vector obtained by using diffuse vector adder 218 . For all the combinations of the pulse vectors produced by the rules of Table 1, the calculation of the value of the above formula (4) is carried out, and the diffusion mode combination number, the pulse vector combination number (pulse position and combination of its polarity), and the maximum value at that time is output to the code number specifying unit 213.

接着,扩散模式存储选择单元215从所得存储扩散模式选择与先前刚选择的组合不同的扩散模式。然后,就新选择并更改的扩散模式的组合,与上文所述相同,对按表1的规则在脉冲矢量生成单元216产生的全部脉冲矢量组合算出式(4)的值。再次将其中式(4)最大时的扩散模式组合号、脉冲矢量组合号和最大值再次输出到码号规定单元213。Next, the diffusion pattern storage selection unit 215 selects a diffusion pattern different from the combination just previously selected from the obtained stored diffusion patterns. Then, for the newly selected and changed combination of diffusion patterns, the value of Equation (4) is calculated for all pulse vector combinations generated by pulse vector generating section 216 according to the rules in Table 1, as described above. The diffusion pattern combination number, the pulse vector combination number, and the maximum value when the formula (4) is maximum are output to the code number specifying unit 213 again.

对能从扩散模式存储选择单元215所存扩散模式选择的全部组合(本实施形态说明中组合总数为8)反复进行上述处理。The above processing is repeated for all the combinations that can be selected from the diffusion patterns stored in the diffusion pattern storage selection unit 215 (the total number of combinations is 8 in the description of this embodiment).

码号规定单元213比较失真计算单元206算出的全部8个矢量最大值,选择其中最大的,规定产生该最大值时的2种组合号(扩散模式组合号、脉冲矢量组合号),并作为噪声码号输出到编码输出单元214。The code number specifying unit 213 compares all 8 vector maximum values calculated by the distortion calculation unit 206, selects the largest one, specifies two combination numbers (diffusion mode combination number, pulse vector combination number) when the maximum value is generated, and uses it as noise The code number is output to the code output unit 214 .

另一方面,图5的话音信号解码器中,编码输入单元301接收话音信号编码器(图4)送来的编码,将接收到的编码分解成对应的线性预测号码、自适应码号、噪声码号(由扩散模式组合号、脉冲矢量组合号等2种组成)和加权码号,并将分解所得的编码分别输出到线性预测系数解码单元302、自适应码本303、噪声码本304和加权码本305。On the other hand, in the speech signal decoder of Fig. 5, the code input unit 301 receives the code sent by the speech signal coder (Fig. 4), and decomposes the received code into corresponding linear prediction number, adaptive code number, noise code number (consisting of two kinds of diffusion mode combination number and pulse vector combination number) and weighted code number, and the code obtained by decomposing is output to linear prediction coefficient decoding unit 302, adaptive codebook 303, noise codebook 304 and Weighted codebook 305 .

噪声码号中,扩散模式组合号输出到扩散模式存储选择单元311、脉冲矢量组合号输出到脉冲矢量生成单元312。Among the random code numbers, the combination number of the diffusion mode is output to the storage selection unit 311 of the diffusion mode, and the combination number of the pulse vector is output to the generation unit 312 of the pulse vector.

然后,线性预测系数解码单元302将线性预测码号解码,取得合成滤波器系数,并输出到合成滤波器309。在自适应码本303,从与自适应码号对应的位置读出自适应码矢量。Then, linear prediction coefficient decoding section 302 decodes the linear prediction code number to obtain synthesis filter coefficients, and outputs them to synthesis filter 309 . In adaptive codebook 303, adaptive code vectors are read from positions corresponding to adaptive code numbers.

噪声码本304中,扩散模式存储选择单元311对每一通道读出与扩散脉冲组合号对应的扩散模式并输出到脉冲矢量扩散单元313;脉冲矢量生成单元312产生通道数份额的与脉冲矢量组合号对应的脉冲矢量并输出到脉冲矢量扩散单元313;脉冲矢量扩散单元313将从扩散模式存储选择单元311接收到的扩散模式和从脉冲矢量生成单元312接收到的脉冲矢量用式(5)的卷积运算产生扩散矢量,并输出到扩散矢量加法器314。扩散矢量加法器314将脉冲矢量扩散单元313产生的各通道的扩散矢量相加后,产生噪声码矢量。In the noise codebook 304, the diffusion mode storage selection unit 311 reads out the diffusion mode corresponding to the diffusion pulse combination number for each channel and outputs it to the pulse vector diffusion unit 313; the pulse vector generation unit 312 generates the combination of the channel number and the pulse vector No. corresponding pulse vector and output to pulse vector diffusion unit 313; Pulse vector diffusion unit 313 will receive the diffusion pattern from diffusion pattern storage selection unit 311 and the pulse vector received from pulse vector generation unit 312 with formula (5) The convolution operation produces a diffuse vector, which is output to the diffuse vector adder 314 . The spread vector adder 314 adds the spread vectors of each channel generated by the pulse vector spread unit 313 to generate a random code vector.

从加权码本305读出与加权码号对应的自适应码增益和噪声码增益,并在自适应码矢量加权单元306将自适应码矢量乘以自适应码增益,同样,在噪声码加权单元307将噪声码矢量乘以噪声码增益后,送到加法器308。Read out the adaptive code gain and the random code gain corresponding to the weighted code number from the weighted code book 305, and multiply the adaptive code vector by the adaptive code gain at the adaptive code vector weighting unit 306, and similarly, at the random code weighting unit 307 multiplies the random code vector by the random code gain, and sends it to the adder 308 .

加法器308将已乘上增益的上述2个码矢量相加,生成驱动音源矢量,并将生成的驱动音源矢量输出到自适应码本303,以便更新缓存器,还输出到合成滤波器309,以便驱动合成滤波器。The adder 308 adds the above-mentioned 2 code vectors that have been multiplied by the gain to generate a driving source vector, and outputs the generated driving source vector to the adaptive codebook 303 so as to update the register, and also outputs to the synthesis filter 309, in order to drive the synthesis filter.

合成滤波器309用加法器308所得的驱动音源矢量驱动后,再生合成话音310。又,自适应码本303用从加法器308接收的驱动音源矢量更新缓存器。The synthesis filter 309 is driven by the driving sound source vector obtained by the adder 308 to reproduce the synthesized speech 310 . Also, the adaptive codebook 303 updates the buffer with the driving excitation vector received from the adder 308 .

但,图4和图5中的扩散模式存储选择单元取为将式(2)内的C中代入式(6)记载的音源矢量所得式(7)的失真计算基准式当作代价函数,并预先学习,使该代价函数的值较小后,将学习所得的扩散模式按每一通道加以存储。But, the diffusion mode storage selection unit in Fig. 4 and Fig. 5 is taken as the distortion calculation reference formula of formula (7) obtained by substituting the sound source vector recorded in formula (6) in C in formula (2) as a cost function, and After learning in advance to make the value of the cost function smaller, the learned diffusion pattern is stored for each channel.

通过上述操作,可生成形状与实际噪声源信息(式(4)中矢量X)的形状相似的音源矢量,因而与噪声码本中采用代数音源矢量生成单元的CELP话音信号编码器/解码器相比,能取得质量高的合成话音。Through the above operations, the sound source vector whose shape is similar to the shape of the actual noise source information (vector X in formula (4)) can be generated, so it is similar to the CELP speech signal encoder/decoder using the algebraic sound source vector generation unit in the noise codebook than, high-quality synthetic speech can be obtained.

EcEc == || || Xx -- gcHwxya ΣΣ ii == 11 NN CiCi || || 22

== ΣΣ nno == 00 LL -- 11 (( Xx (( nno )) -- gcHwxya ΣΣ ii == 11 NN CiCi (( nno )) )) 22

== ΣΣ nno == 00 LL -- 11 (( Xx (( nno )) -- gcHwxya ΣΣ ii == 11 NN ΣΣ kk == 00 LL -- 11 WijWij (( nno -- kk )) didi (( kk )) )) 22 -- -- -- (( 77 ))

X:规定噪声码号用的目标矢量X: Specifies the target vector for the noise code number

gc:噪声码增益gc: noise code gain

H:合成滤波器脉冲响应卷积矩阵H: synthesis filter impulse response convolution matrix

C:噪声码矢量C: random code vector

i:通道号(i=1~N)i: channel number (i=1~N)

j:扩散模式号(i=1~M)j: Diffusion pattern number (i=1~M)

ci:通道i的扩散矢量ci: Diffusion vector for channel i

Wij:通道i的第j种扩散模式Wij: the jth diffusion mode of channel i

di:通道i的脉冲矢量di: pulse vector of channel i

L:音源矢量长度(n=0~L-1)L: sound source vector length (n=0~L-1)

本实施形态中对每一通道预先存储M个进行预先学习,使式(7)的代价函数值较小后获得的扩散模式的情况进行说明,但实际上M个扩散模式不必全部通过学习取得,如果做成每一通道至少预先存储一种通过学习取得的扩散模式,这种情况下也能取得提高合成话音质量的作用和效果。In this embodiment, each channel pre-stores M diffusion patterns obtained by pre-learning to make the cost function value of formula (7) smaller. If each channel is made to store at least one diffusion pattern obtained through learning in advance, the function and effect of improving the quality of synthesized voice can also be obtained in this case.

本实施形态中说明的情况是,根据扩散模式存储选择单元所存储扩散模式的全部组合和脉冲矢量生成单元6所生成候补脉冲矢量位置的全部组合,用闭环规定使式(4)中基准值最大的组合号。然而,做成根据规定噪声码本号前求得的参数(自适应码矢量的理想增益等),进行预选,或用开环进行检索等,也能取得同样的作用和效果。The situation described in this embodiment is that according to all the combinations of the diffusion patterns stored by the diffusion pattern storage selection unit and all the combinations of the candidate pulse vector positions generated by the pulse vector generation unit 6, the reference value in the formula (4) is maximized with a closed-loop regulation combination number. However, the same action and effect can also be obtained by performing preselection based on parameters obtained before specifying the random codebook number (ideal gain of adaptive code vector, etc.), or performing search by open loop.

此外,通过构成具有上述话音信号编码器/解码器的话音信号通信系统或话音信号记录系统,能取得第1实施形态中记载的音源矢量生成装置所具有的作用和效果。Furthermore, by configuring a speech signal communication system or a speech signal recording system having the above-mentioned speech signal encoder/decoder, the functions and effects of the sound source vector generation device described in the first embodiment can be obtained.

(第3实施形态)(third embodiment)

图6中示出本实施形态有关CELP型话音信号编码器的功能框图。本实施形态在噪声码本内采用上述第1实施形态音源矢量生成装置的CELP话音编码器中,用检索噪声码本前求得的理想自适应码增益值,进行扩散模式存储选择单元所存扩散模式的预选。除噪声码本外围部分以外,都与图4的CELP型话音信号编码器相同。因此,本实施形态说明图6 CELP型话音信号编码器中的噪声源信息矢量量化处理。Fig. 6 shows a functional block diagram of a CELP type speech signal encoder according to this embodiment. In this embodiment, in the CELP speech coder adopting the sound source vector generating device of the first embodiment in the noise codebook, the ideal adaptive code gain value obtained before searching the noise codebook is used to perform the diffusion pattern stored in the selection unit. preselection. Except for the peripheral part of the random codebook, it is the same as the CELP type speech signal encoder in Fig. 4. Therefore, this embodiment describes the vector quantization processing of noise source information in the CELP type speech signal encoder of FIG. 6.

此CELP型话音信号编码器具有自适应码本407、自适应码增益加权单元409、由实施形态1所说明音源矢量生成装置构成的噪声码本408、噪声码增益加权单元410、合成滤波器405、失真计算单元406、码号规定单元413、扩散模式存储选择单元415、脉冲矢量生成单元416、脉冲矢量扩散单元417、扩散矢量加法器418和自适应增益判定单元419。This CELP type voice signal encoder has an adaptive codebook 407, an adaptive code gain weighting unit 409, a random codebook 408 composed of the sound source vector generation device described in Embodiment 1, a random code gain weighting unit 410, and a synthesis filter 405. , a distortion calculation unit 406 , a code number specification unit 413 , a diffusion pattern storage selection unit 415 , an impulse vector generation unit 416 , an impulse vector diffusion unit 417 , a diffusion vector adder 418 and an adaptive gain determination unit 419 .

但,本实施形态中,上述扩散模式存储选择单元415存储的M种扩散模式(M≥2)中至少一种是预先学习,使噪声源信息矢量量化时产生的量化失真较小,并由该学习结果得到的扩散模式。However, in this embodiment, at least one of the M types of diffusion patterns (M≥2) stored in the diffusion pattern storage selection unit 415 is learned in advance, so that the quantization distortion generated when the noise source information is vectorized is small, and by this Diffusion patterns obtained as a result of learning.

为了说明简便,本实施形态中设脉冲矢量生成单元的通道数N为3,扩散模式存储选择单元所存储每一通道的扩散脉冲种类数M为2,而且取下列情况进行说明:M种扩散模式(M=2)的一种是由上述学习获得的扩散模式,另一种是由随机数矢量生成装置生成的随机矢量串(下文称为随机模式)。附带说一下,上述由学习获得的扩散模式,如图3中的W11那样,显然为长度比较短的脉冲状扩散模式。For ease of description, the channel number N of the pulse vector generation unit is set to be 3 in the present embodiment, and the diffusion pulse type number M of each channel stored in the diffusion mode storage selection unit is 2, and the following situations are taken for illustration: M kinds of diffusion modes One of (M=2) is the diffusion pattern obtained by the above-mentioned learning, and the other is a random vector string (hereinafter referred to as a random pattern) generated by the random number vector generator. Incidentally, the above-mentioned diffusion pattern obtained by learning is clearly a pulse-like diffusion pattern with a relatively short length, like W11 in FIG. 3 .

图6的CELP型话音信号编码器中,在噪声源信息矢量量化前进行规定自适应码本号的处理。因此,在进行噪声源信息矢量量化处理的时刻,可参照自适应码本的矢量号(自适应码号)和理想自适应码增益(暂时确定)。本实施形态中,使用其中的理想自适应码增益值进行扩散脉冲的预选。In the CELP type speech signal encoder of FIG. 6, processing for specifying an adaptive codebook number is performed before vector quantization of noise source information. Therefore, the vector number (adaptive code number) of the adaptive codebook and the ideal adaptive code gain (temporarily determined) can be referred to when the noise source information vector quantization process is performed. In this embodiment, the ideal adaptive code gain value among them is used for the preselection of the diffused pulses.

具体而言,首先,自适应码本检索完毕后,立即将码号规定单元413所保持的自适应码增益理想值输出到失真计算单元406。失真计算单元406将从码号规定单元413接收到的自适应码增益输出到自适应增益判定单元419。Specifically, first, immediately after the adaptive codebook search is completed, the ideal value of the adaptive code gain held by the code number specifying section 413 is output to the distortion calculating section 406 . Distortion calculating section 406 outputs the adaptive code gain received from code number specifying section 413 to adaptive gain determining section 419 .

自适应增益判定单元419对从失真计算单元409接收到的理想自适应增益值和预先设定的门限值的大小进行比较。接着,自适应增益判定单元419根据上述大小比较的结果,将预先用的控制信号送到扩散模式存储选择单元415。控制信号的内容在上述大小比较中自适应码增益大时,指示选择进行预先学习、使噪声源信息矢量量化时所产生量化失真较小后得到的扩散模式,并在上述大小比较中自适应码增益不大时,指示预选与学习结果所得扩散模式不同的扩散模式。The adaptive gain determination unit 419 compares the ideal adaptive gain value received from the distortion calculation unit 409 with a preset threshold value. Next, adaptive gain determining section 419 sends a preliminarily used control signal to diffusion pattern storage selecting section 415 based on the result of the magnitude comparison. The content of the control signal indicates that when the gain of the adaptive code is large in the above size comparison, it indicates to select the diffusion mode obtained by performing pre-learning and making the quantization distortion generated during the vector quantization of the noise source information smaller, and the adaptive code in the above size comparison When the gain is not large, it indicates that a diffusion pattern different from the diffusion pattern obtained from the learning result is preselected.

结果,在扩散模式存储选择单元415中,可适应自适应增益的大小,预选各通道存储的M种扩散模式(M=2),从而能大量减少扩散模式组合数。结果,不需要对扩散模式的全部组合号计算失真,能以少量的运算高效进行噪声源信息的矢量量化处理。As a result, in the diffusion pattern storage selection unit 415, M types of diffusion patterns (M=2) stored in each channel can be pre-selected according to the size of the adaptive gain, so that the number of diffusion pattern combinations can be greatly reduced. As a result, there is no need to calculate distortion for all combination numbers of the diffusion pattern, and vector quantization processing of noise source information can be efficiently performed with a small number of calculations.

再者,噪声码矢量的形状在自适应增益值大时(有声性强时)为脉冲状,自适应增益值小时(有声性弱时)为随机状。因此,对话音信号的有声区域和无声区域,可分别利用形状适合的噪声码矢量,所以能提高合成话音的质量。Furthermore, the shape of the random code vector is pulse-shaped when the adaptive gain value is large (when the voice is strong), and is random when the adaptive gain value is small (when the voice is weak). Therefore, random code vectors with appropriate shapes can be used for the voiced area and the unvoiced area of the voice signal, so that the quality of the synthesized voice can be improved.

为了说明简便,本实施形态限定于脉冲矢量生成单元的通道数N为3,扩散模式存储选择单元所存每一通道扩散脉冲的种类数M为2的情况下进行说明。然而,脉冲矢量生成单元的通道数、扩散模式存储选择单元中每一通道的扩散模式数与上述说明不同时,也能取得同样的效果和作用。For simplicity of description, this embodiment is limited to the case where the number N of channels of the pulse vector generation unit is 3, and the number M of types of diffusion pulses stored in each channel in the diffusion mode storage and selection unit is 2. However, when the number of channels of the pulse vector generation unit and the number of diffusion patterns per channel in the diffusion pattern storage selection unit are different from those described above, the same effects and actions can be obtained.

为了说明简便,本实施形态对每一通道所存M种扩散模式中(M=2),一种为由上述学习获得的扩散模式,另一种为随机模式的情况进行说明。然而,如果每一通道至少预先存储一种由学习取得的扩散模式,即便不是上述那样的情况,也能期望取得同样的效果和作用。For simplicity of description, in this embodiment, among the M kinds of diffusion patterns stored in each channel (M=2), one is the diffusion pattern obtained by the above-mentioned learning, and the other is a random pattern. However, if each channel stores at least one diffusion pattern obtained by learning in advance, even if it is not the case as above, the same effect and action can be expected.

本实施形态对具有将自适应码增益大小信息作为预选扩散模式用的手段的情况进行说明,但如果兼用自适应增益大小信息以外的表示话音信号短时间特征的参数,可期望获得更进一步的效果和作用。In the present embodiment, the case where the adaptive code gain size information is used as a means for preselecting the diffusion pattern is described, but further effects can be expected to be obtained by using parameters other than the adaptive gain size information that represent the short-term characteristics of the voice signal. and function.

此外,通过构成具有上述话音信号编码器的话音信号通信系统或话音信号记录系统,可获得实施形态1中记载的音源矢量生成装置所具有的作用和效果。Furthermore, by constructing a speech signal communication system or a speech signal recording system having the above-mentioned speech signal encoder, the functions and effects of the sound source vector generation device described in Embodiment 1 can be obtained.

再者,本实施形态说明了利用在噪声源信息量化的时刻可参照的当前处理帧理想自适应音源增益,预选扩散模式的方法,但不用当前帧理想自适应音源增益,而代之以利用在紧接前一帧求出的解码自适应音源增益时,可取同样的结构,这时也能取得相同的效果。Moreover, this embodiment has explained the method of preselecting the diffusion mode by using the ideal adaptive sound source gain of the current processing frame that can be referred to when the noise source information is quantized, but instead of using the ideal adaptive sound source gain of the current frame The same structure can be adopted when decoding the adaptive sound source gain calculated immediately before the previous frame, and the same effect can also be obtained at this time.

(第4实施形态)(fourth embodiment)

图7为本实施形态有关CELP型话音信号编码器的功能框图。本实施形态在噪声码本内采用第1实施形态音源矢量生成装置的CELP型话音信号编码器中,使用在噪声源信息矢量量化的时刻可利用的信息,预选扩散模式存储选择单元所存储的多个扩散模式。作为此预选的基准,其特征在于,使用规定自适应码本号时所产生编码失真(用S/N比表示)的大小。Fig. 7 is a functional block diagram of a CELP type speech signal encoder according to this embodiment. In the present embodiment, in the CELP type speech signal encoder employing the sound source vector generation device of the first embodiment in the noise codebook, information available at the time of vector quantization of the noise source information is used to preselect the multiplicity of the diffusion patterns stored in the selection unit. a diffusion pattern. As a criterion for this preselection, it is characterized in that the magnitude of coding distortion (indicated by the S/N ratio) generated when the adaptive codebook number is specified is used.

除噪声码本外围部分以外,均与图4CELP型话音信号编码器相同。因此,本实施形态详细说明噪声源信息的矢量量化处理。Except for the peripheral part of the random codebook, it is the same as that of the CELP type speech signal encoder in Fig. 4 . Therefore, this embodiment describes in detail the vector quantization processing of noise source information.

如图7所示,本实施形态的CELP型话音信号编码器具有自适应码本507、自适应码增益加权单元509、由第1实施形态中说明的音源矢量生成装置构成的噪声码本508、噪声码增益加权单元510、合成滤波器505、失真计算单元506、码号规定单元513、扩散模式存储选择单元515、脉冲矢量生成单元516、脉冲矢量扩散单元517、扩散矢量加法器518和失真功率判定单元519。As shown in FIG. 7, the CELP type speech signal encoder of this embodiment has an adaptive codebook 507, an adaptive code gain weighting unit 509, a noise codebook 508 composed of the sound source vector generation device described in the first embodiment, Noise code gain weighting unit 510, synthesis filter 505, distortion calculation unit 506, code number regulation unit 513, diffusion mode storage selection unit 515, pulse vector generation unit 516, pulse vector diffusion unit 517, diffusion vector adder 518 and distortion power Judgment unit 519.

但,本实施形态中,取上述扩散模式存储选择单元515所存M种扩散模式中(M≥2),至少一种为随机模式。However, in this embodiment, at least one of the M kinds of diffusion patterns (M≥2) stored in the diffusion pattern storage and selection unit 515 is a random pattern.

为了说明简便,本实施形态中,取脉冲矢量生成单元的通道数N为3,扩散模式存储选择单元所存每一通道扩散模式的种类数M为2,而且假设M种扩散模式中(M=2)一种为随机模式,另一种为预先学习,使噪声源信息矢量量化所产生的量化失真较小后,由该学习结果所得的扩散模式。For ease of description, in the present embodiment, the number of channels N of the pulse vector generation unit is 3, and the number M of types of diffusion patterns of each channel stored in the diffusion pattern storage selection unit is 2, and it is assumed that in M kinds of diffusion patterns (M=2 ) one is a random mode, and the other is pre-learned, the diffusion mode obtained from the learning result after the quantization distortion generated by the noise source information vector quantization is small.

图7的CELP型话音信号编码器中,在噪声源信息矢量量化处理前进行规定自适应码本号的处理。因此,在进行噪声源号矢量量化处理的时刻,可参照自适应码本的矢量号(自适应码号)、理想自适应码增益(暂时确定)和自适应码本检索用的目标矢量。本实施形态中使用可根据上述三种信息算出的自适应码本编码失真(用S/N比表示),进行扩散模式的预选。In the CELP type speech signal encoder of FIG. 7, the process of specifying the adaptive codebook number is performed before the noise source information vector quantization process. Therefore, at the time of vector quantization processing of the noise source number, the vector number of the adaptive codebook (adaptive code number), the ideal adaptive code gain (determined temporarily), and the target vector for adaptive codebook search can be referred to. In this embodiment, the preselection of the diffusion pattern is performed using adaptive codebook coding distortion (expressed by S/N ratio) which can be calculated from the above three kinds of information.

具体而言,自适应码本检索完毕后,立即将码号规定单元513所保持的自适应码号和自适应码增益(理想增益)的值输出到失真计算单元506。失真计算单元506利用从码号规定单元513接收到的自适应码号和自适应码增益,以及自适应码本检索用的目标矢量,算出由规定自适应码本号而产生的编码失真(S/N比)。将算出的S/N比输出到失真功率判定单元519。Specifically, immediately after the adaptive codebook search is completed, the adaptive code number and the value of the adaptive code gain (ideal gain) held by code number specifying section 513 are output to distortion calculating section 506 . Distortion calculating section 506 calculates the encoding distortion (S /N ratio). The calculated S/N ratio is output to distortion power determination section 519 .

失真功率判定单元519首先进行从失真计算单元506接收到的S/N比与预先设定的门限值的大小比较。接着,失真功率判定单元519根据上述大小比较的结果,将预选用的控制信号送到扩散模式存储选择单元515。控制信号的内容在上述大小比较中S/N比大时,指示选择预先学习,使噪声码本检索用目标矢量进行编码所产生的编码失真较小后,其结果所得的扩散模式,并在上述大小比较中S/N比小时,指示选择随机模式的扩散模式。The distortion power determination unit 519 first compares the S/N ratio received from the distortion calculation unit 506 with a preset threshold value. Next, the distortion power determination unit 519 sends the pre-selected control signal to the diffuse pattern storage selection unit 515 according to the result of the above magnitude comparison. When the content of the control signal is greater than the S/N ratio in the above-mentioned size comparison, it indicates that pre-learning is selected so that the encoding distortion generated by encoding the target vector for random codebook retrieval is small, and the resultant diffusion pattern is described above. A small S/N ratio in size comparison indicates that a random pattern of diffusion patterns is selected.

结果,扩散模式存储选择单元515中,从各通道存储的M种扩散模式(M=2)仅预选一种,可大量减少扩散模式的组合。因此,不需要对扩散模式的全部组合号计算失真,能用少量的运算高效规定噪声码号。再者,噪声码矢量的形状在S/N比大时为脉冲状,S/N比小时为随机状。因此,能根据话音信号的短时间特征,使噪声码矢量的形状变化,从而可提高合成话音的质量。As a result, in the diffusion pattern storage selection unit 515, only one of the M diffusion patterns (M=2) stored in each channel is preselected, and the combination of diffusion patterns can be greatly reduced. Therefore, it is not necessary to calculate distortions for all combination numbers of the diffusion pattern, and it is possible to efficiently specify random code numbers with a small number of calculations. Furthermore, the shape of the random code vector is pulse-like when the S/N ratio is large, and random-like when the S/N ratio is small. Therefore, the shape of the random code vector can be changed according to the short-term characteristics of the speech signal, thereby improving the quality of the synthesized speech.

为了说明简便,本实施形态限定于脉冲矢量生成单元的通道数N为3,扩散模式存储选择单元所存每一通道扩散脉冲的种类数M为2的情况下进行说明。然而,脉冲矢量生成单元的通道数、每一通道扩散模式的种类与上述说明不同时,也能取得同样的效果和作用。For simplicity of description, this embodiment is limited to the case where the number N of channels of the pulse vector generation unit is 3, and the number M of types of diffusion pulses stored in each channel in the diffusion mode storage and selection unit is 2. However, even when the number of channels of the pulse vector generation unit and the type of diffusion pattern per channel are different from those described above, the same effect and action can be obtained.

为了说明简便,本实施形态又对每一通道存储的M种扩散模式中(M=2),一种为由上述学习取得的扩散模式,另一种为随机模式的情况进行说明。然而,如果做成每一通道至少预先存储一种随机模式的扩散模式,即便不是上述那样的情况也可期望取得同样的效果和作用。For simplicity of description, this embodiment will also describe the case where among the M types of diffusion patterns stored in each channel (M=2), one is the diffusion pattern acquired by the above-mentioned learning, and the other is a random pattern. However, if the diffusion pattern of at least one random pattern is stored in advance for each channel, the same effects and operations can be expected even in cases other than the above.

本实施形态中虽然仅用由规定自适应码号而产生的编码失真(用S/N比表示)的大小信息作为预选扩散模式用的手段,但如果兼用进一步正确表示话音信号短时间特征的信息,可期望具有更进一步的效果和作用。In this embodiment, although only the size information of the coding distortion (expressed by S/N ratio) generated by the specified adaptive code number is used as the means for pre-selecting the diffusion mode, if the information that further accurately expresses the short-term characteristics of the voice signal is also used , can be expected to have further effects and functions.

此外,通过构成具有上述话音信号编码器的话音信号通信系统或话音信号记录系统,或获得第1实施形态中记载的音源矢量生成装置所具有的作用和效果。Furthermore, by constructing a speech signal communication system or a speech signal recording system having the above-mentioned speech signal encoder, the functions and effects of the sound source vector generation device described in the first embodiment can be obtained.

第5实施形态Fifth Embodiment

图8中示出本发明第5实施形态有关CELP型话音信号编码器的功能框图。此CELP型话音信号编码器中,在LPC分析单元600通过对输入话音数据601进行自相关分析和LPC分析,获得LPC系数。又在对所得LPC系数进行编码而取得LPC码的同时,将所得LPC码解码,取得解码LPC系数。Fig. 8 is a functional block diagram of a CELP type speech signal encoder according to a fifth embodiment of the present invention. In this CELP type speech signal encoder, LPC coefficients are obtained by performing autocorrelation analysis and LPC analysis on the input speech data 601 in the LPC analysis unit 600 . Further, while encoding the obtained LPC coefficients to obtain LPC codes, the obtained LPC codes are decoded to obtain decoded LPC coefficients.

接着,在音源生成单元602,取出自适应码本603和噪声码本604所存的音源取样(分别称为自适应码矢量(或自适应音源)和噪声码矢量(或噪声源)),并分别送到LPC合成单元605。Then, in the sound source generating unit 602, take out the sound source sampling stored in the adaptive codebook 603 and the random codebook 604 (referred to as adaptive code vector (or adaptive sound source) and random code vector (or noise source)), and respectively sent to the LPC synthesis unit 605.

在LPC合成单元605中,对音源生成单元602所获得的2个音源,利用LPC分析单元600所获得的解码LPC系数进行滤波,从而获得2个合成话音。In the LPC synthesis unit 605, the two sound sources obtained by the sound source generation unit 602 are filtered using the decoded LPC coefficients obtained by the LPC analysis unit 600, thereby obtaining two synthesized voices.

在比较单元606中,分析LPC合成单元605所得2个合成话音和输入话音601的关系,求2个合成话音的最佳值(最佳增益),并将用该最佳增益调整功率后的各合成话音相加,取得总合成话音后,计算该总合成话音与输入话音的距离。In the comparison unit 606, analyze the relationship between the two synthesized voices obtained by the LPC synthesis unit 605 and the input voice 601, find the optimum value (optimum gain) of the two synthesized voices, and use the optimum gain to adjust the power of each The synthesized voices are added to obtain the total synthesized voice, and the distance between the total synthesized voice and the input voice is calculated.

又,对自适应码本603和噪声码本604的全部音源的取样,计算由驱动音源生成单元602和LPC合成单元605所得的多个合成话音与输入话音601的距离,求该结果所得距离中最小时的音源取样索引号。Again, for the sampling of all sound sources of the adaptive codebook 603 and the random codebook 604, calculate the distance between a plurality of synthesized voices obtained by driving the sound source generation unit 602 and the LPC synthesis unit 605 and the input voice 601, and calculate the distance between the obtained results The minimum hour audio sample index number.

将所得最佳增益、音源取样索引号以及该索引号对应的2个音源送到参数编码单元607。在参数编码单元607通过进行最佳增益编码取得增益码后,将LPC码、音源取样索引号汇合在一起送到传输线路608。Send the obtained optimal gain, the sound source sampling index number and the two sound sources corresponding to the index number to the parameter encoding unit 607 . After the parameter coding unit 607 obtains the gain code by performing optimal gain coding, the LPC code and the sound source sample index number are combined and sent to the transmission line 608 .

根据与增益码和索引号对应的2个音源,生成实际音源信号,将该信号存入自适应码本603,同时废弃旧的音源取样。Based on the two sound sources corresponding to the gain code and the index number, an actual sound source signal is generated, the signal is stored in the adaptive codebook 603, and old sound source samples are discarded.

再者,LPC合成单元605中,通常兼用听觉加权滤波器,该滤波器采用线性预测系数、高频强化滤波器和长期预测系数(通过对输入话音进行长期预测分析取得)。一般用将分析区间进一步细分的区间(称为子帧)进行对自适应码本和噪声码本的音源检索。Furthermore, in the LPC synthesis unit 605, an auditory weighting filter is usually used as well, and the filter uses a linear prediction coefficient, a high-frequency enhancement filter and a long-term prediction coefficient (obtained by long-term prediction analysis of the input voice). In general, an audio source search for an adaptive codebook and a random codebook is performed using a subdivided section (referred to as a subframe) of the analysis section.

下面,本实施形态对LPC分析单元600中的LPC系数矢量量化进行详细说明。Next, this embodiment will describe in detail the LPC coefficient vector quantization in the LPC analysis unit 600 .

图9示出实现在LPC分析单元600执行的矢量量化算法用的功能框图。图9中所示矢量量化框包括目标提取单元702、量化单元703、失真计算单元704、比较单元705、解码矢量存储单元707和矢量平滑单元708。FIG. 9 shows a functional block diagram for realizing the vector quantization algorithm executed in the LPC analysis unit 600 . The vector quantization block shown in FIG. 9 includes a target extraction unit 702 , a quantization unit 703 , a distortion calculation unit 704 , a comparison unit 705 , a decoded vector storage unit 707 and a vector smoothing unit 708 .

在目标提取单元702中,根据输入矢量701,算出量化目标。现详细说明提取目标的方法。In target extracting section 702 , a quantization target is calculated from input vector 701 . The method of extracting the target will now be described in detail.

本实施形态中的输入矢量由2种矢量构成:分析编码对象帧所得的参数矢量;从一个未来帧进行同样分析取得的参数矢量。目标提取单元702利用上述输入矢量和解码矢量存储单元707所存先前帧解码矢量,算出量化目标。式(8)示出运算方法的例子。The input vector in this embodiment is composed of two types of vectors: a parameter vector obtained by analyzing the encoding target frame; and a parameter vector obtained by similarly analyzing a future frame. The target extraction unit 702 calculates the quantization target by using the input vector and the decoded vector of the previous frame stored in the decoded vector storage unit 707 . Equation (8) shows an example of a calculation method.

X(i)={St(i)+p(d(i)+St+1(i)/2}/(1+p)(8)X(i)={S t (i)+p(d(i)+S t+1 (i)/2}/(1+p)(8)

X(i):目标矢量X(i): target vector

i:矢量要素号i: vector element number

St(i)、St+1(i):输入矢量S t (i), S t+1 (i): input vector

t:时间(帧号)t: time (frame number)

p:加权系数(固定)p: weighting coefficient (fixed)

d(i):先前帧解码矢量d(i): previous frame decoding vector

下面示出上述目标提取方法的思路。典型的矢量量化中将当前帧的参数矢量St(i)作为目标X(i),并用式(9)进行拟合。The idea of the above object extraction method is shown below. In typical vector quantization, the parameter vector S t (i) of the current frame is used as the target X(i), and is fitted by formula (9).

EnEn == ΣΣ ii == 00 II (( Xx (( ii )) -- CnCn (( ii )) )) 22 -- -- -- (( 99 ))

En:与n号码矢量的距离En: the distance from the n number vector

X(i):量化目标X(i): quantitative target

Cn(i):码矢量Cn(i): code vector

n:码矢量号n: code vector number

i:矢量维数i: vector dimension

I:矢量长度I: vector length

于是,至此为止的矢量量化中,编码失真仍然与音质的劣化相联系。在即使采取预测矢量量化等对策也不能避免某种程度的编码失真的超低比特率编码中,这成为大问题。Therefore, in the conventional vector quantization, coding distortion still leads to degradation of sound quality. This becomes a big problem in ultra-low bit-rate coding in which a certain degree of coding distortion cannot be avoided even if measures such as vector predictive quantization are taken.

因此,本实施形态中,作为听觉上难以发现差错的方向,着眼于前后解码矢量的中点,在该处导出解码矢量,由此实现听觉方面的改善。这是利用参数矢量内插特性良好时,难以听到时间连续特性在听觉上的劣化这种特性。下面参照示出矢量空间的图10说明这种情况。Therefore, in the present embodiment, the midpoint of the previous and subsequent decoded vectors is focused on as a direction in which errors are difficult to be detected audibly, and the decoded vector is derived there, thereby improving the audible sense. This is due to the fact that when the parameter vector interpolation characteristic is good, it is difficult to hear the deterioration of the time-continuous characteristic aurally. This is explained below with reference to FIG. 10 showing a vector space.

首先,该前一帧的解码矢量为d(i),未来参数矢量为St+1(i)(实际上最好为未来解码矢量,但当前帧中不能编码,所以代用参数矢量),则码矢量Cn(i):(1)为码矢量Cn(i):(2)更接近参数矢量St(i),但实际上Cn(i):(2)接近在d(i)与St+1(i)的连线上,因而比Cn(i):(1)不易听到劣化。于是,利用这种特性,如果将目标X(i)取为从St(i)以某种程度接近d(i)与St+1(i)的中点位置上的矢量,则将解码矢量引导到听觉上失真小的方向。First, the decoding vector of the previous frame is d(i), and the future parameter vector is S t+1 (i) (in fact, it is better to be the future decoding vector, but it cannot be coded in the current frame, so the parameter vector is used instead), then Code vector Cn(i): (1) is code vector Cn(i): (2) is closer to parameter vector S t (i), but in fact Cn(i): (2) is close to d(i) and S On the connection of t+1 (i), it is less likely to hear degradation than Cn(i): (1). Therefore, using this characteristic, if the target X(i) is taken as a vector from S t (i) to a certain extent close to the midpoint of d(i) and S t+1 (i), then the decoding The vector leads to a direction with less audible distortion.

本实施形态中,通过导入下面式(10)的估算能实现这种目标的移动。In this embodiment, such target movement can be realized by introducing the estimation of the following formula (10).

X(i)={St(i)+p(d(i)+St+1(i)/2}/(1+p)(10)X(i)={S t (i)+p(d(i)+S t+1 (i)/2}/(1+p)(10)

X(i):量化目标矢量X(i): quantization target vector

i:矢量要素号i: vector element number

St(i)、St+1(i):输入矢量S t (i), S t+1 (i): input vector

t:时间(帧号)t: time (frame number)

p:加权系数(固定)p: weighting coefficient (fixed)

d(i):先前帧解码矢量d(i): previous frame decoding vector

式(10)的前半部分为一般矢量量化估算式,后半部分为听觉加权分量。为了用上述估算式进行量化,在各X(i)将估算式进行微分,并设微分所得结果为0,则可得式(8)。The first half of formula (10) is the general vector quantization estimation formula, and the second half is the auditory weighting component. In order to use the above estimation formula for quantification, the estimation formula is differentiated at each X(i), and the result of the differentiation is set to 0, then formula (8) can be obtained.

加权系数P为正的常数,其值为0时和一般矢量量化相同,无限大时目标完全位于中点。P非常大,则目标大为偏离当前帧的参数矢量St(i),听觉清晰度下降。根据解码话音信号试听实验,证实0.5<p<1.0时取得良好的性能。The weighting coefficient P is a positive constant. When its value is 0, it is the same as general vector quantization. When it is infinite, the target is completely located at the midpoint. If P is very large, the target will greatly deviate from the parameter vector S t (i) of the current frame, and the auditory clarity will decrease. According to the audition experiment of the decoded voice signal, it is confirmed that good performance is obtained when 0.5<p<1.0.

在量化单元703对目标提取单元702所得的量化目标进行量化,求矢量码,同时求解码矢量,并和矢量码一起送到失真计算单元704。The quantization unit 703 quantizes the quantization target obtained by the target extraction unit 702 to obtain a vector code and simultaneously obtain a decoding vector, and send it to the distortion calculation unit 704 together with the vector code.

本实施形态中,采用预测矢量量化作为量化的方法。下面说明预测矢量量化。In this embodiment, predictive vector quantization is adopted as a quantization method. Next, predictive vector quantization will be described.

图11中示出预测矢量量化的功能框图。预测矢量量化是一种利用过去编码和解码所得矢量(合成矢量)进行预测,并对该预测误差进行矢量量化的算法。FIG. 11 shows a functional block diagram of predictive vector quantization. Predictive vector quantization is an algorithm for predicting using vectors (synthetic vectors) obtained by encoding and decoding in the past, and performing vector quantization on the prediction errors.

预先生成存储多个预测误差矢量的中心取样(码矢量)的矢量码本800。通常根据分析多个话音数据所得的多个矢量,利用LBG算法(IEEE TRANSACTIONSON COMMUNICATIONS,VOL.COM-28,NO.1,pp84-95,JANUARY 1980),生成该码本。A vector codebook 800 storing center samples (code vectors) of a plurality of prediction error vectors is generated in advance. Usually, the codebook is generated by using the LBG algorithm (IEEE TRANSACTIONSON COMMUNICATIONS, VOL.COM-28, NO.1, pp84-95, JANUARY 1980) based on multiple vectors obtained by analyzing multiple voice data.

在预测单元802对量化目标的矢量801进行预测。预测利用状态存储单元803所存的过去合成矢量进行,并将所得预测误差矢量送到距离计算单元804。这里,作为预测的形态,举出预测次数为1次时利用固定系数进行的预测。下面的式(11)中示出上此预测时的预测误差矢量计算式。The vector 801 of the quantization target is predicted in the prediction unit 802 . The prediction is performed using the past composite vectors stored in the state storage unit 803 , and the obtained prediction error vector is sent to the distance calculation unit 804 . Here, as an aspect of prediction, prediction using a fixed coefficient when the number of times of prediction is one is mentioned. The following equation (11) shows the prediction error vector calculation equation at the time of the above prediction.

Y(i)=X(i)-βD(i)(11)Y(i)=X(i)-βD(i)(11)

Y(i):预测误差矢量Y(i): prediction error vector

X(i):量化目标X(i): quantitative target

β:预测系数(标量)β: prediction coefficient (scalar)

D(i):前1帧的合成矢量D(i): synthetic vector of the previous frame

i:矢量维数i: vector dimension

上式中预测系数β的值一般为0<β<1。The value of the prediction coefficient β in the above formula is generally 0<β<1.

在距离计算单元804中,计算预测单元802所得预测误差矢量与矢量码本800所存码矢量的距离。下面的式(12)示出该距离计算公式。In the distance calculation unit 804 , the distance between the prediction error vector obtained by the prediction unit 802 and the code vector stored in the vector codebook 800 is calculated. Equation (12) below shows this distance calculation formula.

EnEn == &Sigma;&Sigma; ii == 00 II (( TT (( ii )) -- CnCn (( ii )) )) 22 -- -- -- (( 1212 ))

En:与n号码矢量的距离En: the distance from the n number vector

T(i):预测误差矢量T(i): prediction error vector

Cn(i):码矢量Cn(i): code vector

n:码矢量号n: code vector number

i:矢量维数i: vector dimension

I:矢量长度I: vector length

在检索单元805中比较与各码矢量的距离,将距离最小的码矢量的号码作为矢量码806加以输出。即,控制矢量码本800和距离计算单元804,求矢量码本800所存全部码矢量中距离最小的码矢量的号码,并将该矢量号作为矢量码806。The distances to the respective code vectors are compared in the retrieval section 805 , and the number of the code vector with the smallest distance is output as a vector code 806 . That is, control the vector codebook 800 and the distance calculation unit 804 to find the number of the codevector with the smallest distance among all the codevectors stored in the vector codebook 800 , and use the vector number as the vector code 806 .

进而,根据最终矢量码,利用从矢量码本800所得的码矢量和状态存储单元803所存的过去解码矢量,进行矢量解码,并用所得的合成矢量更新状态存储单元803的内容。因此,进行下一次解码时可将此处解码的矢量用于预测。Furthermore, according to the final vector code, vector decoding is performed using the code vector obtained from the vector codebook 800 and the past decoded vector stored in the state storage unit 803, and the content of the state storage unit 803 is updated with the obtained composite vector. Therefore, the vector decoded here can be used for prediction in the next decoding.

利用下面的式(13)进行上述预测形态例(预测次数为1次、固定系数)的解码。Decoding of the above-mentioned example of the prediction mode (the number of predictions is 1, and the coefficient is fixed) is performed using the following equation (13).

Z(i)=CN(i)+βD(i)(13)Z(i)=CN(i)+βD(i)(13)

Z(i):解码矢量(下一次编码时作为D(i)使用)Z(i): decoding vector (used as D(i) in the next encoding)

N:矢量编码N: vector encoding

CN(i):码矢量CN(i): code vector

β:预测系数(标量)β: prediction coefficient (scalar)

D(i):前1帧的合成矢量D(i): synthetic vector of the previous frame

I:矢量维数I: vector dimension

另一方面,在解码器中,通过根据传送来的矢量码求码矢量,进行解码。解码器中预先备有与编码器相同的矢量码本和状态存储单元,利用与上述编码算法中的检索单元解码器功能相同的算法,进行解码。以上是在量化单元703执行的矢量量化。On the other hand, in the decoder, decoding is performed by finding a code vector from the transmitted vector code. The decoder is pre-prepared with the same vector codebook and state storage unit as the encoder, and uses the algorithm with the same function as the retrieval unit decoder in the above encoding algorithm to perform decoding. The above is the vector quantization performed at the quantization unit 703 .

在失真计算单元704中根据量化单元703所得的解码矢量、输入矢量701和解码矢量存储单元707所存的先前帧解码矢量,计算听觉加权编码失真。下面的式(14)示出计算式。In the distortion calculation unit 704 , the auditory weighted coding distortion is calculated according to the decoded vector obtained by the quantization unit 703 , the input vector 701 and the decoded vector of the previous frame stored in the decoded vector storage unit 707 . The following formula (14) shows a calculation formula.

Ew=∑(V(i)-St(i))2+p{V(i)-(d(i)+St+1(i)/2}2(14)Ew=∑(V(i)-S t (i)) 2 +p{V(i)-(d(i)+S t+1 (i)/2} 2 (14)

Ew:加权编码失真Ew: weighted encoding distortion

St(i),St+1(i):输入矢量S t (i), S t+1 (i): input vector

t:时间(帧号)t: time (frame number)

i:矢量要素号i: vector element number

V(i):解码矢量V(i): decoded vector

p:加权系数(固定)p: weighting coefficient (fixed)

d(i):先前帧解码矢量d(i): previous frame decoding vector

在式(14)中,加权系数p与目标提取单元702所用目标计算式的系数相同。将上述加权编码失真值、解码矢量和矢量码送到比较单元705。In Equation (14), the weighting coefficient p is the same as the coefficient of the object calculation equation used by the object extracting unit 702 . Send the above weighted coding distortion value, decoded vector and vector code to the comparison unit 705 .

比较单元705将失真计算单元704送来的矢量码送到传输线路608,而且用失真计算单元704送来的解码矢量,更新解码矢量存储单元707的内容。Comparing section 705 sends the vector code sent from distortion calculating section 704 to transmission line 608, and uses the decoded vector sent from distortion calculating section 704 to update the contents of decoded vector storage section 707.

根据上述实施形态,在目标提取单元702将目标矢量修正为从St(i)以某种程度接近d(i)和St+1(i)的中点的位置的矢量,因而可进行加权检索而不觉得听觉上劣化。According to the above-mentioned embodiment, the target vector is corrected in the target extracting unit 702 to a vector at a position close to the midpoint of d(i) and S t+1 (i) from S t (i) to some extent, so weighting can be performed. Retrieve without feeling aurally degraded.

至此,说明了本发明适应于便携电话等所用低比特率话音信号编码技术的情况,但本发明不但是话音信号编码,而且还能用于音乐编码器。图像编码器中内插性较好的参量矢量量化。So far, it has been described that the present invention is applicable to the low bit rate speech signal encoding technique used in portable telephones and the like, but the present invention can be applied not only to speech signal encoding but also to music encoders. Parametric vector quantization with better interpolation in image coders.

上述算法中LPC分析单元的LPC编码通常是变换为一般的LSP(线谱对)等便于编码的参数矢量,利用欧几里德距离和加权欧几里德距离进行矢量量化(VQ)。The LPC encoding of the LPC analysis unit in the above algorithm is usually transformed into a general LSP (line spectrum pair) and other parameter vectors that are convenient for encoding, and vector quantization (VQ) is performed using Euclidean distance and weighted Euclidean distance.

本实施形态中,目标提取单元702接受比较单元705的控制,将输入矢量701送到矢量平滑单元708,目标提取单元703接收矢量平滑单元708中修改过的输入矢量,再进行目标的提取。In this embodiment, the object extraction unit 702 is controlled by the comparison unit 705, and sends the input vector 701 to the vector smoothing unit 708, and the object extraction unit 703 receives the modified input vector in the vector smoothing unit 708, and then extracts the object.

这时,在比较单元705比较失真计算单元704送来的加权编码失真值和比较单元内部准备的基准值。根据此比较结果,处理分为二种。At this time, the weighted coding distortion value sent from the distortion calculation section 704 is compared in the comparison section 705 with the reference value prepared inside the comparison section. According to the result of this comparison, the processing is divided into two types.

未达到基准值时,将失真计算单元704送来的矢量码送到传输线路606,而且用失真计算单元704送来的解码矢量,更新解码矢量存储单元707的内容。通过用得到的解码矢量改写解码矢量存储单元707的内容,进行此更新。然后,过渡到下一帧参数编码处理。When the reference value is not reached, the vector code sent from the distortion calculation unit 704 is sent to the transmission line 606, and the content of the decoded vector storage unit 707 is updated with the decoded vector sent from the distortion calculation unit 704. This update is performed by overwriting the content of the decoded vector storage unit 707 with the obtained decoded vector. Then, transition to the next frame parameter encoding process.

反之,在基准值以上时,控制矢量平滑单元708,对输入矢量加以修改,使目标提取单元702、量化单元703和矢量计算单元704再次起作用,进行重新编码。On the contrary, when it is above the reference value, control the vector smoothing unit 708 to modify the input vector, so that the target extraction unit 702, the quantization unit 703 and the vector calculation unit 704 function again to perform re-encoding.

在比较单元705中未达到基准值前,反复进行编码处理。然而,有时会反复进行几次也不能变成未达到基准值,因而比较单元705内部具有计数器,计算判定为基准值以上的次数,达到一定次数以上时,中止反复编码,并进行未达到基准值时的处理和计数器清零。Until the comparison section 705 reaches the reference value, the encoding process is repeated. However, sometimes the reference value cannot be reached after repeated several times. Therefore, the comparison unit 705 has a counter inside to calculate the number of times it is judged to be above the reference value. Time processing and counter clearing.

矢量平滑单元708中,接收比较单元705的控制,根据由目标提取单元702得到的输入矢量和从解码矢量存储单元707得到的先前帧解码矢量,利用下面的式(15)修改作为输入矢量之一的当前帧参数矢量St(i),并将修改后的输入矢量送到目标提取单元702。In the vector smoothing unit 708, receiving the control of the comparison unit 705, according to the input vector obtained by the target extraction unit 702 and the previous frame decoding vector obtained from the decoding vector storage unit 707, the following formula (15) is used to modify as one of the input vectors The current frame parameter vector S t (i), and the modified input vector is sent to the target extraction unit 702 .

St(i)←(1-q)·St(i)+q(d(i)+St+1(i))/2(15)S t (i)←(1-q) S t (i)+q(d(i)+S t+1 (i))/2(15)

上述q为平滑系数,表示当前帧参数矢量接近先前帧解码矢量与未来帧参数矢量的中点的程度。根据编码实施,证实0.2<q<0.4且比较单元705的内部反复次数上限值为5-8次时,可获得良好的性能。The above q is a smoothing coefficient, indicating how close the current frame parameter vector is to the midpoint between the previous frame decoding vector and the future frame parameter vector. According to the coding implementation, it is confirmed that good performance can be obtained when 0.2<q<0.4 and the upper limit of the number of internal iterations of the comparison unit 705 is 5-8 times.

本实施形态虽然在量化单元703采用预测矢量量化,但借助上述平滑处理,失真计算单元704所得加权编码失真变小的可能性大。其原因在于利用平滑处理使量化目标更接近先前帧解码矢量。因此,利用反复进行比较单元705控制的编码,比较单元705的失真比较中未达到基准值的可能性提高。In this embodiment, although the quantization section 703 employs predictive vector quantization, it is highly likely that the weighted encoding distortion obtained by the distortion calculation section 704 will be reduced by the smoothing process described above. The reason for this is to make the quantization target closer to the previous frame decoded vector by smoothing. Therefore, by repeating the encoding controlled by comparing section 705 , it is more likely that the distortion comparison by comparing section 705 will not reach the reference value.

解码器中,预先备有与编码器量化单元对应的解码单元,根据从传输线路送来的矢量码进行解码。In the decoder, a decoding unit corresponding to the quantization unit of the encoder is prepared in advance, and decodes based on the vector code sent from the transmission line.

本实施形态也用于CELP型编码中出现的LSP参数量化(量化单元进行预测VQ)进行话音信号的编码和解码实验。其结果,证实听觉上的音质当然能提高,而且也能使客观值(S/N比)提高。这是因为利用具有矢量平滑的反复编码处理,达到即使频谱急剧变化时也能抑制预测VQ编码失真的效果。以往的预测VQ具有的缺点是:由于根据过去合成矢量进行预测,讲话开始的部分等频谱急剧变化部分的频谱失真反而变大。然而,应用本实施形态,则失真大时进行平滑处理,直到失真变小,因而目标虽然有些偏离实际的参数矢量,但编码失真变小,可取得话音信号解码时总体劣化变小的效果。因此,根据本实施形态,不仅听觉上改善音质,而且也能使客观值提高。This embodiment is also used for LSP parameter quantization (quantization unit predicts VQ) in CELP type coding to perform coding and decoding experiments of voice signals. As a result, it was confirmed that the auditory sound quality can be improved as a matter of course, and the objective value (S/N ratio) can also be improved. This is because the iterative encoding process with vector smoothing achieves the effect of suppressing the predicted VQ encoding distortion even when the frequency spectrum changes rapidly. Conventional predictive VQ has a disadvantage in that the spectral distortion in the part where the frequency spectrum changes rapidly, such as the part where the speech starts, becomes rather large due to the prediction based on the past composite vector. However, applying this embodiment, when the distortion is large, the smoothing process is performed until the distortion becomes smaller. Therefore, although the target deviates from the actual parameter vector, the encoding distortion becomes smaller, and the overall degradation of the speech signal is reduced when decoding. Therefore, according to the present embodiment, not only the sound quality can be improved perceptually but also the objective value can be improved.

本实施形态中,可利用比较单元和矢量平滑单元的特征,在矢量量化失真大时,将其劣化的方向控制在听觉上比较不会觉察的方向上,而且在量化单元采用预测矢量量化时通过反复进行平滑处理+编码,直到编码失真变小也能使客观值提高。In this embodiment, the characteristics of the comparison unit and the vector smoothing unit can be used to control the direction of its deterioration in a direction that is relatively imperceptible to the auditory sense when the vector quantization distortion is large, and when the quantization unit adopts predictive vector quantization by Repeatedly performing smoothing + encoding until the encoding distortion becomes small can also increase the objective value.

至此,说明了本发明适应于便携电话等所用低比特率话音编码技术的情况,但本发明不仅是话音信号编码,而且也可用于音乐编码器和图像编码器中内插性较好的参数矢量量化。So far, it has been explained that the present invention is applicable to low-bit-rate speech coding techniques used in portable telephones, etc., but the present invention is not only for speech signal coding, but also can be used for parameter vectors with better interpolation in music coders and image coders. Quantify.

(第6实施形态)(sixth embodiment)

下面说明本发明第6实施形态有关的CELP型话音信号编码器。本实施形态除量化方法采用多级预测矢量量化的量化单元的量化算法外,其他的结构与上述第5实施形态相同。即,噪声码本采用上述第1实施形态的音源矢量生成装置。现详细说明量化单元的量化算法。Next, a CELP type speech signal encoder according to a sixth embodiment of the present invention will be described. This embodiment is the same as the above-mentioned fifth embodiment except that the quantization method adopts the quantization algorithm of the quantization unit of multi-stage predictive vector quantization. That is, the noise source vector generator of the first embodiment described above is used as the random codebook. The quantization algorithm of the quantization unit will now be described in detail.

图12中示出量化单元的功能框图。多级矢量量化中,进行目标矢量量化后,以量化所得目标码字利用其码本进行解码,求编码后的矢量与原目标之差(称为编码失真矢量),进而将求得的编码失真矢量加以量化。A functional block diagram of the quantization unit is shown in FIG. 12 . In multi-level vector quantization, after the target vector quantization is performed, the quantized target codeword is decoded using its codebook, and the difference between the encoded vector and the original target (called the encoding distortion vector) is calculated, and then the obtained encoding distortion Vectors are quantized.

预先生成存放多个预测误差矢量中心取样(码矢量)的矢量码本899、矢量码本900。借助对多个学习用的预测误差矢量,应用与典型“多级矢量量化”码本生成方法相同的算法,生成这些码本。即,通常根据分析许多话音数据所得的多个矢量,利用LBG算法(I EEE TRANSACTIONS ON COMMUNICATIONS,VOL.COM-28,NO.1,pp84-95,JANUARY 1980)生成上述码本。但,矢量码本899的学习总体为许多量化目标的集合,矢量码本900的学习总体为对上述许多量化目标用量化码本899进行编码时的编码失真矢量的集合。The vector codebook 899 and the vector codebook 900 storing a plurality of prediction error vector center samples (code vectors) are generated in advance. These codebooks are generated by applying the same algorithm as a typical "multi-level vector quantization" codebook generation method to a plurality of prediction error vectors for learning. That is, usually based on a plurality of vectors obtained by analyzing many voice data, the LBG algorithm (IEEE TRANSACTIONS ON COMMUNICATIONS, VOL.COM-28, NO.1, pp84-95, JANUARY 1980) is used to generate the above codebook. However, the total learning of the vector codebook 899 is a collection of many quantization targets, and the total learning of the vector codebook 900 is a collection of encoding distortion vectors when the quantization codebook 899 is used to encode the above-mentioned many quantization targets.

首先,在预测单元902对量化目标矢量901进行预测。预测用状态存储单元903所存过去合成矢量进行,并将得到的预测误差矢量送到距离计算单元904和距离计算单元905。First, the quantization target vector 901 is predicted in the prediction unit 902 . The prediction is performed using the past composite vector stored in the state storage unit 903 , and the obtained prediction error vector is sent to the distance calculation unit 904 and the distance calculation unit 905 .

本实施形态中,作为预测形态,举出预测次数为1次时利用固定系数进行的预测。下面的式(16)示出用这种预测时的预测误差矢量运算式。In the present embodiment, as a prediction form, prediction using a fixed coefficient is given when the number of predictions is one. The following equation (16) shows the prediction error vector calculation equation when such prediction is used.

Y(i)=X(i)-βD(i)(16)Y(i)=X(i)-βD(i)(16)

Y(i):预测误差矢量Y(i): prediction error vector

X(i):量化目标X(i): quantitative target

β:预测系数(标量)β: prediction coefficient (scalar)

D(i):前1帧的合成矢量D(i): synthetic vector of the previous frame

i:矢量维数i: vector dimension

上式中,预测系数β的值通常为0<β<1。In the above formula, the value of the prediction coefficient β is usually 0<β<1.

在距离计算单元904中,计算预测单元902所得预测误差矢量与矢量码本899所存码矢量A的距离。下面的式(17)示出距离计算式。In the distance calculation unit 904 , the distance between the prediction error vector obtained by the prediction unit 902 and the code vector A stored in the vector codebook 899 is calculated. The following equation (17) shows the distance calculation equation.

EnEn == &Sigma;&Sigma; ii == 00 II (( Xx (( ii )) -- CC 11 nno (( ii )) )) 22 -- -- -- (( 1717 ))

En:与n号码矢量A的距离En: the distance from n number vector A

X(i):预测误差矢量X(i): prediction error vector

C1n(i):码矢量AC1n(i): code vector A

n:码矢量A的号码n: number of code vector A

i:矢量维数i: vector dimension

I:矢量长度I: vector length

在检索单元906中,比较与各码矢量A的距离,将距离最小的码矢量A的号码作为码矢量A的编码。即,控制矢量码本899和距离计算单元904求矢量码本899所存全部码矢量中距离最小的码矢量A的号码,并将该号码作为码矢量A的编码。然后,将码矢量A的编码和参照该编码从矢量码本899取得的解码矢量A送到距离计算单元905。又将码矢量A的编码送到传输线路、检索单元907。In the search section 906, the distances to the respective code vectors A are compared, and the number of the code vector A with the smallest distance is used as the code of the code vector A. That is, control the vector codebook 899 and the distance calculation unit 904 to obtain the number of the codevector A with the smallest distance among all the codevectors stored in the vector codebook 899, and use this number as the code of the codevector A. Then, the code of the code vector A and the decoded vector A obtained from the vector codebook 899 referring to the code are sent to the distance calculating section 905 . The encoding of the code vector A is sent to the transmission line and the retrieval unit 907.

距离计算单元905根据预测误差矢量和从检索单元906取得的解码矢量A,取得编码失真矢量,或者参照从检索单元906取得的码矢量A的编码,从幅度存储单元908取得幅度,然后计算上述编码失真矢量与矢量码本900中存储的码矢量B乘以上述幅度所得结果的距离,并将该距离送到检索单元907。下面的式(18)示出距离计算式。The distance calculation unit 905 obtains the coding distortion vector from the prediction error vector and the decoded vector A obtained from the retrieval unit 906, or refers to the code of the code vector A obtained from the retrieval unit 906, obtains the magnitude from the magnitude storage unit 908, and then calculates the above code The distance between the distortion vector and the code vector B stored in the vector codebook 900 multiplied by the above amplitude is obtained, and the distance is sent to the retrieval unit 907 . The following equation (18) shows the distance calculation equation.

Z(i)=Y(i)-C1N(i)Z(i)=Y(i)-C1N(i)

EmEm == &Sigma;&Sigma; ii == 00 II (( ZZ (( ii )) -- aNCNC 22 mm (( ii )) )) 22 -- -- -- (( 1818 ))

Z(i):解码失真矢量Z(i): decoded distortion vector

Y(i):预测误差矢量Y(i): prediction error vector

C1N(i):解码矢量AC1N(i): decoded vector A

N:码矢量A的编码N: encoding of code vector A

Em:与m号码矢量B的距离Em: the distance from the m number vector B

aN与码矢量A的编码对应的幅度aN and the magnitude corresponding to the encoding of the code vector A

C2m(i):码矢量BC2m(i): code vector B

m:码矢量B的号码m: number of code vector B

i:矢量维数i: vector dimension

I:矢量长度I: vector length

在检索单元907中,比较与各码矢量B的距离,将距离最小的码矢量B的号码作为码矢量B的编码。即,控制矢量码本900和距离计算单元905,求矢量码本900所存全部码矢量B中距离最小的码矢量B的号码,并将该号码作为码矢量B的编码。然后,将码矢量A和码矢量B的编码合在一起,作为矢量909。In the search section 907, the distances to the respective code vectors B are compared, and the number of the code vector B with the smallest distance is used as the code of the code vector B. That is, control the vector codebook 900 and the distance calculation unit 905 to find the number of the codevector B with the smallest distance among all the codevectors B stored in the vector codebook 900, and use this number as the code of the codevector B. Then, codes of code vector A and code vector B are combined to form vector 909 .

检索单元907还根据码矢量A、B的编码,用从矢量码本899和矢量码本900获得的解码矢量A和B、从幅度存储单元908获得的幅度,以及状态存储单元903存储的过去解码矢量进行矢量的解码,并利用得到的合成矢量更新状态存储单元903的内容。(因此,进行下一次编码时,将此处解码的矢量用于预测。)利用下面的式(19)进行本实施形态预测(预测次数为1次。固定系数)中的解码。The retrieval unit 907 also uses the decoding vectors A and B obtained from the vector codebook 899 and the vector codebook 900, the magnitude obtained from the magnitude storage unit 908, and the past decoding codes stored in the state storage unit 903 according to the coding of the code vectors A and B. The vector is decoded, and the content of the state storage unit 903 is updated with the obtained composite vector. (Therefore, the vector decoded here is used for prediction when the next encoding is performed.) Decoding in the prediction of this embodiment (the number of predictions is 1, and the coefficient is fixed) is performed using the following equation (19).

Z(i)=C1N(i)+aN·C2M(i)+βD(i)        (19)Z(i)=C1N(i)+aN·C2M(i)+βD(i) (19)

Z(i):解码矢量(下一次编码时作为D(i)使用)Z(i): decoding vector (used as D(i) in the next encoding)

N:码矢量A的编码N: encoding of code vector A

M:码矢量B的编码M: encoding of code vector B

C1M:解码矢量AC1M: decode vector A

C2M:解码矢量BC2M: Decoding Vector B

aN:与码矢量A的编码对应的幅度aN: magnitude corresponding to encoding of code vector A

β:预测系数(标量)β: prediction coefficient (scalar)

D(i):前一帧的合成矢量D(i): synthetic vector of the previous frame

i:矢量维数i: vector dimension

预先设定幅度存储单元908存储的幅度,下面示出此设定方法。对许多话音数据进行编码,并对第1级码矢量的各编码求下面式(20)的总编码失真后,进行学习,使该失真最小,从而设定幅度。The width stored in the width storage unit 908 is set in advance, and the setting method is shown below. A lot of speech data is coded, and the total coding distortion of the following equation (20) is obtained for each code of the first-order code vector, and learning is performed to minimize the distortion to set the amplitude.

ENEN == &Sigma;&Sigma; &Sigma;&Sigma; ii == 00 II (( YY tt (( ii )) -- CC 11 NN (( ii )) -- aNCNC 22 mm tt (( ii )) )) 22 -- -- -- (( 2020 ))

EN:码矢量A的编码为N时的编码失真EN: Coding distortion when code vector A is coded as N

N:码矢量A的编码N: encoding of code vector A

t:码矢量A的编码为N的时间t: the time when the encoding of the code vector A is N

Yt(i):时间t的预测误差矢量Y t (i): prediction error vector at time t

C1N(i):解码矢量AC1N(i): decoded vector A

aN:与码矢量A的编码对应的幅度aN: magnitude corresponding to encoding of code vector A

C2mt(i):码矢量BC2m t (i): code vector B

mt:码矢量B的号码m t : number of code vector B

i:矢量维数i: vector dimension

I:矢量长度I: vector length

即,编码后,设定并修改上述式(20)的失真,使在各幅度微分的值为0,由此,进行幅度学习。然后,反复进行上述编码+学习,从而求出最佳幅度。That is, after encoding, the distortion of the above-mentioned expression (20) is set and corrected so that the differential value in each amplitude is 0, thereby performing amplitude learning. Then, the above-described encoding+learning is repeated to find the optimum width.

另一方面,解码器中,通过根据传递来的矢量码,求码矢量,进行解码。解码器具有和编码器相同的矢量码本(对应于码矢量A、B)、幅度存储单元和状态存储单元,用与上述编码算法中检索单元(对应于码矢量B)的解码功能相同的算法进行解码。On the other hand, in the decoder, decoding is carried out by calculating the code vector according to the transmitted vector code. The decoder has the same vector codebook (corresponding to code vectors A, B), amplitude storage unit and state storage unit as the encoder, and uses the same algorithm as the decoding function of the retrieval unit (corresponding to code vector B) in the above-mentioned encoding algorithm to decode.

因此,本实施形态中,利用幅度存储单元和距离计算单元的特征以较少的计算量使第2级的码矢量适应第1级,从而能使幅度失真较小。Therefore, in this embodiment, the second-order code vector is adapted to the first-order code vector with a small amount of calculation by using the features of the amplitude storage unit and the distance calculation unit, so that the amplitude distortion can be reduced.

至此,说明了本发明适应于便携电话等所用低比特率话音信号编码技术的情况,但本发明不仅是话音信号编码,而且还可用于音乐编码器和图像编码器等中内插性较好的参数矢量量化。So far, it has been explained that the present invention is applicable to low-bit-rate speech signal encoding techniques used in portable telephones, etc., but the present invention is not only applicable to speech signal encoding, but can also be used in music encoders, image encoders, etc., which have better interpolation performance. Parameter vector quantization.

(第7实施形态)(seventh embodiment)

下面说明本发明第7实施形态有关的CELP型话音信号编码器。本发明形态是一种编码器的例子,该编码器可减少采用ACELP型噪声码本时码检索的运算量。Next, a CELP type speech signal encoder according to a seventh embodiment of the present invention will be described. The aspect of the present invention is an example of an encoder that can reduce the amount of computation for time code retrieval using an ACELP type noise codebook.

图13中示出本实施形态有关CELP型话音编码器的功能框图。此CELP型话音信号编码器中,滤波器系数分析单元1002对输入话音信号1001进行线性预测分析,取得合成滤波器系数,并将所得合成滤波器系数输出到滤波器系数量化单元1003。滤波器系数量化单元1003将输入的合成滤波器系数量化后,输出到合成滤波器1004。Fig. 13 shows a functional block diagram of a CELP type speech coder according to this embodiment. In this CELP type voice signal encoder, the filter coefficient analysis unit 1002 performs linear predictive analysis on the input voice signal 1001 to obtain synthesis filter coefficients, and outputs the obtained synthesis filter coefficients to the filter coefficient quantization unit 1003 . Filter coefficient quantization section 1003 quantizes the input synthesis filter coefficients and outputs them to synthesis filter 1004 .

合成滤波器1004是根据滤波器系数量化单元1003所供给的滤波器系数建立的,由激励信号1011驱动。该激励信号1011通过将自适应码本1005输出的自适应矢量1006乘以自适应增益1007所得结果与噪声码本1008输出的噪声矢量1009乘以噪声增益1010所得结果相加而取得。Synthesis filter 1004 is established based on filter coefficients supplied from filter coefficient quantization unit 1003 and is driven by excitation signal 1011 . The excitation signal 1011 is obtained by adding the result obtained by multiplying the adaptive vector 1006 output by the adaptive codebook 1005 by the adaptive gain 1007 and the result obtained by multiplying the noise vector 1009 output by the random codebook 1008 by the noise gain 1010 .

这里,自适应码本1005是存储每一音调周期取出过去对合成滤波器的激励信号的多个自适应矢量的码本,噪声码本1007是存储多个噪声矢量的码本。噪声码本1007可采用上述第1实施形态的音源矢量生成装置。Here, the adaptive codebook 1005 is a codebook that stores a plurality of adaptive vectors obtained in the past for the excitation signal for the synthesis filter for each pitch cycle, and the noise codebook 1007 is a codebook that stores a plurality of noise vectors. As the random codebook 1007, the excitation vector generator of the first embodiment described above can be used.

失真计算单元1013算出作为激励信号1011所驱动合成滤波器1004的输出的合成话音信号1012与输入话音信号1001之间的失真,并进行码检索处理。码检索处理是一种规定使失真计算单元1013所计算失真最小用的自适应矢量1006的号码和噪声矢量1009的号码,同时算出各输出矢量所乘自适应增益1007和噪声增益1010的最佳值的处理。Distortion calculating section 1013 calculates distortion between synthesized speech signal 1012 which is an output of synthesis filter 1004 driven by excitation signal 1011 and input speech signal 1001, and performs code search processing. The code search process is a method of specifying the number of the adaptive vector 1006 and the number of the noise vector 1009 used to minimize the distortion calculated by the distortion calculation unit 1013, and calculating the optimum values of the adaptive gain 1007 and the noise gain 1010 multiplied by each output vector processing.

编码输出单元1014输出的是将分别与从滤波器系数量化单元1003得到的滤波器系数量化值,以及失真计算单元1013中选择的自适应矢量1006的号码和噪声矢量1009的号码相乘的自适应增益1007和噪声增益1009编码后所得的结果。将从编码输出单元1014输出的信息加以传输或存储。The coded output unit 1014 outputs the adaptive vector 1006 and the adaptive vector 1009 selected in the distortion calculation unit 1013 by multiplying the filter coefficient quantization value obtained from the filter coefficient quantization unit 1003 and the noise vector 1009 respectively. Gain 1007 and Noise Gain 1009 are encoded results. The information output from the code output unit 1014 is transmitted or stored.

失真计算单元1013中的码检索处理,通常首先对激励信号中的自适应码本分量进行检索,然后对激励信号中的噪声码本分量进行检索。In the code retrieval process in the distortion calculation unit 1013, usually, the adaptive codebook component in the excitation signal is retrieved first, and then the noise codebook component in the excitation signal is retrieved.

上述噪声分量的检索使用下面说明的正交检索。The retrieval of the above-mentioned noise components uses the orthogonal retrieval described below.

正交检索中,规定使式(21)的检索基准值Eort(=Nort/Dort)最大的噪声矢量c。In the orthogonal search, the noise vector c that maximizes the search reference value Eort (=Nort/Dort) of the equation (21) is specified.

EortEort (( == NortNort DortDort )) == [[ {{ (( PP tt Hh tt HcHc )) xx -- (( xx tt HpHP )) HpHP }} HcHc ]] 22 (( cc tt Hh tt HcHc )) (( pp tt Hh tt HpHP )) -- (( pp tt Hh tt HcHc )) 22 -- -- -- (( 21twenty one ))

Nort:Eort的分子项Nort: Molecular term of Eort

Dort:Eort的分母项Dort: the denominator term of Eort

p:已规定的自适应矢量p: specified adaptation vector

H:合成滤波器系数矩阵H: matrix of synthetic filter coefficients

Ht:H的转置矩阵H t : transpose matrix of H

X:目标信号(输入话音信号与合成滤波器零输入响应差分所得的结果)X: target signal (the result of the difference between the input speech signal and the zero-input response of the synthesis filter)

c:噪声矢量c: noise vector

正交检索是对事先规定自适应矢量为候补的噪声矢量分别正交,并从正交的多个噪声矢量规定1个失真最小的检索方法。这种检索方法与非正交检索相比,其特征在于可提高规定噪声矢量的精度,从而能提高合成话音信号的质量。The orthogonal search is a search method in which noise vectors which are predetermined as candidates for adaptive vectors are orthogonalized respectively, and one of the plurality of orthogonal noise vectors is specified with the least distortion. Compared with the non-orthogonal search, this search method is characterized in that it can improve the accuracy of the specified noise vector, thereby improving the quality of the synthesized speech signal.

ACELP方式中,只用少数带极性的脉冲构成噪声矢量。利用这点,将式(21)所示检索基准值的分子项(Nort)变换为下面的式(22),由此,可减少分子项的运算。In the ACELP method, only a few pulses with polarity are used to form the noise vector. Using this point, the numerator term (Nort) of the search reference value shown in the formula (21) is converted into the following formula (22), thereby reducing the calculation of the numerator term.

Nort={a0ψ(l0)+a1ψ(l1)+…an-1ψ(ln-1)}2                 (22)Nort={a 0 ψ(l 0 )+a 1 ψ(l 1 )+…a n-1 ψ(l n-1 )} 2 (22)

ai:第i个脉冲的极性(+1/-1)a i : Polarity of the i-th pulse (+1/-1)

li:第i个脉冲的位置l i : the position of the i-th pulse

N:脉冲个数N: number of pulses

ψ:{(ptHtHp)x-(xtHp)Hp}Hψ: {(p t H t Hp)x-(x t Hp)Hp}H

将式(22)中ψ的值作为前处理预先计算,并在阵列中展开,则可将阵列ψ中的(N-1)个要素带符号进行相加,并对其结果取平方,从而计算式(21)的分子项。The value of ψ in formula (22) is pre-calculated as a pre-processing and expanded in the array, then the (N-1) elements in the array ψ can be added with signs, and the result is squared to calculate Molecular term of formula (21).

下面具体说明可对分母项减少运算量的失真计算单元1013。The distortion calculation unit 1013 that can reduce the amount of computation for the denominator term will be described in detail below.

图14中示出失真计算单元1013的功能框图。本实施形态中的话音信号编码器,其结构为在图13的结构中,将自适应矢量1006和噪声矢量1009输入失真计算单元1013。A functional block diagram of the distortion calculation unit 1013 is shown in FIG. 14 . The speech signal encoder in this embodiment has a configuration in which adaptive vector 1006 and noise vector 1009 are input to distortion calculation section 1013 in the configuration shown in FIG. 13 .

在图14中,作为对所输入噪声矢量计算失真时的前处理,进行以下3种处理。In FIG. 14 , the following three types of processing are performed as pre-processing for calculating the distortion of the input noise vector.

(1)算出第1矩阵(N):计算在合成滤波器合成自适应矢量后所得矢量的功率(ptHtHp)和合成滤波器中滤波器系数的自相关矩阵(HtH),并将上述功率与上述自相关矩阵各要素相乘,从而算出矩阵N(=(ptHtHp)HtH)。(1) Calculate the first matrix (N): Calculate the power (p t H t Hp) of the vector obtained after the synthesis filter synthesizes the adaptive vector and the autocorrelation matrix (H t H) of the filter coefficients in the synthesis filter, A matrix N (=( pt H t Hp)H t H) is calculated by multiplying the above-mentioned power by each element of the above-mentioned autocorrelation matrix.

(2)算出第2矩阵(M):将在合成滤波器合成自适应矢量后所得的矢量按反时针顺序合成,并对其结果所得的信号(ptHtH)取矢积后,算出矩阵M。(2) Calculate the second matrix (M): synthesize the vectors obtained after the synthesis filter synthesizes the adaptive vector in counterclockwise order, and take the vector product of the resulting signal (p t H t H) to calculate Matrix M.

(3)生成第3矩阵(L):对(1)中算出的矩阵N和(2)中算出的矩阵M进行差分,生成矩阵L。(3) Generation of the third matrix (L): The matrix N calculated in (1) and the matrix M calculated in (2) are differentiated to generate a matrix L.

又,式(21)的分母项(Dort)可展开为式(23)。Also, the denominator term (Dort) of Equation (21) can be expanded into Equation (23).

Dort=(ctHtHc)(ptHtHp)-(ptHtHc)2    (23)Dort=(c t H t Hc)(p t H t Hp)-(p t H t Hc) 2 (23)

=ctNc-(rtc)2 =c t Nc-(r t c) 2

=ctNc-(rtc)t(rtc)=c t Nc-(r t c) t (r t c)

=ctNc-(ctrrtc)=c t Nc-(c t rr t c)

=ctNc-(ctMc)=c t Nc-(c t Mc)

=ct(N-M)c=c t (NM)c

=ctLc=c t Lc

N:(ptHtHp)HtH←上述前处理(1)N: (p t H t Hp) H t H ← above preprocessing (1)

r:ptHtH    ←上述前处理(2)r: p t H t H ← the above preprocessing (2)

M:rrt     ←上述前处理(2)M: rr t ← the above preprocessing (2)

L:N-M    ←上述前处理(3)L: N-M ← above pre-processing (3)

c:噪声矢量c: noise vector

由此,将计算式(21)检索基准值(Eort)时的分母项(Dort)的计算方法置换为式(23),可用较少的运算量规定噪声码本分量。Thus, by replacing the calculation method of the denominator term (Dort) when calculating the search reference value (Eort) in Equation (21) with Equation (23), the random codebook component can be specified with a small amount of computation.

用上述前处理获得的矩阵L和噪声矢量1009,进行分母项的计算。The denominator term is calculated using the matrix L and the noise vector 1009 obtained from the above pre-processing.

这里,为了简便,对输入话音信号取样频率为8000Hz,代数结构噪声码本检索的单位时间宽度(帧时间)为10ms,噪声矢量用每10ms 5个单元脉冲(+1/-1)的规则组合生成的情况,说明基于式(23)的分母项计算方法。Here, for the sake of simplicity, the sampling frequency of the input voice signal is 8000Hz, the unit time width (frame time) of the algebraic structure noise codebook retrieval is 10ms, and the noise vector is combined with the rule of 5 unit pulses (+1/-1) every 10ms In the case of generation, the calculation method of the denominator term based on the formula (23) will be described.

又设构成噪声矢量的5个单位脉冲由处于从表2所示第0组到第4组所规定位置分别选择1个位置的脉冲组成,候补噪声矢量可用下面式(24)记述。It is also assumed that the five unit pulses constituting the noise vector are composed of pulses at one of the positions specified in Group 0 to Group 4 shown in Table 2, and the candidate noise vector can be described by the following formula (24).

C=a0δ(k-l0)+a1δ(k-l1)+…+a4δ(k-l4)    (24)C=a 0 δ(kl 0 )+a 1 δ(kl 1 )+…+a 4 δ(kl 4 ) (24)

(k=0,1,…79)(k=0, 1, ... 79)

ai:第i组所属脉冲的极性(+1/-1)a i : the polarity of the pulse belonging to the i group (+1/-1)

li:第i组所属脉冲的位置l i : the position of the pulse belonging to the i group

表2Table 2

组号Group No symbol 候补脉冲位置alternate pulse position

Number   00       ±1±1   0,10,20,30,…,60,700, 10, 20, 30, ..., 60, 70   1 1       ±1±1   2,12,22,32,…,62,722, 12, 22, 32, ..., 62, 72   2 2       ±1±1   2,16,26,36,…,66,762, 16, 26, 36, ..., 66, 76   33       ±1±1   4,14,24,34,…,64,744, 14, 24, 34, ..., 64, 74   44       ±1±1   8,18,28,38,…,68,788, 18, 28, 38, ..., 68, 78

这时,可用下面的式(25)求式(23)所示的分母项(Dort)。In this case, the denominator term (Dort) shown in the formula (23) can be obtained by the following formula (25).

DortDort == &Sigma;&Sigma; ii == 00 44 &Sigma;&Sigma; jj == 00 44 aa ii aa jj LL (( ll ii ,, ll jj )) -- -- -- (( 2525 ))

ai:第i组所属脉冲的极性a i : the polarity of the pulse belonging to the i group

li:第i组所属脉冲的位置l i : the position of the pulse belonging to the i group

L(li,lj):矩阵L中li行、lj列的要素L(l i , l j ): the elements of l i row and l j column in matrix L

根据以上说明,证明采用ACELP型噪声码本时,可用式(22)计算式(21)的码检索基准值的分子项(Nort),可用式(25)计算其分母项(Dort)。因此,采用ACELP型噪声码本时,不是原样计算式(21)的基准值,而是用(22)和式(25)分别计算其分子项和分母项,从而可大幅度削减码检索运算量。According to the above description, it is proved that when ACELP type noise codebook is used, formula (22) can be used to calculate the numerator term (Nort) of the code retrieval reference value of formula (21), and formula (25) can be used to calculate its denominator term (Dort). Therefore, when using ACELP type noise codebook, instead of calculating the reference value of formula (21) as it is, the numerator and denominator terms are calculated by using (22) and formula (25), which can greatly reduce the amount of code retrieval operations .

以上说明的本实施形态,说明了不带有预选的噪声码本检索。然而,预选使式(22)的值大的噪声矢量,并对利用预选收敛为多个候补的噪声矢量计算式(21),选择使该值最大的噪声矢量,这种情况下应用本发明,也能取得相同的效果。In the present embodiment described above, random codebook retrieval without preselection has been described. However, the present invention is applied in the case of preselecting a noise vector having a large value of Equation (22), and calculating Equation (21) for noise vectors converged to a plurality of candidates by preselection, and selecting a noise vector having the largest value. can also achieve the same effect.

Claims (28)

1. A diffusion vector generator for use in an excitation vector generator for a CELP-type speech signal encoder/speech signal decoder, comprising:
a pulse vector generating unit that generates a pulse vector having a pulse with a polarity unit on a certain element of a vector axis;
a diffusion pattern storage unit storing a plurality of diffusion patterns;
a selection unit that selects one diffusion pattern from the plurality of diffusion patterns stored in the diffusion pattern storage unit; and
and a pulse vector diffusion unit which performs convolution operation of the selected diffusion pattern and the pulse vector to generate a diffusion vector.
2. The diffusion vector generation apparatus according to claim 1, wherein said excitation vector generation apparatus generates excitation vectors from N diffusion vectors according to the following equation
Figure C988015560002C1
c: sound source vector
ci: diffusion vector
i: channel number (i = 1-N)
n: vector element numbers (n =0 to L-1, where L is the length of the excitation vector).
3. The diffusion vector generation apparatus of claim 1, wherein the pulse vector is generated by an algebraic structure noise codebook.
4. A CELP-type speech signal encoder, the CELP-type speech signal encoder comprising:
diffusion vector generating means for the excitation vector generating means;
a noise codebook for vector-quantizing the noise source information;
a synthesis filter for generating a synthesized speech using the excitation vector outputted from the excitation vector generation device as a noise code vector;
a distortion calculator that calculates quantization distortions of the generated synthesized speech and the input speech;
a switching unit that switches a combination of a pulse position and a pulse polarity and a diffusion pattern constituting a pulse vector;
a decision unit that decides a combination of a pulse position, a pulse polarity, and a diffusion pattern at which the quantization distortion calculated by the distortion calculator is minimum, and determines a noise codebook number; and
an adaptive codebook storing adaptive codevectors representing tonal components of an input voice;
the diffusion vector generation device includes:
a pulse vector generating unit that generates a pulse vector having a pulse with a polarity unit on a certain element of a vector axis;
a diffusion pattern storage unit storing a plurality of diffusion patterns;
a selection unit that selects one diffusion pattern from the plurality of diffusion patterns stored in the diffusion pattern storage unit; and
and a pulse vector diffusion unit which performs convolution operation of the selected diffusion pattern and the pulse vector to generate a diffusion vector.
5. The CELP-type speech signal encoder according to claim 4, wherein a diffusion pattern obtained by learning in advance and having less quantization distortion generated when quantizing the noise source information vector is stored in a storage unit in the excitation vector generation device.
6. The CELP-type speech signal encoder of claim 5, wherein the storage unit in the excitation vector generation apparatus stores at least one diffusion pattern obtained by learning for each channel.
7. The CELP speech signal encoder of claim 6, wherein the diffusion mode derived from learning is selected when a gain value of the code adaptive codebook is greater than a predetermined threshold value.
8. The CELP-type speech signal encoder of claim 4, wherein at least one of the diffusion patterns stored in the storage unit of the excitation vector generation apparatus in each channel is a random pattern.
9. The CELP-type speech signal encoder according to claim 4, wherein at least one of the diffusion patterns stored in the storage unit in the excitation vector generator in each channel is a diffusion pattern obtained by learning in advance and having a smaller quantization distortion generated when quantizing the noise source information vector, and at least one of the diffusion patterns is a random pattern.
10. The CELP-type speech signal encoder of claim 9, wherein the diffusion vector of the random pattern is selected in case the coding distortion generated in the decision of the adaptive codebook number is larger than a predetermined threshold.
11. The CELP-type speech signal encoder of claim 4, wherein M from a desired diffusion mode N In the full combination, a combination number representing a combination of diffusion patterns selected for each channel is determined to minimize quantization distortion generated when a noise source information vector is quantized.
12. The CELP-type speech signal encoder according to claim 11, wherein a combination of the diffusion patterns is preselected using the speech parameters obtained in advance, quantization distortion generated when the noise source information is vector-quantized is minimized, and a combination number indicating the combination of the diffusion patterns selected for each channel is determined from the combination of the diffusion patterns obtained by preselection.
13. The CELP-type speech signal encoder of claim 12, wherein the combination of preselected dispersion patterns is switched according to the speech interval analysis results.
14. The CELP-type speech signal encoder of claim 4, further comprising:
a target extraction unit for calculating a quantization target vector by using a parameter vector of a speech parameter obtained by analyzing a current encoding target frame, a parameter vector obtained by analyzing a frame in the future from the encoding target frame, and a decoding vector of a frame in the past from the current encoding target frame; and
and a vector quantization unit for encoding the calculated quantization target vector to obtain a noise codebook number of a current encoding target frame.
15. The CELP-type speech signal encoder of claim 14, wherein the target extraction unit computes a quantization target vector according to
X(i)={S t (i)+p(d(i)+S t+1 (i))/2}/(1+p)
X (i): quantization target vector
i: number of vector elements
S t (i)、S t+1 (i) The method comprises the following steps Parameter vector
t: time (frame number)
p: weighting factor (fixed)
d (i): the previous frame decodes the vector.
16. The CELP-type speech signal encoder of claim 14, further comprising:
a unit for decoding the current encoding target frame to generate a decoding vector;
a second distortion calculator for calculating coding distortion based on the decoding vector and the parameter vector of the coding object frame;
and a vector smoothing unit configured to smooth the parameter vector of the current encoding target frame supplied to the target extraction unit when the encoding distortion is smaller than a reference value.
17. The CELP-type speech signal encoder of claim 16, wherein the second distortion calculator calculates the perceptually weighted coding distortion according to
Ew=∑{(V(i)-S t (i)) 2 +p{V(i)-(d(i)+S t+1 (i))/2} 2 }
Ew: audioically weighted coding distortion
S t (i),S t+1 (i) The method comprises the following steps Input vector
t: time (frame number)
i: number of vector elements
V (i): decoding vectors
p: weighting factor
d (i): the vector is decoded from the previous frame.
18. The CELP-type speech signal encoder of claim 14, wherein the vector quantization unit comprises:
a plurality of codebooks correspondingly arranged to each stage of the multi-stage vector quantization and storing a plurality of code vectors;
a unit for calculating the distance between the quantization target vector or the prediction error vector thereof and the code vector stored in the level 1 codebook to obtain the level 1 code;
an amplitude storage unit for storing an amplitude expressed by a scalar corresponding to a code vector stored in the level 1 codebook;
a multiplication unit for taking out the amplitude corresponding to the 1 st level coding from the amplitude storage unit before the 2 nd level coding, and multiplying the amplitude by the code vector stored in the 2 nd level codebook; and
a difference between a decoded vector obtained by decoding the 1 st-order code and a code vector stored in the 2 nd-order codebook and obtained by multiplying the amplitude is calculated, and a unit of the 2 nd-order code is obtained.
19. The CELP-type speech signal encoder of claim 4, wherein the distortion calculator comprises:
means for calculating the power of a signal obtained by synthesizing the adaptive code vector in the synthesis filter and a filter coefficient autocorrelation matrix forming the synthesis filter, and calculating a 1 st matrix obtained by multiplying each element in the autocorrelation matrix by the power,
means for synthesizing signals obtained by synthesizing the adaptive vectors in the synthesis filter in reverse order of time, calculating a 2 nd matrix by taking a vector product of the synthesized signals in reverse order of time, and
differentiating the 1 st matrix from the 2 nd matrix to generate a 3 rd matrix to compute a unit of distortion.
20. A communication device comprising a CELP-type speech signal encoder of claim 4.
21. A CELP-type speech signal decoder, the CELP-type speech signal encoder comprising:
diffusion vector generating means for the excitation vector generating means;
a noise codebook for selecting a diffusion mode according to a noise code number determining a diffusion mode combination number and a pulse vector combination number and generating a pulse vector;
a synthesis filter for generating a synthesized speech using the excitation vector outputted from the excitation vector generation device as a noise code vector; and
an adaptive codebook storing adaptive codevectors;
the diffusion vector generation device includes:
a pulse vector generating unit that generates a pulse vector having a pulse with a polarity unit on a certain element of a vector axis;
a diffusion pattern storage unit storing a plurality of diffusion patterns;
a selection unit that selects one diffusion pattern from the plurality of diffusion patterns stored in the diffusion pattern storage unit; and
and a pulse vector diffusion unit which generates a diffusion vector by performing convolution operation of the selected diffusion pattern and the pulse vector.
22. The CELP-type speech signal decoder according to claim 21, wherein a diffusion pattern obtained by pre-learning and having less quantization distortion generated when quantizing the noise source information vector is stored in a storage unit in the excitation vector generation apparatus.
23. The CELP-type speech signal decoder of claim 22, wherein the storage unit in the excitation vector generation means stores at least one diffusion pattern derived by learning for each channel.
24. The CELP-type speech signal decoder of claim 21, wherein at least one of the diffusion patterns stored in the memory unit in the excitation vector generation means in each channel is a random pattern.
25. The CELP-type speech signal decoder according to claim 21, wherein at least one of the diffusion patterns stored in the storage unit in the excitation vector generation apparatus in each channel is a diffusion pattern obtained by learning in advance and having less quantization distortion generated when quantizing a noise source information vector, and at least one of the diffusion patterns is a random pattern.
26. A communication device comprising the CELP-type voice signal decoder of claim 21.
27. A diffusion vector generation method used in a sound source vector generation device for a CELP type speech signal encoding device/speech signal decoding device, comprising the steps of:
providing a pulse vector having pulses with a polarity unit on an element of a vector axis;
providing a diffusion mode selected from a plurality of diffusion modes;
and performing convolution operation of the selected diffusion mode and the provided pulse vector to generate a diffusion vector.
28. The diffusion vector generation method of claim 27,
the step of providing the pulse vector may provide the pulse vector to the N channels, respectively; and is
The step of providing a diffusion pattern may provide a plurality of diffusion patterns per the N channels.
CNB988015560A 1997-02-13 1998-10-22 Speech encoder and speech decoder Expired - Lifetime CN100367347C (en)

Applications Claiming Priority (7)

Application Number Priority Date Filing Date Title
JP9028941A JPH10228491A (en) 1997-02-13 1997-02-13 Logic verification device
JP289412/1997 1997-10-22
JP289412/97 1997-10-22
JP295130/1997 1997-10-28
JP295130/97 1997-10-28
JP85717/1998 1998-03-31
JP85717/98 1998-03-31

Related Child Applications (9)

Application Number Title Priority Date Filing Date
CNB2005100062028A Division CN100349208C (en) 1997-10-22 1998-10-22 Diffusion vector generation method and diffusion vector generation device
CN2007101529972A Division CN101174412B (en) 1997-10-22 1998-10-22 Sound encoder and sound decoder
CN2007103073165A Division CN101202045B (en) 1997-10-22 1998-10-22 Sound encoder and sound decoder
CN2007103073150A Division CN101202044B (en) 1997-10-22 1998-10-22 Sound encoder and sound decoder
CN200710307317XA Division CN101202046B (en) 1997-10-22 1998-10-22 Sound encoder and sound decoder
CN2007103073184A Division CN101202047B (en) 1997-10-22 1998-10-22 Voice signal encoder and voice signal decoder
CN2006100048275A Division CN1808569B (en) 1997-10-22 1998-10-22 Speech Coder, Orthogonal Retrieval Method and CELP Speech Coding Method
CN2007101529987A Division CN101174413B (en) 1997-10-22 1998-10-22 Speech encoder and speech decoder
CN2007103073381A Division CN101221764B (en) 1997-10-22 1998-10-22 Sound encoder and sound decoder

Publications (2)

Publication Number Publication Date
CN1242860A CN1242860A (en) 2000-01-26
CN100367347C true CN100367347C (en) 2008-02-06

Family

ID=12262442

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB988015560A Expired - Lifetime CN100367347C (en) 1997-02-13 1998-10-22 Speech encoder and speech decoder

Country Status (2)

Country Link
JP (1) JPH10228491A (en)
CN (1) CN100367347C (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7792670B2 (en) * 2003-12-19 2010-09-07 Motorola, Inc. Method and apparatus for speech coding
US7761822B2 (en) * 2007-03-19 2010-07-20 Fujitsu Limited File information generating method, file information generating apparatus, and storage medium storing file information generation program
JP5229119B2 (en) 2009-06-10 2013-07-03 富士通株式会社 Model generation program, method and apparatus
CN102598124B (en) * 2009-10-30 2013-08-28 松下电器产业株式会社 Encoder, decoder and methods thereof
CN116781180B (en) * 2023-06-05 2023-11-10 广州市高科通信技术股份有限公司 PCM channel capacity expansion method and capacity expansion system
CN117577121B (en) * 2024-01-17 2024-04-05 清华大学 Audio encoding and decoding method and device, storage medium and equipment based on diffusion model

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH02280200A (en) * 1989-04-21 1990-11-16 Nec Corp Voice coding and decoding system
JPH02282800A (en) * 1989-04-25 1990-11-20 Nec Corp Sound encoding system
JPH06202699A (en) * 1992-09-29 1994-07-22 Mitsubishi Electric Corp Speech coding apparatus, speech decoding apparatus, and speech coding / decoding method
JPH088753A (en) * 1994-06-16 1996-01-12 Nippon Telegr & Teleph Corp <Ntt> Vector code decoding method
JPH09160536A (en) * 1995-12-08 1997-06-20 Columbia Onkyo Kogyo Kk Keyed instrument

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH02280200A (en) * 1989-04-21 1990-11-16 Nec Corp Voice coding and decoding system
JPH02282800A (en) * 1989-04-25 1990-11-20 Nec Corp Sound encoding system
JPH06202699A (en) * 1992-09-29 1994-07-22 Mitsubishi Electric Corp Speech coding apparatus, speech decoding apparatus, and speech coding / decoding method
JPH088753A (en) * 1994-06-16 1996-01-12 Nippon Telegr & Teleph Corp <Ntt> Vector code decoding method
JPH09160536A (en) * 1995-12-08 1997-06-20 Columbia Onkyo Kogyo Kk Keyed instrument

Also Published As

Publication number Publication date
JPH10228491A (en) 1998-08-25
CN1242860A (en) 2000-01-26

Similar Documents

Publication Publication Date Title
CN100349208C (en) Diffusion vector generation method and diffusion vector generation device
CN101174413B (en) Speech encoder and speech decoder
CN100367347C (en) Speech encoder and speech decoder
HK1025417B (en) Sound encoder and sound decoder
HK1122639B (en) Voice signal encoder and voice signal decoder
HK1104656A (en) Multistage vector quantization for speech encoding
HK1097637B (en) Excitation vector generator for speech coding and speech decoding
HK1103843B (en) Sound encoder and sound decoder
HK1104657A (en) Orthogonalization search for the celp based speech coding
HK1099138B (en) Multistage vector quantization for speech encoding
HK1104655B (en) Orthogonalization search for the celp based speech coding

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
REG Reference to a national code

Ref country code: HK

Ref legal event code: GR

Ref document number: 1025417

Country of ref document: HK

ASS Succession or assignment of patent right

Owner name: INTELLECTUAL PROPERTY BRIDGE NO. 1 CO., LTD.

Free format text: FORMER OWNER: MATSUSHITA ELECTRIC INDUSTRIAL CO, LTD.

Effective date: 20140606

C41 Transfer of patent application or patent right or utility model
TR01 Transfer of patent right

Effective date of registration: 20140606

Address after: Tokyo, Japan

Patentee after: GODO KAISHA IP BRIDGE 1

Address before: Japan's Osaka kamato City

Patentee before: Matsushita Electric Industrial Co., Ltd.

CX01 Expiry of patent term
CX01 Expiry of patent term

Granted publication date: 20080206