[go: up one dir, main page]

HK1212087B - Systems and methods for mitigating potential frame instability - Google Patents

Systems and methods for mitigating potential frame instability Download PDF

Info

Publication number
HK1212087B
HK1212087B HK15112648.4A HK15112648A HK1212087B HK 1212087 B HK1212087 B HK 1212087B HK 15112648 A HK15112648 A HK 15112648A HK 1212087 B HK1212087 B HK 1212087B
Authority
HK
Hong Kong
Prior art keywords
frame
line spectral
spectral frequency
vector
frequency vector
Prior art date
Application number
HK15112648.4A
Other languages
Chinese (zh)
Other versions
HK1212087A1 (en
Inventor
苏巴辛格哈‧夏敏达‧苏巴辛格哈
文卡特什‧克里希南
维韦克‧拉金德朗
Original Assignee
高通股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US14/016,004 external-priority patent/US9842598B2/en
Application filed by 高通股份有限公司 filed Critical 高通股份有限公司
Publication of HK1212087A1 publication Critical patent/HK1212087A1/en
Publication of HK1212087B publication Critical patent/HK1212087B/en

Links

Description

System and method for reducing potential frame instability
Related application
The present application relates to and claims priority to "system and method FOR CORRECTING latent spectral FREQUENCY INSTABILITY (SYSTEMS AND METHODS FOR CORRECTING correction INSTABILITY)" U.S. provisional patent application No. 61/767,431, filed on 21/2/2013.
Technical Field
The present invention generally relates to electronic devices. More particularly, the present invention relates to a system and method for reducing potential frame instability.
Background
In recent decades, the use of electronic devices has become widespread. In particular, advances in electronics technology have reduced the cost of increasingly complex and useful electronic devices. Cost reduction and consumer demand have exploded the use of electronic devices, making them almost ubiquitous in modern society. As the use of electronic devices has expanded, there is a need for new and improved features for electronic devices. More specifically, electronic devices that perform new functions and/or perform functions faster, more efficiently, and with higher quality are often sought after.
Some electronic devices (e.g., mobile phones, smart phones, audio recorders, camcorders, computers, etc.) utilize audio signals. These electronic devices may encode, store, and/or transmit audio signals. For example, a smart phone may obtain, encode, and transmit voice signals for a telephone call, while another smart phone may receive and decode the voice signals.
However, there are particular challenges in the encoding, transmission and decoding of audio signals. For example, an audio signal may be encoded in order to reduce the amount of bandwidth required to transmit the audio signal. When a portion of an audio signal is lost in transmission, it may be difficult to render an accurately decoded audio signal. As can be appreciated from this discussion, systems and methods that improve decoding may be beneficial.
Disclosure of Invention
A method for mitigating potential frame instability by an electronic device is described. The method includes obtaining a frame temporally subsequent to an erased frame. The method also includes determining whether the frame is potentially unstable. The method further includes applying a substitute weighting value to generate a stable frame parameter if the frame is potentially unstable. The frame parameter may be a frame mid line spectral frequency vector. The method may include applying the received weighting vector to generate a current frame mid line spectral frequency vector.
The substitute weighting value may be between 0 and 1. Generating the stable frame parameters may include applying the substitute weighting value to a current frame end line spectral frequency vector and a previous frame end line spectral frequency vector. Generating the stable frame parameters may include determining an alternative current frame mid line spectral frequency vector equal to a product of a current frame end line spectral frequency vector and the alternative weighting value plus a product of a previous frame end line spectral frequency vector and a difference of 1 minus the alternative weighting value. The substitute weighting value may be selected based on at least one of a classification of two frames and a line spectral frequency difference between the two frames.
Determining whether the frame is potentially unstable may be based on whether the current frame mid line spectral frequency is ordered according to a rule prior to any reordering. Determining whether the frame is potentially unstable may be based on whether the frame is within a threshold number of frames after the erased frame. Determining whether the frame is potentially unstable may be based on whether any frame between the frame and the erased frame utilizes non-predictive quantization.
An electronic device for reducing potential frame instability is also described. The electronic device includes a frame parameter determination circuit that obtains a frame that temporally follows an erased frame. The electronic device also includes a stability determination circuit coupled to the frame parameter determination circuit. The stability determination circuit determines whether the frame is potentially unstable. The electronic device further includes a weighted value substitution circuit coupled to the stability determination circuit. The weight substitution circuit applies a substitution weight to generate a stable frame parameter if the frame is potentially unstable.
A computer program product for reducing potential frame instability is also described. The computer-program product includes a non-transitory tangible computer-readable medium having instructions. The instructions include code for causing an electronic device to obtain a frame temporally subsequent to an erased frame. The instructions also include code for causing the electronic device to determine whether the frame is potentially unstable. The instructions further include code for causing the electronic device to apply a substitute weighting value to generate a stable frame parameter if the frame is potentially unstable.
An apparatus for mitigating potential frame instability is also described. The apparatus includes means for obtaining a frame temporally subsequent to an erased frame. The apparatus also includes means for determining whether the frame is potentially unstable. The apparatus further includes means for applying a substitute weighting value to generate a stable frame parameter if the frame is potentially unstable.
Drawings
FIG. 1 is a block diagram illustrating a general example of an encoder and decoder;
FIG. 2 is a block diagram illustrating an example of a basic implementation of an encoder and decoder;
FIG. 3 is a block diagram illustrating an example of a wideband speech encoder and a wideband speech decoder;
FIG. 4 is a block diagram illustrating a more specific example of an encoder;
FIG. 5 is a diagram illustrating an example of a frame over time;
FIG. 6 is a flow diagram illustrating one configuration of a method for encoding speech signals by an encoder;
FIG. 7 is a diagram illustrating an example of Line Spectral Frequency (LSF) vector determination;
FIG. 8 includes two diagrams illustrating examples of LSF interpolation and extrapolation;
FIG. 9 is a flow diagram illustrating one configuration of a method for decoding an encoded speech signal by a decoder;
FIG. 10 is a diagram illustrating one example of clustering LSF dimensions;
FIG. 11 is a graph illustrating an example of artifacts due to clustered LSF dimensions;
FIG. 12 is a block diagram illustrating one configuration of an electronic device configured for mitigating potential frame instability;
FIG. 13 is a flow diagram illustrating one configuration of a method for mitigating potential frame instability;
FIG. 14 is a flow diagram illustrating a more specific configuration of a method for mitigating potential frame instability;
FIG. 15 is a flow diagram illustrating another more particular configuration of a method for mitigating potential frame instability;
FIG. 16 is a flow diagram illustrating another more particular configuration of a method for mitigating potential frame instability;
FIG. 17 is a graph illustrating an example of synthesizing a speech signal;
FIG. 18 is a block diagram illustrating one configuration of a wireless communication device in which systems and methods for mitigating potential frame instability may be implemented; and
FIG. 19 illustrates various components that may be used in an electronic device.
Detailed Description
Various configurations are now described with reference to the figures, where like reference numbers may indicate functionally similar elements. The systems and methods as generally described and illustrated in the figures may be arranged and designed in a wide variety of different configurations. Thus, the following more detailed description of several configurations, such as those represented in the figures, is not intended to limit the scope, as claimed, but is merely representative of systems and methods.
Fig. 1 is a block diagram illustrating a general example of an encoder 104 and a decoder 108. The encoder 104 receives the speech signal 102. The speech signal 102 may be a speech signal in any frequency range. For example, the speech signal 102 may be a full band signal having an approximate frequency range of 0 kilohertz (kHz) to 24kHz, an ultra-wideband signal having an approximate frequency range of 0kHz to 16kHz, a wideband signal having an approximate frequency range of 0kHz to 8kHz, a narrowband signal having an approximate frequency range of 0kHz to 4kHz, a low frequency signal having an approximate frequency range of 50 hertz (Hz) to 300Hz, or a high frequency signal having an approximate frequency range of 4kHz to 8 kHz. Other possible frequency ranges for voice signal 102 include 300Hz to 3400Hz (e.g., the frequency range of the Public Switched Telephone Network (PSTN)), 14kHz to 20kHz, 16kHz to 20kHz, and 16kHz to 32 kHz. In some configurations, the speech signal 102 may be sampled at 16kHz and may have an approximate frequency range of 0kHz to 8 kHz.
The encoder 104 encodes the speech signal 102 to produce an encoded speech signal 106. In general, the encoded speech signal 106 includes one or more parameters representative of the speech signal 102. One or more of the parameters may be quantized. Examples of the one or more parameters include filtering parameters (e.g., weighting factors, Line Spectral Frequencies (LSFs), Line Spectral Pairs (LSPs), Immittance Spectral Frequencies (ISFs), Immittance Spectral Pairs (ISPs), partial correlation (PARCOR) coefficients, reflection coefficients, and/or log-area-ratio values (log-area-ratio values), etc.), and parameters included in the encoded excitation signal (e.g., gain factors, adaptive codebook indices, adaptive codebook gains, fixed codebook indices, and/or fixed codebook gains, etc.). The parameters may correspond to one or more frequency bands. Decoder 108 decodes encoded speech signal 106 to produce decoded speech signal 110. For example, decoder 108 constructs decoded speech signal 110 based on one or more parameters included in encoded speech signal 106. Decoded speech signal 110 may be an approximate reproduction of original speech signal 102.
The encoder 104 may be implemented in hardware (e.g., circuitry), software, or a combination of both. For example, the encoder 104 may be implemented as an Application Specific Integrated Circuit (ASIC) or a processor with instructions. Similarly, the decoder 108 may be implemented in hardware (e.g., circuitry), software, or a combination of both. For example, the decoder 108 may be implemented as an Application Specific Integrated Circuit (ASIC) or a processor with instructions. The encoder 104 and the decoder 108 may be implemented on separate electronic devices or on the same electronic device.
Fig. 2 is a block diagram illustrating an example of a basic implementation of the encoder 204 and decoder 208. The encoder 204 may be one example of the encoder 104 described in connection with fig. 1. The encoder 204 may include an analysis module 212, a coefficient transform 214, a quantizer a 216, an inverse quantizer a 218, an inverse coefficient transform a220, an analysis filter 222, and a quantizer B224. One or more of the components of the encoder 204 and/or decoder 208 may be implemented in hardware (e.g., circuitry), software, or a combination of both.
The encoder 204 receives the speech signal 202. It should be noted that the speech signal 202 may include any frequency range (e.g., an entire band of speech frequencies or a sub-band of speech frequencies) as described above in connection with fig. 1.
In this example, analysis module 212 encodes the spectral envelope of speech signal 202 as a set of Linear Prediction (LP) coefficients (e.g., analysis filter coefficients a (z), which may be applied to generate an all-polar filter 1/a (z), where z is a complex number). The analysis module 212 typically processes the input signal as a series of non-overlapping frames of the speech signal 202, with a new set of coefficients being calculated for each frame or sub-frame. In some configurations, the frame period may be a period within which the speech signal 202 may be expected to be locally stationary. One common example of a frame period is 20 milliseconds (ms) (e.g., equivalent to 160 samples at a sampling rate of 8 kHz). In one example, analysis module 212 is configured to calculate a set of ten linear prediction coefficients to characterize the formant structure for each 20ms frame. It is also possible to implement the analysis module 212 to process the speech signal 202 as a series of overlapping frames.
The analysis module 212 may be configured to directly analyze the samples for each frame, or may first weight the samples according to a windowing function, such as a Hamming window. The analysis may also be performed within a window larger than a frame, e.g. a 30ms window. This window may be symmetric (e.g., 5-20-5 such that it includes 5 milliseconds immediately before and after the 20 millisecond frame) or asymmetric (e.g., 10-20 such that it includes 10 milliseconds after the previous frame). The analysis module 212 is typically configured to calculate linear prediction coefficients using the Levenson-Durbin (Levinson-Durbin) recursive or Roru-Gainun (Leroux-Gueguen) algorithms. In another implementation, the analysis module may be configured to calculate a set of cepstral coefficients instead of a set of linear prediction coefficients for each frame.
By quantizing the coefficients, the output rate of encoder 204 may be significantly reduced with relatively little impact on reproduction quality. Linear prediction coefficients are difficult to quantize efficiently and are typically mapped to another representation, such as an LSF, for quantization and/or entropy coding. In the example of fig. 2, coefficient transform 214 transforms the set of coefficients into a corresponding LSF vector (e.g., a set of LSF dimensions). Other one-to-one representations of coefficients include LSP, PARCOR coefficients, reflection coefficients, log area ratio values, ISP, and ISF. For example, ISF can be used in GSM (Global System for Mobile communications), AMR-WB (adaptive Multi-Rate wideband) codecs. For convenience, the terms "line spectral frequency," "LSF dimension," "LSF vector," and related terms may be used to refer to one or more of LSF, LSP, ISF, ISP, PARCOR coefficients, reflection coefficients, and log area ratio values. Typically, the transform between a set of coefficients and a corresponding LSF vector is reversible, but some configurations may include encoder 204 implementations in which the transform is not reversible without errors.
Quantizer a 216 is configured to quantize the LSF vector (or other coefficient representation). The encoder 204 may output the result of this quantization as the filter parameters 228. Quantizer a 216 typically comprises a vector quantizer that encodes an input vector (e.g., an LSF vector) as an index into a corresponding vector entry in a table or codebook.
As seen in fig. 2, the encoder 204 also generates a residual signal by passing the speech signal 202 through an analysis filter 222 (also referred to as a whitening or prediction error filter) configured according to a set of coefficients. Analysis filter 222 may be implemented as a Finite Impulse Response (FIR) filter or an Infinite Impulse Response (IIR) filter. This residual signal will typically contain perceptually important information of the speech frame not represented in the filtering parameters 228, such as long-term structure related to pitch. Quantizer B224 is configured to calculate a quantized representation of this residual signal for output as an encoded excitation signal 226. In some configurations, quantizer B224 includes a vector quantizer that encodes the input vector as an index to a corresponding vector entry in a table or codebook. Additionally or alternatively, quantizer B224 may be configured to send one or more parameters from which vectors may be dynamically generated at the decoder, rather than retrieved from storage as in the sparse codebook approach. Such methods are used in coding schemes such as algebraic CELP (code excited linear prediction) and codecs such as 3GPP2 (third generation partnership 2) EVRC (enhanced variable rate codec). In some configurations, the encoded excitation signal 226 and the filtering parameters 228 may be included in the encoded speech signal 106.
It may be beneficial for the encoder 204 to generate the encoded excitation signal 226 according to the same filter parameter values that will be available to the corresponding decoder 208. In this way, the resulting encoded excitation signal 226 may address to some extent non-idealities in those parameter values, such as quantization errors. Accordingly, it may be beneficial to configure analysis filter 222 using the same coefficient values that would be available at decoder 208. In the basic example of the encoder 204 as illustrated in fig. 2, the inverse quantizer a 218 dequantizes the filter parameters 228. Inverse coefficient transform a220 maps the resulting values back to a set of corresponding coefficients. This set of coefficients is used to configure the analysis filter 222 to produce a residual signal quantized by quantizer B224.
Some implementations of the encoder 204 are configured to calculate the encoded excitation signal 226 by identifying one codebook vector that best matches the residual signal among a set of codebook vectors. It should be noted, however, that the encoder 204 may also be implemented to calculate a quantized representation of the residual signal without actually generating the residual signal. For example, the encoder 204 may be configured to generate a corresponding synthesized signal using a number of codebook vectors (e.g., according to a current set of filtering parameters) and select the codebook vector associated with the generated signal that best matches the original speech signal 202 in the perceptual weighting domain.
Decoder 208 may include inverse quantizer B230, inverse quantizer C236, inverse coefficient transform B238, and synthesis filter 234. Inverse quantizer C236 dequantizes filtering parameters 228 (e.g., LSF vectors), and inverse coefficient transform B238 transforms the LSF vectors into a set of coefficients (e.g., as described above with reference to inverse quantizer a 218 and inverse coefficient transform a220 of encoder 204). The inverse quantizer B230 dequantizes the encoded excitation signal 226 to produce an excitation signal 232. Based on the coefficients and the excitation signal 232, the synthesis filter 234 synthesizes the decoded speech signal 210. In other words, the synthesis filter 234 is configured to spectrally shape the excitation signal 232 according to the dequantized coefficients to generate the decoded speech signal 210. In some configurations, the decoder 208 may also provide the excitation signal 232 to another decoder, which may use the excitation signal 232 to derive an excitation signal of another frequency band (e.g., a high frequency band). In some implementations, the decoder 208 may be configured to provide additional information about the excitation signal 232, such as spectral tilt, pitch gain and lag, and speech mode, to another decoder.
The system of encoder 204 and decoder 208 is a basic example of a synthesized analysis speech codec. Codebook excited linear predictive coding is a popular family of analysis-by-synthesis coding. Implementations of such coders may perform residual waveform coding, including operations such as selecting entries from fixed and adaptive codebooks, error minimization operations, and/or perceptual weighting operations. Other implementations of synthetic analysis coding include Mixed Excitation Linear Prediction (MELP), algebraic CELP (acelp), relaxed CELP (rcelp) Regular Pulse Excitation (RPE), multi-pulse excitation (MPE), multi-pulse CELP (MP-CELP), and Vector Sum Excitation Linear Prediction (VSELP) coding. Related decoding methods include multi-band excitation (MBE) and Prototype Waveform Interpolation (PWI) decoding. Examples of standardized analysis-by-synthesis speech codecs include ETSI (european telecommunications standards institute) -GSM full rate codec (GSM 06.10) using Residual Excited Linear Prediction (RELP), GSM enhanced full rate codec (ETSI-GSM 06.60), ITU (international telecommunication sector)Federation) Standard 11.8 kilobits per second (kbps) G.729Annex E decoders, IS (temporary Standard) -641 codecs for IS-136 (time division multiple Access scheme), GSM adaptive Multi-Rate (GSM-AMR) codecs, and 4GVTM(fourth generation Vocoder)TM) Codecs (QUALCOMM corporation, san diego, ca). The encoder 204 and corresponding decoder 208 may be implemented in accordance with any of these techniques or any other speech coding technique (whether known or to be developed) that represents a speech signal as (a) a set of parameters that describes a filter and (B) an excitation signal used to drive the filter to reproduce the speech signal.
Even after the analysis filter 222 has removed the coarse spectral envelope from the speech signal 202, a large amount of fine harmonic structure may remain, especially for voiced speech. The periodic structure is related to pitch, and different voiced sounds spoken by the same speaker may have different formant structures but similar pitch structures.
Coding efficiency and/or speech quality may be improved by encoding characteristics of the pitch structure using one or more parameter values. An important characteristic of tonal structures is the frequency of the first harmonic (also called the fundamental frequency), which is typically in the range of 60 hertz (Hz) to 400 Hz. This characteristic is typically encoded as the inverse of the fundamental frequency, also known as pitch lag. The pitch lag indicates the number of samples in one pitch period and may be encoded as one or more codebook indices. Speech signals from male speakers tend to have a greater pitch lag than speech signals from female speakers.
Another signal characteristic related to tonal structures is periodicity, which indicates the strength of the harmonic structure, or in other words, the degree to which the signal is harmonic or non-harmonic. Two typical indicators of periodicity are zero crossings and normalized secondary correlation function (NACF). Periodicity may also be indicated by pitch gain, which is typically encoded as a codebook gain (e.g., quantized adaptive codebook gain).
Encoder 204 may include one or more modules configured to encode long-term harmonic structures of speech signal 202. In some methods of CELP coding, the encoder 204 includes an open-loop Linear Predictive Coding (LPC) analysis module that encodes short-term characteristics or coarse spectral envelopes, followed by a closed-loop long-term prediction analysis stage that encodes fine pitch or harmonic structures. The short-term characteristics are encoded as coefficients (e.g., filter parameters 228) and the long-term characteristics are encoded as values of parameters such as pitch lag and pitch gain. For example, the encoder 204 may be configured to output the encoded excitation signal 226 in a form that includes one or more codebook indices (e.g., a fixed codebook index and an adaptive codebook index) and corresponding gain values. The calculation of this quantized representation of the residual signal (e.g., by quantizer B224) may include selecting these indices and calculating these values. The encoding of the pitch structure may also include interpolation of the pitch prototype waveform, the operation of which may include calculating the difference between successive pitch pulses. For frames corresponding to unvoiced speech, which are typically noise-like and unstructured, modeling of long-term structures may be disabled.
Some implementations of the decoder 208 may be configured to output the excitation signal 232 to another decoder (e.g., a high-band decoder) after the long-term structure (pitch or harmonic structure) has been restored. For example, such a decoder may be configured to output the excitation signal 232 as a dequantized version of the encoded excitation signal 226. Of course, it is also possible to implement the decoder 208 such that another decoder performs inverse quantization of the encoded excitation signal 226 to obtain the excitation signal 232.
FIG. 3 is a block diagram illustrating an example of wideband speech encoder 342 and wideband speech decoder 358. One or more components of wideband speech encoder 342 and/or wideband speech decoder 358 may be implemented in hardware (e.g., circuitry), software, or a combination of both. Wideband speech encoder 342 and wideband speech decoder 358 may be implemented on separate electronic devices or on the same electronic device.
The wideband speech encoder 342 comprises a filter bank a 344, a first band encoder 348 and a second band encoder 350. Filter bank A344 is configured to filter the wideband speech signal 340 to generate a first frequency band signal 346a (e.g., a narrowband signal) and a second frequency band signal 346b (e.g., a high frequency signal).
First frequency band encoder 348 is configured to encode first frequency band signal 346a to generate filter parameters 352 (e.g., Narrowband (NB) filter parameters) and an encoded excitation signal 354 (e.g., an encoded narrowband excitation signal). In some configurations, the first band encoder 348 may generate the filter parameters 352 and the encoded excitation signal 354 as a codebook index or in another quantized form. In some configurations, the first band encoder 348 may be implemented in accordance with the encoder 204 described in conjunction with fig. 2.
The second band encoder 350 is configured to encode a second band signal 346b (e.g., a high frequency signal) according to information in the encoded excitation signal 354 to generate second band coding parameters 356 (e.g., high frequency coding parameters). The second band encoder 350 may be configured to generate the second band coding parameters 356 as codebook indices or in another quantized form. One particular example of wideband speech encoder 342 is configured to encode wideband speech signal 340 at a rate of approximately 8.55kbps, with approximately 7.55kbps for filter parameters 352 and encoded excitation signal 354, and approximately 1kbps for second band coding parameters 356. In some implementations, the filter parameters 352, the encoded excitation signal 354, and the second band coding parameters 356 may be included in the encoded speech signal 106.
In some configurations, the second band encoder 350 may be implemented similar to the encoder 204 described in connection with fig. 2. For example, the second band encoder 350 may generate the second band filter parameters (e.g., as part of the second band coding parameters 356), as described in connection with the encoder 204 (described in connection with fig. 2). However, the second band encoder 350 may be different in some aspects. For example, the second band encoder 350 may include a second band excitation generator that may generate a second band excitation signal based on the encoded excitation signal 354. Second band encoder 350 may utilize the second band excitation signal to generate a synthesized second band signal and determine a second band gain factor. In some configurations, the second band encoder 350 may quantize the second band gain factors. Thus, examples of second band coding parameters 356 include second band filter parameters and quantized second band gain factors.
It may be beneficial to combine the filter parameters 352, the encoded excitation signal 354, and the second band coding parameters 356 in a single bitstream. For example, it may be beneficial to multiplex the encoded signals together for transmission (e.g., over a wired, optical, or wireless transmission channel) or storage (as encoded wideband speech signals). In some configurations, wideband speech encoder 342 includes a multiplexer configured to combine filter parameters 352, encoded excitation signal 354, and second band coding parameters 356 into a multiplexed signal. The filter parameters 352, the encoded excitation signal 354, and the second band coding parameters 356 may be examples of parameters included in the encoded speech signal 106 as described in connection with FIG. 1.
In some implementations, an electronic device that includes wideband speech encoder 342 may also include circuitry configured to transmit the multiplexed signal in a transmission channel, such as a wired, optical, or wireless channel. Such electronic devices may also be configured to perform one or more channel coding operations on the signal, such as error correction coding (e.g., rate-compatible convolutional coding) and/or error detection coding (e.g., cyclic redundancy coding), and/or one or more layers of network protocol coding (e.g., ethernet, transmission control protocol/internet protocol (TCP/IP), cdma2000, etc.).
The following may be beneficial: the multiplexer is configured to embed the filter parameters 352 and the encoded excitation signal 354 as separable substreams of the multiplexed signal, such that the filter parameters 352 and the encoded excitation signal 354 can be recovered and decoded independently of another portion of the multiplexed signal (e.g., high frequency and/or low frequency signals). For example, the multiplexed signal may be arranged such that the filter parameters 352 and the encoded excitation signal 354 may be recovered by removing the second band coding parameters 356. One potential benefit of such features is to avoid the need to transcode second band coding parameters 356 before passing the second band coding parameters 356 to a system that supports decoding of filter parameters 352 and encoded excitation signal 354, but does not support decoding of second band coding parameters 356.
The wideband speech decoder 358 may include a first band decoder 360, a second band decoder 366, and a filter bank B368. A first band decoder 360 (e.g., a narrowband decoder) is configured to decode the filter parameters 352 and the encoded excitation signal 354 to generate a decoded first band signal 362a (e.g., a decoded narrowband signal). The second band decoder 366 is configured to decode the second band coding parameters 356 based on the encoded excitation signal 354 according to the excitation signal 364 (e.g., a narrow-band excitation signal) to generate a decoded second band signal 362b (e.g., a decoded high-frequency signal). In this example, the first band decoder 360 is configured to provide the excitation signal 364 to the second band decoder 366. The filter bank 368 is configured to combine the decoded first frequency band signal 362a and the decoded second frequency band signal 362b to generate a decoded wideband speech signal 370.
Some implementations of wideband speech decoder 358 may include a demultiplexer (not shown) configured to generate filter parameters 352, encoded excitation signal 354, and second band coding parameters 356 from the multiplexed signal. An electronic device that includes wideband speech decoder 358 may include circuitry configured to receive a multiplexed signal from a transmission channel, such as a wired, optical, or wireless channel. Such electronic devices may also be configured to perform one or more channel decoding operations on the signal, such as error correction decoding (e.g., rate-compatible convolutional decoding) and/or error detection decoding (e.g., cyclic redundancy decoding), and/or one or more layers of network protocol decoding (e.g., ethernet, TCP/IP, cdma 2000).
The filter bank a 344 in the wideband speech encoder 342 is configured to filter the input signal according to a split-band scheme to generate a first frequency band signal 346a (e.g., a narrow frequency or low frequency sub-band signal) and a second frequency band signal 346b (e.g., a high frequency or high frequency sub-band signal). The output sub-bands may have equal or unequal bandwidths, and may or may not overlap, depending on the design criteria of a particular application. A configuration of filter bank a 344 that produces more than two sub-bands is also possible. For example, filter bank a 344 may be configured to generate one or more low frequency signals that include components having a frequency range below the frequency range of the first frequency band signal 346a, such as a range of 50 hertz (Hz) to 300 Hz. It is also possible that filter bank a 344 is configured to generate one or more additional high frequency signals that include components having a frequency range higher than the frequency range of second frequency band signal 346b, such as a range of 14 kilohertz (kHz) to 20kHz, 16kHz to 20kHz, or 16kHz to 32 kHz. In such configurations, the wideband speech encoder 342 may be implemented to encode the signals separately, and the multiplexer may be configured to include the additional encoded signals in the multiplexed signal (e.g., as one or more separable portions).
Fig. 4 is a block diagram illustrating a more specific example of encoder 404. In particular, FIG. 4 illustrates a CELP analysis-by-synthesis architecture for low-bit-rate speech coding. In this example, the encoder 404 includes a framing and pre-processing module 472, an analysis module 476, a coefficient transform 478, a quantizer 480, a synthesis filter 484, a summer 488, a perceptual weighted filtering and error minimization module 492, and an excitation estimation module 494. It should be noted that the encoder 404 and/or one or more of the components (e.g., modules) of the encoder 404 may be implemented in hardware (e.g., circuitry), software, or a combination of both.
The speech signal 402 (e.g., input speech s) may be an electronic signal containing speech information. For example, a sonic voice signal may be captured by a microphone and sampled to produce a voice signal 402. In some configurations, the speech signal 402 may be sampled at 16 kHz. The speech signal 402 may include a range of frequencies as described above in connection with fig. 1.
The speech signal 402 may be provided to a framing and pre-processing module 472. The framing and pre-processing module 472 may divide the speech signal 402 into a series of frames. Each frame may be a particular period of time. For example, each frame may correspond to 20ms of the speech signal 402. The framing and pre-processing module 472 may perform other operations on the speech signal, such as filtering (e.g., one or more of low-pass, high-pass, and band-pass filtering). Thus, the framing and pre-processing module 472 may generate a pre-processed speech signal 474 (e.g., s (l), where l is a sample number) based on the speech signal 402.
Analysis module 476 may determine a set of coefficients (e.g., linear prediction analysis filter a (z)). For example, the analysis module 476 may encode the spectral envelope of the preprocessed speech signal 474 as a set of coefficients as described in connection with fig. 2.
The coefficients may be provided to a coefficient transform 478. Coefficient transform 478 transforms the set of coefficients into corresponding LSF vectors (e.g., LSFs, LSPs, ISFs, ISPs, etc.) as described above in connection with fig. 2.
The LSF vector is provided to quantizer 480. Quantizer 480 quantizes the LSF vector into a quantized LSF vector 482. For example, the quantizer 480 may perform vector quantization on the LSF vector to generate a quantized LSF vector 482. In some configurations, the LSF vectors may be generated and/or quantized on a subframe basis. In these configurations, only quantized LSF vectors corresponding to certain subframes (e.g., the last or end subframe of each frame) may be sent to a speech decoder. In these configurations, the quantizer 480 may also determine the quantized weight vector 441. The weighting vectors may be used to quantize LSF vectors (e.g., intermediate LSF vectors) between the LSF vectors corresponding to the transmitted subframes. The weight vector may be quantized. For example, the quantizer 480 may determine the index of the codebook or lookup table corresponding to the weighting vector that best matches the actual weighting vector. The quantized weight vector 441 (e.g., an index) may be sent to a speech decoder. The quantized weighted vector 441 and the quantized LSF vector 482 may be examples of the filter parameters 228 described above in connection with fig. 2.
The quantizer 480 may generate a prediction mode indicator 481 that indicates the prediction mode of each frame. The prediction mode indicator 481 may be sent to a decoder. In some configurations, the prediction mode indicator 481 may indicate one of two prediction modes for a frame (e.g., utilizing predictive quantization or non-predictive quantization). For example, the prediction mode indicator 481 may indicate whether a frame was quantized based on a previous frame (e.g., predictive) or not based on a previous frame (e.g., non-predictive). The prediction mode indicator 481 may indicate a prediction mode of a current frame. In some configurations, the prediction mode indicator 481 can be a bit sent to a decoder indicating whether a frame is quantized by predictive quantization or non-predictive quantization.
The quantized LSF vector 482 is provided to a synthesis filter 484. Synthesis filter 484 generates synthesized speech signal 486 (e.g., reconstructed speech s) based on LSF vector 482 (e.g., quantized coefficients) and excitation signal 496Where l is the sample number). For example, synthesis filter 484 filters excitation signal 496 based on quantized LSF vector 482 (e.g., 1/a (z)).
The synthesized speech signal 486 is subtracted from the preprocessed speech signal 474 by a summer 488 to produce an error signal 490 (also referred to as a prediction error signal). The error signal 490 is provided to a perceptual weighted filtering and error minimization module 492.
Perceptually weighted filtering and error minimization module 492 generates a weighted error signal 493 based on the error signal 490. For example, not all components (e.g., frequency components) of the error signal 490 equally affect the perceptual quality of the synthesized speech signal. Errors in some bands have a greater impact on speech quality than errors in other bands. Perceptual weighted filtering and error minimization module 492 may generate a weighted error signal 493, the weighted error signal 493 reducing errors in frequency components having a greater impact on speech quality and distributing more errors in other frequency components having a lesser impact on speech quality.
The excitation estimation module 494 generates an excitation signal 496 and an encoded excitation signal 498 based on the output of the perceptually weighted filtering and error minimization module 492. For example, the excitation estimation module 494 estimates one or more parameters characterizing the error signal 490 (e.g., the weighted error signal 493). The encoded excitation signal 498 may include the one or more parameters and may be sent to a decoder. For example, in the CELP method, the excitation estimation module 494 may determine parameters characterizing the error signal 490 (e.g., the weighted error signal 493), such as an adaptive (or pitch) codebook index, an adaptive (or pitch) codebook gain, a fixed codebook index, and a fixed codebook gain. Based on these parameters, excitation estimation module 494 can generate excitation signal 496, with excitation signal 496 provided to synthesis filter 484. In this approach, the adaptive codebook index, the adaptive codebook gain (e.g., quantized adaptive codebook gain), the fixed codebook index, and the fixed codebook gain (e.g., quantized fixed codebook gain) may be sent to the decoder as the encoded excitation signal 498.
The encoded excitation signal 498 may be an example of the encoded excitation signal 226 described above in connection with fig. 2. Thus, the quantized weighted vector 441, the quantized LSF vector 482, the encoded excitation signal 498, and/or the prediction mode indicator 481 may be included in the encoded speech signal 106 as described above in connection with fig. 1.
Fig. 5 is a diagram illustrating an example of a frame 503 over time 501. Each frame 503 is divided into a number of sub-frames 505. In the example illustrated in fig. 5, previous frame a 503a includes 4 sub-frames 505 a-505 d, previous frame B503B includes 4 sub-frames 505 e-505 h, and current frame C503C includes 4 sub-frames 505 i-505 l. A typical frame 503 may occupy a period of 20ms and may include 4 subframes, although frames of different lengths and/or different numbers of subframes may be used. Each frame may be represented by a corresponding frame number, where n represents the current frame (e.g., current frame C503C). Further, each subframe may be represented by a corresponding subframe number k.
Fig. 5 may be used to illustrate one example of LSF quantization in an encoder. Each sub-frame k in frame n has a corresponding LSF vectorFor use in analysis and synthesis filters. The current frame end LSF vector 527 (e.g., the last sub-frame LSF vector for the nth frame) is represented asWhereinThe current frame intermediate LSF vector 525 (e.g., the intermediate LSF vector for the nth frame) is represented asAn "intermediate LSF vector" is between other LSF vectors in time 501 (e.g., such asAndin between) LSF vectors. One example of a previous frame end LSF vector 523 is illustrated in FIG. 5 and represented asWhereinAs used herein, the term "previous frame" may refer to any frame (e.g., n-1, n-2, n-3, etc.) prior to the current frame. Thus, a "previous frame end LSF vector" may be an end LSF vector corresponding to any frame prior to the current frame. In the example illustrated in fig. 5, previous frame end LSF vector 523 corresponds to last sub-frame 505h of previous frame B503B (e.g., frame n-1) immediately preceding current frame C503C (e.g., frame n).
Each LSF vector is M-dimensional, where each dimension of the LSF vector corresponds to a single LSF dimension or value. For example, M is typically 16 for wideband speech (e.g., speech sampled at 16 kHz). The i LSF dimension of the k sub-frame of frame n is denoted asWhere i ═ 1, 2.., M }.
In the quantization process for frame n, the end LSF vector may be quantized firstThis quantization may be non-predictive (e.g., a previous LSF vector)Not used in the quantization process) or predictive (e.g., previous LSF vectorsFor use in the quantization process). The intermediate LSF vector may then be quantizedFor example, the encoder may select the weighting vectors such thatAs provided in equation (1).
Weight vector wnCorresponds to a single weight, and is represented by wi,nWhere i ═ 1, 2. It should also be noted that wi,nIs not constrained. In particular, if 0 ≦ wi,n1 is produced fromAndthe value of the delimitation, and wi,n< 0 or wi,nGreater than 1, resulting intermediate LSF vectorCan be in the rangeAnd (c) out. The encoder may determine (e.g., select) a weighting vector wnSuch that the quantized intermediate LSF vector is closest to the actual intermediate LSF vector in the encoder based on some distortion measure, such as Mean Square Error (MSE) or Log Spectral Distortion (LSD). During quantization, the encoder transmits an end LSF vectorQuantization index and weight vector w ofnEnabling the decoder to reconstructAnd
using the interpolation factor α as given by equation (2)kAnd βkBased onAndinterpolating subframe LSF vectors
Note αkAnd βkSo that 0 is less than or equal to (α)kk) 1. interpolation factor αkAnd βkMay be a predetermined value known to both the encoder and decoder.
Fig. 6 is a flow diagram illustrating one configuration of a method 600 for encoding speech signals by the encoder 404. For example, an electronic device including the encoder 404 may perform the method 600. Fig. 6 illustrates the LSF quantization procedure for the current frame n.
The encoder 404 may obtain a previous frame quantized end LSF vector (602). For example, encoder 404 may quantize the codebook vector corresponding to the previous frame by selecting the codebook vector closest to the end LSF vector corresponding to previous frame n-1 (e.g.,) The end LSF vector of (a).
The encoder 404 may quantize the current frame end LSF vector (e.g.,)(604). The encoder 404 quantizes a current frame end LSF vector based on a previous frame end LSF vector if predictive LSF quantization is used (604). However, quantizing the current frame LSF vector (604) is not based on the previous frame end LSF vector if non-predictive quantization is used for the current frame end LSF vector.
The encoder 404 may determine a weighting vector (e.g., w)n) The current frame intermediate LSF vector is quantized (e.g.,)(606). For example, the encoder 404 may select the weighting vector that results in the quantized intermediate LSF vector that is closest to the actual intermediate LSF vector. As illustrated in equation (1), the quantized intermediate LSF vector may be based on a weighting vector, a previous frame end LSF vector, and a current frame end LSF vector.
The encoder 404 may send the quantized current frame end LSF vector and the weighting vector to the decoder (608). For example, the encoder 404 may provide the current frame end LSF vector and the weighting vector to a transmitter on the electronic device, which may transmit the current frame end LSF vector and the weighting vector to a decoder on another electronic device.
Fig. 7 is a diagram illustrating an example of LSF vector determination. Fig. 7 illustrates a previous frame a 703a (e.g., frame n-1) and a current frame B703B (e.g., frame n) over time 701. In this example, the speech samples are weighted using a weighting filter and then used for LSF vector determination (e.g., calculation). First, a weighting filter at encoder 404 is used to determine the end LSF vector of the previous frame (e.g.,)(707). Second, the weighting filter at encoder 404 is used to determine the current frame end LSF vector (e.g.,)(709). Again, the weighting filter at encoder 404 is used to determine (e.g., calculate) the current frame intermediate LSF vector (e.g.,)(711)。
fig. 8 includes two diagrams illustrating examples of LSF interpolation and extrapolation. The horizontal axis in example a 821a illustrates frequency (Hz)819a, and the horizontal axis in example B821B also illustrates frequency (Hz) 819B. In particular, several LSF dimensions are represented in the frequency domain in fig. 8. It should be noted, however, that there are multiple ways to represent the LSF dimension (e.g., frequency, angle, value, etc.). Thus, the horizontal axes 819a through 819B in instances A821 a and B821 a may be described in other units.
Example a 821a illustrates an interpolation case that considers the first dimension of the LSF vector. As described above, an LSF dimension refers to a single LSF dimension or value of an LSF vector. Specifically, example a 821a illustrates a previous frame end LSF dimension 813a at 500Hz (e.g.,) And 800HThe current frame end LSF dimension at z (e.g.,)817 a. In instance A821 a, a first weight (e.g., weight vector w)nOr w1,nMay be used to quantify and indicate the previous frame end LSF dimension in frequency 819a (e.g.,)813a and the current frame end LSF dimension (e.g.,) The middle LSF dimension of the current frame middle LSF vector between 817a (e.g.,)815 a. For example, if w1,n=0.5、And isThenAs illustrated in example a 821 a.
Example B821B illustrates an extrapolation case that considers the first LSF dimension of the LSF vector. Specifically, example B821B illustrates a previous frame end LSF dimension at 500Hz (e.g.,)813b and the current frame end LSF dimension at 800Hz (e.g.,)817 b. In instance B821B, a first weight (e.g., weight vector w)nOr w1,nMay be used to quantify and indicate a previous frame end LSF dimension that is not in frequency 819b (e.g.,)813b and the current frame end LSF dimension (e.g.,) The middle LSF dimension of the current frame middle LSF vector between 817b (e.g.,)815 b. For example, as illustrated in example B821B, if w1,n=2、And isThen
FIG. 9 is a flow diagram illustrating one configuration of a method 900 for decoding an encoded speech signal by a decoder. For example, an electronic device including a decoder may perform method 900.
The decoder may obtain a previous frame dequantized end LSF vector (e.g.,)(902). For example, the decoder may retrieve a dequantized end LSF vector corresponding to a previously decoded (or estimated, in the case of frame erasure) previous frame.
Encoder dequantizationThe current frame end LSF vector (e.g.,)(904). For example, the decoder may dequantize the current frame end LSF vector by looking up the current frame LSF vector in a codebook or table based on the received LSF vector index (904).
The decoder may be based on a weighting vector (e.g., w)n) A current frame intermediate LSF vector is determined (e.g.,)(906). For example, the decoder may receive a weighting vector from the encoder. The decoder may then determine a current frame intermediate LSF vector based on the previous frame end LSF vector, the current frame end LSF vector, and the weighting vector as illustrated in equation (1) (906). As described above, each LSF vector may have M dimensions or LSF dimensions (e.g., 16 LSF dimensions). There should be a minimum separation between two or more of the LSF dimensions in the LSF vector in order to stabilize the LSF vector. However, if there are multiple LSF dimensions that only separate clusters with a minimum, then there is a substantial likelihood of unstable LSF vectors. As described above, the decoder may reorder the LSF vectors if the separation between two or more of the LSF dimensions in the LSF vectors is less than a minimum value.
The methods for weighting and interpolating and/or extrapolating LSF vectors described in connection with fig. 4 through 9 operate well under clean channel conditions (no frame erasure and/or transmission errors). However, this approach may have some serious problems when one or more frame erasures occur. Erased frames are frames that are not received by the decoder or incorrectly received in error. For example, if an encoded speech signal corresponding to a frame is not received or incorrectly received in error, then the frame is an erased frame.
An example of frame erasure is given below with reference to fig. 5. Assume that previous frame B503B is an erased frame (e.g., frame n-1 is lost). In this case, the decoder estimates the missing end LSF direction based on the previous frame A503 a (e.g., frame n-2)Quantity (expressed as) And an intermediate LSF vector (denoted as). It is also assumed that frame n is received correctly. The decoder can be based on equation (1)Andthe current frame intermediate LSF vector 525 is calculated. In the extrapolationIn the case of a particular LSF dimension j (e.g., dimension j), there is a possibility of: the LSF dimension is far from the extrapolation process used in the encoder (e.g.,) Is outside the frequency of the LSF dimension.
The LSF dimensions in each LSF vector may be ordered such thatWhere Δ is the minimum separation (e.g., frequency separation) between two consecutive LSF dimensions. As described above, if some LSF dimensions j (e.g., denoted as) Erroneously extrapolated such that it is significantly larger than the correct value, then the subsequent LSF dimensionCan be recalculated intoEven if it is calculated in the decoder asFor example, although the recalculated LSF dimension j, j +1, etc. may be less than the LSF dimension j, it may be recalculated to be due to the applied ordering structureThis results in LSF vectors having two or more LSF dimensions that are located adjacent to each other with a minimum allowed distance. Two or more LSF dimensions separated by only a minimal separation may be referred to as "clustered LSF dimensions". Clustering the LSF dimensions may result in unstable LSF dimensions (e.g., unstable subframe LSF dimensions) and/or unstable LSF vectors. The unstable LSF dimension corresponds to the coefficients of the synthesis filter that may cause speech artifacts.
In a strict sense, a filter may be unstable if it is at least one pole on or outside of a unit circle. In the case of speech coding and as used herein, the terms "unstable" and "instability" are used in a broad sense. For example, an "unstable LSF dimension" is any LSF dimension that corresponds to coefficients of a synthesis filter that may cause speech artifacts. For example, an unstable LSF dimension may not necessarily correspond to a pole on or outside of a unit circle, but may be "unstable" if their values are too close to each other. This is because LSF dimensions that are located too close to each other may specify poles in a synthesis filter that has a highly resonant filter response in some frequencies that produce speech artifacts. For example, an unstable quantized LSF dimension may specify a pole placement of a synthesis filter that may result in an undesirable increase in energy. Typically, for LSF dimensions expressed in terms of angles between 0 and pi, the LSF dimensional separation may be maintained around 0.01 x pi. As used herein, an "unstable LSF vector" is a vector that includes one or more unstable LSF dimensions. Further, an "unstable synthesis filter" is a synthesis filter having one or more coefficients (e.g., poles) corresponding to one or more unstable LSF dimensions.
FIG. 10 is a diagram illustrating one example of a clustered LSF dimension 1029. The LSF dimension is illustrated in frequency (Hz)1019, but it should be noted that the LSF dimension may alternatively be characterized in other units. The LSF dimension (e.g.,1031a、1031b and1031c) Is an example of the LSF dimension included in the current frame intermediate LSF vector after estimation and reordering. In a previously erased frame, for example, the decoder estimates the end LSF vector of the previous frame that may be incorrect (e.g.,) Is measured. In this case, the current frame intermediate LSF vector (e.g.,1031a) May also be incorrect.
The decoder may attempt to reorder the current frame intermediate LSF vectors (e.g.,1031b) The next LSF dimension. As described above, it may be desirable for each connected LSF dimension in the LSF vector to be larger than the previous element. For example,1031b must be larger than1031 a. Thus, the decoder can place it as AND1031a have minimal separation (e.g., Δ). More specifically, the present invention is described in detail,thus, there may be multiple LSF dimensions with minimal separation (e.g., Δ ═ 100Hz) (e.g.,1031a、1031b and1031c) As illustrated in fig. 10. Therefore, the temperature of the molten metal is controlled,1031a、1031b and1031c is an example of a cluster LSF dimension 1029. Clustering the LSF dimensions can result in unstable synthesis filters, which in turn can produce speech artifacts in the synthesized speech.
Figure 11 is a graph illustrating an example of artifacts 1135 due to clustered LSF dimensions. More specifically, the graph illustrates an example of artifacts 1135 in a decoded speech signal (e.g., synthesized speech) due to the clustered LSF dimensions applied to the synthesis filter. The horizontal axis of the graph is illustrated in time 1101 (e.g., seconds) and the vertical axis of the graph is illustrated in amplitude 1133 (e.g., number, value). The amplitude 1133 may be a number expressed in bits. In some configurations, 16 bits may be utilized to represent samples of a speech signal having a range of values between-32768 to 32767, which corresponds to one range (e.g., values between-1 and +1 in floating point). It should be noted that the amplitude 1133 may be represented differently based on the implementation. In some examples, the value of the amplitude 1133 may correspond to an electromagnetic signal characterized by a voltage (in volts) and/or a current (in amps).
Interpolating and/or extrapolating LSF vectors between current and previous frame LSF vectors on a sub-frame basis is known in speech coding systems. Under erased frame conditions as described in connection with fig. 10 and 11, the LSF interpolation and/or extrapolation scheme may generate unstable LSF vectors for certain sub-frames, which may lead to annoying artifacts in the synthesized speech. When a predictive quantization technique is used in addition to the non-predictive technique for LSF quantization, artifacts occur more frequently.
Using an increased number of bits to guard against errors and using non-predictive quantization to avoid error propagation are common ways to solve the problem. However, the introduction of additional bits under the bit-constrained decoder is not possible, and the use of non-predictive quantization may reduce speech quality in clean channel conditions (e.g., no erased frames).
The systems and methods disclosed herein may be used to reduce potential frame instability. For example, some configurations of the systems and methods disclosed herein may be applied to reduce speech coding artifacts due to frame instability (generated by predictive quantization of LSF vectors under impaired channels and frame-to-frame interpolation and extrapolation).
Fig. 12 is a block diagram illustrating one configuration of an electronic device 1237 configured for mitigating potential frame instability. The electronics 1237 include a decoder 1208. One or more of the decoders described above may be implemented according to decoder 1208 described in connection with fig. 12. Electronics 1237 also include erased frame detector 1243. Erased frame detector 1243 may be implemented separately from decoder 1208 or may be implemented in decoder 1208. Erased frame detector 1243 detects erased frames (e.g., frames that are not received or received in error), and may provide erased frame indicator 1267 when an erased frame is detected. For example, erased frame detector 1243 may detect erased frames based on one or more of a hash function, a checksum, a repetition code, check bits, a Cyclic Redundancy Check (CRC), etc. It should be noted that one or more of the components included in the electronic device 1237 and/or the decoder 1208 may be implemented in hardware (e.g., circuitry), software, or a combination of both. One or more of the lines or arrows illustrated in the block diagrams herein may indicate coupling (e.g., connection) between components or elements.
Decoder 1208 generates a decoded speech signal 1259 (e.g., a synthesized speech signal) based on the received parameters. Examples of received parameters include quantized LSF vector 1282, quantized weighted vector 1241, prediction mode indicator 1281, and encoded excitation signal 1298. The decoder 1208 includes one or more of an inverse quantizer a 1245, an interpolation module 1249, an inverse coefficient transform 1253, a synthesis filter 1257, a frame parameter determination module 1261, a weighting value substitution module 1265, a stability determination module 1269, and an inverse quantizer B1273.
The decoder 1208 receives a quantized LSF vector 1282 (e.g., quantized LSF, LSP, ISF, ISP, PARCOR coefficients, reflection coefficients, or log area ratio values) and a quantized weighting vector 1241. The received quantized LSF vector 1282 may correspond to a subset of subframes. For example, quantized LSF vector 1282 may include only the quantized end LSF vector corresponding to the last subframe of each frame. In some configurations, quantized LSF vector 1282 may be an index corresponding to a lookup table or codebook. Additionally or alternatively, quantized weighting vector 1241 may be an index corresponding to a look-up table or codebook.
The electronics 1237 and/or the decoder 1208 may receive the prediction mode indicator 1281 from the encoder. As described above, the prediction mode indicator 1281 indicates a prediction mode of each frame. For example, prediction mode indicator 1281 may indicate one of two or more prediction modes for a frame. More specifically, the prediction mode indicator 1281 may indicate whether predictive quantization or non-predictive quantization is utilized.
When the frame is correctly received, inverse quantizer a 1245 dequantizes received quantized LSF vector 1282 to produce dequantized LSF vector 1247. For example, inverse quantizer a 1245 may look up (e.g., quantized LSF vector 1282) based on an index corresponding to a lookup table or codebookDequantized LSF vector 1247. Dequantizing the quantized LSF vector 1282 may also be based on the prediction mode indicator 1281. The dequantized LSF vector 1247 may correspond to a subset of subframes (e.g., an end LSF vector corresponding to a last subframe of each frame)). Furthermore, inverse quantizer a 1245 dequantizes quantized weighted vector 1241 to generate dequantized weighted vector 1239. For example, inverse quantizer a 1245 may look up the dequantized weight vector 1239 based on an index (e.g., quantized weight vector 1241) corresponding to a look-up table or codebook.
When the frame is an erased frame, erased frame detector 1243 may provide erased frame indicator 1267 to dequantizer a 1245. When an erased frame is present, one or more quantized LSF vectors 1282 and/or one or more quantized weighted vectors 1241 may not be received or may contain errors. In this case, dequantizer a 1245 may estimate one or more dequantized LSF vectors 1247 (e.g., the end LSF vector of the erased frame) based on one or more LSF vectors from the previous frame (e.g., the frame before the erased frame)). Additionally or alternatively, inverse quantizer a 1245 may estimate one or more dequantized weighting vectors 1239 in the presence of erased frames.
The dequantized LSF vector 1247 (e.g., the end LSF vector) may be provided to a frame parameter determination module 1261 and an interpolation module 1249. Further, one or more dequantized weighting vectors 1239 may be provided to the frame parameter determination module 1261. The frame parameter determination module 1261 obtains a frame. For example, the frame parameter determination module 1261 may obtain erased frames (e.g., estimated dequantized weighting vector 1239 and estimated dequantized LSF vector 1247 corresponding to erased frames). The frame parameter determination module 1261 may also obtain a frame following the erased frame (e.g., a correctly received frame). For example, frame parameter determination module 1261 may obtain dequantized weighting vector 1239 and dequantized LSF vector 1247 corresponding to the correctly received frame after the erased frame.
Frame parameter determination module 1261 determines frame parameter a1263a based on dequantized LSF vector 1247 and dequantized weighting vector 1239. One example of frame parameter a1263a is an intermediate LSF vector (e.g.,). For example, the frame parameter determination module may apply the received weighting vector (e.g., dequantized weighting vector 1239) to generate a current frame intermediate LSF vector. For example, frame parameter determination module 1261 may be based on a current frame end LSF vectorEnd of previous frame LSF vectorAnd a current frame weight vector wnDetermining a current frame intermediate LSF vector according to equation (1)Other examples of frame parameter a1263a include LSP vectors and ISP vectors. For example, frame parameter a1263a may be any parameter estimated based on two end subframe parameters.
In some configurations, frame parameter determination module 1261 may determine frame parameters (e.g., current frame intermediate LSF vectors)) Whether any reordering is preceded by ordering according to a rule. In one example, this frame parameter is the current frame intermediate LSF vectorAnd the rule may be an intermediate LSF vectorIn increasing order of each LSF dimension and across each LSF dimension pairWith at least a minimum separation between them. In this example, frame parameter determination module 1261 may determine an intermediate LSF vectorWhether each LSF dimension in (a) is in increasing order and with at least a minimum separation between each LSF dimension pair. For example, the frame parameter determination module 1261 may determineWhether it is true.
In some configurations, the frame parameter determination module 1261 may provide the ordering indicator 1262 to the stability determination module 1269. Ordering indicator 1262 indicates the LSF dimension (e.g., intermediate LSF vector)The LSF dimension) is not out of order and/or the separation is not greater than the minimum separation Δ prior to any reordering.
In some cases, frame parameter determination module 1261 may reorder the LSF vectors. For example, if frame parameter determination module 1261 determines that an intermediate LSF vector is included in the current frameThe LSF dimensions in (a) are not in increasing order and/or the LSF dimensions do not have at least a minimum separation between each LSF dimension pair, then the frame parameter determination module 1261 may reorder the LSF dimensions. For example, frame parameter determination module 1261 may interpolate an LSF vector for the current frameIs reordered such that the criterion is not metFor each of the LSF dimensions of (a),in other words, the frame parameter determination module 1261 may add Δ to the LSF dimension to obtain the location of the next LSF dimension (if the next LSF dimension is not separated by at least Δ). Furthermore, this can only be done for LSF dimensions that are not separated by the minimum separation Δ. As described above, this reordering may result in intermediate LSF vectorsThe cluster LSF dimension of (1). Thus, in some cases (e.g., for one or more frames following an erased frame), frame parameter a1263a may be a reordered LSF vector (e.g., an intermediate LSF vector))。
In some configurations, the frame parameter determination module 1261 may be implemented as part of the inverse quantizer a 1245. For example, determining the intermediate LSF vector based on the dequantized LSF vector 1247 and the dequantized weighting vector 1239 may be considered part of the dequantization procedure. Frame parameter a1263a may be provided to a weighting value substitution module 1265 and optionally to a stability determination module 1269.
The stability determination module 1269 may determine whether the frame is potentially unstable. When the stability determination module 1269 determines that the current frame is potentially unstable, the stability determination module 1269 may provide an instability indicator 1271 to the weighted value substitution module 1265. In other words, the instability indicator 1271 indicates that the current frame is potentially unstable.
A potentially unstable frame is a frame having one or more characteristics indicative of a risk of producing speech artifacts. Examples of characteristics indicative of a risk of generating speech artifacts may include whether a frame is within one or more frames following an erased frame, whether any frame between the frame and an erased frame utilizes predictive (or non-predictive) quantization, and/or whether frame parameters are ordered according to rules prior to any reordering. A potentially unstable frame may correspond to (e.g., may include) one or more unstable LSF vectors. It should be noted that in some cases, a potentially unstable frame may actually be stable. However, it may be difficult to determine whether a frame is necessarily stable or not necessarily stable without composing the entire frame. Accordingly, the systems and methods disclosed herein may take corrective action to reduce potentially unstable frames. One benefit of the systems and methods disclosed herein is the detection of potentially unstable frames without composing an entire frame. This may reduce the amount of processing and/or delay required to detect and/or reduce speech artifacts.
In a first approach, the stability determination module 1269 determines whether the current frame (e.g., frame n) is potentially unstable based on whether the current frame is within a threshold number of frames after the erased frame and whether any frames between the erased frame and the current frame utilize predictive (or non-predictive) quantization. The current frame may be correctly received. In this approach, the stability determination module 1269 determines that a frame is potentially unstable if the current frame is received within a threshold number of frames after the erased frame and if no frames utilize non-predictive quantization between the current frame and the erased frame (if present).
The number of frames between the erased frame and the current frame may be determined based on the erased frame indicator 1267. For example, the stability determination module 1269 may maintain a counter that is incremented for each frame after the erased frame. In one configuration, the threshold number of frames after the erased frame may be 1. In this configuration, the next frame after the erased frame is always considered potentially unstable. For example, if the current frame is the next frame after the erased frame (thus, no frame between the current frame and the erased frame utilizes non-predictive quantization), the stability determination module 1269 determines that the current frame is potentially unstable. In this case, the stability determination module 1269 provides an instability indicator 1271 that indicates that the current frame is potentially unstable.
In other configurations, the threshold number of frames after the erased frame may be greater than 1. In these configurations, the stability determination module 1269 may determine whether there is a frame between the current frame and the erased frame that utilizes non-predictive quantization based on the prediction mode indicator 1281. For example, the prediction mode indicator 1281 may indicate whether predictive or non-predictive quantization is used for each frame. If there is a frame between the current frame and the erased frame that uses non-predictive quantization, the stability determination module 1269 may determine that the current frame is stable (e.g., not potentially unstable). In this case, the stability determination module 1269 may not indicate that the current frame is potentially unstable.
In the second approach, the stability determination module 1269 determines whether the current frame (e.g., frame n) is potentially unstable based on whether the current frame is received after an erased frame, whether the frame parameter a1263a is ordered according to a rule before any reordering, and whether any frame between the erased frame and the current frame utilizes non-predictive quantization. In this approach, the stability determination module 1269 determines that a frame is potentially unstable if the current frame is obtained after an erased frame, if the frame parameters a1263a are not ordered according to a rule before any reordering, and if no frame utilizes non-predictive quantization between the current frame and the erased frame (if present).
It may be determined whether the current frame is received after the erased frame based on the erased frame indicator 1267. It may be determined whether any frame between the erased frame and the current frame utilizes non-predictive quantization based on the prediction mode indicator as described above. For example, the stability determination module 1269 determines that the current frame is potentially unstable if it is any number of frames after the erased frame, if no frames between the current frame and the erased frame utilize non-predictive quantization, and if the frame parameter a1263a is not ordered according to rules prior to any reordering. In this case, the stability determination module 1269 provides an instability indicator 1271 that indicates that the current frame is potentially unstable.
In some configurations, stability determination module 1269 may obtain ordering indicator 1262 from frame parameter determination module 1261, which indicates frame parameter a1263a (e.g., current frame intermediate LSF vector)) Whether any reordering is preceded by ordering according to a rule. For example, rowsThe order indicator 1262 may indicate an LSF dimension (e.g., an intermediate LSF vector)LSF dimension) is out of order and/or is not separated by at least a minimum separation delta prior to any reordering.
In some configurations, a combination of the first and second methods may be implemented. For example, the first method may be applied to a first frame following an erased frame, and the second method may be applied to a subsequent frame. In this configuration, one or more of the subsequent frames may be indicated as potentially unstable based on the second method. Other methods of determining potential instability may be based on the energy variation of the impulse response of the synthesis filter, on the LSF vector and/or the energy variation corresponding to different frequency bands of the synthesis filter, on the LSF vector.
When potential instability is not indicated (e.g., when the current frame is stable), weighted value substitution module 1265 provides or passes frame parameter a1263a as frame parameter B1263 to interpolation module 1249. In one example, frame parameter A1263a is a current frame intermediate LSF vectorBased on the current frame end LSF vectorEnd of previous frame LSF vectorAnd the received current frame weight vector wn. When potential instability is not indicated, the current frame intermediate LSF vector may be assumedStabilized and may be provided to interpolation module 1249.
If the current frame is potentially unstable, weight substitution module 1265 applies a substitution weight to generate stable frame parameters (e.g., to substitute the current frame)Intermediate LSF vector). The "stable frame parameters" are parameters that will not cause speech artifacts. The substitute weighting value may be a predetermined value that ensures a stable frame parameter (e.g., frame parameter B1263B). An alternate weighting value may be applied in place of the dequantized weighting vector 1239 (received and/or estimated). More specifically, when the instability indicator 1271 indicates that the current frame is potentially unstable, the weighted value substitution module 1265 applies a substitution weighted value to the dequantized LSF vector 1247 to generate stable frame parameters B1263B. In this case, frame parameter a1263a and/or the dequantized weighting vector 1239 of the current frame may be discarded. Thus, when the current frame is potentially unstable, weight value substitution module 1265 generates frame parameter B1263B that substitutes for frame parameter a1263 a.
For example, the weight substitution module 1265 may apply a substitution weight wsubstituteTo generate (stationary) substitute current frame intermediate LSF vectorsFor example, weight substitution module 1265 may apply a substitution weight to the current frame end LSF vector and the previous frame end LSF vector. In some configurations, the substitute weighting value wsubstituteMay be a scalar value between 0 and 1. For example, the substitute weighting value wsubstituteMay operate as an alternative weighting vector (e.g., having M dimensions), where all values are equal to wsubstituteWherein 0 is not less than wsubstituteLess than or equal to 1 (or less than 0 and less than w)substitute< 1). Thus, a (stable) substitute current frame intermediate LSF vector may be generated or determined according to equation (3)
Using w between 0 and 1substituteEnsuring LSF vectors at the base endAndin the stable case, the obtained substitute current frame intermediate LSF vectorAnd (4) stabilizing. In this case, the substitute current frame intermediate LSF vector is one example of a stable frame parameter, as applying coefficients 1255 corresponding to the substitute current frame intermediate LSF vector to the synthesis filter 1257 will not cause speech artifacts in the decoded speech signal 1259. In some configurations, wsubstituteAlternatively 0.6, which is compared to the LSF vector corresponding to the end of the previous frame of the erased frame (e.g.,) Towards the end of the current frame the LSF vector (e.g.,) A slightly larger weight is given.
In an alternative configuration, the alternative weighting values may include individual weightsAlternative weighting vector wsubstituteWhere i ═ 1, 2.., M }, and n indicates the current frame. In these configurations, each weightBetween 0 and 1, and all weights may be different. In these configurations, an alternate weighting value (e.g., alternate weighting vector w) may be applied as provided in equation (4)substitute)。
In some configurations, the substitute weighting values may be static. In other configurations, weight substitution module 1265 may select a substitution weight based on the previous frame and the current frame. For example, different substitute weighting values may be selected based on the classification (e.g., voiced, unvoiced, etc.) of the two frames (e.g., previous and current frames). Additionally or alternatively, different substitute weighting values may be selected based on one or more LSF differences (e.g., differences in LSF filter impulse response energy) between two frames.
The dequantized LSF vector 1247 and frame parameters B1263B may be provided to an interpolation module 1249. Interpolation module 1249 interpolates dequantized LSF vector 1247 and frame parameters B1263B in order to generate sub-frame LSF vectors (e.g., sub-frame LSF vectors for the current frame)。
In one example, frame parameter B1263 is a current frame intermediate LSF vectorAnd dequantized LSF vector 1247 includes the previous frame end LSF vectorAnd current frame end LSF vectorFor example, interpolation module 1249 can be based onAndusing interpolation factor αkAnd βkAccording to the equationInterpolating subframe LSF vectorsInterpolation factor αkAnd βkCan be a predetermined value such that 0 ≦ (α)kk) Less than or equal to 1. Here, K is an integer number of subframes, where 1 ≦ K ≦ K-1, where K is the total number of subframes in the current frame. Interpolation module 1249 interpolates LSF vectors corresponding to each sub-frame in the current frame accordingly. In some configurations, the LSF vector ends for the current frameαk1 and βk=0。
Interpolation module 1249 provides LSF vector 1251 to inverse coefficient transform 1253. Inverse coefficient transform 1253 transforms LSF vector 1251 into coefficients 1255 (e.g., filter coefficients 1/a (z) for a synthesis filter). The coefficients 1255 are provided to a synthesis filter 1257.
Inverse quantizer B1273 receives and dequantizes encoded excitation signal 1298 to produce excitation signal 1275. In one example, the encoded excitation signal 1298 may include a fixed codebook index, a quantized fixed codebook gain, an adaptive codebook index, and a quantized adaptive codebook gain. In this example, inverse quantizer B1273 looks up a fixed codebook entry (e.g., a vector) based on the fixed codebook index and applies the dequantized fixed codebook gain to the fixed codebook entry to obtain a fixed codebook contribution. Further, inverse quantizer B1273 looks up the adaptive codebook entry based on the adaptive codebook index and applies the dequantized adaptive codebook gain to the adaptive codebook entry to obtain the adaptive codebook contribution. Inverse quantizer B1273 may then sum the fixed codebook contribution and the adaptive codebook contribution to generate excitation signal 1275.
The synthesis filter 1257 filters the excitation signal 1275 according to the coefficients 1255 to produce a decoded speech signal 1259. For example, the poles of the synthesis filter 1257 may be configured according to the coefficients 1255. The excitation signal 1275 is then passed through a synthesis filter 1257 to generate a decoded speech signal 1259 (e.g., a synthesized speech signal).
Fig. 13 is a flow diagram illustrating one configuration of a method 1300 for mitigating potential frame instability. The electronics 1237 can obtain 1302 a frame that follows (e.g., temporally follows) an erased frame. For example, the electronics 1237 can detect erased frames based on one or more of a hash function, a checksum, a repetition code, check bits, a Cyclic Redundancy Check (CRC), and the like. The electronics 1237 can then obtain 1302 a frame following the erased frame. The frame obtained 1302 may be the next frame after the erased frame or may be any number of frames after the erased frame. The obtained 1302 frame may be a correctly received frame.
The electronics 1237 can determine whether the frame is potentially unstable 1304. In some configurations, determining whether the frame is potentially unstable (1304) is based on whether frame parameters (e.g., the current frame intermediate LSF vector) are ordered according to a rule prior to any reordering (e.g., prior to reordering (if any)). Additionally or alternatively, determining whether the frame is potentially unstable (1304) may be based on whether a frame (e.g., a current frame) is within a threshold number of frames from an erased frame. Additionally or alternatively, determining whether the frame is potentially unstable (1304) may be based on whether any frame between the frame (e.g., the current frame) and the erased frame utilizes non-predictive quantization.
In the first method as described above, the electronic device 1237 determines that a frame is potentially unstable if the frame is received within a threshold number of frames after an erased frame and if no frames utilize non-predictive quantization between the frame and the erased frame (if present) (1304). In the second method as described above, the electronic device 1237 obtains the frame parameters (e.g., the current frame) if the current frame is after an erased frameIntermediate LSF vector) A frame is determined to be potentially unstable (1304) if no ordering according to rules has occurred prior to any reordering and if no frame between the current frame and erased frame, if any, utilizes non-predictive quantization. Additional or alternative methods may be used. For example, the first method may be applied to a first frame following an erased frame, and the second method may be applied to a subsequent frame.
Electronic device 1237 may apply the substitute weighting value to generate stable frame parameters if the frame is potentially unstable 1306. For example, electronics 1237 may apply an alternate weighting value to dequantized LSF vector 1247 (e.g., to the current frame end LSF vector)And the end LSF vector of the previous frame) To generate stable frame parameters (e.g., to replace the current frame intermediate LSF vector)). For example, generating the stable frame parameters may include determining a substitute current frame intermediate LSF vector (e.g.,) The substitute current frame intermediate LSF vector is equal to the current frame end LSF vector (e.g.,) And the substitute weighting value (e.g., w)substitute) The product of plus the previous frame end LSF vector (e.g.,) And 1 minus the substitute weighting value (e.g., (1-w)substitute) ) of the difference. Can be as in equation (3) orThis is accomplished as illustrated in equation (4).
FIG. 14 is a flow diagram illustrating a more particular configuration of a method 1400 for mitigating potential frame instability; the electronics 1237 can obtain the current frame 1402. For example, the electronics 1237 can obtain parameters for a period corresponding to the current frame.
Electronic device 1237 can determine whether the current frame is an erased frame (1404). For example, the electronics 1237 can detect erased frames based on one or more of a hash function, a checksum, a repetition code, check bits, a Cyclic Redundancy Check (CRC), and the like.
If the current frame is an erased frame, electronics 1237 may obtain an estimated current frame end LSF vector and an estimated current frame mid LSF vector based on the previous frame (1406). For example, decoder 1208 may use error concealment for erased frames. In error concealment, the decoder 1208 may copy the previous frame end LSF vector and the previous frame middle LSF vector into an estimated current frame LSF vector and an estimated current frame middle LSF vector, respectively. This procedure may be followed for consecutive erased frames.
For example, in the case of two consecutive erased frames, the second erased frame may contain a copy of the end LSF vector from the first erased frame and all interpolated LSF vectors, e.g., the middle LSF vector and the subframe LSF vector. Therefore, the LSF vector in the second erased frame may be substantially the same as the LSF vector in the first erased frame. For example, the first erased frame end LSF vector may be copied from the previous frame. Thus, all LSF vectors in consecutive erased frames can be derived from the last correctly received frame. The last correctly received frame may have a very high probability of being stable. Therefore, the probability of consecutive erased frames having unstable LSF vectors is extremely small. This is essentially because in the case of consecutive erased frames, there may not be interpolation between two distinct LSF vectors. Thus, in some configurations, no substitute weighting values may be applied to consecutively erased frames.
The electronics 1237 can determine a subframe LSF vector for the current frame (1416). For example, electronics 1237 may interpolate a current frame end LSF vector, a current frame middle LSF vector, and a previous frame end LSF vector based on the interpolation factors to generate a subframe LSF vector for the current frame. In some configurations, this may be accomplished according to equation (2).
The electronics 1237 can synthesize the decoded speech signal 1259(1418) for the current frame. For example, the electronics 1237 may pass the excitation signal 1275 through a synthesis filter 1257 specified by coefficients 1255 based on the subframe LSF vector 1251 to generate the decoded speech signal 1259.
If the current frame is not an erased frame, then electronics 1237 can apply the received weighting vector to generate a current frame intermediate LSF vector (1408). For example, the electronic device 1237 may multiply the current frame end LSF vector with the received weighting vector and may multiply the previous frame end LSF vector with 1 minus the received weighting vector. Electronics 1237 may then sum the resulting products to generate a current frame intermediate LSF vector. This may be accomplished as provided in equation (1).
Electronic device 1237 can determine whether the current frame is within a threshold number of frames from the last erased frame (1410). For example, the electronics 1237 can utilize a counter that counts each frame from the erased frame indicator 1267 that indicates an erased frame. The counter may be reset each time an erased frame occurs. The electronics 1237 can determine whether the counter is within a threshold number of frames. The threshold number may be one or more frames. If the current frame is not within the threshold number of frames from the last erased frame, electronics 1237 may determine a subframe LSF vector for the current frame as described above (1416) and synthesize decoded speech signal 1259 (1418). Determining whether the current frame is within a threshold number of frames from the last erased frame (1410) may reduce unnecessary processing of frames while having a low probability of instability (e.g., for frames that occur after one or more potentially unstable frames (whose potential instability has been reduced)).
If the current frame is within a threshold number of frames from the last erased frame, electronics 1237 can determine whether any frames between the current frame and the last erased frame utilize non-predictive quantization (1412). For example, the electronic device 1237 may receive a prediction mode indicator 1281 indicating that each frame utilizes predictive quantization or non-predictive quantization. Electronic device 1237 may utilize prediction mode indicator 1281 to track the prediction mode for each frame. If any frame between the current frame and the last erased frame utilizes non-predictive quantization, electronics 1237 may determine a subframe LSF vector for the current frame as described above (1416) and synthesize decoded speech signal 1259 (1418). Determining whether any frame between the current frame and the last erased frame utilizes non-predictive quantization (1412) may reduce unnecessary processing of the frame while having a low probability of instability (e.g., for frames that occur after the frame that should contain the exact terminal LSF vector because the terminal LSF vector is not quantized based on any previous frame).
If no frame between the current frame and the last erased frame utilizes non-predictive quantization (e.g., if all frames between the current frame and the last erased frame utilize predictive quantization), then electronics 1237 may apply a substitute weighting value to generate a substitute current frame intermediate LSF vector (1414). In this case, electronics 1237 may determine that the current frame is potentially unstable and may apply a substitute weighting value to generate stable frame parameters (e.g., substitute for the current frame intermediate LSF vector). For example, the electronics 1237 can multiply the current frame end LSF vector by the replacement weighting vector and can multiply the previous frame end LSF vector by 1 minus the replacement weighting vector. Electronics 1237 may then sum the resulting products to generate an alternate current frame intermediate LSF vector. This may be accomplished as provided in equation (3) or equation (4).
The electronics 1237 may then determine a subframe LSF vector for the current frame as described above (1416). For example, the electronics 1237 may interpolate the subframe LSF vector based on the current frame end LSF vector, the previous frame end LSF vector, the substitute current frame intermediate LSF vector, and the interpolation factor. This may be accomplished according to equation (2). The electronic device 1237 may also synthesize the decoded speech signal 1259(1418) as described above. For example, the electronics 1237 may pass the excitation signal 1275 through a synthesis filter 1257 specified by the coefficients 1255 based on the subframe LSF vector 1251 (which is based on the substitute current intermediate LSF vector) to generate the decoded speech signal 1259.
Fig. 15 is a flow diagram illustrating another more particular configuration of a method 1500 for mitigating potential frame instability. The electronics 1237 can obtain the current frame 1502. For example, the electronics 1237 can obtain parameters for a period corresponding to the current frame.
The electronics 1237 can determine whether the current frame is an erased frame (1504). For example, the electronics 1237 can detect erased frames based on one or more of a hash function, a checksum, a repetition code, check bits, a Cyclic Redundancy Check (CRC), and the like.
If the current frame is an erased frame, electronics 1237 may obtain an estimated current frame end LSF vector and an estimated current frame mid LSF vector based on the previous frame (1506). This may be accomplished as described above in connection with fig. 14.
Electronics 1237 may determine a subframe LSF vector for the current frame (1516). This may be accomplished as described above in connection with fig. 14. The electronics 1237 may synthesize the decoded speech signal 1259(1518) for the current frame. This may be accomplished as described above in connection with fig. 14.
If the current frame is not an erased frame, then electronics 1237 can apply the received weighting vector to generate a current frame intermediate LSF vector (1508). This may be accomplished as described above in connection with fig. 14.
The electronics 1237 can determine whether any frame between the current frame and the last erased frame utilizes non-predictive quantization (1510). This may be accomplished as described above in connection with fig. 14. If any frame between the current frame and the last erased frame utilizes non-predictive quantization, electronics 1237 may determine a subframe LSF vector for the current frame as described above (1516) and synthesize decoded speech signal 1259 (1518).
If the current frame and the last frame are erasedUnless no frame between frames utilizes non-predictive quantization (e.g., if all frames between the current frame and the last erased frame utilize predictive quantization), electronics 1237 can determine whether the current frame intermediate LSF vectors are ordered according to rules prior to any reordering (1512). For example, electronics 1237 may determine an intermediate LSF vectorWhether each LSF in (a) is in increasing order prior to any reordering and has at least minimal separation between each LSF dimension pair, as described above in connection with fig. 12. If the current frame intermediate LSF vectors are ordered according to a rule prior to any reordering, electronics 1237 may determine a subframe LSF vector for the current frame as described above (1516) and synthesize decoded speech signal 1259 (1518).
If the current frame intermediate LSF vector is not ordered according to a rule prior to any reordering, electronics 1237 may apply an alternate weighting value to generate an alternate current frame intermediate LSF vector (1514). In this case, electronics 1237 may determine that the current frame is potentially unstable and may apply a substitute weighting value to generate stable frame parameters (e.g., substitute for the current frame intermediate LSF vector). This may be accomplished as described above in connection with fig. 14.
The electronics 1237 may then determine a subframe LSF vector for the current frame (1516) and synthesize the decoded speech signal 1259(1518), as described above in connection with fig. 14. For example, the electronics 1237 may pass the excitation signal 1275 through a synthesis filter 1257 specified by the coefficients 1255 based on the subframe LSF vector 1251 (which is based on the substitute current intermediate LSF vector) to generate the decoded speech signal 1259.
Fig. 16 is a flow diagram illustrating another more particular configuration of a method 1600 for mitigating potential frame instability. For example, some configurations of the systems and methods disclosed herein may be applied in two programs: detecting and mitigating potential LSF instability.
The electronics 1237 can receive 1602 the frame after the erased frame. For example, electronics 1237 can detect an erased frame and receive one or more frames subsequent to the erased frame. More specifically, electronics 1237 can receive parameters corresponding to a frame subsequent to the erased frame.
Electronics 1237 may determine whether there is a likelihood that the current frame intermediate LSF vector is unstable. In some implementations, electronic device 1237 may assume that one or more frames following the erased frame are potentially unstable (e.g., that include an intermediate LSF vector that is potentially unstable).
If potential instability is detected, the received weight vector w used by the encoder for interpolation/extrapolation (e.g., transmitted as an index to decoder 1208) may be discardedn. For example, the electronics 1237 (e.g., decoder 1208) can discard the weighting vector.
Electronic device 1237 may apply the replacement weighting values to generate (stable) replacement current frame intermediate LSF vector (1604). For example, decoder 1208 applies a substitute weighting value wsubstituteAs described above in connection with fig. 12.
Instability of the LSF vector may propagate if a subsequent frame (e.g., n +1, n +2, etc.) quantizes the end LSF vector using a predictive quantization technique. Thus, for the current frame and subsequent frames received (1608) before electronics 1237 determine (1606, 1614) to use the non-predictive LSF quantization technique for the frame, decoder 1208 may determine whether the current frame intermediate LSF vectors are ordered according to rules before any reordering (1612). More specifically, electronics 1237 may determine whether the current frame utilizes predictive LSF quantization (1606). If the current frame utilizes predictive LSF quantization, electronics 1237 may determine whether a new frame (e.g., the next frame) was correctly received (1608). If the new frame is received incorrectly (e.g., the new frame is an erased frame), operation may continue with the current frame following the erased frame being received (1602). If the electronic device 1237 determines that the new frame was received correctly (1608), the electronic device 1237 may apply the received weighting vector to generate a current frame intermediate LSF vector (1610). For example,electronics 1237 can use the current weighting vector for the current frame intermediate LSF (which is not initially replaced). Thus, for all subsequent frames (correctly received) prior to using the non-predictive LSF quantization technique, the decoder may apply the received weighting vectors to generate a current frame intermediate LSF vector (1610) and determine whether the current frame intermediate LSF vector is ordered according to rules prior to any reordering (1612). For example, electronics 1237 may apply the weighted vector for intermediate LSF vector interpolation based on the index transmitted from the encoder (1610). Then, electronics 1237 may determine 1612 whether a current frame intermediate LSF vector corresponding to the frame was ordered prior to any reordering such that
If a violation of a rule is detected, the intermediate LSF vector is potentially unstable. For example, if electronics 1237 determines that the intermediate LSF vector corresponding to the frame was not ordered according to the rule prior to any reordering (1612), electronics 1237 thus determines that the LSF dimensions in the intermediate LSF vector are potentially unstable. Decoder 1208 may reduce potential instability by applying the substitute weighting values 1604 as described above.
If the current frame intermediate LSF vectors are ordered according to rules, electronics 1237 may determine whether the current frame utilizes predictive quantization (1614). If the current frame utilizes predictive quantization, electronics 1237 may apply an alternative weighting value 1604, as described above. If electronics 1237 determines that the current frame does not utilize predictive quantization (e.g., the current frame utilizes non-predictive quantization) (1614), electronics 1237 may determine whether the new frame was received correctly (1616). If the new frame is received incorrectly (e.g., if the new frame is an erased frame), operation may continue with the current frame following the erased frame being received (1602).
If the current frame utilizes non-predictive quantization and if electronics 1237 determines that the new frame was received correctly 1616, decoder 1208 continues to use for regular operationThe received weight vector in the mode operates normally. In other words, electronics 1237 may apply the received weighting vector for intermediate LSF vector interpolation for each correctly received frame based on the index transmitted from the encoder (1618). In particular, electronics 1237 may apply the received weighting vector for each subsequent frame (e.g., n + n) based on the index received from the encodernp+1、n+nnp+2, etc., wherein nnpFrame numbering using non-predictive quantization) (1618) until an erased frame appears.
The systems and methods disclosed herein may be implemented in a decoder 1208. In some configurations, additional bits need not be transmitted from the encoder to the decoder 1208 to enable detection and mitigation of potential frame instability. Furthermore, the systems and methods disclosed herein do not degrade quality in clean channel conditions.
Fig. 17 is a graph illustrating an example of synthesizing a speech signal. The horizontal axis of the graph is illustrated in time 1701 (e.g., seconds) and the vertical axis of the graph is illustrated in amplitude 1733 (e.g., number, value). Amplitude 1733 may be a number expressed in bits. In some configurations, 16 bits may be utilized to represent samples of a speech signal having a range of values between-32768 to 32767, which corresponds to one range (e.g., values between-1 and +1 in floating point). It should be noted that the amplitude 1733 may be represented differently based on implementation. In some examples, the value of the amplitude 1733 may correspond to an electromagnetic signal characterized by a voltage (in volts) and/or a current (in amps).
The systems and methods disclosed herein may be implemented to generate a synthesized speech signal as given in FIG. 17. In other words, FIG. 17 is a graph illustrating one example of a synthesized speech signal resulting from application of the systems and methods disclosed herein. Corresponding waveforms without application of the systems and methods disclosed herein are shown in fig. 11. As can be observed, the systems and methods disclosed herein provide artifact reduction 1777. In other words, by applying the systems and methods disclosed herein, artifacts 1135 illustrated in figure 11 are reduced or removed, as illustrated in figure 17.
Fig. 18 is a block diagram illustrating one configuration of a wireless communication device 1837 in which systems and methods for reducing potential frame instability may be implemented. The wireless communication device 1837 illustrated in fig. 18 may be an example of at least one of the electronic devices described herein. The wireless communication device 1837 may include an application processor 1893. The application processor 1893 typically processes instructions (e.g., runs programs) to perform functions on the wireless communication device 1837. The applications processor 1893 may be coupled to an audio coder/decoder (codec) 1891.
The audio codec 1891 may be used to code and/or decode an audio signal. The audio codec 1891 may be coupled to at least one speaker 1883, an earpiece 1885, an output jack 1887, and/or at least one microphone 1889. The speaker 1883 may include one or more electro-acoustic transducers that convert electrical or electronic signals into acoustic wave signals. For example, the speaker 1883 may be used to play music or output speakerphone conversations or the like. The earpiece 1885 may be another speaker or an electro-acoustic transducer that may be used to output a sound wave signal (e.g., a voice signal) to a user. For example, the earpiece 1885 may be used so that only one user may reliably hear the acoustic signal. Output jack 1887 may be used to couple other devices, such as headphones, to wireless communication device 1837 for outputting audio. The speaker 1883, earpiece 1885, and/or output jack 1887 may generally be used to output audio signals from the audio codec 1891. The at least one microphone 1889 may be an acoustic-to-electrical converter that converts acoustic signals (e.g., a user's voice) into electrical or electronic signals that are provided to the audio codec 1891.
The audio codec 1891 (e.g., a decoder) may include a frame parameter determination module 1861, a stability determination module 1869, and/or a weight replacement module 1865. Frame parameter determination module 1861, stability determination module 1869, and/or weight replacement module 1865 may function as described above in connection with fig. 12.
An applications processor 1893 may also be coupled to the power management circuitry 1804. One example of the power management circuit 1804 is a Power Management Integrated Circuit (PMIC), which may be used to manage power consumption of the wireless communication device 1837. The power management circuitry 1804 may be coupled to the battery pack 1806. The battery pack 1806 may generally provide power to the wireless communication device 1837. For example, the battery pack 1806 and/or the power management circuitry 1804 may be coupled to at least one of the elements included in the wireless communication device 1837.
The application processor 1893 may be coupled to at least one input device 1808 for receiving input. Examples of input devices 1808 include infrared sensors, image sensors, accelerometers, touch sensors, keypads, and the like. The input devices 1808 may allow a user to interact with the wireless communication device 1837. The applications processor 1893 may also be coupled to one or more output devices 1810. Examples of output devices 1810 include printers, projectors, screens, haptic devices, and the like. The output devices 1810 may allow the wireless communication device 1837 to produce output that may be experienced by a user.
An application processor 1893 may be coupled to the application memory 1812. The application memory 1812 may be any electronic device capable of storing electronic information. Examples of application memory 1812 include double data rate synchronous dynamic random access memory (DDRAM), Synchronous Dynamic Random Access Memory (SDRAM), flash memory, and the like. The application memory 1812 may provide storage for the application processor 1893. For example, the application memory 1812 may store data and/or instructions for causing a program running on the application processor 1893 to function.
The application processor 1893 may be coupled to a display controller 1814, which display controller 1814 may in turn be coupled to a display 1816. The display controller 1814 may be a hardware block used to generate images on the display 1816. For example, the display controller 1814 may translate instructions and/or data from the application processor 1893 into images that may be presented on the display 1816. Examples of the display 1816 include a Liquid Crystal Display (LCD) panel, a Light Emitting Diode (LED) panel, a Cathode Ray Tube (CRT) display, a plasma display, and the like.
An applications processor 1893 may be coupled to the baseband processor 1895. The baseband processor 1895 generally processes communication signals. For example, the baseband processor 1895 may demodulate and/or decode the received signal. Additionally or alternatively, the baseband processor 1895 may encode and/or modulate signals in preparation for transmission.
The baseband processor 1895 may be coupled to a baseband memory 1818. The baseband memory 1818 may be any electronic device capable of storing electronic information, such as SDRAM, DDRAM, flash memory, or the like. The baseband processor 1895 may read information (e.g., instructions and/or data) from the baseband memory 1818 and/or write information to the baseband memory 1818. Additionally or alternatively, the baseband processor 1895 may perform communication operations using instructions and/or data stored in the baseband memory 1818.
The baseband processor 1895 may be coupled to a Radio Frequency (RF) transceiver 1897. RF transceiver 1897 may be coupled to a power amplifier 1899 and one or more antennas 1802. The RF transceiver 1897 may transmit and/or receive radio frequency signals. For example, RF transceiver 1897 may transmit RF signals using power amplifier 1899 and at least one antenna 1802. RF transceiver 1897 may also receive RF signals using the one or more antennas 1802. It should be noted that one or more of the elements included in the wireless communication device 1837 may be coupled to a general purpose bus that may enable communication between the elements.
Fig. 19 illustrates various components that may be used in an electronic device 1937. The illustrated components may be located within the same physical structure or in separate housings or structures. The electronic device 1937 described in connection with fig. 19 may be implemented in accordance with one or more of the devices described herein. The electronics 1937 include a processor 1926. The processor 1926 may be a general purpose single-or multi-chip microprocessor (e.g., an ARM), a special purpose microprocessor (e.g., a Digital Signal Processor (DSP)), a microcontroller, a programmable gate array, or the like. Processor 1926 may be referred to as a Central Processing Unit (CPU). Although only a single processor 1926 is shown in the electronic device 1937 of FIG. 19, in alternative configurations, a combination of processors (e.g., an ARM and DSP) may be used.
The electronics 1937 also include memory 1920 in electronic communication with the processor 1926. That is, processor 1926 may read information from memory 1920 and/or write information to memory 1920. The memory 1920 may be any electronic component capable of storing electronic information. The memory 1920 may be Random Access Memory (RAM), Read Only Memory (ROM), magnetic disk storage media, optical storage media, flash memory devices in RAM, on-board memory included with the processor, Programmable Read Only Memory (PROM), Erasable Programmable Read Only Memory (EPROM), electrically erasable PROM (eeprom), registers, and the like, including combinations thereof.
Data 1924a and instructions 1922a may be stored in memory 1920. The instructions 1922a may include one or more programs, routines, subroutines, functions, programs (programs), and the like. The instructions 1922a may comprise a single computer-readable statement or many computer-readable statements. The instructions 1922a are executable by the processor 1926 to implement one or more of the methods, functions, and procedures described above. Executing the instructions 1922a may involve the use of data 1924a stored in memory 1920. FIG. 19 shows some instructions 1922b and data 1924b (which may be from instructions 1922a and data 1924a) loaded in a processor 1926.
The electronic device 1937 may also include one or more communication interfaces 1930 for communicating with other electronic devices. Communication interface 1930 may be based on wired communication techniques, wireless communication techniques, or both. Examples of different types of communication interfaces 1930 include a serial port, a parallel port, a Universal Serial Bus (USB), an ethernet adapter, an IEEE 1394 bus interface, a Small Computer System Interface (SCSI) bus interface, an Infrared (IR) communication port, a bluetooth wireless communication adapter, and so forth.
The electronics 1937 may also include one or more input devices 1932 and one or more output devices 1936. Examples of different kinds of input devices 1932 include a keyboard, mouse, microphone, remote control device, button, joystick, trackball, touchpad, light pen, and the like. For example, the electronics 1937 may include one or more microphones 1934 for capturing acoustic wave signals. In one configuration, the microphone 1934 can be a converter that converts acoustic wave signals (e.g., voice, speech) into electrical or electronic signals. Examples of different kinds of output devices 1936 include speakers, printers, and so forth. For example, the electronics 1937 may include one or more speakers 1938. In one configuration, the speaker 1938 can be a transducer that converts electrical or electronic signals into acoustic signals. One particular type of output device that may be commonly included in electronic devices 1937 is a display device 1940. Display device 1940 used with the configurations disclosed herein may utilize any suitable image projection technology, such as Cathode Ray Tube (CRT), Liquid Crystal Display (LCD), Light Emitting Diode (LED), gas plasma, electroluminescence, or the like. A display controller 1942 may also be provided for converting data stored in memory 1920 into text, graphics, and/or moving images (where appropriate) shown on the display device 1940.
Various components of the electronic device 1937 may be coupled together by one or more buses, which may include a power bus, a control signal bus, a status signal bus, a data bus, and so forth. For simplicity, the various buses are illustrated in FIG. 19 as bus system 1928. It should be noted that fig. 19 illustrates only one possible configuration of electronic device 1937. Various other architectures and components may be utilized.
In the above description, reference numerals have sometimes been used in conjunction with various terms. Where a term is used in conjunction with a reference number, this may be intended to refer to a particular element shown in one or more of the figures. Where a term is used without a reference number, this may be intended to refer broadly to that term without limitation to any particular figure.
The term "determining" encompasses a variety of actions, and thus "determining" can include calculating, processing, deriving, studying, looking up (e.g., looking up in a table, a database, or another data structure), determining, and the like. Also, "determining" can include receiving (e.g., receiving information), accessing (e.g., accessing data in a memory), and the like. Also, "determining" may include resolving, selecting, choosing, establishing, and the like.
The phrase "based on" does not mean "based only on," unless expressly specified otherwise. In other words, the phrase "based on" describes that "is based only on" and "is based at least on" both.
It should be noted that, where compatible, one or more of the features, functions, programs, components, elements, structures, etc., described in connection with any of the configurations described herein may be combined with one or more of the functions, programs, components, elements, structures, etc., described in connection with any of the other configurations described herein. In other words, any compatible combination of functions, procedures, components, elements, etc., described herein can be implemented in accordance with the systems and methods disclosed herein.
The functions described herein may be stored as one or more instructions on a processor-readable or computer-readable medium. The term "computer-readable medium" refers to any available medium that can be accessed by a computer or a processor. By way of example, and not limitation, such media can comprise RAM, ROM, EEPROM, flash memory, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk andoptical disks, in which magnetic disks usually reproduce data magnetically, and optical disks reproduce data optically with lasers. It should be noted that computer-readable media may be tangible and non-transitory. The term "computer program product" refers to a computing device or processor in combination with code or instructions (e.g., a "program") that may be executed, processed, or computed by the computing device or processor. As used herein, the term "code" may refer to software, instructions, code or data that is executable by a computing device or processor.
Software or instructions may also be transmitted over a transmission medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of transmission medium.
The methods disclosed herein comprise one or more steps or actions for achieving the described method. The method steps and/or actions may be interchanged with one another without departing from the scope of the claims. In other words, unless a specific order of steps or actions is required for proper operation of the described method, the order and/or use of specific steps and/or actions may be modified without departing from the scope of the claims.
It is to be understood that the claims are not limited to the precise configuration and components described above. Various modifications, changes, and variations may be made in the arrangement, operation, and details of the systems, methods, and apparatus described herein without departing from the scope of the claims.

Claims (36)

1. A method for mitigating potential frame instability by an electronic device, comprising:
obtaining a first frame of the speech signal temporally subsequent to the erased frame, wherein the first frame is a correctly received frame;
generating a line spectral frequency vector at the end of the previous frame using frame erasure concealment;
applying the received weighting vector to a first frame end line spectral frequency vector and the previous frame end line spectral frequency vector to generate a first frame mid line spectral frequency vector, wherein the received weighting vector corresponds to the first frame and is received from an encoder;
determining whether the first frame is potentially unstable; and
applying a substitute weighting value instead of the received weighting vector to the first frame end line spectral frequency vector and the previous frame end line spectral frequency vector to generate a stable frame parameter in response to determining that the first frame is potentially unstable, wherein the stable frame parameter is a mid-line spectral frequency vector between the first frame end line spectral frequency vector and the previous frame end line spectral frequency vector; and
synthesizing a decoded speech signal based on the stable frame parameters.
2. The method of claim 1, further comprising interpolating a plurality of subframe line spectral frequency vectors based on the intermediate line spectral frequency vector.
3. The method of claim 1, further comprising:
receiving an encoded excitation signal; and
dequantizing the encoded excitation signal to generate an excitation signal, wherein synthesizing the decoded speech signal comprises filtering the excitation signal based on the stable frame parameters.
4. The method of claim 1, wherein the substitute weighting value is between 0 and 1.
5. The method of claim 1, wherein generating the stable frame parameters comprises determining the mid-line spectral frequency vector that is equal to a product of a current frame end line spectral frequency vector and the substitute weighting value plus a product of the previous frame end line spectral frequency vector and a difference of 1 minus the substitute weighting value.
6. The method of claim 1, wherein the substitute weighting value is selected based on at least one of a classification of two frames and a line spectral frequency difference between the two frames.
7. The method of claim 1, wherein determining whether the first frame is potentially unstable is based on whether first frame mid-line spectral frequencies are ordered according to a rule prior to any reordering.
8. The method of claim 1, wherein determining whether the first frame is potentially unstable is based on whether the frame is within a threshold number of frames after the erased frame.
9. The method of claim 1, wherein determining whether the first frame is potentially unstable is based on whether any frame between the first frame and the erased frame utilizes non-predictive quantization.
10. An electronic device for mitigating potential frame instability, comprising:
a decoder circuit configured to generate a previous frame end line spectral frequency vector using frame erasure concealment;
a frame parameter determination circuit configured to obtain a first frame of a speech signal that temporally follows an erased frame, wherein the first frame is a correctly received frame, and configured to apply a received weighting vector to a first frame end line spectral frequency vector and the previous frame end line spectral frequency vector to generate a first frame mid line spectral frequency vector, wherein the received weighting vector corresponds to the first frame and is received from an encoder;
a stability determination circuit coupled to the frame parameter determination circuit, wherein the stability determination circuit is configured to determine whether the first frame is potentially unstable; and
a weight substitution circuit coupled to the stability determination circuit, wherein the weight substitution circuit is configured to apply a substitution weight instead of the received weighting vector to the first frame end line spectral frequency vector and the previous frame end line spectral frequency vector to generate a stable frame parameter in response to determining that the first frame is potentially unstable, wherein the stable frame parameter is a mid line spectral frequency vector between the first frame end line spectral frequency vector and the previous frame end line spectral frequency vector; and
a synthesis filter configured to synthesize a decoded speech signal based on the stable frame parameters.
11. The electronic device of claim 10, further comprising interpolation circuitry configured to interpolate a plurality of sub-frame line spectral frequency vectors based on the intermediate line spectral frequency vector.
12. The electronic device of claim 10, further comprising an inverse quantizer circuit configured to receive and dequantize an encoded excitation signal to generate an excitation signal, wherein the synthesis filter is configured to synthesize a decoded speech signal by filtering the excitation signal based on the stable frame parameters.
13. The electronic device of claim 10, wherein the substitute weighting value is between 0 and 1.
14. The electronic device of claim 10, wherein the weight substitute circuit is configured to determine the intermediate line spectral frequency vector equal to a product of the first frame end line spectral frequency vector and the substitute weight plus a product of the previous frame end line spectral frequency vector and a difference of 1 minus the substitute weight.
15. The electronic device of claim 10, wherein weight substitution circuit is configured to select the substitution weight based on at least one of a classification of two frames and a line spectral frequency difference between the two frames.
16. The electronic device of claim 10, wherein the stability determination circuit is configured to determine whether the first frame is potentially unstable based on whether first frame mid line spectral frequencies are ordered according to a rule before any reordering.
17. The electronic device of claim 10, wherein the stability determination circuit is configured to determine whether the first frame is potentially unstable based on whether the first frame is within a threshold number of frames after the erased frame.
18. The electronic device of claim 10, wherein the stability determination circuit is configured to determine whether the first frame is potentially unstable based on whether any frame between the frame and the erased frame utilizes non-predictive quantization.
19. A computer-program product for mitigating potential frame instability, comprising a non-transitory tangible computer-readable medium having instructions thereon, the instructions comprising:
code for causing an electronic device to obtain a first frame of a speech signal that temporally follows an erased frame, wherein the first frame is a correctly received frame;
code for causing the electronic device to generate an erased previous frame end line spectral frequency vector using frame erasure concealment;
code for causing the electronic device to apply the received weighting vector to a first frame end line spectral frequency vector and the previous frame end line spectral frequency vector to generate a first frame mid line spectral frequency vector, wherein the received weighting vector corresponds to the first frame and is received from an encoder;
code for causing the electronic device to determine whether the frame is potentially unstable; and
code for causing the electronic device to apply a substitute weighting value, instead of the received weighting vector, to the first frame end line spectral frequency vector and the previous frame end line spectral frequency vector in response to determining that the first frame is potentially unstable to generate a stable frame parameter, wherein the stable frame parameter is a mid-line spectral frequency vector between the first frame end line spectral frequency vector and the previous frame end line spectral frequency vector; and
code for causing the electronic device to synthesize a decoded speech signal based on the stable frame parameters.
20. The computer-program product of claim 19, further comprising code for causing the electronic device to interpolate a plurality of subframe line spectral frequency vectors based on the intermediate line spectral frequency vector.
21. The computer-program product of claim 19, further comprising code for causing the electronic device to receive an encoded excitation signal; and
code for causing the electronic device to dequantize the encoded excitation signal to generate an excitation signal, wherein code for causing the electronic device to synthesize a decoded speech signal comprises code for causing the electronic device to filter the excitation signal based on the stable frame parameters.
22. The computer program product of claim 19, wherein the substitute weighting value is between 0 and 1.
23. The computer program product of claim 19, wherein generating the stable frame parameters comprises determining the intermediate line spectral frequency vector equal to a product of a first frame end line spectral frequency vector and the substitute weighting value plus a product of the previous frame end line spectral frequency vector and a difference of 1 minus the substitute weighting value.
24. The computer program product of claim 19, wherein the substitute weighting value is selected based on at least one of a classification of two frames and a line spectral frequency difference between the two frames.
25. The computer program product of claim 19, wherein determining whether the first frame is potentially unstable is based on whether first frame mid line spectral frequencies are ordered according to a rule prior to any reordering.
26. The computer program product of claim 19, wherein determining whether the first frame is potentially unstable is based on whether the first frame is within a threshold number of frames after the erased frame.
27. The computer program product of claim 19, wherein determining whether the first frame is potentially unstable is based on whether any frame between the first frame and the erased frame utilizes non-predictive quantization.
28. An apparatus for mitigating potential frame instability, comprising:
means for obtaining a first frame of the speech signal temporally subsequent to the erased frame, wherein the first frame is a correctly received frame;
means for generating a previous frame end line spectral frequency vector using frame erasure concealment;
means for applying the received weighting vector to a first frame end line spectral frequency vector and the previous frame end line spectral frequency vector to generate a first frame mid line spectral frequency vector, wherein the received weighting vector corresponds to the first frame and is received from an encoder;
means for determining whether the first frame is potentially unstable; and
means for applying a substitute weighting value instead of the received weighting vector to the first frame end line spectral frequency vector and the previous frame end line spectral frequency vector to generate a stable frame parameter in response to determining that the first frame is potentially unstable, wherein the stable frame parameter is a mid-line spectral frequency vector between the first frame end line spectral frequency vector and the previous frame end line spectral frequency vector; and
means for synthesizing a decoded speech signal based on the stable frame parameters.
29. The apparatus of claim 28, further comprising means for interpolating a plurality of subframe line spectral frequency vectors based on the intermediate line spectral frequency vector.
30. The apparatus of claim 28, further comprising:
means for receiving an encoded excitation signal; and
means for dequantizing the encoded excitation signal to generate an excitation signal, wherein means for synthesizing a decoded speech signal comprises means for filtering the excitation signal based on the stable frame parameters.
31. The apparatus of claim 28, wherein the substitute weighting value is between 0 and 1.
32. The apparatus of claim 28, wherein generating the stable frame parameters comprises determining the intermediate line spectral frequency vector equal to a product of the first frame end line spectral frequency vector and the substitute weighting value plus a product of the previous frame end line spectral frequency vector and a difference of 1 minus the substitute weighting value.
33. The apparatus of claim 28, wherein the substitute weighting value is selected based on at least one of a classification of two frames and a line spectral frequency difference between the two frames.
34. The apparatus of claim 28, wherein determining whether the first frame is potentially unstable is based on whether first frame mid line spectral frequencies are ordered according to a rule prior to any reordering.
35. The apparatus of claim 28, wherein determining whether the first frame is potentially unstable is based on whether the first frame is within a threshold number of frames after the erased frame.
36. The apparatus of claim 28, wherein determining whether the first frame is potentially unstable is based on whether any frame between the first frame and the erased frame utilizes non-predictive quantization.
HK15112648.4A 2013-02-21 2013-09-03 Systems and methods for mitigating potential frame instability HK1212087B (en)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US201361767431P 2013-02-21 2013-02-21
US61/767,431 2013-02-21
US14/016,004 US9842598B2 (en) 2013-02-21 2013-08-30 Systems and methods for mitigating potential frame instability
US14/016,004 2013-08-30
PCT/US2013/057873 WO2014130087A1 (en) 2013-02-21 2013-09-03 Systems and methods for mitigating potential frame instability

Publications (2)

Publication Number Publication Date
HK1212087A1 HK1212087A1 (en) 2016-06-03
HK1212087B true HK1212087B (en) 2018-12-14

Family

ID=

Similar Documents

Publication Publication Date Title
CN104995674B (en) Systems and methods for reducing potential frame instability
KR20160025029A (en) Unvoiced/voiced decision for speech processing
JP2017161917A (en) System and method for controlling average coding rate
US9208775B2 (en) Systems and methods for determining pitch pulse period signal boundaries
AU2013378790B2 (en) Systems and methods for determining an interpolation factor set
HK1212087B (en) Systems and methods for mitigating potential frame instability
HK1212500B (en) Systems and methods for determining an interpolation factor set