CN105957533B - Voice compression method, voice decompression method, audio encoder and audio decoder - Google Patents
Voice compression method, voice decompression method, audio encoder and audio decoder Download PDFInfo
- Publication number
- CN105957533B CN105957533B CN201610260757.3A CN201610260757A CN105957533B CN 105957533 B CN105957533 B CN 105957533B CN 201610260757 A CN201610260757 A CN 201610260757A CN 105957533 B CN105957533 B CN 105957533B
- Authority
- CN
- China
- Prior art keywords
- bit
- frequency domain
- bit allocation
- quantization
- bits
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 49
- 238000007906 compression Methods 0.000 title claims abstract description 47
- 230000006835 compression Effects 0.000 title claims abstract description 47
- 230000006837 decompression Effects 0.000 title claims abstract description 13
- 238000013139 quantization Methods 0.000 claims abstract description 85
- 230000003595 spectral effect Effects 0.000 claims abstract description 66
- 239000013598 vector Substances 0.000 claims abstract description 55
- 238000004458 analytical method Methods 0.000 claims abstract description 6
- 238000004364 calculation method Methods 0.000 claims description 23
- 238000001228 spectrum Methods 0.000 claims description 19
- 230000009466 transformation Effects 0.000 claims description 17
- 230000005540 biological transmission Effects 0.000 claims description 13
- 238000013144 data compression Methods 0.000 claims description 8
- 238000010606 normalization Methods 0.000 claims description 8
- 238000005070 sampling Methods 0.000 claims description 8
- 238000001914 filtration Methods 0.000 claims description 7
- 238000006243 chemical reaction Methods 0.000 claims description 4
- VOOFUNKBLIGEBY-AQRCPPRCSA-N (2s)-2-[[(2s)-2-[[(2s)-2-[[(2s)-5-amino-2-[[(2s)-5-amino-2-[[(2s)-2-amino-4-methylpentanoyl]amino]-5-oxopentanoyl]amino]-5-oxopentanoyl]amino]-4-methylpentanoyl]amino]-4-methylpentanoyl]amino]-3-phenylpropanoic acid Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC(C)C)C(=O)N[C@H](C(O)=O)CC1=CC=CC=C1 VOOFUNKBLIGEBY-AQRCPPRCSA-N 0.000 claims 1
- 238000007476 Maximum Likelihood Methods 0.000 abstract 1
- 238000007670 refining Methods 0.000 abstract 1
- 230000001131 transforming effect Effects 0.000 abstract 1
- 238000010586 diagram Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 5
- 210000005069 ears Anatomy 0.000 description 4
- 230000035945 sensitivity Effects 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 239000000969 carrier Substances 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000008054 signal transmission Effects 0.000 description 1
Images
Classifications
- 
        - G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G10L19/032—Quantisation or dequantisation of spectral components
- G10L19/038—Vector quantisation, e.g. TwinVQ audio
 
Landscapes
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The invention discloses a voice compression method, a voice decompression method, an audio encoder and an audio decoder, wherein MLT (maximum likelihood transform) is used for transforming a time domain signal into a frequency domain signal, an RMS (root mean square) weight analysis method is used for refining quantization grading of the frequency domain signal, and methods such as vector quantization, Huffman coding and the like are used for compressing quantization parameters (quantization weight and bit distribution number) and frequency domain data respectively so as to improve the compression ratio to the maximum extent by ensuring approximately lossless spectral characteristics.
    Description
Technical Field
      The invention belongs to the field of wireless voice signal compression, and particularly relates to a voice compression method and a decompression method based on MLT (multi-level linear transformation) and vector entropy coding, an audio encoder and an audio decoder.
    Background
      The voice signal compression is to save hardware memory space and facilitate storage and transmission. The wireless digital voice system is different from a common wired audio system, and utilizes air bandwidth to transmit voice signals without using wires as signal transmission carriers, thereby facilitating the actual use experience of users.
      The wireless digital audio system based on the embedded technology more effectively combines the embedded technology, the audio coding and decoding technology and the wireless transmission technology together, and has the characteristics of small volume, convenient carrying, high function specialization, lower cost, high stability, good real-time performance and the like. But are limited in bandwidth, delay, and power consumption. Compression algorithms applied to wireless voice transmission are therefore required to have characteristics of high pitch, high quality and compression ratio, low delay, and low computational complexity at the same time.
      The sound quality of the current frequency domain compression coding Bluetooth SBC voice algorithm is lower, and the time domain compression algorithms ADPCM, G711 and the like generally have lower compression ratios. Therefore, it is very meaningful to design a high compression ratio, low delay and low computation complexity for wireless transmission to implement higher-quality speech codec and apply it in a wireless audio system based on embedded technology.
      The voice data compression utilizes the redundancy of voice signals and the unique perceptibility of the human auditory system, the redundancy of the voice signals is mainly expressed in time domain redundancy and frequency domain redundancy 2, and the currently known voice compression methods can be divided into two types according to coding modes. The first type is: time domain compression, which is performed by the coder of the type by analyzing the correlation of the speech data in the time domain; the second type is: frequency domain compression, which is a type of encoder that compresses speech data by analyzing correlations across the frequency domain.
      The first type of compression method mainly adopts time domain redundancy for eliminating voice signals for compression, and sets the quantization level of an adaptive quantizer and updates the predicted value of the next data by calculating the difference value between audio data and the predicted value. The time domain prediction method is difficult to improve the subjective tone quality level under the condition of ensuring a certain compression ratio, so the time domain prediction method has the characteristics of low delay, low computation, medium tone quality and low compression ratio. The mainstream time domain prediction methods include ADPCM, G711 and the like, and the compression ratio is generally between 2:1 and 4: 1.
      The second type of compression method mainly adopts the method of eliminating the frequency domain redundancy of the voice signal for compression, generally adopts the method of combining a transform domain with a psychoacoustic model, transforms time domain voice data into frequency domain data through the transform domain, then carries out hierarchical quantization on the frequency domain signal of the voice data according to the auditory characteristics of human ears through the psychoacoustic model, carries out less quantization on the frequency domain part with high auditory sensitivity of human ears, keeps higher precision, carries out more quantization on the frequency domain part with low auditory sensitivity of human ears and keeps less precision. Due to the analysis of the psychoacoustic model, the transform domain method can compress the audio data stream to the maximum extent under the condition of ensuring the subjective feeling of human ears, so the transform domain method has the characteristics of high delay, high complexity, high sound quality and low code stream. The mainstream transform domain method includes subband coding implemented by a cosine modulation filter bank, such as SBC (generally, the sound quality is about 5:1, and the compression ratio is only about 1), and coding implemented by Modified Discrete Cosine Transform (MDCT), such as CELT, SPEEX, and the like (the sound quality is high, but the delay needs 50ms to 100 ms).
      Because of the high tone quality, high compression ratio, low delay and low computational complexity required by the voice code stream based on wireless voice transmission, the domain predictive coding of the mainstream in the mainstream first-type encoder can not meet the requirements due to the low compression ratio and tone quality; the mainstream transform domain coding of the second type of encoder cannot meet the requirement of wireless transmission because of high delay and high computation amount.
    Disclosure of Invention
      In view of the problems in the prior art, an object of the present invention is to provide a speech compression method based on MLT transform and vector entropy coding, which can simultaneously and effectively satisfy the requirements of high sound quality, low delay, high compression ratio and low complex computation of wireless speech transmission. Another object of the present invention is to provide a speech decompression method based on MLT transform and vector entropy coding.
      In order to achieve the above object, the speech compression method based on MLT transform and vector entropy coding of the present invention specifically comprises:
      1) MLT frequency domain transformation: converting a time domain digital voice signal collected by a digital microphone into a frequency domain spectral coefficient;
      2) RMS quantization weight calculation: the frequency domain spectral coefficient is the root mean square RMS of the grouped calculation signals, and the weight of the frequency domain component is calculated through the grouped root mean square;
      3) optimal grouping bit allocation: obtaining an optimal grouping bit according to the grouping signal frequency domain component weight and the set bit rate parameter;
      4) carrying out vector quantization on the grouped frequency domain voice signals to generate grouped vector quantization coefficients;
      5) and carrying out Huffman coding on the grouped vector quantization coefficients to complete data compression.
      Further, the step 1) adopts modulation aliasing transformation, converts the PCM time domain audio data of the short time frame into MLT frequency domain spectral coefficients through MLT transformation, and groups the MLT frequency domain spectral coefficients according to frequency domain correlation.
      Further, the PCM time domain audio data is firstly subjected to 50% data overlapping and mixing processing, then subjected to anti-aliasing filtering to prevent spectrum overflow, and then subjected to DCT-IV conversion to convert the time domain data into frequency domain spectral coefficients.
      Further, the formula of the MLT frequency domain transform is as follows:
      
      
      further, in step 2), the quantization weight is calculated by the frequency domain spectral coefficient after time-frequency conversion through root mean square RMS, and the RMS calculation formula is as follows:
      
      calculate the quantization weight value for each set of RMS values:
      
      further, in the step 3), the optimal grouping bit calculation method includes: and calculating the maximum bit and the minimum bit according to the quantization weight, and optimizing the grouping bits according to the bit rate parameters to ensure that the optimized bits meet the requirements of each grouping spectral coefficient under the bit limit.
      Further, according to the quantization weight value, calculating each group of bit distribution coefficients:
      category(r)=MAX{0,MIN{7,(offset-rms_index(r)/2)}};
      (0≤r≤number_of_regions;-32≤offset≤31);
      calculating the bit number required by the prediction quantization according to the bit distribution parameter:
      
      then, the number of available bits is calculated from the set bit rate parameter:
      estimated_number_of_available_bits=320+((number_of_available_bits.320)*5/8);
      and adjusting the bit distribution parameters of each group to obtain the maximization of each group of available bits within the range of the available bits, and determining the optimal grouping bits.
      Further, the processing procedures of the step 4) and the step 5) are as follows:
      A) dividing the frequency domain spectral coefficient into sign bit and intensity, and calculating the normalization index of each group of intensity:
      k(i)=MIN{(x*magnitude of(mlt(20r+i))+deadzone_rounding),kmax}
      ((0<i<20;x=1/(stepsize*(magnitude_of_rms(r)););
      B) the normalized indexes are grouped into a vector group bit stream:
      
      
      C) and performing Huffman coding on each group of vector groups and symbol bit groups to form a compressed bit stream.
      A speech decompression method based on MLT transformation and vector entropy coding aiming at the speech compression method adopts inverse vector quantization and inverse MLT to decompress the speech after data compression, and specifically comprises the following steps:
      1) analyzing and performing Haffman decoding on the compressed bit stream to obtain a vector group and a symbol bit group;
      2) carrying out inverse normalization operation on the vector group to obtain the intensity of the frequency domain spectral coefficient and a corresponding sign bit to obtain the frequency domain spectral coefficient;
      3) and performing inverse modulation aliasing transformation IMLT on the frequency domain spectral coefficient to acquire time domain voice data and finish decoding.
      Further, in the step 1), the code stream data after being coded and compressed is analyzed, and time domain PCM stream information of a sampling rate, a bit rate and a time division frame length is obtained.
      Further, the inverse normalization operation formula in step 2) is as follows:
      
      
      further, the IMLT transformation formula in step 3) is as follows:
      
      
      
      
      An audio encoder implementing the voice compression method comprises an MLT frequency domain converter, an RMS quantization weight calculator, an optimal grouping bit position distributor and a Huffman encoder, wherein a time domain signal is converted into a frequency domain signal through the MLT converter, the RMS quantization weight calculator is adopted to refine the quantization grade of the frequency domain signal, the optimal grouping bit position distributor and the Huffman encoder are adopted to respectively compress quantization parameters and frequency domain data, and the voice data compression ratio is improved to the maximum extent under the condition of ensuring approximately lossless spectral characteristics.
      An audio decoder implementing the above speech decompression method comprises a code stream analyzer, a huffman decoder, an inverse vector quantizer, and an inverse MLT transform filter, wherein:
      reading code stream data subjected to coding compression in a code stream analyzer for analysis, and acquiring time domain PCM stream information such as a sampling rate, a bit rate, a time division frame length and the like;
      decoding and obtaining RMS weight, bit distribution parameter and quantized MLT frequency domain spectrum vector in a Huffman decoder;
      in an inverse vector quantizer, performing inverse quantization operation on the quantized MLT frequency domain spectrum vector by using RMS weight and bit allocation parameters to obtain an MLT frequency domain spectrum coefficient;
      in an inverse MLT transform filter, performing inverse MLT transform filtering on the MLT frequency domain spectral coefficients to obtain time domain PCM data;
      and controlling PCM data through PCM stream information analyzed by the code stream, and reconstructing and integrating the PCM voice code stream.
      The invention has the following beneficial effects: the method realizes high compression ratio, low delay and medium operation complexity under the condition of ensuring high tone quality of voice data, and is more suitable for wireless voice application.
    Drawings
      FIG. 1 is a compression flow diagram;
      FIG. 2 is a decompression flow diagram;
      FIG. 3 is a schematic diagram of an MLT transform;
      FIG. 4 is a flow chart of optimal bit allocation;
      FIG. 5 is a diagram of raw PCM waveform data in the time domain;
      FIG. 6 is a graph of raw PCM waveform data spectrum data;
      FIG. 7 is a time domain data plot of PCM waveform data after MLT transform;
      FIG. 8 is a diagram of the data spectrum of the PCM waveform data after MLT transformation.
    Detailed Description
      The present invention now will be described more fully hereinafter with reference to the accompanying drawings, in which exemplary embodiments of the invention are shown. This invention may, however, be embodied in many different forms and should not be construed as limited to the exemplary embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
      The invention relates to a voice compression method based on MLT transformation and vector entropy coding, which specifically comprises the following steps:
      (1) an MLT (modulated mapped transform) frequency domain converter, wherein the MLT converter is a frequency domain converter, can convert time domain data into independent frames in short time, adopts a 50% frame aliasing mode to ensure that the frequency spectrum of critical data is not distorted, and has the characteristics of linearity, perfect signal reconstruction and the like; the MLT transform formula is as follows:
      
      (2) an RMS quantization weight calculator, the RMS calculating a Root-Mean-Square (Root-Mean-Square) of the grouped frequency domain spectral coefficients for representing the quantization weights; compared with the quantization weight represented by an absolute value, the quantization level represented by the RMS value is more, the quantization precision is higher, and the RMS calculation formula is as follows:
      
      calculate the quantization weight value for each set of RMS values:
      
      (3) and the optimal grouping bit distributor calculates the bit distribution coefficient of each group according to the quantization weight value:
      category(r)=MAX{0,MIN{7,(offset-rms_index(r)/2)}},
      (0≤r≤number_of_regions;-32≤offset≤31);
      calculating the bit number required by the prediction quantization according to the bit distribution parameter:
      
      then, the number of available bits is calculated from the set bit rate parameter:
      estimated_number_of_available_bits=320+((number_of_available_bits.320)*5/8),
      adjusting the bit distribution parameters of each group to obtain that each group of available bits reaches the maximum within the range of the available bit number, and determining the optimal grouping bit;
      (4) vector quantization is carried out on the frequency domain spectral coefficients to generate grouped vector quantization coefficients:
      dividing the frequency domain spectral coefficient into sign bit and intensity, and calculating the normalization index of each group of intensity:
      k(i)=MIN{(x*magnitude of(mlt(20r+i))+deadzone_rounding),kmax},
      ((0<i<20;x=1/(stepsize*(magnitude_of_rms(r));),
      the normalized indexes are grouped into a vector group bit stream:
      
      (5) and performing Huffman coding on each group of vector groups and symbol bit groups to form a compressed bit stream.
      A speech decompression method based on MLT transformation and vector entropy coding aiming at the speech compression method adopts inverse vector quantization and inverse MLT to decompress the speech after data compression, and specifically comprises the following steps:
      (1) decoding and analyzing the compressed code stream by a Huffman decoder to obtain quantized MLT frequency domain spectral coefficient quantized data;
      (2) performing inverse quantization analysis on MLT frequency domain pedigree number quantized data by adopting an inverse vector quantizer, performing inverse normalization operation on a vector group, and obtaining frequency domain spectral coefficient intensity and a corresponding sign bit to obtain a frequency domain spectral coefficient;
      
      
      (3) performing IMLT (inverse modulation aliasing transform) on the frequency domain spectral coefficients to acquire time domain voice data and finish decoding; the IMLT transformation formula is as follows:
      
      
      
      
      An audio encoder implementing the voice compression method comprises an MLT frequency domain converter, an RMS quantization weight calculator, an optimal grouping bit position distributor and a Huffman encoder, wherein a time domain signal is converted into a frequency domain signal through the MLT converter, the RMS quantization weight calculator is adopted to refine the quantization grade of the frequency domain signal, the optimal grouping bit position distributor and the Huffman encoder are adopted to respectively compress quantization parameters and frequency domain data, and the voice data compression ratio is improved to the maximum extent under the condition of ensuring approximately lossless spectral characteristics.
      An audio decoder implementing the above speech decompression method comprises a code stream analyzer, a huffman decoder, an inverse vector quantizer, and an inverse MLT transform filter, wherein:
      reading code stream data subjected to coding compression in a code stream analyzer for analysis, and acquiring time domain PCM stream information such as a sampling rate, a bit rate, a time division frame length and the like;
      decoding and obtaining RMS weight, bit distribution parameter and quantized MLT frequency domain spectrum vector in a Huffman decoder;
      in an inverse vector quantizer, performing inverse quantization operation on the quantized MLT frequency domain spectrum vector by using RMS weight and bit allocation parameters to obtain an MLT frequency domain spectrum coefficient;
      in an inverse MLT transform filter, performing inverse MLT transform filtering on the MLT frequency domain spectral coefficients to obtain time domain PCM data;
      and controlling PCM data through PCM stream information analyzed by the code stream, and reconstructing and integrating the PCM voice code stream.
      In the invention, the embodiment of the compression part is as shown in figure 1:
      (1) sampling voice data using a digital microphone, acquiring PCM raw digital voice data, and dividing the voice data into short time frames: 5ms (80sample), 10ms (160sample) or 20ms (320sample), and writes information such as the bit rate sampling rate of the PCM configuration into the code stream.
      (2) And converting the time domain PCM data of the short-time frame into MLT frequency domain spectral coefficients through MLT transformation. And grouping the MLT frequency domain spectral coefficients according to the frequency domain correlation, and dividing the MLT frequency domain spectral coefficients into 20 groups of MLT frequency domain spectral vectors.
      (3) And calculating the RMS of the grouped MLT frequency domain spectral vectors by an RMS weight calculator to obtain the quantization weight of each group of frequency domain spectral vectors, and directly writing the quantization weight into the code stream.
      (4) And using the quantization weight RMS of the grouped frequency domain spectrum coefficients in the optimal bit distributor to perform bit distribution calculation on each grouped MLT frequency domain spectrum vector to obtain the optimal bit distribution number, wherein the bit distribution number is also directly written into a code stream.
      (5) In the vector quantizer set, quantized spectral coefficients are quantized using quantization weights and optimal bit allocation. And grouping MLT frequency domain spectral vectors to perform vector quantization.
      (6) And in a Huffman encoder, carrying out Huffman encoding on the quantization weight, the bit distribution parameter and the quantized grouped MLT frequency domain spectrum vector to obtain a final encoding compressed code stream.
      In the present invention, the specific implementation of the decoding part is as shown in fig. 2:
      (1) in a code stream analyzer, code stream data subjected to coding compression is analyzed to obtain time domain PCM stream information such as a sampling rate, a bit rate, a time division frame length and the like;
      (2) decoding and obtaining RMS weight, bit distribution parameter and quantized MLT frequency domain spectrum vector in a Huffman decoder;
      (3) in the inverse vector quantizer, the quantized MLT frequency-domain spectral vectors are inverse quantized using RMS weights and bit allocation parameters. Obtaining an MLT frequency domain spectral coefficient;
      (4) in an inverse MLT transform filter, performing inverse MLT transform filtering on the MLT frequency domain spectral coefficients to obtain time domain PCM data;
      (5) and controlling PCM data through PCM stream information analyzed by the code stream, and reconstructing and integrating the PCM voice code stream.
      As shown in fig. 3, which is a schematic diagram of MLT transformation, PCM time-domain audio data is first subjected to 50% data overlap mixing processing, then subjected to anti-aliasing filtering to prevent spectrum overflow, and then subjected to DCT-IV transformation to transform the time-domain data into frequency-domain spectral coefficients. As shown in fig. 5, 6, 7, and 8, the PCM data before and after the MLT transform shows that the transformed PCM data and the original PCM data have lossless effect on both time domain and frequency domain information.
      As shown in fig. 4, the optimal bit allocation process is a process of allocating bits according to the spectral coefficients of the packet frequency domain:
      (1) firstly, analyzing RMS quantization weight information of the group of frequency domain spectral coefficients, setting bit distribution parameters, and carrying out bit distribution calculation;
      (2) and then, calculating the bit number consumed by the predicted bit allocation according to the bit allocation result, and analyzing whether the current predicted bit allocation number meets the limit in the limitation of the pre-set signal-to-noise ratio and the residual bit number. If not, resetting the bit allocation parameter, and performing bit allocation again, and if so, entering bit allocation calculation of the next group of frequency domain spectral coefficients. And simultaneously updating the residual bit number for the next group of bit allocation operation.
      The psychoacoustic model, the bit allocation and the quantization mode of the embodiment are optimized to simplify the computational complexity of the psychoacoustic model, and the verified frequency domain auditory threshold and the masked threshold are directly applied to analyze the subband data; and because the bit allocation unit adopts a symmetrical quantization scheme, the bit allocation result is not directly transmitted to the decoding end through the code stream, but the bit allocation number is calculated at the decoding end through the same bit allocation mechanism through the quantization factor, so that a large number of code streams are reduced and can be used for transmitting quantized audio data, and the bit allocation number can be adjusted at any time according to the wireless transmission environment by setting the code stream length adjusting parameter.
      As described above, in the present invention, the perfectly reconstructed MLT transform is used for time domain to frequency domain conversion for the characteristics of wireless voice transmission application, so that high voice quality of voice data is ensured, the MLT transform length can be directly modified according to the system requirement for delay, low delay is ensured, optimal bit allocation is adopted to ensure that the compression ratio is highest without affecting voice quality, and finally huffman coding is adopted to further compress quantized data.
    Claims (9)
1. A method of speech compression, the method comprising:
      1) MLT frequency domain transformation: converting a time domain digital voice signal collected by a digital microphone into a frequency domain spectral coefficient;
      2) RMS quantization weight calculation: the frequency domain spectral coefficient is the root mean square RMS of the grouped calculation signals, and the weight of the frequency domain component is calculated through the grouped root mean square;
      3) optimal grouping bit allocation: obtaining an optimal grouping bit according to the grouping signal frequency domain component weight and the set bit rate parameter;
      4) carrying out vector quantization on the grouped frequency domain voice signals to generate grouped vector quantization coefficients;
      5) performing Huffman coding on the grouped vector quantization coefficients to complete data compression;
      the bit allocation unit adopts a symmetrical quantization scheme, the bit allocation number is calculated at a decoding end through the same bit allocation mechanism according to the result of bit allocation through a quantization factor, and a code stream length adjustment parameter is set, so that the bit allocation number can be adjusted at any time according to a wireless transmission environment;
      the method comprises the following steps of carrying out bit allocation according to a unit of a grouped frequency domain spectral coefficient:
      (1) firstly, analyzing RMS quantization weight information of the group of frequency domain spectral coefficients, setting bit distribution parameters, and carrying out bit distribution calculation;
      (2) then, calculating the bit number consumed by the predicted bit allocation according to the bit allocation result, and analyzing whether the current predicted bit allocation number meets the limit or not under the limit of the preset signal-to-noise ratio and the residual bit number; if not, resetting bit allocation parameters, and performing bit allocation again, and if so, performing bit allocation calculation of the next group of frequency domain spectral coefficients; simultaneously updating the residual bit number for the next group of bit allocation operation;
      in the step 3), the optimal grouping bit calculation method comprises the following steps: calculating a maximum bit and a minimum bit according to the quantization weight, and optimizing grouping bits according to the bit rate parameters to ensure that the optimized bits meet the requirements of each grouping spectral coefficient under the bit limit; and calculating the distribution coefficient of each group of bits according to the quantization weight value:
      category(r)=MAX{0,MIN{7,(offset-rms_index(r)/2)}};
      0≤r≤number_of_regions;-32≤offset≤31;
      calculating the bit number required by the prediction quantization according to the bit distribution parameter:
      
      then, the number of available bits is calculated from the set bit rate parameter:
      estimated_number_of_available_bits=320+((number_of_available_bits.320)*5/8);
      and adjusting the bit distribution parameters of each group to obtain the maximization of each group of available bits within the range of the available bits, and determining the optimal grouping bits.
    2. The speech compression method as recited in claim 1, wherein the step 1) converts the PCM time domain audio data of the short time frame into MLT frequency domain spectral coefficients through MLT transform using modulation aliasing transform, and the MLT frequency domain spectral coefficients are grouped by frequency domain correlation; the PCM time domain audio data is firstly subjected to 50% data overlapping and mixing processing, then subjected to anti-aliasing filtering to prevent spectrum overflow, and then subjected to DCT-IV transformation to transform the time domain data into frequency domain spectral coefficients.
    3. The speech compression method of claim 1, wherein the MLT frequency-domain transform is formulated as follows:
      
      0≤m<N,0≤n<2N,N∈(80,160,320);
      in the step 2), the quantization weight of the frequency domain spectral coefficients after time-frequency conversion is calculated through root mean square RMS, and the RMS calculation formula is as follows:
      
    4. the voice compression method according to claim 1, wherein the processing procedures of the step 4) and the step 5) are as follows:
      A) dividing the frequency domain spectral coefficient into sign bit and intensity, and calculating the normalization index of each group of intensity:
      k(i)=MIN{(x*magnitude of(mlt(20r+i))+deadzone_rounding),kmax}
      0<i<20;x=1/(stepsize*(magnitude_of_rms(r);
      B) the normalized indexes are grouped into a vector group bit stream:
      
      j=index to jthvalue of k();vd=vector dimension;
      C) and performing Huffman coding on each group of vector groups and symbol bit groups to form a compressed bit stream.
    5. A speech decompression method is characterized in that inverse vector quantization and inverse MLT are adopted to decompress speech after data compression, and specifically comprises the following steps:
      1) analyzing and performing Haffman decoding on the compressed bit stream to obtain a vector group and a symbol bit group;
      2) carrying out inverse normalization operation on the vector group to obtain the intensity of the frequency domain spectral coefficient and a corresponding sign bit to obtain the frequency domain spectral coefficient;
      3) performing inverse modulation aliasing transformation IMLT on the frequency domain spectral coefficient to acquire time domain voice data and finish decoding;
      the bit allocation unit adopts a symmetrical quantization scheme, the bit allocation number is calculated at a decoding end through the same bit allocation mechanism according to the result of bit allocation through a quantization factor, and a code stream length adjustment parameter is set, so that the bit allocation number can be adjusted at any time according to a wireless transmission environment;
      the method comprises the following steps of carrying out bit allocation according to a unit of a grouped frequency domain spectral coefficient:
      (1) firstly, analyzing RMS quantization weight information of the group of frequency domain spectral coefficients, setting bit distribution parameters, and carrying out bit distribution calculation;
      (2) then, calculating the bit number consumed by the predicted bit allocation according to the bit allocation result, and analyzing whether the current predicted bit allocation number meets the limit or not under the limit of the preset signal-to-noise ratio and the residual bit number; if not, resetting bit allocation parameters, and performing bit allocation again, and if so, performing bit allocation calculation of the next group of frequency domain spectral coefficients; simultaneously updating the residual bit number for the next group of bit allocation operation;
      the optimal grouping bit calculation method comprises the following steps: calculating a maximum bit and a minimum bit according to the quantization weight, and optimizing grouping bits according to the bit rate parameters to ensure that the optimized bits meet the requirements of each grouping spectral coefficient under the bit limit; and calculating the distribution coefficient of each group of bits according to the quantization weight value:
      category(r)=MAX{0,MIN{7,(offset-rms_index(r)/2)}};
      0≤r≤number_of_regions;-32≤offset≤31;
      calculating the bit number required by the prediction quantization according to the bit distribution parameter:
      
      then, the number of available bits is calculated from the set bit rate parameter:
      estimated_number_of_available_bits=320+((number_of_available_bits.320)*5/8);
      and adjusting the bit distribution parameters of each group to obtain the maximization of each group of available bits within the range of the available bits, and determining the optimal grouping bits.
    6. The speech decompression method according to claim 5, wherein in step 1), the code stream data after being coded and compressed is analyzed to obtain time domain PCM stream information of a sampling rate, a bit rate and a time division frame length; the inverse normalization operation formula in the step 2) is as follows:
      
      
      i=(n+1)vd-j-1;0≤j≤vd-1;0≤n≤vpr-1。
    8. An audio encoder is characterized by comprising an MLT frequency domain converter, an RMS quantization weight calculator, an optimal grouping bit distributor and a Huffman encoder, wherein a time domain signal is converted into a frequency domain signal through the MLT converter, the RMS quantization weight calculator is adopted to refine the quantization grade of the frequency domain signal, the optimal grouping bit distributor and the Huffman encoder are adopted to respectively compress quantization parameters and frequency domain data, and the compression ratio of voice data is improved to the maximum extent under the condition of ensuring approximately lossless spectral characteristics; the bit allocation unit adopts a symmetrical quantization scheme, the bit allocation number is calculated at a decoding end through the same bit allocation mechanism according to the result of bit allocation through a quantization factor, and a code stream length adjustment parameter is set, so that the bit allocation number can be adjusted at any time according to a wireless transmission environment;
      the method comprises the following steps of carrying out bit allocation according to a unit of a grouped frequency domain spectral coefficient:
      (1) firstly, analyzing RMS quantization weight information of the group of frequency domain spectral coefficients, setting bit distribution parameters, and carrying out bit distribution calculation;
      (2) then, calculating the bit number consumed by the predicted bit allocation according to the bit allocation result, and analyzing whether the current predicted bit allocation number meets the limit or not under the limit of the preset signal-to-noise ratio and the residual bit number; if not, resetting bit allocation parameters, and performing bit allocation again, and if so, performing bit allocation calculation of the next group of frequency domain spectral coefficients; simultaneously updating the residual bit number for the next group of bit allocation operation;
      the optimal grouping bit calculation method comprises the following steps: calculating a maximum bit and a minimum bit according to the quantization weight, and optimizing grouping bits according to the bit rate parameters to ensure that the optimized bits meet the requirements of each grouping spectral coefficient under the bit limit; and calculating the distribution coefficient of each group of bits according to the quantization weight value:
      category(r)=MAX{0,MIN{7,(offset-rms_index(r)/2)}};
      0≤r≤number_of_regions;-32≤offset≤31;
      calculating the bit number required by the prediction quantization according to the bit distribution parameter:
      
      then, the number of available bits is calculated from the set bit rate parameter:
      estimated_number_of_available_bits=320+((number_of_available_bits.320)*5/8);
      and adjusting the bit distribution parameters of each group to obtain the maximization of each group of available bits within the range of the available bits, and determining the optimal grouping bits.
    9. An audio decoder comprising a stream analyzer, a huffman decoder, an inverse vector quantizer, an inverse MLT transform filter, wherein:
      reading code stream data subjected to coding compression in a code stream analyzer for analysis, and acquiring time domain PCM stream information such as a sampling rate, a bit rate, a time division frame length and the like;
      decoding and obtaining RMS weight, bit distribution parameter and quantized MLT frequency domain spectrum vector in a Huffman decoder;
      in an inverse vector quantizer, performing inverse quantization operation on the quantized MLT frequency domain spectrum vector by using RMS weight and bit allocation parameters to obtain an MLT frequency domain spectrum coefficient;
      in an inverse MLT transform filter, performing inverse MLT transform filtering on the MLT frequency domain spectral coefficients to obtain time domain PCM data;
      controlling PCM data through PCM stream information analyzed by the code stream, and reconstructing and integrating PCM voice code stream;
      the bit allocation unit adopts a symmetrical quantization scheme, the bit allocation number is calculated at a decoding end through the same bit allocation mechanism according to the result of bit allocation through a quantization factor, and a code stream length adjustment parameter is set, so that the bit allocation number can be adjusted at any time according to a wireless transmission environment;
      the method comprises the following steps of carrying out bit allocation according to a unit of a grouped frequency domain spectral coefficient:
      (1) firstly, analyzing RMS quantization weight information of the group of frequency domain spectral coefficients, setting bit distribution parameters, and carrying out bit distribution calculation;
      (2) then, calculating the bit number consumed by the predicted bit allocation according to the bit allocation result, and analyzing whether the current predicted bit allocation number meets the limit or not under the limit of the preset signal-to-noise ratio and the residual bit number; if not, resetting bit allocation parameters, and performing bit allocation again, and if so, performing bit allocation calculation of the next group of frequency domain spectral coefficients; simultaneously updating the residual bit number for the next group of bit allocation operation;
      the optimal grouping bit calculation method comprises the following steps: calculating a maximum bit and a minimum bit according to the quantization weight, and optimizing grouping bits according to the bit rate parameters to ensure that the optimized bits meet the requirements of each grouping spectral coefficient under the bit limit; and calculating the distribution coefficient of each group of bits according to the quantization weight value:
      category(r)=MAX{0,MIN{7,(offset-rms_index(r)/2)}};
      0≤r≤number_of_regions;-32≤offset≤31;
      calculating the bit number required by the prediction quantization according to the bit distribution parameter:
      
      then, the number of available bits is calculated from the set bit rate parameter:
      estimated_number_of_available_bits=320+((number_of_available_bits.320)*5/8);
      and adjusting the bit distribution parameters of each group to obtain the maximization of each group of available bits within the range of the available bits, and determining the optimal grouping bits.
    Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title | 
|---|---|---|---|
| CN201610260757.3A CN105957533B (en) | 2016-04-22 | 2016-04-22 | Voice compression method, voice decompression method, audio encoder and audio decoder | 
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title | 
|---|---|---|---|
| CN201610260757.3A CN105957533B (en) | 2016-04-22 | 2016-04-22 | Voice compression method, voice decompression method, audio encoder and audio decoder | 
Publications (2)
| Publication Number | Publication Date | 
|---|---|
| CN105957533A CN105957533A (en) | 2016-09-21 | 
| CN105957533B true CN105957533B (en) | 2020-11-10 | 
Family
ID=56915027
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date | 
|---|---|---|---|
| CN201610260757.3A Active CN105957533B (en) | 2016-04-22 | 2016-04-22 | Voice compression method, voice decompression method, audio encoder and audio decoder | 
Country Status (1)
| Country | Link | 
|---|---|
| CN (1) | CN105957533B (en) | 
Families Citing this family (3)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| CN109583056A (en) * | 2018-11-16 | 2019-04-05 | 中国科学院信息工程研究所 | A kind of network-combination yarn tool performance appraisal procedure and system based on emulation platform | 
| CN111402907B (en) * | 2020-03-13 | 2023-04-18 | 大连理工大学 | G.722.1-based multi-description speech coding method | 
| CN113612672A (en) * | 2021-08-04 | 2021-11-05 | 杭州微纳科技股份有限公司 | Asynchronous single-wire audio transmission circuit and audio transmission method | 
Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| EP0684705A2 (en) * | 1994-05-06 | 1995-11-29 | Nippon Telegraph And Telephone Corporation | Multichannel signal coding using weighted vector quantization | 
| CN101165778A (en) * | 2006-10-18 | 2008-04-23 | 宝利通公司 | Dual-transform coding of audio signals | 
| CN101206860A (en) * | 2006-12-20 | 2008-06-25 | 华为技术有限公司 | A layered audio codec method and device | 
| CN101572087A (en) * | 2008-04-30 | 2009-11-04 | 北京工业大学 | Method and device for encoding and decoding embedded voice or voice-frequency signal | 
| CN101572586A (en) * | 2008-04-30 | 2009-11-04 | 北京工业大学 | Method, device and system for encoding and decoding | 
| CN102081926A (en) * | 2009-11-27 | 2011-06-01 | 中兴通讯股份有限公司 | Method and system for encoding and decoding lattice vector quantization audio | 
| CN102150202A (en) * | 2008-07-14 | 2011-08-10 | 三星电子株式会社 | Method and device for encoding and decoding audio/speech signals | 
| CN102801427A (en) * | 2012-08-08 | 2012-11-28 | 深圳广晟信源技术有限公司 | Encoding and decoding method and system for variable-rate lattice vector quantization of source signal | 
Family Cites Families (2)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| EP3385949A1 (en) * | 2011-05-13 | 2018-10-10 | Samsung Electronics Co., Ltd. | Bit allocating method for encoding an audio signal spectrum | 
| CN102436819B (en) * | 2011-10-25 | 2013-02-13 | 杭州微纳科技有限公司 | Wireless audio compression and decompression methods, audio coder and audio decoder | 
- 
        2016
        - 2016-04-22 CN CN201610260757.3A patent/CN105957533B/en active Active
 
Patent Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| EP0684705A2 (en) * | 1994-05-06 | 1995-11-29 | Nippon Telegraph And Telephone Corporation | Multichannel signal coding using weighted vector quantization | 
| CN101165778A (en) * | 2006-10-18 | 2008-04-23 | 宝利通公司 | Dual-transform coding of audio signals | 
| CN101206860A (en) * | 2006-12-20 | 2008-06-25 | 华为技术有限公司 | A layered audio codec method and device | 
| CN101572087A (en) * | 2008-04-30 | 2009-11-04 | 北京工业大学 | Method and device for encoding and decoding embedded voice or voice-frequency signal | 
| CN101572586A (en) * | 2008-04-30 | 2009-11-04 | 北京工业大学 | Method, device and system for encoding and decoding | 
| CN102150202A (en) * | 2008-07-14 | 2011-08-10 | 三星电子株式会社 | Method and device for encoding and decoding audio/speech signals | 
| CN102081926A (en) * | 2009-11-27 | 2011-06-01 | 中兴通讯股份有限公司 | Method and system for encoding and decoding lattice vector quantization audio | 
| CN102801427A (en) * | 2012-08-08 | 2012-11-28 | 深圳广晟信源技术有限公司 | Encoding and decoding method and system for variable-rate lattice vector quantization of source signal | 
Also Published As
| Publication number | Publication date | 
|---|---|
| CN105957533A (en) | 2016-09-21 | 
Similar Documents
| Publication | Publication Date | Title | 
|---|---|---|
| JP5539203B2 (en) | Improved transform coding of speech and audio signals | |
| CN101140759B (en) | Bandwidth extension method and system for voice or audio signal | |
| US8135583B2 (en) | Encoder, decoder, encoding method, and decoding method | |
| US9754601B2 (en) | Information signal encoding using a forward-adaptive prediction and a backwards-adaptive quantization | |
| CN103069484B (en) | Time/frequency two dimension post-processing | |
| EP1080579B1 (en) | Scalable audio coder and decoder | |
| CN102511062B (en) | Bit allocation in enhanced encoding/decoding for improved hierarchical encoding/decoding of digital audio signals | |
| CN102436819B (en) | Wireless audio compression and decompression methods, audio coder and audio decoder | |
| CN101421780B (en) | Method and device for encoding and decoding time-varying signals | |
| JPWO2005004113A1 (en) | Audio encoding device | |
| RU2505921C2 (en) | Method and apparatus for encoding and decoding audio signals (versions) | |
| CN107591157B (en) | Transform coding/decoding of harmonic audio signals | |
| CN103187065A (en) | Voice frequency data processing method, device and system | |
| JPH08278799A (en) | Noise load filtering method | |
| TW201724087A (en) | Apparatus for coding envelope of signal and apparatus for decoding thereof | |
| CN102522092B (en) | One based on G. Apparatus and method for 711.1 voice bandwidth extension | |
| CN104392726B (en) | Encoding equipment and decoding equipment | |
| JP2018205766A (en) | Method, encoder, decoder, and mobile device | |
| KR20070070189A (en) | Speech Coder and Speech Coder | |
| CN105957533B (en) | Voice compression method, voice decompression method, audio encoder and audio decoder | |
| WO2024051412A1 (en) | Speech encoding method and apparatus, speech decoding method and apparatus, computer device and storage medium | |
| CN101192410B (en) | A method and device for adjusting quantization quality in codec | |
| KR20080059657A (en) | Signal coding and decoding based on spectral changes | |
| CN114863942B (en) | Model training method for voice quality conversion, method and device for improving voice quality | |
| CN101562015A (en) | Audio-frequency processing method and device | 
Legal Events
| Date | Code | Title | Description | 
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |