CN102543086B

CN102543086B - A device and method for voice bandwidth extension based on audio watermark

Info

Publication number: CN102543086B
Application number: CN2011104223927A
Authority: CN
Inventors: 陈喆; 殷福亮; 赵承勇
Original assignee: Dalian University of Technology
Current assignee: Dalian University of Technology
Priority date: 2011-12-16
Filing date: 2011-12-16
Publication date: 2013-08-14
Anticipated expiration: 2031-12-16
Also published as: CN102543086A

Abstract

The present invention discloses a device and method for voice bandwidth expansion based on audio watermark. The device and method: at the beginning, the voice uttered by a person is a broadband signal. Before being transmitted through a telephone line, high-frequency parameters are embedded into a narrowband code stream, and the narrowband voice signal is transmitted through the telephone line; A-law decoding is performed at the receiving end, and then high-frequency parameters are extracted, and the high-frequency parameters are used to restore the high-frequency part of the broadband voice, and finally the high-frequency voice and the low-frequency voice are synthesized into broadband voice. The device and method utilize the characteristics of the audio watermark to establish a hidden channel in the narrowband voice, and utilize this channel to transmit the parameters of the high-frequency voice, thereby realizing the frequency band expansion of the voice signal without changing the original network protocol.

Description

A device and method for voice bandwidth extension based on audio watermark

技术领域 technical field

本发明涉及语音处理技术，特别涉及一种基于音频水印的语音带宽扩展的装置和方法。 The invention relates to speech processing technology, in particular to a device and method for extending speech bandwidth based on audio watermark.

背景技术 Background technique

人类语音信号的主要能量集中于0.3～3.4KHz，4KHz带宽就可保证足够的可懂度。因此，国际电信联盟（ITU）制定的公用电话网（PSTN）编码标准G.711（即A律和μ律）的采样频率为8KHz，并一直沿用至今。 The main energy of the human speech signal is concentrated in 0.3-3.4KHz, and the bandwidth of 4KHz can guarantee sufficient intelligibility. Therefore, the sampling frequency of the public telephone network (PSTN) coding standard G.711 (that is, A law and μ law) formulated by the International Telecommunication Union (ITU) is 8KHz, and it has been used until now.

窄带语音在保证一定可懂度的同时，降低了对通信带宽的需求，但这是以牺牲语音的自然性为代价的。窄带语音丢失了原始语音中的高频分量，所以它听起来不够自然。为了提高语音质量，ITU-T提出了第一个用于远程电话会议的宽带语音编解码器G.722。宽带语音通信可以通过重新设计传输链路来实现，但是对于庞大的PSTN固话网络来说，重新设计传输链路耗资过大。 Narrowband speech reduces the demand for communication bandwidth while ensuring a certain intelligibility, but this is at the expense of the naturalness of speech. Narrowband speech loses the high-frequency components of the original speech, so it doesn't sound natural. In order to improve voice quality, ITU-T proposed the first wideband voice codec G.722 for teleconferencing. Broadband voice communication can be realized by redesigning the transmission link, but for the huge PSTN fixed-line network, the cost of redesigning the transmission link is too high.

传统的水印是指纸张对着光线时所见的标记，一般用于重要票据的真伪检测。而数字水印技术是利用多媒体数字作品普遍存在的冗余性和随机性，把某些数字信息嵌入到数字作品中，实现信息的隐藏传输。数字水印主要用于保护数字作品的版权和完整性。由于人的听觉比视觉灵敏，将水印嵌入到音频比嵌入到图像要困难的多。 The traditional watermark refers to the mark seen when the paper is facing the light, and it is generally used for the authenticity detection of important bills. The digital watermarking technology uses the redundancy and randomness of multimedia digital works to embed some digital information into digital works to realize the hidden transmission of information. Digital watermarking is mainly used to protect the copyright and integrity of digital works. Since human hearing is more sensitive than vision, it is much more difficult to embed watermark into audio than into image.

基于最低有效位(LSB)的音频水印：基于LSB的语音带宽扩展的方法是将高频参数嵌入到编码码流的最低位来实现，该方法嵌入水印的数量多、算法简单，适合误码率较低的通信信道。 Least significant bit (LSB)-based audio watermarking: The method of LSB-based voice bandwidth extension is to embed high-frequency parameters into the lowest bit of the encoded code stream. This method has a large number of embedded watermarks, a simple algorithm, and is suitable for bit error rates. Lower communication channel.

基于时域回声隐藏技术的音频水印：基于时域回声隐藏技术的音频水印是利用了人耳听觉特性中的时域掩蔽效应：一个声音信号虽然已经结束，但它对另一个声音的听觉能力还有影响。该方法嵌入的水印数量较少，嵌入水印以后对原始的声音有一定的影响。 Audio watermarking based on time-domain echo concealment technology: Audio watermarking based on time-domain echo concealment technology utilizes the time-domain masking effect in the auditory characteristics of the human ear: although a sound signal has ended, its auditory ability to another sound is still influential. This method embeds a small number of watermarks, which will have a certain impact on the original sound after the watermark is embedded.

基于频域离散傅里叶变换的音频水印该方法首先对音频信息进行DFT变换，然后选择其中频率范围为2.4～6.4kHz的DFT系数进行水印嵌入，并用表示水印序列的频谱分量来替换相应的DFT系数。该方法虽然有很好的稳健性，但当嵌入水印与原始DFT系数差别过大时，对原始语音的影响较大。 Audio watermarking based on frequency-domain discrete Fourier transform This method first performs DFT transformation on the audio information, and then selects the DFT coefficients with a frequency range of 2.4-6.4kHz for watermark embedding, and replaces the corresponding DFT with the spectral components representing the watermark sequence coefficient. Although this method has good robustness, when the difference between the embedded watermark and the original DFT coefficient is too large, it will have a great impact on the original speech.

基于频域离散余弦变换的音频水印：该方法先对时域信号做离散余弦变换，然后对序列进行修正离散余弦变换（MDCT）,通过对MDCT的系数进行改变以嵌入水印。该方法有很好的稳健性，但嵌入水印的数量较少。 Audio watermarking based on frequency-domain discrete cosine transform: This method first performs discrete cosine transform on the time-domain signal, and then performs modified discrete cosine transform (MDCT) on the sequence, and embeds the watermark by changing the coefficient of MDCT. This method has good robustness, but the number of embedded watermarks is small.

现有技术的缺点：以上方法在稳健性、隐蔽性和嵌入水印数量三个方面不能做到很好的均衡，都有其各自的缺点，因此不能够较好的用于语音带宽扩展。 Disadvantages of the prior art: the above methods cannot achieve a good balance in the three aspects of robustness, concealment and the number of embedded watermarks, and each has its own shortcomings, so it cannot be better used for voice bandwidth expansion.

发明内容 Contents of the invention

针对现有音频水印实现带宽扩展的各种缺点和不足，本发明提供了一种基于音频水印的语音带宽扩展的装置和方法。 Aiming at various shortcomings and deficiencies in realizing bandwidth extension by existing audio watermarks, the present invention provides a device and method for audio bandwidth extension based on audio watermarks.

为了达到上述目的，本发明提供的一种基于音频水印的语音带宽扩展的方法，包括以下步骤： In order to achieve the above-mentioned purpose, a kind of method for the voice bandwidth expansion based on audio watermark provided by the present invention, comprises the following steps:

步骤A.使用QMF分析滤波器组模块将宽带语音分成两个部分：0～8000Hz的窄带语音和8000～16000Hz的高频分量；并将两个输出信号采样频率降至8KHz，得到低频信号s _L(n)和高频信号s _H(n)。 Step A. Use the QMF analysis filter bank module to divide the wideband speech into two parts: the narrowband speech of 0-8000Hz and the high-frequency component of 8000-16000Hz; and reduce the sampling frequency of the two output signals to 8KHz to obtain the low-frequency signal s _L ( n ) and high frequency signal s _H ( n ).

步骤B.通过提取高频参数模块提取30个高频参数：16个时域包络参数、12个频域包络参数、平均时域包络参数和平均频域包络参数；该部分参考了文献《基于分层宽带语音编解码系统的DTX/CNG算法研究与实现》的做法，以下是各个参数的具体提取方法： Step B. Extract 30 high-frequency parameters by extracting high-frequency parameters module: 16 time-domain envelope parameters, 12 frequency-domain envelope parameters, average time-domain envelope parameters and average frequency-domain envelope parameters; this part refers to According to the practice of the document "Research and Implementation of DTX/CNG Algorithm Based on Layered Wideband Speech Codec System", the following is the specific extraction method of each parameter:

步骤B1.提取16个时域包络参数和平均时域包络参数： Step B1. Extract 16 time-domain envelope parameters and average time-domain envelope parameters:

每20ms的高频分量s _H(n)等分为16段，每段包括10个采样点；16个时域包络参数为： The high-frequency component s _H ( n ) of every 20ms is divided into 16 segments, and each segment includes 10 sampling points; the 16 time-domain envelope parameters are:

。 .

计算平均时域包络： Compute the average time-domain envelope:

用时域包络参数T(i)与平均值

作差进行归一化： Using the time-domain envelope parameter T ( i ) and the mean

Normalize with difference:

。

.

步骤B2.提取12个频域包络参数和平均频域包络参数： Step B2. extract 12 frequency domain envelope parameters and average frequency domain envelope parameters:

高频分量s _H(n)的当前帧的160个采样点与上一帧的最后48个采用点经过加窗处理得

，这里使用窗长208个样点窗函数window(n): The 160 sampling points of the current frame of the high-frequency component s _H ( n ) and the last 48 sampling points of the previous frame are obtained by windowing

, here the window function window ( n ) with a window length of 208 samples is used:

其中，N=208； Among them, N=208;

对加窗后的信号补0至256点，然后做256点的FFT变换得S _F(k)： Add 0 to 256 points to the signal after windowing, and then perform FFT transformation of 256 points to get S _F ( k ):

。

.

其中，L=256；将频域分为12个均匀间隔，计算每个间隔的频域包络参数，并转换成对数加权子带能量参数。 Among them, L = 256; divide the frequency domain into 12 uniform intervals, calculate the frequency domain envelope parameters of each interval, and convert them into logarithmic weighted subband energy parameters.

计算平均频域包络： Compute the average frequency-domain envelope:

。

.

将频域包络参数F(i)与平均值

作差进行归一化： The frequency domain envelope parameters F ( i ) and the average

Normalize with difference:

。

.

步骤C.通过G.711编解码模块将窄带语音信号s _L(n) 通过A律编码器编码，得到每个点8bit数据长度的码流，将水印信息嵌入到码流中，通过电话线传送到网络中；接收端从码流中提取出水印信息，并通过A律解码器解码，得到窄带语音信号。 Step C. Encode the narrowband speech signal s _L ( n ) through the A-law encoder through the G.711 codec module to obtain a code stream with a data length of 8 bits for each point, embed the watermark information into the code stream, and send the voice signal through the telephone line It is transmitted to the network; the receiving end extracts the watermark information from the code stream, and decodes it through an A-law decoder to obtain a narrowband voice signal.

步骤D.通过水印嵌入模块将水印嵌入到码流中包括以下两种方式： Step D. Embedding the watermark into the code stream through the watermark embedding module includes the following two methods:

D1.通过水印嵌入模块将水印均匀的嵌入到码流中：由于一帧信号有160个采样点，而嵌入水印的比特数为66bit，每隔一个采样点嵌入1比特信息。 D1. Evenly embed the watermark into the code stream through the watermark embedding module: Since there are 160 sampling points in a frame signal, and the number of bits embedded in the watermark is 66 bits, 1 bit of information is embedded in every other sampling point.

或者D2.通过水印嵌入模块将水印信息有选择的嵌入到幅度小的抽样点中；使用C0～C7代表编码码流的最低位到最高位；根据G.711协议，最高位C7代表采样点的符号位，C6～C4为段落码，C3～C0为段内码；段落码越小，码流所代表的采样值的幅度越小；本方法使用C6位将信号划分为大信号，即C6=1和小信号，即C6=0，当C6为0时嵌入水印；如果一帧嵌入的位置不够66个，则选择在其他位置嵌入水印。 Or D2. Selectively embed the watermark information into the sampling points with small amplitude through the watermark embedding module; use C0~C7 to represent the lowest bit to the highest bit of the encoded code stream; according to the G.711 protocol, the highest bit C7 represents the sampling point Sign bit, C6~C4 are paragraph codes, C3~C0 are intra-segment codes; the smaller the paragraph code, the smaller the sampling value represented by the code stream; this method uses C6 bit to divide the signal into large signals, that is, C6= 1 and small signal, that is, C6=0, when C6 is 0, the watermark is embedded; if there are not enough 66 embedded positions in one frame, choose to embed the watermark in other positions. the

步骤E.通过提取水印模块提取水印与步骤D对应，包括两种方式： Step E. Extracting the watermark through the extracting watermark module corresponds to step D, including two ways:

E1.通过提取水印模块提取水印的过程是根据嵌入水印的位置进行提取。 E1. The process of extracting the watermark by the extracting watermark module is based on the position of the embedded watermark.

或者E2.根据码流的特点来判断是否嵌入了水印；从一帧的起始判断，若C6为0，则从最低位提取水印，C6为1时不提取水印；若到达帧尾时提取的水印不足66比特，则返回一帧的起始点，在C6为1处的位置提取，直到提取66比特水印。 Or E2. Determine whether a watermark is embedded according to the characteristics of the code stream; judge from the beginning of a frame, if C6 is 0, then extract the watermark from the lowest bit, and do not extract the watermark when C6 is 1; if it reaches the end of the frame, extract the watermark If the watermark is less than 66 bits, return to the starting point of a frame, and extract at the position where C6 is 1, until a 66-bit watermark is extracted.

步骤F.通过恢复高频语音模块使用白噪声来恢复高频语音： Step F. Recover high-frequency speech using white noise by the Recover High-Frequency Speech module:

首先将产生的白噪声序列通过由低频语音构造的AR模型，然后使用提取的高频参数对其进行时域包络整形和频域包络整形，即可得到高频语音信号。 First pass the generated white noise sequence through the AR model constructed from low-frequency speech, and then use the extracted high-frequency parameters to perform time-domain envelope shaping and frequency-domain envelope shaping to obtain high-frequency speech signals.

步骤F1.使用白噪声恢复高频语音： Step F1. Recover high frequency speech using white noise:

由于高频语音和低频语音有一定的相关性，使用解码得到的低频语音构造AR模型；在解码端产生白噪声序列，将此序列通过构造的AR模型进行成型处理，使噪声具备高频语音的特征。 Since high-frequency speech and low-frequency speech have a certain correlation, the low-frequency speech obtained by decoding is used to construct an AR model; a white noise sequence is generated at the decoding end, and the sequence is shaped through the constructed AR model to make the noise have the characteristics of high-frequency speech feature.

步骤F2.时域包络局部调整，该部分参考了文献《基于分层宽带语音编解码系统的DTX/CNG算法研究与实现》的做法： Step F2. Local adjustment of the time domain envelope, this part refers to the practice of the document "DTX/CNG Algorithm Research and Implementation Based on Layered Wideband Speech Codec System":

从水印中恢复的归一化时域包络参数和平均时域包络计算高频信号的时域包络参数： Compute the time envelope parameters of the high-frequency signal from the normalized time envelope parameters recovered from the watermark and the average time envelope parameters:

。

.

由噪声和高频信号的时域包络参数计算时域局部增益因子： Calculate the time-domain local gain factor from the time-domain envelope parameters of the noise and high-frequency signals:

。

.

使用时域局部增益因子对噪声的时域包络进行调整： Adjust the temporal envelope of the noise using a temporal local gain factor:

。

.

两段之间的增益因子使用线性插值的方法进行处理： The gain factor between two segments is processed using a linear interpolation method:

。

.

步骤F3.频域包络局部调整，该部分参考了文献《基于分层宽带语音编解码系统的DTX/CNG算法研究与实现》的做法： Step F3. Local adjustment of the frequency domain envelope, this part refers to the practice of the document "Research and Implementation of DTX/CNG Algorithm Based on Layered Wideband Speech Codec System":

对时域调整后的信号按照提取12个频域包络参数和平均频域包络参数进行处理，得到噪声的对数加权子带能量参数

和平均频域包络

。按照时域包络局部调整中对噪声的时域包络局部调整方法，对噪声的频域包络进行局部调整。 The signal adjusted in the time domain is processed by extracting 12 frequency domain envelope parameters and the average frequency domain envelope parameters to obtain the logarithmically weighted subband energy parameters of the noise

and the average frequency domain envelope

. According to the local adjustment method of the time domain envelope of the noise in the local adjustment of the time domain envelope, the frequency domain envelope of the noise is locally adjusted.

步骤F4.频域包络全局调整： Step F4. Global adjustment of the frequency domain envelope:

由噪声和高频信号的平均频域包络计算每一帧的频域全局增益因子： Calculate the frequency-domain global gain factor for each frame from the average frequency-domain envelope of the noise and high-frequency signal:

。

.

使用频域全局增益因子对每一帧的频域包络进行全局调整： Globally adjust the frequency-domain envelope for each frame using a frequency-domain global gain factor:

。

.

将调整后的频谱做IFFT变换，然后用window窗函数对得到时域信号加窗后存入长度为208的buffer中： Perform IFFT transformation on the adjusted spectrum, and then use the window window function to window the obtained time domain signal and store it in a buffer with a length of 208:

。

.

其中，L=256, n=0,1,…207。 Among them, L=256, n=0,1,...207.

将前一帧buffer中的最后48个点的值与当前帧buffer中的前48个点相加，然后与当前帧buffer中n=48～159的值构成当前帧恢复出的时域信号。 Add the value of the last 48 points in the buffer of the previous frame to the first 48 points in the buffer of the current frame, and then form the time domain signal recovered from the current frame with the value of n=48~159 in the buffer of the current frame.

步骤F5.时域包络全局调整： Step F5. Global adjustment of time domain envelope:

按照频域包络全局调整的步骤对时域包络进行全局调整，调整后的信号

即为由噪声估计的高频信号。 Globally adjust the time-domain envelope according to the steps of global adjustment of the frequency-domain envelope, the adjusted signal

That is, the high-frequency signal estimated by the noise.

步骤G.通过QMF合成滤波器组模块将8KHz采用频率的低频信号

和估计出的高频信号

提高采样频率到16kHz，然后分别通过低通和高通FIR滤波器，处理完的信号为

和

，滤波器的系数与QMF分析滤波器相同。 Step G. Through the QMF synthesis filter bank module, the low-frequency signal of the 8KHz frequency is used

and the estimated high-frequency signal

Increase the sampling frequency to 16kHz, and then pass through the low-pass and high-pass FIR filters respectively, and the processed signal is

and

, the coefficients of the filter are the same as the QMF analysis filter.

将两信号相加即得到最终16KHz采样频率的宽带信号： Add the two signals to get the final broadband signal with 16KHz sampling frequency:

。 .

本发明另提供一种基于音频水印的语音带宽扩展的装置。所述基于音频水印的语音带宽扩展的装置包括：QMF分析滤波器组模块、提取高频参数模块、G.711编解码模块、水印嵌入模块、提取水印模块、恢复高频语音模块及QMF合成滤波器组模块。 The present invention further provides a device for extending the voice bandwidth based on the audio watermark. The device for extending the voice bandwidth based on the audio watermark includes: a QMF analysis filter bank module, a module for extracting high-frequency parameters, a G.711 codec module, a watermark embedding module, a module for extracting watermarks, a module for restoring high-frequency voice, and a QMF synthesis filter group module.

所述QMF分析滤波器组模块将宽带语音分成两个部分：0～8000Hz的窄带语音和8000～16000Hz的高频分量；并将两个输出信号采样频率降至8KHz，得到低频信号s _L(n)和高频信号s _H(n)。 The QMF analysis filter bank module divides the wideband speech into two parts: the narrowband speech of 0～8000Hz and the high frequency component of 8000～16000Hz; and the sampling frequency of the two output signals is reduced to 8KHz to obtain the low frequency signal s _L ( n ) and high frequency signal s _H ( n ).

所述提取高频参数模块提取30个高频参数：16个时域包络参数、12个频域包络参数、平均时域包络参数和平均频域包络参数；该部分参考了文献《基于分层宽带语音编解码系统的DTX/CNG算法研究与实现》的做法，以下是各个参数的具体提取方法： The module for extracting high-frequency parameters extracts 30 high-frequency parameters: 16 time-domain envelope parameters, 12 frequency-domain envelope parameters, average time-domain envelope parameters and average frequency-domain envelope parameters; this part refers to the document " Based on the practice of DTX/CNG Algorithm Research and Implementation of Layered Wideband Speech Codec System, the following is the specific extraction method of each parameter:

提取16个时域包络参数和平均时域包络参数： Extract 16 time domain envelope parameters and average time domain envelope parameters:

。 .

计算平均时域包络： Compute the average time-domain envelope:

。

.

用时域包络参数T(i)与平均值

Normalize with difference:

。

.

提取12个频域包络参数和平均频域包络参数： Extract 12 frequency domain envelope parameters and average frequency domain envelope parameters:

高频分量s _H(n)的当前帧的160个采样点与上一帧的最后48个采用点经过加窗处理得，这里使用窗长208个样点窗函数window(n): The 160 sampling points of the current frame of the high-frequency component s _H ( n ) and the last 48 sampling points of the previous frame are obtained by windowing , here the window function window ( n ) with a window length of 208 samples is used:

。

.

其中，N=208。 Among them, N=208.

。

.

计算平均频域包络： Compute the average frequency-domain envelope:

。 .

将频域包络参数F(i)与平均值

Normalize with difference:

。

.

所述G.711编解码模块将窄带语音信号s _L(n) 通过A律编码器编码，得到每个点8bit数据长度的码流，将水印信息嵌入到码流中，通过电话线传送到网络中；接收端从码流中提取出水印信息，并通过A律解码器解码，得到窄带语音信号。 The G.711 encoding and decoding module encodes the narrowband voice signal s _L ( n ) through an A-law encoder to obtain a code stream of 8 bit data length for each point, embeds the watermark information into the code stream, and transmits it to the In the network; the receiving end extracts the watermark information from the code stream, and decodes it through an A-law decoder to obtain a narrowband voice signal.

所述水印嵌入模块将水印嵌入到码流中包括以下两种方式： The watermark embedding module includes the following two ways to embed the watermark in the code stream:

方式一：通过水印嵌入模块将水印均匀的嵌入到码流中：由于一帧信号有160个采样点，而嵌入水印的比特数为66bit，每隔一个采样点嵌入1比特信息。 Method 1: Evenly embed the watermark into the code stream through the watermark embedding module: Since there are 160 sampling points in a frame signal, and the number of bits embedded in the watermark is 66 bits, 1 bit of information is embedded in every other sampling point.

方式二：通过水印嵌入模块将水印信息有选择的嵌入到幅度小的抽样点中；使用C0～C7代表编码码流的最低位到最高位；根据G.711协议，最高位C7代表采样点的符号位，C6～C4为段落码，C3～C0为段内码；段落码越小，码流所代表的采样值的幅度越小；本方法使用C6位将信号划分为大信号，即C6=1和小信号，即C6=0，当C6为0时嵌入水印；如果一帧嵌入的位置不够66个，则选择在其他位置嵌入水印。 Method 2: Use the watermark embedding module to selectively embed watermark information into small sampling points; use C0~C7 to represent the lowest bit to the highest bit of the encoded code stream; according to the G.711 protocol, the highest bit C7 represents the sampling point Sign bit, C6~C4 are paragraph codes, C3~C0 are intra-segment codes; the smaller the paragraph code, the smaller the sampling value represented by the code stream; this method uses C6 bit to divide the signal into large signals, that is, C6= 1 and small signal, that is, C6=0, when C6 is 0, the watermark is embedded; if there are not enough 66 embedded positions in one frame, choose to embed the watermark in other positions. the

所述提取水印模块提取水印与水印嵌入模块对应，包括两种方式： The watermark extraction module corresponding to the watermark extraction module includes two methods:

方式一：通过提取水印模块提取水印的过程是根据嵌入水印的位置进行提取。 Method 1: The process of extracting the watermark by the watermark extracting module is based on the position of the embedded watermark.

方式二：根据码流的特点来判断是否嵌入了水印；从一帧的起始判断，若C6为0，则从最低位提取水印，C6为1时不提取水印；若到达帧尾时提取的水印不足66比特，则返回一帧的起始点，在C6为1处的位置提取，直到提取66比特水印。 Method 2: Determine whether a watermark is embedded according to the characteristics of the code stream; judge from the beginning of a frame, if C6 is 0, extract the watermark from the lowest bit, and do not extract the watermark when C6 is 1; if it reaches the end of the frame, extract the watermark If the watermark is less than 66 bits, return to the starting point of a frame, and extract at the position where C6 is 1, until a 66-bit watermark is extracted.

所述恢复高频语音模块使用白噪声来恢复高频语音： The recovery high-frequency speech module uses white noise to restore high-frequency speech:

使用白噪声恢复高频语音： Recover high-frequency speech using white noise:

由于高频语音和低频语音有一定的相关性，使用解码得到的低频语音构造AR模型；在解码端产生白噪声序列，将此序列通过构造的AR模型模块进行成型处理，使噪声具备高频语音的特征。 Since there is a certain correlation between high-frequency speech and low-frequency speech, the low-frequency speech obtained by decoding is used to construct an AR model; a white noise sequence is generated at the decoding end, and the sequence is shaped through the constructed AR model module to make the noise have high-frequency speech Characteristics.

时域包络局部调整，该部分参考了文献《基于分层宽带语音编解码系统的DTX/CNG算法研究与实现》的做法： Partial adjustment of the time domain envelope, this part refers to the practice of the document "Research and Implementation of DTX/CNG Algorithm Based on Layered Wideband Speech Codec System":

。

.

。

.

。

.

。

.

频域包络局部调整，该部分参考了文献《基于分层宽带语音编解码系统的DTX/CNG算法研究与实现》的做法： Local adjustment of the frequency domain envelope, this part refers to the practice of the document "Research and Implementation of DTX/CNG Algorithm Based on Layered Wideband Speech Codec System":

对时域调整后的信号按照提取12个频域包络参数和平均频域包络参数进行处理，得到噪声的对数加权子带能量参数和平均频域包络。按照时域包络局部调整中对噪声的时域包络局部调整方法，对噪声的频域包络进行局部调整。 The signal adjusted in the time domain is processed by extracting 12 frequency domain envelope parameters and the average frequency domain envelope parameters to obtain the logarithmically weighted subband energy parameters of the noise and the average frequency domain envelope . According to the local adjustment method of the time domain envelope of the noise in the local adjustment of the time domain envelope, the frequency domain envelope of the noise is locally adjusted.

频域包络全局调整： Frequency domain envelope global adjustment:

。

.

。

.

将调整后的频谱做IFFT变换，然后用window窗函数对得到时域信号加窗后存入长度为208的buffer装置中： Perform IFFT transformation on the adjusted spectrum, and then use the window window function to window the obtained time domain signal and store it in the buffer device with a length of 208:

。

.

其中，L=256, n=0,1,…207。 Among them, L=256, n=0,1,...207.

将前一帧buffer装置中的最后48个点的值与当前帧buffer装置中的前48个点相加，然后与当前帧buffer装置中n=48～159的值构成当前帧恢复出的时域信号。 Add the value of the last 48 points in the buffer device of the previous frame to the first 48 points in the buffer device of the current frame, and then form the time domain restored by the current frame with the value of n=48~159 in the buffer device of the current frame Signal.

时域包络全局调整： Time domain envelope global adjustment:

按照频域包络全局调整的步骤对时域包络进行全局调整，调整后的信号即为由噪声估计的高频信号。 Globally adjust the time-domain envelope according to the steps of global adjustment of the frequency-domain envelope, the adjusted signal That is, the high-frequency signal estimated by the noise.

所述QMF合成滤波器组模块将8KHz采用频率的低频信号

和估计出的高频信号提高采样频率到16kHz，然后分别通过低通和高通FIR滤波器，处理完的信号为

和

，滤波器的系数与QMF分析滤波器相同。 The QMF synthesis filter bank module converts low-frequency signals using a frequency of 8KHz

and the estimated high-frequency signal Increase the sampling frequency to 16kHz, and then pass through the low-pass and high-pass FIR filters respectively, and the processed signal is

and

, the coefficients of the filter are the same as the QMF analysis filter.

有益效果：本发明给出了一种基于音频水印改善话音质量的方法。该方法利用音频水印的特性，在窄带语音中建立一条隐藏的信道，利用此信道传输高频语音的参数，从而在不改变原有网络协议的前提下，实现了语音信号的频带扩展。本发明使用自适应音频水印实现语音带宽扩展，对原始语音的影响较小、嵌入的高频信息较多、鲁棒性好，适合各种类型的语音，恢复出的宽带语音听觉效果较窄带语音好。 Beneficial effect: the present invention provides a method for improving voice quality based on audio watermark. This method utilizes the characteristics of audio watermarking to establish a hidden channel in narrow-band speech, and uses this channel to transmit the parameters of high-frequency speech, thereby realizing the frequency band extension of speech signals without changing the original network protocol. The present invention uses the adaptive audio watermark to realize voice bandwidth expansion, which has less impact on the original voice, more embedded high-frequency information, good robustness, and is suitable for various types of voice, and the auditory effect of the recovered wideband voice is better than that of narrowband voice good.

附图说明 Description of drawings

图1 本发明原理框图。 Figure 1 is a schematic block diagram of the present invention.

图2 本发明window窗函数。 Fig. 2 The window function of the present invention.

图3 本发明G.711编码码流格式。 Fig. 3 The G.711 code stream format of the present invention.

图4 本发明恢复高频语音框图。 Fig. 4 is a block diagram of recovering high-frequency speech in the present invention.

具体实施方式 Detailed ways

下面结合附图和实施例对本发明进行详细说明。 The present invention will be described in detail below in conjunction with the accompanying drawings and embodiments.

图1给出了本发明完整的原理框图。开始部分，人发出的语音是宽带信号，在通过电话线传输之前，将高频参数嵌入到窄带码流中，通过电话线传输窄带语音信号；在接收端进行A律解码，然后使用高频参数提取模块提取高频参数，使用高频参数合成模块恢复宽带语音中的高频部分，最后将高频语音和低频语音合成宽带语音。 Fig. 1 has provided the complete functional block diagram of the present invention. At the beginning, the human voice is a wideband signal. Before transmission through the telephone line, the high frequency parameters are embedded into the narrowband code stream, and the narrowband voice signal is transmitted through the telephone line; A-law decoding is performed at the receiving end, and then the high frequency parameters are used The extraction module extracts high-frequency parameters, uses the high-frequency parameter synthesis module to restore the high-frequency part in the broadband speech, and finally synthesizes the high-frequency speech and the low-frequency speech into broadband speech.

本发明原理框图中涉及的各个模块介绍如下： Each module involved in the principle block diagram of the present invention is introduced as follows:

1、QMF分析滤波器组模块 1. QMF analysis filter bank module

开始部分人发出语音是宽带语音，而电话线上传输的是窄带语音，所以本发明使用QMF分析滤波器组将宽带语音分成两个部分：0～8000Hz的窄带语音和8000～16000Hz的高频分量。本发明中的QMF分析滤波器使用64阶的FIR滤波器，低通FIR滤波器h_L(n)的系数见附录。高通滤波器h_H(n)是由低通滤波器h_L(n)频移得到，也就是使用复正弦序列

调制，即：

=

=

。 At the beginning part of people's speech is wideband speech, and what is transmitted on the telephone line is narrowband speech, so the present invention uses QMF analysis filter bank to divide broadband speech into two parts: the narrowband speech of 0～8000Hz and the high frequency component of 8000～16000Hz . The QMF analysis filter in the present invention uses a 64-order FIR filter, and the coefficients of the low-pass FIR filter h _L (n) are shown in the appendix. The high-pass filter h _H (n) is obtained by frequency-shifting the low-pass filter h _L (n), that is, using the complex sine sequence

modulation, that is:

=

.

将宽带信号通过QMF分析滤波器组，并将两个输出信号采样频率降至8KHz，就可得到低频信号s_L(n)和高频信号s_H(n)。 By passing the broadband signal through the QMF analysis filter bank and reducing the sampling frequency of the two output signals to 8KHz, the low frequency signal s _L (n) and the high frequency signal s _H (n) can be obtained.

2、提取高频参数模块 2. Extract high-frequency parameter module

本发明所需提取30个高频参数：16个时域包络参数、12个频域包络参数、平均时域包络参数和平均频域包络参数。以下是各个参数的具体提取方法。 The present invention needs to extract 30 high frequency parameters: 16 time domain envelope parameters, 12 frequency domain envelope parameters, average time domain envelope parameters and average frequency domain envelope parameters. The specific extraction method of each parameter is as follows.

(1) 提取16个时域包络参数和平均时域包络参数 (1) Extract 16 time-domain envelope parameters and average time-domain envelope parameters

每20ms的高频分量s_H(n)等分为16段，每段包括10个采样点。16个时域包络参数为： The high-frequency component s _H (n) of every 20ms is equally divided into 16 segments, and each segment includes 10 sampling points. The 16 time domain envelope parameters are:

。

.

计算平均时域包络： Compute the average time-domain envelope:

。

.

用时域包络参数T(i)与平均值

作差进行归一化： Using the time-domain envelope parameter T(i) and the mean

Normalize with difference:

。

.

(2) 提取12个频域包络参数和平均频域包络参数 (2) Extract 12 frequency domain envelope parameters and average frequency domain envelope parameters

高频分量s_H(n)的当前帧的160个采样点与上一帧的最后48个采用点经过加窗处理得

，这里使用窗长208个样点窗函数window(n)： The 160 sampling points of the current frame of the high-frequency component s _H (n) and the last 48 sampling points of the previous frame are obtained by windowing

, here the window function window(n) with a window length of 208 samples is used:

。

.

其中，N=208。窗函数如图2所示。 Among them, N=208. The window function is shown in Figure 2.

对加窗后的信号补0至256点，然后做256点的FFT变换得S_F(k)： Add 0 to 256 points to the windowed signal, and then perform FFT transformation of 256 points to get S _F (k):

。

.

其中，L=256。将频域分为12个均匀间隔，计算每个间隔的频域包络参数，并转换成对数加权子带能量参数。频域包络子带划分和各自带的对数加权能量F(i)的计算方法见附录。 Among them, L=256. The frequency domain is divided into 12 uniform intervals, and the frequency domain envelope parameters of each interval are calculated and converted into logarithmically weighted subband energy parameters. See the appendix for the division of frequency domain envelope subbands and the calculation method of the logarithmically weighted energy F(i) of each band.

计算平均频域包络： Compute the average frequency-domain envelope:

。

.

将频域包络参数F(i)与平均值

作差进行归一化： The frequency domain envelope parameters F(i) and the average

Normalize with difference:

。

.

3、G.711编解码模块 3. G.711 codec module

将窄带语音信号s_L(n) 通过A律编码器编码，得到每个点8bit数据长度的码流，将水印信息嵌入到码流中，通过电话线传送到网络中。接收端从码流中提取出水印信息，并通过A律解码器解码，得到窄带语音信号。 The narrowband voice signal s _L (n) is encoded by an A-law encoder to obtain a code stream with a data length of 8 bits for each point, and the watermark information is embedded into the code stream, which is transmitted to the network through the telephone line. The receiving end extracts the watermark information from the code stream, and decodes it through an A-law decoder to obtain a narrowband voice signal.

4、水印嵌入模块 4. Watermark embedding module

现有的最低有效位嵌入算法是简单的将水印信息嵌入到窄带码流的最低位中，针对传输协议的特点和人耳的主观感觉，本文提出两种改进型最低有效位嵌入算法。 The existing LSB embedding algorithm simply embeds the watermark information into the lowest bit of the narrowband code stream. According to the characteristics of the transmission protocol and the subjective perception of the human ear, this paper proposes two improved LSB embedding algorithms.

第一种方法是将水印较为均匀的嵌入到码流中：由于一帧信号有160个采样点，而嵌入水印的比特数为66bit，可以每隔一个采样点嵌入1比特信息。这样可以避免因局部失真过大导致听觉效果时好时坏，使整体听觉效果保持在一个较高的水平。 The first method is to embed the watermark into the code stream more uniformly: since there are 160 sampling points in a frame signal, and the number of bits embedded in the watermark is 66 bits, 1 bit of information can be embedded in every other sampling point. In this way, the auditory effect can be avoided due to excessive local distortion, so that the overall auditory effect can be kept at a high level.

第二种方法是根据传输协议的特点和人耳的听觉特性提出一种有选择的最低有效位嵌入算法。G.711使用的是非均匀量化，信号抽样值小时，量化间隔也小；信号抽样值大时，量化间隔也大。所以，如果改变小的抽样值的编码码流，抽样值的变化幅度小，改变大的抽样值的编码码流，抽样值的变化幅度大。这样使得无论将水印嵌入到小的抽样点还是大的抽样点，理论上讲得到的信噪比变化很小。但是根据人耳的时域掩蔽效应，一个大信号对后面小信号的掩蔽效应，使得小信号的修改不易被人耳察觉。根据这个特性，可以将水印信息有选择的嵌入到幅度小的抽样点中，使得水印的隐藏性更好。使用C0～C7代表编码码流的最低位到最高位，如图3所示。跟据G.711协议，最高位C7代表采样点的符号位，C6～C4为段落码，C3～C0为段内码。段落码越小，码流所代表的采样值的幅度越小。本文使用C6位将信号划分为大信号（C6=1）和小信号（C6=0），当C6为0时嵌入水印。如果一帧嵌入的位置不够66个，则选择在其他位置嵌入水印。 The second method is to propose a selective LSB embedding algorithm according to the characteristics of the transmission protocol and the auditory characteristics of the human ear. G.711 uses non-uniform quantization. When the signal sampling value is small, the quantization interval is also small; when the signal sampling value is large, the quantization interval is also large. Therefore, if you change the coded stream with a small sample value, the change range of the sample value is small, and if you change the code stream with a large sample value, the change range of the sample value is large. In this way, no matter whether the watermark is embedded in a small sampling point or a large sampling point, theoretically speaking, the signal-to-noise ratio changes little. However, according to the time-domain masking effect of the human ear, the masking effect of a large signal on the subsequent small signal makes the modification of the small signal difficult to be detected by the human ear. According to this feature, the watermark information can be selectively embedded into the sampling points with small amplitude, so that the watermark can be hidden better. Use C0-C7 to represent the lowest bit to the highest bit of the encoded code stream, as shown in Figure 3. According to the G.711 protocol, the highest bit C7 represents the sign bit of the sampling point, C6~C4 are paragraph codes, and C3~C0 are intra-segment codes. The smaller the paragraph code, the smaller the amplitude of the sampling value represented by the code stream. In this paper, the C6 bit is used to divide the signal into a large signal (C6=1) and a small signal (C6=0), and a watermark is embedded when C6 is 0. If there are not enough 66 embedded positions in a frame, choose to embed watermarks in other positions.

5、提取水印模块 5. Extract watermark module

根据嵌入算法的不同，使用与其对应的水印提取方法。第一种算法提取水印的过程是根据嵌入水印的位置进行提取。第二种方法是根据码流的特点来判断是否嵌入了水印。从一帧的起始判断，若C6为0，则从最低位提取水印，C6为1时不提取水印。若到达帧尾时提取的水印不足66比特，则返回一帧的起始点，在C6为1处的位置提取，直到提取66比特水印。 Depending on the embedding algorithm, use the corresponding watermark extraction method. The process of extracting the watermark by the first algorithm is based on the position of the embedded watermark. The second method is to judge whether the watermark is embedded according to the characteristics of the code stream. Judging from the beginning of a frame, if C6 is 0, the watermark is extracted from the lowest bit, and when C6 is 1, no watermark is extracted. If the extracted watermark is less than 66 bits when reaching the end of the frame, return to the starting point of a frame, and extract at the position where C6 is 1, until a 66-bit watermark is extracted.

6、恢复高频语音模块 6. Restore the high-frequency voice module

由于高频语音特性与噪声比较类似，本模块使用白噪声来恢复高频语音。首先将产生的白噪声序列通过由低频语音构造的AR模型，然后使用提取的高频参数对其进行时域包络整形和频域包络整形，即可得到高频语音信号。恢复高频语音框图如图4所示。 Since the characteristics of high-frequency speech are similar to noise, this module uses white noise to restore high-frequency speech. First, pass the generated white noise sequence through the AR model constructed from low-frequency speech, and then use the extracted high-frequency parameters to perform time-domain envelope shaping and frequency-domain envelope shaping to obtain high-frequency speech signals. The block diagram of recovering high-frequency speech is shown in Figure 4.

(1) 使用白噪声恢复高频语音 (1) Restoring high-frequency speech using white noise

由于高频语音和低频语音有一定的相关性，使用解码得到的低频语音构造AR模型。在解码端产生白噪声序列，将此序列通过构造的AR模型进行成型处理，使噪声具备高频语音的特征。 Since there is a certain correlation between high-frequency speech and low-frequency speech, the AR model is constructed using the decoded low-frequency speech. A white noise sequence is generated at the decoding end, and the sequence is shaped through the constructed AR model to make the noise have the characteristics of high-frequency speech.

(2) 时域包络局部调整 (2) Local adjustment of the time domain envelope

。

.

。

.

。

.

。

.

(3) 频域包络局部调整 (3) Local adjustment of the frequency domain envelope

和平均频域包络

and the average frequency domain envelope

(4) 频域包络全局调整 (4) Global adjustment of the frequency domain envelope

。

.

。

.

将调整后的频谱做IFFT变换，然后用图2的window窗函数对得到时域信号加窗后存入长度为208的buffer中： Perform IFFT transformation on the adjusted spectrum, and then use the window window function in Figure 2 to window the obtained time-domain signal and store it in the buffer with a length of 208:

。

.

其中，L=256, n=0,1,…207。 Among them, L=256, n=0,1,...207.

(5) 时域包络全局调整 (5) Global adjustment of the time domain envelope

That is, the high-frequency signal estimated by the noise.

7、QMF合成滤波器组模块 7. QMF synthesis filter bank module

将8KHz采用频率的低频信号

和估计出的高频信号

和

，滤波器的系数与QMF分析滤波器相同。 8KHz low-frequency signal with frequency

and the estimated high-frequency signal

and

, the coefficients of the filter are the same as the QMF analysis filter.

。

.

总结：本实施例提出两种改进型最低有效位水印嵌入算法。一种改进方法是每隔一个采样点嵌入1比特信息，这样可以避免因局部失真过大导致听觉效果时好时坏，使整体听觉效果保持在一个较高的水平。另一种改进方法是根据传输协议的特点和人耳的听觉特性提出一种有选择的最低有效位嵌入算法。根据人耳的时域掩蔽效应，一个大信号对后面小信号的掩蔽效应，使得小信号的修改不易被人耳察觉。根据这个特性，可以将水印信息有选择的嵌入到幅度小的抽样点中，使得水印的隐藏性更好。 Summary: This embodiment proposes two improved LSB watermark embedding algorithms. An improvement method is to embed 1-bit information at every other sampling point, which can avoid the good and bad auditory effect caused by excessive local distortion, and keep the overall auditory effect at a high level. Another improvement method is to propose a selective LSB embedding algorithm according to the characteristics of the transmission protocol and the auditory characteristics of the human ear. According to the time-domain masking effect of the human ear, the masking effect of a large signal on the subsequent small signal makes the modification of the small signal difficult to be detected by the human ear. According to this feature, the watermark information can be selectively embedded into the sampling points with small amplitude, so that the watermark can be hidden better.

本系统基于上述水印算法，将语音信号中的高频信息嵌入到窄带码流中，通过有线电话网传输出去，在接收端提取出语音的高频参数，合成宽带语音，从而实现语音信号的频谱扩展。由于水印算法的掩蔽性更好，所以即使在接收端没有提取水印与合成宽带语音的功能模块，也不会影响正常的通话质量。而具有该功能的电话终端将会听到扩展频谱后的语言，通话质量得到很大改善。 Based on the above watermarking algorithm, this system embeds the high-frequency information in the voice signal into the narrow-band code stream, transmits it through the wired telephone network, extracts the high-frequency parameters of the voice at the receiving end, and synthesizes broadband voice, thereby realizing the frequency spectrum of the voice signal expand. Because the watermark algorithm has better concealment, even if there is no functional module for extracting watermark and synthesizing broadband voice at the receiving end, it will not affect the normal call quality. The telephone terminal with this function will hear the language after the spread spectrum, and the call quality is greatly improved.

以上内容是结合优选技术方案对本发明所做的进一步详细说明，不能认定发明的具体实施仅限于这些说明。对本发明所属技术领域的普通技术人员来说，在不脱离本发明的构思的前提下，还可以做出简单的推演及替换，都应当视为本发明的保护范围。 The above content is a further detailed description of the present invention in combination with preferred technical solutions, and it cannot be assumed that the specific implementation of the invention is limited to these descriptions. For those of ordinary skill in the technical field to which the present invention belongs, simple deduction and substitutions can be made without departing from the concept of the present invention, which should be regarded as the protection scope of the present invention. the

附录 appendix

频域包络的自带划分： The built-in division of the frequency domain envelope:

各自带的对数加权能量F(i)的计算方法： Calculation method of logarithm-weighted energy F ( i ) of each belt:

0子带： 0 subband:

，

，

，

，

。

,

.

1~10子带： 1~10 subbands:

，

，

，

,

，，

，

，

, ,

,

，，

，

。

, ,

,

.

11子带： 11 subbands:

，

，

，

，

,

。

.

Claims

1. A method for voice bandwidth expansion based on audio watermarking, comprising the following steps:

Step A. Use the QMF analysis filter bank module to divide the wideband speech into two parts: the narrowband speech of 0-8000 Hz and the high-frequency component of 8000-16000 Hz; and pass the two output signals through a down-sampling module to reduce the sampling frequency to 8KHz, get low frequency signal s _L (n) and high frequency signal s _H (n);

Step B. Extract 30 high-frequency parameters by extracting high-frequency parameters module: 16 time-domain envelope parameters, 12 frequency-domain envelope parameters, average time-domain envelope parameters and average frequency-domain envelope parameters; the following are the parameters The specific extraction method:

Step B1. Extract 16 time-domain envelope parameters and average time-domain envelope parameters:

The high-frequency component s _H (n) of every 20ms is divided into 16 segments, each segment includes 10 sampling points; the 16 time-domain envelope parameters are:

T T ((i i)) = = \frac{11}{22} {log log}_{22} [[{Σ Σ}_{n no = = 00}^{99} {s the s}_{H h}^{22} ((n no + + 1010 i i))]],, i i = = 0,1 0,1,, \cdot &Center Dot; \cdot &Center Dot; \cdot &Center Dot;,, 1515

Compute the average time-domain envelope:

{M m}_{T T} = = \frac{11}{1616} {Σ Σ}_{i i = = 00}^{1515} T T ((i i))

Normalization is performed by making a difference between the time-domain envelope parameter T(i) and the mean value _MT :

T _M (i) = T (i) - M _T i = 0, 1, ..., 15

Step B2. extract 12 frequency domain envelope parameters and average frequency domain envelope parameters:

The 160 sampling points of the current frame of the high-frequency component s _H (n) and the last 48 sampling points of the previous frame are obtained by windowing

Here the window function window(n) with a window length of 208 samples is used:

{s the s}_{H h}^{w w} ((n no)) = = {s the s}_{H h} ((n no)) window window ((n no)),, n no = = 0,1 0,1,, . . \cdot &Center Dot; \cdot &Center Dot; \cdot &Center Dot; N N - - 11

Among them, N=208; add 0 to 256 points to the windowed signal, and then perform FFT transformation of 256 points to get S _F (k):

{S S}_{F f} ((k k)) = = FFT FFT [[{s the s}_{H h}^{w w} ((n no))]] = = {Σ Σ}_{n no = = 00}^{L L - - 11} {s the s}_{H h}^{w w} ((n no)) {e e}^{- - j j \frac{22 π π}{L L} kn k n},, k k = = 0,1 0,1,, \cdot &Center Dot; \cdot \cdot \cdot &Center Dot;,, L L - - 11

Among them, L=256; divide the frequency domain into 12 uniform intervals, calculate the frequency domain envelope parameters of each interval, and convert them into logarithmic weighted subband energy parameters;

Compute the average frequency-domain envelope:

{M m}_{F f} = = \frac{11}{1212} {Σ Σ}_{i i = = 00}^{1111} F f ((i i))

The difference between the frequency domain envelope parameter F(i) and the mean M _F is normalized:

F _M (i) = F (i) - M _F i = 0, 1, ..., 11

Step C. Use the G.711 codec module to encode the narrowband voice signal s _L (n) through an A-law encoder to obtain a code stream with a data length of 8 bits for each point, embed the watermark information into the code stream, and transmit it through the telephone line into the network; the receiving end extracts the watermark information from the code stream, and decodes it through an A-law decoder to obtain a narrowband voice signal;

Step D. Embedding the watermark into the code stream through the watermark embedding module includes the following two methods:

D1. Embed the watermark evenly into the code stream through the watermark embedding module: since there are 160 sampling points in a frame signal, and the number of bits embedded in the watermark is 66 bits, 1 bit of information is embedded in every other sampling point;

Or D2. Use the watermark embedding module to selectively embed the watermark information into the sampling points with small amplitude; use C0~C7 to represent the lowest bit to the highest bit of the coded stream; according to the G.711 protocol, the highest bit C7 represents the sampling point Sign bit, C6~C4 are paragraph codes, C3~C0 are intra-segment codes; the smaller the paragraph code, the smaller the sampling value represented by the code stream; this method uses C6 bit to divide the signal into large signals, that is, C6= 1 and small signal, that is, C6=0, when C6 is 0, the watermark is embedded;

If there are not enough 66 embedded positions in one frame, choose to embed watermarks in other positions;

Step E. Extracting the watermark through the extracting watermark module corresponds to step D, including adopting one of the following two methods:

E1. Extract the watermark according to the position where the watermark is embedded;

Or E2. Determine whether a watermark is embedded according to the characteristics of the code stream; judge from the beginning of a frame, if C6 is 0, then extract the watermark from the lowest bit, and do not extract the watermark when C6 is 1; if it reaches the end of the frame, extract the watermark If the watermark is less than 66 bits, return to the starting point of a frame, extract at the position where C6 is 1, until the 66-bit watermark is extracted;

Step F. Recover high-frequency speech using white noise by the Recover High-Frequency Speech module:

First pass the generated white noise sequence through the AR model constructed from low-frequency speech, and then use the extracted high-frequency parameters to perform time-domain envelope shaping and frequency-domain envelope shaping to obtain high-frequency speech signals;

Step F1. Recover high frequency speech using white noise:

Since high-frequency speech and low-frequency speech have a certain correlation, the low-frequency speech obtained by decoding is used to construct an AR model; a white noise sequence is generated at the decoding end, and the sequence is shaped through the constructed AR model to make the noise have the characteristics of high-frequency speech feature;

Step F2. Time-domain envelope local adjustment:

Compute the time envelope parameters of the high-frequency signal from the normalized time envelope parameters recovered from the watermark and the average time envelope parameters:

\overset{^^}{T T} ((i i)) = = {T T}_{M m} ((i i)) + + {M m}_{T T},, i i = = 0,1 0,1,, \cdot &Center Dot; \cdot \cdot \cdot \cdot,, 1515

Calculate the time-domain local gain factor from the time-domain envelope parameters of the noise and high-frequency signal:

gain gain__t t ((i i)) = = 22^{\overset{^^}{T T} ((i i)) - - \overset{~ ~}{T T} ((i i))}

In the above formula,

is the time-domain envelope parameter of the high-frequency signal,

is the time-domain envelope parameter of the white noise processed by the AR model;

Adjust the temporal envelope of the noise using a temporal local gain factor:

seed _t (n+10i)=seed(n+10i) gain_t(n+10i) n=0,1,...,9 i=0,1,...,15

In the above formula, seed is the generated white noise sequence, and the generation method is the mixed congruential method; the seed _t sequence is the white noise sequence modulated by the local gain factor in the time domain;

The gain factor between two segments is processed using a linear interpolation method:

Step F3. Local adjustment of the frequency domain envelope:

The signal adjusted in the time domain is processed by extracting 12 frequency domain envelope parameters and the average frequency domain envelope parameters to obtain the logarithmically weighted subband energy parameters of the noise

and the average frequency domain envelope

According to the local adjustment method of the time domain envelope of the noise in the local adjustment of the time domain envelope, the frequency domain envelope of the noise is locally adjusted;

Step F4. Global adjustment of the frequency domain envelope:

Calculate the frequency-domain global gain factor for each frame from the average frequency-domain envelope of the noise and high-frequency signal:

gain gain__mf mf = = 22^{{\overset{^^}{M m}}_{F f} - - {\overset{~ ~}{M m}}_{F f}}

In the above formula,

is the average frequency-domain envelope parameter of the high-frequency signal,

is the average frequency domain envelope parameter of the processed noise;

Globally adjust the frequency-domain envelope for each frame using a frequency-domain global gain factor:

{S S}_{g g} ((i i)) = = {\overset{~ ~}{S S}}_{F f} ((i i)) gain gain__mf mf,, i i = = 0,1 0,1,, \cdot \cdot \cdot \cdot \cdot &Center Dot;,, 255255

Perform IFFT transformation on the adjusted spectrum, and then use the window window function to window the obtained time domain signal and store it in a buffer with a length of 208:

buf buf ((n no)) = = window window ((n no)) * * IFFT IFFT {{{S S}_{g g} ((k k))}} = = window window ((n no)) * * \frac{11}{L L} {Σ Σ}_{k k = = 00}^{L L - - 11} {S S}_{g g} ((k k)) {e e}^{- - j j \frac{22 π π}{L L} nk nk}

Among them, L=256, n=0,1,...207;

Add the value of the last 48 points in the buffer of the previous frame to the first 48 points in the buffer of the current frame, and then form the time domain signal recovered by the current frame with the value of n=48~159 in the buffer of the current frame;

Step F5. Global adjustment of time domain envelope:

Globally adjust the time-domain envelope according to the steps of global adjustment of the frequency-domain envelope, the adjusted signal

That is, the high-frequency signal estimated by the noise;

Step G. Through the QMF synthesis filter bank module, the low-frequency signal of the 8KHz frequency is used and the estimated high-frequency signal

Increase the sampling frequency to 16kHz, and then pass through the low-pass and high-pass FIR filters respectively, and the processed signals are s _16L (n) and s _16H (H), and the coefficients of the filters are the same as those of the QMF analysis filter;

Add the two signals to get the final broadband signal with 16KHz sampling frequency:

{\overset{~ ~}{S S}}_{wb wb} ((n no)) = = {S S}_{1616 L L} ((n no)) + + {S S}_{1616 H h} ((n no)) . .

2. A device based on the voice bandwidth expansion of the audio watermark, characterized in that, the device of the voice bandwidth expansion based on the audio watermark comprises: a QMF analysis filter bank module, a high-frequency parameter extraction module, and a G.711 codec module , a watermark embedding module, a watermark extraction module, a recovery high-frequency speech module and a QMF synthesis filter bank module;

The QMF analysis filter bank module divides the wideband speech into two parts: the narrowband speech of 0～8000Hz and the high frequency component of 8000～16000Hz; and the sampling frequency of the two output signals is reduced to 8KHz to obtain the low frequency signal s _L (n ) and high frequency signal s _H (n);

Extract 30 high-frequency parameters by the module of extracting high-frequency parameters: 16 time-domain envelope parameters, 12 frequency-domain envelope parameters, average time-domain envelope parameters and average frequency-domain envelope parameters; the following are the parameters of each parameter Specific extraction method:

Extract 16 time domain envelope parameters and average time domain envelope parameters:

T T ((i i)) = = \frac{11}{22} {log log}_{22} [[{Σ Σ}_{n no = = 00}^{99} {s the s}_{H h}^{22} ((n no + + 1010 i i))]],, i i = = 0,1 0,1,, \cdot &Center Dot; \cdot &Center Dot; \cdot &Center Dot;,, 1515

Compute the average time-domain envelope:

{M m}_{T T} = = \frac{11}{1616} {Σ Σ}_{i i = = 00}^{1515} T T ((i i))

T _M (i) = T (i) - M _T i = 0, 1, ..., 15

Extract 12 frequency domain envelope parameters and average frequency domain envelope parameters:

Here the window function window(n) with a window length of 208 samples is used:

{s the s}_{H h}^{w w} ((n no)) = = {s the s}_{H h} ((n no)) window window ((n no)),, n no = = 0,1 0,1,, \cdot \cdot \cdot \cdot \cdot \cdot,, N N - - 11

Among them, N=208;

Add 0 to 256 points to the windowed signal, and then perform FFT transformation of 256 points to get S _F (k):

{S S}_{F f} ((k k)) = = FFT FFT [[{s the s}_{H h}^{w w} ((n no))]] = = {Σ Σ}_{n no = = 00}^{L L - - 11} {s the s}_{H h}^{w w} ((n no)) {e e}^{- - j j \frac{22 π π}{L L} kn k n},, k k = = 0,1 0,1,, \cdot \cdot \cdot \cdot \cdot \cdot,, L L - - 11

Compute the average frequency-domain envelope:

{M m}_{F f} = = \frac{11}{1212} {Σ Σ}_{i i = = 00}^{1111} F f ((i i))

F _M (i) = F (i) - M _F i = 0, 1, ..., 11

The G.711 codec module encodes the narrowband voice signal s _L (n) through an A-law encoder to obtain a code stream with a data length of 8 bits at each point, embeds the watermark information into the code stream, and transmits it to the network through the telephone line Middle; the receiving end extracts the watermark information from the code stream, and decodes it through an A-law decoder to obtain a narrowband voice signal;

The watermark embedding module embeds the watermark into the code stream, including adopting one of the following two methods:

Method 1: Embed the watermark evenly into the code stream through the watermark embedding module: since there are 160 sampling points in a frame signal, and the number of bits embedded in the watermark is 66 bits, 1 bit of information is embedded in every other sampling point;

Method 2: Use the watermark embedding module to selectively embed the watermark information into the sampling points with small amplitude; use C0~C7 to represent the lowest bit to the highest bit of the encoded code stream; according to the G.711 protocol, the highest bit C7 represents the sampling point Sign bit, C6~C4 are paragraph codes, C3~C0 are intra-segment codes; the smaller the paragraph code, the smaller the sampling value represented by the code stream; this method uses C6 bit to divide the signal into large signals, that is, C6= 1 and small signal, that is, C6=0, when C6 is 0, embed the watermark; if there are not enough 66 embedded positions in one frame, choose to embed the watermark in other positions;

The extracting watermark module extracts the watermark corresponding to the watermark embedding module, including adopting one of the following two methods:

Method 1: The process of extracting the watermark through the watermark extraction module is based on the position of the embedded watermark;

Method 2: Determine whether a watermark is embedded according to the characteristics of the code stream; judge from the beginning of a frame, if C6 is 0, extract the watermark from the lowest bit, and do not extract the watermark when C6 is 1; if it reaches the end of the frame, extract the watermark If the watermark is less than 66 bits, return to the starting point of a frame, extract at the position where C6 is 1, until the 66-bit watermark is extracted;

The recovery high-frequency speech module uses white noise to restore high-frequency speech:

Recover high-frequency speech using white noise:

Time Domain Envelope Local Adjustment:

\overset{^^}{T T} ((i i)) = = {T T}_{M m} ((i i)) + + {M m}_{T T},, i i = = 0,1 0,1,, \cdot &Center Dot; \cdot \cdot \cdot &Center Dot;,, 1515

Calculate the time-domain local gain factor from the time-domain envelope parameters of the noise and high-frequency signals:

gain gain__t t ((i i)) = = 22^{\overset{^^}{T T} ((i i)) - - \overset{~ ~}{T T} ((i i))}

In the above formula,

is the time-domain envelope parameter of the high-frequency signal,

Adjust the temporal envelope of the noise using a temporal local gain factor:

seed _t (n+10i)=seed(n+10i)gain_t(n+10i) n=0,1,...,9 i=0,1,...,15

Frequency Domain Envelope Local Adjustment:

and the average frequency domain envelope According to the local adjustment method of the time domain envelope of the noise in the local adjustment of the time domain envelope, the frequency domain envelope of the noise is locally adjusted;

Frequency domain envelope global adjustment:

gain gain__mf mf = = 22^{{\overset{^^}{M m}}_{F f} - - {\overset{~ ~}{M m}}_{F f}}

In the above formula,

is the average frequency domain envelope parameter of the processed noise;

i=0,1,...,255

In the above formula,

To adjust the frequency domain envelope of each frame of speech before;

buf buf ((n no)) = = window window ((n no)) * * IFFT IFFT {{{S S}_{g g} ((k k))}} = = window window ((n no)) * * \frac{11}{L L} {Σ Σ}_{k k = = 00}^{L L - - 11} {S S}_{g g} ((k k)) {e e}^{- - j j \frac{22 π π}{L L} nk nk}

Among them, L=256, n=0,1,...207;

Time domain envelope global adjustment:

That is, the high-frequency signal estimated by the noise;

The QMF synthesis filter bank module converts low-frequency signals using a frequency of 8KHz

and the estimated high-frequency signal Increase the sampling frequency to 16kHz, and then pass through the low-pass and high-pass FIR filters respectively. The processed signals are s _16L (n) and s _16H (n), and the coefficients of the filters are the same as those of the QMF analysis filter;

{\overset{~ ~}{S S}}_{wb wb} ((n no)) = = {S S}_{1616 L L} ((n no)) + + {S S}_{1616 H h} ((n no)) . .