Echo cancellation method based on adaptive decorrelation and variable step length proportional M estimation
Technical Field
The invention belongs to the technical field of self-adaptive echo cancellation of voice communication, and particularly relates to an echo cancellation method based on self-adaptive decorrelation and variable step length proportional M estimation.
Background
Acoustic echo is unavoidable in voice communication where both microphone and speaker are required. The sound of the far-end speaker is played through the near-end speaker, is directly or indirectly received by the near-end microphone, and is transmitted back to the far-end, so that the far-end speaker hears own delayed sound, namely acoustic echo. The path of sound propagation from the speaker to the microphone is called the echo path and its impulse response vector is denoted w o. The impulse response of an acoustic echo path tends to be sparse, i.e., most of the elements of w o are zero or near zero, with a few of the elements having a large amplitude. Acoustic echo is the most dominant factor affecting voice call quality.
Currently, in order to cancel echo, the most internationally recognized adaptive echo cancellation technique is the most effective. In essence, adaptive voice echo cancellation is also a problem of identifying the impulse response of the echo path, i.e. the adaptive filter can adjust the weight of the adaptive filter (which is also the estimated value of the impulse response of the echo path) according to the change of the environment, obtain the estimated value of the voice echo (the output signal of the adaptive filter), then subtract the estimated value from the signal received by the near-end microphone, obtain a clean signal and transmit it to the far-end, so as to achieve the purpose of eliminating the echo. Therefore, it is a critical issue to design an adaptive filter algorithm with excellent performance. Because of the sparsity of the echo path impulse response, the proportional least mean square (Proportional Normalized LEAST MEAN square, PNLMS) algorithm has a faster convergence speed than the Normalized LEAST MEAN square, NLMS algorithm. This is because PNLMS algorithm can utilize the prior condition of channel sparseness to allocate more gains to the filter coefficients with large amplitude to achieve rapid convergence, thereby improving the overall convergence speed of the algorithm.
In actual conversation, however, the speech signal often encounters the effects of impulse noise. In this case, the PNLMS algorithm performance may be degraded or even diverge. Based on Normalized LEAST MEAN M-estimate (NLMM) algorithm, huang Zhangliang combines PNLMS algorithm, and adopts improved Huber norm to obtain a class of proportional Normalized minimum mean M-estimated (Proportionate Normalized LEAST MEAN M-estimate, PNLMM) algorithm which has better resistance to impact noise "Huang Zhangliang. However, since the speech signal is a highly correlated and non-stationary signal, and the PNLMM algorithm does not have decorrelation capability. Therefore, in speech echo cancellation, the convergence speed of PNLMM algorithm is not ideal. On the other hand, PNLMM algorithm adopts fixed step length, so that there is a contradiction between convergence speed and convergence level, namely, the step length is large, the steady state of algorithm is poor, the convergence speed is fast, otherwise, the step length is small, the convergence speed of algorithm is slow, and the convergence level is good.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide an echo cancellation method based on self-adaptive decorrelation and variable step length proportional M estimation, which has strong decorrelation capability on voice signals, high convergence speed, good robustness and good echo cancellation effect.
The invention aims at realizing the echo cancellation method based on self-adaptive decorrelation and variable step length proportional M estimation, which comprises the following steps:
A. Signal acquisition
Sampling a voice signal transmitted from a far end and played through a near-end loudspeaker to obtain a far-end sound signal discrete value x (n) of the current moment n, and simultaneously sampling a desired signal collected by a near-end microphone to obtain a desired signal discrete value d (n) of the current moment n (wherein d (n) comprises an echo signal and interference noise);
B. calculating a decorrelation coefficient vector
And B1, forming a self-adaptive decorrelation input vector u (n) of the current moment n by using the value of the far-end sound signal discrete value x (n) obtained in the step A at the moment from n-1 to n-K, wherein u (n) = [ x (n-1) x (n-2) ] T, wherein the superscript T represents the transposed operation of the vector, K represents the decorrelation order, and K is more than or equal to 1. As the K value increases, the convergence speed of the algorithm increases, but the steady state becomes worse;
B2, calculating an adaptive decorrelation error signal e xd (n) at the current time n, Wherein the method comprises the steps ofThe length of the tap weight vector of the self-adaptive decorrelator at the moment n-1 is equal to K, and the initial value is zero vector;
b3, updating the self-adaptive decorrelation coefficient vector at the time n The calculation formula is as follows:
Wherein mu a is the decorrelation step length, and the value range is more than or equal to 0.001 and less than or equal to mu a and less than or equal to 0.05;
C. calculating the decorrelated input vector x D (n) and the decorrelated desired signal d D (n)
C1, calculating the decorrelated input signal x D (n) at the current time n,Using the value of the decorrelated input signal x D (n) from time n to time n-l+1 to form a decorrelated input vector x D(n),xD(n)=[xD(n) xD(n-1) … xD(n-L+1)]T at the current time n, where L is the number of adaptive filter taps, and in acoustic echo cancellation, l=512 or 1024 is often taken;
C2, using the values of the discrete values d (n) of the desired signal obtained in the step A from the time n-1 to the time n-K to form a decorrelated desired vector d (n) of the current time n, d (n) = [ d (n-1) d (n-2.. D (n-K) ] T, calculating a decorrelated desired signal d D (n) of the current time n,
D. Calculating an output y (n) of the adaptive filter and a decorrelated adaptive filter output y D (n)
D1, using the value of the far-end sound signal discrete value x (n) obtained in the step a from the time n to the time n-l+1 to form an input vector x (n) of the current time n, wherein x (n) = [ x (n) x (n-1) & x (n-l+1) ] T, calculating an output signal y (n) of the adaptive filter of the current time n, y (n) = w T (n) x (n), wherein w (n) = [ w 1(n) w2(n) … wL(n)]T ] is a tap weight vector of the adaptive filter of the time n, the length is equal to L, and the initial value is zero vector;
d2, calculating the self-adaptive filter output y D(n),yD(n)=wT(n)xD (n) after the decorrelation of the current moment n by using the input vector x D (n) after the decorrelation of the current moment n in the step C1;
E. calculating an error signal e (n) and a de-correlated error signal e D (n)
E1, subtracting the near-end expected signal D (n) of the current moment n obtained in the step A from the adaptive filter output signal y (n) obtained in the step D1 to obtain an error signal E (n) of the moment n, namely E (n) =d (n) -y (n), and transmitting the E (n) as a clean signal after echo cancellation to a far-end, so that a far-end speaker cannot hear the previous sound of the far-end speaker, and the purpose of echo cancellation is achieved;
E2, subtracting the output y D (n) of the self-adaptive filter after the decorrelation obtained in the step D2 from the expected signal D D (n) after the decorrelation of the current time n obtained in the step C2 to obtain an error signal E D (n) after the decorrelation of the time n, namely E D(n)=dD(n)-yD (n);
F. calculating a proportional matrix G (n)
The proportional matrix G (n) is a pair of corner matrices G (n) =diag [ G 1(n) g2(n) … gL (n) ], where the p-th diagonal element G p (n) of the current time n is represented byCalculating, wherein p is more than or equal to 1 and less than or equal to L, wherein kappa is always 0 or-0.5 or-0.75, |·| 1 represents the 1-norm of the vector;
G. computing a robust decorrelated error signal for iteration in M-estimation
G1, square of the de-correlated error signal E D (n) at time n obtained in step E2The values from time N to time N-N w +1 constitute the de-correlated error estimate sample a e (N) at the current time,Wherein N w is the error estimated sample length after the selected decorrelation;
Calculating a decorrelated error variable σ 2(n),σ2(n)=λσ2(n-1)+C(1-λ)med(Ae (N) free of impact interference, wherein the initial value of σ 2 (N) is zero, λ forgetting factor, and 0< < λ <1, c=2.2 [ 1+5/(N w+1)]2, med ()) is the median operator;
The calculation formula of the error signal threshold value parameter xi after the decorrelation is that xi= 2.576 σ 2 (n);
g2, robust decorrelated error signal The calculation formula of (2) is as follows:
H. Calculating step size parameter mu (n)
H1, the error signal after robust decorrelation according to step G2Calculating the mean square robust error at time nWherein χ is a forgetting factor, and 0< < χ <1;
H2 calculating the mean square error of the decorrelated input signal at time n from the decorrelated input signal x D (n) obtained in step C1
Computing a decorrelated input signal vector x D (n) and a robust error signal at time nIs a related parameter r (n),
Calculating the excess mean square error of time n
H3, calculating the value of the step parameter mu (n) at the time n,
I. updating the tap weight vector of the adaptive filter, and entering the next time processing:
the tap weight vector w (n + 1) of the adaptive filter at the next instant n +1 is calculated,
Let n=n+1, repeat A, B, C, D, E, F, G, H, I steps until the call ends.
The invention has the beneficial effects that the remote sound and the expected signal are subjected to de-correlation processing by utilizing a self-adaptive de-correlation method to obtain de-correlated sound and expected signal, and then the de-correlated sound and expected signal are sent to a self-adaptive filter for self-adaptive processing; the invention greatly reduces the calculation complexity of the algorithm for carrying out high-order decorrelation by adjusting the direct decorrelation into the self-adaptive decorrelation, shortens the time for carrying out decorrelation processing on the acquired signals, and shortens the overall operation time of the algorithm, on the other hand, the invention deduces the optimal formula of the step length by utilizing the fastest gradient descent method through minimizing posterior error, realizes the step length in the initial iteration, and the algorithm obtains rapid convergence, and then the step length gradually reduces along with the iteration, thereby obtaining better steady state.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
Fig. 2 is an impulse response of an echo path in a simulation experiment.
Fig. 3 is a far-end speech signal and a near-end desired signal in a simulation experiment.
Fig. 4 is a plot of the algorithm PNLMS, PNLMM and the detuning of the 1 st implementation of the invention.
Fig. 5 is a clean signal obtained by subtracting an estimated desire from an estimated desired signal and a near-end desired signal obtained by filtering a far-end speech signal using an estimated echo path impulse response. .
Detailed Description
The technical solution of the present invention will be described in further detail with reference to the accompanying drawings, but the scope of the present invention is not limited to the following description.
As shown in fig. 1, the echo cancellation method based on adaptive decorrelation and variable step size proportional M estimation comprises the following steps:
A. Signal acquisition
Sampling a voice signal transmitted from a far end and played through a near-end loudspeaker to obtain a far-end sound signal discrete value x (n) of the current moment n, and simultaneously sampling a desired signal collected by a near-end microphone to obtain a desired signal discrete value d (n) of the current moment n (wherein d (n) comprises an echo signal and interference noise);
B. calculating a decorrelation coefficient vector
And B1, forming a self-adaptive decorrelation input vector u (n) of the current moment n by using the value of the far-end sound signal discrete value x (n) obtained in the step A at the moment from n-1 to n-K, wherein u (n) = [ x (n-1) x (n-2) ] T, wherein the superscript T represents the transposed operation of the vector, K represents the decorrelation order, and K is more than or equal to 1. As the K value increases, the convergence speed of the algorithm increases, but the steady state becomes worse;
B2, calculating an adaptive decorrelation error signal e xd (n) at the current time n, Wherein the method comprises the steps ofThe length of the tap weight vector of the self-adaptive decorrelator at the moment n-1 is equal to K, and the initial value is zero vector;
b3, updating the self-adaptive decorrelation coefficient vector at the time n The calculation formula is as follows:
Wherein mu a is the decorrelation step length, and the value range is more than or equal to 0.001 and less than or equal to mu a and less than or equal to 0.05;
C. calculating the decorrelated input vector x D (n) and the decorrelated desired signal d D (n)
C1, calculating the decorrelated input signal x D (n) at the current time n,Using the value of the decorrelated input signal x D (n) from time n to time n-l+1 to form a decorrelated input vector x D(n),xD(n)=[xD(n) xD(n-1) … xD(n-L+1)]T at the current time n, where L is the number of adaptive filter taps, and in acoustic echo cancellation, l=512 or 1024 is often taken;
C2, using the values of the discrete values d (n) of the desired signal obtained in the step A from the time n-1 to the time n-K to form a decorrelated desired vector d (n) of the current time n, d (n) = [ d (n-1) d (n-2.. D (n-K) ] T, calculating a decorrelated desired signal d D (n) of the current time n,
D. Calculating an output y (n) of the adaptive filter and a decorrelated adaptive filter output y D (n)
D1, using the value of the far-end sound signal discrete value x (n) obtained in the step a from the time n to the time n-l+1 to form an input vector x (n) of the current time n, wherein x (n) = [ x (n) x (n-1) & x (n-l+1) ] T, calculating an output signal y (n) of the adaptive filter of the current time n, y (n) = w T (n) x (n), wherein w (n) = [ w 1(n) w2(n) … wL(n)]T ] is a tap weight vector of the adaptive filter of the time n, the length is equal to L, and the initial value is zero vector;
d2, calculating the self-adaptive filter output y D(n),yD(n)=wT(n)xD (n) after the decorrelation of the current moment n by using the input vector x D (n) after the decorrelation of the current moment n in the step C1;
E. calculating an error signal e (n) and a de-correlated error signal e D (n)
E1, subtracting the near-end expected signal D (n) of the current moment n obtained in the step A from the adaptive filter output signal y (n) obtained in the step D1 to obtain an error signal E (n) of the moment n, namely E (n) =d (n) -y (n), and transmitting the E (n) as a clean signal after echo cancellation to a far-end, so that a far-end speaker cannot hear the previous sound of the far-end speaker, and the purpose of echo cancellation is achieved;
E2, subtracting the output y D (n) of the self-adaptive filter after the decorrelation obtained in the step D2 from the expected signal D D (n) after the decorrelation of the current time n obtained in the step C2 to obtain an error signal E D (n) after the decorrelation of the time n, namely E D(n)=dD(n)-yD (n);
F. calculating a proportional matrix G (n)
The proportional matrix G (n) is a pair of corner matrices G (n) =diag [ G 1(n) g2(n) … gL (n) ], where the p-th diagonal element G p (n) of the current time n is represented byCalculating, wherein p is more than or equal to 1 and less than or equal to L, wherein kappa is always 0 or-0.5 or-0.75, |·| 1 represents the 1-norm of the vector;
G. computing a robust decorrelated error signal for iteration in M-estimation
G1, square of the de-correlated error signal E D (n) at time n obtained in step E2The values from time N to time N-N w +1 constitute the de-correlated error estimate sample a e (N) at the current time,Wherein N w is the error estimated sample length after the selected decorrelation;
Calculating a decorrelated error variable σ 2(n),σ2(n)=λσ2(n-1)+C(1-λ)med(Ae (N) free of impact interference, wherein the initial value of σ 2 (N) is zero, λ forgetting factor, and 0< < λ <1, c=2.2 [ 1+5/(N w+1)]2, med ()) is the median operator;
The calculation formula of the error signal threshold value parameter xi after the decorrelation is that xi= 2.576 σ 2 (n);
g2, robust decorrelated error signal The calculation formula of (2) is as follows:
H. Calculating step size parameter mu (n)
H1, the error signal after robust decorrelation according to step G2Calculating the mean square robust error at time nWherein χ is a forgetting factor, and 0< < χ <1;
H2 calculating the mean square error of the decorrelated input signal at time n from the decorrelated input signal x D (n) obtained in step C1
Computing a decorrelated input signal vector x D (n) and a robust error signal at time nIs a related parameter r (n),
Calculating the excess mean square error of time n
H3, calculating the value of the step parameter mu (n) at the time n,
I. updating the tap weight vector of the adaptive filter, and entering the next moment of processing;
the tap weight vector w (n + 1) of the adaptive filter at the next instant n +1 is calculated,
Let n=n+1, repeat A, B, C, D, E, F, G, H, I steps until the call ends.
In an embodiment of the present application, to verify the effectiveness of the present application, simulation experiments were performed and compared with algorithms PNLMS and PNLMM in voice calls under impulse noise conditions.
1. Simulation conditions
The impulse response w o of the echo path was collected in a quiet, closed room having a height of 2.5m, a width of 3.75m, a length of 6.25m, a temperature of 20 ℃, and a humidity of 50%, and a length L of 512, as shown in fig. 1. The far-end speech signal x (n) and the near-end desired signal are shown in fig. 2. The near-end microphone receives the desired signal d (n) and can be obtained by calculating the formula d (n) =x T(n)wo +v (n), wherein the observed noise v (n) is alpha stationary noise, the characteristic function is phi (t) =exp (-gamma|t| α), wherein alpha epsilon (0, 2) controls the characteristic index of the noise pulse characteristics, and gamma >0 represents the dispersion degree of the noiseIn decibels) was used to evaluate the performance of each method. The values of the parameters of these algorithms are shown in table 1 for fair comparison.
Table 1 parameter values for each algorithm
| Algorithm |
Parameter value |
| PNLMS |
μ=0.1,κ=-0.75 |
| PNLMM |
μ=0.1&μ=0.5,Nw=16,λ=0.98,κ=-0.75 |
| The invention is that |
Nw=20,λ=0.98,χ=0.999,μa=0.01,κ=-0.75,K=1 |
Fig. 2 is an impulse response of an echo path, and fig. 3 is a collected signal, (a) a far-end speech signal, and (b) an echo signal in a near-end desired signal. Fig. 4 is a plot of the algorithm PNLMS, PNLMM and the offset for a 1-degree implementation of the invention. As can be seen from fig. 3, the curve of algorithm PNLMS has a large number of peaks under the impact noise, and particularly, a maximum peak appears before and after time n is 8 ten thousand, whereas the curves of PNLMM and the present invention have no peak. This illustrates that the present invention and PNLMM algorithm have good robustness to impulse noise. In addition, it can be seen from the figure that the present invention has a faster convergence speed and steady-state level relative to PNLMS algorithm and PNLMM algorithm. In fig. 5, (a) is an estimated desired signal obtained by filtering a far-end speech signal using an estimated unknown echo path impulse response, and (b) is a clean signal obtained by subtracting the estimated desired signal from an echo signal in a near-end desired signal.
While the foregoing description illustrates and describes a preferred embodiment of the present invention, it is to be understood that the invention is not limited to the form disclosed herein, but is not to be construed as limited to other embodiments, but is capable of use in various other combinations, modifications and environments and is capable of changes or modifications within the spirit of the invention described herein, either as a result of the foregoing teachings or as a result of the knowledge or skill of the relevant art. And that modifications and variations which do not depart from the spirit and scope of the invention are intended to be within the scope of the appended claims.