CN108831497B

CN108831497B - Echo compression method and device, storage medium and electronic equipment

Info

Publication number: CN108831497B
Application number: CN201810495505.8A
Authority: CN
Inventors: 周舒然; 李志飞
Original assignee: Mobvoi Information Technology Co Ltd
Current assignee: Volkswagen China Investment Co Ltd; Mobvoi Innovation Technology Co Ltd
Priority date: 2018-05-22
Filing date: 2018-05-22
Publication date: 2020-06-09
Anticipated expiration: 2038-05-22
Also published as: CN108831497A

Abstract

The invention provides an echo compression method and device, a storage medium and electronic equipment, wherein the echo compression method comprises the following steps: determining near-end voice information according to the near-end signal, the reference signal and the residual echo signal; the near-end signal, the reference signal and the residual echo signal are all signals in a set frequency band; determining gain information according to the near-end voice information; and performing echo compression processing on the residual echo signal by using the gain information. Therefore, the scheme provided by the invention can improve the signal-to-noise ratio.

Description

Echo compression method and device, storage medium and electronic equipment

Technical Field

The embodiment of the invention relates to the technical field of voice processing, in particular to an echo compression method and device, a storage medium and electronic equipment.

Background

The intelligent voice technology is applied more and more widely at present, and each intelligent voice device can interact with a user by utilizing the intelligent voice technology. The voice signal received by the smart voice device may include a near-end signal and a reference signal. After a reference signal received by the voice terminal is sounded through a loudspeaker, the reference signal can form an echo.

Currently, echo compression is usually performed on the echo to reduce the echo. But often results in distortion of the near-end speech while reducing the residual echo during the echo compression process. The sound sounds not flat and harsh.

In the course of the inventive process, the inventor found that the echo compression process in the prior art results in a low signal-to-noise ratio.

Disclosure of Invention

In view of this, embodiments of the present invention provide an echo compression method and apparatus, a storage medium, and an electronic device, and mainly aim to improve a signal-to-noise ratio.

In a first aspect, an embodiment of the present invention provides an echo compression method, where the echo compression method includes:

determining near-end voice information according to the near-end signal, the reference signal and the residual echo signal; the near-end signal, the reference signal and the residual echo signal are all signals in a set frequency band;

determining gain information according to the near-end voice information;

and performing echo compression processing on the residual echo signal by using the gain information.

Alternatively to this, the first and second parts may,

the gain information comprises at least one frequency point gain value;

the performing echo compression processing on the residual echo signal by using the gain information includes:

the set overload amount is used for inhibiting the at least one frequency point gain value to obtain at least one overload gain value;

performing smoothing processing on the at least one overload gain value to obtain an adjustment gain value corresponding to the frequency band;

and determining echo compression output corresponding to the residual echo signal by using the adjustment gain value.

Alternatively to this, the first and second parts may,

the suppressing the at least one frequency point gain value by using the set overload amount to obtain at least one overload gain value includes:

suppressing the gain value of the at least one frequency point by using a formula (1) to obtain an overload gain value corresponding to each frequency point;

gain2_i＝gain1_i ^t(1)

wherein, the gain1_iRepresenting a frequency point gain value corresponding to the ith frequency point; the gain2_iRepresenting an overload gain value corresponding to the ith frequency point; the t characterizes the overload.

Alternatively to this, the first and second parts may,

the smoothing of the at least one overload gain value to obtain an adjustment gain value corresponding to the frequency band includes:

determining a smooth gain value corresponding to each frequency point according to a formula;

gain3_i＝K1×gain2_(i-1)+K2×gain2_i+K3×gain2_(i+1)(2)

wherein, the gain3_iRepresenting a smooth gain value corresponding to the ith frequency point; the gain2_(i-1)Representing an overload gain value corresponding to the (i-1) th frequency point; the gain2_(i+1)Representing an overload gain value corresponding to the (i + 1) th frequency point; the K1 characterizes a first constant; the K2 characterizing a second constant; the K3 characterizes a third constant;

determining the adjustment gain value through a formula (3) according to the smooth gain value corresponding to each frequency point;

M＝[gain3₁,……gain3_i](3)

wherein the M characterizes the adjustment gain value.

Alternatively to this, the first and second parts may,

the determining the echo compression output corresponding to the residual echo signal by using the adjustment gain value includes:

determining the echo compression output by formula (4) by using the adjustment gain value;

out＝M×E (4)

wherein the out characterizes the echo compressed output; the M characterizes the adjusted gain value; the E characterizes the residual echo signal.

Alternatively to this, the first and second parts may,

the determining near-end speech information according to the near-end signal, the reference signal and the residual echo signal includes:

determining a first average correlation coefficient between the near-end signal and the reference signal;

determining a second average correlation coefficient between the near-end signal and the residual echo signal;

and determining near-end voice information according to the first average correlation coefficient and the second average correlation coefficient.

Alternatively to this, the first and second parts may,

the determining a first average correlation coefficient between the near-end signal and the reference signal comprises:

determining at least one frequency point included in the frequency band;

determining at least one first correlation coefficient corresponding to the at least one frequency point according to the near-end signal and the reference signal;

and determining the first average correlation coefficient according to the at least one first correlation coefficient.

Alternatively to this, the first and second parts may,

said determining a second average correlation coefficient between said near-end signal and said residual echo signal comprises:

determining at least one frequency point included in the frequency band;

determining at least one second correlation coefficient corresponding to the at least one frequency point according to the near-end signal and the residual echo signal;

and determining the second average correlation coefficient according to the at least one second correlation coefficient.

Alternatively to this, the first and second parts may,

the determining near-end speech information according to the first average correlation coefficient and the second average correlation coefficient includes:

determining a near-end speech value through an equation set (1) according to the first average correlation coefficient and the second average correlation coefficient;

wherein T represents the near-end speech value; the above-mentioned

Characterizing the first average correlation coefficient; the above-mentioned

Characterizing the second average correlation coefficient; the K1 characterizes a first constant; the K2 characterizing a second constant; the K3 characterizes a third constant; the K4 characterizes a fourth constant;

when the near-end voice value is 1, the near-end voice information is that near-end voice exists;

and when the near-end voice value is 0, the near-end voice information is that near-end voice does not exist.

In a second aspect, the present invention provides an echo compression device, comprising:

the near-end voice determining module is used for determining near-end voice information according to the near-end signal, the reference signal and the residual echo signal; the near-end signal, the reference signal and the residual echo signal are all signals in a set frequency band;

a gain information determining module, configured to determine gain information according to the near-end speech information determined by the near-end speech determining module;

and the processing module is used for performing echo compression processing on the residual echo signal by using the gain information determined by the gain information determining module.

In a third aspect, the present invention provides a storage medium, where the storage medium includes a stored program, and where the apparatus in which the storage medium is located is controlled to execute any one of the above echo compression methods when the program runs.

In a fourth aspect, the present invention provides an electronic device, including a processor, a memory, and a bus; the processor and the memory complete mutual communication through the bus; the processor is configured to call program instructions in the memory to perform the echo compression method of any of the above.

The embodiment of the invention provides an echo compression method and device, a storage medium and electronic equipment, which determine near-end voice information (the near-end voice information can be near-end voice or not) according to a near-end signal, a reference signal and a residual echo signal in a set frequency band. And then determining gain information according to the near-end voice information, and performing echo compression processing on the residual echo signal by using the gain information so as to reduce the residual echo signal. As can be seen from the above, in the embodiment of the present invention, the echo compression processing may be performed on the residual echo signal according to the gain information determined by the near-end speech information, so that the residual echo signal may be reduced to the maximum. Therefore, the scheme provided by the embodiment of the invention can improve the signal to noise ratio.

The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is a flow chart illustrating an echo compression method according to an embodiment of the present invention;

fig. 2 is a flow chart of an echo compression method according to another embodiment of the present invention;

fig. 3 is a schematic structural diagram of an echo compressing device according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of an echo compressing device according to another embodiment of the present invention;

fig. 5 is a schematic structural diagram of an echo compressing device according to yet another embodiment of the present invention;

fig. 6 is a schematic structural diagram illustrating an echo compressing device according to yet another embodiment of the present invention;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

As shown in fig. 1, an embodiment of the present invention provides an echo compression method, which may include the following steps:

step 101: determining near-end voice information according to the near-end signal, the reference signal and the residual echo signal; the near-end signal, the reference signal and the residual echo signal are all signals in a set frequency band;

step 102: determining gain information according to the near-end voice information;

step 103: and performing echo compression processing on the residual echo signal by using the gain information.

According to the embodiment shown in fig. 1, near-end speech information (which may be the presence of near-end speech or the absence of near-end speech) is determined according to a near-end signal, a reference signal and a residual echo signal within a set frequency band. And then determining gain information according to the near-end voice information, and performing echo compression processing on the residual echo signal by using the gain information so as to reduce the residual echo signal. As can be seen from the above, in the embodiment of the present invention, the echo compression processing may be performed on the residual echo signal according to the gain information determined by the near-end speech information, so that the residual echo signal may be reduced to the maximum. Therefore, the scheme provided by the embodiment of the invention can improve the signal to noise ratio.

In an embodiment of the present invention, the near-end signal, the reference signal, and the residual echo signal involved in the flowchart shown in fig. 1 are all located in a set frequency band, which may be determined according to service requirements. For example, the frequency band may be 500Hz to 3000 Hz.

In an embodiment of the present invention, the step 101 in the flowchart shown in fig. 1 determines the near-end speech information according to the near-end signal, the reference signal and the residual echo signal, which may include:

a1: determining a first average correlation coefficient between the near-end signal and the reference signal;

a2: determining a second average correlation coefficient between the near-end signal and the residual echo signal;

a3: and determining near-end voice information according to the first average correlation coefficient and the second average correlation coefficient.

In an embodiment of the present invention, the step a1 involved in the foregoing embodiment of determining the first average correlation coefficient between the near-end signal and the reference signal may include:

determining at least one frequency point included in the frequency band;

In this embodiment, at least one frequency point corresponding to the frequency band may be determined according to a service requirement, where each frequency point corresponds to a part of the reference signal and a part of the near-end signal, respectively.

In this embodiment, the near-end signal and the reference signal are used to determine a first near-end smooth value, a reference smooth value and a first integrated smooth value, and each of the smooth values can be determined according to equation set (2).

Wherein R is_abiRepresenting a first comprehensive smooth value corresponding to the ith frequency point; r_aaiRepresenting a first near-end smooth value corresponding to the ith frequency point; r_bbiRepresenting the reference smooth value corresponding to the ith frequency point, β representing the first smooth coefficient, R_ab(i-1)Representing a first comprehensive smooth value corresponding to a near-end signal and a reference signal corresponding to the (i-1) th frequency point; r_aa′_(i-1)Representing a first near-end smooth value corresponding to a near-end signal corresponding to the (i-1) th frequency point; r_bb′_(i-1)Representing a reference smooth value corresponding to a reference signal corresponding to the (i-1) th frequency point; a is_iA near-end signal corresponding to the ith frequency point; b_iRepresenting reference signal corresponding to ith frequency point。

Then, the first correlation coefficient corresponding to each frequency point can be calculated according to the following formula (5).

Wherein M is_abiAnd characterizing a first correlation coefficient corresponding to the ith frequency point.

In this embodiment, after the first correlation coefficients corresponding to the frequency points are determined, the first correlation coefficients are summed, and the sum is divided by the first average correlation coefficient of the total number of the frequency points.

According to the embodiment, the first average correlation coefficient is determined by using the first correlation coefficient corresponding to each frequency point and the total amount of the frequency points. Therefore, the determined first average correlation coefficient can reflect the current states of the near-end signal and the reference signal.

In an embodiment of the present invention, the step a2 involved in the foregoing embodiment of determining the second average correlation coefficient between the near-end signal and the residual echo signal may include:

determining at least one frequency point included in the frequency band;

In this embodiment, at least one frequency point corresponding to the frequency band may be determined according to a service requirement, where each frequency point corresponds to a part of the reference signal and a part of the residual echo signal, respectively. It should be noted that each frequency point may be the same as the frequency point determined when the first average correlation coefficient is determined.

In this embodiment, the near-end signal and the residual echo signal are used to determine a second near-end smoothing value, a residual echo smoothing value, and a second combined smoothing value, and the above smoothing values can be determined by equation set (3).

Wherein, R is_aciRepresenting a second comprehensive smooth value corresponding to the ith frequency point; the R is_aaiRepresenting a second near-end smooth value corresponding to the ith frequency point; the R is_cciRepresenting a residual echo smooth value corresponding to the ith frequency point; the gamma characterizes a second smoothing coefficient; the R is_ac(i-1)Representing a second comprehensive smooth value corresponding to the near-end signal and the residual echo signal corresponding to the (i-1) th frequency point; the R is_aa(i-1)Representing a second near-end smooth value corresponding to the near-end signal corresponding to the i-1 frequency point; the R is_cc(′_i-1)Representing a residual echo smooth value corresponding to a reference signal corresponding to the (i-1) th frequency point; a is_iA near-end signal corresponding to the ith frequency point; c. C_iAnd characterizing the residual echo signal corresponding to the ith frequency point.

Then, the second correlation coefficient corresponding to each frequency point can be calculated according to the following formula (6).

Wherein M is_aciRepresenting a second correlation coefficient corresponding to the ith frequency point; r_aciRepresenting a second comprehensive smooth value corresponding to the ith frequency point; r_aaiRepresenting a second near-end smooth value corresponding to the ith frequency point; r_cciAnd representing a residual echo smooth value corresponding to the ith frequency point.

In this embodiment, after the second correlation numbers corresponding to the frequency points are determined, the second correlation numbers are summed, and the sum is divided by the second average correlation coefficient of the total number of the frequency points.

According to the embodiment, the second average correlation coefficient is determined by using the second correlation number corresponding to each frequency point and the total amount of the frequency points. Therefore, the determined second average correlation coefficient can reflect the current states of the near-end signal and the residual echo signal.

In an embodiment of the present invention, the step a3 related in the foregoing embodiment of determining the near-end speech information according to the first average correlation coefficient and the second average correlation coefficient may include:

wherein T represents the near-end speech value; the above-mentioned

Characterizing the first average correlation coefficient; the above-mentioned

In the embodiment, K1-K4 can be determined according to the service requirement. For example, the first constant K1 is preferably 0.95; the second constant K2 is 0.15; the third constant is 0.9; the fourth constant is 0.2.

In the present embodiment, the following are exemplified: at the moment of determining

Greater than 0.95, and

if the value is less than 0.15, the value of the near-end voice is determined to be 1, and then the existence of the near-end voice can be judged.

Less than 0.9, and

if the value is greater than 0.2, the value of the near-end voice is determined to be 0, and then the fact that the near-end voice does not exist can be judged.

According to the embodiment, whether the near-end speech exists can be determined according to the relationship between the first average correlation coefficient and the set constant and the second average correlation coefficient, so that the determined near-end speech information is more accurate.

In an embodiment of the present invention, the gain information related in the flowchart shown in fig. 1 includes at least one frequency point gain value;

when the near-end speech information referred to in the above flowchart shown in fig. 1 is that no near-end speech exists,

then, the step 102 in the flowchart shown in fig. 1 determines gain information according to the near-end speech information, which may include:

determining at least one frequency point included in the frequency band;

and determining a frequency point gain value corresponding to each frequency point according to the at least one correlation coefficient and the at least one second correlation coefficient.

In this embodiment, the determining, according to the at least one correlation coefficient and the at least one second correlation coefficient, a frequency point gain value corresponding to each frequency band may include:

determining a frequency point gain value corresponding to each frequency point according to a formula (7);

wherein, the gain1_iRepresenting a frequency point gain value corresponding to the ith frequency point; said C is_aciRepresenting a second correlation coefficient corresponding to the ith frequency point; said C is_abiAnd characterizing a first correlation coefficient corresponding to the ith frequency point.

when the near-end speech information referred to in the above flowchart shown in fig. 1 is the presence of near-end speech,

determining at least one frequency point included in the frequency band;

executing for each frequency point: and determining the frequency point gain value of the frequency point as a second correlation coefficient corresponding to the frequency point.

According to the embodiment, the frequency point gain value corresponding to each frequency point can be determined in a targeted manner no matter the near-end voice information exists or the near-end voice signal does not exist.

the step 103 in the flowchart shown in fig. 1 performs echo compression processing on the residual echo signal by using the gain information, which may include:

b1: the set overload capacity is used for carrying out suppression processing on the at least one gain value to obtain at least one overload gain value;

b2: performing smoothing processing on the at least one overload gain value to obtain an adjustment gain value corresponding to the frequency band;

b3: and determining echo compression output corresponding to the residual echo signal by using the adjustment gain value.

In an embodiment of the present invention, the step B1 related in the foregoing embodiment performs suppression processing on the at least one frequency point gain value by using the set overload amount to obtain at least one overload gain value, where the method includes:

gain2_i＝gain1_i ^t(1)

In this embodiment, the amount of overload may be determined based on traffic requirements. For example, the amount of excess may be any value between-1 and 1.

In this embodiment, the overload amount is utilized to suppress the frequency point gain value of each frequency point, and particularly, the suppression effect is strong in a high frequency band.

According to the embodiment, the overload capacity is utilized to suppress the gain value of each frequency point, so that the overload gain value corresponding to each frequency point is obtained. Therefore, the overload gain values corresponding to the frequency points can be utilized to inhibit the frequency points.

In an embodiment of the present invention, the step B2 of the above embodiment of smoothing the at least one overload gain value to obtain an adjustment gain value corresponding to the frequency band may include:

determining a smooth gain value corresponding to each frequency point according to a formula (2);

gain3_i＝K1×gain2_(i-1)+K2×gain2_i+K3×gain2_(i+1)(2)

M＝[gain3₁,……gain3_i](3)

wherein the M characterizes the adjustment gain value.

In this embodiment, the smooth gain value corresponding to each frequency point is determined according to the overload gain value of the frequency point, the overload gain value of the previous frequency point adjacent to the frequency point, and the overload gain value corresponding to the next frequency point adjacent to the frequency point. Therefore, the smooth gain values corresponding to the frequency points can reflect the change condition of the signal as a whole.

In this embodiment, the adjustment gain value is determined according to the smooth gain value corresponding to each frequency point, so that the adjustment gain value can reduce the echo to the maximum.

In an embodiment of the present invention, the step B3 in the foregoing embodiment, determining an echo compression output corresponding to the residual echo signal by using the adjusted gain value, may include:

out＝M×E (4)

In this embodiment, the adjustment gain value is obtained after being subjected to the overload processing and the smoothing processing. Therefore, when the residual echo signal is compressed by using the adjustment gain value, the noise can be reduced to the maximum extent, so that the voice is more gentle and not harsh.

The echo compression method will be described below by taking the example of obtaining the near-end signal, the reference signal and the residual echo signal in the frequency band of "500 Hz to 3000 Hz". As shown in fig. 2, the echo compression method includes:

step 201: and acquiring a near-end signal, a reference signal and a residual echo signal with the frequency of 500Hz-3000 Hz.

Step 202: at least one frequency point included in the frequency band is determined.

Step 203: and determining at least one first correlation coefficient corresponding to at least one frequency point according to the near-end signal and the reference signal.

In this step, the first correlation coefficient corresponding to each frequency point can be determined according to the formula (5).

Step 204: a first average correlation coefficient is determined from the at least one first correlation coefficient.

In this step, the respective first correlation coefficients are summed, and the sum is divided by the total number of frequency points to the first average correlation coefficient.

Step 205: and determining at least one second correlation coefficient corresponding to at least one frequency point according to the near-end signal and the residual echo signal.

In this step, the second correlation coefficient corresponding to each frequency point can be determined according to the formula (6).

Step 206: and determining a second average correlation coefficient according to the at least one second correlation coefficient.

In this step, the second correlation numbers are summed, and the sum is divided by the total number of frequency points to obtain a second average correlation coefficient.

Step 207: determining near-end voice information according to the first average correlation coefficient and the second average correlation coefficient; when the near-end voice information is that near-end voice exists, executing step 208; when the near-end voice information is that no near-end voice exists, step 209 is executed.

In this step, the determination of the near-end speech information is made according to equation set (1). Wherein the first constant K1 in equation set (1) is 0.95; the second constant K2 is 0.15; the third constant is 0.9; the fourth constant is 0.2.

In this step, the determination is made

Less than 0.9, and

if it is greater than 0.2, it is determined that the value of the near-end speech is 0, and it can be determined that there is no near-end speech, and step 209 is executed.

Step 208: and determining the frequency point gain value corresponding to each frequency point according to each first correlation coefficient and each second correlation coefficient, and executing the step 210.

Step 209: and determining the frequency point gain value of each frequency point as the corresponding second correlation coefficient, and executing step 210.

In this step, a frequency point gain value corresponding to each frequency point is determined according to each second correlation coefficient determined in step 205.

Step 210: and inhibiting the gain value of each frequency point to obtain the overload gain value corresponding to each frequency point.

In this step, the suppression processing is performed on each bin gain value according to the formula (1). For example, the selected amount of overload t is 0.5.

Step 211: and smoothing each overload gain value to determine a smooth gain value corresponding to each frequency point.

In this step, the formula (2) is adopted to smooth each overload gain value, and a smooth gain value corresponding to each frequency point is obtained.

Step 212: and determining an adjustment gain value according to the determined smooth gain values.

In this step, the adjustment gain value is determined by each smoothed gain value according to equation (3).

Step 213: and determining the echo compression output by using the adjusted gain value.

In this step, the residual echo signal is subjected to echo compression processing by using the formula (4), so as to obtain echo compression output. Because the adjustment gain value is utilized to compress the residual echo signal, the noise can be reduced to the maximum extent, and the voice is more smooth and not harsh.

As shown in fig. 3, an embodiment of the present invention provides an echo compression device, which may include:

a near-end speech determining module 301, configured to determine near-end speech information according to the near-end signal, the reference signal, and the residual echo signal; the near-end signal, the reference signal and the residual echo signal are all signals in a set frequency band;

a gain information determining module 302, configured to determine gain information according to the near-end speech information determined by the near-end speech determining module 301;

a processing module 303, configured to perform echo compression processing on the residual echo signal by using the gain information determined by the gain information determining module 302.

According to the embodiment shown in fig. 3, in the embodiment of the present invention, the processing module may perform echo compression processing on the residual echo signal according to the gain information determined by the near-end speech information, so as to reduce the residual echo signal to the maximum. Therefore, the scheme provided by the embodiment of the invention can improve the signal to noise ratio.

In an embodiment of the present invention, when the gain information includes at least one frequency point gain value and when the near-end speech information is that near-end speech does not exist, as shown in fig. 4,

the gain information determination module 302 may include: a coefficient determining submodule 3021 and a frequency point gain value determining submodule 3022;

the coefficient determining submodule 3021 is configured to determine at least one frequency point included in the frequency band; determining at least one first correlation coefficient corresponding to the at least one frequency point according to the near-end signal and the reference signal; determining at least one second correlation coefficient corresponding to the at least one frequency point according to the near-end signal and the residual echo signal;

the frequency point gain value determining sub-module 3022 is configured to determine, according to the at least one correlation coefficient and the at least one second correlation coefficient, a frequency point gain value corresponding to each frequency point.

In an embodiment of the present invention, the frequency point gain value determining sub-module 3022 is configured to determine, according to formula (7), a frequency point gain value corresponding to each frequency point;

In an embodiment of the present invention, when the gain information includes at least one frequency point gain value, and when the near-end speech information is near-end speech, the gain information determining module 302 is configured to determine at least one frequency point included in the frequency band; determining at least one second correlation coefficient corresponding to the at least one frequency point according to the near-end signal and the residual echo signal; executing for each frequency point: and determining the frequency point gain value of the frequency point as a second correlation coefficient corresponding to the frequency point.

In one embodiment of the present invention, as shown in fig. 5, when the gain information includes at least one bin gain value,

the processing module 303 may include: a suppression submodule 3031, a smoothing submodule 3032 and a determination submodule 3033;

the suppression submodule 3031 is configured to perform suppression processing on the at least one frequency point gain value by using a set overload amount to obtain at least one overload gain value;

the smoothing submodule 3032 is configured to smooth the at least one overload gain value to obtain an adjustment gain value corresponding to the frequency band;

the determining submodule 3033 is configured to determine, by using the adjustment gain value, an echo compression output corresponding to the residual echo signal.

In an embodiment of the present invention, the suppressor module 3031 is configured to perform suppression processing on the gain value of the at least one frequency point according to formula (1) to obtain an overload gain value corresponding to each frequency point;

gain2_i＝gain1_i ^t(1)

In an embodiment of the present invention, the smoothing sub-module 3032 is configured to determine a smoothing gain value corresponding to each frequency point according to formula (2); determining the adjustment gain value through a formula (3) according to the smooth gain value corresponding to each frequency point;

gain3_i＝K1×gain2_(i-1)+K2×gain2_i+K3×gain2_(i+1)(2)

M＝[gain3₁,……gain3_i](3)

wherein the M characterizes the adjustment gain value.

In an embodiment of the present invention, the determining submodule 3033 is configured to determine the echo compression output according to formula (4) by using the adjustment gain value;

out＝M×E(4)

In an embodiment of the present invention, as shown in fig. 6, the near-end speech determination module 301 may include: a first coefficient determination sub-module 3011, a second coefficient determination sub-module 3012, and a near-end speech determination sub-module 3013;

the first coefficient determination sub-module 3011, configured to determine a first average correlation coefficient between the near-end signal and the reference signal;

the second coefficient determining sub-module 3012, configured to determine a second average correlation coefficient between the near-end signal and the residual echo signal;

the near-end speech determining sub-module 3013 is configured to determine near-end speech information according to the first average correlation coefficient and the second average correlation coefficient.

In an embodiment of the present invention, the first coefficient determining sub-module 3011 is configured to determine at least one frequency point included in the frequency band; determining at least one first correlation coefficient corresponding to the at least one frequency point according to the near-end signal and the reference signal; and determining the first average correlation coefficient according to the at least one first correlation coefficient.

In an embodiment of the present invention, the second coefficient determining sub-module 3012 is configured to determine at least one frequency point included in the frequency band; determining at least one second correlation coefficient corresponding to the at least one frequency point according to the near-end signal and the residual echo signal; and determining the second average correlation coefficient according to the at least one second correlation coefficient.

In an embodiment of the present invention, the near-end speech determining sub-module 3013 is configured to determine a near-end speech value according to the first average correlation coefficient and the second average correlation coefficient through an equation set (1); when the near-end voice value is 1, the near-end voice information is that near-end voice exists; when the near-end voice value is 0, the near-end voice information is that near-end voice does not exist;

wherein T represents the near-end speech value; the above-mentioned

Characterizing the first average correlation coefficient; the above-mentioned

in one embodiment of the present invention, the first constant K1 is 0.95; the second constant K2 is 0.15; the third constant is 0.9; the fourth constant is 0.2.

An embodiment of the present invention provides a storage medium, where the storage medium includes a stored program, where when the program runs, a device in which the storage medium is located is controlled to execute the echo compression method described in any one of the above.

In an embodiment of the present invention, an electronic device is provided, as shown in fig. 7, which includes a processor 401, a memory 402, and a bus 403; the processor 401 and the memory 402 complete communication with each other through the bus 403; the processor 401 is configured to call program instructions in the memory 402 to execute the echo compression method described in any of the above.

Because the information interaction, execution process, and other contents between the units in the device are based on the same concept as the method embodiment of the present invention, specific contents may refer to the description in the method embodiment of the present invention, and are not described herein again.

The embodiments of the invention have at least the following beneficial effects:

1. in the embodiment of the present invention, near-end speech information (the near-end speech information may be the presence of near-end speech or the absence of near-end speech) is determined according to a near-end signal, a reference signal and a residual echo signal within a set frequency band. And then determining gain information according to the near-end voice information, and performing echo compression processing on the residual echo signal by using the gain information so as to reduce the residual echo signal. Therefore, in the scheme, the echo compression processing can be performed on the residual echo signal according to the gain information determined by the near-end voice information, so that the residual echo signal can be reduced to the maximum extent. Therefore, the scheme provided by the embodiment of the invention can improve the signal to noise ratio.

2. In the embodiment of the invention, the first average correlation coefficient is determined by using the first correlation coefficient corresponding to each frequency point and the total amount of the frequency points. Therefore, the determined first average correlation coefficient can reflect the current states of the near-end signal and the reference signal.

3. In the embodiment of the invention, the second average correlation coefficient is determined by using the second correlation number corresponding to each frequency point and the total amount of the frequency points. Therefore, the determined second average correlation coefficient can reflect the current states of the near-end signal and the residual echo signal.

4. In the embodiment of the present invention, since whether the near-end speech exists can be determined according to the relationship between the first average correlation coefficient and the set constant, the determined near-end speech information is more accurate.

5. In the embodiment of the invention, the frequency point gain value corresponding to each frequency point can be determined in a targeted manner no matter the near-end voice information exists or the near-end voice signal does not exist.

6. In the embodiment of the invention, the overload capacity is utilized to carry out suppression processing on the gain value of each frequency point, so that the overload gain value corresponding to each frequency point is obtained. Therefore, the overload gain values corresponding to the frequency points can be utilized to inhibit the frequency points.

7. In the embodiment of the invention, the adjustment gain value is determined according to the smooth gain value corresponding to each frequency point, so that the adjustment gain value can reduce the echo to the maximum extent.

8. In the embodiment of the invention, the adjustment gain value is obtained after overload processing and smoothing processing. Therefore, when the residual echo signal is compressed by using the adjustment gain value, the noise can be reduced to the maximum extent, so that the voice is more gentle and not harsh.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a" does not exclude the presence of other similar elements in a process, method, article, or apparatus that comprises the element.

Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

Finally, it is to be noted that: the above description is only a preferred embodiment of the present invention, and is only used to illustrate the technical solutions of the present invention, and not to limit the protection scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. A method of echo compression, comprising:

determining gain information according to the near-end voice information;

performing echo compression processing on the residual echo signal by using the gain information;

determining near-end voice information according to the first average correlation coefficient and the second average correlation coefficient;

determining a first near-end smoothing value, a reference smoothing value and a first comprehensive smoothing value by using the near-end signal and the reference signal through the following equation set;

wherein R is_abiRepresenting a first comprehensive smooth value corresponding to the ith frequency point; r_aaiRepresenting a first near-end smooth value corresponding to the ith frequency point; r_bbiRepresenting the reference smooth value corresponding to the ith frequency point, β representing the first smooth coefficient, R_ab(i-1)Representing a first comprehensive smooth value corresponding to a near-end signal and a reference signal corresponding to the (i-1) th frequency point; r'_aa(i-1)Representing a first near-end smooth value corresponding to a near-end signal corresponding to the (i-1) th frequency point; r'_bb(i-1)Representing a reference smooth value corresponding to a reference signal corresponding to the (i-1) th frequency point; a is_iA near-end signal corresponding to the ith frequency point; b_iRepresenting a reference signal corresponding to the ith frequency point;

determining a first correlation coefficient of each frequency point according to the first near-end smooth value, the reference smooth value and the first comprehensive smooth value of each frequency point;

and summing the first correlation coefficients of the frequency points, and determining the quotient of the summation result and the total number of the frequency points as a first average correlation coefficient.

2. The echo compression method of claim 1,

the gain information comprises at least one frequency point gain value;

3. The echo compression method of claim 2,

the gain value of at least one frequency point is suppressed through a first formula, and an overload gain value corresponding to each frequency point is obtained;

the first formula includes:

gain2_i＝gain1_i ^t

wherein, the gain1_iRepresenting a frequency point gain value corresponding to the ith frequency point; the gain2_iRepresenting an overload gain value corresponding to the ith frequency point; the t represents the overload amount;

and/or the presence of a gas in the gas,

determining a smooth gain value corresponding to each frequency point according to a second formula;

the second formula includes:

gain3_i＝K1×gain2_(i-1)+K2×gain2_i+K3×gain2_(i+1)

wherein, the gain3_iRepresenting a smooth gain value corresponding to the ith frequency point; the gain2_(i-1)Representing an overload gain value corresponding to the (i-1) th frequency point; the gain2_(i+1)Representing an overload gain value corresponding to the (i + 1) th frequency point; the gain2_iRepresenting an overload gain value corresponding to the ith frequency point; the K1 characterizes a first constant; the K2 characterizing a second constant; the K3 characterizes a third constant;

determining the adjustment gain value through a third formula according to the smooth gain value corresponding to each frequency point;

the third formula includes:

M＝[gain3₁,……gain3_i]

wherein the M characterizes the adjustment gain value.

4. The echo compression method of claim 3,

determining the echo compression output by a fourth formula by using the adjustment gain value;

the fourth formula includes:

out＝M×E

5. The echo compression method of claim 1,

determining at least one frequency point included in the frequency band;

determining the first average correlation coefficient according to the at least one first correlation coefficient;

and/or the presence of a gas in the gas,

determining at least one frequency point included in the frequency band;

6. The echo compression method of claim 5,

determining a near-end speech value through a first equation set according to the first average correlation coefficient and the second average correlation coefficient;

the first set of equations includes:

wherein T represents the near-end speech value; the above-mentioned

Characterizing the first average correlation coefficient; the above-mentioned

7. An echo compression device, comprising:

the processing module is used for performing echo compression processing on the residual echo signal by using the gain information determined by the gain information determining module;

the near-end speech determination module comprises:

a first coefficient determination submodule for determining a first average correlation coefficient between the near-end signal and the reference signal;

a second coefficient determination submodule for determining a second average correlation coefficient between the near-end signal and the residual echo signal;

the near-end voice determining submodule is used for determining near-end voice information according to the first average correlation coefficient and the second average correlation coefficient;

a first coefficient determination submodule for determining a first near-end smoothing value, a reference smoothing value and a first integrated smoothing value by using the near-end signal and the reference signal through the following equation set; determining a first correlation coefficient of each frequency point according to the first near-end smooth value, the reference smooth value and the first comprehensive smooth value of each frequency point; summing the first correlation coefficients of all the frequency points, and determining the quotient of the sum result and the total number of the frequency points as a first average correlation coefficient;

wherein R is_abiRepresenting a first comprehensive smooth value corresponding to the ith frequency point; r_aaiRepresenting a first near-end smooth value corresponding to the ith frequency point; r_bbiRepresenting the reference smooth value corresponding to the ith frequency point, β representing the first smooth coefficient, R_ab(i-1)Representing a first comprehensive smooth value corresponding to a near-end signal and a reference signal corresponding to the (i-1) th frequency point; r'_aa(i-1)Representing a first near-end smooth value corresponding to a near-end signal corresponding to the (i-1) th frequency point; r'_bb(i-1)Representing a reference smooth value corresponding to a reference signal corresponding to the (i-1) th frequency point; a is_iA near-end signal corresponding to the ith frequency point; b_iAnd characterizing the reference signal corresponding to the ith frequency point.

8. A storage medium comprising a stored program, wherein the program, when executed, controls an apparatus in which the storage medium is located to perform the echo compression method according to any one of claims 1 to 6.

9. An electronic device, wherein the electronic device comprises a processor, a memory and a bus; the processor and the memory complete mutual communication through the bus; the processor is configured to call program instructions in the memory to perform the echo compression method of any one of claims 1 to 6.