HK1135241A

HK1135241A - Neural network filtering techniques for compensating linear and non-linear distortion of an audio transducer

Info

Publication number: HK1135241A
Application number: HK09112166.4A
Authority: HK
Inventors: 德米特里‧V‧施芒克
Original assignee: Dts（英属维尔京群岛）有限公司
Priority date: 2006-08-01
Filing date: 2007-07-25
Publication date: 2010-05-28

Description

Neural network filtering techniques for compensating linear and non-linear distortions of an audio transducer

Technical Field

The present invention relates to audio transducer compensation, and more particularly to a method of compensating for linear and non-linear distortions of audio transducers such as speakers, microphones or power amplifiers and broadcast antennas.

Background

The audio speaker preferably exhibits balanced and predictable input/output (I/O) response characteristics. Ideally, the analog audio signal coupled to the speaker input is provided to the listener's ear. In fact, the audio signal that reaches the listener's ear is the original audio signal plus some distortion caused by the speaker itself (e.g., its construction and the interaction of its internal components) and by the listening environment (e.g., the location of the listener, the acoustic properties of the room, etc.) in which the audio signal must propagate to reach the listener's ear. Various techniques are performed during speaker manufacturing to minimize distortion caused by the speaker itself in order to provide a desired speaker response. In addition, there is a technique of adjusting the speaker mechanically and manually to further reduce distortion.

U.S. patent No.6,766,025 to Levy describes a programmable speaker that digitally performs a transform function on an input audio signal using characteristic data stored in a memory and Digital Signal Processing (DSP) to compensate for speaker-related distortion and listening environment distortion. A non-intrusive system and method for adjusting a speaker is performed in a manufacturing environment by applying a reference signal and a control signal to an input of a programmable speaker. The microphone detects an audio signal at the output of the speaker corresponding to the input reference signal and feeds it back to the tester, which analyzes the frequency response of the speaker by comparing the input reference signal with the audio output signal from the speaker. The tester provides the updated digital control signal with new characteristic data to the loudspeaker in dependence on the comparison result, which new characteristic data is then stored in the loudspeaker memory and used to perform the transformation function again on the input reference signal. The adjustment feedback loop continues until the input reference signal and the audio output signal from the speaker exhibit the desired frequency response as determined by the tester. In the consumer environment, the microphone is located in a selected listening environment and the adjusting device is again used to update the characteristic data to compensate for distortion effects detected by the microphone in the selected listening environment. Levy relies on techniques well known in the signal processing art to provide an inverse transform to compensate for loudspeaker and listening environment distortions.

Distortion includes both linear and non-linear components. Non-linear distortion, such as "clipping," is a function of the amplitude of the input audio signal, whereas linear distortion is not. Known compensation techniques either solve the linear part of the problem and ignore the non-linear components, or vice versa. While linear distortion may be the dominant component, non-linear distortion produces additional spectral components that are not present in the input signal. Therefore, the compensation is not accurate and thus not suitable for some high-end audio applications.

There are many ways to solve the linear part of the problem. The simplest approach is to provide the equalizer with a bank of bandpass filters with independent gain control. More elaborate techniques include both phase and amplitude correction. For example, Norcross et al, in 2005, 7-10 days of Audio engineering Society, "Adaptive Strategies for Inverse Filtering," introduced a frequency domain Inverse Filtering method that allowed weighting and regularization terms to offset errors at certain frequencies. While this approach is useful for providing the desired frequency characteristics, it does not control the time domain characteristics of the inverse response, e.g., frequency domain calculations cannot reduce pre-echoes in the final (corrected and played through the speaker) signal.

Techniques for compensating for non-linear distortion have been less developed. Klippel et al, in 2005, 10.7-10 AES, "Loudspeaker nonlinear-Causes, Parameters, sym", introduced a relationship between non-linear distortion measurements and Nonlinearities, which are physical Causes of signal distortion in speakers and other transducers. Bard et al used an inverse transform based on the frequency domain Volterra kernel to estimate the non-linearity of the speaker in the "Compensation of nonlinear coefficients of AES at 7-10 days 10/2005. The transform is obtained by analytically computing the transformed Volterra kernel from the forward frequency domain kernel. This method works well for stationary signals (e.g., a set of sinusoids), but significant non-linearity can occur in transient non-stationary regions of an audio signal.

Disclosure of Invention

The following is a summary of the invention in order to provide a basic understanding of some aspects of the invention. This summary is not intended to identify key or critical elements of the invention or to delineate the scope of the invention. Its sole purpose is to present some concepts of the invention in a simplified form as a prelude to the more detailed description and the claims that are presented later.

The present invention provides efficient, robust and accurate filtering techniques to compensate for linear and non-linear distortions of audio transducers, such as loudspeakers. These techniques include both methods of characterizing audio transducers to calculate inverse transfer functions and methods of implementing those inverse transfer functions for reproduction. In a preferred embodiment, the inverse transfer function is extracted using time domain calculations such as provided by linear and non-linear neural networks, which more accurately represent the properties of the audio signal and the transformer than conventional frequency domain or modeling based methods. While the preferred method is to compensate for both linear and non-linear distortions, neural network filtering techniques can also be applied independently. The same technique can also be adapted to compensate for distortions of the transducer and the listening, recording or broadcasting environment.

In an exemplary embodiment, the linear test signal is played through an audio transducer and synchronously recorded. The raw and recorded test signals are processed to extract a forward linear transfer function and noise reduced, preferably using, for example, time, frequency and time/frequency domain techniques. The parallel application of a wavelet transform to a "slice" (snapshot) of a forward transform, which exploits the time-scale properties of the transform, is particularly suited to the properties of the transformer impulse response. The inverse linear transfer function is calculated and mapped to the coefficients of the linear filter. In a preferred embodiment, the linear neural network is trained to convert a linear transfer function, whereby the network weights are directly mapped to the filter coefficients. Both time and frequency domain constraints can be placed on the transfer function by an error function to address issues such as pre-echo and excessive amplification.

The non-linear test signal is applied to the audio transducer and synchronously recorded. The recorded signal is preferably passed through a linear filter to remove linear distortion of the device. Noise reduction techniques may also be applied to the recorded signal. The recorded signal is then subtracted from the non-linear test signal to provide an estimate of the non-linear distortion from which forward and inverse non-linear transfer functions are calculated. In a preferred embodiment, a non-linear neural network is trained with a test signal and non-linear distortion to estimate a forward non-linear transfer function. The inverse transform is obtained by recursing the test signal through a non-linear neural network and subtracting the weighted response from the test signal. The weighting coefficients of the recursive formula are optimized by, for example, a minimum mean square error method. The time domain representation used in the method is well suited for handling non-linearities in transient regions of an audio signal.

At reproduction, the audio signal is applied to a linear filter (the transfer function of which is an estimate of the inverse linear transfer function of the audio reproduction device) to provide a linear pre-compensated audio signal. The linear pre-compensated audio signal is then applied to a non-linear filter whose transfer function is an estimate of the inverse non-linear transfer function. The non-linear filter is suitably implemented by a trained non-linear neural network and an optimized recursive formula that recursively passes the audio signal. To improve efficiency, non-linear neural networks and recursive formulas may be used as models to train single-pass broadcast neural networks. For output transducers such as speakers or broadcast amplifying antennas, linear and non-linear pre-compensated signals are delivered to the transducer. For input transducers such as microphones, linear and non-linear compensation is applied to the output of the transducer.

Drawings

These and other features and advantages of the present invention will become apparent to those skilled in the art from the following detailed description of the preferred embodiments, when read in light of the accompanying drawings. Wherein:

FIGS. 1a and 1b are a block diagram and a flow diagram for calculating inverse linear and non-linear transfer functions for precompensating an audio signal for playback on an audio reproduction device;

FIG. 2 is a flow chart of extracting a forward linear transfer function and reducing its noise and computing an inverse linear transfer function using a linear neural network;

fig. 3a and 3b are diagrams illustrating frequency domain filtering and segment reconstruction, and fig. 3c is a frequency plot of the resulting forward linear transfer function;

4a-4d are diagrams illustrating the parallel application of wavelet transforms to segments of a forward linear transfer function;

FIGS. 5a and 5b are plots of forward linear transfer functions for noise reduction;

FIG. 6 is a diagram of a single-layer single neuron neural network for transforming a forward linear transformation;

FIG. 7 is a flow diagram of extracting a forward non-linear transfer function using a non-linear neural network and calculating an inverse non-linear transfer function using a recursive subtraction formula;

FIG. 8 is a diagram of a non-linear neural network;

FIGS. 9a and 9b are block diagrams of audio systems configured to compensate for linear and non-linear distortion of speakers;

FIGS. 10a and 10b are flow diagrams for compensating linear and non-linear distortions of an audio signal during playback;

FIG. 11 is a plot of raw and compensated frequency responses of a loudspeaker; and

fig. 12a and 12b are plots of the impulse response of the loudspeaker before and after compensation, respectively.

Detailed Description

The present invention provides efficient, robust and accurate filtering techniques to compensate for linear and non-linear distortions of audio transducers such as speakers, broadcast amplifying antennas or microphones. These techniques include both methods of characterizing audio transducers to calculate inverse transfer functions and methods of implementing these inverse transfer functions for reproduction during playback, broadcast, or recording. In a preferred embodiment, the inverse transfer function is extracted using time domain calculations such as provided by linear and non-linear neural networks, which more accurately represent the properties of the audio signal and the audio transformer than traditional frequency domain or modeling based methods. The neural network filtering technique can be applied independently, although the preferred method is to compensate for both linear and non-linear distortions. The same technique is also suitable for compensating for distortions of the loudspeaker and the listening, broadcasting or recording environment.

As used herein, the term "audio transducer" refers to any device that is energized by power from one system and supplies power to another system in another form, where one form of power is electrical and the other form is acoustic or electrical, and the device reproduces an audio signal. The transducer may be an output transducer such as a speaker or an amplifying antenna, or an input transducer such as a microphone. An exemplary embodiment of the present invention will now be described for a loudspeaker that converts an electrical input audio signal into an acoustic signal of acoustic frequency.

A test setup for characterizing the distortion properties of a loudspeaker and a method of calculating an inverse transfer function are shown in fig. 1a and 1 b. The test setup suitably includes a computer 10, a sound card 12, a speaker under test 14 and a microphone 16. The computer generates an audio test signal 18 and passes the signal to the sound card 12, which sound card 12 in turn drives a speaker. The microphone 16 collects the audio signal and converts it back into an electrical signal. The sound card passes the recorded audio signal 20 back to the computer for analysis. It is appropriate to use a full duplex sound card to perform playback and recording of the test signal with reference to a shared clock signal such that the signals are time aligned and therefore fully synchronized within a single sampling period.

The inventive technique will characterize and compensate for any distortion sources in the signal path from playback to recording. Thus, a high quality microphone is used in order to ignore any distortion caused by the microphone. Note that if the transducer under test is a microphone, a high quality speaker is used to exclude unwanted sources of distortion. To characterize the speaker only, the "listening environment" should be configured to minimize any reflections or other sources of distortion. Alternatively, the same technique can be used to characterize the speakers in, for example, a consumer's home theater. In the latter case, the consumer's receiver or speaker system must be configured to perform the test, analyze the data, and configure the speaker for playback.

The same test setup was used to characterize both the linear and non-linear distortion properties of the loudspeaker. The computer generates different audio test signals 18 and performs different analyses on the recorded audio signal 20. The spectral content of the linear test signal should cover the full analysis frequency range and the full amplitude range of the loudspeaker. An exemplary test signal contains two series of linear, full frequency chirp pulses: (a) increasing linearly for 700ms at a frequency from 0Hz to 24kHz, decreasing linearly for 700ms to 0Hz at a frequency, and then repeating, and (b) increasing linearly for 300ms at a frequency from 0Hz to 24kHz, decreasing linearly for 300ms to 0Hz at a frequency, and then repeating. Both chirps are present in the signal, spanning the full duration of the signal. The chirp is amplitude modulated in such a way that a sharp attack (sharp attack) and a slow decay are generated in the time domain. The length of each period of the amplitude modulation is arbitrary and ranges from about 0ms to 150 ms. The non-linear test signal should preferably contain tones and noise of different amplitudes and silent periods. There should be sufficient variability in the signal for successful training of the neural network. The exemplary non-linear test signal is constructed in a similar manner but with different time parameters: (a) a 4 second linear increase in frequency from 0Hz to 24kHz without a drop in frequency, the period of the next chirp starting again at 0Hz, and (b) a 250ms linear increase in frequency from 0Hz to 24kHz, a 250ms linear decrease in frequency to 0 Hz. The chirp in the signal is modulated by an arbitrary amplitude change. The amplitude rate can be as fast as from 0 to full scale in 8 ms. Both the linear and non-linear test signals preferably contain a certain mark (e.g. a single full scale peak) which can be used for synchronization purposes, but this is not mandatory.

As shown in fig. 1b, to extract the inverse transform function, the computer performs a synchronous playback and recording of the linear test signal (step 30). The computer processes both the test and recorded signals to extract a linear transfer function (step 32). A linear transfer function, also referred to as an "impulse response," characterizes the response of the loudspeaker to apply a delta function or impulse. The computer calculates the inverse linear transfer function and maps the coefficients to coefficients of a linear filter, such as a FIR filter (step 34). The inverse linear transfer function may be obtained in any way, but as will be described in more detail below, the properties of the audio signal and the loudspeaker may be most accurately represented using time domain calculations such as those provided by linear neural networks.

The computer performs synchronized playback and recording of the non-linear test signal (step 36). This step may be performed after the linear transfer function is extracted or offline while the linear test signal is being recorded. In a preferred embodiment, FIR filtering is applied to the recorded signal to remove linear distortion components (step 38). Extensive testing has shown that removal of linear distortion greatly improves the characteristics, and thus the inverse transfer function of non-linear distortion, although this is not always necessary. The computer subtracts the test signal from the filtered signal to provide an estimate of only the non-linear distortion component (step 40). The computer then processes the non-linear distortion signal to extract a non-linear transfer function (step 42) and calculates an inverse non-linear transfer function (step 44). The two transfer functions are preferably calculated using time domain calculations.

Simulations and tests of the present invention have shown that extraction of the inverse transfer function for both linear and non-linear distortion components improves the characteristics of the loudspeaker and its distortion compensation. In addition, the performance of the non-linear part of the solution (solution) can be greatly improved by removing the typically dominant linear distortion prior to characterization. Finally, using time domain calculations to calculate the inverse transfer function may also improve performance.

Linear distortion characterization

Exemplary embodiments of extracting forward and inverse linear transfer functions are shown in fig. 2-6. The first part of the problem is to provide a good estimate of the forward linear transfer function. This can be done in a number of ways, including simply applying a pulse to the loudspeaker and measuring the response, or using the inverse of the ratio of the recorded signal spectrum and the test signal spectrum. However, we have found that modifying the latter approach with a combination of time, frequency and/or time/frequency noise reduction techniques can provide a clearer forward linear transfer function. In the exemplary embodiment, all three noise reduction techniques are employed, but any one or two of the three techniques may be used for a given application.

The computer averages the recorded test signals over multiple cycles to reduce noise from random sources (step 50). The computer then divides the period of the test and recorded signal into as many sections M as possible, subject to the constraint that each section must exceed the duration of the impulse response of the loudspeaker (step 52). If this constraint is not met, the partial impulse responses of the loudspeakers will overlap and it is not possible to separate them. The computer calculates the frequency spectrum of the test and recording sections, e.g., by performing an FFT (step 54), and then forms the ratio of the recorded frequency spectrum to the corresponding test frequency spectrum to form M "segments" in the frequency domain of the loudspeaker impulse response (step 56). The computer filters each spectral line across M segments to select a subset of all N < M segments for that spectral line with similar amplitude responses (step 58). This "best-N-averaging" is based on what we know is that there is usually a set of segments in a typical audio signal in a noisy environment, where the corresponding spectral lines are hardly affected by "tonal" noise. Thus, the process effectively avoids noise rather than just reducing it. In an exemplary embodiment, the best-N averaging algorithm (for each spectral line) is:

1. the mean is calculated for the spectral lines over the available segments.

2. If there are only N fragments-stop.

3. If there are > N segments-find the segment whose value of the spectral line is farthest from the calculated average and remove it by further calculation.

4. Continue from step 1.

The output of the processing for each spectral line is a subset of the N "slices" with the best spectral line values. The computer then plots spectral lines from the segments listed in each subset to reconstruct the N segments (step 60).

A simple example is provided in fig. 3a and 3b to illustrate the steps of best-N averaging and segment reconstruction. On the left side of the figure are 10 "fragments" 70 corresponding to M ═ 10 segments. In this example, the spectrum 72 for each segment is represented by 5 spectral lines 74, and N-4 for the averaging algorithm. The output of the best-4 average is a subset of the segments for each line (line 1, line 2.. line 5) (step 76). The first segment "snap 1" 78 is reconstructed by adding spectral lines to the segments that are the first entry (entry) in each line 1, 2. The second segment "snap 2" is reconstructed by adding spectral lines to the segment that is the second entry in each line, and so on (step 80).

The process can be represented algorithmically as follows:

s (I, j) ═ FFT (record section (I, j))/FFT (test section (I, j)), where S () is segment 70, I ═ 1-M section and j ═ 1-P spectral lines;

line (j, k) ═ F (S (i, j)), where F () is the best-4 averaging algorithm, k ═ 1 to N; and

RS (k, j) ═ Line (j, k), where RS () is a reconstructed fragment.

The results of the best-4 averaging are shown in figure 3 c. As shown, the spectrum 82 resulting from a simple average of all the segments for each spectral line is very noisy. The "tonal" noise is very strong in some segments. By comparison, the spectrum 84 produced by the best-4 average has very little noise. It is important to note that this smoothed frequency response is not simply the result of averaging more segments, which can confuse and adversely affect the underlying transfer function. But rather the smoothed frequency response is a result of intelligently avoiding noise sources in the frequency domain, so that the noise level can be reduced while preserving the underlying information.

The computer performs an inverse FFT on each of the N frequency-domain segments to provide N time-domain segments (step 90). At this point, the N time domain segments may be simply averaged together to output a forward linear transfer function. However, in the exemplary embodiment, an additional wavelet filtering process (step 92) is performed on the N slices to remove noise that may be "localized" in multiple time-scales with the time/frequency representation of the wavelet transform. Wavelet filtering also results in a minimal amount of "ringing" in the filtered results.

One approach is to perform a single wavelet transform on the averaged time domain slices, pass the "approximate" coefficients and threshold the "detail" coefficients to zero for a predetermined energy level, and then inverse transform to extract the forward linear transfer function. This method does remove the noise normally found in the "detail" coefficients at different decomposition levels of the wavelet transform.

A better approach, as shown in figures 4a-4D, is to use each of the N slices 94 and perform a "parallel" wavelet transform that forms a 2D coefficient map 96 for each slice, and use the statistics of the coefficients of each transformed slice to determine which coefficients in the output map 98 are set to zero. If the coefficients are relatively consistent across N segments, the noise level may be low and thus the coefficients should be averaged and passed. Conversely, if the coefficient changes or deviates significantly, it is a good indicator of noise. One way is therefore to compare the measurement of the deviation with respect to a threshold. If the deviation exceeds the threshold, the coefficient is set to zero. This rationale may be applied to all coefficients, in which case some "detailed" coefficients that would have been assumed to be noise and set to zero may be retained, while some "approximate" coefficients that would have passed are set to zero, thus reducing noise in the final forward linear transfer function 100. Alternatively, all "detail" coefficients may be set to zero, while the statistical data is used to capture the noise approximation coefficients. In another embodiment, the statistical data may be a measure of the variation of the neighbors around each coefficient.

The effect of the noise reduction technique is illustrated in fig. 5a and 5b, which show the frequency response 102 of the final forward linear transfer function 100 for a typical loudspeaker. As shown, the frequency response is highly detailed and clear.

In order to maintain the accuracy of the forward linear transfer function, we need a method of transforming the transfer function to synthesize a FIR filter that can be flexibly adapted to the time and frequency domain properties of the loudspeaker and its impulse response. To achieve this, we have chosen a neural network. The use of a linear excitation function constrains the selection of a neural network structure to be linear. The weights of the linear neural network are trained using the forward linear transfer function 100 as an input and the target pulse signal as a target to provide an estimate of the loudspeaker's inverse linear transfer function a () (step 104). The error function may be constrained to provide a desired time domain constraint or frequency domain characteristic. Once trained, the weights from the nodes are mapped to the coefficients of the linear FIR filter (step 106).

Many known types of neural networks are suitable. The current state of the art of neural network architecture and training algorithms makes feed-forward networks (hierarchical networks where each layer receives only inputs from previous layers) good candidates. Existing training algorithms provide stable results and good generalization.

As shown in fig. 6, a single-layer single neuron neural network 117 is sufficient to determine the inverse linear transfer function. The time domain forward linear transfer function 100 is applied to the neuron through a delay line 118. This layer will have N delay elements to synthesize an FIR filter with N taps. Each neuron 120 computes a weighted sum of delay elements, the neuron 120 simply passing the delayed input. The excitation function 122 is linear so the weighted sum passes as the output of the neural network. In an exemplary embodiment, a 1024-1 feed-forward network structure (1024 delay elements and 1 neuron) performs well for a 512-point time-domain forward transfer function and a 1024-tap FIR filter. More complex networks comprising one or more hidden layers may be used. This may increase some flexibility but would require modification of the training algorithm and back propagation of the weights from the single hidden layer (or multiple hidden layers) to the input layer in order to map the weights to the FIR coefficients.

An off-line supervised elastic back propagation training algorithm adjusts weights with which to transfer a time-domain forward linear transfer function to a neuron. In supervised learning, the output of neurons is compared to target values in order to measure neural network performance during training. For the conversion of the forward transfer function, the target sequence contains a single "pulse", where all target values T are set to 1 (unity gain) except for one target value_iAre all zero. The comparison is performed by an average of a mathematical metric such as Mean Square Error (MSE). The standard MSE equation is:where N is the number of output neurons, O_iIs the neuron output value and T_iIs a sequence of target values. The training algorithm "back-propagates" the error through the network to adjust all weights. This process is repeated until the MSE is minimized and the weighting has converged to a solution. These weights are then mapped to the FIR filter.

Since the neural network performs time-domain calculations, i.e. the output values and the target values are in the time domain, a time-domain constraint can be applied to the error function to improve the properties of the inverse transfer function. For example, pre-echo is a psychoacoustic phenomenon where infrequently dominant artifacts (artifacts) of energy from time domain transients that tail back in time are audible in sound recordings. By controlling its duration and amplitude, we can reduce its audibility, or make it completely inaudible due to the presence of "forward temporal masking".

One way to compensate for the pre-echo is to weight the error function as a function of time. For example, byGiving the constrained MSE. We can assume time t<0 corresponds to pre-echo, and t<The error at 0 should be weighted more heavily. For example, D (-inf: -1) ═ 100 and D (0: inf) ═ 1. The back propagation algorithm will then weight the neuron W_iAn optimization is performed to minimize the weighted MSEw function. The weights can be adjusted to follow the temporal masking curve, and there are other ways of applying constraints to the error measurement function besides individual error weighting (e.g., constraining the combined error within a selected range).

An alternative example of constraining the combined error within the selected range a: B is given by:

here:

SSE_AB-certain ranges a: the sum of the squares of the errors in B;

O_i-a network output value;

T_i-a target value;

lim-some predetermined limit;

err-final error (or metric) value.

While neural networks are time domain computations, frequency domain constraints can be placed on the network to ensure desired frequency characteristics. For example, "over-amplification" occurs with an inverse transfer function at frequencies where the loudspeaker response has a deep dip. Excessive amplification will cause ringing in the time domain response. To prevent excessive amplification, the frequency envelope of the target pulse (which is initially equal to 1 for all frequencies) is attenuated at frequencies where the original loudspeaker response has a deep dip, so that the maximum amplitude difference between the original and target is below some db limit. The constrained MSE is given by:

T′＝F^-1[A_f·F(T)]

here, the

T' -constrained target vector;

t-original target vector;

o-network output vectors;

f () -represents the Fourier transform;

F¹() -representing an inverse fourier transform;

A_f-a target attenuation coefficient;

n-the number of samples in the target vector.

This will avoid excessive amplification and persistent ringing in the time domain.

Alternatively, the contribution of the error to the error function can be spectrally weighted. One way to impose such constraints is to compute individual errors, perform an FFT on those individual errors and then compare the results to zero using some metric, e.g., applying more weight to the high frequency components. For example, the constrained error function is given by:

here:

S_f-spectral weighting;

o-network output vectors;

t-original target vector;

f () -represents the Fourier transform;

err-final error (or metric) value;

the number of N-lines.

Both time and frequency domain constraints can be incorporated by modifying the error function, or applied simultaneously by simply adding the error functions together and minimizing the sum.

The combination of noise reduction techniques for extracting the forward linear transfer function and time-domain linear neural networks that support both time-domain and frequency-domain constraints provides a robust and accurate technique for synthesizing FIR filters to perform inverse linear transfer functions to pre-compensate for the linear distortion of the loudspeaker during playback.

Non-linear distortion characteristic

An exemplary embodiment for extracting forward and inverse non-linear transfer functions is shown in fig. 7. As described above, the FIR filter is preferably applied to the recorded non-linear test signal to effectively remove the linear distortion component. Although this is not strictly necessary, we have found that this approach significantly improves the performance of the inverse non-linear filtering. Conventional noise reduction techniques (step 130) may be applied to reduce random noise and other noise sources, but are often not necessary.

To solve the non-linear part of the problem, we use a neural network to estimate the non-linear forward transfer function (step 132). As shown in fig. 8, the feed-forward network 110 generally includes an input layer 112, one or more hidden layers 114, and an output layer 116. The excitation function is suitably a standard non-linear tanh () function. The weights of the non-linear neural network are trained using the original non-linear test signal I115 as an input to the delay line 118 and the non-linear distortion signal as a target in the output layer to provide an estimate of the forward non-linear transfer function F (). Time and/or frequency domain constraints may also be applied to the error function as required by the particular type of transformer. In an exemplary embodiment, the 64-16-1 feedforward network is trained with an 8 second test signal. Time domain neural network computations do perform well for significant non-linearities that may occur in transient regions of an audio signal, better than the frequency domain Volterra kernel.

To convert the non-linear transfer function, we use a formula that uses a non-linear neural network to recursively apply a forward non-linear transfer function F () to the test signal I and subtract a first order approximation Cj F (I) from the test signal I to estimate an inverse non-linear transfer function RF () for the loudspeaker (step 134), where Cj is a weighting coefficient for the jth recursive iteration. The weighting coefficients Cj are optimized using, for example, a conventional least squares minimization algorithm.

For a single iteration (non-recursive), the formula for the inverse transfer function is only Y-C1 f (I). In other words, an input audio signal I (in which linear distortion has been appropriately removed) is passed through a forward transform F () and the forward transform F () is subtracted from the audio signal I to produce a signal Y which has been "pre-compensated" for non-linear distortion of the loudspeaker. When the audio signal Y passes through the loudspeaker, the influence is eliminated. Unfortunately, this effect is not practically eliminated and the nonlinear residual signal is usually retained. The formula may bring the non-linear residue closer to zero by recursively iterating two or more times and thus having more weighting coefficients Ci to optimize. It has been shown that performance is improved for only two or three iterations.

For example, the cubic equation is given by:

Y＝I-C3*F(I-C2*F(I-C1*F(I)))。

assuming that I has been pre-compensated for linear distortion, the actual speaker output is Y + f (Y). To effectively remove non-linear distortion, we solve for Y + f (Y) -I ═ 0 and find the coefficients C1, C2, and C3.

For playback, there are two options. The weighting coefficients Ci of the trained neural network and the recursive formula can be provided to a speaker or receiver to simply reproduce the non-linear neural network and the iterative formula. A more efficient method of computation is to train a "playneutral network" (PNN) using a trained neural network and a recursive formula, the PNN directly computing the inverse non-linear transfer function (step 136). The PNN is also suitable for feed forward networks and may have the same structure (e.g., layers and neurons) as the original network. The PNN may be trained using the same input signals and outputs of the recursive formula as targets for training the original network. Alternatively, different input signals may be passed through the network and recursive formulas, with the input signals and resulting outputs being used to train the PNN. A significant advantage is that the inverse transfer function can be performed in a single pass through the neural network without requiring multiple (e.g., 3) passes through the network.

Distortion compensation and reproduction

In order to compensate for the linear and non-linear distortion characteristics of the loudspeaker, inverse linear and non-linear transfer functions must be actually applied to the audio signal before the audio signal is played through the loudspeaker. This can be achieved with a number of different hardware configurations and different applications of the inverse transfer function, two of which are shown in fig. 9a-9b and 10a-10 b.

As shown in fig. 9a, a loudspeaker 150 having three amplifier 152 and transducer 154 arrangements for low, mid and high frequencies is also provided with processing power components 156 and memory 158 to pre-compensate the input audio signal to eliminate or at least reduce loudspeaker distortion. In a standard loudspeaker, a crossover network is applied across the audio signal, which maps the audio signal to low, mid, and high frequency output transducers. In this exemplary embodiment, each low, mid, and high frequency component of the speaker is uniquely characterized for its linear and non-linear distortion properties. Filter coefficients 160 and neural network weights 162 are stored in memory 158 for each speaker component. These coefficients and weights may be stored in memory at the time of manufacture, as a service performed to characterize a particular speaker, or by the end user by downloading them from a website and transferring them to memory. The single processor (or multiple processors) 156 loads the filter coefficients into the FIR filter 164 and the weights into the PNN 166. As shown in fig. 10a, the processor applies a FIR filter to the audio input to precompensate the audio input for linear distortion (step 168), and then applies the signal to the PNN to precompensate the signal for non-linear distortion (step 170). Alternatively, the network weighting and recursive formula coefficients may be stored and loaded into the processor. As shown in fig. 10b, the processor applies a FIR filter to the audio input to pre-compensate the audio input for linear distortion (step 172), and then applies the signal to NN (step 174) and a recursive formula (step 176) to pre-compensate the signal for non-linear distortion.

As shown in figure 9b of the drawings,the audio receiver 180 may be configured to perform pre-compensation for a conventional speaker 182 having a crossover network 184 and amplifier/converter components 186 for low, mid, and high frequencies. Although the memory 188 for storing the filter coefficients 190 and the network weights 192 and the processor 194 for implementing the FIR filter 196 and the PNN 198 are shown as separate or added components to the audio decoder 200, it is quite feasible to design this functionality into the audio decoder. An audio decoder receives an encoded audio signal from a TV broadcast or DVD, decodes it and separates it into stereo (L, R) or multi-channel (L, R, C, L)_S，R_SLFE) which are directed to the respective loudspeakers. As shown, for each channel, the processor applies an FIR filter and PNN to the audio signal and directs the precompensated signal to the speakers 182.

As previously mentioned, the speaker itself or the audio receiver may be provided with microphone inputs and processing and algorithm capability components to characterize the speaker and train the neural network to provide the coefficients and weighting required for playback. This will provide the advantage of compensating for linear and non-linear distortions of the particular listening environment of each individual speaker, in addition to compensating for the distorted nature of that speaker.

Pre-compensation using the inverse transfer function will work for any output audio transducer such as the described speaker or amplifying antenna. However, in the case of any input transducer, such as a microphone, any compensation must be performed "after" converting from, for example, an audio signal to an electrical signal. The analysis used to train the neural network, etc., does not change. The composition for reproduction or playing is very similar, except that it occurs after conversion.

Test and results

The general approach illustrates the characterization and compensation of linear and non-linear distortion components, respectively, while the validity of the time-domain neural network-based solution is verified by the frequency and time-domain impulse responses measured for a typical loudspeaker. Pulses were applied to both speakers with and without correction and impulse responses were recorded. As shown in fig. 11, the spectrum 210 of the uncorrected impulse response is very unbalanced across the audio bandwidth from 0Hz to about 22 kHz. By comparison, the spectrum 212 of the corrected impulse response is very flat across the entire bandwidth. As shown in fig. 12a, the uncorrected time domain impulse response 220 includes substantial ringing. If the ringing time is long or the amplitude is high, it can be perceived by the human ear as reverberation added to the signal or as a coloration (change in spectral characteristics) of the signal. As shown in fig. 12b, the corrected time domain impulse response 222 is very clear. The clear pulse demonstrates that the frequency characteristic of the system is close to unity gain as shown in figure 10. This is desirable because it does not add coloration, reverberation, or other distortion to the signal.

While several illustrative embodiments of the invention have been shown and described, numerous variations and alternative embodiments will occur to those skilled in the art. Such variations and alternative embodiments are contemplated and may be made without departing from the spirit and scope of the present invention as defined in the appended claims.

The claims (modification according to treaty clause 19)

1. A method of determining inverse linear and non-linear transfer functions of an audio transducer for precompensating an audio signal for reproduction on the transducer, comprising the steps of:

a) synchronously playing and recording a linear test signal through the audio transducer;

b) extracting a forward linear transfer function for the audio transducer from the linear test signal and its recorded version;

c) converting the forward linear transfer function to provide an estimate of an inverse linear transfer function A () for the transformer;

d) mapping the inverse linear transfer function to respective coefficients of a linear filter;

e) synchronously playing and recording a non-linear test signal I through the converter;

f) applying the linear filter to the recorded non-linear test signal and subtracting the result from the original non-linear test signal to estimate the non-linear distortion of the transducer;

g) extracting a forward non-linear transfer function F (), from the non-linear distortion; and

h) converting the forward non-linear transfer function to provide an estimate of an inverse non-linear transfer function, RF (), for the transformer.

2. The method of claim 1, wherein the steps of playing and recording the linear test signal are performed by reference to a shared clock signal such that the signals are time aligned within a single sampling period.

3. The method of claim 1, wherein the test signal is periodic and the forward linear transfer function is extracted by:

averaging the recording signals of a plurality of periods to obtain an average recording signal;

dividing the averaged recording signal and the linear test signal into a plurality of similar M time segments;

performing a frequency transformation and ratioing the similar recording and testing segments to form a plurality of similar segments, each segment having a plurality of spectral lines;

filtering each spectral line to select a subset of all N < M segments with similar amplitude responses to the spectral line;

drawing spectral lines from the segments listed in each subset to reconstruct the N segments;

inverse transforming the reconstructed segments to provide N time-domain segments of the forward linear transfer function; and

wavelet filtering the N time-domain slices to extract the forward linear transfer function.

4. A method according to claim 3, wherein the average recorded signal is divided into as many sections as possible subject to the constraint that each section must exceed the duration of the transducer impulse response.

5. A method as claimed in claim 3, wherein the wavelet filtering is applied in parallel by:

transforming each time domain slice wavelet into a 2D coefficient map;

computing statistics of the coefficients across the 2D coefficient map;

selectively zeroing coefficients in the 2D coefficient map based on the statistics;

averaging the 2D coefficient map to obtain an average map; and

inverse wavelet transforming the mean map into a forward linear transfer function.

6. The method of claim 5, wherein the statistical data measures a deviation between coefficients at the same location from different maps, the coefficients being made zero if the deviation exceeds a threshold.

7. The method of claim 1, wherein the forward linear transformation is converted to estimate the inverse linear transfer function a () by using the forward linear transfer function as an input and weights for training a linear neural network using a target pulse signal as a target.

8. The method of claim 7, wherein the weights are trained according to an error function, further comprising placing a time-domain constraint on the error function.

9. The method of claim 8, wherein the time domain constraint weights errors in the pre-echo portion more heavily.

10. The method of claim 7, wherein the weights are trained according to an error function, further comprising placing a frequency domain constraint on the error function.

11. The method of claim 10, wherein the frequency domain constraint attenuates the envelope of the target impulse signal to clip a maximum difference between the target impulse signal and an original impulse response with some predetermined limit.

12. The method of claim 10, wherein the frequency domain constraints weight spectral components of the error function differently.

13. The method of claim 7, wherein the linear neural network comprises N delay elements passing an input, N weights on each delay input, and a single neuron computing as an output a weighted sum of the delay inputs.

14. The method of claim 1, wherein the forward non-linear transfer function F () is extracted by weights that train a non-linear neural network using an original non-linear test signal I as an input and non-linear distortion as a target.

15. The method of claim 1, wherein the forward non-linear transfer function F () is recursively applied to the test signal I and Cj x F (I) is subtracted from the test signal I to estimate an inverse non-linear transfer function RF (), where Cj is a weighting coefficient for a recursive iteration of order j, j being greater than one.

16. A method of determining an inverse linear transfer function a () of a transformer for precompensating an audio signal for reproduction on said transformer, comprising the steps of:

a) synchronously playing and recording linear test signals through the converter;

b) extracting a forward linear transfer function for the transducer from the linear test signal and its recorded version;

c) training the weights of a linear neural network using the forward linear transfer function as an input and a target pulse signal as a target to provide an estimate of an inverse linear transfer function A () for the transformer; and

d) the trained weights from the NN are mapped to the corresponding coefficients of the linear filter.

17. The method of claim 16, wherein the test signal is periodic and the forward linear transfer function is extracted by:

performing a frequency transformation and ratioing similar recording and testing segments to form similar segments, each segment having a plurality of spectral lines;

filtering the N time-domain segments to extract the forward linear transfer function.

18. The method of claim 17, wherein the time domain segment is filtered by:

transforming each time domain slice wavelet into a 2D coefficient map;

computing statistics of the coefficients across the 2D coefficient map;

averaging the 2D coefficient map to obtain an average map; and

19. The method of claim 16, wherein the forward linear transfer function is extracted by:

processing the test and record signals to provide N time domain segments of the forward linear transfer function;

transforming each time domain slice wavelet into a 2D coefficient map;

computing statistics of the coefficients across the 2D coefficient map;

averaging the 2D coefficient map to obtain an average map; and

20. The method of claim 19, wherein the statistical data measures a deviation between coefficients at the same location from different maps, and causes the coefficients to be zero if the deviation exceeds a threshold.

21. The method of claim 16, wherein the linear neural network comprises N delay elements passing an input, N weights on each delay input, and a single neuron computing as an output a weighted sum of the delay inputs.

22. The method of claim 16, wherein the weights are trained according to an error function, further comprising placing a time-domain constraint on the error function.

23. The method of claim 16, wherein the weights are trained according to an error function, further comprising placing a frequency domain constraint on the error function.

24. A method of determining an inverse non-linear transfer function of a transducer for precompensating an audio signal for reproduction on said transducer, comprising the steps of:

a) synchronously playing and recording a non-linear test signal I through the converter;

b) estimating a non-linear distortion of the transducer from the recorded non-linear test signal;

c) training the weights of a non-linear neural network using an original non-linear test signal I as an input and using non-linear distortion as a target to provide an estimate of a forward non-linear transfer function F ();

d) recursively applying the forward nonlinear transfer function F () to the test signal I using the nonlinear neural network and subtracting Cj F (I) from the test signal I to estimate an inverse nonlinear transfer function RF () for the transformer, where Cj is a weighting coefficient for a recursive iteration of order j; and

e) the weighting coefficients Cj are optimized.

25. The method of claim 24, wherein the non-linear distortion is estimated by removing linear distortion from the recorded non-linear test signal and subtracting the result from the original non-linear test signal.

26. The method of claim 24, further comprising the steps of:

training a non-linear Play Neural Network (PNN) using a non-linear input test signal applied to the non-linear neural network as an input and using an output of the recursive application as a target, whereby the PNN directly estimates the inverse non-linear transfer function RF ().

27. A method of pre-compensating an audio signal X for reproduction on an audio transducer, comprising the steps of:

a) applying the audio signal X to a linear filter to provide a linear pre-compensated audio signal X' ═ a (X), wherein a transfer function of the linear filter is an estimate of the inverse linear transfer function a () of the transformer; and

b) applying said linear pre-compensated audio signal X 'to a non-linear filter to provide a pre-compensated audio signal Y ═ RF (X'), wherein a transfer function of the non-linear filter is an estimate of said inverse non-linear transfer function RF () of said transformer, and

c) the pre-compensated audio signal Y is directed to the transformer.

28. The method of claim 27, wherein the linear filter comprises a FIR filter, wherein coefficients of the FIR filter are mapped by weights of a linear neural network, wherein a transfer function of the linear neural network estimates an inverse linear transfer function of the transformer.

29. The method of claim 27, wherein the non-linear filter is implemented by:

applying X 'as an input to a neural network to output an estimate F (X') of a non-linear distortion produced by the transformer, wherein a transfer function F () of the neural network is a representation of the forward non-linear transfer function of the transformer; and

recursively subtracting the weighted non-linear distortion Cj X F (X ') from the audio signal I to produce said precompensated audio signal Y RF (X'), where Cj is a weighting coefficient for a recursive iteration of order j.

30. The method of claim 27, wherein the non-linear filter is implemented by:

passing X 'through a non-linear play neural network to produce a pre-compensated audio signal Y ═ RF (X'), where the transfer function RF () of the non-linear play neural network is an estimate of the inverse non-linear transfer function, the transfer function RF () being trained to emulate recursively subtracting Cj X F (I) from the audio signal I, where F () is the forward non-linear transfer function of the transformer and Cj is the weighting coefficient for the jth order recursive iteration.

31. A method of compensating an audio signal I for an audio transducer, comprising the steps of:

a) providing the audio signal as an input to a neural network to output an estimate F (I) of a non-linear distortion produced by the transducer for audio signal I, wherein a transfer function F () of the neural network is a representation of the forward non-linear transfer function of the transducer; and

b) recursively subtracting the weighted non-linear distortions Cj x f (I) from the audio signal I to produce a compensated audio signal Y, where Cj is a weighting coefficient for a recursive iteration of order j.

32. A method of compensating an audio signal I for an audio transducer, comprising the steps of: passing the audio signal I through a non-linear playing neural network to produce a pre-compensated audio signal Y, wherein a transfer function RF () of the non-linear playing neural network is an estimate of an inverse non-linear transfer function of the transformer, the transfer function RF () being trained to simulate a recursive subtraction of Cj x F (I) from the audio signal I, where F () is a forward non-linear transfer function of the transformer and Cj is a weighting coefficient for a recursive iteration of order j.

Claims

transforming each time domain slice wavelet into a 2D coefficient map;

computing statistics of the coefficients across the 2D coefficient map;

averaging the 2D coefficient map to obtain an average map; and

14. The method of claim 1, wherein the forward non-linear transfer function F () is extracted by weights for training a non-linear neural network using an original non-linear test signal I as an input and using non-linear distortion as a target.

18. The method of claim 17, wherein the time domain segment is filtered by:

transforming each time domain slice wavelet into a 2D coefficient map;

computing statistics of the coefficients across the 2D coefficient map;

averaging the 2D coefficient map to obtain an average map; and

transforming each time domain slice wavelet into a 2D coefficient map;

computing statistics of the coefficients across the 2D coefficient map;

averaging the 2D coefficient map to obtain an average map; and

e) the weighting coefficients Cj are optimized.

26. The method of claim 24, further comprising the steps of:

c) the pre-compensated audio signal Y is directed to the transformer.

29. The method of claim 27, wherein the non-linear filter is implemented by:

30. The method of claim 27, wherein the non-linear filter is implemented by: