Disclosure of Invention
In view of this, the present application provides a transform circuit, a method, an apparatus, and an encoder to solve the problem of high computational complexity of the conventional video encoder.
One aspect of the present application provides a transform circuit, including a processing unit corresponding to each operation transform element in an operation transform matrix; the operation transformation matrix is obtained by transformation according to an initial transformation matrix of a specific transformation algorithm;
the processing unit is used for obtaining the product between the corresponding operation transformation element and the input residual error element by setting a summation path.
Optionally, the particular transform algorithm comprises a plurality of residual transform algorithms; the operation transformation elements corresponding to the residual transformation algorithms are in a set value range;
the processing unit is further configured to perform parameter configuration according to the operation transformation element corresponding to each residual transformation algorithm to obtain a product between the operation transformation element and the corresponding residual element.
Optionally, the particular transform algorithm comprises at least one set of residual transform algorithms; in the initial transformation matrix corresponding to each group of residual transformation algorithms, the initial transformation elements in the corresponding row of each initial transformation matrix have the following characteristics: the absolute values are the same, the sequence is opposite, the signs of the initial transformation elements corresponding to the odd lines are the same, and the signs of the initial transformation elements corresponding to the even lines are opposite.
Optionally, the transform circuit is further configured to switch from one residual transform algorithm to another residual transform algorithm in each set of residual transform algorithms.
Optionally, the processing unit includes a plurality of shifters, each shifter corresponds to a channel selector and a multi-stage adder, and each multi-stage adder includes two input ends;
the input end of each shifter is respectively connected with the corresponding residual error element, and the output end of each shifter is connected with the shifting end of the corresponding shifter; the zero setting end of each channel selector is respectively connected with a zero setting signal, the control end is respectively connected with a displacement selection signal representing a selected displacement end or a zero setting selection signal representing a selected zero setting end, and the output end is respectively connected with one input end of the primary adder; two input ends of the first-stage adder are respectively connected with the output ends of the first two channel selectors, the output end of the last-stage adder is used for outputting the product between the operation transformation element and the residual element, and the other input end of each of the other stages of adders is connected with the output end of the first-stage adder.
Optionally, the processing unit further comprises a symbol selector; the input end of the sign selector is connected with the output end of the last-stage adder and used for determining the positive sign and the negative sign of the product.
Optionally, the operation transformation matrix is determined according to an equivalent matrix of the initial transformation matrix; the equivalent matrix is obtained by respectively reducing each initial transformation element of the initial transformation matrix by 2n times and rounding each reduced initial transformation element.
Optionally, the value of n is 3; the processing unit comprises a first shifter, a second shifter, a third shifter, a first channel selector corresponding to the first shifter, a second channel selector corresponding to the second shifter, a third channel selector corresponding to the third shifter, a first-stage adder and a second-stage adder;
one input end of the first-stage adder is connected with the output end of the first channel selector, the other input end of the first-stage adder is connected with the output end of the second channel selector, the output end of the first-stage adder is connected with one input end of the second-stage adder, the other input end of the second-stage adder is connected with the output end of the third channel selector, and the output end of the second-stage adder is used for outputting the product between the operation transformation element and the residual error element.
Optionally, if the first matrix type of the initial transformation matrix is larger than the array type of the processing unit in the transformation circuit, the operation transformation matrix includes partial elements of the equivalent matrix.
Another aspect of the present application provides a transformation method, including:
transforming the initial transformation matrix of the specific transformation algorithm to obtain an operation transformation matrix;
and transforming the residual sequence to be transformed by adopting any one of the transformation circuits according to the operation transformation matrix.
Optionally, the transforming the initial transformation matrix of the specific transformation algorithm to obtain the operation transformation matrix includes:
respectively reducing each initial transformation element of the initial transformation matrix by 2nAfter doubling, rounding each reduced initial transformation element to obtain an equivalent matrix;
and determining the operation transformation matrix according to the equivalent matrix.
Optionally, the determining the operation transformation matrix according to the equivalent matrix includes: obtaining at least one operational transformation matrix matched with the transformation circuit according to the equivalent matrix;
the transforming the to-be-transformed residual sequence according to the operational transformation matrix by adopting any one of the transformation circuits comprises the following steps: and dividing the to-be-transformed residual sequence into at least one sub-residual sequence according to the equivalent matrix and each operational transformation matrix, and transforming the corresponding sub-residual sequence according to each operational transformation matrix by adopting the transformation circuit respectively so as to transform the to-be-transformed residual sequence.
Optionally, the transforming process of the sub residual sequence includes:
configuring corresponding processing units in the transformation circuit according to each operation transformation element of the operation transformation matrix so that each processing unit can obtain the product between the corresponding operation transformation element and the residual error element;
and inputting the sub residual sequence into the configured transformation circuit so as to transform the sub residual sequence.
Optionally, the second matrix type of the operation transformation matrix is smaller than or equal to the array type of the processing unit in the transformation circuit;
before configuring the corresponding processing unit in the transform circuit according to each operation transform element of the operation transform matrix, the transform process of the sub-residual sequence further includes:
and if the second matrix model is smaller than the array model, selecting a current operation circuit corresponding to the second matrix model from the conversion circuit, and determining the corresponding relation between each operation conversion element in the operation conversion matrix and each processing unit in the current operation circuit.
Optionally, the processing units of the transformation circuit form a square matrix array; the current operational circuit comprises an upper left corner array and/or a lower right corner array of the square array.
Optionally, the configuration process of the processing unit includes:
determining the shift parameters of each shifter and the channel selection signals accessed by the control end of each channel selector according to the corresponding operation transformation elements; the channel selection signal comprises a shift selection signal for representing an optional shift end or a zero setting selection signal for representing an optional zero setting end.
Optionally, the process of configuring the processing unit further includes:
the sign of the sign selector is configured according to the sign of the corresponding operation transformation element.
Optionally, the method of converting electricity further comprises:
amplifying 2 an initial conversion result output by the conversion circuitnAnd (4) doubling.
Optionally, the transform circuit is a DCT-VIII circuit implementing a DCT-VIII algorithm; the process of obtaining the transformation result corresponding to the DST-VII algorithm by adopting the DCT-VIII circuit comprises the following steps:
inputting each residual error element in the residual error sequence to be transformed into the DCT-VIII circuit according to the reverse order;
and inverting the output result of the DCT-VIII circuit at an even number position to obtain a transformation result corresponding to the DST-VII algorithm.
Another aspect of the present application provides a conversion apparatus including any one of the conversion circuits described above.
Another aspect of the present application provides an encoder comprising any of the above-described transform circuits.
In the transformation circuit, the method, the device and the encoder, each processing unit obtains the product between the corresponding operation transformation element and the input residual error element by setting the summation path, so that the residual error transformation corresponding to a specific transformation algorithm is realized, the area of the residual error transformation circuit can be reduced, and the power consumption of the residual error transformation circuit is reduced; the method can rapidly switch among a plurality of residual transformation algorithms by configuring each processing unit and/or adjusting input and output, and has higher switching efficiency and utilization rate; the method can also simultaneously transform a plurality of residual sequence with relatively small size and/or realize the transformation of the residual sequence with relatively large size through a plurality of transformations, and has high flexibility; the performance of the corresponding encoder can be improved, and the development cost of the encoder is reduced.
Detailed Description
As described in the background, the introduction of the adaptive multi-transform technique enables higher energy concentration effects and better compression performance at the transform coding module, but the overall encoder computational complexity rises dramatically due to the introduction of two additional transform types DST-VII and DCT-VIII and the allowance of asymmetrically sized transforms. Therefore, a low-cost, high-performance hardware implementation of the conversion circuit is crucial for implementing a real-time VVC encoder. The transformation architecture of the related height pipeline design uses the shift accumulation unit to replace the original multiplication operation, and maximally supports the transformation size of 32x32, although the area is saved to a certain extent, independent circuits are still used between different sizes in the transformation architecture, and no multiplexing mechanism exists between the different sizes, so that a large optimization space is still left in the area.
In order to solve the above problems, in the transform circuit, the method, the apparatus, and the encoder provided by the present application, each processing unit obtains a product between a corresponding operation transform element and an input residual error element by setting a summation path, so as to implement a residual error transform corresponding to a specific transform algorithm, reduce the area of the residual error transform circuit, and reduce the power consumption of the residual error transform circuit; the method can rapidly switch among a plurality of residual transformation algorithms by configuring each processing unit and/or adjusting input and output, and has higher switching efficiency and utilization rate; the method can also simultaneously transform a plurality of residual sequence with relatively small size and/or realize the transformation of the residual sequence with relatively large size through a plurality of transformations, and has high flexibility; the performance of the encoder can be improved, and the development cost of the corresponding encoder is reduced.
The technical solutions in the embodiments of the present application are clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application. The following embodiments and their technical features may be combined with each other without conflict.
In a first aspect, the present application provides a conversion circuit, referred to as1, the transformation circuit comprises a processing unit corresponding to each operation transformation element in an operation transformation matrix; the operation transformation matrix is obtained by transformation according to an initial transformation matrix of a specific transformation algorithm; the processing unit is used for obtaining the product between the corresponding operation transformation element and the input residual error element by setting a summation path. Fig. 1 shows a transform circuit including 8 × 8 processing units, where the operational transform elements corresponding to each small box are Tij, and i and j are integers greater than or equal to 0 and less than or equal to 7, respectively. As shown in fig. 1, the residual error sequence X to be transformedT=[X0,X1,X2,X3,X4,X5,X6,X7]TEach residual element of the transform circuit shown in fig. 1 is input into each row of processing units of the transform circuit, so that each processing unit can perform operation on each residual element to transform the to-be-transformed residual sequence. The processing units can carry out parameter configuration respectively according to each operational transformation element in the operational transformation matrix corresponding to the specific transformation algorithm, and the configured processing units can accurately transform corresponding residual error elements, so that the transformation circuit can realize a complex residual error transformation process by a simple circuit, can be multiplexed by a plurality of residual error transformation algorithms, can reduce the area of the residual error transformation circuit, and reduce the power consumption of the residual error transformation circuit, thereby reducing the development cost of the corresponding encoder.
In one embodiment, the particular transform algorithm comprises a plurality of residual transform algorithms; the operation transformation elements corresponding to the residual transformation algorithms are in a set value range, so that the processing unit can accurately operate the operation transformation elements and the corresponding residual operations by configuring the processing unit parameters corresponding to the operation transformation elements, and the corresponding transformation precision is ensured. The processing unit is further configured to perform parameter configuration according to the operation transformation element corresponding to each residual transformation algorithm to obtain a product between the operation transformation element and the corresponding residual element. Specifically, the processing unit may perform parameter configuration according to an operation transformation element corresponding to a residual transformation algorithm in each transformation process, so as to perform corresponding operations on the operation transformation element and a corresponding residual element, and thus the transformation circuit may transform the to-be-transformed residual sequence by using one residual transformation algorithm in each transformation, so that each residual transformation algorithm may multiplex the transformation circuit, which may improve the utilization rate of the transformation circuit and improve the transformation function, thereby improving the encoding capability of the corresponding encoder and reducing the circuit area and the development cost of the encoder. The set value range is determined according to the specific structure of each processing unit in the conversion circuit, and can be a relatively small numerical range from 0 to 11, so that the circuit structure of the processing unit is relatively simple.
In particular, the particular transformation algorithm comprises at least one set of residual transformation algorithms; each set of residual transform algorithms may comprise two residual transform algorithms, such as a DST-VII algorithm and a DCT-VIII algorithm. In the initial transformation matrix corresponding to each group of residual transformation algorithms, the initial transformation elements of the corresponding rows (rows with the same serial number) of each initial transformation matrix have the following characteristics: the absolute values are the same, the sequence is opposite, the signs of the initial transformation elements corresponding to the odd lines are the same, and the signs of the initial transformation elements corresponding to the even lines are opposite. Corresponding rows of each initial transformation matrix comprise the same row elements, and the row elements are arranged in a positive sequence in one initial transformation matrix and in a reverse sequence in the other initial transformation matrix; if the corresponding row is an odd-numbered row, the corresponding elements in each initial transformation matrix have the same sign and the same absolute value, and if the corresponding row is an even-numbered row, the corresponding elements in each initial transformation matrix have opposite numbers, that is, the signs are opposite, and the absolute values are the same. Referring to fig. 2a, a1 is an initial transformation matrix of one residual transformation algorithm in a group of residual transformation algorithms (referred to as a1 algorithm for short), a2 is an initial transformation matrix of another residual transformation algorithm in the group of residual transformation algorithms (referred to as a2 algorithm for short), in a1 and a2, the initial transformation elements in the first row of a1 are a00, a01, a02 and a03, the initial transformation elements in the first row of a2 are a03, a02, a01 and a00, the sequence of the initial transformation elements in the first row of a1 is opposite, the signs of the corresponding elements are the same, and the absolute values are the same; the second row of initial transformation elements of A1 is a10, a11, a12 and a13, the second row of initial transformation elements of A2 is-a 13, -a 12, -a 11 and-a 10, the sequence is opposite to that of the second row of initial transformation elements of A1, the signs of the corresponding elements in the rows are opposite, and the absolute values are equal. In the corresponding row, the first element corresponds to the last element, the second element corresponds to the second last element, the third element corresponds to the third last element, … …, and so on.
Optionally, the transform circuit is further configured to switch from one residual transform algorithm to another residual transform algorithm in each group of residual transform algorithms, so that the transform circuit can be used to implement the function of each residual transform algorithm in each group of residual transform algorithms only by performing fine tuning on the input and output conditions without increasing additional computing resources, thereby improving the multiplexing rate and the working efficiency of the circuit and reducing the circuit area. For example, if the conversion circuit is currently the a1 circuit that implements the a1 algorithm, performing the following two operations on the input and output of the a1 circuit can obtain the conversion result corresponding to the a2 algorithm: (1) inputting each residual element in the residual sequence to be transformed into an A1 circuit in reverse order, and (2) inverting the output result of the A1 circuit at even positions.
In one example, the transformation principle of the above-described transformation circuit is illustrated by taking the DST-VII algorithm and the DCT-VIII algorithm as examples. The transform coding process in video data coding is to perform two-dimensional transform on the residual block, and the two-dimensional transform can be decomposed into two times of one-dimensional transform, namely row transform and column transform, according to the separability of the two-dimensional transform. Both have order-preserving rows, i.e. row-first or column-first transformation does not affect the final result. The formula for the one-dimensional transformation includes: y is TN·XTWherein T isNRepresenting an initial transformation matrix of the DST-VII or DCT-VIII algorithm, the model of which is NxN, for transforming a residual sequence to be transformed, X, comprising N residual elementsTRepresenting the residual sequence to be transformed, and Y representing the transformation result. The relation between the DST-VII algorithm and the DCT-VIII algorithm, the initial transformation matrix DCT-VIII of the DCT-VIII algorithm, will be described here by taking N ═ 4 as an example4×4And initial transformation matrix DST-VII of DST-VII algorithm4×4Referring to fig. 2b, comparing the two initial transformation matrices shown in fig. 2b, it can be seen that: (1) each initial transformation element of the corresponding row has a corresponding relation, and the elements of a certain row are arranged in a positive sequence in one initial transformation matrix and in a reverse sequence in the other initial transformation matrix; (2) in odd rowsThe corresponding elements have the same symbol, and the corresponding elements in the even-numbered rows are opposite numbers. According to the two characteristics and the combination of the one-dimensional transformation process, the transformation circuit provided by the application can transform the residual transformation algorithms of the residual transformation algorithms respectively only by configuring and/or adjusting input and output. For example, if the transform circuit is currently a DCT-VIII circuit, the following two operations are performed on the input and output of the DCT-VIII circuit to obtain the transform result corresponding to the DST-VII algorithm: (1) inputting each residual element in the residual sequence to be transformed into the DCT-VIII circuit according to the reverse order, and (2) inverting the output result of the DCT-VIII circuit at the even number position. Therefore, under the condition of not increasing additional computing resources, only the input and output conditions need to be finely adjusted, the function of the DST-VII algorithm can be realized by using the DCT-VIII circuit, the multiplexing rate of the circuit is improved, and the circuit area is reduced.
The purpose of residual transformation in video coding is to concentrate the residual in the upper left corner, and combine quantization and entropy coding to obtain higher compression efficiency. Integer transformation used in VVC (Video Coding, a new generation of Video Coding and decoding standard) is obtained by approximately scaling a standard transformation matrix. Based on this idea, the present example employs two matrix multiplications to approximate the transformation matrix in VVC, the corresponding matrix transformation process is as follows:
representing an initial transformation matrix T
NApproximated transformation matrix, C
NIndicating that each initial transformation element of the initial transformation matrix is respectively reduced by 2
nAfter the multiplication, rounding (such as rounding or rounding up) each reduced initial transformation element to obtain an equivalent matrix, S
NExpressed as a diagonal matrix with values of 2 in each diagonal direction
nThe value of n can be determined according to the characteristics of circuit resources, operation precision and/or processing complexity, and can be 3 equivalent values. Taking the DCT-VIII algorithm as an example, the matrix transformation process may be:
according to the formula, the compound has the advantages of,
when the residual error conversion corresponding to the DCT-VIII algorithm is realized, C is firstly carried out
NAs a matrix for the arithmetic transformation, a further amplification of 2 is carried out at the output position
nAnd (4) finishing. Due to the fact that 2
nScaling, only rearrangement of lines on a hardware circuit, therefore, additional area is not introduced, and the circuit area required by the conversion circuit can be effectively reduced. Operation transformation matrix C corresponding to DCT-VIII algorithm
NIt can be known that, the absolute value of each operation transformation element is between 0 and 11, and although the values of each element of the initial transformation matrixes with different sizes are different, after similar approximation, each element in the corresponding operation transformation matrix is also between 0 and 11, so that each processing unit in the transformation circuit can realize each residual transformation algorithm through corresponding configuration, and the multiplexing design of circuits with different sizes is possible.
In one embodiment, referring to fig. 3a, the processing unit includes a plurality of shifters 110, each shifter 110 corresponding to a channel selector 120, and a multi-stage adder 130, each multi-stage adder 130 including two input terminals;
the input end of each shifter 110 is connected to the corresponding residual error element, and the output end is connected to the shift end of the corresponding shifter 120; the zero setting end of each channel selector 120 is respectively connected to a zero setting signal (0 as shown in fig. 3 a), the control end is respectively connected to a shift selection signal representing a selective shift end or a zero setting selection signal representing a selective zero setting end, and the output end is respectively connected to one input end of the first-stage adder, for example, in fig. 3a, the control end of each channel selector 120 is respectively connected to a selection signal mux, when mux is 1, the selection signal mux represents the shift selection signal, when mux is 0, the selection signal mux represents the zero setting selection signal, and when mux is zero setting end; two input ends of the first-stage adder are respectively connected with the output ends of the first two channel selectors, the output end of the last-stage adder is used for outputting the product between the operation transformation element and the residual element, and the other input end of each of the other stages of adders is connected with the output end of the first-stage adder.
Optionally, as shown in fig. 3b, the processing unit may further include a symbol selector 141; the input of the sign selector 141 is connected to the output of the last adder stage for determining the sign of the product. The output end of the last-stage adder is connected with the sign selector 141, so that when the signs of the operation transformation elements corresponding to the residual error transformation algorithms are opposite, a circuit for realizing one residual error transformation algorithm can be quickly switched to a circuit for realizing another residual error transformation algorithm, and the switching efficiency of the transformation circuit among the residual error transformation algorithms can be improved.
Because the operation transformation element corresponding to each residual transformation algorithm is in the set value range, the processing unit can realize multiplication between the operation transformation element and the residual element through a plurality of shifters, corresponding channel selectors and adders at each stage. The present embodiment uses shift and addition instead of multiplication originally, and can make the area of each processing unit smaller. In addition, by configuring the shift parameters of each shifter 110, the signal channel selected by each channel selector 120, and the sign channel selected by the sign selector 141, the corresponding transform circuit can be switched between the residual transform algorithms quickly.
Specifically, the operation transformation matrix is determined according to an equivalent matrix of the initial transformation matrix; the equivalent matrix is obtained by respectively reducing each initial transformation element of the initial transformation matrix by 2nAfter doubling, rounding each reduced initial transformation element to obtain a matrix; the value of n can be determined according to the characteristics of circuit resources, operation precision, processing complexity and the like. Optionally, the respective transformation circuit may also re-amplify the initial transformation result by 2 at the output positionnSo as to ensure the accuracy of the finally obtained transformation result.
Optionally, if the first matrix type (i.e., the number of rows or columns of the corresponding square matrix) of the initial transformation matrix is greater than the array type (i.e., the number of rows or columns of the processing unit) of the processing unit in the transformation circuit, the equivalent matrix may be divided into a plurality of equivalent blocks at this time, and the operational transformation matrix includes a part of elements corresponding to one equivalent block in the equivalent matrix, so that the transformation circuit is adopted to transform corresponding residual elements according to each equivalent block, thereby ensuring the integrity of the implemented algorithm.
In one example, n takes the value of 3; referring to fig. 3c, the processing unit includes a first shifter 111, a second shifter 112, a third shifter 113, a first channel selector 121 corresponding to the first shifter 111, a second channel selector 122 corresponding to the second shifter 112, a third channel selector 123 corresponding to the third shifter 113, a first-stage adder 131, and a second-stage adder 132; the first-stage adder 131 has an input terminal connected to the output terminal of the first channel selector 121, another input terminal connected to the output terminal of the second channel selector 122, and an output terminal connected to an input terminal of the second-stage adder 132, another input terminal of the second-stage adder 132 connected to the output terminal of the third channel selector 123, and an output terminal for outputting the product between the operation transformation element and the residual element.
The operation transformation elements corresponding to the operation transformation matrix are within a set value range, such as 0-11, at the moment, each processing unit can realize output from 0x to 11x through three shifters and two adders, and the reconfigurability of the processing unit is mainly embodied in shift control and output symbol control. For the processing unit shown in fig. 3c, 3-bit shift signals can be used to configure the shift bit number of each shifter shifted to the left, and three selection signals mux are used to select the outputs of the three shifters, so that the product of the residual element and the operation transformation element can be obtained through a fixed summation path; finally, the sign selector is adopted to select the positive value and the negative value of the product. For example, when the processing unit implements 10x (where 10 is a coefficient and x is an input), the shift parameters of the first shifter, the second shifter and the third shifter may be configured to be 3, 1 and 0, respectively, to obtain 8x, 2x and 0, and after the selection of the corresponding channel selector, the summation result is (8x +2x +0 ═ 10x), and when the sign selector selects a positive sign channel, the multiplication operation of 10x can be implemented.
Further, when the conversion circuit includes 64 processing units, the conversion circuit may adopt a configuration of horizontal input and vertical output as shown in fig. 1. Namely 8 input residual error elements are input from the horizontal direction, sequentially pass through 8 processing units, and 64 intermediate results are output; the intermediate results in the vertical direction are accumulated and shifted to obtain the initial transformation result of the corresponding residual element, so that the total number of the output results is 8. A transform circuit comprising 8 × 8 — 64 processing units may be directly used for 8-point (i.e. a to-be-transformed residual sequence comprising 8 residual elements) transforms.
Alternatively, for a single 4-point transform (a to-be-transformed residual sequence including 4 residual elements), part of the processing units in the transform circuit may be selected for the transform in this example, such as 4 × 4 processing unit arrays in the upper left corner or the lower right corner. At this time, the processing unit arrays in the upper left corner or the lower right corner, etc. may be configured according to the corresponding operation transformation elements, and the other processing units output a value of 0. Therefore, single 4-point transformation can be conveniently realized, a plurality of 4x4 processing unit arrays can be simultaneously adopted to respectively realize a plurality of 4-point transformations, and two 4-point transformations are simultaneously performed by using 4x4 areas at the upper left corner and the lower right corner, so that the transformation efficiency is further improved.
Optionally, for residual transformation with a size greater than 8 points, such as 16-point transformation (including a to-be-transformed residual sequence of 16 residual elements) and 32-point transformation (including a to-be-transformed residual sequence of 32 residual elements), at this time, the first matrix model of the initial transformation matrix is greater than the array model of the processing unit in the transformation circuit, the equivalent matrix may be divided into a plurality of equivalent blocks, each equivalent matrix is respectively used as an operation transformation matrix, then the transformation circuit is adopted to transform corresponding residual elements according to each equivalent block, and each part of residual elements is transformed by using the transformation circuit for multiple times, so as to transform the whole to-be-transformed residual sequence. For example, referring to fig. 4, when 16-point transform is performed using transform circuit transform including 8 × 8 ═ 64 processing units, the to-be-transformed residual sequence may be divided into two sub-residual sequences L0 and L1, each of which is an input of 8 × 1. The 16 × 16 equivalent matrix is divided into 4 8 × 8 blocks: A. b, C and D, which are the operation transformation matrixes adopted by each transformation process, the implementation process of the 16-point transformation can be divided into 4 steps: a × L0, B × L1, C × L0 and D × L1, where each step is completed in one transform cycle, i.e., multiplexing the transform circuit 4 times can achieve 16-point transform. Specifically, during the first conversion cycle, L0 is input to the conversion circuit, which is now configured as a, completing axl 0, and the intermediate results can be stored in 8 accumulators. In the second conversion cycle, L1 is entered into the conversion circuit, which is now configured as B, completing B × L1, and the intermediate result obtained is accumulated with the intermediate result of a × L0 stored in the accumulator, thus obtaining the final conversion result for the first 8 coefficients. Similarly, by controlling the configuration of the processing unit of the input and transform circuit, the accumulated result of C × L0 and D × L1, i.e., the final transform result of the latter 8 coefficients, can be obtained. With the above-described split matrix multiplication algorithm scheme, 16-point transform can be completed in 4 cycles in the case of multiplexing a transform circuit including 8 × 8 processing units, and similarly, 32-point transform can be completed in 16 transform cycles with the use of the transform circuit.
The transformation circuit comprises processing units corresponding to all operation transformation elements in an operation transformation matrix, wherein each processing unit obtains the product between the corresponding operation transformation element and the input residual error element by setting a summation path so as to transform a to-be-transformed residual error sequence by adopting a specific transformation algorithm, so that the area of the residual error transformation circuit can be reduced, and the power consumption of the residual error transformation circuit can be reduced; the method can rapidly switch among a plurality of residual transformation algorithms by configuring each processing unit and/or adjusting input and output, and has higher switching efficiency and utilization rate; the method can also simultaneously transform a plurality of residual sequence with relatively small size and/or realize the transformation of the residual sequence with relatively large size through a plurality of transformations, and has high flexibility; the performance of the encoder can be improved, and the development cost of the corresponding encoder is reduced.
The present application also provides, in a second aspect, a transformation method, as shown with reference to fig. 5, the transformation method including:
s510, transforming the initial transformation matrix of the specific transformation algorithm to obtain an operation transformation matrix; the step can carry out the transformation of forms such as reduction and the like on the initial transformation matrix to obtain the operation transformation matrix which can simplify the operation process. Optionally, if the initial transformation matrix is reduced to obtain the required operation transformation matrix, the preliminary operation result may be amplified at the output end of the transformation circuit to ensure the accuracy of the obtained transformation result.
And S520, transforming the residual sequence to be transformed by adopting the transformation circuit according to any one of the embodiments.
In the above step, each residual element of the to-be-transformed residual sequence may be input into each row of processing units of the transform circuit, and each processing unit may perform an operation on each residual element to transform the to-be-transformed residual sequence. Specifically, each processing unit can perform parameter configuration according to each operational transformation element in the operational transformation matrix corresponding to the specific transformation algorithm, so that the configured processing unit can be used for accurately transforming the corresponding residual error element, and the transformation circuit can realize a complex residual error transformation process by using a simple circuit. By configuring each processing unit such that a plurality of residual transform algorithms can be multiplexed with the transform circuit, the utilization of the transform circuit can be improved.
In one embodiment, transforming the initial transformation matrix of the particular transformation algorithm to obtain the operational transformation matrix comprises: respectively reducing each initial transformation element of the initial transformation matrix by 2nAfter doubling, rounding each reduced initial transformation element to obtain an equivalent matrix; and determining the operation transformation matrix according to the equivalent matrix. The value of n can be determined according to the characteristics of circuit resources, operation precision, processing complexity and the like. The operation transformation matrix determined by the embodiment can effectively reduce the calculation amount in the subsequent transformation process and simplify the corresponding circuit structure.
Optionally, the above method for converting electricity may further include: amplifying the initial conversion result output from the conversion circuit 2nAnd multiplying to obtain a transformation result corresponding to the residual sequence to be transformed, and ensuring the accuracy of the finally obtained transformation result.
Specifically, the determining the operation transformation matrix according to the equivalent matrix includes: acquiring at least one operational transformation matrix matched with the transformation circuit according to the equivalent matrix; and the second matrix type of the operation transformation matrix is smaller than or equal to the array type of the processing unit in the transformation circuit, which indicates that the operation transformation matrix is matched with the transformation circuit. For example, if the matrix type of the equivalent matrix is less than or equal to the array type of the processing unit in the conversion circuit, the equivalent matrix can be used as an operation conversion matrix; if the matrix type of the equivalent matrix is larger than the array type of the processing unit in the transformation circuit, the equivalent matrix can be divided into a plurality of sub-matrices, each sub-matrix is sequentially used as an operation transformation matrix, the residual sequence to be transformed can be divided into a plurality of sub-residual sequences, the corresponding relation between the sub-matrices and the sub-residual sequences is determined, and the transformation circuit is enabled to respectively adopt each operation transformation matrix to transform corresponding residual elements so as to transform the residual sequence to be transformed.
The transforming the to-be-transformed residual sequence by the transformation circuit according to the operation transformation matrix comprises the following steps: and dividing the to-be-transformed residual sequence into at least one sub-residual sequence according to the equivalent matrix and each operational transformation matrix, and transforming the corresponding sub-residual sequence according to each operational transformation matrix by adopting the transformation circuit respectively so as to transform the to-be-transformed residual sequence. For example, referring to fig. 4, a 16-point transform is performed using a transform circuit transform including 64 processing units, and a 16 × 16 equivalent matrix is divided into 4 8 × 8 blocks: A. b, C and D, as the operation transformation matrix used in each transformation; dividing the residual sequence to be transformed into two sub-residual sequences L0 and L1, wherein each sub-residual sequence is 8 multiplied by 1 input, and determining sub-residual sequences corresponding to A, B, C and D respectively. The implementation process of the 16-point transformation can be divided into 4 steps: a × L0, B × L1, C × L0 and D × L1, where each step is completed in one transform cycle, i.e., multiplexing the transform circuit 4 times can achieve 16-point transform. Specifically, during the first conversion cycle, L0 is entered into the conversion circuit, which is now configured as a, completing axl 0, and the intermediate results can be stored in 8 accumulators. In the second conversion cycle, L1 is entered into the conversion circuit, which is now configured as B, completing B × L1, and the intermediate result obtained is accumulated with the intermediate result of a × L0 stored in the accumulator, thus obtaining the final conversion result for the first 8 coefficients. Similarly, by controlling the configuration of the processing unit of the input and transform circuit, the accumulated result of C × L0 and D × L1, i.e., the final transform result of the latter 8 coefficients, can be obtained. With the above-described split matrix multiplication algorithm scheme, 16-point transform can be completed in 4 cycles with multiplexing of transform circuits including 8 × 8 processing units.
In one example, the transformation process of the sub-residual sequence includes: configuring corresponding processing units in the transformation circuit according to each operation transformation element of the operation transformation matrix so that each processing unit can obtain the product between the corresponding operation transformation element and the residual error element; and inputting the sub residual sequence into the configured transformation circuit so as to transform the sub residual sequence.
Specifically, the second matrix type of the operation transformation matrix is less than or equal to the array type of the processing unit in the transformation circuit; before configuring the corresponding processing unit in the transform circuit according to each operation transform element of the operation transform matrix, the transform process of the sub-residual sequence further includes: and if the second matrix model is smaller than the array model, selecting a current operation circuit corresponding to the second matrix model from the conversion circuit, and determining the corresponding relation between each operation conversion element in the operation conversion matrix and each processing unit in the current operation circuit. Optionally, other processing units output a0 value or other invalid parameter.
Optionally, the processing units of the transformation circuit form a square matrix array; the current operation circuit comprises an upper left corner array and/or a lower right corner array of the square array so as to facilitate the acquisition and processing of the input and/or corresponding output of the residual error elements.
Optionally, if the second matrix type is smaller than the array type, a plurality of current operation circuits corresponding to the second matrix type may be selected from the transformation circuits, a correspondence between each operation transformation element and the processing unit in each operation transformation matrix is determined, each processing unit of each current operation circuit is configured at the same time, and each current operation circuit is adopted to simultaneously transform the corresponding to-be-transformed residual sequence, so as to improve the transformation efficiency. For example, for 4-point transform, the transform circuit including 8 × 8 — 64 processing units may select the upper left-hand 4 × 4 processing units and the lower right-hand 4 × 4 processing units as 2 current operation circuits, and the 2 current operation circuits respectively implement one 4-point transform to perform 2 transforms simultaneously.
In one example, the process of configuring the processing unit includes: determining the shift parameters of each shifter and the channel selection signals accessed by the control end of each channel selector according to the corresponding operation transformation elements so that the processing unit obtains the products of the corresponding operation transformation elements and the residual error elements; the channel selection signal comprises a shift selection signal for representing an optional shift end or a zero setting selection signal for representing an optional zero setting end.
Further, the process of configuring the processing unit further comprises: and configuring the positive and negative signs of the sign selector according to the signs of the corresponding operation transformation elements so as to more accurately represent the characteristics of the corresponding operation transformation elements.
In one embodiment, the transform circuit is a DCT-VIII circuit implementing a DCT-VIII algorithm; the process of obtaining the transformation result corresponding to the DST-VII algorithm by adopting the DCT-VIII circuit comprises the following steps: inputting each residual error element in the residual error sequence to be transformed into the DCT-VIII circuit according to the reverse order; and inverting the output result of the DCT-VIII circuit at an even number position to obtain a transformation result corresponding to the DST-VII algorithm.
According to the embodiment, under the condition that no extra computing resource is added, only the input and output conditions need to be finely adjusted, the function of the DST-VII algorithm can be realized by using the DCT-VIII circuit, the multiplexing rate of the circuit is improved, and the circuit area is reduced.
The conversion method is implemented by using the conversion circuit provided in any of the embodiments, and has all the advantages of the conversion circuit, which are not described herein again.
For the specific description of the transformation method provided in the present application, reference may be made to the corresponding description of the transformation circuit in the foregoing embodiments, and the transformation method has all the beneficial effects of the transformation circuit, and is not described herein again.
The present application also provides, in a third aspect, a conversion apparatus comprising at least one conversion circuit according to any one of the above embodiments.
The transformation device can comprise at least one transformation circuit, can realize the residual transformation corresponding to a plurality of residual transformation algorithms, and reduces the area and the power consumption of the residual transformation circuit; when the encoder comprises a plurality of conversion circuits, the conversion circuits can be respectively adopted to simultaneously execute residual error conversion, so that the conversion speed can be increased, and the encoding efficiency of the corresponding encoder can be improved.
The present application also provides, in a fourth aspect, an encoder comprising a transform circuit as described in any of the above embodiments.
The encoder adopts the transformation circuit provided by any one of the embodiments to perform residual transformation, and the area of the circuit used for the residual transformation is small, and the power consumption is low; the method can be rapidly switched among a plurality of residual transformation algorithms by configuring each processing unit and/or adjusting input and output, and has high flexibility; the performance of the encoder is improved, and the development cost is reduced.
Although the application has been shown and described with respect to one or more implementations, equivalent alterations and modifications will occur to others skilled in the art based upon a reading and understanding of this specification and the annexed drawings. This application is intended to embrace all such modifications and variations and is limited only by the scope of the appended claims. In particular regard to the various functions performed by the above described components, the terms used to describe such components are intended to correspond, unless otherwise indicated, to any component which performs the specified function of the described component (e.g., that is functionally equivalent), even though not structurally equivalent to the disclosed structure which performs the function in the herein illustrated exemplary implementations of the specification.
That is, the above description is only an embodiment of the present application, and not intended to limit the scope of the present application, and all equivalent structures or equivalent flow transformations made by using the contents of the specification and the drawings, such as mutual combination of technical features between various embodiments, or direct or indirect application to other related technical fields, are included in the scope of the present application.
In addition, structural elements having the same or similar characteristics may be identified by the same or different reference numerals. Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more features. In the description of the present application, "a plurality" means two or more unless specifically limited otherwise.
In this application, the word "exemplary" is used to mean "serving as an example, instance, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments. The previous description is provided to enable any person skilled in the art to make and use the present application. In the foregoing description, various details have been set forth for the purpose of explanation. It will be apparent to one of ordinary skill in the art that the present application may be practiced without these specific details. In other instances, well-known structures and processes are not shown in detail to avoid obscuring the description of the present application with unnecessary detail. Thus, the present application is not intended to be limited to the embodiments shown, but is to be accorded the widest scope consistent with the principles and features disclosed herein.