WO2018171596A1 - Procédé de codage vidéo, procédé de décodage vidéo et dispositif correspondant - Google Patents
Procédé de codage vidéo, procédé de décodage vidéo et dispositif correspondant Download PDFInfo
- Publication number
- WO2018171596A1 WO2018171596A1 PCT/CN2018/079699 CN2018079699W WO2018171596A1 WO 2018171596 A1 WO2018171596 A1 WO 2018171596A1 CN 2018079699 W CN2018079699 W CN 2018079699W WO 2018171596 A1 WO2018171596 A1 WO 2018171596A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- frame
- video
- scene
- feature
- residual
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/124—Quantisation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/179—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a scene or a shot
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/593—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial prediction techniques
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/61—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/625—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using discrete cosine transform [DCT]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/85—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
Definitions
- the present invention relates to the field of video frame processing, and in particular, to a video encoding method, a video decoding method, a video encoding device and a video decoding device, and a video encoding and decoding device.
- HEVC High Efficiency Video Coding predictive coding uses both intra-frame compression and inter-frame compression.
- the GOP Group of pictures
- the frame group is a group composed of a plurality of frames. To prevent motion changes, the number of frames should not be set too much.
- HEVC divides all frames into three types of frames: I, P, and B, as shown in Figure 1. The numbers above the frames in the figure indicate the number of the corresponding frame in the original video sequence.
- the I frame, the P frame, and the B frame are encoded in units of GOP.
- an Intra-frame also known as an intra-coded frame
- an Intra-frame is an independent frame with all the information, and can be independently encoded and decoded without reference to other images.
- the existing IVC frame of the HEVC standard only uses the image intraframe information of the current I frame for encoding and decoding, and is selected according to the video time axis by a fixed strategy.
- the amount of independently encoded I-frame compressed data is high and there is a large amount of information redundancy.
- the embodiments of the present invention provide a video encoding method, a video decoding method, a video encoding device, a video decoding device, and a video encoding and decoding device, which are used to improve the compression efficiency of a video frame.
- a first aspect of the embodiments of the present invention provides a video encoding method, the method comprising: acquiring a plurality of video frames, wherein each of the plurality of video frames includes redundant data on the screen content. Then, the multiple video frames are reconstructed to obtain scene information and reconstruction residuals of each video frame, the scene information includes data obtained by reducing redundancy of redundant data, and the reconstructed residual is used to represent The difference between the video frame and the scene information, such that redundant data of the plurality of video frames is reduced by reconstruction. Subsequently, the scene information is predictively coded, the scene feature prediction coded data is obtained, and the reconstructed residual is predictively coded to obtain residual prediction coded data.
- the redundancy of the video frames can be reduced, so that in the encoding operation, the obtained scene features and the reconstructed residual total compressed data amount are relative to the original video.
- the amount of compressed data of the frame is reduced, reducing the amount of data obtained after compression.
- Each video frame is reconstructed into a scene feature and a reconstructed residual. Since the reconstructed residual includes residual information other than the scene information, the amount of information is small and sparse, and the feature can be compared when performing predictive coding.
- the codewords are less predictively encoded, the amount of encoded data is small, and the compression ratio is high.
- the method of the embodiment of the present invention can effectively improve the compression efficiency of a video frame.
- each of the multiple video frames includes the same picture content, and the same picture content That is, redundant data of the plurality of video frames.
- Reconstructing a plurality of video frames to obtain scene information and a reconstruction residual of each video frame comprising: reconstructing a plurality of video frames to obtain scene features and reconstruction residuals of each video frame, The scene feature is used to represent the same picture content between each video frame, and the reconstructed residual is used to represent the difference between the video frame and the scene feature.
- the scene feature is one of the specific forms of scene information.
- the scene information is predictively encoded, and the scene feature prediction encoded data is obtained, including: predicting and encoding the scene features, and obtaining scene feature prediction encoded data.
- the method of the embodiment of the present invention can effectively improve the compression efficiency of a video frame.
- multiple video frames are reconstructed to obtain a scene feature and each video frame.
- Reconstruction residuals include: converting multiple video frames into an observation matrix, and the observation matrix is used to represent multiple video frames in a matrix form. Then, the observation matrix is reconstructed according to the first constraint condition to obtain a scene feature matrix and a reconstructed residual matrix.
- the scene feature matrix is used to represent the scene features in a matrix form, and the reconstructed residual matrix is used in a matrix form.
- the reconstructed residuals of the plurality of video frames are represented, the first constraint is used to define the scene feature matrix low rank and the reconstructed residual matrix is sparse.
- the reconstruction operation of the plurality of video frames is performed in the form of a matrix, and under the constraint of the first constraint, the reconstruction residual and the scene feature are made to meet the preset requirements, which is advantageous for reducing the coding amount and the subsequent encoding operation. Increase the compression ratio.
- the observation matrix is reconstructed according to the first constraint condition, and the scene feature matrix is obtained.
- Reconstructing the residual matrix includes: calculating a scene feature matrix and a reconstructed residual matrix according to a first preset formula, wherein the obtained scene feature matrix is a low rank matrix, and the reconstructed residual matrix is a sparse matrix.
- the first preset formula is:
- Both sets of formulas include two formulas: the target constraint function and the reconstruction formula. Because the former group of formulas belong to the NP problem, the slack operation is performed to obtain the latter set of formulas, and the latter set of formulas are convenient to solve.
- D is the observation matrix
- F is the scene feature matrix
- E is the reconstructed residual matrix
- ⁇ is the weight parameter
- ⁇ is used to balance the relationship between the scene feature matrix F and the reconstructed residual matrix E.
- 1 is the matrix L1 norm, and
- * is the matrix kernel norm.
- the multiple video frames are performed on the multiple video frames.
- the method of the implementation manner further includes: extracting picture feature information of each of the plurality of video frames; and then, according to the picture The feature information is calculated to obtain content metric information for measuring a difference in picture content of the plurality of video frames. Therefore, when the content metric information is not greater than the preset metric threshold, performing the step of reconstructing the plurality of video frames to obtain a scene feature and a reconstruction residual of each video frame.
- the reconstruction operations of the first to third implementations of the first aspect can be performed by the plurality of video frames that meet the requirements, and the normal execution of the reconstruction operation is ensured.
- the screen feature information is a global GIST feature
- the preset metric threshold is a preset.
- the variance threshold is calculated according to the picture feature information
- the content GIST feature variance is calculated according to the global GIST feature.
- the reconstruction of the first to third implementations of the first aspect of the present application is performed by calculating the content GIST feature variance of the plurality of video frames to measure the content consistency of the plurality of video frames.
- the acquiring multiple video frames includes: The video stream is obtained, and the video frames of the video stream include an I frame, a B frame, and a P frame. Then, an I frame is extracted from the video stream, and the I frame is used to perform a step of reconstructing a plurality of video frames to obtain scene features and reconstruction residuals of each video frame.
- the method of the implementation manner further includes: reconstructing according to the scene feature and the reconstructed residual to obtain a reference frame.
- the B frame and the P frame are inter-predictive-coded to obtain B-frame predictive coded data and P-frame predictive coded data.
- the predictive coded data is subjected to transform coding, quantization coding, and entropy coding to obtain video compressed data;
- the predictive coded data includes scene feature prediction coded data, residual prediction coded data, B frame predictive coded data, and P frame predictive coded data.
- the I frame of the video stream can be reconstructed and encoded using the method of the present implementation, the amount of encoded data of the I frame is reduced, and the redundant data of the I frame is reduced.
- each of the multiple video frames includes redundant data at a local location, corresponding to The reconstruction operation is different from the foregoing implementation manner, that is, reconstructing multiple video frames to obtain scene information and reconstruction residuals of each video frame, including: splitting each video frame in multiple video frames A plurality of frame sub-blocks are obtained, and the frame sub-block obtained after the split includes redundant data, and the partial frame sub-blocks can be obtained based on other frame sub-blocks.
- the so-called frame sub-block is the frame content of a partial area of the video frame.
- the plurality of frame sub-blocks are reconstructed to obtain a scene feature, a representation coefficient of each frame sub-block of the plurality of frame sub-blocks, and a reconstruction residual of each frame sub-block, wherein the scene feature includes multiple
- the independent scene feature base cannot be reconstructed from each other within the scene feature.
- the scene feature base is used to describe the picture content feature of the frame sub-block.
- the indicated representation coefficient represents the correspondence between the scene feature base and the frame sub-block.
- the reconstructed residual represents the difference between the frame sub-block and the scene feature base.
- the scene feature of the implementation manner is one of the specific forms of the scene information, which can reduce the redundancy between the partially redundant video frames.
- the scene information is predictively encoded, and the scene feature prediction encoded data is obtained, including: predicting and encoding the scene features, and obtaining scene feature prediction encoded data.
- multiple frame sub-blocks are reconstructed to obtain a scene feature and multiple frames.
- a representation coefficient of each frame sub-block in the sub-block and a reconstruction residual of each frame sub-block including: reconstructing a plurality of frame sub-blocks to obtain each of the plurality of frame sub-blocks Represents the coefficient and the reconstructed residual of each frame sub-block.
- the representation coefficient represents a correspondence between a frame sub-block and a target frame sub-block
- the target frame sub-block is an independent frame sub-block among the plurality of frame sub-blocks
- the independent frame sub-block is not based on other ones of the plurality of frame sub-blocks
- the frame sub-block reconstructed from the frame sub-block is reconstructed to represent the difference between the target frame sub-block and the frame sub-block.
- a plurality of target frame sub-blocks indicating the coefficient indication are combined to obtain a scene feature
- the target frame sub-block is a scene feature base.
- the target frame sub-blocks that can be independently represented are selected, and the target sub-blocks and the reconstructed residuals are not represented by the frame sub-blocks that are not independently represented, thereby reducing the between the sub-blocks and the target sub-blocks that are not independently represented.
- Redundant data only need to encode the target frame sub-block and the reconstructed residual when encoding, reducing the amount of coding.
- the multiple frame sub-blocks are reconstructed to obtain multiple frame sub-blocks.
- the representation coefficient of each frame sub-block and the reconstruction residual of each frame sub-block include: converting a plurality of frame sub-blocks into an observation matrix, and the observation matrix is used to represent the plurality of frame sub-blocks in a matrix form. Then, the observation matrix is reconstructed according to the second constraint condition, and the representation coefficient matrix and the reconstructed residual matrix are obtained.
- the representation coefficient matrix is a matrix including the representation coefficients of each of the plurality of frame sub-blocks, the non-zero coefficient indicating the coefficient indicates the target frame sub-block, and the reconstructed residual matrix is used for each of the matrix forms.
- the reconstructed residual of the frame sub-block is represented, and the second constraint is used to define the low rank and the sparsity of the represented coefficient to meet the preset requirement.
- combining the plurality of target frame sub-blocks indicating the coefficient indication to obtain the scene feature comprising: combining the target frame sub-blocks indicating the non-zero coefficient indication coefficients of the coefficient matrix to obtain the scene features.
- the observation matrix is reconstructed according to the second constraint condition, and the representation coefficient matrix is obtained.
- Reconstructing the residual matrix includes: calculating a representation coefficient matrix and a reconstructed residual matrix according to a second preset formula, where the second preset formula is:
- multiple frame sub-blocks are reconstructed to obtain scene features and multiple Representing coefficients of each frame sub-block in the frame sub-block and reconstructing residuals of each frame sub-block includes: reconstructing a plurality of frame sub-blocks to obtain a scene feature and each of the plurality of frame sub-blocks
- the representation coefficient of the frame sub-block, the scene feature includes the scene feature base as an independent feature block in the feature space, and the independent feature block is a feature block that cannot be reconstructed by other feature blocks in the scene feature.
- the reconstructed residual of each frame sub-block is calculated according to the reconstructed residual of each frame sub-block and the reconstructed data of the scene feature and each frame sub-block.
- a scene feature that can represent the plurality of frame sub-blocks as a whole is obtained by reconstructing, the scene feature is composed of a scene feature base, and the scene feature base is an independent feature block in the feature space, if different frames are used. If the block reconstruction obtains the same feature block, the same feature block may not be repeatedly saved in the scene feature, thereby reducing redundant data.
- multiple frame sub-blocks are reconstructed to obtain scene features and multiple
- the representation coefficients of each frame sub-block in the frame sub-block include: converting a plurality of frame sub-blocks into an observation matrix, and the observation matrix is used to represent the plurality of frame sub-blocks in a matrix form.
- the representation coefficient matrix is a matrix including the representation coefficients of each frame sub-block, and the non-zero coefficient indicating the coefficient indicates the scene feature Base
- the scene feature matrix is used to represent the scene feature in a matrix form
- the third constraint condition is used to define the similarity between the picture representing the coefficient matrix and the scene feature matrix reconstructed picture and the frame sub-block according to a preset similarity threshold, And limiting the data matrix sparsity to meet the preset sparse threshold, and the amount of data of the scene feature matrix is less than the preset data amount threshold.
- the reconstructed residual of each frame sub-block is calculated, including: according to the representation coefficient matrix and the scene feature
- the data and the observation matrix obtained by the matrix reconstruction are used to calculate a reconstructed residual matrix, wherein the reconstructed residual matrix is used to represent the reconstructed residual in a matrix form.
- the reconstruction operation can be performed in the form of a matrix, and the representation coefficients and scene features that meet the requirements for reducing the coding amount are calculated by using the third constraint condition.
- the observation matrix is reconstructed according to the third constraint condition, and the representation coefficient is obtained.
- the matrix and the scene feature matrix include: calculating a representation coefficient matrix and a scene feature matrix according to a third preset formula, and the third preset formula is:
- D is the observation matrix
- C is the coefficient matrix
- F is the scene feature
- ⁇ and ⁇ are the weight parameters, which are used to adjust the coefficient sparsity and low rank. Represents the optimal value of F and C, ie the formula The value of F and C when the value is minimum.
- the multiple video frames are The method of the present implementation further includes: extracting picture feature information of each of the plurality of video frames, before each video frame is split to obtain a plurality of frame sub-blocks. Then, based on the picture feature information, content metric information is calculated, where the content metric information is used to measure the difference of the picture content of the plurality of video frames. Therefore, when the content metric information is greater than the preset metric threshold, performing the step of splitting each of the plurality of video frames to obtain a plurality of frame sub-blocks. In this way, when the content metric information is greater than the preset metric threshold, the image representing the plurality of video frames locally has redundant data, thereby using a method of splitting the video frame and reconstructing the frame sub-block.
- the picture feature information is a global GIST feature
- the preset metric threshold is And a preset variance threshold
- the content metric information is calculated according to the picture feature information, including: calculating a feature GIST feature variance according to the global GIST feature.
- the content consistency of the plurality of video frames is calculated by calculating the variance of the scene GIST features of the plurality of video frames, thereby determining whether the images of the plurality of video frames have locally stored redundant data, so as to split and match the video frames.
- a method of reconstructing a frame sub-block is
- multiple video frames are acquired, including Obtaining a video stream, where the video frame of the video stream includes an I frame, a B frame, and a P frame; extracting an I frame from the video stream, where the I frame is used to perform splitting each video frame in the multiple video frames, a step of multiple frame sub-blocks;
- the method of the implementation manner further includes: performing reconfiguration according to the scene feature, the representation coefficient, and the reconstruction residual to obtain a reference frame; using the reference frame as a reference, performing interframe prediction coding on the B frame and the P frame, and obtaining the B frame prediction coding.
- the predictive coded data includes scene feature predictive coded data, residual predictive coded data, B frame predictive coded data, and P Frame prediction encoded data.
- the method of the present implementation can be applied to key frames of a video stream, reducing redundant data and coding amount of key frames.
- the method of the implementation manner further includes: classifying the plurality of video frames based on the correlation of the picture content, and obtaining video frames of one or more classification clusters, where the video frames of the same classification cluster are used to execute multiple videos.
- the frame is reconstructed to obtain scene information and a step of reconstructing the residual of each video frame.
- the multiple video frames are classified according to the correlation of the screen content
- a video frame of one or more clusters includes extracting feature information of each of the plurality of video frames. Determining the clustering distance between any two video frames according to the feature information, the clustering distance is used to represent the similarity between the two video frames, and the video frames are clustered according to the clustering distance to obtain the video of one or more clustering clusters. frame. In this way, the classification operation of multiple video frames is realized by clustering.
- acquiring a plurality of video frames includes: acquiring a video stream, where the video stream includes multiple video frames. Then, feature information of the first video frame and the second video frame are respectively extracted, and the feature information is used to describe the picture content of the video frame, where the first video frame and the second video frame are video frames in the video stream; Calculating a lens distance between the first video frame and the second video frame; determining whether the lens distance is greater than a preset lens threshold; if the lens distance is greater than a preset lens threshold, segmenting the target lens from the video stream, starting frame of the target lens For the first video frame, the end frame of the target lens is the previous video frame of the second video frame; if the lens distance is less than the preset lens threshold, the first video frame and the second video frame are attributed to the same lens, and the target lens belongs to One of the lenses of the video stream, the lens is a time-continuous video frame
- the frame distance between two adjacent key frames is greater than a preset frame distance threshold, and the frame distance is used to indicate the degree of difference between the two video frames, and the key frame of each shot is used for execution. Reconstructing a plurality of video frames to obtain scene information and a reconstructed residual for each video frame. After the lens is divided, the key frames are extracted from the respective shots according to the distance. Such an extraction method uses the context information of the video stream, and the method of the present implementation can be applied to the video stream.
- the method further comprises: performing discriminant training according to each shot segmented from the video stream to obtain a plurality of classifiers corresponding to the shot; and using the target classifier to discriminate the target video frame, Determining the score, the target classifier is one of a plurality of classifiers, and the target video frame is one of the key frames, and the discriminant score is used to indicate the extent to which the target video frame belongs to the scene to which the target classifier belongs; When the threshold is greater than the preset score threshold, it is determined that the target video frame belongs to the same scene as the shot to which the target classifier belongs; and the video frames of one or more clusters are determined according to the video frames belonging to the same scene as the shot.
- acquiring a plurality of video frames includes: acquiring a compressed video stream, where the compressed video stream includes the compressed video a frame; a plurality of target video frames are determined from the compressed video stream, and the target video frame is an independently compressed and encoded video frame in the compressed video stream; the target video frame is decoded to obtain a decoded target video frame, and the decoded target is obtained.
- the video frame is used to perform the step of splitting each of the plurality of video frames to obtain a plurality of frame sub-blocks.
- a second aspect of the embodiments of the present invention provides a video decoding method, which includes: acquiring scene feature prediction encoded data and residual prediction encoded data. Then, the scene feature prediction encoded data is decoded to obtain scene information, wherein the scene information includes data obtained by reducing redundancy of redundant data, and the redundant data is between each video frame of the plurality of video frames. Redundant data on the content.
- the residual prediction encoded data is decoded to obtain a reconstructed residual, and the reconstructed residual is used to represent the difference between the video frame and the scene information.
- the reconstruction is performed according to the scene information and the reconstructed residual, and multiple video frames are obtained. In this way, the scene feature prediction coded data and the residual prediction coded data obtained by the video coding method provided by the first aspect can be decoded by the video decoding method of the implementation manner.
- each of the multiple video frames includes the same picture content, and the pair of scene feature prediction codes
- the data is decoded to obtain scene information, including: decoding scene feature prediction encoded data to obtain scene features, and the scene features are used to represent the same screen content between each video frame.
- Reconstructing according to the scene information and the reconstructed residual obtaining multiple video frames, including: reconstructing according to the scene feature and the reconstructed residual, to obtain multiple video frames.
- the scene feature information can be decoded by this implementation.
- acquiring scene feature prediction coding data and residual prediction coding data including: acquiring video Compressing data; performing entropy decoding, inverse quantization processing, and DCT inverse variation on the video compressed data to obtain predictive encoded data, and the predictive encoded data includes scene feature predictive encoded data, residual predictive encoded data, B frame predictive encoded data, and P frame predictive encoded data.
- Reconstructing according to the scene feature and the reconstructed residual, obtaining multiple video frames including: reconstructing according to the scene feature and the reconstruction residual, and obtaining multiple I frames;
- the method of the implementation manner further includes: performing inter-frame decoding on the B frame predictive encoded data and the P frame predictive encoded data by using the I frame as a reference frame to obtain a B frame and a P frame; and time consuming the I frame, the B frame, and the P frame. Arrange sequentially to get the video stream.
- the video stream can be decoded by the present implementation.
- the method of the implementation manner further includes: acquiring a representation coefficient.
- Decoding the scene feature prediction encoded data to obtain scene information comprising: decoding scene feature prediction encoded data to obtain a scene feature, where the scene feature includes multiple independent scene feature bases, and independent scene feature bases within the scene feature Cannot be reconstructed from each other, the scene feature base is used to describe the picture content feature of the frame sub-block, and the representation coefficient represents the correspondence between the scene feature base and the frame sub-block, and the reconstructed residual represents the difference between the frame sub-block and the scene feature base. value.
- the reconstruction is performed according to the scene information and the reconstructed residual to obtain a plurality of video frames.
- the method includes: reconstructing according to a scene feature, a representation coefficient, and a reconstruction residual to obtain a plurality of frame sub-blocks; combining the plurality of frame sub-blocks to obtain a plurality of video frames.
- the video decoding method of the present implementation can be used to decode the scene feature and the reconstructed residual, and reconstructed to obtain a plurality of frame sub-blocks.
- a video frame can be obtained by reorganizing.
- acquiring scene feature prediction coding data and residual prediction coding data including: acquiring video Compressing data; performing entropy decoding, inverse quantization processing, and DCT inverse variation on the video compressed data to obtain predictive encoded data, and the predictive encoded data includes scene feature predictive encoded data, residual predictive encoded data, B frame predictive encoded data, and P frame predictive encoded data.
- the method of this implementation manner further includes:
- the I frame is used as a reference frame, and the B frame predictive coded data and the P frame predictive coded data are inter-frame decoded to obtain a B frame and a P frame; and the I frame, the B frame, and the P frame are arranged in chronological order to obtain a video stream.
- the reconstructed residual, the scene feature and the representation coefficient are reconstructed by the frame sub-block, and then decoded and restored by the video decoding method of the implementation manner. Get the video stream.
- a third aspect of the embodiments of the present invention provides a video encoding apparatus having a function of performing the above video encoding method.
- This function can be implemented in hardware or in hardware by executing the corresponding software.
- the hardware or software includes one or more modules corresponding to the functions described above.
- the video encoding device includes:
- An acquiring module configured to acquire multiple video frames, and each of the plurality of video frames includes redundant data on the screen content
- the reconstruction module is configured to reconstruct multiple video frames to obtain scene information and reconstruction residuals of each video frame, where the scene information includes data obtained by reducing redundancy of redundant data, and reconstructing residuals Deducing the difference between the video frame and the scene information;
- a prediction encoding module configured to predictively encode scene information, and obtain scene feature prediction encoded data
- the prediction encoding module is further configured to perform predictive coding on the reconstructed residual to obtain residual prediction encoded data.
- the video encoding device includes:
- the video encoder performs the following actions: acquiring a plurality of video frames, and each of the plurality of video frames includes redundant data on the screen content;
- the video encoder further performs the following actions: reconstructing a plurality of video frames to obtain scene information and reconstruction residuals of each video frame, and the scene information includes data obtained by reducing redundancy of redundant data, and reconstructing The residual is used to represent the difference between the video frame and the scene information;
- the video encoder further performs the following actions: performing predictive coding on the scene information to obtain scene feature prediction encoded data;
- the video encoder also performs an operation of predictive coding the reconstructed residual to obtain residual prediction encoded data.
- a fourth aspect of the embodiments of the present invention provides a video decoding apparatus having a function of performing the above video decoding method.
- This function can be implemented in hardware or in hardware by executing the corresponding software.
- the hardware or software includes one or more modules corresponding to the functions described above.
- the video decoding device includes:
- An obtaining module configured to acquire scene feature prediction encoded data and residual prediction encoded data
- a scene information decoding module configured to decode scene feature prediction encoded data to obtain scene information, where the scene information includes data obtained by reducing redundancy of redundant data, and the redundant data is each video frame of multiple video frames. Redundant data between screen contents;
- the video frame reconstruction module is configured to reconstruct according to the scene information and the reconstructed residual to obtain a plurality of video frames.
- the video decoding device includes:
- the video decoder performs the following actions: acquiring scene feature prediction encoded data and residual prediction encoded data;
- the video decoder further performs the following operations: decoding scene feature prediction encoded data to obtain scene information, the scene information including data obtained by reducing redundancy of redundant data, and the redundant data is each of a plurality of video frames Redundant data on the content of the picture between video frames;
- the video decoder further performs the following operations: decoding the residual prediction encoded data to obtain a reconstructed residual, where the reconstructed residual is used to represent a difference between the video frame and the scene information;
- the video decoder also performs an action of reconstructing based on the scene information and the reconstructed residual to obtain a plurality of video frames.
- a fifth aspect of the embodiments of the present invention provides a video codec device, where the video codec device includes a video encoding device and a video decoding device.
- the video encoding device is the video encoding device provided by the foregoing third aspect
- the video decoding device is the video decoding device provided by the fourth aspect above.
- a seventh aspect of the embodiments of the present invention provides a computer storage medium storing program code for indicating execution of the method of the second aspect described above.
- Yet another aspect of the present application provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the methods described in the various aspects above.
- each of the plurality of video frames includes redundant data on the picture content. Then, reconstructing the plurality of video frames to obtain scene information and reconstruction residuals of each video frame, the scene information includes data obtained by reducing redundancy of redundant data, and the reconstructed residual is used to represent the video frame. The difference between the scene information and the scene information. Then, the scene information is predictively coded, the scene feature prediction coded data is obtained, and the reconstructed residual is predictively coded to obtain residual prediction coded data.
- the redundancy of the video frames can be reduced, so that in the encoding operation, the obtained scene features and the reconstructed residual total compressed data amount are relative to the original video.
- the amount of compressed data of the frame is reduced, reducing the amount of data obtained after compression.
- Each video frame is reconstructed into a scene feature and a reconstructed residual. Since the reconstructed residual includes residual information other than the scene information, the amount of information is small and sparse, and the feature can be compared when performing predictive coding.
- the codewords are less predictively encoded, the amount of encoded data is small, and the compression ratio is high.
- the method of the embodiment of the present invention can effectively improve the compression efficiency of a video frame.
- FIG. 1 is a schematic diagram of a conventional HEVC coding
- FIG. 2 is a flowchart of a video frame encoding and decoding method according to an embodiment of the present invention
- FIG. 3a is a schematic diagram of a flow of a video encoding method and a flow of an existing HEVC encoding method according to another embodiment of the present invention
- FIG. 4a is a schematic diagram of a flow of a video decoding method and a flow of an existing HEVC decoding method according to another embodiment of the present invention
- FIG. 4b is a schematic diagram of a scenario involved in a video decoding method according to another embodiment of the present invention.
- FIG. 5 is a flowchart of a method for video encoding according to another embodiment of the present invention.
- FIG. 6 is a flowchart of a method for decoding a video according to another embodiment of the present invention.
- FIG. 7 is a flowchart of a method of a lens segmentation method of the video encoding method shown in FIG. 5;
- FIG. 8 is a flowchart of a method for extracting a key frame of the video encoding method shown in FIG. 5;
- FIG. 9 is a flowchart of a method for scene classification of the video encoding method shown in FIG. 5;
- FIG. 10 is a flowchart of a method based on an SVM classification method of the video encoding method shown in FIG. 5;
- FIG. 11 is a flowchart of a method for reconstructing an RPCA based scene of the video encoding method shown in FIG. 5;
- FIG. 12 is a flowchart of a method for a video encoding method according to another embodiment of the present invention.
- FIG. 13 is a schematic diagram of a scenario of the video encoding method shown in FIG. 12;
- FIG. 14 is a schematic diagram of a scenario of one of the specific methods of the video encoding method shown in FIG. 12;
- FIG. 15 is a schematic diagram of a scenario of one of the specific methods of the video encoding method shown in FIG. 12;
- FIG. 16 is a schematic diagram of a scenario of one of the specific methods of the video encoding method shown in FIG. 12;
- FIG. 17 is a flowchart of a method for decoding a video according to another embodiment of the present invention.
- FIG. 18 is a schematic structural diagram of a video encoding apparatus according to another embodiment of the present invention.
- FIG. 18b is a partial structural diagram of the video encoding apparatus shown in FIG. 18a;
- FIG. 19 is a schematic structural diagram of a video decoding device according to another embodiment of the present invention.
- FIG. 20 is a schematic structural diagram of a video codec device according to another embodiment of the present invention.
- 21 is a schematic block diagram of a video codec system 10 in accordance with an embodiment of the present invention.
- 22 is a block diagram illustrating an example video encoder 20 that is configured to implement the techniques of the present invention
- FIG. 23 is a block diagram illustrating an example video decoder 30 that is configured to implement the techniques of the present invention.
- each of the plurality of video frames includes redundant data on the screen content, and the plurality of video frames are performed.
- Reconstructing, obtaining scene information and reconstruction residuals of each video frame wherein the scene information includes data obtained by reducing redundancy of redundant data, and the reconstructed residual is used to represent a difference between the video frame and the scene information. value.
- the scene information is predictively coded
- the scene feature prediction coded data is obtained
- the reconstructed residual is predictively coded to obtain residual prediction coded data.
- the redundancy of the video frames can be reduced, so that in the encoding operation, the obtained scene features and the reconstructed residual total compressed data amount are relative to the original video.
- the amount of compressed data of the frame is reduced, reducing the amount of data obtained after compression.
- Each video frame is reconstructed into a scene feature and a reconstructed residual. Since the reconstructed residual includes residual information other than the scene information, the amount of information is small and sparse, and the feature can be compared when performing predictive coding.
- the codewords are less predictively encoded, the amount of encoded data is small, and the compression ratio is high.
- the method of the embodiment of the present invention can effectively improve the compression efficiency of a video frame.
- the embodiment of the present invention further provides a video decoding method, which is used to decode scene feature prediction encoded data and residual prediction encoded data obtained by the video encoding device, obtain scene information, and reconstruct residuals, according to The scene information and the reconstructed residual are reconstructed to obtain a video frame.
- key frames are independently coded, wherein key frames are also referred to as I frames.
- the I frame After compression, the I frame has a high proportion of compressed data and a large amount of information redundancy between I frames.
- the video coding method of the embodiment of the present invention is used for the I frame at the time of encoding, the coding efficiency of the I frame can be improved.
- HEVC High Efficiency Video Coding
- HEVC is a widely used and successful video codec standard.
- HEVC is a block-based hybrid coding method, which includes several modules such as prediction, transform, quantization, entropy coding, and loop filtering.
- the prediction module is a core module of the HEVC codec method, and may be specifically classified into an intra prediction and an inter prediction module.
- Intra prediction that is, using the pixels already encoded in the current image to generate prediction values.
- Inter prediction that is, reconstructing a pixel using the encoded image that has been previously in the current image to generate a predicted value. Since interframe prediction uses residual coding, compression is relatively high.
- the existing intra prediction module of the HEVC standard only uses the current image intraframe information for encoding and decoding, and adopts a fixed strategy according to the video time axis, and does not take into consideration the context context information of the video, so the encoding and decoding efficiency is low.
- the compression ratio is not high. For example:
- Scene 1 In the movie, A and B perform dialogues. The director frequently switches between A and B to express the inner feelings of the characters. At this time, it is suitable to divide and cluster all the lenses related to A, and perform inter-frame and intra-prediction encoding uniformly.
- Scene 2 the TV drama shooting venue is mainly divided into grass, beach and office scenes. At this time, it is suitable to identify and classify all grasses, beaches, and office scenes, extract scene feature information uniformly, and express and predict key frames.
- HEVC predictive coding uses both intra-frame compression and inter-frame compression.
- the GOP step size is set first before encoding, that is, the number of frames included in the GOP. To prevent motion changes, the number of frames should not be set too much.
- HEVC divides all frames into three types of frames: I, P, and B, as shown in Figure 1.
- the numbers above the frames in Figure 1 indicate the number of the corresponding frame in the original video sequence.
- the I frame, the P frame, and the B frame are encoded in units of GOP.
- Intra-frame also known as intra-framed frame
- Intra-frame is an independent frame with all the information. It can be independently encoded and decoded without reference to other images. It can be simply understood as a static picture.
- the first frame in each GOP is set to an I frame, and the length of the GOP also represents the interval between two adjacent I frames.
- the I frame provides the most critical information in the GOP, and the amount of information in the data is relatively large, so the compression is relatively poor, generally around 7:1.
- a P-frame (Predictive frame) is also called an inter-predictive coded frame. It needs to refer to the previous I frame to encode. Indicates the difference between the current frame picture and the previous frame (the previous frame may be an I frame or a P frame). When decoding, it is necessary to superimpose the difference defined by this frame with the previously buffered picture to generate the final picture.
- P frames typically occupy fewer bits of data than I frames, but the disadvantage is that P frames are very sensitive to transmission errors because of the complex dependence of the previous P and I reference frames. Since the residual is used for encoding, the amount of coded information required for the P frame is greatly reduced relative to the I frame, and the compression ratio is relatively high, generally around 20:1.
- a bi-directional frame is also called a bidirectional predictive coding frame, that is, a B frame records the difference between the current frame and the previous and subsequent frames.
- Decoding a B frame requires not only the previous buffered picture but also the P frame picture after decoding, and the final picture is obtained by superimposing the previous and subsequent pictures and superimposing the current frame data.
- B frame compression rate is high, but the decoding performance is high.
- the B frame is not a reference frame and does not cause a spread of decoding errors.
- B frames have the highest encoding compression ratio, and the general compression ratio is around 50:1.
- Entropy coding if it is an inter coding mode, encodes a motion vector.
- the decoding process of HEVC is the reverse process of the encoding process, and will not be described here.
- the HEVC codec method relies too much on I frame coding and has the following drawbacks:
- the amount of I frame compressed data is high. I frame coding only performs spatial compression on intraframe data without considering redundant information between adjacent frames. The amount of compressed data is large, usually about 10 times that of P frames.
- the GOP step size needs to be preset before encoding.
- the I frame ratio is determined by the setting of the GOP step size. As shown in FIG. 1, when the GOP step size is set to 13, the ratio of the I frame to the BP frame is 1:12. According to the respective compression ratios of the IBP frames, the ratio of the final I frame to the BP frame compressed data is approximately 2:5. Generally, a larger GOP step size can be set to reduce the I frame ratio to improve the overall compression ratio of the video, but this also causes a decrease in the quality of the compressed video.
- the I frames are sequentially extracted according to the time axis sequence, and the interval between adjacent I frames is GOP step.
- the selection strategy does not take into account the contextual context information of the video. For example, for two video segments that are not consecutive in time but highly correlated in the picture content, if the I frame is extracted according to the GOP step size and the individual intra coding is performed, a large amount of information redundancy is caused.
- the embodiment of the present invention proposes a video encoding and decoding algorithm based on intelligent video scene classification, in view of the problem that the original HEVC relies too much on I frame coding and the compression efficiency is not high.
- the method identifies and classifies the video shots and scenes, performs key data analysis and reconstruction on the key frames (I frames), and encodes the scene information and the representation residuals. It effectively avoids the problem of inefficient compression in a single key frame, and introduces video context information to improve the compression ratio.
- the video frame encoding and decoding method includes an encoding method part and a decoding method part.
- the video frame coding and decoding method includes:
- each of the plurality of video frames includes redundant data on the picture content.
- the obtaining of the multiple video frames may be obtained from the video stream according to a preset rule after the video stream is acquired, or the video codec may acquire the multiple video frames from other devices, which is used by the embodiment of the present invention. No specific limitation. Wherein, the plurality of embodiments of the present invention refer to at least two.
- the redundant data is data related to the content of the screen among the plurality of video frames, and information redundancy exists.
- the redundant data may be redundant data on the overall picture of the video frame, such as the description of the embodiment shown in Figure 5 below. It may also be redundant data on a partial picture of a video frame, such as the description of the embodiment shown in FIG.
- the plurality of video frames are obtained from a video stream.
- the codec device divides the video lens by the scene transition detection technology on the premise of the overall video data stream, and determines whether it is a static lens. Video frame extraction is performed for each lens according to the type of lens.
- the original video stream is segmented into a short-sized lens unit by a scene change detection technique.
- each shot is composed of video frames that are continuous in time, and represents a temporally and spatially continuous motion in a scene.
- the specific lens segmentation method can perform boundary segmentation and discrimination processing on the lens according to the change of the content of the video frame. For example, by locating the lens boundary and finding the position or time point of the boundary frame, the video can be segmented accordingly.
- the video frame of the lens is extracted on the basis of the lens segmentation, and the extracted video frame is the video frame to be acquired in step 201.
- the extraction of the video frame is adaptively selected according to the length of the lens and the content change, and may be one or more frames of images capable of reflecting the main information content of the lens.
- the codec device may directly extract a plurality of video frames that perform the following encoding method from the video stream, for example, extract the video frames according to a preset step size, and the like.
- Step 202 Perform reconstruction on multiple video frames to obtain scene information and reconstruction residuals of each video frame.
- the scene information includes data obtained by reducing redundancy of redundant data, and the reconstructed residual is used to represent a difference between the video frame and the scene information.
- the redundancy of the multiple video frames can be reduced by the reconstruction.
- the obtained scene information can also be in various forms.
- the scene information includes data obtained by reducing redundancy between redundant data frames, and a reconstructed residual represents a difference between a video frame and a scene feature, thereby reconstructing scene information obtained by reconstructing the plurality of video frames.
- the reconstructed residual reduces the redundancy of redundant data compared to the original video frame, reduces the overall amount of data, and maintains a complete amount of information.
- the purpose of scene reconstruction is to reduce the redundancy of key frames in the scene.
- the scene feature extraction principle is that the scene feature representation succinctly occupies a small amount of data, and the data reconstructed according to the scene information matches the original image as much as possible, so that the reconstructed residual amount is small.
- the scene reconstruction operation directly affects the compression effect of the video encoding.
- the method of the embodiment of the present invention further includes the operation of classifying the plurality of video frames, for example, classifying the plurality of video frames based on the correlation of the screen content, A video frame of one or more clusters is obtained, and step 202 is performed subsequent to the video frames of the same cluster.
- the redundancy of redundant data between multiple video frames belonging to the same cluster is in accordance with a preset requirement, for example, greater than a threshold.
- the specific classification methods include various methods, such as a cluster-based method, a method using a classifier, and the like, for example, feature extraction and description of key frames, and clustering key frames in the feature space.
- a cluster-based method such as a cluster-based method, a method using a classifier, and the like, for example, feature extraction and description of key frames, and clustering key frames in the feature space.
- the specific implementation process is described in detail in the following embodiments, which are not specifically limited in this embodiment of the present invention.
- a video frame for performing the method of the embodiment of the present invention is extracted for each shot.
- the video frame extracted by one shot can be reflected.
- the characteristics of the lens, and thus the classification of the extracted video frames can also be referred to as scene classification of the lens.
- the purpose of scene classification is to combine video frames extracted from the lens that are strongly related in content, so that the entire scene content can be analyzed later.
- the specific strategy of scene classification is realized by analyzing and clustering key frames of each lens.
- the principle of scene classification is that the video frames in each cluster are highly correlated on the screen content, and there is a large amount of information redundancy. This operation plays a decisive role in the subsequent scene reconstruction operation. The better the classification effect is, the intra-class information is highly aggregated, and the larger the information redundancy, the higher the coding efficiency.
- Step 203 Perform predictive coding on the scene information to obtain scene feature prediction encoded data.
- the scene information After the scene information is obtained, it can be predictively encoded to obtain scene feature prediction encoded data.
- Step 204 Perform predictive coding on the reconstructed residual to obtain residual prediction encoded data.
- the reconstructed residual After the reconstructed residual is obtained, it can be predictively coded to obtain residual prediction encoded data.
- intra prediction coding or inter prediction coding may be employed.
- the reconstruction residual has sparse characteristics because it does not include scene features. For example, when the reconstruction residuals are represented by a matrix, most of them are 0, and only a few are not 0. The value is therefore small in the amount of encoded information.
- the redundancy of the redundant data is reduced, so that the amount of data to be encoded is reduced, so that the scene feature predictive coding data and the residual obtained after the encoding are obtained.
- the amount of data of the difference prediction coded data is reduced, and since the video frame is represented by the scene information and the reconstructed residual, respectively, and the reconstructed residual represents the difference between the video frame and the scene feature, the obtained reconstruction is performed.
- the residual has a sparse characteristic, so that the amount of coded information of the reconstructed residual is reduced.
- the above steps 201 to 204 are video encoding methods, and the following are the steps of the video decoding method.
- Step 205 Acquire scene feature prediction encoded data and residual prediction encoded data.
- the video codec device acquires the predicted feature encoded data and the residual predictive encoded data with the encoded scene.
- Step 206 Decode the scene feature prediction encoded data to obtain scene information.
- the video codec device encodes the scene feature prediction encoded data to obtain scene information.
- the scene information includes data obtained by reducing the redundancy of redundant data, which is redundant data on the screen content between each of the plurality of video frames.
- Step 207 Decode the residual prediction encoded data to obtain a reconstructed residual.
- the video codec also decodes the residual prediction encoded data to obtain a reconstructed residual.
- the reconstructed residual is used to represent the difference between the video frame and the scene information.
- step 206 and step 207 is not specifically limited in the embodiment of the present invention.
- Step 208 Perform reconstruction according to the scene information and the reconstructed residual to obtain a plurality of video frames.
- the scene feature prediction coding data and the reconstruction residual include information of the video frame, and the video information and the reconstruction residual are reconstructed to obtain a plurality of video frames.
- the redundancy of the video frames can be reduced, so that in the encoding operation, the obtained scene features and the reconstructed residual total compressed data amount are relative to the original video.
- the amount of compressed data of the frame is reduced, reducing the amount of data obtained after compression.
- Each video frame is reconstructed into a scene feature and a reconstructed residual. Since the reconstructed residual includes residual information other than the scene information, the amount of information is small and sparse, and the feature can be compared when performing predictive coding.
- the codewords are less predictively encoded, the amount of encoded data is small, and the compression ratio is high.
- the method of the embodiment of the present invention can effectively improve the compression efficiency of a video frame.
- the embodiments of the present invention can be used in various scenarios, for example, the video frame encoding and decoding method of the foregoing embodiment of the present invention is used in an HEVC scenario.
- the video frame obtained in step 201 of the foregoing embodiment is a key frame (I frame) in the HEVC scenario.
- the method in the embodiment of the present invention further includes: reconstructing the key frame (I frame) And use the traditional BP frame inter-prediction coding for the remaining frames as a reference.
- the method of the embodiment of the present invention further includes performing transform coding, quantization coding, and entropy coding on the predictive coded data according to the HEVC coding process to obtain video compression data.
- the predictive coding data includes scene feature prediction encoded data, residual predictive encoded data, B-frame predictive encoded data, and P-frame predictive encoded data.
- FIG. 3a is a schematic diagram of a flow of a video encoding method and a flow of an existing HEVC encoding method according to an embodiment of the present invention
- FIG. 3b is a scenario related to a video encoding method according to an embodiment of the present invention. schematic diagram.
- the video compression data is subjected to entropy decoding, inverse quantization processing, and DCT (discrete cosine transformation) to obtain corresponding prediction according to the HEVC decoding process.
- Encoded data The above-described operations of steps 205 to 208 are then performed using the scene feature prediction encoded data and the residual prediction encoded data in the predictive encoded data.
- the video frame reconstructed in step 208 is a key frame.
- the method in the embodiment of the present invention further includes performing BP frame decoding according to the decoded key frame data, and arranging the decoded data frames in time sequence to obtain a complete sequence of the original video. .
- FIG. 4a is a schematic diagram of a comparison between a flow of a video decoding method and a flow of an existing HEVC decoding method according to an embodiment of the present invention.
- FIG. 4 is a schematic diagram of a scenario of a video decoding method according to an embodiment of the present invention.
- the original HEVC is too dependent on the I frame coding and the compression efficiency is not high.
- the method of the embodiment of the present invention is used for the key frame.
- the I frame is independently coded, so that the I frame compression data amount is high, and I There is a large amount of information redundancy between frames.
- the redundant information of the I frame is reduced, and the amount of encoded data of the I frame is reduced.
- the method of the embodiment of the present invention identifies and classifies a video shot and a scene, performs overall data analysis and reconstruction on a key frame (I frame) in the scene, and encodes the scene feature and the representation residual. It effectively avoids the problem of inefficient compression in a single key frame, and introduces video context information to improve the compression ratio.
- the method in the embodiment of the present invention can also be used in other video frames that need to be independently coded, and reconstructed by using a video frame that needs to be independently coded to obtain scene information and reconstructed residuals, and separately coded. Reduce the amount of compressed data that would otherwise require a separately encoded video frame.
- the method of the embodiment of the present invention is described in the context of the HEVC standard. It should be understood that the video frame encoding and decoding method provided by the embodiment of the present invention can also be applied to other scenarios. The specific usage scenarios are not limited in the embodiment of the present invention.
- the overall frame picture of the reconstructed video frame has redundant data
- the partial frame picture of the reconstructed video frame has redundant data
- the overall frame picture of the video frame has redundant data
- FIG. 5 is a flowchart of a method for a video encoding method according to an embodiment of the present invention.
- a video encoding method according to an embodiment of the present invention includes:
- Step 501 Acquire a video stream.
- the encoding device acquires a video stream that includes a plurality of video frames.
- Step 502 Perform lens segmentation on the video stream to obtain multiple shots.
- the lens segmentation module of the encoding device may segment the video stream into multiple shots to extract a video frame to be reconstructed according to the lens.
- the lens segmentation module of the encoding device may segment the video stream into multiple shots to extract a video frame to be reconstructed according to the lens.
- the lens includes temporally consecutive video frames, and the lens represents a temporally and spatially continuous motion in a scene.
- step 502 can be implemented by the following steps:
- Step A1 Acquire a video stream.
- Step A1 is step 501, wherein the video stream includes a plurality of video frames.
- Step A2 Extract feature information of the first video frame and the second video frame, respectively.
- the feature information is used to describe the picture content of the video frame.
- it may be analyzed by feature information of the video stream, which is information for describing characteristics of the video frame, for example, image color, shape, edge contour or texture feature, and the like.
- the first video frame and the second video frame are video frames in the video stream, and the first video frame and the second video frame are not currently assigned to any of the shots.
- Step A3 Calculate the lens distance between the first video frame and the second video frame according to the feature information.
- the lens distance is used to indicate the degree of difference between the first video frame and the second video frame.
- Step A4 Determine whether the lens distance is greater than a preset lens threshold.
- the preset lens threshold can be set manually.
- Step A5 If the lens distance is greater than the preset lens threshold, the target lens is segmented from the video stream, and if the lens distance is less than the preset lens threshold, the first video frame and the second video frame are attributed to the same lens.
- the start frame of the target lens is the first video frame
- the end frame of the target lens is the previous video frame of the second video frame
- the target lens belongs to one of the lenses of the video stream
- the lens is a segment in time. Continuous video frames.
- the lens distance between the first video frame and the second video frame is greater than the preset lens threshold, indicating that the difference between the first video frame and the second video frame reaches a preset requirement
- the first video frame and the second video frame are The difference between the video frame and the first video frame does not reach a preset requirement, that is, less than the preset lens threshold, so that in the video stream, from the first video frame to the previous video frame of the second video frame
- the video frame belongs to the target lens. Otherwise, when the first video frame is located before the second video frame, the lens distance is calculated by the next frame of the second video frame and the first video frame, and steps A4 and A5 are repeated. Thus, through the repeated execution of the above steps, multiple shots can be obtained from the video stream.
- the feature information of the video frame is first extracted, and the content is measured based on the feature.
- a more common method is to extract image color, shape, edge contour or texture features, or extract multiple features and normalize them.
- the method of the embodiment of the present invention describes the image by using a block color histogram.
- the video image frame is first scaled to a fixed size (eg 320*240) and the image is downsampled to reduce the effects of noise on the image. Then, the image is 4*4 divided, and each block extracts an RGB color histogram. To reduce the impact of illumination on the image, the histogram is equalized. Then, the distance between the video frames is calculated based on the feature information of the video frame.
- the distance between video frames can be measured by a measure such as Mahalanobis distance and Euclidean distance.
- this example uses the normalized histogram intersection method to measure.
- the preset lens threshold is preset. When the lens distance is greater than the preset lens threshold, the video frame in the front of the two video frames in which the lens distance is calculated is determined as the frame boundary start frame, and the video frames in the two video frames are located behind. The previous frame is determined to be the boundary end frame of the previous shot, otherwise the two video frames belong to the same shot. Finally, you can split a complete video into multiple sets of separate shots.
- Step 503 Extract key frames from the obtained shots.
- a key frame is extracted from each lens, and the reconstruction operation of the method of the embodiment of the present invention is performed with the key frame.
- step 303 can be implemented by the execution of step A5.
- Step A5 For each shot in the video stream, the key frame is extracted according to the frame distance between the video frames in the shot.
- the frame distance between any two adjacent key frames in each shot is greater than a preset frame distance threshold, and the frame distance is used to indicate the degree of difference between the two video frames. Then, the reconstruction of the plurality of video frames is performed with key frames of each shot to obtain scene information and a reconstruction residual of each video frame.
- current key frame extraction algorithms mainly include sampling-based methods, color feature-based methods, content-based analysis methods, motion analysis-based methods, cluster-based methods, and compression-based methods.
- sampling-based methods mainly include sampling-based methods, color feature-based methods, content-based analysis methods, motion analysis-based methods, cluster-based methods, and compression-based methods.
- the starting frame of each shot is set as a key frame.
- Feature description and distance metric are used for each frame by using block color histogram feature and histogram intersection method.
- the method of the embodiment of the present invention increases the judgment of the type of each lens, that is, first determines whether the lens is a static picture according to the adjacent frame feature space distance, if all the frames in the lens are between If the frame distance is 0, it is determined to be a static picture, and the key frame is no longer extracted, otherwise it is a dynamic picture.
- the content of each frame is measured in chronological order from the previous key frame, and if the distance is greater than the set threshold, the frame is set as a key frame.
- Figure 8 shows the key frame extraction process.
- the method of the embodiment of the present invention is described in the HEVC scenario.
- the lens obtained in the above steps can be used as a GOP, and one lens is a GOP.
- the start frame of the lens is a key frame
- the lens is taken from the lens through step A5.
- the video frame extracted in is also a key frame
- other video frames of the lens can be used as a B frame and a P frame.
- the key frame extraction operation of the embodiment of the present invention takes into account the context context information of the video, so that when the key frames are subsequently classified, the classification effect of the key frames is better, which contributes to the improvement of the compression ratio of the subsequent coding. .
- the key frame sequence is quickly generated, and can respond to the user's fast forward and switch scene requirements in time.
- the user can preview the video scene according to the sequence of key frames, and accurately locate the video scene segments that are of interest to the user, thereby improving the user experience.
- a video stream is acquired, wherein the video frames of the video stream include an I frame, a B frame, and a P frame. Then, an I frame is extracted from the video stream, and step 504 or step 505 is performed with the I frame.
- the encoding device acquires a plurality of key frames, which are video frames to be reconstructed to reduce redundancy.
- the method of the embodiment of the present invention further includes the step of classifying the key frame, that is, step 504.
- Step 504 Classify a plurality of key frames based on the correlation of the picture content to obtain key frames of one or more classification clusters.
- the method of the embodiment of the present invention may perform the step 505 in the key frame of the same cluster by using the method of the embodiment of the present invention.
- the screen content between the key frames is highly correlated, and there is a large amount of redundant data. If the classification effect is better, that is, the plurality of key frame information in the same cluster is highly aggregated. The greater the redundancy of multiple key frames in the same cluster, the more significant the effect of subsequent reconstruction operations on the reduction of redundancy.
- one or more classification clusters are obtained after the classification operation, and there are more portions of the same content content among the multiple key frames in the same classification cluster, so that redundant data redundancy between the key frames is performed. Larger.
- the classification operation if different key frames are classified based on the lens, the classification may also be referred to as a scene classification. Of course, the classification operation may also directly classify different key frames without being based on the lens.
- the classification operation of the method provided by the embodiment of the present invention is referred to as a scene classification operation.
- the clustering classification method the plurality of key frames are classified based on the correlation of the screen content, and the key frames of the one or more classification clusters are obtained, including:
- Step B1 Extract feature information of each key frame of the plurality of key frames.
- the feature information of the key frame may be an underlying feature or a middle layer semantic feature.
- Step B2 Determine the cluster distance between any two key frames according to the feature information.
- the cluster distance is used to represent the similarity between two key frames.
- Any two key frames here include all the key frames extracted in the above steps, which may be key frames belonging to different shots, or key frames belonging to the same shot.
- the difference between frames in the lens is smaller than the difference between frames in different lenses.
- different feature spaces may be selected, and different feature spaces correspond to different metrics, so that the cluster distance and the lens distance may be different.
- Step B3 Cluster the video frames according to the cluster distance to obtain video frames of one or more clusters.
- scene classification is achieved by analyzing and clustering key frames of each lens.
- Scene classification is closely related to scene reconstruction.
- the first principle of scene classification is that the key frames in each cluster are highly correlated at the content level of the screen, and there is a large amount of information redundancy.
- the existing scene classification algorithms are mainly divided into two categories: a) based on the underlying feature scene classification algorithm; b) based on the middle layer semantic feature modeling scene classification algorithm. These methods are based on feature detection and description, and reflect the description of the scene content at different levels.
- the underlying image features may include features such as color, edge, texture, SIFT (Scale-invariant feature transform), HOG (Histogram of Oriented Gradient), and GIST.
- Middle-level semantic features include Bag of Words, deep learning network features, and more.
- the embodiment of the present invention selects a relatively simple GIST global feature to describe the overall content of the key frame.
- the distance measure function uses the Euclidean distance to measure the similarity of the two images.
- the clustering algorithm can adopt traditional K-means, graph cutting, hierarchical clustering and other methods.
- a condensed hierarchical clustering algorithm is used to cluster key frames. The method cluster number depends on the similarity threshold setting. The higher the threshold setting, the greater the amount of keyframe information redundancy within the class and the corresponding number of clusters.
- the specific flow chart of the scene classification is shown in the following figure.
- the above clustering-based scene classification strategy is beneficial to the improvement of coding speed.
- the following classification mechanism based on the classifier model is beneficial to the improvement of coding precision.
- the main idea of the scene classification strategy based on the classifier model is to perform discriminant training on each shot according to the shot segmentation result to obtain a plurality of discriminant classifiers.
- Each key frame is discriminated by the classifier, and the key frame with a high score is considered to be the same scene as the lens.
- the specific process is as follows:
- the classification method of the video coding method in the embodiment of the present invention includes:
- Step C1 Perform discrimination training according to each shot segmented from the video stream to obtain a plurality of classifiers corresponding to the shots.
- the optional classifier models are: decision tree, Adaboost, Support Vector Machine (SVM), deep learning and other models.
- SVM Support Vector Machine
- Step C2 Using the target classifier to discriminate the target key frame to obtain a discriminant score.
- the target classifier is one of the plurality of classifiers obtained in step C1
- the target video frame is one of the key frames
- the discriminant score is used to indicate the extent to which the target video frame belongs to the scene to which the target classifier belongs.
- Step C3 When the discriminant score is greater than the preset score threshold, it is determined that the target video frame belongs to the same scene as the shot to which the target classifier belongs.
- the target key frame When the discriminant score is greater than the preset score threshold, the target key frame may be considered to belong to the same scene as the shot to which the target classifier belongs. Otherwise, the target key frame and the shot to which the target classifier belongs are not considered to belong to the same scene.
- Step C4 Determine video frames of one or more clusters according to video frames belonging to the same scene as the shot.
- the operation of classifying using a classifier includes two main phases, as follows:
- y i is the label corresponding to the i-th training sample
- the positive sample corresponds to the label 1 and the negative sample is -1.
- ⁇ ( ⁇ ) is the feature mapping function
- n is the total number of training samples
- w is the classifier parameter
- I is the training sample.
- the keyframes are discriminated by the classifier model trained by each lens.
- the specific formula is as follows:
- the keyframes are discriminated by the classifier model trained by each lens.
- the specific formula is as follows:
- w j and b j are the classifier parameters corresponding to the jth lens, and the denominator is the normalization factor.
- the probability is greater than the set threshold, it is considered that the key frame i and the shot j belong to one scene.
- i and j are positive integers.
- the correspondence between multiple sets of key frames and shots can be obtained. These correspondences indicate that the key frames and the shots belong to the same scene, and then the encoding device can determine the video frames of one or more clusters according to the corresponding relationships. .
- step 504 may not be included.
- the redundancy of the redundant data between the key frames in the same classification cluster is large, thereby When the subsequent reconstruction of the key frames of the same cluster is performed, the redundancy of the redundant data can be further reduced to further reduce the amount of encoded data.
- the video is compressed according to the scene, and the content is clipped later, and the video green mirror (that is, the essence video is generated according to the heat analysis) is facilitated.
- Step 505 Perform reconstruction on multiple key frames of the same cluster to obtain scene features and reconstruction residuals of each video frame.
- Each of the plurality of key frames includes the same picture content, that is, redundant data included on the picture content between each key frame. If these key frames are not reconstructed, the encoding device will repeatedly encode the same picture content between these key frames.
- the reconstructed scene features are used to represent the same picture content between each video frame, such that the scene information includes data resulting from reducing redundancy of redundant data.
- the reconstructed residual is used to represent the difference between the key frame and the scene feature.
- the scene feature thus obtained may represent the overall information of the frame, so that the reconstruction operation of step 505 is directed to a scene in which the entire screen of the plurality of video frames has the same picture content.
- step 505 The specific implementation of step 505 is as follows:
- observation matrix Convert keyframes of the same taxonomy into observation matrices.
- the observation matrix is used to represent the plurality of key frames in a matrix form. Then, the observation matrix is reconstructed according to the first constraint condition to obtain a scene feature matrix and a reconstructed residual matrix.
- the scene feature matrix is used to represent the scene features in a matrix form
- the reconstructed residual matrix is used to represent the reconstructed residuals of the plurality of key frames in a matrix form.
- the first constraint is used to define the scene feature matrix low rank and the reconstructed residual matrix is sparse.
- the observation matrix is reconstructed according to the first constraint condition, and the scene feature matrix and the reconstructed residual matrix are obtained, including: calculating the scene feature matrix and reconstructing the residual according to the first preset formula a difference matrix, the scene feature matrix is a low rank matrix, and the reconstructed residual matrix is a sparse matrix;
- the first preset formula is:
- D is the observation matrix
- F is the scene feature matrix
- E is the reconstructed residual matrix
- ⁇ is the weight parameter
- ⁇ is used to balance the relationship between the scene feature matrix F and the reconstructed residual matrix E.
- Rank( ⁇ ) is a matrix for the rank function
- 1 is the matrix L1 norm
- * is the matrix kernel norm.
- the scene reconstruction is to perform content analysis on the scene of each cluster cluster obtained by the scene classification, and extract scene features and representation coefficients suitable for reconstructing all key frames in the scene.
- Models that can be used for scene reconstruction include RPCA (Robust Principle Component Analysis), LRR (low rank representation), SR (sparse representation), SC (sparse coding), and SDAE (Sparse Self-Coded Deep Learning Model). , CNN (convolution neural network) and so on.
- the representation coefficient of the embodiment of the present invention may be represented by a unit matrix, and the scene feature and the representation coefficient are still multiplied by the scene feature. Of course, in some embodiments of the present invention, since the representation coefficient can be ignored, it can be used as a unit matrix.
- the representation coefficient may or may not be used. In this case, in the decoding and reconstruction stage, only the scene feature and the reconstruction residual are required to represent the original video frame.
- the video coding method in this embodiment uses RPCA to reconstruct key frames in the scene.
- the scene reconstruction strategy based on RPCA method reconstructs the overall content data of key frames, which can reduce the square phenomenon caused by block prediction.
- a scene S contains N key frames, that is, a certain cluster includes N key frames, and N is a natural number.
- D the observation matrix
- Ii the ith key frame
- ⁇ is a weight parameter used to balance the relationship between the scene feature matrix F and the reconstructed residual matrix E
- rank( ⁇ ) is a matrix rank function
- 1 is a matrix L 1 norm.
- Figure 11 shows an example diagram based on RPCA scene reconstruction, where key frames 1 to 3 belong to different shot segments of the same video.
- the scene feature matrix F rank is 1, so only one column of the matrix needs to be data compressed.
- the residual matrix E has a value of 0 in most of the regions, so only a small amount of information is needed to represent E.
- the scene feature in the embodiment of the present invention is one specific implementation manner of the scenario information, and the step 505 is to reconstruct multiple video frames to obtain scene information and one of reconstructed residuals of each video frame. Specific implementation.
- the method of the embodiment of the present invention may perform a reconstruction operation on a key frame having redundant data of the frame overall information.
- the key frame needs to be detected first to determine the current Whether the selected key frame is suitable for the reconstruction operation of the method of the embodiment of the present invention, so that the adaptive coding can be performed according to the content of the video scene.
- the method of the embodiment of the present invention further includes: extracting each of the multiple video frames, before the reconstructing the plurality of video frames to obtain the scene features and the reconstructed residuals of the video frames.
- the picture feature information of the frame wherein the extracted picture feature information may be a global feature or a local feature of the video frame, and specifically includes a GIST global feature, a HOG global feature, a SIFT local feature, and the like, which are not specifically limited in this embodiment of the present invention.
- the encoding device calculates the content metric information according to the picture feature information, where the content metric information is used to measure the difference of the picture content of the multiple video frames, that is, the content consistency measurement of the key frame, the key frame Content consistency metrics can be measured in terms of feature variance, Euclidean distance, and the like. And performing the step of reconstructing the plurality of video frames to obtain a scene feature and a reconstruction residual of each video frame, when the content metric information is not greater than a preset metric threshold.
- the method of the embodiment of the present invention further includes:
- Step D1 Extract global GIST features of each of the plurality of video frames.
- This global GIST feature is used to describe the characteristics of keyframes.
- Step D2 Calculate the variance of the scene GIST feature according to the global GIST feature.
- the scene GIST feature variance is used to measure the content consistency of multiple video frames.
- the scene GIST feature variance is used to measure the content consistency of multiple key frames of the same cluster.
- Step D3 When the scenario GIST feature variance is not greater than the preset variance threshold, step 304 is performed.
- the video coding and decoding device may perform intra prediction coding on the scene feature and the reconstructed residual, respectively.
- steps D1 to D3 are specific methods for determining whether the key frame of the same cluster is applicable to step 505.
- Step 506 Perform predictive coding on the scene features to obtain scene feature prediction encoded data.
- Step 507 Perform predictive coding on the reconstructed residual to obtain residual prediction encoded data.
- the predictive coding portion of the encoding device includes two parts of intra prediction coding and inter prediction coding.
- the scene features and reconstruction errors are encoded by intra prediction, and the remaining frames of the shot, that is, the non-key frames of the shot, are inter-predictive encoded.
- the specific process of intra prediction coding is similar to the HEVC intra coding module. Since the scene feature matrix has a low rank, only the key columns of the scene feature matrix need to be encoded.
- the reconstruction error belongs to residual coding, and the amount of coded data is small and the compression ratio is high.
- Step 508 Perform reconstruction according to the scene feature and the reconstructed residual to obtain a reference frame.
- a reference frame In order to perform interframe predictive coding on B and P frames, a reference frame needs to be obtained.
- the key frame is used as the reference frame.
- the reverse reconstruction scheme is adopted to prevent the error from spreading between the BP frames. The reconstruction is performed according to the scene feature and the reconstruction residual, and the following step 509 is performed with reference to the obtained reference frame.
- the BP frame inter prediction can be directly performed through the key frame extracted in step 504.
- Step 509 Perform inter-prediction encoding on the B frame and the P frame with reference to the reference frame, and obtain B frame predictive encoded data and P frame predictive encoded data.
- Inter-frame coding first reconstructs the key frame (I frame) according to the scene features and reconstruction error, and then performs motion compensation prediction and coding on the BP frame content.
- the specific inter prediction encoding process is the same as HEVC.
- Step 510 Perform transform coding, quantization coding, and entropy coding on the prediction encoded data to obtain video compressed data.
- the predictive coded data includes scene feature predictive coded data, residual predictive coded data, B frame predictive coded data, and P frame predictive coded data.
- the data is subjected to change coding, quantization coding, and entropy coding on the basis of predictive coding, which is the same as HEVC.
- the video coding method in the embodiment of the present invention can improve the video compression ratio.
- the entire scene information can be represented by a small amount of information, and the code rate is lowered, and the video quality is guaranteed.
- the reduction of the compression volume is more suitable for the transmission and storage of images in a low bit rate environment.
- the existing on-demand (VOD), personal video recording (NPVR), and Catch-up TV video services account for 70% of the server's storage resources and network bandwidth.
- the technical solution of the embodiment of the invention can reduce the pressure of the storage server and improve the network transmission efficiency.
- the CDN edge node can store more videos, the user hit rate will be greatly increased, the return rate is reduced, the user experience is improved, and the network device consumption is reduced.
- the method in the embodiment of the present invention can generate different code rate videos by performing feature extraction on different levels of the scene.
- the same picture content is de-duplicated and represented by scene features, which can reduce the redundancy of redundant information of the multiple video frames. Therefore, in the encoding operation, the obtained scene feature and the compressed data amount of the reconstructed residual total are reduced relative to the compressed data amount of the original video frame, and the amount of data obtained after compression is reduced.
- Each video frame is reconstructed into a scene feature and a reconstructed residual. Since the reconstructed residual includes residual information other than the scene information, the amount of information is small and sparse, and the feature can be compared when performing predictive coding.
- the codewords are less predictively encoded, the amount of encoded data is small, and the compression ratio is high.
- the method of the embodiment of the present invention can effectively improve the compression efficiency of a video frame.
- the video codec device may perform a decompression operation on the compressed encoded data.
- FIG. 6 is a flowchart of a method for decoding a video according to an embodiment of the present invention.
- a video decoding method according to an embodiment of the present invention includes:
- Step 601 Acquire video compression data.
- the decoding device acquires video compressed data, which may be video compressed data obtained by the video encoding method of the embodiment shown in FIG. 5.
- Step 602 Perform entropy decoding, inverse quantization processing, and DCT inverse change on the video compressed data to obtain predictive encoded data.
- the prediction encoded data includes scene feature prediction encoded data, residual prediction encoded data, B-frame predictive encoded data, and P-frame predictive encoded data.
- the video compression data needs to be entropy decoded, inverse quantized, and DCT inversely changed according to the HEVC decoding process to obtain corresponding predictive encoded data.
- Step 603 Decode the scene feature prediction encoded data to obtain a scene feature.
- the scene feature is used to represent the same picture content between each video frame, and the scene feature obtained by decoding the scene feature prediction encoded data represents each video in the plurality of video frames.
- the same picture content between frames is used to represent the same picture content between frames.
- Step 604 Decode the residual prediction encoded data to obtain a reconstructed residual.
- the reconstructed residual is used to represent the difference between the video frame and the scene information.
- the scene feature prediction encoded data and the key frame error prediction encoded data are respectively decoded to obtain a scene feature matrix F and a reconstructed residual e i .
- Step 605 Perform reconstruction according to the scene feature and the reconstructed residual to obtain multiple I frames.
- the key frame is reconstructed to obtain the scene feature and the reconstruction residual. Therefore, in the coding method of the video frame, the scene feature and the reconstruction residual are reconstructed. The result is multiple keyframes.
- Step 606 Perform inter-frame decoding on the B frame predictive coded data and the P frame predictive coded data by using the I frame as a reference frame to obtain a B frame and a P frame.
- Step 607 Arranging the I frame, the B frame, and the P frame in chronological order to obtain a video stream.
- the video streams are obtained by arranging the three types of video frames in chronological order.
- the original data reconstruction is performed in combination with the decoded scene feature F and the key frame error e i to obtain key frame decoded data.
- BP frame decoding is performed according to the decoded key frame data, and the decoded data frames are arranged in chronological order to obtain a complete sequence of the original video.
- the scene feature prediction coded data and the residual prediction coded data are obtained, and the data can be obtained by the video decoding method shown in FIG. 6. Decoding to get a video frame.
- the embodiment shown in FIG. 5 is mainly applied to perform efficient compression in a redundant scenario in which overall information between key frames exists.
- the embodiment shown in FIG. 12 is applied to perform efficient compression in a redundant scene where local information of key frames exists, and the local information may be, for example, a texture image, a lens gradation, or the like.
- FIG. 12 is a flowchart of a method for a video encoding method according to an embodiment of the present invention.
- a video decoding method provided by an embodiment of the present invention includes:
- Step 1201 Acquire a video stream.
- step 1201 For details of the implementation of step 1201, reference may be made to step 501.
- Step 1202 Perform lens segmentation on the video stream to obtain multiple shots.
- step 1202 The implementation details of step 1202 can be referred to step 502.
- Step 1203 Extract key frames from the obtained shots.
- step 1203 For details of the implementation of step 1203, refer to step 503.
- the video frame to be reconstructed is obtained, and the video stream may also be acquired by other methods, for example, the video frame of the video stream. Includes I frames, B frames, and P frames. Then, extracting an I frame from the video stream, and performing subsequent steps of splitting each of the plurality of video frames by using the I frame to obtain a plurality of frame sub-blocks;
- Step 1204 classify a plurality of key frames based on the correlation of the picture content to obtain key frames of one or more classification clusters.
- step 1204 The implementation details of step 1204 can be referred to step 504.
- the method of the embodiment of the present invention may perform a reconstruction operation on a key frame in which frame local information has redundant data.
- the key frame needs to be detected first to determine the current Whether the selected key frame is suitable for the reconstruction operation of the method of the embodiment of the present invention, that is, before each video frame of the plurality of video frames is split to obtain a plurality of frame sub-blocks, the embodiment of the present invention
- the method further includes: extracting picture feature information of each of the plurality of video frames, wherein the extracted picture feature information may be a global feature or a local feature of the video frame, specifically a GIST global feature, a HOG global feature,
- the SIFT local features and the like are not specifically limited in the embodiment of the present invention.
- the encoding device calculates the content metric information according to the picture feature information, where the content metric information is used to measure the difference of the picture content of the multiple video frames, that is, the content consistency measurement of the key frame, the key frame Content consistency metrics can be measured in terms of feature variance, Euclidean distance, and the like.
- the content metric information is greater than the preset metric threshold, performing the step of splitting each of the plurality of video frames to obtain a plurality of frame sub-blocks.
- the method of the embodiment of the present invention further includes:
- Step E1 Extract global GIST features of each of the plurality of video frames.
- step E1 is to extract global GIST features for each of a plurality of key frames of the same cluster. This global GIST feature is used to describe the characteristics of keyframes.
- Step E2 Calculate the variance of the scene GIST feature according to the global GIST feature.
- the scene GIST feature variance is used to measure the content consistency of multiple video frames
- the scene GIST feature variance is used to measure the content consistency of multiple key frames of the same cluster.
- Step E3 When the scene GIST feature variance is greater than the preset variance threshold, step 1205 is performed.
- the video frames in the steps E1 to E3 are key frames in the HEVC scenario.
- the key frames are key frames belonging to the same cluster.
- steps E1 to E3 are specific methods for determining whether the key frame of the same cluster is applicable to step 1205. If the variance of the scene GIST feature of the plurality of key frames is greater than the preset variance threshold, it indicates that the local part of the frame picture of the multiple key frames has redundant data, so that step 1205 or step 1206 can be performed on the multiple key frames to Reduce the redundancy of these local redundant data.
- Step 1205 Split each video frame in multiple video frames to obtain multiple frame sub-blocks.
- the encoding device splits multiple key frames of the same cluster to obtain a plurality of frame sub-blocks.
- each of the plurality of video frames includes redundant data at a local location, that is, redundant data exists between different video frames and within a video frame, and these redundancy
- the data is in a local location of the frame.
- one video frame has a window image in the lower part of the frame, and the other video frame has the same window image in the upper part of the frame.
- the window image constitutes redundant data.
- the video frames can be split first, and the frame picture at this time is a frame sub-block picture, and the granularity of the redundant data relative to the frame picture is reduced, thereby facilitating the acquisition of the scene feature base, and the scene feature base See the description of step 1206 for details.
- the plurality of frame sub-blocks obtained by the splitting may be equal in size or unequal.
- the frame sub-blocks may be pre-processed, such as zooming in or out.
- Step 1206 Perform reconstruction on multiple frame sub-blocks to obtain a scene feature, a representation coefficient of each frame sub-block in the plurality of frame sub-blocks, and a reconstruction residual of each frame sub-block.
- the scene feature includes multiple independent scene feature bases, and the independent scene feature bases in the scene feature cannot be reconstructed from each other.
- the scene feature base is used to describe the screen content features of the frame sub-block, and the representation coefficients represent the scene features.
- the correspondence between the base and the frame sub-block, the reconstructed residual represents the difference between the frame sub-block and the scene feature base.
- the reconstructed residual may be a specific value or zero.
- the representation coefficients may be stored in separate fields, passed by encoding the ancillary information, such as by adding corresponding fields within the image header, strip header or macroblock information.
- the scene feature base can be configured in various forms, for example, it can be a certain frame sub-block, or a feature block in a specific space.
- Multiple scene feature bases may constitute scene features.
- different scene feature bases cannot be reconstructed from each other, and thus these scene feature bases constitute a basic image unit.
- the basic image unit and the corresponding reconstructed residual combination can obtain a certain frame sub-block. Since there are multiple basic image units, it is necessary to represent the coefficients to match the scene feature base and the reconstructed residual corresponding to the same frame sub-block.
- one frame sub-block may correspond to one scene feature base, or may correspond to multiple scene feature bases. When multiple scene feature bases correspond to one frame sub-block, the scene feature bases are superimposed on each other and reconstructed residuals are performed. The reconstructed frame sub-block is obtained.
- the scene features are composed of scene feature bases, and the scene feature bases in one scene feature cannot be reconstructed from each other, and the additional parameter reconstruction residuals represent the difference between the frame sub-block and the scene feature base, thereby being composed of multiple frames.
- the scene feature may record only one of the same scene feature bases, and the scene information includes data obtained by reducing redundancy of redundant data.
- the data of the frame sub-block is converted into data composed of the reconstructed residual and the scene feature, and the redundancy of the redundant data is reduced.
- the video coding method of the embodiment of the present invention may refer to FIG. 3b.
- the method further includes the representation coefficient C.
- the representation coefficient C For example, after performing scene reconstruction on the key frame of the scene 1, the weight is obtained.
- C1, C3, and C5 are the representation coefficients of the key frames I1, I3, and I5, respectively.
- Step 1205 and step 1206 described above are one of the specific forms of the steps of reconstructing a plurality of video frames to obtain scene information and reconstruction residuals of each video frame.
- the encoding apparatus reconstructs a plurality of frame sub-blocks to obtain a representation coefficient of each frame sub-block of the plurality of frame sub-blocks and a reconstruction residual of each frame sub-block.
- the representation coefficient represents a correspondence between a frame sub-block and a target frame sub-block
- the target frame sub-block is an independent frame sub-block among the plurality of frame sub-blocks
- the independent frame sub-block is not based on other ones of the plurality of frame sub-blocks
- the frame sub-block reconstructed from the frame sub-block is reconstructed to represent the difference between the target frame sub-block and the frame sub-block.
- the encoding device combines a plurality of target frame sub-blocks indicating the coefficient indication to obtain a scene feature, and the target frame sub-block is a scene feature base.
- the frame sub-blocks that are independently represented are determined by the reconstruction operation, and the independently represented frame sub-blocks are now referred to as target frames. Piece.
- the target frame sub-block and the non-target frame sub-block are included in the obtained multiple frame sub-blocks, the target frame sub-block cannot be reconstructed based on other target frame sub-blocks, and the non-target frame sub-block can be obtained based on other target frame sub-blocks. .
- the scene features are composed of target frame sub-blocks, which can reduce the redundancy of redundant data. Because the scene feature base is the original frame sub-block, the scene feature base constituting the scene feature can be determined according to the indication of the representation coefficient.
- one of the two frame sub-blocks includes a window pattern 1301, and the frame sub-block plus the gate image 1303 can obtain another frame sub-block, so that the previous frame sub-block is targeted.
- Frame sub-block 1302, the next frame sub-block is a non-target frame sub-block 1304.
- the target frame sub-block and the reconstruction residual of the gate pattern are reconstructed to obtain the target frame sub-block, so that in the scene including the two frame sub-blocks, the window pattern of the two-frame sub-block is redundant data.
- the reconstructed residual of the target frame sub-block and the gate is obtained, two representation coefficients, one representation coefficient indicates the target frame sub-block itself, and the other representation coefficient indicates the target frame sub-block and the gate.
- the target frame sub-block is a scene feature base.
- a frame sub-block is obtained as a target frame sub-block according to a representation coefficient indicating the target frame sub-block itself, and the target frame sub-block is represented according to a representation coefficient indicating a correspondence relationship between the target frame sub-block and the reconstructed residual of the gate.
- the reconstruction residual of the AND gate is reconstructed to obtain another frame sub-block.
- reconstructing a plurality of frame sub-blocks to obtain a representation coefficient of each frame sub-block of the plurality of frame sub-blocks and a reconstruction residual of each frame sub-block including:
- the coefficient matrix is a matrix including the representation coefficients of each frame sub-block of the plurality of frame sub-blocks, indicating the non-coefficient of the coefficients
- the zero coefficient indicates a target frame sub-block
- the reconstructed residual matrix is used to represent the reconstructed residual of each frame sub-block in a matrix form
- the second constraint is used to define a low rank and sparsity of the representation coefficient.
- the target frame sub-block indicated by the non-zero coefficient indicating the coefficient in the coefficient matrix is combined to obtain a scene feature.
- reconstructing the observation matrix according to the second constraint condition to obtain a representation coefficient matrix and a reconstruction residual matrix including:
- the representation coefficient matrix and the reconstructed residual matrix are calculated, and the second preset formula is:
- a scene S contains N key frames, that is, the same cluster includes N key frames, and N is a natural number.
- Pull each sub-block into a column vector to form an observation matrix D ie Since there is a large amount of redundancy in the information content between the key frame and the key frame, the matrix can be regarded as a union of a plurality of subspaces.
- the goal of scene reconstruction is to find these independent subspaces and solve the representation coefficients of the observation matrix D in these independent subspaces.
- Space refers to a collection with some specific properties.
- the observation matrix D contains a plurality of image feature vectors, and the representation space formed by these vectors is a full space.
- a subspace is a partial space that represents a dimension that is smaller than the full space. This subspace is the space formed by independent frame sub-blocks.
- the scene reconstruction problem can be transformed into the following optimization problem to describe:
- C is the coefficient of representation.
- the scene features corresponding to each subspace can be obtained.
- the non-zero number of coefficients C corresponds one-to-one with the number of scene feature bases.
- the representation coefficient of this embodiment refers to a coefficient matrix (or vector) represented by each scene feature base in the scene feature in the key frame reconstruction process, that is, a correspondence relationship between the frame sub-block and the scene feature base.
- the representation coefficient between different independent frame sub-blocks is usually 0.
- the grass image does not contain the lake scene feature, so the coefficient of the image block represented by the lake scene feature is usually zero.
- each frame sub-block in the observation matrix D can be represented by observing other frame sub-blocks in the matrix D, independent frame sub-blocks. It is itself represented by itself.
- Each column in the representation coefficient matrix C is a representation coefficient of a frame sub-block
- ⁇ and ⁇ are weight parameters, and the coefficient sparsity and low rank are adjusted.
- the above optimization problem can be solved by matrix optimization algorithms such as APG and IALM.
- the final scene feature consists of a feature base corresponding to a non-zero coefficient C.
- the representation coefficients need to be sparsely constrained, that is, the representation coefficients of the frame sub-blocks belonging to the same type of scene (for example, both are grassland) are not only strongly correlated but also indicate that the coefficients are mostly 0, and a small portion is not 0.
- the image sub-block corresponding to the representation coefficient is the scene feature that needs to be encoded eventually.
- c1_2 is not 0.
- the scene feature base is d2, that is, the frame sub-block d1 can be represented based on the frame sub-block d2, the frame sub-block d2 is an independent frame sub-block, and the reconstructed residuals of the frame sub-block d2 and the frame sub-block d1 are heavy.
- the embodiment of the present invention converts the information amount of the I frame into the scene feature base information and the residual matrix information.
- the redundancy of the I frame information amount is reflected in the scene feature base and the residual matrix information.
- Multiple I frames have the same scene feature base, so only the scene feature base needs to be encoded once to greatly reduce the amount of encoded data.
- each sub-block is first reconstructed according to the decoded scene features, representation coefficients and reconstruction errors, and then the sub-blocks are combined by number to obtain the final key frame content.
- Figure 14 shows an example of scene reconstruction based on local information representation.
- the frame sub-blocks may be arranged in a preset order without using the number of the sub-blocks, and the reconstructed process is performed according to the preset rule.
- the video sub-blocks can be combined to obtain a video frame.
- This implementation can mine the texture structure existing in the key frame. If there are a large number of texture features in the scene, the representation coefficient C obtained by the above formula will be low rank and sparse.
- the feature base corresponding to the sparse coefficient is the basic unit of the scene texture structure.
- Figure 15 shows an example diagram of local feature reconstruction under a texture scene.
- the scene content is represented and reconstructed according to the underlying data features of the image.
- the following implementations will use higher-level semantic features to describe and reconstruct the content of the scene to achieve data compression.
- Specific models include Sparse Coding (SC), Deep Neural Network (DNN), Convolutional Neural Network (CNN), Stacked Auto Encoder (SAE), and so on.
- the decoding device reconstructs a plurality of frame sub-blocks to obtain a scene feature and a representation coefficient of each frame sub-block of the plurality of frame sub-blocks.
- the scene feature includes a scene feature set as an independent feature block in the feature space, and the independent feature block is a feature block that cannot be reconstructed by other feature blocks in the scene feature.
- the decoding device calculates the reconstructed residual of each frame sub-block according to the reconstructed residual of each frame sub-block and the reconstructed data of the scene feature and each frame sub-block.
- the scene feature base is an independent feature block in the feature space.
- the feature space may be an RGB color space, a HIS color space, a YUV color space, etc., and different frame sub-blocks do not seem to have the same picture, but After the high-level mapping, the same feature blocks are formed. These same feature blocks constitute redundant data, and the scene features record the same feature blocks only one by one, thereby reducing the redundancy between the frame sub-blocks.
- Such scene features are similar to a dictionary consisting of feature blocks, which are the feature blocks needed to select a frame sub-block from the dictionary and map the corresponding reconstructed residuals.
- one frame sub-block can correspond to multiple feature blocks, and the multiple feature blocks are superimposed and reconstructed by the reconstructed residual to obtain a frame sub-block.
- the plurality of frame sub-blocks are reconstructed to obtain a scene feature and a representation coefficient of each frame sub-block of the plurality of frame sub-blocks, including:
- the coefficient matrix is a matrix including the representation coefficients of each frame sub-block, and the non-zero coefficient indicating the coefficient indicates the scene feature base, as shown
- the scene feature matrix is used to represent the scene features in a matrix form
- the third constraint condition is used to define the similarity between the picture representing the coefficient matrix and the scene feature matrix reconstructed picture and the frame sub-block according to the preset similarity threshold, and the limitation
- the data matrix indicating that the coefficient matrix sparsity conforms to the preset sparse threshold and the scene feature matrix is smaller than the preset data volume threshold;
- the reconstructed residuals of each frame sub-block are calculated according to the reconstructed residual of each frame sub-block and the reconstructed data of the scene feature and each frame sub-block, including:
- the reconstructed residual matrix is calculated according to the data and the observation matrix reconstructed by the coefficient matrix and the scene feature matrix, and the reconstructed residual matrix is used to represent the reconstructed residual in a matrix form.
- the observation matrix is reconstructed according to the third constraint condition, and the representation coefficient matrix and the scene feature matrix are obtained, including:
- the representation coefficient matrix and the scene feature matrix are calculated, and the third preset formula is:
- D is the observation matrix
- C is the coefficient matrix
- F is the scene feature
- ⁇ and ⁇ are the weight parameters, which are used to adjust the coefficient sparsity and low rank.
- a sparse coding model is used for modeling and analysis for description.
- a scene S contains N key frames, and each key frame is evenly split into M equal-sized frame sub-blocks. Pull each frame sub-block into a column vector to form an observation matrix D, ie
- D observation matrix
- ⁇ and ⁇ are weight parameters
- the matrix optimization parameters are scene features F and representation coefficients C.
- the first item in the objective function is to constrain the reconstruction error, so that the picture reconstructed by the scene feature and the representation coefficient is as similar as possible to the original picture.
- the second term is the sparse constraint on the coefficient C, which means that each picture can be reconstructed with a small number of feature bases.
- the last item is to constrain the scene feature F to prevent the F data from being too large, that is, the first item of the formula is the error term, and the last two items are regular items, and the representation coefficients are constrained.
- the specific optimization algorithm can adopt the methods of conjugate gradient method, OMP (Orthogonal Matching Pursuit), LASSO and the like.
- the scene features obtained by the final solution are shown in Fig. 16.
- the dimension and number of the F matrix are consistent with the dimensions of the frame sub-block.
- each small frame of FIG. 16 is a scene feature base
- the scene feature matrix F is a matrix composed of small frames (scene feature bases)
- FC F[c1, c2, c3, ...]
- Fc1 represents The representation coefficient c1 combines the scene feature bases to obtain a linear representation of the feature space, and the reconstructed residual e1 is added to restore the original frame sub-block image I1.
- the scene feature base is directly determined by the observation sample D. That is, the scene feature base is selected from the observation sample D.
- the scene features in this example are learned according to the algorithm.
- the optimization process of the parameter F the iterative solution is performed according to the objective function, and the optimization result can minimize the reconstruction error.
- the amount of coded information is concentrated on F, E.
- the dimension of F is consistent with the dimension of the frame sub-block, and the number of Fs can be set in advance. The less the setting, the less the coding information, but the reconstruction residual E is larger. The more the F setting is, the larger the coding information is, but the reconstruction residual E is smaller, so the number of Fs needs to be weighed by the weight parameter.
- Step 1207 Perform predictive coding on the feature of the scene to obtain scene feature prediction encoded data.
- Step 1208 Perform predictive coding on the reconstructed residual to obtain residual prediction encoded data.
- the predictive coding portion of the encoding device includes two parts of intra prediction coding and inter prediction coding.
- the scene features and reconstruction errors are encoded by intra prediction, and the remaining frames of the shot, that is, the non-key frames of the shot, are inter-predictive encoded.
- the specific process of intra prediction coding is similar to the HEVC intra coding module. Since the scene feature matrix has a low rank, only the key columns of the scene feature matrix need to be encoded.
- the reconstruction error belongs to residual coding, and the amount of coded data is small and the compression ratio is high.
- Step 1209 Perform reconstruction according to the scene feature, the representation coefficient, and the reconstruction residual to obtain a reference frame.
- step 1209 For the specific implementation of step 1209, reference may be made to step 508.
- Step 1210 Perform reference frame prediction on the B frame and the P frame by using the reference frame as a reference, and obtain B frame predictive coded data and P frame predictive coded data.
- step 1210 For the specific implementation of step 1210, reference may be made to step 509.
- Step 1211 Perform transform coding, quantization coding, and entropy coding on the predictive coded data to obtain video compressed data.
- the predictive coded data includes scene feature predictive coded data, residual predictive coded data, B frame predictive coded data, and P frame predictive coded data.
- step 1211 For the specific implementation of step 1211, reference may be made to step 510.
- the embodiment shown in FIG. 12 is described based on the HEVC scenario, but the video encoding method shown in FIG. 12 can also be applied to other scenarios.
- the encoding device acquires a plurality of video frames, and each of the plurality of video frames includes redundant data on the picture content, and in particular, each of the plurality of video frames is in a mutual Local locations include redundant data.
- the encoding device splits each video frame in the plurality of video frames to obtain a plurality of frame sub-blocks, and then reconstructs the plurality of frame sub-blocks to obtain scene features and multiple frame sub-blocks.
- the scene feature includes multiple independent scene feature bases, and the independent scene feature bases in the scene feature cannot be reconstructed from each other.
- the scene feature base is used to describe the screen content features of the frame sub-block, and the representation coefficients represent the scene features.
- the correspondence between the base and the frame sub-block, the reconstructed residual represents the difference between the frame sub-block and the scene feature base.
- the scene features are predictively coded
- the scene feature prediction coded data is obtained
- the reconstructed residual is predictively coded to obtain residual prediction coded data.
- the redundancy of redundant data included in the local location is reduced. Therefore, in the encoding operation, the obtained scene feature and the compressed data amount of the reconstructed residual total are reduced relative to the compressed data amount of the original video frame, and the amount of data obtained after compression is reduced.
- Each video frame is reconstructed into a scene feature and a reconstructed residual. Since the reconstructed residual includes residual information other than the scene information, the amount of information is small and sparse, and the feature can be compared when performing predictive coding.
- the codewords are less predictively encoded, the amount of encoded data is small, and the compression ratio is high.
- the method of the embodiment of the present invention can effectively improve the compression efficiency of a video frame.
- FIG. 17 shows a video decoding method.
- the video decoding method in the embodiment of the present invention includes:
- Step 1701 Acquire scene feature prediction encoded data, residual prediction encoded data, and representation coefficients.
- the decoding device acquires video compressed data, which may be video compressed data obtained by the video encoding method of the embodiment shown in FIG.
- acquiring the scene feature prediction encoded data and the residual prediction encoded data includes: acquiring video compressed data, and then performing entropy decoding, inverse quantization processing, and DCT inverse change on the video compressed data to obtain predictive encoded data.
- the prediction encoded data includes scene feature prediction encoded data, residual prediction encoded data, B frame predictive encoded data, and P frame predictive encoded data;
- Step 1702 Decode the scene feature prediction encoded data to obtain a scene feature.
- the scene feature includes multiple independent scene feature bases, and the independent scene feature bases in the scene feature cannot be reconstructed from each other.
- the scene feature base is used to describe the picture content features of the frame sub-block, and the represented coefficients represent the scene feature base and The correspondence between the frame sub-blocks, and the reconstructed residual represents the difference between the frame sub-block and the scene feature base.
- Step 1703 Decode the residual prediction encoded data to obtain a reconstructed residual.
- the reconstructed residual is used to represent the difference between the video frame and the scene information.
- Step 1704 Perform reconstruction according to the scene feature, the representation coefficient, and the reconstruction residual to obtain a plurality of frame sub-blocks.
- the video decoding method in the embodiment of the present invention may perform multiple reconstructions according to the scene feature, the representation coefficient, and the reconstruction residual. Frame sub-block.
- the method of the embodiment of the present invention may refer to FIG. 4b, but after decoding the scene feature, the representation feature is used to determine the required scene feature base in the scene feature, for example, using the scene feature F1*[C1, C3, C5 After T , the reconstructed residuals E1, E3, and E5 are reconstructed respectively to obtain key frames I1, I3, and I5.
- C1, C3, and C5 are the representation coefficients of the key frames I1, I3, and I5, respectively.
- Step 1705 Combining a plurality of frame sub-blocks to obtain a plurality of video frames.
- Step 1704 and step 1705 are specific implementations of the steps of reconstructing the video information according to the scene information and the reconstructed residual.
- a plurality of frame sub-blocks are combined to obtain a plurality of video frames, and a plurality of frame sub-blocks are combined to obtain a plurality of I frames.
- each sub-block is first reconstructed according to the decoded scene features, representation coefficients, and reconstruction errors, and then the sub-blocks are combined by number to obtain the final key frame content.
- the method of the embodiment of the present invention further includes: inter-frame decoding the B frame predictive encoded data and the P frame predictive encoded data by using the I frame as a reference frame to obtain a B frame and a P frame. Then, the decoding device arranges the I frame, the B frame, and the P frame in chronological order to obtain a video stream.
- the video frame can be decoded by the video decoding method of the embodiment shown in FIG.
- the video frame that performs the reconfiguration operation is obtained, and the embodiment that the video frame is extracted from the acquired video stream and the video frame is directly obtained is obtained.
- the video frame is obtained. This can be obtained by taking a compressed video frame and then decompressing it.
- step 201 can be implemented by the following steps:
- Step F1 Acquire a compressed video stream.
- the compressed video stream includes a compressed video frame.
- the compressed video stream can be, for example, a HEVC compressed video stream.
- Step F2 Determine a plurality of target video frames from the compressed video stream.
- the target video frame is an independently compression-encoded video frame in the compressed video stream.
- Step F3 Decoding the target video frame to obtain a decoded target video frame.
- the decoded target video frame is used to perform step 202.
- the video frames may be classified. For details, refer to step 504.
- the compression efficiency of these video frames can be improved, and the compressed data amount of these video frames can be reduced.
- the embodiment of the present invention may perform secondary compression on the HEVC compressed video stream. Specifically, after compressed video discrimination, I frame extraction, and intra-frame decoding, an I frame to be used to perform the method of the embodiment of the present invention is obtained.
- a method of the embodiment of the present invention may be implemented by adding a compressed video discriminating, an I frame decimation, and an intra decoding module based on the original video encoding device.
- an I frame extraction operation is performed. Since the HEVC compressed video adopts a hierarchical code stream structure, independent GOP data is extracted according to the image group header in the image group layer according to the code stream hierarchy. Then, each frame of the GOP is extracted according to the image header, and the first frame image of the GOP group is an I frame, and the I frame can be extracted.
- the decoding device performs intra-frame decoding on the extracted I-frame encoded data to obtain decoding.
- the subsequent I frame, residual coding and decoding steps can be referred to the encoding and decoding operations above. In this way, it is possible to perform secondary encoding and decoding of the compressed video on the basis of the original video encoded data.
- the method of the invention can perform secondary encoding and decoding on the existing compressed video data, and is consistent with the traditional HEVC method in the process of transform coding, quantization coding, entropy coding, etc., therefore, when performing the function module deployment of the present invention, Can be compatible with legacy video compression devices.
- the method of the embodiment of the present invention can also be applied to other encoded data, and the steps of extracting and decoding the compressed video frame according to the above steps, and then performing the steps of the video encoding method of FIG. 2, FIG. 5 and FIG. 12 described above.
- the I frame can be determined according to the size of the compressed image data, and usually the I frame encoded data is much larger than the P frame and the B frame encoded data.
- FIG. 18 is a schematic structural diagram of a video encoding apparatus according to an embodiment of the present invention.
- FIG. 18b is a schematic diagram showing a partial structure of a video encoding apparatus according to the embodiment shown in FIG. 18a.
- the video encoding apparatus can be used to perform the video encoding method in the foregoing embodiments.
- the video encoding apparatus includes: acquiring Module 1801, reconstruction module 1802, and prediction encoding module 1803.
- the obtaining module 1801 is configured to perform a process of acquiring a video frame in an embodiment of each of the foregoing video encoding methods.
- the reconstruction module 1802 is configured to perform a process related to the reconfiguration operation to reduce the redundancy of the redundant data in the embodiments of the foregoing video coding methods, for example, step 202, step 505, and step 1206.
- the prediction encoding module 1803 is configured to perform steps of predictive encoding, such as step 203 and step 204, in an embodiment of each of the above video encoding methods.
- the reconstruction module 1802 obtains the scene information and the reconstruction residual after performing the reconstruction operation on the plurality of video frames acquired by the obtaining module 1801, so that the prediction encoding module 1803 predictively encodes the scene information and the reconstructed residual.
- the video encoding device further includes a metric feature extraction module 1804 and a metric information calculation module 1805 between the obtaining module 1801 and the reconstruction module 1802.
- the feature extraction module 1804 is configured to perform a process of extracting picture feature information of a video frame in an embodiment of each of the above video coding methods, for example, steps D1 and E1.
- the metric information calculation module 1805 is configured to perform a process of calculating the content metric information in the embodiment of each of the above video coding methods, for example, steps D2 and E2.
- the video encoding device further includes:
- a reference frame reconstruction module 1806, configured to perform a process of reconstructing a reference frame in an embodiment of each of the foregoing video coding methods
- the inter prediction encoding module 1807 is configured to perform a process related to inter prediction encoding in the embodiments of the foregoing video encoding methods.
- the encoding module 1808 is configured to perform a process of transform coding, quantization coding, and entropy coding in the embodiments of the foregoing video coding methods.
- the reconstruction module 1802 further includes a splitting unit 1809 and a reconstruction unit 1810.
- the reconstruction unit 1810 may reconstruct the frame sub-block obtained by splitting the unit 1809.
- the splitting unit 1809 is configured to perform a process of splitting a video frame in an embodiment of each of the video encoding methods described above, for example, step 1206.
- the reconstruction unit 1810 is configured to perform a process of reconstructing a frame sub-block in an embodiment of each of the foregoing video coding methods, for example, step 1206;
- the reconstruction unit 1810 includes a reconstruction subunit 1811 and a combination subunit 1812.
- the reconstruction sub-unit 1811 is configured to perform a process of reconstructing a frame sub-block to obtain a representation coefficient and a reconstruction residual in an embodiment of each of the above video coding methods.
- the combining sub-unit 1812 is configured to perform a process of combining the target frame sub-blocks in the embodiment of each of the video encoding methods described above.
- the reconstruction unit 1810 may further include a sub-block reconstruction sub-unit 1813 and a sub-block calculation sub-unit 1814.
- the sub-block reconstruction sub-unit 1813 is configured to perform a process of reconstructing a frame sub-block to obtain a scene feature and a representation coefficient in an embodiment of each of the foregoing video coding methods, where the scene feature includes a scene feature base that is independent in the feature space. Feature block.
- the sub-block calculation sub-unit 1814 is for performing a computational reconstruction residual processing procedure in an embodiment for performing the above-described respective video coding methods.
- the video encoding device further includes a classification module 1815 for performing a process involving classification in an embodiment of each of the video encoding methods described above.
- the classification module 1815 includes a feature extraction unit 1816, a distance calculation unit 1817, and a clustering unit 1818.
- the feature extraction unit 1816 is configured to extract feature information of each of the plurality of video frames, and the distance calculation unit 1817 is configured to perform a process of processing the cluster distance in the embodiment of each of the video coding methods.
- the class unit 1818 is configured to perform a process involving clustering in an embodiment of each of the above video coding methods.
- the obtaining module 1801 includes the following units:
- a video stream obtaining unit 1819 configured to acquire a video stream
- a frame feature extraction unit 1820 configured to perform a process of extracting feature information of the first video frame and the second video frame in an embodiment of each of the foregoing video encoding methods
- a lens distance calculation unit 1821 configured to perform a process related to lens distance calculation in an embodiment of each of the above video coding methods
- the lens distance determining unit 1822 is configured to determine whether the lens distance is greater than a preset lens threshold
- a lens dividing unit 1823 configured to perform a process of dividing a target lens in an embodiment of each of the above video encoding methods
- the key frame extracting unit 1824 is configured to perform a process of extracting a key frame according to a frame distance in an embodiment of each of the above video encoding methods.
- the video encoding device further includes:
- the training module 1825 is configured to perform discriminant training according to each shot segmented from the video stream, to obtain a plurality of classifiers corresponding to the shots;
- a discriminating module 1826 configured to determine a target video frame by using a target classifier to obtain a discriminant score
- the scene determining module 1827 is configured to: when the discriminant score is greater than the preset score threshold, determine that the target video frame belongs to the same scene as the shot to which the target classifier belongs;
- the cluster determination module 1828 is configured to determine video frames of one or more clusters according to video frames belonging to the same scene as the shot.
- the obtaining module 1801 includes:
- a compressed video obtaining unit 1829 configured to acquire a compressed video stream, where the compressed video stream includes a compressed video frame
- a frame determining unit 1830 configured to determine, from the compressed video stream, a target video frame, where the target video frame is an independently compressed encoded video frame;
- the decoding unit 1831 is configured to decode the target video frame to obtain a decoded target video frame, where the decoded target video frame is used to perform splitting of each of the plurality of video frames to obtain multiple frames. The steps of the block.
- the obtaining module 1801 acquires a plurality of video frames, and each of the plurality of video frames includes redundant data on the screen content. Then, the reconstruction module 1802 reconstructs the plurality of video frames to obtain scene information and a reconstruction residual of each video frame, where the scene information includes data obtained by reducing redundancy of redundant data, and reconstructing the residual The difference is used to represent the difference between the video frame and the scene information.
- the prediction encoding module 1803 performs predictive coding on the scene information to obtain scene feature prediction encoded data. The prediction encoding module 1803 performs predictive coding on the reconstructed residual to obtain residual prediction encoded data.
- the redundancy of the video frames can be reduced, so that in the encoding operation, the obtained scene features and the reconstructed residual total compressed data amount are relative to the original video.
- the amount of compressed data of the frame is reduced, reducing the amount of data obtained after compression.
- Each video frame is reconstructed into a scene feature and a reconstructed residual. Since the reconstructed residual includes residual information other than the scene information, the amount of information is small and sparse, and the feature can be compared when performing predictive coding.
- the codewords are less predictively encoded, the amount of encoded data is small, and the compression ratio is high.
- the method of the embodiment of the present invention can effectively improve the compression efficiency of a video frame.
- FIG. 19 is a schematic structural diagram of a video decoding device according to an embodiment of the present invention.
- the video decoding device can be used to perform the video decoding method in the foregoing embodiments.
- the video decoding device includes: an obtaining module 1901, a scene information decoding module 1902, a reconstructed residual decoding module 1903, and a video frame reconstruction module. 1904.
- the scene information decoding module 1902 and the reconstructed residual decoding module 1903 respectively perform the decoding operation on the scene feature prediction encoded data and the residual prediction encoded data acquired by the obtaining module 1901, so that the video frame reconstruction module 1904 can reconstruct the data obtained by using the decoding. Get the video frame.
- the obtaining module 1901 is configured to perform a process of acquiring encoded data in an embodiment of each of the foregoing video decoding methods, for example, step 205;
- the scene information decoding module 1902 is configured to perform a process related to decoding scene information in the embodiments of the foregoing video decoding methods, for example, step 206, step 603;
- the reconstructed residual decoding module 1903 is configured to perform a process of decoding the reconstructed residual in the embodiment of each of the foregoing video decoding methods, for example, step 207;
- the video frame reconstruction module 1904 is configured to perform a process of reconstructing a plurality of video frames in an embodiment of each of the video decoding methods, for example, step 208 and step 604.
- the obtaining module 1901 includes an obtaining unit 1905 and a decoding unit 1906.
- the obtaining unit 1905 is configured to perform a process of acquiring video compression data in an embodiment of each of the foregoing video decoding methods, for example, step 601.
- the decoding unit 1906 is configured to perform a process related to obtaining the predicted encoded data in the embodiment of each of the video decoding methods described above, for example, step 602.
- the video decoding apparatus further includes: an inter-frame decoding module 1907, configured to perform a process related to inter-frame decoding in an embodiment of each of the above video decoding methods, for example, step 606;
- the arranging module 1908 is configured to perform a process involving frame alignment in the embodiment of each of the video decoding methods described above, for example, step 607.
- the obtaining module 1901 is further configured to acquire a representation coefficient.
- the video frame reconstruction module 1904 includes a reconstruction unit 1909 and a combination unit 1910.
- the reconstruction unit 1909 is configured to perform a process of reconstructing a plurality of frame sub-blocks in an embodiment of each of the video decoding methods, for example, step 1704.
- the combining unit 1910 is configured to perform a process of combining frame sub-blocks in an embodiment of each of the above video decoding methods, for example, step 1705.
- the scene information decoding module 1902 decodes the scene feature prediction coded data to obtain scene information, where the scene information includes reducing redundant data.
- the redundancy obtained data the redundant data is redundant data on the picture content between each of the plurality of video frames.
- the reconstructed residual decoding module 1903 decodes the residual prediction encoded data to obtain a reconstructed residual, and the reconstructed residual is used to represent the difference between the video frame and the scene information.
- a video frame reconstruction module 1904 configured to perform reconstruction according to the scene information and the reconstructed residual to obtain a plurality of video frames.
- FIG. 20 is a schematic structural diagram of a video codec device according to an embodiment of the present invention.
- the video encoding and decoding device can be used to perform the video encoding method and the video decoding method in the foregoing embodiments.
- the video encoding and decoding device 2000 includes a video encoding device 2001 and a video decoding device 2002.
- the video encoding device 2001 is the video encoding device of the embodiment shown in FIG. 18a and FIG. 18b above;
- the video decoding device 2002 is the video decoding device of the embodiment shown in Fig. 19 described above.
- the video encoding method and the video decoding method provided by the embodiments of the present invention are described below in the hardware architecture.
- a video encoding and decoding system is provided.
- the video frame encoding and decoding system includes a video encoder and a video decoder. .
- video codec system 10 includes source device 12 and destination device 14.
- Source device 12 produces encoded video data.
- source device 12 may be referred to as a video encoding device or a video encoding device.
- Destination device 14 may decode the encoded video data produced by source device 12.
- destination device 14 may be referred to as a video decoding device or a video decoding device.
- Source device 12 and destination device 14 may be examples of video codec devices or video codec devices.
- Source device 12 and destination device 14 may include a wide range of devices including desktop computers, mobile computing devices, notebook (eg, laptop) computers, tablet computers, set top boxes, smart phones, etc., televisions, cameras, display devices , digital media player, video game console, on-board computer, or the like.
- Channel 16 may include one or more media and/or devices capable of moving encoded video data from source device 12 to destination device 14.
- channel 16 may include one or more communication media that enable source device 12 to transmit encoded video data directly to destination device 14 in real time.
- source device 12 may modulate the encoded video data in accordance with a communication standard (eg, a wireless communication protocol) and may transmit the modulated video data to destination device 14.
- the one or more communication media may include wireless and/or wired communication media, such as a radio frequency (RF) spectrum or one or more physical transmission lines.
- RF radio frequency
- the one or more communication media may form part of a packet-based network (eg, a local area network, a wide area network, or a global network (eg, the Internet).
- the one or more communication media may include routers, switches, base stations, or promotions Other devices that communicate from source device 12 to destination device 14.
- channel 16 can include a storage medium that stores encoded video data generated by source device 12.
- destination device 14 can access the storage medium via disk access or card access.
- the storage medium may include a variety of locally accessible data storage media, such as Blu-ray Disc, DVD, CD-ROM, flash memory, or other suitable digital storage medium for storing encoded video data.
- channel 16 can include a file server or another intermediate storage device that stores encoded video data generated by source device 12.
- destination device 14 may access the encoded video data stored at a file server or other intermediate storage device via streaming or download.
- the file server may be a server type capable of storing encoded video data and transmitting the encoded video data to the destination device 14.
- the instance file server includes a web server (eg, for a website), a file transfer protocol (FTP) server, a network attached storage (NAS) device, and a local disk drive.
- FTP file transfer protocol
- NAS network attached storage
- Destination device 14 can access the encoded video data via a standard data connection (e.g., an internet connection).
- a standard data connection e.g., an internet connection.
- An instance type of a data connection includes a wireless channel (eg, a Wi-Fi connection), a wired connection (eg, DSL, cable modem, etc.), or both, suitable for accessing encoded video data stored on a file server. combination.
- the transmission of the encoded video data from the file server may be streaming, downloading, or a combination of both.
- the technology of the present invention is not limited to a wireless application scenario.
- the technology can be applied to video codecs supporting multiple multimedia applications such as aerial television broadcasting, cable television transmission, satellite television transmission, and streaming video. Transmission (eg, via the Internet), encoding of video data stored on a data storage medium, decoding of video data stored on a data storage medium, or other application.
- video codec system 10 may be configured to support one-way or two-way video transmission to support applications such as video streaming, video playback, video broadcasting, and/or video telephony.
- source device 12 includes video source 18, video encoder 20, and output interface 22.
- output interface 22 can include a modulator/demodulator (modem) and/or a transmitter.
- Video source 18 may include a video capture device (eg, a video camera), a video archive containing previously captured video data, a video input interface to receive video data from a video content provider, and/or a computer for generating video data.
- Video encoder 20 may encode video data from video source 18.
- source device 12 transmits the encoded video data directly to destination device 14 via output interface 22.
- the encoded video data may also be stored on a storage medium or file server for later access by the destination device 14 for decoding and/or playback.
- destination device 14 includes an input interface 28, a video decoder 30, and a display device 32.
- input interface 28 includes a receiver and/or a modem.
- Input interface 28 can receive the encoded video data via channel 16.
- Display device 32 may be integral with destination device 14 or may be external to destination device 14. In general, display device 32 displays the decoded video data.
- Display device 32 may include a variety of display devices such as liquid crystal displays (LCDs), plasma displays, organic light emitting diode (OLED) displays, or other types of display devices.
- LCDs liquid crystal displays
- OLED organic light emitting diode
- Video encoder 20 and video decoder 30 may operate in accordance with a video compression standard (eg, the High Efficiency Video Codec H.265 standard) and may conform to the HEVC Test Model (HM).
- a video compression standard eg, the High Efficiency Video Codec H.265 standard
- HM HEVC Test Model
- a textual description of the H.265 standard is published on April 29, 2015, ITU-T.265(V3) (04/2015), available for download from http://handle.itu.int/11.1002/1000/12455 The entire contents of the document are incorporated herein by reference.
- video encoder 20 and video decoder 30 may operate in accordance with other proprietary or industry standards including ITU-TH.261, ISO/IEC MPEG-1 Visual, ITU-TH.262, or ISO/IEC MPEG-2 Visual, ITU. -TH.263, ISO/IECMPEG-4 Visual, ITU-TH.264 (also known as ISO/IEC MPEG-4 AVC), including scalable video codec (SVC) and multiview video codec (MVC) extensions.
- SVC scalable video codec
- MVC multiview video codec
- FIG. 21 is merely an example and the techniques of the present invention are applicable to video codec applications (eg, single-sided video encoding or video decoding) that do not necessarily include any data communication between the encoding device and the decoding device.
- data is retrieved from local memory, data is streamed over a network, or manipulated in a similar manner.
- the encoding device may encode the data and store the data to a memory, and/or the decoding device may retrieve the data from the memory and decode the data.
- encoding and decoding are performed by a plurality of devices that only encode data to and/or retrieve data from the memory and decode the data by not communicating with each other.
- Video encoder 20 and video decoder 30 may each be implemented as any of a variety of suitable circuits, such as one or more microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASICs), field programmable Gate array (FPGA), discrete logic, hardware, or any combination thereof. If the technology is implemented partially or wholly in software, the device may store the instructions of the software in a suitable non-transitory computer readable storage medium, and the instructions in the hardware may be executed using one or more processors to perform the techniques of the present invention. . Any of the foregoing (including hardware, software, a combination of hardware and software, etc.) can be considered as one or more processors. Each of video encoder 20 and video decoder 30 may be included in one or more encoders or decoders, any of which may be integrated into a combined encoder/decoder (codec) in other devices Part of the (CODEC).
- codec combined encoder/decoder
- the invention may generally refer to video encoder 20 "signaling" certain information to another device (e.g., video decoder 30).
- the term “signaling” may generally refer to a syntax element and/or to convey the communication of encoded video data. This communication can occur in real time or near real time. Alternatively, this communication may occur over a time span, such as may occur when encoding the encoded element to a computer readable storage medium at the time of encoding, the syntax element being subsequently decodable after being stored in the medium The device is retrieved at any time.
- the video encoder 20 encodes video data.
- Video data may include one or more pictures.
- Video encoder 20 may generate a code stream that contains encoded information for the video data in the form of a bitstream.
- the encoded information may include encoded picture data and associated data.
- Associated data can include sequence parameter sets (SPS), picture parameter sets (PPS), and other syntax structures.
- SPS sequence parameter sets
- PPS picture parameter sets
- An SPS can contain parameters that are applied to zero or more sequences.
- the PPS can contain parameters that are applied to zero or more pictures.
- a grammatical structure refers to a collection of zero or more syntax elements arranged in a specified order in a code stream.
- video encoder 20 may partition the picture into a raster of coded tree blocks (CTBs).
- CTB coded tree blocks
- a CTB may be referred to as a "tree block,” a "maximum coding unit” (LCU), or a "coding tree unit.”
- the CTB is not limited to a particular size and may include one or more coding units (CUs).
- Each CTB can be associated with a block of pixels of equal size within the picture.
- Each pixel can correspond to one luminance (luminance or luma) sample and two chrominance or chroma samples.
- each CTB can be associated with one luma sample block and two chroma sample blocks.
- the CTB of a picture can be divided into one or more stripes.
- each stripe contains an integer number of CTBs.
- video encoder 20 may generate encoded information for each strip of the picture, i.e., encode the CTB within the strip.
- video encoder 20 may recursively perform quadtree partitioning on the block of pixels associated with the CTB to partition the block of pixels into decreasing blocks of pixels. The smaller block of pixels can be associated with a CU.
- Video encoder 20 may generate one or more prediction units (PUs) that each no longer partition the CU. Each PU of a CU may be associated with a different block of pixels within a block of pixels of the CU. Video encoder 20 may generate predictive pixel blocks for each PU of the CU. Video encoder 20 may use intra prediction or inter prediction to generate predictive pixel blocks for the PU. If video encoder 20 uses intra prediction to generate a predictive pixel block for a PU, video encoder 20 may generate a predictive pixel block for the PU based on the decoded pixels of the picture associated with the PU.
- PUs prediction units
- video encoder 20 may generate predictiveness of the PU based on decoded pixels of one or more pictures that are different from pictures associated with the PU. Pixel block. Video encoder 20 may generate residual pixel blocks of the CU based on the predictive pixel blocks of the PU of the CU. The residual pixel block of the CU may indicate the difference between the sampled value in the predictive pixel block of the PU of the CU and the corresponding sampled value in the initial pixel block of the CU.
- Video encoder 20 may perform recursive quadtree partitioning on the residual pixel blocks of the CU to partition the residual pixel blocks of the CU into one or more smaller residual pixel blocks associated with the transform units (TUs) of the CU. Because the pixels in the pixel block associated with the TU each correspond to one luma sample and two chroma samples, each TU can be associated with one luma residual sample block and two chroma residual sample blocks. Video encoder 20 may apply one or more transforms to the residual sample block associated with the TU to generate a coefficient block (ie, a block of coefficients). The transform can be a DCT transform or a variant thereof.
- the coefficient block is obtained by applying a one-dimensional transform in the horizontal and vertical directions to calculate a two-dimensional transform.
- Video encoder 20 may perform a quantization procedure for each of the coefficients in the coefficient block. Quantization generally refers to the process by which the coefficients are quantized to reduce the amount of data used to represent the coefficients, thereby providing further compression.
- Video encoder 20 may generate a set of syntax elements that represent coefficients in the quantized coefficient block. Video encoder 20 may apply an entropy encoding operation (eg, a context adaptive binary arithmetic coding (CABAC) operation) to some or all of the above syntax elements. To apply CABAC encoding to syntax elements, video encoder 20 may binarize the syntax elements to form a binary sequence that includes one or more bits (referred to as "binary"). Video encoder 20 may encode a portion of the binary using regular encoding, and may use bypass encoding to encode other portions of the binary.
- CABAC context adaptive binary arithmetic coding
- video encoder 20 may apply an inverse quantization and an inverse transform to the transformed coefficient block to reconstruct the residual sample block from the transformed coefficient block.
- Video encoder 20 may add the reconstructed residual sample block to a corresponding sample block of one or more predictive sample blocks to produce a reconstructed sample block.
- video encoder 20 may reconstruct the block of pixels associated with the TU. The pixel block of each TU of the CU is reconstructed in this way until the entire pixel block reconstruction of the CU is completed.
- video encoder 20 may perform a deblocking filtering operation to reduce the blockiness of the block of pixels associated with the CU.
- video encoder 20 may use sample adaptive offset (SAO) to modify the reconstructed block of pixels of the CTB of the picture.
- SAO sample adaptive offset
- video encoder 20 may store the reconstructed blocks of pixels of the CU in a decoded picture buffer for use in generating predictive blocks of pixels for other CUs.
- Video decoder 30 can receive the code stream.
- the code stream contains encoded information of video data encoded by video encoder 20 in the form of a bitstream.
- Video decoder 30 may parse the code stream to extract syntax elements from the code stream.
- video decoder 30 may perform regular decoding on partial bins and may perform bypass decoding on bins of other portions, and the bins in the code stream have mapping relationships with syntax elements, through parsing The binary gets the syntax element.
- Video decoder 30 may reconstruct a picture of the video data based on the syntax elements extracted from the code stream.
- the process of reconstructing video data based on syntax elements is generally reciprocal to the process performed by video encoder 20 to generate syntax elements.
- video decoder 30 may generate a predictive pixel block of a PU of a CU based on syntax elements associated with the CU.
- video decoder 30 may inverse quantize the coefficient blocks associated with the TUs of the CU.
- Video decoder 30 may perform an inverse transform on the inverse quantized coefficient block to reconstruct a residual pixel block associated with the TU of the CU.
- Video decoder 30 may reconstruct a block of pixels of the CU based on the predictive pixel block and the residual pixel block.
- video decoder 30 may perform a deblocking filtering operation to reduce the blockiness of the block of pixels associated with the CU. Additionally, video decoder 30 may perform the same SAO operations as video encoder 20 based on one or more SAO syntax elements. After video decoder 30 performs these operations, video decoder 30 may store the block of pixels of the CU in a decoded picture buffer.
- the decoded picture buffer can provide reference pictures for subsequent motion compensation, intra prediction, and presentation by the display device.
- video encoder 20 includes prediction processing unit 100, residual generation unit 102, transform processing unit 104, quantization unit 106, inverse quantization unit 108, inverse transform processing unit 110, reconstruction unit 112, filter unit 113,
- the picture buffer 114 and the entropy encoding unit 116 are decoded.
- Entropy encoding unit 116 includes a regular CABAC codec engine 118 and a bypass codec engine 120.
- the prediction processing unit 100 includes an inter prediction processing unit 121 and an intra prediction processing unit 126.
- the inter prediction processing unit 121 includes a motion estimation unit 122 and a motion compensation unit 124.
- video encoder 20 may include more, fewer, or different functional components.
- Video encoder 20 receives the video data.
- video encoder 20 may encode each strip of each picture of the video data.
- video encoder 20 may encode each CTB in the strip.
- prediction processing unit 100 may perform quadtree partitioning on the pixel blocks associated with the CTB to divide the block of pixels into decreasing blocks of pixels. For example, prediction processing unit 100 may partition a block of pixels of a CTB into four equally sized sub-blocks, split one or more of the sub-blocks into four equally sized sub-sub-blocks, and the like.
- Video encoder 20 may encode the CU of the CTB in the picture to generate coded information for the CU.
- Video encoder 20 may encode the CU of the CTB according to the fold scan order. In other words, video encoder 20 may encode the CU by the upper left CU, the upper right CU, the lower left CU, and then the lower right CU.
- video encoder 20 may encode the CU associated with the sub-block of the pixel block of the partitioned CU according to the fold scan order.
- prediction processing unit 100 can partition the pixel blocks of the CU in one or more PUs of the CU.
- Video encoder 20 and video decoder 30 can support a variety of PU sizes. Assuming that the size of a particular CU is 2N ⁇ 2N, video encoder 20 and video decoder 30 may support a PU size of 2N ⁇ 2N or N ⁇ N for intra prediction, and support 2N ⁇ 2N, 2N ⁇ N, N ⁇ 2N, N x N or similarly sized symmetric PUs for inter prediction. Video encoder 20 and video decoder 30 may also support asymmetric PUs of 2N x nU, 2N x nD, nL x 2N, and nR x 2N for inter prediction.
- the inter prediction processing unit 121 may generate predictive data of the PU by performing inter prediction on each PU of the CU.
- the predictive data of the PU may include motion information corresponding to the predictive pixel block of the PU and the PU.
- the strip can be an I strip, a P strip or a B strip.
- the inter prediction unit 121 may perform different operations on the PU of the CU depending on whether the PU is in an I slice, a P slice, or a B slice. In the I slice, all PUs perform intra prediction.
- motion estimation unit 122 may search for a reference picture in a list of reference pictures (eg, "List 0") to find a reference block for the PU.
- the reference block of the PU may be the pixel block that most closely corresponds to the pixel block of the PU.
- Motion estimation unit 122 may generate a reference picture index that indicates a reference picture of the PU-containing reference block in list 0, and a motion vector that indicates a spatial displacement between the pixel block of the PU and the reference block.
- the motion estimation unit 122 may output the reference picture index and the motion vector as motion information of the PU.
- Motion compensation unit 124 may generate a predictive pixel block of the PU based on the reference block indicated by the motion information of the PU.
- motion estimation unit 122 may perform uni-directional inter prediction or bi-directional inter prediction on the PU.
- motion estimation unit 122 may search for a reference picture of a first reference picture list ("List 0") or a second reference picture list ("List 1") to find a reference block for the PU.
- the motion estimation unit 122 may output the following as the motion information of the PU: a reference picture index indicating a position in the list 0 or the list 1 of the reference picture containing the reference block, a space between the pixel block indicating the PU and the reference block The motion vector of the displacement, and the prediction direction indicator indicating whether the reference picture is in list 0 or in list 1.
- motion estimation unit 122 may search for reference pictures in list 0 to find reference blocks for the PU, and may also search for reference pictures in list 1 to find another reference block for the PU.
- Motion estimation unit 122 may generate a reference picture index indicating the list 0 of the reference picture containing the reference block and the location in list 1. Additionally, motion estimation unit 122 may generate a motion vector that indicates a spatial displacement between the reference block and the pixel block of the PU.
- the motion information of the PU may include a reference picture index of the PU and a motion vector.
- Motion compensation unit 124 may generate a predictive pixel block of the PU based on the reference block indicated by the motion information of the PU.
- Intra prediction processing unit 126 may generate predictive data for the PU by performing intra prediction on the PU.
- the predictive data of the PU may include predictive pixel blocks of the PU and various syntax elements.
- Intra prediction processing unit 126 may perform intra prediction on I slices, P slices, and PUs within B slices.
- intra-prediction processing unit 126 may use multiple intra-prediction modes to generate multiple sets of predictive data for the PU.
- intra-prediction processing unit 126 may spread samples of sample blocks from neighboring PUs across sample blocks of the PU in a direction associated with the intra-prediction mode. It is assumed that the coding order from left to right and from top to bottom is used for PU, CU and CTB, and the adjacent PU may be above the PU, at the upper right of the PU, at the upper left of the PU or to the left of the PU.
- Intra prediction processing unit 126 may use a different number of intra prediction modes, for example, 33 directional intra prediction modes. In some examples, the number of intra prediction modes may depend on the size of the pixel block of the PU.
- the prediction processing unit 100 may select the predictive data of the PU of the CU from among the predictive data generated by the inter prediction processing unit 121 for the PU or the predictive data generated by the intra prediction processing unit 126 for the PU. In some examples, prediction processing unit 100 selects predictive data for the PU of the CU based on the rate/distortion metric of the set of predictive data. For example, a Lagrangian cost function is used to select between an encoding mode and its parameter values, such as motion vectors, reference indices, and intra prediction directions.
- a predictive pixel block that selects predictive data may be referred to herein as a selected predictive pixel block.
- Residual generation unit 102 may generate a residual pixel block of the CU based on the pixel block of the CU and the selected predictive pixel block of the PU of the CU. For example, the residual generation unit 102 may generate a residual pixel block of the CU such that each sample in the residual pixel block has a value equal to a difference between: a sample in a pixel block of the CU, and a PU of the CU Corresponding samples in the predictive pixel block are selected.
- the prediction processing unit 100 may perform quadtree partitioning to partition the residual pixel block of the CU into sub-blocks. Each residual pixel block that is no longer partitioned may be associated with a different TU of the CU. The size and location of the residual pixel block associated with the TU of the CU is not necessarily related to the size and location of the pixel block of the CU-based PU.
- Transform processing unit 104 may generate a coefficient block for each TU of the CU by applying one or more transforms to the residual sample block associated with the TU. For example, transform processing unit 104 may apply a discrete cosine transform (DCT), a directional transform, or a conceptually similar transform to the residual sample block.
- DCT discrete cosine transform
- Quantization unit 106 may quantize the coefficients in the coefficient block. For example, an n-bit coefficient can be truncated to an m-bit coefficient during quantization, where n is greater than m. Quantization unit 106 may quantize the coefficient block associated with the TU of the CU based on a quantization parameter (QP) value associated with the CU. Video encoder 20 may adjust the degree of quantization applied to the coefficient block associated with the CU by adjusting the QP value associated with the CU.
- QP quantization parameter
- Inverse quantization unit 108 and inverse transform processing unit 110 may apply inverse quantization and inverse transform, respectively, to the transformed coefficient block to reconstruct the residual sample block from the coefficient block.
- Reconstruction unit 112 may add samples of the reconstructed residual sample block to corresponding samples of one or more predictive sample blocks generated by prediction processing unit 100 to generate a reconstructed sample block associated with the TU. By reconstructing the sample block of each TU of the CU in this manner, video encoder 20 may reconstruct the block of pixels of the CU.
- Filter unit 113 may perform a deblocking filtering operation to reduce blockiness of pixel blocks associated with the CU. Further, the filter unit 113 may apply the SAO offset determined by the prediction processing unit 100 to the reconstructed sample block to restore the pixel block. Filter unit 113 may generate encoding information for the SAO syntax elements of the CTB.
- the decoded picture buffer 114 can store the reconstructed block of pixels.
- Inter prediction unit 121 may perform inter prediction on PUs of other pictures using reference pictures containing the reconstructed pixel blocks.
- intra-prediction processing unit 126 can use the reconstructed block of pixels in decoded picture buffer 114 to perform intra-prediction on other PUs in the same picture as the CU.
- Entropy encoding unit 116 may receive data from other functional components of video encoder 20. For example, entropy encoding unit 116 may receive a coefficient block from quantization unit 106 and may receive syntax elements from prediction processing unit 100. Entropy encoding unit 116 may perform one or more entropy encoding operations on the data to generate entropy encoded data. For example, entropy encoding unit 116 may perform context adaptive variable length codec (CAVLC) operations, CABAC operations, variable to variable (V2V) length codec operations, grammar-based context adaptive binary arithmetic coding on data. Decoding (SBAC) operations, probability interval partition entropy (PIPE) codec operations, or other types of entropy coding operations. In a particular example, entropy encoding unit 116 may encode regular CABAC codec bins of syntax elements using regular CABAC engine 118, and may encode pass-through codec bins using bypass codec engine 120.
- CAVLC context adaptive variable length codec
- video decoder 30 includes an entropy decoding unit 150, a prediction processing unit 152, an inverse quantization unit 154, an inverse transform processing unit 156, a reconstruction unit 158, a filter unit 159, and a decoded picture buffer 160.
- the prediction processing unit 152 includes a motion compensation unit 162 and an intra prediction processing unit 164.
- Entropy decoding unit 150 includes a regular CABAC codec engine 166 and a bypass codec engine 168. In other examples, video decoder 30 may include more, fewer, or different functional components.
- Video decoder 30 can receive the code stream.
- Entropy decoding unit 150 may parse the code stream to extract syntax elements from the code stream. As part of parsing the code stream, entropy decoding unit 150 may parse the entropy encoded syntax elements in the code stream.
- the prediction processing unit 152, the inverse quantization unit 154, the inverse transform processing unit 156, the reconstruction unit 158, and the filter unit 159 may decode the video data according to the syntax elements extracted from the code stream, that is, generate the decoded video data.
- the syntax elements may include a regular CABAC codec binary and a bypass codec binary.
- Entropy decoding unit 150 may use a regular CABAC codec engine 166 to decode the regular CABAC codec bins, and may use the bypass codec engine 168 to decode the bypass codec bins.
- intra prediction processing unit 164 may perform intra prediction to generate a predictive sample block for the PU.
- Intra-prediction processing unit 164 may use an intra-prediction mode to generate a predictive pixel block of a PU based on a block of pixels of a spatially neighboring PU.
- Intra prediction processing unit 164 may determine an intra prediction mode for the PU based on one or more syntax elements parsed from the code stream.
- Motion compensation unit 162 may construct a first reference picture list (List 0) and a second reference picture list (List 1) based on syntax elements parsed from the code stream. Furthermore, if the PU uses inter prediction coding, the entropy decoding unit 150 may parse the motion information of the PU. Motion compensation unit 162 can determine one or more reference blocks of the PU based on the motion information of the PU. Motion compensation unit 162 can generate a predictive pixel block of the PU from one or more reference blocks of the PU.
- video decoder 30 may perform a reconstruction operation on a CU that is no longer split. To perform a reconstruction operation on a CU that is no longer split, video decoder 30 may perform a reconstruction operation on each TU of the CU. By performing a reconstruction operation on each TU of the CU, video decoder 30 may reconstruct the residual pixel blocks associated with the CU.
- inverse quantization unit 154 may inverse quantize (ie, dequantize) the coefficient block associated with the TU. Inverse quantization unit 154 may determine the degree of quantization using the QP value associated with the CU of the TU, and determine the degree of inverse quantization that the inverse quantization unit 154 will apply.
- inverse transform processing unit 156 may apply one or more inverse transforms to the coefficient block to generate a residual sample block associated with the TU.
- inverse transform processing unit 156 may map inverse DCT, inverse integer transform, Karhunen-Loeve transform (KLT), inverse rotation transform, inverse directional transform, or other transform to the encoding end. The inverse transform is applied to the coefficient block.
- Reconstruction unit 158 may use the residual pixel block associated with the TU of the CU and the predictive pixel block of the PU of the CU (ie, intra-prediction data or inter-prediction data) to reconstruct the block of pixels of the CU, where applicable.
- reconstruction unit 158 can add samples of the residual pixel block to corresponding samples of the predictive pixel block to reconstruct the pixel block of the CU.
- Filter unit 159 may perform a deblocking filtering operation to reduce the blockiness of the block of pixels associated with the CU of the CTB. Additionally, filter unit 159 can modify the pixel values of the CTB based on the SAO syntax elements parsed from the code stream. For example, filter unit 159 can determine the correction value based on the SAO syntax element of the CTB and add the determined correction value to the sample value in the reconstructed pixel block of the CTB. By modifying some or all of the pixel values of the CTB of the picture, the filter unit 159 can modify the reconstructed picture of the video data according to the SAO syntax element.
- Video decoder 30 may store the block of pixels of the CU in decoded picture buffer 160.
- the decoded picture buffer 160 may provide reference pictures for subsequent motion compensation, intra prediction, and presentation by a display device (eg, display device 32 of FIG. 21).
- video decoder 30 may perform intra-prediction operations or inter-prediction operations on PUs of other CUs according to the blocks of pixels in decoded picture buffer 160.
- the video encoder of the embodiment of the present invention may be used to perform the video encoding method of the foregoing embodiments, and the functional modules of the video encoding apparatus shown in FIG. 18a and FIG. 18b may be integrated into the video encoder 20 of the embodiment of the present invention.
- the video encoder can be used to perform the video encoding method of the embodiment shown in FIG. 2, FIG. 5 or FIG. 12 described above.
- video encoder 20 acquires a plurality of video frames, each of which includes redundant data on the picture content. Then, the video encoder 20 reconstructs the plurality of video frames to obtain scene information and reconstruction residuals of each video frame, where the scene information includes redundant data to reduce redundancy, and the residual is used for reconstruction. Indicates the difference between the video frame and the scene information. Next, the video encoder 20 predictively encodes the scene information to obtain scene feature prediction encoded data. The video encoder 20 predictively encodes the reconstructed residual to obtain residual prediction encoded data.
- the redundancy of the video frames can be reduced, so that in the encoding operation, the obtained scene features and the reconstructed residual total compressed data amount are relative to the original video.
- the amount of compressed data of the frame is reduced, reducing the amount of data obtained after compression.
- Each video frame is reconstructed into a scene feature and a reconstructed residual. Since the reconstructed residual includes residual information other than the scene information, the amount of information is small and sparse, and the feature can be compared when performing predictive coding.
- the codewords are less predictively encoded, the amount of encoded data is small, and the compression ratio is high.
- the method of the embodiment of the present invention can effectively improve the compression efficiency of a video frame.
- a video decoder is further provided, where the video decoder can be used to perform the video decoding method of the foregoing embodiments, and the functional modules of the video decoding device shown in FIG. 19 can also be integrated.
- the video decoder 30 of an embodiment of the invention can be used to perform the video decoding method of the embodiment shown in FIG. 2, FIG. 6, or FIG.
- the video decoder 30 decodes the scene feature prediction encoded data to obtain scene information, where the scene information includes redundant data and reduces redundancy.
- the data the redundant data is redundant data on the picture content between each of the plurality of video frames.
- video decoder 30 decodes the residual prediction encoded data to obtain a reconstructed residual, which is used to represent the difference between the video frame and the scene information.
- a video decoder 30, configured to perform reconstruction according to the scene information and the reconstructed residual to obtain a plurality of video frames.
- the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted as one or more instructions or code via a computer-readable medium and executed by a hardware-based processing unit.
- the computer readable medium can comprise a computer readable storage medium (which corresponds to a tangible medium such as a data storage medium) or a communication medium comprising, for example, any medium that facilitates transfer of the computer program from one place to another in accordance with a communication protocol. .
- computer readable media generally may correspond to (1) a non-transitory tangible computer readable storage medium, or (2) a communication medium such as a signal or carrier wave.
- Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for use in carrying out the techniques described herein.
- the computer program product can comprise a computer readable medium.
- Some computer-readable storage media may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage, disk storage or other magnetic storage device, flash memory, or may be used to store instructions or data structures, by way of example and not limitation. Any other medium in the form of the desired program code and accessible by the computer. Also, any connection is properly termed a computer-readable medium. For example, if you use coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technology (eg, infrared, radio, and microwave) to send commands from a website, server, or other remote source, coaxial cable , fiber optic cable, twisted pair, DSL, or wireless technologies (eg, infrared, radio, and microwave) are included in the definition of the media.
- coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technology eg, infrared, radio, and microwave
- a magnetic disk and an optical disk include a compact disk (CD), a laser disk, an optical disk, a digital video disk (DVD), a flexible disk, and a Blu-ray disk, wherein the disk usually reproduces data magnetically, and the disk passes the laser Optically copy data. Combinations of the above should also be included within the scope of computer readable media.
- processors such as one or more digital signal processors (DSPs), general purpose microprocessors, application specific integrated circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuits
- DSPs digital signal processors
- ASICs application specific integrated circuits
- FPGAs field programmable logic arrays
- processors may refer to any of the foregoing structures or any other structure suitable for implementing the techniques described herein.
- the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combined codec.
- the techniques can be fully implemented in one or more circuits or logic elements.
- the techniques of the present invention can be broadly implemented by a variety of devices or devices, including a wireless handset, an integrated circuit (IC), or a collection of ICs (eg, a chipset).
- IC integrated circuit
- Various components, modules or units are described in this disclosure to emphasize functional aspects of the apparatus configured to perform the disclosed techniques, but are not necessarily required to be implemented by different hardware units. Rather, as described above, various units may be combined in a codec hardware unit or combined with suitable software and/or by a collection of interoperable hardware units (including one or more processors as described above). Or firmware to provide.
- system and “network” are used interchangeably herein. It should be understood that the term “and/or” herein is merely an association relationship describing an associated object, indicating that there may be three relationships, for example, A and/or B, which may indicate that A exists separately, and A and B exist simultaneously. There are three cases of B alone. In addition, the character "/" in this article generally indicates that the contextual object is an "or" relationship.
- B corresponding to A means that B is associated with A, and B can be determined from A.
- determining B from A does not mean that B is only determined based on A, and that B can also be determined based on A and/or other information.
- the disclosed systems, devices, and methods may be implemented in other manners.
- the device embodiments described above are merely illustrative.
- the division of the unit is only a logical function division.
- there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
- the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
- the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
- each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
- the disclosed system, apparatus, and method may be implemented in other manners.
- the device embodiments described above are merely illustrative.
- the division of the unit is only a logical function division.
- there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
- the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
- the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
- each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
- the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
- the integrated unit if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium.
- the technical solution of the present invention which is essential or contributes to the prior art, or all or part of the technical solution, may be embodied in the form of a software product stored in a storage medium.
- a number of instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention.
- the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like. .
- the computer program product includes one or more computer instructions.
- the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable device.
- the computer instructions can be stored in a computer readable storage medium or transferred from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions can be from a website site, computer, server or data center Transfer to another website site, computer, server, or data center by wire (eg, coaxial cable, fiber optic, digital subscriber line (DSL), or wireless (eg, infrared, wireless, microwave, etc.).
- wire eg, coaxial cable, fiber optic, digital subscriber line (DSL), or wireless (eg, infrared, wireless, microwave, etc.).
- the computer readable storage medium can be any available media that can be stored by a computer or a data storage device such as a server, data center, or the like that includes one or more available media.
- the usable medium may be a magnetic medium (eg, a floppy disk, a hard disk, a magnetic tape), an optical medium (eg, a DVD), or a semiconductor medium (such as a solid state disk (SSD)).
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- Discrete Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
Des modes de réalisation de la présente invention concernent un procédé de codage vidéo, un procédé de décodage vidéo et un dispositif correspondant destinés à améliorer l'efficacité de la compression des trames vidéo. Le procédé selon les modes de réalisation de la présente invention consiste : à acquérir une pluralité de trames vidéo, des données de contenu d'image redondantes existant entre les trames vidéo respectives de la pluralité de trames vidéo; à reconstruire la pluralité de trames vidéo, pour obtenir des informations de scène et un résidu de reconstruction de la trame vidéo respective, les informations de scène comprenant des données obtenues par réduction de la redondance des données redondantes et le résidu de reconstruction étant utilisé pour représenter une différence entre la trame vidéo et les informations de scène; et à effectuer respectivement un codage prédictif des informations de scène et des résidus de reconstruction, pour obtenir des données de codage prédictives des caractéristiques de scène et des données de codage prédictives résiduelles. Selon l'invention, une redondance entre les trames vidéo est réduite et un volume de données obtenu après la compression est réduit. En outre, la trame vidéo respective est reconstruite en caractéristiques de scène et en un résidu de reconstruction. Des résidus de reconstruction sont codés en fonction du codage résiduel, produisant un petit volume de données codées et un taux de compression élevé. Le procédé selon les modes de réalisation de la présente invention peut ainsi améliorer efficacement l'efficacité de compression des trames vidéo.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710169486.5 | 2017-03-21 | ||
CN201710169486.5A CN108632625B (zh) | 2017-03-21 | 2017-03-21 | 一种视频编码方法、视频解码方法和相关设备 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2018171596A1 true WO2018171596A1 (fr) | 2018-09-27 |
Family
ID=63584112
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2018/079699 Ceased WO2018171596A1 (fr) | 2017-03-21 | 2018-03-21 | Procédé de codage vidéo, procédé de décodage vidéo et dispositif correspondant |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN108632625B (fr) |
WO (1) | WO2018171596A1 (fr) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11711449B2 (en) | 2021-12-07 | 2023-07-25 | Capital One Services, Llc | Compressing websites for fast data transfers |
Families Citing this family (35)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111383245B (zh) * | 2018-12-29 | 2023-09-22 | 北京地平线机器人技术研发有限公司 | 视频检测方法、视频检测装置和电子设备 |
CN109714602B (zh) * | 2018-12-29 | 2022-11-01 | 武汉大学 | 一种基于背景模板和稀疏编码的无人机视频压缩方法 |
KR20250117726A (ko) * | 2019-03-18 | 2025-08-05 | 프라운호퍼 게젤샤프트 쭈르 푀르데룽 데어 안겐반텐 포르슝 에. 베. | 신경망의 매개변수를 압축하기 위한 방법 및 장치 |
CN110263650B (zh) * | 2019-05-22 | 2022-02-22 | 北京奇艺世纪科技有限公司 | 行为类别检测方法、装置、电子设备和计算机可读介质 |
CN110427517B (zh) * | 2019-07-18 | 2023-04-25 | 华戎信息产业有限公司 | 一种基于场景词典树的图搜视频方法,装置及计算机可读存储介质 |
CN110554405B (zh) * | 2019-08-27 | 2021-07-30 | 华中科技大学 | 一种基于组合聚类的正态扫描配准方法和系统 |
CN110572675B (zh) * | 2019-09-27 | 2023-11-14 | 腾讯科技(深圳)有限公司 | 视频解码、编码方法和装置、存储介质与解码器、编码器 |
CN113196779B (zh) * | 2019-10-10 | 2022-05-20 | 无锡安科迪智能技术有限公司 | 视频片段压缩的方法与装置 |
CN111083498B (zh) * | 2019-12-18 | 2021-12-21 | 杭州师范大学 | 用于视频编码帧间环路滤波的模型训练方法和使用方法 |
CN111083499A (zh) * | 2019-12-31 | 2020-04-28 | 合肥图鸭信息科技有限公司 | 一种视频帧重构方法、装置及终端设备 |
CN111212288B (zh) * | 2020-01-09 | 2022-10-04 | 广州虎牙科技有限公司 | 视频数据的编解码方法、装置、计算机设备和存储介质 |
CN111181568A (zh) * | 2020-01-10 | 2020-05-19 | 深圳花果公社商业服务有限公司 | 数据压缩装置及方法、数据解压装置及方法 |
CN111223438B (zh) * | 2020-03-11 | 2022-11-04 | Tcl华星光电技术有限公司 | 像素补偿表的压缩方法及装置 |
CN111654724B (zh) * | 2020-06-08 | 2021-04-06 | 上海纽菲斯信息科技有限公司 | 一种视频会议系统的低码率编码传输方法 |
CN112004085B (zh) * | 2020-08-14 | 2023-07-07 | 北京航空航天大学 | 一种场景语义分割结果指导下的视频编码方法 |
CN111953973B (zh) * | 2020-08-31 | 2022-10-28 | 中国科学技术大学 | 支持机器智能的通用视频压缩编码方法 |
CN112084949B (zh) * | 2020-09-10 | 2022-07-19 | 上海交通大学 | 视频实时识别分割和检测方法及装置 |
US11494700B2 (en) | 2020-09-16 | 2022-11-08 | International Business Machines Corporation | Semantic learning in a federated learning system |
CN114257818B (zh) * | 2020-09-22 | 2024-09-24 | 阿里巴巴达摩院(杭州)科技有限公司 | 视频的编、解码方法、装置、设备和存储介质 |
CN112184843B (zh) * | 2020-11-09 | 2021-06-29 | 新相微电子(上海)有限公司 | 图像数据压缩的冗余数据去除系统及方法 |
CN113852850B (zh) * | 2020-11-24 | 2024-01-09 | 广东朝歌智慧互联科技有限公司 | 音视频流播放装置 |
CN112770116B (zh) * | 2020-12-31 | 2021-12-07 | 西安邮电大学 | 用视频压缩编码信息提取视频关键帧的方法 |
CN112802485B (zh) * | 2021-04-12 | 2021-07-02 | 腾讯科技(深圳)有限公司 | 语音数据处理方法、装置、计算机设备及存储介质 |
CN113784108B (zh) * | 2021-08-25 | 2022-04-15 | 盐城香农智能科技有限公司 | 一种基于5g传输技术的vr旅游观光方法及系统 |
CN114222133B (zh) * | 2021-12-10 | 2024-08-20 | 上海大学 | 一种基于分类的内容自适应vvc帧内编码快速划分方法 |
CN114374845B (zh) * | 2021-12-21 | 2022-08-02 | 北京中科智易科技有限公司 | 自动压缩加密的存储系统和设备 |
CN114390314B (zh) * | 2021-12-30 | 2024-06-18 | 咪咕文化科技有限公司 | 可变帧率音视频处理方法、设备及存储介质 |
CN114449241B (zh) * | 2022-02-18 | 2024-04-02 | 复旦大学 | 一种适用于图像压缩的色彩空间转化算法 |
CN114422803B (zh) * | 2022-03-30 | 2022-08-05 | 浙江智慧视频安防创新中心有限公司 | 一种视频处理方法、装置及设备 |
CN116527912A (zh) * | 2023-03-28 | 2023-08-01 | 阿里巴巴(中国)有限公司 | 编码视频数据处理方法和视频编码处理器 |
CN116437102B (zh) * | 2023-06-14 | 2023-10-20 | 中国科学技术大学 | 可学习通用视频编码方法、系统、设备及存储介质 |
CN117651148B (zh) * | 2023-11-01 | 2024-07-19 | 广东联通通信建设有限公司 | 一种物联网终端管控方法 |
CN118368423B (zh) * | 2024-06-19 | 2024-10-15 | 摩尔线程智能科技(北京)有限责任公司 | 一种视频编码方法及视频编码器、电子设备和存储介质 |
CN118972590B (zh) * | 2024-10-15 | 2024-12-17 | 中科方寸知微(南京)科技有限公司 | 基于自然语言引导的场景自适应视频压缩方法及系统 |
CN120281905B (zh) * | 2025-06-06 | 2025-09-16 | 深圳金三立视频科技股份有限公司 | 视频编码方法、装置、电子设备及存储介质 |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101742319A (zh) * | 2010-01-15 | 2010-06-16 | 北京大学 | 基于背景建模的静态摄像机视频压缩方法与系统 |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10542265B2 (en) * | 2014-09-09 | 2020-01-21 | Dolby Laboratories Licensing Corporation | Self-adaptive prediction method for multi-layer codec |
-
2017
- 2017-03-21 CN CN201710169486.5A patent/CN108632625B/zh active Active
-
2018
- 2018-03-21 WO PCT/CN2018/079699 patent/WO2018171596A1/fr not_active Ceased
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101742319A (zh) * | 2010-01-15 | 2010-06-16 | 北京大学 | 基于背景建模的静态摄像机视频压缩方法与系统 |
Non-Patent Citations (1)
Title |
---|
ZHANG, XIAOYUN ET AL.: "Research On Hevc Coding Based On Alternating Background Model", COMPUTER APPLICATIONS AND SOFTWARE, vol. 34, no. 3, 15 March 2017 (2017-03-15), pages 131 - 135, ISSN: 1000-386X * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11711449B2 (en) | 2021-12-07 | 2023-07-25 | Capital One Services, Llc | Compressing websites for fast data transfers |
Also Published As
Publication number | Publication date |
---|---|
CN108632625B (zh) | 2020-02-21 |
CN108632625A (zh) | 2018-10-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108632625B (zh) | 一种视频编码方法、视频解码方法和相关设备 | |
US11638007B2 (en) | Codebook generation for cloud-based video applications | |
Wang et al. | Towards analysis-friendly face representation with scalable feature and texture compression | |
TWI830107B (zh) | 通過指示特徵圖資料進行編碼 | |
WO2017071480A1 (fr) | Procédé de décodage de trame de référence | |
WO2018001207A1 (fr) | Procédé et appareil de codage et de décodage | |
Makar et al. | Interframe coding of feature descriptors for mobile augmented reality | |
CN103202017A (zh) | 使用基于样本的数据修剪的视频解码 | |
US20180376151A1 (en) | Method and device for picture encoding and decoding | |
US20240414363A1 (en) | Npu and decoder capable of transceiving feature map | |
US20130163671A1 (en) | Reduction of spatial predictors in video compression | |
JP2024056596A (ja) | 多次元データの符号化におけるエンドツーエンド特徴圧縮のためのシステム及び方法 | |
Megala et al. | State-of-the-art in video processing: compression, optimization and retrieval | |
Baroffio et al. | Hybrid coding of visual content and local image features | |
WO2023279968A1 (fr) | Appareil et procédé de codage et de décodage d'une image vidéo | |
WO2024005659A1 (fr) | Sélection adaptative de paramètres de codage entropique | |
KR102072576B1 (ko) | 데이터 인코딩 및 디코딩 장치와 방법 | |
Rabie et al. | PixoComp: a novel video compression scheme utilizing temporal pixograms | |
Anandan et al. | Nonsubsampled contourlet transform based video compression using Huffman and run length encoding for multimedia applications | |
Kufa et al. | Quality comparison of 360° 8K images compressed by conventional and deep learning algorithms | |
KR102804777B1 (ko) | 영상의 압축 영역에서 딥러닝 행동 인식 방법 및 이를 위한 장치 | |
Ye et al. | A novel image compression framework at edges | |
US11831887B1 (en) | Scalable video coding for machine | |
US20240364890A1 (en) | Compression of bitstream indexes for wide scale parallel entropy coding in neural-based video codecs | |
WO2025148762A1 (fr) | Procédé et infrastructure de compression avec post-traitement pour la vision artificielle |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18772429 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18772429 Country of ref document: EP Kind code of ref document: A1 |