CN113068029A - Video decoding method and system for mobile terminal, storage medium and electronic equipment - Google Patents
Video decoding method and system for mobile terminal, storage medium and electronic equipment Download PDFInfo
- Publication number
- CN113068029A CN113068029A CN202110230999.9A CN202110230999A CN113068029A CN 113068029 A CN113068029 A CN 113068029A CN 202110230999 A CN202110230999 A CN 202110230999A CN 113068029 A CN113068029 A CN 113068029A
- Authority
- CN
- China
- Prior art keywords
- data
- super
- processing
- resolution
- coding
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 77
- 238000012545 processing Methods 0.000 claims abstract description 83
- 238000001914 filtration Methods 0.000 claims abstract description 8
- 238000012549 training Methods 0.000 claims description 42
- 238000005457 optimization Methods 0.000 claims description 23
- 230000008569 process Effects 0.000 claims description 12
- 238000013139 quantization Methods 0.000 claims description 12
- 238000005070 sampling Methods 0.000 claims description 10
- 238000010606 normalization Methods 0.000 claims description 7
- 238000005192 partition Methods 0.000 claims description 5
- 230000004927 fusion Effects 0.000 claims description 4
- 230000010354 integration Effects 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 abstract description 11
- 230000006870 function Effects 0.000 description 15
- 238000010586 diagram Methods 0.000 description 13
- 230000009286 beneficial effect Effects 0.000 description 7
- 238000004590 computer program Methods 0.000 description 7
- 230000000694 effects Effects 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 5
- 238000011156 evaluation Methods 0.000 description 5
- 238000013528 artificial neural network Methods 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 238000013524 data verification Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 2
- 230000003321 amplification Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000003199 nucleic acid amplification method Methods 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 241000699670 Mus sp. Species 0.000 description 1
- 230000000903 blocking effect Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
- 
        - H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/117—Filters, e.g. for pre-processing or post-processing
 
- 
        - H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/119—Adaptive subdivision aspects, e.g. subdivision of a picture into rectangular or non-rectangular coding blocks
 
- 
        - H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/124—Quantisation
 
- 
        - H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/90—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
- H04N19/96—Tree coding, e.g. quad-tree coding
 
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
The invention discloses a video decoding method, a video decoding system, a storage medium and electronic equipment for a mobile terminal, wherein the method comprises the following steps: acquiring a video and carrying out HEVC decoding to obtain a plurality of coding blocks; respectively optimizing the coding blocks according to the types of the coding blocks to obtain optimized frame data; and performing loop filtering processing on the optimized frame data and playing. The system realizes the video decoding method for the mobile terminal, reduces the super-resolution calculation amount of the video from the two aspects of the super-resolution model and decoding, realizes the real-time super-resolution of the video at the mobile terminal, and accordingly improves the watching experience of users.
    Description
Technical Field
      The present invention relates to the field of image processing technologies, and in particular, to a video decoding method and system for a mobile terminal, a storage medium, and an electronic device.
    Background
      Super-resolution is a technique for recovering a high resolution picture from a low resolution picture or a sequence of low resolution pictures. In recent years, a method based on a deep Convolutional Neural Network (CNN) makes remarkable progress in the aspect of image overdivision, and a plurality of excellent image overdivision methods appear, but most of the existing overdivision methods only pay attention to peak signal-to-noise ratio (PSNR) and some image subjective effects, and do not consider the complexity and the calculation amount of a model, so that the calculation amount is large, even on some high-end machine learning hosts, the model is operated slowly, and the difficulty in realizing the existing overdivision methods on mobile equipment is large.
      For example, the network structure of the SRCNN hyper-segmentation method includes three convolutional layers, amplifies a low-resolution image to a target size using bicubic (bicubic) interpolation, then fits nonlinear mapping through three layers of convolutional networks, and finally outputs a high-resolution image result. The method comprises the following three steps: extracting image blocks, performing feature representation, performing feature nonlinear mapping and finally reconstructing. However, the convolution kernel in the SRCNN over-segmentation method is large, the image dimension of data is large, the calculation is complex, the depth of the neural network is shallow, and the over-segmentation effect is not good.
      As another example, the network input of the ESPCN over-division method is an original low-resolution image, and after passing through three convolution layers, the number of channels r is obtained2The feature image having the same size as the input image. Then r of each pixel of the characteristic image is calculated2The channels are rearranged into an r x r region corresponding to a sub-block of r x r size in the high resolution image, thereby having a size of H x W x r2Is rearranged into a high resolution image of rH × rW × 1. However, the neural network depth of the ESPCN super-resolution method is not enough, the super-resolution effect is not good, and a complete picture is super-resolved each time, and the efficiency is relatively low without considering the cooperation of the decoding end.
      Therefore, how to combine with the decoder and reduce the amount of video super-resolution calculation on the mobile terminal with lower performance to realize real-time optimization of video stream on the mobile terminal without significantly increasing the power consumption of the mobile terminal becomes a big problem.
    Disclosure of Invention
      The invention aims to provide a video decoding method, a video decoding system, a storage medium and electronic equipment for a mobile terminal, wherein the decoding process is combined with a super-resolution technology, so that real-time optimization of video streams is realized on the mobile terminal with lower performance, the super-resolution calculation amount of videos is reduced, and the power consumption of the mobile terminal is not increased remarkably, so that the watching experience of a user is improved.
      In order to achieve the above purpose, the invention provides the following technical scheme:
      a video decoding method for a mobile terminal, comprising:
      acquiring a video and carrying out HEVC decoding to obtain a plurality of coding blocks;
      respectively optimizing the coding blocks according to the types of the coding blocks to obtain optimized coding blocks; wherein the optimization process comprises: performing hyper-resolution processing, double-cube stretching processing or copying hyper-resolution data of a reference frame;
      and integrating the optimized coding blocks into a plurality of pieces of frame data, and performing loop filtering processing on the frame data and playing the frame data.
      Specifically, the method for performing optimization processing on the coding blocks according to the types of the coding blocks includes:
      judging the type of the coding block, wherein the type comprises an inter-frame block, an intra-frame block and an SKIP block;
      if the type of the coding block is an inter-frame block, performing super-division processing on the coding block by using a super-division model;
      if the type of the coding block is an intra block, further judging a quantization parameter QP of the coding block; if the quantization parameter QP is not less than 30, performing double cubic stretching processing on the coding block, and if the quantization parameter QP is less than 30, performing super-division processing on the coding block by using a super-division model;
      and if the coding block is an SKIP block, copying the super-divided data of the reference frame.
      Preferably, when the coding blocks are optimized according to the types of the coding blocks, for the coding blocks with residual errors, corresponding super-division proportional amplification needs to be performed on the residual errors.
      Further, the method for the super-division model to super-divide the coding blocks includes:
      s1, obtaining the feature data of the coding block by convolution operation, and marking the feature data as original feature data;
      s2, channel splitting is carried out on the characteristic data according to the formula 1(m-1), and the characteristic data of the 1/m channel is extracted and marked as hyper-split characteristic data;
      s3, judging whether the number of the hyper-resolution characteristic data is equal to (m-1); if not equal to m-1, go to step S4; if m-1, go to step S5;
      s4, amplifying the feature data of the residual (m-1)/m channels in the S2 to the size of the original channel by convolution operation, and executing the step S2;
      s5, compressing the feature data of the (m-1)/m channels left in the last operation of the S2 to the size of the 1/m channel by convolution operation, and marking the feature data as the super-resolution feature data;
      and S6, performing data fusion on all the super-divided feature data, and recovering the super-divided feature data into one channel by using convolution operation to obtain the feature data after the coding block is subjected to super-division processing.
      Further, the method for performing the super-division processing on the coding block by the super-division model further includes:
      and adding residual errors of the feature data after the coding block is subjected to the over-division processing and the original feature data to obtain an optimized coding block.
      Specifically, the training method of the hyper-resolution model comprises the following steps:
      processing the sample data to obtain training data;
      carrying out downsampling processing on the training data, and inputting the downsampled training data into a hyper-resolution model;
      performing up-sampling processing on training data output by the hyper-resolution model;
      and inputting the training data subjected to the upsampling processing into original training data, and performing normalization processing.
      Further, a method of processing sample data to obtain training data includes:
      reading image data from a sample image, dividing the image data into a plurality of channels, and randomly intercepting sub-picture data in any channel as sample data;
      and performing randomization processing and normalization processing on the sample data in sequence to obtain the training data.
      A video decoding system for a mobile terminal, comprising a decoding module, an optimizing module and an integrating module, wherein:
      the decoding module is used for HEVC decoding on the video to obtain a plurality of coding blocks;
      the optimization module is configured to perform optimization processing on the coding blocks according to the types of the coding blocks to obtain optimized frame data, where the optimization processing includes: performing hyper-resolution processing, double-cube stretching processing or copying hyper-resolution data of a reference frame;
      the integration module is used for integrating the optimized coding blocks into a plurality of pieces of frame data, and performing loop filtering processing and playing on the frame data.
      A computer-readable storage medium having stored thereon computer-readable program instructions for executing the video decoding method for a mobile terminal of any one of claims 1 to 7.
      An electronic device, the electronic device comprising:
      at least one processor; and the number of the first and second groups,
      a memory communicatively coupled to the at least one processor; wherein,
      the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the video decoding method for a mobile terminal of any one of claims 1 to 7.
      Compared with the prior art, the video decoding method, the video decoding system, the storage medium and the electronic equipment for the mobile terminal provided by the invention have the following beneficial effects:
      the video decoding method for the mobile terminal combines the decoding process with the super-resolution technology, and carries out corresponding optimization processing on different coding blocks obtained after video decoding, thereby reducing the time and the computation load of video decoding super-resolution, ensuring the effect, and improving the viewing experience of users and the compatibility rate of application scenes.
      The video decoding system for the mobile terminal provided by the invention realizes real-time optimization of video stream on the mobile terminal with lower performance, reduces the amount of over-calculation of video, and does not increase the power consumption of the mobile terminal obviously, thereby improving the watching experience of users.
    Drawings
      The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention and not to limit the invention. In the drawings:
      FIG. 1 is a diagram illustrating a video decoding method for a mobile terminal according to an embodiment of the present invention;
      fig. 2 is a schematic diagram illustrating a relationship between a coding tree unit CTU and a coding unit CU in HEVC according to an embodiment of the present invention;
      FIG. 3 is a diagram illustrating an optimization process performed on a coded block according to its type according to an embodiment of the present invention;
      FIG. 4 is a diagram illustrating an example of a super-divide model for super-dividing a coding block according to an embodiment of the present invention;
      FIG. 5 is a schematic diagram of a training method of a hyper-resolution model according to an embodiment of the present invention;
      FIG. 6 is a schematic diagram of sample image processing according to an embodiment of the present invention;
      FIG. 7 is a diagram illustrating training criteria for a hyper-segmentation model in an embodiment of the present invention;
      fig. 8 is a schematic block diagram of an electronic device according to an embodiment of the present invention.
    Detailed Description
      In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
      Example one
      Referring to fig. 1, a video decoding method for a mobile terminal includes:
      acquiring a video and carrying out HEVC decoding to obtain a plurality of coding blocks;
      respectively optimizing the coding blocks according to the types of the coding blocks to obtain optimized coding blocks; wherein the optimization process comprises: performing hyper-resolution processing, double-cube stretching processing or copying hyper-resolution data of a reference frame;
      and integrating the optimized coding blocks into a plurality of pieces of frame data, and performing loop filtering processing on the frame data and playing the frame data.
      The video decoding method for the mobile terminal combines the decoding process with the super-resolution technology, and carries out corresponding optimization processing on different coding blocks obtained after video decoding, thereby reducing the time and the computation load of video decoding super-resolution, ensuring the effect, and improving the viewing experience of users and the compatibility rate of application scenes.
      The method for decoding a Video to obtain a plurality of coding blocks may use a Video compression standard of high Efficiency Video coding hevc (high Efficiency Video coding), where a basic coding unit in the standard is a coding Tree unit CTU (coding Tree unit), and each coding Tree unit CTU includes a luminance coding Tree block CTB (coding Tree block) and two chrominance coding Tree blocks CTB.
      Referring to fig. 2, during encoding, a frame is first divided into segments ss (slice segment) and then the segmentation encoding is continued, and the specific method is as follows: the segment ss (slice segment) is first divided into coding tree units CTUs of the same size, and each coding tree unit CTU is divided into coding units CU (coding units) of different types according to a quadtree division manner, where the coding units CU are basic units for performing decision inter-frame, intra-frame, Skip/Merge modes.
      Referring to fig. 3, a coding block is obtained after decoding of a coding unit CU is completed, and a method for optimizing the coding block according to a type of the coding block includes:
      judging the types of the coding blocks, including an inter-frame block, an intra-frame block and an SKIP block;
      if the type of the coding block is an inter-frame block, performing super-division processing on the coding block by using a super-division model;
      if the type of the coding block is an intra-frame block, further judging a quantization parameter QP (quantization parameter) of the coding block, wherein the quantization parameter QP reflects the space detail compression condition, and the smaller the QP value is, the finer the quantization is, and the higher the image quality is; if the quantization parameter QP is not less than 30, performing double cubic stretching processing on the coding block, and if the quantization parameter QP is less than 30, performing super-division processing on the coding block by using a super-division model;
      if the coding block is the SKIP block, the super-divided data of the reference frame is copied, wherein the reference frame refers to a frame required to be referred to during coding and generally contains similar information with the SKIP block.
      Because of the existence of the data between frames, the existence of the residual error is caused, and for the block with the residual error, the corresponding super-divide scaling amplification needs to be carried out on the residual error so as to enable the residual error to correspond to the super-divide of the block between frames.
      And integrating the optimized coding blocks into a plurality of pieces of frame data, and performing loop filtering processing on the frame data to solve the blocking effect and the ringing phenomenon.
      The method for carrying out the super-division processing on the coding blocks by the super-division model comprises the following steps:
      s1, obtaining the feature data of the coding block by convolution operation, and marking the feature data as original feature data;
      s2, channel splitting is carried out on the characteristic data according to the formula (m-1), the characteristic data of the 1/m channel is extracted, marked and stored as a piece of super-split characteristic data;
      s3, judging whether the number of the parts of the over-divided feature data is equal to m-1 or not; if not equal to m-1, go to step S4; if m-1, go to step S5;
      s4, amplifying the feature data of the residual (m-1)/m channels in the S2 to the size of the original channel by convolution operation, and executing the step S2;
      s5, compressing the feature data of the (m-1)/m channels left in the last operation of the S2 to the size of the 1/m channel by convolution operation, and marking the feature data as the super-resolution feature data;
      and S6, performing data fusion on all the m parts of the obtained super-divided feature data, and recovering the m parts of the obtained super-divided feature data into one channel by using convolution operation to obtain the feature data after the super-divided processing of the coding blocks.
      The network main framework of the super-resolution model mainly adopts an information extraction method, characteristic poetry sentences are extracted layer by layer, and finally extracted information is fused and combined. And further, carrying out residual error addition on the feature data after the coding block is subjected to the over-division processing and the original feature data to obtain an optimized coding block.
      Referring to fig. 4, in this embodiment, the specific operation of the super-partition model for super-partitioning the coding blocks may be:
      the hyper-resolution model firstly carries out convolution operation with convolution kernel of 3 x 3 on the coding block to obtain original characteristic data;
      then, channel splitting is carried out on the characteristic data according to the ratio of 1:3 (namely, m is 4), and the channel is split into two parts: a portion of occupancy 1/4, marked and stored as a piece of hyper-resolution feature data; the remaining portion occupies 3/4;
      performing convolution operation of 3 × 3 convolution kernels on the remaining characteristic data of the 3/4 channels again, amplifying the characteristic data to the size of the original channel, then performing channel splitting again according to the ratio of 1:3, and splitting the channel into two parts: one part occupies 1/4, is marked and stored as one part of the super-score feature data (the current part of the super-score feature data is 2); the remaining portion occupies 3/4;
      performing convolution operation of 3 × 3 convolution kernels on the remaining characteristic data of the 3/4 channels again, amplifying the characteristic data to the size of the original channel, then performing channel splitting again according to the ratio of 1:3, and splitting the channel into two parts: one part occupies 1/4, is marked and stored as one part of hyper-resolution characteristic data (the current part of the hyper-resolution characteristic data is 3); the remaining portion occupies 3/4;
      at this time, the number of the hyper-resolution feature data is 3 and is equal to 4-1(m-1), then the feature data of the channel 1/4 is obtained by directly utilizing the convolution operation of a 3 × 3 convolution kernel for the last 3/4 channel, and is marked and stored as a part of hyper-resolution feature data (the number of parts of the current hyper-resolution feature data is 4);
      and performing data fusion on all the obtained 4 parts of the super-divided feature data, and recovering the data to one channel by using convolution operation to obtain the feature data after the coding block is subjected to super-division processing.
      Finally, residual errors of the feature data after the super-division processing and the original feature data are added to obtain optimized coding blocks
      It should be clear to those skilled in the art that although the convolution operation of the 3 × 3 convolution kernel is performed multiple times in the above process, the specific value setting of each 3 × 3 convolution kernel is not necessarily the same, and only represents that the sizes of the convolution kernels are the same.
      Referring to fig. 5, the hyper-resolution model used in the present embodiment is obtained by neural network training, wherein the training method includes:
      and processing the sample data to obtain training data. Please refer to fig. 6, which specifically includes: reading image data from a sample image, dividing the image data into a plurality of channels, and randomly intercepting sub-picture data in any channel as sample data; carrying out randomization processing and normalization processing on the sample data in sequence to obtain training data;
      in a specific implementation, the sample image may be read and trained in batch using 2000 4k high-definition pictures, where each picture only reads image data of one channel (for example, Y channel in fig. 6), and randomly cuts 192 × 192 sub-pictures, and performs random operations such as rotation and mirroring on the sub-pictures to increase the randomness of the pictures. And finally, carrying out normalization processing on the image data to obtain training data.
      Then, the training data are grouped and input into the model in sequence for training. The concrete mode is as follows: selecting a group of training data, performing down-sampling processing on the group of training data, inputting the training data subjected to the down-sampling processing into a hyper-resolution model for data processing, and performing up-sampling processing on the training data output by the hyper-resolution model;
      the down-sampling refers to performing s-time down-sampling on an image with the size of M × N to obtain a resolution image with the size of (M/s) × (N/s); the upsampling is an inverse process of downsampling, which is also called interpolation (Interpolating), and is to insert new elements between pixel points by using an interpolation algorithm on the basis of the original image resolution.
      In this embodiment, the sub-pixel convolution layer in the ESPCN hyper-resolution technique is used for the down-sampling, the number of channels of the image is increased through the sub-pixel convolution, the length and the width of the pixel matrix are reduced, although the number of channels of the image is increased, the information of the image is basically not lost, and the computation amount of the hyper-resolution is reduced through the subsequent processing;
      in general, two times of down-sampling and up-sampling are respectively performed, so that the effect of over-doubling is achieved. In order to further reduce the data calculation amount in the embodiment, in the case of overcutting twice, quadruple downsampling and double upsampling are respectively performed, so that not only can the calculation amount of 4 times be effectively reduced, but also the overcutting effect of the image is basically not influenced.
      And adding the training data subjected to the upsampling processing into the original training data, keeping the original information of the output image, enabling the hyper-resolution image not to be distorted, and performing normalization processing to keep the output image data between [0 and 1 ].
      Referring to fig. 7, the loss function of the current set of training data is calculated, the loss function is optimized by using an optimizer, and the optimized set of training data is subjected to data verification: if the data passes the verification, the current model is stored, and the training is finished; and if the data verification fails, the current model is not stored, the next group of training data is used for repeating the training rule until the data verification passes, and the training is finished. And after the training is finished, performing model conversion on the stored model.
      In this embodiment, the loss function of the training data may be calculated as the sum of the MSE loss function and the VGG loss function, and the loss function in this embodiment is:
      
      
      
      
      
      In this embodiment, the optimizer of the loss function may select an Adam optimizer, and the Adam optimizer may adaptively adjust the learning rate and smoothly optimize the loss function. The MNN can be used by the trained inference engine, is a lightweight deep neural network inference engine, has the characteristics of high efficiency, easiness in use and small size, and enables the hyper-differentiation model to be accessed to a mobile terminal to operate. Therefore, after the training is finished, model conversion is carried out on the stored hyper-resolution model, the hyper-resolution model can be converted into a tenserflow PB model, and then the PB model is converted into an MNN model used by an MNN inference engine.
      In this embodiment, the super-resolution model realizes the super-resolution of images of 2 times or 4 times for any resolution, for a complete frame of image, the super-resolution can be divided into blocks and super-resolution, after the super-resolution of each picture block is completed, a complete super-resolution image can be synthesized, and the super-resolution junction is in smooth transition. On a mobile phone with lower performance, aiming at a low-resolution video, for example, 320 × 240 resolution, the video is subjected to super-resolution doubled (640 × 480) real-time playing, a pause phenomenon does not occur, the power consumption of a mobile terminal is not increased remarkably, the power consumption of the mobile phone is not increased by more than 30%, the subjective definition is obviously improved, and compared with the peak signal-to-noise ratio (PSNR) based on a pure bilinear stretching super-resolution method, the peak PSNR can be improved by about 2 db.
      Example two
      A video decoding system for a mobile terminal, comprising a decoding module, an optimizing module and an integrating module, wherein: the decoding module is used for HEVC decoding on the video to obtain a plurality of coding blocks; the optimization module is used for respectively carrying out optimization processing on the coding blocks according to the types of the coding blocks to obtain optimized frame data, and the optimization processing comprises the following steps: performing hyper-resolution processing, double-cube stretching processing or copying hyper-resolution data of a reference frame; the integration module is used for integrating the optimized coding blocks into a plurality of pieces of frame data, and performing loop filtering processing and playing on the frame data.
      By adopting the video decoding method for the mobile terminal in the first embodiment, the video decoding system for the mobile terminal provided by the invention realizes real-time optimization of video streams, reduces the amount of over-calculation of videos, and does not increase the power consumption of the mobile terminal significantly, thereby improving the viewing experience of users. Compared with the prior art, the beneficial effects of the video decoding system for the mobile terminal provided by the embodiment of the present invention are the same as the beneficial effects of the video decoding method for the mobile terminal provided by the first embodiment of the present invention, and other technical features of the video decoding system for the mobile terminal are the same as those disclosed in the method of the previous embodiment, which are not repeated herein.
      EXAMPLE III
      An embodiment of the present invention provides a computer-readable storage medium having computer-readable program instructions stored thereon, where the computer-readable program instructions are used to execute the video decoding method for a mobile terminal in the first embodiment.
      The computer readable storage medium provided by the embodiments of the present invention may be, for example but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, or device, or any combination thereof. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present embodiment, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, or device. Program code embodied on a computer readable storage medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
      The computer-readable storage medium may be embodied in an electronic device; or may be present alone without being incorporated into the electronic device.
      The computer readable storage medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring at least two internet protocol addresses; sending a node evaluation request comprising the at least two internet protocol addresses to node evaluation equipment, wherein the node evaluation equipment selects the internet protocol addresses from the at least two internet protocol addresses and returns the internet protocol addresses; receiving an internet protocol address returned by the node evaluation equipment; wherein the obtained internet protocol address indicates an edge node in the content distribution network.
      Alternatively, the computer readable storage medium carries one or more programs which, when executed by an electronic device, cause the electronic device to: receiving a node evaluation request comprising at least two internet protocol addresses; selecting an internet protocol address from the at least two internet protocol addresses; returning the selected internet protocol address; wherein the received internet protocol address indicates an edge node in the content distribution network.
      Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
      The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
      The modules described in the embodiments of the present disclosure may be implemented by software or hardware. The names of the modules do not in some cases constitute a limitation to the unit itself, and for example, the optimization module may also be described as a "module that performs optimization processing according to the type of the coding block".
      The computer-readable storage medium provided by the invention stores computer-readable program instructions for executing the method for decoding the video of the mobile terminal, and combines the decoding process with the super-resolution technology to perform corresponding optimization processing on different coding blocks obtained after the video decoding. Compared with the prior art, the beneficial effects of the computer-readable storage medium provided by the embodiment of the present invention are the same as the beneficial effects of the method for video decoding at the mobile terminal provided by the first embodiment, and are not described herein again.
      Example four
      An embodiment of the present invention provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to execute the method for video decoding on a mobile terminal according to the first embodiment.
      Referring now to FIG. 8, shown is a schematic diagram of an electronic device suitable for use in implementing embodiments of the present disclosure. The electronic devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., car navigation terminals), and the like, and fixed terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 8 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
      As shown in fig. 8, the electronic device may include a processing means (e.g., a central processing unit, a graphic processor, etc.) that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) or a program loaded from a storage means into a Random Access Memory (RAM). In the RAM, various programs and data necessary for the operation of the electronic apparatus are also stored. The processing device, the ROM, and the RAM are connected to each other by a bus. An input/output (I/O) interface is also connected to the bus.
      Generally, the following systems may be connected to the I/O interface: input devices including, for example, touch screens, touch pads, keyboards, mice, image sensors, microphones, accelerometers, gyroscopes, and the like; output devices including, for example, Liquid Crystal Displays (LCDs), speakers, vibrators, and the like; storage devices including, for example, magnetic tape, hard disk, etc.; and a communication device. The communication means may allow the electronic device to communicate wirelessly or by wire with other devices to exchange data. While the figures illustrate an electronic device with various systems, it is to be understood that not all illustrated systems are required to be implemented or provided. More or fewer systems may alternatively be implemented or provided.
      In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means, or installed from a storage means, or installed from a ROM. The computer program, when executed by a processing device, performs the above-described functions defined in the methods of the embodiments of the present disclosure.
      By adopting the video decoding method for the mobile terminal in the first embodiment, the electronic device provided by the invention reduces the super-resolution calculation amount of the video from the two aspects of the super-resolution model and the decoding on the mobile terminal with lower performance, realizes the real-time super-resolution of the video stream on the mobile terminal, and does not significantly increase the power consumption of the mobile terminal, thereby improving the watching experience of a user. Compared with the prior art, the beneficial effects of the electronic device provided by the embodiment of the present invention are the same as the beneficial effects of the video decoding method for the mobile terminal provided by the first embodiment of the present invention, and other technical features of the electronic device are the same as those disclosed in the method of the previous embodiment, which are not described herein again.
      It should be understood that portions of the present disclosure may be implemented in hardware, software, firmware, or a combination thereof. In the foregoing description of embodiments, the particular features, structures, materials, or characteristics may be combined in any suitable manner in any one or more embodiments or examples.
      The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.
    Claims (10)
1. A video decoding method for a mobile terminal, comprising:
      acquiring a video and carrying out HEVC decoding to obtain a plurality of coding blocks;
      respectively optimizing the coding blocks according to the types of the coding blocks to obtain optimized coding blocks; wherein the optimization process comprises: performing hyper-resolution processing, double-cube stretching processing or copying hyper-resolution data of a reference frame;
      and integrating the optimized coding blocks into a plurality of pieces of frame data, and performing loop filtering processing on the frame data and playing the frame data.
    2. The video decoding method for the mobile terminal according to claim 1, wherein the method for optimizing the coding blocks according to the types of the coding blocks comprises:
      judging the type of the coding block, wherein the type comprises an inter-frame block, an intra-frame block and an SKIP block;
      if the type of the coding block is an inter-frame block, performing super-division processing on the coding block by using a super-division model;
      if the type of the coding block is an intra block, further judging a quantization parameter QP of the coding block; if the quantization parameter QP is not less than 30, performing double cubic stretching processing on the coding block, and if the quantization parameter QP is less than 30, performing super-division processing on the coding block by using a super-division model;
      and if the coding block is an SKIP block, copying the super-divided data of the reference frame.
    3. The method as claimed in claim 2, wherein when the coding blocks are optimized according to the type of the coding block, the residual is required to be subjected to corresponding super-scaling for the coding blocks with residual.
    4. The video decoding method for the mobile end according to claim 2, wherein the method for the super-partition model to super-partition the coding blocks comprises:
      s1, obtaining the feature data of the coding block by convolution operation, and marking the feature data as original feature data;
      s2, channel splitting is carried out on the characteristic data according to the formula 1(m-1), and the characteristic data of the 1/m channel is extracted and marked as hyper-split characteristic data;
      s3, judging whether the number of the hyper-resolution characteristic data is equal to m-1; if not equal to m-1, go to step S4; if m-1, go to step S5;
      s4, amplifying the feature data of the residual (m-1)/m channels in the S2 to the size of the original channel by convolution operation, and executing the step S2;
      s5, compressing the feature data of the (m-1)/m channels left in the last operation of the S2 to the size of the 1/m channel by convolution operation, and marking the feature data as the super-resolution feature data;
      and S6, performing data fusion on all the super-divided feature data, and recovering the super-divided feature data into one channel by using convolution operation to obtain the feature data after the coding block is subjected to super-division processing.
    5. The video decoding method for the mobile end according to claim 4, wherein the method for the super-partition model to super-partition the coding blocks further comprises:
      and adding residual errors of the feature data after the coding block is subjected to the over-division processing and the original feature data to obtain an optimized coding block.
    6. The video decoding method for the mobile terminal according to claim 4 or 5, wherein the training method of the hyper-resolution model comprises:
      processing the sample data to obtain training data;
      carrying out downsampling processing on the training data, and inputting the downsampled training data into a hyper-resolution model;
      performing up-sampling processing on training data output by the hyper-resolution model;
      and inputting the training data subjected to the upsampling processing into original training data, and performing normalization processing.
    7. The video decoding method for the mobile terminal according to claim 6, wherein the method of processing the sample data to obtain the training data comprises:
      reading image data from a sample image, dividing the image data into a plurality of channels, and randomly intercepting sub-picture data in any channel as sample data;
      and performing randomization processing and normalization processing on the sample data in sequence to obtain the training data.
    8. A video decoding system for mobile terminal, comprising a decoding module, an optimizing module and an integrating module, wherein:
      the decoding module is used for HEVC decoding on the video to obtain a plurality of coding blocks;
      the optimization module is configured to perform optimization processing on the coding blocks according to the types of the coding blocks to obtain optimized frame data, where the optimization processing includes: performing hyper-resolution processing, double-cube stretching processing or copying hyper-resolution data of a reference frame;
      the integration module is used for integrating the optimized coding blocks into a plurality of pieces of frame data, and performing loop filtering processing and playing on the frame data.
    9. A computer-readable storage medium having computer-readable program instructions stored thereon for performing the video decoding method for a mobile terminal of any one of claims 1 to 7.
    10. An electronic device, characterized in that the electronic device comprises:
      at least one processor; and the number of the first and second groups,
      a memory communicatively coupled to the at least one processor; wherein,
      the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the video decoding method for a mobile terminal of any one of claims 1 to 7.
    Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title | 
|---|---|---|---|
| CN202110230999.9A CN113068029A (en) | 2021-03-02 | 2021-03-02 | Video decoding method and system for mobile terminal, storage medium and electronic equipment | 
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title | 
|---|---|---|---|
| CN202110230999.9A CN113068029A (en) | 2021-03-02 | 2021-03-02 | Video decoding method and system for mobile terminal, storage medium and electronic equipment | 
Publications (1)
| Publication Number | Publication Date | 
|---|---|
| CN113068029A true CN113068029A (en) | 2021-07-02 | 
Family
ID=76559472
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date | 
|---|---|---|---|
| CN202110230999.9A Pending CN113068029A (en) | 2021-03-02 | 2021-03-02 | Video decoding method and system for mobile terminal, storage medium and electronic equipment | 
Country Status (1)
| Country | Link | 
|---|---|
| CN (1) | CN113068029A (en) | 
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| US20100026685A1 (en) * | 2008-08-04 | 2010-02-04 | Kabushiki Kaisha Toshiba | Image Processing Apparatus | 
| CN106960416A (en) * | 2017-03-20 | 2017-07-18 | 武汉大学 | A kind of video satellite compression image super-resolution method of content complexity self adaptation | 
| CN110136056A (en) * | 2018-02-08 | 2019-08-16 | 华为技术有限公司 | Method and device for image super-resolution reconstruction | 
| CN111539874A (en) * | 2020-04-15 | 2020-08-14 | 山东神舟信息技术有限公司 | Method and device for accelerating video super-resolution reconstruction | 
- 
        2021
        - 2021-03-02 CN CN202110230999.9A patent/CN113068029A/en active Pending
 
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| US20100026685A1 (en) * | 2008-08-04 | 2010-02-04 | Kabushiki Kaisha Toshiba | Image Processing Apparatus | 
| CN106960416A (en) * | 2017-03-20 | 2017-07-18 | 武汉大学 | A kind of video satellite compression image super-resolution method of content complexity self adaptation | 
| CN110136056A (en) * | 2018-02-08 | 2019-08-16 | 华为技术有限公司 | Method and device for image super-resolution reconstruction | 
| CN111539874A (en) * | 2020-04-15 | 2020-08-14 | 山东神舟信息技术有限公司 | Method and device for accelerating video super-resolution reconstruction | 
Similar Documents
| Publication | Publication Date | Title | 
|---|---|---|
| CN110662034B (en) | Adaptive filtering method and video encoding and decoding device using the same | |
| US12328509B2 (en) | Adaptive panoramic video streaming using composite pictures | |
| CN113747242B (en) | Image processing method, image processing device, electronic equipment and storage medium | |
| CN108063976B (en) | Video processing method and device | |
| US20190364205A1 (en) | Adaptive panoramic video streaming using overlapping partitioned sections | |
| CN111510739B (en) | A video transmission method and device | |
| CN115361582A (en) | A video real-time super-resolution processing method, device, terminal and storage medium | |
| CN113068029A (en) | Video decoding method and system for mobile terminal, storage medium and electronic equipment | |
| CN110662071A (en) | Video decoding method and apparatus, storage medium, and electronic apparatus | |
| CN110572677A (en) | video encoding and decoding method and device, storage medium and electronic device | |
| CN110677676A (en) | Video encoding method and apparatus, video decoding method and apparatus, and storage medium | |
| CN110677690B (en) | Video processing method and device and storage medium | |
| EP3170306B1 (en) | Multilevel video compression, decompression, and display for 4k and 8k applications | |
| CN116016958A (en) | Image processing method, device, device and storage medium | |
| CN112954360B (en) | Decoding method, decoding device, storage medium, and electronic apparatus | |
| CN118714406A (en) | Remote sensing image quick-view processing method, system and electronic equipment based on real-time video stream | |
| HK40020132B (en) | Video processing method, device and storage medium | |
| HK40020132A (en) | Video processing method, device and storage medium | |
| WO2023197717A1 (en) | Image decoding method and apparatus, and image coding method and apparatus | |
| CN119515672A (en) | Image quality enhancement method, device, electronic device and storage medium | |
| HK40019475A (en) | Method and apparatus for video decoding, storage medium and electronic device | |
| CN119520835A (en) | Video image quality enhancement method, device, electronic device and storage medium | |
| HK40020214A (en) | Video encoding method and device, video decoding method and device, and storage medium | |
| HK40018862A (en) | Method for encoding and decoding video, apparatus, storage medium and electronic device | |
| JP4615042B2 (en) | Image processing device | 
Legal Events
| Date | Code | Title | Description | 
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| WD01 | Invention patent application deemed withdrawn after publication | Application publication date: 20210702 | |
| WD01 | Invention patent application deemed withdrawn after publication |