Disclosure of Invention
In order to meet challenges such as instantaneity and efficient compression in a cloud desktop scene, the invention aims to provide an intelligent encoding and decoding method and system based on video content in the cloud desktop scene, and the adopted technical scheme is as follows:
in a first aspect, the application discloses an intelligent encoding and decoding method based on video content in a cloud desktop scene, and the method is applied to a server and comprises the following steps:
s1, determining a change area image block and a content type corresponding to the change area image block according to an image difference and content identification technology aiming at a current frame image of a desktop of a cloud host system;
s2, determining the coding mode of the current frame image based on the size, distribution and content type of the image blocks in the change area within the coding and decoding capacity range of the cloud host system;
S3, coding the current frame image based on the coding mode to obtain coded data;
S4, encrypting the encoded data, transmitting the obtained encrypted data to a client, correspondingly decrypting the obtained encrypted data by the client, decoding, and displaying the current frame image.
Further, in step S1, for a current frame image of a desktop of the cloud host system, determining a change region image block and a content type corresponding to the change region image block by using an image difference and content identification technology, including:
s11, acquiring a previous frame image and a current frame image of a desktop of a cloud host system;
S12, determining an image change area through pixel difference analysis based on the previous frame image and the current frame image;
s13, identifying a target image area to be encoded based on the image change area;
S14, image segmentation and image content identification are carried out on the target image area so as to determine a change area image block and a content type corresponding to the change area image block.
Further, in step S2, the encoding mode determined based on the size, distribution, and content type of the change region image block includes:
According to the application scene requirement, a first coding mode or a second coding mode of performing global incremental coding on a first frame image, a preset key frame image or a current frame image with a change region image block with a size larger than or equal to a preset threshold value is adopted in a streaming coding mode or a picture coding mode;
And for a current frame image with a plurality of second change area image blocks with the sizes smaller than a preset threshold value, adopting a third coding mode of area increment coding by adopting an adaptive hybrid coding mode according to the content types corresponding to the image blocks in the image, wherein the hybrid coding mode comprises at least one of PNG lossy coding mode, LZ4 lossless coding mode, H265 lossless coding mode, H264 lossy coding mode and JPEG lossy coding mode.
Further, in step S2, for a current frame image having a plurality of second variable region image blocks with sizes smaller than a preset threshold, if a content type corresponding to an image block is text office content, region delta encoding is performed on the image block in a PNG lossy encoding mode, if a content type corresponding to an image block is design drawing content and the cloud host system supports lossless hardware acceleration, region delta encoding is performed on the image in a H265 lossless encoding mode, if a content type corresponding to an image block is design drawing content and the cloud host system does not support lossless hardware acceleration, region delta encoding is performed on the image block in an LZ4 lossless encoding mode, if a content type corresponding to an image block is rendering entertainment content and the cloud host system supports lossy hardware acceleration, region delta encoding is performed on the image block in a H264 lossy encoding mode, and if a content type corresponding to an image block is rendering entertainment content and the cloud host system does not support lossy hardware acceleration, region delta encoding is performed on the image block in a JPEG lossy encoding mode.
Further, for the current frame image with a plurality of image blocks of the second variation area with the size smaller than the preset threshold, if the plurality of image blocks are spatially adjacent and the content types are identical, the plurality of image blocks are combined to obtain a new combined variation area, and for the current frame image with the combined variation area, the encoding mode determined in the step S2 further includes:
When the size of the merging change area is determined to be larger than or equal to a preset threshold value, a fourth coding mode of performing global incremental coding on the current frame image by adopting a first coding mode or a second coding mode according to the application scene requirement;
and when a plurality of merging change areas with the sizes smaller than a preset threshold value exist, adopting a fifth coding mode of performing area increment coding on the current frame image by adopting a third coding mode.
Further, in step S3, the encoding the current frame image based on the encoding mode to obtain encoded data includes:
s31, for the current frame image adopting the global increment coding mode, adopting a global increment coder to code the current frame image to obtain corresponding coded data, wherein the coded data is attached with image coding mode information.
Further, in step S3, the encoding the current frame image based on the encoding mode to obtain encoded data includes:
S31, independently encoding each change region image block in the current frame image by adopting a region increment encoding mode to generate an encoding segment corresponding to the current frame image, wherein the encoding segment is attached with encoding mode information used by the corresponding change region image block and Rect information representing the region position and the size of the image block;
S32, for each coding segment, carrying out serialization processing based on the coding segment, corresponding coding mode information and Rect information respectively to obtain a corresponding structured data segment;
s33, combining the structured data segments to obtain corresponding coded data.
Further, in step S4, during displaying the current frame image, the method includes:
If the client decodes and recovers to obtain a complete image consistent with the original picture, directly taking the complete image as a currently displayed frame image, and caching the complete image for subsequent use;
If the client decodes and recovers to obtain a plurality of change area image blocks, the change area image blocks are combined with the previous frame image according to the position information of the change area image blocks so as to update the currently displayed frame image.
In a second aspect, the application discloses an intelligent encoding and decoding system based on video content in a cloud desktop scene, wherein the system comprises a server side and a client side, and the system comprises the following components:
The server is used for determining a change area image block and a content type corresponding to the change area image block according to an image difference and content identification technology aiming at a current frame image of a desktop of the cloud host system, determining an encoding mode of the current frame image based on the size, distribution and content type of the change area image block in the encoding and decoding capacity range of the cloud host system, encoding the current frame image based on the encoding mode to obtain encoding data, encrypting the encoding data and transmitting the obtained encryption data to a client;
The client is used for correspondingly decrypting the obtained encrypted data and displaying the current frame image after decoding.
The invention has the following beneficial effects:
1) By the image difference and content identification technology, the image blocks of the change area in the current frame image of the desktop of the cloud host system can be accurately identified, and the indifferently encoding of the whole frame image is avoided, so that unnecessary encoding and decoding consumption and data transmission are reduced;
2) The method comprises the steps of selecting a coding mode according to the size, distribution and content type of the image blocks in a change area, and dynamically selecting the most suitable coding mode according to the size, distribution and content type of the image blocks in the change area, so that the coding efficiency and the transmission speed can be further improved;
3) The coded data is encrypted, so that the data can be effectively prevented from being stolen or tampered in the transmission process, and the safety of the cloud desktop system and the privacy of user data are ensured. After the encrypted data is transmitted to the client, the client performs corresponding decryption and decoding processing, so that the end-to-end data protection is realized, and the security of the system is further enhanced.
Detailed Description
In order to further describe the technical means and effects adopted by the invention to achieve the preset aim, the following is a detailed description of specific implementation, structure, characteristics and effects of the intelligent encoding and decoding method and system based on video content in cloud desktop scene according to the invention in combination with the accompanying drawings and the preferred embodiment. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The following specifically describes a specific scheme of an intelligent encoding and decoding method and system based on video content in a cloud desktop scene.
Referring to fig. 1, a method flowchart of an intelligent encoding and decoding method based on video content in a cloud desktop scene according to an embodiment of the present invention is shown, where the method includes:
Step S1, determining a change area image block and a content type corresponding to the change area image block according to an image difference and content identification technology aiming at a current frame image of a desktop of a cloud host system.
Specifically, the present application compares the current frame image with the previous frame image, thereby obtaining an image change area. In view of the need to improve coding efficiency and decoding quality, the present application performs image segmentation and classification based on image content based on image change regions. This process may carefully analyze image features, such as color, texture, shape, etc., within the image change region, thereby obtaining more accurate change region image blocks, and the corresponding content types (e.g., text, image, video, etc.) of those image blocks. The processing is not only beneficial to reducing redundancy of encoded data and improving encoding efficiency, but also can recover the original image more accurately during decoding and improve decoding quality.
And S2, determining the coding mode of the current frame image based on the size, the distribution and the content type of the image block of the change area within the coding and decoding capability range of the cloud host system.
Specifically, when determining the coding mode of the current frame image, the server side may consider a plurality of factors. Wherein:
(1) For the first frame image, considering that the first frame image is taken as a starting frame in a video sequence, a previous frame image which is not available for reference is compared, so the application adopts a global increment coding mode to code the image. Global delta coding means that the whole image area of the current frame is coded, not just the changed area, because the whole image is new and needs to be transmitted for the first frame image. Therefore, the global incremental coding can ensure that the first frame image can be completely and accurately coded and transmitted to the client, and provides a basis for the incremental coding of the subsequent frames;
(2) For preset key frame images (such as I frames, i.e. intra-frame coding frames) in a video sequence, considering that the key frame images need to be independently decoded in the decoding process and serve as reference frames of subsequent frames, the complete image information needs to be contained, and a server side encodes the images in a global increment coding mode, so that the processing mode ensures high quality and high fidelity of the key frame images, and provides a solid foundation for accurate decoding and display of the subsequent frames.
(3) The server may also analyze the size of the image blocks of the changed region, and for larger image blocks, considering that the large image blocks may contain more image details or complex texture information, if block-based region delta coding is adopted, the coding efficiency may be reduced or the image quality may be damaged. Therefore, the server encodes the image using global delta encoding (i.e., global encoding for the large image block). For the case where there are multiple smaller image blocks, it is contemplated that these small image blocks may contain different types of image content, and that the requirements for coding efficiency and image quality are also different for each content type. Therefore, the server side performs region incremental coding in an adaptive hybrid coding mode according to the content type corresponding to each image block. Such a processing method can ensure that the most appropriate encoding method is selected for different types of image contents, thereby maximizing the encoding efficiency and the data compression ratio while maintaining the image quality. The content of each small image block is intelligently analyzed, and the optimal coding strategy is selected according to the content, so that the server can optimize the transmission of the video stream, reduce the bandwidth occupation and improve the user experience.
And step S3, coding the current frame image based on the coding mode to obtain coded data.
Specifically, the server performs the actual encoding operation on the current frame image according to the previously determined encoding mode (possibly global delta encoding or local delta encoding). If the current frame image is determined to be a key frame image or contains a larger image block, the server may encode the whole frame image by adopting a global increment encoding mode, so as to ensure the integrity of image information and high-quality transmission. For the case of non-key frame images, which contain a plurality of smaller image blocks, the server side encodes each image block by using a region delta encoding mode according to the content type (such as text, image, video, etc.) of each image block. After the encoding process, a bit stream containing the encoded data of the current frame image is obtained. These encoded data will be transmitted to the client for subsequent decoding and display operations.
And S4, carrying out encryption processing on the encoded data, transmitting the obtained encrypted data to a client, and displaying the current frame image after the client carries out corresponding decryption and decoding processing on the obtained encrypted data.
Specifically, the server firstly encrypts the encoded data to ensure the security and privacy protection of the data in the transmission process. The encryption process may employ advanced encryption algorithms and techniques, such as symmetric encryption algorithm (e.g., AES), asymmetric encryption algorithm (e.g., RSA), or hybrid encryption algorithm, and the specific choice depends on security requirements, computing resources, and transmission efficiency. After encryption is completed, the server transmits the obtained encrypted data to the client through the network. After receiving the encrypted data, the client performs corresponding decryption processing. The decryption process needs to use the same encryption and decryption algorithm and key or corresponding key pair as the encryption of the server to ensure the correct restoration of the data. After decryption is completed, the client obtains the original encoded data. The client then decodes the resulting encoded data. The decoding process is opposite to the encoding process, which restores the encoded data to the original image data using the same technique as in the encoding. And finally, the client displays the decoded image data and presents the current frame image on a user interface.
According to the intelligent coding and decoding method based on video content in the cloud desktop scene, the image difference and content identification technology can accurately identify the image blocks of the change area in the current frame image of the cloud host desktop, so that indiscriminate coding of the whole frame image is avoided, unnecessary coding and decoding consumption and data transmission are reduced, the optimal coding mode is dynamically selected according to the size, distribution and content type of the image blocks of the change area, coding efficiency and transmission speed can be further improved, the coding mode is selected according to the content type, proper processing of different types of video content in the transmission process can be ensured, stability and definition of video quality are maintained, encryption processing is carried out on the coded data, theft or tampering of the data in the transmission process can be effectively prevented, and safety of the cloud desktop system and privacy of user data are guaranteed. After the encrypted data is transmitted to the client, the client performs corresponding decryption and decoding processing, so that the end-to-end data protection is realized, and the security of the system is further enhanced.
In one embodiment, in step S1, for a current frame image of a desktop of a cloud host system, determining a change region image block and a content type corresponding to the change region image block by using an image difference and content identification technology includes:
step S11, a previous frame image and a current frame image of a desktop of the cloud host system are acquired.
Specifically, this step is to acquire the basic data and the changed data for comparison. Wherein, the higher performance image interception based on content change is realized by DXGI under Windows system.
Step S12, determining an image change area by pixel difference analysis based on the previous frame image and the current frame image.
Specifically, by comparing the pixel difference between the previous frame image and the current frame image, it is further identified which regions in the current frame image have changed. I.e. image differential analysis is performed based on the previous frame image and the current frame image to determine the region of variation in the current frame image. The change areas include various changes caused by user operations, such as newly appearing windows, moving icons, and changed text.
Step S13, identifying the target image area to be encoded based on the image change area.
And step S14, performing image segmentation and image content identification on the target image area to determine a change area image block and a content type corresponding to the change area image block.
Based on the steps S13-S14, it is required to explain that the application determines which areas are interested and target image areas needing to be encoded by performing feature analysis (such as color, texture and shape feature analysis) and judgment on the content of the image change areas. Then, the image segmentation algorithm (such as an edge-based image segmentation algorithm, a threshold-based image segmentation algorithm, etc.) is used to accurately segment the target image region, so as to extract corresponding image blocks of a change region (it should be noted that the obtained image blocks may be one change region image block or a plurality of change region image blocks, which depends on the complexity of the image change region). Then, each divided image block of the change area is classified by using an image content recognition technology to determine the corresponding content type (such as text area, icon area, image area, video window and the like).
In one embodiment, the encoding mode determined based on the size, distribution and content type of the variable region image block in step S2 includes a first encoding mode or a second encoding mode for performing global delta encoding on a first frame image, a preset key frame image, or a current frame image with a variable region image block having a size greater than or equal to a preset threshold, using a streaming encoding mode or a picture encoding mode according to an application scene requirement, and a third encoding mode for performing local delta encoding on a current frame image with a plurality of second variable region image blocks having a size less than the preset threshold, using an adaptive hybrid encoding mode according to the content type corresponding to each image block in the figure, where the hybrid encoding mode includes at least one of a PNG lossy encoding mode, an LZ4 lossless encoding mode, an H264 lossy encoding mode, and a JPEG lossy encoding mode.
Specifically, first, for a first frame image, a preset key frame image, or a current frame image with a change region image block having a size greater than or equal to a preset threshold, a server side performs global incremental encoding in a streaming encoding mode. The coding mode is suitable for application scenes needing to transmit and play video streams in real time, such as remote desktop sharing, video conferences and the like. Through stream coding, the service end can send the coded data to the client end in real time, and the client end can receive and decode the data for display, so that delay and blocking phenomenon are reduced. Secondly, for the same type of image (i.e. the first frame image, the key frame image or the frame image containing a large variation region), the server side will choose to perform global delta encoding in a picture encoding mode. The coding mode is more focused on the maintenance of image quality and the improvement of compression efficiency, and is suitable for application scenes with higher requirements on image quality and lower requirements on real-time performance, such as image storage, image transmission and the like. And finally, for the current frame image with a plurality of second change region image blocks with the sizes smaller than a preset threshold, the server side performs region incremental coding in a self-adaptive mixed coding mode according to the content type corresponding to each image block. The coding mode combines the advantages of a plurality of coding technologies, and the most suitable coding method can be selected according to the content type of the image block.
In one embodiment, in step S2, for a current frame image having a plurality of second change area image blocks with sizes smaller than a preset threshold, if a content type corresponding to an image block is text office content, performing area increment encoding on the image block in a PNG lossy encoding mode, if a content type corresponding to an image block is design drawing content and a cloud host system supports lossless hardware acceleration, performing area increment encoding on the image in an H265 lossless encoding mode, if a content type corresponding to an image block is design drawing content and a cloud host system does not support lossless hardware acceleration, performing area increment encoding on the image block in an LZ4 lossless encoding mode, if a content type corresponding to an image block is rendering entertainment content and a cloud host system supports lossy hardware acceleration, performing area increment encoding on the image block in an H264 lossy encoding mode, and if a content type corresponding to an image block is rendering entertainment content and a cloud host system does not support lossy hardware acceleration, performing area increment encoding on the image block in a JPEG lossy encoding mode.
Specifically, for an image block with a content type of text office content, since the text information needs to maintain high definition and readability, the server side can select a PNG lossy coding form to perform region incremental coding. Among them, the PNG format supports lossless compression while providing transparency support, which is very effective for contents such as text and simple graphics, which need to keep edges clear. Although lossy coding, by adjusting the compression level, better compression can be achieved while ensuring text readability. For image blocks whose content type is the content of a design drawing, lossless compression can better preserve these details since the design drawing typically contains complex line, shape, and color gradations. If the cloud host system supports lossless hardware acceleration, the server side can select an H265 lossless coding form to perform region incremental coding. Among them, H265 (HEVC) is an advanced video compression standard that supports efficient lossless compression, while maintaining or improving video quality, significantly reduces the bit rate of video, thereby reducing storage space and transmission bandwidth requirements. With the support of hardware acceleration, the H265 can realize efficient lossless compression and real-time processing, and is very suitable for the transmission of design drawing contents. If the cloud host system does not support lossless hardware acceleration, the server side can select an LZ4 lossless coding form to perform region incremental coding. The LZ4 is an efficient compression algorithm, can realize rapid compression and decompression speed while maintaining data integrity, and is very suitable for the transmission of design drawing content. For image blocks with content types of rendering entertainment content, the server selects an encoding form according to whether the cloud host system supports lossy hardware acceleration. If the cloud host system supports the acceleration of the lossy hardware, the server side selects an H264 lossy coding form to perform region incremental coding. Among them, H264 is an advanced video compression standard that uses a lossy compression technique to reduce the amount of data by removing redundant information in an image and using temporal correlation. Under the support of hardware acceleration, the H264 can realize efficient compression and real-time processing, and is very suitable for the transmission of rendering entertainment content. If the cloud host system does not support the acceleration of the lossy hardware, the server side can select a JPEG lossy coding mode to perform region incremental coding. Among them, JPEG is a widely used image compression format that also utilizes lossy compression techniques to reduce the amount of data. Although some image details can be lost in the compression process of JPEG, better compression effect can be realized by adjusting compression quality parameters on the premise of ensuring acceptable image quality.
In one embodiment, for a current frame image with a plurality of image blocks with second change regions with sizes smaller than a preset threshold, if the plurality of image blocks are adjacent in space and have consistent content types, the plurality of image blocks are combined to obtain a new combined change region, and for the current frame image with the combined change region, the coding mode determined in the step S2 further comprises a fourth coding mode for performing global incremental coding on the current frame image by adopting a first coding mode or a second coding mode according to application scene requirements when the size of the combined change region is determined to be larger than or equal to the preset threshold, and a fifth coding mode for performing region incremental coding on the current frame image by adopting a third coding mode when the plurality of combined change regions with sizes smaller than the preset threshold are determined to exist.
In one embodiment, in step S3, the encoding the current frame image based on the encoding mode to obtain encoded data includes:
Step S31, for the current frame image adopting the global increment coding mode, adopting a global increment coder to code the current frame image to obtain corresponding coding data, wherein the coding data is attached with image coding mode information.
In one embodiment, in step S3, the encoding the current frame image based on the encoding mode to obtain encoded data includes:
Step S31, for the current frame image adopting the area increment coding mode, each change area image block in the current frame image is independently coded, and a corresponding coding segment is generated, wherein the coding segment is attached with coding mode information used by the corresponding change area image block and Rect information representing the position and the size of the image block area.
Specifically, referring to fig. 2, for a current frame image adopting a region delta coding manner, a server side first detects a difference between the current frame image and a previous frame image, and identifies all the changed region image blocks. Then, an independent encoding process is performed for each of the change region image blocks. In the encoding process, an encoding segment corresponding to the image block is generated, and in addition to the actual encoding data, encoding mode information used by the image block and Rect information (generally including x and y coordinates of the upper left corner and height and width information of the image block) representing the position and size of the image block are attached to the encoding segment. This information is necessary for subsequent decoding, thereby enabling correct decoding and reconstruction of the image.
Step S32, for each coding segment, carrying out serialization processing based on the coding segment, the corresponding coding mode information and the Rect information to obtain a corresponding structured data segment.
Specifically, the server organizes the encoded data, the encoding mode information, and the Rect information according to a predetermined format, and converts the organized data into a format that is easy to store and transmit, such as the binary format illustrated in fig. 2. The purpose of such serialization is to keep the structure and content of the data unchanged during storage or transmission. After the serialization process, each encoded segment generates a corresponding structured data segment. These structured data segments contain all necessary decoding information, thereby facilitating subsequent decoding resolution and reconstruction of the image.
Step S33, combining each of the structured data segments to obtain corresponding encoded data.
Specifically, referring to fig. 2, the server integrates all the structured data segments, and the integrated data is outputted as final encoded data. The encoded data contains all the change area information of the current frame image and is organized in a structured manner, so that a decoder can analyze and reconstruct the image
In one embodiment, in step S4, during displaying the current frame image, the method includes directly taking the complete image as the current frame image to be displayed and caching the complete image for subsequent use if the client decodes and recovers to obtain the complete image consistent with the original image, and merging the multiple change region image blocks with the previous frame image according to the position information of the multiple change region image blocks if the client decodes and recovers to obtain the multiple change region image blocks so as to update the current frame image.
Specifically, if the client decodes and recovers the complete image consistent with the original picture, which indicates that the complete image contains complete picture information, the client does not need to perform image merging operation, and the decoded complete image is taken as the currently displayed frame image. And the client also caches the complete image for subsequent use. It should be noted that, the purpose of buffering is to reduce the need of repeated decoding and improve the playing efficiency. When the subsequent frame image again requires this image as a reference frame, it can be read directly from the buffer without re-decoding.
Specifically, if the client decodes and recovers to obtain a plurality of change area image blocks, at this time, the image blocks are accurately placed at corresponding positions of the previous frame image according to the position information (i.e., rect information) of each change area image block obtained by decoding, and after the merging of all the change area image blocks is completed, the client obtains an updated current frame image, wherein the updated current frame image contains all the latest change information and is ready to be displayed.
Referring to fig. 3, the intelligent encoding and decoding system based on video content in cloud desktop scene disclosed by the application comprises a server side and a client side, wherein:
The server is used for determining a change area image block and a content type corresponding to the change area image block according to an image difference and content identification technology aiming at a current frame image of a desktop of a cloud host system, determining an encoding mode of the current frame image based on the size, distribution and content type of the change area image block in the encoding and decoding capacity range of the cloud host system, encoding the current frame image based on the encoding mode to obtain encoded data, encrypting the encoded data and transmitting the obtained encrypted data to a client.
The client is used for correspondingly decrypting the obtained encrypted data and displaying the current frame image after decoding.
In one embodiment, the server is further configured to implement the steps illustrated in any one of the foregoing method embodiments, which is not limited by the present application.
According to the intelligent coding and decoding system based on video content in the cloud desktop scene, the image difference and content identification technology can accurately identify the image blocks of the change area in the current frame image of the cloud host system desktop, so that indiscriminate coding of the whole frame image is avoided, unnecessary coding consumption and data transmission are reduced, the most suitable coding mode is dynamically selected according to the size, distribution and content type of the image blocks of the change area, coding efficiency and transmission speed can be further improved, the coding mode is selected according to the content type, proper processing of different types of video content in the transmission process can be ensured, stability and definition of video quality are maintained, encryption processing is carried out on the coded data, theft or tampering of the data in the transmission process can be effectively prevented, and safety of the cloud desktop system and privacy of user data are guaranteed. After the encrypted data is transmitted to the client, the client performs corresponding decryption and decoding processing, so that the end-to-end data protection is realized, and the security of the system is further enhanced.
It should be noted that the sequence of the embodiments of the present invention is only for description, and does not represent the advantages and disadvantages of the embodiments. The processes depicted in the accompanying drawings do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments.