[go: up one dir, main page]

CN119653089A - A video content-based intelligent encoding and decoding method and system in a cloud desktop scenario - Google Patents

A video content-based intelligent encoding and decoding method and system in a cloud desktop scenario Download PDF

Info

Publication number
CN119653089A
CN119653089A CN202510177146.1A CN202510177146A CN119653089A CN 119653089 A CN119653089 A CN 119653089A CN 202510177146 A CN202510177146 A CN 202510177146A CN 119653089 A CN119653089 A CN 119653089A
Authority
CN
China
Prior art keywords
image
encoding
frame image
current frame
content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202510177146.1A
Other languages
Chinese (zh)
Inventor
郭猛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Zeta Cloud Technology Co ltd
Original Assignee
Wuhan Zeta Cloud Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Zeta Cloud Technology Co ltd filed Critical Wuhan Zeta Cloud Technology Co ltd
Priority to CN202510177146.1A priority Critical patent/CN119653089A/en
Publication of CN119653089A publication Critical patent/CN119653089A/en
Pending legal-status Critical Current

Links

Landscapes

  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本发明公开的一种云桌面场景下基于视频内容的智能编解码方法及系统,该方法包括:针对云主机系统桌面的当前帧图像,通过图像差分与内容识别技术,确定变化区域图像块、以及变化区域图像块对应的内容类型;在云主机系统的编解码能力范围内,基于变化区域图像块的大小、分布、以及内容类型,确定当前帧图像的编码方式;基于所述编码方式对所述当前帧图像进行编码,得到编码数据;对所述编码数据进行加密处理,并将所得的加密数据传输到客户端,由客户端对获取到的加密数据进行对应的解密、以及解码处理,得到更新后的完整帧图像。

The present invention discloses an intelligent encoding and decoding method and system based on video content in a cloud desktop scenario. The method comprises: for a current frame image of a cloud host system desktop, determining a changed area image block and a content type corresponding to the changed area image block by using image difference and content recognition technology; determining an encoding method for the current frame image based on the size, distribution, and content type of the changed area image block within the encoding and decoding capability of the cloud host system; encoding the current frame image based on the encoding method to obtain encoded data; encrypting the encoded data and transmitting the obtained encrypted data to a client, and the client performs corresponding decryption and decoding processing on the obtained encrypted data to obtain an updated complete frame image.

Description

Intelligent encoding and decoding method and system based on video content in cloud desktop scene
Technical Field
The invention relates to the fields of cloud computing cloud desktop application, video encoding and decoding and encryption communication, in particular to an intelligent encoding and decoding method and system based on video content in a cloud desktop scene.
Background
Cloud desktop is a desktop virtualization technology based on cloud computing, which migrates a traditional desktop computing environment to a cloud server, and a user can access and use applications and files on the cloud desktop through various terminal devices (such as a PC, a tablet, a mobile phone, and the like). The cloud desktop technology has the advantages of resource centralized management, elastic expansion, high availability, safety and the like, and is widely applied to the fields of enterprise office, remote education, graphic design and the like. Video content codec technology refers to technology that compresses and decompresses video data. In cloud desktop scenarios, the transmission and presentation of video content requires efficient codec techniques to reduce the amount of data, improve the transmission efficiency, and ensure video quality. Common video coding algorithms are h.264, H.265 (HEVC), VP8, VP9, AV1, etc., which implement compression of video data by different compression strategies and optimization means. However, conventional video codec technology still faces many challenges in cloud desktop scenarios. For example, while algorithms such as h.264, h.265 have significantly improved compression efficiency, they tend to have difficulty achieving sufficiently low latency and sufficiently high compression rate while guaranteeing video quality in the face of complex and varied video content in the cloud desktop. Therefore, it is highly desirable to provide an intelligent encoding and decoding technology based on video content, which can fully utilize the advantages of cloud computing and artificial intelligence, can meet challenges such as real-time performance and efficient compression in a cloud desktop scene, and provides a smoother, clearer and intelligent cloud desktop experience for users.
Disclosure of Invention
In order to meet challenges such as instantaneity and efficient compression in a cloud desktop scene, the invention aims to provide an intelligent encoding and decoding method and system based on video content in the cloud desktop scene, and the adopted technical scheme is as follows:
in a first aspect, the application discloses an intelligent encoding and decoding method based on video content in a cloud desktop scene, and the method is applied to a server and comprises the following steps:
s1, determining a change area image block and a content type corresponding to the change area image block according to an image difference and content identification technology aiming at a current frame image of a desktop of a cloud host system;
s2, determining the coding mode of the current frame image based on the size, distribution and content type of the image blocks in the change area within the coding and decoding capacity range of the cloud host system;
S3, coding the current frame image based on the coding mode to obtain coded data;
S4, encrypting the encoded data, transmitting the obtained encrypted data to a client, correspondingly decrypting the obtained encrypted data by the client, decoding, and displaying the current frame image.
Further, in step S1, for a current frame image of a desktop of the cloud host system, determining a change region image block and a content type corresponding to the change region image block by using an image difference and content identification technology, including:
s11, acquiring a previous frame image and a current frame image of a desktop of a cloud host system;
S12, determining an image change area through pixel difference analysis based on the previous frame image and the current frame image;
s13, identifying a target image area to be encoded based on the image change area;
S14, image segmentation and image content identification are carried out on the target image area so as to determine a change area image block and a content type corresponding to the change area image block.
Further, in step S2, the encoding mode determined based on the size, distribution, and content type of the change region image block includes:
According to the application scene requirement, a first coding mode or a second coding mode of performing global incremental coding on a first frame image, a preset key frame image or a current frame image with a change region image block with a size larger than or equal to a preset threshold value is adopted in a streaming coding mode or a picture coding mode;
And for a current frame image with a plurality of second change area image blocks with the sizes smaller than a preset threshold value, adopting a third coding mode of area increment coding by adopting an adaptive hybrid coding mode according to the content types corresponding to the image blocks in the image, wherein the hybrid coding mode comprises at least one of PNG lossy coding mode, LZ4 lossless coding mode, H265 lossless coding mode, H264 lossy coding mode and JPEG lossy coding mode.
Further, in step S2, for a current frame image having a plurality of second variable region image blocks with sizes smaller than a preset threshold, if a content type corresponding to an image block is text office content, region delta encoding is performed on the image block in a PNG lossy encoding mode, if a content type corresponding to an image block is design drawing content and the cloud host system supports lossless hardware acceleration, region delta encoding is performed on the image in a H265 lossless encoding mode, if a content type corresponding to an image block is design drawing content and the cloud host system does not support lossless hardware acceleration, region delta encoding is performed on the image block in an LZ4 lossless encoding mode, if a content type corresponding to an image block is rendering entertainment content and the cloud host system supports lossy hardware acceleration, region delta encoding is performed on the image block in a H264 lossy encoding mode, and if a content type corresponding to an image block is rendering entertainment content and the cloud host system does not support lossy hardware acceleration, region delta encoding is performed on the image block in a JPEG lossy encoding mode.
Further, for the current frame image with a plurality of image blocks of the second variation area with the size smaller than the preset threshold, if the plurality of image blocks are spatially adjacent and the content types are identical, the plurality of image blocks are combined to obtain a new combined variation area, and for the current frame image with the combined variation area, the encoding mode determined in the step S2 further includes:
When the size of the merging change area is determined to be larger than or equal to a preset threshold value, a fourth coding mode of performing global incremental coding on the current frame image by adopting a first coding mode or a second coding mode according to the application scene requirement;
and when a plurality of merging change areas with the sizes smaller than a preset threshold value exist, adopting a fifth coding mode of performing area increment coding on the current frame image by adopting a third coding mode.
Further, in step S3, the encoding the current frame image based on the encoding mode to obtain encoded data includes:
s31, for the current frame image adopting the global increment coding mode, adopting a global increment coder to code the current frame image to obtain corresponding coded data, wherein the coded data is attached with image coding mode information.
Further, in step S3, the encoding the current frame image based on the encoding mode to obtain encoded data includes:
S31, independently encoding each change region image block in the current frame image by adopting a region increment encoding mode to generate an encoding segment corresponding to the current frame image, wherein the encoding segment is attached with encoding mode information used by the corresponding change region image block and Rect information representing the region position and the size of the image block;
S32, for each coding segment, carrying out serialization processing based on the coding segment, corresponding coding mode information and Rect information respectively to obtain a corresponding structured data segment;
s33, combining the structured data segments to obtain corresponding coded data.
Further, in step S4, during displaying the current frame image, the method includes:
If the client decodes and recovers to obtain a complete image consistent with the original picture, directly taking the complete image as a currently displayed frame image, and caching the complete image for subsequent use;
If the client decodes and recovers to obtain a plurality of change area image blocks, the change area image blocks are combined with the previous frame image according to the position information of the change area image blocks so as to update the currently displayed frame image.
In a second aspect, the application discloses an intelligent encoding and decoding system based on video content in a cloud desktop scene, wherein the system comprises a server side and a client side, and the system comprises the following components:
The server is used for determining a change area image block and a content type corresponding to the change area image block according to an image difference and content identification technology aiming at a current frame image of a desktop of the cloud host system, determining an encoding mode of the current frame image based on the size, distribution and content type of the change area image block in the encoding and decoding capacity range of the cloud host system, encoding the current frame image based on the encoding mode to obtain encoding data, encrypting the encoding data and transmitting the obtained encryption data to a client;
The client is used for correspondingly decrypting the obtained encrypted data and displaying the current frame image after decoding.
The invention has the following beneficial effects:
1) By the image difference and content identification technology, the image blocks of the change area in the current frame image of the desktop of the cloud host system can be accurately identified, and the indifferently encoding of the whole frame image is avoided, so that unnecessary encoding and decoding consumption and data transmission are reduced;
2) The method comprises the steps of selecting a coding mode according to the size, distribution and content type of the image blocks in a change area, and dynamically selecting the most suitable coding mode according to the size, distribution and content type of the image blocks in the change area, so that the coding efficiency and the transmission speed can be further improved;
3) The coded data is encrypted, so that the data can be effectively prevented from being stolen or tampered in the transmission process, and the safety of the cloud desktop system and the privacy of user data are ensured. After the encrypted data is transmitted to the client, the client performs corresponding decryption and decoding processing, so that the end-to-end data protection is realized, and the security of the system is further enhanced.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions and advantages of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
Fig. 1 is a method flowchart of an intelligent encoding and decoding method based on video content in a cloud desktop scene according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of an overall flow of region increment coding using an adaptive hybrid coding scheme;
Fig. 3 is a system structure diagram of an intelligent encoding and decoding system based on video content in a cloud desktop scene according to an embodiment of the present invention.
Detailed Description
In order to further describe the technical means and effects adopted by the invention to achieve the preset aim, the following is a detailed description of specific implementation, structure, characteristics and effects of the intelligent encoding and decoding method and system based on video content in cloud desktop scene according to the invention in combination with the accompanying drawings and the preferred embodiment. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The following specifically describes a specific scheme of an intelligent encoding and decoding method and system based on video content in a cloud desktop scene.
Referring to fig. 1, a method flowchart of an intelligent encoding and decoding method based on video content in a cloud desktop scene according to an embodiment of the present invention is shown, where the method includes:
Step S1, determining a change area image block and a content type corresponding to the change area image block according to an image difference and content identification technology aiming at a current frame image of a desktop of a cloud host system.
Specifically, the present application compares the current frame image with the previous frame image, thereby obtaining an image change area. In view of the need to improve coding efficiency and decoding quality, the present application performs image segmentation and classification based on image content based on image change regions. This process may carefully analyze image features, such as color, texture, shape, etc., within the image change region, thereby obtaining more accurate change region image blocks, and the corresponding content types (e.g., text, image, video, etc.) of those image blocks. The processing is not only beneficial to reducing redundancy of encoded data and improving encoding efficiency, but also can recover the original image more accurately during decoding and improve decoding quality.
And S2, determining the coding mode of the current frame image based on the size, the distribution and the content type of the image block of the change area within the coding and decoding capability range of the cloud host system.
Specifically, when determining the coding mode of the current frame image, the server side may consider a plurality of factors. Wherein:
(1) For the first frame image, considering that the first frame image is taken as a starting frame in a video sequence, a previous frame image which is not available for reference is compared, so the application adopts a global increment coding mode to code the image. Global delta coding means that the whole image area of the current frame is coded, not just the changed area, because the whole image is new and needs to be transmitted for the first frame image. Therefore, the global incremental coding can ensure that the first frame image can be completely and accurately coded and transmitted to the client, and provides a basis for the incremental coding of the subsequent frames;
(2) For preset key frame images (such as I frames, i.e. intra-frame coding frames) in a video sequence, considering that the key frame images need to be independently decoded in the decoding process and serve as reference frames of subsequent frames, the complete image information needs to be contained, and a server side encodes the images in a global increment coding mode, so that the processing mode ensures high quality and high fidelity of the key frame images, and provides a solid foundation for accurate decoding and display of the subsequent frames.
(3) The server may also analyze the size of the image blocks of the changed region, and for larger image blocks, considering that the large image blocks may contain more image details or complex texture information, if block-based region delta coding is adopted, the coding efficiency may be reduced or the image quality may be damaged. Therefore, the server encodes the image using global delta encoding (i.e., global encoding for the large image block). For the case where there are multiple smaller image blocks, it is contemplated that these small image blocks may contain different types of image content, and that the requirements for coding efficiency and image quality are also different for each content type. Therefore, the server side performs region incremental coding in an adaptive hybrid coding mode according to the content type corresponding to each image block. Such a processing method can ensure that the most appropriate encoding method is selected for different types of image contents, thereby maximizing the encoding efficiency and the data compression ratio while maintaining the image quality. The content of each small image block is intelligently analyzed, and the optimal coding strategy is selected according to the content, so that the server can optimize the transmission of the video stream, reduce the bandwidth occupation and improve the user experience.
And step S3, coding the current frame image based on the coding mode to obtain coded data.
Specifically, the server performs the actual encoding operation on the current frame image according to the previously determined encoding mode (possibly global delta encoding or local delta encoding). If the current frame image is determined to be a key frame image or contains a larger image block, the server may encode the whole frame image by adopting a global increment encoding mode, so as to ensure the integrity of image information and high-quality transmission. For the case of non-key frame images, which contain a plurality of smaller image blocks, the server side encodes each image block by using a region delta encoding mode according to the content type (such as text, image, video, etc.) of each image block. After the encoding process, a bit stream containing the encoded data of the current frame image is obtained. These encoded data will be transmitted to the client for subsequent decoding and display operations.
And S4, carrying out encryption processing on the encoded data, transmitting the obtained encrypted data to a client, and displaying the current frame image after the client carries out corresponding decryption and decoding processing on the obtained encrypted data.
Specifically, the server firstly encrypts the encoded data to ensure the security and privacy protection of the data in the transmission process. The encryption process may employ advanced encryption algorithms and techniques, such as symmetric encryption algorithm (e.g., AES), asymmetric encryption algorithm (e.g., RSA), or hybrid encryption algorithm, and the specific choice depends on security requirements, computing resources, and transmission efficiency. After encryption is completed, the server transmits the obtained encrypted data to the client through the network. After receiving the encrypted data, the client performs corresponding decryption processing. The decryption process needs to use the same encryption and decryption algorithm and key or corresponding key pair as the encryption of the server to ensure the correct restoration of the data. After decryption is completed, the client obtains the original encoded data. The client then decodes the resulting encoded data. The decoding process is opposite to the encoding process, which restores the encoded data to the original image data using the same technique as in the encoding. And finally, the client displays the decoded image data and presents the current frame image on a user interface.
According to the intelligent coding and decoding method based on video content in the cloud desktop scene, the image difference and content identification technology can accurately identify the image blocks of the change area in the current frame image of the cloud host desktop, so that indiscriminate coding of the whole frame image is avoided, unnecessary coding and decoding consumption and data transmission are reduced, the optimal coding mode is dynamically selected according to the size, distribution and content type of the image blocks of the change area, coding efficiency and transmission speed can be further improved, the coding mode is selected according to the content type, proper processing of different types of video content in the transmission process can be ensured, stability and definition of video quality are maintained, encryption processing is carried out on the coded data, theft or tampering of the data in the transmission process can be effectively prevented, and safety of the cloud desktop system and privacy of user data are guaranteed. After the encrypted data is transmitted to the client, the client performs corresponding decryption and decoding processing, so that the end-to-end data protection is realized, and the security of the system is further enhanced.
In one embodiment, in step S1, for a current frame image of a desktop of a cloud host system, determining a change region image block and a content type corresponding to the change region image block by using an image difference and content identification technology includes:
step S11, a previous frame image and a current frame image of a desktop of the cloud host system are acquired.
Specifically, this step is to acquire the basic data and the changed data for comparison. Wherein, the higher performance image interception based on content change is realized by DXGI under Windows system.
Step S12, determining an image change area by pixel difference analysis based on the previous frame image and the current frame image.
Specifically, by comparing the pixel difference between the previous frame image and the current frame image, it is further identified which regions in the current frame image have changed. I.e. image differential analysis is performed based on the previous frame image and the current frame image to determine the region of variation in the current frame image. The change areas include various changes caused by user operations, such as newly appearing windows, moving icons, and changed text.
Step S13, identifying the target image area to be encoded based on the image change area.
And step S14, performing image segmentation and image content identification on the target image area to determine a change area image block and a content type corresponding to the change area image block.
Based on the steps S13-S14, it is required to explain that the application determines which areas are interested and target image areas needing to be encoded by performing feature analysis (such as color, texture and shape feature analysis) and judgment on the content of the image change areas. Then, the image segmentation algorithm (such as an edge-based image segmentation algorithm, a threshold-based image segmentation algorithm, etc.) is used to accurately segment the target image region, so as to extract corresponding image blocks of a change region (it should be noted that the obtained image blocks may be one change region image block or a plurality of change region image blocks, which depends on the complexity of the image change region). Then, each divided image block of the change area is classified by using an image content recognition technology to determine the corresponding content type (such as text area, icon area, image area, video window and the like).
In one embodiment, the encoding mode determined based on the size, distribution and content type of the variable region image block in step S2 includes a first encoding mode or a second encoding mode for performing global delta encoding on a first frame image, a preset key frame image, or a current frame image with a variable region image block having a size greater than or equal to a preset threshold, using a streaming encoding mode or a picture encoding mode according to an application scene requirement, and a third encoding mode for performing local delta encoding on a current frame image with a plurality of second variable region image blocks having a size less than the preset threshold, using an adaptive hybrid encoding mode according to the content type corresponding to each image block in the figure, where the hybrid encoding mode includes at least one of a PNG lossy encoding mode, an LZ4 lossless encoding mode, an H264 lossy encoding mode, and a JPEG lossy encoding mode.
Specifically, first, for a first frame image, a preset key frame image, or a current frame image with a change region image block having a size greater than or equal to a preset threshold, a server side performs global incremental encoding in a streaming encoding mode. The coding mode is suitable for application scenes needing to transmit and play video streams in real time, such as remote desktop sharing, video conferences and the like. Through stream coding, the service end can send the coded data to the client end in real time, and the client end can receive and decode the data for display, so that delay and blocking phenomenon are reduced. Secondly, for the same type of image (i.e. the first frame image, the key frame image or the frame image containing a large variation region), the server side will choose to perform global delta encoding in a picture encoding mode. The coding mode is more focused on the maintenance of image quality and the improvement of compression efficiency, and is suitable for application scenes with higher requirements on image quality and lower requirements on real-time performance, such as image storage, image transmission and the like. And finally, for the current frame image with a plurality of second change region image blocks with the sizes smaller than a preset threshold, the server side performs region incremental coding in a self-adaptive mixed coding mode according to the content type corresponding to each image block. The coding mode combines the advantages of a plurality of coding technologies, and the most suitable coding method can be selected according to the content type of the image block.
In one embodiment, in step S2, for a current frame image having a plurality of second change area image blocks with sizes smaller than a preset threshold, if a content type corresponding to an image block is text office content, performing area increment encoding on the image block in a PNG lossy encoding mode, if a content type corresponding to an image block is design drawing content and a cloud host system supports lossless hardware acceleration, performing area increment encoding on the image in an H265 lossless encoding mode, if a content type corresponding to an image block is design drawing content and a cloud host system does not support lossless hardware acceleration, performing area increment encoding on the image block in an LZ4 lossless encoding mode, if a content type corresponding to an image block is rendering entertainment content and a cloud host system supports lossy hardware acceleration, performing area increment encoding on the image block in an H264 lossy encoding mode, and if a content type corresponding to an image block is rendering entertainment content and a cloud host system does not support lossy hardware acceleration, performing area increment encoding on the image block in a JPEG lossy encoding mode.
Specifically, for an image block with a content type of text office content, since the text information needs to maintain high definition and readability, the server side can select a PNG lossy coding form to perform region incremental coding. Among them, the PNG format supports lossless compression while providing transparency support, which is very effective for contents such as text and simple graphics, which need to keep edges clear. Although lossy coding, by adjusting the compression level, better compression can be achieved while ensuring text readability. For image blocks whose content type is the content of a design drawing, lossless compression can better preserve these details since the design drawing typically contains complex line, shape, and color gradations. If the cloud host system supports lossless hardware acceleration, the server side can select an H265 lossless coding form to perform region incremental coding. Among them, H265 (HEVC) is an advanced video compression standard that supports efficient lossless compression, while maintaining or improving video quality, significantly reduces the bit rate of video, thereby reducing storage space and transmission bandwidth requirements. With the support of hardware acceleration, the H265 can realize efficient lossless compression and real-time processing, and is very suitable for the transmission of design drawing contents. If the cloud host system does not support lossless hardware acceleration, the server side can select an LZ4 lossless coding form to perform region incremental coding. The LZ4 is an efficient compression algorithm, can realize rapid compression and decompression speed while maintaining data integrity, and is very suitable for the transmission of design drawing content. For image blocks with content types of rendering entertainment content, the server selects an encoding form according to whether the cloud host system supports lossy hardware acceleration. If the cloud host system supports the acceleration of the lossy hardware, the server side selects an H264 lossy coding form to perform region incremental coding. Among them, H264 is an advanced video compression standard that uses a lossy compression technique to reduce the amount of data by removing redundant information in an image and using temporal correlation. Under the support of hardware acceleration, the H264 can realize efficient compression and real-time processing, and is very suitable for the transmission of rendering entertainment content. If the cloud host system does not support the acceleration of the lossy hardware, the server side can select a JPEG lossy coding mode to perform region incremental coding. Among them, JPEG is a widely used image compression format that also utilizes lossy compression techniques to reduce the amount of data. Although some image details can be lost in the compression process of JPEG, better compression effect can be realized by adjusting compression quality parameters on the premise of ensuring acceptable image quality.
In one embodiment, for a current frame image with a plurality of image blocks with second change regions with sizes smaller than a preset threshold, if the plurality of image blocks are adjacent in space and have consistent content types, the plurality of image blocks are combined to obtain a new combined change region, and for the current frame image with the combined change region, the coding mode determined in the step S2 further comprises a fourth coding mode for performing global incremental coding on the current frame image by adopting a first coding mode or a second coding mode according to application scene requirements when the size of the combined change region is determined to be larger than or equal to the preset threshold, and a fifth coding mode for performing region incremental coding on the current frame image by adopting a third coding mode when the plurality of combined change regions with sizes smaller than the preset threshold are determined to exist.
In one embodiment, in step S3, the encoding the current frame image based on the encoding mode to obtain encoded data includes:
Step S31, for the current frame image adopting the global increment coding mode, adopting a global increment coder to code the current frame image to obtain corresponding coding data, wherein the coding data is attached with image coding mode information.
In one embodiment, in step S3, the encoding the current frame image based on the encoding mode to obtain encoded data includes:
Step S31, for the current frame image adopting the area increment coding mode, each change area image block in the current frame image is independently coded, and a corresponding coding segment is generated, wherein the coding segment is attached with coding mode information used by the corresponding change area image block and Rect information representing the position and the size of the image block area.
Specifically, referring to fig. 2, for a current frame image adopting a region delta coding manner, a server side first detects a difference between the current frame image and a previous frame image, and identifies all the changed region image blocks. Then, an independent encoding process is performed for each of the change region image blocks. In the encoding process, an encoding segment corresponding to the image block is generated, and in addition to the actual encoding data, encoding mode information used by the image block and Rect information (generally including x and y coordinates of the upper left corner and height and width information of the image block) representing the position and size of the image block are attached to the encoding segment. This information is necessary for subsequent decoding, thereby enabling correct decoding and reconstruction of the image.
Step S32, for each coding segment, carrying out serialization processing based on the coding segment, the corresponding coding mode information and the Rect information to obtain a corresponding structured data segment.
Specifically, the server organizes the encoded data, the encoding mode information, and the Rect information according to a predetermined format, and converts the organized data into a format that is easy to store and transmit, such as the binary format illustrated in fig. 2. The purpose of such serialization is to keep the structure and content of the data unchanged during storage or transmission. After the serialization process, each encoded segment generates a corresponding structured data segment. These structured data segments contain all necessary decoding information, thereby facilitating subsequent decoding resolution and reconstruction of the image.
Step S33, combining each of the structured data segments to obtain corresponding encoded data.
Specifically, referring to fig. 2, the server integrates all the structured data segments, and the integrated data is outputted as final encoded data. The encoded data contains all the change area information of the current frame image and is organized in a structured manner, so that a decoder can analyze and reconstruct the image
In one embodiment, in step S4, during displaying the current frame image, the method includes directly taking the complete image as the current frame image to be displayed and caching the complete image for subsequent use if the client decodes and recovers to obtain the complete image consistent with the original image, and merging the multiple change region image blocks with the previous frame image according to the position information of the multiple change region image blocks if the client decodes and recovers to obtain the multiple change region image blocks so as to update the current frame image.
Specifically, if the client decodes and recovers the complete image consistent with the original picture, which indicates that the complete image contains complete picture information, the client does not need to perform image merging operation, and the decoded complete image is taken as the currently displayed frame image. And the client also caches the complete image for subsequent use. It should be noted that, the purpose of buffering is to reduce the need of repeated decoding and improve the playing efficiency. When the subsequent frame image again requires this image as a reference frame, it can be read directly from the buffer without re-decoding.
Specifically, if the client decodes and recovers to obtain a plurality of change area image blocks, at this time, the image blocks are accurately placed at corresponding positions of the previous frame image according to the position information (i.e., rect information) of each change area image block obtained by decoding, and after the merging of all the change area image blocks is completed, the client obtains an updated current frame image, wherein the updated current frame image contains all the latest change information and is ready to be displayed.
Referring to fig. 3, the intelligent encoding and decoding system based on video content in cloud desktop scene disclosed by the application comprises a server side and a client side, wherein:
The server is used for determining a change area image block and a content type corresponding to the change area image block according to an image difference and content identification technology aiming at a current frame image of a desktop of a cloud host system, determining an encoding mode of the current frame image based on the size, distribution and content type of the change area image block in the encoding and decoding capacity range of the cloud host system, encoding the current frame image based on the encoding mode to obtain encoded data, encrypting the encoded data and transmitting the obtained encrypted data to a client.
The client is used for correspondingly decrypting the obtained encrypted data and displaying the current frame image after decoding.
In one embodiment, the server is further configured to implement the steps illustrated in any one of the foregoing method embodiments, which is not limited by the present application.
According to the intelligent coding and decoding system based on video content in the cloud desktop scene, the image difference and content identification technology can accurately identify the image blocks of the change area in the current frame image of the cloud host system desktop, so that indiscriminate coding of the whole frame image is avoided, unnecessary coding consumption and data transmission are reduced, the most suitable coding mode is dynamically selected according to the size, distribution and content type of the image blocks of the change area, coding efficiency and transmission speed can be further improved, the coding mode is selected according to the content type, proper processing of different types of video content in the transmission process can be ensured, stability and definition of video quality are maintained, encryption processing is carried out on the coded data, theft or tampering of the data in the transmission process can be effectively prevented, and safety of the cloud desktop system and privacy of user data are guaranteed. After the encrypted data is transmitted to the client, the client performs corresponding decryption and decoding processing, so that the end-to-end data protection is realized, and the security of the system is further enhanced.
It should be noted that the sequence of the embodiments of the present invention is only for description, and does not represent the advantages and disadvantages of the embodiments. The processes depicted in the accompanying drawings do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments.

Claims (9)

1.一种云桌面场景下基于视频内容的智能编解码方法,其特征在于,所述方法应用于服务端,包括:1. An intelligent encoding and decoding method based on video content in a cloud desktop scenario, characterized in that the method is applied to a server and includes: S1、针对云主机系统桌面的当前帧图像,通过图像差分与内容识别技术,确定变化区域图像块、以及变化区域图像块对应的内容类型;S1. For the current frame image of the cloud host system desktop, determine the image block in the changed area and the content type corresponding to the image block in the changed area through image difference and content recognition technology; S2、在云主机系统的编解码能力范围内,基于变化区域图像块的大小、分布、以及内容类型,确定当前帧图像的编码方式;S2. Determine the encoding method of the current frame image based on the size, distribution, and content type of the image blocks in the changed area within the encoding and decoding capabilities of the cloud host system; S3、基于所述编码方式对所述当前帧图像进行编码,得到编码数据;S3, encoding the current frame image based on the encoding method to obtain encoded data; S4、对所述编码数据进行加密处理,并将所得的加密数据传输到客户端,由客户端对获取到的加密数据进行对应的解密、以及解码处理后,显示当前帧图像。S4. Encrypt the encoded data and transmit the obtained encrypted data to the client. The client performs corresponding decryption and decoding on the obtained encrypted data and then displays the current frame image. 2.根据权利要求1所述的方法,其特征在于,步骤S1中,针对云主机系统桌面的当前帧图像,通过图像差分与内容识别技术,确定变化区域图像块、以及变化区域图像块对应的内容类型,包括:2. The method according to claim 1, characterized in that in step S1, for the current frame image of the cloud host system desktop, the image block of the changed area and the content type corresponding to the image block of the changed area are determined by image difference and content recognition technology, including: S11、获取云主机系统桌面的前一帧图像、以及当前帧图像;S11, obtaining the previous frame image and the current frame image of the cloud host system desktop; S12、基于所述前一帧图像、以及当前帧图像,通过像素差异分析以确定图像变化区域;S12, determining the image change area through pixel difference analysis based on the previous frame image and the current frame image; S13、基于所述图像变化区域识别出需要编码的目标图像区域;S13, identifying a target image area to be encoded based on the image change area; S14、对所述目标图像区域进行图像分割、以及图像内容识别,以确定变化区域图像块、以及变化区域图像块对应的内容类型。S14: performing image segmentation and image content recognition on the target image area to determine image blocks in the changed area and content types corresponding to the image blocks in the changed area. 3.根据权利要求1所述的方法,其特征在于,步骤S2中,基于变化区域图像块的大小、分布、以及内容类型所确定的编码方式包括:3. The method according to claim 1, characterized in that, in step S2, the encoding method determined based on the size, distribution, and content type of the image blocks in the changed area comprises: 根据应用场景需求,采用流式编码形式或图片编码形式,对第一帧图像、预设的关键帧图像、或存在大小大于或等于预设阈值的变化区域图像块的当前帧图像进行全域增量编码的第一编码方式、或第二编码方式;According to the application scenario requirements, a first encoding method or a second encoding method is used to perform global incremental encoding on a first frame image, a preset key frame image, or a current frame image having a change area image block whose size is greater than or equal to a preset threshold, using a streaming encoding form or a picture encoding form; 对于存在多个大小小于预设阈值的第二变化区域图像块的当前帧图像,根据图中各图像块对应的内容类型采用自适应的混合编码形式进行区域增量编码的第三编码方式,所述混合编码形式包括PNG有损编码形式、LZ4无损编码形式、H264有损编码形式、以及JPEG有损编码形式中的至少一种。For a current frame image in which there are multiple second change area image blocks with sizes smaller than a preset threshold, a third encoding method is used to perform regional incremental encoding using an adaptive hybrid encoding form according to the content type corresponding to each image block in the image. The hybrid encoding form includes at least one of a PNG lossy encoding form, an LZ4 lossless encoding form, an H264 lossy encoding form, and a JPEG lossy encoding form. 4.根据权利要求3所述的方法,其特征在于,步骤S2中,针对存在多个大小小于预设阈值的第二变化区域图像块的当前帧图像,若其中一图像块对应的内容类型为文本办公内容,则对该图像块采用PNG有损编码形式进行区域增量编码;若其中一图像块对应的内容类型为设计绘图内容、且云主机系统支持无损硬件加速,则对该图像采用H265无损编码形式进行区域增量编码;若其中一图像块对应的内容类型为设计绘图内容、且云主机系统不支持无损硬件加速,则对该图像块采用LZ4无损编码形式进行区域增量编码;若其中一图像块对应的内容类型为渲染娱乐内容、且云主机系统支持有损硬件加速,则对该图像块采用H264有损编码形式进行区域增量编码;若其中一图像块对应的内容类型为渲染娱乐内容、且云主机系统不支持有损硬件加速,则对该图像块采用JPEG有损编码形式进行区域增量编码。4. The method according to claim 3 is characterized in that, in step S2, for the current frame image having multiple second change area image blocks whose sizes are smaller than a preset threshold, if the content type corresponding to one of the image blocks is text office content, the image block is subjected to regional incremental encoding using PNG lossy encoding; if the content type corresponding to one of the image blocks is design and drawing content and the cloud host system supports lossless hardware acceleration, the image is subjected to regional incremental encoding using H265 lossless encoding; if the content type corresponding to one of the image blocks is design and drawing content and the cloud host system does not support lossless hardware acceleration, the image block is subjected to regional incremental encoding using LZ4 lossless encoding; if the content type corresponding to one of the image blocks is rendering entertainment content and the cloud host system supports lossy hardware acceleration, the image block is subjected to regional incremental encoding using H264 lossy encoding; if the content type corresponding to one of the image blocks is rendering entertainment content and the cloud host system does not support lossy hardware acceleration, the image block is subjected to regional incremental encoding using JPEG lossy encoding. 5.根据权利要求3所述的方法,其特征在于,对于存在多个大小小于预设阈值的第二变化区域图像块的当前帧图像,若其中的多个图像块在空间上相邻、且内容类型一致,则对这多个图像块进行合并,得到新的合并变化区域;针对包括所述合并变化区域的当前帧图像,步骤S2中所确定的编码方式还包括:5. The method according to claim 3, characterized in that, for a current frame image having multiple second change region image blocks whose sizes are smaller than a preset threshold, if the multiple image blocks are spatially adjacent and have the same content type, the multiple image blocks are merged to obtain a new merged change region; for the current frame image including the merged change region, the encoding method determined in step S2 further includes: 在确定所述合并变化区域的大小大于或等于预设阈值时,根据应用场景需求,采用第一编码方式或第二编码方式对所述当前帧图像进行全域增量编码的第四编码方式;When it is determined that the size of the merged change area is greater than or equal to the preset threshold, according to the application scenario requirements, a fourth encoding method of performing global incremental encoding on the current frame image using the first encoding method or the second encoding method; 在确定存在多个大小小于预设阈值的合并变化区域时,采用第三编码方式对所述当前帧图像进行区域增量编码的第五编码方式。When it is determined that there are multiple merged change regions whose sizes are smaller than the preset threshold, a fifth encoding method of performing region incremental encoding on the current frame image using the third encoding method. 6.根据权利要求1所述的方法,其特征在于,步骤S3中,所述基于所述编码方式对所述当前帧图像进行编码,得到编码数据,包括:6. The method according to claim 1, characterized in that, in step S3, encoding the current frame image based on the encoding method to obtain encoded data comprises: S31、对于采用全域增量编码方式的当前帧图像,采用全域增量编码器对当前帧图像进行编码处理,得到相应的编码数据,所述编码数据中附带有图像编码方式信息。S31. For a current frame image using a global incremental encoding method, a global incremental encoder is used to encode the current frame image to obtain corresponding encoded data, wherein the encoded data is accompanied by image encoding method information. 7.根据权利要求1所述的方法,其特征在于,步骤S3中,所述基于所述编码方式对所述当前帧图像进行编码,得到编码数据,包括:7. The method according to claim 1, characterized in that, in step S3, encoding the current frame image based on the encoding method to obtain encoded data comprises: S31、对于采用区域增量编码方式的当前帧图像,对所述当前帧图像中的每一个变化区域图像块进行独立编码,生成与之对应的编码片段,其中,所述编码片段中附带有对应变化区域图像块所使用的编码方式信息,以及表示该图像块区域位置和大小的Rect信息;S31, for a current frame image using a region increment coding method, independently encode each changed region image block in the current frame image to generate a corresponding coded segment, wherein the coded segment is accompanied by coding method information used for the corresponding changed region image block and Rect information indicating the region position and size of the image block; S32、针对每一个编码片段,分别基于所述编码片段、对应的编码方式信息、以及Rect信息进行序列化处理,得到相应的结构化数据片段;S32, for each coding segment, serialization processing is performed based on the coding segment, the corresponding coding mode information, and the Rect information to obtain a corresponding structured data segment; S33、组合各所述结构化数据片段,得到相应的编码数据。S33: Combine the structured data segments to obtain corresponding encoded data. 8.根据权利要求1所述的方法,其特征在于,步骤S4中,显示当前帧图像过程中,所述方法包括:8. The method according to claim 1, characterized in that, in step S4, during the process of displaying the current frame image, the method comprises: 若客户端解码恢复得到与原始画面一致的完整图像,则直接将该完整图像作为当前显示的帧图像,并将其缓存以供后续使用;If the client decodes and recovers a complete image that is consistent with the original picture, the complete image is directly used as the currently displayed frame image and cached for subsequent use; 若客户端解码恢复得到多个变化区域图像块,则将这多个变化区域图像块根据其位置信息与前一帧图像进行合并,以更新当前显示的帧图像。If the client decodes and recovers multiple image blocks in the changed region, the multiple image blocks in the changed region are merged with the previous frame image according to their position information to update the currently displayed frame image. 9.一种云桌面场景下基于视频内容的智能编解码系统,其特征在于,所述系统包括服务端、以及客户端,其中:9. An intelligent encoding and decoding system based on video content in a cloud desktop scenario, characterized in that the system includes a server and a client, wherein: 所述服务端,用于针对云主机系统桌面的当前帧图像,通过图像差分与内容识别技术,确定变化区域图像块、以及变化区域图像块对应的内容类型;在云主机系统的编解码能力范围内,基于变化区域图像块的大小、分布、以及内容类型,确定当前帧图像的编码方式;基于所述编码方式对所述当前帧图像进行编码,得到编码数据;对所述编码数据进行加密处理,并将所得的加密数据传输到客户端;The server is used to determine the image blocks in the changed area and the content types corresponding to the image blocks in the changed area by using image difference and content recognition technology for the current frame image of the cloud host system desktop; determine the encoding method of the current frame image based on the size, distribution, and content type of the image blocks in the changed area within the encoding and decoding capability of the cloud host system; encode the current frame image based on the encoding method to obtain encoded data; encrypt the encoded data and transmit the obtained encrypted data to the client; 所述客户端,用于对获取到的加密数据进行对应的解密、以及解码处理后,显示当前帧图像。The client is used to perform corresponding decryption and decoding processing on the acquired encrypted data and then display the current frame image.
CN202510177146.1A 2025-02-18 2025-02-18 A video content-based intelligent encoding and decoding method and system in a cloud desktop scenario Pending CN119653089A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202510177146.1A CN119653089A (en) 2025-02-18 2025-02-18 A video content-based intelligent encoding and decoding method and system in a cloud desktop scenario

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202510177146.1A CN119653089A (en) 2025-02-18 2025-02-18 A video content-based intelligent encoding and decoding method and system in a cloud desktop scenario

Publications (1)

Publication Number Publication Date
CN119653089A true CN119653089A (en) 2025-03-18

Family

ID=94938851

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202510177146.1A Pending CN119653089A (en) 2025-02-18 2025-02-18 A video content-based intelligent encoding and decoding method and system in a cloud desktop scenario

Country Status (1)

Country Link
CN (1) CN119653089A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106470345A (en) * 2015-08-21 2017-03-01 阿里巴巴集团控股有限公司 Video-encryption transmission method and decryption method, apparatus and system
CN115514956A (en) * 2022-09-16 2022-12-23 威创集团股份有限公司 An adaptive video encoding and decoding method, system, device and medium
CN116489132A (en) * 2023-04-27 2023-07-25 深圳市深信服信息安全有限公司 Virtual desktop data transmission method, server, client and storage medium
CN119299700A (en) * 2024-10-16 2025-01-10 中移(苏州)软件技术有限公司 Video encoding method, device, electronic device, chip and medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106470345A (en) * 2015-08-21 2017-03-01 阿里巴巴集团控股有限公司 Video-encryption transmission method and decryption method, apparatus and system
CN115514956A (en) * 2022-09-16 2022-12-23 威创集团股份有限公司 An adaptive video encoding and decoding method, system, device and medium
CN116489132A (en) * 2023-04-27 2023-07-25 深圳市深信服信息安全有限公司 Virtual desktop data transmission method, server, client and storage medium
CN119299700A (en) * 2024-10-16 2025-01-10 中移(苏州)软件技术有限公司 Video encoding method, device, electronic device, chip and medium

Similar Documents

Publication Publication Date Title
EP3399758B1 (en) Method and apparatus to encode and decode two-dimension point clouds
US10771803B2 (en) Constrained motion field estimation for hardware efficiency
US10979663B2 (en) Methods and apparatuses for image processing to optimize image resolution and for optimizing video streaming bandwidth for VR videos
US9262986B2 (en) Reference frame management for screen content video coding using hash or checksum functions
EP4373086A1 (en) Image processing method and apparatus, medium, and electronic device
US11647223B2 (en) Dynamic motion vector referencing for video coding
CN109391846B (en) Video scrambling method and device for self-adaptive mode selection
US20050140674A1 (en) System and method for scalable portrait video
US11818382B2 (en) Temporal prediction shifting for scalable video coding
CA3057894C (en) Video compression using down-sampling patterns in two phases
Meng et al. A robust coverless video steganography based on maximum DC coefficients against video attacks
Huang et al. A cloud computing based deep compression framework for UHD video delivery
WO2021057697A1 (en) Video encoding and decoding methods and apparatuses, storage medium, and electronic device
CN101409830B (en) Method and apparatus for judging DCT coefficient block similarity, and encrypting and deciphering image
Rajalakshmi et al. ZLBM: zero level binary mapping technique for video security
KR20010019704A (en) Macroblock-based object-oriented coding method of image sequence having a stationary background
WO2021057686A1 (en) Video decoding method and apparatus, video encoding method and apparatus, storage medium and electronic device
CN119653089A (en) A video content-based intelligent encoding and decoding method and system in a cloud desktop scenario
WO2023024832A1 (en) Data processing method and apparatus, computer device and storage medium
JP2018514133A (en) Data processing method and apparatus
CN119497991A (en) Point cloud encoding and decoding method, device, equipment and storage medium
US11025969B1 (en) Video packaging system using source encoding
CN112584151A (en) Image processing method, terminal device and server
US20240244229A1 (en) Systems and methods for predictive coding
CN115412727B (en) Coding method, decoding method and device thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination