[go: up one dir, main page]

WO2025128480A1 - Warp intra block copy - Google Patents

Warp intra block copy Download PDF

Info

Publication number
WO2025128480A1
WO2025128480A1 PCT/US2024/059178 US2024059178W WO2025128480A1 WO 2025128480 A1 WO2025128480 A1 WO 2025128480A1 US 2024059178 W US2024059178 W US 2024059178W WO 2025128480 A1 WO2025128480 A1 WO 2025128480A1
Authority
WO
WIPO (PCT)
Prior art keywords
block
warp
neighboring blocks
neighboring
parameters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/US2024/059178
Other languages
French (fr)
Inventor
Jingning Han
Cheng Chen
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Google LLC
Original Assignee
Google LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Google LLC filed Critical Google LLC
Publication of WO2025128480A1 publication Critical patent/WO2025128480A1/en
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/105Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/167Position within a video image, e.g. region of interest [ROI]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/537Motion estimation other than block-based
    • H04N19/54Motion estimation other than block-based using feature points or meshes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/593Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial prediction techniques
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards

Definitions

  • Digital video streams may represent video using a sequence of frames or still images.
  • Digital video can be used for various applications including, for example, video conferencing, high-definition video entertainment, video advertisements, or sharing of usergenerated videos.
  • a digital video stream can contain a large amount of data and consume a significant amount of computing or communication resources of a computing device for processing, transmission, or storage of the video data.
  • Various approaches have been proposed to reduce the amount of data in video streams, including compression and other coding techniques. These techniques may include both lossy and lossless coding techniques.
  • This disclosure relates generally to encoding and decoding video data and more particularly relates to a warp intra block copy.
  • An aspect of the disclosed implementations is a method coding a current block of a current frame using a warp intra-block copy mode.
  • the method includes selecting neighboring blocks that are coded using the warp intra-block copy mode; identifying respective pixel pairs for the neighboring blocks, where a pixel pair for a neighboring block includes a pixel of the neighboring block and a projected pixel within the current frame; obtaining, based on the pixel pairs, parameters of a warp model; and obtaining based on the warp model a prediction block for the current block.
  • Implementations may include one or more of the following features.
  • the method may include coding a syntax element indicating that the current block is coded using the warp intra-block copy mode.
  • the warp model can be a homographic warp model, and the parameters may include parameters defining translation, rotation, scaling, changes in aspect ratio, shearing, and perspective distortion.
  • the warp model can be an affine warp model having six parameters that project pixels of the current block to a parallelogram patch within the current frame.
  • the warp model can be a similarity motion model having four parameters that project pixels of the current block to a square patch within the current frame.
  • a number of the neighboring blocks can be based on a number of the parameters.
  • a number of the parameters can be based on a number of the neighboring blocks that are coded using the warp intra-block copy mode.
  • a number of the neighboring blocks can be at least 1 and not greater than 4.
  • Selecting the neighboring blocks may include selecting the neighboring blocks in a clockwise fashion starting with a bottom-most left neighboring block.
  • Selecting the neighboring blocks may include selecting the neighboring blocks in a counter clockwise fashion starting with a top-most right neighboring block.
  • Selecting the neighboring blocks can be based on characteristics of warp models associated with the neighboring blocks.
  • Selecting the neighboring blocks may include determining average values of horizontal and vertical components of translational parameters across available neighboring blocks coded using the warp intra-block copy mode; and selecting the neighboring blocks based on an error metric comparing translational parameters of individual neighboring blocks to the average values.
  • At least one of the neighboring blocks can be associated with a zero block vector.
  • the pixel of the neighboring block may include a center pixel of the neighboring block.
  • the projected pixel within the current frame can be obtained using a block vector associated with the neighboring block.
  • the projected pixel within the current block can be obtained using a warp model associated with the neighboring block.
  • aspects can be implemented in any convenient form.
  • aspects may be implemented by appropriate computer programs which may be carried on appropriate carrier media which may be tangible carrier media (e.g. disks) or intangible carrier media (e.g. communications signals).
  • aspects may also be implemented using suitable apparatus which may take the form of programmable computers running computer programs arranged to implement the methods and/or techniques disclosed herein.
  • a non-transitory computer-readable storage medium may include executable instructions that, when executed by a processor, facilitate performance of operations operable to cause the processor to carry out any of the methods described herein.
  • aspects can be combined such that features described in the context of one aspect may be implemented in another aspect.
  • FIG. 1 is a schematic of a video encoding and decoding system.
  • FIG. 2 is a block diagram of an example of a computing device that can implement a transmitting station or a receiving station.
  • FIG. 3 is a diagram of an example of a video stream to be encoded and subsequently decoded.
  • FIG. 4 is a block diagram of an encoder.
  • FIG. 5 is a block diagram of a decoder.
  • FIG. 6 is a block diagram illustrating the conventional IntraBC mode.
  • FIG. 7 is a flowchart of an example of a technique for coding a current block using the warp intra-block copy mode.
  • FIG. 8 illustrates neighboring blocks of a current block of a current frame.
  • FIGS. 9A-D depict different warp (or projections) models used to project pixels of a current block of a current frame to a warped (e.g., projected) patch within the same frame.
  • compression schemes related to coding video streams may include breaking images (i.e., original or source images) into blocks and generating a digital video output bitstream using one or more techniques to limit the information included in the output.
  • a received encoded bitstream can be decoded to re-create the blocks and the source images from the limited information.
  • Encoding a video stream, or a portion thereof, such as a frame or a block can include using temporal or spatial similarities in the video stream to improve coding efficiency. For example, a current block of a video stream may be encoded based on identifying a difference (residual) between previously coded pixel values and those in the current block.
  • Inter prediction uses a motion vector that represents the temporal displacement of a previously coded block relative to the current block.
  • the motion vector can be identified using a method of motion estimation, such as a motion search. In the motion search, a portion of a reference frame can be translated to a succession of locations to form a predictor block that can be subtracted from a portion of a current frame to form a series of residuals. The horizontal and/or vertical translations corresponding to the location having, e.g., the smallest residual can be selected as the motion vector.
  • the motion vector can be encoded in the encoded bitstream along with an indication of the reference frame.
  • intra prediction can attempt to predict the pixel values of a current block of a current frame of a video stream using pixels peripheral to the current block.
  • the pixels peripheral to the current blocks are pixels within the current frame but that are outside the current block.
  • the pixels peripheral to the block can be pixels adjacent to the current block. Which pixels peripheral to the block are used can depend on the intra-prediction mode and/or a scan order of the blocks of a frame. For example, in a raster scan order, peripheral pixels above a current block (i.e., the block being encoded or decoded) and/or peripheral pixels to the left of the current block may be used.
  • Intra Block Copy is a technique that is useful in certain use cases, such as in coding screen-captured content, and is particularly effective in scenarios involving repeated patterns (e.g., characters, sharp edges, etc.) within a video frame, such as those commonly found in a video frame.
  • the operation of IntraBC, as elaborated in relation to FIG. 6, is based on a translational model.
  • Intra Block Copy may be somewhat limited when it comes to handling the variety of possible pattern transformations within a video frame and cannot capture more complex pattern projections. For example, using the IntraBC mode, it is not possible to identify and use rotations, where patterns undergo angular changes; zooming, where patterns experience changes in scale; and shearing transformations, involving distortions that skew the patterns.
  • Implementations according to this disclosure more accurately address the wide range of possible pattern transformations observed in screen-captured content.
  • IntraBC uses only translational models
  • the intra-frame mode described herein i.e., “warp intra-block copy mode” or WIntraBC, for short
  • a block vector is encoded in a compressed bitstream.
  • the BV essentially codes the two parameters (i.e., a horizontal and a vertical offset) of the translational model.
  • the parameters of the warp model need not be coded in the compressed bitstream. Rather, when decoding (e.g., reconstructing) a current block, a decoder derives these parameters based on sample pairs derived from neighboring blocks of the current block. As such, compression efficiency can be improved.
  • an encoder may signal (i.e., encode in a compressed bitstream) and a decoder may decode from the compressed bitstream one or more syntax elements indicating that a current block is encoded using the WIntraBC mode. Both the encoder and the decoder implement the same process for deriving the parameters of a warp model used for coding the current block.
  • FIG. 1 is a schematic of a video encoding and decoding system 100.
  • a transmitting station 102 can be, for example, a computer having an internal configuration of hardware such as that described in FIG. 2. However, other suitable implementations of the transmitting station 102 are possible. For example, the processing of the transmitting station 102 can be distributed among multiple devices.
  • a network 104 can connect the transmitting station 102 and a receiving station 106 for encoding and decoding of the video stream.
  • the video stream can be encoded in the transmitting station 102 and the encoded video stream can be decoded in the receiving station 106.
  • the network 104 can be, for example, the Internet.
  • the network 104 can also be a local area network (LAN), wide area network (WAN), virtual private network (VPN), cellular telephone network or any other means of transferring the video stream from the transmitting station 102 to, in this example, the receiving station 106.
  • the receiving station 106 in one example, can be a computer having an internal configuration of hardware such as that described in FIG. 2. However, other suitable implementations of the receiving station 106 are possible. For example, the processing of the receiving station 106 can be distributed among multiple devices.
  • an implementation can omit the network 104.
  • a video stream can be encoded and then stored for transmission at a later time to the receiving station 106 or any other device having memory.
  • the receiving station 106 receives (e.g., via the network 104, a computer bus, and/or some communication pathway) the encoded video stream and stores the video stream for later decoding.
  • a real-time transport protocol RTP
  • a transport protocol other than RTP may be used, e.g., a Hypertext Transfer Protocol (HTTP) video streaming protocol.
  • HTTP Hypertext Transfer Protocol
  • the transmitting station 102 and/or the receiving station 106 may include the ability to both encode and decode a video stream as described below.
  • the receiving station 106 could be a video conference participant who receives an encoded video bitstream from a video conference server (e.g., the transmitting station 102) to decode and view and further encodes and transmits its own video bitstream to the video conference server for decoding and viewing by other participants.
  • FIG. 2 is a block diagram of an example of a computing device 200 (e.g., an apparatus) that can implement a transmitting station or a receiving station.
  • the computing device 200 can implement one or both of the transmitting station 102 and the receiving station 106 of FIG. 1.
  • the computing device 200 can be in the form of a computing system including multiple computing devices, or in the form of one computing device, for example, a mobile phone, a tablet computer, a laptop computer, a notebook computer, a desktop computer, and the like.
  • a CPU 202 in the computing device 200 can be a conventional central processing unit.
  • the CPU 202 can be any other type of device, or multiple devices, capable of manipulating or processing information now existing or hereafter developed.
  • the disclosed implementations can be practiced with one processor as shown, e.g., the CPU 202, advantages in speed and efficiency can be achieved using more than one processor.
  • a memory 204 in computing device 200 can be a read only memory (ROM) device or a random-access memory (RAM) device in an implementation. Any other suitable type of storage device can be used as the memory 204.
  • the memory 204 can include code and data 206 that is accessed by the CPU 202 using a bus 212.
  • the memory 204 can further include an operating system 208 and application programs 210, the application programs 210 including at least one program that permits the CPU 202 to perform the techniques described here.
  • the application programs 210 can include applications 1 through N, which further include a video coding application that performs the techniques described here.
  • Computing device 200 can also include a secondary storage 214, which can, for example, be a memory card used with a mobile computing device.
  • the computing device 200 can also include one or more output devices, such as a display 218.
  • the display 218 may be, in one example, a touch sensitive display that combines a display with a touch sensitive element that is operable to sense touch inputs.
  • the display 218 can be coupled to the CPU 202 via the bus 212.
  • Other output devices that permit a user to program or otherwise use the computing device 200 can be provided in addition to or as an alternative to the display 218.
  • the display can be implemented in various ways, including by a liquid crystal display (LCD), a cathode-ray tube (CRT) display or light emitting diode (LED) display, such as an organic LED (OLED) display.
  • LCD liquid crystal display
  • CRT cathode-ray tube
  • LED light emitting diode
  • OLED organic LED
  • the computing device 200 can also include or be in communication with an image-sensing device 220, for example a camera, or any other image-sensing device 220 now existing or hereafter developed that can sense an image such as the image of a user operating the computing device 200.
  • the image-sensing device 220 can be positioned such that it is directed toward the user operating the computing device 200.
  • the position and optical axis of the image-sensing device 220 can be configured such that the field of vision includes an area that is directly adjacent to the display 218 and from which the display 218 is visible.
  • the computing device 200 can also include or be in communication with a soundsensing device 222, for example a microphone, or any other sound-sensing device now existing or hereafter developed that can sense sounds near the computing device 200.
  • the sound-sensing device 222 can be positioned such that it is directed toward the user operating the computing device 200 and can be configured to receive sounds, for example, speech or other utterances, made by the user while the user operates the computing device 200.
  • FIG. 2 depicts the CPU 202 and the memory 204 of the computing device 200 as being integrated into one unit, other configurations can be utilized.
  • the operations of the CPU 202 can be distributed across multiple machines (wherein individual machines can have one or more of processors) that can be coupled directly or across a local area or other network.
  • the memory 204 can be distributed across multiple machines such as a network-based memory or memory in multiple machines performing the operations of the computing device 200.
  • the bus 212 of the computing device 200 can be composed of multiple buses.
  • the secondary storage 214 can be directly coupled to the other components of the computing device 200 or can be accessed via a network and can comprise an integrated unit such as a memory card or multiple units such as multiple memory cards.
  • the computing device 200 can thus be implemented in a wide variety of configurations.
  • FIG. 3 is a diagram of an example of a video stream 300 to be encoded and subsequently decoded.
  • the video stream 300 includes a video sequence 302.
  • the video sequence 302 includes a number of adjacent frames 304. While three frames are depicted as the adjacent frames 304, the video sequence 302 can include any number of adjacent frames 304.
  • the adjacent frames 304 can then be further subdivided into individual frames, e.g., a frame 306.
  • the frame 306 can be divided into a series of planes or segments 308.
  • the segments 308 can be subsets of frames that permit parallel processing, for example.
  • the segments 308 can also be subsets of frames that can separate the video data into separate colors.
  • a frame 306 of color video data can include a luminance plane and two chrominance planes.
  • the segments 308 may be sampled at different resolutions.
  • the frame 306 may be further subdivided into blocks 310, which can contain data corresponding to, for example, 16x16 pixels in the frame 306.
  • the blocks 310 can also be arranged to include data from one or more segments 308 of pixel data.
  • the blocks 310 can also be of any other suitable size such as 4x4 pixels, 8x8 pixels, 16x8 pixels, 8x16 pixels, 16x16 pixels, or larger. Unless otherwise noted, the terms block and macro-block are used interchangeably herein.
  • FIG. 4 is a block diagram of an encoder 400.
  • the encoder 400 can be implemented, as described above, in the transmitting station 102 such as by providing a computer software program stored in memory, for example, the memory 204.
  • the computer software program can include machine instructions that, when executed by a processor such as the CPU 202, cause the transmitting station 102 to encode video data in the manner described in FIG. 4.
  • the encoder 400 can also be implemented as specialized hardware included in, for example, the transmitting station 102. In one particularly desirable implementation, the encoder 400 is a hardware encoder.
  • the encoder 400 has the following stages to perform the various functions in a forward path (shown by the solid connection lines) to produce an encoded or compressed bitstream 420 using the video stream 300 as input: an intra/inter prediction stage 402, a transform stage 404, a quantization stage 406, and an entropy encoding stage 408.
  • the encoder 400 may also include a reconstruction path (shown by the dotted connection lines) to reconstruct a frame for encoding of future blocks.
  • the encoder 400 has the following stages to perform the various functions in the reconstruction path: a dequantization stage 410, an inverse transform stage 412, a reconstruction stage 414, and a loop filtering stage 416.
  • Other structural variations of the encoder 400 can be used to encode the video stream 300.
  • respective frames 304 can be processed in units of blocks.
  • respective blocks can be encoded using intra-frame prediction (also called intra-prediction) or inter- frame prediction (also called inter-prediction).
  • intra-prediction also called intra-prediction
  • inter-prediction also called inter-prediction
  • a prediction block can be formed.
  • intra-prediction a prediction block may be formed from samples in the current frame that have been previously encoded and reconstructed.
  • interprediction a prediction block may be formed from samples in one or more previously constructed reference frames.
  • the prediction block can be subtracted from the current block at the intra/inter prediction stage 402 to produce a residual block (also called a residual).
  • the transform stage 404 transforms the residual into transform coefficients in, for example, the frequency domain using block-based transforms.
  • the quantization stage 406 converts the transform coefficients into discrete quantum values, which are referred to as quantized transform coefficients, using a quantizer value or a quantization level. For example, the transform coefficients may be divided by the quantizer value and truncated.
  • the quantized transform coefficients are then entropy encoded by the entropy encoding stage 408.
  • the entropy-encoded coefficients, together with other information used to decode the block, which may include for example the type of prediction used, transform type, MVs and quantizer value, are then output to the compressed bitstream 420.
  • the compressed bitstream 420 can be formatted using various techniques, such as variable length coding (VLC) or arithmetic coding.
  • VLC variable length coding
  • the compressed bitstream 420 can also be referred to as an encoded video stream or encoded video bitstream, and the terms will be used interchangeably herein.
  • the reconstruction path in FIG. 4 can be used to ensure that the encoder 400 and a decoder 500 (described below) use the same reference frames to decode the compressed bitstream 420.
  • the reconstruction path performs functions that are similar to functions that take place during the decoding process that are discussed in more detail below, including dequantizing the quantized transform coefficients at the dequantization stage 410 and inverse transforming the dequantized transform coefficients at the inverse transform stage 412 to produce a derivative residual block (also called a derivative residual).
  • the prediction block that was predicted at the intra/inter prediction stage 402 can be added to the derivative residual to create a reconstructed block.
  • the loop filtering stage 416 can be applied to the reconstructed block to reduce distortion such as blocking artifacts.
  • encoder 400 can be used to encode the compressed bitstream 420.
  • a non-transform-based encoder can quantize the residual signal directly without the transform stage 404 for certain blocks or frames.
  • an encoder can have the quantization stage 406 and the dequantization stage 410 combined in a common stage.
  • FIG. 5 is a block diagram of a decoder 500.
  • the decoder 500 can be implemented in the receiving station 106, for example, by providing a computer software program stored in the memory 204.
  • the computer software program can include machine instructions that, when executed by a processor such as the CPU 202, cause the receiving station 106 to decode video data in the manner described in FIG. 5.
  • the decoder 500 can also be implemented in hardware included in, for example, the transmitting station 102 or the receiving station 106.
  • the decoder 500 similar to the reconstruction path of the encoder 400 discussed above, includes in one example the following stages to perform various functions to produce an output video stream 516 from the compressed bitstream 420: an entropy decoding stage 502, a dequantization stage 504, an inverse transform stage 506, an intra/inter prediction stage 508, a reconstruction stage 510, a loop filtering stage 512 and a post-loop filtering stage 514.
  • stages to perform various functions to produce an output video stream 516 from the compressed bitstream 420 includes in one example the following stages to perform various functions to produce an output video stream 516 from the compressed bitstream 420: an entropy decoding stage 502, a dequantization stage 504, an inverse transform stage 506, an intra/inter prediction stage 508, a reconstruction stage 510, a loop filtering stage 512 and a post-loop filtering stage 514.
  • Other structural variations of the decoder 500 can be used to decode the compressed bitstream 420.
  • the data elements within the compressed bitstream 420 can be decoded by the entropy decoding stage 502 to produce a set of quantized transform coefficients.
  • the dequantization stage 504 dequantizes the quantized transform coefficients (e.g., by multiplying the quantized transform coefficients by the quantizer value), and the inverse transform stage 506 inverse transforms the dequantized transform coefficients to produce a derivative residual that can be identical to that created by the inverse transform stage 412 in the encoder 400.
  • the decoder 500 can use the intra/inter prediction stage 508 to create the same prediction block as was created in the encoder 400, e.g., at the intra/inter prediction stage 402.
  • the prediction block can be added to the derivative residual to create a reconstructed block.
  • the loop filtering stage 512 can be applied to the reconstructed block to reduce blocking artifacts.
  • Other filtering can be applied to the reconstructed block.
  • the postloop filtering stage 514 is applied to the reconstructed block to reduce blocking distortion, and the result is output as the output video stream 516.
  • the output video stream 516 can also be referred to as a decoded video stream, and the terms will be used interchangeably herein.
  • Other variations of the decoder 500 can be used to decode the compressed bitstream 420.
  • the decoder 500 can produce the output video stream 516 without the post- loop filtering stage 514.
  • FIG. 6 is a block diagram illustrating the conventional IntraBC mode.
  • FIG. 6 illustrates a portion of a current frame 600 being coded and a current block 602 within the current frame 600.
  • the IntraBC mode may be limited to intra-coded frames.
  • an encoder e.g., the encoder 400 of FIG. 4 searches at least a subset of a reconstructed area 603 of the current frame 600 to identify a reference block (e.g., a reference block 604) that best matches the current block 602.
  • the reference block 604 is used as a prediction block for the current block 602.
  • a difference i.e., residual
  • a compressed bitstream e.g., the compressed bitstream 420 of FIG. 4
  • a BV such as a block vector 606.
  • IntraBC can be considered “motion compensation” within the same frame (e.g., the current frame 600), essentially using the BV as a motion vector.
  • the encoder encodes a flag (e.g., an IntraBC flag) indicating that the current block 602 is encoded using the IntraBC mode and also encodes the BV (i.e., the horizontal, BV X , and the vertical, BV y , components thereof).
  • a decoder such as the decoder 500 of FIG. 5, decodes the IntraBC flag to determine whether the current block 602 is to be decoded using the IntraBC mode. If so, then the decoder decodes the block vector 606 from the compressed bitstream to identify the reference block 604 therewith obtaining the prediction block for the current block 602. A residual block can then be decoded from the compressed bitstream to add to the prediction block therewith reconstructing the current block 602.
  • FIG. 7 is a flowchart of an example of a technique 700 for coding a current block using the warp intra-block copy mode.
  • the technique 700 is further explained with reference to FIGS. 8 and 9A-9D.
  • FIGS. 9A-9D provide generic descriptions of different warp models that may be encompassed (e.g., derived or used) when a the current block is encoded using the WIntraBC mode.
  • the technique 700 can be implemented, for example, as a software program that may be executed by computing devices such as transmitting station 102 or receiving station 106.
  • the software program can include machine-readable instructions that may be stored in a memory such as the memory 204 or the secondary storage 214, and that, when executed by a processor, such as CPU 202, may cause the computing device to perform the technique 700.
  • the technique 700 may be implemented in whole or in part in the intra/inter prediction stage 402 of the encoder 400 or in the intra/inter prediction stage 508 of the decoder 500 of FIG. 5.
  • the technique 700 can be implemented using specialized hardware or firmware. Multiple processors, memories, or both, may be used.
  • the technique 700 of FIG. 7 is depicted and described as a series of steps or operations. However, the steps or operations in accordance with this disclosure can occur in various orders and/or concurrently. Additionally, other steps or operations not presented and described herein may be used. Furthermore, not all illustrated steps or operations may be required to implement a method in accordance with the disclosed subject matter.
  • the techniques 700 derives parameters of a warp model that is used to derive a prediction block for the current block.
  • the warp model can have 2, 4, 6, 8, or some other number of parameters.
  • coding means encoding into a compressed bitstream; and when implemented by a decoder, “coding” means decoding from a compressed bitstream.
  • neighboring blocks of the current block that are coded using the WIntraBC mode are selected.
  • the warp model requires 2N parameters, then the number of selected neighboring blocks is N.
  • FIG. 8 illustrates neighboring blocks of a current block 802 of a current frame 800.
  • the blocks 804 through 814 are all of the neighboring blocks of the current block 802. However, not all of these blocks may be selected at 702 of FIG. 7. Additionally, while FIG. 8 illustrates one arrangement of all of the neighboring blocks, other arrangements are possible depending on the block partitioning scheme determined by the encoder. To reiterate, those neighboring blocks that are used in deriving the warp model parameters are referred to as selected neighboring blocks.
  • FIG. 8 illustrates that only blocks 804, 806, 810, and 814 are coded using the WIntraBC mode.
  • the neighboring blocks selected at 702 of FIG. 7 may be a subset of the blocks 804, 806, 810, and 814.
  • the number of selected neighboring blocks can depend on the warp model.
  • the encoder and decoder may be configured to use a preselected warp model (i.e., a model type).
  • the preselected warp model may be the affine model. As the affine model requires six parameters, three neighboring blocks are selected.
  • the number (e.g., 3) of neighboring blocks can be half of the number (e.g., 6) of parameters of the preselected warp model.
  • the encoder may encode a syntax element (e.g., warp_index) indicating the preselected warp model (e.g., the model type). The decoder decodes the syntax element to determine the number of parameters to derive.
  • a selection process may be applied to select the subset of the neighboring blocks.
  • the selection process may be to select the neighboring blocks in clockwise fashion starting with a bottom-most left block.
  • the blocks 804, 806, and 810 would be selected.
  • a counter clockwise fashion starting with the top-most right block e.g., the block 814) can be used.
  • the selection process may be based on the characteristics of warp models associated with the selected neighboring blocks.
  • the translational parameters of the warp models may be used as the selection metric.
  • the average (or mean) values of both the horizontal (i.e., the hi3 parameters described below) and vertical (i.e., the parameter h.23 described below) components of the translational parameters across all blocks are calculated.
  • the selection process can then be used to identify those neighboring blocks whose translational parameters - both horizontal and vertical components - most closely align with these average values based on an error metric, such as a mean square error (MSE), or some other error metric.
  • MSE mean square error
  • the selection process can be used to select three of the neighboring blocks 804, 806, 810, and 814.
  • the preselected warp model may be the homographic warp model (requiring 8 parameters with h.33 set to 1 as described with respect to FIG. 9A) but only 3 blocks were coded using the warp intra-block copy mode. As such some other blocks may be assumed to be associated with a zero block vector, as illustrated below with respect to the block 808.
  • respective pixel pairs for the neighboring blocks i.e., the selected neighboring blocks
  • a pixel pair for a neighboring block includes a pixel of the neighboring block and a projected pixel within the current frame.
  • the projected pixels can be identified using the warp models associated with the blocks. In an example, the projected pixels can be identified using only the translational components of the warp models. To identify a pair of pixels, one pixel at a location (x, y) from a neighboring block is selected and a corresponding projection pixel (x', y') based on the warp model (or a subset thereof) of the neighboring block is identified. In an example, the selected pixel can be the center pixel of the neighboring block.
  • a pixel 816A of the neighboring block 804 is projected using the warp model associated with the block 804 to obtain a pixel 816B; a pixel 818A of the neighboring block 806 is projected using the warp model associated with the block 806 to obtain a pixel 818B; and so on.
  • the selected pixel and the projection pixel are considered the same, as illustrated with respect to a selected pixel 820A and projected pixel 820B of the block 808.
  • parameters of a warp model are obtained based on the pixel pairs.
  • a set of linear equations can be formulated based on the pixel pairs and solved.
  • the warp model is a six parameter model (e.g., the affine model)
  • each pixel pair contributes 2 equations, obtained using equation (4) below, to a system of linear equations that can be assembled into a matrix system and solved using standard techniques in linear algebra.
  • the system of linear equations may be given by:
  • a prediction block for the current block is obtained using the obtained warp model. That is, each pixel location of the current block 802 is projected onto a patch 822 to obtain the corresponding the predicted pixel.
  • the patch 822 illustrates a shape of the prediction “block” of the current block 802 using the warp model having the obtained parameters hn through A25.
  • the technique 700 can include coding a syntax element (e.g., a flag) indicating that the current block is coded using the WIntraBC mode. That is, when implemented by an encoder, the syntax element is encoded in the compressed bitstream; and when implemented by a decoder, the syntax element is decoded from the compressed bitstream. In response to determining that the current is encoded using the wrap intra-block copy mode, the encoder and the decoder perform 702-708 of the technique 700.
  • a syntax element e.g., a flag
  • the encoder may determine that the current block is to be encoded using the WIntraBC mode based on rate-distortion optimization.
  • a rate-distortion (R-D) optimization represents the fundamental trade-off between compression efficiency (rate, measured in bits) and quality loss (distortion) in video encoding.
  • the optimization process occurs at multiple levels in the encoding pipeline, including coding unit decisions, transform selections, and quantization parameters.
  • the encoder When selecting between available coding modes (such as intra prediction, inter prediction, and WIntraBC), the encoder calculates distortion using metrics like Sum of Absolute Differences (SAD), determines the rate including both mode signaling and residual coding bits, and computes the combined R-D cost.
  • SAD Sum of Absolute Differences
  • the encoder may employ various optimization strategies including early termination of mode searches, content-aware mode filtering, adaptive thresholds to skip testing unlikely modes, and statistical tracking of mode usage patterns, ultimately selecting the mode that minimizes the R-D cost function while respecting bitrate and quality constraints.
  • the number of the selected neighboring blocks can be based on a number of the parameters of the warp model. As mentioned above, if the preselected warp model requires 2N parameters, then the number of selected neighboring blocks is N. In another example, the warp model selected can be based on the number of neighboring blocks that are coded using the WIntraBC mode. To illustrate, if M of the neighboring blocks are coded using the WIntraBC mode, then the warp model to obtain parameters for can be one where the number of parameters is max(2M, 8). In an example, the number of selected neighboring blocks can at least 1 (in the case of a simple translational model) and not greater than 4 (in the case of a homographic projection).
  • FIGS. 9A-D depict different warp (or projections) models used to project pixels of a current block of a current frame to a warped (e.g., projected) patch within the same frame.
  • the warped patch can be used to generate a prediction block for encoding or decoding the current block.
  • a warp model indicates how the pixels of the current block are to be scaled, rotated, or otherwise moved when projected into the warped patch. The number and function of the parameters of a warp model depend upon the specific projection used.
  • pixels of a block 902A are projected to a warped patch 904A of a frame 900A using a homographic warp model.
  • a homographic warp model uses eight parameters to project the pixels of the block 902A to the warped patch 904A.
  • a homographic warp model is not bound by a linear transformation between the coordinates of two spaces.
  • the eight parameters that define a homographic warp model can be used to project pixels of the block 902A to a quadrilateral patch (e.g., the warped patch 904A) within the frame 900A.
  • Homographic motion models thus support translation, rotation, scaling, changes in aspect ratio, shearing, and other non-parallelogram warping.
  • a homographic warp model between two spaces is defined using equation (1), which can be rewritten as equation (2):
  • (x',y') and (x, y) are coordinates of two spaces, namely, a projected position of a pixel within the frame 900 A and an original position of a pixel within the block 902A, respectively.
  • hn through hs 3 are the homographic parameters and can be real numbers representing a relationship between positions of respective pixels within the frame 900 A and the block 902 A.
  • the parameter, hn represents horizontal scaling - it scales the x-coordinates, but in the context of a general homography, it can also be influenced by rotation and shearing.
  • the parameter hi 2 typically represents horizontal shearing or a combination of rotation and scaling effects.
  • the parameter hn is a translation along the x-axis.
  • the parameter Im is similar to the parameter h.12 and typically represents vertical shearing or a combination of rotation and scaling effects.
  • the parameter /122 relates to vertical scaling, affecting the y- coordinates and, like hn, it can also be influenced by rotation and shearing in a general homography.
  • the parameter h.23 is a translation along the y-axis.
  • the parameters Im and /132 introduce perspective distortion.
  • the parameters h.31 and /132 are responsible for effects such as tilting or depth, where the transformation varies depending on the x and y positions, respectively.
  • the parameter h.33 is often set to 1 in many practical applications to keep the matrix calculations consistent, especially in affine transformations.
  • the parameter h.33 can affect the overall scale and perspective distortion and acts as a normalizing factor in the conversion from homogeneous coordinates to Cartesian coordinates.
  • the parameter h.33 can vary, contributing to the perspective effect.
  • pixels of a block 902B are projected to a warped patch 904B of a frame 900B using an affine motion model.
  • An affine warp model uses six parameters to project the pixels of the block 902B to the warped patch 904B.
  • An affine motion is a linear transformation between the coordinates of two spaces defined by the six parameters.
  • the six parameters that define an affine motion model can be used to project pixels of the block 902B to a parallelogram patch (e.g., the warped patch 904B) within the frame 900B.
  • Affine motion models thus support translation, rotation, scale, changes in aspect ratio, and shearing.
  • the affine projection between two spaces can be given by equation (3), which is a subset of equation (3), which can be rewritten as equation (4):
  • the coordinates (x',y') and (x,y) are as described above.
  • the parameter hn performs scaling along the x-axis where a value greater than 1 enlarges the object along x, while a value between 0 and 1 shrinks it.
  • the parameter hn scales an object along the y-axis where a value greater than 1 enlarges the object along y, and a value between 0 and 1 shrinks it.
  • the parameter hn and .21 perform shearing where hi2 (or A27) performs horizontal (or vertical) shearing by changing the x-coordinate (or the y-coordinate) values, effectively tilting the object along the x-axis (or the y-axis).
  • the parameter hi3 translates an object horizontally.
  • the parameter .23 translates an object vertically.
  • the tuple (h 13 , h 23 ) corresponds to a conventional block (or motion) vector that can be used in a translational model; the parameters hn and / 22 can be used to control the scaling factors in the vertical and horizontal axes, and in conjunction with the parameters hn and .21 decide (e.g., determine, set, etc.) a rotation angle.
  • pixels of a block 902C are projected to a warped patch 904C of a frame 900C using a similarity motion model.
  • a similarity motion model uses four parameters to project the pixels of the block 902C to the warped patch 904C.
  • a similarity motion is a linear transformation between the coordinates of two spaces defined by the four parameters.
  • the four parameters can be a translation along the x-axis, a translation along the y-axis, a rotation value, and a zoom value.
  • the four parameters that define a similarity motion model can be used to project pixels of the block 902C to a square patch (e.g., the warped patch 904C) within the frame 900C. Similarity motion models thus support square to square transformation with rotation and zoom.
  • pixels of a block 902D are projected to a warped patch 904D of a frame 900D using a translational motion model.
  • a translational motion model uses two parameters to project the pixels of the block 902D to the warped patch 904D.
  • a translational motion is a linear transformation between the coordinates of two spaces defined by the two parameters.
  • the two parameters can be a translation along the x-axis and a translation along the y-axis.
  • the two parameters that define a translational motion model can be used to project pixels of the block 902D to a square patch (e.g., the warped patch 904D) within the frame 900D.
  • Clause 1 A method for coding a current block of a current frame using a warp intra-block copy mode. The method includes selecting neighboring blocks that are coded using the warp intra-block copy mode; identifying respective pixel pairs for the neighboring blocks, where a pixel pair for a neighboring block includes a pixel of the neighboring block and a projected pixel within the current frame; obtaining, based on the pixel pairs, parameters of a warp model; and obtaining, based on the warp model, a prediction block for the current block.
  • Clause 2 The method of Clause 1, further including coding a syntax element indicating that the current block is coded using the warp intra-block copy mode.
  • Clause 3 The method of Clause 1, where the warp model is a homographic warp model, and where the parameters include parameters defining translation, rotation, scaling, changes in aspect ratio, shearing, and perspective distortion.
  • Clause 4 The method of Clause 1, where the warp model is an affine warp model having six parameters that project pixels of the current block to a parallelogram patch within the current frame.
  • Clause 5 The method of Clause 1, where the warp model is a similarity motion model having four parameters that project pixels of the current block to a square patch within the current frame.
  • Clause 6 The method of Clause 1, where a number of the neighboring blocks is based on a number of the parameters.
  • Clause 7 The method of Clause 1, where a number of the parameters is based on a number of the neighboring blocks that are coded using the warp intra-block copy mode.
  • Clause 8 The method of Clause 1, where a number of the neighboring blocks is at least 1 and not greater than 4.
  • Clause 9 The method of Clause 1, where selecting the neighboring blocks includes selecting the neighboring blocks in a clockwise fashion starting with a bottom-most left neighboring block.
  • Clause 10 The method of Clause 1, where selecting the neighboring blocks includes selecting the neighboring blocks in a counterclockwise fashion starting with a topmost right neighboring block.
  • Clause 11 The method of Clause 1, where selecting the neighboring blocks is based on characteristics of warp models associated with the neighboring blocks.
  • Clause 12 The method of Clause 1, where selecting the neighboring blocks includes determining average values of horizontal and vertical components of translational parameters across available neighboring blocks coded using the warp intra-block copy mode; and selecting the neighboring blocks based on an error metric comparing translational parameters of individual neighboring blocks to the average values.
  • Clause 13 The method of Clause 1, where at least one of the neighboring blocks is associated with a zero block vector.
  • Clause 14 The method of Clause 1, where for each neighboring block, the pixel of the neighboring block comprises a center pixel of the neighboring block.
  • Clause 15 The method of Clause 1, where the projected pixel within the current frame is obtained using a block vector associated with the neighboring block.
  • Clause 16 The method of Clause 1, where the projected pixel within the current block is obtained using a warp model associated with the neighboring block.
  • Claim 17 A device that includes a processor that is configured to perform the method of any of clauses 1-16.
  • Clause 18 A device that includes a memory and a processor.
  • the processor is configured to execute instructions stored in the memory to perform the method of any of clauses 1-16.
  • Clause 19 A non-transitory computer-readable storage medium that includes executable instructions that, when executed by a processor, facilitate performance of operations, including operations that perform the method of any of clauses 1-16.
  • Clause 20 A non-transitory computer-readable storage medium having stored thereon an encoded bitstream, wherein the encoded bitstream is configured for decoding by the method of any of clauses 1-16.
  • Clause 21 A non-transitory computer-readable storage medium having stored thereon an encoded bitstream, wherein the encoded bitstream is generated by an encoder performing the method of any of clauses 1-16.
  • example is used herein to mean serving as an example, instance, or illustration. Any aspect or design described herein as “example” is not necessarily to be construed as preferred or advantageous over other aspects or designs. Rather, use of the word “example” is intended to present concepts in a concrete fashion.
  • the term “or” is intended to mean an inclusive “or” rather than an exclusive “or.” That is, unless specified otherwise, or clear from context, “X includes A or B” is intended to mean any of the natural inclusive permutations. That is, if X includes A; X includes B; or X includes both A and B, then “X includes A or B” is satisfied under any of the foregoing instances.
  • Implementations of the transmitting station 102 and/or the receiving station 106 can be realized in hardware, software, or any combination thereof.
  • the hardware can include, for example, computers, intellectual property (IP) cores, application- specific integrated circuits (ASICs), programmable logic arrays, optical processors, programmable logic controllers, microcode, microcontrollers, servers, microprocessors, digital signal processors or any other suitable circuit.
  • IP intellectual property
  • ASICs application- specific integrated circuits
  • programmable logic arrays optical processors
  • programmable logic controllers programmable logic controllers
  • microcode microcontrollers
  • servers microprocessors, digital signal processors or any other suitable circuit.
  • signal processors should be understood as encompassing any of the foregoing hardware, either singly or in combination.
  • signals and “data” are used interchangeably. Further, portions of the transmitting station 102 and the receiving station 106 do not necessarily have to be implemented in the same manner.
  • the transmitting station 102 or the receiving station 106 can be implemented using a general-purpose computer or general-purpose processor with a computer program that, when executed, carries out any of the respective methods, algorithms and/or instructions described herein.
  • a special purpose computer/processor can be utilized which can contain other hardware for carrying out any of the methods, algorithms, or instructions described herein.
  • the transmitting station 102 and the receiving station 106 can, for example, be implemented on computers in a video conferencing system.
  • the transmitting station 102 can be implemented on a server and the receiving station 106 can be implemented on a device separate from the server, such as a hand-held communications device.
  • the transmitting station 102 can encode content using an encoder 400 into an encoded video signal and transmit the encoded video signal to the communications device.
  • the communications device can then decode the encoded video signal using a decoder 500.
  • the communications device can decode content stored locally on the communications device, for example, content that was not transmitted by the transmitting station 102.
  • the receiving station 106 can be a generally stationary personal computer rather than a portable communications device and/or a device including an encoder 400 may also include a decoder 500.
  • implementations of the present disclosure can take the form of a computer program product accessible from, for example, a computer-usable or computer-readable medium.
  • a computer-usable or computer-readable medium can be any device that can, for example, tangibly contain, store, communicate, or transport the program for use by or in connection with any processor.
  • the medium can be, for example, an electronic, magnetic, optical, electromagnetic, or a semiconductor device. Other suitable mediums are also available.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

Coding a current block of a current frame using a warp intra-block copy mode is disclosed. Neighboring blocks that are coded using the warp intra-block copy mode are selected. Respective pixel pairs for the neighboring blocks are identified. A pixel pair for a neighboring block includes a pixel of the neighboring block and a projected pixel within the current frame. Parameters of a warp model are obtaining based on the pixel pairs. Based on the warp model, a prediction block is obtained for the current block..

Description

WARP INTRA BLOCK COPY
BACKGROUND
[0001] Digital video streams may represent video using a sequence of frames or still images. Digital video can be used for various applications including, for example, video conferencing, high-definition video entertainment, video advertisements, or sharing of usergenerated videos. A digital video stream can contain a large amount of data and consume a significant amount of computing or communication resources of a computing device for processing, transmission, or storage of the video data. Various approaches have been proposed to reduce the amount of data in video streams, including compression and other coding techniques. These techniques may include both lossy and lossless coding techniques.
SUMMARY
[0002] This disclosure relates generally to encoding and decoding video data and more particularly relates to a warp intra block copy.
[0003] An aspect of the disclosed implementations is a method coding a current block of a current frame using a warp intra-block copy mode. The method includes selecting neighboring blocks that are coded using the warp intra-block copy mode; identifying respective pixel pairs for the neighboring blocks, where a pixel pair for a neighboring block includes a pixel of the neighboring block and a projected pixel within the current frame; obtaining, based on the pixel pairs, parameters of a warp model; and obtaining based on the warp model a prediction block for the current block. Implementations may include one or more of the following features.
[0004] The method may include coding a syntax element indicating that the current block is coded using the warp intra-block copy mode.
[0005] The warp model can be a homographic warp model, and the parameters may include parameters defining translation, rotation, scaling, changes in aspect ratio, shearing, and perspective distortion.
[0006] The warp model can be an affine warp model having six parameters that project pixels of the current block to a parallelogram patch within the current frame.
[0007] The warp model can be a similarity motion model having four parameters that project pixels of the current block to a square patch within the current frame. [0008] A number of the neighboring blocks can be based on a number of the parameters. [0009] A number of the parameters can be based on a number of the neighboring blocks that are coded using the warp intra-block copy mode.
[0010] A number of the neighboring blocks can be at least 1 and not greater than 4.
[0011] Selecting the neighboring blocks may include selecting the neighboring blocks in a clockwise fashion starting with a bottom-most left neighboring block.
[0012] Selecting the neighboring blocks may include selecting the neighboring blocks in a counter clockwise fashion starting with a top-most right neighboring block.
[0013] Selecting the neighboring blocks can be based on characteristics of warp models associated with the neighboring blocks.
[0014] Selecting the neighboring blocks may include determining average values of horizontal and vertical components of translational parameters across available neighboring blocks coded using the warp intra-block copy mode; and selecting the neighboring blocks based on an error metric comparing translational parameters of individual neighboring blocks to the average values.
[0015] At least one of the neighboring blocks can be associated with a zero block vector. [0016] For each neighboring block, the pixel of the neighboring block may include a center pixel of the neighboring block.
[0017] The projected pixel within the current frame can be obtained using a block vector associated with the neighboring block.
[0018] The projected pixel within the current block can be obtained using a warp model associated with the neighboring block.
[0019] It will be appreciated that aspects can be implemented in any convenient form. For example, aspects may be implemented by appropriate computer programs which may be carried on appropriate carrier media which may be tangible carrier media (e.g. disks) or intangible carrier media (e.g. communications signals). Aspects may also be implemented using suitable apparatus which may take the form of programmable computers running computer programs arranged to implement the methods and/or techniques disclosed herein. For example, a non-transitory computer-readable storage medium may include executable instructions that, when executed by a processor, facilitate performance of operations operable to cause the processor to carry out any of the methods described herein. Aspects can be combined such that features described in the context of one aspect may be implemented in another aspect. [0020] These and other aspects of the present disclosure are disclosed in the following detailed description of the embodiments, the appended claims, and the accompanying figures.
BRIEF DESCRIPTION OF THE DRAWINGS
[0021] The description herein refers to the accompanying drawings described below wherein like reference numerals refer to like parts throughout the several views.
[0022] FIG. 1 is a schematic of a video encoding and decoding system.
[0023] FIG. 2 is a block diagram of an example of a computing device that can implement a transmitting station or a receiving station.
[0024] FIG. 3 is a diagram of an example of a video stream to be encoded and subsequently decoded.
[0025] FIG. 4 is a block diagram of an encoder.
[0026] FIG. 5 is a block diagram of a decoder.
[0027] FIG. 6 is a block diagram illustrating the conventional IntraBC mode.
[0028] FIG. 7 is a flowchart of an example of a technique for coding a current block using the warp intra-block copy mode.
[0029] FIG. 8 illustrates neighboring blocks of a current block of a current frame.
[0030] FIGS. 9A-D depict different warp (or projections) models used to project pixels of a current block of a current frame to a warped (e.g., projected) patch within the same frame.
DETAIEED DESCRIPTION
[0031] As mentioned above, compression schemes related to coding video streams may include breaking images (i.e., original or source images) into blocks and generating a digital video output bitstream using one or more techniques to limit the information included in the output. A received encoded bitstream can be decoded to re-create the blocks and the source images from the limited information. Encoding a video stream, or a portion thereof, such as a frame or a block, can include using temporal or spatial similarities in the video stream to improve coding efficiency. For example, a current block of a video stream may be encoded based on identifying a difference (residual) between previously coded pixel values and those in the current block. In this way, only the residual and parameters used to generate the residual need be added to the encoded bitstream. The residual may be encoded using a lossy quantization step. Decoding (i.e., reconstructing) an encoded block from such a residual often results in a distortion between the original and the reconstructed block. [0032] Encoding using temporal similarities is known as inter prediction. Inter prediction uses a motion vector that represents the temporal displacement of a previously coded block relative to the current block. The motion vector can be identified using a method of motion estimation, such as a motion search. In the motion search, a portion of a reference frame can be translated to a succession of locations to form a predictor block that can be subtracted from a portion of a current frame to form a series of residuals. The horizontal and/or vertical translations corresponding to the location having, e.g., the smallest residual can be selected as the motion vector. The motion vector can be encoded in the encoded bitstream along with an indication of the reference frame.
[0033] Encoding using spatial similarities is known as intra prediction. Using an intraprediction mode, intra prediction can attempt to predict the pixel values of a current block of a current frame of a video stream using pixels peripheral to the current block. The pixels peripheral to the current blocks are pixels within the current frame but that are outside the current block. The pixels peripheral to the block can be pixels adjacent to the current block. Which pixels peripheral to the block are used can depend on the intra-prediction mode and/or a scan order of the blocks of a frame. For example, in a raster scan order, peripheral pixels above a current block (i.e., the block being encoded or decoded) and/or peripheral pixels to the left of the current block may be used.
[0034] Intra Block Copy (IntraBC) is a technique that is useful in certain use cases, such as in coding screen-captured content, and is particularly effective in scenarios involving repeated patterns (e.g., characters, sharp edges, etc.) within a video frame, such as those commonly found in a video frame. The operation of IntraBC, as elaborated in relation to FIG. 6, is based on a translational model.
[0035] However, the scope of repeated patterns in a frame often extends beyond mere translational motion (or shift). As such, the scope of Intra Block Copy (IntraBC) may be somewhat limited when it comes to handling the variety of possible pattern transformations within a video frame and cannot capture more complex pattern projections. For example, using the IntraBC mode, it is not possible to identify and use rotations, where patterns undergo angular changes; zooming, where patterns experience changes in scale; and shearing transformations, involving distortions that skew the patterns.
[0036] Implementations according to this disclosure more accurately address the wide range of possible pattern transformations observed in screen-captured content. Whereas IntraBC uses only translational models, the intra-frame mode described herein (i.e., “warp intra-block copy mode” or WIntraBC, for short) improves prediction accuracy by identifying matches based on warp (i.e., projection or transformation) models that are more sophisticated than simple translation therewith enabling a more comprehensive and versatile approach to pattern replication and manipulation in screen-captured content.
[0037] Additionally, and as further described with respect to FIG. 6, in the IntraBC mode, a block vector (BV) is encoded in a compressed bitstream. The BV essentially codes the two parameters (i.e., a horizontal and a vertical offset) of the translational model. However, in the WIntraBC mode, and as further described herein, the parameters of the warp model need not be coded in the compressed bitstream. Rather, when decoding (e.g., reconstructing) a current block, a decoder derives these parameters based on sample pairs derived from neighboring blocks of the current block. As such, compression efficiency can be improved.
[0038] At a high level, in the WIntraBC mode, an encoder may signal (i.e., encode in a compressed bitstream) and a decoder may decode from the compressed bitstream one or more syntax elements indicating that a current block is encoded using the WIntraBC mode. Both the encoder and the decoder implement the same process for deriving the parameters of a warp model used for coding the current block.
[0039] Further details of warp intra block copy are described herein with initial reference to a system in which it can be implemented. FIG. 1 is a schematic of a video encoding and decoding system 100. A transmitting station 102 can be, for example, a computer having an internal configuration of hardware such as that described in FIG. 2. However, other suitable implementations of the transmitting station 102 are possible. For example, the processing of the transmitting station 102 can be distributed among multiple devices.
[0040] A network 104 can connect the transmitting station 102 and a receiving station 106 for encoding and decoding of the video stream. Specifically, the video stream can be encoded in the transmitting station 102 and the encoded video stream can be decoded in the receiving station 106. The network 104 can be, for example, the Internet. The network 104 can also be a local area network (LAN), wide area network (WAN), virtual private network (VPN), cellular telephone network or any other means of transferring the video stream from the transmitting station 102 to, in this example, the receiving station 106.
[0041] The receiving station 106, in one example, can be a computer having an internal configuration of hardware such as that described in FIG. 2. However, other suitable implementations of the receiving station 106 are possible. For example, the processing of the receiving station 106 can be distributed among multiple devices.
[0042] Other implementations of the video encoding and decoding system 100 are possible. For example, an implementation can omit the network 104. In another implementation, a video stream can be encoded and then stored for transmission at a later time to the receiving station 106 or any other device having memory. In one implementation, the receiving station 106 receives (e.g., via the network 104, a computer bus, and/or some communication pathway) the encoded video stream and stores the video stream for later decoding. In an example implementation, a real-time transport protocol (RTP) is used for transmission of the encoded video over the network 104. In another implementation, a transport protocol other than RTP may be used, e.g., a Hypertext Transfer Protocol (HTTP) video streaming protocol.
[0043] When used in a video conferencing system, for example, the transmitting station 102 and/or the receiving station 106 may include the ability to both encode and decode a video stream as described below. For example, the receiving station 106 could be a video conference participant who receives an encoded video bitstream from a video conference server (e.g., the transmitting station 102) to decode and view and further encodes and transmits its own video bitstream to the video conference server for decoding and viewing by other participants.
[0044] FIG. 2 is a block diagram of an example of a computing device 200 (e.g., an apparatus) that can implement a transmitting station or a receiving station. For example, the computing device 200 can implement one or both of the transmitting station 102 and the receiving station 106 of FIG. 1. The computing device 200 can be in the form of a computing system including multiple computing devices, or in the form of one computing device, for example, a mobile phone, a tablet computer, a laptop computer, a notebook computer, a desktop computer, and the like.
[0045] A CPU 202 in the computing device 200 can be a conventional central processing unit. Alternatively, the CPU 202 can be any other type of device, or multiple devices, capable of manipulating or processing information now existing or hereafter developed. Although the disclosed implementations can be practiced with one processor as shown, e.g., the CPU 202, advantages in speed and efficiency can be achieved using more than one processor.
[0046] A memory 204 in computing device 200 can be a read only memory (ROM) device or a random-access memory (RAM) device in an implementation. Any other suitable type of storage device can be used as the memory 204. The memory 204 can include code and data 206 that is accessed by the CPU 202 using a bus 212. The memory 204 can further include an operating system 208 and application programs 210, the application programs 210 including at least one program that permits the CPU 202 to perform the techniques described here. For example, the application programs 210 can include applications 1 through N, which further include a video coding application that performs the techniques described here. Computing device 200 can also include a secondary storage 214, which can, for example, be a memory card used with a mobile computing device. Because the video communication sessions may contain a significant amount of information, they can be stored in whole or in part in the secondary storage 214 and loaded into the memory 204 as needed for processing. [0047] The computing device 200 can also include one or more output devices, such as a display 218. The display 218 may be, in one example, a touch sensitive display that combines a display with a touch sensitive element that is operable to sense touch inputs. The display 218 can be coupled to the CPU 202 via the bus 212. Other output devices that permit a user to program or otherwise use the computing device 200 can be provided in addition to or as an alternative to the display 218. When the output device is or includes a display, the display can be implemented in various ways, including by a liquid crystal display (LCD), a cathode-ray tube (CRT) display or light emitting diode (LED) display, such as an organic LED (OLED) display.
[0048] The computing device 200 can also include or be in communication with an image-sensing device 220, for example a camera, or any other image-sensing device 220 now existing or hereafter developed that can sense an image such as the image of a user operating the computing device 200. The image-sensing device 220 can be positioned such that it is directed toward the user operating the computing device 200. In an example, the position and optical axis of the image-sensing device 220 can be configured such that the field of vision includes an area that is directly adjacent to the display 218 and from which the display 218 is visible.
[0049] The computing device 200 can also include or be in communication with a soundsensing device 222, for example a microphone, or any other sound-sensing device now existing or hereafter developed that can sense sounds near the computing device 200. The sound-sensing device 222 can be positioned such that it is directed toward the user operating the computing device 200 and can be configured to receive sounds, for example, speech or other utterances, made by the user while the user operates the computing device 200.
[0050] Although FIG. 2 depicts the CPU 202 and the memory 204 of the computing device 200 as being integrated into one unit, other configurations can be utilized. The operations of the CPU 202 can be distributed across multiple machines (wherein individual machines can have one or more of processors) that can be coupled directly or across a local area or other network. The memory 204 can be distributed across multiple machines such as a network-based memory or memory in multiple machines performing the operations of the computing device 200. Although depicted here as one bus, the bus 212 of the computing device 200 can be composed of multiple buses. Further, the secondary storage 214 can be directly coupled to the other components of the computing device 200 or can be accessed via a network and can comprise an integrated unit such as a memory card or multiple units such as multiple memory cards. The computing device 200 can thus be implemented in a wide variety of configurations.
[0051] FIG. 3 is a diagram of an example of a video stream 300 to be encoded and subsequently decoded. The video stream 300 includes a video sequence 302. At the next level, the video sequence 302 includes a number of adjacent frames 304. While three frames are depicted as the adjacent frames 304, the video sequence 302 can include any number of adjacent frames 304. The adjacent frames 304 can then be further subdivided into individual frames, e.g., a frame 306. At the next level, the frame 306 can be divided into a series of planes or segments 308. The segments 308 can be subsets of frames that permit parallel processing, for example. The segments 308 can also be subsets of frames that can separate the video data into separate colors. For example, a frame 306 of color video data can include a luminance plane and two chrominance planes. The segments 308 may be sampled at different resolutions.
[0052] Whether or not the frame 306 is divided into segments 308, the frame 306 may be further subdivided into blocks 310, which can contain data corresponding to, for example, 16x16 pixels in the frame 306. The blocks 310 can also be arranged to include data from one or more segments 308 of pixel data. The blocks 310 can also be of any other suitable size such as 4x4 pixels, 8x8 pixels, 16x8 pixels, 8x16 pixels, 16x16 pixels, or larger. Unless otherwise noted, the terms block and macro-block are used interchangeably herein.
[0053] FIG. 4 is a block diagram of an encoder 400. The encoder 400 can be implemented, as described above, in the transmitting station 102 such as by providing a computer software program stored in memory, for example, the memory 204. The computer software program can include machine instructions that, when executed by a processor such as the CPU 202, cause the transmitting station 102 to encode video data in the manner described in FIG. 4. The encoder 400 can also be implemented as specialized hardware included in, for example, the transmitting station 102. In one particularly desirable implementation, the encoder 400 is a hardware encoder.
[0054] The encoder 400 has the following stages to perform the various functions in a forward path (shown by the solid connection lines) to produce an encoded or compressed bitstream 420 using the video stream 300 as input: an intra/inter prediction stage 402, a transform stage 404, a quantization stage 406, and an entropy encoding stage 408. The encoder 400 may also include a reconstruction path (shown by the dotted connection lines) to reconstruct a frame for encoding of future blocks. In FIG. 4, the encoder 400 has the following stages to perform the various functions in the reconstruction path: a dequantization stage 410, an inverse transform stage 412, a reconstruction stage 414, and a loop filtering stage 416. Other structural variations of the encoder 400 can be used to encode the video stream 300.
[0055] When the video stream 300 is presented for encoding, respective frames 304, such as the frame 306, can be processed in units of blocks. At the intra/inter prediction stage 402, respective blocks can be encoded using intra-frame prediction (also called intra-prediction) or inter- frame prediction (also called inter-prediction). In any case, a prediction block can be formed. In the case of intra-prediction, a prediction block may be formed from samples in the current frame that have been previously encoded and reconstructed. In the case of interprediction, a prediction block may be formed from samples in one or more previously constructed reference frames.
[0056] Next, still referring to FIG. 4, the prediction block can be subtracted from the current block at the intra/inter prediction stage 402 to produce a residual block (also called a residual). The transform stage 404 transforms the residual into transform coefficients in, for example, the frequency domain using block-based transforms. The quantization stage 406 converts the transform coefficients into discrete quantum values, which are referred to as quantized transform coefficients, using a quantizer value or a quantization level. For example, the transform coefficients may be divided by the quantizer value and truncated. The quantized transform coefficients are then entropy encoded by the entropy encoding stage 408. The entropy-encoded coefficients, together with other information used to decode the block, which may include for example the type of prediction used, transform type, MVs and quantizer value, are then output to the compressed bitstream 420. The compressed bitstream 420 can be formatted using various techniques, such as variable length coding (VLC) or arithmetic coding. The compressed bitstream 420 can also be referred to as an encoded video stream or encoded video bitstream, and the terms will be used interchangeably herein.
[0057] The reconstruction path in FIG. 4 (shown by the dotted connection lines) can be used to ensure that the encoder 400 and a decoder 500 (described below) use the same reference frames to decode the compressed bitstream 420. The reconstruction path performs functions that are similar to functions that take place during the decoding process that are discussed in more detail below, including dequantizing the quantized transform coefficients at the dequantization stage 410 and inverse transforming the dequantized transform coefficients at the inverse transform stage 412 to produce a derivative residual block (also called a derivative residual). At the reconstruction stage 414, the prediction block that was predicted at the intra/inter prediction stage 402 can be added to the derivative residual to create a reconstructed block. The loop filtering stage 416 can be applied to the reconstructed block to reduce distortion such as blocking artifacts.
[0058] Other variations of the encoder 400 can be used to encode the compressed bitstream 420. For example, a non-transform-based encoder can quantize the residual signal directly without the transform stage 404 for certain blocks or frames. In another implementation, an encoder can have the quantization stage 406 and the dequantization stage 410 combined in a common stage.
[0059] FIG. 5 is a block diagram of a decoder 500. The decoder 500 can be implemented in the receiving station 106, for example, by providing a computer software program stored in the memory 204. The computer software program can include machine instructions that, when executed by a processor such as the CPU 202, cause the receiving station 106 to decode video data in the manner described in FIG. 5. The decoder 500 can also be implemented in hardware included in, for example, the transmitting station 102 or the receiving station 106. [0060] The decoder 500, similar to the reconstruction path of the encoder 400 discussed above, includes in one example the following stages to perform various functions to produce an output video stream 516 from the compressed bitstream 420: an entropy decoding stage 502, a dequantization stage 504, an inverse transform stage 506, an intra/inter prediction stage 508, a reconstruction stage 510, a loop filtering stage 512 and a post-loop filtering stage 514. Other structural variations of the decoder 500 can be used to decode the compressed bitstream 420.
[0061] When the compressed bitstream 420 is presented for decoding, the data elements within the compressed bitstream 420 can be decoded by the entropy decoding stage 502 to produce a set of quantized transform coefficients. The dequantization stage 504 dequantizes the quantized transform coefficients (e.g., by multiplying the quantized transform coefficients by the quantizer value), and the inverse transform stage 506 inverse transforms the dequantized transform coefficients to produce a derivative residual that can be identical to that created by the inverse transform stage 412 in the encoder 400. Using header information decoded from the compressed bitstream 420, the decoder 500 can use the intra/inter prediction stage 508 to create the same prediction block as was created in the encoder 400, e.g., at the intra/inter prediction stage 402. At the reconstruction stage 510, the prediction block can be added to the derivative residual to create a reconstructed block. The loop filtering stage 512 can be applied to the reconstructed block to reduce blocking artifacts.
[0062] Other filtering can be applied to the reconstructed block. In this example, the postloop filtering stage 514 is applied to the reconstructed block to reduce blocking distortion, and the result is output as the output video stream 516. The output video stream 516 can also be referred to as a decoded video stream, and the terms will be used interchangeably herein. Other variations of the decoder 500 can be used to decode the compressed bitstream 420. For example, the decoder 500 can produce the output video stream 516 without the post- loop filtering stage 514.
[0063] FIG. 6 is a block diagram illustrating the conventional IntraBC mode. FIG. 6 illustrates a portion of a current frame 600 being coded and a current block 602 within the current frame 600. The IntraBC mode may be limited to intra-coded frames.
[0064] In the IntraBC mode, an encoder (e.g., the encoder 400 of FIG. 4) searches at least a subset of a reconstructed area 603 of the current frame 600 to identify a reference block (e.g., a reference block 604) that best matches the current block 602. The reference block 604 is used as a prediction block for the current block 602. A difference (i.e., residual) between the current block 602 and the reference block 604 is encoded in a compressed bitstream (e.g., the compressed bitstream 420 of FIG. 4) along with a BV, such as a block vector 606. As such, IntraBC can be considered “motion compensation” within the same frame (e.g., the current frame 600), essentially using the BV as a motion vector. The encoder encodes a flag (e.g., an IntraBC flag) indicating that the current block 602 is encoded using the IntraBC mode and also encodes the BV (i.e., the horizontal, BVX, and the vertical, BVy, components thereof).
[0065] When decoding the current block 602, a decoder, such as the decoder 500 of FIG. 5, decodes the IntraBC flag to determine whether the current block 602 is to be decoded using the IntraBC mode. If so, then the decoder decodes the block vector 606 from the compressed bitstream to identify the reference block 604 therewith obtaining the prediction block for the current block 602. A residual block can then be decoded from the compressed bitstream to add to the prediction block therewith reconstructing the current block 602.
[0066] FIG. 7 is a flowchart of an example of a technique 700 for coding a current block using the warp intra-block copy mode. The technique 700 is further explained with reference to FIGS. 8 and 9A-9D. FIGS. 9A-9D provide generic descriptions of different warp models that may be encompassed (e.g., derived or used) when a the current block is encoded using the WIntraBC mode. [0067] The technique 700 can be implemented, for example, as a software program that may be executed by computing devices such as transmitting station 102 or receiving station 106. The software program can include machine-readable instructions that may be stored in a memory such as the memory 204 or the secondary storage 214, and that, when executed by a processor, such as CPU 202, may cause the computing device to perform the technique 700. The technique 700 may be implemented in whole or in part in the intra/inter prediction stage 402 of the encoder 400 or in the intra/inter prediction stage 508 of the decoder 500 of FIG. 5. The technique 700 can be implemented using specialized hardware or firmware. Multiple processors, memories, or both, may be used.
[0068] For simplicity of explanation, the technique 700 of FIG. 7 is depicted and described as a series of steps or operations. However, the steps or operations in accordance with this disclosure can occur in various orders and/or concurrently. Additionally, other steps or operations not presented and described herein may be used. Furthermore, not all illustrated steps or operations may be required to implement a method in accordance with the disclosed subject matter.
[0069] The techniques 700 derives parameters of a warp model that is used to derive a prediction block for the current block. The warp model can have 2, 4, 6, 8, or some other number of parameters. When the technique 700 is implemented in an encoder, “coding” means encoding into a compressed bitstream; and when implemented by a decoder, “coding” means decoding from a compressed bitstream.
[0070] At 702, neighboring blocks of the current block that are coded using the WIntraBC mode are selected. As a general rule, if the warp model requires 2N parameters, then the number of selected neighboring blocks is N.
[0071] FIG. 8 illustrates neighboring blocks of a current block 802 of a current frame 800. The blocks 804 through 814 are all of the neighboring blocks of the current block 802. However, not all of these blocks may be selected at 702 of FIG. 7. Additionally, while FIG. 8 illustrates one arrangement of all of the neighboring blocks, other arrangements are possible depending on the block partitioning scheme determined by the encoder. To reiterate, those neighboring blocks that are used in deriving the warp model parameters are referred to as selected neighboring blocks.
[0072] FIG. 8 illustrates that only blocks 804, 806, 810, and 814 are coded using the WIntraBC mode. As such, the neighboring blocks selected at 702 of FIG. 7 may be a subset of the blocks 804, 806, 810, and 814. [0073] In an example, the number of selected neighboring blocks can depend on the warp model. In an example, the encoder and decoder may be configured to use a preselected warp model (i.e., a model type). To illustrate, and without limitations, the preselected warp model may be the affine model. As the affine model requires six parameters, three neighboring blocks are selected. More generally, the number (e.g., 3) of neighboring blocks can be half of the number (e.g., 6) of parameters of the preselected warp model. In an example, the encoder may encode a syntax element (e.g., warp_index) indicating the preselected warp model (e.g., the model type). The decoder decodes the syntax element to determine the number of parameters to derive.
[0074] In an example, if more neighboring blocks than required are available, then a selection process may be applied to select the subset of the neighboring blocks. In an example, the selection process may be to select the neighboring blocks in clockwise fashion starting with a bottom-most left block. As such, in this case and assuming the warp model is the affine model, then the blocks 804, 806, and 810 would be selected. In another example, a counter clockwise fashion starting with the top-most right block (e.g., the block 814) can be used.
[0075] The selection process may be based on the characteristics of warp models associated with the selected neighboring blocks. In an example, the translational parameters of the warp models may be used as the selection metric. As such, the average (or mean) values of both the horizontal (i.e., the hi3 parameters described below) and vertical (i.e., the parameter h.23 described below) components of the translational parameters across all blocks are calculated. With these average values established, the selection process can then be used to identify those neighboring blocks whose translational parameters - both horizontal and vertical components - most closely align with these average values based on an error metric, such as a mean square error (MSE), or some other error metric. To illustrate, and again assuming an affine model, the selection process can be used to select three of the neighboring blocks 804, 806, 810, and 814.
[0076] In some situations, there may not be sufficient number of blocks coded using the WIntraBC mode such that the number of blocks is half of the number of parameters. To illustrate, the preselected warp model may be the homographic warp model (requiring 8 parameters with h.33 set to 1 as described with respect to FIG. 9A) but only 3 blocks were coded using the warp intra-block copy mode. As such some other blocks may be assumed to be associated with a zero block vector, as illustrated below with respect to the block 808. [0077] At 704, respective pixel pairs for the neighboring blocks (i.e., the selected neighboring blocks) are identified. A pixel pair for a neighboring block includes a pixel of the neighboring block and a projected pixel within the current frame. In an example, the projected pixels can be identified using the warp models associated with the blocks. In an example, the projected pixels can be identified using only the translational components of the warp models. To identify a pair of pixels, one pixel at a location (x, y) from a neighboring block is selected and a corresponding projection pixel (x', y') based on the warp model (or a subset thereof) of the neighboring block is identified. In an example, the selected pixel can be the center pixel of the neighboring block.
[0078] To illustrate, a pixel 816A of the neighboring block 804 is projected using the warp model associated with the block 804 to obtain a pixel 816B; a pixel 818A of the neighboring block 806 is projected using the warp model associated with the block 806 to obtain a pixel 818B; and so on. In the case that a zero BV is associated with a neighboring block, then the selected pixel and the projection pixel are considered the same, as illustrated with respect to a selected pixel 820A and projected pixel 820B of the block 808.
[0079] Accordingly, a set of original pixel coordinates (x;, y£) and projected pixel coordinates (x-, y ■) are obtained, where i is equal to the number of selected neighboring blocks.
[0080] At 706, parameters of a warp model are obtained based on the pixel pairs. In one example, a set of linear equations can be formulated based on the pixel pairs and solved. To illustrate, and assuming that the warp model is a six parameter model (e.g., the affine model), then each pixel pair contributes 2 equations, obtained using equation (4) below, to a system of linear equations that can be assembled into a matrix system and solved using standard techniques in linear algebra. The system of linear equations may be given by:
Figure imgf000016_0001
[0081] At 708, a prediction block for the current block is obtained using the obtained warp model. That is, each pixel location of the current block 802 is projected onto a patch 822 to obtain the corresponding the predicted pixel. The patch 822 illustrates a shape of the prediction “block” of the current block 802 using the warp model having the obtained parameters hn through A25. [0082] The technique 700 can include coding a syntax element (e.g., a flag) indicating that the current block is coded using the WIntraBC mode. That is, when implemented by an encoder, the syntax element is encoded in the compressed bitstream; and when implemented by a decoder, the syntax element is decoded from the compressed bitstream. In response to determining that the current is encoded using the wrap intra-block copy mode, the encoder and the decoder perform 702-708 of the technique 700.
[0083] The encoder may determine that the current block is to be encoded using the WIntraBC mode based on rate-distortion optimization. A rate-distortion (R-D) optimization represents the fundamental trade-off between compression efficiency (rate, measured in bits) and quality loss (distortion) in video encoding. This optimization process is typically formulated using a Lagrangian cost function: J = D + ZR, where J is the cost to minimize, D is the distortion (quality loss), R is the rate (bits), and (lambda) is the Lagrangian multiplier that weights the relative importance of rate versus distortion. The optimization process occurs at multiple levels in the encoding pipeline, including coding unit decisions, transform selections, and quantization parameters. When selecting between available coding modes (such as intra prediction, inter prediction, and WIntraBC), the encoder calculates distortion using metrics like Sum of Absolute Differences (SAD), determines the rate including both mode signaling and residual coding bits, and computes the combined R-D cost. For computational efficiency, the encoder may employ various optimization strategies including early termination of mode searches, content-aware mode filtering, adaptive thresholds to skip testing unlikely modes, and statistical tracking of mode usage patterns, ultimately selecting the mode that minimizes the R-D cost function while respecting bitrate and quality constraints.
[0084] In an example, the number of the selected neighboring blocks can be based on a number of the parameters of the warp model. As mentioned above, if the preselected warp model requires 2N parameters, then the number of selected neighboring blocks is N. In another example, the warp model selected can be based on the number of neighboring blocks that are coded using the WIntraBC mode. To illustrate, if M of the neighboring blocks are coded using the WIntraBC mode, then the warp model to obtain parameters for can be one where the number of parameters is max(2M, 8). In an example, the number of selected neighboring blocks can at least 1 (in the case of a simple translational model) and not greater than 4 (in the case of a homographic projection).
[0085] FIGS. 9A-D depict different warp (or projections) models used to project pixels of a current block of a current frame to a warped (e.g., projected) patch within the same frame. The warped patch can be used to generate a prediction block for encoding or decoding the current block. A warp model indicates how the pixels of the current block are to be scaled, rotated, or otherwise moved when projected into the warped patch. The number and function of the parameters of a warp model depend upon the specific projection used.
[0086] In FIG. 9A, pixels of a block 902A are projected to a warped patch 904A of a frame 900A using a homographic warp model. A homographic warp model uses eight parameters to project the pixels of the block 902A to the warped patch 904A. A homographic warp model is not bound by a linear transformation between the coordinates of two spaces. As such, the eight parameters that define a homographic warp model can be used to project pixels of the block 902A to a quadrilateral patch (e.g., the warped patch 904A) within the frame 900A. Homographic motion models thus support translation, rotation, scaling, changes in aspect ratio, shearing, and other non-parallelogram warping. A homographic warp model between two spaces is defined using equation (1), which can be rewritten as equation (2):
Figure imgf000018_0001
[0087] In these equations, (x',y') and (x, y) are coordinates of two spaces, namely, a projected position of a pixel within the frame 900 A and an original position of a pixel within the block 902A, respectively. Further, hn through hs 3 are the homographic parameters and can be real numbers representing a relationship between positions of respective pixels within the frame 900 A and the block 902 A.
[0088] The parameter, hn represents horizontal scaling - it scales the x-coordinates, but in the context of a general homography, it can also be influenced by rotation and shearing. The parameter hi 2 typically represents horizontal shearing or a combination of rotation and scaling effects. The parameter hn is a translation along the x-axis. The parameter Im is similar to the parameter h.12 and typically represents vertical shearing or a combination of rotation and scaling effects. The parameter /122 relates to vertical scaling, affecting the y- coordinates and, like hn, it can also be influenced by rotation and shearing in a general homography. The parameter h.23 is a translation along the y-axis. The parameters Im and /132 introduce perspective distortion. The parameters h.31 and /132 are responsible for effects such as tilting or depth, where the transformation varies depending on the x and y positions, respectively. The parameter h.33 is often set to 1 in many practical applications to keep the matrix calculations consistent, especially in affine transformations. In a full homography, the parameter h.33 can affect the overall scale and perspective distortion and acts as a normalizing factor in the conversion from homogeneous coordinates to Cartesian coordinates. In the case of perspective transformations, the parameter h.33 can vary, contributing to the perspective effect.
[0089] In FIG. 9B, pixels of a block 902B are projected to a warped patch 904B of a frame 900B using an affine motion model. An affine warp model uses six parameters to project the pixels of the block 902B to the warped patch 904B. An affine motion is a linear transformation between the coordinates of two spaces defined by the six parameters. As such, the six parameters that define an affine motion model can be used to project pixels of the block 902B to a parallelogram patch (e.g., the warped patch 904B) within the frame 900B. Affine motion models thus support translation, rotation, scale, changes in aspect ratio, and shearing. The affine projection between two spaces can be given by equation (3), which is a subset of equation (3), which can be rewritten as equation (4):
Figure imgf000019_0001
[0090] The coordinates (x',y') and (x,y) are as described above. The parameter hn performs scaling along the x-axis where a value greater than 1 enlarges the object along x, while a value between 0 and 1 shrinks it. The parameter hn scales an object along the y-axis where a value greater than 1 enlarges the object along y, and a value between 0 and 1 shrinks it. The parameter hn and .21 perform shearing where hi2 (or A27) performs horizontal (or vertical) shearing by changing the x-coordinate (or the y-coordinate) values, effectively tilting the object along the x-axis (or the y-axis). The parameter hi3 translates an object horizontally. The parameter .23 translates an object vertically.
[0091] To summarize, the tuple (h13, h23) corresponds to a conventional block (or motion) vector that can be used in a translational model; the parameters hn and / 22 can be used to control the scaling factors in the vertical and horizontal axes, and in conjunction with the parameters hn and .21 decide (e.g., determine, set, etc.) a rotation angle.
[0092] In FIG. 9C, pixels of a block 902C are projected to a warped patch 904C of a frame 900C using a similarity motion model. A similarity motion model uses four parameters to project the pixels of the block 902C to the warped patch 904C. A similarity motion is a linear transformation between the coordinates of two spaces defined by the four parameters. For example, the four parameters can be a translation along the x-axis, a translation along the y-axis, a rotation value, and a zoom value. As such, the four parameters that define a similarity motion model can be used to project pixels of the block 902C to a square patch (e.g., the warped patch 904C) within the frame 900C. Similarity motion models thus support square to square transformation with rotation and zoom.
[0093] In FIG. 9D, pixels of a block 902D are projected to a warped patch 904D of a frame 900D using a translational motion model. A translational motion model uses two parameters to project the pixels of the block 902D to the warped patch 904D. A translational motion is a linear transformation between the coordinates of two spaces defined by the two parameters. For example, the two parameters can be a translation along the x-axis and a translation along the y-axis. As such, the two parameters that define a translational motion model can be used to project pixels of the block 902D to a square patch (e.g., the warped patch 904D) within the frame 900D.
[0094] Some implementations are described below as numbered clauses (Clause 1 , Clause 2, etc.). These clauses are provided as examples only and do not limit the other implementations disclosed herein.
[0095] Clause 1 : A method for coding a current block of a current frame using a warp intra-block copy mode. The method includes selecting neighboring blocks that are coded using the warp intra-block copy mode; identifying respective pixel pairs for the neighboring blocks, where a pixel pair for a neighboring block includes a pixel of the neighboring block and a projected pixel within the current frame; obtaining, based on the pixel pairs, parameters of a warp model; and obtaining, based on the warp model, a prediction block for the current block.
[0096] Clause 2: The method of Clause 1, further including coding a syntax element indicating that the current block is coded using the warp intra-block copy mode.
[0097] Clause 3: The method of Clause 1, where the warp model is a homographic warp model, and where the parameters include parameters defining translation, rotation, scaling, changes in aspect ratio, shearing, and perspective distortion.
[0098] Clause 4: The method of Clause 1, where the warp model is an affine warp model having six parameters that project pixels of the current block to a parallelogram patch within the current frame.
[0099] Clause 5: The method of Clause 1, where the warp model is a similarity motion model having four parameters that project pixels of the current block to a square patch within the current frame. [0100] Clause 6: The method of Clause 1, where a number of the neighboring blocks is based on a number of the parameters.
[0101] Clause 7: The method of Clause 1, where a number of the parameters is based on a number of the neighboring blocks that are coded using the warp intra-block copy mode. [0102] Clause 8: The method of Clause 1, where a number of the neighboring blocks is at least 1 and not greater than 4.
[0103] Clause 9: The method of Clause 1, where selecting the neighboring blocks includes selecting the neighboring blocks in a clockwise fashion starting with a bottom-most left neighboring block.
[0104] Clause 10: The method of Clause 1, where selecting the neighboring blocks includes selecting the neighboring blocks in a counterclockwise fashion starting with a topmost right neighboring block.
[0105] Clause 11: The method of Clause 1, where selecting the neighboring blocks is based on characteristics of warp models associated with the neighboring blocks.
[0106] Clause 12: The method of Clause 1, where selecting the neighboring blocks includes determining average values of horizontal and vertical components of translational parameters across available neighboring blocks coded using the warp intra-block copy mode; and selecting the neighboring blocks based on an error metric comparing translational parameters of individual neighboring blocks to the average values.
[0107] Clause 13: The method of Clause 1, where at least one of the neighboring blocks is associated with a zero block vector.
[0108] Clause 14: The method of Clause 1, where for each neighboring block, the pixel of the neighboring block comprises a center pixel of the neighboring block.
[0109] Clause 15: The method of Clause 1, where the projected pixel within the current frame is obtained using a block vector associated with the neighboring block.
[0110] Clause 16: The method of Clause 1, where the projected pixel within the current block is obtained using a warp model associated with the neighboring block.
[0111] Claim 17: A device that includes a processor that is configured to perform the method of any of clauses 1-16.
[0112] Clause 18: A device that includes a memory and a processor. The processor is configured to execute instructions stored in the memory to perform the method of any of clauses 1-16. [0113] Clause 19: A non-transitory computer-readable storage medium that includes executable instructions that, when executed by a processor, facilitate performance of operations, including operations that perform the method of any of clauses 1-16.
[0114] Clause 20: A non-transitory computer-readable storage medium having stored thereon an encoded bitstream, wherein the encoded bitstream is configured for decoding by the method of any of clauses 1-16.
[0115] Clause 21: A non-transitory computer-readable storage medium having stored thereon an encoded bitstream, wherein the encoded bitstream is generated by an encoder performing the method of any of clauses 1-16.
[0116] The aspects of encoding and decoding described above illustrate some examples of encoding and decoding techniques. However, it is to be understood that encoding and decoding, as those terms are used in the claims, could mean compression, decompression, transformation, or any other processing or change of data.
[0117] The word “example” is used herein to mean serving as an example, instance, or illustration. Any aspect or design described herein as “example” is not necessarily to be construed as preferred or advantageous over other aspects or designs. Rather, use of the word “example” is intended to present concepts in a concrete fashion. As used in this application, the term “or” is intended to mean an inclusive “or” rather than an exclusive “or.” That is, unless specified otherwise, or clear from context, “X includes A or B” is intended to mean any of the natural inclusive permutations. That is, if X includes A; X includes B; or X includes both A and B, then “X includes A or B” is satisfied under any of the foregoing instances. In addition, the articles “a” and “an” as used in this application and the appended claims should generally be construed to mean “one or more” unless specified otherwise or clear from context to be directed to a singular form. Moreover, use of the term “an implementation” or “one implementation” throughout is not intended to mean the same embodiment or implementation unless described as such.
[0118] Implementations of the transmitting station 102 and/or the receiving station 106 (and the algorithms, methods, instructions, etc., stored thereon and/or executed thereby, including by the encoder 400 and the decoder 500) can be realized in hardware, software, or any combination thereof. The hardware can include, for example, computers, intellectual property (IP) cores, application- specific integrated circuits (ASICs), programmable logic arrays, optical processors, programmable logic controllers, microcode, microcontrollers, servers, microprocessors, digital signal processors or any other suitable circuit. In the claims, the term “processor” should be understood as encompassing any of the foregoing hardware, either singly or in combination. The terms “signal” and “data” are used interchangeably. Further, portions of the transmitting station 102 and the receiving station 106 do not necessarily have to be implemented in the same manner.
[0119] Further, in one aspect, for example, the transmitting station 102 or the receiving station 106 can be implemented using a general-purpose computer or general-purpose processor with a computer program that, when executed, carries out any of the respective methods, algorithms and/or instructions described herein. In addition, or alternatively, for example, a special purpose computer/processor can be utilized which can contain other hardware for carrying out any of the methods, algorithms, or instructions described herein. [0120] The transmitting station 102 and the receiving station 106 can, for example, be implemented on computers in a video conferencing system. Alternatively, the transmitting station 102 can be implemented on a server and the receiving station 106 can be implemented on a device separate from the server, such as a hand-held communications device. In this instance, the transmitting station 102 can encode content using an encoder 400 into an encoded video signal and transmit the encoded video signal to the communications device. In turn, the communications device can then decode the encoded video signal using a decoder 500. Alternatively, the communications device can decode content stored locally on the communications device, for example, content that was not transmitted by the transmitting station 102. Other suitable transmitting and receiving implementation schemes are available. For example, the receiving station 106 can be a generally stationary personal computer rather than a portable communications device and/or a device including an encoder 400 may also include a decoder 500.
[0121] Further, all or a portion of implementations of the present disclosure can take the form of a computer program product accessible from, for example, a computer-usable or computer-readable medium. A computer-usable or computer-readable medium can be any device that can, for example, tangibly contain, store, communicate, or transport the program for use by or in connection with any processor. The medium can be, for example, an electronic, magnetic, optical, electromagnetic, or a semiconductor device. Other suitable mediums are also available.
[0122] The above-described embodiments, implementations and aspects have been described in order to allow easy understanding of the present invention and do not limit the present invention. On the contrary, the invention is intended to cover various modifications and equivalent arrangements included within the scope of the appended claims, which scope is to be accorded the broadest interpretation so as to encompass all such modifications and equivalent structure as is permitted under the law.

Claims

What is claimed is:
1. A method for coding a current block of a current frame using a warp intrablock copy mode, comprising: selecting neighboring blocks that are coded using the warp intra-block copy mode; identifying respective pixel pairs for the neighboring blocks, wherein a pixel pair for a neighboring block includes a pixel of the neighboring block and a projected pixel within the current frame; obtaining, based on the pixel pairs, parameters of a warp model; and obtaining based on the warp model a prediction block for the current block.
2. The method of claim 1, further comprising: coding a syntax element indicating that the current block is coded using the warp intra-block copy mode.
3. The method of any one of claims 1 to 2, wherein the warp model is a homographic warp model, and wherein the parameters comprise: parameters defining translation, rotation, scaling, changes in aspect ratio, shearing, and perspective distortion.
4. The method of any one of claims 1 to 2, wherein the warp model is an affine warp model having six parameters that project pixels of the current block to a parallelogram patch within the current frame.
5. The method of any one of claims 1 to 2, wherein the warp model is a similarity motion model having four parameters that project pixels of the current block to a square patch within the current frame.
6. The method of any one of claims 1 to 5, wherein a number of the neighboring blocks is based on a number of the parameters.
7. The method of any one of claims 1 to 5, wherein a number of the parameters is based on a number of the neighboring blocks that are coded using the warp intra-block copy mode.
8. The method of any one of claims 1 to 5, wherein a number of the neighboring blocks is at least 1 and not greater than 4.
9. The method of any one of claims 1 to 8, wherein selecting the neighboring blocks comprises: selecting the neighboring blocks in a clockwise fashion starting with a bottom-most left neighboring block.
10. The method of any one of claims 1 to 8, wherein selecting the neighboring blocks comprises: selecting the neighboring blocks in a counter clockwise fashion starting with a topmost right neighboring block.
11. The method of any one of claims 1 to 8, wherein selecting the neighboring blocks is based on characteristics of warp models associated with the neighboring blocks.
12. The method of any one of claims 1 to 8, wherein selecting the neighboring blocks comprises: determining average values of horizontal and vertical components of translational parameters across available neighboring blocks coded using the warp intra-block copy mode; and selecting the neighboring blocks based on an error metric comparing translational parameters of individual neighboring blocks to the average values.
13. The method of any one of claims 1 to 12, wherein at least one of the neighboring blocks is associated with a zero block vector.
14. The method of any one of claims 1 to 13, wherein for each neighboring block, the pixel of the neighboring block comprises a center pixel of the neighboring block.
15. The method of any one of claims 1 to 14, wherein the projected pixel within the current frame is obtained using a block vector associated with the neighboring block.
16. The method of any one of claims 1 to 14, wherein the projected pixel within the current block is obtained using a warp model associated with the neighboring block.
17. A device, comprising: a processor that is configured to perform the method of any of claims 1-16.
18. A device, comprising: a memory; and a processor, the processor configured to execute instructions stored in the memory to perform the method of any of claims 1-16.
19. A non-transitory computer-readable storage medium, comprising executable instructions that, when executed by a processor, facilitate performance of operations, comprising operations that perform the method of any of claims 1-16.
20. A non-transitory computer-readable storage medium having stored thereon an encoded bitstream, wherein the encoded bitstream is configured for decoding by the method of any of claims 1-16.
21. A non-transitory computer-readable storage medium having stored thereon an encoded bitstream, wherein the encoded bitstream is generated by an encoder performing the method of any of claims 1-16.
PCT/US2024/059178 2023-12-15 2024-12-09 Warp intra block copy Pending WO2025128480A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202363610631P 2023-12-15 2023-12-15
US63/610,631 2023-12-15

Publications (1)

Publication Number Publication Date
WO2025128480A1 true WO2025128480A1 (en) 2025-06-19

Family

ID=93924640

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2024/059178 Pending WO2025128480A1 (en) 2023-12-15 2024-12-09 Warp intra block copy

Country Status (1)

Country Link
WO (1) WO2025128480A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180270497A1 (en) * 2017-03-15 2018-09-20 Google Llc Segmentation-based parameterized motion models
US10110914B1 (en) * 2016-09-15 2018-10-23 Google Llc Locally adaptive warped motion compensation in video coding
US20200021840A1 (en) * 2018-07-16 2020-01-16 Tencent America LLC Determination of parameters of an affine model
US11202079B2 (en) * 2018-02-05 2021-12-14 Tencent America LLC Method and apparatus for video decoding of an affine model in an intra block copy mode
US11818384B2 (en) * 2020-09-24 2023-11-14 Ofinno, Llc Affine intra block copy refinement

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10110914B1 (en) * 2016-09-15 2018-10-23 Google Llc Locally adaptive warped motion compensation in video coding
US20180270497A1 (en) * 2017-03-15 2018-09-20 Google Llc Segmentation-based parameterized motion models
US11202079B2 (en) * 2018-02-05 2021-12-14 Tencent America LLC Method and apparatus for video decoding of an affine model in an intra block copy mode
US20200021840A1 (en) * 2018-07-16 2020-01-16 Tencent America LLC Determination of parameters of an affine model
US11818384B2 (en) * 2020-09-24 2023-11-14 Ofinno, Llc Affine intra block copy refinement

Similar Documents

Publication Publication Date Title
US10165283B1 (en) Video coding using compound prediction
US9866863B1 (en) Affine motion prediction in video coding
US11405645B2 (en) Transform kernel selection and entropy coding
US9374578B1 (en) Video coding using combined inter and intra predictors
US12425636B2 (en) Segmentation-based parameterized motion models
US10623731B2 (en) DC coefficient sign coding scheme
CN110741638B (en) Motion vector encoding using residual block energy distribution
US20210112270A1 (en) Dynamic motion vector referencing for video coding
US10225573B1 (en) Video coding using parameterized motion models
EP3711293B1 (en) Diversified motion using multiple global motion models
US9615100B2 (en) Second-order orthogonal spatial intra prediction
EP3568978B1 (en) Compound prediction for video coding
US12273533B2 (en) Video stream adaptive filtering for bitrate reduction
US10110914B1 (en) Locally adaptive warped motion compensation in video coding
US20190028714A1 (en) Video coding using frame rotation
WO2019036080A1 (en) Constrained motion field estimation for inter prediction
CN110692247B (en) Composite Motion Compensated Prediction
WO2025128480A1 (en) Warp intra block copy
US20240388690A1 (en) Warped Motion Compensation With Explicitly Signaled Extended Rotations
US20240380924A1 (en) Geometric transformations for video compression
US20250324096A1 (en) Scaling-only affine mode
WO2025010397A1 (en) Merge based cross-component prediction and inter prediction with filtering
EP4584956A1 (en) Inter-prediction with filtering
WO2024096895A1 (en) Wavefront scan order for transform coefficient coding
WO2024096896A1 (en) Jointly designed context model and scan order for transform coefficient coding

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24827662

Country of ref document: EP

Kind code of ref document: A1