[go: up one dir, main page]

WO2017030865A1 - Procédé et systèmes d'affichage d'une partie d'un flux vidéo - Google Patents

Procédé et systèmes d'affichage d'une partie d'un flux vidéo Download PDF

Info

Publication number
WO2017030865A1
WO2017030865A1 PCT/US2016/046317 US2016046317W WO2017030865A1 WO 2017030865 A1 WO2017030865 A1 WO 2017030865A1 US 2016046317 W US2016046317 W US 2016046317W WO 2017030865 A1 WO2017030865 A1 WO 2017030865A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
interest
sub
streams
manifest
Prior art date
Application number
PCT/US2016/046317
Other languages
English (en)
Inventor
Kumar Ramaswamy
Jeffrey Allen Cooper
Original Assignee
Vid Scale, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Vid Scale, Inc. filed Critical Vid Scale, Inc.
Publication of WO2017030865A1 publication Critical patent/WO2017030865A1/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/4728End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for selecting a Region Of Interest [ROI], e.g. for requesting a higher resolution version of a selected region
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/262Content or additional data distribution scheduling, e.g. sending additional data at off-peak times, updating software modules, calculating the carousel transmission frequency, delaying a video stream transmission, generating play-lists
    • H04N21/26258Content or additional data distribution scheduling, e.g. sending additional data at off-peak times, updating software modules, calculating the carousel transmission frequency, delaying a video stream transmission, generating play-lists for generating a list of items to be played back in a given order, e.g. playlist, or scheduling item distribution according to such list
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/60Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client 
    • H04N21/63Control signaling related to video distribution between client, server and network components; Network processes for video distribution between server and clients or between remote clients, e.g. transmitting basic layer and enhancement layers over different transmission paths, setting up a peer-to-peer communication via Internet between remote STB's; Communication protocols; Addressing
    • H04N21/637Control signals issued by the client directed to the server or network components
    • H04N21/6373Control signals issued by the client directed to the server or network components for rate control, e.g. request to the server to modify its transmission rate
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics

Definitions

  • Digital video signals are commonly characterized by its parameters of i) resolution (luma and chroma resolution or horizontal and vertical pixel dimensions), ii) frame rate, and iii) dynamic range or bit depth (bits per pixel).
  • the resolution of the digital video signals has increased from Standard Definition (SD) through 8K-Ultra High Definition (UHD).
  • the other digital video signal parameters have also increased from 30 frames per second (fps) up to 240 fps, and the bit depth has increased from 8 bit to 10 bit.
  • MPEG/ITU standardized video compression has undergone several generations of successive improvements in compression efficiency, including MPEG2, MPEG4/H.264, and HEVC/H.265.
  • the technology to display the digital video signals on a consumer device, such as a television or mobile phone, has also increased correspondingly.
  • Video content is initially captured at a higher resolution, frame rate, and dynamic range. For example, a 4:2:2, 10 bit HD video content is often down-resolved to 4:2:0, 8 bit for distribution.
  • the digital video is encoded and stored at multiple resolutions at a server, and these versions at varying resolutions are made available for retrieval, decoding and rendering by clients with possibly varying capabilities.
  • the digital video gets encoded and stored at multiple resolutions at a server.
  • Adaptive bit rate (ABR) further addresses network congestion.
  • ABR a digital video is encoded at multiple bit rates (e.g. choosing the same or multiple lower resolutions, lower frame rates, etc.) and is made available at a server.
  • the client device requests a different bit rate for consumption at periodic intervals based on its calculated available network bandwidth or local computing resources.
  • Described herein are systems and methods related to displaying a portion of a digital video stream.
  • the digital video is encoded and stored at multiple resolutions at a server, and these versions at varying resolutions are made available for retrieval, decoding and rendering by clients with possibly varying capabilities.
  • the server may make available additional metadata so that clients may request and receive data sufficient to decode and render one or more objects or areas of interest at a high resolution and/or a zoomed scale, where the spatial support for the objects or areas of interest may vary in time.
  • a server transmits a manifest, such as a DASH MPD, to a client device.
  • the manifest identifies at least one unzoomed stream representing an unzoomed version of a source video.
  • the manifest further identifies a plurality of sub-streams, where each sub-stream represents a respective spatial portion of the source video.
  • the server also transmits, to the client device, information associating at least one object of interest with a plurality of the spatial portions. This information may be provided in the manifest.
  • the server receives, from the client device, a request for at least one of the sub-streams. In response, the server transmits the requested sub-streams to the client device.
  • the sub-streams may be encoded at a higher resolution than the unzoomed stream, allowing for higher-quality video when a client device zooms in on an object of interest represented in the sub-streams.
  • the information that associates the at least one object of interest with a plurality of the spatial portions may be provided by including, in the manifest, a syntax element for each sub- stream that identifies at least one object of interest associated with the respective sub-stream.
  • the server also transmits to the client a render point for the obj ect of interest.
  • the render point may be used to indicate which portions of the sub-streams are to be displayed.
  • the render point may represent coordinates of one or more corners of a rectangular region of interest, where the rectangular region of interest is smaller than a complete region represented by all of the sub-streams. The rectangular region of interest is displayed, while portions of the sub-streams that are outside of the rectangular region of interest may not be displayed.
  • the rendering reference points may be communicated to the client device.
  • rendering reference points may be transmitted in-band as part of the video streams or video segments, or as side information sent along with the video streams or video segments.
  • One or more rendering reference points may be transmitted in-band in a video stream, such as in an unzoomed stream or in one or more sub-streams.
  • the rendering reference points may be specified in an out-of-band communication (e.g. as metadata in a manifest such as a DASH MPD).
  • the sub-streams are encoded for adaptive bit rate (ABR) streaming, for example with at least two sub-streams with different bitrates being available for at least some of the spatial portions.
  • ABR adaptive bit rate
  • the client may select which sub-stream to request based on network conditions.
  • a video client receives a manifest, where the manifest identifies an unzoomed stream representing an unzoomed version of a source video.
  • the manifest also identifies a plurality of sub-streams, where each sub-stream represents a respective spatial portion of the source video.
  • the client further receives information associating at least one object of interest with a plurality of the spatial portions.
  • the client device receives a selection (e.g. a user selection entered through a user interface device such as a remote control) of one of the objects of interest.
  • the client device identifies the spatial portions associated with the selected object of interest and retrieves a representative sub-stream for each of the spatial portions.
  • the client device may select which of the representative sub-streams to retrieve based on network conditions.
  • the client device then causes display of a zoomed version of the object of interest by rendering the retrieved sub-streams.
  • the display of the zoomed version may be provided by the client device itself (e.g. on a built-in screen), or the client device may transmit uncompressed video to an external display device (such as a television or monitor).
  • FIG. 1 A depicts an example communications system in which one or more disclosed embodiments may be implemented.
  • FIG. IB depicts an example client device that may be used within the communications system of FIG. 1A.
  • FIG. 1C depicts an example video encoding and distribution system.
  • FIG. ID depicts example screen resolutions.
  • FIG. IE schematically depicts ABR encoding.
  • FIG. IF depicts an example network entity 190, that may be used within the communication system 100 of FIG. 1A.
  • FIG. 2 depicts an example video coding system and distribution system, according to an embodiment.
  • FIG. 3 depicts example coding resolutions, in accordance with an embodiment.
  • FIG. 4A depicts an example of a video zoom operation, in accordance with an embodiment.
  • FIG. 4B depicts a second example of video zoom operation, in accordance with an embodiment.
  • FIG. 5 depicts an example of a digital video with an object of interest, in accordance with an embodiment.
  • FIG. 6 is a message sequence diagram depicting encoding and delivery of content to a client in accordance with an embodiment.
  • FIG. 7 is a message sequence diagram depicting a second example of encoding and delivery of content to a client in accordance with an embodiment.
  • FIG. 8 is a message sequence diagram depicting an example communications process, in accordance with an embodiment.
  • FIG. 9 illustrates a video having a plurality of spatial portions, at least some of the spatial portions having associated sub-streams to enable zoomed display of an object of interest.
  • FIG. 1 A is a diagram of an example communications system 100 in which one or more disclosed embodiments may be implemented.
  • the communications system 100 may be a multiple access system that provides content, such as voice, data, video, messaging, broadcast, and the like, to multiple wireless users.
  • the communications system 100 may enable multiple wired and wireless users to access such content through the sharing of system resources, including wired and wireless bandwidth.
  • the communications systems 100 may employ one or more channel-access methods, such as code division multiple access (CDMA), time division multiple access (TDMA), frequency division multiple access (FDMA), orthogonal FDMA (OFDMA), single-carrier FDMA (SC-FDMA), and the like.
  • the communications systems 100 may also employ one or more wired communications standards (e.g.: Ethernet, DSL, radio frequency (RF) over coaxial cable, fiber optics, and the like.
  • RF radio frequency
  • the communications system 100 may include client devices 102a, 102b, 102c, and/or 102d, Radio Access Networks (RAN) 103/104/105, a core network 106/107/109, a public switched telephone network (PSTN) 108, the Internet 110, and other networks 112, and communication links 115/116/117, and 119, though it will be appreciated that the disclosed embodiments contemplate any number of client devices, base stations, networks, and/or network elements.
  • Each of the client devices 102a, 102b, 102c, 102d may be any type of device configured to operate and/or communicate in a wired or wireless environment.
  • the client device 102a is depicted as a tablet computer
  • the client device 102b is depicted as a smart phone
  • the client device 102c is depicted as a computer
  • the client device 102d is depicted as a television.
  • the communications systems 100 may also include a base station 114a and a base station 114b.
  • Each of the base stations 114a, 114b may be any type of device configured to wirelessly interface with at least one of the WTRUs 102a, 102b, 102c, 102d to facilitate access to one or more communication networks, such as the core network 106/107/109, the Internet 110, and/or the networks 112.
  • the base stations 114a, 114b may be a base transceiver station (BTS), a Node-B, an eNode B, a Home Node B, a Home eNode B, a site controller, an access point (AP), a wireless router, and the like. While the base stations 114a, 114b are each depicted as a single element, it will be appreciated that the base stations 114a, 114b may include any number of interconnected base stations and/or network elements.
  • the base station 114a may be part of the RAN 103/104/105, which may also include other base stations and/or network elements (not shown), such as a base station controller (BSC), a radio network controller (RNC), relay nodes, and the like.
  • the base station 114a and/or the base station 114b may be configured to transmit and/or receive wireless signals within a particular geographic region, which may be referred to as a cell (not shown).
  • the cell may further be divided into sectors.
  • the cell associated with the base station 114a may be divided into three sectors.
  • the base station 114a may include three transceivers, i.e., one for each sector of the cell.
  • the base station 114a may employ multiple- input multiple output (MIMO) technology and, therefore, may utilize multiple transceivers for each sector of the cell.
  • MIMO multiple- input multiple output
  • the base stations 114a, 114b may communicate with one or more of the client devices 102a, 102b, 102c, and 102d over an air interface 115/116/117, or communication link 119, which may be any suitable wired or wireless communication link (e.g., radio frequency (RF), microwave, infrared (IR), ultraviolet (UV), visible light, and the like).
  • the air interface 115/116/117 may be established using any suitable radio access technology (RAT).
  • RAT radio access technology
  • the communications system 100 may be a multiple access system and may employ one or more channel-access schemes, such as CDMA, TDMA, FDMA, OFDMA, SC-FDMA, and the like.
  • the base station 114a in the RAN 103/104/105 and the client devices 102a, 102b, 102c may implement a radio technology such as Universal Mobile Telecommunications System (UMTS) Terrestrial Radio Access (UTRA), which may establish the air interface 115/116/117 using wideband CDMA (WCDMA).
  • WCDMA may include communication protocols such as High-Speed Packet Access (HSPA) and/or Evolved HSPA (HSPA+).
  • HSPA may include High-Speed Downlink Packet Access (HSDPA) and/or High-Speed Uplink Packet Access (HSUPA).
  • the base station 114a and the client devices 102a, 102b, 102c may implement a radio technology such as Evolved UMTS Terrestrial Radio Access (E-UTRA), which may establish the air interface 115/116/117 using Long Term Evolution (LTE) and/or LTE-Advanced (LTE-A).
  • E-UTRA Evolved UMTS Terrestrial Radio Access
  • LTE Long Term Evolution
  • LTE-A LTE-Advanced
  • the base station 114a and the client devices 102a, 102b, 102c may implement radio technologies such as IEEE 802.16 (i.e., Worldwide Interoperability for Microwave Access (WiMAX)), CDMA2000, CDMA2000 IX, CDMA2000 EV-DO, Interim Standard 2000 (IS-2000), Interim Standard 95 (IS-95), Interim Standard 856 (IS-856), Global System for Mobile communications (GSM), Enhanced Data rates for GSM Evolution (EDGE), GSM EDGE (GERAN), and the like.
  • IEEE 802.16 i.e., Worldwide Interoperability for Microwave Access (WiMAX)
  • CDMA2000, CDMA2000 IX, CDMA2000 EV-DO Code Division Multiple Access 2000
  • IS-95 Interim Standard 95
  • IS-856 Interim Standard 856
  • GSM Global System for Mobile communications
  • GSM Global System for Mobile communications
  • EDGE Enhanced Data rates for GSM Evolution
  • GERAN GSM EDGERAN
  • the base station 114b in FIG. 1 A may be a wired router, a wireless router, Home Node B, Home eNode B, or access point, as examples, and may utilize any suitable wired transmission standard or RAT for facilitating wireless connectivity in a localized area, such as a place of business, a home, a vehicle, a campus, and the like.
  • the base station 114b and the client devices 102c, 102d may implement a radio technology such as IEEE 802.11 to establish a wireless local area network (WLAN).
  • the base station 114b and the client devices 102c, 102d may implement a radio technology such as IEEE 802.15 to establish a wireless personal area network (WPAN).
  • WLAN wireless local area network
  • WPAN wireless personal area network
  • the base station 114b and the client devices 102c, 102d may utilize a cellular-based RAT (e.g., WCDMA, CDMA2000, GSM, LTE, LTE-A, and the like) to establish a picocell or femtocell.
  • the base station 114b communicates with client devices 102a, 102b, 102c, and 102d through communication links 119. As shown in FIG. 1A, the base station 114b may have a direct connection to the Internet 110. Thus, the base station 114b may not be required to access the Internet 110 via the core network 106/107/109.
  • the RAN 103/104/105 may be in communication with the core network 106/107/109, which may be any type of network configured to provide voice, data, applications, and/or voice over internet protocol (VoIP) services to one or more of the client devices 102a, 102b, 102c, 102d.
  • the core network 106/107/109 may provide call control, billing services, mobile location-based services, pre-paid calling, Internet connectivity, video distribution, and the like, and/or perform high-level security functions, such as user authentication.
  • the RAN 103/104/105 and/or the core network 106/107/109 may be in direct or indirect communication with other RANs that employ the same RAT as the RAN 103/104/105 or a different RAT.
  • the core network 106/107/109 may also be in communication with another RAN (not shown) employing a GSM radio technology.
  • the core network 106/107/109 may also serve as a gateway for the client devices 102a, 102b, 102c, 102d to access the PSTN 108, the Internet 110, and/or other networks 112.
  • the PSTN 108 may include circuit-switched telephone networks that provide plain old telephone service (POTS).
  • POTS plain old telephone service
  • the Internet 110 may include a global system of interconnected computer networks and devices that use common communication protocols, such as the transmission control protocol (TCP), user datagram protocol (UDP) and IP in the TCP/IP Internet protocol suite.
  • the networks 112 may include wired and/or wireless communications networks owned and/or operated by other service providers.
  • the networks 112 may include another core network connected to one or more RANs, which may employ the same RAT as the RAN 103/104/105 or a different RAT.
  • Some or all of the client devices 102a, 102b, 102c, 102d in the communications system 100 may include multi-mode capabilities, i.e., the client devices 102a, 102b, 102c, 102d may include multiple transceivers for communicating with different wired or wireless networks over different communication links.
  • the WTRU 102c shown in FIG. 1A may be configured to communicate with the base station 114a, which may employ a cellular-based radio technology, and with the base station 114b, which may employ an IEEE 802 radio technology.
  • FIG. IB depicts an example client device that may be used within the communications system of FIG. 1 A.
  • FIG. IB is a system diagram of an example client device 102.
  • the client device 102 may include a processor 118, a transceiver 120, a transmit/receive element 122, a speaker/microphone 124, a keypad 126, a display/touchpad 128, a non-removable memory 130, a removable memory 132, a power source 134, a global positioning system (GPS) chipset 136, and other peripherals 138.
  • GPS global positioning system
  • the client device 102 may represent any of the client devices 102a, 102b, 102c, and 102d, and include any subcombination of the foregoing elements while remaining consistent with an embodiment.
  • the base stations 114a and 114b, and/or the nodes that base stations 114a and 114b may represent, such as but not limited to transceiver station (BTS), a Node-B, a site controller, an access point (AP), a home node-B, an evolved home node-B (eNodeB), a home evolved node-B (He B), a home evolved node-B gateway, and proxy nodes, among others, may include some or all of the elements depicted in FIG. IB and described herein.
  • BTS transceiver station
  • AP access point
  • eNodeB evolved home node-B
  • He B home evolved node-B gateway
  • proxy nodes among others, may include some or all of the elements depicted in FIG. IB and described herein.
  • the processor 118 may be a general purpose processor, a special purpose processor, a conventional processor, a digital signal processor (DSP), a plurality of microprocessors, one or more microprocessors in association with a DSP core, a controller, a microcontroller, Application Specific Integrated Circuits (ASICs), Field Programmable Gate Array (FPGAs) circuits, any other type of integrated circuit (IC), a state machine, and the like.
  • the processor 118 may perform signal coding, data processing, power control, input/output processing, and/or any other functionality that enables the client device 102 to operate in a wired or wireless environment.
  • the processor 118 may be coupled to the transceiver 120, which may be coupled to the transmit/receive element 122. While FIG. IB depicts the processor 118 and the transceiver 120 as separate components, it will be appreciated that the processor 118 and the transceiver 120 may be integrated together in an electronic package or chip.
  • the transmit/receive element 122 may be configured to transmit signals to, or receive signals from, a base station (e.g., the base station 114a) over the air interface 115/116/117 or communication link 119.
  • the transmit/receive element 122 may be an antenna configured to transmit and/or receive RF signals.
  • the transmit/receive element 122 may be an emitter/detector configured to transmit and/or receive IR, UV, or visible light signals, as examples.
  • the transmit/receive element 122 may be configured to transmit and receive both RF and light signals.
  • the transmit/receive element may be a wired communication port, such as an Ethernet port. It will be appreciated that the transmit/receive element 122 may be configured to transmit and/or receive any combination of wired or wireless signals.
  • the transmit/receive element 122 is depicted in FIG. IB as a single element, the client device 102 may include any number of transmit/receive elements 122. More specifically, the client device 102 may employ MIMO technology. Thus, in one embodiment, the WTRU 102 may include two or more transmit/receive elements 122 (e.g., multiple antennas) for transmitting and receiving wireless signals over the air interface 115/116/117.
  • the transceiver 120 may be configured to modulate the signals that are to be transmitted by the transmit/receive element 122 and to demodulate the signals that are received by the transmit/receive element 122.
  • the client device 102 may have multi-mode capabilities.
  • the transceiver 120 may include multiple transceivers for enabling the client device 102 to communicate via multiple RATs, such as UTRA and IEEE 802.11, as examples.
  • the processor 118 of the client device 102 may be coupled to, and may receive user input data from, the speaker/microphone 124, the keypad 126, and/or the display/touchpad 128 (e.g., a liquid crystal display (LCD) display unit or organic light-emitting diode (OLED) display unit).
  • the processor 118 may also output user data to the speaker/microphone 124, the keypad 126, and/or the display/touchpad 128.
  • the processor 118 may access information from, and store data in, any type of suitable memory, such as the non-removable memory 130 and/or the removable memory 132.
  • the non-removable memory 130 may include random-access memory (RAM), read-only memory (ROM), a hard disk, or any other type of memory storage device.
  • the removable memory 132 may include a subscriber identity module (SIM) card, a memory stick, a secure digital (SD) memory card, and the like.
  • SIM subscriber identity module
  • SD secure digital
  • the processor 118 may access information from, and store data in, memory that is not physically located on the client device 102, such as on a server or a home computer (not shown).
  • the processor 118 may receive power from the power source 134, and may be configured to distribute and/or control the power to the other components in the client device 102.
  • the power source 134 may be any suitable device for powering the WTRU 102.
  • the power source 134 may include one or more dry cell batteries (e.g., nickel -cadmium (NiCd), nickel- zinc (NiZn), nickel metal hydride (NiMH), lithium-ion (Li-ion), and the like), solar cells, fuel cells, a wall outlet and the like.
  • the processor 118 may also be coupled to the GPS chipset 136, which may be configured to provide location information (e.g., longitude and latitude) regarding the current location of the client device 102.
  • location information e.g., longitude and latitude
  • the WTRU 102 may receive location information over the air interface 115/116/117 from a base station (e.g., base stations 114a, 114b) and/or determine its location based on the timing of the signals being received from two or more nearby base stations.
  • a base station e.g., base stations 114a, 114b
  • the client device 102 may acquire location information by way of any suitable location-determination method while remaining consistent with an embodiment.
  • the client device 102 does not comprise a GPS chipset and does not acquire location information.
  • the processor 118 may further be coupled to other peripherals 138, which may include one or more software and/or hardware modules that provide additional features, functionality and/or wired or wireless connectivity.
  • the peripherals 138 may include an accelerometer, an e-compass, a satellite transceiver, a digital camera (for photographs or video), a universal serial bus (USB) port, a vibration device, a television transceiver, a hands free headset, a Bluetooth® module, a frequency modulated (FM) radio unit, a digital music player, a media player, a video game player module, an Internet browser, and the like.
  • FIG. 1C depicts an example video encoding system, in accordance with an embodiment. In particular, FIG.
  • the example system 160 includes a full resolution input video source 162, an adaptive bitrate encoder 164, a streaming server 166, a network 168, and client devices 169.
  • the example system 160 may be implemented in the context of the example communication system 100 depicted in FIG. 1A.
  • both the adaptive bitrate encoder 164 and the streaming server 168 may be entities in any of the networks depicted in the communication system 100.
  • the client devices 169 may be the client devices 102a-d depicted in the communication system 100.
  • the adaptive bitrate encoder or transcoder 164 receives an uncompressed or compressed input video stream from source 162 and encodes or transcodes the video stream into a plurality of representations 165. Each of the representations may differ from the others in a property such as resolution, frame rate, bit rate, and the like.
  • the adaptive bitrate encoder 164 communicates the encoded video streams 165 to the streaming server 166.
  • the streaming server 166 transmits an encoded video stream via the network to the client devices. The transmission may take place over any of the communication interfaces, such as the communication link 115/116/117 or 119.
  • FIG. ID provides an illustration 170 of different image resolutions.
  • the example image resolutions, listed from lowest resolution to highest resolution, include standard definition (SD), full high definition (FHD), 4K Ultra High Definition (UHD), and 8K UHD, although other resolutions may also be available.
  • SD standard definition
  • FHD full high definition
  • UHD 4K Ultra High Definition
  • 8K UHD 8K UHD
  • FIG. IE provides a schematic illustration of ABR encoding.
  • a 4K UHD source video is converted to three other encodings with three different resolutions.
  • the source video may be downconverted to a stream ABR-1 (182), which may be, for example, a 1080p HD video; a stream ABR-2 (184), which may be, for example, a standard definition (SD) stream; and a stream ABR-3 (186), which may be a still lower-resolution stream (e.g. for use under conditions of network congestion).
  • ABR-1 which may be, for example, a 1080p HD video
  • ABR-2 which may be, for example, a standard definition (SD) stream
  • SD standard definition
  • ABR-3 186
  • Each of the ABR encoded versions of the source video are transmitted to the streaming server for further transmission to client devices based in part on client device capability and network congestion.
  • the highest spatial resolution that is available is not always delivered to the client devices.
  • FIG. IF depicts an example network entity 190 that may be used within the communication system 100 of FIG. 1A.
  • network entity 190 includes a communication interface 192, a processor 194, and non-transitory data storage 196, all of which are communicatively linked by a bus, network, or another communication path 198.
  • Communication interface 192 may include one or more wired communication interfaces and/or one or more wireless-communication interfaces. With respect to wired communication, communication interface 192 may include one or more interfaces such as Ethernet interfaces, as an example. With respect to wireless communication, communication interface 192 may include components such as one or more antennae, one or more transceivers/chipsets designed and configured for one or more types of wireless (e.g., LTE) communication, and/or any other components deemed suitable by those of skill in the relevant art. And further with respect to wireless communication, communication interface 192 may be equipped at a scale and with a configuration appropriate for acting on the network side— as opposed to the client side— of wireless communications (e.g., LTE communications, Wi-Fi communications, and the like). Thus, communication interface 192 may include the appropriate equipment and circuitry (perhaps including multiple transceivers) for serving multiple mobile stations, UEs, or other access terminals in a coverage area.
  • wireless communication interface 192 may include the appropriate equipment and circuitry (perhaps including multiple transceivers)
  • Processor 194 may include one or more processors of any type deemed suitable by those of skill in the relevant art, some examples including a general-purpose microprocessor and a dedicated DSP.
  • Data storage 196 may take the form of any non-transitory computer-readable medium or combination of such media, some examples including flash memory, read-only memory (ROM), and random-access memory (RAM) to name but a few, as any one or more types of non- transitory data storage deemed suitable by those of skill in the relevant art could be used.
  • data storage 196 contains program instructions 197 executable by processor 194 for carrying out various combinations of the various network-entity functions described herein.
  • the network-entity functions described herein are carried out by a network entity having a structure similar to that of network entity 190 of FIG. IF. In some embodiments, one or more of such functions are carried out by a set of multiple network entities in combination, where each network entity has a structure similar to that of network entity 190 of FIG. IF.
  • network entity 190 is— or at least includes— one or more of the encoders, one or more of (one or more entities in) RAN 103, (one or more entities in) RAN 104, (one or more entities in) RAN 105, (one or more entities in) core network 106, (one or more entities in) core network 107, (one or more entities in) core network 109, base station 114a, base station 114b, Node-B 140a, Node-B 140b, Node-B 140c, RNC 142a, RNC 142b, MGW 144, MSC 146, SGSN 148, GGSN 150, eNode-B 160a, eNode-B 160b, eNode-B 160c, MME 162, serving gateway 164, PDN gateway 166, base station 180a, base station 180b, base station 180c, ASN gateway 182, MIP-HA 184, AAA 186, and gateway 188. And certainly other network entities and/or combinations
  • FIG. 2 depicts an example video coding and distribution system, according to an embodiment.
  • FIG. 2 depicts the example system 200.
  • the example system 200 includes components analogous to those depicted in the example ABR system 160 of FIG. 1C, such as a full resolution input video source 262, an adaptive bitrate encoder 264 generating traditional ABR streams 265, a streaming server 266, a network 268, and client devices 269.
  • system 200 further includes a zoom coding encoder 204.
  • the zoom coding encoder 204 receives a source video stream from the full resolution video source 262, either in uncompressed or a previously compressed format.
  • Zoom coding encoder 204 encodes or transcodes the source video stream into a plurality of zoom coded sub-streams, wherein each of the zoom coded sub- streams encodes a spatial portion (e.g. a segment, a slice, a quadrant, or other division) representing an area smaller than the complete area of the overall source video.
  • a decoding process is performed that brings the video back to the uncompressed domain at its full resolution followed by the re-encoding process of creating new compressed video streams representing different resolutions, bit rate or frame rates.
  • the zoom coded sub-streams 206 may be i) encoded at a resolution and quality of the source video stream, or similar to the ABR encoding, and/or ii) encoded into a plurality of resolutions.
  • the zoom coded sub-streams 206 are transmitted to the streaming server 266 for further transmission to the client devices 269.
  • the ABR encoder and the zoom coding encoder are the same encoder, configured to encode the source video into the ABR streams and the zoom coded sub-streams.
  • FIG. 3 depicts example coding resolutions, in accordance with an embodiment.
  • FIG. 3 depicts an overview of coding 300.
  • the overview includes a digital source video 302, an ABR encoder 304, a zoom coding encoder 306, ABR streams 308-312, and zoom coded sub-streams 314-320.
  • the digital source video 302 is depicted as having four quadrants, the top left has diagonal cross hatching, the top right has vertical and horizontal lines, the bottom left has diagonal lines, and the bottom right is dotted.
  • the full resolution of the source digital video 302 is 3840 horizontally by 2160 vertically (4K x 2K).
  • the four quadrants are shown by way of example as the digital video source may be divided into any number of areas in any arrangement, including segments of different sizes and shapes.
  • the digital source video 302 is received by the ABR encoder 304 and the zoom coding encoder 306.
  • the ABR encoder 304 processes the digital source video into three different ABR streams 308, 310 and 312.
  • Each ABR stream is of a different resolution.
  • the ABR stream 308 is encoded in 2K x IK (specifically 1920 x 1080), has the highest resolution, and is depicted as the largest area.
  • the ABR stream 312 is encoded in 500 x 250 (specifically 480 ⁇ 270), has the lowest resolution, and is depicted as the smallest area.
  • zoom coded sub-streams 314, 316, 318, and 320 are each encoded in a 2K x IK resolution (specifically 1920 x 1080), matching the resolution of the corresponding regions in the digital source video 302.
  • a client device is streaming digital video via the system 200, and a source video is being encoded (or has previously been encoded and stored at a streaming server) as depicted in the FIG. 3.
  • the client device can receive, can decode, and can display any of the ABR streams 308, 310, 312 that depict the source video encoded at varying digital video parameters.
  • the client device can zoom in on a portion of the (e.g. decoded) traditional ABR streams 308, 310, or 312. However, the whole ABR stream is transmitted over the network, including portions which the client device may not display (e.g. portions that are outside the boundary of the display when the video is zoomed in).
  • the client device can zoom in on a portion of the video stream by requesting a one or more of the zoom coded sub-streams 314, 316, 318, and 320 corresponding to the portion of the video stream requested by the client device.
  • the client device can, for example, request to see the top left portion of the digital video corresponding to the diagonally cross-hatched area.
  • the streaming server transmits the zoom coded sub-stream 314 over the network to the client device.
  • the portion of the video display requested by the client device is transmitted over the network and the resulting display is of a higher quality than a zoomed-in version of an ABR stream.
  • a separate video camera or source video is not required to provide a high quality video stream to the client device.
  • the streaming server may be configured to notify the client device via a profile communication file of available streams.
  • the profile communication file may be a manifest file, a session description file, a media presentation description (MPD) file, a DASH MPD, or another suitable representation for describing the available streams.
  • MPD media presentation description
  • the source video is a sports event, a replay of a sports event, an action sequence, a surveillance security video, a movie, or a television broadcast.
  • FIG. 4A depicts examples of zooming video streams, in accordance with an embodiment.
  • FIG. 4A depicts the process to display an area of interest 402 of the source video 302.
  • the area of interest 402 may include an object of interest in the video scene, which may be a stationary object or a moving object. (Henceforth, the terms object of interest and area of interest are used interchangeably in the present disclosure.)
  • the source video 302 is encoded with the ABR encoder 304 producing video streams 308, 310, 312 and is further encoded with the zoom coding encoder 306 producing video streams 314, 316, 318, 320.
  • the regions 404 and 406 represent the portion of the encoded streams associated with the area of interest of the source video 302.
  • the displays 406 and 410 represent the video displayed on the client device by zooming in on a traditional ABR stream (406) as compared to the use of zoom coding (410).
  • the area of interest 402 overlaps areas encoded in four different zoom coded sub-streams and has resolution dimensions of 2K x IK of the original 4K x 2K source video.
  • the highest resolution available to be displayed on a client device that represents the area of interest is 2K x IK.
  • the ABR encoder 304 is able to provide a zoomed-in view of the area of interest.
  • the ABR encoder 304 produces three ABR encoded streams 308, 310, and 312.
  • the ABR stream 308 has a resolution of 2K x IK and includes the region 404 that corresponds to the area of interest 402.
  • the portion of the video stream corresponding to the area of interest has resolution dimensions of approximately IK x 500 (specifically 960 x 540).
  • the final displayed video 406 with resolution dimensions of approximately IK x 500 (specifically 960 x 540), is at a lower resolution than the region of interest 402 in the source video 302, and displayed video 406 must be scaled for display on the client device.
  • the zoom coding encoder 306 is also able to provide for a zoomed-in view of the area of interest.
  • the zoom coding encoder 306 produces four zoom coded sub- streams 314, 316, 318, and 320.
  • Each zoom coded sub-stream has the resolution of 2K x IK.
  • the region 408 overlaps all four zoom coded sub-streams and has a maximum resolution available of 2K x IK, the same resolution dimensions available for region of interest 402 in the source video 302.
  • the source video 302 may be further divided into smaller portions, or slices, than the quadrants depicted.
  • the source video 302 may be divided using a grid of 8 portions horizontally and 8 portions vertically, or using a different grid of 32 portions horizontally and 16 portions vertically, or some other partitioning into portions.
  • Slice encoding is supported by video encoding standards, such as H.264, H.265, HEVC, and the like.
  • video encoding standards such as H.264, H.265, HEVC, and the like.
  • not all available zoom coded video sub-streams are transmitted via the network to the client device.
  • the client device or a network entity, such as the streaming server can determine the appropriate subset of the available zoom coded video sub-streams to transmit to the client device to cover the area of interest. For example, if the area of interest 402 of FIG.
  • zoom coded video sub-streams 314 and 316 are provided to represent the area of interest.
  • the streams 314 and 316 are transmitted to the client device, and streams 318 and 320 are not transmitted to the client device, in order to allow the client to decode and display the region of interest 402.
  • FIG. 4B depicts a second example of video zoom, in accordance with an embodiment.
  • FIG. 4B depicts zooming in circumstances where segments of video are divided into a plurality of slices, tiles, or other divisions of area.
  • Each slice, tile, or other division of area (collectively referred to herein as a slice) is independently decodable.
  • Each slice may be individually requested, retrieved, and decoded by a client, in the same manner as described for the alternative zoom coded sub-streams in the preceding examples.
  • a source video 412 with resolution dimensions of 4K x 2K has an area of interest 414 that has resolution dimensions of 2K x IK.
  • the source video is encoded into twelve slices, six on the left side and six on the right side.
  • the division into slices may be performed using any video codec (e.g. any video codec supporting independently decodable portions or slices) as known to those of skill in the art.
  • the area of interest overlaps eight slices, and does not include the top two and bottom two video slices.
  • the client device can request to only receive eight of the total twelve video segments from (e.g. encoded by) the zoom coding encoder.
  • the client device can display the area of interest in the full resolution of 2K x IK without scaling the video and without the need to receive all of the available zoom coded segments.
  • FIG. 5 depicts an example of a digital video with an object of interest, in accordance with an embodiment.
  • FIG. 5 depicts an example digital video 500.
  • the digital video 500 includes a plurality of video slices 502a, 502b, etc. As illustrated in FIG.
  • the use of zoom coded sub-streams allows a user to view a zoomed version of an object or area of interest that moves such that it overlaps different slices at different times.
  • the source video 502 has resolution dimensions of 3840 horizontally by 2160 vertically.
  • Each of video segments 502a, 502b, etc. have approximate resolutions dimensions of 800 horizontally and 333 vertically.
  • the source video 500 may be encoded by various ABR encoders and zoom coding encoders and provide the encoded video streams to a streaming server for further transmitting over a network to a client device.
  • An object of interest depicted by a soccer ball, is located at position 504a (inside slice 502c) at a first time Tl .
  • the position of the ball may be represented by a data structure ( ⁇ , ⁇ ), where PI represents the position 504a.
  • the object of interest is located further up and to the right (in slice 502d) at position 504b, which may be represented by (P2,T2).
  • the object of interest is located further up and to the right (in slice 502e) at position 504c, which may be represented by (P3,T3).
  • a client device may initially (for viewing of time period Tl) request slice 502c (and, in some embodiments, immediately neighboring slices).
  • the client device on receiving the requested slices, causes display of a zoomed-in stream that includes the object of interest.
  • the client device may subsequently (for viewing of time period T2) request and display slice 502d (and, in some embodiments, immediately neighboring slices).
  • the client device may subsequently (for viewing of time period T3) request and display slice 502e (and, in some embodiments, immediately neighboring slices).
  • Selection of the appropriate slices to show the object of interest in context may be performed on the client device or at the streaming server.
  • the concepts in this disclosure may apply to larger objects, objects which span multiple neighboring slices, objects traversing slices at different speeds, multiple objects, source video streams segmented into smaller segments, and the like.
  • a rendering reference point or "render point” may be used to indicate a rendering position associated with one or more positions of the object/area of interest.
  • the rendering reference point may, for example, indicate a position (e.g. a corner or an origin point) of a renderable region which contains the object of interest at some point in time.
  • the rendering reference point may indicate a size or extent of the renderable region.
  • the rendering reference point may define a bounding box which defines the location and extent of the object/area of interest or of the renderable region containing the object/area of interest.
  • the client may use the rendering reference point information to extract the renderable region from one or multiple zoom coded sub- streams or segments, and may render the region as a zoomed region of interest on the client display.
  • the rendering reference point, (0, 0) is depicted in the bottom left corner of the source video 502.
  • the second set of video segments has a rendering reference point of (a, b), and is depicted in the bottom left corner of slice 502f.
  • the rendering reference points may be communicated to the client device.
  • rendering reference points may be transmitted in-band as part of the video streams or video segments, or as side information sent along with the video streams or video segments.
  • the rendering reference points may be specified in an out-of-band communication (e.g. as metadata in a manifest such as a DASH MPD).
  • a discrete jump in the rendering reference point from (0, 0) to (a, b) as the object transitions from (PI, Tl) to (P3, T3) will cause an abrupt change in the location of the object of interest as displayed on the client device.
  • the rendering reference point as communicated to the client may be updated on a frame-by-frame basis, which may allow the client to continuously vary the location of the extracted renderable region, and so the object of interest may be smoothly tracked on the client display. Alternately the rendering reference point may be updated more coarsely in time, in which case the client may interpolate the rendering position between updates in order to smoothly track the obj ect of interest when displaying the renderable region on the client display.
  • the rendering reference point may include two parameters, a vertical distance and a horizontal distance represented by (x, y).
  • SEI Supplemental Enhancement Information
  • the render reference point may be updated to reflect the global object motion between each frame.
  • the render reference adjustment is equal to the global motion of the object of interest, the object will appear motionless (e.g. having a relatively constant position relative to the displayed region), as if the camera were panning to keep the object at the same point on the screen.
  • the motion of the object of interest is underestimated, the object skips backwards on the screen.
  • the motion of the object of interest is overestimated, the object skips forwards between frames. Minimizing the error of the object motion results in smooth rendering.
  • the video display transitions from the first set of video segments (and, in some embodiments, video segments which contain slices in the spatial neighborhood of the first set of video segments) to the second set of video segments (and, in some embodiments, video segments which contain slices in the spatial neighborhood of the second set of video segments) when the object of interest is at (P2, T2). Therefore, in this embodiment, the render reference point for each frame transmitted is adjusted (e.g. interpolated) to smoothly transition from (0, 0) to (a, b) over the time from Tl to T2.
  • the smooth transition may be linear (e.g. moving the rendering reference point a set distance equally each frame), non-linear (e.g.
  • the rendering reference point is transmitted as two coordinates, such as (x, y), and in other embodiments, the rendering reference point is transmitted as a differential from the previous frame.
  • FIG. 6 depicts an example process for encoding and delivery of content to a client using adaptive bitrate coding.
  • source content is communicated from a content source 604 to an encoder 606.
  • the source content is a compressed or uncompressed stream of digital video.
  • the encoder 606 encodes the video into several representations 608 with different bitrates, different resolutions, and or other different characteristics and transmits those representations 608 to a transport packager 610.
  • the transport packager 610 uses the representations 608 to generate segments of, e.g., a few seconds in duration.
  • the transport packager 610 further generates a manifest (e.g. a DASH MPD) describing the available segments.
  • the generated manifest and the segmented files are distributed to one or more edge streaming servers 614 through an origin server 612. Subsequent segments (collectively 617) are also distributed to the origin server 612 and/or the edge streaming server 614.
  • a client 620 visits a web server 618, e.g. by sending an HTTP GET request 622.
  • the web server 618 may send a response 624 that directs or redirects the client 620 to a streaming server such as the edge streaming server 614.
  • the client thus sends a request 626 to the edge streaming server.
  • the edge streaming server sends a manifest (e.g. a DASH MPD) 628 to the client.
  • the client selects an appropriate representation of the content and issues a request 630 for an appropriate segment (e.g. the first segment of recorded content, or the most recent segment of live content).
  • the edge streaming server responds by providing the requested segment 632 to the client.
  • the client may request a subsequent segment of the content (which may be at the same bitrate or a different bitrate from the segment 632), and the subsequent segment is sent to the client at 636.
  • FIG. 7 depicts an example process of encoding and delivery of content to a client using zoom coding.
  • source content is communicated from a content source 704 to an encoder 706.
  • the source content is a compressed or uncompressed stream of digital video.
  • the encoder 706 encodes the video into several representations 708 of the complete screen area with different bitrates, different resolutions, and or other different characteristics and transmits those representations 708 to a transport packager 710.
  • the zoom coding encoder encodes the video into several different slice streams (e.g. streams 712, 714) representing different areas of the complete video image.
  • Each of the streams 712 may represent a first encoded slice area of the content, with each of the streams being encoded at a different bitrate, and each of the streams 714 may represent a second encoded slice area of the content, again with each of the streams being encoded at a different bitrate.
  • additional slice streams representing various encoded bit rates for other content slices may be included, though not shown in the figure.
  • the transport packager 710 uses the representations 708, 712, 714 to generate segments of, e.g., a few seconds in duration.
  • the transport packager 710 further generates a manifest (e.g. a DASH MPD) describing the available segments, including the segments that represent the entire screen and segments that represent only a slice area of the screen.
  • the generated manifest and the segmented files are distributed to one or more streaming servers such as edge streaming server 720 through an origin server 718.
  • a client 724 visits a web server 722, e.g. by sending an HTTP GET request 726.
  • the web server 722 may send a response 728 that directs or redirects the client 724 to the edge streaming server 720.
  • the client thus sends a request 730 to the edge streaming server.
  • the edge streaming server sends a manifest (e.g. a DASH MPD) 732 to the client.
  • the client selects an appropriate representation of the normal (unzoomed) content and issues a request 734 for an appropriate segment (e.g. the first segment of recorded content, or the most recent segment of live content).
  • the edge streaming server responds by providing the requested unzoomed segment 736 to the client.
  • the client may request, receive, parse, decode and display additional unzoomed segments in addition to the segment 736 shown in the diagram.
  • the client device 724 may issue a request 738 for one or more sub-streams that are associated with an object or region of interest.
  • the client device identifies the streams to be requested based on, e.g. information such as render point information which may be provided in the manifest or in-band in the video streams.
  • the client device identifies the object or region of interest and forms a request based on the identified object or region of interest, and the identification of appropriate streams for that object or region of interest is made at the server side. Such server-identified streams or segments may then be returned by the server to the client in response to the request.
  • the appropriate slice stream or streams 740 are sent to the client device 724, and the client device decodes and combines the streams 740 to provide a zoomed version of the object or region of interest.
  • the client may request and receive the stream or streams 740 at a bitrate appropriate to the capabilities of the client device and the current network conditions using ABR techniques.
  • more than one obj ect of interest can be tracked and displayed.
  • a first object may be associated with a first set of slices such that a client must retrieve the slices of the first set in order to recover and render a view (e.g. a zoomed view) of the first object.
  • a second object may be associated with a second set of slices such that the client must retrieve the slices of the second set in order to recover and render a view (e.g. a zoomed view) of the second object.
  • the first set of slices may be completely different than, partially overlapping with, or fully overlapping with the second set of slices.
  • the amount of overlap between the first and second set of slices may change with time as the underlying objects move.
  • the render point information for each set may be independently encoded for each such set and may be contained in different slices or the same slice.
  • the receiver must retrieve the appropriate rendering point (corresponding to a current zoom coded object) and apply the render point offset accordingly.
  • the object, or objects, of interest move through the screen, there may be changes to the sets of slices that represent the new zoomed view.
  • the manifest may be updated to signal such changes, or a completely new manifest can be created.
  • the client device may use the updated manifest information to appropriately request the set of slices that represent the updated view.
  • the changes may be signaled in-band in the video stream, or in side information such as a render point metadata file retrievable by the client.
  • the request for streams may correspond to a particular area of the video or correspond to an object ID.
  • the video source is video of a soccer (a.k.a. football) game
  • examples of different objects may include a goal box, a ball, or a player.
  • the objects may be detected via any means, including image detection (e.g. detecting the rectangular dimensions of a goal, the round shape of a ball, or numbers on a uniform, etc.), spatial information encoded in the source video (e.g. correlation between camera position and a stationary object's position, sensor information being transmitted from a soccer ball, etc.) or any other similar method.
  • a client device can request to receive zoom coded sub-streams associated with an object of interest such as the ball.
  • the request may also include a magnitude of context to include with the ball, such that the ball comprises a certain percentage of the display, or the like.
  • the magnitude of context may be specified as a rendering area size specified as a horizontal dimension and a vertical dimension in pixels, for example.
  • a network entity, or the client device can determine the appropriate zoom coded sub-streams to frame the object of interest, and may communicate to the streaming server which zoom coded sub-streams to send to the client device.
  • the client device receives the zoom coded sub-streams and the appropriate rendering information to display the zoomed-in video stream.
  • the spatial regions of the object, or objects, of interest may be determined at the streaming server, at the client device, a separate network entity, or a combination of the above examples.
  • the server-side creates the arbitrary spatial region, such as mapping the streams to slices for encoding.
  • the client-device side creates or assembles the arbitrary spatial regions by, for example, decoding more than one spatial content portion (e.g. more than one slice or video segment) from the server and combining parts of the decoded spatial content portions in order to create or assemble the desired spatial region.
  • a hybrid server-side/player-side creates the arbitrary spatial regions.
  • o Zoom-coded regions can include variations of frame rate, chroma resolution, and bit depth characteristics.
  • the ABR Streams for Each Segment as shown in FIG. 4B may be encoded using such variations.
  • zoom coded sub-streams or segments may be packaged using MPEG-2 transport stream segments, or using an ISO Base Media file format.
  • o Zoom-coded sequences or segments may be created with additional bit depth for specific spatial regions.
  • the regions of enhanced bit depth may correspond to the areas or objects of interest, for example.
  • Two-way interaction may be used to optimize for client side display capabilities.
  • o Creation of special effects may be provided, such as slow motion and zoom.
  • FIG. 8 depicts an example communications process, in accordance with an embodiment.
  • FIG. 8 depicts a DASH- Type exchange between a streaming server 802 and a client device 804 to receive a zoom coded sub-stream.
  • the client device 804 sends a request 808 to a web server 806 for streaming services, and the web server at 810 directs or redirects the client device 804 to the edge streaming server 802.
  • the edge streaming server sends an extended MPD 814, with zoom coded information, to the client device 804.
  • the client device parses the extended MPD in order to determine what objects/areas of interest are available and also to determine in step 816 the slices to request for each object.
  • the client sends requests for the appropriate slices (e.g. requesting a first slice at 818 and requesting a second slice at 820).
  • the requested slices may be a subset of the available slices, and/or may be requested by requesting the video segments which contain the slices.
  • the edge streaming server sends each requested slice of the video stream to the client device (e.g. sending the first slice at 822 and the second slice at 824), and the client device renders the zoom coded frame for the specific object in step 826.
  • the client device causes display of the zoom coded frame, e.g. by displaying the frame on a built-in screen or by transmitting information representing the frame to an external display.
  • Composition of the zoom coded frame by the client may comprise receiving, decoding, and/or rendering the requested slices for an object/area of interest.
  • the client may render a subset of the pixels of the requested slices, as determined by a current render point and/or a rendering area size or context magnitude indication for the object.
  • the DASH-type message may include additional extensions to support tracking of multiple objects with overlapping slices.
  • Zoom coding may be enabled using MPEG-DASH.
  • MPEG-DASH ISO/IEC 23009- 1 :2014
  • IP Internet Protocol
  • An exemplary process for performing zoom coding using MPEG- DASH may be performed as follows. A determination is made that a zoom coded representation is available and how to access that content. This information is signaled to the DASH client using syntax in the MPD descriptor. Per Amendment 2 of the ISO DASH standard, the MPD may provide a "supplementary stream.” This supplementary stream may be utilized for zoom coding.
  • a spatial relationship descriptor (SRD) syntax element can describe a spatial portion of an image (see Annex H of ISO 23009-1 AM2).
  • An object render point provided in the video bitstream is used to render the zoomed section for the object being tracked.
  • the zoomed section may, for example, be rendered with uniform motion or with interpolated motion as described herein.
  • the object (or objects) render point may be sent in user data for one or more slices as an SEI message.
  • the SEI message may be as defined in a video coding standard such as AVC/H.264 or HEVC/H.265. Zero or more objects may be signaled per slice.
  • Exemplary Slice User Data for object render points includes the following parameters:
  • Object_ID Range 0-255. This syntax element provides a unique identifier for each object.
  • Object_y_position[n] For each object ID n, the y position of the object bounding box.
  • Object _x_size_in_slice[n] For each object ID n, the x dimension of the object bounding box.
  • Object _y_size_in_slice[n] For each object ID n, the y dimension of the object bounding box.
  • the object bounding box represents a rectangular region that encloses the object.
  • the object bounding box may also enclose some amount of surrounding context to be rendered with the object.
  • the x,y position may indicate, for example, the upper left corner position of the object bounding box.
  • the object position and size may pertain to the portion of the object contained in the slice that contains the user data.
  • the video depicted in FIG. 5 may be used in the implementation of zoom coding of a video with resolution of 4K, or 3840x2160.
  • the 4K video is encoded with H.264 compression into thirty independent H.264 slices. Each slice is 768x360 pixels.
  • the native full image is scaled down to HD 1920 ⁇ 1080 and provided as the normal unzoomed stream for a client device to display. Additionally, each of the thirty segments is encoded in the native 768x360 resolution.
  • the encoder tracks an object as shown in the figure moving across the scene.
  • the subset of slices is signaled to the client via the MPD SRD descriptor. For each slice, an Adaptation Set with SRD descriptor is provided.
  • the SRD descriptor syntax may be extended to allow the client device to determine which slices are needed for rendering the given objects.
  • the Object ID (consistent with the slice SEI information) is included in the SRD, added to the end of the "value" syntax for SRD. If multiple objects are associated with the slice, then multiple Object lD's may be added to the end of the SRD value syntax.
  • the Spatial Set ID may also be in the stream. After the Spatial Set ID parameter in the SRD, up to 256 Object lDs can be included.
  • An example of two SupplementalProperty SRD elements are shown below.
  • Example 1 SRD with 1 object (Object ID 5)
  • Example 2 SRD with 5 objects (Object ID 2, 4, 7, 9, and 14)
  • the SupplementalProperty syntax element may be used to provide an association between particular spatial portions of a video (e.g. particular slices) and particular objects of interest.
  • Example 1 above provides an association between the slice numbered 16 and an object numbered 5
  • Example 2 above provides an association between the slice numbered 16 and an objects numbered 2, 4, 7, 9, and 14.
  • a client device may request all slices that are associated with the selected object.
  • xM,yM refers to the x,y position of the slice origin.
  • these may be pixel values.
  • xl6,yl6 would be equal to 0, 1080.
  • each frame is encoded (e.g. with H.264, HEVC, or the like)
  • a user data entry may be inserted for each slice and may be used to provide the object position information.
  • the client uses this information provide a smoothly rendered picture with the object being tracked on the client display.
  • the MPD may be updated with a new list of slices for the client to access.
  • This MPD change may include a sequence access point (SAP) in the DASH segments.
  • SAP sequence access point
  • ⁇ Title> Example of a DASH Media Presentation Description using Spatial Relationship Description to indicate that a video is a zoomed part of another ⁇ /Title>
  • mimeType "video/mp4"
  • codecs "avcl .42c033"
  • width 1920
  • height 1080
  • bandwidth 1055223
  • startWithSAP 1 ">
  • mimeType "video/mp4"
  • codecs "avcl .42c033"
  • width 768
  • height 360
  • bandwidth 1055223
  • startWithSAP 1 ">
  • mimeType "video/mp4"
  • codecs "avcl .42c033"
  • width 768
  • height 360
  • bandwidth 1055223
  • startWithSAP 1 ">
  • mimeType "video/mp4"
  • codecs "avcl .42c033"
  • width 768
  • height 360
  • bandwidth 1055223
  • startWithSAP 1 ">
  • mimeType "video/mp4"
  • codecs "avcl .42c033"
  • width 768
  • height 360
  • bandwidth 1055223
  • startWithSAP 1 ">
  • mimeType "video/mp4"
  • codecs "avcl .42c033"
  • width 768
  • height 360
  • bandwidth 1055223
  • startWithSAP 1 ">
  • mimeType "video/mp4"
  • codecs "avcl .42c033"
  • width 768
  • height 360
  • bandwidth 1055223
  • startWithSAP 1 ">
  • mimeType "video/mp4"
  • codecs "avcl .42c033"
  • width 768
  • height 360
  • bandwidth 1055223
  • startWithSAP 1 ">
  • mimeType "video/mp4"
  • codecs "avcl .42c033"
  • width 768
  • height 360
  • bandwidth 1055223
  • startWithSAP 1 ">
  • mimeType "video/mp4"
  • codecs "avcl .42c033"
  • width 768
  • height 360
  • bandwidth 1055223
  • startWithSAP 1 ">
  • Examples of computer-readable storage media include, but are not limited to, a read only memory (ROM), a random access memory (RAM), a register, cache memory, semiconductor memory devices, magnetic media such as internal hard disks and removable disks, magneto-optical media, and optical media such as CD-ROM disks, and digital versatile disks (DVDs).
  • ROM read only memory
  • RAM random access memory
  • register cache memory
  • semiconductor memory devices magnetic media such as internal hard disks and removable disks, magneto-optical media, and optical media such as CD-ROM disks, and digital versatile disks (DVDs).
  • a processor in association with software may be used to implement a radio frequency transceiver for use in a WTRU, UE, terminal, base station, RNC, or any host computer.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

La présente invention concerne des systèmes et des procédés qui permettent à des clients vidéo d'effectuer un zoom dans une région ou sur un objet d'intérêt sans perte notable de résolution. Selon un procédé donné à titre d'exemple, un serveur transmet un manifeste, tel qu'un DASH MPD, à un dispositif client. Le manifeste identifie une pluralité de sous-flux, chaque sous-flux représentant une partie spatiale respective d'une vidéo d'origine. Le manifeste comprend également des informations associant un objet d'intérêt à une pluralité de parties spatiales. Pour visualiser un zoom de grande qualité d'une vidéo, le client demande les sous-flux qui sont associés à l'objet d'intérêt et effectue un rendu des sous-flux demandés. Dans certains modes de réalisation, un point de rendu est transmis au client pour permettre un mouvement de l'objet d'intérêt.
PCT/US2016/046317 2015-08-14 2016-08-10 Procédé et systèmes d'affichage d'une partie d'un flux vidéo WO2017030865A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201562205492P 2015-08-14 2015-08-14
US62/205,492 2015-08-14

Publications (1)

Publication Number Publication Date
WO2017030865A1 true WO2017030865A1 (fr) 2017-02-23

Family

ID=56802680

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2016/046317 WO2017030865A1 (fr) 2015-08-14 2016-08-10 Procédé et systèmes d'affichage d'une partie d'un flux vidéo

Country Status (2)

Country Link
TW (1) TW201724868A (fr)
WO (1) WO2017030865A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190014165A1 (en) * 2017-07-10 2019-01-10 Qualcomm Incorporated Processing media data using a generic descriptor for file format boxes
WO2020121322A3 (fr) * 2018-12-12 2020-07-30 Sling Media Pvt Ltd. Systèmes, procédés et dispositifs pour optimiser le débit binaire de diffusion en continu sur la base de profils d'affichage multi-clients
CN115766679A (zh) * 2017-03-23 2023-03-07 Vid拓展公司 改进用于360度自适应流传输的体验的度量和消息

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140089990A1 (en) * 2011-06-08 2014-03-27 Nederlandse Organisatie Voor Toegepast- Natuurwetenschappelijk Onderzoek Tno Spatially-Segmented Content Delivery
WO2014057131A1 (fr) * 2012-10-12 2014-04-17 Canon Kabushiki Kaisha Procédé et dispositif correspondant pour la diffusion en flux de données vidéo
EP2824885A1 (fr) * 2013-07-12 2015-01-14 Alcatel Lucent Format de fichier de manifeste supportant une vidéo panoramique
WO2015014773A1 (fr) * 2013-07-29 2015-02-05 Koninklijke Kpn N.V. Fourniture de flux vidéo tuilés à un client

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140089990A1 (en) * 2011-06-08 2014-03-27 Nederlandse Organisatie Voor Toegepast- Natuurwetenschappelijk Onderzoek Tno Spatially-Segmented Content Delivery
WO2014057131A1 (fr) * 2012-10-12 2014-04-17 Canon Kabushiki Kaisha Procédé et dispositif correspondant pour la diffusion en flux de données vidéo
EP2824885A1 (fr) * 2013-07-12 2015-01-14 Alcatel Lucent Format de fichier de manifeste supportant une vidéo panoramique
WO2015014773A1 (fr) * 2013-07-29 2015-02-05 Koninklijke Kpn N.V. Fourniture de flux vidéo tuilés à un client

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ADITYA MAVLANKAR ET AL: "An interactive region-of-interest video streaming system for online lecture viewing", PACKET VIDEO WORKSHOP (PV), 2010 18TH INTERNATIONAL, IEEE, 13 December 2010 (2010-12-13), pages 64 - 71, XP031899005, ISBN: 978-1-4244-9522-1, DOI: 10.1109/PV.2010.5706821 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115766679A (zh) * 2017-03-23 2023-03-07 Vid拓展公司 改进用于360度自适应流传输的体验的度量和消息
US20190014165A1 (en) * 2017-07-10 2019-01-10 Qualcomm Incorporated Processing media data using a generic descriptor for file format boxes
US11665219B2 (en) * 2017-07-10 2023-05-30 Qualcomm Incorporated Processing media data using a generic descriptor for file format boxes
WO2020121322A3 (fr) * 2018-12-12 2020-07-30 Sling Media Pvt Ltd. Systèmes, procédés et dispositifs pour optimiser le débit binaire de diffusion en continu sur la base de profils d'affichage multi-clients
US11463758B2 (en) 2018-12-12 2022-10-04 Sling Media Pvt. Ltd. Systems, methods, and devices for optimizing streaming bitrate based on multiclient display profiles

Also Published As

Publication number Publication date
TW201724868A (zh) 2017-07-01

Similar Documents

Publication Publication Date Title
WO2018049321A1 (fr) Procédé et systèmes d'affichage d'une partie d'un flux vidéo avec des rapports de grossissement partiel
US10841566B2 (en) Methods and apparatus of viewport adaptive 360 degree video delivery
CN109076239B (zh) 虚拟实境中的圆形鱼眼视频
US10893256B2 (en) Apparatus, a method and a computer program for omnidirectional video
US10917564B2 (en) Systems and methods of generating and processing files for partial decoding and most interested regions
US20180176468A1 (en) Preferred rendering of signalled regions-of-interest or viewports in virtual reality video
US11109044B2 (en) Color space conversion
JP6466324B2 (ja) ピクチャ方位情報を有するマルチメディア通信のためのデバイスおよび方法
JP6607414B2 (ja) 画像符号化装置および方法
TW201840201A (zh) 全向式視覺媒體中之感興趣區之進階傳信
US9674499B2 (en) Compatible three-dimensional video communications
TW201720170A (zh) 用戶變焦編碼內容解譯及表達方法及系統
KR20170005366A (ko) 고 해상도 영상에서의 영상 추출 장치 및 방법
WO2014203763A1 (fr) Dispositif de décodage, procédé de décodage, dispositif de codage et procédé de codage
WO2017123474A1 (fr) Système et procédé de fonctionnement de lecteur vidéo pour lire des vidéos en mode d'enrichissement
WO2017030865A1 (fr) Procédé et systèmes d'affichage d'une partie d'un flux vidéo
EP3123714B1 (fr) Négociation d'orientation vidéo
WO2017180439A1 (fr) Système et procédé de commutation rapide de flux avec rognage et agrandissement dans un lecteur client
WO2021125185A1 (fr) Systèmes et procédés de signalisation d'informations de bouclage de point de vue dans du contenu multimédia omnidirectionnel

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16757771

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16757771

Country of ref document: EP

Kind code of ref document: A1