[go: up one dir, main page]

US20180332287A1 - Method, apparatus and computer program product for providing improved data compression - Google Patents

Method, apparatus and computer program product for providing improved data compression Download PDF

Info

Publication number
US20180332287A1
US20180332287A1 US15/951,781 US201815951781A US2018332287A1 US 20180332287 A1 US20180332287 A1 US 20180332287A1 US 201815951781 A US201815951781 A US 201815951781A US 2018332287 A1 US2018332287 A1 US 2018332287A1
Authority
US
United States
Prior art keywords
data
input data
high priority
priority data
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US15/951,781
Inventor
Jani K. Nurminen
Sakari Himanen
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Conversant Wireless Licensing SARL
Original Assignee
Conversant Wireless Licensing SARL
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Conversant Wireless Licensing SARL filed Critical Conversant Wireless Licensing SARL
Priority to US15/951,781 priority Critical patent/US20180332287A1/en
Assigned to CPPIB CREDIT INVESTMENTS, INC. reassignment CPPIB CREDIT INVESTMENTS, INC. AMENDED AND RESTATED U.S. PATENT SECURITY AGREEMENT (FOR NON-U.S. GRANTORS) Assignors: CONVERSANT WIRELESS LICENSING S.A R.L.
Publication of US20180332287A1 publication Critical patent/US20180332287A1/en
Assigned to CONVERSANT WIRELESS LICENSING S.A R.L. reassignment CONVERSANT WIRELESS LICENSING S.A R.L. RELEASE BY SECURED PARTY (SEE DOCUMENT FOR DETAILS). Assignors: CPPIB CREDIT INVESTMENTS INC.
Abandoned legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • G10L19/24Variable rate codecs, e.g. for generating different qualities using a scalable representation such as hierarchical encoding or layered encoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/147Data rate or code amount at the encoder output according to rate distortion criteria
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/189Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding
    • H04N19/196Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding being specially adapted for the computation of encoding parameters, e.g. by averaging previously computed encoding parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/032Quantisation or dequantisation of spectral components

Definitions

  • Embodiments of the present invention relate generally to data compression technology and, more particularly, relate to a method, apparatus and computer program product for providing compression of data for use in various environments such as, for example, speech coding, video coding, audio coding, image coding, etc.
  • the data processing services may be provided from a network server or other network device, from the mobile terminal such as, for example, a mobile telephone, a mobile television, a mobile gaming system, etc, or even from a combination of the mobile terminal and the network device.
  • the data processing services may include transmission, reception, coding, decoding, storage, or other processing of, for example, image files, speech, video or audio signals, etc.
  • a user of a mobile terminal who wishes to wirelessly communicate data comprising images, speech, audio, video, etc., which has been captured by the mobile terminal may experience delays in transmitting the data to the network device.
  • the delays may result due to the relatively long transmission times required to transmit relatively large quantities of data (e.g., large image files captured by imaging devices on modern mobile terminals).
  • compression techniques have been developed in order to reduce transmission times and storage requirements.
  • compression techniques may introduce errors or reduce the quality of the compressed data when the data is decompressed for rendering.
  • spectral information related to a speech signal may be considered to include two portions.
  • a first portion including a model of the human vocal tract, may be modeled using linear prediction and is referred to as the spectral envelope.
  • the spectral envelope is typically considered the most important part of the spectral information.
  • a second portion of the spectral information may include remaining parts of the spectral information which include detailed information that, while useful in adding to the speech quality, may not be necessary for clear comprehension of the speech.
  • the second portion may be referred to as the residual spectrum or excitation spectrum.
  • the residual spectrum typically has lower perceptual importance, but if the residual spectrum is neglected, speech quality usually suffers to some extent. Since residual information comprising the residual spectrum includes details that cannot be easily modeled by the spectral envelope, the amount of information to be quantized therein may be high.
  • a method, apparatus and computer program product are therefore provided that may enable data compression of both high priority data and low priority data with relatively low bit rates.
  • a main quantizer may be trained for processing the high priority data and a side model may simultaneously be trained for processing the low priority data.
  • training for processing of both the high and low priority data may be completed in order to account for high and low priority data jointly.
  • processing of data may occur without negatively impacting quantizer performance. Accordingly, data compression efficiency may be achieved while maintaining relatively high speech quality for low bit rate speech coding.
  • a method for providing improved data compression may include partially encoding the input data by encoding the high priority data using a quantizer trained with respect to high priority data, and modeling low priority data among the input data using a side model trained jointly with the training of the quantizer.
  • a computer program product for providing improved data compression includes at least one computer-readable storage medium having computer-readable program code portions stored therein.
  • the computer-readable program code portions include first, second and third executable portions.
  • the first executable portion is for partially encoding the input data by encoding the high priority data using a quantizer trained with respect to high priority data.
  • the second executable portion is for modeling low priority data among the input data using a side model trained jointly with the training of the quantizer.
  • an apparatus for providing improved data compression may include an encoder comprising a quantizer for encoding input data and a side model.
  • the quantizer may be trained with respect to high priority data among the input data and may be configured to partially encode the input data by encoding the high priority data.
  • the side model may be trained jointly with the training of the quantizer and is configured to model low priority data among the input data.
  • an apparatus for decoding compressed data includes a side model and a decoder including a dequantizer.
  • the decoder may be configured to decode compressed data using the dequantizer to recover high priority data from a received signal.
  • the side model may be configured to recover low priority data from the received signal.
  • the side model may be trained jointly with the training of a quantizer used to encode the received signal.
  • the quantizer may be trained with respect to the high priority data.
  • a method for decoding compressed data may include decoding compressed data using a dequantizer to recover high priority data from a received signal, and recovering low priority data from the received signal using a side model trained jointly with the training of a quantizer used to encode the received signal.
  • the quantizer may be trained with respect to the high priority data.
  • Embodiments of the present invention may enable relatively high quality encoding of data in low bit rate environments. Accordingly, bits may be preserved for the primary purpose of encoding higher priority data, rather than devoting bits to encoding of less important data (e.g., lower priority data).
  • FIG. 1 is a schematic block diagram of a mobile terminal according to an exemplary embodiment of the present invention
  • FIG. 2 is a schematic block diagram of a wireless communications system according to an exemplary embodiment of the present invention.
  • FIG. 3 illustrates a block diagram showing an encoder and a decoder according to an exemplary embodiment of the present invention
  • FIG. 4 is a block diagram according to an exemplary method of improving data compression according to an exemplary embodiment of the present invention.
  • FIG. 5 illustrates a method of training a quantizer and a side model in accordance with an embodiment of the present invention.
  • FIG. 1 illustrates a block diagram of a mobile terminal 10 that would benefit from embodiments of the present invention. It should be understood, however, that a mobile telephone as illustrated and hereinafter described is merely illustrative of one type of mobile terminal that would benefit from embodiments of the present invention and, therefore, should not be taken to limit the scope of embodiments of the present invention.
  • mobile terminal 10 While several embodiments of the mobile terminal 10 are illustrated and will be hereinafter described for purposes of example, other types of mobile terminals, such as portable digital assistants (PDAs), pagers, mobile televisions, gaming devices, laptop computers, cameras, video recorders, audio/video player, radio, GPS devices, or any combination of the aforementioned, and other types of voice and text communications systems, can readily employ embodiments of the present invention.
  • PDAs portable digital assistants
  • pagers mobile televisions
  • gaming devices gaming devices
  • laptop computers cameras
  • video recorders audio/video player, radio, GPS devices
  • GPS devices GPS devices
  • the mobile terminal 10 includes an antenna 12 (or multiple antennae) in operable communication with a transmitter 14 and a receiver 16 .
  • the mobile terminal 10 may further include an apparatus, such as a controller 20 or other processing element, that provides signals to and receives signals from the transmitter 14 and receiver 16 , respectively.
  • the signals include signaling information in accordance with the air interface standard of the applicable cellular system, and also user speech, received data and/or user generated data.
  • the mobile terminal 10 is capable of operating with one or more air interface standards, communication protocols, modulation types, and access types.
  • the mobile terminal 10 is capable of operating in accordance with any of a number of first, second, third and/or fourth-generation communication protocols or the like.
  • the mobile terminal 10 may be capable of operating in accordance with second-generation (2G) wireless communication protocols IS-136 (time division multiple access (TDMA)), GSM (global system for mobile communication), and IS-95 (code division multiple access (CDMA)), or with third-generation (3G) wireless communication protocols, such as Universal Mobile Telecommunications System (UMTS), CDMA2000, wideband CDMA (WCDMA) and time division-synchronous CDMA (TD-SCDMA), with fourth-generation (4G) wireless communication protocols or the like.
  • 2G wireless communication protocols IS-136 (time division multiple access (TDMA)
  • GSM global system for mobile communication
  • IS-95 code division multiple access
  • third-generation (3G) wireless communication protocols such as Universal Mobile Telecommunications System (UMTS), CDMA2000, wideband CDMA (WCDMA) and time division-synchronous CDMA (TD-SCDMA), with fourth-generation (4G) wireless communication protocols or the like.
  • 3G wireless communication protocols such as Universal Mobile Telecommunications System (UMTS), CDMA2000, wideband CDMA (WC
  • the apparatus may include circuitry desirable for implementing audio and logic functions of the mobile terminal 10 .
  • the controller 20 may be comprised of a digital signal processor device, a microprocessor device, and various analog to digital converters, digital to analog converters, and other support circuits. Control and signal processing functions of the mobile terminal 10 are allocated between these devices according to their respective capabilities.
  • the controller 20 thus may also include the functionality to convolutionally encode and interleave message and data prior to modulation and transmission.
  • the controller 20 can additionally include an internal voice coder, and may include an internal data modem. Further, the controller 20 may include functionality to operate one or more software programs, which may be stored in memory.
  • the controller 20 may be capable of operating a connectivity program, such as a conventional Web browser.
  • the connectivity program may then allow the mobile terminal 10 to transmit and receive Web content, such as location-based content and/or other web page content, according to a Wireless Application Protocol (WAP), Hypertext Transfer Protocol (HTTP) and/or the like, for example.
  • WAP Wireless Application Protocol
  • HTTP Hypertext Transfer Protocol
  • the mobile terminal 10 may also comprise a user interface including an output device such as a conventional earphone or speaker 24 , a ringer 22 , a microphone 26 , a display 28 , and a user input interface, all of which are coupled to the controller 20 .
  • the user input interface which allows the mobile terminal 10 to receive data, may include any of a number of devices allowing the mobile terminal 10 to receive data, such as a keypad 30 , a touch display (not shown) or other input device.
  • the keypad 30 may include the conventional numeric (0-9) and related keys (#, *), and other hard and soft keys used for operating the mobile terminal 10 .
  • the keypad 30 may include a conventional QWERTY keypad arrangement.
  • the keypad 30 may also include various soft keys with associated functions.
  • the mobile terminal 10 may include an interface device such as a joystick or other user input interface.
  • the mobile terminal 10 further includes a battery 34 , such as a vibrating battery pack, for powering various circuits that are required to operate the mobile terminal 10 , as well as optionally providing mechanical vibration as a detectable output.
  • the mobile terminal 10 may include a positioning sensor 36 .
  • the positioning sensor 36 may include, for example, a global positioning system (GPS) sensor, an assisted global positioning system (Assisted-GPS) sensor, etc. However, in one exemplary embodiment, the positioning sensor 36 includes a pedometer or inertial sensor.
  • the positioning sensor 36 is capable of determining a location of the mobile terminal 10 , such as, for example, longitudinal and latitudinal directions of the mobile terminal 10 , or a position relative to a reference point such as a destination or start point. Information from the positioning sensor 36 may then be communicated to a memory of the mobile terminal 10 or to another memory device to be stored as a position history or location information.
  • the mobile terminal 10 may further include a user identity module (UIM) 38 .
  • the UIM 38 is typically a memory device having a processor built in.
  • the UIM 38 may include, for example, a subscriber identity module (SIM), a universal integrated circuit card (UICC), a universal subscriber identity module (USIM), a removable user identity module (R-UIM), etc.
  • SIM subscriber identity module
  • UICC universal integrated circuit card
  • USIM universal subscriber identity module
  • R-UIM removable user identity module
  • the UIM 38 typically stores information elements related to a mobile subscriber.
  • the mobile terminal 10 may be equipped with memory.
  • the mobile terminal 10 may include volatile memory 40 , such as volatile Random Access Memory (RAM) including a cache area for the temporary storage of data.
  • RAM volatile Random Access Memory
  • the mobile terminal 10 may also include other non-volatile memory 42 , which can be embedded and/or may be removable.
  • the non-volatile memory 42 can additionally or alternatively comprise an electrically erasable programmable read only memory (EEPROM), flash memory or the like, such as that available from the SanDisk Corporation of Sunnyvale, Calif., or Lexar Media Inc. of Fremont, Calif.
  • EEPROM electrically erasable programmable read only memory
  • flash memory or the like, such as that available from the SanDisk Corporation of Sunnyvale, Calif., or Lexar Media Inc. of Fremont, Calif.
  • the memories can store any of a number of pieces of information, and data, used by the mobile terminal 10 to implement the functions of the mobile terminal 10 .
  • the memories can include an identifier, such as an international mobile equipment identification (IMEI) code, capable of uniquely identifying the mobile terminal 10 .
  • IMEI international mobile equipment identification
  • the memories may store instructions for determining cell id information.
  • the memories may store an application program for execution by the controller 20 , which determines an identity of the current cell, i.e., cell id identity or cell id information, with which the mobile terminal 10 is in communication.
  • the cell id information may be used to more accurately determine a location of the mobile terminal 10 .
  • the mobile terminal 10 includes a media capturing module, such as a camera, video and/or audio module, in communication with the controller 20 .
  • the media capturing module may be any means for capturing an image, video and/or audio for storage, display or transmission.
  • the media capturing module is a camera module 37
  • the camera module 37 may include a digital camera capable of forming a digital image file from a captured image, or a video file from a series of captured image frames with or without accompanying audio data.
  • the camera module 37 includes all hardware, such as a lens or other optical device, and software necessary for creating a digital image, video or audio file from captured image/audio data.
  • the camera module 37 may include only the hardware needed to capture an image, while a memory device of the mobile terminal 10 stores instructions for execution by the controller 20 in the form of software necessary to create a digital image file from a captured image.
  • the camera module 37 may further include a processing element such as a co-processor which assists the controller 20 in processing image data and an encoder and/or decoder for compressing and/or decompressing image data.
  • the encoder and/or decoder may encode and/or decode according to, for example, a joint photographic experts group (JPEG) standard or other format.
  • JPEG joint photographic experts group
  • FIG. 2 is a schematic block diagram of a wireless communications system according to an exemplary embodiment of the present invention.
  • the system includes a plurality of network devices.
  • one or more mobile terminals 10 may each include an antenna 12 for transmitting signals to and for receiving signals from a base site or base station (BS) 44 .
  • the base station 44 may be a part of one or more cellular or mobile networks each of which includes elements required to operate the network, such as a mobile switching center (MSC) 46 .
  • MSC mobile switching center
  • the mobile network may also be referred to as a Base Station/MSC/Interworking function (BMI).
  • BMI Base Station/MSC/Interworking function
  • the MSC 46 is capable of routing calls to and from the mobile terminal 10 when the mobile terminal 10 is making and receiving calls.
  • the MSC 46 can also provide a connection to landline trunks when the mobile terminal 10 is involved in a call.
  • the MSC 46 can be capable of controlling the forwarding of messages to and from the mobile terminal 10 , and can also control the forwarding of messages for the mobile terminal 10 to and from a messaging center. It should be noted that although the MSC 46 is shown in the system of FIG. 2 , the MSC 46 is merely an exemplary network device and embodiments of the present invention are not limited to use in a network employing an MSC.
  • the MSC 46 can be coupled to a data network, such as a local area network (LAN), a metropolitan area network (MAN), and/or a wide area network (WAN).
  • the MSC 46 can be directly coupled to the data network.
  • the MSC 46 is coupled to a gateway device (GTW) 48
  • GTW 48 is coupled to a WAN, such as the Internet 50 .
  • devices such as processing elements (e.g., personal computers, server computers or the like) can be coupled to the mobile terminal 10 via the Internet 50 .
  • the processing elements can include one or more processing elements associated with a computing system 52 (two shown in FIG. 2 ), origin server 54 (one shown in FIG. 2 ) or the like, as described below.
  • the BS 44 can also be coupled to a serving GPRS (General Packet Radio Service) support node (SGSN) 56 .
  • SGSN General Packet Radio Service
  • the SGSN 56 is typically capable of performing functions similar to the MSC 46 for packet switched services.
  • the SGSN 56 like the MSC 46 , can be coupled to a data network, such as the Internet 50 .
  • the SGSN 56 can be directly coupled to the data network. In a more typical embodiment, however, the SGSN 56 is coupled to a packet-switched core network, such as a GPRS core network 58 .
  • the packet-switched core network is then coupled to another GTW 48 , such as a gateway GPRS support node (GGSN) 60 , and the GGSN 60 is coupled to the Internet 50 .
  • the packet-switched core network can also be coupled to a GTW 48 .
  • the GGSN 60 can be coupled to a messaging center.
  • the GGSN 60 and the SGSN 56 like the MSC 46 , may be capable of controlling the forwarding of messages, such as MMS messages.
  • the GGSN 60 and SGSN 56 may also be capable of controlling the forwarding of messages for the mobile terminal 10 to and from the messaging center.
  • devices such as a computing system 52 and/or origin server 54 may be coupled to the mobile terminal 10 via the Internet 50 , SGSN 56 and GGSN 60 .
  • devices such as the computing system 52 and/or origin server 54 may communicate with the mobile terminal 10 across the SGSN 56 , GPRS core network 58 and the GGSN 60 .
  • the mobile terminals 10 may communicate with the other devices and with one another, such as according to the Hypertext Transfer Protocol (HTTP) and/or the like, to thereby carry out various functions of the mobile terminals 10 .
  • HTTP Hypertext Transfer Protocol
  • the mobile terminal 10 may be coupled to one or more of any of a number of different networks through the BS 44 .
  • the network(s) may be capable of supporting communication in accordance with any one or more of a number of first-generation (1G), second-generation (2G), 2.5G, third-generation (3G), 3.9G, fourth-generation (4G) mobile communication protocols or the like.
  • one or more of the network(s) can be capable of supporting communication in accordance with 2G wireless communication protocols IS-136 (TDMA), GSM, and IS-95 (CDMA).
  • one or more of the network(s) can be capable of supporting communication in accordance with 2.5G wireless communication protocols GPRS, Enhanced Data GSM Environment (EDGE), or the like. Further, for example, one or more of the network(s) can be capable of supporting communication in accordance with 3G wireless communication protocols such as a UMTS network employing WCDMA radio access technology.
  • Some narrow-band analog mobile phone service (NAMPS), as well as total access communication system (TACS), network(s) may also benefit from embodiments of the present invention, as should dual or higher mode mobile stations (e.g., digital/analog or TDMA/CDMA/analog phones).
  • the mobile terminal 10 can further be coupled to one or more wireless access points (APs) 62 .
  • the APs 62 may comprise access points configured to communicate with the mobile terminal 10 in accordance with techniques such as, for example, radio frequency (RF), infrared (IrDA) or any of a number of different wireless networking techniques, including WLAN techniques such as IEEE 802.11 (e.g., 802.11a, 802.11b, 802.11g, 802.11n, etc.), world interoperability for microwave access (WiMAX) techniques such as IEEE 802.16, and/or wireless Personal Area Network (WPAN) techniques such as IEEE 802.15, BlueTooth (BT), ultra wideband (UWB) and/or the like.
  • the APs 62 may be coupled to the Internet 50 .
  • the APs 62 can be directly coupled to the Internet 50 . In one embodiment, however, the APs 62 are indirectly coupled to the Internet 50 via a GTW 48 . Furthermore, in one embodiment, the BS 44 may be considered as another AP 62 . As will be appreciated, by directly or indirectly connecting the mobile terminals 10 and the computing system 52 , the origin server 54 , and/or any of a number of other devices, to the Internet 50 , the mobile terminals 10 can communicate with one another, the computing system, etc., to thereby carry out various functions of the mobile terminals 10 , such as to transmit data, content or the like to, and/or receive content, data or the like from, the computing system 52 .
  • data As used herein, the terms “data,” “content,” “information” and similar terms may be used interchangeably to refer to data capable of being transmitted, received and/or stored in accordance with embodiments of the present invention. Thus, use of any such terms should not be taken to limit the spirit and scope of embodiments of the present invention.
  • the mobile terminal 10 and computing system 52 may be coupled to one another and communicate in accordance with, for example, RF, BT, IrDA or any of a number of different wireline or wireless communication techniques, including LAN, WLAN, WiMAX, UWB techniques and/or the like.
  • One or more of the computing systems 52 can additionally, or alternatively, include a removable memory capable of storing content, which can thereafter be transferred to the mobile terminal 10 .
  • the mobile terminal 10 can be coupled to one or more electronic devices, such as printers, digital projectors and/or other multimedia capturing, producing and/or storing devices (e.g., other terminals).
  • the mobile terminal 10 may be configured to communicate with the portable electronic devices in accordance with techniques such as, for example, RF, BT, IrDA or any of a number of different wireline or wireless communication techniques, including universal serial bus (USB), LAN, WLAN, WiMAX, UWB techniques and/or the like.
  • techniques such as, for example, RF, BT, IrDA or any of a number of different wireline or wireless communication techniques, including universal serial bus (USB), LAN, WLAN, WiMAX, UWB techniques and/or the like.
  • content or data may be communicated over the system of FIG. 2 between a mobile terminal, which may be similar to the mobile terminal 10 of FIG. 1 , and a network device of the system of FIG. 2 in order to, for example, execute applications or establish communication (e.g., voice communications utilizing speech coding, image/video data captured by the camera module 37 , etc.) between the mobile terminal 10 and other mobile terminals.
  • a mobile terminal which may be similar to the mobile terminal 10 of FIG. 1
  • a network device of the system of FIG. 2 in order to, for example, execute applications or establish communication (e.g., voice communications utilizing speech coding, image/video data captured by the camera module 37 , etc.) between the mobile terminal 10 and other mobile terminals.
  • FIG. 2 is merely provided for purposes of example.
  • embodiments of the present invention may be resident on a communication device such as the mobile terminal 10 , and/or may be resident on other devices, absent any communication with the system of FIG. 2 .
  • FIG. 3 An exemplary embodiment of the invention will now be described with reference to FIG. 3 , in which certain elements of an apparatus for providing improved data compression in, for example, a low bit rate speech coding environment are displayed.
  • the apparatus of FIG. 3 may be employed, for example, on the mobile terminal 10 of FIG. 1 and/or the computing system 52 or the origin server 54 of FIG. 2 .
  • the system of FIG. 3 may also be employed on a variety of other devices, both mobile and fixed, and therefore, the present invention should not be limited to application on devices such as the mobile terminal 10 of FIG. 1 .
  • FIG. 3 illustrates one example of a configuration of an apparatus for providing improved data compression, numerous other configurations may also be used to implement the present invention.
  • embodiments of the present invention are not limited to use in the context of speech coding, but also apply more generally to data compression for other compression applications as well, such as video coding, image coding, audio coding, etc.
  • the apparatus includes an encoder 70 and a decoder 72 capable of communication with each other (although such communication is not required).
  • the encoder 70 may be employed to encode data (e.g., speech data, audio data, video data, image data, etc.) and the decoder 72 may be employed to decode encoded data.
  • the encoder 70 may be any means or device embodied in hardware, software or a combination of hardware and software that is capable of performing encoding operations as described below.
  • the encoder 70 is embodied in software as instructions that are stored on a memory of the mobile terminal 10 and executed by the controller 20 .
  • the decoder 72 may be any device or means embodied in either hardware, software, or a combination of hardware and software that is capable of performing decoding operations as described below.
  • the decoder 72 may be embodied in software as instructions that are stored on a memory of the mobile terminal 10 and executed by the controller 20 .
  • either or both of the encoder 70 and decoder 72 could also be embodied at a network device, such as, for example, the computing system 52 or the origin server 54 of FIG. 2 , and/or at a different mobile terminal.
  • the encoder 70 may include a main quantizer 74 and a side information model 76 .
  • the side information model 76 could alternatively be separate from the encoder 70 (such as in the decoder 72 or elsewhere) or shared between the encoder 70 and the decoder 72 .
  • the encoder 70 may not include the side information model 76 .
  • the main quantizer 74 and the side information model 76 may each be any circuitry, means or device embodied in hardware, software or a combination of hardware and software that are configured to perform the corresponding functions of the main quantizer 74 and the side information model 76 , respectively, as described below.
  • the main quantizer 74 may be configured to perform quantization operations on incoming or input data (e.g., speech data in this example). Quantization refers to a process of approximating a continuous set of values in the input data with a finite (preferably small) set of values.
  • the input to the main quantizer 74 may be the original data, and the output of the main quantizer 74 may be a set of output values forming an approximation of the input, preferably with minimal loss or distortion.
  • the main quantizer 74 may be trained, in accordance with embodiments of the present invention, in order to optimize, enhance or at least improve quantization accuracy for a portion of the input data.
  • the portion of the input data for which the main quantizer 74 may be trained to provide optimal or improved quantization accuracy may be a high priority portion of the input data.
  • the high priority portion of the input data may be the spectral envelope.
  • high priority and low priority when used in the context of embodiments of the present invention, should be understood to be relative terms with respect to each other and do not represent high and low priorities in any absolute sense.
  • high priority data and low priority data can be predefined for content within any particular application.
  • background data may be low priority data in an image
  • rapidly changing features may be low priority data in video
  • data in certain frequency bands may be lower priority in audio.
  • one skilled in these respective arts may define high and low priority data as desired.
  • the side information model 76 may be trained simultaneously with the training of the main quantizer 74 .
  • the side information model 76 and the main quantizer 74 may both be trained simultaneously while optimizing or improving the quantization accuracy for the high priority portion (e.g., the spectral envelope).
  • the side information model 76 may be trained as a consequence of the training of the main quantizer 74 .
  • the side information model 76 may then model side information comprising the low priority portion of the input data (e.g., the residual spectrum).
  • the high priority portion e.g., the spectral envelope
  • the encoded high priority portion and modeled low priority portion e.g., the residual spectrum
  • the encoder may receive input data in the form of speech data (although as indicated above, other forms of data may also be employed) and may compress the speech data such that the spectral envelope, which is high priority data, is processed by a quantizer trained to provide an improved or optimal efficiency with respect to quantizing only high priority data. Meanwhile low priority data may be modeled by a side model trained jointly with the training of the quantizer.
  • the output of the encoder therefore includes partially encoded data (the portion encoded corresponding to the high priority data) and a model of the low priority portion (e.g., the residual spectrum).
  • a model of the low priority portion e.g., the residual spectrum
  • the decoder 72 may recover the residual spectrum based on the coded spectral envelope and the side information model 76 stored at the decoder 72 .
  • the decoder 72 may receive encoded information and may decode the received information prior to further processing.
  • the decoder 72 may include a side information model (which could be the same as the side information model 76 used in the encoder or an identical version thereof) and a dequantizer 78 .
  • the decoder 72 may be configured to receive data communicated to the apparatus (e.g., the mobile terminal 10 ) that has been encoded in accordance with an embodiment of the present invention, such as by a device employing an encoder as described above.
  • the dequantizer 78 may be any device or means embodied in either hardware, software, or a combination of hardware and software that is capable of dequantizing the quantized high priority portion (e.g., the spectral envelope).
  • the high priority portion e.g., the spectral envelope
  • the low priority portion e.g., the residual spectrum
  • a low bit rate speech encoding operation is utilized for compression, e.g., in a high quality text-to-speech database, with twenty bits dedicated to quantization, all twenty of the bits may be utilized for quantizing the spectral envelope without devoting any bits to quantizing the residual spectrum and yet, at the decoder 72 , the residual spectrum may be recovered from the modeled residual spectrum.
  • one bit could be used for indication with respect to model sufficiency.
  • one bit may be used to indicate whether the accuracy provided by the modeling described above is sufficient (e.g., based on modeling error for the residual spectrum). Accordingly, if accuracy is indicated as being insufficient, further details regarding the modeling error for the residual spectrum (or low priority data of any kind in other embodiments) may be provided and transmitted and/or stored in a location accessible for use in connection with practicing embodiments of the present invention.
  • the encoder 70 and decoder 72 may undergo training in order to, for example, optimize, enhance or improve performance of the main quantizer 74 (and dequantizer 78 ) with respect to high priority data as indicated at operation 80 .
  • Such training can also provide training the side information model 76 simultaneously.
  • the training described above may be performed offline or online.
  • an apparatus e.g., such as the mobile terminal 10 having the encoder 70 and/or decoder 72 may be pre-trained and, when received by a user, may include a trained encoder 70 and/or decoder 72 .
  • the encoder 70 and/or decoder 72 may each be embodied in software, in at least one embodiment, the encoder 70 and/or decoder 72 may each be controlled by or otherwise embodied as a processing element such as the controller 20 of the mobile terminal 10 (or a processor of a server, computer or other device). Processing elements such as those described herein may be embodied in many ways.
  • the processing element may be embodied as a processor, a coprocessor, a controller or various other processing means or devices including integrated circuits such as, for example, an ASIC (application specific integrated circuit).
  • ASIC application specific integrated circuit
  • MSVQ multistage vector quantization
  • some initial information will initially be provided as a background related to MSVQ.
  • embodiments of the present invention may also be practiced in other quantization contexts as well and MSVQ is merely one type of quantization that is provided herein by way of example and not of limitation.
  • a typical characteristic of the training procedure may be that the encoding criterion and the centroid calculation and/or model training are decoupled.
  • the encoding step may be done only partially, e.g., only for the high-priority data, but the centroids and/or other models may be computed also for the low-priority data.
  • An MSVQ quantizes vectors in two or more additive stages.
  • the objective is to find a vector combination, in other words a sum of the selected vectors at different stages, that minimizes the resulting distortion.
  • the quantized vector can be defined as:
  • y l (j) denotes the lth reproduction vector from the jth stage and S is the number of stages.
  • l j is the selected vector in the jth stage.
  • the quantized vector in multistage quantization can be expressed as:
  • vector c j is the codebook (e.g., stacked codebook) of jth stage
  • c j [y 1 (j)T y 2 (j)T . . . y N j (j)T ] T ,
  • T l (j) is a sparse Toeplitz matrix (p by pN j , N j being the number of codevectors at jth stage) that satisfies
  • the column vector c is the combined stacked codebook
  • c [c 1 T c 2 T . . . c S T ] T ,
  • T ⁇ T l 1 (1) T l 2 (2) . . . T l S (S) ⁇ .
  • the quantization process can be viewed as the process of constructing the selection matrix T.
  • x denote a complete spectral vector to be quantized.
  • a may be assumed to have been obtained through linear prediction and represented using line spectral frequencies (LSF).
  • LSF line spectral frequencies
  • the vector b may have been obtained by modeling the spectral details (e.g., low priority data) by transforming the residual amplitude spectrum into fixed-dimension mel cepstrum domain.
  • a new matrix Z may be included into the distortion measure to obtain the new measure:
  • the matrix Z is a diagonal matrix whose diagonal elements that correspond to vector a are set to 1 and others to 0. As such, the matrix Z may be considered a selector for partial encoding. In other words, matrix Z may be a tool used to enable the quantization to focus on the spectral envelope by selecting the high priority data (e.g., data corresponding to the spectral envelope).
  • the whole vector, including the spectral envelope part and the residual spectrum part is recovered.
  • the residual spectrum part can be directly stored in the codebook.
  • the residual spectrum part can be recovered using the information available from the spectral envelope part and a separate side information model. MSVQ may be used together with an M-L tree search algorithm.
  • one bit per vector may be included for indicating the quality of the modeled vector b. If the bit indicates that the accuracy of the particular vector is not sufficient, separate refinement information may be included for the vector.
  • the quantizer for the refinement information can be used and trained in a conventional manner
  • Quantizer training will now be described in accordance with an exemplary embodiment of the present invention.
  • the algorithm used for training the quantizer may be modified from that of a conventional quantizer in order to achieve the benefits described herein.
  • the partial encoding must be taken into the account (e.g., by providing the selector matrix Z).
  • W n is a (diagonal) weighting matrix for the nth training vector.
  • the weighted squared distortion may be rewritten as:
  • vector c j is the stacked codebook c with the vector components corresponding to jth stage codebook replaced with zeros.
  • Embodiments of the present invention can utilize a simultaneous joint design algorithm that may be initialized with sequentially designed random codebooks.
  • Table 1 illustrates an example of the simultaneous joint design algorithm.
  • the distortion d p mentioned at step 2 below is computed using the new matrix Z described above.
  • data comprising a training set may be quantized using the codebooks, and the running sums Q, Y and d q may be updated correspondingly.
  • the quantization is performed using a full search (e.g., all combinations in the codebook are searched).
  • M-L tree search with a moderate value of M.
  • Operations 3 and 4 may define an inner loop which minimizes the total distortion and thus optimizes the stacked codebook using the partitioning performed at operation 2.
  • the codebooks may be reordered to enhance the performance of the M-L tree search before continuing with a new partitioning.
  • the exemplary implementation provided above produces a straightforward side information model that can be stored inside the codebook(s) of the main quantizer.
  • the resulting codebook(s) can be stored in such a way that the resulting codebook(s) include joint vectors x that include both the high-priority part to be encoded, a, and the low-priority side information part b.
  • FIGS. 4 and 5 are flowcharts of methods and program products according to exemplary embodiments of the invention. It will be understood that each block or step of the flowcharts, and combinations of blocks in the flowcharts, can be implemented by various means, such as hardware, firmware, and/or software including one or more computer program instructions. For example, one or more of the procedures described above may be embodied by computer program instructions. In this regard, the computer program instructions which embody the procedures described above may be stored by a memory device of the mobile terminal and executed by a built-in processor in the mobile terminal.
  • any such computer program instructions may be loaded onto a computer or other programmable apparatus (i.e., hardware) to produce a machine, such that the instructions which execute on the computer or other programmable apparatus create means for implementing the functions specified in the flowcharts block(s) or step(s).
  • These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowcharts block(s) or step(s).
  • the computer program instructions may also be loaded onto a computer or other programmable apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer-implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowcharts block(s) or step(s).
  • blocks or steps of the flowcharts support combinations of means for performing the specified functions, combinations of steps for performing the specified functions and program instruction means for performing the specified functions. It will also be understood that one or more blocks or steps of the flowcharts, and combinations of blocks or steps in the flowcharts, can be implemented by special purpose hardware-based computer systems which perform the specified functions or steps, or combinations of special purpose hardware and computer instructions.
  • one embodiment of a method for providing improved data compression may provide that input data may be compressed without using any bits for quantization of the low priority data.
  • the method may include partially encoding input data by encoding high priority data among the input data using a quantizer trained with respect to the high priority data at operation 100 .
  • a selector matrix may be utilized to enable encoding only the input data that corresponds to the high priority data.
  • Low priority data may be modeled, at operation 110 , using a side model trained jointly with the training of the quantizer.
  • the method may further include decoding compressed data using a dequantizer and the side model to recover the high priority data and the low priority data, respectively, from a received transmission at an optional operation 120 .
  • compression may include quantizing the high priority data using the quantizer that is further trained to optimize or enhance quantization accuracy for the high priority data while simultaneously training the side model.
  • the method may further include providing an indicator bit indicative of a sufficiency of the accuracy of the modeling of the low priority data.
  • modeling error information for improving modeling accuracy may be provided in response to the indicator bit indicating insufficient modeling accuracy.
  • FIG. 5 illustrates a method of training the quantizer and the side model in accordance with an embodiment of the present invention.
  • the method may include gathering a data set including high priority data and low priority data jointly at operation 200 .
  • a quantizer may be trained for processing the high priority data in a data set.
  • the side model may then be jointly trained with the training of the quantizer with respect to quantization accuracy of the quantizer for the high priority data at operation 220 .
  • operations 210 and 220 may be handled simultaneously and/or with many iterations for the training.
  • the side model may be configured for modeling the low priority data.
  • the training of the quantizer and the joint training of the side model may include simultaneously training the quantizer and the side model offline.
  • Embodiments of the present invention may provide for a reduction in the bit rate required to achieve a given quality level (or better quality for a given bit rate), without a significant increase in computational load. Furthermore, only moderate increases in memory consumption may be required (e.g., for storage of side information data).

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Quality & Reliability (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Telephonic Communication Services (AREA)

Abstract

An apparatus for providing improved data compression may include an encoder comprising a quantizer for encoding input data and a side model. The quantizer may be trained with respect to high priority data among the input data and may be configured to partially encode the input data by encoding the high priority data. The side model may be trained jointly with the training of the quantizer and is configured to model low priority data among the input data.

Description

    CROSS-REFERENCE TO RELATED APPLICATIONS
  • This application is a continuation of U.S. patent application Ser. No. 15/244,856, filed on Aug. 23, 2016, which is a continuation of U.S. patent application Ser. No. 14/529,718, filed on Oct. 31, 2014 now U.S. Pat. No. 9,451,265, which is a continuation of U.S. patent application Ser. No. 11/867,212 filed on Oct. 4, 2007 now U.S. Pat. No. 8,886,612. The above-identified application is herein incorporated by reference in its entirety.
  • FIELD OF THE INVENTION
  • Embodiments of the present invention relate generally to data compression technology and, more particularly, relate to a method, apparatus and computer program product for providing compression of data for use in various environments such as, for example, speech coding, video coding, audio coding, image coding, etc.
  • BACKGROUND OF THE INVENTION
  • The modern communications era has brought about a tremendous expansion of wireline and wireless networks. Computer networks, television networks, and telephony networks are experiencing an unprecedented technological expansion, fueled by consumer demand. Wireless and mobile networking technologies have addressed related consumer demands, while providing more flexibility and immediacy of information transfer.
  • Current and future networking technologies continue to facilitate ease of information transfer and convenience to users. One area in which there is a demand to increase ease of information transfer relates to data processing services for data being communicated to/from a mobile terminal. The data processing services may be provided from a network server or other network device, from the mobile terminal such as, for example, a mobile telephone, a mobile television, a mobile gaming system, etc, or even from a combination of the mobile terminal and the network device. The data processing services may include transmission, reception, coding, decoding, storage, or other processing of, for example, image files, speech, video or audio signals, etc.
  • In current applications, a user of a mobile terminal who wishes to wirelessly communicate data comprising images, speech, audio, video, etc., which has been captured by the mobile terminal may experience delays in transmitting the data to the network device. The delays may result due to the relatively long transmission times required to transmit relatively large quantities of data (e.g., large image files captured by imaging devices on modern mobile terminals). Accordingly, compression techniques have been developed in order to reduce transmission times and storage requirements. However, compression techniques may introduce errors or reduce the quality of the compressed data when the data is decompressed for rendering.
  • An exemplary field in which compression may introduce problems may be speech coding. In this regard, for example, in low bit rate speech coding, spectral information related to a speech signal may be considered to include two portions. A first portion, including a model of the human vocal tract, may be modeled using linear prediction and is referred to as the spectral envelope. The spectral envelope is typically considered the most important part of the spectral information. A second portion of the spectral information may include remaining parts of the spectral information which include detailed information that, while useful in adding to the speech quality, may not be necessary for clear comprehension of the speech. The second portion may be referred to as the residual spectrum or excitation spectrum. The residual spectrum typically has lower perceptual importance, but if the residual spectrum is neglected, speech quality usually suffers to some extent. Since residual information comprising the residual spectrum includes details that cannot be easily modeled by the spectral envelope, the amount of information to be quantized therein may be high.
  • Several frequency domain and time domain techniques have been developed for modeling and quantizing the residual spectrum. However, these techniques typically share common weaknesses with respect to providing modeling that may be considered too coarse for achieving high quality, or for requiring a large number of bits for quantizing the model parameters for relatively low bit rates. Moreover, conventional mechanisms have typically modeled and quantized the residual spectrum and the spectral envelope separately.
  • Other types of data may also include portions of the data to be compressed that are of unequal importance. Thus, given the above described problems, it may be advantageous to provide an improved data compression technique that may overcome at least some of the disadvantages described above to, for example, produce improved compression in low bit rate environments.
  • BRIEF SUMMARY OF THE INVENTION
  • A method, apparatus and computer program product are therefore provided that may enable data compression of both high priority data and low priority data with relatively low bit rates. In an exemplary embodiment, a main quantizer may be trained for processing the high priority data and a side model may simultaneously be trained for processing the low priority data. As such, training for processing of both the high and low priority data may be completed in order to account for high and low priority data jointly. As such, processing of data may occur without negatively impacting quantizer performance. Accordingly, data compression efficiency may be achieved while maintaining relatively high speech quality for low bit rate speech coding.
  • In one exemplary embodiment, a method for providing improved data compression is provided. The method may include partially encoding the input data by encoding the high priority data using a quantizer trained with respect to high priority data, and modeling low priority data among the input data using a side model trained jointly with the training of the quantizer.
  • In another exemplary embodiment, a computer program product for providing improved data compression is provided. The computer program product includes at least one computer-readable storage medium having computer-readable program code portions stored therein. The computer-readable program code portions include first, second and third executable portions. The first executable portion is for partially encoding the input data by encoding the high priority data using a quantizer trained with respect to high priority data. The second executable portion is for modeling low priority data among the input data using a side model trained jointly with the training of the quantizer.
  • In another exemplary embodiment, an apparatus for providing improved data compression is provided. The apparatus may include an encoder comprising a quantizer for encoding input data and a side model. The quantizer may be trained with respect to high priority data among the input data and may be configured to partially encode the input data by encoding the high priority data. The side model may be trained jointly with the training of the quantizer and is configured to model low priority data among the input data.
  • In another exemplary embodiment, an apparatus for decoding compressed data is provided. The apparatus includes a side model and a decoder including a dequantizer. The decoder may be configured to decode compressed data using the dequantizer to recover high priority data from a received signal. The side model may be configured to recover low priority data from the received signal. The side model may be trained jointly with the training of a quantizer used to encode the received signal. The quantizer may be trained with respect to the high priority data.
  • In yet another exemplary embodiment, a method for decoding compressed data is provided. The method may include decoding compressed data using a dequantizer to recover high priority data from a received signal, and recovering low priority data from the received signal using a side model trained jointly with the training of a quantizer used to encode the received signal. The quantizer may be trained with respect to the high priority data.
  • Embodiments of the present invention may enable relatively high quality encoding of data in low bit rate environments. Accordingly, bits may be preserved for the primary purpose of encoding higher priority data, rather than devoting bits to encoding of less important data (e.g., lower priority data).
  • BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWING(S)
  • Having thus described the invention in general terms, reference will now be made to the accompanying drawings, which are not necessarily drawn to scale, and wherein:
  • FIG. 1 is a schematic block diagram of a mobile terminal according to an exemplary embodiment of the present invention;
  • FIG. 2 is a schematic block diagram of a wireless communications system according to an exemplary embodiment of the present invention;
  • FIG. 3 illustrates a block diagram showing an encoder and a decoder according to an exemplary embodiment of the present invention;
  • FIG. 4 is a block diagram according to an exemplary method of improving data compression according to an exemplary embodiment of the present invention; and
  • FIG. 5 illustrates a method of training a quantizer and a side model in accordance with an embodiment of the present invention.
  • DETAILED DESCRIPTION OF THE INVENTION
  • Embodiments of the present invention will now be described more fully hereinafter with reference to the accompanying drawings, in which some, but not all embodiments of the invention are shown. Indeed, the invention may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein;
  • rather, these embodiments are provided so that this disclosure will satisfy applicable legal requirements. Like reference numerals refer to like elements throughout.
  • FIG. 1, one aspect of the invention, illustrates a block diagram of a mobile terminal 10 that would benefit from embodiments of the present invention. It should be understood, however, that a mobile telephone as illustrated and hereinafter described is merely illustrative of one type of mobile terminal that would benefit from embodiments of the present invention and, therefore, should not be taken to limit the scope of embodiments of the present invention. While several embodiments of the mobile terminal 10 are illustrated and will be hereinafter described for purposes of example, other types of mobile terminals, such as portable digital assistants (PDAs), pagers, mobile televisions, gaming devices, laptop computers, cameras, video recorders, audio/video player, radio, GPS devices, or any combination of the aforementioned, and other types of voice and text communications systems, can readily employ embodiments of the present invention.
  • In addition, while several embodiments of the method of the present invention are performed or used by a mobile terminal 10, the method may be employed by other than a mobile terminal. Moreover, the system and method of embodiments of the present invention will be primarily described in conjunction with mobile communications applications. It should be understood, however, that the system and method of embodiments of the present invention can be utilized in conjunction with a variety of other applications, both in the mobile communications industries and outside of the mobile communications industries.
  • The mobile terminal 10 includes an antenna 12 (or multiple antennae) in operable communication with a transmitter 14 and a receiver 16. The mobile terminal 10 may further include an apparatus, such as a controller 20 or other processing element, that provides signals to and receives signals from the transmitter 14 and receiver 16, respectively. The signals include signaling information in accordance with the air interface standard of the applicable cellular system, and also user speech, received data and/or user generated data. In this regard, the mobile terminal 10 is capable of operating with one or more air interface standards, communication protocols, modulation types, and access types. By way of illustration, the mobile terminal 10 is capable of operating in accordance with any of a number of first, second, third and/or fourth-generation communication protocols or the like. For example, the mobile terminal 10 may be capable of operating in accordance with second-generation (2G) wireless communication protocols IS-136 (time division multiple access (TDMA)), GSM (global system for mobile communication), and IS-95 (code division multiple access (CDMA)), or with third-generation (3G) wireless communication protocols, such as Universal Mobile Telecommunications System (UMTS), CDMA2000, wideband CDMA (WCDMA) and time division-synchronous CDMA (TD-SCDMA), with fourth-generation (4G) wireless communication protocols or the like. As an alternative (or additionally), the mobile terminal 10 may be capable of operating in accordance with non-cellular communication mechanisms. For example, the mobile terminal 10 may be capable of communication in a wireless local area network (WLAN) or other communication networks described below in connection with FIG. 2.
  • It is understood that the apparatus, such as the controller 20, may include circuitry desirable for implementing audio and logic functions of the mobile terminal 10. For example, the controller 20 may be comprised of a digital signal processor device, a microprocessor device, and various analog to digital converters, digital to analog converters, and other support circuits. Control and signal processing functions of the mobile terminal 10 are allocated between these devices according to their respective capabilities. The controller 20 thus may also include the functionality to convolutionally encode and interleave message and data prior to modulation and transmission. The controller 20 can additionally include an internal voice coder, and may include an internal data modem. Further, the controller 20 may include functionality to operate one or more software programs, which may be stored in memory. For example, the controller 20 may be capable of operating a connectivity program, such as a conventional Web browser. The connectivity program may then allow the mobile terminal 10 to transmit and receive Web content, such as location-based content and/or other web page content, according to a Wireless Application Protocol (WAP), Hypertext Transfer Protocol (HTTP) and/or the like, for example.
  • The mobile terminal 10 may also comprise a user interface including an output device such as a conventional earphone or speaker 24, a ringer 22, a microphone 26, a display 28, and a user input interface, all of which are coupled to the controller 20. The user input interface, which allows the mobile terminal 10 to receive data, may include any of a number of devices allowing the mobile terminal 10 to receive data, such as a keypad 30, a touch display (not shown) or other input device. In embodiments including the keypad 30, the keypad 30 may include the conventional numeric (0-9) and related keys (#, *), and other hard and soft keys used for operating the mobile terminal 10. Alternatively, the keypad 30 may include a conventional QWERTY keypad arrangement. The keypad 30 may also include various soft keys with associated functions. In addition, or alternatively, the mobile terminal 10 may include an interface device such as a joystick or other user input interface. The mobile terminal 10 further includes a battery 34, such as a vibrating battery pack, for powering various circuits that are required to operate the mobile terminal 10, as well as optionally providing mechanical vibration as a detectable output. In addition, the mobile terminal 10 may include a positioning sensor 36. The positioning sensor 36 may include, for example, a global positioning system (GPS) sensor, an assisted global positioning system (Assisted-GPS) sensor, etc. However, in one exemplary embodiment, the positioning sensor 36 includes a pedometer or inertial sensor. In this regard, the positioning sensor 36 is capable of determining a location of the mobile terminal 10, such as, for example, longitudinal and latitudinal directions of the mobile terminal 10, or a position relative to a reference point such as a destination or start point. Information from the positioning sensor 36 may then be communicated to a memory of the mobile terminal 10 or to another memory device to be stored as a position history or location information.
  • The mobile terminal 10 may further include a user identity module (UIM) 38. The UIM 38 is typically a memory device having a processor built in. The UIM 38 may include, for example, a subscriber identity module (SIM), a universal integrated circuit card (UICC), a universal subscriber identity module (USIM), a removable user identity module (R-UIM), etc. The UIM 38 typically stores information elements related to a mobile subscriber. In addition to the UIM 38, the mobile terminal 10 may be equipped with memory. For example, the mobile terminal 10 may include volatile memory 40, such as volatile Random Access Memory (RAM) including a cache area for the temporary storage of data. The mobile terminal 10 may also include other non-volatile memory 42, which can be embedded and/or may be removable. The non-volatile memory 42 can additionally or alternatively comprise an electrically erasable programmable read only memory (EEPROM), flash memory or the like, such as that available from the SanDisk Corporation of Sunnyvale, Calif., or Lexar Media Inc. of Fremont, Calif. The memories can store any of a number of pieces of information, and data, used by the mobile terminal 10 to implement the functions of the mobile terminal 10. For example, the memories can include an identifier, such as an international mobile equipment identification (IMEI) code, capable of uniquely identifying the mobile terminal 10. Furthermore, the memories may store instructions for determining cell id information. Specifically, the memories may store an application program for execution by the controller 20, which determines an identity of the current cell, i.e., cell id identity or cell id information, with which the mobile terminal 10 is in communication. In conjunction with the positioning sensor 36, the cell id information may be used to more accurately determine a location of the mobile terminal 10.
  • In an exemplary embodiment, the mobile terminal 10 includes a media capturing module, such as a camera, video and/or audio module, in communication with the controller 20. The media capturing module may be any means for capturing an image, video and/or audio for storage, display or transmission. For example, in an exemplary embodiment in which the media capturing module is a camera module 37, the camera module 37 may include a digital camera capable of forming a digital image file from a captured image, or a video file from a series of captured image frames with or without accompanying audio data. As such, the camera module 37 includes all hardware, such as a lens or other optical device, and software necessary for creating a digital image, video or audio file from captured image/audio data. Alternatively, the camera module 37 may include only the hardware needed to capture an image, while a memory device of the mobile terminal 10 stores instructions for execution by the controller 20 in the form of software necessary to create a digital image file from a captured image. In an exemplary embodiment, the camera module 37 may further include a processing element such as a co-processor which assists the controller 20 in processing image data and an encoder and/or decoder for compressing and/or decompressing image data. The encoder and/or decoder may encode and/or decode according to, for example, a joint photographic experts group (JPEG) standard or other format.
  • FIG. 2 is a schematic block diagram of a wireless communications system according to an exemplary embodiment of the present invention. Referring now to FIG. 2, an illustration of one type of system that would benefit from embodiments of the present invention is provided. The system includes a plurality of network devices. As shown, one or more mobile terminals 10 may each include an antenna 12 for transmitting signals to and for receiving signals from a base site or base station (BS) 44. The base station 44 may be a part of one or more cellular or mobile networks each of which includes elements required to operate the network, such as a mobile switching center (MSC) 46. As well known to those skilled in the art, the mobile network may also be referred to as a Base Station/MSC/Interworking function (BMI). In operation, the MSC 46 is capable of routing calls to and from the mobile terminal 10 when the mobile terminal 10 is making and receiving calls. The MSC 46 can also provide a connection to landline trunks when the mobile terminal 10 is involved in a call. In addition, the MSC 46 can be capable of controlling the forwarding of messages to and from the mobile terminal 10, and can also control the forwarding of messages for the mobile terminal 10 to and from a messaging center. It should be noted that although the MSC 46 is shown in the system of FIG. 2, the MSC 46 is merely an exemplary network device and embodiments of the present invention are not limited to use in a network employing an MSC.
  • The MSC 46 can be coupled to a data network, such as a local area network (LAN), a metropolitan area network (MAN), and/or a wide area network (WAN). The MSC 46 can be directly coupled to the data network. In one typical embodiment, however, the MSC 46 is coupled to a gateway device (GTW) 48, and the GTW 48 is coupled to a WAN, such as the Internet 50. In turn, devices such as processing elements (e.g., personal computers, server computers or the like) can be coupled to the mobile terminal 10 via the Internet 50. For example, as explained below, the processing elements can include one or more processing elements associated with a computing system 52 (two shown in FIG. 2), origin server 54 (one shown in FIG. 2) or the like, as described below.
  • The BS 44 can also be coupled to a serving GPRS (General Packet Radio Service) support node (SGSN) 56. As known to those skilled in the art, the SGSN 56 is typically capable of performing functions similar to the MSC 46 for packet switched services. The SGSN 56, like the MSC 46, can be coupled to a data network, such as the Internet 50. The SGSN 56 can be directly coupled to the data network. In a more typical embodiment, however, the SGSN 56 is coupled to a packet-switched core network, such as a GPRS core network 58. The packet-switched core network is then coupled to another GTW 48, such as a gateway GPRS support node (GGSN) 60, and the GGSN 60 is coupled to the Internet 50. In addition to the GGSN 60, the packet-switched core network can also be coupled to a GTW 48. Also, the GGSN 60 can be coupled to a messaging center. In this regard, the GGSN 60 and the SGSN 56, like the MSC 46, may be capable of controlling the forwarding of messages, such as MMS messages. The GGSN 60 and SGSN 56 may also be capable of controlling the forwarding of messages for the mobile terminal 10 to and from the messaging center.
  • In addition, by coupling the SGSN 56 to the GPRS core network 58 and the GGSN 60, devices such as a computing system 52 and/or origin server 54 may be coupled to the mobile terminal 10 via the Internet 50, SGSN 56 and GGSN 60. In this regard, devices such as the computing system 52 and/or origin server 54 may communicate with the mobile terminal 10 across the SGSN 56, GPRS core network 58 and the GGSN 60. By directly or indirectly connecting mobile terminals 10 and the other devices (e.g., computing system 52, origin server 54, etc.) to the Internet 50, the mobile terminals 10 may communicate with the other devices and with one another, such as according to the Hypertext Transfer Protocol (HTTP) and/or the like, to thereby carry out various functions of the mobile terminals 10.
  • Although not every element of every possible mobile network is shown and described herein, it should be appreciated that the mobile terminal 10 may be coupled to one or more of any of a number of different networks through the BS 44. In this regard, the network(s) may be capable of supporting communication in accordance with any one or more of a number of first-generation (1G), second-generation (2G), 2.5G, third-generation (3G), 3.9G, fourth-generation (4G) mobile communication protocols or the like. For example, one or more of the network(s) can be capable of supporting communication in accordance with 2G wireless communication protocols IS-136 (TDMA), GSM, and IS-95 (CDMA). Also, for example, one or more of the network(s) can be capable of supporting communication in accordance with 2.5G wireless communication protocols GPRS, Enhanced Data GSM Environment (EDGE), or the like. Further, for example, one or more of the network(s) can be capable of supporting communication in accordance with 3G wireless communication protocols such as a UMTS network employing WCDMA radio access technology. Some narrow-band analog mobile phone service (NAMPS), as well as total access communication system (TACS), network(s) may also benefit from embodiments of the present invention, as should dual or higher mode mobile stations (e.g., digital/analog or TDMA/CDMA/analog phones).
  • The mobile terminal 10 can further be coupled to one or more wireless access points (APs) 62. The APs 62 may comprise access points configured to communicate with the mobile terminal 10 in accordance with techniques such as, for example, radio frequency (RF), infrared (IrDA) or any of a number of different wireless networking techniques, including WLAN techniques such as IEEE 802.11 (e.g., 802.11a, 802.11b, 802.11g, 802.11n, etc.), world interoperability for microwave access (WiMAX) techniques such as IEEE 802.16, and/or wireless Personal Area Network (WPAN) techniques such as IEEE 802.15, BlueTooth (BT), ultra wideband (UWB) and/or the like. The APs 62 may be coupled to the Internet 50. Like with the MSC 46, the APs 62 can be directly coupled to the Internet 50. In one embodiment, however, the APs 62 are indirectly coupled to the Internet 50 via a GTW 48. Furthermore, in one embodiment, the BS 44 may be considered as another AP 62. As will be appreciated, by directly or indirectly connecting the mobile terminals 10 and the computing system 52, the origin server 54, and/or any of a number of other devices, to the Internet 50, the mobile terminals 10 can communicate with one another, the computing system, etc., to thereby carry out various functions of the mobile terminals 10, such as to transmit data, content or the like to, and/or receive content, data or the like from, the computing system 52. As used herein, the terms “data,” “content,” “information” and similar terms may be used interchangeably to refer to data capable of being transmitted, received and/or stored in accordance with embodiments of the present invention. Thus, use of any such terms should not be taken to limit the spirit and scope of embodiments of the present invention.
  • Although not shown in FIG. 2, in addition to or in lieu of coupling the mobile terminal 10 to computing systems 52 across the Internet 50, the mobile terminal 10 and computing system 52 may be coupled to one another and communicate in accordance with, for example, RF, BT, IrDA or any of a number of different wireline or wireless communication techniques, including LAN, WLAN, WiMAX, UWB techniques and/or the like. One or more of the computing systems 52 can additionally, or alternatively, include a removable memory capable of storing content, which can thereafter be transferred to the mobile terminal 10. Further, the mobile terminal 10 can be coupled to one or more electronic devices, such as printers, digital projectors and/or other multimedia capturing, producing and/or storing devices (e.g., other terminals). Like with the computing systems 52, the mobile terminal 10 may be configured to communicate with the portable electronic devices in accordance with techniques such as, for example, RF, BT, IrDA or any of a number of different wireline or wireless communication techniques, including universal serial bus (USB), LAN, WLAN, WiMAX, UWB techniques and/or the like.
  • In an exemplary embodiment, content or data may be communicated over the system of FIG. 2 between a mobile terminal, which may be similar to the mobile terminal 10 of FIG. 1, and a network device of the system of FIG. 2 in order to, for example, execute applications or establish communication (e.g., voice communications utilizing speech coding, image/video data captured by the camera module 37, etc.) between the mobile terminal 10 and other mobile terminals. As such, it should be understood that the system of FIG. 2 need not be employed for communication between mobile terminals or between a network device and the mobile terminal, but rather FIG. 2 is merely provided for purposes of example. Furthermore, it should be understood that embodiments of the present invention may be resident on a communication device such as the mobile terminal 10, and/or may be resident on other devices, absent any communication with the system of FIG. 2.
  • An exemplary embodiment of the invention will now be described with reference to FIG. 3, in which certain elements of an apparatus for providing improved data compression in, for example, a low bit rate speech coding environment are displayed. The apparatus of FIG. 3 may be employed, for example, on the mobile terminal 10 of FIG. 1 and/or the computing system 52 or the origin server 54 of FIG. 2. However, it should be noted that the system of FIG. 3, may also be employed on a variety of other devices, both mobile and fixed, and therefore, the present invention should not be limited to application on devices such as the mobile terminal 10 of FIG. 1. It should also be noted, however, that while FIG. 3 illustrates one example of a configuration of an apparatus for providing improved data compression, numerous other configurations may also be used to implement the present invention. Furthermore, although an exemplary embodiment is described below primarily in the context of speech coding, embodiments of the present invention are not limited to use in the context of speech coding, but also apply more generally to data compression for other compression applications as well, such as video coding, image coding, audio coding, etc.
  • Referring now to FIG. 3, an apparatus for providing improved data compression for speech coding in a low bit rate speech coding application is provided. However, as indicated above, other types of coding may also be practiced in a similar manner to the description provided below. The apparatus includes an encoder 70 and a decoder 72 capable of communication with each other (although such communication is not required). The encoder 70 may be employed to encode data (e.g., speech data, audio data, video data, image data, etc.) and the decoder 72 may be employed to decode encoded data. The encoder 70 may be any means or device embodied in hardware, software or a combination of hardware and software that is capable of performing encoding operations as described below. In an exemplary embodiment, the encoder 70 is embodied in software as instructions that are stored on a memory of the mobile terminal 10 and executed by the controller 20. The decoder 72 may be any device or means embodied in either hardware, software, or a combination of hardware and software that is capable of performing decoding operations as described below. In an exemplary embodiment, the decoder 72 may be embodied in software as instructions that are stored on a memory of the mobile terminal 10 and executed by the controller 20. However, either or both of the encoder 70 and decoder 72 could also be embodied at a network device, such as, for example, the computing system 52 or the origin server 54 of FIG. 2, and/or at a different mobile terminal.
  • The encoder 70 may include a main quantizer 74 and a side information model 76. However, the side information model 76 could alternatively be separate from the encoder 70 (such as in the decoder 72 or elsewhere) or shared between the encoder 70 and the decoder 72. As yet another alternative, the encoder 70 may not include the side information model 76. The main quantizer 74 and the side information model 76 may each be any circuitry, means or device embodied in hardware, software or a combination of hardware and software that are configured to perform the corresponding functions of the main quantizer 74 and the side information model 76, respectively, as described below. In this regard, the main quantizer 74 may be configured to perform quantization operations on incoming or input data (e.g., speech data in this example). Quantization refers to a process of approximating a continuous set of values in the input data with a finite (preferably small) set of values. The input to the main quantizer 74 may be the original data, and the output of the main quantizer 74 may be a set of output values forming an approximation of the input, preferably with minimal loss or distortion. The main quantizer 74 may be trained, in accordance with embodiments of the present invention, in order to optimize, enhance or at least improve quantization accuracy for a portion of the input data. The portion of the input data for which the main quantizer 74 may be trained to provide optimal or improved quantization accuracy may be a high priority portion of the input data.
  • In the case of speech coding, the high priority portion of the input data may be the spectral envelope. However, the terms high priority and low priority, when used in the context of embodiments of the present invention, should be understood to be relative terms with respect to each other and do not represent high and low priorities in any absolute sense. As such, high priority data and low priority data can be predefined for content within any particular application. For example, background data may be low priority data in an image, rapidly changing features may be low priority data in video, and data in certain frequency bands may be lower priority in audio. However, one skilled in these respective arts may define high and low priority data as desired.
  • The side information model 76 may be trained simultaneously with the training of the main quantizer 74. In this regard, the side information model 76 and the main quantizer 74 may both be trained simultaneously while optimizing or improving the quantization accuracy for the high priority portion (e.g., the spectral envelope). As such, the side information model 76 may be trained as a consequence of the training of the main quantizer 74. The side information model 76 may then model side information comprising the low priority portion of the input data (e.g., the residual spectrum). As such, when the input data is processed by the encoder 70, encoding may only partially be performed on the input data. In this regard, only the high priority portion (e.g., the spectral envelope) may be encoded. The encoded high priority portion (e.g., the spectral envelope) and modeled low priority portion (e.g., the residual spectrum) may then be communicated to another device (e.g., via the system of FIG. 2).
  • Accordingly, as provided in this example, the encoder may receive input data in the form of speech data (although as indicated above, other forms of data may also be employed) and may compress the speech data such that the spectral envelope, which is high priority data, is processed by a quantizer trained to provide an improved or optimal efficiency with respect to quantizing only high priority data. Meanwhile low priority data may be modeled by a side model trained jointly with the training of the quantizer. In an exemplary embodiment, the output of the encoder therefore includes partially encoded data (the portion encoded corresponding to the high priority data) and a model of the low priority portion (e.g., the residual spectrum). However, there may not always be an explicit output related to the model of the residual spectrum. In this regard, for example, if all the bits are used for coding of the high priority data and there would be no bit indicative of modeling accuracy and thus, no output related to the model of the residual spectrum. In such a scenario, the decoder 72 may recover the residual spectrum based on the coded spectral envelope and the side information model 76 stored at the decoder 72.
  • The decoder 72 may receive encoded information and may decode the received information prior to further processing. In an exemplary embodiment, the decoder 72 may include a side information model (which could be the same as the side information model 76 used in the encoder or an identical version thereof) and a dequantizer 78. The decoder 72 may be configured to receive data communicated to the apparatus (e.g., the mobile terminal 10) that has been encoded in accordance with an embodiment of the present invention, such as by a device employing an encoder as described above. The dequantizer 78 may be any device or means embodied in either hardware, software, or a combination of hardware and software that is capable of dequantizing the quantized high priority portion (e.g., the spectral envelope). As such, at the decoder 72, the high priority portion (e.g., the spectral envelope) that was quantized by the main quantizer 74 may be recovered by the dequantizer 78 and the low priority portion (e.g., the residual spectrum) may be recovered by the side information model 76 using the information available for the high priority portion. Accordingly, with this approach, the low priority portion can be modeled as side information without using any bits for quantizing the low priority portion. As such, for example, if a low bit rate speech encoding operation is utilized for compression, e.g., in a high quality text-to-speech database, with twenty bits dedicated to quantization, all twenty of the bits may be utilized for quantizing the spectral envelope without devoting any bits to quantizing the residual spectrum and yet, at the decoder 72, the residual spectrum may be recovered from the modeled residual spectrum.
  • In an exemplary embodiment, one bit could be used for indication with respect to model sufficiency. In this regard, for example, one bit may be used to indicate whether the accuracy provided by the modeling described above is sufficient (e.g., based on modeling error for the residual spectrum). Accordingly, if accuracy is indicated as being insufficient, further details regarding the modeling error for the residual spectrum (or low priority data of any kind in other embodiments) may be provided and transmitted and/or stored in a location accessible for use in connection with practicing embodiments of the present invention.
  • As indicated in FIG. 3, the encoder 70 and decoder 72 may undergo training in order to, for example, optimize, enhance or improve performance of the main quantizer 74 (and dequantizer 78) with respect to high priority data as indicated at operation 80. Such training can also provide training the side information model 76 simultaneously. The training described above may be performed offline or online. As such, an apparatus (e.g., such as the mobile terminal 10) having the encoder 70 and/or decoder 72 may be pre-trained and, when received by a user, may include a trained encoder 70 and/or decoder 72. Moreover, since the encoder 70 and/or decoder 72 may each be embodied in software, in at least one embodiment, the encoder 70 and/or decoder 72 may each be controlled by or otherwise embodied as a processing element such as the controller 20 of the mobile terminal 10 (or a processor of a server, computer or other device). Processing elements such as those described herein may be embodied in many ways. For example, the processing element may be embodied as a processor, a coprocessor, a controller or various other processing means or devices including integrated circuits such as, for example, an ASIC (application specific integrated circuit).
  • An exemplary embodiment of the present invention will now be described in the context of vector quantization and, more specifically, in the context of multistage vector quantization (MSVQ). In this regard, some initial information will initially be provided as a background related to MSVQ. However, embodiments of the present invention may also be practiced in other quantization contexts as well and MSVQ is merely one type of quantization that is provided herein by way of example and not of limitation. In alternative embodiments of the invention, a typical characteristic of the training procedure may be that the encoding criterion and the centroid calculation and/or model training are decoupled. In this regard, the encoding step may be done only partially, e.g., only for the high-priority data, but the centroids and/or other models may be computed also for the low-priority data.
  • An MSVQ quantizes vectors in two or more additive stages. The objective is to find a vector combination, in other words a sum of the selected vectors at different stages, that minimizes the resulting distortion. The quantized vector can be defined as:
  • x ^ = j = 1 S y l j ( j ) ,
  • where yl (j) denotes the lth reproduction vector from the jth stage and S is the number of stages. lj is the selected vector in the jth stage. Alternatively, the quantized vector in multistage quantization can be expressed as:
  • x ^ = j = 1 S T l j ( j ) c j = TC ,
  • where vector cj is the codebook (e.g., stacked codebook) of jth stage

  • c j =[y 1 (j)T y 2 (j)T . . . y N j (j)T]T,
  • and the selection matrix for the jth stage, Tl (j), is a sparse Toeplitz matrix (p by pNj, Nj being the number of codevectors at jth stage) that satisfies

  • T l (j) c j =y l (j).
  • The column vector c is the combined stacked codebook

  • c=[c 1 T c 2 T . . . c S T]T,
  • and the matrix T is the multistage selection matrix

  • T=└T l 1 (1) T l 2 (2) . . . T l S (S)┘.
  • With this notation, the quantization process can be viewed as the process of constructing the selection matrix T.
  • Implementation of encoding and decoding in an exemplary use case is provided below. In this regard, for example, let x denote a complete spectral vector to be quantized. The complete spectral vector may include the spectral envelope part a and the residual spectrum part b, combined together as x=[aT bT]T. In our implementation, a may be assumed to have been obtained through linear prediction and represented using line spectral frequencies (LSF). The vector b may have been obtained by modeling the spectral details (e.g., low priority data) by transforming the residual amplitude spectrum into fixed-dimension mel cepstrum domain.
  • Using conventional techniques, we could jointly quantize a and b in the vector x using e.g. the weighted squared error distortion measure:

  • d(x,{circumflex over (x)})=(x−{circumflex over (x)})T W(x−{circumflex over (x)}).
  • Here, we assume that the matrix W is diagonal, with weights selected according to the subjective importance of the corresponding vector components. As a new addition related to an embodiment of the present invention, a new matrix Z may be included into the distortion measure to obtain the new measure:

  • d p(x,{circumflex over (x)})=(x−{circumflex over (x)})T(ZW)(x−{circumflex over (x)}),
  • that can be used in the partial encoding. The matrix Z is a diagonal matrix whose diagonal elements that correspond to vector a are set to 1 and others to 0. As such, the matrix Z may be considered a selector for partial encoding. In other words, matrix Z may be a tool used to enable the quantization to focus on the spectral envelope by selecting the high priority data (e.g., data corresponding to the spectral envelope). At the decoder, the whole vector, including the spectral envelope part and the residual spectrum part, is recovered. In an exemplary embodiment, the residual spectrum part can be directly stored in the codebook. In another embodiment, the residual spectrum part can be recovered using the information available from the spectral envelope part and a separate side information model. MSVQ may be used together with an M-L tree search algorithm.
  • As indicated above, if the resulting quality is not considered sufficient, one bit per vector may be included for indicating the quality of the modeled vector b. If the bit indicates that the accuracy of the particular vector is not sufficient, separate refinement information may be included for the vector. The quantizer for the refinement information can be used and trained in a conventional manner
  • Quantizer training will now be described in accordance with an exemplary embodiment of the present invention. In an exemplary embodiment, the algorithm used for training the quantizer may be modified from that of a conventional quantizer in order to achieve the benefits described herein. In particular, the partial encoding must be taken into the account (e.g., by providing the selector matrix Z). A summary of a conventional training scheme and the necessary modifications according to one example is described below. First, notations defined earlier in this section are used. In addition, Wn is a (diagonal) weighting matrix for the nth training vector. The weighted squared distortion may be rewritten as:
  • d r = d 0 - 2 c T Y + c T QC , where d 0 = n x n T W n x n , Y = n T n T W n x n , and Q = n T n T W n T n .
  • Furthermore, Sj is a shifting matrix satisfying the condition c=c j+Sjcj, and vector c j is the stacked codebook c with the vector components corresponding to jth stage codebook replaced with zeros.
  • Embodiments of the present invention can utilize a simultaneous joint design algorithm that may be initialized with sequentially designed random codebooks. Table 1 illustrates an example of the simultaneous joint design algorithm. Of note, in Table 1, the distortion dp mentioned at step 2 below is computed using the new matrix Z described above.
  • TABLE 1
    1. Initialize.
    Set q = 1. Create an S stage random codebook.
    2. Partition.
    Set all elements of Y and Q to zero. Then, for each vector in the
    training sequence, determine the codebook indices which minimize
    the distortion dp. Compute both Tn TWnTn and Tn TWnxn and update
    the running sum Q, Y and dq.
    3. Compute new codebooks.
    For j = 1 to S compute cj = Qjj −1Yj, where Qjj = Sj TQSj and
    Yj = Sj T(Y − Qc j).
    4. Convergence of c.
    Repeat Operation 3 until convergence.
    5. Reorder.
    Modify the order of the codebooks such that the energy in cj after
    subtracting the codebook mean is less than the corresponding energy
    in ci for all j > i. Ensure that the sum of all codevectors of each
    codebook (except the first codebook) is zero.
    6. Convergence test.
    If |dq−1 − dq|/dq > εd, set q = q + 1 and go to Operation 2. εd being a
    threshold for determining convergence. Distances used to measure
    convergence may be measured using the partial approach above (e.g.,
    with application of the Z matrix) or via conventional approaches.
    7. Terminate.
  • After initialization at operation 1, data comprising a training set may be quantized using the codebooks, and the running sums Q, Y and dq may be updated correspondingly. In theory, it is assumed that the quantization is performed using a full search (e.g., all combinations in the codebook are searched). However, it has been experimentally found that good performance can be achieved by employing the M-L tree search with a moderate value of M. Operations 3 and 4 may define an inner loop which minimizes the total distortion and thus optimizes the stacked codebook using the partitioning performed at operation 2. Finally, the codebooks may be reordered to enhance the performance of the M-L tree search before continuing with a new partitioning. Of note, the exemplary implementation provided above produces a straightforward side information model that can be stored inside the codebook(s) of the main quantizer. In other words, for example, the resulting codebook(s) can be stored in such a way that the resulting codebook(s) include joint vectors x that include both the high-priority part to be encoded, a, and the low-priority side information part b.
  • FIGS. 4 and 5 are flowcharts of methods and program products according to exemplary embodiments of the invention. It will be understood that each block or step of the flowcharts, and combinations of blocks in the flowcharts, can be implemented by various means, such as hardware, firmware, and/or software including one or more computer program instructions. For example, one or more of the procedures described above may be embodied by computer program instructions. In this regard, the computer program instructions which embody the procedures described above may be stored by a memory device of the mobile terminal and executed by a built-in processor in the mobile terminal. As will be appreciated, any such computer program instructions may be loaded onto a computer or other programmable apparatus (i.e., hardware) to produce a machine, such that the instructions which execute on the computer or other programmable apparatus create means for implementing the functions specified in the flowcharts block(s) or step(s). These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowcharts block(s) or step(s). The computer program instructions may also be loaded onto a computer or other programmable apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer-implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowcharts block(s) or step(s).
  • Accordingly, blocks or steps of the flowcharts support combinations of means for performing the specified functions, combinations of steps for performing the specified functions and program instruction means for performing the specified functions. It will also be understood that one or more blocks or steps of the flowcharts, and combinations of blocks or steps in the flowcharts, can be implemented by special purpose hardware-based computer systems which perform the specified functions or steps, or combinations of special purpose hardware and computer instructions.
  • In this regard, one embodiment of a method for providing improved data compression, as illustrated in FIG. 4, may provide that input data may be compressed without using any bits for quantization of the low priority data. The method may include partially encoding input data by encoding high priority data among the input data using a quantizer trained with respect to the high priority data at operation 100. A selector matrix may be utilized to enable encoding only the input data that corresponds to the high priority data. Low priority data may be modeled, at operation 110, using a side model trained jointly with the training of the quantizer. In an exemplary embodiment, the method may further include decoding compressed data using a dequantizer and the side model to recover the high priority data and the low priority data, respectively, from a received transmission at an optional operation 120. In another exemplary embodiment, compression may include quantizing the high priority data using the quantizer that is further trained to optimize or enhance quantization accuracy for the high priority data while simultaneously training the side model.
  • In an optional embodiment, the method may further include providing an indicator bit indicative of a sufficiency of the accuracy of the modeling of the low priority data. In this regard, for example, modeling error information for improving modeling accuracy may be provided in response to the indicator bit indicating insufficient modeling accuracy.
  • FIG. 5 illustrates a method of training the quantizer and the side model in accordance with an embodiment of the present invention. As indicated in FIG. 5, the method may include gathering a data set including high priority data and low priority data jointly at operation 200. At operation 210, a quantizer may be trained for processing the high priority data in a data set. The side model may then be jointly trained with the training of the quantizer with respect to quantization accuracy of the quantizer for the high priority data at operation 220. In an exemplary embodiment, operations 210 and 220 may be handled simultaneously and/or with many iterations for the training. The side model may be configured for modeling the low priority data. In an exemplary embodiment, the training of the quantizer and the joint training of the side model may include simultaneously training the quantizer and the side model offline.
  • Embodiments of the present invention may provide for a reduction in the bit rate required to achieve a given quality level (or better quality for a given bit rate), without a significant increase in computational load. Furthermore, only moderate increases in memory consumption may be required (e.g., for storage of side information data).
  • Many modifications and other embodiments of the inventions set forth herein will come to mind to one skilled in the art to which these inventions pertain having the benefit of the teachings presented in the foregoing descriptions and the associated drawings. Therefore, it is to be understood that the inventions are not to be limited to the specific embodiments disclosed and that modifications and other embodiments are intended to be included within the scope of the appended claims. Although specific terms are employed herein, they are used in a generic and descriptive sense only and not for purposes of limitation.

Claims (12)

1. A method comprising:
receiving input data comprising multimedia;
defining high priority data and low priority data in the input data based on the type of the input data;
partially encoding, in a processor, the input by encoding high priority data among the input data using a quantizer trained with respect to the high priority data; and
modeling, in a processor, low priority data among the input data using a side model trained jointly with the training of the quantizer, wherein the low priority data is associated with the residual spectrum data of the input data subtracted by the high priority data,
wherein partially encoding the input data comprises providing a selector matrix to enable encoding only the input data that corresponds to the high priority data.
2. The method of claim 1, wherein the defining comprises the high priority data is associated with spectral envelope data of input data when the type of the input data is speech.
3. The method of claim 1, wherein the defining comprises the high priority data is associated with foreground data of input data when the type of the input data is image.
4. The method of claim 1, wherein the defining comprises the low priority data is associated with rapidly changing features of input data when the type of the input data is video.
5. A computer program product comprising at least one computer-readable storage medium having computer-readable program code portions stored therein, the computer-readable program code portions comprising:
a first executable portion for receiving input data comprising multimedia;
a second executable portion for defining high priority data and low priority data in the input data based on the type of the input data;
a third executable portion for partially encoding, in a processor, the input by encoding high priority data among the input data using a quantizer trained with respect to the high priority data; and
a fourth executable portion for modeling, in a processor, low priority data among the input data using a side model trained jointly with the training of the quantizer, wherein the low priority data is associated with the residual spectrum data of the input data subtracted by the high priority data,
wherein partially encoding the input data comprises providing a selector matrix to enable encoding only the input data that corresponds to the high priority data.
6. The computer program product of claim 5, wherein the defining comprises the high priority data is associated with spectral envelope data of input data when the type of the input data is speech
7. The computer program product of claim 5, wherein the defining comprises the high priority data is associated with foreground data of input data when the type of the input data is image.
8. The computer program product of claim 5, wherein the defining comprises the low priority data is associated with rapidly changing features of input data when the type of the input data is video.
9. An apparatus comprising:
a processor; and
a memory communicatively coupled to the processor, and configured to store computer program code for, when executed by the processor:
receiving input data comprising multimedia,
defining high priority data and low priority data in the input data based on the type of the input data,
partially encoding, in a processor, the input by encoding high priority data among the input data using a quantizer trained with respect to the high priority data, and
modeling, in a processor, low priority data among the input data using a side model trained jointly with the training of the quantizer, wherein the low priority data is associated with the residual spectrum data of the input data subtracted by the high priority data,
wherein partially encoding the input data comprises providing a selector matrix to enable encoding only the input data that corresponds to the high priority data.
10. The apparatus of claim 9, wherein the defining comprises the high priority data is associated with spectral envelope data of input data when the type of the input data is speech.
11. The apparatus of claim 9, wherein the defining comprises the high priority data is associated with foreground data of input data when the type of the input data is image.
12. The apparatus of claim 9, wherein the defining comprises the low priority data is associated with rapidly changing features of input data when the type of the input data is video.
US15/951,781 2007-10-04 2018-04-12 Method, apparatus and computer program product for providing improved data compression Abandoned US20180332287A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/951,781 US20180332287A1 (en) 2007-10-04 2018-04-12 Method, apparatus and computer program product for providing improved data compression

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US11/867,212 US8886612B2 (en) 2007-10-04 2007-10-04 Method, apparatus and computer program product for providing improved data compression
US14/529,718 US9451265B2 (en) 2007-10-04 2014-10-31 Method, apparatus and computer program product for providing improved data compression
US15/244,856 US9973755B2 (en) 2007-10-04 2016-08-23 Method, apparatus and computer program product for providing improved data compression
US15/951,781 US20180332287A1 (en) 2007-10-04 2018-04-12 Method, apparatus and computer program product for providing improved data compression

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US15/244,856 Continuation US9973755B2 (en) 2007-10-04 2016-08-23 Method, apparatus and computer program product for providing improved data compression

Publications (1)

Publication Number Publication Date
US20180332287A1 true US20180332287A1 (en) 2018-11-15

Family

ID=40524195

Family Applications (4)

Application Number Title Priority Date Filing Date
US11/867,212 Expired - Fee Related US8886612B2 (en) 2007-10-04 2007-10-04 Method, apparatus and computer program product for providing improved data compression
US14/529,718 Expired - Fee Related US9451265B2 (en) 2007-10-04 2014-10-31 Method, apparatus and computer program product for providing improved data compression
US15/244,856 Expired - Fee Related US9973755B2 (en) 2007-10-04 2016-08-23 Method, apparatus and computer program product for providing improved data compression
US15/951,781 Abandoned US20180332287A1 (en) 2007-10-04 2018-04-12 Method, apparatus and computer program product for providing improved data compression

Family Applications Before (3)

Application Number Title Priority Date Filing Date
US11/867,212 Expired - Fee Related US8886612B2 (en) 2007-10-04 2007-10-04 Method, apparatus and computer program product for providing improved data compression
US14/529,718 Expired - Fee Related US9451265B2 (en) 2007-10-04 2014-10-31 Method, apparatus and computer program product for providing improved data compression
US15/244,856 Expired - Fee Related US9973755B2 (en) 2007-10-04 2016-08-23 Method, apparatus and computer program product for providing improved data compression

Country Status (1)

Country Link
US (4) US8886612B2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112533036A (en) * 2020-11-30 2021-03-19 北京大学 Neural self-adaptive video stream multi-path transmission result determination method and system

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8886612B2 (en) * 2007-10-04 2014-11-11 Core Wireless Licensing S.A.R.L. Method, apparatus and computer program product for providing improved data compression
US9767822B2 (en) 2011-02-07 2017-09-19 Qualcomm Incorporated Devices for encoding and decoding a watermarked signal
US9767823B2 (en) 2011-02-07 2017-09-19 Qualcomm Incorporated Devices for encoding and detecting a watermarked signal
US8880404B2 (en) 2011-02-07 2014-11-04 Qualcomm Incorporated Devices for adaptively encoding and decoding a watermarked signal
GB2582916A (en) * 2019-04-05 2020-10-14 Nokia Technologies Oy Spatial audio representation and associated rendering

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030152105A1 (en) * 1994-04-19 2003-08-14 Multi-Tech Systems, Inc. Advanced priority statistical multiplexer
US20030200092A1 (en) * 1999-09-22 2003-10-23 Yang Gao System of encoding and decoding speech signals
US20040008897A1 (en) * 2002-07-09 2004-01-15 Lightsurf Technologies, Inc. System and method for improved compression of DCT compressed images
US7013269B1 (en) * 2001-02-13 2006-03-14 Hughes Electronics Corporation Voicing measure for a speech CODEC system

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6137835A (en) * 1993-11-16 2000-10-24 Canon Kabushiki Kaisha Picture coding apparatus using orthogonal transforms
US5602764A (en) * 1993-12-22 1997-02-11 Storage Technology Corporation Comparing prioritizing memory for string searching in a data compression system
US7190284B1 (en) * 1994-11-16 2007-03-13 Dye Thomas A Selective lossless, lossy, or no compression of data based on address range, data type, and/or requesting agent
US6904404B1 (en) * 1996-07-01 2005-06-07 Matsushita Electric Industrial Co., Ltd. Multistage inverse quantization having the plurality of frequency bands
JP3246715B2 (en) * 1996-07-01 2002-01-15 松下電器産業株式会社 Audio signal compression method and audio signal compression device
US6072830A (en) * 1996-08-09 2000-06-06 U.S. Robotics Access Corp. Method for generating a compressed video signal
WO1998044637A1 (en) * 1997-03-28 1998-10-08 Sony Corporation Data coding method and device, data decoding method and device, and recording medium
CA2343751A1 (en) * 1998-07-27 2000-02-10 Webtv Networks, Inc. Remote computer access
EP1292036B1 (en) * 2001-08-23 2012-08-01 Nippon Telegraph And Telephone Corporation Digital signal decoding methods and apparatuses
US7599835B2 (en) * 2002-03-08 2009-10-06 Nippon Telegraph And Telephone Corporation Digital signal encoding method, decoding method, encoding device, decoding device, digital signal encoding program, and decoding program
AU2002307889A1 (en) * 2002-04-22 2003-11-03 Nokia Corporation Generating lsf vectors
US20030235338A1 (en) * 2002-06-19 2003-12-25 Meetrix Corporation Transmission of independently compressed video objects over internet protocol
US20040022322A1 (en) * 2002-07-19 2004-02-05 Meetrix Corporation Assigning prioritization during encode of independently compressed objects
US8886612B2 (en) * 2007-10-04 2014-11-11 Core Wireless Licensing S.A.R.L. Method, apparatus and computer program product for providing improved data compression

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030152105A1 (en) * 1994-04-19 2003-08-14 Multi-Tech Systems, Inc. Advanced priority statistical multiplexer
US20030200092A1 (en) * 1999-09-22 2003-10-23 Yang Gao System of encoding and decoding speech signals
US7013269B1 (en) * 2001-02-13 2006-03-14 Hughes Electronics Corporation Voicing measure for a speech CODEC system
US20040008897A1 (en) * 2002-07-09 2004-01-15 Lightsurf Technologies, Inc. System and method for improved compression of DCT compressed images

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112533036A (en) * 2020-11-30 2021-03-19 北京大学 Neural self-adaptive video stream multi-path transmission result determination method and system

Also Published As

Publication number Publication date
US9451265B2 (en) 2016-09-20
US20150055883A1 (en) 2015-02-26
US20170085877A1 (en) 2017-03-23
US20090094264A1 (en) 2009-04-09
US8886612B2 (en) 2014-11-11
US9973755B2 (en) 2018-05-15

Similar Documents

Publication Publication Date Title
US9973755B2 (en) Method, apparatus and computer program product for providing improved data compression
US7848924B2 (en) Method, apparatus and computer program product for providing voice conversion using temporal dynamic features
US6836225B2 (en) Fast search method for nearest neighbor vector quantization
EP2005415B1 (en) Method, apparatus, mobile terminal and computer program product for providing efficient evaluation of feature transformation
US8131550B2 (en) Method, apparatus and computer program product for providing improved voice conversion
US20030110027A1 (en) Method and system for information signal coding using combinatorial and huffman codes
US20190355378A1 (en) Audio signal encoding and decoding method, and audio signal encoding and decoding apparatus
EP0788091A2 (en) Speech encoding and decoding method and apparatus therefor
CN103999468A (en) Method and apparatus for video encoding
JPH08263099A (en) Encoder
CN105814631A (en) Systems and methods of blind bandwidth extension
KR20180026812A (en) Prediction method and coding/decoding device for high frequency band signal
JP6096896B2 (en) Vector quantization
JP4691011B2 (en) Encoding transmission method, apparatus thereof, program thereof, and recording medium thereof
KR100508618B1 (en) Pitch cycle search range setting device and pitch cycle search device
US20140052440A1 (en) Coding through combination of code vectors
US7725411B2 (en) Method, apparatus, mobile terminal and computer program product for providing data clustering and mode selection
US20130246076A1 (en) Coding of strings
US20080120114A1 (en) Method, Apparatus and Computer Program Product for Performing Stereo Adaptation for Audio Editing
CN119339732B (en) Audio generation model training method, device and electronic equipment
JPH08179800A (en) Speech coding device
JP3257386B2 (en) Vector quantization method
JP2002247587A (en) Image encoding data re-encoding device, re-encoding method, re-encoding program, and recording medium on which re-encoding program is recorded
JP2017138605A (en) Vector quantization
JP2001044846A (en) Vector quantization method and speech coding method and apparatus

Legal Events

Date Code Title Description
STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

AS Assignment

Owner name: CPPIB CREDIT INVESTMENTS, INC., CANADA

Free format text: AMENDED AND RESTATED U.S. PATENT SECURITY AGREEMENT (FOR NON-U.S. GRANTORS);ASSIGNOR:CONVERSANT WIRELESS LICENSING S.A R.L.;REEL/FRAME:046897/0001

Effective date: 20180731

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO PAY ISSUE FEE

AS Assignment

Owner name: CONVERSANT WIRELESS LICENSING S.A R.L., LUXEMBOURG

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:CPPIB CREDIT INVESTMENTS INC.;REEL/FRAME:055910/0698

Effective date: 20210302