[go: up one dir, main page]

WO1997036425A1 - Video processing - Google Patents

Video processing Download PDF

Info

Publication number
WO1997036425A1
WO1997036425A1 PCT/GB1997/000641 GB9700641W WO9736425A1 WO 1997036425 A1 WO1997036425 A1 WO 1997036425A1 GB 9700641 W GB9700641 W GB 9700641W WO 9736425 A1 WO9736425 A1 WO 9736425A1
Authority
WO
WIPO (PCT)
Prior art keywords
signals
input
composite
data
picture
Prior art date
Application number
PCT/GB1997/000641
Other languages
French (fr)
Inventor
Gary Dean Burgess
Original Assignee
British Telecommunications Public Limited Company
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from GBGB9606511.5A external-priority patent/GB9606511D0/en
Application filed by British Telecommunications Public Limited Company filed Critical British Telecommunications Public Limited Company
Priority to JP9534101A priority Critical patent/JP2000507418A/en
Priority to AU21022/97A priority patent/AU2102297A/en
Publication of WO1997036425A1 publication Critical patent/WO1997036425A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems

Definitions

  • Videoconferencing can be regarded as a technological substitute for face- to-face meetings.
  • current technology allows one set of participants to see the other set of participants.
  • multipoint videoconferencing current systems generally provide a view of only one other location at a time, owing to cost and technology constraints,
  • CIF common intermediate format
  • a multipoint videoconference is generally controlled by a multipoint control unit (MCU) which processes the audio and video signals from each location separately.
  • MCU multipoint control unit
  • the MCU is usually provided as a separate piece of equipment but may form an intergral part of one of the participating terminals.
  • the MCU generally provides an open audio-mixing system, where all participants are able to hear all other participants but not themselves.
  • each terminal is only able to see one other participating terminal, the MCU switching the video from selected terminals to be seen at the other terminals.
  • Various methods for selecting who is seen at a particular terminal are known. Two of the most popular involve selecting the picture automatically from the terminal where someone is speaking or having a chair person controlling which picture is seen by whom.
  • European Patent Application No. 523629 relates to such a multipoint teleconferencing system.
  • a chairperson is located at one of the terminals to control which pictures are viewed by the participants.
  • Each participant receives the same video signal as the other participants for display.
  • European patent application no. 642271 desc ⁇ bes videoconferencing apparatus in which a multipoint control unit selects every nth field of the incoming video signals to derive a single output signal which is sent to the participants. Again all participants receive the same video signal.
  • a more desirable approach to multipoint videoconferencing would be to enable participants to be seen and heard at all times during the conference, making a videoconference closer to a real face-to-face meeting.
  • image processing apparatus comprises input means for receiving input signals from n terminals, where n is an integer greater than or equal to 3, each input signal representing frames of a video signal, processing means for forming n composite signals each representing different combinations of at least two of the input signals, and means for transmitting the composite signals to the relevant terminal.
  • processing means comprises means for identifying control data in each input signal, means for redefining the control data for inclusion in the composite signals and means for inserting video data from the input signals into the composite signals.
  • the frame rate of the composite signals may be equal to the highest frame rate of the input signals or equal to a predetermined fixed rate.
  • the input signals conform to the quarter Common Intermediate Format and the composite signals conform to the Common Intermediate Format.
  • a method of processing image data from a plurality of terminals comprises receiving the input signals from n terminals, where n is an integer greater than or equal to 3, processing the input signals to form n composite signals representing combinations of at least two input signals, each composite signal being different, and transmitting the composite signals to the relevant terminals.
  • the composite signals may represent combinations of four input signals, the input signals preferably being selected on the basis of at which terminal the most recent speakers are located.
  • the method preferably includes identifying control data in each input signal, redefining the control data for inclusion in the composite signals and inserting video data from the input signals into the composite signals.
  • Figure 1 shows schematically a multipoint videoconference
  • Figure 2 shows an area of a video image divided into blocks
  • Figure 3a shows a macro block consisting of four luminance and two chrominance blocks
  • Figure 3b shows a group of blocks (GOB).
  • Figure 3c shows the structure of a whole image consisting of twelve groups of blocks according to the common intermediate format and three groups of blocks according to quarter CIF;
  • Figure 4 shows the framing structure for an H.261 encoded picture;
  • FIG. 5 shows the functional elements of apparatus according to the invention
  • Figure 6 shows schematically a CIF picture formed from four QCIF pictures, according to the invention
  • Figure 7 shows an example of a look-up table defining the new GOB numbering of video data for each output
  • FIG 8 shows the functional elements of an alternative embodiment of apparatus according to the invention.
  • a multipoint videoconference involves at least three locations, a videoconferencing terminal 1 2 being provided at each location. The locations might be in the same country or spread over a number of countries.
  • a multipoint control unit (MCU) 14 controls the videoconference and performs all the required audio and video mixing and switching and the control signalling.
  • Each terminal 1 2 is connected to the MCU 14 via broadband digital links such as Intergrated Services Digital Network (ISDN) B- channels.
  • ISDN Intergrated Services Digital Network
  • Each terminal 1 2 conforms to the H.261 standard and is capable of transmitting CIF or QCIF pictures. On commencement of a videoconference all participating terminals signal their capabilities to the MCU which then signals to the terminals to request the data in QCIF format.
  • images are divided into blocks 22 as shown in Figure 2 for subsequent processing.
  • the smallest block size is an 8x8 pixel block but other sized blocks may be employed.
  • a group of four such luminance (Y) blocks, and the two corresponding chrominance (C b and C r ) blocks, that cover the same area at half the luminance resolution, are collectively called a macro block (MB) as shown in Figure 3a.
  • Thirty-three macro blocks, grouped and numbered as shown in Figure 3b, are known as a group of blocks (GOB) .
  • the GOBs, grouped and numbered as shown in Figure 3c form a full CIF or QCIF picture.
  • the framing structure for a frame of H.261 encoded data is shown in Figure 4.
  • the structure is organised in a series of layers, each containing information relevant to the succeeding layers.
  • the layers are arranged as follows: Picture layer 401 ; GOB layer 403; MB layer 405; and Block layer 407.
  • Each of the layers has a header.
  • the picture header 402 includes information relating to the picture number of the encoded picture, the type of picture (e.g. whether the picture is intraframe coded or interframe coded) and Forward Error Correction (FEC) codes.
  • the GOB header 404 includes information relating to the GOB number within the frame and the quantising step size used to code the GOB.
  • the MB header 406 includes information relating to the MB number and the type of the MB (i.e. intra/inter, foward/backward predicted, luminance/chrominance etc.) . ⁇ « -,, « .. « O 97/36425
  • Figure 5 shows apparatus according to the invention for combining four
  • Each individual terminal 1 2 participating in the videoconference supplies QCIF H.261 formatted video data to the MCU 14.
  • the apparatus shown in Figure 5 receives five QCIF pictures from the participating terminals and produces CIF signals representing each combination of four QCIF pictures into a 2x2 array of QCIF coded pictures.
  • the resulting CIF signals are then transmitted to the appropriate participating terminals 1 2 for display on a display capable of displaying CIF resolution pictures.
  • the apparatus shown only operates upon the video signals from the terminals 1 2: the audio, user data information and signalling are controlled in a conventional manner by the host
  • the apparatus comp ⁇ ses five inputs 51 a-e for receiving QCIF format signals from five participating terminals 1 2.
  • Each input signal is input to a forward error correction (FEC) decoder 52a-e which decodes each FEC codes contained in the picture header 402 of each signal, error corrects the video data of the signal in a conventional manner and establishes framing locks on each input signal.
  • FEC decoder 52 signals this to a control means 54.
  • the control means 54 may be provided by a microprocessor.
  • the error corrected QCIF signals are then input to first-in-first-out (FIFO) input buffers 53 a-e.
  • FIFO first-in-first-out
  • the control means 54 searches each contributing error-corrected QCIF signal to identify header codewords (such as the GOB header 404 and the MB header 406). This is achieved by a device 55 which decodes the attributed data in the FEC-corrected QCIF signals output from the input buffers 53.
  • the device 55 comprises a series of comparators (not shown) and a shift register (not shown) of sufficient length to hold the longest code word.
  • the comparators compare the data as it enters the shift register and when a code word is identified, it is forwarded to the control means 54 via a bus 55a.
  • the shift registers then perform a serial to parallel conversion to organise the input video data into bytes for output via bus 55b and convenient storage in random access memory (RAM) 56.
  • RAM random access memory
  • a suitable device 55 to perform these operations is a Field Programmable Gate Array (FPGA) such as the Xylinx device.
  • FPGA Field Programmable Gate Array
  • Each GOB will thus be reorganised into a number of words ( 1 6 bit or 32 bit) having newly assigned byte boundaries, since H.261 signals are not originally organised in bytes.
  • the bytes of data allocated to a particular GOB may inevitably contain data not relevant to that GOB; this data will form part of the first and last bytes of the GOBs concerned. These first and last bytes are marked to state the number of valid bits that they contain.
  • the control means 54 monitors the status of the data content of the individual input buffers 53a-e via an input control device 60 (such as a FPGA) to ensure that there is no overflow or underflow of data in the buffers.
  • the video data of each GOB is allocated to a portion of random access memory (RAM) 56. Since intra- or inter- frame coding may be used in H.261 , the amount of video data within a GOB may vary significantly.
  • the video data of each GOB is allocated to a portion of random access memory (RAM) 56. Since intra- or inter- frame coding may be used in H.261 , the amount of video data within a GOB may vary significantly.
  • the video data of each GOB is allocated to a portion of random access memory (RAM) 56. Since intra- or inter- frame coding may be used in H.261 , the amount of video data within a GOB may vary significantly.
  • GOB is therefore allocated to a portion of RAM of sufficient capacity to hold the largest possible GOB allowed under H.261 .
  • the GOBs for a particular QCIF picture (which contains three GOBs) are logically grouped together in the RAM.
  • various codes associated with each GOB are also stored in the RAM. These codes relate to: the source of the data (i.e. from which terminal 1 2 the video originated); the picture number (PIC) of the current picture held in RAM from a particular source; the original group number (OGN)( 1 ,2,3) of the GOB in a particular PIC; the number of bytes (Nbyte) in the GOB; the valid data content (VFByte) of the first byte in a GOB; and the valid data content (VLByte) of the last byte in a GOB.
  • Also associated with each GOB is a number of pointers to locate the position of headers within the frame. These are used, for example, to locate the OGN codeword position for editing purposes prior to compilation of the video data to form a CIF format signal.
  • each new CIF picture-data sequence from the original individual constituent QCIF picture data stored in the RAM 56: • Assign an appropriate CIF Picture Header for the output CIF Frame; this is output ahead of the GOBs of data. • Edit each GOB Header code to conform to the new positions of each GOB in the CIF structure required for the given output to which the data is to be sent.
  • each portion of the RAM is polled by the control means 54 at the highest allowed H.261 picture rate, approximately 30 Hertz.
  • H.261 picture rate approximately 30 Hertz.
  • a complete frame of data for an individual QCIF signal from a terminal 1 2 is available it is transferred to an output data FIFO 57
  • an empty GOB of data i.e. just a header
  • the control means 54 monitors the status of the individual areas of the RAM to ensure that the above procedure is followed.
  • each CIF frame output is built up by transferring one GOB at a time to each output buffer 57 in turn before returning to the first to start again
  • data for several outputs tend to require the same input picture data at any one time in the CIF compilation sequence, allowing a large degree of parallelism to be employed in the data transfer.
  • the RAM 56 is of sufficient capacity to store a sequence of several QCIF frames of data from any single source if required, although in normal operation on average only two QCIF frames of data are required. Once an area of RAM has been transferred to all of the required output buffers 57 a-e then the area is made available for storing a new QCIF frame. New MB address Stuffing Codes are omitted or inserted to control the output data-rate to comply with H.261 for a CIF picture.
  • the output buffers 57 buffer the data being assembled from the original
  • the output buffers 57 are of sufficient capacity to aliow loading of data to take place without overflow, whilst providing the FEC encoders 58 with data when requested without underflow.
  • the flow of data into the buffers 57 and out of the buffers 59 and the FEC is controlled by an output control 62, which may also be a FPGA device.
  • the forward error corrected signal output from the encoders 58 is input to CIF output buffers 59a-e which buffer the CIF signals for transmission to the relevant participating terminal 1 2.
  • Each of the individual terminals 1 2 participating in a conference are autonomous. This means that there will tend to be different and varying amounts of information within each individual QCIF-coded picture; each terminal 1 2 will be operating at slightly different picture-rate tolerances ( ⁇ 50pmm); and each terminal 1 2 can produce a different picture-rate (through picture dropping) .
  • the last item potentially creates the biggest problem.
  • the possible options and alternatives available when combining pictures at different frame-rates into a larger picture of one frame-rate are discussed below.
  • the combined CIF picture is compiled from a maximum of four contributing QCIF pictures. If different picture rates are used by the different QCIF picture feeds, then the combined CIF picture may be formed for instance by either using the highest QCIF picture rate present or using a fixed pre-determined rate.
  • this rate may vary dynamically with the changing scene contents which are encoded by each participating terminal 1 2. It is possible to keep track of the highest current picture-rate and to modify the CIF output frame- rate accordingly.
  • the highest picture-rate possible (29.97 Hz), or some other pre-determined rate, is used to set the CIF output frame-rate.
  • individual QCIF data picture-rates would not be used to determine the output rate.
  • This option is slightly more wasteful of data-capacity than the previous option, requiring a larger 'overhead', but simplifies the operation of the apparatus and potentially allows for the use of each individual Temporal Reference (TR) code of an H.261 format signal.
  • TR Temporal Reference
  • the TR code can be used to determine the relative temporal position of each QCIF picture within a sequence of CIF frames, possibly leading to enhanced rendition of motion when displayed It may well be that one or more of the terminals 1 2 can only receive pictures at a particular lower rate In this case that lower rate will set the limit on the maximum allowable pre ⁇ determined CIF picture-rate for all participants, the controlling MCU 14 signalling this to all the participating terminals.
  • the MCU can impose a maximum picture rate on the contributing incoming feeds if necessary.
  • the newly formed CIF format signals have a mean data-rate that is the sum of the data rates of the constituent QCIF pictures, plus an additional 'overhead' capacity to cater for combining pictures with the different picture-rates (as discussed above).
  • Each CIF frame must contain all of the constituent GOB headers, even for any omitted data.
  • a proportionally higher data rate will be required on the output CIF channel, depending upon the picture-rate disparities between the incoming QCIF feeds. The following is an estimation of a 'worst- case' scenario, to determine the overhead required.
  • Each of the terminals 1 2 may allocate a different channel capacity (R) to video data for transmission to the MCU.
  • the image processor of the invention in the MCU produces a combined CIF coded video signal for transmission at the highest allowed picture-rate for the call. If no constraints are set, this will be 30 Hz (in fact 29.97 Hz ⁇ 50ppm) ; constraints can be sent from the MCU 14 (using, for example, H.221 format signalling) to lower this to say 1 5, 1 0 or 7 5 Hz if desired or required.
  • the down-link (from MCU to Terminal 1 2) channel capacities required are therefore the sum of the four QCIF capacities which will go to form the new CIF pictures, plus the overhead, Audio, Data, a frame alignment signal and a bit allocation signal.
  • each incoming H.261 coded QCIF picture is autonomous with its own unique data structure.
  • the internal structure is mn* M 97/36425
  • Picture Layer The individual constituent QCIF Macroblocks are assigned a location in the new CIF array of Macroblocks.
  • a new Picture Layer Picture Start Code (PSC) is assigned to conform to the new CIF format and a flag set which defines the source format (0: QCIF, 1 . CIF) to declare CIF for coded pictures output.
  • the Temporal Reference (TR) code could be taken as one of the contributing QCIF pictures 'averaged' from all of the contributions or used to temporally locate each QCIF segment of data into the new CIF frame.
  • GOB Layer The individual constituent QCIF Macroblocks are assigned a location in the new CIF array of Macroblocks.
  • a new Picture Layer Picture Start Code (PSC) is assigned to conform to the new CIF format and a flag set which defines the source format (0: QCIF, 1 . CIF) to declare CIF for coded pictures output.
  • the Temporal Reference (TR) code could be taken as one of the contributing QCIF pictures 'averaged' from all of
  • GN GOB header Group Number
  • a Macro-Block stuffing (MBA stuffing) code word is available and may be employed for 'padding out' the data content if desired.
  • FIG. 6 shows the resulting CIF pictures for a videoconference including five terminals.
  • Each CIF picture is formed from four QCIF pictures.
  • the last CIF picture in Figure 6 represents a combination of the QCIF signals from terminals number 1 , 2, 3 and 4 and is transmitted from the MCU to terminal no. 5.
  • terminal number 5 will display a composite image composed of images from all four other participating terminals 1 2.
  • the image processor of the invention can produce a CIF picture from one, two, three or four QCIF pictures. This method could also be used to combine CIF formatted pictures into "Multiple-CIF" formats (e.g. to combine four CIF images into one composite signal) to produce higher resolution pictures. It could similarly also be used, with only minor changes, to combine MPEG (H.262) pictures into multiple pictures.
  • the location information contained in the H.261 data headers may be edited to position individual picture segments anywhere within the available display field as desired. This can be used to produce a subjectively more pleasing arrangement of contributing QCIF pictures when there are less than four participants being displayed. For example, if the final CIF picture is compiled from only two contributing QCIF pictures, such as would be the case in a three-way conference, then it may be subjectively better to arrange the two pictures say side by side in the middle of the screen, rather than in any of the corners. This can easily be achieved by re-numbe ⁇ ng the constituent GOBs for each QCIF picture to occupy, for example, positions 3, 5, 7 and 4, 6, 8 in the CIF array. Alternatively, the images may be placed on top of each, at the top of the display etc.
  • a composite signal may be generated which represents more than four QCIF pictures
  • the resolution of a user's screen is 352 pixels by 288 lines and each participant terminal transmits to a central image processing apparatus according to the invention a full resolution (i.e. 352 x 288) picture.
  • a pre-processor 80 (as shown in Figure 8) then pre-processes each incoming signal to reduce its resolution by 50% in each dimension. (In Figure 8, like elements are indicated by the same reference numerals as Figure 5.)

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

Image processing apparatus comprising input means (51) for receiving input signals from n videoconferencing terminals, where n is an interger greater than or equal to 3, each input signal representing frames of a video signal, processing means for forming n composite signals each representing different combinations of at least two of the input signals, and means for transmitting the composite signals to the relevant videoconferencing terminal.

Description

VIDEO PROCESSING
This invention relates to teleconferencing and, in particular, to systems enabling videoconferencing between three or more locations. Videoconferencing can be regarded as a technological substitute for face- to-face meetings. For meetings between two locations, current technology allows one set of participants to see the other set of participants. Where more than two locations are interconnected (so called multipoint videoconferencing) current systems generally provide a view of only one other location at a time, owing to cost and technology constraints,
A number of standards relating to the field of videoconferencing have been adopted, in particular ITU-T Recommendation H.261 "Video codec for audio-visual services at px64 kbit/s" . H.261 proposed a common intermediate format (CIF). CIF is based on 288 non-interlaced lines per picture at 30 pictures per second. This format was found to solve the compatibility problems between the traditional formats used in Japan and North America and those used in Europe and provide a good quality picture for use in videoconferencing. A second picture format was also included having one half of the resolution of CIF in two dimensions. This format is known as quarter CIF (QCIF). Other relevant International standards are those set by the Moving
Pictures Expert Group (MPEG), both ISO/IEC IS1 1 1 72- 1 (commonly known as MPEG 1 ) and ISO/IEC/1 3818 (commonly known as MPEG2) . Both of these standards also utilise the Common intermediate Format, and the individual pictures can be any size within the 352 pixels by 288 lines picture. A multipoint videoconference is generally controlled by a multipoint control unit (MCU) which processes the audio and video signals from each location separately. The MCU is usually provided as a separate piece of equipment but may form an intergral part of one of the participating terminals. The MCU generally provides an open audio-mixing system, where all participants are able to hear all other participants but not themselves. However, each terminal is only able to see one other participating terminal, the MCU switching the video from selected terminals to be seen at the other terminals. Various methods for selecting who is seen at a particular terminal are known. Two of the most popular involve selecting the picture automatically from the terminal where someone is speaking or having a chair person controlling which picture is seen by whom.
European Patent Application No. 523629 relates to such a multipoint teleconferencing system. A chairperson is located at one of the terminals to control which pictures are viewed by the participants. Each participant receives the same video signal as the other participants for display. European patent application no. 642271 descπbes videoconferencing apparatus in which a multipoint control unit selects every nth field of the incoming video signals to derive a single output signal which is sent to the participants. Again all participants receive the same video signal.
These current systems suffer from the intrusion of the picture-switching process and the feeling of presence is lost since all participants may not be seen at any one time. An example of "loss of presence" occurs when a participant is particularly quiet or is merely listening; it is easy to forget that this participant is present in the teleconference.
A more desirable approach to multipoint videoconferencing would be to enable participants to be seen and heard at all times during the conference, making a videoconference closer to a real face-to-face meeting.
In accordance with the invention, image processing apparatus comprises input means for receiving input signals from n terminals, where n is an integer greater than or equal to 3, each input signal representing frames of a video signal, processing means for forming n composite signals each representing different combinations of at least two of the input signals, and means for transmitting the composite signals to the relevant terminal. Preferably the processing means comprises means for identifying control data in each input signal, means for redefining the control data for inclusion in the composite signals and means for inserting video data from the input signals into the composite signals.
Since the video data itself is not processed, the propogation delays through the apparatus are relatively low, so providing an acceptable degree of service to users.
Preferably, the frame rate of the composite signals may be equal to the highest frame rate of the input signals or equal to a predetermined fixed rate. Preferably the input signals conform to the quarter Common Intermediate Format and the composite signals conform to the Common Intermediate Format.
In accordance with a further aspect of the invention a method of processing image data from a plurality of terminals comprises receiving the input signals from n terminals, where n is an integer greater than or equal to 3, processing the input signals to form n composite signals representing combinations of at least two input signals, each composite signal being different, and transmitting the composite signals to the relevant terminals.
When n is greater than 5, the composite signals may represent combinations of four input signals, the input signals preferably being selected on the basis of at which terminal the most recent speakers are located.
The method preferably includes identifying control data in each input signal, redefining the control data for inclusion in the composite signals and inserting video data from the input signals into the composite signals. The invention will now be described by way of example only with reference to the accompanying drawings in which:
Figure 1 shows schematically a multipoint videoconference; Figure 2 shows an area of a video image divided into blocks; Figure 3a shows a macro block consisting of four luminance and two chrominance blocks;
Figure 3b shows a group of blocks (GOB);
Figure 3c shows the structure of a whole image consisting of twelve groups of blocks according to the common intermediate format and three groups of blocks according to quarter CIF; Figure 4 shows the framing structure for an H.261 encoded picture;
Figure 5 shows the functional elements of apparatus according to the invention;
Figure 6 shows schematically a CIF picture formed from four QCIF pictures, according to the invention; Figure 7 shows an example of a look-up table defining the new GOB numbering of video data for each output; and
Figure 8 shows the functional elements of an alternative embodiment of apparatus according to the invention. As shown in Figure 1 , a multipoint videoconference involves at least three locations, a videoconferencing terminal 1 2 being provided at each location. The locations might be in the same country or spread over a number of countries. In the embodiment shown in Figure 1 , a multipoint control unit (MCU) 14 controls the videoconference and performs all the required audio and video mixing and switching and the control signalling. Each terminal 1 2 is connected to the MCU 14 via broadband digital links such as Intergrated Services Digital Network (ISDN) B- channels. In the UK each B-channel has a capacity of 64kbit/second.
Each terminal 1 2 conforms to the H.261 standard and is capable of transmitting CIF or QCIF pictures. On commencement of a videoconference all participating terminals signal their capabilities to the MCU which then signals to the terminals to request the data in QCIF format.
According to the H.261 standard, images are divided into blocks 22 as shown in Figure 2 for subsequent processing. The smallest block size is an 8x8 pixel block but other sized blocks may be employed. A group of four such luminance (Y) blocks, and the two corresponding chrominance (Cb and Cr) blocks, that cover the same area at half the luminance resolution, are collectively called a macro block (MB) as shown in Figure 3a. Thirty-three macro blocks, grouped and numbered as shown in Figure 3b, are known as a group of blocks (GOB) . The GOBs, grouped and numbered as shown in Figure 3c, form a full CIF or QCIF picture.
The framing structure for a frame of H.261 encoded data is shown in Figure 4. The structure is organised in a series of layers, each containing information relevant to the succeeding layers. The layers are arranged as follows: Picture layer 401 ; GOB layer 403; MB layer 405; and Block layer 407. Each of the layers has a header. The picture header 402 includes information relating to the picture number of the encoded picture, the type of picture (e.g. whether the picture is intraframe coded or interframe coded) and Forward Error Correction (FEC) codes. The GOB header 404 includes information relating to the GOB number within the frame and the quantising step size used to code the GOB. The MB header 406 includes information relating to the MB number and the type of the MB (i.e. intra/inter, foward/backward predicted, luminance/chrominance etc.) . ~ «-,,«..« O 97/36425
Figure 5 shows apparatus according to the invention for combining four
QCIF coded pictures into a single full CIF picture. Such apparatus is provided within the MCU 14. Each individual terminal 1 2 participating in the videoconference supplies QCIF H.261 formatted video data to the MCU 14. The apparatus shown in Figure 5 receives five QCIF pictures from the participating terminals and produces CIF signals representing each combination of four QCIF pictures into a 2x2 array of QCIF coded pictures. The resulting CIF signals are then transmitted to the appropriate participating terminals 1 2 for display on a display capable of displaying CIF resolution pictures. The apparatus shown only operates upon the video signals from the terminals 1 2: the audio, user data information and signalling are controlled in a conventional manner by the host
MCU 14 in which the apparatus is located.
The apparatus compπses five inputs 51 a-e for receiving QCIF format signals from five participating terminals 1 2. Each input signal is input to a forward error correction (FEC) decoder 52a-e which decodes each FEC codes contained in the picture header 402 of each signal, error corrects the video data of the signal in a conventional manner and establishes framing locks on each input signal. Once framing is established for a particular signal, each FEC decoder 52 signals this to a control means 54. The control means 54 may be provided by a microprocessor. The error corrected QCIF signals are then input to first-in-first-out (FIFO) input buffers 53 a-e.
The control means 54 then searches each contributing error-corrected QCIF signal to identify header codewords (such as the GOB header 404 and the MB header 406). This is achieved by a device 55 which decodes the attributed data in the FEC-corrected QCIF signals output from the input buffers 53. The device 55 comprises a series of comparators (not shown) and a shift register (not shown) of sufficient length to hold the longest code word. The comparators compare the data as it enters the shift register and when a code word is identified, it is forwarded to the control means 54 via a bus 55a. The shift registers then perform a serial to parallel conversion to organise the input video data into bytes for output via bus 55b and convenient storage in random access memory (RAM) 56. A suitable device 55 to perform these operations is a Field Programmable Gate Array (FPGA) such as the Xylinx device. Each GOB will thus be reorganised into a number of words ( 1 6 bit or 32 bit) having newly assigned byte boundaries, since H.261 signals are not originally organised in bytes. Thus the bytes of data allocated to a particular GOB may inevitably contain data not relevant to that GOB; this data will form part of the first and last bytes of the GOBs concerned. These first and last bytes are marked to state the number of valid bits that they contain. The control means 54 monitors the status of the data content of the individual input buffers 53a-e via an input control device 60 (such as a FPGA) to ensure that there is no overflow or underflow of data in the buffers. The video data of each GOB is allocated to a portion of random access memory (RAM) 56. Since intra- or inter- frame coding may be used in H.261 , the amount of video data within a GOB may vary significantly. The video data of each
GOB is therefore allocated to a portion of RAM of sufficient capacity to hold the largest possible GOB allowed under H.261 . The GOBs for a particular QCIF picture (which contains three GOBs) are logically grouped together in the RAM.
Together with the video data, various codes associated with each GOB are also stored in the RAM. These codes relate to: the source of the data (i.e. from which terminal 1 2 the video originated); the picture number (PIC) of the current picture held in RAM from a particular source; the original group number (OGN)( 1 ,2,3) of the GOB in a particular PIC; the number of bytes (Nbyte) in the GOB; the valid data content (VFByte) of the first byte in a GOB; and the valid data content (VLByte) of the last byte in a GOB.
Also associated with each GOB is a number of pointers to locate the position of headers within the frame. These are used, for example, to locate the OGN codeword position for editing purposes prior to compilation of the video data to form a CIF format signal.
The following processes then take place to compose each new CIF picture-data sequence from the original individual constituent QCIF picture data stored in the RAM 56: • Assign an appropriate CIF Picture Header for the output CIF Frame; this is output ahead of the GOBs of data. • Edit each GOB Header code to conform to the new positions of each GOB in the CIF structure required for the given output to which the data is to be sent.
• Transfer the needed GOB data from each constituent QCIF picture (held in RAM 56) in the correct sequence after the CIF Picture Header, to form the output CIF frame data sequence required for each output. An example of a required sequence is depicted in Figure 6. For example Output 3' , which is the H.261 CIF sequence required for output 3, will require GOB data (after the new CIF picture header) from all of the other pictures (except input 3) in the following sequence:
< Pιc 1 , GOB 1 > < Pιc 2, GOB 1 > < Pιc 1 , GOB 2 > < Pιc 2, GOB 2 > < Pic 1 , GOB 3 > < Pιc 2, GOB 3 > < Pιc 4, GOB 1 > < Pιc 5, GOB 1 > < Pιc 4, GOB 2 > < Pιc 5, GOB 2 > < Pιc 4, GOB 3 > < Pιc 5, GOB 3 > where Picx, GOBy represents GOB number y from input number x. A look-up table of the required header editing (as shown in Figure 7) is used to guide the control module 54.
The contents of each portion of the RAM is polled by the control means 54 at the highest allowed H.261 picture rate, approximately 30 Hertz. When a complete frame of data for an individual QCIF signal from a terminal 1 2 is available it is transferred to an output data FIFO 57 If the required data for any QCIF segment of the CIF frame is not yet available from the RAM, then an empty GOB of data (i.e. just a header) is transferred instead This allows the destination terminal to display an image until a new frame is ready to be sent by the MCU. The control means 54 monitors the status of the individual areas of the RAM to ensure that the above procedure is followed.
Ail used outputs are loaded with data in a polled sequential cycle: i e. each CIF frame output is built up by transferring one GOB at a time to each output buffer 57 in turn before returning to the first to start again As can be seen from Figure 6, data for several outputs tend to require the same input picture data at any one time in the CIF compilation sequence, allowing a large degree of parallelism to be employed in the data transfer.
The RAM 56 is of sufficient capacity to store a sequence of several QCIF frames of data from any single source if required, although in normal operation on average only two QCIF frames of data are required. Once an area of RAM has been transferred to all of the required output buffers 57 a-e then the area is made available for storing a new QCIF frame. New MB address Stuffing Codes are omitted or inserted to control the output data-rate to comply with H.261 for a CIF picture.
The output buffers 57 buffer the data being assembled from the original
QCIF data GOBs prior to forward error correction coding. Once sufficient data to form a full FEC frame of data (492 bits) have been loaded into an output FIFO 57, the data is fed to a following FEC encoder (58a-e) for forward error correction framing.
The output buffers 57 are of sufficient capacity to aliow loading of data to take place without overflow, whilst providing the FEC encoders 58 with data when requested without underflow. The flow of data into the buffers 57 and out of the buffers 59 and the FEC is controlled by an output control 62, which may also be a FPGA device. The forward error corrected signal output from the encoders 58 is input to CIF output buffers 59a-e which buffer the CIF signals for transmission to the relevant participating terminal 1 2. CIF Output Frame Rate
Each of the individual terminals 1 2 participating in a conference are autonomous. This means that there will tend to be different and varying amounts of information within each individual QCIF-coded picture; each terminal 1 2 will be operating at slightly different picture-rate tolerances (±50pmm); and each terminal 1 2 can produce a different picture-rate (through picture dropping) . The last item potentially creates the biggest problem. The possible options and alternatives available when combining pictures at different frame-rates into a larger picture of one frame-rate are discussed below.
The combined CIF picture is compiled from a maximum of four contributing QCIF pictures. If different picture rates are used by the different QCIF picture feeds, then the combined CIF picture may be formed for instance by either using the highest QCIF picture rate present or using a fixed pre-determined rate.
If the QCIF source with the highest picture-rate is used to determine the CIF output frame-rate, this rate may vary dynamically with the changing scene contents which are encoded by each participating terminal 1 2. It is possible to keep track of the highest current picture-rate and to modify the CIF output frame- rate accordingly.
Alternatively, the highest picture-rate possible (29.97 Hz), or some other pre-determined rate, is used to set the CIF output frame-rate. In this case, individual QCIF data picture-rates would not be used to determine the output rate. This option is slightly more wasteful of data-capacity than the previous option, requiring a larger 'overhead', but simplifies the operation of the apparatus and potentially allows for the use of each individual Temporal Reference (TR) code of an H.261 format signal. The TR code can be used to determine the relative temporal position of each QCIF picture within a sequence of CIF frames, possibly leading to enhanced rendition of motion when displayed It may well be that one or more of the terminals 1 2 can only receive pictures at a particular lower rate In this case that lower rate will set the limit on the maximum allowable pre¬ determined CIF picture-rate for all participants, the controlling MCU 14 signalling this to all the participating terminals. The MCU can impose a maximum picture rate on the contributing incoming feeds if necessary.
The newly formed CIF format signals have a mean data-rate that is the sum of the data rates of the constituent QCIF pictures, plus an additional 'overhead' capacity to cater for combining pictures with the different picture-rates (as discussed above). Each CIF frame must contain all of the constituent GOB headers, even for any omitted data. A proportionally higher data rate will be required on the output CIF channel, depending upon the picture-rate disparities between the incoming QCIF feeds. The following is an estimation of a 'worst- case' scenario, to determine the overhead required.
Worst Case Scenario
Say one QCIF source picture-rate is 30 Hz, whilst the other three are 1 Hz. This means that there will be 29 inserted pictures in every 30 where additional GOB headers without associated data need to be inserted to form the CIF output. Say 26 bits are allocated to each GOB header. Therefore the total number of additional GOB Header bits for 3 QCIF pictures (each containing 3 GOBs) is- 3 x 3 x 26 = 234 bits/CIF frame These extra bits are added for 29 frames out of 30 in every second: 29 x 234 = 6,786 extra bits over-head/sec
Thus a constant 'overhead' of 6.786 kbits/s is required. This quantity will be a greater fraction of the overall data rate for the lower data-rates
Each of the terminals 1 2 may allocate a different channel capacity (R) to video data for transmission to the MCU. The image processor of the invention in the MCU produces a combined CIF coded video signal for transmission at the highest allowed picture-rate for the call. If no constraints are set, this will be 30 Hz (in fact 29.97 Hz ± 50ppm) ; constraints can be sent from the MCU 14 (using, for example, H.221 format signalling) to lower this to say 1 5, 1 0 or 7 5 Hz if desired or required. This allows the image processor of the invention to handle ail incoming QCIF rates, allowing empty GOBs to be transmitted when there is insufficient video data from any source.
When empty GOBs are transmitted, additional information is required for the GOB header data, leading to an additional 'overhead' (discussed earlier) of data capacity required for the output to each terminal 1 2. Under 'worst case' conditions (one QCIF source of 30 Hz with the other three at 1 Hz to be combined into 30 Hz CIF frames) this overhead will approximately be an additional 6.8 kbits/s, independent of the overall channel capacities involved When viewed on a H .221 time-slot basis, this overhead works out to be about 68 bits in every B- channel of 8 x 80 bits: the overhead will thus fit within a single 8 kbit/s Sub- Channel (80 bits) .
The down-link (from MCU to Terminal 1 2) channel capacities required are therefore the sum of the four QCIF capacities which will go to form the new CIF pictures, plus the overhead, Audio, Data, a frame alignment signal and a bit allocation signal.
Data Header Modification
As described earlier, modifications are made to the data header information associated with each new CIF frame which is to be compiled out of the original contributing QCIF data. These modifications are performed on the data which is held in the RAM prior to its sequenced transfer to the Output buffers 57 a-e.
As outlined earlier, each incoming H.261 coded QCIF picture is autonomous with its own unique data structure. The internal structure is mn* M 97/36425
1 1
organised in a series of layers, as shown in Figure 4, each containing information relevant to the succeeding layers. The modifications made to these layers to compile a CIF-format frame are outlined below: Picture Layer: The individual constituent QCIF Macroblocks are assigned a location in the new CIF array of Macroblocks. A new Picture Layer Picture Start Code (PSC) is assigned to conform to the new CIF format and a flag set which defines the source format (0: QCIF, 1 . CIF) to declare CIF for coded pictures output. The Temporal Reference (TR) code could be taken as one of the contributing QCIF pictures 'averaged' from all of the contributions or used to temporally locate each QCIF segment of data into the new CIF frame. GOB Layer:
Each individual QCIF GOB header Group Number (GN) (a 4 bit positional locator number code) is edited to be redefined for the new CIF structure, as shown in the table of Figure 7. MB Layer:
A Macro-Block stuffing (MBA stuffing) code word is available and may be employed for 'padding out' the data content if desired.
Figure 6 shows the resulting CIF pictures for a videoconference including five terminals. Each CIF picture is formed from four QCIF pictures. The last CIF picture in Figure 6 represents a combination of the QCIF signals from terminals number 1 , 2, 3 and 4 and is transmitted from the MCU to terminal no. 5. Thus terminal number 5 will display a composite image composed of images from all four other participating terminals 1 2. The image processor of the invention can produce a CIF picture from one, two, three or four QCIF pictures. This method could also be used to combine CIF formatted pictures into "Multiple-CIF" formats (e.g. to combine four CIF images into one composite signal) to produce higher resolution pictures. It could similarly also be used, with only minor changes, to combine MPEG (H.262) pictures into multiple pictures.
The location information contained in the H.261 data headers may be edited to position individual picture segments anywhere within the available display field as desired. This can be used to produce a subjectively more pleasing arrangement of contributing QCIF pictures when there are less than four participants being displayed. For example, if the final CIF picture is compiled from only two contributing QCIF pictures, such as would be the case in a three-way conference, then it may be subjectively better to arrange the two pictures say side by side in the middle of the screen, rather than in any of the corners. This can easily be achieved by re-numbeπng the constituent GOBs for each QCIF picture to occupy, for example, positions 3, 5, 7 and 4, 6, 8 in the CIF array. Alternatively, the images may be placed on top of each, at the top of the display etc.
Although the above specific description has focussed on video signals conforming to the H.261 standard, there is no intention to limit the scope of the invention to video signals of this type. For instance, the invention is also applicable to video signals conforming to one of the MPEG standards In this case, since the pictures are not confined to QCIF and CIF pictures, a composite signal may be generated which represents more than four QCIF pictures For instance, say the resolution of a user's screen is 352 pixels by 288 lines and each participant terminal transmits to a central image processing apparatus according to the invention a full resolution (i.e. 352 x 288) picture. If the image processing apparatus is arranged to display four images, a pre-processor 80 (as shown in Figure 8) then pre-processes each incoming signal to reduce its resolution by 50% in each dimension. (In Figure 8, like elements are indicated by the same reference numerals as Figure 5.)

Claims

1 . Image processing apparatus comprising input means for receiving input signals from n terminals, where n is an integer greater than or equal to 3, each input signal representing frames of a video signal, processing means for forming n composite signals representing combinations of at least two of the input signals, each composite signal being different and means for transmitting the composite signals to the relevant terminals.
2. Apparatus according to claim 1 , wherein the processing means compπses means for identifying control data in each input signal, means for redefining the control data for inclusion in the composite signals and means for inserting video data from the input signals into the composite signals.
3. Apparatus according to Claim 1 or 2 wherein the frame rate of the composite signals is equal to the highest frame rate of the input signals.
4. Apparatus according to Claim 1 or 2 wherein the frame rate of the composite signals is equal to a predetermined fixed rate.
5. Apparatus according to any preceding claim wherein the input signals conform to the quarter Common Intermediate Format and the composite signals conform to the Common Intermediate Format.
6. Apparatus according to any of claims 1 -4 wherein the input signals and the composite signals conform to the same format, the apparatus further including a pre-processor for pre-processing the input signals.
7 A method of processing the image data from a plurality of terminals, the method compπsing receiving the input signals from n terminals, where n is an integer greater than or equal to 3, each input signal representing frames of a video signal, processing the input signals to form n composite signals representing combinations of at least two input signals, each composite signal being different, and transmitting the composite signals to the relevant terminals.
8. A method according to claim 7 wherein, when n is greater than 5, the composite signals represent combinations of four input signals, the input signals being selected on the basis of at which terminal the most recent speakers are located, or controlled by a conference chairperson. 9 A method according to claim 7 or 8 further comprising identifying control data in each input signal, redefining the control data for inclusion in the composite signals and inserting video data from the input signals into the composite signals. 10. A method according to any of claims 7 to 9 wherein the frame rate of the composite signals is equal to the highest frame rate of the input signals. 1 1 A method according to any of claims 7 to 9 wherein the frame rate of the composite signals is equal to a predetermined fixed rate.
PCT/GB1997/000641 1996-03-28 1997-03-07 Video processing WO1997036425A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP9534101A JP2000507418A (en) 1996-03-28 1997-03-07 Video processing
AU21022/97A AU2102297A (en) 1996-03-28 1997-03-07 Video processing

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
GB9606511.5 1996-03-28
EP96302148.0 1996-03-28
GBGB9606511.5A GB9606511D0 (en) 1996-03-28 1996-03-28 Video processing
EP96302148 1996-03-28

Publications (1)

Publication Number Publication Date
WO1997036425A1 true WO1997036425A1 (en) 1997-10-02

Family

ID=26143636

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB1997/000641 WO1997036425A1 (en) 1996-03-28 1997-03-07 Video processing

Country Status (3)

Country Link
JP (1) JP2000507418A (en)
AU (1) AU2102297A (en)
WO (1) WO1997036425A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2828055A1 (en) * 2001-07-27 2003-01-31 Thomson Licensing Sa Compressed video image mosaic coding/decoding process/mechanism having sub image forming mosaic and coding element/element sub assembly identifying/resynchronisation mark stream placed prior sub image.
WO2003026300A1 (en) * 2001-09-19 2003-03-27 Bellsouth Intellectual Property Corporation Minimal decoding method for spatially multiplexing digital video pictures

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100548383B1 (en) 2003-07-18 2006-02-02 엘지전자 주식회사 Digital video signal processing apparatus of mobile communication system and method thereof
JP2024120350A (en) * 2023-02-24 2024-09-05 Kddiアジャイル開発センター株式会社 Data processing device, data processing method and data processing system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0523629A1 (en) * 1991-07-15 1993-01-20 Hitachi, Ltd. Multipoint teleconference system employing H. 221 frames
EP0642271A1 (en) * 1993-09-03 1995-03-08 International Business Machines Corporation Video communication apparatus
EP0669765A2 (en) * 1994-02-25 1995-08-30 AT&T Corp. Multipoint digital video communication system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0523629A1 (en) * 1991-07-15 1993-01-20 Hitachi, Ltd. Multipoint teleconference system employing H. 221 frames
EP0642271A1 (en) * 1993-09-03 1995-03-08 International Business Machines Corporation Video communication apparatus
EP0669765A2 (en) * 1994-02-25 1995-08-30 AT&T Corp. Multipoint digital video communication system

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2828055A1 (en) * 2001-07-27 2003-01-31 Thomson Licensing Sa Compressed video image mosaic coding/decoding process/mechanism having sub image forming mosaic and coding element/element sub assembly identifying/resynchronisation mark stream placed prior sub image.
WO2003013145A1 (en) * 2001-07-27 2003-02-13 Thomson Licensing S.A. Method and device for coding a mosaic
US8160159B2 (en) 2001-07-27 2012-04-17 Thomson Licensing Method and device for coding a mosaic
WO2003026300A1 (en) * 2001-09-19 2003-03-27 Bellsouth Intellectual Property Corporation Minimal decoding method for spatially multiplexing digital video pictures
US6956600B1 (en) 2001-09-19 2005-10-18 Bellsouth Intellectual Property Corporation Minimal decoding method for spatially multiplexing digital video pictures
US7518630B2 (en) 2001-09-19 2009-04-14 At&T Intellectual Property I, L.P. Minimal decoding method for spatially multiplexing digital video pictures
US8872881B2 (en) 2001-09-19 2014-10-28 At&T Intellectual Property I, L.P. Minimal decoding method for spatially multiplexing digital video pictures
US9554165B2 (en) 2001-09-19 2017-01-24 At&T Intellectual Property I, L.P. Minimal decoding method for spatially multiplexing digital video pictures

Also Published As

Publication number Publication date
JP2000507418A (en) 2000-06-13
AU2102297A (en) 1997-10-17

Similar Documents

Publication Publication Date Title
US5453780A (en) Continous presence video signal combiner
US5764277A (en) Group-of-block based video signal combining for multipoint continuous presence video conferencing
US7646736B2 (en) Video conferencing system
US5684527A (en) Adaptively controlled multipoint videoconferencing system
US6285661B1 (en) Low delay real time digital video mixing for multipoint video conferencing
US6535240B2 (en) Method and apparatus for continuously receiving frames from a plurality of video channels and for alternately continuously transmitting to each of a plurality of participants in a video conference individual frames containing information concerning each of said video channels
CA2159846C (en) Video transmission rate matching for multimedia communication systems
US5838664A (en) Video teleconferencing system with digital transcoding
US5600646A (en) Video teleconferencing system with digital transcoding
CA2159847C (en) Coded domain picture composition for multimedia communications systems
EP1683356B1 (en) Distributed real-time media composer
CA2140849C (en) Multipoint digital video communication system
US7245660B2 (en) Method and an apparatus for mixing compressed video
AU2002355089A1 (en) Method and apparatus for continuously receiving frames from a pluarlity of video channels and for alternatively continuously transmitting to each of a plurality of participants in a video conference individual frames containing information concerning each of said video channels
US7720157B2 (en) Arrangement and method for generating CP images
WO1997036425A1 (en) Video processing
KR100194976B1 (en) Bitstream editing device
Pao et al. Multipoint Videoconferencing
WO1997003522A1 (en) Videoconferencing

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GE GH HU IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK TJ TM TR TT UA UG US UZ VN YU AM AZ BY KG KZ MD RU TJ TM

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH KE LS MW SD SZ UG AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: CA