HK1016793B - Method and device for encoding seamless-connection of telecine-converted video data - Google Patents
Method and device for encoding seamless-connection of telecine-converted video data Download PDFInfo
- Publication number
- HK1016793B HK1016793B HK99101788.6A HK99101788A HK1016793B HK 1016793 B HK1016793 B HK 1016793B HK 99101788 A HK99101788 A HK 99101788A HK 1016793 B HK1016793 B HK 1016793B
- Authority
- HK
- Hong Kong
- Prior art keywords
- video
- data
- vob
- stream
- signal
- Prior art date
Links
Description
Technical Field
The present invention relates to a recording apparatus, a recording medium, and a reproducing apparatus for reproducing a recording medium, which perform various processes on an information transmission bit stream constituting moving picture data, audio data, and sub-picture data having various titles of a series of related contents, generate a new bit stream to constitute a title having a content desired by a user, and efficiently record the generated bit stream on a predetermined recording medium, and a method and an apparatus for encoding and reproducing video data generated by converting a rate of a film-filmed movie material by a frame rate conversion method called telecine conversion, which is used in an authoring system.
Background
In recent years, in systems using laser discs, VCDs, and the like, authoring systems that digitally process multimedia data such as moving images, audio, and sub-images to constitute titles having a series of related contents have been put to practical use.
Particularly, in a system using VCD, moving picture data is recorded on a CD medium having a storage capacity of about 600 mbytes and originally used for recording digital audio signals by a moving picture compression method with a high compression rate called MPEG. The title of an existing laser optical disc is being exchanged for a VCD, as represented by karaoke.
The requirements of users for the content and playback quality of each title are complicated and increasing year by year. In response to the user's request, it is necessary to construct each title with a bit stream having a deeper hierarchical structure than the existing one. With the multimedia data thus constituted in a bit stream having a deeper hierarchical structure, the data amount thereof reaches more than ten times as much as before. Further, since the content corresponding to the details in the title is edited very finely, it is necessary to perform data processing and control on the bitstream by using a lower layer data section.
It is desirable to create a bit stream structure that allows efficient control of such a large number of digital bit streams having a multi-layer structure at various levels, and an advanced digital processing method including playback. There is also a need for a device for performing such digital processing, and a recording medium capable of efficiently recording and storing bit stream information digitally processed by the device, and quickly reproducing the recorded information.
In view of such a situation, many studies on the storage capacity of an optical disc which is to be used have been made for a recording medium. In order to increase the storage capacity of an optical disc, the spot diameter D of the light beam is reduced, but if the laser wavelength is λ and the numerical aperture of the objective lens is NA, the spot diameter D is proportional to λ/NA, and thus the smaller λ, the larger NA is, which is more advantageous for increasing the storage capacity.
However, in the case of using a lens having a large NA, as described in U.S. patent 5235581, coma aberration generated by a relative inclination of the optical axis surface of the light beam and the optical axis of the light beam, which is called beam tilt, becomes large, and in order to prevent this, it is necessary to reduce the thickness of the transparent substrate. In the case of a thin transparent substrate, there is a problem that mechanical strength is deteriorated.
With respect to data processing, MPEG2, which is capable of transmitting large-capacity data at a higher speed than conventional MPEG1, has been studied and put to practical use as a recording/reproducing system for signal data such as moving pictures, audio, graphics, and the like. MPEG2 adopts a compression method and a data format which are somewhat different from MPEG1, and the contents of MPEG1 and MPEG2 and their differences are described in detail in the MPEG specifications of ISO11172 and ISO13818, and therefore, the description thereof is omitted, and MPEG2 also defines the structure of a video encoded stream, but the hierarchical structure of a system bit stream and the processing method of lower layers are not clarified.
As described above, in the existing authoring system, a large amount of data streams having information necessary to sufficiently satisfy various requirements of users cannot be processed, and even if a processing technique is established, since a large-capacity recording medium which can sufficiently record and play back a large amount of data streams is not available, the processed data cannot be effectively reused.
In other words, in order to process a bit stream in a smaller section than a title, an excessive demand is placed on hardware for increasing the capacity of a recording medium and increasing the speed of digital processing, and software for designing an advanced digital processing method including a refined data structure.
An object of the present invention is to provide an editing system that satisfies a user's demand by controlling a bitstream of multimedia data in one title.
A video player for playing back title contents made of such multimedia data is preferably connected to a television set, so that the user can conveniently use playback information. Many titles use movie-captured images as a material, and in such a case, for the purpose of facilitating editing when generating a recording bit stream, a digital VTR is used when supplying the material to a recording signal generation apparatus, and therefore, the movie-captured image material is subjected to rate conversion by a frame rate conversion section called telecine conversion to generate a recording signal.
In the telecine conversion, basically, the conversion of the frame rate is realized by periodically inserting redundant segments in which fields of the same parity are copied, and since the frame rate of the movie and the frame rate of the video are not in a simple integer ratio relationship, a conversion pattern different from a normal conversion pattern is sandwiched between the frame rate of the video and the frame rate of the video at the interval of the periodic processing, and when the telecine image obtained in this way is compression-encoded, the frame rate of the video is compression-encoded as it is, and the video is encoded in a manner equivalent to that of the copied redundant field, so that the efficiency is deteriorated.
However, when editing units VOBs of a plurality of titles subjected to reverse telecine conversion are continuously reproduced, the top fields are continuous at the positions where the VOBs are continuously reproduced, and generally, the operation of the MPEG decoder in such a case cannot be guaranteed, and it corresponds to a DVD player that deletes a field while inserting 1 field to match the reproduced image before and after the reproduced image, or that inserts an irrelevant field in the worst case. Even in the former case, the asynchrony with the sound occurs. Therefore, complete faultless playback cannot be achieved.
The present invention is intended to provide a method for generating a system stream, a recording apparatus, a reproducing apparatus, and an optical disc recording method having a data structure for a medium for recording such a system stream, which are intended to provide a data structure that enables non-tomographic reproduction even at the boundary of an access unit without being continuous between bottom fields or top fields even when VOBs are continuously reproduced, and which are filed on the basis of japanese patent application No. H7-252733 (filed on 9/29/1995), and the contents disclosed in the specification of which are part of the disclosure of the present invention.
Disclosure of Invention
A signal conversion recording method of the present invention is a signal conversion recording method of removing such a redundant field by repeating a video signal converted into a frame rate higher than the frame rate of a signal source by a plurality of times, converting the redundant field into an intermediate signal having a frame rate almost equal to the frame rate of the original signal source, compression-encoding the intermediate signal to obtain a recording signal, and recording the recording signal on a recording medium together with a flag indicating the field removed and a flag indicating which of 2 fields in each video frame obtained as a result is temporally leading, wherein the conversion from the video signal to the recording signal is performed so that the flags at the beginning and end of each recording section have predetermined values when a plurality of logical recording sections are provided on the same recording medium.
Brief description of the drawings
Fig. 1 shows a data structure of a multimedia bitstream.
Fig. 2 shows an authoring encoder.
Fig. 3 shows an authoring decoder.
Fig. 4 is a cross-sectional view of a DVD recording medium having a single recording surface.
Fig. 5 is a cross-sectional view of a DVD recording medium having a single recording surface.
Fig. 6 is a cross-sectional view of a DVD recording medium having a single recording surface.
Fig. 7 is a cross-sectional view of a DVD recording medium having a plurality of recording surfaces (single-sided dual-layer type).
Fig. 8 is a cross-sectional view of a DVD recording medium having a plurality of recording surfaces (double-sided single-layer type).
Fig. 9 is a plan view of the DVD recording medium.
Fig. 10 is a plan view of the DVD recording medium.
Fig. 11 is a development view of a single-sided dual-layer type DVD recording medium.
Fig. 12 is a development view of a single-sided dual-layer type DVD recording medium.
Fig. 13 is a development view of a double-sided single-layer type DVD recording medium.
Fig. 14 is a development view of a double-sided single-layer type DVD recording medium.
Fig. 15 shows an example of a multi-standard header stream.
Fig. 16 is a data structure diagram of the VTS.
Fig. 17 shows a data structure of a system flow.
Fig. 18 shows a data structure of a system flow.
Fig. 19 shows a data structure of a data group in the system flow.
Fig. 20 shows a data structure of the navigation group NV.
FIG. 21 shows a multi-footer example for a DVD.
Fig. 22 shows a data structure of a DVD.
Fig. 23 shows the connection of system flows for multi-view control.
Fig. 24 shows an example of VOBs corresponding to a multi-scene.
Fig. 25 shows a DVD authoring encoder.
Fig. 26 shows a DVD authoring decoder.
Fig. 27 shows a VOB set data string.
Fig. 28 shows VOB data string.
Fig. 29 shows encoder parameters.
FIG. 30 shows an example of a DVD multi-scene program chain structure.
Fig. 31 shows an example of the structure of a DVD multi-scene VOB.
Fig. 32 shows the situation of the telecine transform and the inverse telecine transform.
Fig. 33 shows the concept of multi-view control.
Fig. 34A shows an encoder control flowchart.
Fig. 34B shows an encoder control flowchart.
Fig. 35 is a flow chart showing generation of multi-view coding parameters for non-seamless switching.
Fig. 36 shows a common flowchart for generating encoding parameters.
Fig. 37 is a flowchart showing the generation of multi-view coding parameters for non-tomographic switching.
FIG. 38 is a flow chart illustrating the generation of encoding parameters for protective locking control.
Fig. 39 shows a block diagram of an inverse telecine transform.
FIG. 40 shows a sequential example of protective locking.
Fig. 41 shows the situation of the telecine transform and the inverse telecine transform.
Fig. 42 shows a time chart of the telecine inverse converter.
Fig. 43 shows a connection example of protective locking.
Fig. 44 shows the situation of the telecine transform and the inverse telecine transform.
Fig. 45 shows a block diagram of a telecine inverse transformer.
Fig. 46 shows a timing diagram of the telecine inverter.
Fig. 47 shows a decoder system table.
Fig. 48 shows a decoder table.
Fig. 49 shows a flow chart of the decoder.
Fig. 50 shows a flowchart of PGC playback.
Fig. 51 is a flowchart showing a process of decoding data in a bitstream buffer.
Fig. 52 is a flowchart showing synchronization processing of the decoders.
Fig. 53 shows a flowchart of encoder parameter generation for a single scene.
Fig. 54 shows an example of the interleaved data block structure.
Fig. 55 shows an example of the structure of a VOB data block of the VTS.
Fig. 56 shows a data structure in consecutive data blocks.
Fig. 57 shows a data structure in an interleaved data block.
Best mode for carrying out the invention
For a more detailed description of the invention, reference will now be made to the accompanying drawings.
Data structure of authoring system
First, the logical structure of the bit stream of multimedia data to be processed in the recording apparatus, recording medium, playback apparatus, and authoring system including these functions according to the present invention will be described with reference to fig. 1. The video and audio information that the user can recognize, understand, or enjoy the content is set as 1 title. The "title" corresponds to the amount of information indicating the entire content of a movie at the maximum and the amount of information indicating the content of each scene at the minimum in the movie.
A video title set VTS is constituted by a bit stream containing information of a predetermined number of titles. For simplicity, the video title set is hereinafter referred to as VTS. The VTS includes playback data such as images and sounds representing the contents of the titles and control data for controlling the playback data.
A video zone VZ as a video data section in an authoring system is formed by a prescribed number of VTSs. For simplicity, the viewport will be referred to simply as VZ. In-line sequential arrangement of VTS on a VZ#0-VTS#K (K is a positive integer including 0) and K + 1. Then, one of them, preferably the VTS of the head#0, which is a video management file indicating content information of titles included in each VTS. The predetermined number of VZs thus constructed form a multimedia bitstream MBS as a maximum management part of the multimedia data bitstream in the authoring system.
Authoring encoder EC
Fig. 2 shows an embodiment of the inventive encoder EC for generating a new multimedia bitstream MBS by encoding the original multimedia bitstream in accordance with an arbitrary script adapted to the user's requirements. Also, the original multimedia bit stream is composed of a video stream St1 carrying image information, a sub image stream St3 carrying auxiliary image information such as commentary, and an audio stream St5 carrying sound information. The video stream and the audio stream are bit streams containing image information and audio information obtained from an object within a predetermined time. On the other hand, a sub-picture stream is a bit stream containing image information of one picture, that is, an instant. If necessary, a sub-picture of a picture portion can be captured in a video memory or the like, and the captured sub-pictures can be displayed continuously.
These multimedia source data St1, St3, and St5 are provided with live images and sound signals by means of a television camera or the like in the case of live broadcasting. Or to provide non-live video and audio signals for playback on a recording medium such as a video tape. In fig. 2, three multimedia source streams are used for simplicity, but it is needless to say that 3 or more types of source data representing different title contents may be input. Multimedia source data having such audio, video, and auxiliary image information of a plurality of titles is called a multi-title stream.
The authoring encoder EC is constituted by an editing information generation unit 100, an encoding system control unit 200, a video encoder 300, a video stream buffer 400, a sub-picture encoder 500, a sub-picture stream buffer 600, an audio editor 700, an audio stream buffer 800, a system encoder 900, a video area format composer 1300, a recording unit 1200, and a recording medium M.
In fig. 2, a bit stream encoded by the encoder of the present invention is recorded on an optical disc medium as an example.
The authoring encoder EC includes an editing information generating unit 100 capable of outputting, as script data, an instruction to edit a corresponding part of the multimedia bit stream MBS in accordance with a user's request for pictures, sub-pictures, and audio in the original multimedia title. The edit information generation unit 100 is preferably configured by a display unit, a speaker unit, a keyboard, a CPU, a source data stream buffer, and the like. The edit information generator 100 is connected to the external multimedia stream source, and receives the multimedia source data St1, St3, and St 5.
The user can recognize the content of the title by reproducing the image and the audio on the display unit for multimedia source data and the speaker. The user inputs a content editing instruction conforming to the desired scenario by using the keyboard while confirming the content to be played back. The edit instruction content is information for selecting one or more contents of each source data including a plurality of title contents for each predetermined time period, and connecting and reproducing the selected contents by a predetermined method.
The CPU generates script data St7 for coding information such as the position and length of an edit target part of each data stream of St1, St3, and St5, and the temporal correlation between the edit parts in the multimedia source data, based on the keyboard input.
The source data stream buffer has a predetermined capacity, and outputs the multimedia source data after delaying St1, St3, and St5 by a predetermined time Td.
This is because, when encoding is performed simultaneously with the generation of the scenario data St7 by the user, that is, when the encoding process is performed sequentially, it takes a certain time Td to determine the content of the editing process of the multimedia source data from the scenario data St7 as described later, and therefore, it is necessary to delay the multimedia source data by the time Td so as to synchronize with the editing encoding when actually performing the editing.
In this way, when the editing process is performed sequentially, the delay time Td is determined according to the degree of synchronization required for adjusting the elements in the system, and therefore, the source data stream buffer is generally constituted by a high-speed recording medium such as a semiconductor memory.
However, in the so-called batch editing in which a batch of multimedia source data is encoded after the script data St7 is completed through all titles, the delay time Td needs to be equivalent to one title or more. In such a case, the source data stream buffer may be constituted by a low-speed large-capacity recording medium such as a video tape, a magnetic disk, or an optical disk. That is, the source data stream buffer may be configured by using an appropriate recording medium according to the delay time Td and the manufacturing cost.
The coding system control unit 200 is connected to the edit information generation unit 100, and receives the script data St7 from the edit information generation unit 100. The authoring system controller 200 generates encoding parameters and encoding start/end timing signals St9, St11, and St13 for editing the editing target section of the multimedia source data, based on the information on the temporal position and length of the editing target section included in the script data St 7. Also, as described above, the respective multimedia source data St1, St3, and St5 are output by the source data stream buffer delay time Td, and thus are synchronized with the respective timings St9, St11, and St 13.
That is, the signal St9 is a video encoding signal that generates a video encoding unit indicating the timing of encoding the video stream St1 in order to extract an encoding target portion from the video stream St 1. Likewise, the signal St11 is a sub-picture stream encoding signal indicating the timing of encoding the sub-picture stream St3 for generating the sub-picture encoding section. The signal St13 is an audio encoding signal indicating the timing of encoding the audio stream St5 to generate the audio encoding unit.
The encoding system control unit 200 generates timing signals St21, St23, and St25 for arranging encoded multimedia encoded streams in a predetermined relationship from information such as a temporal relationship among encoding target portions of the respective streams St1, St3, and St5 in the multimedia source data included in the script data St 7.
The encoding system control unit 200 generates, for each title editing unit (VOB) of one video zone VZ share, playback time information IT indicating the playback time of the title editing unit (VOB) and stream encoded data St33 indicating encoding parameters for system encoding for multiplexing multimedia encoded streams of video, audio, and sub-images.
The coding system control unit 200 generates a connection of the title editing units (VOBs) defining the titles of the multimedia bitstream MBS from the title editing units (VOBs) of the respective streams in a predetermined time relationship with each other, or defines an arrangement instruction signal St39 of the formatting parameters for formatting the respective title editing units (VOBs) of the interleaved title editing units (VOBs) for generating the respective title editing units to be overlapped as the multimedia bitstream MBS.
The video encoder 300 is connected to the source data stream buffer of the editing information generating unit 100 and the encoding system control unit 200, and inputs the video stream St1, and encoding parameter data and encoding start/end timing signal St9 for video encoding, such as encoding start/end timing, bit rate, encoding conditions at the start/end of encoding, and whether the type of editing material is an NTSC signal or a PAL signal, or a television movie. The video encoder 300 encodes a predetermined part of the video stream St1 from the video encoding signal St9, and generates a video encoding stream St 15.
Similarly, the sub-image encoder 500 is connected to the source data buffer of the encoding information generator 100 and the encoding system controller 200, and inputs the sub-image stream St3 and the sub-image stream encoding signal St11, respectively. The sub-image encoder 500 encodes a predetermined part of the sub-image stream St3 based on the parameter signal St11 for sub-image stream encoding, and generates a sub-image encoded stream St 17.
The audio encoder 700 is connected to the source data buffer of the edit information generation unit 100 and the encoding system control unit 200, and inputs the audio stream St5 and the audio encoding signal St13, respectively. The audio encoder 700 encodes a predetermined part of the audio stream St5 based on the parameter data for audio encoding and the encoding start/end timing signal St13, and generates an audio encoded stream St 19.
The video stream buffer 400 is connected to the video encoder 300, and stores the video encoding stream St15 output from the video encoder 300. The video stream buffer 400 is also connected to the encoding system control section 200, and outputs the stored video encoded stream St15 as the timed video encoded stream St27 in response to the input of the timing signal St 21.
Also, the sub-picture stream buffer 600 is connected to the sub-picture encoder 500, and stores the sub-picture encoding stream St19 output from the sub-picture encoder 500. The sub-picture stream buffer 600 is also connected to the encoding system control section 200, and outputs the stored sub-picture encoding stream St17 as the timing sub-picture encoding stream St29 in accordance with the input of the timing signal St 23.
The audio stream buffer 800 is connected to the audio encoder 700, and stores the audio stream St19 output from the audio encoder 700. The audio stream buffer 800 is also connected to the coding system control section 200, and outputs the stored audio coded stream St19 as the timed audio coded stream St31 in response to the input of the timing signal St 25.
The system encoder 900 is connected to the video stream buffer 400, the sub-picture stream buffer 600 and the audio buffer 800, and inputs the timed video encoding stream St27, the timed sub-picture encoding stream St29, and the timed audio encoding stream St 31. The system encoder 900 is also connected to the encoding system control section 200, and inputs the stream encoded data St 33.
The system encoder 900 performs multiplexing processing on the timing streams St27, St29, and St31 based on the encoding parameter data of the system encoding and the signal St33 of the encoding start/end timing, and generates a title editor (VOB) St 35.
The video area formatter 1300 is connected to the system encoder 900, and inputs the title editing unit St 35. The video area formatter 1300 is also connected to the encoding system control unit 200, and inputs a formatting parameter for formatting the multimedia bit stream MBS and a formatting start/end timing signal St 39. The video area formatter 1300 rearranges the title editing units St35 of 1 video area (VZ) share in accordance with the order of the script required by the user, based on the title editing unit St39, and generates an edited multimedia bitstream St 43.
The multimedia bitstream St43 edited into the content of the script requested by the user is transmitted to the recording unit 1200. The recording unit 1200 processes the edited multimedia bitstream MBS into data St43 in a format suitable for the recording medium M, and records the data on the recording medium M. In this case, the multimedia bitstream MBS is pre-populated with a volume file structure VFS indicating the physical address on the medium generated by the video area encoder 1300.
The encoded multimedia bitstream St35 may be directly output to a decoder as will be described later, and the edited title content may be played back. In this case, of course, the volume file structure VFS is not included in the multimedia bitstream MBS.
Authoring of the decoder DC
An embodiment of a codec DC for decoding an edited multimedia bitstream MBS by means of an authoring decoder EC of the present invention and expanding the contents of each title according to a script requested by a user will be described with reference to fig. 3. In the present embodiment, the multimedia bitstream St45 encoded by the authoring encoder EC and recorded in the recording medium M in fig. 2 is recorded in the recording medium M in fig. 3.
The authoring decoder DC includes a multimedia bitstream playback unit 2000, a scenario selector 2100, a decoding system controller 2300, a stream buffer 2400, a system decoder 2500, a video buffer 2600, a sub-picture buffer 2700, an audio buffer 2800, a synchronization controller 2900, a video decoder 3800, a sub-picture decoder 3100, an audio decoder 3200, a synthesizer 3500, a video data output terminal 3600, and an audio data output terminal 3700.
The multimedia bit stream reproducing unit 2000 includes a recording medium driving unit 2004 for driving the recording medium M, a reading head unit 2006 for reading information recorded on the recording medium M to generate a binary read signal St57, a tone processing unit 2008 for performing various processes on the read signal St57 to generate a reproduced bit stream St61, and a mechanism control unit 2002. The mechanism controller 2002 is connected to the decoding system controller 2300, and receives the multimedia bitstream playback instruction signal St53, and generates playback control signals St55 and St59 for controlling the recording medium driver (motor) 2004 and the signal processor 2008, respectively.
The decoder DC includes a scenario selector 2100 capable of outputting, as scenario data, an instruction given to the authoring decoder DC in accordance with a request for selecting playback of a corresponding scenario, and enabling playback of a user-desired portion of the image, sub-image, and audio of the multimedia title edited by the authoring encoder EC.
The scenario selection unit 2100 is preferably configured by a keyboard, a CPU, or the like. The user operates the keyboard to input a desired script based on the contents of the script input with the authoring encoder EC. The CPU generates script selection data St51 indicating the selected script according to the keyboard input. The scenario selection unit 2100 is connected to the decoding system control unit 2300 via an infrared communication device or the like, for example. The decoding system control section 2300 generates a playback instruction signal St53 that controls the operation of the multimedia bitstream playback section 2000 from St 51.
The stream buffer 2400 has a predetermined buffer capacity, temporarily stores the playback signal bit stream St61 input from the multimedia bit stream playback unit 2000, extracts address information and synchronization initial value data of each stream, and generates stream control data St 63. The stream buffer 2400 is connected to the decoding system control unit 2300, and supplies the generated stream control data St63 to the decoding system control unit 2300.
The synchronization control unit 2900 is connected to the decoding system control unit 2300, and receives the synchronization initial value data (SCR) included in the synchronization control data St81, sets an internal system clock (STC), and supplies the reset system clock St79 to the decoding system control unit 2300. The decoding system control unit 2300 generates a stream read signal St65 and an input stream buffer 2400 at predetermined time intervals from the system clock St 79.
The stream buffer 2400 outputs a playback bit stream St61 at prescribed time intervals in accordance with the readout signal St 65.
The decoding system control unit 2300 generates a decoded stream instruction signal St69 indicating the IDs of the video stream, the sub-picture stream, and the audio stream corresponding to the selected scenario based on the scenario selection data St51, and outputs the decoded stream instruction signal St69 to the system decoder 25O 0.
The system decoder 2500 outputs the data streams of video, sub-picture, and audio input from the stream buffer 2400 as the video encoding stream St71 to the video buffer 2600, as the sub-picture encoding stream St73 to the sub-picture buffer 2700, and as the audio encoding stream St75 to the audio buffer 2800, in accordance with the instruction of the decoding instruction signal St 69.
The system decoder 2500 detects a reproduction start time (PTS) and a decoding start time (DTS) of each stream St67 in each minimum control unit, and generates a time information signal St 77. The time information signal St77 is input to the synchronization controller 2900 as synchronization control data St81 via the decoding system controller 2300.
In response to the synchronization control data St81, the synchronization control unit 2900 determines a decoding start time for each stream so that each stream is decoded in a predetermined order. The synchronization controller 2900 generates a video stream decoding start signal St89 from the decoding time, and inputs the video stream decoding start signal St89 to the video decoder 3800. Similarly, the synchronization controller 2900 generates a sub-image decoding start signal St91 and an audio decoding start signal St93, and inputs them to the sub-image decoder 3100 and the audio decoder 3200, respectively.
The video decoder 3800 generates a video output request signal St84 from the video stream decoding start signal St89, and outputs the video output request signal to the video buffer 2600. The video buffer 2600 receives the video output request signal St84, and outputs a video stream St83 to the video decoder 3800. The video decoder 3800 detects the playback time information included in the video stream St83, and immediately after receiving an input of the video stream St83 having a length corresponding to the playback time, invalidates the video output request signal St 84. In this way, the video stream corresponding to the predetermined playback time is decoded by the video decoder 3800, and the played back video signal St104 is output to the combining unit 3500.
Also, the sub-image decoder 3100 generates a sub-image output request signal St86 from the sub-image decoding start time St91, and supplies it to the sub-image buffer 2700. The sub-image buffer 2700 receives the sub-image output request signal St86, and outputs the sub-image stream St85 to the sub-image decoder 3100. The sub-picture decoder 3100 decodes the sub-picture stream St85 having a length corresponding to a predetermined reproduction time based on the reproduction time information included in the sub-picture stream St85, reproduces the sub-picture signal St99, and outputs the reproduced sub-picture signal to the synthesizer 3500.
The synthesizer 3500 superimposes the video signal St104 and the sub-image signal St99 to generate a multi-image video signal St105 and outputs the multi-image video signal to the video output terminal 3600.
The audio decoder 3200 generates an audio output request signal St88 from the audio decoding start signal St93, and supplies the audio output request signal St88 to the audio buffer 2800. The audio buffer 2800 receives the audio output request signal St88, and outputs an audio stream St87 to the audio decoder 3200. The audio decoder 3200 decodes the audio stream St87 having a length corresponding to a predetermined playback time based on the playback time information included in the audio stream St87, and outputs the decoded audio stream to the audio output terminal 3700.
In so doing, the user's desired multimedia bitstream MBS can be played back in real time in response to the user's script selection. That is, whenever the user selects a different script, the authoring decoder DC may play back the multimedia bitstream MBS corresponding to the selected script to play back the title content desired by the user.
As described above, in the authoring system of the present invention, in order to arrange substreams that may have a plurality of branches and indicate the minimum editing units of each content in a predetermined temporal relationship with each other for basic title content, multimedia source data is actually encoded or encoded in batch, and a multimedia bitstream can be generated in accordance with a plurality of arbitrary scenarios.
The multimedia bitstream thus encoded can be played back according to any of a plurality of scripts. Therefore, even if a scenario different from the selected scenario is selected (switched) at the time of playback, the multimedia bit stream dynamically corresponding to the newly selected scenario can be played back vividly. Furthermore, when the title content is played back in accordance with an arbitrary scenario, an arbitrary scene can be selected from among a plurality of scenes and played back vividly.
Thus, the authoring system of the present invention can not only edit and play back the multimedia bit stream MBS in real time, but also play back the multimedia bit stream MBS repeatedly. As for details of the authoring system, the applicant of the present application has already disclosed in japanese patent application laid out in 1996, 9/27.
DVD
Fig. 4 shows an example of a DVD having a single recording surface. The DVD recording medium RC1 in this example is composed of an information recording surface RS1 on which writing and reading are performed by irradiating laser LS, and a protective layer PL1 covering the recording surface. A reinforcing layer BL1 is provided on the back surface of the recording surface RS 1. Thus, the surface on the protective layer PL1 side is the front surface SA, and the surface on the reinforcing layer BL1 side is the rear surface SB. Like the medium RC1, a DVD medium having a single recording layer RS1 on one side is called a single-sided single-layer disc.
Fig. 5 shows a detail of part C1 in fig. 4. The recording surface RS1 is formed by an information layer 4109 to which a reflective film such as a metal thin film is attached. On the upper surface of this layer, a protective layer PL1 is formed from a1 st transparent substrate 4108 having a predetermined thickness T1. The reinforcing layer BL1 is formed of a2 nd transparent substrate 4111 having a prescribed thickness T2. The 1 st and 2 nd transparent substrates 4108 and 4111 are connected to each other with an adhesive layer 4110 interposed therebetween.
A print layer 4112 for printing a label is further provided on the upper surface of the 2 nd transparent substrate 4111 as required. The printed layer 4112 is not printed on the entire area of the substrate 4111 of the reinforcing layer BL1, but is printed only on a portion where letters and pictures are to be displayed, and the transparent substrate 4111 may be peeled off at other portions. In this case, the light reflected by the metal thin film 4109 forming the recording surface RS1 is directly visible from the rear surface SB side at the unprinted portion, and for example, when the metal thin film is an aluminum film, the background is silvery white, and printed characters and graphics appear thereon. The printed layer 4112 is not necessarily provided over the entire surface of the reinforcing layer BL1, and may be provided partially depending on the application.
Fig. 6 also shows the detail of part C2 in fig. 5. Concave and convex pits are formed on the surface SA from which information is extracted by the incident light beam, and the surface of the 1 st transparent substrate 4108 in contact with the information layer 4109 by a molding technique, and information is recorded by changing the length and the interval of the pits. That is, the shape of the concave-convex pits of the 1 st transparent substrate 4108 is copied on the information layer 4109. The length and spacing of the pits are made smaller than in the case of a CD, and the information tracks and the pitch formed by the pits in a row are made narrow. As a result, the areal recording density is greatly improved.
The side of the 1 st transparent substrate 4108 on which the pits are not formed is a flat surface. The 2 nd transparent substrate 4111 is a transparent substrate for reinforcement, which is made of the same material as the 1 st transparent substrate 4108 and has flat both surfaces. And a predetermined thickness T1And T2All the same, for example, 0.6 mm is a desirable value, but is not limited thereto.
As in the case of the CD, the information is extracted as a change in the reflectance of the light spot by irradiation with the light beam LS. In the DVD system, the numerical aperture NA of the objective lens is increased and the wavelength λ of the light beam can be made small, so that the diameter of the light spot Ls to be used can be narrowed to about 1/1.6 of the diameter of the light spot in the case of the CD. This means a resolution of 1.6 times higher than that of the CD system.
In reading out data from a DVD, a red semiconductor laser having a short wavelength (650 nm) and an optical system having an objective lens numerical aperture NA as large as 0.6 mm are used. This, combined with the transparent substrate being made 0.6 mm thin, enables recording of information in an amount exceeding 5 gigabytes on one side of an optical disc 120 mm in diameter.
As described above, even in the single-sided single-layer optical disc RC1 having the single recording surface RS1, the amount of recordable information is approximately 10 times that of a CD, and therefore, even a moving image having a very large data size can be handled without impairing the image quality. As a result, the existing CD system can record and reproduce only 74 minutes even at the expense of moving picture quality, compared to DVD which can record and reproduce images of high picture quality for more than 2 hours. Thus, the DVD has a characteristic suitable as a moving image recording medium.
Fig. 7 and 8 show an example of a DVD recording medium having a plurality of the recording surfaces RS. The DVD recording medium RC2 of fig. 7 has a1 st recording surface and translucent 2 nd recording surfaces RS1 and RS2 arranged in two layers on the same side, i.e., the front surface side SA. Different light beams LS1 and LS2 are used for the 1 st recording surface RS1 and the 2 nd recording surface RS2, respectively, and recording and reproducing can be performed on both surfaces at the same time. Alternatively, one of the light beams LS1 or LS2 can be used to record and reproduce information corresponding to two recording surfaces. The DVD recording medium thus constructed is called a single-sided double-layer optical disc. In this example, two recording layers RS1 and RS2 are provided, but a DVD recording medium having two or more recording layers RS may be provided as needed. Such a recording medium is called a single-sided multilayer optical disc.
On the other hand, the DVD recording medium RC3 of fig. 8 has a1 st recording surface RS1 on the front side and a2 nd recording surface RS2 on the rear side SB. In these examples, an example in which two recording surfaces are provided on one DVD is shown, but it is needless to say that a multilayer recording surface optical disk having two or more recording surfaces may be used. As in the case of fig. 7, the light beams LS1 and LS2 may be provided separately, or one light beam may be used to record and reproduce two recording surfaces RS1 and RS 2. The DVD recording medium thus constructed is called a double-sided single-layer optical disc. Of course, a DVD recording medium having two or more recording layers RS disposed on one side may be used. Such an optical disc is called a double-sided multilayer optical disc.
Fig. 9 and 10 are plan views of the recording surface RS of the DVD recording medium RC viewed from the side irradiated with the light beam LS. On the DVD, a spiral track TR for recording information is continuously provided from the inner circumference to the outer circumference. The information recording track TR is divided into a plurality of sectors for each predetermined data portion. In fig. 9, each track is divided into 3 or more sectors for the sake of convenience.
The track TR is wound in a clockwise direction DrA from an end point IA of the inner circumference of the optical disk RCA to an end point OA of the outer circumference as shown in fig. 9. Such an optical disc RCA is called a clockwise rotation optical disc, and its track is called a clockwise rotation track TRA. Depending on the application, the track TRB is wound in the clockwise direction DrB from the end OB of the outer circumference of the optical disc RCB to the end IB of the inner circumference, as shown in fig. 10. The direction DrB is counterclockwise if viewed from the inner circumference to the outer circumference, and thus, for the sake of distinction from the optical disk RCA of fig. 9, is referred to as counterclockwise rotation of the optical disk RCB and counterclockwise rotation of the track. The track rotation directions DrA and DrB are the directions of movement of the light beam for scanning the track for recording and reproduction, i.e., the track paths. The reverse direction RdA of the track winding direction DrA is a direction in which the optical disk RCA is rotated. The opposite direction RdB of the track winding direction DrB is the direction in which the optical disc RCB rotates.
Fig. 11 schematically shows a schematic diagram of an optical disc RC2o as an example of the single-sided dual-layer optical disc RC2 shown in fig. 7. As shown in fig. 9, the clockwise rotation track TRA is provided in the clockwise direction DrA on the 1 st recording surface RS1 on the lower side, and as shown in fig. 10, the counterclockwise rotation track TRB is provided in the counterclockwise direction DrB on the 2 nd recording surface RS2 on the upper side. In this case, the outer circumferential ends OB and OA of the tracks on the upper and lower sides are located on the same line parallel to the center line of the optical disc RC2 o. The winding directions DrA and DrB of the track TR are both directions in which data is read from and written to the optical disk RC. In this case, the winding directions of the upper and lower tracks are opposite, that is, the track paths DrA and DrB of the upper and lower recording layers face each other.
The opposite track path type single-sided double-layer optical disc RC2o rotates in the RdA direction corresponding to the 1 st recording surface RS1, the light beam LS follows the track of the 1 st recording surface RS1 along the track path DrA, and at the time of reaching the outer peripheral end OA, the light beam LS is adjusted to be focused on the outer peripheral end OB of the 2 nd recording surface RS2, and the light beam LS can continuously follow the track of the 2 nd recording surface RS 2. In so doing, the physical distances between tracks TRA and TRB of the 1 st and 2 nd recording surfaces RS1 and RS2 can be instantaneously canceled by adjusting the focal point of the light beam LS. As a result, the opposite track path type single-sided dual-layer optical disc RCo easily treats the tracks of the upper and lower layers as one continuous track TR. Accordingly, the multimedia bitstream MBS, which is the maximum management part of the multimedia data in the authoring system described with reference to fig. 1, can be continuously recorded on the two recording layers RS1 and RS2 of one media RC2 o.
In addition, when the winding direction of the tracks on the recording surfaces RS1 and RS2 is reversed from that of the present example, that is, when the track TRB rotating in the counterclockwise direction is provided on the 1 st recording surface RS1 and the track TRA rotating in the clockwise direction is provided on the 2 nd recording surface, the two recording surfaces are used as one recording surface having the continuous track TR in the same manner as the above example except that the rotation direction of the optical disk is changed to RdB. Therefore, for the sake of simplicity, the illustration of the drawings of such examples is omitted. With the above structure, the DVD is constructed so that the multimedia bit stream MBS of the title having a long content can be recorded on one single-sided double-layer optical disc RC2o of the opposite track path type. Such DVD media are called single-sided dual-layer opposite track path type optical discs.
Fig. 12 is a development view schematically showing still another example RC2p of the single-sided double-layer optical disc RC2 shown in fig. 7. As shown in fig. 9, the 1 st and 2 nd recording surfaces RS1 and RS2 each have a track TRA that rotates clockwise. In this case, the single-sided dual-layer optical disc RC2p rotates in the RdA direction, and the moving direction of the light beam is the same as the winding direction of the tracks, that is, the track paths of the upper and lower recording layers are parallel to each other. Even in this case, it is preferable that the outer circumferential ends OA and OA of the upper and lower tracks are located on the same line parallel to the center line of the optical disk RC2 p. Therefore, the focus of the light beam LS is adjusted at the outer circumferential end OA, and the address to be accessed can be changed from the outer circumferential end OA of the track TRA of the 1 st recording surface RS1 to the outer circumferential end OA of the track TRA of the 2 nd recording surface RS2 at a moment like the medium RC2o described in fig. 11.
However, when the light beam LS temporally continuously accesses the track TRA of the 2 nd recording surface RS2, it is preferable to rotate the medium RC2p in the reverse direction (the reverse RdA direction). However, since the efficiency of changing the rotational direction of the medium according to the position of the light beam is not good, the light beam LS reaches the outer circumferential end OA of the track of the 1 st recording surface RS1 and then moves to the inner circumferential end IA of the track of the 2 nd recording surface RS2 as shown by the arrow in the figure, thereby being used as a logically continuous optical disc. Further, if necessary, the tracks on the upper and lower recording surfaces may not be treated as a recording continuous track, and the multimedia bit streams MBS may be recorded on the respective tracks one by one as different tracks. Such DVD media are called single-sided dual-layer parallel track path type discs.
Even if the winding direction of the tracks on both recording surfaces RS1 and RS2 is set to be opposite to that of the present example, that is, the tracks TRB are rotated counterclockwise, the rotation direction of the optical disk is set to RdB, but the other directions are the same. This single-sided double-layer parallel track path type optical disk is suitable for use in applications requiring frequent random access such as cachou's dictionary, in which a plurality of titles are recorded on one medium RC2 p.
Fig. 13 is a development view showing an example RC3s of a double-sided single-layer type DVD medium RC3 having one recording surface RS1 and RS2 on each side shown in fig. 8. One recording surface RS1 is provided with tracks TRA rotating clockwise and the other recording surface RS2 is provided with tracks TRB rotating counterclockwise. Even in this case, it is preferable that the outer circumferential ends OA and OB of the tracks on both recording surfaces are located on the same line parallel to the center line of the optical disc RC3 s. The two recording surfaces RS1 and RS2 have tracks wound in opposite directions, but the track paths are in a plane-symmetric relationship with each other. Such an optical disk RC3s is called a double-sided single-layer symmetrical track path type optical disk. Such a double-sided single-layer symmetrical track path type optical disc RC3s rotates in the RdA direction corresponding to the 1 st recording medium RS 1. As a result, the track path of the 2 nd recording medium RS2 on the opposite side is in the direction opposite to the track winding direction DrB, that is, the DrA direction. In this case, whether continuous or discontinuous, it is not practical to access the two recording surfaces RS1 and RS2 with the same light beam LS by nature. Thus, the front and rear recording surfaces record multimedia bitstreams, respectively.
FIG. 14 is an expanded view of yet another example RC3a of the dual-sided single layer DVD media RC3 shown in FIG. 8. Both the recording surfaces RS1 and RS2 are provided with tracks TRA rotating clockwise as shown in fig. 9. In this case, it is also preferable that the outer circumferential ends OA and OA of the tracks of the both recording surfaces RS1 and RS2 are located on the same straight line parallel to the center line of the optical disc RC3 a. However, in this example, unlike the double-sided single-layer symmetrical track path type optical disc RC3s described above, the tracks on the two recording surfaces RS1 and RS2 are in an asymmetrical relationship. Such an optical disk RC3a is called a double-sided single-layer asymmetric track path type optical disk. This double-sided single-layer asymmetric track path type optical disc RC3s rotates in the RdA direction in correspondence with the 1 st recording medium RS 1.
As a result, the track path of the 2 nd recording surface RS2 on the opposite side is in the direction opposite to the track winding direction DrA, that is, in the DrB direction. Therefore, by continuously moving the single light beam LS from the inner circumference to the outer circumference of the 1 st recording surface RS1 and then moving the light beam LS from the outer circumference to the inner circumference of the 2 nd recording surface RS2, recording and reproducing can be performed without turning over the front and back surfaces of the medium PC3a, even if a different light beam source is not prepared for each recording surface. In this double-sided single-layer asymmetric track path type optical disc, the tracks on both recording surfaces RS1 and RS2 are the same. Therefore, the front and back sides of the medium PC3a are reversed, and recording and reproduction can be performed on both sides with a single light beam LS even if different light beams are not prepared for each recording, and as a result, the recording and reproduction apparatus can be economically manufactured. Note that, in this example, a track TRB is provided on both recording surfaces RS1 and RS2 instead of a track TRA.
As described above, the DVD system, in which the recording capacity is easily multiplied due to the multilayering of the recording surface, will exert its real value in the multimedia field of reproducing some moving picture data, some audio data, some graphic data, etc. recorded on 1 optical disc by an interactive operation with the user. That is, it is possible for a conventional software provider to record a movie in a medium for providing it to people of different languages in different regions and generations while maintaining the quality of the movie.
Protective lock
Conventionally, as independent kits for protective locking in various languages around the world and instituted in countries of europe and america, a movie title software provider has been required to create, supply, and manage a multi-specification title for the same title. The time spent is considerable. Here, it is important that the image quality is high, and it is also important that the content can be played back as intended by the user. A recording medium that is approaching one step to address this desire is the DVD.
Multi-view angle
As a typical example of the interactive operation, when one scene is played back, a "multi-view" function of switching to another scene from another viewpoint is required. This is an application requirement, for example, in the case of a baseball scene, among several angles such as a pitcher, a catcher, an angle centered on a hitter seen from the rear side of the net, an angle centered on an infield seen from the rear side of the net, a pitcher, a catcher, and an angle centered on a hitter seen from the center side, the user can freely select a preferred angle as if switching the cameras.
As a system capable of recording signal data such as moving pictures, voice, graphics, and the like in response to such a demand, the same MPEG as that used for VCD is used for DVD. VCDs and DVDs, which are similar to MPEG formats, are compressed and data-formatted in a manner somewhat different from MPEG1 and MPEG2, because of differences in capacity and transmission speed, and in signal processing performance in playback devices. However, since the contents of MPEG1 and MPEG2 and the differences thereof have no direct relation with the contents concerned by the present invention, the descriptions thereof are omitted (for example, refer to the MPEG standards of ISO11172 and ISO 13818).
The data structure of the DVD system according to the present invention will be described below with reference to fig. 16, 17, 18, and 20.
Multi-scene
If titles are prepared for each desired content in order to satisfy the requirements for locked playback and multi-view playback as described above, it is necessary to prepare a desired number of substantially identical content titles having few partially different scene data, and to record them in advance on a recording medium. This is equivalent to repeatedly recording the same data in most areas of the recording medium, and therefore the utilization efficiency of the storage capacity of the recording medium is significantly unimportant. Further, even if a large-capacity recording medium such as a DVD is provided, it is impossible to record a title that meets all the requirements. Such a problem is basically solved by increasing the capacity of the recording medium, but is highly undesirable from the viewpoint of effective utilization of system resources.
In the DVD system, a title having a variety of variations is constructed with minimum required data using multi-scene control whose outline will be described below, enabling efficient use of system resources such as recording media. That is, a basic scene section formed by common data among the titles and a multi-scene section formed by several different scenes suitable for various requirements constitute titles having various changes. Thus, preparation is made in advance so that the user can freely select a specific scene in each multi-scene section at any time during playback. The multi-scene control including lock playback and multi-view playback will be described below with reference to fig. 21.
Data structure of DVD system
Fig. 22 shows a data structure of edit data in the DVD system according to the present invention. The DVD system includes a recording area roughly divided into three areas, i.e., a write area LI, a volume area VS, and a read area LO, for recording a multimedia bit stream MBS.
The writing area LI is located at the circumferential portion of the innermost circumference of the optical disc, for example, at the inner circumferential ends IA and IB of its tracks in the magnetic disc illustrated in fig. 9 and 10. In the writing area LI, data for stabilizing the operation at the start of reading by the playback apparatus and the like are recorded.
The readout region LO is located on the outer circumference of the outermost turn of the optical disc, i.e., the outer circumferential ends OA and OB of the tracks illustrated in fig. 9 and 10. In the read area LO, data indicating the termination of the volume area VS and the like are recorded.
The volume area VS is located between the write area LI and the read area LO, and records logical sectors LS of 2048 bytes as n +1 (n is zero or a positive integer) one-dimensional arrays. Each logical sector LS is distinguished by a sector number (#0, #1, #2, # … # n). And the volume area VS is divided into a volume/file management area VFS formed of m +1 logical sectors LS #0 to LS # m (m is a positive integer smaller than n or 0) and a file data area FDS formed of n-m logical sectors LS # m +1 to LS # n. The file data region FDS corresponds to the multimedia bitstream MBS shown in fig. 1.
The volume/file management area VFS is a file system for managing data of the volume area VS as a file, and is formed of logical sectors LS #0 to LS # m that accommodate the number m of sectors (m is a natural number smaller than n) necessary for managing data of the entire disc. The volume/file management area records information of files in a file data area FDS in accordance with standards such as ISO9660 and ISO 13346.
The file data area FDS is formed of n-m logical sectors LS # m +1 to LS # n, and includes video management files VMG whose sizes are integral multiples of logical sectors (2048 × I, I being a predetermined integer) and k VTS video title sets VTS #1 to VTS # k (k being a natural number smaller than 100).
The video management file VMG holds information indicating title management information of the entire optical disc and has information indicating a volume menu as a setting/changing menu for performing the entire volume playback control. The video title set VTS # k is also simply referred to as a video file, and represents a title composed of data such as moving images, sounds, and still images.
Fig. 16 shows a content structure of the video title set VTS of fig. 22. The video title set is roughly divided into VTS information (VTSI) indicating management information of the entire disc and VOBS (VTSTT _ VOBS) for VTS titles as system streams of multimedia bit streams. First, after the description of the VTS information, the VTS title will be described as VOBS.
The VTS information mainly includes a VTSI management table (VTSI _ MAT) and a VTSPGC information table (VTS-PGCIT).
The VTSI management table indicates the internal structure of the video title set VTS, the number of selectable audio streams, the number of sub-images, and the storage address of the video title set VTS.
The VTSPGC information management table records I pieces (I is a natural number) of PGC information VTS PGCI #1 to VTS _ PGCI # I indicating a program chain (PGC) for controlling the playback order. Each item of PGC information VTS _ PGCI # I is information indicating a program chain, and is composed of j (j is a natural number) pieces of access unit playback information C _ PBI #1 to C _ PBI # j. Each access unit playback information C _ PBI # j contains control information on the playback order and playback of the access units.
The program chain PGC is a concept describing a title stream. The playback order of the access units (described below) is described to form a title. For example, when the VTS information is menu information, the VTS information is stored in a buffer in the playback apparatus at the start of playback, and the playback apparatus refers to the VTS information at the time when the "menu" key of the remote controller is pressed during playback, and displays the uppermost menu, for example, # 1. In the case of the hierarchical menu, for example, program chain information VTS _ PGCI #1 is a main menu displayed after a "menu" key is pressed, #2 to #9 are submenus corresponding to the numbers of the numeric keys of the remote controller, and #10 and later are submenus of further lower layers. The configuration may be such that, for example, #1 is the uppermost menu displayed by pressing the numeric key, and #2 is the guidance sound reproduced by the number corresponding to the numeric key.
The menu itself can be represented by a plurality of program chains specified by the table, and can be configured into any form of menu, such as a hierarchical menu or a menu containing guidance sounds.
For example, in the case of a movie, the playback apparatus refers to a buffer stored in the playback apparatus at the start of playback, and plays back the system stream in the access unit playback order described in the PGC.
The access unit referred to herein is all or a part of the system stream, and is used as an access point during playback. For example, in the case of a movie, it can be used as a chapter that segments a title in the middle.
The input PGC information C _ PBI # j includes access unit playback processing information and an access unit information table, respectively. The playback processing information is composed of information necessary for playback by the access unit, such as playback time and the number of repetitions. C _ PBI # j is composed of an access unit block pattern (CBM), an access unit block type (CBT), a Seamless Playback Flag (SPF), an interleaved block configuration flag (IAF), an STC reset flag (STCDF), an access unit playback time (C _ PBTM), a seamless angle switching flag (SACF), an access unit head VOBU start address (C _ fvbu _ SA), and an access unit end VOBU start address (C _ LVOBU _ SA).
The non-tomographic reproduction is a reproduction of media data such as video, audio, and sub-video without interrupting data and information in the DVD system. The details will be described below with reference to fig. 23 and 24.
The access unit block pattern CBM indicates whether a plurality of access units constitute one functional block, access unit playback information of each access unit constituting the functional block is continuously arranged in PGC information, the CBM arranged in the access unit playback information at the head shows a value indicating "the head access unit of the block", the CBM arranged in the last access unit playback information shows a value indicating "the last access unit of the block", and the CBM arranged in the middle access unit playback information shows a value indicating "the access unit within the block".
The access unit block type CBT indicates the kind of the access unit block shown by the CBM. For example, when a multi-view function is set, access unit information corresponding to playback at each angle is set as the function block described above, and as the type of the function, a value indicating "angle" is set in the CBT of the access unit playback information of each access unit.
The system reproduction flag SPF is a flag indicating whether or not the access unit is connected to and reproduced without a break from a previously reproduced access unit or access unit block, and when the access unit is connected to and reproduced without a break from a previously reproduced access unit or access unit block, a flag value of 1 is set to the SPF of the reproduction information in the access unit of the access unit. In the case of non-faultless, a flag value of 0 is set.
The interleave allocation flag IAF is a flag indicating whether or not the access unit is allocated in the interleave area, and when the access unit is allocated in the interleave area, a flag value 1 is set to the interleave allocation flag IAF of the access unit. Otherwise, the flag value 0 is set.
The STC resetting flag STCDF is information indicating whether or not it is necessary to reset the STC used for synchronization in playback of an access unit, and if it is necessary to reset the STC, a flag value of 1 is set. Otherwise, the flag value 0 is set.
When the access unit belongs to an angle section and is switched without a slice, the non-slice angle conversion flag SACF sets a flag value of 1 to the SACF of the access unit. Otherwise, the flag value 0 is set.
The access unit playback time (C _ PBTM) represents the playback time of an access unit within the precision range of the video frame number.
C _ LVOBU _ SA indicates the VOBU start address at the end of the access unit, and its value indicates the distance in terms of the number of sectors from the logical sector of the access unit at the beginning of the VOBS for the VTS title (VTSTT _ VOBS). C _ FVOBU _ SA indicates the start address of the starting VOBU of the access unit, and indicates the distance from the logical sector of the starting access unit of the VTS title VOBS (VTSTT _ VOBS) by the number of sectors.
The following describes VOBS for VTS titles, i.e. 1 multimedia system stream data VTSTT _ VOBS. The system stream data VTSTT _ VOBS is constituted by i (i is a natural number) system streams SS called video playback objects (VOEs). Each of the video playback objects VOB #1 to VOB # i is constituted by at least one video data, and may be constituted by interleaving at most 8 audio data pieces and at most 32 sub-picture data pieces in some cases.
Each video playback object VOB is constituted by q (q is a natural number) access units C #1 to C # q. Each access unit C is composed of r (r natural numbers) video object parts VOBU #1 to VOBU # r. Each VOBU is composed of a plurality of video coding update periods (GOPs) and audio data and sub-pictures whose time corresponds to the period. The head of each VOBU includes a navigation pack NV as management information of the VOBU. The structure of NV is described below with reference to fig. 19.
Fig. 17 shows an internal structure of the view zone VZ (fig. 22). In the figure, the video encoding stream St15 is a compressed one-dimensional video data string encoded by the video encoder 300. The audio encoding stream St19 is also a one-dimensional audio data string in which left and right channel data of stereo encoded by the audio encoder 700 are compressed and synthesized. The audio data may be multi-channel data such as surround sound.
The system stream St35 has the structure illustrated in fig. 22 in which data groups (Pack) having the number of bytes equivalent to logical sector LS # n having a capacity of 2048 bytes are arranged one-dimensionally. A stream management data group called a navigation group NV, which records management information such as data arrangement in the system stream, is arranged at the head of the system stream St35, that is, at the head of the VOBU.
Each of the video encoded stream St15 and the audio encoded stream St19 is divided into packets (packets) by the number of bytes corresponding to the data group of the system stream. These data packets are represented in the figure as V1, V2, V3, V4 and a1, a2, …. These packets are interleaved in an appropriate order as a system stream in the figure in consideration of the processing time of a decoder for video and audio data expansion and the buffer capacity of the decoder, thereby forming a packet array. For example, in this example, V1, V2, a1, V3, V4, and a2 are arranged in this order.
Fig. 17 shows an example of interleaving a set of moving image data and a set of audio data. However, in the DVD system, since the recording/reproducing capacity is greatly increased to realize high-speed recording/reproducing and the performance of the LSI for information processing is improved, a set of moving picture data can be recorded as one MPEG system stream by interleaving it with a plurality of audio data and sub-picture data as a plurality of picture data, and the plurality of audio data and the plurality of sub-picture data can be selectively reproduced when reproduced. Fig. 18 shows a configuration of a system stream used in such a DVD system.
Fig. 18 is also the same as fig. 17, and the video encoding streams St15 forming the packets are denoted as V1, V2, V3, V4, …. However, in this example, the audio encoding stream St19 is not one, but three audio data strings St19A, St19B, and St19C are input as sources. In the sub-picture encoding stream St17 as the sub-picture data sequence, two sequences of data, St17A and St17B, are also input as sources. These compressed data totaling 6 strings are interleaved into one system stream St 35.
Video data is encoded in the MPEG system, and the GOP portion is a compressed portion, and the standard of the GOP portion is that 1GOP is configured by 15 frames in the case of NTSC, but the number of frames is variable. Stream management data groups indicating management data having information such as the correlation of the interleaved data are also interleaved at intervals of a GOP based on the video data. If the number of frames constituting a GOP changes, the interval also changes. In the case of DVD, the interval is measured in terms of the length of the playback time, and in the range of 0.4 seconds to 1.0 second, the limit is taken to be the GOP portion. If the playback time of consecutive GOPs is 1 second or less, the management data groups can be interleaved in one stream for the video data of the GOPs.
In the case of a DVD, such a management data set is called a navigation pack, a data set from the navigation pack NV to the next navigation pack is called a video playback object part (hereinafter, VOBU), and one continuous playback part which can be generally defined as one scene is called a video playback object (hereinafter, VOB) and is composed of one or more VOBUs. A data set formed by a plurality of VOB sets is also referred to as a VOB set (hereinafter, VOBs). These are the data formats that were first adopted on DVDs.
When a plurality of data strings are interleaved in this way, it is necessary to interleave a navigation group NV that contains management data expressing the relationship between the interleaved data in a portion called a predetermined number of data groups. The GOP is a part that collects about 0.5 seconds of video data corresponding to a playback time of 12 to 15 frames, and it is considered that one stream management packet is interleaved in the number of packets required for playback at that time.
Fig. 19 is an explanatory diagram showing stream management information included in a data group such as interlaced video data, audio data, and sub-image data constituting a system stream. As shown in the figure, each data in the system stream is recorded in a packetized and packetized form according to MPEG 2. Video, audio and sub-picture data have substantially the same packet structure, and in the DVD system, 1 packet has a capacity of 2048 bytes as described above, includes 1 packet called a PES packet, and is composed of a packet header PKH, a packet header PTH and a data field.
The pack header PKH records SCR indicating the time at which the pack should be transferred from the stream buffer 2400 to the system decoder 2500 in fig. 26, that is, reference time information for AV synchronized playback. In MPEG, the SCR is assumed as a reference clock of the entire decoder, but in the case of an optical disk medium such as DVD, a clock as a reference of the time of the entire decoder is separately provided so that the closed time management can be performed for each recording/reproducing apparatus. In the packet header PTH, a PTS indicating a time when the video data or audio data contained in the packet should be output as a playback output after decoding, a DTS indicating a time when the video stream should be decoded, and the like are recorded. PTS and DT5 are set when the header of an access unit as a decoding unit is present in a packet, the PTS indicating the presentation start time of the access unit, and the DTS indicating the decoding start time of the access unit. In the case where the PTS and the DTS are the same time, the DTS is omitted.
The packet header PTH includes a stream ID of an 8-bit field indicating whether the video packet is a video data stream, a private packet, or an MPEG audio packet.
Here, the private pack is data that can freely define the content in the MPEG2 standard, and in the present embodiment, audio data (other than MPEG audio data) and sub-picture data are transmitted using the private pack 1, and PCI and DSI packets are transmitted using the private pack 2.
The private packet 1 and the private packet 2 are composed of a packet header, a private data area, and a data area. The private data area contains a substream ID having a field length of 8 bits indicating whether recorded data is audio data or sub-picture data. The audio data defined by the private data set 2 may be set to a maximum of 8 categories from #0 to #7 for the linear PCM mode and the AC-3 mode, respectively, and the sub-picture data may be set to a maximum of 32 categories from #1 to # 31.
The data area is a recording area for recording compressed data in the MPEG2 format in the case of video data, data in the linear PCM mode, AC-3 mode, or MPEG mode in the case of audio data, and run-length-coded compressed graphics data in the case of sub-image data.
MPEG2 video data is compressed by a fixed bit rate method (hereinafter also referred to as "CBR") or a variable bit rate method (hereinafter also referred to as "VBR"). The fixed bit rate system is a system in which a video stream is continuously input to a video buffer at a constant rate. In contrast, the variable bit rate scheme is a scheme in which a video stream is intermittently input to a video buffer, and thus, occurrence of unnecessary coding can be suppressed.
In DVD, both fixed bit rate mode and variable bit rate mode can be used. In MPEG, since moving picture data is compressed by a variable length coding method, the data amount of GOP is not always necessary. Further, the decoding time of moving picture and sound is different, and the time relationship between moving picture data and audio data read from the optical disk does not coincide with the time relationship between moving picture data and audio data output from the decoder. Therefore, a method of temporally synchronizing a moving image and a sound will be described in detail later with reference to fig. 26, and for the sake of simplicity, a fixed bit rate manner will be first described.
Fig. 20 shows the structure of the navigation group NV. The navigation pack NV is composed of a PCI packet and a DSI packet set, and a pack header PKH is set at the head. The time at which the set should be transferred from the stream buffer 2400 of fig. 26 to the system decoder 2500, that is, the SCR indicating the reference time information for AV synchronized playback is recorded in the PKH as described above.
The PCI packet has PCI information (PCI _ GI) and non-tomosynthesis multiview information (NSML _ AGLI). The PCI information (PCI _ GI) describes the display time (VOBU _ S _ PTM) of the leading image frame and the display time (VOBU _ E _ PTM) of the trailing image frame of the video data contained in the VOBU with the system clock accuracy (90 KHz).
In the non-tomographic multiview information (NSML _ AGLI), the read start address at the time of switching the angle is described as the number of sectors from the head of the VOB. In this case, since the number of angles is 9 or less, there are 9 angle-sized address description areas (NSML _ AGL _ D1_ DStA to NSML _ AGL _ C9_ DStA).
In the DSI data group, there are DSI information (DSI-GI), non-tomographic playback information (SML _ PBI), and non-tomographic multiview playback information (SML _ AGLI). As DSI information (DSI _ GI), the last data set address (VOBU _ EA) in the VOBU is described as the number of sectors from the head of the VOBU.
Although the seamless playback will be described later, in order to play back a divided or spliced title without a break, it is necessary to interleave (multiplex) at the system stream level with ILVU as a continuous reading unit. An interval in which a plurality of system streams are interleaved with ILVUs as a minimum is defined as an interleaved data block.
In order to reproduce such a system stream interleaved with ILVU as a minimum portion without a slice, slice-free reproduction information (SML _ PBI) is described. In the non-tomographic playback information (SML _ PBI), an interleave unit flag indicating whether or not the VOBU interleaves the data block is described. The flag indicates whether VOBUs exist in an interleaved region (to be described later). When the current exists in the interleaved region, the flag value is set to "1". Otherwise, the flag value is set to "0".
When a VOBU exists in an interleaved area, a part end flag indicating whether the VOBU is an ILVU or not is described. Since ILVUs are continuous read-out units, the flag is set to "1" if the VOBU currently being read out is the last VOBU of the ILVU. Otherwise, the characteristic value "0" is set.
When the VOBU exists in the interleaved area, an ILVO end data group address (ILVU-EA) indicating the address of the end data group of the ILVU to which the VOBU belongs is described. Here, the address is described by the number of sectors from the NV of the VOBU.
When the VOBU exists in the interleaved region, the start address (NT _ ILVU _ SA) of the next ILVU is described. Here, the address is described by the number of sectors from the NV of the VOBU.
When two system streams are connected without a break, particularly when the audio signals before and after connection are discontinuous (e.g., when the audio signals are different), it is necessary to temporarily stop the audio signal in order to synchronize the video signal and the audio signal after connection. For example, in the case of NTSC, the frame period of the video signal is about 33.33 milliseconds, and the frame period of the audio signal AC3 is 32 milliseconds.
For this purpose, an audio signal reproduction stop time 1(VOBU _ a _ STP _ PTM1), an audio signal reproduction stop time 2(VOBU _ a _ STP _ PTM2), an audio signal reproduction stop time period 1(VOB _ a _ GAP _ LEN1), and an audio signal reproduction stop time period 2(VOB _ a _ GAP _ LEN2) are described. The time information is described with the system clock accuracy (90 KHz).
Further, a read start address at the time of switching the angle is described as the seamless multiview playback information (SML _ AGLI). This region is an effective region in the case of a faultless multi-view. The address is described by the number of sectors from the NV of the VOBU. Since the number of angles is less than 9, there are 9 address description areas of the size of the angles: (SML _ AGL _ C1_ DSTA-SML _ AGL _ C9_ DSTA).
DVD encoder
Fig. 25 shows an embodiment of an authoring encoder ECD when the multimedia bitstream authoring system according to the present invention is applied to the DVD system. An authoring encoder ECD (hereinafter referred to as a DVD encoder) used in the DVD system has a very similar structure to the authoring encoder EC shown in fig. 2. The DVD authoring encoder ECD has a basic structure that changes the video area formatter 1300 of the authoring encoder EC into the VOB buffer 1000 and formatter 1100. Of course, the bit stream encoded by the encoder of the present invention is recorded on the DVD medium M. The operation of the DVD authoring encoder ECD is compared with the authoring encoder EC and is explained below.
In the DVD authoring encoder ECD, similarly to the authoring encoder EC, the encoding-system controller 200 generates control signals St9, St11, St13, St21, St23, St25, St33, and St39 from script data St7 indicating the contents of a user editing instruction input from the editing-information generator 100, and controls the video encoder 300, the sub-image encoder 500, and the audio encoder 700. The edit instruction content in the DVD system includes, in the same manner as the edit instruction content in the authoring system described with reference to fig. 2, information that one or more contents are selected from the contents of each source data for each predetermined time period for all or each of the source data including a plurality of title contents, and the selected contents are linked and played back by a predetermined method, and the following information. That is, the information includes whether or not to select a subtitle source data stream from a plurality of streams such as a plurality of streams included in an editor divided into predetermined time sections, a plurality of audio signals and a plurality of sub-images in each stream, and display time thereof, and a multi-view source data stream from a plurality of streams such as a plurality of lock-up streams and a plurality of multi-view frames, and a scene switching connection method for a set multi-view frame.
In the DVD system, the scenario data St7 includes VOB part control contents necessary for encoding a media source data stream, i.e., whether or not to perform multiview and whether or not to generate a multi-standard title enabling lock control, and contents such as a bit rate at the time of encoding each stream in consideration of interleaving and a disc capacity in the case of multiview control and lock control described below, start time and end time of each control, and whether or not to perform seamless connection with preceding and following streams are considered. The coding system control unit 200 extracts information from the script data St7 and generates a coding information table and coding parameters necessary for coding control. The coding information table and the coding parameters will be described in detail below with reference to fig. 27, 28, and 29.
The system stream coding parameter data and the system coding start/end timing signal St33 contain information for using the above information in the DVD system to generate VOBs. The VOB generation information includes a front-back connection condition, the number of audio signals, coding information of the audio signals, an audio signal ID, the number of sub-pictures, a sub-picture ID, time information (VPTS) for starting picture display, time information (APTS) for starting audio reproduction, and the like. Also, the format parameter data of the multimedia bitstream MBS and the format start/end timing signal St39 include playback control information and interlace information.
The video encoder 300 encodes a predetermined portion of the video stream St1 based on the encoding parameter signal for video encoding and the signal St9 of the encoding start/end timing, and generates an elementary stream conforming to the MPEG2 video standard specified in ISO 13818. The elementary stream is then output to the video stream buffer 400 as the video encoding stream St 15.
Here, the video encoder 300 generates an elementary stream of the MPEG2 video standard specified in ISO13818, and inputs parameters such as encoding start/end timing, bit rate, encoding conditions at the start/end of encoding, the type of a material being an NTSC signal, a PAL signal, or whether telecine is present, as encoding parameters, from a signal St9 including video encoding parameter data, and also inputs settings of the encoding modes of the open GOP and the closed GOP as encoding parameters, respectively.
The MPEG2 encoding method basically uses the correlation between frames to perform encoding. That is, encoding is performed with reference to frames before and after the frame to be encoded. However, a frame (intra frame) that does not refer to another frame is inserted in terms of transmission errors and flow accessibility. The encoding processing section of an intra frame having at least 1 frame is called GOP.
In such a GOP, a GOP in which encoding is performed completely closed within the GOP is a closed GOP. When a frame referencing a frame in a previous GOP exists in the GOP, the GOP is referred to as an open GOP.
Therefore, when a closed GOP is played back, the GOP can be played back only by using the GOP, and when an open GOP is played back, the previous GOP is usually required.
In addition, a GOP part is often used as a receiving part. For example, when the playback is started from the middle of the title, the playback start point, the video switching point, or special playback such as fast playback, the frame which is the intra-frame encoded frame in the GOP is played back only at the GOP portion, thereby realizing high-speed playback.
The sub-image encoder 500 encodes a predetermined part of the sub-image stream St3 based on the sub-image stream encoding signal St11, and generates variable-length encoded data of bitmap data. The variable length encoded data is then output as a sub-picture encoding stream St17 to the sub-picture stream buffer 600.
The audio encoder 700 generates encoded audio data by encoding a predetermined part of the audio stream St5 based on the encoded audio signal St 13. The audio encoded data includes data based on the MPEG1 audio standard specified in ISO11172 and the MPEG2 audio standard specified in ISO13818, AC-3 audio data, PCM (LPCM) data, and the like. Methods and apparatus for encoding such audio data are well known.
The video stream buffer 400 is connected to the video encoder 300, and stores the video encoding stream St15 output from the video encoder 300. The video stream buffer 400 is also connected to the encoding system control unit 200, and outputs the stored video encoded stream St15 as the timed video encoded stream St27 in response to the input of the timing signal St 21.
Also, the sub-picture stream buffer 600 is connected to the sub-picture encoder 500, and stores the sub-picture encoding stream St17 output from the sub-picture encoder 500. The sub-picture stream buffer 600 is also connected to the encoding system control section 200, and outputs the stored sub-picture encoded stream St17 as the timing sub-picture encoded stream St29 in response to the input of the timing signal St 23.
The audio stream buffer 800 is connected to the audio encoder 700, and stores the audio encoded stream St19 output from the audio encoder 700. The audio stream buffer 800 is also connected to the coding system control unit 200, and outputs the stored audio coded stream St19 as the timed audio coded stream St31 in response to the input of the timing signal St 25.
The system encoder 900 is connected to the video stream buffer 400, the sub-picture stream buffer 600, and the audio stream buffer 800, and inputs the timing video encoding stream St27, the timing sub-picture encoding stream St29, and the timing audio encoding stream St 31. The system encoder 900 is further connected to the encoding system control unit 200, and receives St33 including encoding parameter data for system encoding.
The system encoder 900 performs multiplexing processing on each of the timing streams St27, St29, and St31 based on the encoding parameter data and the encoding start/end timing signal St33, and generates a minimum title editing unit (VOSs) St 35.
The VOB buffer 1000 is a buffer storage area for temporarily storing VOBs generated in the system encoder 900, and the formatter 1100 reads out VOBs necessary for timing from the VOB buffer 1000 in accordance with St39 to generate the 1-view zone VZ. The formatter 1100 adds a file system (VFS) to generate St 43.
The stream St43 edited in the content of the script requested by the user is transmitted to the recording unit 1200. The recording unit 1200 processes the edited multimedia bitstream MBS into data St43 in a format suitable for the recording medium M, and records the data on the recording medium M.
DVD decoder
Referring to fig. 26, an embodiment of an authoring decoder DC when the multimedia bitstream authoring system according to the present invention is applied to the DVD system will be described. The authoring decoder DCD (hereinafter, referred to as a DVD decoder) applied to the DVD system decodes the multimedia bitstream MBS edited by the DVD encoder ECD of the present invention, and expands the contents of each title according to a script desired by a user. In the present embodiment, the multimedia bit stream St45 encoded by the DVD encoder ECD is recorded on the recording medium M.
The basic structure of the DVD authoring decoder DCD is the same as the authoring decoder DC shown in fig. 3, and the video decoder 3800 is replaced with the video decoder 3801 while interposing a rearrangement buffer 3300 and a switch 3400 between the video decoder 3801 and the composition section 3500. Further, the switch 3400 is connected to the synchronization control unit 2900 and receives an input of the switching instruction signal St 103.
The DVD authoring decoder DCD includes a multimedia bitstream playback unit 2000, a scenario selector 2100, a decoding system controller 2300, a stream buffer 2400, a system decoder 2500, a video buffer 2600, a sub-image buffer 2700, an audio buffer 2800, a synchronization controller 2900, a video decoder 3801, an in-sequence buffer 3300, a sub-image decoder 3100, an audio decoder 3200, a selector 3400, a synthesizer 3500, a video data output terminal 3600, and an audio data output terminal 3700.
The multimedia bit stream reproducing unit 2000 is composed of a recording medium driving unit 2004 for driving the recording medium M, a reading head unit 2006 for reading information recorded on the recording medium M to generate a binary read signal St57, a signal processing unit 2008 for applying various processes to the read signal St57 to generate a reproduced bit stream St61, and a mechanism control unit 2002. The mechanism controller 2002 is connected to the decoding system controller 2300, and receives the multimedia bitstream playback instruction signal St53 to generate playback control signals St55 and St59 for controlling the recording medium drive device (motor) 2004 and the signal processor 2008, respectively.
The decoder DC includes a scenario selector 2100 capable of outputting, as scenario data, an instruction given to the authoring decoder DC in accordance with a request for selecting playback of a corresponding scenario, so as to play back a desired portion of the image, the sub-image, and the audio of the multimedia title edited by the authoring encoder EC.
The scenario data selection section 2100 is preferably configured by a keyboard and a CPU. The user operates the keyboard to input a desired script based on the script content input by the authoring encoder EC. The CPU generates script selection data St51 indicating the selected script according to the keyboard input. The scenario selection unit 2100 is connected to the decoding system control unit 2300 via an infrared communication device or the like, for example, and inputs the generated scenario selection signal St51 to the decoding system control unit 2300.
The stream buffer 2400 has a predetermined buffer capacity, temporarily stores the playback signal bit stream St61 input from the multimedia bit stream playback unit 2000, extracts the volume file structure VFS, the synchronization initial value data (SCR) existing in each data group, and the VOBU control information (DSI) existing in the navigation group NV, and generates the stream control data St 63.
The decoding system control unit 2300 generates a playback instruction signal St53 for controlling the operation of the multimedia bit stream playback unit 2000, based on the scenario selection data St51 generated by the decoding system control unit 2300. The decoding system control unit 2300 also extracts playback instruction information of the user from the scenario data St53, and generates a decoding information table necessary for decoding control. The decoding information table will be described in detail below with reference to fig. 47 and 48. The decoding system control unit 2300 extracts the title information recorded on the optical disk M, such as the video management file VMG, VTS information VTSI, PGC information C _ PBI # j, and access unit playback time (C _ PBTM: Cell playback time), from the file data area FDS information in the stream playback data St63, and generates the title information St 200.
The flow control data St63 generates the data group section of fig. 19. The stream buffer 2400 is connected to the decoding system control unit 2300, and supplies the generated stream control data St63 to the decoding system control unit 2300.
The synchronous control unit 2900 is connected to the decoding system control unit 2300, receives the synchronous initial value data (SCR) included in the synchronous playback data St81, sets an internal system clock (STC), and supplies the reset system clock St97 to the decoding system control unit 2300. The decoding system control unit 2300 generates a stream read signal St64 at predetermined intervals from the system clock St79 and inputs the stream buffer 2400. The read-out section in this case is a data group. A method of generating the flow sense signal St65 will be described below. The decoding system control unit 2300 compares the SCR in the flow control data extracted from the flow buffer 2400 with the system clock St79 from the synchronization control unit 2900, and generates a read request signal when the system clock St79 becomes larger than the SCR in St 63. The data group unit performs such control to control the transfer of the data group.
The decoded data control unit 2300 further generates a decoding instruction signal St69 indicating the ID of each of the video, sub-image, and audio streams corresponding to the selected scenario, based on the scenario selection data St51, and outputs the decoding instruction signal St69 to the system decoder 2500.
When there are a plurality of audio data such as sounds in different languages such as japanese, english, and french, and a plurality of sub-image data such as japanese subtitles, english subtitles, and french subtitles, for example, in different languages, IDs are provided for the respective sub-image data. That is, as described with reference to fig. 19, the stream ID is provided to the video data and the MPEG audio data, and the sub-stream ID is provided to the sub-picture data, the audio data in the AC3 mode, the linear PCM, and the navigation group NV information. The user does not recognize the ID, and selects which language of sound or subtitle is selected by the script selecting section 2100. If the english sound is selected, the ID corresponding to the english sound is transmitted to the decoding system control section 2300 as script selection data St 51. Further, the decoding system control unit 2300 transfers the ID to St69 and passes the ID to the system decoder 2500.
The system decoder 2500 outputs the streams of video, sub-picture, and audio input from the stream buffer 2400 as the video offset stream St71 to the video buffer 2600, as the sub-picture decoded stream St73 to the sub-picture buffer 2700, and as the audio encoded stream St75 to the audio buffer 2800, respectively, in accordance with the decoding instruction signal. That is, when the ID of the stream input from the scenario selector 2100 matches the ID of the data group transferred from the stream buffer 2400, the system decoder 2500 transfers the data group to each buffer (the video buffer 2600, the sub-image buffer 2700, and the audio buffer 2800).
The system decoder 2500 detects a reproduction start time (PTS) and a reproduction end time (DTS) of each minimum control access unit in each stream St67, and generates a time information signal St 77. The time information signal St77 is input as St81 to the synchronization controller 2900 via the decoding system controller 2300.
The synchronization control unit 2900 determines, for each stream, a decoding start time that can be set to a predetermined order after decoding, based on the time information signal St 81. The synchronization controller 2900 generates a video stream decoding start signal St89 based on the decoding timing, and inputs the video stream decoding start signal St89 to the video decoder 3801. Similarly, the synchronization controller 2900 generates a sub-picture decoding start signal St91 and an audio encoding start signal St93, and inputs them to the sub-picture decoder 3100 and the audio decoder 3200, respectively.
The video decoder 3801 generates a video output request signal St84 based on the video stream decoder start signal St89, and outputs the video stream decoder to the video buffer 2600. The video buffer 2600 receives the video output request signal St84, and outputs the video stream St83 to the video decoder 3801. The video analyzer 3801 detects the playback time information included in the video stream St83, and invalidates the video output request signal St84 at the time when the input of the video stream St83 having a length corresponding to the playback time is received. In this way, the video stream corresponding to the predetermined playback time is decoded by the decoder 3801, and the played back video signal St95 is output to the reordering buffer 3300 and the switch 3400.
Since a video coded stream is coded using the correlation between frames, the display order does not match the order of the coded stream when viewed in units of frames. It cannot be displayed in decoding order. Therefore, the frame whose decoding is completed is temporarily stored in the reordering buffer 3300. The synchronization controller 2900 controls St103 so as to match the display order, and switches between the output St95 of the video decoder 3801 and the output of the reordering buffer St97, and outputs the result to the synthesizer 3500.
Also, the sub-image decoder 3100 generates a sub-image output request signal St86 from the sub-image decoding start signal St91, and supplies it to the sub-image buffer 2700. The sub-image buffer 2700 receives the video output request signal St84, and outputs the sub-image stream St85 to the sub-image decoder 3100. The sub-picture decoder 3100 decodes the sub-picture stream St85 having a length corresponding to a predetermined time from the reproduction time information included in the sub-picture stream St85, reproduces the sub-picture information St99, and outputs the reproduced sub-picture information to the synthesizer 3500.
The synthesizer 3500 superimposes the output of the selector 3400 and the sub-image signal St99 to generate the video signal St105 and outputs the video signal to the video output terminal 3600.
The audio decoder 3200 generates an audio output request signal St88 from the audio decoding start signal St93, and supplies the audio output request signal St88 to the audio buffer 2800. The audio buffer 2800 receives the audio output request signal St88, and outputs an audio stream St87 to the audio decoder 3200. The audio decoder 3200 decodes the audio stream St having a length corresponding to a predetermined playback time from the playback time information included in the audio stream St87, and outputs the decoded audio stream St to the audio output terminal 3700.
In so doing, the multimedia bitstream MBS desired by the user can be played back in real time according to the user's selection of the script. That is, each time the user selects a different script, the authoring decoder DCD plays back the multimedia bitstream MBS corresponding to the selected script, whereby the title content desired by the user can be played back.
The decoding system control unit may supply the title information signal St200 to the scenario selection unit 2100 via the infrared communication device or the like. The scenario selection unit 2100 extracts the title information recorded on the optical disk M from the file data region FDS information included in the stream playback data St63 of the title information signal St200, and displays the extracted title information on a built-in display, thereby enabling a user to select a scenario in a man-machine interactive manner.
In the above example, the stream buffer 2400, the video buffer 2600, the sub-image buffer 2700, and the audio buffer 2800 reordering buffer 3300 are functionally different and are shown as different buffers. However, buffers having operating speeds several times faster than the writing and reading speeds required for these buffers may be used separately in time, with one buffer functioning as these separate buffers.
Multi-scene
The concept of the multi-field control of the present invention will be described below with reference to fig. 21. As described above, this control is configured by the basic scene section formed by the data shared between the titles and the multi-scene section formed by some scenes in accordance with various requirements. In the figure, scene 1, scene 5, and scene 8 are common scenes. The angle scene between the common scene 1 and the scene 5 and the locked scene between the scene 5 and the scene 8 are multi-scene sections. In the multi-view section, one of scenes shot from different angles, namely, an angle 1, an angle 2 and an angle 3 can be dynamically selected for reproduction during reproduction. In the locked section, either one of the scene 6 and the scene 7 corresponding to different content data may be selected statically in advance for playback.
The script content for selecting which scene among the multi-scene sections is to be played back is input to the script selection unit 2100 by the user and is generated as script selection data St 51. The scenario 1 is shown in the figure as being able to freely select a scene of an arbitrary angle and reproduce a scene 6 selected in advance in a locked section. Similarly, the scenario 2 also indicates that a scene can be freely selected in the angle interval, and a scene 7 is pre-selected in the lock interval.
Next, the multi-field plane shown in fig. 21 will be described with reference to fig. 30 and 31 with respect to the PGC information VTS _ PGCI in the case of using the data structure of the DVD.
Fig. 30 shows a case where the user-specified scenario shown in fig. 21 is described using the VTSI data structure of the internal structure of the video title set in the DVD data structure shown in fig. 16. In the figure, scenario 1 and scenario 2 in fig. 21 are described as two program chains VTS _ PGCI #1 and VTS _ PGCI #2 in the program chain information VTS _ PGCIT in the VTSI in fig. 16. That is, VTS _ PGCI #1 describing scenario 1 is composed of access unit playback information C _ PBI #3 corresponding to scene 1, access unit playback information C _ PBI #4, access unit playback information C _ PBI #5 corresponding to scene 5, access unit playback information C _ PBI #6 corresponding to scene 6, and access unit playback information C _ PBI #7 corresponding to scene 8.
VTS _ PGCI #2 describing scenario 2 is composed of access unit playback information C _ PBI #1 corresponding to scene 1, access unit playback information C _ PBI #2 in the multiview access unit block corresponding to the multiview scene, access unit playback information C _ PBI #3, access unit playback information C _ PBI #4, access unit playback information C _ PBI #5 corresponding to scene 5, access unit playback information C _ PBI #6 corresponding to scene 7, and access unit playback information C _ PBI #7 corresponding to scene 8. The DVD data structure replaces one playback control unit (i.e., one scene) of the script with a unit description on the DVD data structure called an access unit, and implements the script instructed by the user on the DVD.
Fig. 31 shows a scenario instructed by the user shown in fig. 21 in the VOB data structure VTSTT _ VOBs as a multimedia bit stream for the video title set in the DVD data structure of fig. 16.
In fig. 31, two scenarios, scenario 1 and scenario 2, in fig. 21, share VOB data for one title. In the individual scenes shared by the respective scenarios, VOB #1 corresponding to scene 1, VOB #5 corresponding to scene 5, and VOB #8 corresponding to scene 8 are arranged as individual VOBs in the non-interleaved block portion, that is, in the continuous block.
In the multi-view scene common to scenario 1 and scenario 2, angle 1 is constituted by VOB #2, angle 2 is constituted by VOB #3, and angle 3 is constituted by VOB #4, i.e., an angle is constituted by 1VOB, and for switching between angles and seamless playback for each angle, an interleaved data block is taken.
In addition, since both scenes 6 and 7, which are scenes unique to the scripts 1 and 2, are played back without a slice, and are played back without a slice together with the preceding and following common scenes, the scenes are interleaved blocks.
As described above, the user-instructed scenario shown in fig. 21 can be realized in the DVD data structure using the playback control information of the video title set shown in fig. 30 and the VOB data structure for title playback shown in fig. 31.
Faultless playback
The above described seamless playback in connection with the data structure description of the DVD system is explained below. The seamless playback is a playback in which, when multimedia data such as images, sounds, and sub-images are played back while being connected between common scene sections, between a common scene section and a multi-field section, and between multi-field sections, each data and information are played back without interruption. The main cause of interruption of playback of data and information involves hardware, which is so-called decoder underflow, in which the speed at which source data is input to a decoder is out of balance with the speed at which the input source data is decoded.
Further, as a factor relating to the characteristics of the reproduced data, there is a demand for the reproduced data to be reproduced continuously for a fixed time period or longer in order for the user to understand the contents or information, as in the case of audio, and in such data reproduction, when the required continuous reproduction time cannot be secured, the continuity of information is lost. Such continuous reproduction of information is also called continuous reproduction of information, and also called non-tomographic reproduction of information. Reproduction in which continuity of information cannot be ensured is also referred to as discontinuous information reproduction, or non-seamless information reproduction. Of course, continuous information reproduction and discontinuous information reproduction are non-tomographic and non-tomographic reproduction, respectively.
As described above, for the non-tomographic playback, there are defined non-tomographic data playback in which a blank or interruption occurs at the time of data playback is physically prevented by means of buffer underflow or the like, and non-tomographic information playback in which the data itself is not interrupted and the user feels that the information is interrupted at the time of identifying the information from the playback data.
Detailed description of faultless playback
Specific methods capable of thus enabling the non-tomographic playback will be described in detail below with reference to fig. 23 and 24.
Interleaving
For the above-described system stream of DVD data, a title such as a movie is recorded on a DVD medium using the authoring encoder EC. However, in order to provide services in a form that enables the same movie to be used in different cultural circles or countries, it is necessary to record lines in languages of each country and edit and record contents according to ethical requirements of each cultural circle. In such a case, in order to record a plurality of titles edited from the original title on one medium, even in a large-capacity system such as a DVD, it is necessary to reduce the bit rate, and the demand for high image quality cannot be satisfied. Therefore, a method is adopted in which the same portion is shared by a plurality of titles and only different portions are recorded for each title. This makes it possible to record a plurality of titles different in country or culture circle on one optical disc without lowering the bit rate.
A title recorded on an optical disc, as shown in fig. 21, has a multi-field section including a common part (field) and a non-common part (field) in order to enable lock control and multi-view control.
In the case of the locking control, when a scene unsuitable for children, i.e., a scene suitable only for adults, such as a sexual scene or a violent scene, is included in one title, the title is composed of a common scene, a scene suitable only for adults, and a scene suitable for minors. Such a title stream can be realized by configuring scenes suitable only for adults and scenes suitable for minors as multi-scene sections provided between common scenes.
When multi-view control is realized in a normal single-angle title, a plurality of multimedia scenes obtained by photographing an object at a predetermined camera angle are arranged between common scenes as multi-scene areas. Here, scenes photographed at different angles may be taken at the same angle, but scenes photographed at different times may be taken at different times, and may be data such as computer graphics.
When a plurality of titles share data, it is necessary to move the optical pickup head at different positions on the optical disc (RCI) in order to move the optical beam from a shared portion of the data to an unshared portion of the data. Since this movement takes time, it is difficult to realize the seamless reproduction without interruption of the sound and the image in the middle of reproduction. To solve such a problem, theoretically, a trace buffer (stream buffer 2400) having a buffer time equivalent to the longest access time may be provided. In general, data recorded on an optical disc is read by an optical pickup, subjected to predetermined signal processing, and temporarily stored as data in a tracking buffer. The stored data is thereafter decoded and played back as video data or audio data.
Definition of interlacing
In order to make it possible to cut a certain scene and select from among a plurality of scenes as described above, recording is performed on tracks of a recording medium in a layout in which data access units belonging to respective scenes are continuous with each other. Therefore, it is inevitable that there is a non-selected scene insertion record between the data of the common scene and the data of the selected scene. In such a case, reading data in the order of recording makes it difficult to perform seamless connection to scenes because data of non-selected scenes must be accessed before accessing and decoding data of selected scenes.
However, in the DVD system, it is possible to perform seamless connection between such a plurality of scenes by utilizing the excellent random access performance to the recording medium. That is, the data belonging to each scene is divided into a plurality of portions having a predetermined data amount, and the plurality of divided data portions belonging to different scenes are arranged in a predetermined order in the transition performance range, so that the data belonging to each selected scene is intermittently accessed and decoded for each divided portion, thereby reproducing the selected scene without interruption of data. I.e. to ensure the replay of non-tomographic data.
Structure of interleaving data block and interleaving unit
The following describes an interleaving method for enabling seamless data playback with reference to fig. 24 and 54. FIG. 24 shows a case where a plurality of VOBs (VOB-B, VOB-D, VOB-C) are played back from one VOB (VOB-A) branch and then linked to one VOB (VOB-E). Fig. 54 shows a case where these data are actually arranged on tracks TR on the optical disc.
VOB-a and VOB-E in fig. 54 are individual video playback objects of the start point and the end point of playback, and are arranged in principle in a continuous area. As shown in FIG. 24, VOB-B, VOB-C, VOB-D is interleaved after matching the start point and the end point of playback. The interleaved area is then arranged as a continuous area on the optical disc as an interleaved area. The continuous region and the interleaved region are arranged in the order of reproduction, that is, in the direction of the track path Dr. Fig. 54 shows a case where a plurality of VOBs, that is, VOBs, are arranged on the track TR.
Fig. 54 shows a data area in which data are continuously arranged as a data block, and the data block includes a continuous data block in which VOBs that finish the start point and the end point independently are continuously arranged and an interleaved data block in which the start point and the end point are aligned and the plurality of data blocks are interleaved. These data blocks have a structure configured in playback order as shown in fig. 38 as data block 1, data block 2, data block 3, … … data block 7.
In fig. 55, the system stream data VTSTT _ VOBS is made up of data blocks 1, 2, 3, 4, 5, 6, and 7. In block 1, VOB1 is arranged separately, and similarly in blocks 2, 3, 5 and 7, VOBs 2, 3, 6 and 10 are arranged separately. That is, these data blocks 2, 3, 5, and 7 are consecutive data blocks.
On the other hand, in the data block 4, the VOB4 is interleaved with the VOB 5. Similarly, in the data block 6, three VOBs of the VOB7, VOB8, and VOB9 are interleaved. I.e. the two data blocks 4 and 6 are interleaved data blocks.
Fig. 56 shows a data structure in consecutive data blocks. In this figure, VOB-i and VOB-j are arranged as continuous data blocks in the VOBS. VOB-i and VOB-j in the continuous data block are divided into access units as a logical playback section as described with reference to fig. 16. FIG. 56 shows that VOB-i and VOB-j are composed of three access CELLs CELL #1, CELL #2, and CELL #3, respectively. An access unit is composed of 1 or more VOBUs, and the boundary is defined by the VOBU. As shown in fig. 16, the access unit describes the position information in a program chain (hereinafter referred to as PGC) of the playback control information of the DVD. That is, the addresses of the VOBU at the beginning and the VOBU at the end of the access unit are described. As illustrated in fig. 56, for continuous playback of a continuous block, VOBs and access units defined therein are recorded in a continuous area. Therefore, playback of consecutive data blocks is not problematic.
Next, fig. 57 shows a data structure in the interleaved data block. In the interleaved data block, each VOB is divided into interleaved units ILVU, and the interleaved units to which each VOB belongs are interleaved. The interleaved unit then independently defines the access unit boundaries. In this figure, VOB-k is divided into four interleaved CELLs ILVuk-1, ILVuk-2, ILVuk-3 and ILVuk-4, while two access CELLs CELL #1k and CELL #2k are also defined. Similarly, VOB-m is divided into ILVum-1, ILVum 2. ILVum3 and ILVum4, and also defines two access CELLs CELL #1m and CELL #2 m. That is, video data and audio data are contained in the interleave unit ILVU.
In the example of FIG. 57, the interleaved units ILVUk1, ILVUk2, ILVUk3, ILVUk4 and ILVumml of two different VOB-k and VOB-m, ILVum2, ILVum3 and ILVum4, are interleaved within an interleaved data block. Interleaving the interleaved units ILVUs of two VOBs in such an array enables seamless playback branching from a single scene to one of a plurality of scenes and from one of the scenes to the single scene. By performing the interleaving in this way, it is possible to perform connection of scenes having branches and connections and enabling seamless playback in the case of a plurality of scenes.
Multi-scene
The concept of multi-field control based on the present invention is described below, and a multi-field section is also described.
The following is an example of scene composition taken at different angles. However, the scenes of the multiple scenes are at the same angle, but may be scenes photographed at different times or data such as computer graphics. In other words, the multiview scene zone is a multi-scene zone.
Protective lock
The concept of multi-title such as protection and director cutting will be described with reference to fig. 15. This figure shows an example of a multi-specification header stream based on locking. In the case where a title including sexual scenes, violent scenes, and so on, which are unfavorable for young children, so-called scenes suitable only for adults, is composed of the common system streams SSa, SSb, and SSe, an adult-oriented system stream SSc including scenes suitable only for adults, and a minor-oriented system stream SSd including scenes suitable only for minors. Such a header stream configures the system SSc suitable for adults and the system stream SSd suitable for non-adults as a multi-scene system stream between multiple scenes disposed between the common system streams SSb and SSe.
Next, the relationship between the system stream and each title described in the program chain PGC of the title stream configured as described above will be described. The common system streams SSa and SSb, the system stream SSc suitable for minor, and the common system stream SSe are described in this order in the program chain PGC1 suitable for the title of an adult. The common system streams SSa and SSb, the minor-eligible system stream SSd, and the common system stream SSe are described in this order in the program chain PGC2 for minor titles.
In this way, by arranging the system stream SSc suitable for adults and the system stream SSd suitable for minors as a multi-scene, the common system stream SSa and SSb are reproduced by the above-described decoding method based on the description of each PGC, and then the SSc suitable for adults is reproduced between the multi-scene, and the common system stream SSe is reproduced, whereby a title having a content suitable for adults can be reproduced. On the other hand, by selecting the system stream SSd suitable for minors among a plurality of scenes for playback, it is possible to play back a title suitable for minors which does not include scenes suitable for adults alone. In this way, a method of preparing a multi-field section composed of a plurality of types of alternative fields in a title stream, selecting a field to be played back among the fields among the multi-field section in advance, and generating a plurality of titles having different fields from the fields of substantially the same title in accordance with the selected content is called protective locking.
Such locking is called protective locking based on a request from the viewpoint of protecting minors, but is a technique in which a user selects a specific scene in a multi-scene section in advance and generates statically different titles as described above from the viewpoint of system flow processing. Conversely, multi-view is a technique in which a user can freely select scenes in a multi-scene section at any time during playback of a title, thereby dynamically changing the content of the same title.
Also, using the master lock technique, a cropped title stream called a director can be edited. The director cut is a case where, when a long-playback-time title such as a movie is provided on an airplane, unlike playback in a theater, the title cannot be played back last due to the relationship of flight time. In order to avoid this, it is determined beforehand by a director who is responsible for title creation, i.e., a director, that scenes that may be cut out in order to shorten the playback time of a title are deleted, and a system stream including such deleted scenes and a system stream whose scenes are not deleted are arranged in a multi-scene section. With this, scenes can be cut and edited as intended by the creator. In such a protective lock control, it is necessary to smoothly connect playback images at the interface between streams from one system to another without contradiction, that is, to perform playback of non-tomographic information in which video, audio, or the like buffers do not underflow and playback of audio and video images are not audibly and visually unnatural, and playback is not interrupted.
Multi-view angle
The concept of the multi-view control of the present invention is explained with reference to fig. 33. In general, a multimedia title is obtained by recording and shooting (hereinafter, simply referred to as "shooting") a target object for a time T. Each block of # SC1, # SM1, # SM2, # SM3, and # SC3 represents a multimedia scene obtained at each of shooting unit times T1, T2, and T3, which is obtained by shooting a target object at a predetermined camera angle. # SM1, # SM2, and # SM3 are scenes shot at different (first, second, and third) camera angles during the shooting unit time T2, and are hereinafter referred to as first, second, and third multi-view scenes.
Here, the multi-view scene is an example of a scene photographed at different angles. However, each scene in the multi-scene may be a scene photographed at a different time with the same angle, or data such as computer graphics. In other words, the multi-view scene section is a multi-field section, and the data of the section is not limited to the scene data actually obtained at different camera angles, but a section composed of data of a plurality of fields whose display times are the same can be selectively reproduced.
# SC1 and # SC2 are scenes shot at the same basic camera angle before and after the multi-view scene at the shooting unit times T1 and T3, respectively, and are hereinafter referred to as basic angle scenes. Typically one of the plurality of angles is the same as the base camera angle.
In order to easily understand the relationship between these angle scenes, the relay broadcasting of baseball will be described as an example. The basic angle scenes # SC1 and # SC3 are shot at a basic camera angle centered on the pitcher, the catcher, and the player seen from the center. The first multi-view scene # SM1 was shot at a first multi-camera angle centered on the pitcher, the catcher, and the hitter seen from the rear side of the net. The second multi-view scene # SM2 is photographed at a second multi-camera angle, i.e., a basic camera angle, centered on a pitcher, a catcher, and a hitter seen from the center side.
It means that the second multi-view scene # SM2 is the basic angle scene # SC2 in the photographing unit time T2. The third multi-view scene # SM3 is shot with the third multi-camera multi-view centered on the internal field seen from the rear side of the net.
The multi-view scenes # SM1, # SM2, and # SM3 capture a unit time T2, whose presentation time repeatedly appears, which is called a multi-view interval. The viewer freely selects the multi-view scene interval # SM1 by means of the multi-view interval. # SM2 and # SM3 allow the user to enjoy the image of the favorite angle scene in the basic angle scene as in the switching camera. Also, in the figure, it can be seen that there is a time gap between the basic angle fields # SC1 and # SC3 and the respective multi-view fields # SM1, # SM2 and # SM3, but this is because it is indicated by an arrow so as to easily understand what path of a field to select which multi-view field to play back is, and there is no time gap at all.
The multi-view control of the system stream on which the present invention is based is explained below from the viewpoint of data connection with reference to fig. 23. The multimedia data corresponding to the basic angle scene # SC is used as the basic angle data BA, and the basic angle data BA in the shooting unit times T1 and T3 are used as BA1 and BA3, respectively. The multi-view data corresponding to the multi-view scenes # SM1, # SM2, and # SM # are referred to as first, second, and third multi-view data MA1, MA2, and MA3, respectively. Referring to fig. 33, as described above, one of the multi-view scene data MA1, MA2, and MA3 is selected, and the image of the angle scene that is enjoyed can be switched. Also, there is no gap in time between the basic angle scene data BA1 and BA3 and the respective multi-view scene data MA1, M2, and M3.
However, in the case of an MPEG system stream, when arbitrary data in each of the multi-view data MA1, MA2, and MA3 is connected to the preceding base data BA1 and/or connected to the following base angle data BA3, depending on the content of the connected angle data, discontinuity in playback information between playback data may occur, and the playback cannot be performed naturally as one title. That is, in this case, although the non-tomographic data is reproduced, the non-tomographic information is not reproduced.
Next, a multi-view switching operation for selectively reproducing a plurality of scenes in a multi-scene section and reproducing non-tomographic information connected to the front and rear scenes in the DVD system will be described with reference to fig. 23.
The switching of the angle scene image, i.e., the selection of one of the multi-view scene data MA1, MA2, and MA3, must be completed before the end of the playback of the preceding basic angle data BA 1. For example, when the angle scene data BA1 is being played back, it is very difficult to switch to the other multi-view scene data MA 2. This is because multimedia data has a data structure of variable length coding MPEG, it is difficult to find an interruption of data in the middle of switching target data, and there is a possibility that an image is disturbed when angle switching is performed because correlation between frames is used when encoding processing is performed. In MPEG, a GOP is defined as a processing section having at least 1 update frame. In this processing section called GOP, closed processing without referring to frames belonging to other GOPs can be performed.
In other words, if any multiview data, for example, MA3, is selected at the latest at the time when the reproduction of the preceding basic angle data BA1 ends before the reproduction reaches the multiview zone, the selected multiview data can be reproduced without a slice. However, it is very difficult to reproduce multi-view scene data without a slice in the middle of reproducing the multi-view scene data. Therefore, it is difficult to obtain a viewpoint as freely as switching cameras in a multi-view period.
The flow chart is as follows: encoder for encoding a video signal
Next, an encoding information table generated by the encoding system control unit 200 will be described with reference to the script data St7 described above with reference to fig. 27. The coding information table is composed of a VOB set data string including a plurality of VOBs and VOB data strings of each scene corresponding to a scene section having a branch point and a join point of the scene as a partition line. The VOB set data string shown in fig. 27 will be described below.
In step #100 of fig. 34, the encoding system control unit 200 generates an encoding information table in order to generate a multimedia stream of a DVD according to the title content instructed by the user. The user-directed script has a bifurcation point leading from a common scene to multiple scenes, or a junction point leading to a common scene. The VwOB corresponding to the scene section having the branch point and the join point as the partition boundaries is set as a VOB set, and data generated for encoding the VOB set is set as a VOB set data string. In the VOB set data string, the number of titles presented when the multi-scene section is included is expressed as the number of titles of the VOB set data string.
The VOB set data structure of fig. 27 shows the contents of data used for encoding one VOB set of the VOB set data string. The VOB set data structure is constituted by a VOB set number (VOBs _ NO), a VOB number (VOB _ NO) of the VOB set, a leading VOB non-cross-linking flag (VOB _ Fsb), a following VOB non-cross-linking flag (VOB _ Fsf), a multi-field flag (VOB _ Fp), an interlace flag (VOB _ Fi), a multi-view flag (VOB _ Fm), a multi-view non-cross-linking switching flag (VOB _ FsV), a maximum bit rate (ILV _ BR) of an interlaced VOB, a division number (ILV _ DIV) of an interlaced VOB, and a minimum interlaced unit playback time (ILV _ MT).
The VOB set number VOBs _ NO is a number for identifying a VOB set focusing on, for example, the playback order of a title script.
The VOB number VOB _ NO in the VOB set is a number for identifying the VOB for all the title scripts, for example, with a view to the playback order of the title scripts.
The preceding VOB non-cross-section connection flag VOB _ Fsb is a flag indicating whether or not the preceding VOB is non-cross-section connected to the preceding VOB at the time of playback of the scenario.
The subsequent VOB non-cross-connection flag VOB _ Fsf is a flag indicating whether or not the script is non-cross-connected with the subsequent VOB at the time of playback.
The multi-scene flag VOB _ Fp is a flag indicating whether or not the VOB set is constituted by a plurality of VOBs.
The interleave flag VOB _ Fi is a flag indicating whether or not VOBs in the VOB set are interleaved.
The multiview flag VOB _ Fm is a flag indicating whether the VOB set is multiview.
The multi-view non-tomographic switching flag VOB _ FsV is a flag indicating whether switching within a multi-view is non-tomographic.
Interlaced VOB maximum rate ILV _ BR represents the value of the maximum bit rate of the VOB undergoing interleaving.
The interleaved VOB partition number ILV _ DIV represents the number of interleaved units of the VOB subjected to interleaving.
The minimum interleaved unit playback time ILVU _ MT indicates a time at which the bit rate of the VOB can be played back at the time of ILV _ BR in the minimum interleaved unit in which the tracking buffer does not underflow at the time of interleaved data block playback.
Next, an explanation will be given of the coding information table corresponding to each VOB generated by the coding system control unit 200 based on the scenario data St7 with reference to fig. 28. Based on the coding information table, coding parameter data corresponding to each VOB described below is generated and supplied to the video encoder 300, the sub-picture encoder 500, the audio encoder 700, and the system encoder 900. The VOB data string shown in fig. 28 is a coding information table of each VOB generated in the coding system control for generating a multimedia stream of the DVD based on the title contents instructed by the user in step #100 of fig. 34. The data generated for coding the VOB is a VOB data string with 1 coding part as the VOB. For example, a VOB set constituted by scenes of 3 angles is constituted by 3 VOBs. The VOB data structure of fig. 28 shows the contents of data for encoding one VOB of the VOB data string.
The VOB data structure includes a picture material start time (VOB _ VST), a picture material end time (VOB _ tend), a picture material type (VOB _ V _ tend), a video coding bit rate (V _ BR), a sound material start time (VOB _ AST), a sound audio material end time (VOB _ AEND), an audio coding scheme (VOB _ a _ tend), and an audio bit rate (a _ BR).
The start time VOB _ ST of the video material is a video encoding start time corresponding to the time of the image material.
The image material end time VOB _ VEND is an end time of video encoding corresponding to the image material time.
The picture material type VOB _ V _ kidd indicates whether the coded material is of the NTSC system or the PAL system, or whether the picture material has been subjected to the telecine conversion processing.
The video bit rate V _ BR is the encoding bit rate of the video signal.
The sound material start time VOB _ AST is an audio coding start time corresponding to the sound material time.
The sound material end time VOB _ AEND is an audio encoding end time corresponding to the sound material time.
The audio coding scheme VOB _ a _ kidd indicates the coding scheme of the audio signal. The encoding method includes AC-3, MPEG, linear PCM, etc.
The audio bit rate a _ BR is the encoding bit rate of the audio signal.
Fig. 29 shows coding parameters to be supplied to video, audio, and system encoders 300, 500, and 900 for coding a VOB. The encoding parameters include VOB number (VOB _ NO), video encoding start time (V _ STTM), video encoding end time (V _ ENDTM), video encoding mode (V _ endmm), video encoding bit RATE (V _ RATE), video encoding maximum bit RATE (V _ MRATE), GOP structure fixed flag (GOP _ FXflag), video encoding GOP structure (GOPST), video encoding initial data (V _ init), video encoding end data (V _ ENDST), audio encoding start time (a _ STTM), audio encoding end time (a _ ENDTM), audio encoding bit RATE (a _ RATE), audio encoding mode (a _ endmm), gap at sound start (a _ stbap), gap at sound end (a _ endcap), preceding VOB number (B _ VOB _ NO), and following VOB number (F _ VOB _ NO).
The VOB number VOB _ NO is a number for VOB for identifying, for example, the playback order of the title scripts and numbering all the title scripts.
The video encoding start time V _ STTM is a video encoding start time in terms of image material.
The video encoding end time V _ STTM is a video encoding end time in terms of image material.
The video encoding mode V _ ENCMD is an encoding mode for setting whether or not to perform reverse telecine conversion processing at the time of video encoding so as to enable efficient encoding when the image material is a telecine-converted material.
The video encoding bit RATE V _ RATE is an average bit RATE at the time of video encoding.
Video encoding maximum bit rate V _ MRATE is the maximum bit rate at the time of video encoding.
The GOP structure fixed flag GOP _ FXflag indicates whether the structure of the GOP is changed for encoding without intermediate decoding of the video. Is a parameter effective when non-tomographic switching can be performed in a multi-view scene.
The GOP structure GOPST of the video encoder is GOP structure data at the time of encoding.
The video encoding initial data V _ INST is a parameter effective when reproduced without a break from a preceding video decoded stream, such as setting an initial value of a VBV buffer (decoding buffer) at the start of video encoding.
The video encoding end data V _ ENDST is a parameter effective for seamless playback with the subsequent video decoded stream, such as setting an end value of a VBV buffer (decoding buffer) at the time of video encoding end.
The audio encoding start time a _ STTM is an audio encoding start time in terms of sound material.
The audio encoding end time a _ ENDTM is an audio encoding end time in terms of sound material.
The audio coding bit RATE a _ RATE is a bit RATE at the time of audio coding.
The audio encoding system a _ ENCMD is an encoding system of an audio signal, and includes systems such as AC-3, MPEG, and linear PCM.
The gap at the start of sound a _ STGAP is the time offset between the picture and the start of sound at the start of the VOB. Is a parameter effective when reproduced without a fault with a preceding system coded stream.
The gap a _ ENDGAP at the end of sound is the time when the end of sound is shifted from the end of image at the end of VOB. Is a parameter effective when reproduced faultlessly with a subsequent system code stream.
The previous VOB number B _ VOB _ NO indicates the VOB number in the case where the previous VOB having NO cross-sectional connection exists.
The subsequent VOB number F _ VOB _ NO indicates the VOB number in the case where the non-faulted subsequent VOB exists.
The operation of the DVD encoder ECD of the present invention will be described with reference to the flowchart shown in fig. 34. The blocks indicated by double-line blocks in the figure represent subroutines, respectively. The present embodiment describes the DVD system. It goes without saying that the same structure can be adopted for the authoring encoder EC.
In step #100, the user inputs an edit instruction to be added to the content of a desired scenario while checking the contents of the multimedia source data St2l, St2, and St3 in the edit information generation unit 100.
In step #200, the edit information generation unit 100 generates script data St7 including the edit instruction information according to the edit instruction by the user.
When generating the scenario data St7 in step #200, the edit instruction contents of the user are input under the following conditions when the edit instruction contents are interleaved for the multi-view and lock control multi-scene section assumed to be interleaved.
First, the maximum bit rate of VOB that can obtain sufficient picture quality in terms of picture quality is determined, and then the values of the tracking buffer capacity, the transition performance, the transition time, and the transition distance of the DVD decoder DCD, which is assumed to be a DVD encoded-data playback apparatus, are determined. Based on the above values, the playback time of the minimum interleave unit is obtained from equations 3 and 4.
Then, it is checked whether or not (equation 5) and (equation 6) are satisfied based on the playback time of each field included in the multi-field section. If the input instruction is not satisfied, the user changes the input instruction, and performs processing such as connecting a part of the subsequent scenes to each scene in the multi-scene section so as to satisfy (equation 5) and (equation 6).
When performing non-tomographic switching in the case of a multi-view editing instruction, an editing instruction to equalize audio signals in the playback time of each scene of a multi-view is input while satisfying (equation 7). When non-faultless switching is performed, an edit instruction of a user is input in accordance with a request satisfying the equation (8).
In step #300, the coding system control unit 200 first determines whether or not the target scene is connected to the preceding scene without a slice based on the scenario data St 7. The non-fault connection is a connection of any one of all scenes included in a preceding multi-scene section to a common scene to be connected at that time when the preceding multi-scene section is a multi-scene section composed of a plurality of scenes. Similarly, when the connection target at that time is a multi-scene section, the non-fault connection means that any one scene of the multi-scene section can be connected. If no in step #300, that is, if it is determined that the connection is not faultless, the flow proceeds to step # 400.
In step #400, the coding-system control unit 200 resets a preceding scene non-tomographic connection flag VOB _ Fsb indicating that the target scene is non-tomographic connected to the preceding scene, and then proceeds to step # 600.
If yes in step #300, that is, if it is determined that the preceding scene is not connected by a slice, the routine proceeds to step # 500.
In step #500, the leading scene non-tomographic connection flag VOB _ Fsb is set, and the process proceeds to step # 600.
In step #600, the encoding system control unit 200 determines whether or not the target scene and the subsequent scene are connected without a slice from the scenario data St 7. If the determination in step #600 is no, that is, if it is determined that the connection is not faultless, the routine proceeds to step # 700.
In step #700, the coding-system control unit 200 resets a subsequent scene non-tomographic connection flag VOB _ Fsf indicating that a scene is non-tomographic connected to a subsequent scene, and then proceeds to step # 900.
If it is determined as "yes" in step #600, that is, if it is determined that the scene is not connected to the subsequent scene, the process proceeds to step # 800.
In step #800, the coding-system control unit 200 sets the subsequent scene non-tomographic connection flag VOB Fsf, and then proceeds to step # 900.
In step #900, the coding system control unit 200 determines whether or not there is one or more scenes to be connected, that is, whether or not there are multiple scenes, based on the scenario data St 7. In the case of multi-fields, there are lock control in which playback is performed through only one playback path among a plurality of playback paths that can be configured with multi-fields, and multi-view control in which playback paths can be switched between the multi-fields. If the scenario step #900 determines "no", that is, if it is determined that the multi-scene connection is not made, the routine proceeds to step # 1000.
In step #1000, the multi-field flag VOB _ Fp indicating that the multi-field connection is present is reset, and the process proceeds to encoding parameter generation step # 1800. The operation of step #1800 will be described below.
On the other hand, if the determination in step #900 is yes, that is, if it is determined that the multi-field connection is made, the process proceeds to step # 1100.
In step #1100, after the multi-scene flag VOB _ Fp is set, the process proceeds to step #1200, in which whether or not the multi-view connection is made is determined.
In step #1200, it is determined whether or not switching is performed between a plurality of scenes in a multi-scene section, that is, whether or not a multi-view section is present. If it is determined in step #1200 that the playback is not switched in the middle of the multi-scene section, that is, if it is determined that the playback is to be locked through only one playback path, the routine proceeds to step # 1300.
In step #1300, the multiview flag VOB _ Fm indicating that the scene to be connected is multiview is reset, and the process proceeds to step # 1302.
In step #1302, it is determined whether any one of the preceding scene non-tomographic connection flag VOB _ Fsb and the subsequent scene non-tomographic connection flag VOB _ Fsf is set. If it is determined in step #1300 as "yes", that is, if it is determined that either one of the preceding scene and the subsequent scene or both of the scenes to be connected are connected without a break, the routine proceeds to step # 1304.
Step #1304 sets an interleaving flag VOB _ Fi indicating that VOBs that are encoded data of the target scene are interleaved, and the process proceeds to step # 1800.
On the other hand, if the determination in step #1302 is no, that is, if the target scene is not faultlessly connected to any of the preceding scene and the subsequent scene, the process proceeds to step # 1306.
In step #1306, after the interleave flag VOB _ Fi is reset, the process proceeds to step # 1800.
If yes is determined in step #1200, that is, if it is determined to be a multiview view, the process proceeds to step # 1400.
Step #1400 sets the multiview flag VOB _ Fm and the interlace flag VOB _ Fi, and then proceeds to step # 1500.
In step #1500, the coding-system control unit 200 determines from the scenario data St7 whether or not to perform so-called seamless switching in which images and sounds are not interrupted in a multi-view scene section, that is, in a playback unit smaller than the VOB. When no in step #1500, that is, when it is determined that the non-faultless switching is performed, the routine proceeds to step # 1600. In step #1600, the process proceeds to step #1800 after resetting the non-tomographic flag VOB _ FsV indicating that the object scene is non-tomographic.
On the other hand, if the determination in step #1500 is yes, that is, if it is determined to be a non-fault switching, the routine proceeds to step # 1700.
In step #1700, the non-tomographic switching flag VOB _ FsV is set, and the process proceeds to step # 1800. In this way, the present invention detects edit information as the set state of each flag based on script data St7 reflecting the idea of editing, and proceeds to step # 1800.
In step #1800, based on the user's editing concept detected as the flag-set state as described above, additional information of the coding information table of each VOB set part and VOB part shown in fig. 27 and 28 and coding parameters in the VOB data part shown in fig. 29 used for coding the source data stream are generated. Then, the process proceeds to step # 1900.
This step of generating the encoding parameter will be described in detail later with reference to fig. 35, 36, 37, and 38.
In step #1900, the video data and the audio data are encoded according to the encoding parameters generated in step #1800, and the process proceeds to step # 2000. The sub-picture data is originally inserted and used whenever necessary during image reproduction, and therefore continuity with preceding and following scenes and the like is not required. Since the sub-picture is picture information of about one picture portion, unlike video data and audio data which continue on the time axis, the sub-picture is displayed in a still state and is not reproduced continuously. Therefore, in the present embodiment regarding continuous playback of non-tomographic and non-tomographic images, the description of sub-image data encoding will be omitted for simplicity.
In step #2000, the process is repeated around the loop constituted by the steps from step #300 to step #1900 for the number of times equal to the number of VOB sets, program chain (VTS _ PGC # I) information of playback information such as the playback order of each VOB having a title in the self data structure in fig. 16 is formatted to generate an arrangement in which VOBs in a multi-scene section are interleaved, and then VOB set data strings and VOB data strings necessary for system encoding are completed. Next, the process proceeds to step # 2100.
In step #2100, VOB set number VOBs _ NUM that can be obtained as a result of the loop processing until decision #2000 is obtained and added to the VOB set data string, and the number TITLE _ NO in the case where the number of retrieved scenario playback paths is the number of TITLEs is set in the scenario data St7, and the VOB set data string as the coding information table is completed, and the routine proceeds to step # 2200.
In step #2200, system coding is performed for the purpose of generating VOB (VOB # i) data in the VTSTT _ VOBS of fig. 16, based on the video encoded stream, the audio encoded stream, and the coding parameters of fig. 29 encoded in step # 1900. Subsequently, the process proceeds to step # 2300.
In step #2300, a formatting process is performed, which includes generating the VTS information of fig. 16, the VISI management table (VTSI _ MAT) contained in the VTSI, the vtspc information table (VTSPGCIT), and the program chain information (VTS _ PGCI # I) controlling the playback order of the VOB data, and interleaving the VOBs contained in the multi-scene section, and the like.
Next, an operation of encoding parameter generation during multiview control in the encoding parameter generation subroutine of step #1800 in the flowchart shown in fig. 34 will be described with reference to fig. 35, 36, and 37.
First, with reference to fig. 35, an operation in the case where it is determined as "no" in step #1500 of fig. 34, that is, the flags are respectively set to VOB _ Fsb being 1 or VOB _ Fsf being 1, VOB _ Fp being 1, VOB _ Fi being 1, VOB _ Fm being 1, and FsV being 0, that is, a non-tomographic switching stream coding parameter generating operation in the multi-view control will be described. The encoding information tables shown in fig. 27 and 28 and the encoding parameters shown in fig. 29 are generated by the following operations.
Step #1812 extracts the playback order of the scenario included in scenario data St7, sets VOB set number VOBs _ NO, and sets VOB number VOB _ NO for one or more VOBs in the VOB set.
Step #1814 extracts the maximum bit rate ILV _ BR of the interleaved VOB from the scenario data St7, and sets the video encoding maximum bit rate V _ MRATE of the encoding parameter on the basis of the interleaving flag VOB _ Fi being 1.
Step #1816 extracts the minimum interleaved unit playback time ILVU _ MT from the script data St 7.
Step #1818 sets the values of N-15 and M-3 of the video coding GOP structure GOPST and the GOP structure fixed flag gopfxlag-1 on the basis of the multi-view flag VOB _ Fp-1.
Step #1820 is a common subroutine of VOB data setting. Fig. 36 shows a VOB data common setting subroutine of step # 1820. The encoding information table shown in fig. 27 and 28 and the encoding parameters shown in fig. 29 are generated in the following operation flow.
Step #1822 extracts the start time VOB _ VST and the end time VOB _ VEND of the image material of each VOB from the scenario data St7, and uses the video encoding start time V _ STTM and the encoding end time V _ ENDTM as parameters for video encoding.
Step #1824 extracts the sound material start time VOB _ AST of each VOB from the scenario data St7, using the audio coding start time a _ STTM as an audio coding parameter.
Step #1826 extracts the sound material end time VOB _ AEND of each VOB from the scenario data St7, and sets the time of an audio access unit (hereinafter, referred to as AAU) determined in an audio coding method at a time not exceeding the time of VOB _ AEND as a parameter (coding end time a _ ENDTM) of audio coding.
Step #1828 obtains the gap at the start of sound a _ STGAP as a parameter of system coding from the difference between the video coding start time V _ STTM and the audio coding start time a _ STTM.
Step #1830 obtains the gap a _ ENDTM at the end of sound as a parameter of system coding from the difference between the video coding end time V _ ENDTM and the audio coding end time a _ ENDTM.
Step #1832 extracts the video bit RATE V _ BR as the average bit RATE of video encoding from the script data St7, and takes the video encoding bit RATE V _ RATE as a parameter of video encoding.
Step #1834 extracts the audio bit RATE a _ BR from the script data St7, taking the audio encoding bit RATE a _ RATE as a parameter of audio encoding.
Step #1836 extracts the picture material type VOB _ V _ kill from the script data St7, and if it is a movie material, that is, a telecine-converted material, sets the reverse telecine conversion to the video encoding mode V _ ENCMD as a parameter of video encoding.
Step #1838 extracts the audio coding scheme VOB _ a _ kill from the script data, and sets the audio coding scheme in the audio coding mode a _ ENCMD as a parameter for audio coding.
In step #1840, the VBV buffer initialization value of the video encoding initialization data V _ INST is set to be smaller than the VBV buffer termination value of the video encoding termination data V _ ENDST, and is used as a parameter for video encoding.
In step #1842, the previous VOB NO-fault connection flag VOB _ Fsb is set to 1, and the previously connected VOB number VOB _ NO is set to the previously connected VOB number B _ VOB _ NO as a parameter of the system coding.
Step #1844 sets the VOB number VOB _ NO of the subsequent connection as the parameter of the system coding as the subsequent connection VOB number F _ VOB _ NO, on the basis that the subsequent VOB non-tomographic connection flag VOB _ Fsf is 1.
As described above, the coding information table and the coding parameters in the case of the non-seamless multiview switching control can be generated by the multiview VOB set.
Next, referring to fig. 37, a description will be given of an operation of generating coding parameters of a non-slice switching stream in multi-view control when yes is determined in step #1500 in fig. 34, that is, when each flag is VOB _ Fsb-1 or VOB _ Fsf-1, VOB _ Fp-1, VOB _ Fi-1, VOB _ Fm-1, and VOB _ FsV-1.
The encoding information tables shown in fig. 27 and 28 and the encoding parameters shown in fig. 29 are generated by the following operations.
Step #1850 extracts the playback order of the scenario included in data St7, sets VOB set number VOBs _ NO, and sets VOB number VOB _ NO for one or more VOBs in the VOB set.
Step #1852 extracts the maximum bit RATE ILV _ BR of the interleaved VOB from the scenario data St7, and sets the video encoding maximum bit RATE V _ RATE on the basis of the interleaving flag VOB _ Fi being 1.
Step #1854 extracts the minimum interleaved unit playback time ILVU _ MT from the script data St 7.
Step #1856 sets the value of N-15, M-3 of the video coding GOP structure GOPST and the GOP structure fixed flag gopfxlag-1 on the basis of the multiview flag VOB _ Fp-1.
In step #1858, a closed GOP is set in the video coding GOP structure GOPST as a parameter of video coding, based on the non-slice switching flag VOB _ FsV being 1.
Step #1860 is a common subroutine of VOB data setting. The common subroutine is the subroutine shown in fig. 35, and has already been described, and therefore, is omitted.
As described above, the encoding parameters in the case of the non-tomographic switching control can be generated with the VOB set of the multi-view.
Next, referring to fig. 38, description will be given of an encoding parameter generating operation in the lock control when no is determined in step #1200 in fig. 34 and yes is determined in step #1304, that is, when each flag is VOB _ Fsb-1 or VOB _ Fsf-1, VOB _ Fp-1, VOB _ Fi-1, and VOB _ Fm-0. The encoding information tables shown in fig. 27 and 28 and the encoding parameters shown in fig. 29 were generated by the following operations.
In step #1870, the playback order of the scenario included in scenario data St7 is extracted, VOB set number VOBs _ NO is set, and VOB number VOB _ NO is set for one or more VOBs in the VOB set.
Step #1872 extracts the maximum bit RATE ILV _ BR of the interleaved VOB from the scenario data St7, and sets the video encoding maximum bit RATE V _ RATE on the basis of the interleaving flag VOB _ Fi being 1.
Step #1874 extracts the VOB interleaved unit partition number ILV _ DIV from the script data St 7.
Step #1876 is a common subroutine set for VOB data, which is the subroutine shown in fig. 35, and has already been described, and is therefore omitted.
As described above, the encoding parameters in the case of lock control can be generated with VOB sets of multiple fields.
Next, an operation of generating an encoding parameter of a single scenario when the determination in step #900 in fig. 34 is "no", that is, when each flag is VOB _ Fp equal to 0 will be described with reference to fig. 53. The encoding information tables shown in fig. 27 and 28 and the encoding parameters shown in fig. 29 were generated by the following operations.
Step #1880 extracts the playback order of the scenario included in scenario data St7, sets a VOB set number VOBs _ NO, and sets a VOB number VOB _ NO for one or more VOBs in the VOB set.
Step #1882 extracts the maximum bit rate ILV _ BR of the interleaved VOB from the scenario data St7, and sets the video encoding maximum bit rate V _ MRATE based on the interleaving flag VOB _ Fi being 1.
Step #1884 is a common subroutine of VOB data setting. The common subroutine is the subroutine shown in fig. 35, and is not described here.
By means of the above-mentioned procedures for generating the coding information table and the coding parameters, the coding parameters for the video, audio, system code of the DVD and the formatting device of the DVD can be generated.
Decoder flow chart
Transfer flow from optical disc to bit stream buffer
Next, a decoding information table generated by the decoding system control unit 2300 will be described based on the scenario selection data St51 with reference to fig. 47 and 48. The decoding information table is composed of the decoding system table shown in fig. 47 and the decoding table shown in fig. 48.
As shown in fig. 47, the decoding system table is composed of a scenario information register unit and an access unit information register unit. The scenario information register unit extracts playback information such as a title number selected by the user and included in the scenario selection data St51, and records the playback information. The access unit information register unit extracts and records information required for reproducing each access unit information constituting the program chain, based on the script information extracted by the script information register unit and selected by the user.
The scenario information register section includes an ANGLE number register ANGLE _ NO _ reg, a VTS number register VTS _ NO _ reg, a PGC number register VTS _ PGCI _ NO _ reg, an AUDIO ID register AUDIO _ ID _ reg, a sub-image ID register SP _ ID _ reg, and an SCR buffer SCR _ buffer.
The ANGLE number register ANGLE _ NO _ reg records information about which one is played back in the case where there are multiple views in the played back PGC. The VTS number register VTS _ NO _ reg records the number of the VTS to be played back next among the plurality of VTSs present on the optical disc. The PGC number register VTS _ PGCI _ NO _ reg records information indicating which PGC is to be played back from among a plurality of PGCs present in the VTS for use in lock control and the like.
The sound ID register AUD10_ ID _ reg records information indicating which one of the plurality of audio streams existing in the VTS is played back. The sub-picture ID register SP _ ID _ reg records information indicating which sub-picture stream is played back in the case where a plurality of sub-picture streams exist in the VTS. The SCR buffer SCR _ buffer is a buffer for temporarily storing the SCR indicated at the head of the data group as shown in fig. 19. The SCR temporarily stored is output to the decoding system control unit 2300 as stream playback data St63 as described with reference to fig. 26.
The access unit information register section includes an access unit block mode register CBM _ reg, an access unit block type register CBT _ reg, a non-tomographic playback flag register SPB _ reg, an interleave flag register IAF _ reg, an STC reset flag register STCDF _ reg, a non-tomographic angle switching flag register SACF _ reg, a VOBU start address register C _ fbbu _ SA _ reg at the head of the access unit, and an access unit end VOBU start address register C _ LVOBU _ SA _ reg.
The access unit BLOCK mode register CBM _ reg indicates whether or not a plurality of access units constitute one functional BLOCK, and in the case of no constitution, the value thereof is recorded as "N _ BLOCK". When an access unit constitutes one functional BLOCK, the functional BLOCK has "F _ CELL" recorded in the head, "L _ CELL" recorded in the tail, and "BLOCK" recorded in the middle, as the corresponding values.
The access unit BLOCK type register CBT _ reg is a register that records the type of the partial BLOCK indicated by the access unit BLOCK mode register CBM _ reg, and records "a _ BLOCK" in the case of a multi-view and "N _ BLOCK" in the case of a non-multi-view.
The non-fault playback flag register SPF _ reg records information indicating whether or not the access unit is connected to the previously played back access unit or block for non-fault playback. In the case of playback without a cross-connection with the previous section or the previous block, the value thereof is recorded as "SML", and in the case of not being a cross-connection, the value thereof is recorded as "NAML".
The interleaving flag register IAF _ reg records whether the access unit is allocated in the interleaving area. The value is recorded as "ILVB" when the value is allocated to the interleaved area, and is recorded as "N _ ILVB" when the value is not allocated to the interleaved area.
The STC resetting flag register STCDF _ reg records information on whether it is necessary to reset an STC (system clock) used at the time of synchronization at the time of access unit playback. In the case where the resetting is necessary, the value is recorded as "STC _ RESET", and in the case where the resetting is unnecessary, the value is recorded as "STC _ NRESET".
The non-tomographic angle switching flag register SACF _ reg records information indicating whether or not the access unit belongs to an angle section and performs non-tomographic switching. If the angle section belongs to and the non-tomographic switching is performed, the value is recorded as "SML", and if not, it is recorded as "NSML".
The access unit start VOBU start address register C _ fvbu _ SA _ reg records the start address of the access unit start VOBU. The value indicates the distance of the logical sector of the head access unit to the VTS title VOBS (VTSTT _ VOBS) in terms of the number of sectors, and the number of sectors is recorded.
The access unit end VOBU start address register C _ LCOBU _ SA _ reg records the start address of the access unit end VOBU. The value indicates the distance of the logical sector of the head access unit to the VTS title VOBS (VTSTT _ VOBS) in terms of the number of sectors, and the number of sectors is recorded.
Next, the decoding table of fig. 48 is explained, and as shown in the drawing, the decoding table is composed of a non-tomographic multi-view information register section, a VOBU information register section, and a non-tomographic reproduction register section.
The non-tomosynthesis multiview information register section includes NSML _ AGL _ C1_ DSTA _ reg to NSML _ AGL _ C9_ DSTA _ reg. NSML _ AGL _ C1_ DSTA _ reg to NSML _ AGL _ C9_ DSTA _ reg record NSML _ AGL _ C1_ DSTA to NSML _ AGL _ C9_ DSTA in the PCI packet shown in fig. 20.
The non-tomographic multiview information register section includes SML _ AGL _ C1_ DSTA _ reg to SML _ AGL _ C9_ DSTA _ reg.
The SML _ AGL _ C1_ DSTA _ reg to SML _ AGL _ C9_ DSTA _ reg record SML _ AGL _ C1_ DSTA to SML _ AGL _ C9_ DSTA in the DSI packet shown in fig. 20.
The VOBU information register section contains a VOBU end address register VOBU _ EA _ reg.
VOBU _ EA in the SI packet shown in fig. 20 is recorded in the VOBU information register VOBU _ EA _ reg.
The non-tomographic reproduction register section includes an interleave UNIT flag register ILVU _ flag _ reg, a UNIT END flag register UNIT _ END _ flag _ reg, an ILVU END data group address register ILVU _ EA _ reg, a next interleave UNIT start address NT _ ILVU _ SA _ reg, a VOB inside header image frame display start time register VOB _ V _ SPTM _ reg, an VOB inside END image frame display END time register VOB _ V _ EPTM _ reg, a sound reproduction stop time 1 register VOB _ a _ GAP _ PTM1_ reg, a sound reproduction stop time 2 register VOB _ a _ GAP _ PTM2_ reg, a sound reproduction stop time 1 register VOB _ a _ GAP _ LEN1, and a sound reproduction stop time 2 register VOB _ a _ GAP _ LEN 2.
The interleaved unit flag register ILVU _ flag _ reg indicates whether or not the VOBU is in an interleaved region, and records "ILVU" in the case of an interleaved region, and does not record "N _ ILVU" in the case of an interleaved region.
The UNIT END flag register UNIT _ END _ flag _ reg records information indicating whether the VOBU is an END VOBU of the ILVU in the case where the VOBU is in an interleaved area. The ILVU is a continuous readout unit, and thus "END" is recorded if the VOBU being currently read out is the last VOBU of the ILVU, and "N _ END" is recorded if it is not the last VOBU.
The ILVU end data group address register ILVU _ EA _ reg records the address of the end data group of the ILVU to which the VOBU belongs in the case where the VOBU exists in the interleaved region. Where the address is the number of sectors from the NV of the VOBU.
The next ILVU start address register NT _ ILVU _ SA _ reg records the start address of the next ILVU in the case where VOBUs exist in the interleaved region. Where the address is the number of sectors from the NV of the VOBU.
The first image frame display start time register VOB _ V _ SPTM _ reg in the VOB records the time at which the display of the first image frame of the VOB is started.
A last image frame display end time register VOB _ V _ EPTM _ reg in the VOB records the time when the display of the last image frame of the VOB is ended.
The sound reproduction stop time 1 register VOB _ a _ RAP _ PTM1_ reg records the time at which sound reproduction is stopped, and the sound reproduction stop time duration 1 register VOB _ a _ GAP _ LEN1_ reg records the time interval at which sound reproduction is stopped.
The same applies to the sound reproduction stop time 2 register VOB _ a _ GAP _ PTM2_ reg and the sound reproduction stop time 2 register VOB _ a _ GAP _ LEN 2.
The operation of the DVD decoder DCD of the present invention, the block diagram of which is shown in fig. 26, is explained below with reference to the DVD decoder flow shown in fig. 49.
Step #310202 is a step of judging whether or not the optical disc has been inserted, and if the optical disc has been inserted, it proceeds to step # 310204.
After reading the volume file information VFS of fig. 22 in step #310204, the flow proceeds to step # 310206.
Step #310206 reads out the video management file VMG shown in fig. 22, extracts the VTS to be played back, and proceeds to step # 310208.
Step #310208 extracts the video title set menu address information VTSM _ C _ ADT from the management table TVSI of the VTS, and proceeds to step # 310210.
Step #310210 reads out the video title set menu VTSM _ VOBS from the optical disc based on the VTSM _ C _ ADT information, and displays the title selection menu. The user presses the menu to select a title. In this case, if not only a title but a title including a sound number, a sub-picture number, and a multi-view angle is input. The user's input ends and proceeds to the next step # 310214.
Step #310214 extracts VTS _ PGCI # i corresponding to the title number selected by the user from the management table, and the process proceeds to step # 310216.
Playback of the PGC is started at the next step # 310216. The reproduction of the PGC is ended, and the decoding process is also ended. When another title is to be played back later, if the script selection unit has a keyboard input from the user, the control such as the title menu display returning to step #310210 can be used.
The reproduction of the PGC of step #310216 described above is explained in more detail with reference to fig. 50. PGC playback step #310216 is shown to consist of steps #31030, #31032, #31034, # 31035.
Step #31030 sets the decoding system table of fig. 47. The ANGLE number register ANGLE _ NO _ reg, VTS number register VTS _ NO _ reg, PGC number register PGC _ NO _ reg, AUDIO ID register AUDIO _ ID _ reg, and sub-picture register SP _ ID _ reg are set by the user operating the scenario selection unit 210.
When the user selects the target and decides the PGC to be played back in a single value, the corresponding access unit information (C _ PBI) is extracted and set in the access unit information register. The set registers are CBM _ reg, CBT _ reg, SPF _ reg, IAF _ reg, STCDF _ reg, SACF _ reg, C _ FVOBU _ SA _ reg, C _ LVOBU _ SA _ reg.
After the decoding system table is set, the process of transferring data to the stream buffer in step #31032 and the decoding of data in the stream buffer in step #31034 are started in parallel.
The process of transferring data to the stream buffer in step #31032 is the process of transferring data from the optical disk M to the stream buffer 2400 in fig. 26. That is, the process of reading necessary data from the optical disk M and transferring the read data to the stream buffer 2400 in accordance with the header information selected by the user and the playback control information (navigation group NV) described in the data stream.
On the other hand, step #31034 is a part of the process of decoding the data in the stream buffer 2400 and outputting the decoded data to the video output terminal 3600 and the audio output terminal 3700 in fig. 26. I.e., a process of decoding and reproducing the data stored in the stream buffer 2400. Step #31032 is run in parallel with step # 31034.
More details will be described below with respect to step # 31032. The processing in step #31032 is performed in units of access cells, and the processing in one access cell ends, and it is checked in step #31035 whether the processing in PGC ends. If the processing of the PGC is not finished, the setting of the decoding system table corresponding to the next access unit is made in step # 31030. This process is performed until the end of the PGC.
Flow of decoding from stream buffer
Next, the decoding process in the stream buffer of step #31034 shown in fig. 50 will be described with reference to fig. 51.
Step #31034 is composed of step #31110, step #31112, step #31114 and step #31116 as shown.
Step #31110 is a process for transferring data in units of data groups from the stream buffer 2400 to the system decoder 2500 shown in fig. 26, and then the process proceeds to step # 31112.
Step #31112 performs data transfer, and transfers the data group data transferred from the stream buffer 2400 to each buffer, that is, the video buffer 2600, the sub-image buffer 2700, and the audio buffer 2800.
Step #31112 compares the IDs of the AUDIO and sub-video selected by the user, i.e., the AUDIO ID register AUDIO _ ID _ reg and sub-video ID register SP _ ID _ reg included in the scenario information register shown in fig. 47, with the stream ID and sub-stream ID in the packet header shown in fig. 19, and divides the matched packet into buffers (video buffer 2600, AUDIO buffer 2700, and sub-video buffer 2800), and then proceeds to step # 3114.
Step #31114 controls the decoding timing of each decoder (video decoder, sub-picture decoder, audio decoder), that is, performs synchronization processing between the decoders, and proceeds to step # 31116.
The synchronization process of each decoder of step #31114 will be described in detail below.
Step #31116 performs various basic decoding processes. That is, the video decoder reads out data from the video buffer and performs decoding processing. The sub-picture decoder reads out data from the sub-picture buffer and performs decoding processing in the same manner. The audio decoder reads out data from the audio buffer and performs decoding processing in the same manner. The decoding process ends, and step #31034 ends.
The previously described step #31114 will be described in more detail with reference to fig. 52.
Step #31114 is shown as being comprised of step #31120, step #31122, and step # 31124.
Step #31120 is a step of checking whether or not the connection between the preceding access unit and the access unit is non-fault connection, and if so, the routine proceeds to step #31122, and if not, the routine proceeds to step # 31124.
Step #31122 performs synchronization processing for non-fault.
On the other hand, step #31124 performs synchronization processing for non-faultless connections.
Video encoder
As a material of the video data St1 input to the video encoder 300 of fig. 25, there is a movie-shot image. However, a bit stream recorded on a DVD or the like is premised on connection to a television set for home use. For video sources to facilitate editing during encoding of multimedia bitstreams, digital VTRs are typically used when supplying material to the authoring encoder of fig. 25. Since the frame rate of film is 24 frames per second, and the frame rate of video is 29.97 frames per second in NTSC home televisions and digital VTRs, film material taken with film is subjected to rate conversion by a frame rate conversion technique called telecine conversion to generate an image signal to be recorded in the digital VTR.
Next, an embodiment 1 of the inverse telecine conversion circuit according to the present invention will be described with reference to fig. 39. Fig. 39 shows a detailed configuration of a video encoder 300A in which the inverse telecine conversion circuit according to the present invention is incorporated in the video encoder 300 of fig. 25.
The video encoder 300A is constituted by frame memories 304, 306, an inter-field differentiator 308, a threshold decider 310, a telecine period decider 312, a selector 314 and an encoding device 316.
The input controller 302 is connected to the edit information generation section 100 and the encoding system control section 200 shown in fig. 25, and receives the video signal St1 and the timing signal St9, respectively, and when the video signal St1 is a telecine image, it includes telecine inverse conversion instruction information used as video encoder control data.
The telecine image RT1 is held by the frame memory 304 for 1 frame, and then inputted to the frame memory 306, the selector 314, and the field differentiator 308 as a 1-frame delayed telecine image RT 2.
The field differentiator 308 accumulates the difference amount between the same parity fields in the telecine image RT2 delayed by 1 frame and the telecine image RT1 of the current frame (input from the input controller 302), and inputs the accumulated result as a difference value RT3 into the threshold decider 310.
The threshold determiner 310 compares the difference value RT3 with a predetermined threshold, and inputs a comparison result signal RT5 indicating the result of the comparison to the telecine period determining circuit 312.
The telecine period determination circuit 312 internally generates period information RT6 in accordance with the comparison result signal RT5, and outputs a selector control signal BT7 that controls the selector 314 to the selector 314 in accordance with the period information RT6 so as to output an output image in a telecine period. The telecine period determination circuit 312 outputs to the encoding device a repeat 1 st field flag RFF indicating whether or not redundant fields are deleted from each frame, a top field (also called top field) 1 st flag TFF indicating the display order of 2 fields in a frame, and an output picture enable flag IEF indicating whether or not a frame input to the encoding device 316 is encoded.
The 1-frame delayed telecine image R2 output from the frame memory 304 holds 1 more frame in the frame memory 306, and generates a 2-frame delayed telecine image RT4. Such an image RT4 is input to the selector 314.
Based on the 1-frame-delayed telecine image RT2 output from the frame memory 304, the 2-frame-delayed telecine image RT4 input from the frame memory 306, and the selector control signal RT7 input from the telecine period determiner 312, the selector 314 selects a top field and a bottom field (also called bottom field) from any one of the 1-frame-delayed telecine image RT2 and the 2-frame-delayed telecine image RT4 and generates an anti-telecine image RT 8. This inverse telecine image is output to the encoding device 316.
The encoding device 316 compression-encodes the inverse telecine image RT8 input from the selector 314 and the edit TFF, RFF, and IEF input from the telecine period determination circuit 312.
Fig. 32 shows a movie material, an NTSC video signal (i.e., a telecine image) generated from the movie material by a telecine transform, an inverse telecine image encoded (i.e., an inverse telecine transform) by the video encoder 300A incorporated in the above-described inverse telecine transform circuit, and a reproduced image in which the encoded data is decoded.
Line 1 shows a movie image IF of 24 frames per second.
Line 2 shows the NTSC signal (i.e., the telecine image RT1) after telecine conversion of the movie image of line 1.
Line 3 shows an inverse telecine image RT8 obtained by inverse telecine conversion in which redundant fields are detected and deleted when the telecine image of line 2 is video-encoded, and a repeat 1 st field flag RHH and a top field 1 st flag TFF as flag data in video encoding. RHH indicates that the previous field is used as 1 field of the next playback frame in terms of the time for constructing the frame. TFF indicates that the previous field is the top field in terms of time constituting a frame.
Line 4 shows a reproduced image IR (NTSC signal) obtained when the encoded data of the inverse telecine image of line 3 is video decoded.
Basically, as shown in fig. 32, the telecine conversion implements a conversion of the frame rate by inserting redundant fields that periodically copy the same parity field. The movie image IF is a movie image of 24 frames per second, whose top field F1t of frame F1 is copied and whose bottom field F3b of frame F3 is copied, thereby converting 4 frames of frame F1 to frame F4 into 5 frames of frame F1 'to frame F5' of the telecine image RT 1.
When the thus obtained telecine-converted video signal (i.e., the telecine image RT1) is compression-encoded, compression-encoding at the original frame rate is equivalent to encoding a redundant field even when it is copied, and the efficiency is deteriorated. Therefore, after detecting and deleting the duplicated redundant field, that is, after inverse telecine conversion, compression encoding is usually performed, and a repeat 1 st field flag RFF indicating whether or not the redundant field is deleted in each frame and a top field 1 st flag TFF indicating the display order of 2 fields in the frame are added and recorded.
Since the frame rate of the movie and the frame rate of the video are not in a simple integer ratio relationship, and a conversion pattern different from the usual one is sandwiched between the frames processed periodically, as shown in the figure, the telecine conversion converts a portion corresponding to 4 frames of the movie into 5 frames, and the rate of 24 frames per second becomes a rate of about 30 frames per second. Thus, the telecine images are regularly changed with a period of substantially 5 frames of the telecine images, and the period of each of the frames is taken as a telecine period. The process of obtaining the inverse telecine image from the cine image conversion is changed separately for each telecine period,
next, the operation of the above-described inverse telecine conversion circuit 300A will be described with reference to a timing chart shown in fig. 42.
In the figure, line 1 shows a telecine image input RT1, a 1-frame delayed telecine image RT2, a differential value RT3, and a 2-frame delayed telecine image RT4 in this order.
Line 2 shows the output timing of the comparison result signal RT 5.
Line 3 shows the period information RT6 for the telecine image. In the figure, the cycle information is represented as a state (state).
Line 4 shows the selector control signal RT 7. Line 5 shows the inverse telecine image RT8 output. Line 6 shows the top field 1 st flag TFF, the repeat 1 st field flag RFF and the output image valid flag IEF.
In the state 0 as the first cycle, the reverse telecine image RT8 is formed by the fields F1t and F1b of the frame F1 'of the telecine image, TFF 1 is set to 1, the top field of F2' is the same as F1t, and RFF 1 is set to 1 because the copy (copy) field is used at the time of playback of the next frame, starting at the time point after the input of the frames F1 'and F2' of the telecine image RT1 to the frame memories 304 and 306 is ended.
In state 1, the TFF is set to 0 since the reverse telecine image RT8 is formed by the top field F2t of the bottom fields F2b and F3 'of F2' of the telecine image RT1 and the frame structure leading in bottom field time is formed at the time point after the input of the frames F2 'and F3' of the telecine image RT1 to the frame memories 304 and 306 is completed, and the RFF is set to 0 since the field is not copied.
In state 2, at the time point after the end of inputting the frames F3 'and F4' of the telecine image RT1 to the frame memories 304 and 306, the bottom field of the telecine image F3 'is set to 0, the bottom field of the F4' is set to the same as F3b, and the next frame is reproduced, the copy field is set to 1, because the reverse telecine image RT8 is configured by the bottom field F3b of the telecine image F3 'and the top field F3t of the F4'.
In state 3, the video movie image RT1 starts at the time after the input of the frame F4 ' and the frame memories 304 and 306 of F5 ' is finished, and the reverse video movie image RT8 is formed by F4t and F4b of the video movie image F5 ', and the frame structure is advanced in the top field time, so that TFF is set to 1, and RFF is set to 0 because no field is copied.
In the state 4, although it starts at a time after the input of the frame F5 'and the next cycle F1' of the telecine image RT1 to the frame memories 304, 306 is ended, the inverse telecine image RT8 is not generated in this cycle.
The inverse telecine image RT8 is generated and encoded repeating from state 0 to state 4 as previously described.
The conversion from the telecine image RT1 to the inverse telecine image RT8 shown in fig. 32 is a telecine conversion, and when the difference between fields between consecutive fields and between bottom fields is smaller than a predetermined threshold value, it is determined as a copy field and the field is deleted. At the same time, the aforementioned RFF, TFF flags are generated as shown.
The original telecine image can be easily reproduced as shown in the reproduced image IR by using these flags at the time of reproduction. That is, since the frame F1 of the inverse telecine image RT8 has TFF 1, the top field F1t of F1 is output first, and the bottom field F1b of F1 is output next. Since RFF is 1, F1t, which is the 1 st field, is output again.
In the frame F2, TFF is 0, and therefore, the bottom field F2b of F2 is output first, and the top field F2t of F2 is output next. The 2 nd output F1t and F2b constitute a new 1 frame F2. In the frame F3, since TFF is 0, the bottom field F3b is output first. The top field F3t is then output. Since RFF is 1, the bottom field F3b is output again. In the frame F4, since TFF is 1, the top field F4t is output first, and the bottom field F4b is output next. In this way, the telecine image RT1 can be played back using the logo.
In fig. 42, the telecine image RT1 and the frame memory 304 of fig. 39 are compared with the telecine image RF2 delayed by 1 frame and output and delayed by 1 frame, and since F1t and F1 t' of fig. 32 are copy fields, the threshold determiner 310 outputs a high level (Hi). Since F1b and F2b in fig. 32 are not copy fields, the comparison result signal RT5, which is the output of the threshold determiner 310, is low (Lo). At this time, the telecine image decision circuit 312 assumes a certain telecine cycle state, here state 0. When the circuit judges that this state is 0, the output selection signal is controlled to Lo to sequentially output F1t, F1b of fig. 32 and simultaneously output TFF 1, RFF 1, and the selector 314 selects and outputs the 2-frame-delayed telecine image rt4 as the output of the frame memory 306 in fig. 39 using the selector control signal RT7 as the output from the telecine period determiner, thereby sequentially outputting F1t, F1b of fig. 32 as the inverse telecine image RT 8.
In the next frame, since F1 t' and F2t shown in fig. 32, and F2b and F3b are not copy fields, the telecine period determination circuit 312 moves to the next state 1, and switches the selector 314 by the selector control signal RT7 to output F2t and F2b of fig. 32 in order. Since the bottom field leads for such a frame, the output TFF becomes 0, and since the first field is displayed only 1 time, the output RFF becomes 0.
Similarly, since the frame rates are different between the outputs to F4t and F4b in fig. 32, the telecine inverse conversion circuit 300A stops outputting for 1 frame time and stops outputting. To indicate such a rest period, the telecine period determination circuit 312 sets the output image valid flag IEF to "no" (invalid state) during this period.
When an inverse telecine image of an endless period is required, that is, when encoding is performed at a frame rate after the inverse telecine conversion, such a memory is read at a frame rate after the inverse telecine conversion using a FIFO memory or the like for the subsequent frame rate conversion, and encoding is performed.
However, when it is desired to continuously reproduce a plurality of VOBs each having a reverse telecine transform, reproduction of non-tomographic information at the connection point becomes a problem, and in order to explain this problem easily, the following description will be made based on an example of protective lock control.
The states of the telecine conversion, the encoded image, and the reproduced image at the time of the above-described locking control will be described with reference to fig. 40 and 41. An example of 3 protective locked connections between VOBa, VOBb, VOBc is shown at 40. Line 1 of fig. 41 shows a telecine image RT1 input into the video encoder 300A. Similarly, line 2 shows a signal St15 obtained by encoding an inverse telecine image RT8 obtained by inverse telecine transforming the telecine image RT1 shown in line 1 with the video encoder 300A. In the figure, an inverse telecine image is shown. Line 3 shows the playback image IR decoded from the video coding stream St 15.
In this example, the VOBa ending at frame F18 of the original telecine image, the VOBb ending at frame F44 from frame F19 of the original telecine image, and the VOBc starting at frame F45 of the original telecine image are obtained by inversely transforming and compression-encoding the original continuous telecine image RT1 of line 1, and depending on the situation of the target viewer, it may be necessary to skip the VOBb and perform non-tomographic continuous playback from the VOBa to the VOBc. In this case, the telecine reverse transform of line 3 records the end of the image VOBa. When RFF is 0 and TFF is 1, and the start of VOBc starts at RFF 0 and TFF 1, and continuous playback is performed, line 1 continues at the top field (top field) of the connection point between VOBa and VOBc as shown in line 3, and the operation of the MPEG decoder in this case cannot be guaranteed in general, which corresponds to the case where 1 field is inserted into the DVD decoder, and deleted while the playback picture is made to coincide with the front and back, or in the worst case, an irrelevant field is inserted. Even in the former case, the asynchrony with the sound occurs. Therefore, complete faultless playback cannot be achieved.
In response to such a problem, the present invention performs inverse telecine conversion so that the values of RFF and TFF at the start and end of each VOB become predetermined values when a plurality of logical recording sections (that is, VOBs) are provided on the same recording medium, and a method of the inverse telecine conversion will be described in detail later with reference to fig. 43 and 44. The concept will be briefly explained below.
At the start of VOB, the marks RFF and TFF are fixed to a predetermined value, and the removal of redundant frame end is prohibited, and the reverse telecine conversion is performed, and when the marks RFF and TFF generated based on the actual redundant field detection result become the predetermined value, the removal of redundant field is started, the values of the marks RFF and TFF are output, and the reverse telecine conversion is performed so that the marks RFF and TFF at the start of VOB have the predetermined values.
In addition, in order to make the flags RFF, TFF at the end of each VOB a predetermined value, a means is provided for detecting in advance the position of the redundant field of the telecine image RT1 corresponding to the VOB and generating the flags RFF, TFF on the basis of the detection result, when the telecine reverse conversion and compression coding are actually performed, the frame located near the end of the VOB among the frames after the telecine reverse conversion in which the flags RFF, TFF at the end of the VOB become the predetermined value stops the operation of removing the duplicated redundant field, and the telecine reverse conversion is performed so that the flags RFF, TFF at the end of the VOB have the predetermined value.
Alternatively, means for detecting whether or not the end of the telecine image RT1 corresponding to the VOB is approached is provided, and when it is determined that the end of the VOB is approached, the elimination of the redundant field is restricted, and the telecine is reversely converted so that the flags RFF and TFF at the end of the VOB have predetermined values.
By performing the reverse telecine conversion by this means, the values of the flags RFF and TFF at the start and end of the VOB are made uniform to a predetermined value, and even if the VOBs are continuously reproduced, the bottom fields or the top fields are not continuous. Therefore, when a plurality of VOBs are played back continuously, seamless playback can be realized even at the boundary of the VOBs.
Next, another embodiment of the inverse telecine conversion circuit of the present invention, the concept of which is described above, will be described with reference to fig. 45. Fig. 45 shows a detailed configuration of a video encoder 300B in which the inverse telecine conversion circuit according to the present invention is incorporated in the video encoder 300 of fig. 25. The video encoder 300B according to the present embodiment has the same configuration as the video encoder 300A shown in fig. 39, and includes frame memories 304 and 306, a field differentiator 308, a threshold determiner 310, a telecine period determiner 312, a selector 314, and an encoding device 316. However, as shown, VOB end detector 318 and control signal fixing circuit 322 are added as compared with video encoder 300A.
The VOB end detector 318 is connected to the edit information generation section 100 of the DVD encoder ECD and receives supply of a time code input in synchronization with a video stream included in the video stream St 1. The VOB end detector 318 outputs a VOB end signal RT9 which becomes high level (Hi) at least several frames before the time code of the VOB end, based on the video encoder end time V _ ENDTM (see fig. 29) which is the encoding parameter generated by the encoding system control unit 200.
In this embodiment, the time code of the last frame of the state 3 of the telecine cycle in the VOB is set, and the VOB end signal rt9 is output at the time when the frame is input, and when the time code corresponding to the telecine cycle is unknown, the VOB end signal RT9 may be output before the 1-cycle part, that is, 5 frames of the VOB end time code.
The control signal fixing circuit 322 is connected to the VOB end detector 318, receives the supply of the VOB end signal RT9, and at the same time, is connected to the telecine period determiner 312, receives the supply of the top field 1 st flag TFF, the repeat field 1 st flag RFF, and the output picture valid flag IEF. The control signal fixing circuit 322 controls the selector control signal RT7, the top field 1 st flag TFF, the repeat 1 st flag RFF, and the output picture valid flag IEF based on the VOB terminal signal RT9, and outputs the 2 nd selector control signal RT7 ', the 2 nd top field 1 st flag TFF', the 2 nd repeat 1 st flag RFF ', and the 2 nd output picture valid flag IEF'.
The selector 314 is connected to the control signal fixing circuit 322 and receives the supply of the 2 nd selector control signal RT7 ', and similarly, the encoding device 316 is also connected to the control signal fixing circuit 322 and receives the supply of the 2 nd top field 1 st flag TFF', the 2 nd repetition 1 st field flag RFF ', and the 2 nd output picture valid flag IEF'.
When detecting that TFF is 1 and RFF is 0 after the VOB end signal becomes high (Hi), the control signal fixing circuit 322 controls the picture before coding to be in a state where TFF is 1 and RFF is 0, and performs coding processing on the frame of the input telecine image RT1 as it is. That is, TFF '1, RFF' 0, IEF '1, and RT 7' 1 are fixed, and the redundant field is prohibited from being deleted again. Since the changes of RT7 and IEF are synchronized with RFF and TFF, only the changes of RFF and TFF are sufficient.
That is, in the video encoder 300B relating to the present embodiment, unlike the video encoder 100A, the selector 314 and the encoding device 316 detect the VOB end in the video encoded stream St1 by the VOB end detector 318 and the control signal fixing circuit 322 on the basis of the time code included in the video stream St1 and the encoding parameter St9 from the encoding system control section 200, and the deletion of the redundant field can be controlled more precisely, so that the inverse telecine conversion processing can be performed more efficiently and accurately.
Next, the inverse telecine transform method of the video encoder 300B will be described with reference to fig. 43 and 44. Lines 1 to 3 in fig. 43 and 44 are the same as in fig. 40 and 41 showing the described reverse telecine conversion timing, and therefore, the description thereof is omitted. However, in line 5, VOB end detection signal RT9 is shown. Box GF1 shows the end period of VOBa, box GF2 shows the start period of VOBb, and box GF3 shows the end period of VOBc.
First, considering the reverse telecine conversion of the telecine image RT1, it is noted that the redundant field is detected in advance at the end of the VOBa at the end of the frame F18 in the original telecine image RT1, and the inclusion of the redundant field is known. When the telecine image RT1 is subjected to inverse telecine conversion as it is, RFF 'and TFF' shown in fig. 40 are generated. Here, in the frame of VOBa, TFF ' is 1 and RFF ' is 0, and the frame at the nearest end is the frame F12 ' in fig. 41, so when deletion of redundant fields in the section indicated by the frame GF1 is prohibited later, VOBa that ends in the bottom field without fail as shown in the reproduced image IR in fig. 44 is obtained.
Next, note that the start of VOBb, at the start of VOBb, deletion of the actual redundant field is prohibited, flags are output from the states of TFF '═ 1 and RFF' ═ 0, and as a result of the redundant field check, deletion of the redundant field is started after the flags become TFF '═ 1 and RFF' ═ 0, and the section in box GF2 is the redundant field deletion prohibition period.
At the end of VOBb, the same processing as that at the end of VOBa is performed, that is, the redundant field is not deleted during the period indicated by frame GF 3.
Since TFF 'is 1 and RFF' is 0 originally at the start of VOBc, the mode for deleting redundant fields is entered quickly.
As described earlier, at the time of generating each VOB access unit; in the case where TFF 'is 1 and RFF' is 0, the end of VOBa, the start and end of VOBb, and the start of VOBc, and in the case where VOBa → VOBb → VOBc are continuously reproduced, or in the case where VOBa → VOBc is continuously reproduced, no field discontinuity occurs, and seamless reproduction can be ensured.
Next, the operation of the video encoder 300B of embodiment 2 of the telecine inverse transform circuit relating to the present invention will be described in more detail with reference to the timing chart shown in fig. 46. The timing chart of the present embodiment is drawn such that in the video encoder 300A and the timing chart shown in fig. 42, the VOB end signal RT9 is added, the 2 nd selector repeats the operations of the 1 st field flag RFF ' and the 2 nd output picture valid flag IEF ' with respect to the signals RT7 ', the 2 nd top field 1 st flag TFF ', and the 2 nd output picture valid flag IEF ', and the relationship between the original flag and each 2 nd flag is clearly shown based on the VOB end signal RT9 already described with reference to fig. 43.
In the same figure, the case where the reverse telecine stop signal RT9 based on a time code or the like is input at the timing of F4t in the telecine image input is shown. The operation until the reverse telecine stop signal is input is the same as the case described with reference to fig. 42.
The reverse telecine operation is stopped during a frame period of state 3 in which the reverse telecine stop signal RT9 is input and TFF '1 and RFF' 0 are output for the first time, and the input telecine image is output as it is. Thus, regardless of the position at which the encoding is stopped, the access unit ends at the frame leading the top field, and seamless playback when a plurality of VOBs are played back continuously can be ensured.
With reference to fig. 43, 44, 45, and 46, since the VOB end detector 318 and the control signal fixing circuit 322 described above are sufficiently configured by software such as a program or an electric circuit by those skilled in the art to be able to perform such an operation, a detailed description of the configuration thereof will be omitted here.
In the present embodiment, although the end of the VOB is detected by using the time code, it can be realized by using a method of counting the number of frames or the like, and the effect of the present invention is also effective for the related art. Although the VOB is ended in the state where TFF 'is 1 and RFF' is 0, the other states may be limited to the boundaries of a plurality of VOBs, and no problem may occur in the telecine period.
In the above-described embodiment, a method of sequentially performing inverse telecine conversion processing and video encoding processing from 1 time of telecine image input is shown, and as the following 2 nd embodiment, a method of detecting a telecine conversion period and TFF and RFF by the 1 st input, and generating TFF 'and RFF' by the 2 nd input and performing video encoding processing as shown in fig. 43 is described, and the processing in this case can be realized by adding a memory for storing telecine period information of VOB to be encoded to the telecine period determiner 312 in the video encoding unit shown in fig. 39.
That is, the 1 st telecine image input is used to sequentially perform the inverse telecine conversion processing, and the telecine period information, which is the processing result, is stored in the telecine period determiner 312, and then the 1 st processing is terminated. Next, the telecine cycle determiner 312 evaluates TFF and RFF of each telecine inversely converted image from the frame trace back time at the encoding end time, and if a state in which TFF is 1 and RFF is 0 is detected, sets TFF and RFF of the frame from the frame to the end of the VOB to TFF '1 and RFF' 0.
As an embodiment different from the above-described embodiment 2, when the telecine cycle information of the encoded telecine converted image is known, the telecine cycle information is input to the telecine cycle determiner 312 before the encoding process, and the above-described TFF/RFF conversion process and the above-described 2 nd process are performed by the telecine cycle determiner 312, and the same effect can be obtained by inputting the TFF 'and RFF' subjected to the TFF/RFF conversion process to the telecine cycle determiner.
Video encoding including the above-described inverse telecine conversion circuit corresponds to the video encoder 300 of fig. 25, and the video encoding process of step #1900 of fig. 34B is further performed based on the video encoding start time V _ STTM and the video encoding end time V _ ENDTM, depending on whether or not the inverse telecine conversion process set in the video encoding mode V _ ENCMD among the encoding parameters set in step #1800 as a subroutine of the encoder flowchart shown in fig. 34, that is, the encoding parameters shown in fig. 29 is performed.
As described above, according to the present invention, even if the access unit is continuously reproduced, the bottom field or the top field is not continuous, and even at the boundary of the access unit, the seamless reproduction can be performed.
Industrial applicability of the invention
As described above, the method and apparatus for bit stream interleave recording and playback of a medium according to the present invention are suitable for use in a title that can compose a bit stream for transmitting various information and edit the bit stream according to a user's request to compose a new title, and are further suitable for a digital video disc system, so-called DVD system, developed in recent years.
Claims (7)
1. A method of generating a recording bitstream (IR) from an input bitstream (RT1), wherein said input bitstream comprises a plurality of video objects, each of said plurality of Video Objects (VOBs) comprising video data representing a video signal, said video data comprising one or more video frames having at least a top field and a bottom field, said method of generating said recording bitstream comprising:
converting said input bitstream (RT1) into an intermediate bitstream (RT8) by deleting redundant fields repeated in said video data, and adding information comprising for each video frame a1 st flag (TFF) indicating whether the 1 st field of the corresponding video frame is a top field or a bottom field, and a2 nd flag (RFF) indicating whether said 1 st field is displayed a plurality of times during video display, and
-supplying said recording bit stream (IR) by compression encoding of said intermediate bit stream (RT8),
it is characterized in that the preparation method is characterized in that,
in the step of converting the input bitstream (RT1) into the intermediate bitstream (RT8), deletion of repeated redundant fields of the video data is selectively prohibited and performed in a manner that the 1 st field of each video object is a top field and the last field is a bottom field.
2. The bit stream generation method as claimed in claim 1,
in the step of converting said input bitstream (RT1) into said intermediate bitstream (RT8), the deletion of repeated redundant fields of said video data is performed in such a way that a first flag (TFF) corresponding to a first video frame and a last video frame of each video object indicates that a first field is a top field and a second flag (RFF) corresponding to a last video frame of each video object indicates that said first field is not displayed multiple times.
3. The bit stream generation method as claimed in claim 1,
in the step of converting said input bitstream (RT1) into said intermediate bitstream (RT8), the deletion of repeated redundant fields of said video data is performed in such a way that a first flag (TFF) corresponding to a first video frame of a video object indicates that the first field is a top field, a first flag (TFF) corresponding to a last video frame of said video object indicates that the first field is a bottom field, and a second flag (RFF) corresponding to a last video frame of said video object indicates that said first field is displayed a plurality of times during video display.
4. An apparatus for generating a recording bitstream from an input bitstream, said input bitstream comprising a plurality of video objects, said plurality of Video Objects (VOBs) comprising video data representing a video signal, said video data comprising 1 or more video frames having at least a top field and a bottom field,
it is characterized in that the preparation method is characterized in that,
the device comprises:
storage means (304, 306) for storing a plurality of fields of the video signal,
means (308) for comparing fields of the same parity in said storage means (304, 306),
means (310) for deleting redundant fields in accordance with the output of said comparing means (308),
means (314, 316) for generating an intermediate signal from the video signal representing the deleted redundant fields,
flag generating means (312) for generating a first flag (RFF) indicating whether a field is deleted and a second flag (TFF) indicating whether the first field is a top field in a resulting frame,
means (316) for generating a recording signal based on the intermediate signal, the first marks and the second marks,
a controller (318, 322) for controlling the recording signal generating means and the intermediate signal generating means such that the first field of each video object is a top field and the last field of each video object is a bottom field.
5. An apparatus for generating a recording bitstream from an input bitstream according to claim 4, further comprising means (312) for scanning in advance the positions of redundant fields of a video signal corresponding to a Video Object (VOB) and generating a flag according to the result,
it is characterized in that the preparation method is characterized in that,
during the actual signal conversion, the removal of redundant fields (RFF) at frames near the end of the video object (VOB end), where said frames are frames of the intermediate signal at the end of the Video Object (VOB) where said flag is a special value, is interrupted, and said video signal is converted into said recorded signal so that at the end of the Video Object (VOB) said flag is a special value.
6. The apparatus for generating a recording bit stream from an input bit stream as claimed in claim 4,
further comprising comparing means (318) for deleting the end of the video signal corresponding to the video object,
it is characterized in that the preparation method is characterized in that,
during the actual signal conversion, the removal of redundant fields (RFF) at frames near the end (VOB end) of the video object, where said frames are frames of the intermediate signal at the end (VOB end) of the video object where said flag is a special value, is interrupted, and said video signal is converted into said recorded signal so that at the end (VOB end) of the video object said flag is a special value.
7. The apparatus for generating a recording bit stream from an input bit stream as claimed in claim 5,
when a video frame of the video signal contains a field to be removed and the first field is a bottom field, the control means (322) decides whether or not the video frame is a video frame closest to the end of the video signal, and controls the video data generation means to stop removing the field that can be removed from the remaining video signal if the video frame is the video frame closest to the end of the video signal.
Applications Claiming Priority (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP7/252733 | 1995-09-29 | ||
| JP25273395 | 1995-09-29 | ||
| PCT/JP1996/002799 WO1997013362A1 (en) | 1995-09-29 | 1996-09-27 | Method and device for encoding seamless-connection of telecine-converted video data |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| HK1016793A1 HK1016793A1 (en) | 1999-11-05 |
| HK1016793B true HK1016793B (en) | 2004-04-02 |
Family
ID=
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN1164102C (en) | Multi-view connection coding method and device for bit stream | |
| CN1113532C (en) | Encoding method and device for bit stream seamless connection system | |
| CN1516124A (en) | Encoding method and device for assigning multiple retrieval playback path information to bit stream | |
| CN1197572A (en) | Method and device for bit stream interleaved recording and playback of media | |
| CN1164103C (en) | Method and apparatus for non-tomographic reproduction of bit stream having discontinuous system time information | |
| CN1164104C (en) | Bit stream seamless connection encoding method and device | |
| CN1135481C (en) | Multimedia Stream Generation Method and Optical Disc Creation System for Selective Playback of Video Data | |
| CN1126080C (en) | Information reproduction device | |
| CN1260970C (en) | Multimedia optical disk, reproducing device and reproducing method | |
| CN1240059C (en) | Multimedia optical disc for realizing transfer reproduction to parental locking zone by little control information and its reproducing device | |
| CN1096078C (en) | Information holding/transmitting medium employing parental control | |
| CN1706187A (en) | Data processing device | |
| CN1523603A (en) | Information recording medium, information reproducing apparatus and method, and computer program | |
| CN1516162A (en) | Method and device for seamlessly encoding video data obtained from telecine conversion | |
| CN1303590C (en) | Optical disk device, optical disk reproducing method, and optical disk | |
| HK1016793B (en) | Method and device for encoding seamless-connection of telecine-converted video data | |
| HK1016796B (en) | Method and device for seamless-reproducing a bit stream containing noncontinuous system time information | |
| HK1016792B (en) | Method and device for encoding seamless-connection system of bit stream | |
| HK1016791B (en) | Method and device for multi-angle connecting and encoding bit stream | |
| HK1016794B (en) | Encoding method and device for giving searching/reproducing route information to a bit stream | |
| HK1016795B (en) | Method and device for recording and reproducing interleaved bit stream on and from medium | |
| HK1016797B (en) | Method and device for encoding seamless-connection of bit stream |