HU220823B1

HU220823B1 - Method and apparatus for identification of original picture cycles

Info

Publication number: HU220823B1
Application number: HU9801739A
Authority: HU
Inventors: Klaus Schulze
Original assignee: Klaus Schulze
Priority date: 1998-07-31
Filing date: 1998-07-31
Publication date: 2002-05-28
Also published as: HUP9801739A3; HU9801739D0; HUP9801739A2

Abstract

The identification method involves using a video recorder (2) that has an input from a receiver (4) and has the output coupled to a DCT transformer (5). The transformer is connected with an image carousel (7), a correlator (9) and a reference memory (10). The correlator connects with a coarse identification stage (11). A high resolution identification stage (13) is coupled to a FIFO memory (12). The system uses a comparison based upon a quasi stochastic characteristics of an image sequence.

Description

A leírás terjedelme 18 oldal (ezen belül 11 lap ábra)The scope of the description is 18 pages (including 11 pages)

HU 220 823 BlEN 220 823 B1

A találmány tárgya eljárás és berendezés eredeti képsorozatok, például reklámok felismerésére, amelynek során a képsorozatok egyes képeiből azok világosságára vonatkozó jellemzőket állítunk elő, ezeket digitalizáljuk és egy referenciamintával hasonlítjuk össze.The present invention relates to a method and apparatus for recognizing original image sequences, such as advertisements, in which some of the image sequences are made for their brightness, digitized and compared with a reference sample.

Azokat a képsorozatokat, amelyek több egymás után következő, tartalmilag egymással összefüggő egyedi képből állnak, például a televízióban minden egyes újabb sugárzásnál fel kell ismerni. Ezek a képsorozatok reklámok, fizetett hirdetések, régi filmek vagy videoklipek, vagy politikai közlemények, például választási hirdetések lehetnek. Az említett alkalmazások mindegyikénél a képsorozatok sugárzását a fizetendő díjak miatt jogi vagy statisztikai okokból regisztrálni kell. Az eredeti (unikat) képsorozat kifejezés itt azt jelenti, hogy a képtartalom eredeti formájában megmarad, azaz az összes képpont világossága és színe egyedi és változatlan. A jó minőségű műszaki berendezésekkel történő másolásnál a kép egyetlen jellemző értéke sem változik. Ezért az „eredeti” megjelölés a másolt képsorozatokra is vonatkozik.Sequences of images that consist of several sequentially intertwined individual images, for example, have to be recognized on every new broadcast on television. These series can be advertisements, paid advertisements, old movies or video clips, or political announcements such as election ads. In each of these applications, the broadcasting of the series of images must be registered for legal or statistical reasons due to the charges to be paid. The original (unika) image sequence here means that the image content remains in its original form, ie the brightness and color of all pixels are unique and unchanged. When copying with high quality technical equipment, no characteristic value of the image changes. Therefore, the term "original" also applies to copied image sequences.

A reklámokat a tévétársaságok olyan időpontokban sugározzák, amelyek a hirdetőnek - a nézettség miatt különösen fontosak. A hirdetőnek érdekében áll annak ellenőrzése, hogy hirdetését valóban a megfelelő időpontban sugározzák. Egy reklám fennállása folyamán változik, rövidebb lesz, néhány képet megváltoztatnak vagy teljesen kicserélnek. Az új képsorozatot meg kell különböztetni az eredeti változattól.Advertisements are broadcast by television companies at times that are particularly important to the advertiser because of their audience. It is in the interest of the advertiser to verify that your ad is actually broadcast at the right time. It changes in the course of an advertisement, it will be shorter, some images will be changed or completely replaced. A new series of images must be distinguished from the original version.

A DE 43 09 957 Cl számú iratból ismeretes egy eljárás eredeti képsorozatok felismerésére. Ennek során az egyes képelemeket, az úgynevezett pixeleket csoportokba, tömbökbe foglalják össze, amelyeket klaszternak (cluster) neveznek, ezek világosságát (luminanciáját) jellemző értékeket digitalizálják, és ismert képek vagy referenciaminták megfelelő jellemzőivel hasonlítják össze.DE 43 09 957 Cl discloses a method for recognizing original image sequences. In doing so, the individual pixels, called pixels, are grouped into clusters, called clusters, digitizing their luminance values, and comparing them with the corresponding characteristics of known images or reference samples.

A szükséges adatkompresszió (adatredukció) miatt viszonylag nagy a hamis képfelismerés valószínűsége. A hamis felismerések számának csökkentése érdekében az ismert eljárásnál több egymás után következő képet tapogatnak le. Ennek az a hátránya, hogy az egymás utáni képek általában nagyon hasonlóak, úgyhogy a megtalált jellemzők gyakran véletlen hasonlóságokhoz vezetnek, anélkül hogy valóban hasonló képekről lenne szó. Bár ezek az úgynevezett véletlen hamis felismerések önmagában véve nem feltétlenül rontják egy reklám felismerését, de nagyon megnövelik a keletkező adattömeget, ami leterheli a számítógépet.Due to the required data compression (data reduction), the probability of false image recognition is relatively high. In order to reduce the number of false recognitions, several consecutive images are scanned by the known method. The disadvantage of this is that the sequential images are usually very similar, so that the features found often lead to random similarities without really similar images. Although these so-called random false recognitions do not necessarily detract from the recognition of an advertisement, they greatly increase the amount of data that is generated by the computer.

Célunk a találmánnyal a keletkező adatmennyiség és a hamis felismerések számának csökkentése.The object of the present invention is to reduce the amount of data generated and the number of false recognitions.

A kitűzött feladatot a találmány szerinti eljárással úgy oldjuk meg, hogy a jellemzőket több képre alkalmazott kvázi sztochasztikus módszerrel korrelációmentesítve állítjuk elő.The object of the present invention is to solve the problem by making the characteristics correlated with a quasi-stochastic method applied to multiple images.

Azáltal, hogy nem szomszédos képekből állítjuk elő a jellemzőket, hanem több képből egy kvázi sztochasztikus eljárás szerint, az egymást követő képek véletlen hasonlóságai elhanyagolhatóak, mivel megszakadnak az egymást követő képek közötti kapcsolatok.By not generating attributes from adjacent images, but from multiple images by a quasi-stochastic method, the random similarities of successive images are negligible as the relationships between successive images are interrupted.

Ennek eredményeként csökken a véletlen hamis felismerések száma, és ezáltal a számítógépet terhelő adattömeg is.As a result, the number of accidental false recognitions is reduced, and hence the data load on the computer.

Előnyösen az egyes képek kiválasztott jellemzőit meghatározott sorrendben egy képforgató léptetőregiszterként szervezett tárolójába újuk be, és egy kvázi sztochasztikus eljárás szerint szabadon választható eléréssel kiolvassuk. A jellemzők elérését a képek közötti lehető legnagyobb ugrásokkal, illetve a képeken belüli lehető legnagyobb távolságokkal valósítjuk meg. Ezekkel az intézkedésekkel korrelációmentesítjük a jellemzőket, mivel megszakítjuk az időbeli és térbeli kapcsolatokat.Preferably, the selected characteristics of each image are re-entered in a specific order in a repository of the image rotator organized as a step-by-step register and read by a freely selectable access by a quasi-stochastic method. Reaching the characteristics is accomplished with the greatest possible jumps between the images and the maximum distances within the images. With these measures, we correlate the attributes by interrupting temporal and spatial relationships.

Előnyös továbbá, ha kétfokozatú feldolgozást végzünk, amelynek első fokozata durva felismerés, és második fokozata finom felismerés. A durva felismerésnél keletkező adatok mennyiségét valós idejű feldolgozásra alkalmas mértékűre redukáljuk. Ez lehetővé teszi például egy reklám felismerését, anélkül hogy az esetleges változásokat (mutációkat) meg lehetne állapítani. A finom felismerést csak akkor hajtjuk végre, ha a durva felismeréssel egy képsorozatot, illetve reklámot hozzárendeltünk a rendelkezésre álló referenciamintához. A finom felismerés bizonyítja az egyezést, és feltáija a helyi és időbeli változásokat.It is further preferred that two-stage processing be performed with a first degree of coarse recognition and a second degree of fine recognition. The amount of data generated by coarse detection is reduced to a real-time processing rate. This allows, for example, recognition of an advertisement without any possible changes (mutations) being detected. Fine recognition is performed only when a series of images or advertisements are assigned to the available reference sample by coarse recognition. Fine recognition proves agreement and reveals local and temporal changes.

Célszerűen a jellemzőket a világosságnak egy térben összefüggő pixelekből álló klaszteren belüli változásából képezzük.Preferably, the characteristics are formed by a change of light within a cluster of spatially-related pixels.

A klasztereket diszkrét koszinusztranszformációnak vetjük alá. A kisfrekvenciás váltakozó együtthatók egyikét az előjelére redukáljuk, és ezt használjuk jellemzőként. Egy képsorozatot állandó hosszúságú időszeletekre osztunk fel, amelyek mindegyikét önálló egységként hozzuk korrelációba. Egy időszelet képeit ezáltal különösen kis adatmennyiség felhasználásával kódoljuk. Akkor tekintjük úgy, hogy egy képsorozatot felismertünk, ha az egyes időszeleteket a helyes sorrendben és a helyes időbeli távolságban felismertük.Clusters are subjected to discrete cosine transformation. One of the low frequency alternating coefficients is reduced to its sign and used as a feature. A series of images is divided into constant length slices, each of which is correlated as a separate unit. Thus, images of a time slice are encoded using a particularly small amount of data. It is considered that a series of images have been recognized when each time slice is recognized in the correct order and at the correct time distance.

A találmány tárgyát képezi az eljárás végrehajtására alkalmas berendezés is.The invention also relates to a device for carrying out the process.

A DE 43 09 957 Cl számú iratból ismeretes egy berendezés is az eredeti képsorozatok felismerésére szolgáló eljárás végrehajtására. Ennek a berendezésnek az a hátránya, hogy csak egymást követó képekből lehet előállítani a jellemzőket, úgyhogy a jellemzők között korreláció van, ami sok hamis felismerést okoz.DE 43 09 957 Cl also discloses an apparatus for performing a method for recognizing original image sequences. The disadvantage of this equipment is that it is possible to produce the characteristics only from successive images, so that there is a correlation between the characteristics, which causes many false recognitions.

A találmánnyal olyan berendezést kívánunk létrehozni, amely kvázi sztochasztikusan állítja elő a jellemzőket.The invention is intended to provide an apparatus that produces quasi-stochastic properties.

A találmány szerinti berendezésben egy léptetőregiszterként szervezett tárolót tartalmazó képforgató egyrészt egy DCT-transzformátoron át egy raszter/klaszter konverterrel ellátott videodekóderrel, és másrészt egy korrelátoron át egy referenciatárolóval úgy van összekötve, hogy egy vevőből a videodekóderbe betáplált képsorozat jellemző vektorként a korrelátorba vezethető, és a referenciatárolóban tárolható referenciamintával összehasonlítható.In a device according to the invention, a picture rotator containing a repository organized as a stepper register is connected to a video decoder with a raster / cluster converter via a DCT transformer and, secondly, to a correlator with a reference storage as a characteristic vector from the receiver to the video decoder, and comparable to the reference sample stored in the reference storage.

A képforgatóval egyszerű módon megvalósítható a jellemzők kvázi sztochasztikus előállítása, amelynek so2With the image rotator, quasi-stochastic production of the properties can be accomplished in a simple manner, which is so2

HU 220 823 Β1 rán a különböző képek jellemzőit korrelációmentesítjük.EN 220 823 Β1 the characteristics of the different images are correlated.

A berendezés egy előnyös kiviteli alakjánál a DCTtranszformátor és a képforgató között egy leágazás van kialakítva egy finomfelismerő tárolóhoz, amely egy képsorozat összes jellemzőjének tárolására alkalmas FIFO (first in, first out)-léptetőregiszterként van megvalósítva.In a preferred embodiment of the apparatus, a junction is formed between the DCT transformer and the image rotor for a fine recognition container implemented as a FIFO (first in, first out) stepping register for storing all the features of an image sequence.

A FIFO-léptetőregiszter alkalmazása lehetővé teszi, hogy a finom felismerést csak a valós időben elvégzett durva felismerés után hajtsuk végre.The use of the FIFO stepper register allows fine-tuning to be performed only after a rough realization in real time.

A találmányt a továbbiakban kiviteli példák és rajzok alapján részletesebben ismertetjük. A rajzokon azThe invention will now be described in more detail by way of examples and drawings. The drawings are

1. ábra: az eredeti képsorozatok felismerésére szolgáló berendezés tömbvázlata, aFigure 1 is a block diagram of a device for recognizing original series of images, a

2. ábra: reklámhelyek vázlata egy időszeletekre bontott képsorozattal, aFigure 2: An outline of advertising space with a series of slices divided into time slices, a

3. ábra: egyetlen időszelet képeinek vázlata a jellemzőkkel, aFigure 3: Sketch of images of a single time slice with features, a

4. ábra: a 3. ábra szerinti időszelet jellemző vektora kétszintű áramjelként ábrázolva, aFigure 4 is a characteristic vector of the time slice of Figure 3, represented as a two-level current signal, a

6. ábra: példa egy helyi változásra egy képen belül, ahol egy felirat egyik betűjét változtatták meg, aFigure 6: An example of a local change within an image where one letter of an inscription was changed, a

7. ábra: egy spektrális referenciaminta ábrázolása a legalacsonyabb rendű együtthatók bázisképeinek alakjában, aFigure 7: Representation of a spectral reference sample in the form of base images of the lowest order coefficients;

8. ábra: egy diszkrét koszinusztranszformáció egyes együtthatóinak variánsai, aFigure 8: Variants of individual coefficients of a discrete cosine transformation, a

9. ábra: egy CCIR-szabvány szerinti világosságjel tömörítése (adatredukálás), aFigure 9: Compression of a CCIR brightness signal (data reduction), a

10. ábra: egy tetszőleges kép együtthatóinak amplitúdóeloszlás-sűrűsége alulmintavételezés nélkül, aFigure 10 shows the amplitude distribution density of the coefficients of an arbitrary image without under-sampling;

11. ábra: egy tetszőleges kép C_Oi együtthatójának amplitúdóeloszlás-sűrűsége alulmintavételezéssel, aFig. 11: The amplitude distribution density of sub-sample of the C _O i coefficient of an arbitrary image by sub-sampling,

12. ábra: képsorozatok véletlen egyezésének mértéke referenciamintákkal, az elméletileg várható binomiális eloszlással összehasonlítva, aFigure 12: The degree of random matching of image sequences with reference samples compared to the theoretical binomial distribution,

13. ábra: egy képforgató vázlata, aFigure 13: Outline of a rotary motion picture a

14. ábra: a bináris minták véletlen egyezésének eloszlása különböző korrelációmentesítő intézkedések esetén, és aFigure 14: Distribution of random matching of binary samples for different correlation measures, and

15. ábra: a 14. ábra nagyított részlete, amely a korrelációmentesítő intézkedések hatását szemlélteti a felismerési küszöb felett.Figure 15 is an enlarged detail of Figure 14 illustrating the effect of correlation measures above the detection threshold.

Az 1. ábrán eredeti képsorozatok felismerésére szolgáló 1 berendezés egy raszter/klaszter konverterrel ellátott 2 videodekódert tartalmaz, amelynek 3 bemenete egy 4 vevő videokimenetével van összekötve. A 2 videodekóder kimenete egy 5 DCT-transzformátor bemenetére csatlakozik (DCT=diszkrét koszinusztranszformáció). Az 5 DCT-transzformátor kimenete egy 7 képforgató 6 bemenetére van kapcsolva. A 7 képforgató 8 kimenete egy 9 korrelátor bemenetével van összekötve. A 9 korrelátorra egy 10 referenciatároló, továbbá a durva felismeréshez egy 11 kiértékelőegység van kapcsolva. Az 5 DCT-transzformátor és a 7 képforgató között egy 12 FIFO-léptetőregiszter van elhelyezve, amely a finom felismeréshez egy 13 kiértékelőegységgel van összekötve. All kiértékelőegység a jelfeldolgozás első lépésében durva felismerést végez. Ha egy 14 képsorozat durva felismerésénél all kiértékelőegység olyan nagy hasonlóságot talál egy, a 10 referenciatárolóban tárolt referenciamintával, hogy nagy (körülbelül > 90%) valószínűséggel a keresett reklámról, illetve a keresett 14 képsorozatról van szó, a finom felismerésre szolgáló 13 kiértékelőegység egy felismerési jelet kap a durva felismerést végző 11 kiértékelőegységtől, és egy második feldolgozási lépésben megkezdi a 12 FIFO-léptetőregiszterben tárolt adatok feldolgozását. A finom felismerés bizonyítja az egyezést, és felfedi a helyi és időbeli változásokat (mutációkat).1, the apparatus 1 for recognizing original image sequences comprises a video decoder with a raster / cluster converter, the input 3 of which is connected to the video output of a receiver 4. The output of the video decoder 2 is connected to the input of a DCT transformer 5 (DCT = discrete cosine transformation). The output of the DCT transformer 5 is connected to the input 6 of an image rotator 7. The output 8 of the image rotator 7 is connected to the input of a correlator 9. A reference storage 10 is connected to the correlator 9 and an evaluation unit 11 is connected to the coarse recognition. The DCT transformer 5 and the image rotator 7 are provided with a FIFO stepper register 12 which is connected to an evaluation unit 13 for fine recognition. In the first step of signal processing, the sub-evaluation unit performs a rough recognition. If an evaluation unit at a rough recognition of a series of images 14 finds such a high degree of similarity with a reference sample stored in the reference store 10 that there is a high probability (about 90%) of the desired advertisement or the desired sequence of images, the evaluation unit 13 for detection is a recognition signal receives the coarse detection 11 and begins processing the data stored in the FIFO stepping register 12 in a second processing step. Fine recognition proves agreement and reveals local and temporal changes (mutations).

Az időbeli változás (egy jelenet lerövidítése vagy megváltoztatása) miatt az egész 14 képsorozatot kódolni kell. Ehhez célszerű a 2. ábrán látható képsorozatot például 2 másodperc hosszúságú 15 időszeletekre felosztani. A 3. ábrán vázolt 15 időszeletek önálló egységek, amelyek korrelációját külön-külön vizsgáljuk. Egy képsorozat felismerése (durva felismerése) akkor történik meg, amikor a 11, 13 kiértékelőegység az egyes időszeleteket a helyes sorrendben és a helyes időbeli távolságban felismeri. Ha időbeli változás van, akkor néhány 15 időszelet hiányzik, de a többi a várt sorrendben felismerhető. Az említett körülbelül 2 másodperces hosszúság számos kísérlet alapján előnyösnek bizonyult, de ez a reklámok, illetve 14 képsorozatok hosszúságának megfelelően korrigálható, azaz a 15 időszeletek hosszát úgy választjuk meg, hogy a reklámok, illetve 14 képsorozatok hosszúsága maradék nélkül felosztható legyen a 15 időszeletekre. Ha például a legrövidebb reklám hossza 7 másodperc, akkor 1,75 másodperces 15 időszeleteket alkalmazunk. Egyébként a tovább nem osztható maradékot a korrelációnál nem vesszük figyelembe.Due to the change in time (shortening or changing a scene), the entire 14 series of images must be coded. For this, it is advisable to divide the image sequence shown in Figure 2 into, for example, time slots of 15 seconds. The time slices 15 shown in Figure 3 are separate units, the correlation of which is examined separately. Recognition of a series of images (rough recognition) occurs when the evaluation unit 11, 13 detects each time slice in the correct order and at the correct time distance. If there is a change of time, some 15 time slices are missing, but the rest can be recognized in the expected order. These lengths of about 2 seconds have proved to be advantageous on many experiments, but can be corrected according to the length of the advertisements or 14 image sequences, i.e. the length of the time slices 15 is chosen so that the length of the advertisements and / or the series of 14 images can be divided into the 15 slices without any remaining. For example, if the shortest advertising length is 7 seconds, then we use 15 time slices of 1.75 seconds. Otherwise, the non-divisible residue is not taken into account in the correlation.

A helyi változások, tehát a 14 képsorozat egy 16 képén belüli változások, egy pixelekben (képpontokban) kifejezhető tartományra terjednek ki. A 6. ábrán látható példában egy felirat egyik betűje változott meg. A szignifikáns változások egy 32 x 32 pixeles területre terjednek ki. A finom felismeréshez, tehát az esetleges helyi változások felismeréséhez, mindegyik képet körülbelül az említett nagyságú területekre osztjuk fel, amelyeket önálló egységként más képek megfelelő területeivel kell korrelációba hozni (összehasonlítani) a 13 kiértékelőegységben. Egy pixelcsoport, tehát egy klaszter kódolásához a világosságjel (luminanciajel) változó összetevőjét alkalmazzuk a klaszteren belül. Egészen általánosan megfogalmazva, ez a változó összetevő a világosság változását adja meg a klaszteren belül, szemben egy egyenösszetevővel, amely a közepes világosságnak felel meg. A világosságváltozás sokféleképpen mehet végbe: lehet egy meghatározott irányú, egyszerű változás (gradiens), vagy egy többszörös világos-sötét váltakozás. Célszerűen az 5 DCT-transzformátorral végrehajtott diszkrét koszinusztranszformációt használjuk fel a világos-sötét váltakozás detektálására. A diszkrét koszinusztranszformációt ismert tömörítő algoritmusok3Local changes, i.e., changes within a 16 picture of a series of 14 images, range from a pixel (pixel) range. In the example shown in Figure 6, one letter of an inscription changed. Significant changes cover a 32 x 32 pixel area. For fine recognition, that is, for recognizing possible local changes, each image is divided into areas of said size, which are to be correlated (compared) with the respective areas of other images in the evaluation unit 13 as separate entities. For the coding of a pixel group, i.e. a cluster, the variable component of the luminance signal (luminance signal) is used within the cluster. Generally speaking, this variable component represents the change in light within the cluster, as opposed to an equal component that corresponds to medium brightness. The change of light can take many different forms: it can be a specific direction, a simple gradient, or a multiple light-dark alternation. Preferably, discrete cosine transformation with DCT transformer 5 is used to detect light-dark alternation. Compression compression algorithms known as discrete cosine transformation3

HU 220 823 Bl bán (például JPEG, MPEG) is alkalmazzák, és gyártanak ennek végrehajtására alkalmas integrált áramköröket (például Zorán 36050). Ezeknél a képtömörítő eljárásoknál a képet 8x8 pixel méretű klaszterekre (tömbökre) osztják fel (raszter a klaszterátalakításhoz). Ezután a klasztereken kétdimenziós diszkrét koszinusztranszformációt hajtanak végre, aminek eredményeként a képek frekvencia szerinti ábrázolását kapják. A tömörítés lényege elvileg az, hogy a nagyfrekvenciás összetevőket erősen redukálják, vagy akár el is hagyják. Az egyes C,* együtthatókhoz tartozó spektrális referenciaminták az úgynevezett bázisképek, amelyek közül az első néhány a 7. ábrán látható. A bal felső báziskép (Coo együttható) szolgáltatja - egy 8x8 pixeles klaszterra alkalmazva - az egyenösszetevőt, tehát a klaszter közepes világosságát. A jobb felső báziskép (C₀₁együttható) azt adja meg, hogy a vizsgált klaszterben milyen mértékben van jelen a világosság bemutatott alakulása. Ez a báziskép szolgáltatja tehát a képinformációnak az egyenösszetevő utáni legalsó spektrumvonalát. A 7. ábra szerinti többi báziskép a megfelelő információkat szolgáltatja a kép másik irányában. A magasabb rendű DCT-együtthatókhoz tartozó bázisképeket nem ábrázoltuk, mivel a találmány szempontjából nincs jelentőségük. A diszkrét koszinusztranszformációnak az a tulajdonsága, hogy azok a lényeges információk, amelyek az eredeti tartományban az összes alapértékre elosztva voltak jelen, a transzformáció után néhány komponensben koncentrálódnak. Ez azt jelenti, hogy a lényeges energia-összetevők az úgynevezett DC-együtthatóban (egyenösszetevőben) és az alsó AC-egyűtthatókban (kisfrekvenciás váltakozó összetevőkben) jelennek meg.HU 220 823 B1 (e.g., JPEG, MPEG) is used and integrated circuits capable of performing it (e.g., Zoran 36050) are manufactured. In these image compression processes, the image is divided into clusters (arrays) of 8x8 pixels (raster for cluster conversion). The clusters are then subjected to two-dimensional discrete cosine transformation, resulting in a frequency representation of the images. The essence of compression is in principle that high-frequency components are heavily reduced or even abandoned. The spectral reference samples for each of the C, * coefficients are so-called base images, the first of which are shown in Figure 7. The top left base image (Coo coefficient) provides - with an 8x8 pixel cluster - the component component, that is, the average brightness of the cluster. The upper right base picture (C ₀₁ coefficient) indicates to what extent the light presented in the examined cluster is present. This base image thus provides the lowest spectrum of image information after the component. The other base images of Figure 7 provide the corresponding information in the other direction of the image. The base images of higher order DCT coefficients are not shown because they are of no relevance to the invention. The characteristic of discrete cosine transformation is that the essential information that was present in the original range divided by all the base values is concentrated in a few components after the transformation. This means that the essential energy components appear in the so-called DC coefficient (component) and the lower AC aggregates (low-frequency alternating components).

A 8. ábra az energia csökkenését mutatja a magasabb rendű AC-együtthatóknál. Például a C₀₁ együtthatóban az energia-összetevő előnyösen körülbelül megkétszerezhető, ha vízszintes és függőleges decimálást alkalmazunk. Ez alatt négy négyzetesen elhelyezkedő pixel átlagoló összefoglalását értjük egy új pixellé. Ezt az eljárást alulmintavételezésnek (downsampling) is nevezik. Ezen a módon egy 16x16 pixelt tartalmazó klaszterból egy 8x8 pixel méretű klasztert hozunk létre, amelyen aztán újra végrehajtjuk a diszkrét koszinusztranszformációt. Az így előállított C₀₁ együtthatók általában kétszer akkora teljesítményt tartalmaznak, mint egy eredeti 8x8 pixeles klaszterból nyert együtthatók. A hardveres megvalósításnál ez az eljárás részben a televíziónál szokásos váltott soros képfelbontásból adódik, amelynél a képeket félképenként viszik át (tv-sorok közbeszövése). Ha csak egy félképet tekintünk, akkor függőleges alulmintavételezésról beszélhetünk. A vízszintes alulmintavételezés az alkalmazott integrált áramköröknél szintén beállítható. A 10. ábrán egy tetszőleges 16 kép C₀₁ együtthatójának amplitúdóeloszlás-sűrűsége látható alulmintavételezés nélkül. All. ábra a 10. ábra szerinti C₀₁ együttható amplitúdóeloszlás-sűrűségét mutatja alulmintavételezéssel. All. ábra valamivel szélesebb görbéje számszerűen körülbelül kétszeres varianciát szolgáltat, mint a 10. ábra szerinti görbe.Figure 8 shows the decrease in energy at higher AC coefficients. For example, in the C ₀₁ coefficient, the energy component is preferably about doubled when horizontal and vertical decimation is used. Here we mean the averaging summary of four square pixels for a new pixel. This procedure is also called downsampling. In this way, a cluster of 8x8 pixels is formed from a 16x16 pixel cluster, and then discrete cosine transformation is performed again. Generally, the _C01 coefficients thus produced contain twice as much performance as the coefficients obtained from an original 8x8 pixel cluster. In hardware implementation, this process is partly due to the interlaced serial image resolution of the television, at which the images are transmitted per half-frame (interruption of TV lines). If we only look at one half-image, we can talk about vertical under-sampling. Horizontal sampling can also be set for the integrated circuits used. In Figure 10, an arbitrary image 16 without the C ₀₁ amplitúdóeloszlás coefficient of the bottom-density sampling. All. Figure ₀₁ shows the C-coefficient amplitúdóeloszlás density of Figure 10 under sampling. All. The slightly wider curve of FIG. 10 provides a numerical variation of about two times the curve of FIG.

Ha egy 15 időszelet összes 16 képét - a 16 χ 16 pixeles tömbök, illetve klaszterek decimálásával - 8 χ 8 pixel méretű klaszterekre bontottuk és a frekvenciatartományba transzformáltuk, a legalsó nagy teljesítményű váltakozó együtthatót - ez mindegyik klaszter C₀₁ vagy C₁₀ együtthatója - további adatredukálásnak vetjük alá azáltal, hogy a továbbiakban csak az előjelét értékeljük. Egy 8x8 pixeles klasztert így egy bittel ábrázolunk, amely a váltakozó együttható előjelét adja meg. Ezzel az intézkedéssel egy 15 időszelet így kapott adatsorozata a következő döntő előnyökkel rendelkezik: Az egész képsorozat az adatok különleges redukálásával (tömörítésével) van kódolva, továbbá mindegyik bit egy helyi jellemző, amely független a képjelek kivezérlésétől és jel-zaj viszonyától.If all 16 picture of a 15 time slots - to the 16 pixel blocks 16 χ or clusters decimálásával - divided into 8 χ 8 pixel clusters and transformed to the frequency domain, the lowest high-power AC coefficient - this, each cluster C ₀₁ or C ₁₀ coefficient - further adatredukálásnak by evaluating only its sign. Thus, an 8x8 pixel cluster is represented by a bit that represents the sign of the alternating coefficient. With this measure, the data sequence thus obtained for a time slice 15 has the following decisive advantages: The entire sequence of images is encoded by a special reduction (compression) of the data, and each bit is a local feature independent of the derivation of the image signals and the signal-to-noise ratio.

A 9. ábra táblázatában egy 768x576 pixeles CCIRformátum világosságjeléből az adatredukció mértéke kereken 2xl0³. Egy például 2 másodperc hosszúságú időszeletnél még mindig körülbelül 11 kbyte adatmennyiség adódik. Reklámok ezreinek valós idejű feldolgozásánál ez az adattömeg túlságosan nagy lenne. Ezért a durva felismerést valós idejű feldolgozásra alkalmas eljárásként használjuk képsorozatok felismerésére, az esetleges változások (mutációk) kiértékelése nélkül. A finom felismerés nem valós idejű eljárásként követi a durva felismerést, azt megerősíti és lehetővé teszi a változások analizálását.Figure 9 is a table of pixel CCIRformátum 768x576 luminance signal with the degree of data reduction of around 2xl0 ^third For example, at a time slice of 2 seconds, there is still about 11 kbyte of data. With real-time processing of thousands of ads, this data mass would be too large. Therefore, coarse recognition is used as a real-time processing method for recognizing sequences without evaluating any changes (mutations). Fine recognition follows a rough recognition as a real-time process, confirms it, and allows you to analyze changes.

A durva felismerés egy-egy 16 képnél már körülbelül tizenhat 17, 18 jellemzővel lehetséges. A 17, 18 jellemzők alatt a DCT-együtthatók előjelbitjét értjük. Ezáltal egy (fél)kép 1728 előjelbitjéből mindig tizenhat bitet, azaz két bájtot veszünk ki egy bizonyos minta szerint. Egy 2 másodperc hosszúságú 15 időszeletnél ez száz bájtot jelent. A kivételi minta olyan, hogy ne legyen térbeli kapcsolat az együtthatók között. A képen látható tárgyak általában olyan kiteijedésűek, hogy sok klasztert magukban foglalnak. Ezekhez a klaszterekhez nagy valószínűséggel ugyanazok a DCT-együtthatók adódnak. Ezért azok a 17, 18 jellemzők, amelyeket ezekből a klaszterekből veszünk ki, nem függetlenek egymástól, és nem javítják a felismerés jóságát. A kapcsolatok megszűnnek, ha a tizenhat 17, 18 jellemzőt egy kvázi sztochasztikus eljárással lehetőleg egymástól távoli klaszterekből vesszük ki. Itt egy 15 időszelet mindegyik 16 képénél ugyanúgy járunk el, és összesen egy, például 16 χ 50=800 bit hosszúságú adatfuzért kapunk. Ha feltételezzük az egyes bitek függetlenségét, akkor annak a valószínűsége, hogy két ilyen bináris mintában N számú lehetséges közül k számú bit véletlenül egyezik, binomiális eloszlással számítható:The rough recognition of each of the 16 images is possible with about sixteen 17, 18 characteristics. Characteristics 17, 18 denote the sign of the DCT coefficients. Thus, we take out sixteen bits, that is, two bytes, of a (half) image 1728 sign bit, according to a particular pattern. At a 15-second slice of 2 seconds, this means one hundred bytes. The extraction pattern is such that there is no spatial relationship between the coefficients. The objects shown in the picture are usually of the kind that contain many clusters. These clusters are likely to have the same DCT coefficients. Therefore, the features 17, 18 that are extracted from these clusters are not independent of each other and do not improve the goodness of recognition. The relationships disappear when the sixteen 17, 18 characteristics are extracted from a distant cluster with a quasi-stochastic method. Here, we proceed in the same way for each of the 16 images of a time slice 15, and a total of, for example, 16 χ 50 = 800 bit lengths is obtained. If we assume the independence of the individual bits, then the probability that two bits of k in two binary patterns coincide randomly can be calculated by binomial distribution:

b(k,N,p)=(N)pk(l-p)N-k kb (k, N, p) = (N) pk (1-p) N-k k

ahol p egy 17, 18 jellemző előfordulásának valószínűsége.where p is a probability of occurrence of a characteristic of 17, 18.

Az itt vizsgált esetekben p=0,5, azaz a 0 és az 1 egy együttható előjelbitjeként egyformán valószínű. Ezáltal a binomiális eloszlás a következő összefüggésre egyszerűsödik:In the cases examined here, p = 0.5, i.e., a coefficient of 0 for the coefficient 1 and 1 for the coefficient is equally likely. Thus, the binomial distribution simplifies the following relationship:

HU 220 823 Bl b(k,N,p)=(N)pN kHU 220 823 B1b (k, N, p) = (N) pN k

Ha egy felismeréshez 85%-os alsó korlátot adunk meg, azazIf a 85% lower limit is given for recognition, that is

-100% >85%-100%> 85%

N akkor két időszeletminta között minden 85%-nál nagyobb hasonlóságot felismerésként értékelünk. Azt, hogy ez az időszelet-felismerés valóban egy igazi képsorozat-felismeréshez tartozik-e, szoftveres plauzibilitásvizsgálattal állapítjuk meg. Nemleges eredmény esetén véletlen egyezésről, hamis felismerésről van szó.N, then, between two 85% of the slice samples, we recognize the similarity as more than 85%. Whether this time-slice recognition really belongs to a real sequence of images is determined by a software plausibility test. In case of a negative result it is a random match, a false recognition.

A hamis felismerések kimutatására szolgál a 13. ábrán ábrázolt képforgató. A 7 képforgató egy léptetőregiszterként szervezett 19 tároló, amelybe egy 15 időszelet 16 képeinek kiválasztott 17, 18 jellemzőit előírt sorrendben írjuk be. A 7 képforgatónak az a feladata, hogy több 16 képből a mintákat ne a 14 képsorozat sorrendjében, hanem kvázi véletlenszerűen állítsa elő. Ez az úgynevezett keretek közötti (interffame) kódolás. A 14, ábrán az 1 görbe egy úgynevezett kereten belüli (intraframe) kódolást mutat, amelynél a jellemzőket a képek sorrendjében dolgozzuk fel; a 2,3 és 4 görbék a keretek közötti (interffame) kódolások véletlen egyezésének eloszlását szemléltetik 2, 10 és 50 kép esetén. A 15. ábrán nagyítva ábrázoltuk a 14. ábrának a 85%-os felismerési küszöb feletti tartományát.The image rotation shown in Figure 13 is used to detect false recognitions. The image rotator 7 is a repository 19 arranged as a stepper register, in which the selected characteristics 17, 18 of the 16 images of a time slice 15 are written in the specified order. The task of the image rotator 7 is to produce samples from a plurality of 16 images in the order of the sequence of images 14 rather than in a random sequence. This is called interffame encoding. In FIG. 14, curve 1 shows a so-called frame (intraframe) encoding, wherein the characteristics are processed in the order of the images; the curves 2.3 and 4 illustrate the random distribution of interffame encodings for 2, 10, and 50 images. Figure 15 shows an enlarged view of the range of Figure 85 above the 85% detection threshold.

Egy új 20 kép 17, 18 jellemzőit 40 ms-onként visszük be a 19 tárolóba, miközben a legrégebbi 21 képet a 19 tároló végén kiléptetjük. A 19 tároló két frissítése között egy időszelet N számú 17, 18 jellemzőjét egy kvázi sztochasztikus eljárás szerint szabadon választható eléréssel kiolvassuk, és a referenciamintákkal történő összehasonlításhoz a 9 korrelátorra adjuk. Eközben úgy járunk el, hogy egy 16 kép egyetlen klaszterét sem olvassuk ki másodszor, amíg az a 7 képforgatóban tartózkodik. A 17,18 jellemzők elérése a 23 lehető legnagyobb ugrásokkal történik aló képek között, és egyidejűleg a lehető legnagyobb távolságokkal a 16 képeken belül. Ezáltal megszüntetjük a korrelációt a 17, 18 jellemzők között, mivel megbontanak az időbeli és térbeli kapcsolatok.Characteristics 17, 18 of a new image 20 are introduced into the container 19 at 40 ms while the oldest picture 21 is removed at the end of the storage 19. Between the two updates of the container 19, the N-number 17, 18 of a time-slice is read by a freely selectable access according to a quasi-stochastic method and added to the correlator 9 for comparison with the reference samples. Meanwhile, no clusters of a 16 images are read out for the second time while in the rotor 7. The 17.18 features are achieved with the maximum possible jumps of 23 between the sub-images, and at the same time with the greatest possible distance within the 16 images. This eliminates the correlation between the characteristics 17, 18, as the temporal and spatial relationships are broken.

Az eljárás a következő lépésekben hajtható végre:The procedure can be performed in the following steps:

a) A 14 képsorozatokat 1,5-2 másodperc hosszúságú 15 időszeletekre osztjuk fel (hozzáigazítás a felismerési feladathoz).a) The 14 series of images are divided into time slots of 15 to 15 seconds (alignment to the recognition task).

b) Mindegyik 16 képet a használt JPEG-eljárásra támaszkodva klaszterekre osztjuk fel, és előnyösen függőleges és vízszintes alulmintavételezést alkalmazunk. A képek klasztereit diszkrét koszinusztranszformációnak vetjük alá, és egy alacsony rendű (célszerűen a C₀₁vagy C₁₀) váltakozó együttható előjelét 17,18 jellemzőként használjuk fel.b) Each of the 16 images is divided into clusters based on the JPEG method used, and preferably, vertical and horizontal sub-sampling is used. The clusters of the images are subjected to discrete cosine transformation and a low-order (preferably C ₀₁ or C ₁₀ ) variable coefficient is used as a 17.18 characteristic.

c) A minden egyes 16 képnél a b) pont szerint keletkező 1728 db 17 jellemzőből körülbelül 800-1000 db 18 jellemzőt töltünk be a 7 képforgató léptetőregiszterébe a valós idejű korrelációhoz, illetve durva felismeréshez.c) Approximately 800-1000 of the 1728 features of the 1728 17 generated in (b) for each of the 16 images are loaded into the rotary register 7 of the image rotator for real-time correlation or coarse recognition.

Innen egy korrelációmentesítő (dekorreláló) elérést biztosító eljárással minden egyes 15 időszeletben körülbelül 800-1000 db 18 jellemzőt veszünk ki, és az egész 15 időszeletet jellemző 4. ábrán szemléltetett 22 vektorként tároljuk. Ez képenként körülbelül 16-32 bitnek felel meg.From here, with a correlation (decorrelation) access method, each of the 15 slices is extracted from about 800 to about 1000, and stored as vector 22 illustrated in FIG. This corresponds to about 16-32 bits per image.

d) A c) pontban előállított jellemző 22 vektort referenciamintának nevezzük, és később üzem közben az ugyanilyen módon folyamatosan az éppen futó tévéműsorból előállított tesztmintával hasonlítjuk össze a korreláció megállapításához kizáró NEM-VAGY (EXNOR) logikai művelettel. Egy adott küszöbértéknél (például 85%-nál) nagyobb hasonlóság esetén úgy tekintjük, hogy felismertük a képet. A felismeréseket a hasonlóság mértékeként az időponttal együtt bevisszük egy adatbankba, amely tartalmazza a megfelelő mezőket az összes összehasonlítandó referenciamintához.d) The characteristic vector 22 produced in (c) is referred to as a reference sample and is later compared in the same manner with the test sample produced from the current television program in the same manner, with the non-OR or EXNOR logic operation to determine the correlation. For a similarity to a given threshold value (e.g., 85%), we consider that we have recognized the image. Recognitions are included as a measure of similarity with the time in a data bank containing the appropriate fields for all reference samples to be compared.

e) Az adatbankot egy alkalmas szoftverrel állandóan vizsgáljuk, hogy vannak-e összetartozó 15 időszeletek, amelyeket megfelelő képi távolságban és elegendő hasonlósággal ismertünk fel. Ha egy 14 képsorozat 15 időszeleteinek többségét felismertük, úgy tekintjük, hogy az egész képsorozatot felismertük.e) The data bank is continuously checked with a suitable software to determine if there are related time slices 15, which are recognized at a suitable image distance and with sufficient similarity. If most of the time slices of a series of 14 images have been recognized, it is considered that the entire sequence of images has been recognized.

f) A d) és e) pont szerinti felismerési folyamat valós időben zajlik le, tehát közvetlenül a képsorozat, illetve a reklám sugárzása után jelezhető a képsorozat felismerése. Ekkor a kép összes 17 jellemzőjét, azaz például egy 2 másodperces időszeletnél 86400 bit= 10,8 kbyte információt, amelyet átmenetileg a 12 FIFO-léptetőregiszterben tároltunk, felhasználjuk a finom felismeréshez. Ennek során egy 14 képsorozat összes 15 időszeletében megvizsgáljuk a területekre felosztott összes 16 kép korrelációját. Ez az összehasonlítás előre kiválasztott referenciamintákkal végezhető el, mivel a képsorozat felismerése alapjában véve már megtörtént.f) The recognition process in d) and e) takes place in real time, so that the image sequence can be detected directly after the series of images or the broadcast of the advertisement. At this point, all 17 characteristics of the image, such as 86400 bit = 10.8 kbyte information at a 2 second time slot, temporarily stored in the FIFO stepper register 12, are used for fine recognition. In this, we examine the correlation of all 16 images divided into areas in all 15 time slots of a series of 14 images. This comparison can be made with pre-selected reference samples because the sequence of images has been recognized in principle.

Claims

PATENT CLAIMS

A method for recognizing original image sequences, such as commercials, by generating characteristics of each image in a series of images for their luminance, digitizing them and comparing them with a reference sample, characterized in that the features (17, 18) are applied to multiple images (16) method.

Method according to Claim 1, characterized in that the selected features (17, 18) of each image are written in a sequential order into a shift register (19) of an image rotator (7) and read in a quasi-stochastic method with optional access.

A method according to claim 2, characterized in that the access to the features (17,18) is accomplished by the largest possible jump (23) between the images (16).

Method according to claim 2 or 3, characterized in that the access to the features (17,18) is accomplished with the greatest possible distances within the images (16).

5. A method according to any one of claims 1 to 4, characterized in that it is performed at about 40 ms

The BI enters the characteristics (17,18) of a new image (20) into the storage (19) and outputs the characteristics (17, 18) of the oldest image (21) at the end of the storage (19).

6. A method according to any one of claims 1 to 5, characterized in that the characteristics (17,18) are compared in a correlator (9) to the reference sample.

7. A method according to any one of claims 1 to 3, characterized in that the image sequence (14) is divided into constant length time slices (15).

The method of claim 7, wherein each time slice (15) is correlated as a single unit.

Method according to claim 7 or 8, characterized in that a sequence of images (14) is recognized when each time slice (15) is recognized in the correct order and at the correct time spacing.

10. A method according to any one of claims 1 to 3, characterized in that the image sequence (14) is divided into time slots (15) of about 1.5 to 2 seconds.

11. A method according to any one of claims 1 to 5, characterized in that the length of the time slices (15) is selected such that the length of the image sequence (14) can be divided into the time slices (15) without any remainder.

12. A method according to any one of claims 1 to 4, characterized in that the images (16) are divided into clusters of spatially connected pixels.

The method of claim 12, wherein the features (17,18) are formed by a change in luminosity within the cluster.

14. The method of claim 12 or 13, wherein the clusters are subjected to a discrete cosine transformation.

The method of claim 14, wherein the coefficients of the discrete cosine transformation are used as characteristics (17, 18).

The method of claim 15, wherein the characteristic (17,18) is one of the low frequency alternating coefficients.

The method of claim 16, wherein the low frequency alternating coefficient is reduced to its sign and used as a characteristic (17,18).

18. A 12-17. A method according to any one of claims 1 to 5, characterized in that clusters of 8 x 8 pixels are formed.

19. 12-18. A method according to any one of claims 1 to 4, characterized in that the clusters are sub-sampled by horizontal and vertical decimation.

20. The method of claim 19, wherein the four squared pixels are merged into a new pixel.

21. The method of claim 20, wherein a 16x16 pixel cluster is traced back to an 8x8 pixel cluster.

22. A method according to any one of claims 1 to 4, characterized in that it is a two-stage processing, the first stage of which is coarse detection and the second stage of fine detection.

23. The method of claim 22, wherein the amount of data generated by coarse detection is reduced to a level suitable for real-time processing.

Method according to claim 22 or 23, characterized in that for coarse detection, characteristics (17, 18) are generated from individual images (16) of the time slices (15) and in real-time correlation in the storage (7) of the image rotator (7). 19) and then read from the correlation describer access process and further processed as typical vectors (22) at time slices (15).

25. The method of claim 24, wherein comparing a characteristic vector (22) of a time slice (15) with a corresponding reference sample in the form of a negative EXNOR logic operation and, if detected, entering it into a database that contains all contains the appropriate fields for the reference pattern.

Method according to claim 25, characterized in that the database is constantly examined by suitable software for the presence of related time slots (15) which are recognized at the correct image distance and with sufficient similarity.

The method of claim 26, wherein a sequence of images (14) is recognized when most of the time slices (15) of the sequence of images (14) are recognized.

28. A 22-27. A method according to any one of claims 1 to 4, characterized in that fine detection is performed only when a series of images (14) is associated with the available reference pattern by coarse detection.

29. The method of claim 28, further comprising processing all characteristics (17, 18) aggregated per image area for fine detection.

30. Apparatus according to FIGS. A method according to any one of claims 1 to 4, characterized in that the image rotator (7) comprising a step register register (19) comprises a video decoder (2) provided with a raster / cluster converter on the one hand; and, on the other hand, is connected via a correlator (9) to a reference store (10) such that an image sequence (14) fed from a receiver (4) into the video decoder (2) can be fed to the correlator (9) as a characteristic vector (22); ) comparable to a stored reference sample.

31. Apparatus according to claim 30, characterized in that a branch is provided between the DCT transformer (5) and the image rotator (7) for a fine recognition container.

Apparatus according to claim 31, characterized in that the fine recognition container is formed as a FIFO shift register (12) capable of storing all the features (17) of a series of images (14).