[summary of the invention]
First according to video object segmentation result, subject area marker bit and the interframe subject area flag bit encoding scheme based on estimation in a kind of frame based on region growing are disclosed.Propose a kind of new code stream form of describing based on object details, the semantic information of extracting object video is write to code stream in the lump and store.The video analysis of high complexity is transferred to front monitoring front-end by the present invention, by frontal chromatography describe, marking video object, further based in frame H.264, the encoding characteristics of interframe encodes to flag bit, thereby reduce the storage cost of monitor video by reducing object flag position coding cost, become possibility for monitoring rear end obtains object of interest information expeditiously based on flag bit.
By the object flag position relevant semantic information such as description object area information carry out efficient storage exactly, decoding end decodes retrieve video according to the interested object information of user, greatly delete the redundancy content of video, thereby based on user interest information, magnanimity monitor video is carried out to fast browsing.The main description object area information in object flag position and Object Semanteme information, and semantic information not only comprise color, texture, shape, etc. low layer semantic information, and comprise object type, behavioural characteristic etc. high-layer semantic information.The present invention is intended to illustrate a kind of coding framework based on object flag position that is applied to video frequency searching, therefore do analytic explanation taking the color-coded position of object as Object Semanteme information as example.
In order to realize object of the present invention, according to an aspect of the present invention, the present invention divides scan mode by changing subject area piece in frame, further introduces subject area flag bit inter-frame coding based on estimation, motion compensation.
1) the area flag position intraframe coding based on region growing:
According to claim 2, Moving Objects is carried out mark by object boundary rectangle frame, and adopt compression domain piece division information that the macro block in rectangle frame is divided, and these sub-blocks can be expressed as Ri={sb
1, sb
2... sb
n, the centre coordinate of sub-block is expressed as set Ce={sbc
1, sbc
2... sbc
n.Set level, vertical coordinate axle taking rectangle frame center (object centers) as the origin of coordinates.Adopt each sub-block center of normalization to arrive rectangle frame centre distance:
Piece is divided the central point that comprises rectangle frame, in this case, and dis
n=0; Piece is divided and is positioned on the horizontal mid line or vertical centering control top-stitching of rectangle frame: in this case, and d
x(*) and d
y(*) in, there is one to be 0; Piece division and horizontal central line and median vertical line are all non-intersect: in this case, and d
x(*) and d
y(*) be not all 0.
By dis
n(n=1,2 ..., N) by the ascending order of Weighted distance to piece to be marked in the rectangle frame traversal of growing.With respect to traditional raster scan, the foreground blocks that makes to be labeled as 1 is more concentrated on the front portion that traversal piece is divided by algorithm disclosed by the invention, is labeled as 0 background piece and more concentrates on the rear portion that traversal piece is divided.Prefix to binary flags position and suffix adopt Run-Length Coding, and the harmless lossless compression method of middle directly transmission can further reduce the coding expense of area flag position on the basis of former method.
2) the area flag position interframe encode based on estimation
H.264 coding framework is taked different predicting strategies for different sub-blocks.Therefore in inter-frame encoding frame, in order better to utilize relativity of time domain, subject area flag bit utilizes predictive mode, MV and the reference frame divided based on piece existing in code stream to carry out interframe encode.
In current block smb to be marked all pixels based on
pixel precision carries out interframe precoding.First, be divided three classes with reference to the sub-block in the boundary rectangle frame of Moving Objects in frame: foreground area (F), background area (B), borderline region (C), wherein borderline region width is 1 pixel.Next according to the motion vector MV (mv of Video coding motion estimation process output
x, mv
y) predict, concrete predicting strategy is as follows:
Wherein smb
x, smb
ybe respectively horizontal stroke, the ordinate of current block top left corner apex to be encoded, x, y are pixel to be predicted horizontal stroke, ordinates with respect to left upper apex in current sub-block.
be used for describing the flag state of this pixel after prediction.According to claim 4, further determine the flag bit state of this sub-block according to the state of all pixels in sub-block:
1), if all pixels are all labeled as foreground area (F) in current sub-block, be 1 by the mark position of this sub-block;
2), if all pixels are all labeled as background area (B) in current sub-block, be 0 by the mark position of this sub-block;
3) if pixel is labeled as respectively foreground area (F), background area (B), borderline region (C) in current sub-block, carry out judgement symbol position by following rule:
Wherein, introduce threshold value Thf, Thb and judge current sub-block flag bit state.The present invention is 2 to be defined as the flag bit of unofficial by flag bit, and interscan mode is frame by frame encoded.
According to claim 5, first extract the RGB color model of Moving Objects, obtain hsv color spatial model through a kind of linear transformation.In order to reduce high dimensional feature to the inconvenience of calculating and object information mark brings, algorithm carries out color quantizing to the HSV model after changing herein, by h, s, tri-components of v carry out the quantification of unequal interval by human eye color-aware, by large component analysis and comparison to hsv color model, tone h is divided into 7 parts herein, saturation s is divided into 3 parts, brightness v is divided into 3 parts, quantizes according to the different range of color, and tone, saturation and brightness value after quantification are respectively H, S, V.According to quantized level, 3 color components are converted into one-dimensional characteristic vector:
F=HQ
sQ
v+SQ
v+V
Like this, H, S, V3 component just distributes and comes in one-dimensional vector, gets different weights and reduces image brightness y and the impact of saturation S on result for retrieval, and the objects different to distribution of color can be retrieved effectively.
[embodiment]
For above-mentioned purpose of the present invention, feature and advantage can be become apparent more, below in conjunction with the drawings and specific embodiments, the present invention is further detailed explanation.
The object of the present invention is to provide a kind of object video mark high efficient coding framework.Fig. 1 shows the object flag position high efficient coding method system framework in the present invention, please refer to Fig. 1, described method 100 is: step 102, coding side obtains subject area information and semantic information by video analysis, respectively corresponding objects area flag position and Object Semanteme flag bit.Set up mixed Gauss model Moving Objects mask, according to the piece dividing mode formation object area flag position in video coding process.The opposing party extracts the RGB color model of Moving Objects:
First extract the RGB color model of Moving Objects, obtain hsv color spatial model through a kind of linear transformation, the HSV model after conversion is carried out to color quantizing, by h, s, tri-components of v carry out the quantification of unequal interval by human eye color-aware, by large component analysis and comparison to hsv color model, tone h is divided into 7 parts herein, saturation s is divided into 3 parts, and brightness v is divided into 3 parts, quantize according to the different range of color, tone, saturation and brightness value after quantification are respectively H, S, and V:
According to above quantized level, 3 color components are converted into one-dimensional characteristic vector, that is:
F=HQ
sQ
v+SQ
v+V
In formula, Q
sand Q
vbe respectively the quantification progression of s and V, get Q=4 herein, Q=2, above formula can be expressed as:
F=8H+2S+V
Like this, H, S, V3 component just distributes and comes in one-dimensional vector, the span of L be [0,1,2 ..., 53], the weight that wherein tone H gets is 8, and the weight that saturation S gets is 2, and the weight that brightness y gets is 1.This has just reduced image brightness y and the impact of saturation S on result for retrieval, and the images different to distribution of color can be retrieved well.According to method above, color space is divided into 54 kinds of colors, the quantization method of these 54 kinds of representative colors has compressed color characteristic effectively, and can meet preferably the perception of human eye to color.
Step 104, adopts H.264 encoder to encode to original video when video analysis, extracts piece partition mode, MV information, the reference frame information of motion compensation in video coding process.
Step 106, object flag position is taked in frame, interframe encode.Frame inner region flag bit adopts the method based on region growing, interframe flag bit based on
pixel precision carries out predictive coding.
1), for object in frame, we take the area flag position intraframe coding based on region growing:
Moving Objects is carried out mark by object boundary rectangle frame, and adopt compression domain piece division information that the macro block in rectangle frame is divided, and these sub-blocks can be expressed as Ri={sb
1, sb
2... sb
n, the centre coordinate of sub-block is expressed as set Ce={sbc
1, sbc
2... sbc
n.Set level, vertical coordinate axle taking rectangle frame center (object centers) as the origin of coordinates.Adopt each sub-block center of normalization to arrive rectangle frame centre distance:
Piece is divided the central point that comprises rectangle frame, in this case, and dis
n=0; Piece is divided and is positioned on the horizontal mid line or vertical centering control top-stitching of rectangle frame: in this case, and d
x(*) and d
y(*) in, there is one to be 0; Piece division and horizontal central line and median vertical line are all non-intersect: in this case, and d
x(*) and d
y(*) be not all 0.
By dis
n(n=1,2 ..., N) by the ascending order of Weighted distance to piece to be marked in the rectangle frame traversal of growing, the prefix to binary flags position and suffix adopt Run-Length Coding, middlely directly transmit harmless lossless compression method.
2), for interframe object, adopt and carry out interframe precoding based on 1/4th pixel precisions:
First, be divided three classes with reference to the sub-block in the boundary rectangle frame of Moving Objects in frame: foreground area (F), background area (B), borderline region (C).Next according to the motion vector MV (mv of Video coding motion estimation process output
x, mv
y) predict, concrete predicting strategy is as follows:
Wherein smb
x, smb
ybe respectively horizontal stroke, the ordinate of current block top left corner apex to be encoded, x, y are pixel to be predicted horizontal stroke, ordinates with respect to left upper apex in current sub-block.
be used for describing the flag state of this pixel after prediction, further determine the flag bit state of this sub-block according to the state of all pixels in sub-block:
1), if all pixels are all labeled as foreground area (F) in current sub-block, be 1 by the mark position of this sub-block;
2), if all pixels are all labeled as background area (B) in current sub-block, be 0 by the mark position of this sub-block;
3) if pixel is labeled as respectively foreground area (F), background area (B), borderline region (C) in current sub-block, carry out judgement symbol position by following rule:
Wherein, introduce threshold value Thf=2, Thb=4 and judge current sub-block flag bit state.The present invention is 2 to be defined as the flag bit of unofficial by flag bit, and interscan mode is frame by frame encoded.
Step 108, merges the code stream of object flag position information and original video coding to obtain looking video content data storehouse.Object flag position information is write to picture parameter set extension layer or sheet head region, there is thereby form the monitor video code stream that object video details is described.
Step 110 is right according to the retrieval sample of input and video database content.Decoding end we when inputting a certain feature object, extract its hsv color one-dimensional characteristic vector according to following formula:
F′=8H+2S+V
F ' and F are analyzed, if
just think that this object video is the object of successfully retrieving, and it is decoded.Background parts is taked main background back-and-forth method, because the change of background of monitor video scene is very little, so we take the main background as next cycle every background of one-period decoding.Finally obtain retrieving by said method interested object video, thereby realized the fast browsing of magnanimity video.
Above-mentioned explanation has fully disclosed the specific embodiment of the present invention.It is pointed out that and be familiar with the scope that any change that person skilled in art does the specific embodiment of the present invention does not all depart from claims of the present invention.Correspondingly, the scope of claim of the present invention is also not limited only to described embodiment.