CN103873864A

CN103873864A - Object flag bit efficient encoding method applied to video object retrieval

Info

Publication number: CN103873864A
Application number: CN201410126655.3A
Authority: CN
Inventors: 梁久祯; 王小龙
Original assignee: Jiangnan University
Current assignee: Jiangnan University
Priority date: 2014-03-31
Filing date: 2014-03-31
Publication date: 2014-06-18

Abstract

本发明公开了一种应用于视频对象快速浏览的对象标志位高效编码的方法，该方法基于对象区域信息、语义信息生成对象标志位来存储监控视频。首先根据视频对象分割结果，公开了一种基于区域生长的帧内对象区域标记位与基于运动估计的帧间对象区域标志位编码方案。提出一种基于对象细节描述的新的码流格式，将提取视频对象的语义信息一并写入码流进行存储。本发明将高复杂度的视频分析转移到监控前端，通过前端分析描述、标记视频对象，进一步基于H.264帧内、帧间的编码特性对标志位进行编码，通过减少对象标志位编码代价从而降低了监控视频的存储代价，为监控后端基于标志位高效率地获得感兴趣对象信息成为可能。The invention discloses a method for efficiently encoding object flag bits applied to fast browsing of video objects. The method generates object flag bits based on object area information and semantic information to store surveillance video. Firstly, according to the results of video object segmentation, a coding scheme of intra-frame object region flags based on region growing and inter-frame object region flags based on motion estimation is disclosed. A new code stream format based on object detail description is proposed, and the semantic information of the extracted video objects is written into the code stream for storage. The present invention transfers high-complexity video analysis to the monitoring front-end, analyzes and describes and marks video objects through the front-end, further encodes the flag bits based on the coding characteristics of H.264 intra-frame and inter-frame, and reduces the encoding cost of object flag bits to thereby The storage cost of the surveillance video is reduced, and it is possible for the surveillance backend to efficiently obtain the information of the object of interest based on the flag bit.

Description

A kind of object flag position high efficient coding method that is applied to object video retrieval

[technical field]

The present invention relates to object shapes, semantic coding and video storage field, particularly a kind of object flag position high efficient coding method of describing based on object details.

[background technology]

Digital video monitoring had obtained the extensive concern of academia and industrial quarters in recent years, and to monitor video storage and application start thereof further investigation.The notable feature of monitor video is that scene is relatively fixing, and many research work launch based on such feature, wherein mainly comprise that monitor video efficient storage is in fast browsing technology.

Video fast browsing technology mainly comprises video frequency abstract and video retrieval technology.Video frequency abstract claims again video concentrated, to video content simplified summary, in automatic or automanual mode, first analyze by moving target, extract moving target, then the movement locus of each target is analyzed, different targets is spliced in a common background scene, and they are combined in some way.On the one hand, such mode that splices and combines there will be object overlapping to a certain extent, can not the degree of depth the each interested object of dialysis; On the other hand, video frequency abstract need to carry out very complicated video analysis process, the limited needs that generally can not satisfying magnanimity Video processing of disposal ability of monitoring rear end.And traditional video, image retrieval technologies are to find required video segment or picture from a large amount of video datas, describe according to given sample or feature, system finds mated video segment point automatically, is conventionally applicable to retrieve sports that in plot that in interested event, film, retrieval is liked, sports cast, retrieval is liked etc. in news.

In the middle of monitor video application, in the time that monitor staff is only concerned about a certain feature object, how can in this type of feature object short time of whole monitor video, present, for back-end processing problem limited in one's ability, the video analysis process of high complexity can be placed on to front end, the monitor video that storage comprises video analysis content, does rear end monitor staff directly obtain the video of object of interest as required? from user perspective, thereby by which type of technological means greatly reduce and browsed user and lose interest in time of object video realize the fast browsing of video; Realizing angle from system, complexity transferred to front end by the task of which type of technological means to alleviate back-end processor by? the present invention is intended to provide a solution for above-mentioned technical barrier.

[summary of the invention]

First according to video object segmentation result, subject area marker bit and the interframe subject area flag bit encoding scheme based on estimation in a kind of frame based on region growing are disclosed.Propose a kind of new code stream form of describing based on object details, the semantic information of extracting object video is write to code stream in the lump and store.The video analysis of high complexity is transferred to front monitoring front-end by the present invention, by frontal chromatography describe, marking video object, further based in frame H.264, the encoding characteristics of interframe encodes to flag bit, thereby reduce the storage cost of monitor video by reducing object flag position coding cost, become possibility for monitoring rear end obtains object of interest information expeditiously based on flag bit.

By the object flag position relevant semantic information such as description object area information carry out efficient storage exactly, decoding end decodes retrieve video according to the interested object information of user, greatly delete the redundancy content of video, thereby based on user interest information, magnanimity monitor video is carried out to fast browsing.The main description object area information in object flag position and Object Semanteme information, and semantic information not only comprise color, texture, shape, etc. low layer semantic information, and comprise object type, behavioural characteristic etc. high-layer semantic information.The present invention is intended to illustrate a kind of coding framework based on object flag position that is applied to video frequency searching, therefore do analytic explanation taking the color-coded position of object as Object Semanteme information as example.

In order to realize object of the present invention, according to an aspect of the present invention, the present invention divides scan mode by changing subject area piece in frame, further introduces subject area flag bit inter-frame coding based on estimation, motion compensation.

1) the area flag position intraframe coding based on region growing:

According to claim 2, Moving Objects is carried out mark by object boundary rectangle frame, and adopt compression domain piece division information that the macro block in rectangle frame is divided, and these sub-blocks can be expressed as Ri={sb ₁, sb ₂... sb _n, the centre coordinate of sub-block is expressed as set Ce={sbc ₁, sbc ₂... sbc _n.Set level, vertical coordinate axle taking rectangle frame center (object centers) as the origin of coordinates.Adopt each sub-block center of normalization to arrive rectangle frame centre distance:

{dis}_{n} = \frac{d_{x}^{2} ({sbc}_{n})}{H_{0}} + \frac{d_{y}^{2} ({sbc}_{n})}{W_{0}}, n = 1,2, . . ., N .

Piece is divided the central point that comprises rectangle frame, in this case, and dis _n=0; Piece is divided and is positioned on the horizontal mid line or vertical centering control top-stitching of rectangle frame: in this case, and d _x(*) and d _y(*) in, there is one to be 0; Piece division and horizontal central line and median vertical line are all non-intersect: in this case, and d _x(*) and d _y(*) be not all 0.

By dis _n(n=1,2 ..., N) by the ascending order of Weighted distance to piece to be marked in the rectangle frame traversal of growing.With respect to traditional raster scan, the foreground blocks that makes to be labeled as 1 is more concentrated on the front portion that traversal piece is divided by algorithm disclosed by the invention, is labeled as 0 background piece and more concentrates on the rear portion that traversal piece is divided.Prefix to binary flags position and suffix adopt Run-Length Coding, and the harmless lossless compression method of middle directly transmission can further reduce the coding expense of area flag position on the basis of former method.

2) the area flag position interframe encode based on estimation

H.264 coding framework is taked different predicting strategies for different sub-blocks.Therefore in inter-frame encoding frame, in order better to utilize relativity of time domain, subject area flag bit utilizes predictive mode, MV and the reference frame divided based on piece existing in code stream to carry out interframe encode.

In current block smb to be marked all pixels based on pixel precision carries out interframe precoding.First, be divided three classes with reference to the sub-block in the boundary rectangle frame of Moving Objects in frame: foreground area (F), background area (B), borderline region (C), wherein borderline region width is 1 pixel.Next according to the motion vector MV (mv of Video coding motion estimation process output _x, mv _y) predict, concrete predicting strategy is as follows:

&PartialD; ({spix}_{i, j}, F) = \{\begin{matrix} 1 & if ({smb}_{x} + x + {mv}_{x}, {smb}_{y} + y + {mv}_{y}) &Element; F \\ 0 & otherwise \end{matrix}

&PartialD; ({spix}_{i, j}, B) = \{\begin{matrix} 1 & if ({smb}_{x} + x + {mv}_{x}, {smb}_{y} + y + {mv}_{y}) &Element; B \\ 0 & otherwise \end{matrix}

&PartialD; ({spix}_{i, j}, C) = \{\begin{matrix} 1 & if ({smb}_{x} + x + {mv}_{x}, {smb}_{y} + y + {mv}_{y}) &Element; C \\ 0 & otherwise \end{matrix}

Wherein smb _x, smb _ybe respectively horizontal stroke, the ordinate of current block top left corner apex to be encoded, x, y are pixel to be predicted horizontal stroke, ordinates with respect to left upper apex in current sub-block.

be used for describing the flag state of this pixel after prediction.According to claim 4, further determine the flag bit state of this sub-block according to the state of all pixels in sub-block:

1), if all pixels are all labeled as foreground area (F) in current sub-block, be 1 by the mark position of this sub-block;

2), if all pixels are all labeled as background area (B) in current sub-block, be 0 by the mark position of this sub-block;

3) if pixel is labeled as respectively foreground area (F), background area (B), borderline region (C) in current sub-block, carry out judgement symbol position by following rule:

Λ (smb, F) = \begin{matrix} \{\begin{matrix} 1 & if \frac{\underset{i, j &Element; smb}{Σ} &PartialD; ({spix}_{i, j}, F)}{\underset{i, j &Element; smb}{Σ} &PartialD; ({spix}_{i, j}, B)} > Thf \\ 0 & if \frac{\underset{i, j &Element; smb}{Σ} &PartialD; ({spix}_{i, j}, B)}{\underset{i, j &Element; smb}{Σ} &PartialD; ({spix}_{i, j}, F)} > Thb \\ 2 & otherwise \end{matrix} \end{matrix}

Wherein, introduce threshold value Thf, Thb and judge current sub-block flag bit state.The present invention is 2 to be defined as the flag bit of unofficial by flag bit, and interscan mode is frame by frame encoded.

According to claim 5, first extract the RGB color model of Moving Objects, obtain hsv color spatial model through a kind of linear transformation.In order to reduce high dimensional feature to the inconvenience of calculating and object information mark brings, algorithm carries out color quantizing to the HSV model after changing herein, by h, s, tri-components of v carry out the quantification of unequal interval by human eye color-aware, by large component analysis and comparison to hsv color model, tone h is divided into 7 parts herein, saturation s is divided into 3 parts, brightness v is divided into 3 parts, quantizes according to the different range of color, and tone, saturation and brightness value after quantification are respectively H, S, V.According to quantized level, 3 color components are converted into one-dimensional characteristic vector:

F＝HQ _sQ _v+SQ _v+V

Like this, H, S, V3 component just distributes and comes in one-dimensional vector, gets different weights and reduces image brightness y and the impact of saturation S on result for retrieval, and the objects different to distribution of color can be retrieved effectively.

[brief description of the drawings]

In conjunction with reference to accompanying drawing and ensuing detailed description, the present invention will be easier to understand, wherein structure member corresponding to same Reference numeral, wherein:

Fig. 1 is that in the present invention, a kind of object flag position high efficient coding method system that is applied to object video retrieval realizes block diagram;

The area flag position intraframe coding schematic diagram of Fig. 2 based on center expansion, wherein (a)-(d) is region growing flow process;

Fig. 3 (a) be based on

pixel precision carries out interframe precoding schematic diagram, is (b) original video object, (c) for schematic diagram is divided in region.

[embodiment]

For above-mentioned purpose of the present invention, feature and advantage can be become apparent more, below in conjunction with the drawings and specific embodiments, the present invention is further detailed explanation.

The object of the present invention is to provide a kind of object video mark high efficient coding framework.Fig. 1 shows the object flag position high efficient coding method system framework in the present invention, please refer to Fig. 1, described method 100 is: step 102, coding side obtains subject area information and semantic information by video analysis, respectively corresponding objects area flag position and Object Semanteme flag bit.Set up mixed Gauss model Moving Objects mask, according to the piece dividing mode formation object area flag position in video coding process.The opposing party extracts the RGB color model of Moving Objects:

First extract the RGB color model of Moving Objects, obtain hsv color spatial model through a kind of linear transformation, the HSV model after conversion is carried out to color quantizing, by h, s, tri-components of v carry out the quantification of unequal interval by human eye color-aware, by large component analysis and comparison to hsv color model, tone h is divided into 7 parts herein, saturation s is divided into 3 parts, and brightness v is divided into 3 parts, quantize according to the different range of color, tone, saturation and brightness value after quantification are respectively H, S, and V:

H = \{\begin{matrix} 0, & ifh &Element; [315,20) \\ 1, & ifh &Element; [20,45) \\ 2, & ifh &Element; [45,80) \\ 3, & ifh &Element; [80,150) \\ 4, & ifh &Element; [150,190) \\ 5, & ifh &Element; [190,280) \\ 6, & ifh &Element; [280,315) \end{matrix}

S = \{\begin{matrix} 0, & ifs &Element; [0,0.2) \\ 1, & ifs &Element; [0.2,0.75) \\ 2, & ifs &Element; [0.75,1] \end{matrix}

V = \{\begin{matrix} 0, & ifv &Element; [0,0.2) \\ 1, & ifv &Element; [0.2,0.7) \\ 2, & ifv &Element; [0.7, 1] \end{matrix}

According to above quantized level, 3 color components are converted into one-dimensional characteristic vector, that is:

F＝HQ _sQ _v+SQ _v+V

In formula, Q _sand Q _vbe respectively the quantification progression of s and V, get Q=4 herein, Q=2, above formula can be expressed as:

F＝8H+2S+V

Like this, H, S, V3 component just distributes and comes in one-dimensional vector, the span of L be [0,1,2 ..., 53], the weight that wherein tone H gets is 8, and the weight that saturation S gets is 2, and the weight that brightness y gets is 1.This has just reduced image brightness y and the impact of saturation S on result for retrieval, and the images different to distribution of color can be retrieved well.According to method above, color space is divided into 54 kinds of colors, the quantization method of these 54 kinds of representative colors has compressed color characteristic effectively, and can meet preferably the perception of human eye to color.

Step 104, adopts H.264 encoder to encode to original video when video analysis, extracts piece partition mode, MV information, the reference frame information of motion compensation in video coding process.

Step 106, object flag position is taked in frame, interframe encode.Frame inner region flag bit adopts the method based on region growing, interframe flag bit based on

pixel precision carries out predictive coding.

1), for object in frame, we take the area flag position intraframe coding based on region growing:

Moving Objects is carried out mark by object boundary rectangle frame, and adopt compression domain piece division information that the macro block in rectangle frame is divided, and these sub-blocks can be expressed as Ri={sb ₁, sb ₂... sb _n, the centre coordinate of sub-block is expressed as set Ce={sbc ₁, sbc ₂... sbc _n.Set level, vertical coordinate axle taking rectangle frame center (object centers) as the origin of coordinates.Adopt each sub-block center of normalization to arrive rectangle frame centre distance:

{dis}_{n} = \frac{d_{x}^{2} ({sbc}_{n})}{H_{0}} + \frac{d_{y}^{2} ({sbc}_{n})}{W_{0}}, n = 1,2, . . ., N .

By dis _n(n=1,2 ..., N) by the ascending order of Weighted distance to piece to be marked in the rectangle frame traversal of growing, the prefix to binary flags position and suffix adopt Run-Length Coding, middlely directly transmit harmless lossless compression method.

2), for interframe object, adopt and carry out interframe precoding based on 1/4th pixel precisions:

First, be divided three classes with reference to the sub-block in the boundary rectangle frame of Moving Objects in frame: foreground area (F), background area (B), borderline region (C).Next according to the motion vector MV (mv of Video coding motion estimation process output _x, mv _y) predict, concrete predicting strategy is as follows:

&PartialD; ({spix}_{i, j}, B) = \{\begin{matrix} 1 & if ({smb}_{x} + x + {mv}_{x}, {smb}_{y} + y + {mv}_{y}) &Element; B \\ 0 & otherwise \end{matrix}

&PartialD; ({spix}_{i, j}, C) = \{\begin{matrix} 1 & if ({smb}_{x} + x + {mv}_{x}, {smb}_{y} + y + {mv}_{y}) &Element; C \\ 0 & otherwise \end{matrix}

&PartialD; ({spix}_{i, j}, F) = \{\begin{matrix} 1 & if ({smb}_{x} + x + {mv}_{x}, {smb}_{y} + y + {mv}_{y}) &Element; F \\ 0 & otherwise \end{matrix}

be used for describing the flag state of this pixel after prediction, further determine the flag bit state of this sub-block according to the state of all pixels in sub-block:

Λ (smb, F) = \begin{matrix} \{\begin{matrix} 1 & if \frac{\underset{i, j &Element; smb}{Σ} &PartialD; ({spix}_{i, j}, F)}{\underset{i, j &Element; smb}{Σ} &PartialD; ({spix}_{i, j}, B)} > Thf \\ 0 & if \frac{\underset{i, j &Element; smb}{Σ} &PartialD; ({spix}_{i, j}, B)}{\underset{i, j &Element; smb}{Σ} &PartialD; ({spix}_{i, j}, F)} > Thb \\ 2 & otherwise \end{matrix} \end{matrix}

Wherein, introduce threshold value Thf=2, Thb=4 and judge current sub-block flag bit state.The present invention is 2 to be defined as the flag bit of unofficial by flag bit, and interscan mode is frame by frame encoded.

Step 108, merges the code stream of object flag position information and original video coding to obtain looking video content data storehouse.Object flag position information is write to picture parameter set extension layer or sheet head region, there is thereby form the monitor video code stream that object video details is described.

Step 110 is right according to the retrieval sample of input and video database content.Decoding end we when inputting a certain feature object, extract its hsv color one-dimensional characteristic vector according to following formula:

F′＝8H+2S+V

F ' and F are analyzed, if

just think that this object video is the object of successfully retrieving, and it is decoded.Background parts is taked main background back-and-forth method, because the change of background of monitor video scene is very little, so we take the main background as next cycle every background of one-period decoding.Finally obtain retrieving by said method interested object video, thereby realized the fast browsing of magnanimity video.

Above-mentioned explanation has fully disclosed the specific embodiment of the present invention.It is pointed out that and be familiar with the scope that any change that person skilled in art does the specific embodiment of the present invention does not all depart from claims of the present invention.Correspondingly, the scope of claim of the present invention is also not limited only to described embodiment.

Claims

1. the object video fast browsing framework based on object flag position high-efficiency coding technology, is characterized in that, described method comprises:

Carry out video analysis based on H.264 Video coding framework is encoded to original video when;

Set object flag position based on the relevant subject area information of video analysis result, semantic information;

Subject area marker bit encryption algorithm in frame based on region growing, flag bit in energy lossless coding frame;

Based on estimation,

the interframe subject area flag bit encoding scheme of pixel precision motion compensation, improves interframe flag bit coding efficiency;

Based on object flag position storage or transmit a kind of monitor video that is applied to video frequency searching.

2. subject area marker bit encryption algorithm in the frame based on region growing according to claim 1, carries out Moving Objects mark according to the object boundary rectangle frame of video analysis, adopts compression domain piece division information that the macro block in rectangle frame is divided:

Sub-block is expressed as Ri={sb ₁, sb ₂... sb _n, the centre coordinate of sub-block is expressed as set Ce={sbc ₁, sbc ₂... sbc _n.Set level, vertical coordinate axle taking rectangle frame center (object centers) as the origin of coordinates.Adopt each sub-block center of normalization to arrive rectangle frame centre distance:

{dis}_{n} = \frac{d_{x}^{2} ({sbc}_{n})}{H_{0}} + \frac{d_{y}^{2} ({sbc}_{n})}{W_{0}}, n = 1,2, . . ., N .

According to claim 1 based on estimation,

the interframe subject area flag bit encryption algorithm of pixel precision motion compensation, first carries out mark to the pixel of each sub-block:

In current block smb to be marked all pixels based on

pixel precision carries out interframe precoding, be divided three classes with reference to the sub-block in the boundary rectangle frame of Moving Objects in frame: foreground area (F), background area (B), borderline region (C), next according to motion vector MV (mv _x, mv _y) predict, predicting strategy is as follows:

&PartialD; ({spix}_{i, j}, F) = \{\begin{matrix} 1 & if ({smb}_{x} + x + {mv}_{x}, {smb}_{y} + y + {mv}_{y}) &Element; F \\ 0 & otherwise \end{matrix}

&PartialD; ({spix}_{i, j}, B) = \{\begin{matrix} 1 & if ({smb}_{x} + x + {mv}_{x}, {smb}_{y} + y + {mv}_{y}) &Element; B \\ 0 & otherwise \end{matrix}

&PartialD; ({spix}_{i, j}, C) = \{\begin{matrix} 1 & if ({smb}_{x} + x + {mv}_{x}, {smb}_{y} + y + {mv}_{y}) &Element; C \\ 0 & otherwise \end{matrix}

Wherein smb _x, smb _ybe respectively horizontal stroke, the ordinate of current block top left corner apex to be encoded, x, y are pixel to be predicted horizontal stroke, ordinates with respect to left upper apex in current sub-block,

be used for describing the flag state of this pixel after prediction.

4. according to claim 3 to after the pixel mark of each sub-block, judge the flag bit of each sub-block:

Λ (smb, F) = \begin{matrix} \{\begin{matrix} 1 & if \frac{\underset{i, j &Element; smb}{Σ} &PartialD; ({spix}_{i, j}, F)}{\underset{i, j &Element; smb}{Σ} &PartialD; ({spix}_{i, j}, B)} > Thf \\ 0 & if \frac{\underset{i, j &Element; smb}{Σ} &PartialD; ({spix}_{i, j}, B)}{\underset{i, j &Element; smb}{Σ} &PartialD; ({spix}_{i, j}, F)} > Thb \\ 2 & otherwise \end{matrix} \end{matrix}

Wherein, introduce threshold value Thf, Thb and judge current sub-block flag bit state, the present invention is 2 to be defined as the flag bit of unofficial by flag bit, and interscan mode is frame by frame encoded.

5. according to claim 1, based on object flag position storage or transmit a kind of monitor video that is applied to video frequency searching:

H = \{\begin{matrix} 0, & ifh &Element; [315,20) \\ 1, & ifh &Element; [20,45) \\ 2, & ifh &Element; [45,80) \\ 3, & ifh &Element; [80,150) \\ 4, & ifh &Element; [150,190) \\ 5, & ifh &Element; [190,280) \\ 6, & ifh &Element; [280,315) \end{matrix}

S = \{\begin{matrix} 0, & ifs &Element; [0,0.2) \\ 1, & ifs &Element; [0.2,0.75) \\ 2, & ifs &Element; [0.75,1] \end{matrix}

V = \{\begin{matrix} 0, & ifv &Element; [0,0.2) \\ 1, & ifv &Element; [0.2,0.7) \\ 2, & ifv &Element; [0.7, 1] \end{matrix}

F＝HQ _sQ _v+SQ _v+V

In formula, Q _sand Q _vrespectively the quantification progression of s and V.

Decoding end we when inputting a certain feature object, extract its hsv color one-dimensional characteristic vector F ' according to following formula:

F ' and F are analyzed, if

think that this object video is the object of successfully retrieving, and it is decoded.