WO2016117039A1 - Dispositif de recherche d'images, procédé de recherche d'images, et support de stockage d'informations - Google Patents
Dispositif de recherche d'images, procédé de recherche d'images, et support de stockage d'informations Download PDFInfo
- Publication number
- WO2016117039A1 WO2016117039A1 PCT/JP2015/051433 JP2015051433W WO2016117039A1 WO 2016117039 A1 WO2016117039 A1 WO 2016117039A1 JP 2015051433 W JP2015051433 W JP 2015051433W WO 2016117039 A1 WO2016117039 A1 WO 2016117039A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- region
- search
- feature amount
- image
- scene
- Prior art date
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/232—Content retrieval operation locally within server, e.g. reading video streams from disk arrays
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
Definitions
- the present invention relates to an image search device, an image search method, and an information recording medium storing a program.
- Patent Document 1 discloses an “object detection method capable of detecting an object whose background is moving”, specifically, “approximate the background motion with a predetermined conversion model (for example, affine transformation or perspective transformation), "Background motion is estimated by estimating the conversion coefficient of the conversion model from the motion vector of the video” and "Only the object is detected by obtaining the difference between the feature quantity related to the object and the feature quantity related to the background” Yes.
- a predetermined conversion model for example, affine transformation or perspective transformation
- Patent Document 1 In the technique of Patent Document 1 described above, first, a motion vector for each macroblock is extracted.
- the motion vector itself has a large error in addition to the motion to be detected, and includes background motion due to camera work. Therefore, in Patent Document 1, the background motion is estimated by approximating the motion of the camera work using affine transformation.
- the estimated background motion is removed from the actual motion vector, and macroblocks with similar obtained motion vector data are integrated and detected as an object.
- each frame is scanned in various size areas, and all the obtained partial areas and search data corresponding to the partial areas are used for searching. It is possible to register it in the database.
- the search for surveillance video, broadcast video, etc. the number of frames constituting the video becomes enormous, and the number of obtained areas becomes enormous. There is a problem that it takes.
- an image search apparatus includes an input unit to which a plurality of images are input, and a plurality of first regions from the plurality of images.
- a first extraction unit that extracts and extracts a first feature amount from each first region, and selects a first feature amount that has a low appearance frequency from a plurality of first feature amount distributions extracted from a plurality of images;
- a region determination unit that identifies the first region including the selected first feature amount as a second region, a first feature amount extracted from the second region, a second region, and an image from which the second region is extracted;
- a search unit that performs a search using the first feature amount.
- a first step in which a plurality of images are input and a second step in which a plurality of first regions are extracted from the plurality of images and a first feature amount is extracted from each of the first regions.
- a first feature amount having a low appearance frequency is selected from the distribution of the plurality of first feature amounts extracted from the plurality of images, and the first region including the selected first feature amount is specified as the second region.
- 3 steps, a first feature amount extracted from the second region, a second region, and an image from which the second region is extracted are stored in a storage unit, and a search is performed using the first feature amount. And performing a fifth step.
- an information recording medium on which a program is recorded wherein a first means for receiving a plurality of images and a plurality of first areas are extracted from the plurality of images and a first feature amount is extracted from each of the first areas.
- a first feature amount having a low appearance frequency is selected from a plurality of first feature amount distributions extracted from a plurality of images, and a first region including the selected first feature amount is defined as a second region.
- a fourth means for storing in the storage unit a third means for identifying the second feature, a first feature value extracted from the second area, a second area, and an image from which the second area has been extracted;
- a fifth means for performing a search using the program is recorded.
- the image search apparatus According to the image search apparatus according to the present invention, it is possible to realize a search focusing on candidate areas in a video at high speed.
- Block diagram showing overall system configuration Block diagram showing hardware configuration
- Video database configuration example Diagram explaining the registration process of video database Flowchart showing processing flow of registration processing of video database Diagram explaining video search processing
- Flow chart showing the processing flow of video search Configuration example of registration and search screen System-wide processing sequence
- region A flowchart showing the processing flow of saliency determination based on the appearance frequency of an area Diagram explaining saliency determination based on region tracking
- a saliency area in which a search target appears remarkably is determined for candidate areas in a scene composed of a plurality of frames (405).
- the saliency area is a candidate area that has a high possibility that the search target appears prominently. For example, if a candidate region has a small image feature amount similar to other candidate regions among a plurality of candidate regions, it is considered that some object is captured instead of a frequent pattern such as wallpaper. Therefore, this candidate area is determined as a saliency area.
- the other candidate areas are moving to the right in the frame, and the candidate areas that are moving only one to the left are likely to be candidate areas to be noted.
- FIG. 1 is a functional block diagram showing the configuration of the video search system 100 according to the first embodiment of the present invention.
- the video search system detects a candidate area that may contain an object from each frame of the input video, identifies a salient area from a plurality of candidate areas, and creates a database, thereby acquiring large-scale video data.
- the system is intended to efficiently execute a video search focusing on a detection target.
- the video search system 100 includes a video storage device 101, an input device 102, a display device 103, and a video search device 104.
- the video storage device 101 is a storage medium for storing video data, and is configured using a hard disk drive built in a computer or a storage system connected via a network such as NAS (Network Attached Storage) or SAN (Storage Area Network). can do.
- the video storage device 101 may be a cache memory that temporarily holds video data continuously input from a camera, for example.
- the video data stored in the video storage device 101 may be data in any format as long as time series information between images can be acquired in some form.
- the stored video data may be moving image data shot by a video camera, or a series of still image data shot by a still camera at a predetermined interval.
- the input device 102 is an input interface for transmitting user operations to the video search device 104 such as a mouse, a keyboard, and a touch device.
- the display device 103 is an output interface such as a liquid crystal display, and is used for displaying the recognition result of the video search device 104, interactive operation with the user, and the like.
- the video search device 104 is similar to a query from a database using a registration process for extracting information necessary for search from the video stored in the video storage device 101 and creating a database, and a search query specified by the user from the input device 102. A search process for searching for a video and presenting information on the display device 103 is performed.
- the video search device 104 detects a candidate area from the frame and realizes a search focusing on an object area in the frame of the video, and after identifying the salient area using the first feature amount extracted from the candidate area, Feature quantities suitable for large-scale data retrieval are extracted from only the salient areas and registered in the database.
- the video search device 104 includes a video input unit 105, a first feature quantity extraction unit 106, a saliency area determination unit 107, a second feature quantity extraction unit 108, a video database 109, and a video search unit 110.
- the video input unit 105 reads video data from the video storage device 101 and converts it into a data format used in the video search device 104. Specifically, the video input unit 105 performs a video decoding process that decomposes video (moving image data format) into frames (still image data format). The obtained frame is sent to the first feature amount extraction unit 106. Further, an image feature amount is extracted from each obtained frame.
- the image feature amount is, for example, data represented by a fixed-length vector, and numerically representing appearance information such as an image color and shape. Information on the input video and information on the obtained frame are registered in the video database 109.
- the first feature quantity extraction unit 106 detects a candidate area included in the search target from each input frame.
- the candidate area is detected by scanning each frame several pixels in a plurality of areas, thereby detecting a plurality of areas of various sizes. At this time, it is easier to perform image processing later if the area shape is rectangular.
- the video search system 100 of the present invention is not limited to a specific type of object, and aims to realize a video search focusing on an arbitrary detection target designated by a user (including not only an object but also a symbol such as a mark). Therefore, a region having a large “objectness” index value is detected as a candidate region from the frame.
- a well-known technique can be used for detection of an object candidate region.
- the index value for example, the number of edges included in the region, the color difference from the surrounding region, the symmetry of the image, and the like can be used.
- the type of input video and the algorithm of a known technique when the type of object is not limited, several tens to several thousand candidate areas are output. In this way, by evaluating the candidate area with “likeness to be detected”, the number of candidates can be narrowed down before the saliency area is determined, and the processing load in the saliency area determination process thereafter can be reduced. Can do.
- the first feature quantity extraction unit 106 extracts feature quantities from all these candidate areas and registers them in the video database 109.
- feature quantities with a sufficiently large amount of information so that the feature quantities differ between different data. For example, it is possible to obtain a feature amount that combines a shape feature amount and a color feature amount, or a feature amount that takes a position into consideration by dividing a region into a lattice shape. In general, not only does it take a long time to calculate the feature amount, but also the registration data increases. Therefore, in the video search device 104 of the present invention, for the candidate areas detected by the first feature quantity extraction unit 106, feature quantities (first feature quantities) that can be distinguished between areas only within a limited scene are used. In the database registration, only the data writing may be performed without performing the clustering process.
- a feature quantity such as an image feature quantity having an information amount larger than the first feature quantity may be registered as a search feature quantity. This search feature amount will be described later as processing of the second extraction unit.
- the first feature amount for example, simple edge frequency, representative color, coordinate data representing motion, and the like can be used.
- the object to be searched is a thing with little movement such as a mark or a characteristic building
- the object to be searched is a person with movement such as a person or a car
- coordinate data representing a motion, a vector amount with a direction, or the like is desirable.
- the region determination unit 107 selects a saliency region in which a search target appears remarkably from the candidate regions detected by the first feature amount extraction unit 106.
- FIG. 10 is a diagram for explaining the saliency determination in the region determination unit.
- an index for obtaining the saliency of the candidate area there is a method of checking whether or not the pattern of the area appears stably (the appearance frequency is high). For example, for patterns that frequently appear in images of various compositions such as wallpaper and sky, even if the area is registered as search data, the possibility of actual registration is low, and the data compresses the storage unit. On the other hand, if patterns that appear only in a specific image, such as a person's face or a predetermined symbol, are registered as search data, they are often used for search and thus are not wasted.
- the present invention determines that an area with a low appearance frequency is an area in which data useful for searching is prominently identified, and identifies it as a prominent area. Then, by registering only the feature amount extracted from the saliency area, it is possible to reduce the load of registration processing. Furthermore, since the database configuration is carefully selected by registering the salient area, the speed of the search process can be improved.
- the registered image is an image constituting a plurality of moving images
- a scene change of each moving image is detected.
- the feature amount of the frame calculated by the video input unit 105 is used, and the distance between the feature amount of the current frame and the feature amount of the previous frame (for example, the square distance between feature amount vectors) is predetermined. This can be achieved by determining where the value is greater than or equal to the value.
- each candidate area detected by the first feature amount extraction unit may be tracked by associating between frames, and a frame in which tracking of many candidate areas is interrupted may be detected to determine a scene change. .
- a plurality of similar and temporally continuous image groups are set as one scene, the appearance frequency outside the scene is obtained, and the remarkable area is specified by the ratio of the frequency inside the scene and the frequency outside the scene.
- the pattern of the candidate area included in 1001 in FIG. 10 appears more frequently in the scene than 1002, but since the frequency outside the scene is high, it is determined that the saliency is low.
- 1003 determines that information useful for searching this scene is remarkable because the frequency outside the scene is low but the frequency inside the scene is high. Thereby, an area where a specific scene can be appropriately detected can be registered as search data.
- the first feature amount is clustered based on the distribution in the feature amount space, and if it is a small cluster, it is determined that the appearance frequency is low, and is set as a remarkable region. For example, as a determination method, a cluster in which the number of data in a cluster is less than several tens to one hundredth of the total number of data in the feature amount space is determined to have a low appearance frequency, and this cluster is selected. Several to several tens of remarkable regions are extracted from one frame.
- the first area is divided into the inside and outside of the scene, and clustering is performed based on the distribution in the feature amount space, and the appearance area that has a low appearance frequency outside the scene and a high appearance frequency within the scene is defined as a remarkable area.
- the degree of similarity with another first feature amount is obtained, and when the number of first feature amounts having a high degree of similarity is less than a threshold value, it is determined that the appearance frequency is low, and the region is defined as a remarkable region.
- other known techniques can be adopted as long as the method determines the appearance frequency.
- registration data can be reduced by narrowing down the saliency areas determined as described above by the following method.
- the remarkable area may be specified after tracking the candidate area using a plurality of temporally continuous images.
- the candidate area is first tracked. For example, another frame is searched using an image feature amount extracted from the candidate area, and a candidate area of another frame having a similarity equal to or higher than a threshold is specified as a tracking result of the same object.
- the movement amount of the candidate areas between the frames is obtained, and this is set as the first feature quantity.
- the appearance frequency is determined from the movement amount distribution, and the saliency area is specified. For example, in a plurality of candidate areas in the same frame, by clustering with the amount of movement, it corresponds to a small cluster (specifically, it is significantly larger or smaller than the surrounding movement amount, Identifies a region having a movement amount in the opposite direction) as a salient region.
- the candidate areas that are very similar in the scene are reduced to one remarkable area.
- several to several tens of salient regions are obtained from candidate regions existing in the scene from several thousand to several tens of thousands, and registration data can be reduced.
- the second feature quantity extraction unit 108 extracts a second feature quantity for search suitable for a wide range of similar image searches from the saliency area obtained by the area determination unit 107 and registers it in the video database 109.
- the second feature amount is a feature amount that can be distinguished between different regions even when the number of scenes and the number of registered data is increased by combining colors and shapes and performing composition division. For example, using a luminance gradient distribution A desired image feature amount can be considered.
- a search process for only similar clusters can be performed during a search.
- Two or more second feature values can be registered for one area so that the user can specify and switch during the search. For example, a feature value that emphasizes the shape and a feature value that emphasizes the color may be extracted and registered.
- a candidate area hereinafter referred to as a related area
- a related area related to the saliency area in the scene can also be registered in association with it. For example, when the saliency is determined using the frequency of the pattern, only one saliency area is selected from similar patterns, and the remaining candidate areas are registered as associated areas with the saliency area.
- the video database 109 is a database for managing information on video, frames, scenes, candidate areas, and salient areas necessary for video search.
- the video database 109 can store image feature amounts and perform a similar image search using the image feature amounts.
- the similar image search is a function of rearranging and outputting data in the order in which the query and the image feature amount are close to each other. For comparison of image feature amounts, for example, the Euclidean distance between vectors can be used. Details of the structure of the video database 109 will be described later with reference to FIG.
- the video search unit 110 searches for video desired by the user from the video database.
- the user specifies a search query using the input device 102.
- the search query may be registration data in the video database or an image input from the outside.
- the first feature amount or the second feature amount is extracted from the image, and an image search is performed using the extracted feature amount.
- the search result is presented to the user via the display device 103.
- the search it is possible to perform a search with higher accuracy by using the second feature amount (search feature amount) having a large amount of information.
- the first feature amount (comparison feature amount) is used. It is possible enough.
- FIG. 2 is a block diagram illustrating a hardware configuration of the video search system 100 according to the first embodiment of the present invention.
- the video search device 104 can be realized by a general computer, for example.
- the video search device 104 may include a processor 201 and a storage device 202 that are connected to each other.
- the storage device 202 is configured by any type of storage medium.
- the storage device 202 may be configured by a combination of a semiconductor memory and a hard disk drive.
- functional units such as the video input unit 105, the first feature amount extraction unit 106, the saliency area determination unit 107, the second feature amount extraction unit 108, the video database 109 search function, and the video search unit 110 shown in FIG. Is realized by the processor 201 executing the processing program 203 stored in the storage device 202.
- the processing executed by each functional unit is actually executed by the processor 201 based on the processing program 203 described above.
- the data of the image database 109 is included in the storage device 202.
- the video search device 104 further includes a network interface device (NIF) 204 connected to the processor.
- the video storage device 101 may be a NAS or a SAN connected to the video search device 104 via the network interface device 204. Alternatively, the video storage device 101 may be included in the storage device 202.
- FIG. 3 is an explanatory diagram showing a configuration and a data example of the video database 109 according to the first embodiment of the present invention.
- a configuration example of a table format is shown, but the data format of the video database 109 may be arbitrary.
- the video database 108 includes a video table 300, a scene table 310, a frame table 320, a candidate area table 330, and a salient area table 340.
- the table configuration in FIG. 3 and the field configuration of each table are configurations necessary for implementing the present invention, and tables and fields may be added according to the application.
- the video table 300 has a video ID field 301, a file path field 302, and a frame ID list field 303.
- the video ID field 301 holds the identification number of each video data.
- the file path field 302 holds a location on the video storage device 101.
- the frame ID list field 303 is a field for managing a list of frames extracted from the video, and holds a list of IDs managed by the frame table 320.
- the scene table 310 has a scene ID field 311 and a frame ID list field 312.
- the scene ID field 311 holds the identification number of each scene data.
- the frame ID list field 312 is a field for managing continuous frames belonging to the scene, and holds a list of IDs managed by the frame table 320.
- the frame table 320 includes a frame ID field 321, a video ID field 322, a scene ID field 323, a candidate area ID list field 324, a salient area ID list field 325, and a frame feature amount field 326.
- the frame ID field 321 holds an identification number of each frame data.
- the video ID field 322 holds a video ID of a video from which a frame is extracted.
- the scene ID field 323 holds the scene ID of the scene to which the frame belongs.
- the candidate area ID list field is a field for managing candidate areas detected from the frame, and holds a list of IDs managed by the candidate area table 330.
- the saliency area ID list field 325 is a field for managing areas determined to be prominent by the area determination unit 107 among the candidate areas detected from the frame, and is a list of IDs managed by the saliency area table 340. Hold.
- the frame feature amount field 326 holds image feature amounts extracted from the entire region of the frame. The image feature amount is given by, for example, fixed-length vector data.
- the candidate area table 330 has a candidate area ID field 331, a frame ID field 332, a coordinate field 333, and a first feature quantity field 334.
- the candidate area ID field 331 holds the identification number of each candidate area data.
- the frame ID field 332 holds the ID of the frame from which the candidate area is detected.
- the coordinate field 333 holds the coordinates of the candidate area in the detection source frame. The coordinates are expressed, for example, in the form of “horizontal coordinates of the upper left corner, vertical coordinates of the upper left corner, horizontal coordinates of the lower right corner, and vertical coordinates of the lower right corner of the rectangle” of the area rectangle.
- region is given as a rectangle for easy description, arbitrary area
- the first feature quantity field 334 holds the feature quantity of the candidate area extracted by the first feature quantity extraction unit 106.
- the saliency area table 340 includes a saliency area ID field 341, a representative candidate area ID field 342, a related candidate area ID list field 343, and a second feature quantity field 344.
- the saliency area ID field 341 holds an identification number of each saliency area data.
- the representative candidate area ID field 342 holds the ID of the candidate area selected as the salient area.
- the related candidate area ID field 343 holds a list of IDs of candidate areas related to the salient area.
- the second feature quantity field 344 holds a feature quantity for searching a saliency area extracted by the second feature quantity extraction unit 108.
- the image search apparatus described in the present embodiment extracts a plurality of first regions from the input unit that inputs a plurality of images and a plurality of images, and first feature amounts from the respective first regions. And a first feature amount having a low appearance frequency is selected from a plurality of first feature amount distributions extracted from a plurality of images, and a first region including the selected first feature amount is selected as a first region.
- a region determination unit that identifies two regions, a first feature amount extracted from the second region, a second region, and an image from which the second region is extracted, and a first feature amount And a search unit for performing a search.
- Second area By storing the feature quantities extracted only from the partial area (second area) specified in this way and using it for the search, the number of registered data can be reduced and the search speed can be improved.
- FIG. 5 is a flowchart for explaining processing in which the video search device 104 according to the first embodiment of the present invention detects a region from the video input from the video storage device 101 and registers it in the video database 109. Hereinafter, each step of FIG. 5 will be described.
- the video input unit 105 acquires a video from the video storage device 101 and converts it into a format that can be used inside the system. Specifically, the video input unit 105 decodes the video and extracts a frame (still image).
- Step S502 The video input unit 105 extracts an image feature amount from the frame obtained in step S501.
- the first feature quantity extraction unit 106 detects a region that is highly likely to contain an object from the frame obtained in step S501 and sets it as a candidate region.
- the first feature amount extraction unit 106 extracts a first feature amount intended to be used for saliency determination from each candidate region obtained in step S503.
- Step S505 The area determination unit 107 determines a scene change using the frame feature value extracted in step S502 or the first feature value of the candidate area extracted in step S504. If a scene change has occurred, step S506 and subsequent steps are executed for the data of the previous scene, and if not, the process moves to step S508.
- the area determination unit 107 determines a saliency area for all candidate areas included in the scene.
- Step S507 The second feature quantity extraction unit 108 extracts a second feature quantity that is intended to be used for search with respect to the saliency area specified in step S506.
- the video search apparatus 104 registers information on video, frames, scenes, candidate areas, and salient areas in the video database 109 in association with each other. Regarding data registration, registration may be sequentially performed in the video database 109 for each process of each preceding functional unit, or may be registered in the video database 109 in a lump after a series of processing for frames is completed.
- Step S509 If there is a next frame in the video storage device 101, the video search apparatus 104 returns to step S501 and repeats the series of registration processes described above. If not, the video search apparatus 104 ends the registration process.
- FIG. 6 is a diagram for explaining processing in which the video search apparatus 104 searches for videos registered in the video database 109 using a query designated by the user in the video search system 100 according to the first embodiment of the present invention.
- FIG. 6 is a diagram for explaining processing in which the video search apparatus 104 searches for videos registered in the video database 109 using a query designated by the user in the video search system 100 according to the first embodiment of the present invention.
- the user inputs information as a clue to search for a desired video from the video database 109.
- similar image search an image having a similar feature can be found from the database using the feature of the image given by the user.
- the user may specify a search target area (601). Further, for example, by managing text information representing a specific object and an image in association with each other, an image used for a similar image search can be given from text input by the user.
- the second feature value is extracted from the query image given by the user in this way (602).
- a similar image search is executed for the video database 109 (604).
- the similar image search is a process of searching for images with similar features, and the distance between feature quantity vectors can be regarded as dissimilarity. Further, when exp ( ⁇ d) ⁇ 100 is calculated using the distance d, it takes a value from 0 to 100, and this may be used as the similarity.
- the search results 605 are rearranged in the descending order of similarity and presented to the user.
- the search processing described above is a search using only information on the saliency area, but since the video search apparatus 104 of the present invention holds information on the candidate area, it is possible to perform a search again using this information. .
- the re-search for the candidate area in the scene can be switched by an option (610).
- the first feature quantity 612 is re-extracted from the query designated by the user (611).
- a search is performed for candidate regions related to the salient region of the search result 605 obtained using the second feature value (613).
- the clustering process for speeding up the search is not performed on the first feature quantity, the number of candidate areas related to the search result 605 is limited, and therefore, the first feature quantity can be executed without a heavy load.
- FIG. 7 is a flowchart for explaining processing in which the video search apparatus 104 according to the first embodiment of the present invention searches for videos registered in the video database 109 using a query designated by the user. Hereinafter, each step of FIG. 7 will be described.
- Step S701 The user designates a search query using the input device 102.
- Step S702 The video search unit 110 extracts the second feature amount from the image specified by the user.
- the second feature amount is extracted by the same processing procedure as that at the time of registration.
- Step S703 The video search unit 110 uses the second feature value obtained in step S702 to search the video database 109 for a saliency area with a close feature value.
- Step S704 If the user has instructed the re-search in the scene, the video search device 104 executes the processing from step S705 onward, and otherwise moves to step S707.
- Step S705 The video search unit 110 extracts the first feature amount from the image designated by the user in step S701.
- Step S706 The video search unit 110 uses the first feature amount obtained in step S705 to search for a region having a close feature amount, targeting candidate regions related to the salient region of the search result in step S703. This result is reflected in the search result.
- Step S707 The video search device 104 outputs the search result to the display device 103 and ends the process.
- FIG. 8 is a diagram illustrating a configuration example of an operation screen for registering video data and performing video search focusing on an object in a frame using the video search device 104 according to the first embodiment of the present invention.
- This screen is presented to the user on the display device 103.
- the user gives a processing instruction to the video search device 104 by operating the cursor 801 marked on the screen using the input device 102.
- a data registration button 802 includes a registration button 802, a registration option designation area 803, a query read button 804, a query image display area 805, a search option designation area 806, a search button 807, and a search result display area 808.
- the video search device 104 reads the video stored in the video storage device 102 and registers it in the video database 109. All data may be registered, or the user may designate a video file to be registered. Further, in the registration option designation area 803, all data may be registered without performing the saliency determination as in the past.
- the read image is displayed in the query image display area 805.
- the search option designation area 806 for example, the user can switch the search target to an entire frame area, a salient area, or a candidate area.
- the feature amount extracted from the query image and the search target change according to the region specified here.
- the search button 807 the video search device 104 searches for a similar video from the video database 109.
- the search result is displayed in a search result display area 808.
- the search result display area 808 can further improve the ease of use of the search results by providing operation buttons for performing video thumbnails, similarity, time during moving images, reproduction and data output to an external application.
- FIG. 9 is a diagram for explaining the processing sequence of the video search system 100 according to the first embodiment of the present invention. Specifically, the user 900 in the video registration and video search processing of the video search system 100 described above. The processing sequence of the video storage device 101, the computer 901, and the image database 109 is shown. The computer 901 is a computer that implements the video search device 104. Hereinafter, each step of FIG. 9 will be described.
- S910 represents a video registration process
- S930 represents a video search process
- the computer 901 acquires video data from the video storage device 101 (S912, S913).
- the subsequent processing corresponds to the series of registration processing described above with reference to FIG.
- the computer 901 cuts out a frame from the video (S914), extracts the feature amount of the frame (S915), and then extracts a large number of candidate areas from the frame (S916).
- the computer 901 extracts the first feature amount from each obtained candidate region (S917).
- the computer 901 detects a scene change (S918), and performs a saliency area determination for a candidate area in the scene (S919).
- a second feature amount is extracted for the obtained saliency area (S920), and information on the video, scene, frame, candidate area, and saliency area is associated and registered in the video database 109 (S921).
- the computer 901 notifies the user 900 of the end of the registration process (S922).
- the video search process S930 corresponds to the series of search processes described above with reference to FIG.
- the computer 900 extracts a second feature amount from the given query image (S932).
- a similar image search is performed on the video database 109 using the extracted second feature amount (S933).
- the computer 900 extracts the first feature amount from the query image (S934).
- a similar image search is performed on candidate areas related to the saliency area obtained in step S933 (S935). These results are integrated to generate a search result screen (S936), and the search result is presented to the user 900 (S937).
- FIG. 11 is a flowchart for explaining a processing flow of saliency determination in the region determination unit. Hereinafter, each step of FIG. 11 will be described.
- the saliency area determination unit 107 performs clustering processing on candidate areas in the scene.
- a known algorithm such as K-means clustering can be applied to the clustering process.
- the saliency area determination unit 107 calculates a representative vector from each cluster obtained in step S1102.
- the representative vector for example, an average value of the feature vector belonging to the cluster can be used.
- the saliency area determination unit 107 performs a similar image search on registered data using the representative vector obtained in step 1102.
- the first feature amount is not subjected to pre-processing for speeding up, it may be difficult to calculate the similarity with all registered data. Therefore, for example, random sampling is performed and compared with only a predetermined number of registered data.
- the information necessary for this processing is the number of similar registration data, for example, when the feature amount space is divided in advance and the first feature amount of the candidate region is registered, which partial space belongs to And the number of candidate areas belonging to the space is counted. By only referring to this count, the frequency of registered data similar to the representative vector can be obtained.
- the saliency area determination unit 107 calculates the frequency in the scene (number of cluster members) obtained in step 1101 and the frequency outside the scene obtained in step 1103 (number of search results with similarity equal to or greater than a predetermined value). The saliency is determined from the ratio. For a cluster whose saliency is equal to or greater than a predetermined value, a candidate area having the closest feature quantity to the representative vector is output as a saliency area.
- FIG. 12 is an explanatory diagram of saliency determination based on region tracking.
- FIG. 13 is a flowchart for explaining a processing flow of saliency determination based on region tracking. Hereinafter, each step of FIG. 13 will be described.
- the saliency area determination unit 107 associates adjacent frames with the candidate areas in the scene. For example, the similarity of the first feature amount may be used for the association, or a coordinate value may be used. In addition, in consideration of a case where tracking is interrupted due to occlusion or the like, association that allows a predetermined number of missing frames may be performed.
- Step S1302 The saliency area determination unit 107 calculates the duration, movement path, and movement amount of the locus from the locus of the candidate region obtained in step S1301.
- the saliency area determination unit 107 calculates the average movement amount of all the trajectories obtained in step S1302, and obtains the saliency from the difference from the average movement amount and the duration.
- One or more candidate areas are selected as saliency areas in a trajectory having a saliency greater than or equal to a predetermined value. For example, a candidate region is selected from the trajectory using information such as the size of the region, edge strength, and less blur (a frame with little change in movement amount).
- the first feature amount is used to classify candidate areas in the scene. Therefore, the first feature amount extraction algorithm may be changed in accordance with the scene characteristics. For example, in the case of an image in a dark place, the candidate areas in the scene can be effectively classified by using only the information on the shape and movement, not the color characteristics. For example, parameters such as luminance correction may be changed according to the scene without changing the algorithm for extracting the feature quantity.
- FIG. 14 is a flowchart showing switching of the first feature amount by scene determination. Hereinafter, each step of FIG. 14 will be described.
- the video input unit 105 performs scene discrimination using the image feature amount extracted from the frame.
- the scene type, the corresponding parameter, and the first feature extraction method are set when the system is constructed. For example, the process branches to a first feature amount extraction process in which shape, color, and movement are emphasized as follows.
- the first feature amount extraction unit 106 extracts shape feature amounts from the candidate areas.
- the saliency area determination unit 107 performs a saliency area determination process focusing on the shape.
- the first feature amount extraction unit 106 extracts a color feature amount from the candidate area.
- the saliency area determination unit 107 performs saliency area determination processing focusing on color.
- the first feature quantity extraction unit 106 extracts a motion feature quantity from the candidate area.
- the saliency area determination unit 107 performs a saliency area determination process focusing on movement. Note that when the first feature value is switched using scene discrimination, candidate regions outside the scene in the saliency discrimination based on the frequency described in the description of FIG. 10 are targeted only for regions using the same extraction method. .
- the image search method described in the present embodiment includes a first step in which a plurality of images are input, a plurality of first regions extracted from the plurality of images, and a first feature amount from each first region.
- a first feature amount having a low appearance frequency is selected from a plurality of first feature amount distributions extracted from a plurality of images, and a first region including the selected first feature amount is selected as a second region.
- a sixth step of performing a search using is
- Second area By storing the feature quantities extracted only from the partial area (second area) specified in this way and using it for the search, the number of registered data can be reduced and the search speed can be improved.
- Video search system 101: Video storage device, 102: Input device, 103: Display device, 104: Video search device, 105: Video input unit, 106: First feature amount extraction unit, 107: Area determination unit, 108 : Second feature amount extraction unit, 109: video database, 201: processor, 202: storage device, 203: processing program, 204: network interface device, 802: data registration button, 803: registration option designation area, 804: query read Button 805: Query image display area 806: Search option designation area 807: Search button 808: Search result display area
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Television Signal Processing For Recording (AREA)
Abstract
L'invention concerne un dispositif de recherche d'images, comportant : une unité d'entrée dans laquelle une pluralité d'images sont entrées ; une première unité d'extraction qui extrait, en provenance de la pluralité d'images, une pluralité de premières régions (par exemple, des régions candidates, des zones partielles), et qui extrait une première valeur caractéristique en provenance de chacune des premières régions ; une unité de détermination de régions qui sélectionne, à partir d'une distribution d'une pluralité des premières valeurs caractéristiques qui sont extraites en provenance de la pluralité d'images, les premières valeurs caractéristiques à faible fréquence d'apparition, et qui spécifie les premières zones comprenant les premières valeurs caractéristiques comme étant des deuxièmes régions (par exemple, des régions importantes, des régions de recherche) ; une première unité de stockage qui stocke les premières valeurs caractéristiques qui sont extraites en provenance des deuxièmes régions, les deuxièmes régions, et les images en provenance desquelles les deuxièmes régions sont extraites ; et une unité de recherche qui effectue une recherche à l'aide de la première valeur caractéristique.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2015/051433 WO2016117039A1 (fr) | 2015-01-21 | 2015-01-21 | Dispositif de recherche d'images, procédé de recherche d'images, et support de stockage d'informations |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2015/051433 WO2016117039A1 (fr) | 2015-01-21 | 2015-01-21 | Dispositif de recherche d'images, procédé de recherche d'images, et support de stockage d'informations |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2016117039A1 true WO2016117039A1 (fr) | 2016-07-28 |
Family
ID=56416605
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2015/051433 WO2016117039A1 (fr) | 2015-01-21 | 2015-01-21 | Dispositif de recherche d'images, procédé de recherche d'images, et support de stockage d'informations |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2016117039A1 (fr) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108304506A (zh) * | 2018-01-18 | 2018-07-20 | 腾讯科技(深圳)有限公司 | 检索方法、装置及设备 |
WO2019156043A1 (fr) * | 2018-02-06 | 2019-08-15 | 日本電信電話株式会社 | Dispositif de détermination de contenu, procédé de détermination de contenu et programme |
CN112685586A (zh) * | 2019-10-18 | 2021-04-20 | 富士施乐株式会社 | 检索条件确定系统、检索系统及存储介质 |
WO2021131343A1 (fr) * | 2019-12-26 | 2021-07-01 | 株式会社ドワンゴ | Système de distribution de contenu, procédé de distribution de contenu et programme de distribution de contenu |
-
2015
- 2015-01-21 WO PCT/JP2015/051433 patent/WO2016117039A1/fr active Application Filing
Non-Patent Citations (2)
Title |
---|
SIVIC, JOSEF ET AL.: "Video Google: Efficient Visual Search of Videos", TOWARD CATEGORY-LEVEL OBJECT RECOGNITION (LNCS 4170, 2006, pages 127 - 144 * |
YUSUKE UCHIDA ET AL.: "Daikibo Tokutei Buttai Ninshiki Gijutsu Oyobi sono Saishin Kenkyu Jirei", IMAGE LAB, vol. 24, no. 12, 10 December 2013 (2013-12-10), pages 61 - 68 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108304506A (zh) * | 2018-01-18 | 2018-07-20 | 腾讯科技(深圳)有限公司 | 检索方法、装置及设备 |
CN108304506B (zh) * | 2018-01-18 | 2022-08-26 | 腾讯科技(深圳)有限公司 | 检索方法、装置及设备 |
WO2019156043A1 (fr) * | 2018-02-06 | 2019-08-15 | 日本電信電話株式会社 | Dispositif de détermination de contenu, procédé de détermination de contenu et programme |
JP2019139326A (ja) * | 2018-02-06 | 2019-08-22 | 日本電信電話株式会社 | コンテンツ判定装置、コンテンツ判定方法、及びプログラム |
CN112685586A (zh) * | 2019-10-18 | 2021-04-20 | 富士施乐株式会社 | 检索条件确定系统、检索系统及存储介质 |
WO2021131343A1 (fr) * | 2019-12-26 | 2021-07-01 | 株式会社ドワンゴ | Système de distribution de contenu, procédé de distribution de contenu et programme de distribution de contenu |
JP2021106324A (ja) * | 2019-12-26 | 2021-07-26 | 株式会社ドワンゴ | コンテンツ配信システム、コンテンツ配信方法、およびコンテンツ配信プログラム |
JP2021106378A (ja) * | 2019-12-26 | 2021-07-26 | 株式会社ドワンゴ | コンテンツ配信システム、コンテンツ配信方法、およびコンテンツ配信プログラム |
JP7408506B2 (ja) | 2019-12-26 | 2024-01-05 | 株式会社ドワンゴ | コンテンツ配信システム、コンテンツ配信方法、およびコンテンツ配信プログラム |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR102560308B1 (ko) | 외관 탐색을 위한 시스템 및 방법 | |
Shen et al. | Multiobject tracking by submodular optimization | |
JP5358083B2 (ja) | 人物画像検索装置及び画像検索装置 | |
Yu et al. | Trajectory-based ball detection and tracking in broadcast soccer video | |
JP4725690B2 (ja) | 映像識別子抽出装置 | |
CN104520875B (zh) | 优选用于搜索和检索目的的从视频内容提取描述符的方法和装置 | |
JP2022003831A (ja) | 装置、方法及びプログラム | |
JP5097280B2 (ja) | 画像及び画像群を表現、比較及び検索する方法及び装置、プログラム、コンピュータ読み取り可能な記憶媒体 | |
US9934423B2 (en) | Computerized prominent character recognition in videos | |
CN105593855A (zh) | 图像检索系统、图像检索装置以及图像检索方法 | |
CN102117313A (zh) | 一种视频检索方法和系统 | |
JP2016095849A (ja) | 前景画像分割方法及び装置、プログラム、並びに記録媒体 | |
CN102609548A (zh) | 一种基于运动目标的视频内容检索方法及系统 | |
JP5180922B2 (ja) | 画像検索システム及び画像検索方法 | |
JP4907938B2 (ja) | 少なくとも1つの画像及び画像群を表現する方法、画像又は画像群の表現、画像及び/又は画像群を比較する方法、画像又は画像群を符号化する方法、画像又は画像シーケンスを復号する方法、符号化されたデータの使用、画像又は画像群を表現する装置、画像及び/又は画像群を比較する装置、コンピュータプログラム、システム、及びコンピュータ読み取り可能な記憶媒体 | |
Li et al. | Structuring lecture videos by automatic projection screen localization and analysis | |
Omidyeganeh et al. | Video keyframe analysis using a segment-based statistical metric in a visually sensitive parametric space | |
US20170013230A1 (en) | Video processing system | |
WO2016117039A1 (fr) | Dispositif de recherche d'images, procédé de recherche d'images, et support de stockage d'informations | |
Tippaya et al. | Video shot boundary detection based on candidate segment selection and transition pattern analysis | |
JP5356289B2 (ja) | 画像検索システム | |
Xu et al. | Fast and accurate object detection using image cropping/resizing in multi-view 4K sports videos | |
JP5192437B2 (ja) | 物体領域検出装置、物体領域検出方法および物体領域検出プログラム | |
JP6948787B2 (ja) | 情報処理装置、方法およびプログラム | |
Lee et al. | Hierarchical active shape model with motion prediction for real-time tracking of non-rigid objects |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 15878734 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 15878734 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: JP |