[go: up one dir, main page]

WO2018171234A1 - Procédé et appareil de traitement de vidéo - Google Patents

Procédé et appareil de traitement de vidéo Download PDF

Info

Publication number
WO2018171234A1
WO2018171234A1 PCT/CN2017/112342 CN2017112342W WO2018171234A1 WO 2018171234 A1 WO2018171234 A1 WO 2018171234A1 CN 2017112342 W CN2017112342 W CN 2017112342W WO 2018171234 A1 WO2018171234 A1 WO 2018171234A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
target object
identification information
information
coordinate system
Prior art date
Application number
PCT/CN2017/112342
Other languages
English (en)
Chinese (zh)
Inventor
徐异凌
张文军
黄巍
胡颖
马展
吴钊
李明
吴平
Original Assignee
上海交通大学
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海交通大学, 中兴通讯股份有限公司 filed Critical 上海交通大学
Publication of WO2018171234A1 publication Critical patent/WO2018171234A1/fr

Links

Images

Definitions

  • the present disclosure relates to the field of communications, and in particular, to a video processing method and apparatus.
  • the core of these application scenarios is often the research and processing of the Region of Interest (ROI).
  • ROI Region of Interest
  • the user's area of interest that is, when the user watches the video media, the line of sight is mainly concentrated and the video area of interest.
  • the user needs to identify the video area that is of interest by identifying the video content in the received large number of videos, which leads to a problem that requires a large amount of resources and time, and there is no reasonable solution at present.
  • An embodiment of the present disclosure provides a method and an apparatus for processing a video, so as to at least solve the problem that a user needs to detect a video area that is of interest by identifying a video content in a received large number of videos in the related art, resulting in a large amount of resources and The problem of time.
  • a method for processing a video includes: marking a target object in a video, and generating identification information of the target object according to the marking result, wherein the identification information is at least set to indicate One of the following: a type of the target object, a content of the target object, spatial location information of the target object in the video, acquiring instruction information, and specifying specified identification information of the target object according to the instruction information index; Pushing or displaying part or all of the video corresponding to the specified identification information in the video.
  • the identifier information is set to at least one of: a mark type of the identifier information, a mark content type of the identifier information, a mark content of the identifier information, a length information of the identifier information, the a quality level of a part or all of the video in which the target object is located, a quantity of identification information included in part or all of the video in which the target object is located, time information corresponding to part or all of the video in which the target object is located, and the target object is located
  • the spatial location information of the part or all of the video in the video includes at least one of: a central point coordinate of the part or all of the video, a width of the part or all of the video, the part or all of the video
  • the height of the coordinate system includes one of the following: a two-dimensional space coordinate system, and a three-dimensional space coordinate system.
  • the value of the coordinate includes at least one of the following: a value of a two-dimensional rectangular coordinate system, and a value of a two-dimensional spherical coordinate system; in a three-dimensional space coordinate system, the coordinate The value is at least one of the following: the value of the three-dimensional space rectangular coordinate system, and the value of the three-dimensional spherical coordinate system.
  • the target object in the video is marked, and the identification information of the target object is generated according to the marking result, including: marking, in the process of video capturing or editing, the target object in the video, and further marking according to the marking
  • the result is generated identification information of the target object; and/or in the captured or edited video data, the target object in the video is marked, and the identification information of the target object is generated according to the marking result.
  • the acquiring the instruction information for indicating the at least one specified target object comprises: acquiring first instruction information preset by the user; and/or acquiring second instruction information obtained after analyzing the video viewing behavior of the user.
  • a video processing apparatus comprising: a marking module configured to mark a target object in a video; and a generating module configured to generate identification information of the target object according to the marking result And the identifier information is set to at least one of: a type of the target object, a content of the target object, spatial location information of the target object in the video, and an obtaining module configured to acquire an instruction And an indexing module, configured to index the specified identification information of the specified target object according to the instruction information; and the processing module is configured to push or display part or all of the video corresponding to the specified identification information in the video.
  • the identifier information is set to at least one of: a mark type of the identifier information, a mark content type of the identifier information, a length information of the identifier information, a mark content of the identifier information, the a quality level of a part or all of the video in which the target object is located, a quantity of identification information included in part or all of the video in which the target object is located, time information corresponding to the part or all of the video, and the part or all of the video is in the Spatial location information in the video.
  • the spatial location information of the part or all of the video in the video includes at least one of: a central point coordinate of the part or all of the video, a width of the part or all of the video, the part or all of the video
  • the height of the coordinate system includes one of the following: a two-dimensional space coordinate system, and a three-dimensional space coordinate system.
  • the value of the coordinate includes at least one of the following: a value of a two-dimensional rectangular coordinate system, a value of a two-dimensional spherical coordinate system; a value of a three-dimensional space rectangular coordinate system, a three-dimensional spherical coordinate Take the value.
  • the marking module comprises: a first marking unit configured to mark a target object in the video during video capture or editing; and a second marking unit configured to be in the captured or edited video data. , mark the target object in the video.
  • the obtaining module includes: a first acquiring unit configured to acquire first instruction information preset by the user; and a second acquiring unit configured to acquire second instruction information obtained after analyzing the video viewing behavior of the user.
  • a storage medium including a stored program, wherein the execution of the processing method of the video in the above embodiment is performed while the program is running.
  • a processor configured to execute a program, wherein the program is executed to execute an implementation of a processing method of a video in the above embodiment.
  • the target object in the video is marked, and then the identification information of the target object is generated according to the marking result, the identification information includes at least the spatial position information of the target object in the video, and then the instruction set to indicate the specified target object is obtained by acquiring Information, and indexing to the specified identification information of the specified target object according to the instruction information, and then pushing or displaying part or all of the video corresponding to the specified identification information according to the spatial location information in the identification information, where some or all of the video is included in the entire video.
  • the above method solves the problem that the user needs to detect the video area that is interested by identifying the video content in the received large number of videos in the related art, which causes a problem that requires a large amount of resources and time, and the user can already exist through the index video.
  • the identification information quickly captures the video of interest, which greatly saves resources and time during the video retrieval process.
  • FIG. 1 is a schematic diagram of an application environment of an optional video processing method according to an embodiment of the present disclosure
  • FIG. 2 is a flowchart of a method of processing an optional video according to an embodiment of the present disclosure
  • FIG. 3 is a structural block diagram of an optional video processing apparatus according to an embodiment of the present disclosure.
  • FIG. 4 is a block diagram showing the structure of an optional video processing apparatus according to an embodiment of the present disclosure.
  • FIG. 5 is a structural block diagram of an optional video processing apparatus according to an embodiment of the present disclosure.
  • FIG. 6 is a schematic diagram of content of an optional identification information in an embodiment of the present disclosure.
  • FIG. 7 is a schematic diagram of an optional video positioning method according to an embodiment of the present disclosure.
  • FIG. 8 is a schematic diagram of an optional video retrieval method according to an embodiment of the present disclosure.
  • FIG. 1 A schematic diagram of an application environment of an optional video processing method of the embodiment.
  • the processing method of the video may be, but is not limited to, being applied to an application environment as shown in FIG. 1.
  • the terminal 102 is connected to the server 106, wherein the server 106 may push the video file to the terminal 102.
  • An application client 104 that can receive and display video images is run on the terminal 102.
  • the server 106 marks the target object in the video image, and further generates identification information of the target object according to the marking result, wherein the identification information is at least set to indicate one of: a type of the target object, a content of the target object, and the target object is in the The spatial location information in the video image; the server 106 acquires the instruction information, and acquires the specified identification information of the specified target object according to the instruction information, wherein the instruction information is set to indicate the at least one specified target object; the server 106 pushes the video corresponding to the specified identification information, wherein The video includes: some or all of the videos. It should be noted that each step of the foregoing steps performed by the server 106 can also be performed at the terminal 102. This embodiment of the disclosure does not limit this.
  • Embodiments of the present disclosure also provide a method of processing a video.
  • 2 is a flow chart of an alternative video processing method in accordance with an embodiment of the present disclosure. As shown in FIG. 2, an optional process of the video processing method includes:
  • Step S202 marking the target object in the video, and further generating identification information of the target object according to the marking result, wherein the identification information is at least set to indicate one of: a type of the target object, a content of the target object, and the target object is in the video. Spatial location information;
  • Step S204 acquiring instruction information, and specifying specified identification information of the target object according to the instruction information index;
  • Step S206 pushing or displaying part or all of the videos corresponding to the specified identification information in the video.
  • the target object in the video is marked, and then the identification information of the target object is generated according to the marking result, and the identification information includes at least the spatial location information of the target object in the video, and then is set to indicate the specified target by acquiring
  • the instruction information of the object is indexed to the specified identification information of the specified target object according to the instruction information, and then part or all of the video corresponding to the specified identification information is pushed or displayed according to the spatial location information in the identification information, where some or all of the video is included In the entire video.
  • the above method solves the problem that the user needs to detect the video area that is interested by identifying the video content in the received large number of videos in the related art, which causes a problem that requires a large amount of resources and time, and the user can already exist through the index video.
  • the identification information quickly captures the video of interest, which greatly saves resources and time during the video retrieval process.
  • the identification information is at least set to indicate one of: a tag type of the identification information, a tag content type of the identification information, a tag content of the identification information, a length information of the identification information, and a target object
  • a tag type of the identification information a tag type of the identification information
  • a tag content of the identification information a tag content of the identification information
  • a target object The quality level of some or all of the videos, the number of identification information contained in some or all of the videos in which the target object is located, the time information corresponding to some or all of the videos, and the spatial location information of some or all of the videos in the video.
  • the spatial location information of the video in the video includes at least one of the following: the coordinates of the center point of part or all of the video, the width of part or all of the video, and the height of some or all of the video.
  • the coordinate system in which the coordinates are located includes one of the following: a two-dimensional space coordinate system, and a three-dimensional space coordinate system.
  • the value of the coordinate includes at least one of the following: a value of the two-dimensional rectangular coordinate system, and a value of the two-dimensional spherical coordinate system.
  • the value of the two-dimensional Cartesian coordinate system here can be expressed as (x, y), and the value of the two-dimensional spherical coordinate system can be expressed as (pitch angle coordinate value, yaw angle coordinate value).
  • the value of the coordinates is at least one of the following: the value of the three-dimensional space rectangular coordinate system, and the value of the three-dimensional spherical coordinate system.
  • the value of the three-dimensional rectangular coordinate system can be expressed as (x, y, z)
  • the value of the three-dimensional spherical coordinate system can be expressed as (pitch angle coordinate value, yaw angle coordinate value, roll angle).
  • the target object in the video is marked, and the identification information of the target object is generated according to the marking result, including: marking the target object in the video during video capture or editing. And generating identification information of the target object according to the marking result; and/or marking the target object in the video in the captured or edited video data, and generating identification information of the target object according to the marking result.
  • acquiring the instruction information set to indicate the at least one specified target object comprises: acquiring first instruction information preset by the user; and/or obtaining the obtained after analyzing the video viewing behavior of the user. Second instruction information.
  • module may implement a combination of software and/or hardware of a predetermined function.
  • apparatus described in the following embodiments is preferably implemented in software, hardware, or a combination of software and hardware, is also possible and contemplated.
  • FIG. 3 is a structural block diagram of an optional video processing apparatus according to an embodiment of the present disclosure. As shown in Figure 3, the device comprises:
  • a marking module 302 configured to mark a target object in the video
  • the generating module 304 is configured to generate identification information of the target object according to the marking result, wherein the identification information is at least set to indicate one of: a type of the target object, a content of the target object, and spatial location information of the target object in the video;
  • the obtaining module 306 is configured to acquire instruction information, where the instruction information is set to indicate at least one specified target object;
  • the indexing module 308 is configured to specify specified identification information of the target object according to the instruction information index
  • the processing module 310 is configured to push or display part or all of the video corresponding to the specified identification information in the video.
  • the marking module marks the target object in the video
  • the generating module generates the identification information of the target object according to the marking result
  • the identification information includes at least the spatial location information of the target object in the video
  • the acquisition module obtains the setting as Instructing instruction information of the specified target object
  • the indexing module indexes the specified identification information of the specified target object according to the instruction information
  • the processing module pushes or displays part or all of the video corresponding to the specified identification information in the video according to the spatial location information in the identification information.
  • the invention solves the problem that the user needs to detect the video area that is interested in the video content in the received large number of videos, which causes a large amount of resources and time, and the user can quickly identify the existing identification information in the video. Get video feeds of interest, greatly saving resources and time in the video retrieval process.
  • the identification information is at least set to indicate one of: a tag type of the identification information, a tag content type of the identification information, a length information of the identification information, a tag content of the identification information, and a target pair.
  • the spatial location information of the video in the video includes at least one of the following: the coordinates of the center point of part or all of the video, the width of part or all of the video, and the height of some or all of the video.
  • the coordinate system in which the coordinates are located includes one of the following: a two-dimensional space coordinate system, and a three-dimensional space coordinate system.
  • the value of the coordinate includes at least one of the following: a value of the two-dimensional rectangular coordinate system, and a value of the two-dimensional spherical coordinate system.
  • the value of the two-dimensional Cartesian coordinate system here can be expressed as (x, y), and the value of the two-dimensional spherical coordinate system can be expressed as (pitch angle coordinate value, yaw angle coordinate value).
  • the value of the coordinates is at least one of the following: the value of the three-dimensional space rectangular coordinate system, and the value of the three-dimensional spherical coordinate system.
  • the value of the three-dimensional rectangular coordinate system can be expressed as (x, y, z)
  • the value of the three-dimensional spherical coordinate system can be expressed as (pitch angle coordinate value, yaw angle coordinate value, roll angle).
  • the embodiment of the present disclosure also provides an optional video processing device.
  • 4 is a block diagram showing the structure of an optional video processing apparatus according to an embodiment of the present disclosure.
  • the marking module 302 includes: a first marking unit 3020 configured to mark a target object in the video during video capture or editing; and a second marking unit 3022 configured to be completed in acquisition or editing In the video data, mark the target object in the video.
  • the obtaining module 306 includes: a first obtaining unit 3060 configured to acquire first instruction information preset by the user; and a second obtaining unit 3062 configured to acquire second instruction information obtained after analyzing the video viewing behavior of the user.
  • the foregoing apparatus may be applied to any hardware device having the foregoing functional module in the server or the terminal, which is not limited in the embodiment of the present disclosure.
  • FIG. 5 is a structural block diagram of an optional video processing apparatus according to an embodiment of the present disclosure. As shown in Figure 5, the device includes:
  • the memory 52 is configured to store instructions executable by the processor 50; the processor 50 is configured to perform an operation of tagging a target object in the video based on instructions stored in the memory 52, thereby As a result, the identification information of the target object is generated, wherein the identification information is at least set to indicate one of: a type of the target object, a content of the target object, and spatial location information of the target object in the video; acquiring instruction information, and specifying the target according to the instruction information index The specified identification information of the object; push or display part or all of the video corresponding to the specified identification information.
  • the processor 50 described above may also perform an implementation of any of the above-described video processing methods.
  • the processor marks the target object in the video, and then generates identification information of the target object according to the marking result, where the identification information includes at least spatial location information of the target object in the video, and then is set to indicate the specified target object by acquiring
  • the instruction information is indexed to the specified identification information of the specified target object according to the instruction information, and then part or all of the video corresponding to the specified identification information is pushed according to the spatial location information in the identification information, and some or all of the video here is included in the entire video.
  • the related art needs to detect the video content that is interested in the video content in the received large number of videos, which results in a large amount of resources and The problem of time, the user can quickly obtain the video push of interest by indexing the information already existing in the video, which greatly saves the resources and time in the video retrieval process.
  • Embodiments of the present disclosure also provide a storage medium including a stored program, wherein the program is executed to perform an implementation of a processing method of a video in the above-described embodiments and its alternative examples.
  • the present embodiment introduces the technical solutions of the embodiments of the present disclosure in an embodiment based on an exemplary application scenario.
  • An embodiment of the present disclosure provides an identification information marking method based on video content and a spatial location thereof, which can perform corresponding information identification on a specific content or a video area of a specific spatial location in the video medium, thereby being able to pass the content of interest of the user.
  • the identification information provided in the embodiments of the present disclosure is associated to a corresponding video area.
  • the video area herein can be understood as a video image of a certain range around the target object to which the information is associated.
  • the size or shape of the area can be customized, which is not limited in this embodiment.
  • Video positioning that is, pre-acquisition information according to user habits, preferences, etc.
  • the identification information is based on specific video content and spatial location
  • the video area is directly located and the video is directly located. The area is pushed to the user.
  • the video positioning application of the present disclosure can realize the application of the panoramic video such as the initial viewing angle, or Priority is given to areas of interest to the user.
  • Video retrieval which directly retrieves the video content required by the user in a large number of videos. For example, in a video surveillance application scenario, it is necessary to perform fast and centralized processing on an area of interest to the user.
  • the present disclosure provides a marking method based on identification information of video content and its spatial location. Therefore, the identification information provided by the present disclosure can be retrieved to quickly retrieve the corresponding video region, which greatly improves the efficiency of video retrieval.
  • the embodiment of the present disclosure adopts the following technical solutions. It should be noted that the video content accessory tag or tag mentioned in the embodiment of the present disclosure may be understood as the identification information based on the video content and its spatial location.
  • An object of the present disclosure is to provide an identification information marking method based on video content and its spatial position, which is exemplarily: for a video picture finally presented to a user, a video area in which a specific content or a specific spatial position is attached is unique Associated specific video tag information.
  • the identification information based on the video content and its spatial location to be added may be varied.
  • the following set of information may be used as an example:
  • Information three set as exemplary information indicating the label content of the video content affiliate tag of the area;
  • Message 5 Set to indicate the spatial location of the area in the overall video.
  • the disclosure identifies information about a specific content or a specific spatial location of the video medium, and the identification information indicates a specific content category, content information, content quality, and content location of the portion of the video.
  • the video tag information provided by the present disclosure may be set up in one embodiment as a process and presentation of a client application or service.
  • the server can simultaneously analyze video content through image processing, pattern recognition, and the like in the video capture and acquisition phase. Mark specific content or specific spatial locations of the video media based on the results of the analysis.
  • the server may mark specific content or specific spatial locations of the video media during the video editing process.
  • the server tags the specific content or specific spatial location of the video media in the captured or edited video data.
  • the server may place the tagged specific content or specific spatial location information in a reserved field in the video stream or codestream.
  • the server separately creates the tag data associated with the corresponding video data.
  • the client used by the user can separately create the tag data of the corresponding video according to the user's usage habit, and feed back to the server.
  • the user After receiving the video media, the user can learn the specific content and the spatial location of the video by identifying the information, thereby performing further application processing.
  • the server may first obtain the video area that matches the user information by matching the preset user information and the identification information marked in the video. Then match the push according to user preferences or settings.
  • the server dynamically matches the identification information according to the video viewing requirement of the user for the specific content, and pushes the corresponding video area to the user.
  • the server pushes the complete video to the user
  • the terminal acquires the video area that matches the user information according to the preset user information and the identification information marked in the video, and performs matching display according to the user's preference or setting.
  • the server pushes the complete video to the user.
  • the terminal dynamically matches the identification information according to the video viewing requirement of the specific content, and displays the corresponding video area to the user.
  • the user information herein may include, but is not limited to, at least one of: a user's viewing habits, a user's preference for a particular content, a user's preference, and a user's specific use.
  • the identification information herein may be set to indicate but not limited to at least one of the following: a label type of the video content attachment label of the area, a label content type of the video content attachment label of the area, and information of the label content of the area video content attachment label, The quality level of the video content in the area, the spatial location of the area in the overall video.
  • the server can directly locate the video area that matches the user information and push the video area to the user, and the terminal can be straight.
  • the video area that matches the user information is located and displayed to the user.
  • the user information here may be the user information that is obtained in advance before the video is pushed, or may be obtained by collecting the feedback of the user in the process of the user watching the video, which is not limited in this embodiment. . If the user information is collected in advance, the matched video area can be pushed to the user in the initial stage of the user watching the video. If the user information collected during the video viewing by the user is analyzed, the user information can be analyzed and identified in the video. After the information is matched, the matched video area is pushed to the user during the subsequent viewing process of the user.
  • the above marking process can be implemented by adding new identification information to the video media related information, which can be implemented variously, preferably by the following set of information.
  • Quality_level indicates the quality level of the video content in the area
  • Label_center_yaw indicates the yaw coordinate value of the yaw angle of the center point of the label area
  • Label_center_pitch indicates the pitch angle pitch coordinate value of the center point of the label area
  • Label_width indicates the width of the label area
  • Label_height indicates the height of the label area
  • Label_type indicates the label type of the video content attachment label of the area
  • Label_info_type indicates the tag content type of the video content affiliate tag of the area
  • Label_info_content_length indicates the content length of the video content affiliate tag of the area
  • Content_byte Indicates the specific byte information of the tag content of the video content affiliate tag of the area.
  • the identification information based on the video content and its spatial position that is, quality_level, label_center_yaw, label_center_pitch, label_width, label_height, label_type, label_info_type, label_info_content_length, content_byte, is appropriately added to form a specific content and a specific spatial position.
  • the ID of the video area is appropriately added to form a specific content and a specific spatial position.
  • Label_number Indicates the number of labels included in this video area.
  • Quality_level Indicates the quality level of video content in this area. The higher the value, the higher the video quality.
  • Label_center_yaw Indicates the yaw coordinate value of the center point of the label area, in units of 0.01 degrees, ranging from [-18000, 18000).
  • Label_center_pitch Indicates the pitch coordinate value of the center point of the label area, in units of 0.01 degrees, in the range of [-9000, 9000].
  • Label_width Indicates the width of the label area, in units of 0.01 degrees.
  • Label_height Indicates the height of the label area, in units of 0.01 degrees.
  • Label_type indicates the label type of the video content attachment label of the area. The value and meaning of the label type are shown in Table 1.
  • the attached label of the video content is a face 1
  • the attached label for this video content is the license plate 2
  • the affiliate tag of this video content is a general sports target. 3
  • the affiliate tag of the video content is a general static target 4
  • the attached label for this video content is the product 5
  • the attached label for this video content is a plant 6-254 This part is reserved 255
  • the affiliate tag for this video content is a user-defined tag
  • Label_info_type indicates the tag content type of the video content attachment tag of the area. The value and meaning of the tag content type are shown in Table 2.
  • the label content is text 1
  • the tag content is a URL 2-255 This part is reserved
  • Label_info_content_length indicates the length of the label content of the video content attachment tag of the area.
  • Content_byte Indicates the specific byte information of the tag content of the video content affiliate tag of the area.
  • the label group LabelBox corresponding to a video area includes label_number label information LabelInfoBox and label area information LabelRegionBox.
  • a label information LabelInfoBox contains a label type label_type, a label content type label_info_type, a label content length label_content_length, and a label_content_length content information content_byte.
  • a label area information LabelRegionBox contains a quality level quality_level, spatial position information: label area center point information (label_center_yaw.label_center_pitch), label area width label_width, label area height label_height.
  • FIG. 6 is a schematic diagram of content of an optional identification information in an embodiment of the present disclosure.
  • Embodiment 1 Video Positioning Application
  • the panoramic video includes a 180-degree or 360-degree viewing angle range, but the human viewing angle has limitations, and the entire panoramic video content cannot be viewed at the same time, but only a part of the panoramic video is viewed. Therefore, the user can view different area videos in the panorama in different browsing order. It is worth noting that some areas where the user views the panoramic video are not completely random, but the video area is switched according to the user's personal preference.
  • the disclosure provides a label associated with a video, which is set to indicate a partial video area specific content and specific spatial space location information, and then directly locates a corresponding video area according to user preferences, and presents the part of the video to the user. The following is exemplified by several examples.
  • the information of the corresponding video area is marked in the recorded panoramic video content, and the video area containing the label is preferentially pushed to the user for viewing according to the user's preference for the label type during the user viewing process.
  • FIG. 7 is a schematic diagram of an optional video positioning method according to an embodiment of the present disclosure.
  • the tag may indicate that the corresponding area is information such as a face, a plant, and the like. If the user likes to pay attention to the plants in the video, the user can preferentially push the video content of the region to the user by positioning the plant tag and according to the corresponding spatial location information and rotation information when the user views the panoramic video.
  • the user's view area is limited at the same time, the entire panoramic video will not be viewed. Therefore, in the case of limited bandwidth, the user's region of interest can be encoded with high quality, and the non-interest region of the user can be low-quality encoded.
  • the area of the face in which the user is interested is in a high quality coding mode, and the other parts are in low quality coding.
  • the user can set a plurality of tags of interest, and push the optimal area video to the user according to various possible combinations of the tags.
  • the user is interested in a certain person and a certain car, and sets the two types as the tags of interest.
  • the video area containing both the person and the car tags is preferentially displayed when there is no simultaneous presence.
  • the tag type added in the panoramic video can be preset, or it can be a label customized by the user according to his own needs.
  • the user can combine the related tags to define the type of combination that they need.
  • the user sets a custom label for an item in the video, feeds the information of the label to the server, and the server subsequently pushes the relevant video area to the user according to the set label.
  • the content form carried by the different tags in the panoramic video may be different from the content itself, and the tag content may be text, such as a character tag, and the text content describes the person's name and resume.
  • the tag content can be a number, such as a product tag, and a digital content describing the price information.
  • the tag content can be a link, such as a plant tag, and the link content gives the URL address of the plant in detail.
  • a tag in the same video area can be associated with multiple types of content information.
  • text information of the product name, digital price of the product price or production date, and goods can be added.
  • the link information for the purchase path can be added.
  • a label of the panoramic video area setting can be nested to contain multiple sub-tags.
  • multiple character sub-tags can be nested under the same sports label so that the user can watch.
  • Embodiment 2 Virtual Reality Video Application
  • the video area viewed by the user is not a complete virtual reality video area, so the video content of interest can be pushed for the user by adding different labels.
  • a tag is added to multiple view videos, and the user sets a tag for the region of interest, and the best view video can be selected and pushed to the user according to the tag of the user's interest.
  • Embodiment 4 Video Retrieval Application
  • the acquired surveillance video is usually used to track target vehicles, target people, etc., but because these tracking behaviors often need to analyze and process a large number of surveillance videos through image processing and other technologies in a short time, for video surveillance.
  • the application brings a lot of work.
  • the video area of a specific content such as a face, a license plate, and the like can be marked during the shooting of the monitoring video, the label in the video can be directly retrieved after receiving the monitoring video. , greatly reducing the workload of video retrieval.
  • the following is exemplified by several examples.
  • FIG. 8 is a schematic diagram of an optional video retrieval method according to an embodiment of the present disclosure.
  • the specific content in the video is tagged during the monitoring video capture, and the user can directly retrieve the tags after receiving the monitoring video, for example, the tags of all the license plates can be retrieved, and the tags are obtained.
  • the associated information finally obtains the video information of all the license plates included in the video and the number information of the license plate.
  • tags can be set for video retrieval, and all relevant video regions are searched based on various combinations of these tags.
  • a person and a certain car are searched for a combined tag, and finally video information containing the two tags is obtained.
  • Video retrieval which directly retrieves the video content required by the user in a large number of videos.
  • the present disclosure provides an identification information marking method based on video content and its spatial location. Therefore, the corresponding video region can be quickly retrieved by retrieving the identification information provided by the present disclosure, thereby greatly improving the efficiency of video retrieval.
  • Embodiments of the present disclosure also provide a storage medium.
  • the foregoing storage medium may be configured to save the program code executed by the card tray pop-up method provided in the first embodiment.
  • the foregoing storage medium may be located in any one of the computer terminal groups in the computer network, or in any one of the mobile terminal groups.
  • the storage medium is arranged to store program code arranged to perform the following steps:
  • identification information is at least set to indicate one of: a type of the target object, a content of the target object, and a target object in the video. Spatial location information;
  • the disclosed technical contents may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, unit or module, and may be electrical or otherwise.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the embodiments of the present disclosure.
  • each functional unit in various embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present disclosure, or all or part of the technical solution, may be embodied in the form of a software product stored in a storage medium. A number of instructions are included to cause a computer device (which may be a personal computer, server or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present disclosure.
  • the foregoing storage medium includes: a U disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk, and the like. .
  • a method for processing a video provided by an embodiment of the present disclosure, by marking a target object in a video, and further according to The tag result generates identification information of the target object, and then obtains instruction information set to indicate the specified target object, and indexes the specified identification information to the specified target object according to the instruction information, and then pushes or displays the specified identifier according to the spatial location information in the identification information.
  • Part or all of the video corresponding to the information solves the problem that the user needs to detect the video area that is interested in the video content in the received large number of videos in the related art, which causes a large amount of resources and time, and the user can pass the index.
  • the identification information already existing in the video quickly acquires the video of interest, which greatly saves resources and time in the video retrieval process.

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

L'invention concerne un procédé et un appareil de traitement de vidéo, comprenant les étapes consistant : à étiqueter un objet cible à l'intérieur d'une vidéo et à générer ainsi des informations d'étiquette de l'objet cible en fonction d'un résultat de marquage, les informations d'étiquette étant au moins configurées afin d'indiquer l'un des éléments suivants : le type de l'objet cible, le contenu de l'objet cible et les informations de position spatiale de l'objet cible dans la vidéo (S202) ; à acquérir des informations d'instruction et à indexer des informations d'étiquette spécifiées d'un objet cible spécifié en fonction des informations d'instruction (S204) ; à pousser ou à afficher une partie ou la totalité des vidéos correspondant auxdites informations d'étiquette spécifiées à l'intérieur de ladite vidéo (S206).
PCT/CN2017/112342 2017-03-24 2017-11-22 Procédé et appareil de traitement de vidéo WO2018171234A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710186180.0 2017-03-24
CN201710186180.0A CN108628913B (zh) 2017-03-24 2017-03-24 视频的处理方法及装置

Publications (1)

Publication Number Publication Date
WO2018171234A1 true WO2018171234A1 (fr) 2018-09-27

Family

ID=63584114

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/112342 WO2018171234A1 (fr) 2017-03-24 2017-11-22 Procédé et appareil de traitement de vidéo

Country Status (2)

Country Link
CN (1) CN108628913B (fr)
WO (1) WO2018171234A1 (fr)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110798736B (zh) * 2019-11-28 2021-04-20 百度在线网络技术(北京)有限公司 视频播放方法、装置、设备和介质
CN111487889A (zh) * 2020-05-08 2020-08-04 北京金山云网络技术有限公司 控制智能设备的方法、装置、设备、控制系统及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101068342A (zh) * 2007-06-05 2007-11-07 西安理工大学 基于双摄像头联动结构的视频运动目标特写跟踪监视方法
CN101207807A (zh) * 2007-12-18 2008-06-25 孟智平 一种处理视频的方法及其系统
US20080252723A1 (en) * 2007-02-23 2008-10-16 Johnson Controls Technology Company Video processing systems and methods
CN101420595A (zh) * 2007-10-23 2009-04-29 华为技术有限公司 一种描述和捕获视频对象的方法及设备

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003079952A (ja) * 2001-09-14 2003-03-18 Square Co Ltd ビデオゲームのプログラムを記録したコンピュータ読み取り可能な記録媒体及びビデオゲームのプログラム及びビデオゲーム処理方法及びビデオゲーム処理装置
CN101930779B (zh) * 2010-07-29 2012-02-29 华为终端有限公司 一种视频批注方法及视频播放器
CN104602128A (zh) * 2014-12-31 2015-05-06 北京百度网讯科技有限公司 视频处理方法和视频处理装置
CN104837034B (zh) * 2015-03-09 2019-04-12 腾讯科技(北京)有限公司 一种信息处理方法、客户端及服务器
CN106303401B (zh) * 2015-05-12 2019-12-06 杭州海康威视数字技术股份有限公司 视频监控方法、设备及其系统和基于商场的视频监控方法
CN105843541A (zh) * 2016-03-22 2016-08-10 乐视网信息技术(北京)股份有限公司 全景视频中的目标追踪显示方法和装置
CN105847998A (zh) * 2016-03-28 2016-08-10 乐视控股(北京)有限公司 一种视频播放方法、播放终端及媒体服务器
CN105933650A (zh) * 2016-04-25 2016-09-07 北京旷视科技有限公司 视频监控系统及方法
CN106023261B (zh) * 2016-06-01 2019-11-29 无锡天脉聚源传媒科技有限公司 一种电视视频目标跟踪的方法及装置
CN106254925A (zh) * 2016-08-01 2016-12-21 乐视控股(北京)有限公司 基于视频识别的目标对象提取方法、设备以及系统
CN106303726B (zh) * 2016-08-30 2021-04-16 北京奇艺世纪科技有限公司 一种视频标签的添加方法及装置
CN106504187A (zh) * 2016-11-17 2017-03-15 乐视控股(北京)有限公司 视频识别方法以及装置
CN106534944B (zh) * 2016-11-30 2020-01-14 北京字节跳动网络技术有限公司 视频展现方法和装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080252723A1 (en) * 2007-02-23 2008-10-16 Johnson Controls Technology Company Video processing systems and methods
CN101068342A (zh) * 2007-06-05 2007-11-07 西安理工大学 基于双摄像头联动结构的视频运动目标特写跟踪监视方法
CN101420595A (zh) * 2007-10-23 2009-04-29 华为技术有限公司 一种描述和捕获视频对象的方法及设备
CN101207807A (zh) * 2007-12-18 2008-06-25 孟智平 一种处理视频的方法及其系统

Also Published As

Publication number Publication date
CN108628913A (zh) 2018-10-09
CN108628913B (zh) 2024-06-25

Similar Documents

Publication Publication Date Title
US10380170B2 (en) Integrated image searching system and service method thereof
US12394164B2 (en) Interaction analysis systems and methods
US9210385B2 (en) Method and system for metadata extraction from master-slave cameras tracking system
JP2019528599A (ja) メディア表示と同期したコンテンツアイテムの提示
KR20130105542A (ko) 이미지들 또는 이미지 시퀀스들에서의 객체 식별
CN111586474A (zh) 直播视频处理方法及装置
WO2012064494A1 (fr) Alignement et annotation de différents flux de photographies
US20110216087A1 (en) Methods and Systems for Analyzing Parts of an Electronic File
WO2018113659A1 (fr) Procédé d'affichage de données de support de diffusion en continu, dispositif, processus et support
CN108833964B (zh) 一种实时的连续帧信息植入识别系统
CN104520848A (zh) 按照出席者搜索事件
WO2018133321A1 (fr) Procédé et appareil destinés à générer des informations de prise de vue
US9538209B1 (en) Identifying items in a content stream
CN110881131B (zh) 一种直播回看视频的分类方法及其相关装置
Kim et al. Key frame selection algorithms for automatic generation of panoramic images from crowdsourced geo-tagged videos
WO2018171234A1 (fr) Procédé et appareil de traitement de vidéo
CN110413839A (zh) 一种图像采集设备的标签数据共享方法、装置和设备
US9727890B2 (en) Systems and methods for registering advertisement viewing
US9569440B2 (en) Method and apparatus for content manipulation
US20130100296A1 (en) Media content distribution
JP6422259B2 (ja) 情報提供システム
JP6740598B2 (ja) プログラム、ユーザ端末、記録装置及び情報処理システム
JP5983745B2 (ja) 画像情報提供装置、画像情報提供システム、画像情報提供方法
CN114299269A (zh) 显示方法、显示设备、显示系统、电子设备及存储介质
CN112187851A (zh) 基于5g及边缘计算的多屏信息推送方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17901791

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 13/01/2020)

122 Ep: pct application non-entry in european phase

Ref document number: 17901791

Country of ref document: EP

Kind code of ref document: A1