[go: up one dir, main page]

CN110163837B - Video noise identification method, device, equipment and computer readable storage medium - Google Patents

Video noise identification method, device, equipment and computer readable storage medium Download PDF

Info

Publication number
CN110163837B
CN110163837B CN201910048947.2A CN201910048947A CN110163837B CN 110163837 B CN110163837 B CN 110163837B CN 201910048947 A CN201910048947 A CN 201910048947A CN 110163837 B CN110163837 B CN 110163837B
Authority
CN
China
Prior art keywords
video
frame
neural network
noise
frame difference
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910048947.2A
Other languages
Chinese (zh)
Other versions
CN110163837A (en
Inventor
谯睿智
高永强
蓝玉海
张怀选
徐颖
賈佳亞
戴宇榮
沈小勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201910048947.2A priority Critical patent/CN110163837B/en
Publication of CN110163837A publication Critical patent/CN110163837A/en
Application granted granted Critical
Publication of CN110163837B publication Critical patent/CN110163837B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the invention discloses a video noise identification method, which comprises the steps of sampling a plurality of groups of continuous video frames from a video to be identified, respectively calculating a frame difference image corresponding to each group of continuous video frames, inputting the calculated frame difference image into a depth neural network for noise prediction, and outputting a video classification result, wherein the depth neural network is a neural network which is trained by taking the frame difference image of the video as the input of a training neural network, different classification results output by the depth neural network correspond to different video noise severity degrees, and outputting the noise identification result according to the video classification result. By adopting the method and the device, the characteristics of the noise different from other texture intensive areas are adaptively learned through the deep neural network, so that the interference of the texture intensive areas is eliminated, and the accuracy of classification of the severity of the noise video is greatly improved.

Description

Video noise identification method, device, equipment and computer readable storage medium
Technical Field
The present invention relates to the field of computers, and in particular, to a method for identifying video noise, a device for identifying video noise, and a computer-readable storage medium.
Background
In recent years, short video applications are gradually growing, and a short video platform in the prior art can accept tens of thousands of video updates of users every day, but many videos are subjected to serious video noise due to shooting light rays, equipment and the like of the users, and the videos can seriously influence the watching experience of other users.
The prior art mainly divides the identification or detection of the noise point of the video into two steps, firstly, extracting the video frame, analyzing the noise point of the video frame by utilizing the manual extraction characteristics, and then integrating all the extracted frames to estimate the noise point level of the video. The method of estimating the noise intensity of the video frame by using only the video frame information and then estimating by using the image noise detection algorithm is often based on the characteristic that "the noise is high frequency information of the frequency domain" or "the noise region has high variance in the spatial domain". However, according to the observation of a large amount of video data, the texture-intensive video frames, such as those of lawns, asphalt pavements, oil salt particles on food and the like, have the characteristics assumed above, so that the detection technology in the prior art cannot accurately distinguish whether the texture-intensive video frames have noise problems or not, and the accuracy is greatly reduced.
Disclosure of Invention
The technical problem to be solved by the embodiment of the invention is to provide a video noise point identification method, a video noise point identification device, video noise point identification equipment and a computer readable storage medium, which can distinguish noise points and dense textures and solve the technical problem of low detection technology accuracy in the prior art.
In order to solve the technical problems, an aspect of the embodiments of the present invention discloses a method for identifying video noise, including
Sampling a plurality of groups of continuous video frames from the video to be identified, and respectively calculating a frame difference map corresponding to each group of continuous video frames;
the depth neural network is used for training by taking the frame difference image of the video as the input of a training neural network, and different classification results output by the depth neural network correspond to different video noise severity degrees;
And outputting a noise identification result according to the video classification result.
In combination with this aspect, in one possible implementation manner, in the process of calculating the frame difference maps corresponding to each group of continuous video frames, for a group of continuous video frames, the method includes:
determining an intermediate video frame of the set of consecutive video frames and calculating an average frame of the set of consecutive video frames;
And carrying out difference operation on the intermediate video frames and the average video frames to obtain a frame difference map corresponding to the group of continuous video frames.
In combination with this aspect, in one possible implementation manner, the performing a difference operation on the intermediate video frame and the average video frame to obtain a frame difference map corresponding to the set of continuous video frames includes:
Subtracting the average video frame from the intermediate video frame, and then performing truncation processing;
and translating the truncated frame difference pixel value to a target pixel interval to obtain a frame difference map corresponding to the group of continuous video frames.
In combination with this aspect, in one possible implementation manner, the performing a difference operation on the intermediate video frame and the average video frame to obtain a frame difference map corresponding to the set of continuous video frames includes:
and subtracting the average video frame from the intermediate video frame, and taking an absolute value to obtain a frame difference map corresponding to the group of continuous video frames.
In combination with this aspect, in one possible implementation manner, the set of continuous video frames includes n frames, the step of subtracting the average video frame from the intermediate video frame, performing truncation processing, and the step of translating the truncated frame difference pixel value to a target pixel interval to obtain a frame difference map corresponding to the set of continuous video frames includes:
by the formula Calculating to obtain a frame difference map corresponding to the group of continuous video frames;
Where f i,0 is the intermediate video frame, Mu i is an average video frame, kappa is a truncation threshold, i is more than or equal to 1 and less than or equal to n, and n is a positive integer.
In combination with this aspect, in one possible implementation manner, the inputting the calculated frame difference map into a deep neural network to perform noise prediction includes:
The frame difference image is contracted according to a preset proportion, the number of first side length pixels of the contracted frame difference image is in a first range value, and the number of second side length pixels of the contracted frame difference image is in a second range value;
And inputting the contracted frame difference image into a deep neural network to perform noise prediction.
In combination with this aspect, in one possible implementation manner, the first side length pixel number is 400, and the second side length pixel number is 280.
In combination with this aspect, in one possible implementation manner, the training using the frame difference map of the video as an input of the training neural network includes:
Sampling each video in the video training set, sampling a plurality of groups of continuous video frames of the video, and respectively calculating a frame difference map corresponding to each group of continuous video frames;
inputting the calculated frame difference image and the artificial label corresponding to the video into a pre-training neural network for training, wherein the artificial label is used for indicating the noise severity of the video;
and determining the neural network of the trained prediction model through the video verification set.
In combination with this aspect, in one possible implementation manner, after the determining, by the video verification set, the neural network of the trained prediction model, the method further includes:
And performing noise prediction on the video test set through the determined neural network of the prediction model, and generating a confusion matrix of video classification according to a prediction result and the artificial label corresponding to the video test set.
In combination with this aspect, in one possible implementation manner, the depth neural network includes a depth neural network constructed by depth separable convolution, and the output of the depth neural network is 3 video classification results.
Another aspect of the embodiment of the invention discloses a video noise point identification device, which comprises:
The sampling calculation unit is used for sampling a plurality of groups of continuous video frames from the video to be identified and respectively calculating a frame difference map corresponding to each group of continuous video frames;
The prediction unit is used for inputting the calculated frame difference image into a depth neural network to perform noise prediction and outputting a video classification result, wherein the depth neural network is a neural network which is obtained by training by taking the frame difference image of the video as the input of a training neural network;
And the identification result output unit is used for outputting a noise identification result according to the video classification result.
The embodiment of the invention also discloses video noise identification equipment, which comprises a processor and a memory, wherein the processor and the memory are connected with each other, the memory is used for storing data processing codes, and the processor is configured to call the program codes to execute the video noise identification method.
Another aspect of the embodiments of the present invention discloses a computer-readable storage medium storing program instructions that, when executed by a processor, cause the processor to perform the video noise identification method described above.
According to the embodiment of the invention, a plurality of groups of continuous frames are sampled from the video to be identified, the frame difference image corresponding to each group of continuous frames is calculated respectively, the calculated frame difference image is input into the depth neural network to conduct noise prediction, and a video classification result is output, wherein the depth neural network is a neural network which is obtained by training by using the frame difference image of the video as the input of the training neural network, so that the time information of different frames of the video is utilized, the phenomenon that the noise in the noise video floats is considered, the residual image of the video frame difference is taken as the input of the depth neural network, the characteristic that the noise is different from other texture intensive areas is adaptively learned through the depth neural network, the displacement information that the residual pixels in the frame difference information are noise or dense textures can be distinguished better, the interference of the texture intensive areas is eliminated, and the accuracy of the classification of the noise video severity is greatly improved. In addition, the learning difficulty of the neural network can be greatly reduced by using the video frame difference method, so that the training of the neural network is easier to converge.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is an overall architecture diagram of a video noise identification method according to an embodiment of the present invention;
Fig. 2 is a flow chart of a method for identifying video noise according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a deep neural network training process according to an embodiment of the present invention;
Fig. 4 is a basic structural diagram of a lightweight neural network according to an embodiment of the present invention;
FIG. 5 is a diagram of the improved network architecture provided by the present invention;
fig. 6 is an application scenario schematic diagram of a video noise point identification method provided by an embodiment of the present invention;
Fig. 7 is a schematic view of an application scenario of a video noise identifying method according to another embodiment of the present invention;
fig. 8 is a schematic structural diagram of a video noise identifying apparatus according to an embodiment of the present invention;
FIG. 9 is a schematic diagram of another embodiment of a video noise recognition device according to the present invention;
Fig. 10 is a schematic structural diagram of a video noise identifying apparatus according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In order to better understand the method, the device and the equipment for identifying the video noise provided by the embodiment of the invention, the overall architecture of the video noise identification method provided by the embodiment of the invention is described first, as shown in fig. 1, the overall architecture of the video noise identification method provided by the embodiment of the invention is designed, for a pre-trained deep neural network, training is performed through training videos in a video library, specifically, the training videos are used for extracting frame differences, then the frame differences and an artificial tag are used as inputs for learning and training of the deep neural network, a trained deep neural network (the depth neural network used for predicting noise or noise) is obtained, finally, the noise prediction is performed on the video (to be identified) to be predicted through the trained deep neural network, specifically, the predicted video is also extracted for the frame differences (the mode of frame differences extraction is consistent with that of training), then the frame differences are input into the trained deep neural network, finally, a video classification result is obtained, and the noise severity of the video to be identified can be obtained according to the video classification result, so that the noise identification result is output.
The device or equipment for executing the video noise identification method in the embodiment of the invention can include, but is not limited to, network equipment such as a server, and terminal equipment such as a desktop computer, a laptop computer, a tablet computer, an intelligent terminal and the like. The server may be an independent server or a cluster server. The embodiments of the present invention are not limited.
The following is a schematic flow chart of a video noise recognition method provided by the embodiment of the present invention, which is shown in fig. 2, specifically illustrates how noise prediction is performed on a video (to-be-recognized video) to be predicted through a trained deep neural network in the embodiment of the present invention, and may include the following steps:
step 200, sampling a plurality of groups of continuous video frames from the video to be identified, and respectively calculating a frame difference map corresponding to each group of continuous video frames;
In particular, groups of consecutive video frames may be randomly sampled from the video to be identified, or sampled according to a preset rule (e.g., uniformly distributed according to the size of the entire video frame). In one embodiment, in view of the application environment of the short video, the background of the video in the short video has small change, and a large number of data experiments prove that 10 groups of continuous video frames can be sampled to respectively calculate frame difference maps corresponding to the 10 groups of continuous video frames.
In the process of respectively calculating the frame difference maps corresponding to each group of continuous video frames, aiming at one group of continuous video frames, the method comprises the steps of determining an intermediate video frame in the group of continuous video frames, calculating an average frame of the group of continuous video frames, and carrying out difference value operation on the intermediate video frame and the average video frame to obtain the frame difference map corresponding to the group of continuous video frames.
Specifically, the intermediate video frame may be truncated after subtracting the average video frame, and the truncated frame difference pixel value may be shifted to a target pixel interval, for example, a (0, 255) pixel interval or a (0,101) pixel interval, so as to obtain a frame difference map corresponding to a group of continuous video frames. Or subtracting the average video frame from the intermediate video frame, taking the absolute value, and obtaining a frame difference map corresponding to a group of continuous video frames without cutting off.
Next, taking the intermediate video frame minus the average video frame for truncation, and translating the truncated frame difference pixel value to the target pixel interval to obtain a frame difference map corresponding to a group of continuous video frames, for example, to explain:
In one embodiment, the performing a difference operation on the intermediate video frame and the average video frame to obtain a frame difference map corresponding to the set of continuous video frames may include:
by the formula 1
Calculating to obtain a frame difference map corresponding to the group of continuous video frames, wherein f i,0 is an intermediate video frame,And (3) taking the- +to represent video frames before and after the middle video frame, wherein mu i is an average video frame, kappa is a cut-off threshold value, i is more than or equal to 1 and less than or equal to n, and n is a positive integer.
For example, n is 3, thenF i,-1 is the first frame of the 3 frames, f i,0 is the second frame of the 3 frames, and f i,1 is the third frame of the 3 frames.
As another example, n is 4, then mu i can beF i,-1 is the first frame of the 3 frames, f i,0 is the second frame of the 3 frames, f i,1 is the third frame of the 3 frames, and f i,2 is the fourth frame of the 3 frames. Or mu i may beF i,-2 is the first frame of the 3 frames, f i,-1 is the second frame of the 3 frames, f i,0 is the third frame of the 3 frames, and f i,1 is the fourth frame of the 3 frames.
As can be seen from equation 1, the embodiment of the present invention performs the truncation processing on the frame difference map, where the truncation processing refers to discarding the pixel difference value of the frame difference greater than the truncation threshold, and using the truncation threshold to replace the pixel difference value, whereas the pixel value of the noise position and the surrounding undamaged pixels are generally not large, within the set threshold range, while the larger difference value appearing in the frame difference map is often caused by the non-overlapping of the inter-frame content caused by the position change, so that the embodiment of the present invention truncates the frame difference map, thereby improving the extraction efficiency of the frame difference map and the detection efficiency of the noise.
And, equation 1 shows that increasing the frame difference map by the truncation threshold κ, i.e., shifting the pixel values of the frame difference map between (0, 255), reserves space for (and does not scale) the pixel difference values of (-255, 255) of the pixel value space record of (0, 255).
The embodiment of the present invention is not limited to calculating the frame difference map corresponding to a group of continuous video frames through the above formula 1, and in another implementation manner, the intermediate video frames and the average video frames may be directly subjected to difference operation and then taken as absolute values, as shown in formula 2:
Xi i=|fi,0i |formula 2
Calculating to obtain a frame difference map corresponding to the group of continuous video frames, wherein f i,0 is an intermediate video frame,And (3) taking the- +to represent video frames before and after the middle video frame, wherein mu i is an average video frame, i is more than or equal to 1 and less than or equal to n, and n is a positive integer.
In one embodiment, n may take on values of 3, 4, 5, etc. The cutoff threshold κ may be 50, 86, 127, etc. According to the embodiment of the invention, a comparison test is carried out to test the effects of different frame difference extraction schemes after the training is participated, so that the optimal effect can be obtained by taking three frames as an average, namely n is 3, and the translation effect is optimal by taking 50 as a cut-off threshold kappa.
Step S202, inputting the calculated frame difference image into a depth neural network to conduct noise prediction, and outputting a video classification result;
the deep neural network in the embodiment of the invention can be a neural network which is obtained by training by taking a frame difference image of a video as input of the training neural network, and different classification results which are output by noise prediction through the deep neural network correspond to different video noise severity.
And step S204, outputting a noise identification result according to the video classification result.
According to the embodiment of the invention, a plurality of groups of continuous frames are sampled from the video to be identified, the frame difference image corresponding to each group of continuous frames is calculated respectively, the calculated frame difference image is input into the depth neural network to conduct noise prediction, and a video classification result is output, wherein the depth neural network is a neural network which is obtained by training by using the frame difference image of the video as the input of the training neural network, so that the time information of different frames of the video is utilized, the phenomenon that the noise in the noise video floats is considered, the residual image of the video frame difference is taken as the input of the depth neural network, the characteristic that the noise is different from other texture intensive areas is adaptively learned through the depth neural network, the displacement information that the residual pixels in the frame difference information are noise or dense textures can be distinguished better, the interference of the texture intensive areas is eliminated, and the accuracy of the classification of the noise video severity is greatly improved. In addition, the learning difficulty of the neural network can be greatly reduced by using the video frame difference method, so that the training of the neural network is easier to converge.
The following details how the deep neural network is trained according to the embodiment of the present invention with reference to fig. 3 to 5, and the flow chart of the deep neural network training provided by the embodiment of the present invention as shown in fig. 3 may include the following steps:
step 300, sampling each video in the video training set, sampling a plurality of groups of continuous video frames of the video, and respectively calculating a frame difference map corresponding to each group of continuous video frames;
Specifically, the video database may include a video training set, a video verification set and a video test set, for example, all videos in the video database are divided according to a preset proportion, videos with a first threshold value in all videos are video training sets, videos with a second threshold value in all videos are video verification sets, and videos with a third threshold value in all videos are video test sets. The sum of the first, second and third thresholds is 100%, for example 80%, 20% and 20% etc.
The video training set in the video database can be used for training the pre-training neural network, specifically, each video in the video training set is sampled first, a plurality of groups of continuous video frames of the video are sampled, and then the frame difference map corresponding to each group of continuous video frames is calculated respectively. The implementation process of extracting the frame difference in this step S300 may refer to the implementation process of step S200 in the embodiment of fig. 2.
Step S302, inputting the calculated frame difference image and the artificial label corresponding to the video into a pre-training neural network for training;
Specifically, the manual label in the embodiment of the invention is used for indicating the noise severity of the video, wherein each video in the video database can be manually marked with a label (i.e. manual label) to mark which video classification of three video classifications (serious noise, slight noise or clear) belongs. In the embodiment of the invention, the frame difference image obtained in the step S300 and the artificial label corresponding to the video of the frame difference image are input into a pre-training neural network together for training.
In one embodiment, in order to reduce the computational cost of the neural network, the deep neural network (including the pre-trained neural network) in the embodiment of the present invention includes a deep neural network constructed by deep separable convolution, for example, a lightweight neural network (MobileNet) may be selected as a network structure, and the classification output is changed into three classifications (i.e., 3 video classification results), that is, training is performed according to the artificial labels (severe noise, slight noise, no noise).
MobileNet is an efficient model proposed for mobile and embedded devices. Depth separable convolution is used to construct lightweight deep neural networks. The basic structure on which it is based is shown in fig. 4. The first step is depth-separable convolution (depth-wise), which has only M3 x3 convolution kernels, and M convolution kernels are convolved with M input maps one by one to obtain M maps, which play a role in feature extraction, and the second step is point-wise convolution (point-wise), which is actually a traditional convolution, but all convolution kernels are 1x1, and a total of M x N1 x1 play a role in feature fusion.
In order to improve accuracy and speed up the training process, embodiments of the present invention may employ a pre-trained network on an image network (ImageNet) for fine tuning. In view of the fact that the image quality belongs to the bottom layer information, the high-level semantic information only plays an auxiliary role, in one implementation manner, mobileNet can be simplified, a MobileNet network model is changed, the last two deep separation convolution layers in the traditional MobileNet network model are removed, the final classification output is changed into 3, three classifications (serious noise, slight noise and clear) of the corresponding noise are achieved, and the network structure is shown in fig. 5.
The depth separable convolution consists of two layers, a depth convolution and a point-by-point convolution. The depth of the number of input channels is obtained by convolving each input channel with a single convolution kernel using a depth convolution, and then the outputs in the depth convolution are linearly combined using a point-wise convolution, i.e. applying a simple 1x1 convolution. MobileNets batch normalization layers (Batch Normalization or Batchnorm, BN) and nonlinear activation units (RECTIFIED LINEAR Unit) were used for each layer.
The depth convolution may use a convolution kernel for each channel, and the calculated amount of the depth convolution may be dkdk M DF DFDK DK M DF.
The deep convolution is very efficient with respect to the standard convolution, however it only convolves the input channels and does not combine them to produce new features. The next layer uses the 1x1 convolution with the additional layer to compute a linear combination of the outputs of the depth convolutions to produce new features.
Then the combination of the depth convolution plus the point-wise convolution of the 1x1 convolution is called the depth separable convolution, and is initially set forth in (Rigid-motion scattering for image classification).
The calculated amount of the depth separable convolution may be DK x M x df+m x N x DF x DFDK x DK x M x df+m x N x DF, i.e. the sum of the depth convolution and the point-by-point convolution of 1x 1.
The use of a 3x3 depth separable convolution for the reduction :DK*DK*M*DF*DF+M*N*DF*DFDK*DK*M*DF*DF=1N+1D2KD K*DK*M*DF*DF+M*N*DF*DFDK*DK*M*DF*DF=1N+1DK2.MobileNet in computation by the process of integrating the volume into filtering and combining is 8 to 9 times less computation than a standard convolution.
Step S304, determining the neural network of the trained prediction model through the video verification set.
Specifically, in order to prevent overfitting, the optimal model is determined from the training results through the video verification set and is stored as a prediction model. That is, for the video verification set, each video is sampled, a plurality of groups of continuous video frames of the video are sampled, a frame difference map corresponding to each group of continuous video frames is calculated respectively (the implementation process of extracting the frame difference can refer to the implementation process of step S200 in the embodiment of fig. 2), the calculated frame difference map and the artificial label corresponding to the video are input to each trained neural network, so that the model with the best effect is determined.
In one embodiment, when the frame difference image processed by the video training set and the artificial tag are input into the pre-training neural network together for training, an optimization mode can be set to be random gradient descent (Stochastic GRADIENT DESCENT, SGD), parameters can be learning rate 1e-3, potential energy can be 0.9, batch size can be 128, and each video frame processed by a first quantity value (for example, 1000) can be set to be verified once through the video verification set, the accuracy is calculated according to the verification result and the artificial tag, and a model with the highest accuracy is reserved.
In one embodiment, step S304 may further include step S306, where noise prediction is performed on the video test set through the determined neural network of the best model, and a confusion matrix of video classification is generated according to the prediction result and the artificial label corresponding to the video test set.
Specifically, after the neural network of the trained prediction model is obtained, video prediction is performed through a video test set to obtain a prediction result, and then a confusion matrix of video classification is generated according to comparison between the artificial labels corresponding to the video test set and the prediction result, so as to judge the degree of video classification of the deep neural network.
In various embodiments of the present invention, inputting the calculated frame difference map into the deep neural network may further include:
the frame difference image is contracted according to a preset proportion, the number of first side length pixels of the contracted frame difference image is in a first range value, and the number of second side length pixels of the contracted frame difference image is in a second range value; and inputting the contracted frame difference image into a deep neural network to perform noise prediction.
Specifically, in order to obtain an input of a fixed size and further reduce the amount of calculation, the frame difference map of the video may be contracted according to the size ratio of the video itself, for example, the frame difference map may be contracted to 400 pixels on the first side (i.e., the number of pixels on the long side of the frame difference map) and 280 pixels on the second side (i.e., the number of pixels on the short side of the frame difference map).
In one implementation manner, the video noise identification method of the embodiment of the invention can also utilize a attention mechanism and combine reinforcement learning rewards (whether the final total classification is correct or not) to enable the deep neural network to autonomously learn areas with noise in a video frame in a large amount of data, and classify the severity according to the size of the noise areas, the size of noise particles, the position and other information.
The video noise identification method provided by the embodiment of the invention can be applied to various technical scenes:
For example, for checking the quality of video uploaded by a user for a detector of a short video platform, as shown in an application scene diagram of the video noise identification method provided by the embodiment of the invention in fig. 6, the video noise identification equipment side can extract a frame difference image from a video library in a mode of extracting the frame difference image according to the invention, send the frame difference image to a pre-trained neural network for training to obtain a neural network of a prediction model after training, then extract the video to be predicted from the frame difference image, and then use the prediction model for prediction, output a video classification result, and when the video classification result is confirmed to be inconsistent with the playing requirement, the video to be identified is not passed, and when the video classification result is confirmed to be consistent with the playing requirement, the video to be identified is passed. That is, the video noise identification method can objectively judge the severity of the video noise, can well replace manual verification of the noise after being deployed on line, can automatically and rapidly acquire whether the video to be identified passes identification or passes verification, and can display verification results to users.
For another example, a video is recommended to a user according to the quality of video quality for a short video platform, as shown in fig. 7, the video noise recognition device side of the video noise recognition method according to another embodiment of the present invention may extract a frame difference image from a video library according to the method of extracting a frame difference image of the present invention, send the frame difference image to a pre-trained neural network for training, obtain a trained neural network of a prediction model, then extract a video to be predicted from the frame difference image, then use the prediction model for prediction, output a video classification result, rank the video according to the video quality according to the video classification result, and recommend the video to the user according to the order of quality from good to bad.
According to the embodiment of the invention, a plurality of groups of continuous frames are sampled from the video to be identified, the frame difference image corresponding to each group of continuous frames is calculated respectively, the calculated frame difference image is input into the depth neural network to conduct noise prediction, and a video classification result is output, wherein the depth neural network is a neural network which is obtained by training by using the frame difference image of the video as the input of the training neural network, so that the time information of different frames of the video is utilized, the phenomenon that the noise in the noise video floats is considered, the residual image of the video frame difference is taken as the input of the depth neural network, the characteristic that the noise is different from other texture intensive areas is adaptively learned through the depth neural network, the displacement information that the residual pixels in the frame difference information are noise or dense textures can be distinguished better, the interference of the texture intensive areas is eliminated, and the accuracy of the classification of the noise video severity is greatly improved. In addition, the learning difficulty of the neural network can be greatly reduced by using the video frame difference method, so that the training of the neural network is easier to converge.
In order to facilitate better implementation of the foregoing solution of the embodiment of the present invention, the present invention correspondingly provides a video noise recognition device, as shown in fig. 8, where the video noise recognition device 80 includes a sampling calculation unit 800, a prediction unit 802, and a recognition result output unit 804, where
The sampling calculation unit 800 is configured to sample a plurality of groups of continuous video frames from the video to be identified, and calculate a frame difference map corresponding to each group of continuous video frames respectively;
The prediction unit 802 is configured to input the calculated frame difference image into a deep neural network for noise prediction, and output a video classification result, where the deep neural network is a neural network that is obtained by training by using the frame difference image of the video as an input of a training neural network;
and when the video to be identified is confirmed to be not in accordance with the playing requirement according to the output video classification result, the video to be identified is not identified.
The sampling calculation unit 800 may include a determination calculation unit and a difference calculation unit, wherein,
A determining and calculating unit for determining an intermediate video frame in the group of continuous video frames and calculating an average frame of the group of continuous video frames;
And the difference value operation unit is used for carrying out difference value operation on the intermediate video frame and the average video frame to obtain a frame difference map corresponding to the group of continuous video frames.
In one embodiment, the set of consecutive video frames includes n frames, and the difference operation unit may specifically:
by the formula Calculating to obtain a frame difference map corresponding to the group of continuous video frames;
Where f i,0 is the intermediate video frame, Mu i is an average video frame, kappa is a truncation threshold, i is more than or equal to 1 and less than or equal to n, and n is a positive integer.
In one embodiment, the prediction unit 802 may be configured to shrink the frame difference map according to a predetermined ratio, where the number of first edge pixels of the frame difference map after shrinkage is in a first range value, the number of second edge pixels of the frame difference map after shrinkage is in a second range value, and input the frame difference map after shrinkage into the deep neural network to perform noise prediction.
In one embodiment, the number of the first side length pixels is 400, and the number of the second side length pixels is 280.
And the recognition result output unit 804 is configured to output a noise recognition result according to the video classification result.
As shown in fig. 9, in addition to the sampling calculation unit 800, the prediction unit 802, and the recognition result output unit 804, the video noise recognition device 80 may further include a training unit 806, configured to sample each video in the video training set, sample multiple groups of continuous video frames of the video, and calculate a frame difference map corresponding to each group of continuous video frames respectively;
inputting the calculated frame difference image and the artificial label corresponding to the video into a pre-training neural network for training, wherein the artificial label is used for indicating the noise severity of the video;
and determining the neural network of the trained prediction model through the video verification set.
After determining the neural network of the trained prediction model through the video verification set, the training unit 806 may be further configured to perform noise prediction on the video test set through the determined neural network of the prediction model, and generate a confusion matrix of video classification according to the prediction result and the artificial tag corresponding to the video test set.
In one implementation, the deep neural network comprises a deep neural network constructed by depth separable convolution, and the output of the deep neural network is 3 video classification results.
The units of the video noise identifying apparatus 80 in the embodiment of the present invention are used for correspondingly executing the steps of the video noise identifying method in the embodiments of fig. 1 to 5 in the above method embodiments, and are not described herein again.
In order to facilitate better implementation of the foregoing solutions of the embodiments of the present invention, the present invention further correspondingly provides a video noise identifying device, and the following details are described with reference to the accompanying drawings:
as shown in fig. 10, in the schematic structural diagram of the video noise identifying apparatus according to the embodiment of the present invention, the video noise identifying apparatus 100 may include a processor 101, a display screen 102, a memory 104, and a communication module 105, where the processor 101, the display screen 102, the memory 104, and the communication module 105 may be connected to each other through a bus 106. The memory 104 may be a high-speed random access memory (Random Access Memory, RAM) memory or a nonvolatile memory (non-volatile memory), such as at least one disk memory, where the memory 104 includes a flash in an embodiment of the present invention. The memory 104 may also optionally be at least one storage system located remotely from the aforementioned processor 101. The memory 104 is used for storing application program codes and may include an operating system, a network communication module, a user interface module and a video noise identification program, the communication module 105 is used for information and data interaction with external devices, and the processor 101 is configured to call the program codes to execute the following steps:
sampling a plurality of groups of continuous video frames from the video to be identified, and respectively calculating a frame difference map corresponding to each group of continuous video frames;
the depth neural network is used for training by taking the frame difference image of the video as the input of a training neural network, and different classification results output by the depth neural network correspond to different video noise severity degrees;
And outputting a noise identification result according to the video classification result.
In the process of calculating the frame difference map corresponding to each group of continuous video frames by the processor 101, for a group of continuous video frames, the method may include:
determining an intermediate video frame of the set of consecutive video frames and calculating an average frame of the set of consecutive video frames;
And carrying out difference operation on the intermediate video frames and the average video frames to obtain a frame difference map corresponding to the group of continuous video frames.
The processor 101 performs a difference operation on the intermediate video frame and the average video frame to obtain a frame difference map corresponding to the group of continuous video frames, which may include:
Subtracting the average video frame from the intermediate video frame, and then performing truncation processing;
and translating the truncated frame difference pixel value to a target pixel interval to obtain a frame difference map corresponding to the group of continuous video frames.
The processor 101 performs a difference operation on the intermediate video frame and the average video frame to obtain a frame difference map corresponding to the group of continuous video frames, which may include:
and subtracting the average video frame from the intermediate video frame, and taking an absolute value to obtain a frame difference map corresponding to the group of continuous video frames.
The method for obtaining the frame difference map corresponding to the group of continuous video frames includes that the processor 101 subtracts the average video frame from the intermediate video frame, then performs truncation processing, and translates the truncated frame difference pixel value to a target pixel interval to obtain the frame difference map corresponding to the group of continuous video frames, which may include:
by the formula Calculating to obtain a frame difference map corresponding to the group of continuous video frames;
Where f i,0 is the intermediate video frame, Mu i is an average video frame, kappa is a truncation threshold, i is more than or equal to 1 and less than or equal to n, and n is a positive integer.
The processor 101 inputs the calculated frame difference map into a deep neural network to perform noise prediction, which may include:
The frame difference image is contracted according to a preset proportion, the number of first side length pixels of the contracted frame difference image is in a first range value, and the number of second side length pixels of the contracted frame difference image is in a second range value;
And inputting the contracted frame difference image into a deep neural network to perform noise prediction.
The number of the first side length pixels is 400, and the number of the second side length pixels is 2100.
Wherein the processor 101 trains using the frame difference map of the video as an input to train the neural network, may include:
Sampling each video in the video training set, sampling a plurality of groups of continuous video frames of the video, and respectively calculating a frame difference map corresponding to each group of continuous video frames;
inputting the calculated frame difference image and the artificial label corresponding to the video into a pre-training neural network for training, wherein the artificial label is used for indicating the noise severity of the video;
and determining the neural network of the trained prediction model through the video verification set.
Wherein after the processor 101 determines the neural network of the trained predictive model through the video validation set, it is further operable to:
And performing noise prediction on the video test set through the determined neural network of the prediction model, and generating a confusion matrix of video classification according to a prediction result and the artificial label corresponding to the video test set.
The depth neural network comprises a depth neural network constructed through depth separable convolution, and the output of the depth neural network is 3 video classification results.
It should be noted that, in the embodiment of the present invention, the execution steps of the processor 101 in the video noise identification device may refer to specific implementation manners of the video noise identification method in the embodiment of fig. 1 to 5 in the above method embodiments, and are not repeated here.
According to the embodiment of the invention, a plurality of groups of continuous frames are sampled from the video to be identified, the frame difference image corresponding to each group of continuous frames is calculated respectively, the calculated frame difference image is input into the depth neural network to conduct noise prediction, and a video classification result is output, wherein the depth neural network is a neural network which is obtained by training by using the frame difference image of the video as the input of the training neural network, so that the time information of different frames of the video is utilized, the phenomenon that the noise in the noise video floats is considered, the residual image of the video frame difference is taken as the input of the depth neural network, the characteristic that the noise is different from other texture intensive areas is adaptively learned through the depth neural network, the displacement information that the residual pixels in the frame difference information are noise or dense textures can be distinguished better, the interference of the texture intensive areas is eliminated, and the accuracy of the classification of the noise video severity is greatly improved. In addition, the learning difficulty of the neural network can be greatly reduced by using the video frame difference method, so that the training of the neural network is easier to converge.
Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored on a computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a random-access Memory (Random Access Memory, RAM), or the like.
The foregoing disclosure is illustrative of the present invention and is not to be construed as limiting the scope of the invention, which is defined by the appended claims.

Claims (9)

1.一种视频噪点识别方法,其特征在于,包括:1. A method for identifying video noise, comprising: 从待识别视频中采样多组连续视频帧,分别计算每组连续视频帧所对应的帧差图;其中,针对一组连续视频帧,确定所述一组连续视频帧中的中间视频帧,并计算所述一组连续视频帧的平均视频帧;Sampling multiple groups of continuous video frames from the video to be identified, and calculating the frame difference map corresponding to each group of continuous video frames respectively; wherein, for a group of continuous video frames, determining the middle video frame in the group of continuous video frames, and calculating the average video frame of the group of continuous video frames; 将所述中间视频帧减去所述平均视频帧后,进行截断处理;After subtracting the average video frame from the intermediate video frame, truncation is performed; 将截断处理后的帧差像素值平移到目标像素区间,得到所述一组连续视频帧所对应的帧差图;The frame difference pixel values after truncation processing are translated to the target pixel interval to obtain a frame difference image corresponding to the set of continuous video frames; 将计算得到的所述帧差图输入深度神经网络进行噪点预测,输出视频分类结果;其中,所述深度神经网络为利用视频的帧差图作为训练神经网络的输入而训练得到的神经网络;所述深度神经网络输出的不同的分类结果对应不同的视频噪点严重程度;Inputting the calculated frame difference image into a deep neural network for noise prediction, and outputting a video classification result; wherein the deep neural network is a neural network trained by using the frame difference image of the video as an input for training the neural network; different classification results output by the deep neural network correspond to different video noise severity levels; 根据所述视频分类结果输出噪点识别结果。Outputting a noise point recognition result according to the video classification result. 2.如权利要求1所述的方法,其特征在于,所述将所述中间视频帧与所述平均视频帧进行差值运算,得到所述一组连续视频帧所对应的帧差图,还包括:2. The method according to claim 1, wherein performing a difference operation on the intermediate video frame and the average video frame to obtain a frame difference map corresponding to the set of continuous video frames further comprises: 将所述中间视频帧减去所述平均视频帧后取绝对值,得到所述一组连续视频帧所对应的帧差图。The average video frame is subtracted from the intermediate video frame and an absolute value is taken to obtain a frame difference map corresponding to the set of continuous video frames. 3.如权利要求1所述的方法,其特征在于,所述一组连续视频帧包括n帧;所述将所述中间视频帧减去所述平均视频帧后,进行截断处理;将截断处理后的帧差像素值平移到目标像素区间,得到所述一组连续视频帧所对应的帧差图,包括:3. The method according to claim 1, wherein the group of continuous video frames includes n frames; the subtracting the average video frame from the intermediate video frame and then performing truncation processing; and translating the frame difference pixel values after truncation processing to a target pixel interval to obtain a frame difference map corresponding to the group of continuous video frames, comprising: 通过公式计算得到所述一组连续视频帧所对应的帧差图;By formula Calculate and obtain a frame difference map corresponding to the set of continuous video frames; 其中,为中间视频帧,,所述表示相对所述中间视频帧前的视频帧,所述表示相对所述中间视频帧后的视频帧,为平均视频帧,为截断阈值,,n为正整数。in, is the middle video frame, , Indicates that relative to the video frame before the intermediate video frame, the represents a video frame relative to the intermediate video frame, is the average video frame, is the cutoff threshold, , n is a positive integer. 4.如权利要求1所述的方法,其特征在于,所述将计算得到的所述帧差图输入深度神经网络进行噪点预测,包括:4. The method according to claim 1, wherein the step of inputting the calculated frame difference map into a deep neural network for noise prediction comprises: 将所述帧差图按照预定比例进行收缩,收缩后帧差图的第一边长像素个数在第一范围值中,收缩后帧差图的第二边长像素个数在第二范围值中;Shrinking the frame difference image according to a predetermined ratio, wherein the number of pixels of the first side length of the frame difference image after shrinkage is within a first range of values, and the number of pixels of the second side length of the frame difference image after shrinkage is within a second range of values; 将收缩后的帧差图输入深度神经网络进行噪点预测。The shrunken frame difference map is input into the deep neural network for noise prediction. 5.如权利要求1-4任一项所述的方法,其特征在于,所述利用视频的帧差图作为训练神经网络的输入而训练,包括:5. The method according to any one of claims 1 to 4, wherein the step of using the frame difference map of the video as an input for training the neural network comprises: 对视频训练集中的每个视频进行采样,采样出视频的多组连续视频帧,分别计算每组连续视频帧所对应的帧差图;Each video in the video training set is sampled to obtain multiple groups of continuous video frames, and the frame difference map corresponding to each group of continuous video frames is calculated respectively; 将计算得到的所述帧差图以及所述视频对应的人工标签输入预训练神经网络进行训练;所述人工标签用于指示所述视频的噪点严重程度;Inputting the calculated frame difference image and the artificial label corresponding to the video into a pre-trained neural network for training; the artificial label is used to indicate the severity of noise in the video; 通过视频验证集来确定出训练后的预测模型的神经网络。The video validation set is used to determine the neural network of the trained prediction model. 6.如权利要求5所述的方法,其特征在于,所述通过视频验证集来确定训练后的预测模型的神经网络之后,还包括:6. The method of claim 5, wherein after determining the trained neural network of the prediction model through the video verification set, the method further comprises: 通过确定出的所述预测模型的神经网络对视频测试集进行噪点预测;Performing noise prediction on a video test set by using the determined neural network of the prediction model; 根据预测结果以及所述视频测试集对应的人工标签生成视频分类的混淆矩阵。A confusion matrix for video classification is generated based on the prediction results and the manual labels corresponding to the video test set. 7.一种视频噪点识别装置,其特征在于,包括:7. A video noise recognition device, comprising: 采样计算单元,用于从待识别视频中采样多组连续视频帧,分别计算每组连续视频帧所对应的帧差图;A sampling calculation unit, used for sampling multiple groups of continuous video frames from the video to be identified, and respectively calculating the frame difference map corresponding to each group of continuous video frames; 预测单元,用于将计算得到的所述帧差图输入深度神经网络进行噪点预测,输出视频分类结果;其中,所述深度神经网络为利用视频的帧差图作为训练神经网络的输入而训练得到的神经网络;所述深度神经网络输出的不同的分类结果对应不同的视频噪点严重程度;A prediction unit, configured to input the calculated frame difference map into a deep neural network for noise prediction, and output a video classification result; wherein the deep neural network is a neural network trained by using the frame difference map of the video as an input for training the neural network; different classification results output by the deep neural network correspond to different video noise severity levels; 识别结果输出单元,用于根据所述视频分类结果输出噪点识别结果;A recognition result output unit, used to output a noise recognition result according to the video classification result; 其中,所述采样计算单元包括:Wherein, the sampling calculation unit includes: 确定计算单元,用于确定一组连续视频帧中的中间视频帧,并计算所述一组连续视频帧的平均视频帧;A determination calculation unit is used to determine a middle video frame in a group of continuous video frames and calculate an average video frame of the group of continuous video frames; 差值运算单元,用于将所述中间视频帧与所述平均视频帧进行差值运算,得到所述一组连续视频帧所对应的帧差图,包括:将所述中间视频帧减去所述平均视频帧后,进行截断处理,将截断处理后的帧差像素值平移到目标像素区间,得到所述一组连续视频帧所对应的帧差图。The difference operation unit is used to perform a difference operation on the intermediate video frame and the average video frame to obtain a frame difference map corresponding to the group of continuous video frames, including: subtracting the average video frame from the intermediate video frame, performing a truncation process, and translating the frame difference pixel value after the truncation process to a target pixel interval to obtain a frame difference map corresponding to the group of continuous video frames. 8.一种视频噪点识别设备,其特征在于,包括处理器和存储器,所述处理器和存储器相互连接,其中,所述存储器用于存储程序代码,所述处理器被配置用于调用所述程序代码,执行如权利要求1-6任一项所述的方法。8. A video noise recognition device, characterized in that it comprises a processor and a memory, the processor and the memory are connected to each other, wherein the memory is used to store program code, and the processor is configured to call the program code to execute the method according to any one of claims 1-6. 9.一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有程序指令,所述程序指令当被处理器执行时使所述处理器执行如权利要求1-6任一项所述的方法。9. A computer-readable storage medium, characterized in that the computer-readable storage medium stores program instructions, and when the program instructions are executed by a processor, the processor executes the method according to any one of claims 1 to 6.
CN201910048947.2A 2019-01-18 2019-01-18 Video noise identification method, device, equipment and computer readable storage medium Active CN110163837B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910048947.2A CN110163837B (en) 2019-01-18 2019-01-18 Video noise identification method, device, equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910048947.2A CN110163837B (en) 2019-01-18 2019-01-18 Video noise identification method, device, equipment and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN110163837A CN110163837A (en) 2019-08-23
CN110163837B true CN110163837B (en) 2024-12-17

Family

ID=67644806

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910048947.2A Active CN110163837B (en) 2019-01-18 2019-01-18 Video noise identification method, device, equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN110163837B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110703899B (en) * 2019-09-09 2020-09-25 创新奇智(南京)科技有限公司 Data center energy efficiency optimization method based on transfer learning
CN110910356A (en) * 2019-11-08 2020-03-24 北京华宇信息技术有限公司 Method for generating image noise detection model, image noise detection method and device
CN111882584A (en) * 2020-07-29 2020-11-03 广东智媒云图科技股份有限公司 A method and device for judging the amount of oil fume through grayscale images

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103971342A (en) * 2014-05-21 2014-08-06 厦门美图之家科技有限公司 Image noisy point detection method based on convolution neural network
CN106254864A (en) * 2016-09-30 2016-12-21 杭州电子科技大学 Snowflake in monitor video and noise noise detecting method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170178309A1 (en) * 2014-05-15 2017-06-22 Wrnch Inc. Methods and systems for the estimation of different types of noise in image and video signals

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103971342A (en) * 2014-05-21 2014-08-06 厦门美图之家科技有限公司 Image noisy point detection method based on convolution neural network
CN106254864A (en) * 2016-09-30 2016-12-21 杭州电子科技大学 Snowflake in monitor video and noise noise detecting method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
RESIDUAL FRAME FOR NOISY VIDEO CLASSIFICATION ACCORDING TO PERCEPTUAL QUALITY IN CONVOLUTIONAL NEURAL NETWORKS;Huaixuan Zhang et al.;《2019 IEEE Xplore》;20190805;第242-247页 *

Also Published As

Publication number Publication date
CN110163837A (en) 2019-08-23

Similar Documents

Publication Publication Date Title
US11176381B2 (en) Video object segmentation by reference-guided mask propagation
CN110807757B (en) Image quality evaluation method and device based on artificial intelligence and computer equipment
CN114549574B (en) An interactive video cutout system based on mask propagation network
US20200117906A1 (en) Space-time memory network for locating target object in video content
CN110751649B (en) Video quality evaluation method and device, electronic equipment and storage medium
CN114612832B (en) Real-time gesture detection method and device
CN110210551A (en) A kind of visual target tracking method based on adaptive main body sensitivity
KR102042168B1 (en) Methods and apparatuses for generating text to video based on time series adversarial neural network
CN110163837B (en) Video noise identification method, device, equipment and computer readable storage medium
CN115457015B (en) A method and device for image quality assessment without reference based on visual interactive perception dual-stream network
CN111027347A (en) Video identification method and device and computer equipment
CN113111716A (en) Remote sensing image semi-automatic labeling method and device based on deep learning
CN114358204B (en) No-reference image quality assessment method and system based on self-supervision
WO2018005565A1 (en) Automated selection of subjectively best images from burst captured image sequences
CN111639230B (en) Similar video screening method, device, equipment and storage medium
CN113269722A (en) Training method for generating countermeasure network and high-resolution image reconstruction method
CN115661163B (en) Small sample segmentation method based on generalization feature and region proposal loss function
TWI803243B (en) Method for expanding images, computer device and storage medium
CN112802076A (en) Reflection image generation model and training method of reflection removal model
CN110659641A (en) Character recognition method and device and electronic equipment
CN114445755A (en) A video quality evaluation method, device, equipment and storage medium
CN119314020A (en) Visual recognition method and device based on pulse neural network
Ertan et al. Enhancement of underwater images with artificial intelligence
CN118433446B (en) Video optimization processing method, system, device and storage medium
CN113570509B (en) Data processing method and computer device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TG01 Patent term adjustment
TG01 Patent term adjustment