CN110163837B - Video noise identification method, device, equipment and computer readable storage medium - Google Patents
Video noise identification method, device, equipment and computer readable storage medium Download PDFInfo
- Publication number
- CN110163837B CN110163837B CN201910048947.2A CN201910048947A CN110163837B CN 110163837 B CN110163837 B CN 110163837B CN 201910048947 A CN201910048947 A CN 201910048947A CN 110163837 B CN110163837 B CN 110163837B
- Authority
- CN
- China
- Prior art keywords
- video
- frame
- neural network
- noise
- frame difference
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
- 
        - G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
 
- 
        - G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
 
- 
        - G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
 
- 
        - G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
 
- 
        - G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
 
- 
        - G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
 
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Quality & Reliability (AREA)
- Image Analysis (AREA)
Abstract
The embodiment of the invention discloses a video noise identification method, which comprises the steps of sampling a plurality of groups of continuous video frames from a video to be identified, respectively calculating a frame difference image corresponding to each group of continuous video frames, inputting the calculated frame difference image into a depth neural network for noise prediction, and outputting a video classification result, wherein the depth neural network is a neural network which is trained by taking the frame difference image of the video as the input of a training neural network, different classification results output by the depth neural network correspond to different video noise severity degrees, and outputting the noise identification result according to the video classification result. By adopting the method and the device, the characteristics of the noise different from other texture intensive areas are adaptively learned through the deep neural network, so that the interference of the texture intensive areas is eliminated, and the accuracy of classification of the severity of the noise video is greatly improved.
    Description
Technical Field
      The present invention relates to the field of computers, and in particular, to a method for identifying video noise, a device for identifying video noise, and a computer-readable storage medium.
    Background
      In recent years, short video applications are gradually growing, and a short video platform in the prior art can accept tens of thousands of video updates of users every day, but many videos are subjected to serious video noise due to shooting light rays, equipment and the like of the users, and the videos can seriously influence the watching experience of other users.
      The prior art mainly divides the identification or detection of the noise point of the video into two steps, firstly, extracting the video frame, analyzing the noise point of the video frame by utilizing the manual extraction characteristics, and then integrating all the extracted frames to estimate the noise point level of the video. The method of estimating the noise intensity of the video frame by using only the video frame information and then estimating by using the image noise detection algorithm is often based on the characteristic that "the noise is high frequency information of the frequency domain" or "the noise region has high variance in the spatial domain". However, according to the observation of a large amount of video data, the texture-intensive video frames, such as those of lawns, asphalt pavements, oil salt particles on food and the like, have the characteristics assumed above, so that the detection technology in the prior art cannot accurately distinguish whether the texture-intensive video frames have noise problems or not, and the accuracy is greatly reduced.
    Disclosure of Invention
      The technical problem to be solved by the embodiment of the invention is to provide a video noise point identification method, a video noise point identification device, video noise point identification equipment and a computer readable storage medium, which can distinguish noise points and dense textures and solve the technical problem of low detection technology accuracy in the prior art.
      In order to solve the technical problems, an aspect of the embodiments of the present invention discloses a method for identifying video noise, including
      Sampling a plurality of groups of continuous video frames from the video to be identified, and respectively calculating a frame difference map corresponding to each group of continuous video frames;
       the depth neural network is used for training by taking the frame difference image of the video as the input of a training neural network, and different classification results output by the depth neural network correspond to different video noise severity degrees; 
       And outputting a noise identification result according to the video classification result. 
      In combination with this aspect, in one possible implementation manner, in the process of calculating the frame difference maps corresponding to each group of continuous video frames, for a group of continuous video frames, the method includes:
       determining an intermediate video frame of the set of consecutive video frames and calculating an average frame of the set of consecutive video frames; 
       And carrying out difference operation on the intermediate video frames and the average video frames to obtain a frame difference map corresponding to the group of continuous video frames. 
      In combination with this aspect, in one possible implementation manner, the performing a difference operation on the intermediate video frame and the average video frame to obtain a frame difference map corresponding to the set of continuous video frames includes:
       Subtracting the average video frame from the intermediate video frame, and then performing truncation processing; 
       and translating the truncated frame difference pixel value to a target pixel interval to obtain a frame difference map corresponding to the group of continuous video frames. 
      In combination with this aspect, in one possible implementation manner, the performing a difference operation on the intermediate video frame and the average video frame to obtain a frame difference map corresponding to the set of continuous video frames includes:
       and subtracting the average video frame from the intermediate video frame, and taking an absolute value to obtain a frame difference map corresponding to the group of continuous video frames. 
      In combination with this aspect, in one possible implementation manner, the set of continuous video frames includes n frames, the step of subtracting the average video frame from the intermediate video frame, performing truncation processing, and the step of translating the truncated frame difference pixel value to a target pixel interval to obtain a frame difference map corresponding to the set of continuous video frames includes:
       by the formula Calculating to obtain a frame difference map corresponding to the group of continuous video frames;
       Where f i,0 is the intermediate video frame, Mu i is an average video frame, kappa is a truncation threshold, i is more than or equal to 1 and less than or equal to n, and n is a positive integer.
      In combination with this aspect, in one possible implementation manner, the inputting the calculated frame difference map into a deep neural network to perform noise prediction includes:
       The frame difference image is contracted according to a preset proportion, the number of first side length pixels of the contracted frame difference image is in a first range value, and the number of second side length pixels of the contracted frame difference image is in a second range value; 
       And inputting the contracted frame difference image into a deep neural network to perform noise prediction. 
      In combination with this aspect, in one possible implementation manner, the first side length pixel number is 400, and the second side length pixel number is 280.
      In combination with this aspect, in one possible implementation manner, the training using the frame difference map of the video as an input of the training neural network includes:
       Sampling each video in the video training set, sampling a plurality of groups of continuous video frames of the video, and respectively calculating a frame difference map corresponding to each group of continuous video frames; 
       inputting the calculated frame difference image and the artificial label corresponding to the video into a pre-training neural network for training, wherein the artificial label is used for indicating the noise severity of the video; 
       and determining the neural network of the trained prediction model through the video verification set. 
      In combination with this aspect, in one possible implementation manner, after the determining, by the video verification set, the neural network of the trained prediction model, the method further includes:
       And performing noise prediction on the video test set through the determined neural network of the prediction model, and generating a confusion matrix of video classification according to a prediction result and the artificial label corresponding to the video test set. 
      In combination with this aspect, in one possible implementation manner, the depth neural network includes a depth neural network constructed by depth separable convolution, and the output of the depth neural network is 3 video classification results.
      Another aspect of the embodiment of the invention discloses a video noise point identification device, which comprises:
       The sampling calculation unit is used for sampling a plurality of groups of continuous video frames from the video to be identified and respectively calculating a frame difference map corresponding to each group of continuous video frames; 
       The prediction unit is used for inputting the calculated frame difference image into a depth neural network to perform noise prediction and outputting a video classification result, wherein the depth neural network is a neural network which is obtained by training by taking the frame difference image of the video as the input of a training neural network; 
       And the identification result output unit is used for outputting a noise identification result according to the video classification result. 
      The embodiment of the invention also discloses video noise identification equipment, which comprises a processor and a memory, wherein the processor and the memory are connected with each other, the memory is used for storing data processing codes, and the processor is configured to call the program codes to execute the video noise identification method.
      Another aspect of the embodiments of the present invention discloses a computer-readable storage medium storing program instructions that, when executed by a processor, cause the processor to perform the video noise identification method described above.
      According to the embodiment of the invention, a plurality of groups of continuous frames are sampled from the video to be identified, the frame difference image corresponding to each group of continuous frames is calculated respectively, the calculated frame difference image is input into the depth neural network to conduct noise prediction, and a video classification result is output, wherein the depth neural network is a neural network which is obtained by training by using the frame difference image of the video as the input of the training neural network, so that the time information of different frames of the video is utilized, the phenomenon that the noise in the noise video floats is considered, the residual image of the video frame difference is taken as the input of the depth neural network, the characteristic that the noise is different from other texture intensive areas is adaptively learned through the depth neural network, the displacement information that the residual pixels in the frame difference information are noise or dense textures can be distinguished better, the interference of the texture intensive areas is eliminated, and the accuracy of the classification of the noise video severity is greatly improved. In addition, the learning difficulty of the neural network can be greatly reduced by using the video frame difference method, so that the training of the neural network is easier to converge.
    Drawings
      In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
      Fig. 1 is an overall architecture diagram of a video noise identification method according to an embodiment of the present invention;
       Fig. 2 is a flow chart of a method for identifying video noise according to an embodiment of the present invention; 
       FIG. 3 is a schematic diagram of a deep neural network training process according to an embodiment of the present invention; 
       Fig. 4 is a basic structural diagram of a lightweight neural network according to an embodiment of the present invention; 
       FIG. 5 is a diagram of the improved network architecture provided by the present invention; 
       fig. 6 is an application scenario schematic diagram of a video noise point identification method provided by an embodiment of the present invention; 
       Fig. 7 is a schematic view of an application scenario of a video noise identifying method according to another embodiment of the present invention; 
       fig. 8 is a schematic structural diagram of a video noise identifying apparatus according to an embodiment of the present invention; 
       FIG. 9 is a schematic diagram of another embodiment of a video noise recognition device according to the present invention; 
       Fig. 10 is a schematic structural diagram of a video noise identifying apparatus according to an embodiment of the present invention. 
    Detailed Description
      The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
      In order to better understand the method, the device and the equipment for identifying the video noise provided by the embodiment of the invention, the overall architecture of the video noise identification method provided by the embodiment of the invention is described first, as shown in fig. 1, the overall architecture of the video noise identification method provided by the embodiment of the invention is designed, for a pre-trained deep neural network, training is performed through training videos in a video library, specifically, the training videos are used for extracting frame differences, then the frame differences and an artificial tag are used as inputs for learning and training of the deep neural network, a trained deep neural network (the depth neural network used for predicting noise or noise) is obtained, finally, the noise prediction is performed on the video (to be identified) to be predicted through the trained deep neural network, specifically, the predicted video is also extracted for the frame differences (the mode of frame differences extraction is consistent with that of training), then the frame differences are input into the trained deep neural network, finally, a video classification result is obtained, and the noise severity of the video to be identified can be obtained according to the video classification result, so that the noise identification result is output.
      The device or equipment for executing the video noise identification method in the embodiment of the invention can include, but is not limited to, network equipment such as a server, and terminal equipment such as a desktop computer, a laptop computer, a tablet computer, an intelligent terminal and the like. The server may be an independent server or a cluster server. The embodiments of the present invention are not limited.
      The following is a schematic flow chart of a video noise recognition method provided by the embodiment of the present invention, which is shown in fig. 2, specifically illustrates how noise prediction is performed on a video (to-be-recognized video) to be predicted through a trained deep neural network in the embodiment of the present invention, and may include the following steps:
       step 200, sampling a plurality of groups of continuous video frames from the video to be identified, and respectively calculating a frame difference map corresponding to each group of continuous video frames; 
       In particular, groups of consecutive video frames may be randomly sampled from the video to be identified, or sampled according to a preset rule (e.g., uniformly distributed according to the size of the entire video frame). In one embodiment, in view of the application environment of the short video, the background of the video in the short video has small change, and a large number of data experiments prove that 10 groups of continuous video frames can be sampled to respectively calculate frame difference maps corresponding to the 10 groups of continuous video frames. 
      In the process of respectively calculating the frame difference maps corresponding to each group of continuous video frames, aiming at one group of continuous video frames, the method comprises the steps of determining an intermediate video frame in the group of continuous video frames, calculating an average frame of the group of continuous video frames, and carrying out difference value operation on the intermediate video frame and the average video frame to obtain the frame difference map corresponding to the group of continuous video frames.
      Specifically, the intermediate video frame may be truncated after subtracting the average video frame, and the truncated frame difference pixel value may be shifted to a target pixel interval, for example, a (0, 255) pixel interval or a (0,101) pixel interval, so as to obtain a frame difference map corresponding to a group of continuous video frames. Or subtracting the average video frame from the intermediate video frame, taking the absolute value, and obtaining a frame difference map corresponding to a group of continuous video frames without cutting off.
      Next, taking the intermediate video frame minus the average video frame for truncation, and translating the truncated frame difference pixel value to the target pixel interval to obtain a frame difference map corresponding to a group of continuous video frames, for example, to explain:
       In one embodiment, the performing a difference operation on the intermediate video frame and the average video frame to obtain a frame difference map corresponding to the set of continuous video frames may include: 
       by the formula 1 
      Calculating to obtain a frame difference map corresponding to the group of continuous video frames, wherein f i,0 is an intermediate video frame,And (3) taking the- +to represent video frames before and after the middle video frame, wherein mu i is an average video frame, kappa is a cut-off threshold value, i is more than or equal to 1 and less than or equal to n, and n is a positive integer.
      For example, n is 3, thenF i,-1 is the first frame of the 3 frames, f i,0 is the second frame of the 3 frames, and f i,1 is the third frame of the 3 frames.
      As another example, n is 4, then mu i can beF i,-1 is the first frame of the 3 frames, f i,0 is the second frame of the 3 frames, f i,1 is the third frame of the 3 frames, and f i,2 is the fourth frame of the 3 frames. Or mu i may beF i,-2 is the first frame of the 3 frames, f i,-1 is the second frame of the 3 frames, f i,0 is the third frame of the 3 frames, and f i,1 is the fourth frame of the 3 frames.
      As can be seen from equation 1, the embodiment of the present invention performs the truncation processing on the frame difference map, where the truncation processing refers to discarding the pixel difference value of the frame difference greater than the truncation threshold, and using the truncation threshold to replace the pixel difference value, whereas the pixel value of the noise position and the surrounding undamaged pixels are generally not large, within the set threshold range, while the larger difference value appearing in the frame difference map is often caused by the non-overlapping of the inter-frame content caused by the position change, so that the embodiment of the present invention truncates the frame difference map, thereby improving the extraction efficiency of the frame difference map and the detection efficiency of the noise.
      And, equation 1 shows that increasing the frame difference map by the truncation threshold κ, i.e., shifting the pixel values of the frame difference map between (0, 255), reserves space for (and does not scale) the pixel difference values of (-255, 255) of the pixel value space record of (0, 255).
      The embodiment of the present invention is not limited to calculating the frame difference map corresponding to a group of continuous video frames through the above formula 1, and in another implementation manner, the intermediate video frames and the average video frames may be directly subjected to difference operation and then taken as absolute values, as shown in formula 2:
       Xi i=|fi,0-μi |formula 2 
      Calculating to obtain a frame difference map corresponding to the group of continuous video frames, wherein f i,0 is an intermediate video frame,And (3) taking the- +to represent video frames before and after the middle video frame, wherein mu i is an average video frame, i is more than or equal to 1 and less than or equal to n, and n is a positive integer.
      In one embodiment, n may take on values of 3, 4, 5, etc. The cutoff threshold κ may be 50, 86, 127, etc. According to the embodiment of the invention, a comparison test is carried out to test the effects of different frame difference extraction schemes after the training is participated, so that the optimal effect can be obtained by taking three frames as an average, namely n is 3, and the translation effect is optimal by taking 50 as a cut-off threshold kappa.
      Step S202, inputting the calculated frame difference image into a depth neural network to conduct noise prediction, and outputting a video classification result;
       the deep neural network in the embodiment of the invention can be a neural network which is obtained by training by taking a frame difference image of a video as input of the training neural network, and different classification results which are output by noise prediction through the deep neural network correspond to different video noise severity. 
      And step S204, outputting a noise identification result according to the video classification result.
      According to the embodiment of the invention, a plurality of groups of continuous frames are sampled from the video to be identified, the frame difference image corresponding to each group of continuous frames is calculated respectively, the calculated frame difference image is input into the depth neural network to conduct noise prediction, and a video classification result is output, wherein the depth neural network is a neural network which is obtained by training by using the frame difference image of the video as the input of the training neural network, so that the time information of different frames of the video is utilized, the phenomenon that the noise in the noise video floats is considered, the residual image of the video frame difference is taken as the input of the depth neural network, the characteristic that the noise is different from other texture intensive areas is adaptively learned through the depth neural network, the displacement information that the residual pixels in the frame difference information are noise or dense textures can be distinguished better, the interference of the texture intensive areas is eliminated, and the accuracy of the classification of the noise video severity is greatly improved. In addition, the learning difficulty of the neural network can be greatly reduced by using the video frame difference method, so that the training of the neural network is easier to converge.
      The following details how the deep neural network is trained according to the embodiment of the present invention with reference to fig. 3 to 5, and the flow chart of the deep neural network training provided by the embodiment of the present invention as shown in fig. 3 may include the following steps:
       step 300, sampling each video in the video training set, sampling a plurality of groups of continuous video frames of the video, and respectively calculating a frame difference map corresponding to each group of continuous video frames; 
       Specifically, the video database may include a video training set, a video verification set and a video test set, for example, all videos in the video database are divided according to a preset proportion, videos with a first threshold value in all videos are video training sets, videos with a second threshold value in all videos are video verification sets, and videos with a third threshold value in all videos are video test sets. The sum of the first, second and third thresholds is 100%, for example 80%, 20% and 20% etc. 
      The video training set in the video database can be used for training the pre-training neural network, specifically, each video in the video training set is sampled first, a plurality of groups of continuous video frames of the video are sampled, and then the frame difference map corresponding to each group of continuous video frames is calculated respectively. The implementation process of extracting the frame difference in this step S300 may refer to the implementation process of step S200 in the embodiment of fig. 2.
      Step S302, inputting the calculated frame difference image and the artificial label corresponding to the video into a pre-training neural network for training;
       Specifically, the manual label in the embodiment of the invention is used for indicating the noise severity of the video, wherein each video in the video database can be manually marked with a label (i.e. manual label) to mark which video classification of three video classifications (serious noise, slight noise or clear) belongs. In the embodiment of the invention, the frame difference image obtained in the step S300 and the artificial label corresponding to the video of the frame difference image are input into a pre-training neural network together for training. 
      In one embodiment, in order to reduce the computational cost of the neural network, the deep neural network (including the pre-trained neural network) in the embodiment of the present invention includes a deep neural network constructed by deep separable convolution, for example, a lightweight neural network (MobileNet) may be selected as a network structure, and the classification output is changed into three classifications (i.e., 3 video classification results), that is, training is performed according to the artificial labels (severe noise, slight noise, no noise).
      MobileNet is an efficient model proposed for mobile and embedded devices. Depth separable convolution is used to construct lightweight deep neural networks. The basic structure on which it is based is shown in fig. 4. The first step is depth-separable convolution (depth-wise), which has only M3 x3 convolution kernels, and M convolution kernels are convolved with M input maps one by one to obtain M maps, which play a role in feature extraction, and the second step is point-wise convolution (point-wise), which is actually a traditional convolution, but all convolution kernels are 1x1, and a total of M x N1 x1 play a role in feature fusion.
      In order to improve accuracy and speed up the training process, embodiments of the present invention may employ a pre-trained network on an image network (ImageNet) for fine tuning. In view of the fact that the image quality belongs to the bottom layer information, the high-level semantic information only plays an auxiliary role, in one implementation manner, mobileNet can be simplified, a MobileNet network model is changed, the last two deep separation convolution layers in the traditional MobileNet network model are removed, the final classification output is changed into 3, three classifications (serious noise, slight noise and clear) of the corresponding noise are achieved, and the network structure is shown in fig. 5.
      The depth separable convolution consists of two layers, a depth convolution and a point-by-point convolution. The depth of the number of input channels is obtained by convolving each input channel with a single convolution kernel using a depth convolution, and then the outputs in the depth convolution are linearly combined using a point-wise convolution, i.e. applying a simple 1x1 convolution. MobileNets batch normalization layers (Batch Normalization or Batchnorm, BN) and nonlinear activation units (RECTIFIED LINEAR Unit) were used for each layer.
      The depth convolution may use a convolution kernel for each channel, and the calculated amount of the depth convolution may be dkdk M DF DFDK DK M DF.
      The deep convolution is very efficient with respect to the standard convolution, however it only convolves the input channels and does not combine them to produce new features. The next layer uses the 1x1 convolution with the additional layer to compute a linear combination of the outputs of the depth convolutions to produce new features.
      Then the combination of the depth convolution plus the point-wise convolution of the 1x1 convolution is called the depth separable convolution, and is initially set forth in (Rigid-motion scattering for image classification).
      The calculated amount of the depth separable convolution may be DK x M x df+m x N x DF x DFDK x DK x M x df+m x N x DF, i.e. the sum of the depth convolution and the point-by-point convolution of 1x 1.
      The use of a 3x3 depth separable convolution for the reduction :DK*DK*M*DF*DF+M*N*DF*DFDK*DK*M*DF*DF=1N+1D2KD K*DK*M*DF*DF+M*N*DF*DFDK*DK*M*DF*DF=1N+1DK2.MobileNet in computation by the process of integrating the volume into filtering and combining is 8 to 9 times less computation than a standard convolution.
      Step S304, determining the neural network of the trained prediction model through the video verification set.
      Specifically, in order to prevent overfitting, the optimal model is determined from the training results through the video verification set and is stored as a prediction model. That is, for the video verification set, each video is sampled, a plurality of groups of continuous video frames of the video are sampled, a frame difference map corresponding to each group of continuous video frames is calculated respectively (the implementation process of extracting the frame difference can refer to the implementation process of step S200 in the embodiment of fig. 2), the calculated frame difference map and the artificial label corresponding to the video are input to each trained neural network, so that the model with the best effect is determined.
      In one embodiment, when the frame difference image processed by the video training set and the artificial tag are input into the pre-training neural network together for training, an optimization mode can be set to be random gradient descent (Stochastic GRADIENT DESCENT, SGD), parameters can be learning rate 1e-3, potential energy can be 0.9, batch size can be 128, and each video frame processed by a first quantity value (for example, 1000) can be set to be verified once through the video verification set, the accuracy is calculated according to the verification result and the artificial tag, and a model with the highest accuracy is reserved.
      In one embodiment, step S304 may further include step S306, where noise prediction is performed on the video test set through the determined neural network of the best model, and a confusion matrix of video classification is generated according to the prediction result and the artificial label corresponding to the video test set.
      Specifically, after the neural network of the trained prediction model is obtained, video prediction is performed through a video test set to obtain a prediction result, and then a confusion matrix of video classification is generated according to comparison between the artificial labels corresponding to the video test set and the prediction result, so as to judge the degree of video classification of the deep neural network.
      In various embodiments of the present invention, inputting the calculated frame difference map into the deep neural network may further include:
       the frame difference image is contracted according to a preset proportion, the number of first side length pixels of the contracted frame difference image is in a first range value, and the number of second side length pixels of the contracted frame difference image is in a second range value; and inputting the contracted frame difference image into a deep neural network to perform noise prediction. 
      Specifically, in order to obtain an input of a fixed size and further reduce the amount of calculation, the frame difference map of the video may be contracted according to the size ratio of the video itself, for example, the frame difference map may be contracted to 400 pixels on the first side (i.e., the number of pixels on the long side of the frame difference map) and 280 pixels on the second side (i.e., the number of pixels on the short side of the frame difference map).
      In one implementation manner, the video noise identification method of the embodiment of the invention can also utilize a attention mechanism and combine reinforcement learning rewards (whether the final total classification is correct or not) to enable the deep neural network to autonomously learn areas with noise in a video frame in a large amount of data, and classify the severity according to the size of the noise areas, the size of noise particles, the position and other information.
      The video noise identification method provided by the embodiment of the invention can be applied to various technical scenes:
       For example, for checking the quality of video uploaded by a user for a detector of a short video platform, as shown in an application scene diagram of the video noise identification method provided by the embodiment of the invention in fig. 6, the video noise identification equipment side can extract a frame difference image from a video library in a mode of extracting the frame difference image according to the invention, send the frame difference image to a pre-trained neural network for training to obtain a neural network of a prediction model after training, then extract the video to be predicted from the frame difference image, and then use the prediction model for prediction, output a video classification result, and when the video classification result is confirmed to be inconsistent with the playing requirement, the video to be identified is not passed, and when the video classification result is confirmed to be consistent with the playing requirement, the video to be identified is passed. That is, the video noise identification method can objectively judge the severity of the video noise, can well replace manual verification of the noise after being deployed on line, can automatically and rapidly acquire whether the video to be identified passes identification or passes verification, and can display verification results to users. 
      For another example, a video is recommended to a user according to the quality of video quality for a short video platform, as shown in fig. 7, the video noise recognition device side of the video noise recognition method according to another embodiment of the present invention may extract a frame difference image from a video library according to the method of extracting a frame difference image of the present invention, send the frame difference image to a pre-trained neural network for training, obtain a trained neural network of a prediction model, then extract a video to be predicted from the frame difference image, then use the prediction model for prediction, output a video classification result, rank the video according to the video quality according to the video classification result, and recommend the video to the user according to the order of quality from good to bad.
      According to the embodiment of the invention, a plurality of groups of continuous frames are sampled from the video to be identified, the frame difference image corresponding to each group of continuous frames is calculated respectively, the calculated frame difference image is input into the depth neural network to conduct noise prediction, and a video classification result is output, wherein the depth neural network is a neural network which is obtained by training by using the frame difference image of the video as the input of the training neural network, so that the time information of different frames of the video is utilized, the phenomenon that the noise in the noise video floats is considered, the residual image of the video frame difference is taken as the input of the depth neural network, the characteristic that the noise is different from other texture intensive areas is adaptively learned through the depth neural network, the displacement information that the residual pixels in the frame difference information are noise or dense textures can be distinguished better, the interference of the texture intensive areas is eliminated, and the accuracy of the classification of the noise video severity is greatly improved. In addition, the learning difficulty of the neural network can be greatly reduced by using the video frame difference method, so that the training of the neural network is easier to converge.
      In order to facilitate better implementation of the foregoing solution of the embodiment of the present invention, the present invention correspondingly provides a video noise recognition device, as shown in fig. 8, where the video noise recognition device 80 includes a sampling calculation unit 800, a prediction unit 802, and a recognition result output unit 804, where
      The sampling calculation unit 800 is configured to sample a plurality of groups of continuous video frames from the video to be identified, and calculate a frame difference map corresponding to each group of continuous video frames respectively;
       The prediction unit 802 is configured to input the calculated frame difference image into a deep neural network for noise prediction, and output a video classification result, where the deep neural network is a neural network that is obtained by training by using the frame difference image of the video as an input of a training neural network; 
       and when the video to be identified is confirmed to be not in accordance with the playing requirement according to the output video classification result, the video to be identified is not identified. 
      The sampling calculation unit 800 may include a determination calculation unit and a difference calculation unit, wherein,
      A determining and calculating unit for determining an intermediate video frame in the group of continuous video frames and calculating an average frame of the group of continuous video frames;
       And the difference value operation unit is used for carrying out difference value operation on the intermediate video frame and the average video frame to obtain a frame difference map corresponding to the group of continuous video frames. 
      In one embodiment, the set of consecutive video frames includes n frames, and the difference operation unit may specifically:
       by the formula Calculating to obtain a frame difference map corresponding to the group of continuous video frames;
       Where f i,0 is the intermediate video frame, Mu i is an average video frame, kappa is a truncation threshold, i is more than or equal to 1 and less than or equal to n, and n is a positive integer.
      In one embodiment, the prediction unit 802 may be configured to shrink the frame difference map according to a predetermined ratio, where the number of first edge pixels of the frame difference map after shrinkage is in a first range value, the number of second edge pixels of the frame difference map after shrinkage is in a second range value, and input the frame difference map after shrinkage into the deep neural network to perform noise prediction.
      In one embodiment, the number of the first side length pixels is 400, and the number of the second side length pixels is 280.
      And the recognition result output unit 804 is configured to output a noise recognition result according to the video classification result.
      As shown in fig. 9, in addition to the sampling calculation unit 800, the prediction unit 802, and the recognition result output unit 804, the video noise recognition device 80 may further include a training unit 806, configured to sample each video in the video training set, sample multiple groups of continuous video frames of the video, and calculate a frame difference map corresponding to each group of continuous video frames respectively;
       inputting the calculated frame difference image and the artificial label corresponding to the video into a pre-training neural network for training, wherein the artificial label is used for indicating the noise severity of the video; 
       and determining the neural network of the trained prediction model through the video verification set. 
      After determining the neural network of the trained prediction model through the video verification set, the training unit 806 may be further configured to perform noise prediction on the video test set through the determined neural network of the prediction model, and generate a confusion matrix of video classification according to the prediction result and the artificial tag corresponding to the video test set.
      In one implementation, the deep neural network comprises a deep neural network constructed by depth separable convolution, and the output of the deep neural network is 3 video classification results.
      The units of the video noise identifying apparatus 80 in the embodiment of the present invention are used for correspondingly executing the steps of the video noise identifying method in the embodiments of fig. 1 to 5 in the above method embodiments, and are not described herein again.
      In order to facilitate better implementation of the foregoing solutions of the embodiments of the present invention, the present invention further correspondingly provides a video noise identifying device, and the following details are described with reference to the accompanying drawings:
       as shown in fig. 10, in the schematic structural diagram of the video noise identifying apparatus according to the embodiment of the present invention, the video noise identifying apparatus 100 may include a processor 101, a display screen 102, a memory 104, and a communication module 105, where the processor 101, the display screen 102, the memory 104, and the communication module 105 may be connected to each other through a bus 106. The memory 104 may be a high-speed random access memory (Random Access Memory, RAM) memory or a nonvolatile memory (non-volatile memory), such as at least one disk memory, where the memory 104 includes a flash in an embodiment of the present invention. The memory 104 may also optionally be at least one storage system located remotely from the aforementioned processor 101. The memory 104 is used for storing application program codes and may include an operating system, a network communication module, a user interface module and a video noise identification program, the communication module 105 is used for information and data interaction with external devices, and the processor 101 is configured to call the program codes to execute the following steps: 
       sampling a plurality of groups of continuous video frames from the video to be identified, and respectively calculating a frame difference map corresponding to each group of continuous video frames; 
       the depth neural network is used for training by taking the frame difference image of the video as the input of a training neural network, and different classification results output by the depth neural network correspond to different video noise severity degrees; 
       And outputting a noise identification result according to the video classification result. 
      In the process of calculating the frame difference map corresponding to each group of continuous video frames by the processor 101, for a group of continuous video frames, the method may include:
       determining an intermediate video frame of the set of consecutive video frames and calculating an average frame of the set of consecutive video frames; 
       And carrying out difference operation on the intermediate video frames and the average video frames to obtain a frame difference map corresponding to the group of continuous video frames. 
      The processor 101 performs a difference operation on the intermediate video frame and the average video frame to obtain a frame difference map corresponding to the group of continuous video frames, which may include:
       Subtracting the average video frame from the intermediate video frame, and then performing truncation processing; 
       and translating the truncated frame difference pixel value to a target pixel interval to obtain a frame difference map corresponding to the group of continuous video frames. 
      The processor 101 performs a difference operation on the intermediate video frame and the average video frame to obtain a frame difference map corresponding to the group of continuous video frames, which may include:
       and subtracting the average video frame from the intermediate video frame, and taking an absolute value to obtain a frame difference map corresponding to the group of continuous video frames. 
      The method for obtaining the frame difference map corresponding to the group of continuous video frames includes that the processor 101 subtracts the average video frame from the intermediate video frame, then performs truncation processing, and translates the truncated frame difference pixel value to a target pixel interval to obtain the frame difference map corresponding to the group of continuous video frames, which may include:
       by the formula Calculating to obtain a frame difference map corresponding to the group of continuous video frames;
       Where f i,0 is the intermediate video frame, Mu i is an average video frame, kappa is a truncation threshold, i is more than or equal to 1 and less than or equal to n, and n is a positive integer.
      The processor 101 inputs the calculated frame difference map into a deep neural network to perform noise prediction, which may include:
       The frame difference image is contracted according to a preset proportion, the number of first side length pixels of the contracted frame difference image is in a first range value, and the number of second side length pixels of the contracted frame difference image is in a second range value; 
       And inputting the contracted frame difference image into a deep neural network to perform noise prediction. 
      The number of the first side length pixels is 400, and the number of the second side length pixels is 2100.
      Wherein the processor 101 trains using the frame difference map of the video as an input to train the neural network, may include:
       Sampling each video in the video training set, sampling a plurality of groups of continuous video frames of the video, and respectively calculating a frame difference map corresponding to each group of continuous video frames; 
       inputting the calculated frame difference image and the artificial label corresponding to the video into a pre-training neural network for training, wherein the artificial label is used for indicating the noise severity of the video; 
       and determining the neural network of the trained prediction model through the video verification set. 
      Wherein after the processor 101 determines the neural network of the trained predictive model through the video validation set, it is further operable to:
       And performing noise prediction on the video test set through the determined neural network of the prediction model, and generating a confusion matrix of video classification according to a prediction result and the artificial label corresponding to the video test set. 
      The depth neural network comprises a depth neural network constructed through depth separable convolution, and the output of the depth neural network is 3 video classification results.
      It should be noted that, in the embodiment of the present invention, the execution steps of the processor 101 in the video noise identification device may refer to specific implementation manners of the video noise identification method in the embodiment of fig. 1 to 5 in the above method embodiments, and are not repeated here.
      According to the embodiment of the invention, a plurality of groups of continuous frames are sampled from the video to be identified, the frame difference image corresponding to each group of continuous frames is calculated respectively, the calculated frame difference image is input into the depth neural network to conduct noise prediction, and a video classification result is output, wherein the depth neural network is a neural network which is obtained by training by using the frame difference image of the video as the input of the training neural network, so that the time information of different frames of the video is utilized, the phenomenon that the noise in the noise video floats is considered, the residual image of the video frame difference is taken as the input of the depth neural network, the characteristic that the noise is different from other texture intensive areas is adaptively learned through the depth neural network, the displacement information that the residual pixels in the frame difference information are noise or dense textures can be distinguished better, the interference of the texture intensive areas is eliminated, and the accuracy of the classification of the noise video severity is greatly improved. In addition, the learning difficulty of the neural network can be greatly reduced by using the video frame difference method, so that the training of the neural network is easier to converge.
      Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored on a computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a random-access Memory (Random Access Memory, RAM), or the like.
      The foregoing disclosure is illustrative of the present invention and is not to be construed as limiting the scope of the invention, which is defined by the appended claims.
    Claims (9)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title | 
|---|---|---|---|
| CN201910048947.2A CN110163837B (en) | 2019-01-18 | 2019-01-18 | Video noise identification method, device, equipment and computer readable storage medium | 
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title | 
|---|---|---|---|
| CN201910048947.2A CN110163837B (en) | 2019-01-18 | 2019-01-18 | Video noise identification method, device, equipment and computer readable storage medium | 
Publications (2)
| Publication Number | Publication Date | 
|---|---|
| CN110163837A CN110163837A (en) | 2019-08-23 | 
| CN110163837B true CN110163837B (en) | 2024-12-17 | 
Family
ID=67644806
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date | 
|---|---|---|---|
| CN201910048947.2A Active CN110163837B (en) | 2019-01-18 | 2019-01-18 | Video noise identification method, device, equipment and computer readable storage medium | 
Country Status (1)
| Country | Link | 
|---|---|
| CN (1) | CN110163837B (en) | 
Families Citing this family (3)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| CN110703899B (en) * | 2019-09-09 | 2020-09-25 | 创新奇智(南京)科技有限公司 | Data center energy efficiency optimization method based on transfer learning | 
| CN110910356A (en) * | 2019-11-08 | 2020-03-24 | 北京华宇信息技术有限公司 | Method for generating image noise detection model, image noise detection method and device | 
| CN111882584A (en) * | 2020-07-29 | 2020-11-03 | 广东智媒云图科技股份有限公司 | A method and device for judging the amount of oil fume through grayscale images | 
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| CN103971342A (en) * | 2014-05-21 | 2014-08-06 | 厦门美图之家科技有限公司 | Image noisy point detection method based on convolution neural network | 
| CN106254864A (en) * | 2016-09-30 | 2016-12-21 | 杭州电子科技大学 | Snowflake in monitor video and noise noise detecting method | 
Family Cites Families (1)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| US20170178309A1 (en) * | 2014-05-15 | 2017-06-22 | Wrnch Inc. | Methods and systems for the estimation of different types of noise in image and video signals | 
- 
        2019
        - 2019-01-18 CN CN201910048947.2A patent/CN110163837B/en active Active
 
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| CN103971342A (en) * | 2014-05-21 | 2014-08-06 | 厦门美图之家科技有限公司 | Image noisy point detection method based on convolution neural network | 
| CN106254864A (en) * | 2016-09-30 | 2016-12-21 | 杭州电子科技大学 | Snowflake in monitor video and noise noise detecting method | 
Non-Patent Citations (1)
| Title | 
|---|
| RESIDUAL FRAME FOR NOISY VIDEO CLASSIFICATION ACCORDING TO PERCEPTUAL QUALITY IN CONVOLUTIONAL NEURAL NETWORKS;Huaixuan Zhang et al.;《2019 IEEE Xplore》;20190805;第242-247页 * | 
Also Published As
| Publication number | Publication date | 
|---|---|
| CN110163837A (en) | 2019-08-23 | 
Similar Documents
| Publication | Publication Date | Title | 
|---|---|---|
| US11176381B2 (en) | Video object segmentation by reference-guided mask propagation | |
| CN110807757B (en) | Image quality evaluation method and device based on artificial intelligence and computer equipment | |
| CN114549574B (en) | An interactive video cutout system based on mask propagation network | |
| US20200117906A1 (en) | Space-time memory network for locating target object in video content | |
| CN110751649B (en) | Video quality evaluation method and device, electronic equipment and storage medium | |
| CN114612832B (en) | Real-time gesture detection method and device | |
| CN110210551A (en) | A kind of visual target tracking method based on adaptive main body sensitivity | |
| KR102042168B1 (en) | Methods and apparatuses for generating text to video based on time series adversarial neural network | |
| CN110163837B (en) | Video noise identification method, device, equipment and computer readable storage medium | |
| CN115457015B (en) | A method and device for image quality assessment without reference based on visual interactive perception dual-stream network | |
| CN111027347A (en) | Video identification method and device and computer equipment | |
| CN113111716A (en) | Remote sensing image semi-automatic labeling method and device based on deep learning | |
| CN114358204B (en) | No-reference image quality assessment method and system based on self-supervision | |
| WO2018005565A1 (en) | Automated selection of subjectively best images from burst captured image sequences | |
| CN111639230B (en) | Similar video screening method, device, equipment and storage medium | |
| CN113269722A (en) | Training method for generating countermeasure network and high-resolution image reconstruction method | |
| CN115661163B (en) | Small sample segmentation method based on generalization feature and region proposal loss function | |
| TWI803243B (en) | Method for expanding images, computer device and storage medium | |
| CN112802076A (en) | Reflection image generation model and training method of reflection removal model | |
| CN110659641A (en) | Character recognition method and device and electronic equipment | |
| CN114445755A (en) | A video quality evaluation method, device, equipment and storage medium | |
| CN119314020A (en) | Visual recognition method and device based on pulse neural network | |
| Ertan et al. | Enhancement of underwater images with artificial intelligence | |
| CN118433446B (en) | Video optimization processing method, system, device and storage medium | |
| CN113570509B (en) | Data processing method and computer device | 
Legal Events
| Date | Code | Title | Description | 
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant | ||
| TG01 | Patent term adjustment | ||
| TG01 | Patent term adjustment |