CN110163837B

CN110163837B - Video noise identification method, device, equipment and computer readable storage medium

Info

Publication number: CN110163837B
Application number: CN201910048947.2A
Authority: CN
Inventors: 谯睿智; 高永强; 蓝玉海; 张怀选; 徐颖; 賈佳亞; 戴宇榮; 沈小勇
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-01-18
Filing date: 2019-01-18
Publication date: 2024-12-17
Anticipated expiration: 2039-01-18
Also published as: CN110163837A

Abstract

The embodiment of the invention discloses a video noise identification method, which comprises the steps of sampling a plurality of groups of continuous video frames from a video to be identified, respectively calculating a frame difference image corresponding to each group of continuous video frames, inputting the calculated frame difference image into a depth neural network for noise prediction, and outputting a video classification result, wherein the depth neural network is a neural network which is trained by taking the frame difference image of the video as the input of a training neural network, different classification results output by the depth neural network correspond to different video noise severity degrees, and outputting the noise identification result according to the video classification result. By adopting the method and the device, the characteristics of the noise different from other texture intensive areas are adaptively learned through the deep neural network, so that the interference of the texture intensive areas is eliminated, and the accuracy of classification of the severity of the noise video is greatly improved.

Description

Video noise identification method, device, equipment and computer readable storage medium

Technical Field

The present invention relates to the field of computers, and in particular, to a method for identifying video noise, a device for identifying video noise, and a computer-readable storage medium.

Background

In recent years, short video applications are gradually growing, and a short video platform in the prior art can accept tens of thousands of video updates of users every day, but many videos are subjected to serious video noise due to shooting light rays, equipment and the like of the users, and the videos can seriously influence the watching experience of other users.

The prior art mainly divides the identification or detection of the noise point of the video into two steps, firstly, extracting the video frame, analyzing the noise point of the video frame by utilizing the manual extraction characteristics, and then integrating all the extracted frames to estimate the noise point level of the video. The method of estimating the noise intensity of the video frame by using only the video frame information and then estimating by using the image noise detection algorithm is often based on the characteristic that "the noise is high frequency information of the frequency domain" or "the noise region has high variance in the spatial domain". However, according to the observation of a large amount of video data, the texture-intensive video frames, such as those of lawns, asphalt pavements, oil salt particles on food and the like, have the characteristics assumed above, so that the detection technology in the prior art cannot accurately distinguish whether the texture-intensive video frames have noise problems or not, and the accuracy is greatly reduced.

Disclosure of Invention

The technical problem to be solved by the embodiment of the invention is to provide a video noise point identification method, a video noise point identification device, video noise point identification equipment and a computer readable storage medium, which can distinguish noise points and dense textures and solve the technical problem of low detection technology accuracy in the prior art.

In order to solve the technical problems, an aspect of the embodiments of the present invention discloses a method for identifying video noise, including

Sampling a plurality of groups of continuous video frames from the video to be identified, and respectively calculating a frame difference map corresponding to each group of continuous video frames;

the depth neural network is used for training by taking the frame difference image of the video as the input of a training neural network, and different classification results output by the depth neural network correspond to different video noise severity degrees;

And outputting a noise identification result according to the video classification result.

In combination with this aspect, in one possible implementation manner, in the process of calculating the frame difference maps corresponding to each group of continuous video frames, for a group of continuous video frames, the method includes:

determining an intermediate video frame of the set of consecutive video frames and calculating an average frame of the set of consecutive video frames;

And carrying out difference operation on the intermediate video frames and the average video frames to obtain a frame difference map corresponding to the group of continuous video frames.

In combination with this aspect, in one possible implementation manner, the performing a difference operation on the intermediate video frame and the average video frame to obtain a frame difference map corresponding to the set of continuous video frames includes:

Subtracting the average video frame from the intermediate video frame, and then performing truncation processing;

and translating the truncated frame difference pixel value to a target pixel interval to obtain a frame difference map corresponding to the group of continuous video frames.

and subtracting the average video frame from the intermediate video frame, and taking an absolute value to obtain a frame difference map corresponding to the group of continuous video frames.

In combination with this aspect, in one possible implementation manner, the set of continuous video frames includes n frames, the step of subtracting the average video frame from the intermediate video frame, performing truncation processing, and the step of translating the truncated frame difference pixel value to a target pixel interval to obtain a frame difference map corresponding to the set of continuous video frames includes:

by the formula Calculating to obtain a frame difference map corresponding to the group of continuous video frames;

Where f _i,0 is the intermediate video frame, Mu _i is an average video frame, kappa is a truncation threshold, i is more than or equal to 1 and less than or equal to n, and n is a positive integer.

In combination with this aspect, in one possible implementation manner, the inputting the calculated frame difference map into a deep neural network to perform noise prediction includes:

The frame difference image is contracted according to a preset proportion, the number of first side length pixels of the contracted frame difference image is in a first range value, and the number of second side length pixels of the contracted frame difference image is in a second range value;

And inputting the contracted frame difference image into a deep neural network to perform noise prediction.

In combination with this aspect, in one possible implementation manner, the first side length pixel number is 400, and the second side length pixel number is 280.

In combination with this aspect, in one possible implementation manner, the training using the frame difference map of the video as an input of the training neural network includes:

Sampling each video in the video training set, sampling a plurality of groups of continuous video frames of the video, and respectively calculating a frame difference map corresponding to each group of continuous video frames;

inputting the calculated frame difference image and the artificial label corresponding to the video into a pre-training neural network for training, wherein the artificial label is used for indicating the noise severity of the video;

and determining the neural network of the trained prediction model through the video verification set.

In combination with this aspect, in one possible implementation manner, after the determining, by the video verification set, the neural network of the trained prediction model, the method further includes:

And performing noise prediction on the video test set through the determined neural network of the prediction model, and generating a confusion matrix of video classification according to a prediction result and the artificial label corresponding to the video test set.

In combination with this aspect, in one possible implementation manner, the depth neural network includes a depth neural network constructed by depth separable convolution, and the output of the depth neural network is 3 video classification results.

Another aspect of the embodiment of the invention discloses a video noise point identification device, which comprises:

The sampling calculation unit is used for sampling a plurality of groups of continuous video frames from the video to be identified and respectively calculating a frame difference map corresponding to each group of continuous video frames;

The prediction unit is used for inputting the calculated frame difference image into a depth neural network to perform noise prediction and outputting a video classification result, wherein the depth neural network is a neural network which is obtained by training by taking the frame difference image of the video as the input of a training neural network;

And the identification result output unit is used for outputting a noise identification result according to the video classification result.

The embodiment of the invention also discloses video noise identification equipment, which comprises a processor and a memory, wherein the processor and the memory are connected with each other, the memory is used for storing data processing codes, and the processor is configured to call the program codes to execute the video noise identification method.

Another aspect of the embodiments of the present invention discloses a computer-readable storage medium storing program instructions that, when executed by a processor, cause the processor to perform the video noise identification method described above.

According to the embodiment of the invention, a plurality of groups of continuous frames are sampled from the video to be identified, the frame difference image corresponding to each group of continuous frames is calculated respectively, the calculated frame difference image is input into the depth neural network to conduct noise prediction, and a video classification result is output, wherein the depth neural network is a neural network which is obtained by training by using the frame difference image of the video as the input of the training neural network, so that the time information of different frames of the video is utilized, the phenomenon that the noise in the noise video floats is considered, the residual image of the video frame difference is taken as the input of the depth neural network, the characteristic that the noise is different from other texture intensive areas is adaptively learned through the depth neural network, the displacement information that the residual pixels in the frame difference information are noise or dense textures can be distinguished better, the interference of the texture intensive areas is eliminated, and the accuracy of the classification of the noise video severity is greatly improved. In addition, the learning difficulty of the neural network can be greatly reduced by using the video frame difference method, so that the training of the neural network is easier to converge.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is an overall architecture diagram of a video noise identification method according to an embodiment of the present invention;

Fig. 2 is a flow chart of a method for identifying video noise according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a deep neural network training process according to an embodiment of the present invention;

Fig. 4 is a basic structural diagram of a lightweight neural network according to an embodiment of the present invention;

FIG. 5 is a diagram of the improved network architecture provided by the present invention;

fig. 6 is an application scenario schematic diagram of a video noise point identification method provided by an embodiment of the present invention;

Fig. 7 is a schematic view of an application scenario of a video noise identifying method according to another embodiment of the present invention;

fig. 8 is a schematic structural diagram of a video noise identifying apparatus according to an embodiment of the present invention;

FIG. 9 is a schematic diagram of another embodiment of a video noise recognition device according to the present invention;

Fig. 10 is a schematic structural diagram of a video noise identifying apparatus according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In order to better understand the method, the device and the equipment for identifying the video noise provided by the embodiment of the invention, the overall architecture of the video noise identification method provided by the embodiment of the invention is described first, as shown in fig. 1, the overall architecture of the video noise identification method provided by the embodiment of the invention is designed, for a pre-trained deep neural network, training is performed through training videos in a video library, specifically, the training videos are used for extracting frame differences, then the frame differences and an artificial tag are used as inputs for learning and training of the deep neural network, a trained deep neural network (the depth neural network used for predicting noise or noise) is obtained, finally, the noise prediction is performed on the video (to be identified) to be predicted through the trained deep neural network, specifically, the predicted video is also extracted for the frame differences (the mode of frame differences extraction is consistent with that of training), then the frame differences are input into the trained deep neural network, finally, a video classification result is obtained, and the noise severity of the video to be identified can be obtained according to the video classification result, so that the noise identification result is output.

The device or equipment for executing the video noise identification method in the embodiment of the invention can include, but is not limited to, network equipment such as a server, and terminal equipment such as a desktop computer, a laptop computer, a tablet computer, an intelligent terminal and the like. The server may be an independent server or a cluster server. The embodiments of the present invention are not limited.

The following is a schematic flow chart of a video noise recognition method provided by the embodiment of the present invention, which is shown in fig. 2, specifically illustrates how noise prediction is performed on a video (to-be-recognized video) to be predicted through a trained deep neural network in the embodiment of the present invention, and may include the following steps:

step 200, sampling a plurality of groups of continuous video frames from the video to be identified, and respectively calculating a frame difference map corresponding to each group of continuous video frames;

In particular, groups of consecutive video frames may be randomly sampled from the video to be identified, or sampled according to a preset rule (e.g., uniformly distributed according to the size of the entire video frame). In one embodiment, in view of the application environment of the short video, the background of the video in the short video has small change, and a large number of data experiments prove that 10 groups of continuous video frames can be sampled to respectively calculate frame difference maps corresponding to the 10 groups of continuous video frames.

In the process of respectively calculating the frame difference maps corresponding to each group of continuous video frames, aiming at one group of continuous video frames, the method comprises the steps of determining an intermediate video frame in the group of continuous video frames, calculating an average frame of the group of continuous video frames, and carrying out difference value operation on the intermediate video frame and the average video frame to obtain the frame difference map corresponding to the group of continuous video frames.

Specifically, the intermediate video frame may be truncated after subtracting the average video frame, and the truncated frame difference pixel value may be shifted to a target pixel interval, for example, a (0, 255) pixel interval or a (0,101) pixel interval, so as to obtain a frame difference map corresponding to a group of continuous video frames. Or subtracting the average video frame from the intermediate video frame, taking the absolute value, and obtaining a frame difference map corresponding to a group of continuous video frames without cutting off.

Next, taking the intermediate video frame minus the average video frame for truncation, and translating the truncated frame difference pixel value to the target pixel interval to obtain a frame difference map corresponding to a group of continuous video frames, for example, to explain:

In one embodiment, the performing a difference operation on the intermediate video frame and the average video frame to obtain a frame difference map corresponding to the set of continuous video frames may include:

by the formula 1

Calculating to obtain a frame difference map corresponding to the group of continuous video frames, wherein f _i,0 is an intermediate video frame,And (3) taking the- +to represent video frames before and after the middle video frame, wherein mu _i is an average video frame, kappa is a cut-off threshold value, i is more than or equal to 1 and less than or equal to n, and n is a positive integer.

For example, n is 3, thenF _i,-1 is the first frame of the 3 frames, f _i,0 is the second frame of the 3 frames, and f _i,1 is the third frame of the 3 frames.

As another example, n is 4, then mu _i can beF _i,-1 is the first frame of the 3 frames, f _i,0 is the second frame of the 3 frames, f _i,1 is the third frame of the 3 frames, and f _i,2 is the fourth frame of the 3 frames. Or mu _i may beF _i,-2 is the first frame of the 3 frames, f _i,-1 is the second frame of the 3 frames, f _i,0 is the third frame of the 3 frames, and f _i,1 is the fourth frame of the 3 frames.

As can be seen from equation 1, the embodiment of the present invention performs the truncation processing on the frame difference map, where the truncation processing refers to discarding the pixel difference value of the frame difference greater than the truncation threshold, and using the truncation threshold to replace the pixel difference value, whereas the pixel value of the noise position and the surrounding undamaged pixels are generally not large, within the set threshold range, while the larger difference value appearing in the frame difference map is often caused by the non-overlapping of the inter-frame content caused by the position change, so that the embodiment of the present invention truncates the frame difference map, thereby improving the extraction efficiency of the frame difference map and the detection efficiency of the noise.

And, equation 1 shows that increasing the frame difference map by the truncation threshold κ, i.e., shifting the pixel values of the frame difference map between (0, 255), reserves space for (and does not scale) the pixel difference values of (-255, 255) of the pixel value space record of (0, 255).

The embodiment of the present invention is not limited to calculating the frame difference map corresponding to a group of continuous video frames through the above formula 1, and in another implementation manner, the intermediate video frames and the average video frames may be directly subjected to difference operation and then taken as absolute values, as shown in formula 2:

Xi _i＝|f_i,0-μ_i |formula 2

Calculating to obtain a frame difference map corresponding to the group of continuous video frames, wherein f _i,0 is an intermediate video frame,And (3) taking the- +to represent video frames before and after the middle video frame, wherein mu _i is an average video frame, i is more than or equal to 1 and less than or equal to n, and n is a positive integer.

In one embodiment, n may take on values of 3, 4, 5, etc. The cutoff threshold κ may be 50, 86, 127, etc. According to the embodiment of the invention, a comparison test is carried out to test the effects of different frame difference extraction schemes after the training is participated, so that the optimal effect can be obtained by taking three frames as an average, namely n is 3, and the translation effect is optimal by taking 50 as a cut-off threshold kappa.

Step S202, inputting the calculated frame difference image into a depth neural network to conduct noise prediction, and outputting a video classification result;

the deep neural network in the embodiment of the invention can be a neural network which is obtained by training by taking a frame difference image of a video as input of the training neural network, and different classification results which are output by noise prediction through the deep neural network correspond to different video noise severity.

And step S204, outputting a noise identification result according to the video classification result.

The following details how the deep neural network is trained according to the embodiment of the present invention with reference to fig. 3 to 5, and the flow chart of the deep neural network training provided by the embodiment of the present invention as shown in fig. 3 may include the following steps:

step 300, sampling each video in the video training set, sampling a plurality of groups of continuous video frames of the video, and respectively calculating a frame difference map corresponding to each group of continuous video frames;

Specifically, the video database may include a video training set, a video verification set and a video test set, for example, all videos in the video database are divided according to a preset proportion, videos with a first threshold value in all videos are video training sets, videos with a second threshold value in all videos are video verification sets, and videos with a third threshold value in all videos are video test sets. The sum of the first, second and third thresholds is 100%, for example 80%, 20% and 20% etc.

The video training set in the video database can be used for training the pre-training neural network, specifically, each video in the video training set is sampled first, a plurality of groups of continuous video frames of the video are sampled, and then the frame difference map corresponding to each group of continuous video frames is calculated respectively. The implementation process of extracting the frame difference in this step S300 may refer to the implementation process of step S200 in the embodiment of fig. 2.

Step S302, inputting the calculated frame difference image and the artificial label corresponding to the video into a pre-training neural network for training;

Specifically, the manual label in the embodiment of the invention is used for indicating the noise severity of the video, wherein each video in the video database can be manually marked with a label (i.e. manual label) to mark which video classification of three video classifications (serious noise, slight noise or clear) belongs. In the embodiment of the invention, the frame difference image obtained in the step S300 and the artificial label corresponding to the video of the frame difference image are input into a pre-training neural network together for training.

In one embodiment, in order to reduce the computational cost of the neural network, the deep neural network (including the pre-trained neural network) in the embodiment of the present invention includes a deep neural network constructed by deep separable convolution, for example, a lightweight neural network (MobileNet) may be selected as a network structure, and the classification output is changed into three classifications (i.e., 3 video classification results), that is, training is performed according to the artificial labels (severe noise, slight noise, no noise).

MobileNet is an efficient model proposed for mobile and embedded devices. Depth separable convolution is used to construct lightweight deep neural networks. The basic structure on which it is based is shown in fig. 4. The first step is depth-separable convolution (depth-wise), which has only M3 x3 convolution kernels, and M convolution kernels are convolved with M input maps one by one to obtain M maps, which play a role in feature extraction, and the second step is point-wise convolution (point-wise), which is actually a traditional convolution, but all convolution kernels are 1x1, and a total of M x N1 x1 play a role in feature fusion.

In order to improve accuracy and speed up the training process, embodiments of the present invention may employ a pre-trained network on an image network (ImageNet) for fine tuning. In view of the fact that the image quality belongs to the bottom layer information, the high-level semantic information only plays an auxiliary role, in one implementation manner, mobileNet can be simplified, a MobileNet network model is changed, the last two deep separation convolution layers in the traditional MobileNet network model are removed, the final classification output is changed into 3, three classifications (serious noise, slight noise and clear) of the corresponding noise are achieved, and the network structure is shown in fig. 5.

The depth separable convolution consists of two layers, a depth convolution and a point-by-point convolution. The depth of the number of input channels is obtained by convolving each input channel with a single convolution kernel using a depth convolution, and then the outputs in the depth convolution are linearly combined using a point-wise convolution, i.e. applying a simple 1x1 convolution. MobileNets batch normalization layers (Batch Normalization or Batchnorm, BN) and nonlinear activation units (RECTIFIED LINEAR Unit) were used for each layer.

The depth convolution may use a convolution kernel for each channel, and the calculated amount of the depth convolution may be dkdk M DF DFDK DK M DF.

The deep convolution is very efficient with respect to the standard convolution, however it only convolves the input channels and does not combine them to produce new features. The next layer uses the 1x1 convolution with the additional layer to compute a linear combination of the outputs of the depth convolutions to produce new features.

Then the combination of the depth convolution plus the point-wise convolution of the 1x1 convolution is called the depth separable convolution, and is initially set forth in (Rigid-motion scattering for image classification).

The calculated amount of the depth separable convolution may be DK x M x df+m x N x DF x DFDK x DK x M x df+m x N x DF, i.e. the sum of the depth convolution and the point-by-point convolution of 1x 1.

The use of a 3x3 depth separable convolution for the reduction ：DK*DK*M*DF*DF+M*N*DF*DFDK*DK*M*DF*DF＝1N+1D2KD K*DK*M*DF*DF+M*N*DF*DFDK*DK*M*DF*DF＝1N+1DK2.MobileNet in computation by the process of integrating the volume into filtering and combining is 8 to 9 times less computation than a standard convolution.

Step S304, determining the neural network of the trained prediction model through the video verification set.

Specifically, in order to prevent overfitting, the optimal model is determined from the training results through the video verification set and is stored as a prediction model. That is, for the video verification set, each video is sampled, a plurality of groups of continuous video frames of the video are sampled, a frame difference map corresponding to each group of continuous video frames is calculated respectively (the implementation process of extracting the frame difference can refer to the implementation process of step S200 in the embodiment of fig. 2), the calculated frame difference map and the artificial label corresponding to the video are input to each trained neural network, so that the model with the best effect is determined.

In one embodiment, when the frame difference image processed by the video training set and the artificial tag are input into the pre-training neural network together for training, an optimization mode can be set to be random gradient descent (Stochastic GRADIENT DESCENT, SGD), parameters can be learning rate 1e-3, potential energy can be 0.9, batch size can be 128, and each video frame processed by a first quantity value (for example, 1000) can be set to be verified once through the video verification set, the accuracy is calculated according to the verification result and the artificial tag, and a model with the highest accuracy is reserved.

In one embodiment, step S304 may further include step S306, where noise prediction is performed on the video test set through the determined neural network of the best model, and a confusion matrix of video classification is generated according to the prediction result and the artificial label corresponding to the video test set.

Specifically, after the neural network of the trained prediction model is obtained, video prediction is performed through a video test set to obtain a prediction result, and then a confusion matrix of video classification is generated according to comparison between the artificial labels corresponding to the video test set and the prediction result, so as to judge the degree of video classification of the deep neural network.

In various embodiments of the present invention, inputting the calculated frame difference map into the deep neural network may further include:

the frame difference image is contracted according to a preset proportion, the number of first side length pixels of the contracted frame difference image is in a first range value, and the number of second side length pixels of the contracted frame difference image is in a second range value; and inputting the contracted frame difference image into a deep neural network to perform noise prediction.

Specifically, in order to obtain an input of a fixed size and further reduce the amount of calculation, the frame difference map of the video may be contracted according to the size ratio of the video itself, for example, the frame difference map may be contracted to 400 pixels on the first side (i.e., the number of pixels on the long side of the frame difference map) and 280 pixels on the second side (i.e., the number of pixels on the short side of the frame difference map).

In one implementation manner, the video noise identification method of the embodiment of the invention can also utilize a attention mechanism and combine reinforcement learning rewards (whether the final total classification is correct or not) to enable the deep neural network to autonomously learn areas with noise in a video frame in a large amount of data, and classify the severity according to the size of the noise areas, the size of noise particles, the position and other information.

The video noise identification method provided by the embodiment of the invention can be applied to various technical scenes:

For example, for checking the quality of video uploaded by a user for a detector of a short video platform, as shown in an application scene diagram of the video noise identification method provided by the embodiment of the invention in fig. 6, the video noise identification equipment side can extract a frame difference image from a video library in a mode of extracting the frame difference image according to the invention, send the frame difference image to a pre-trained neural network for training to obtain a neural network of a prediction model after training, then extract the video to be predicted from the frame difference image, and then use the prediction model for prediction, output a video classification result, and when the video classification result is confirmed to be inconsistent with the playing requirement, the video to be identified is not passed, and when the video classification result is confirmed to be consistent with the playing requirement, the video to be identified is passed. That is, the video noise identification method can objectively judge the severity of the video noise, can well replace manual verification of the noise after being deployed on line, can automatically and rapidly acquire whether the video to be identified passes identification or passes verification, and can display verification results to users.

For another example, a video is recommended to a user according to the quality of video quality for a short video platform, as shown in fig. 7, the video noise recognition device side of the video noise recognition method according to another embodiment of the present invention may extract a frame difference image from a video library according to the method of extracting a frame difference image of the present invention, send the frame difference image to a pre-trained neural network for training, obtain a trained neural network of a prediction model, then extract a video to be predicted from the frame difference image, then use the prediction model for prediction, output a video classification result, rank the video according to the video quality according to the video classification result, and recommend the video to the user according to the order of quality from good to bad.

In order to facilitate better implementation of the foregoing solution of the embodiment of the present invention, the present invention correspondingly provides a video noise recognition device, as shown in fig. 8, where the video noise recognition device 80 includes a sampling calculation unit 800, a prediction unit 802, and a recognition result output unit 804, where

The sampling calculation unit 800 is configured to sample a plurality of groups of continuous video frames from the video to be identified, and calculate a frame difference map corresponding to each group of continuous video frames respectively;

The prediction unit 802 is configured to input the calculated frame difference image into a deep neural network for noise prediction, and output a video classification result, where the deep neural network is a neural network that is obtained by training by using the frame difference image of the video as an input of a training neural network;

and when the video to be identified is confirmed to be not in accordance with the playing requirement according to the output video classification result, the video to be identified is not identified.

The sampling calculation unit 800 may include a determination calculation unit and a difference calculation unit, wherein,

A determining and calculating unit for determining an intermediate video frame in the group of continuous video frames and calculating an average frame of the group of continuous video frames;

And the difference value operation unit is used for carrying out difference value operation on the intermediate video frame and the average video frame to obtain a frame difference map corresponding to the group of continuous video frames.

In one embodiment, the set of consecutive video frames includes n frames, and the difference operation unit may specifically:

In one embodiment, the prediction unit 802 may be configured to shrink the frame difference map according to a predetermined ratio, where the number of first edge pixels of the frame difference map after shrinkage is in a first range value, the number of second edge pixels of the frame difference map after shrinkage is in a second range value, and input the frame difference map after shrinkage into the deep neural network to perform noise prediction.

In one embodiment, the number of the first side length pixels is 400, and the number of the second side length pixels is 280.

And the recognition result output unit 804 is configured to output a noise recognition result according to the video classification result.

As shown in fig. 9, in addition to the sampling calculation unit 800, the prediction unit 802, and the recognition result output unit 804, the video noise recognition device 80 may further include a training unit 806, configured to sample each video in the video training set, sample multiple groups of continuous video frames of the video, and calculate a frame difference map corresponding to each group of continuous video frames respectively;

After determining the neural network of the trained prediction model through the video verification set, the training unit 806 may be further configured to perform noise prediction on the video test set through the determined neural network of the prediction model, and generate a confusion matrix of video classification according to the prediction result and the artificial tag corresponding to the video test set.

In one implementation, the deep neural network comprises a deep neural network constructed by depth separable convolution, and the output of the deep neural network is 3 video classification results.

The units of the video noise identifying apparatus 80 in the embodiment of the present invention are used for correspondingly executing the steps of the video noise identifying method in the embodiments of fig. 1 to 5 in the above method embodiments, and are not described herein again.

In order to facilitate better implementation of the foregoing solutions of the embodiments of the present invention, the present invention further correspondingly provides a video noise identifying device, and the following details are described with reference to the accompanying drawings:

as shown in fig. 10, in the schematic structural diagram of the video noise identifying apparatus according to the embodiment of the present invention, the video noise identifying apparatus 100 may include a processor 101, a display screen 102, a memory 104, and a communication module 105, where the processor 101, the display screen 102, the memory 104, and the communication module 105 may be connected to each other through a bus 106. The memory 104 may be a high-speed random access memory (Random Access Memory, RAM) memory or a nonvolatile memory (non-volatile memory), such as at least one disk memory, where the memory 104 includes a flash in an embodiment of the present invention. The memory 104 may also optionally be at least one storage system located remotely from the aforementioned processor 101. The memory 104 is used for storing application program codes and may include an operating system, a network communication module, a user interface module and a video noise identification program, the communication module 105 is used for information and data interaction with external devices, and the processor 101 is configured to call the program codes to execute the following steps:

In the process of calculating the frame difference map corresponding to each group of continuous video frames by the processor 101, for a group of continuous video frames, the method may include:

The processor 101 performs a difference operation on the intermediate video frame and the average video frame to obtain a frame difference map corresponding to the group of continuous video frames, which may include:

The method for obtaining the frame difference map corresponding to the group of continuous video frames includes that the processor 101 subtracts the average video frame from the intermediate video frame, then performs truncation processing, and translates the truncated frame difference pixel value to a target pixel interval to obtain the frame difference map corresponding to the group of continuous video frames, which may include:

The processor 101 inputs the calculated frame difference map into a deep neural network to perform noise prediction, which may include:

The number of the first side length pixels is 400, and the number of the second side length pixels is 2100.

Wherein the processor 101 trains using the frame difference map of the video as an input to train the neural network, may include:

Wherein after the processor 101 determines the neural network of the trained predictive model through the video validation set, it is further operable to:

The depth neural network comprises a depth neural network constructed through depth separable convolution, and the output of the depth neural network is 3 video classification results.

It should be noted that, in the embodiment of the present invention, the execution steps of the processor 101 in the video noise identification device may refer to specific implementation manners of the video noise identification method in the embodiment of fig. 1 to 5 in the above method embodiments, and are not repeated here.

Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored on a computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a random-access Memory (Random Access Memory, RAM), or the like.

The foregoing disclosure is illustrative of the present invention and is not to be construed as limiting the scope of the invention, which is defined by the appended claims.

Claims

1. A method for identifying video noise, comprising:

Sampling multiple groups of continuous video frames from the video to be identified, and calculating the frame difference map corresponding to each group of continuous video frames respectively; wherein, for a group of continuous video frames, determining the middle video frame in the group of continuous video frames, and calculating the average video frame of the group of continuous video frames;

After subtracting the average video frame from the intermediate video frame, truncation is performed;

The frame difference pixel values after truncation processing are translated to the target pixel interval to obtain a frame difference image corresponding to the set of continuous video frames;

Inputting the calculated frame difference image into a deep neural network for noise prediction, and outputting a video classification result; wherein the deep neural network is a neural network trained by using the frame difference image of the video as an input for training the neural network; different classification results output by the deep neural network correspond to different video noise severity levels;

Outputting a noise point recognition result according to the video classification result.

2. The method according to claim 1, wherein performing a difference operation on the intermediate video frame and the average video frame to obtain a frame difference map corresponding to the set of continuous video frames further comprises:

The average video frame is subtracted from the intermediate video frame and an absolute value is taken to obtain a frame difference map corresponding to the set of continuous video frames.

3. The method according to claim 1, wherein the group of continuous video frames includes n frames; the subtracting the average video frame from the intermediate video frame and then performing truncation processing; and translating the frame difference pixel values after truncation processing to a target pixel interval to obtain a frame difference map corresponding to the group of continuous video frames, comprising:

By formula Calculate and obtain a frame difference map corresponding to the set of continuous video frames;

in, is the middle video frame, , Indicates that relative to the video frame before the intermediate video frame, the represents a video frame relative to the intermediate video frame, is the average video frame, is the cutoff threshold, , n is a positive integer.

4. The method according to claim 1, wherein the step of inputting the calculated frame difference map into a deep neural network for noise prediction comprises:

Shrinking the frame difference image according to a predetermined ratio, wherein the number of pixels of the first side length of the frame difference image after shrinkage is within a first range of values, and the number of pixels of the second side length of the frame difference image after shrinkage is within a second range of values;

The shrunken frame difference map is input into the deep neural network for noise prediction.

5. The method according to any one of claims 1 to 4, wherein the step of using the frame difference map of the video as an input for training the neural network comprises:

Each video in the video training set is sampled to obtain multiple groups of continuous video frames, and the frame difference map corresponding to each group of continuous video frames is calculated respectively;

Inputting the calculated frame difference image and the artificial label corresponding to the video into a pre-trained neural network for training; the artificial label is used to indicate the severity of noise in the video;

The video validation set is used to determine the neural network of the trained prediction model.

6. The method of claim 5, wherein after determining the trained neural network of the prediction model through the video verification set, the method further comprises:

Performing noise prediction on a video test set by using the determined neural network of the prediction model;

A confusion matrix for video classification is generated based on the prediction results and the manual labels corresponding to the video test set.

7. A video noise recognition device, comprising:

A sampling calculation unit, used for sampling multiple groups of continuous video frames from the video to be identified, and respectively calculating the frame difference map corresponding to each group of continuous video frames;

A prediction unit, configured to input the calculated frame difference map into a deep neural network for noise prediction, and output a video classification result; wherein the deep neural network is a neural network trained by using the frame difference map of the video as an input for training the neural network; different classification results output by the deep neural network correspond to different video noise severity levels;

A recognition result output unit, used to output a noise recognition result according to the video classification result;

Wherein, the sampling calculation unit includes:

A determination calculation unit is used to determine a middle video frame in a group of continuous video frames and calculate an average video frame of the group of continuous video frames;

The difference operation unit is used to perform a difference operation on the intermediate video frame and the average video frame to obtain a frame difference map corresponding to the group of continuous video frames, including: subtracting the average video frame from the intermediate video frame, performing a truncation process, and translating the frame difference pixel value after the truncation process to a target pixel interval to obtain a frame difference map corresponding to the group of continuous video frames.

8. A video noise recognition device, characterized in that it comprises a processor and a memory, the processor and the memory are connected to each other, wherein the memory is used to store program code, and the processor is configured to call the program code to execute the method according to any one of claims 1-6.

9. A computer-readable storage medium, characterized in that the computer-readable storage medium stores program instructions, and when the program instructions are executed by a processor, the processor executes the method according to any one of claims 1 to 6.