[go: up one dir, main page]

CN115035393B - Stroboscopic scene classification method, model training method, related device and electronic equipment - Google Patents

Stroboscopic scene classification method, model training method, related device and electronic equipment Download PDF

Info

Publication number
CN115035393B
CN115035393B CN202210760670.8A CN202210760670A CN115035393B CN 115035393 B CN115035393 B CN 115035393B CN 202210760670 A CN202210760670 A CN 202210760670A CN 115035393 B CN115035393 B CN 115035393B
Authority
CN
China
Prior art keywords
image
feature
images
pair
scene
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210760670.8A
Other languages
Chinese (zh)
Other versions
CN115035393A (en
Inventor
倪敏垚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Vivo Mobile Communication Co Ltd
Original Assignee
Vivo Mobile Communication Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Vivo Mobile Communication Co Ltd filed Critical Vivo Mobile Communication Co Ltd
Priority to CN202210760670.8A priority Critical patent/CN115035393B/en
Publication of CN115035393A publication Critical patent/CN115035393A/en
Application granted granted Critical
Publication of CN115035393B publication Critical patent/CN115035393B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/35Categorising the entire scene, e.g. birthday party or wedding scene
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a stroboscopic scene classification method, a model training method, a related device and electronic equipment, and belongs to the technical field of artificial intelligence. The method comprises the steps of obtaining a first image pair under the same shooting scene, carrying out feature processing on the first image and the second image to obtain a first image feature of the first image and a second image feature of the second image, splicing the first image feature and the second image feature to obtain a first target image feature, and carrying out stroboscopic scene classification on the shooting scene based on the first target image feature to obtain a first classification result, wherein the first classification result is used for representing the stroboscopic intensity level of the first image pair.

Description

Stroboscopic scene classification method, model training method, related device and electronic equipment
Technical Field
The application belongs to the technical field of artificial intelligence, and particularly relates to a stroboscopic scene classification method, a model training method, a related device and electronic equipment.
Background
In the process of shooting by a camera, in order to improve the image shooting accuracy, it is generally required to increase the shutter speed of the camera to perform image shooting in a stroboscopic scene, however, high shutter shooting may cause obvious bright and dark fringes (called as banding) on the obtained image, which affects the imaging experience.
At present, the intensity of the stroboscopic light of each row of pixels in a single frame image is usually calculated to predict the intensity of the stroboscopic light, so as to obtain whether the image has stroboscopic light, and the accuracy of detecting the stroboscopic light is relatively low.
Disclosure of Invention
The embodiment of the application aims to provide a stroboscopic scene classification method, a model training method, a related device and electronic equipment, which can solve the problem of low stroboscopic detection accuracy.
In a first aspect, an embodiment of the present application provides a method for classifying a strobe scene, where the method includes:
acquiring a first image pair under the same shooting scene, wherein the first image pair comprises a first image and a second image;
Performing feature processing on the first image and the second image to obtain first image features of the first image and second image features of the second image;
splicing the first image features and the second image features to obtain first target image features;
and carrying out stroboscopic scene classification on the shooting scene based on the first target image characteristics to obtain a first classification result, wherein the first classification result is used for representing the stroboscopic intensity level of the first image pair.
In a second aspect, an embodiment of the present application provides a model training method, including:
acquiring training sample data, wherein the training sample data comprises a fourth image pair under the same shooting scene and a stroboscopic scene classification label of the shooting scene, and the fourth image pair comprises a fifth image and a sixth image;
Performing target operation on the fourth image pair to obtain a second classification result, wherein the target operation comprises feature processing is performed on the fifth image and the sixth image to obtain a sixth image feature of the fifth image and a seventh image feature of the sixth image, and the sixth image feature and the seventh image feature are spliced to obtain a second target image feature;
Comparing the second classification result with the stroboscopic scene classification label to obtain a network loss value;
and updating network parameters in the target model based on the network loss value.
In a third aspect, an embodiment of the present application provides a strobe scene classification apparatus, including:
The first acquisition module is used for acquiring a first image pair in the same shooting scene, wherein the first image pair comprises a first image and a second image;
the feature processing module is used for carrying out feature processing on the first image and the second image to obtain first image features of the first image and second image features of the second image;
the splicing module is used for splicing the first image feature and the second image feature to obtain a first target image feature;
And the classification module is used for classifying the shooting scene by the stroboscopic scene based on the first target image characteristics to obtain a first classification result, and the first classification result is used for representing the stroboscopic intensity level of the first image pair.
In a fourth aspect, an embodiment of the present application provides a model training apparatus, including:
The second acquisition module is used for acquiring training sample data, wherein the training sample data comprises a fourth image pair under the same shooting scene and a stroboscopic scene classification label of the shooting scene, and the fourth image pair comprises a fifth image and a sixth image;
The target operation module is used for inputting the fourth image pair into a target model to execute target operation to obtain a second classification result, wherein the target operation comprises the steps of performing feature processing on the fifth image and the sixth image to obtain a sixth image feature of the fifth image and a seventh image feature of the sixth image; based on the second target image feature, classifying the shooting scene to obtain a second classification result, wherein the second classification result is used for representing the stroboscopic intensity level of the fourth image pair;
the comparison module is used for comparing the second classification result with the stroboscopic scene classification label to obtain a network loss value;
And the updating module is used for updating the network parameters in the target model based on the network loss value.
In a fifth aspect, an embodiment of the present application provides an electronic device, where the electronic device includes a processor, a memory, and a program or an instruction stored on the memory and executable on the processor, where the program or the instruction is executed by the processor to implement the steps of the strobe scene classification method according to the first aspect or the steps of the model training method according to the second aspect.
In a sixth aspect, embodiments of the present application provide a readable storage medium having stored thereon a program or instructions which, when executed by a processor, implement the steps of the strobe scene classification method as described in the first aspect, or the steps of the model training method as described in the second aspect.
In a seventh aspect, an embodiment of the present application provides a chip, where the chip includes a processor and a communication interface, where the communication interface is coupled to the processor, and the processor is configured to execute a program or instructions to implement the steps of the method for classifying a strobe scene according to the first aspect, or the steps of the method for training a model according to the second aspect.
In the embodiment of the application, a first image pair under the same shooting scene is obtained, the first image pair comprises a first image and a second image, the first image and the second image are subjected to feature processing to obtain a first image feature of the first image and a second image feature of the second image, the first image feature and the second image feature are spliced to obtain a first target image feature, the shooting scene is subjected to stroboscopic scene classification based on the first target image feature to obtain a first classification result, and the first classification result is used for representing the stroboscopic intensity level of the first image pair. Therefore, the shot scenes can be classified, the stroboscopic intensity level of the image pair can be evaluated on the whole, and the accuracy of stroboscopic detection is improved.
Drawings
Fig. 1 is a flowchart of a strobe scene classification method according to an embodiment of the present application;
FIG. 2 is a schematic illustration of a display of a first image pair banding;
FIG. 3 is a schematic diagram of an image format of a first image;
FIG. 4 is a schematic diagram of an exemplary object model;
FIG. 5 is a schematic diagram of an exemplary downsampling convolution module;
FIG. 6 is a schematic diagram of an exemplary double convolution module;
FIG. 7 is a schematic diagram of an exemplary cross-attention block configuration;
Fig. 8 is a schematic diagram of a process of converting a RAW domain image into an RGGB image;
FIG. 9 is a flow chart of a model training method provided by an embodiment of the present application;
FIG. 10 is a schematic diagram of a process for converting a three-way image into a two-dimensional matrix;
FIG. 11 is a process schematic of a demosaicing method;
fig. 12 is a block diagram of a strobe scene classification device according to an embodiment of the present application;
FIG. 13 is a block diagram of a model training apparatus provided by an embodiment of the present application;
fig. 14 is a block diagram of an electronic device according to an embodiment of the present application;
fig. 15 is a schematic hardware structure of an electronic device implementing an embodiment of the present application.
Detailed Description
The technical solutions of the embodiments of the present application will be clearly described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which are obtained by a person skilled in the art based on the embodiments of the present application, fall within the scope of protection of the present application.
The terms first, second and the like in the description and in the claims, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged, as appropriate, such that embodiments of the present application may be implemented in sequences other than those illustrated or described herein, and that the objects identified by "first," "second," etc. are generally of a type, and are not limited to the number of objects, such as the first object may be one or more. Furthermore, in the description and claims, "and/or" means at least one of the connected objects, and the character "/", generally means that the associated object is an "or" relationship.
The strobe scene classification provided by the embodiment of the application is described in detail below by specific embodiments and application scenes thereof with reference to the accompanying drawings.
Fig. 1 is a flowchart of a strobe scene classification method according to an embodiment of the present application, as shown in fig. 1, including the following steps:
Step 101, a first image pair in the same shooting scene is acquired, wherein the first image pair comprises a first image and a second image.
In this step, the first image pair may include two images, a first image and a second image, respectively. The first image and the second image are images under the same shooting scene respectively, namely the content of the first image and the second image under the shooting lens is the same. For example, two scenery shots are shot in the same shooting scene, and the scenery of the two scenery shots is the same.
The first image pair in the shooting scene may or may not carry bright and dark stripes (may be referred to as a banding), and is not particularly limited herein. Wherein, the dimming refers to light and dark stripes existing in a photographed image under a stroboscopic scene, the stroboscopic may be caused by the light source flickering at a fixed frequency, and the flickering is an unstable visual phenomenon caused by light stimulus whose brightness or spectral distribution fluctuates with time.
The object of the present embodiment is to obtain a classification result for characterizing the strobe intensity level of the first image pair by performing feature processing on the first image pair to classify the strobe scene of the shooting scene based on the image features of the first image pair. That is, the present embodiment can determine whether the photographing scene is a strobe scene, i.e., no strobe and a strobe, and in the case where the photographing scene is a strobe scene, the strobe intensity level of the first image pair, such as a light strobe, a heavy strobe, or the like.
In the case where there is a binding in the first image pair, the image content under the photographing lens may be regarded as the image background of the binding. And because the first image and the second image are obtained under the same shooting scene, the stroboscopic intensity of bandind in the first image and the second image is the same, but the positions of bandind are different. As shown in fig. 2, a region of bandind a 201 existing in the first image (image shown in the left diagram of fig. 2) and a region of bandind a 202 existing in the second image (image shown in the right diagram of fig. 2) are displayed in a staggered manner.
The image formats of the first image and the second image are the same, and may be an RGB format, or a preset image format, such as a bayer format in a RAW domain, where the bayer format may include pixels corresponding to four color types, that is, pixels representing red (denoted by R), pixels representing blue (denoted by B), pixels representing green near red (denoted by GR), and pixels representing green near blue (denoted by GB). The bayer format may be RGGB format, GRBG format, or the like.
In an alternative implementation manner, the first image pair of the RAW domain may be used to perform the strobe scene classification of this embodiment, the pixels in the image of the RAW domain and the ambient brightness are in a linear relationship, the RGB image is obtained by processing the image signal of the RAW domain, and the pixels and the ambient brightness are already in a nonlinear relationship, and since the scaling phenomenon is directly related to the ambient brightness change, the first image pair uses the image of the RAW domain to perform the strobe scene classification, which may more conform to the scene of the strobe detection, and improve the accuracy of the strobe detection.
In an alternative embodiment, a target model may be adopted, and the strobe scene of the shooting scene is classified based on the first image pair, so as to adapt to the structure of the target model, and the first image pair may be a four-channel image, where pixel points in the four-channel image include pixel values of four channels, and each pixel value corresponds to one color type of the four color types. The first image pair may be obtained by performing format conversion on a RAW domain image.
As shown in fig. 3, which is one of the image formats of the first image or the second image, taking the first image as an example, the first image may include four channels, each corresponding to a type of pixel point, R, GR, GB, and B, respectively.
The first image pair may be obtained by multiple ways, for example, two adjacent frames of RAW domain images in a real-time shooting scene may be obtained as the first image pair, for example, two adjacent frames of video or two frames of images with frames separated by a preset threshold may be obtained as the first image pair, for example, format conversion may be performed on the obtained two frames of images, so that two frames of images with four channels are converted as the first image pair.
And 102, performing feature processing on the first image and the second image to obtain first image features of the first image and second image features of the second image.
In this embodiment, the object model may be used to perform strobe scene classification, and specifically, the first image pair may be input to the object model to perform an object operation, where the object operation includes feature processing on the input image to obtain image features of the input image.
The object model may include a first branch network and a second branch network, and the first branch network and the second branch network may have the same structure. The first branch network and the second branch network can perform feature processing of the images independently of each other, namely, the first branch network can perform feature processing on the first image to obtain first image features, and the second branch network can perform the feature processing on the second image to obtain second image features.
Because the positions of the areas of the bonding in the two images in the same shooting scene are different, the first branch network and the second branch network can also perform image feature processing based on the attention mechanism in a mutually constrained way, namely the attention mechanism can be adopted, the image features extracted by the second branch network are adjusted based on the image features extracted by the first branch network, and the attention mechanism is adopted, the image features extracted by the first branch network are adjusted based on the image features extracted by the second branch network, so that the attention of the network to the bonding area is improved.
The first and second branch networks may perform feature processing on the input image using a downsampling convolution Module (Down Conv Module), respectively, to obtain a first image feature of the first image and a second image feature of the second image.
The first image features may include color features, texture features, shape features, spatial relationship features, etc. of the first image, where the first image carries a bonding, the color features may include luminance features of bonding in the first image, and the spatial relationship features may include features for characterizing the location of bonding regions in the first image. The second image features may include color features, texture features, shape features, spatial relationship features, etc. of the second image, where the second image carries a bonding, the color features may include luminance features of bonding in the second image, and the spatial relationship features may also include features for characterizing the location of bonding regions in the second image.
Because the positions of the blending regions of the first image and the second image are different, the target model can perform stroboscopic scene classification by using the difference of the positions of the blending regions in the first image feature and the second image feature and the blending brightness feature. Strobe intensity levels, such as no strobe, mild strobe, and severe strobe, are determined for differences in the positions of the string regions and string brightness characteristics.
And step 103, stitching the first image feature and the second image feature to obtain a first target image feature.
In this step, the first image feature and the second image feature may be stitched based on a stitching module, such as a concat module, to obtain a first target image feature.
The concat module performs feature stitching for channels of the first image feature and the second image feature, for example, the first image feature and the second image feature are feature graphs with the dimensions of (512,14,14) respectively, and after the concat module is stitched, a combined feature graph with the dimensions of (1024,14,14) can be obtained.
Step 104, based on the first target image feature, performing strobe scene classification on the shooting scene to obtain a first classification result, where the first classification result is used to characterize the strobe intensity level of the first image pair.
In the step, after a series of feature extraction operations are performed on the first target image features by the target model, input features of a full-connection layer can be obtained, and the full-connection layer is used for classifying the shooting scene by using the strobe scene based on the input features to obtain a first classification result.
If the first target image feature is a feature map with the scale of (1024,14,14), obtaining an input feature F6 of the full-connection layer after a series of feature extraction operations, wherein the scale of the input feature F6 is (512,1,1), and the specific steps are shown in the following formula (1).
F6=G(G(concat(F14,F24),2),7) (1)
The method comprises the steps of obtaining a first target image feature after feature stitching, wherein F14 and F24 are a first image feature and a second image feature respectively, the scale is 512,14,14, and G is a feature extraction operation and can be achieved through a downsampling convolution module.
And then, the target model can be used for classifying the stroboscopic scene of the shooting scene based on the characteristic F6 through the full-connection layer to obtain network output with the scale of (M, 1). Where M may be the number of strobe scene classifications, such as two, three, or four classifications, each value output by the network may represent a probability value for a corresponding strobe intensity level.
In an optional implementation manner, the full-connection layer may perform three classifications, which respectively correspond to the scenes with the strobe intensity levels without strobe, mild strobe and severe strobe, the scale of the network output is (3, 1), and the probability values of the strobe intensity levels without strobe, mild strobe and severe strobe are arranged according to the arrangement sequence, so that the strobe intensity level corresponding to the maximum probability value may be determined as the strobe scene classification result of the photographed scene. For example, if the network output is (0.2,0.7,0.1), the classification result of the stroboscopic scene is a mild stroboscopic.
In the embodiment, a first image pair under the same shooting scene is obtained, the first image pair comprises a first image and a second image, feature processing is conducted on the first image and the second image to obtain first image features of the first image and second image features of the second image, the first image features and the second image features are spliced to obtain first target image features, the shooting scene is subjected to stroboscopic scene classification based on the first target image features to obtain a first classification result, and the first classification result is used for representing stroboscopic intensity levels of the first image pair. Therefore, the shot scenes can be classified, the stroboscopic intensity level of the image pair can be evaluated on the whole, and the accuracy of stroboscopic detection is improved.
In addition, since the present embodiment can perform strobe scene classification, the scenes with strobe intensity levels of no strobe, mild strobe, and severe strobe are distinguished. Therefore, in practical application, for a camera without a stripping algorithm, the camera can be helped to adjust a shutter to avoid the occurrence of a strip stripping image, for example, under a shooting scene with a certain shutter speed, the shooting scene is detected to be a slight stroboscopic scene, and at the moment, the shutter speed can be properly reduced to avoid the occurrence of the strip stripping image. And for the camera with the band elimination algorithm, the band elimination algorithm of the camera can be assisted to distinguish whether the stroboscopic scene can be processed or not, for example, when the shooting scene is a light stroboscopic scene, the band elimination algorithm can be used for eliminating the band in the image so as to pertinently use the band elimination algorithm, and therefore the feasibility of the band elimination algorithm is improved.
Optionally, the step 102 specifically includes:
Extracting features of the first image to obtain third image features;
Extracting first weight information of a fourth image feature in a space dimension based on an attention mechanism, wherein the first weight information is used for indicating attention degree of each pixel region in the fourth image feature, the fourth image feature is obtained by carrying out feature extraction based on the second image, and when the second image carries stripes, a first weight value in the first weight information is larger than a second weight value, the first weight value is used for representing attention degree of the stripe regions in the fourth image feature, and the second weight value is used for representing attention degree of other regions except the stripe regions in the fourth image feature;
multiplying the first weight information with the third image feature to obtain a fifth image feature;
and carrying out feature processing on the fifth image feature to obtain the first image feature.
In this embodiment, the first branch network and the second branch network may perform feature processing of images with mutual constraint based on the attention mechanism, that is, the target model in this embodiment may be a network model in which two images input by two inputs have a mutual reference function and are constrained with each other.
As shown in fig. 4, the structure of the object model is shown in an example, and the input of the object model may be a first image and a second image, which includes two branch networks, namely a first branch network 41 and a second branch network 42, respectively, where the structures of the first branch network and the second branch network are the same.
Taking the first branch network as an example, the first branch network may include a feature processing module 411, which may include a feature extraction module 4111 and a cross-attention block 4112. The number of feature processing modules 411 may be one, two or even more, as shown in fig. 4, and the number of feature processing modules 411 is three, so as to ensure that feature information characterizing an image can be fully extracted.
Specifically, the feature extraction Module 4111 may include a downsampling convolution Module 4113 (Down Conv Module) and a doubling convolution Module 4114 (Double Conv Module). The feature extraction module may use the pooling layer and the convolution block to perform feature extraction, and the process of feature extraction may be represented by the following formula (2).
G(F,S)=conv(maxpool(F)↓S) (2)
Where F represents the input feature, G (F, S) represents the output feature, maxpool represents the maximum pooling layer, S represents the step size of the pooling layer, conv represents the convolution combination block of the convolution layer + regularization + activation layer.
Fig. 5 is a schematic diagram of an exemplary downsampling convolution module, as shown in fig. 5, which may include a full-connection layer 51 and convolution layer+regularization+activation layer convolution combination blocks 52, where the number of convolution layer+regularization+activation layer convolution combination blocks is two. The scale of the input image features is (c_in, h_in, w_in), the scale of the output image features is (c_in, h_in/2, w_in/2) after the input image features pass through the full connection layer, and the scale of the output image features is (c_out, h_in/2, w_in/2) after the input image features pass through the convolution combination blocks of the convolution layer and the regularization and activation layer in series. Wherein c_out is greater than c_in.
Fig. 6 is a schematic structural diagram of an exemplary double convolution module, as shown in fig. 6, where the double convolution module may include convolution combination blocks 61 of a convolution layer+regularization+activation layer, where the number of convolution combination blocks of the convolution layer+regularization+activation layer is two, the scale of the image feature input is (c_in, h_in, w_in), and the scale of the image feature output after passing through the convolution combination blocks of the convolution layer+regularization+activation layer is (c_out, h_in, w_in). Wherein c_out is greater than c_in.
After the feature extraction module 4111 performs feature extraction on the first image, a third image feature, denoted by F11, may be obtained. Accordingly, after the second image is subjected to feature extraction by the feature extraction module 4211 in the feature processing module 421, a fourth image feature, denoted by F21, can be obtained.
In an alternative embodiment, the input image may have a scale (4,224,224), and after feature extraction by the feature extraction module, the third image feature and the fourth image feature may have a scale (64,112,112).
The object model may include a cross attention block 4112, which in case of deriving the third image feature and the fourth image feature may extract the first weight information of the fourth image feature in the spatial dimension based on the attention mechanism. Under the circumstance that the first image pair carries bright and dark stripe blending, the blending position feature may be included in the third image feature and the fourth image feature, and the blending position feature in the image feature may be adjusted based on the attention mechanism through the cross attention block 4112, so as to highlight the position needing to be focused.
Fig. 7 is a schematic structural diagram of an exemplary cross-attention block, as shown in fig. 7, where the cross-attention mechanism may extract weight information of an image feature in a spatial dimension based on an attention mechanism (may be a spatial attention mechanism), and multiply another image feature with the weight information to obtain an output image feature.
Taking the adjustment of the third image feature F11 as an example, the convolution combination block of the convolution layer+regularization+activation layer is used to extract the first weight information of the fourth image feature F21 in the spatial dimension, denoted by SA21, where the scale is (1, h_in, w_in). The first weight information SA21 gives different weight values to pixels in the fourth image feature F21 in a spatial dimension, and is used for indicating the attention degree of each pixel region in the fourth image feature, so as to highlight the position needing important attention.
In the case that the second image carries stripes, i.e. the bonding, the first weight value in the first weight information is greater than the second weight value in the first weight information, the first weight value is used for representing the attention degree of the bonding region in the fourth image feature, the second weight value is used for representing the attention degree of other regions except the bonding region in the fourth image feature, i.e. the SA21 is mainly used for indicating the bonding region in the fourth image feature F21, and the specific operation steps are as shown in the following formula (3).
SA21=conv(F21) (3)
Since the positions of the blending areas of the two images in the first image pair are different, if there is a significant dislocation, the difference of the positions of the blending areas is a key point of classification of the strobe scene. Specifically, the first weight information SA21 and the third image feature F11 are multiplied, and the weighting area information in the fourth image feature F21 affects the feature information in the third image feature F11, mainly to distinguish the brightness difference at the same position in the third image feature F11 and the fourth image feature F21. The method can avoid interference of background areas between two images, and gradually and accurately extract the change characteristics of the playing information, so that the attention of the network to the playing areas is improved.
Accordingly, when the fourth image feature F21 is adjusted, the fourth image feature F21 is adjusted using the spatial weight calculated by the third image feature F11.
Thereafter, the fifth image feature obtained by multiplying the first weight information SA21 and the third image feature F11 may be subjected to feature processing by another feature processing module connected in series with the feature processing module 411 in the first branch network, thereby obtaining the first image feature F14. The feature processing may include feature extraction and feature adjustment, and the feature processing may be similar to the operation performed by the feature processing module 411, which is not described herein. Correspondingly, the second branch network may also perform feature processing on the second image in the same manner to obtain a second image feature F24, which is not described herein.
In an alternative embodiment, the scale of the input image may be (4,224,224), after the first feature processing module performs feature processing, an image feature with a scale of (64,112,112) is obtained, after the second feature processing module performs feature processing, an image feature with a scale of (128,56,56) is obtained, after the third feature processing module performs feature processing, an image feature with a scale of (256,28,28) is obtained, and then feature extraction is performed by the feature extraction module, so as to obtain a first image feature F14 and a second image feature F24 with a scale of (512,14,14).
As shown in fig. 4, the object model may further include a stitching module 43, where the first image feature F14 and the second image feature F24 are stitched by the stitching module to obtain a first object image feature. After a series of feature extraction, the first target image features obtain an input feature F6 of the full-connection layer, where the scale is (512,1,1), and the target model may further include a full-connection layer 44, where the full-connection layer 44 outputs a classification result based on the input feature F6.
In this embodiment, the difference characteristic of the position of the binding region in the first image pair under the same shooting scene is utilized, the image feature of one branch network is adjusted based on the image feature of the other branch network through the cross attention mechanism, and the binding position feature in the image feature can be correspondingly adjusted, so that the binding region information in the image feature of the one branch network affects the feature information in the image feature of the other branch network. Therefore, the brightness difference of the same position in the image features of the two images can be distinguished, interference of a background area between the two images can be avoided, the change features of the screening information can be extracted gradually and accurately, the attention of the network to the screening area is improved, and the accuracy of stroboscopic detection can be further improved.
Optionally, the step 101 specifically includes:
Acquiring a second image pair under the shooting scene, wherein the second image pair comprises two third images which are in a preset image format and are in a single channel, and the third images comprise pixel points corresponding to four color types;
And carrying out first image preprocessing on the two third images to obtain the first image and the second image, wherein the first image and the second image are four-channel images, pixel points in the four-channel images comprise pixel values of four channels, and each pixel value corresponds to one color type of the four color types.
In this embodiment, the first image pair may be obtained by performing format conversion on an image in a RAW domain bayer format.
Specifically, the second image pair may be a RAW domain image, where the image format of the second image pair is a preset image format, that is, a bayer format, and the bayer format may include pixels corresponding to four color types, that is, pixels representing red (denoted by R), pixels representing blue (denoted by B), pixels representing green near red (denoted by GR), and pixels representing green near blue (denoted by GB). The bayer format may be RGGB format, GRBG format, or the like.
The second image pair may be obtained by multiple methods, for example, two adjacent frames, that is, two continuous frames, of RAW domain images in a real-time shooting scene may be obtained as the second image pair, for example, two adjacent frames in a video or two frames of images with frames separated by a preset threshold may be obtained, and format conversion may be performed on the two frames of images to obtain the second image pair.
And then, respectively carrying out first image preprocessing on the two third images to obtain the first image and the second image. Wherein the first image preprocessing may include format conversion for the purpose of converting the bayer format into a four-channel image format. Fig. 8 is a schematic diagram of a process of converting a RAW domain image into an RGGB image, where, as shown in fig. 8, a first image preprocessing is performed on the RAW domain image (the image format may be the RGGB format, the GRBG format, etc.), and the RAW domain image with a scale of (H, W) is converted into an RGGB image with a size of (H/2, W/2, 4), where a pixel point in the RGGB image includes pixel values of four channels, and each pixel value corresponds to one color type of the four color types.
The first image and the second image can be obtained based on the two RGGB images obtained by format conversion, for example, the RGGB images can be normalized, and the size of the normalized RGGB images is adjusted to (224,224,4).
In this embodiment, since the scaling phenomenon is directly related to the change of the ambient brightness, and the pixel in the RAW domain image and the ambient brightness are in a linear relationship, the strobe scene classification can be more consistent with the scene of strobe detection by using the RAW domain image, and the accuracy of strobe detection can be improved.
Optionally, the performing first image preprocessing on the two third images to obtain the first image and the second image includes:
performing format conversion on the two third images to obtain a third image pair, wherein the third image pair comprises two fourth images, and the fourth images are four-channel images;
And normalizing the two fourth images based on the maximum pixel values and the preset black level of the two fourth images to obtain the first image and the second image.
In this embodiment, the first image preprocessing may include format conversion and normalization processing. The normalization processing is to normalize each fourth image, namely, an RGGB image according to a pixel maximum value of the RGGB image and a preset black level (e.g., 64 or 1024, etc.), where the formula is shown in the following formula (4).
Wherein bl is black level, max is pixel maximum, I is pixel value In RGGB image, and In is pixel value obtained after normalization.
In this embodiment, the normalization is performed according to the pixel maximum value and the preset black level of the RGGB image, so that the accuracy of image normalization can be improved.
It should be noted that, the target model needs to be trained in advance before use to fix the network parameters of the target model, and the training process will be described in detail in the following examples. The model training of the target model for strobe scene classification provided by the embodiment of the application is described in detail below through specific embodiments and application scenes thereof with reference to the accompanying drawings.
Fig. 9 is a flowchart of a model training method according to an embodiment of the present application, as shown in fig. 9, including the following steps:
Step 901, obtaining training sample data, wherein the training sample data comprises a fourth image pair under the same shooting scene and a stroboscopic scene classification tag of the shooting scene, and the fourth image pair comprises a fifth image and a sixth image;
Step 902, inputting the fourth image pair into a target model to execute target operation to obtain a second classification result, wherein the target operation comprises feature processing is performed on the fifth image and the sixth image to obtain a sixth image feature of the fifth image and a seventh image feature of the sixth image, stitching is performed on the sixth image feature and the seventh image feature to obtain a second target image feature, and based on the second target image feature, performing stroboscopic scene classification on the shooting scene to obtain a second classification result, wherein the second classification result is used for representing the stroboscopic intensity level of the fourth image pair;
step 903, comparing the second classification result with the strobe scene classification label to obtain a network loss value;
Step 904, updating network parameters in the target model based on the network loss value.
This embodiment describes a training process for the target model.
In step 901, the training sample data may include at least one photographed scene, and a fourth image pair at each photographed scene, and the training sample data may further include a strobe scene classification tag at each photographed scene.
The fourth image pair is acquired in a similar manner to the first image pair, and a detailed description thereof will be omitted.
In order to make the training effect of the target model better, the training sample data needs to include image pairs of each strobe scene, such as non-strobe image pairs, light strobe image pairs and heavy strobe image pairs. In an alternative embodiment, the fourth image pair may be obtained by fusion processing based on a background image not carrying a bonding and a bonding mask image carrying each intensity, so as to simplify the acquisition of training sample data.
The strobe scene classification tag of the shooting scene can be obtained by manual labeling of a user, or can be obtained by pixel statistics of a mask image carrying a string, and is not particularly limited herein.
In an alternative embodiment, the strobe scene classification tag may be a one-dimensional vector, its scale may be (3, 1), the strobe scene classification tag may be [1, 0] in the case where the strobe scene is a no strobe, the strobe scene classification tag may be [0,1,0] in the case where the strobe scene is a light strobe, and the strobe scene classification tag may be [0, 1] in the case where the strobe scene is a heavy strobe.
In step 902, a target operation may be performed on the input of the fourth image pair to the target model, resulting in a second classification result. The manner of inputting the fourth image pair to the target model to perform the target operation is similar to the manner of classifying the strobe scene of the shooting scene of the first image pair based on the target model in the above embodiment, and will not be described herein. Accordingly, the resulting second classification result is conceptually similar to the first classification result, the second classification result characterizes the strobe intensity level of the fourth image pair, and the first classification result characterizes the strobe intensity level of the first image pair.
In step 903, the second classification result may be compared with the strobe scene classification tag to obtain a network loss value. In an alternative embodiment, the second classification result may be compared with the strobe scene classification tag by a distance comparison method of the vectors, to obtain the network loss value.
In step 904, the network parameters of the target model may be updated by a gradient descent method, and a loop iteration method may be used to continuously update the network parameters of the target model until the difference between the second classification result and the classification label of the strobe scene, that is, the network loss value is less than a certain threshold value, and convergence is reached, at which time the target model may be trained.
In the embodiment, training sample data are obtained, the training sample data comprise a fourth image pair under the same shooting scene and a stroboscopic scene classification label of the shooting scene, the fourth image pair comprises a fifth image and a sixth image, the fourth image pair is input into a target model to execute target operation to obtain a second classification result, the target operation comprises the steps of carrying out feature processing on the fifth image and the sixth image to obtain a sixth image feature of the fifth image and a seventh image feature of the sixth image, carrying out stitching on the sixth image feature and the seventh image feature to obtain a second target image feature, carrying out stroboscopic scene classification on the shooting scene based on the second target image feature to obtain a second classification result, the second classification result is used for representing the stroboscopic intensity level of the fourth image pair, comparing the second classification result with the stroboscopic scene classification label to obtain a network loss value, and updating network parameters in the target model based on the network loss value. Therefore, training of the target model can be achieved, the target model can be used for classifying the stroboscopic scenes of the shooting scenes, and accuracy of stroboscopic detection is improved.
Optionally, the object model includes a first branch network and a second branch network, where the first branch network is used to perform feature processing on the fifth image and the sixth image to obtain a sixth image feature of the fifth image, and the second branch network is used to perform feature processing on the fifth image and the sixth image to obtain a seventh image feature of the sixth image.
In this embodiment, the object model may include two branch networks, where the first branch network and the second branch network may perform feature processing of images independently from each other, and since the locations of the regions of bandind in the two images in the same shooting scene are different, the first branch network and the second branch network may perform feature processing of images in a mutually constrained manner based on an attention mechanism, that is, may use the attention mechanism to adjust the feature of the image extracted by the second branch network based on the feature of the image extracted by the first branch network, and use the attention mechanism to adjust the feature of the image extracted by the first branch network based on the feature of the image extracted by the second branch network, so as to promote the attention of the network to the band region.
In this embodiment, the target model may accept two image inputs through two branch networks, so that strobe scene classification may be performed by using the difference in the positions of the blending regions and the blending brightness features in the two images in the same shooting scene.
Optionally, the step 902 specifically includes:
Extracting features of the fifth image to obtain eighth image features;
Extracting second weight information of a ninth image feature in a space dimension based on an attention mechanism, wherein the second weight information is used for indicating attention degree of each pixel region in the ninth image feature, the ninth image feature is obtained by carrying out feature extraction based on the sixth image, and when the sixth image carries stripes, a third weight value in the second weight information is larger than a fourth weight value, the third weight value is used for representing attention degree of the stripe regions in the ninth image feature, and the fourth weight value is used for representing attention degree of other regions except the stripe regions in the ninth image feature;
multiplying the second weight information with the eighth image feature to obtain a tenth image feature;
and carrying out feature processing on the tenth image feature to obtain the sixth image feature.
In this embodiment, the first branch network and the second branch network may perform feature processing of images with mutual constraint based on the attention mechanism, that is, the target model may be a network model in which two input images have a mutual reference function and are constrained with each other.
The manner of the feature processing of the fifth image by the first branch network is similar to the manner of the feature processing of the first image, and will not be described here.
In this embodiment, the difference characteristic of the position of the binding region in the first image pair under the same shooting scene is utilized, the image feature of one branch network is adjusted based on the image feature of the other branch network through the cross attention mechanism, and the binding position feature in the image feature can be correspondingly adjusted, so that the binding region information in the image feature of the one branch network affects the feature information in the image feature of the other branch network. Therefore, the brightness difference of the same position in the image features of the two images can be distinguished, interference of a background area between the two images can be avoided, the change features of the screening information can be extracted gradually and accurately, the attention of the network to the screening area is improved, and the accuracy of stroboscopic detection can be further improved.
Optionally, the step 901 specifically includes:
Acquiring a fifth image pair under the shooting scene, wherein the fifth image pair comprises two seventh images which are in a preset image format and are in a single channel, the seventh images comprise pixel points corresponding to four color types, and acquiring a sixth image pair which comprises two mask images which carry stripes and are in a single channel, and the two mask images have the same strobe intensity and different stripe positions;
performing second image preprocessing on the two seventh images in the fifth image pair to obtain a seventh image pair, wherein the seventh image pair comprises two images with four channels, pixel points in the images with four channels comprise pixel values of four channels, each pixel value corresponds to one color type in the four color types;
multiplying the seventh image pair and the eighth image pair to obtain the fourth image pair; and counting the pixel values in the eighth image pair to obtain the stroboscopic scene classification tag.
In this embodiment, the fourth image pair may be obtained by performing fusion processing based on the background image not carrying the bonding and the bonding mask image carrying the bonding of each intensity, so that the acquisition of training sample data may be simplified.
The image formats of the fifth image pair and the second image pair are similar, the image formats are RAW domain images, the image formats are bayer formats, and the acquisition mode of the fifth image pair and the second image pair can be similar to that of the second image pair, and the description is omitted here.
The sixth image pair may carry a stripe, or banding, and the sixth image pair may include two mask images of a single channel, which may be gray scale images, having the same magnitude of strobe intensity but different positions of the stripe. The sixth image pair may be acquired in various manners, for example, a prestored sixth image pair may be acquired, and a sixth image peer sent by other electronic devices may be received.
Then, for each image in the fifth image pair, the second image preprocessing may be performed on the image, and the manner of performing the second image preprocessing on the image may be similar to the manner of performing the first image preprocessing on the image in the second image pair, which is not described herein.
Correspondingly, a third image preprocessing can be performed on the sixth image pair to obtain an eighth image pair, wherein the eighth image pair comprises two mask images with four channels.
Wherein the third image pre-processing may comprise format conversion to convert the image in the sixth image pair (image being a single channel) into a four channel image. Before the format conversion, in order to improve generalization of the target model, the third image preprocessing may further include performing random conversion on the two mask images to adjust the bonding strength. And after the format conversion, in order to further simplify the acquisition of the training sample data, so that a large number of mask images with different bonding intensities can be acquired, the third image preprocessing may further include processing for changing the degree of bonding in the mask images.
Specifically, in order to improve generalization of the target model, two mask images are randomly converted, and the bonding strength is adjusted. Both mask images are gray images, the gray images are normalized and copied into four-channel images for adapting to the bayer format of the four channels, and then the four-channel mask images are transformed. First, a random power operation may be performed on the mask map, and the adjustment formula is shown in the following formula (5) to change the degree of weighting of the overall grouping.
M1=Mrt,rt∈(0.5,5) (5)
Where M is the light and heavy degree of the binding before intensity adjustment, and M1 is the light and heavy degree of the binding after intensity adjustment.
And then, multiplying the four channels by different random enhancement coefficients respectively to change the color form of the band, wherein the enhancement coefficients of the channel index c corresponding to the two G channels are the same, and the adjustment formula is shown in the following formula (6).
M2c=1-e(1-M1c),c∈(R,G,B)&e∈(0.5,1) (6)
Wherein, M1 c is the degree of weight of the unbinding before the color morphology adjustment for the channel index c, and M2 c is the degree of weight of the unbinding after the color morphology adjustment for the channel index c.
The transformation parameters of the two mask images are the same, and an eighth image pair can be obtained, wherein the eighth image pair can comprise two mask images after enhancement transformation and are respectively represented by K1 and K2.
Thereafter, the basis intensity may be calculated based on the eighth image pair, specifically, the pixel value of the mask image in the eighth image pair may be counted, the minimum pixel value P min may be determined, the basis intensity may be determined according to the pixel value P min, and the basis intensity may be represented by gt, and y represents the one-shot code of gt.
In an alternative embodiment, the relationship between the pixel value P min and gt may be represented by the following formula (7), and the relationship between gt and y may be represented by the following formula (8).
Wherein 0 represents no strobe, i.e. no banding in the image, 1 represents a light strobe, i.e. light banding in the image, and 2 represents a heavy strobe, i.e. heavy banding in the image.
Accordingly, based on the minimum pixel value of the mask image in the eighth image pair, and in combination with the above formulas (7) and (8), a strobe scene classification tag can be determined.
The two mask images K1 and K2 are respectively resized so that the sizes thereof are the same as those of the two images R1 and R1 in the seventh image pair. Afterwards, a binding is added to R1 and R1, specifically, K1 and R1, K2 and R2 may be multiplied, and the size of the input image from the resize to the target model is adjusted, for example, (4,224,224), to obtain a fourth image pair, that is, the input image of the target model is denoted by KR1 and KR2, respectively.
In this embodiment, since the scaling phenomenon is directly related to the change of the ambient brightness, and the pixel in the RAW domain image and the ambient brightness are in a linear relationship, the strobe scene classification can be more consistent with the scene of strobe detection by using the RAW domain image, and the accuracy of strobe detection can be improved. And, carry out fusion processing based on the background image that does not carry with the bonding and carry the bonding mask image of each intensity and obtain the input image of target model to and obtain the stroboscopic scene classification label that is used for model training based on bonding mask image, can simplify the acquisition of training sample data like this.
Optionally, the acquiring a fifth image pair in the shooting scene includes:
Acquiring a ninth image pair in the shooting scene from a video, wherein the ninth image pair comprises two frames of images of which the frames are separated by a preset threshold value in the video;
And performing format conversion on the two frames of images in the ninth image pair to obtain the fifth image pair.
In this embodiment, the ninth image pair may be a three-channel image, and the image format thereof may be an RGB image. The preset threshold value can be set according to practical situations, and cannot be set too large in general, so as to avoid that two acquired images are no longer images in the same shooting scene.
In an alternative embodiment, 8 consecutive frames of images may be acquired every 5 seconds (30 frames per second, i.e. 150 frames apart) in the video, and two random frames may be acquired in the 8 frames of images, which are denoted as a ninth image pair, I1 and I2, respectively.
The ninth image pair is then format converted to a fifth image pair to convert I 1,I2 from an RGB image to a RAW domain image.
The specific steps of the format conversion are as follows:
step 1, through the formula I 1 and I 2 were normalized.
Step 2, after the RAW domain image is converted into the RGB image, in order to better conform to the human eye vision and enhance the contrast ratio, the RAW domain image is subjected to gamma operation, where the formula is I RGB=IRAW g, where g is usually 1/2.2, so that the formula of the gamma operation can be I gj=Ij 2.2, j e {1,2}, and the image after the gamma operation, i.e., I g1,Ig2, can be obtained through the above processing.
In step 3, the inverse color correction is performed by multiplying a conversion matrix on the RAW domain image in order to convert the RAW domain image into the RGB image, and therefore, when converting the RGB image into the RAW domain image, the inverse matrix of the conversion matrix needs to be multiplied. In an alternative embodiment, the inverse of the transformation matrix may beAnd converting the images arranged by three channels (h, w, c) of the RGB image into a two-dimensional matrix of (h×w, c), as shown in fig. 10, to facilitate the calculation operation.
Definition I g1,Ig2 is converted to I s1,Is2 as shown in fig. 10, multiplied by ccm in a matrix multiplication manner, and reconverted to (h, w, c) to obtain I c1,Ic2.
Through step 2 and step 3, the nonlinear relationship between the RGB image and the ambient brightness can be converted into a near linear relationship. And then, obtaining the RAW domain image in the bayer format by a demosaicing method. The demosaicing method is shown in fig. 11 to convert I c1,Ic2 to the bayer format to yield a fifth image pair.
And then, converting the RAW domain image in the bayer format into four channels according to the color channels to obtain two images R1 and R1 in the seventh image pair.
Because RAW domain diagrams carrying the screening are difficult to collect, and manual classification is subjective, classification is difficult to accurate. Therefore, in the present embodiment, by synthesizing data, two continuous images at any position of a video are used, the two images are converted into a RAW domain image, and a string mask image after objective classification is added to the RAW domain image, so as to form an input image of a target model, thereby further simplifying acquisition of training sample data.
It should be noted that, in the method for classifying a strobe scene according to the embodiment of the present application, the execution subject may be a strobe scene classification device, or a control module for executing the method for classifying a strobe scene in the strobe scene classification device. In the embodiment of the application, a method for classifying a strobe scene by using the strobe scene classifying device is taken as an example, and the strobe scene classifying device provided by the embodiment of the application is described.
Referring to fig. 12, fig. 12 is a block diagram of a strobe scene classification apparatus according to an embodiment of the present application, and as shown in fig. 12, a strobe scene classification apparatus 1200 includes:
A first obtaining module 1201, configured to obtain a first image pair in the same shooting scene, where the first image pair includes a first image and a second image;
A feature processing module 1202, configured to perform feature processing on the first image and the second image to obtain a first image feature of the first image and a second image feature of the second image;
The stitching module 1203 is configured to stitch the first image feature and the second image feature to obtain a first target image feature;
The classification module 1204 is configured to perform a strobe scene classification on the captured scene based on the first target image feature, to obtain a first classification result, where the first classification result is used to characterize a strobe intensity level of the first image pair.
Optionally, the feature processing module 1202 is specifically configured to:
Extracting features of the first image to obtain third image features;
Extracting first weight information of a fourth image feature in a space dimension based on an attention mechanism, wherein the first weight information is used for indicating attention degree of each pixel region in the fourth image feature, the fourth image feature is obtained by carrying out feature extraction based on the second image, and when the second image carries stripes, a first weight value in the first weight information is larger than a second weight value, the first weight value is used for representing attention degree of the stripe regions in the fourth image feature, and the second weight value is used for representing attention degree of other regions except the stripe regions in the fourth image feature;
multiplying the first weight information with the third image feature to obtain a fifth image feature;
and carrying out feature processing on the fifth image feature to obtain the first image feature.
Optionally, the first obtaining module 1201 includes:
The first acquisition unit is used for acquiring a second image pair under the shooting scene, wherein the second image pair comprises two third images which are in a preset image format and are in a single channel, and the third images comprise pixel points corresponding to four color types;
The first image preprocessing unit is used for performing first image preprocessing on the two third images to obtain the first image and the second image, the first image and the second image are four-channel images, pixel points in the four-channel images comprise pixel values of four channels, and each pixel value corresponds to one color type of the four color types.
Optionally, the first image preprocessing unit is specifically configured to:
performing format conversion on the two third images to obtain a third image pair, wherein the third image pair comprises two fourth images, and the fourth images are four-channel images;
And normalizing the two fourth images based on the maximum pixel values and the preset black level of the two fourth images to obtain the first image and the second image.
In this embodiment, a first image pair under the same shooting scene is acquired through a first acquisition module 1201, the first image pair includes a first image and a second image, feature processing is performed on the first image and the second image through a feature processing module 1202 to obtain a first image feature of the first image and a second image feature of the second image, the first image feature and the second image feature are spliced through a splicing module 1203 to obtain a first target image feature, and the shooting scene is subjected to stroboscopic scene classification through a classification module 1204 based on the first target image feature to obtain a first classification result, wherein the first classification result is used for representing the stroboscopic intensity level of the first image pair. Therefore, the shot scenes can be classified, the stroboscopic intensity level of the image pair can be evaluated on the whole, and the accuracy of stroboscopic detection is improved.
The strobe scene classification device in the embodiment of the application can be a device, and can also be a component, an integrated circuit or a chip in electronic equipment. The device may be a mobile electronic device or a non-mobile electronic device. By way of example, the mobile electronic device may be a mobile phone, a tablet computer, a notebook computer, a palm computer, a vehicle-mounted electronic device, a wearable device, an ultra-mobile personal computer (UMPC), a netbook or a Personal Digital Assistant (PDA), etc., and the non-mobile electronic device may be a server, a network attached storage (Network Attached Storage, NAS), a personal computer (personal computer, PC), a Television (TV), a teller machine, a self-service machine, etc., and the embodiments of the present application are not limited in particular.
The strobe scene classification device in the embodiment of the application can be a device with an operating system. The operating system may be an Android operating system, an ios operating system, or other possible operating systems, and the embodiment of the present application is not limited specifically.
The strobe scene classification device provided by the embodiment of the application can realize each process realized by the method embodiment of fig. 1, and in order to avoid repetition, the description is omitted here.
It should be noted that, in the method for classifying a strobe scene according to the embodiment of the present application, the execution subject may be a strobe scene classification device, or a control module for executing the method for classifying a strobe scene in the strobe scene classification device. In the embodiment of the application, a method for classifying a strobe scene by using the strobe scene classifying device is taken as an example, and the strobe scene classifying device provided by the embodiment of the application is described.
Referring to fig. 13, fig. 13 is a block diagram of a model training apparatus according to an embodiment of the present application, and as shown in fig. 13, a model training apparatus 1300 includes:
A second obtaining module 1301, configured to obtain training sample data, where the training sample data includes a fourth image pair in the same shooting scene, and a strobe scene classification tag of the shooting scene, where the fourth image pair includes a fifth image and a sixth image;
The target operation module 1302 is configured to perform a target operation on the fourth image pair input to a target model to obtain a second classification result, where the target operation includes performing feature processing on the fifth image and the sixth image to obtain a sixth image feature of the fifth image and a seventh image feature of the sixth image; based on the second target image feature, classifying the shooting scene to obtain a second classification result, wherein the second classification result is used for representing the stroboscopic intensity level of the fourth image pair;
The comparison module 1303 is configured to compare the second classification result with the strobe scene classification tag to obtain a network loss value;
An updating module 1304 configured to update network parameters in the target model based on the network loss value.
Optionally, the object model includes a first branch network and a second branch network, where the first branch network is used to perform feature processing on the fifth image and the sixth image to obtain a sixth image feature of the fifth image, and the second branch network is used to perform feature processing on the fifth image and the sixth image to obtain a seventh image feature of the sixth image.
Optionally, the target operation module 1302 is specifically configured to:
Extracting features of the fifth image to obtain eighth image features;
Extracting second weight information of a ninth image feature in a space dimension based on an attention mechanism, wherein the second weight information is used for indicating attention degree of each pixel region in the ninth image feature, the ninth image feature is obtained by carrying out feature extraction based on the sixth image, and when the sixth image carries stripes, a third weight value in the second weight information is larger than a fourth weight value, the third weight value is used for representing attention degree of the stripe regions in the ninth image feature, and the fourth weight value is used for representing attention degree of other regions except the stripe regions in the ninth image feature;
multiplying the second weight information with the eighth image feature to obtain a tenth image feature;
and carrying out feature processing on the tenth image feature to obtain the sixth image feature.
Optionally, the second obtaining module 1301 includes:
The second acquisition unit is used for acquiring a fifth image pair under the shooting scene, wherein the fifth image pair comprises two seventh images which are in a preset image format and are in a single channel, and the seventh images comprise pixel points corresponding to four color types; the method comprises the steps of obtaining a sixth image pair, wherein the sixth image pair comprises two mask images which carry stripes and are single-channel, the stroboscopic intensity of the two mask images is the same, but the positions of the stripes are different;
A second image preprocessing unit, configured to perform second image preprocessing on two seventh images in the fifth image pair to obtain a seventh image pair, where the seventh image pair includes two images with four channels, and a pixel point in the four-channel image includes pixel values of four channels, where each pixel value corresponds to one color type of the four color types;
A third image preprocessing unit, configured to perform third image preprocessing on the two mask images, to obtain an eighth image pair, where the eighth image pair includes two mask images with four channels;
And the multiplication processing unit is used for carrying out multiplication processing on the seventh image pair and the eighth image pair to obtain the fourth image pair, and carrying out statistics on pixel values in the eighth image pair to obtain the stroboscopic scene classification tag.
Optionally, the second obtaining unit is specifically configured to:
Acquiring a ninth image pair in the shooting scene from a video, wherein the ninth image pair comprises two frames of images of which the frames are separated by a preset threshold value in the video;
And performing format conversion on the two frames of images in the ninth image pair to obtain the fifth image pair.
In this embodiment, training sample data is acquired through a second acquisition module 1301, the training sample data includes a fourth image pair under the same shooting scene and a strobe scene classification tag of the shooting scene, the fourth image pair includes a fifth image and a sixth image, a target operation module 1302 is used for inputting the fourth image pair into a target model to perform a target operation, the target operation includes performing feature processing on the fifth image and the sixth image to obtain a sixth image feature of the fifth image and a seventh image feature of the sixth image, stitching the sixth image feature and the seventh image feature to obtain a second target image feature, based on the second target image feature, performing strobe scene classification on the shooting scene to obtain a second classification result, the second classification result is used for representing a strobe intensity level of the fourth image pair, a comparison module 1303 is used for comparing the second classification result with the strobe scene classification tag to obtain a network loss value, and an update module 1304 is used for updating the network loss value based on the network loss parameter in the target model. Therefore, training of the target model can be achieved, the target model can be used for classifying the stroboscopic scenes of the shooting scenes, and accuracy of stroboscopic detection is improved.
The model training apparatus in an embodiment of the present application may be an apparatus, but may also be a component, integrated circuit, or chip in an electronic device. The device may be a mobile electronic device or a non-mobile electronic device. By way of example, the mobile electronic device may be a mobile phone, a tablet computer, a notebook computer, a palm computer, a vehicle-mounted electronic device, a wearable device, an ultra-mobile personal computer (UMPC), a netbook or a Personal Digital Assistant (PDA), etc., and the non-mobile electronic device may be a server, a network attached storage (Network Attached Storage, NAS), a personal computer (personal computer, PC), a Television (TV), a teller machine, a self-service machine, etc., and the embodiments of the present application are not limited in particular.
The model training device in the embodiment of the application can be a device with an operating system. The operating system may be an Android operating system, an ios operating system, or other possible operating systems, and the embodiment of the present application is not limited specifically.
The model training device provided by the embodiment of the present application can implement each process implemented by the method embodiment of fig. 9, and in order to avoid repetition, a description is omitted here.
Optionally, as shown in fig. 14, an electronic device 1400 is further provided in the embodiment of the present application, which includes a processor 1401, a memory 1402, and a program or an instruction stored in the memory 1402 and capable of being executed on the processor 1401, where the program or the instruction implements each process of the above embodiment of the strobe scene classification method or implements each process of the above embodiment of the model training method when executed by the processor 1401, and the same technical effects are achieved, and are not repeated herein.
The electronic device in the embodiment of the application includes the mobile electronic device and the non-mobile electronic device.
Fig. 15 is a schematic hardware structure of an electronic device implementing an embodiment of the present application.
The electronic device 1500 includes, but is not limited to, radio frequency units 1501, network modules 1502, audio output units 1503, input units 1504, sensors 1505, display units 1506, user input units 1507, interface units 1508, memory 1509, and a processor 1510.
Those skilled in the art will appreciate that the electronic device 1500 may also include a power source (e.g., a battery) for powering the various components, which may be logically connected to the processor 1510 via a power management system so as to perform functions such as managing charging, discharging, and power consumption via the power management system. The electronic device structure shown in fig. 15 does not constitute a limitation of the electronic device, and the electronic device may include more or less components than those shown in the drawings, or may combine some components, or may be arranged in different components, which will not be described in detail herein.
The electronic device may be configured to implement a strobe scene classification method, wherein the processor 1510 is configured to:
acquiring a first image pair under the same shooting scene, wherein the first image pair comprises a first image and a second image;
Performing feature processing on the first image and the second image to obtain first image features of the first image and second image features of the second image;
splicing the first image features and the second image features to obtain first target image features;
and carrying out stroboscopic scene classification on the shooting scene based on the first target image characteristics to obtain a first classification result, wherein the first classification result is used for representing the stroboscopic intensity level of the first image pair.
In this embodiment, a first image pair under the same shooting scene is acquired through a processor 1510, the first image pair includes a first image and a second image, feature processing is performed on the first image and the second image to obtain a first image feature of the first image and a second image feature of the second image, the first image feature and the second image feature are spliced to obtain a first target image feature, and based on the first target image feature, the shooting scene is subjected to stroboscopic scene classification to obtain a first classification result, wherein the first classification result is used for representing the stroboscopic intensity level of the first image pair. Therefore, the shot scenes can be classified, the stroboscopic intensity level of the image pair can be evaluated on the whole, and the accuracy of stroboscopic detection is improved.
Optionally, the processor 1510 is further configured to:
Extracting features of the first image to obtain third image features;
Extracting first weight information of a fourth image feature in a space dimension based on an attention mechanism, wherein the first weight information is used for indicating attention degree of each pixel region in the fourth image feature, the fourth image feature is obtained by carrying out feature extraction based on the second image, and when the second image carries stripes, a first weight value in the first weight information is larger than a second weight value, the first weight value is used for representing attention degree of the stripe regions in the fourth image feature, and the second weight value is used for representing attention degree of other regions except the stripe regions in the fourth image feature;
multiplying the first weight information with the third image feature to obtain a fifth image feature;
and carrying out feature processing on the fifth image feature to obtain the first image feature.
Optionally, the processor 1510 is further configured to:
Acquiring a second image pair under the shooting scene, wherein the second image pair comprises two third images which are in a preset image format and are in a single channel, and the third images comprise pixel points corresponding to four color types;
And carrying out first image preprocessing on the two third images to obtain the first image and the second image, wherein the first image and the second image are four-channel images, pixel points in the four-channel images comprise pixel values of four channels, and each pixel value corresponds to one color type of the four color types.
Optionally, the processor 1510 is further configured to:
performing format conversion on the two third images to obtain a third image pair, wherein the third image pair comprises two fourth images, and the fourth images are four-channel images;
And normalizing the two fourth images based on the maximum pixel values and the preset black level of the two fourth images to obtain the first image and the second image.
The electronic device may also be used to implement a model training method, wherein the processor 1510 is configured to:
acquiring training sample data, wherein the training sample data comprises a fourth image pair under the same shooting scene and a stroboscopic scene classification label of the shooting scene, and the fourth image pair comprises a fifth image and a sixth image;
Performing target operation on the fourth image pair to obtain a second classification result, wherein the target operation comprises feature processing is performed on the fifth image and the sixth image to obtain a sixth image feature of the fifth image and a seventh image feature of the sixth image, and the sixth image feature and the seventh image feature are spliced to obtain a second target image feature;
Comparing the second classification result with the stroboscopic scene classification label to obtain a network loss value;
and updating network parameters in the target model based on the network loss value.
Optionally, the object model includes a first branch network and a second branch network, where the first branch network is used to perform feature processing on the fifth image and the sixth image to obtain a sixth image feature of the fifth image, and the second branch network is used to perform feature processing on the fifth image and the sixth image to obtain a seventh image feature of the sixth image.
Optionally, the processor 1510 is further configured to:
Extracting features of the fifth image to obtain eighth image features;
Extracting second weight information of a ninth image feature in a space dimension based on an attention mechanism, wherein the second weight information is used for indicating attention degree of each pixel region in the ninth image feature, the ninth image feature is obtained by carrying out feature extraction based on the sixth image, and when the sixth image carries stripes, a third weight value in the second weight information is larger than a fourth weight value, the third weight value is used for representing attention degree of the stripe regions in the ninth image feature, and the fourth weight value is used for representing attention degree of other regions except the stripe regions in the ninth image feature;
multiplying the second weight information with the eighth image feature to obtain a tenth image feature;
and carrying out feature processing on the tenth image feature to obtain the sixth image feature.
Optionally, the processor 1510 is further configured to:
Acquiring a fifth image pair under the shooting scene, wherein the fifth image pair comprises two seventh images which are in a preset image format and are in a single channel, the seventh images comprise pixel points corresponding to four color types, and acquiring a sixth image pair which comprises two mask images which carry stripes and are in a single channel, and the two mask images have the same strobe intensity and different stripe positions;
performing second image preprocessing on the two seventh images in the fifth image pair to obtain a seventh image pair, wherein the seventh image pair comprises two images with four channels, pixel points in the images with four channels comprise pixel values of four channels, each pixel value corresponds to one color type in the four color types;
multiplying the seventh image pair and the eighth image pair to obtain the fourth image pair; and counting the pixel values in the eighth image pair to obtain the stroboscopic scene classification tag.
Optionally, the processor 1510 is further configured to:
Acquiring a ninth image pair in the shooting scene from a video, wherein the ninth image pair comprises two frames of images of which the frames are separated by a preset threshold value in the video;
And performing format conversion on the two frames of images in the ninth image pair to obtain the fifth image pair.
It should be appreciated that in embodiments of the present application, the input unit 1504 may include a graphics processor (Graphics Processing Unit, GPU) 15041 and a microphone 15042, the graphics processor 15041 processing image data of still pictures or video obtained by an image capturing device (e.g., a camera) in a video capturing mode or an image capturing mode. The display unit 1506 may include a display panel 15061, and the display panel 15061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 1507 includes a touch panel 15071 and other input devices 15072. The touch panel 15071 is also referred to as a touch screen. The touch panel 15071 may include two parts, a touch detection device and a touch controller. Other input devices 15072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and so forth, which are not described in detail herein. Memory 1509 may be used to store software programs as well as various data including, but not limited to, application programs and an operating system. The processor 1510 may integrate an application processor that primarily handles operating systems, user interfaces, applications, etc., with a modem processor that primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 1510.
The embodiment of the application also provides a readable storage medium, on which a program or an instruction is stored, which when executed by a processor, implements each process of the above embodiment of the method for classifying strobe scenes, or implements each process of the above embodiment of the model training method, and can achieve the same technical effect, so that repetition is avoided, and no further description is given here.
Wherein the processor is a processor in the electronic device described in the above embodiment. The readable storage medium includes a computer readable storage medium such as a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk or an optical disk, and the like.
The embodiment of the application further provides a chip, which comprises a processor and a communication interface, wherein the communication interface is coupled with the processor, and the processor is used for running programs or instructions to realize the processes of the embodiment of the strobe scene classification method or to realize the processes of the embodiment of the model training method, and the same technical effects can be achieved, so that repetition is avoided and no further description is provided here.
It should be understood that the chips referred to in the embodiments of the present application may also be referred to as system-on-chip chips, chip systems, or system-on-chip chips, etc.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Furthermore, it should be noted that the scope of the methods and apparatus in the embodiments of the present application is not limited to performing the functions in the order shown or discussed, but may also include performing the functions in a substantially simultaneous manner or in an opposite order depending on the functions involved, e.g., the described methods may be performed in an order different from that described, and various steps may be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a computer software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing an electronic device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present application.
The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those having ordinary skill in the art without departing from the spirit of the present application and the scope of the claims, which are to be protected by the present application.

Claims (20)

1. A method for classifying strobe scenes, the method comprising:
acquiring a first image pair under the same shooting scene, wherein the first image pair comprises a first image and a second image;
Performing feature processing on the first image and the second image to obtain first image features of the first image and second image features of the second image;
splicing the first image features and the second image features to obtain first target image features;
and carrying out stroboscopic scene classification on the shooting scene based on the first target image characteristics to obtain a first classification result, wherein the first classification result is used for representing the stroboscopic intensity level of the first image pair.
2. The method of claim 1, wherein performing feature processing on the first image and the second image to obtain a first image feature of the first image comprises:
Extracting features of the first image to obtain third image features;
Extracting first weight information of a fourth image feature in a space dimension based on an attention mechanism, wherein the first weight information is used for indicating attention degree of each pixel region in the fourth image feature, the fourth image feature is obtained by carrying out feature extraction based on the second image, and when the second image carries stripes, a first weight value in the first weight information is larger than a second weight value, the first weight value is used for representing attention degree of stripe regions in the fourth image feature, and the second weight value is used for representing attention degree of other regions except the stripe regions in the fourth image feature;
multiplying the first weight information with the third image feature to obtain a fifth image feature;
and carrying out feature processing on the fifth image feature to obtain the first image feature.
3. The method of claim 1, wherein the acquiring the first image pair in the same shooting scene comprises:
Acquiring a second image pair under the shooting scene, wherein the second image pair comprises two third images which are in a preset image format and are in a single channel, and the third images comprise pixel points corresponding to four color types;
And carrying out first image preprocessing on the two third images to obtain the first image and the second image, wherein the first image and the second image are four-channel images, pixel points in the four-channel images comprise pixel values of four channels, and each pixel value corresponds to one color type of the four color types.
4. A method according to claim 3, wherein said performing a first image preprocessing on said two third images to obtain said first image and said second image comprises:
performing format conversion on the two third images to obtain a third image pair, wherein the third image pair comprises two fourth images, and the fourth images are four-channel images;
And normalizing the two fourth images based on the maximum pixel values and the preset black level of the two fourth images to obtain the first image and the second image.
5. A method of model training, the method comprising:
acquiring training sample data, wherein the training sample data comprises a fourth image pair under the same shooting scene and a stroboscopic scene classification label of the shooting scene, and the fourth image pair comprises a fifth image and a sixth image;
Performing target operation on the fourth image pair to obtain a second classification result, wherein the target operation comprises feature processing is performed on the fifth image and the sixth image to obtain a sixth image feature of the fifth image and a seventh image feature of the sixth image, and the sixth image feature and the seventh image feature are spliced to obtain a second target image feature;
Comparing the second classification result with the stroboscopic scene classification label to obtain a network loss value;
and updating network parameters in the target model based on the network loss value.
6. The method of claim 5, wherein the object model comprises a first branch network and a second branch network, the first branch network is used for performing feature processing on the fifth image and the sixth image to obtain a sixth image feature of the fifth image, and the second branch network is used for performing feature processing on the fifth image and the sixth image to obtain a seventh image feature of the sixth image.
7. The method according to claim 5 or 6, wherein performing feature processing on the fifth image and the sixth image to obtain a sixth image feature of the fifth image comprises:
Extracting features of the fifth image to obtain eighth image features;
extracting second weight information of a ninth image feature in a space dimension based on an attention mechanism, wherein the second weight information is used for indicating attention degree of each pixel region in the ninth image feature, the ninth image feature is obtained by carrying out feature extraction based on the sixth image, and when the sixth image carries stripes, a third weight value in the second weight information is larger than a fourth weight value, the third weight value is used for representing attention degree of stripe regions in the ninth image feature, and the fourth weight value is used for representing attention degree of other regions except the stripe regions in the ninth image feature;
multiplying the second weight information with the eighth image feature to obtain a tenth image feature;
and carrying out feature processing on the tenth image feature to obtain the sixth image feature.
8. The method of claim 5, wherein the acquiring training sample data comprises:
Acquiring a fifth image pair under the shooting scene, wherein the fifth image pair comprises two seventh images which are in a preset image format and are in a single channel, the seventh images comprise pixel points corresponding to four color types, and acquiring a sixth image pair which comprises two mask images which carry stripes and are in a single channel, and the two mask images have the same strobe intensity and different stripe positions;
performing second image preprocessing on the two seventh images in the fifth image pair to obtain a seventh image pair, wherein the seventh image pair comprises two images with four channels, pixel points in the images with four channels comprise pixel values of four channels, each pixel value corresponds to one color type in the four color types;
multiplying the seventh image pair and the eighth image pair to obtain the fourth image pair; and counting the pixel values in the eighth image pair to obtain the stroboscopic scene classification tag.
9. The method of claim 8, wherein the acquiring a fifth image pair of the captured scene comprises:
Acquiring a ninth image pair in the shooting scene from a video, wherein the ninth image pair comprises two frames of images of which the frames are separated by a preset threshold value in the video;
And performing format conversion on the two frames of images in the ninth image pair to obtain the fifth image pair.
10. A strobe scene classification device, the device comprising:
The first acquisition module is used for acquiring a first image pair in the same shooting scene, wherein the first image pair comprises a first image and a second image;
the feature processing module is used for carrying out feature processing on the first image and the second image to obtain first image features of the first image and second image features of the second image;
the splicing module is used for splicing the first image feature and the second image feature to obtain a first target image feature;
And the classification module is used for classifying the shooting scene by the stroboscopic scene based on the first target image characteristics to obtain a first classification result, and the first classification result is used for representing the stroboscopic intensity level of the first image pair.
11. The apparatus according to claim 10, wherein the feature processing module is specifically configured to:
Extracting features of the first image to obtain third image features;
Extracting first weight information of a fourth image feature in a space dimension based on an attention mechanism, wherein the first weight information is used for indicating attention degree of each pixel region in the fourth image feature, the fourth image feature is obtained by carrying out feature extraction based on the second image, and when the second image carries stripes, a first weight value in the first weight information is larger than a second weight value, the first weight value is used for representing attention degree of stripe regions in the fourth image feature, and the second weight value is used for representing attention degree of other regions except the stripe regions in the fourth image feature;
multiplying the first weight information with the third image feature to obtain a fifth image feature;
and carrying out feature processing on the fifth image feature to obtain the first image feature.
12. The apparatus of claim 10, wherein the first acquisition module comprises:
The first acquisition unit is used for acquiring a second image pair under the shooting scene, wherein the second image pair comprises two third images which are in a preset image format and are in a single channel, and the third images comprise pixel points corresponding to four color types;
The first image preprocessing unit is used for performing first image preprocessing on the two third images to obtain the first image and the second image, the first image and the second image are four-channel images, pixel points in the four-channel images comprise pixel values of four channels, and each pixel value corresponds to one color type of the four color types.
13. The apparatus according to claim 12, wherein the first image preprocessing unit is specifically configured to:
performing format conversion on the two third images to obtain a third image pair, wherein the third image pair comprises two fourth images, and the fourth images are four-channel images;
And normalizing the two fourth images based on the maximum pixel values and the preset black level of the two fourth images to obtain the first image and the second image.
14. A model training apparatus, the apparatus comprising:
The second acquisition module is used for acquiring training sample data, wherein the training sample data comprises a fourth image pair under the same shooting scene and a stroboscopic scene classification label of the shooting scene, and the fourth image pair comprises a fifth image and a sixth image;
The target operation module is used for inputting the fourth image pair into a target model to execute target operation to obtain a second classification result, wherein the target operation comprises the steps of performing feature processing on the fifth image and the sixth image to obtain a sixth image feature of the fifth image and a seventh image feature of the sixth image; based on the second target image feature, classifying the shooting scene to obtain a second classification result, wherein the second classification result is used for representing the stroboscopic intensity level of the fourth image pair;
the comparison module is used for comparing the second classification result with the stroboscopic scene classification label to obtain a network loss value;
And the updating module is used for updating the network parameters in the target model based on the network loss value.
15. The apparatus of claim 14, wherein the object model comprises a first branch network and a second branch network, the first branch network is configured to perform feature processing on the fifth image and the sixth image to obtain a sixth image feature of the fifth image, and the second branch network is configured to perform feature processing on the fifth image and the sixth image to obtain a seventh image feature of the sixth image.
16. The apparatus according to claim 14 or 15, wherein the target operation module is specifically configured to:
Extracting features of the fifth image to obtain eighth image features;
extracting second weight information of a ninth image feature in a space dimension based on an attention mechanism, wherein the second weight information is used for indicating attention degree of each pixel region in the ninth image feature, the ninth image feature is obtained by carrying out feature extraction based on the sixth image, and when the sixth image carries stripes, a third weight value in the second weight information is larger than a fourth weight value, the third weight value is used for representing attention degree of stripe regions in the ninth image feature, and the fourth weight value is used for representing attention degree of other regions except the stripe regions in the ninth image feature;
multiplying the second weight information with the eighth image feature to obtain a tenth image feature;
and carrying out feature processing on the tenth image feature to obtain the sixth image feature.
17. The apparatus of claim 14, wherein the second acquisition module comprises:
The second acquisition unit is used for acquiring a fifth image pair under the shooting scene, wherein the fifth image pair comprises two seventh images which are in a preset image format and are in a single channel, and the seventh images comprise pixel points corresponding to four color types; the method comprises the steps of obtaining a sixth image pair, wherein the sixth image pair comprises two mask images which carry stripes and are single-channel, the stroboscopic intensity of the two mask images is the same, but the positions of the stripes are different;
A second image preprocessing unit, configured to perform second image preprocessing on two seventh images in the fifth image pair to obtain a seventh image pair, where the seventh image pair includes two images with four channels, and a pixel point in the four-channel image includes pixel values of four channels, where each pixel value corresponds to one color type of the four color types;
A third image preprocessing unit, configured to perform third image preprocessing on the two mask images, to obtain an eighth image pair, where the eighth image pair includes two mask images with four channels;
And the multiplication processing unit is used for carrying out multiplication processing on the seventh image pair and the eighth image pair to obtain the fourth image pair, and carrying out statistics on pixel values in the eighth image pair to obtain the stroboscopic scene classification tag.
18. The apparatus according to claim 17, wherein the second acquisition unit is specifically configured to:
Acquiring a ninth image pair in the shooting scene from a video, wherein the ninth image pair comprises two frames of images of which the frames are separated by a preset threshold value in the video;
And performing format conversion on the two frames of images in the ninth image pair to obtain the fifth image pair.
19. An electronic device comprising a processor, a memory, and a program or instruction stored on the memory and executable on the processor, which when executed by the processor, implements the steps of the strobe scene classification method of any one of claims 1-4, or the steps of the model training method of any one of claims 5-9.
20. A readable storage medium, wherein a program or instructions is stored on the readable storage medium, which when executed by a processor, implements the steps of the strobe scene classification method of any one of claims 1-4, or the steps of the model training method of any one of claims 5-9.
CN202210760670.8A 2022-06-29 2022-06-29 Stroboscopic scene classification method, model training method, related device and electronic equipment Active CN115035393B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210760670.8A CN115035393B (en) 2022-06-29 2022-06-29 Stroboscopic scene classification method, model training method, related device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210760670.8A CN115035393B (en) 2022-06-29 2022-06-29 Stroboscopic scene classification method, model training method, related device and electronic equipment

Publications (2)

Publication Number Publication Date
CN115035393A CN115035393A (en) 2022-09-09
CN115035393B true CN115035393B (en) 2025-05-02

Family

ID=83128968

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210760670.8A Active CN115035393B (en) 2022-06-29 2022-06-29 Stroboscopic scene classification method, model training method, related device and electronic equipment

Country Status (1)

Country Link
CN (1) CN115035393B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116051931B (en) * 2023-02-09 2024-12-03 北京爱芯科技有限公司 Model training method, image classification method, device and electronic equipment
CN119048891B (en) * 2024-08-29 2025-08-29 重庆九龙现代数字科技有限公司 Smart community scene data analysis system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109274984A (en) * 2018-10-16 2019-01-25 深圳开阳电子股份有限公司 Detect the method, apparatus and image processing equipment of light source scintillation in video sequence
CN111510709A (en) * 2020-04-24 2020-08-07 展讯通信(上海)有限公司 Image stroboscopic detection method and device, storage medium and terminal

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113688820B (en) * 2021-08-25 2024-07-16 维沃移动通信有限公司 Method, device and electronic device for identifying stroboscopic stripe information

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109274984A (en) * 2018-10-16 2019-01-25 深圳开阳电子股份有限公司 Detect the method, apparatus and image processing equipment of light source scintillation in video sequence
CN111510709A (en) * 2020-04-24 2020-08-07 展讯通信(上海)有限公司 Image stroboscopic detection method and device, storage medium and terminal

Also Published As

Publication number Publication date
CN115035393A (en) 2022-09-09

Similar Documents

Publication Publication Date Title
CN105144233B (en) Reference picture selection for moving ghost image filtering
US10074165B2 (en) Image composition device, image composition method, and recording medium
WO2019233262A1 (en) Video processing method, electronic device, and computer readable storage medium
CN115442515A (en) Image processing method and apparatus
CN115035393B (en) Stroboscopic scene classification method, model training method, related device and electronic equipment
CN105118027B (en) A kind of defogging method of image
WO2021164234A1 (en) Image processing method and image processing device
CN106952246A (en) Visible-infrared image enhancement color fusion method based on visual attention characteristics
Luan et al. Fast single image dehazing based on a regression model
CN112101260B (en) Method, device, equipment and storage medium for identifying safety belt of operator
CN101582163B (en) Method for capturing clearest human face in video monitor images
CN109712177A (en) Image processing method, apparatus, electronic device, and computer-readable storage medium
CN110674729A (en) Method for identifying number of people based on heat energy estimation, computer device and computer readable storage medium
JP2004310475A (en) Image processing apparatus, mobile phone for performing image processing, and image processing program
CN116612015A (en) Model training method, image moiré removal method, device and electronic equipment
CN112489144B (en) Image processing method, image processing device, terminal device and storage medium
Hsu et al. Color constancy and color consistency using dynamic gamut adjustment
CN111797694A (en) License plate detection method and device
CN119942588A (en) Fingertip detection method, device, equipment and storage medium based on event camera
Qian et al. Fast color contrast enhancement method for color night vision
US11935322B1 (en) Obstruction-sensitive white point determination using face information
CN114529488A (en) Image fusion method, device and equipment and storage medium
CN109191398A (en) Image processing method, image processing device, computer-readable storage medium and electronic equipment
Chou et al. Power constrained exposure correction network for mobile devices
WO2015154526A1 (en) Color restoration method and apparatus for low-illumination-level video surveillance images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant