CN115035393A - Stroboscopic scene classification method, model training method, related device and electronic equipment - Google Patents

Stroboscopic scene classification method, model training method, related device and electronic equipment Download PDF

Info

Publication number
CN115035393A
CN115035393A CN202210760670.8A CN202210760670A CN115035393A CN 115035393 A CN115035393 A CN 115035393A CN 202210760670 A CN202210760670 A CN 202210760670A CN 115035393 A CN115035393 A CN 115035393A
Authority
CN
China
Prior art keywords
image
feature
images
pair
stroboscopic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210760670.8A
Other languages
Chinese (zh)
Inventor
倪敏垚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Vivo Mobile Communication Co Ltd
Original Assignee
Vivo Mobile Communication Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Vivo Mobile Communication Co Ltd filed Critical Vivo Mobile Communication Co Ltd
Priority to CN202210760670.8A priority Critical patent/CN115035393A/en
Publication of CN115035393A publication Critical patent/CN115035393A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/35Categorising the entire scene, e.g. birthday party or wedding scene
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Abstract

The application discloses a stroboscopic scene classification method, a model training method, a related device and electronic equipment, and belongs to the technical field of artificial intelligence. The method comprises the following steps: acquiring a first image pair under the same shooting scene, wherein the first image pair comprises a first image and a second image; performing feature processing on the first image and the second image to obtain a first image feature of the first image and a second image feature of the second image; splicing the first image characteristic and the second image characteristic to obtain a first target image characteristic; and based on the first target image characteristics, carrying out stroboscopic scene classification on the shooting scene to obtain a first classification result, wherein the first classification result is used for representing the stroboscopic intensity level of the first image pair.

Description

Stroboscopic scene classification method, model training method, related device and electronic equipment
Technical Field
The application belongs to the technical field of artificial intelligence, and particularly relates to a stroboscopic scene classification method, a model training method, a related device and electronic equipment.
Background
In the process of camera shooting, in order to improve the image shooting accuracy, it is generally necessary to increase the shutter speed of a camera to shoot an image in a stroboscopic scene, however, high shutter shooting may cause an obvious bright and dark stripe (called banding) on the obtained image, which affects the imaging experience.
At present, the stroboscopic intensity of each row of pixels in a single frame image is usually calculated to predict the stroboscopic intensity, so as to obtain whether the image has stroboscopic, and the stroboscopic detection accuracy is relatively low.
Disclosure of Invention
The embodiment of the application aims to provide a stroboscopic scene classification method, a model training method, a related device and electronic equipment, and the problem of low accuracy of stroboscopic detection can be solved.
In a first aspect, an embodiment of the present application provides a stroboscopic scene classification method, where the method includes:
acquiring a first image pair under the same shooting scene, wherein the first image pair comprises a first image and a second image;
performing feature processing on the first image and the second image to obtain a first image feature of the first image and a second image feature of the second image;
splicing the first image characteristic and the second image characteristic to obtain a first target image characteristic;
and based on the first target image characteristics, carrying out stroboscopic scene classification on the shooting scene to obtain a first classification result, wherein the first classification result is used for representing the stroboscopic intensity level of the first image pair.
In a second aspect, an embodiment of the present application provides a model training method, where the method includes:
acquiring training sample data, wherein the training sample data comprises a fourth image pair under the same shooting scene and a stroboscopic scene classification label of the shooting scene, and the fourth image pair comprises a fifth image and a sixth image;
inputting the fourth image pair into a target model to execute a target operation to obtain a second classification result, wherein the target operation comprises: performing feature processing on the fifth image and the sixth image to obtain a sixth image feature of the fifth image and a seventh image feature of the sixth image; splicing the sixth image characteristic and the seventh image characteristic to obtain a second target image characteristic; based on the second target image feature, performing stroboscopic scene classification on the shooting scene to obtain a second classification result, wherein the second classification result is used for representing the stroboscopic intensity level of the fourth image pair;
comparing the second classification result with the stroboscopic scene classification label to obtain a network loss value;
updating network parameters in the target model based on the network loss values.
In a third aspect, an embodiment of the present application provides a stroboscopic scene classification apparatus, including:
the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a first image pair in the same shooting scene, and the first image pair comprises a first image and a second image;
the characteristic processing module is used for carrying out characteristic processing on the first image and the second image to obtain a first image characteristic of the first image and a second image characteristic of the second image;
the splicing module is used for splicing the first image characteristic and the second image characteristic to obtain a first target image characteristic;
and the classification module is used for carrying out stroboscopic scene classification on the shooting scene based on the first target image characteristic to obtain a first classification result, and the first classification result is used for representing the stroboscopic intensity level of the first image pair.
In a fourth aspect, an embodiment of the present application provides a model training apparatus, including:
the second acquisition module is used for acquiring training sample data, wherein the training sample data comprises a fourth image pair under the same shooting scene and a stroboscopic scene classification label of the shooting scene, and the fourth image pair comprises a fifth image and a sixth image;
a target operation module, configured to input the fourth image pair to a target model to perform a target operation, so as to obtain a second classification result, where the target operation includes: performing feature processing on the fifth image and the sixth image to obtain a sixth image feature of the fifth image and a seventh image feature of the sixth image; splicing the sixth image characteristic and the seventh image characteristic to obtain a second target image characteristic; based on the second target image feature, performing stroboscopic scene classification on the shooting scene to obtain a second classification result, wherein the second classification result is used for representing the stroboscopic intensity level of the fourth image pair;
the comparison module is used for comparing the second classification result with the stroboscopic scene classification label to obtain a network loss value;
and the updating module is used for updating the network parameters in the target model based on the network loss value.
In a fifth aspect, an embodiment of the present application provides an electronic device, which includes a processor, a memory, and a program or instructions stored on the memory and executable on the processor, and when executed by the processor, the program or instructions implement the steps of the stroboscopic scene classification method according to the first aspect or the steps of the model training method according to the second aspect.
In a sixth aspect, the present application provides a readable storage medium, on which a program or instructions are stored, which when executed by a processor implement the steps of the stroboscopic scene classification method according to the first aspect or the steps of the model training method according to the second aspect.
In a seventh aspect, an embodiment of the present application provides a chip, where the chip includes a processor and a communication interface, where the communication interface is coupled to the processor, and the processor is configured to execute a program or instructions to implement the steps of the stroboscopic scene classification method according to the first aspect or the steps of the model training method according to the second aspect.
In the embodiment of the application, a first image pair in the same shooting scene is obtained, wherein the first image pair comprises a first image and a second image; performing feature processing on the first image and the second image to obtain a first image feature of the first image and a second image feature of the second image; splicing the first image characteristic and the second image characteristic to obtain a first target image characteristic; and based on the first target image characteristics, carrying out stroboscopic scene classification on the shooting scene to obtain a first classification result, wherein the first classification result is used for representing the stroboscopic intensity level of the first image pair. Therefore, the shooting scene can be classified into the stroboscopic scenes, the stroboscopic intensity level of the image pair can be evaluated as a whole, and the accuracy of stroboscopic detection is improved.
Drawings
Fig. 1 is a flowchart of a strobe scene classification method provided in an embodiment of the present application;
FIG. 2 is a schematic illustration of a banding display in a first image pair;
FIG. 3 is a schematic diagram of an image format of a first image;
FIG. 4 is a schematic diagram of an exemplary object model;
FIG. 5 is a block diagram of an exemplary downsampling convolution module;
FIG. 6 is a schematic diagram of an exemplary double convolution module;
FIG. 7 is a schematic diagram of an exemplary cross-attention block configuration;
fig. 8 is a schematic diagram of a process of converting a RAW domain image into an RGGB image;
FIG. 9 is a flow chart of a model training method provided by an embodiment of the present application;
FIG. 10 is a schematic illustration of a process for converting a three-channel image into a two-dimensional matrix;
FIG. 11 is a process diagram of an anti-demosaicing method;
fig. 12 is a structural diagram of a strobe scene classification device according to an embodiment of the present application;
FIG. 13 is a block diagram of a model training apparatus according to an embodiment of the present application;
fig. 14 is a block diagram of an electronic device provided in an embodiment of the present application;
fig. 15 is a schematic hardware structure diagram of an electronic device implementing an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described clearly below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments that can be derived by one of ordinary skill in the art from the embodiments given herein are intended to be within the scope of the present disclosure.
The terms first, second and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that embodiments of the application may be practiced in sequences other than those illustrated or described herein, and that the terms "first," "second," and the like are generally used herein in a generic sense and do not limit the number of terms, e.g., the first term can be one or more than one. In addition, "and/or" in the specification and claims means at least one of connected objects, a character "/" generally means that a preceding and succeeding related objects are in an "or" relationship.
The strobe scene classification provided by the embodiments of the present application is described in detail below with reference to the accompanying drawings by specific embodiments and application scenarios thereof.
Fig. 1 is a flowchart of a strobe scene classification method according to an embodiment of the present application, and as shown in fig. 1, the method includes the following steps:
step 101, a first image pair in the same shooting scene is obtained, wherein the first image pair comprises a first image and a second image.
In this step, the first image pair may include two images, a first image and a second image. The first image and the second image are respectively images under the same shooting scene, namely the image contents of the first image and the second image under the shooting lens are the same. For example, two scenic photographs are photographed in the same photographing scene, and the scenic photographs are the same.
In the first image pair in the shooting scene, a bright and dark stripe (which may be referred to as a banding) may be carried, or the bright and dark stripe may not be carried, which is not specifically limited herein. The banding refers to a bright and dark stripe existing in an imaged image captured in a stroboscopic scene, wherein stroboscopic may be caused by a light source flickering at a fixed frequency, and the flickering is an unstable visual phenomenon caused by light stimulation with fluctuation of brightness or spectral distribution over time.
The object of the present embodiment is to perform feature processing on the first image pair to perform strobe scene classification of the shooting scene based on the image features of the first image pair, so as to obtain a classification result for characterizing the strobe intensity level of the first image pair. That is to say, this embodiment can confirm whether this shooting scene is the stroboscopic scene, and be no stroboscopic and have the stroboscopic promptly, and in this shooting scene is the stroboscopic scene's the condition, the stroboscopic intensity level of first image pair, like light stroboscopic, heavy stroboscopic etc..
In the case where banding is present in the first image pair, the image content under the taking lens can be regarded as a banding image background. And because the first image and the second image are obtained in the same shooting scene, the strobe intensity of the bandind in the first image and the second image is the same, but the positions of the bandind are different. As shown in fig. 2, the bandind201 present in the first image (the image shown in the left diagram of fig. 2) is displayed with a position offset from the bandind202 present in the second image (the image shown in the right diagram of fig. 2).
The image formats of the first image and the second image are the same, and may be RGB formats, or preset image formats such as a bayer format of a RAW domain, where the bayer format may include pixel points corresponding to four color types, which are respectively a pixel point representing red (represented by R), a pixel point representing blue (represented by B), a pixel point near red representing green (represented by GR), and a pixel point near blue representing green (represented by GB). The bayer format may be an RGGB format, a GRBG format, or the like.
In an optional implementation manner, the first image pair in the RAW domain may be used to perform the stroboscopic scene classification in this embodiment, pixels in the image in the RAW domain and the ambient brightness are in a linear relationship, and the RGB image is obtained by processing the image signal in the RAW domain, and the pixels and the ambient brightness are already in a nonlinear relationship, and because the banding phenomenon occurs and is directly related to the ambient brightness change, the first image pair performs the stroboscopic scene classification using the image in the RAW domain, which may better conform to the stroboscopic detection scene, and improve the accuracy of the stroboscopic detection.
In an optional implementation manner, a target model may be adopted, a stroboscopic scene classification of a shooting scene is performed based on a first image pair, to adapt to a structure of the target model, the first image pair may be a four-channel image, a pixel point in the four-channel image includes pixel values of four channels, and each pixel value corresponds to one color type of the four color types. The first image pair may be obtained by format converting a RAW domain image.
As shown in fig. 3, which is one of the image formats of the first image or the second image, taking the first image as an example, the first image may include four channels, where each channel corresponds to one type of pixel points, which are R, GR, GB, and B respectively.
The first image pair may be obtained in a variety of manners, for example, two adjacent frames of RAW domain images in a real-time shooting scene may be obtained as the first image pair, for example, two adjacent frames in a video, or two frames of images with a frame number that is separated by a preset threshold may be obtained as the first image pair, and for example, format conversion may be performed on the obtained two frames of images to convert into two frames of images of four channels as the first image pair.
And 102, performing feature processing on the first image and the second image to obtain a first image feature of the first image and a second image feature of the second image.
In this embodiment, a target model may be used for performing strobe scene classification, and specifically, a first image pair may be input to the target model to perform a target operation, where the target operation includes feature processing on an input image to obtain an image feature of the input image.
The object model may include a first branch network and a second branch network, and the first branch network and the second branch network may be identical in structure. The first branch network and the second branch network may perform feature processing on the images independently, that is, the first branch network may perform feature processing on the first image to obtain a first image feature, and the second branch network may perform the feature processing on the second image to obtain a second image feature.
Because the banding areas in two images in the same shooting scene are different in position, the first branch network and the second branch network can perform feature processing on the images in a mutually constrained mode based on an attention mechanism, namely the attention mechanism is adopted, the image features extracted by the second branch network are adjusted based on the image features extracted by the first branch network, and the attention mechanism is adopted, the image features extracted by the first branch network are adjusted based on the image features extracted by the second branch network, so that the attention of the network to the banding areas is promoted.
The first branch network and the second branch network may perform feature processing on the input image using a Down Conv Module (Down Conv Module) to obtain a first image feature of the first image and a second image feature of the second image, respectively.
The first image feature may include a color feature, a texture feature, a shape feature, a spatial relationship feature, and the like of the first image, in the case that the first image carries a banding, the color feature may include a brightness feature of the banding in the first image, and the spatial relationship feature may include a feature for characterizing a position of the banding region in the first image. The second image feature may include a color feature, a texture feature, a shape feature, a spatial relationship feature, and the like of the second image, in the case that the second image carries a banding, the color feature may include a brightness feature of the banding in the second image, and the spatial relationship feature may also include a feature for characterizing a position of the banding region in the second image.
Since the position of the banding region of the first image is different from that of the second image, the object model can perform stroboscopic scene classification by using the difference of the position of the banding region in the first image feature and the position of the banding region in the second image feature and the banding brightness feature. And determining the stroboscopic intensity levels such as no stroboscopic effect, light stroboscopic effect and heavy stroboscopic effect according to the difference of the positions of the banding areas and the banding brightness characteristics.
And 103, splicing the first image characteristic and the second image characteristic to obtain a first target image characteristic.
In this step, the first image feature and the second image feature may be stitched based on a stitching module, such as a concat module, to obtain a first target image feature.
The concat module performs feature splicing on channels of the first image feature and the second image feature, for example, feature maps with the first image feature and the second image feature being in (512,14,14) scales respectively, and after the first image feature and the second image feature are spliced by the concat module, a combined feature map with the (1024,14,14) scales can be obtained.
And 104, carrying out stroboscopic scene classification on the shooting scene based on the first target image feature to obtain a first classification result, wherein the first classification result is used for representing the stroboscopic intensity level of the first image pair.
In this step, after the target model performs a series of feature extraction operations on the first target image feature, an input feature of the full-link layer may be obtained, and the full-link layer is configured to perform stroboscopic scene classification on the shooting scene based on the input feature to obtain a first classification result.
If the first target image feature is a feature map with a scale of (1024,14,14), after a series of feature extraction operations, the input feature F6 of the full connection layer is obtained, and its scale is (512,1, 1), which is specifically performed as shown in the following formula (1).
F6=G(G(concat(F14,F24),2),7) (1)
The image feature extraction method comprises the steps of obtaining a first target image feature through feature splicing, wherein F14 and F24 are the first image feature and the second image feature respectively, the scale is (512,14 and 14), the first target image feature is obtained through feature splicing, and G is feature extraction operation which can be achieved through a down-sampling convolution module.
Then, the target model can classify the stroboscopic scene of the shooting scene based on the feature F6 through the full-link layer, and obtain the network output with the scale (M, 1, 1). Where M may be the number of stroboscopic scene classifications, such as two, three, or four classifications, and each value output by the network may represent a probability value of a corresponding strobe intensity level.
In an optional implementation mode, the full link layer can be classified three times, the full link layer respectively corresponds to the scenes of stroboscopic intensity levels without stroboscopic effect, mild stroboscopic effect and severe stroboscopic effect, the scale of the network output is (3,1,1), the probability values of the stroboscopic intensity levels without stroboscopic effect, mild stroboscopic effect and severe stroboscopic effect are determined according to the arrangement sequence, and the stroboscopic intensity level corresponding to the maximum probability value can be determined as the stroboscopic scene classification result of the shooting scene. For example, if the network output is (0.2,0.7,0.1), the strobe scene classification result is a light strobe.
In the embodiment, a first image pair in the same shooting scene is obtained, wherein the first image pair comprises a first image and a second image; performing feature processing on the first image and the second image to obtain a first image feature of the first image and a second image feature of the second image; splicing the first image characteristic and the second image characteristic to obtain a first target image characteristic; and based on the first target image characteristics, carrying out stroboscopic scene classification on the shooting scene to obtain a first classification result, wherein the first classification result is used for representing the stroboscopic intensity level of the first image pair. Therefore, the shooting scenes can be classified into stroboscopic scenes, the stroboscopic intensity level of the image pairs can be evaluated as a whole, and the accuracy of stroboscopic detection can be improved.
In addition, because the present embodiment can perform the classification of the strobe scene, the strobe intensity level is distinguished as the scenes of no strobe, light strobe and heavy strobe. Therefore, in practical application, for a camera without the banding elimination algorithm, the camera can be helped to adjust a shutter, and the appearance of a banded image is avoided; if the shooting scene with a certain shutter speed is detected to be a light stroboscopic scene, the shutter speed can be properly reduced at the moment so as to avoid the occurrence of bandcast images. For a camera with a banding elimination algorithm, the banding elimination algorithm of the camera can be assisted to distinguish whether the stroboscopic scene can be processed, if the shooting scene is a mild stroboscopic scene, the banding elimination algorithm can be used for eliminating banding in the image, so that the banding elimination algorithm is used in a targeted manner, and the feasibility of the banding elimination algorithm is improved.
Optionally, the step 102 specifically includes:
performing feature extraction on the first image to obtain a third image feature;
extracting first weight information of a fourth image feature on a spatial dimension based on an attention mechanism, wherein the first weight information is used for indicating attention degree of each pixel region in the fourth image feature, the fourth image feature is obtained by performing the feature extraction based on the second image, and when the second image carries a stripe, a first weight value in the first weight information is larger than a second weight value, the first weight value is used for representing attention degree of the stripe region in the fourth image feature, and the second weight value is used for representing attention degree of other regions except the stripe region in the fourth image feature;
multiplying the first weight information and the third image characteristic to obtain a fifth image characteristic;
and performing feature processing on the fifth image feature to obtain the first image feature.
In this embodiment, the first branch network and the second branch network may perform feature processing of images in a mutually constrained manner based on an attention mechanism, that is, the target model in this embodiment may be a network model with double inputs, in which two input images have a mutual reference function and are mutually constrained.
As shown in fig. 4, the diagram is a schematic structural diagram of an exemplary object model, and its input may be a first image and a second image, which include two branch networks, a first branch network 41 and a second branch network 42, respectively, and the structures of the first branch network and the second branch network are the same.
Taking the first branch network as an example, the first branch network may include a feature processing module 411, which may include a feature extraction module 4111 and a cross attention block 4112. The number of the feature processing modules 411 may be one, two, or even multiple, as shown in fig. 4, the number of the feature processing modules 411 is three, so as to ensure that feature information representing an image can be sufficiently extracted.
In particular, the feature extraction Module 4111 may include a downsampling convolution Module 4113(Down Conv Module) and a doubling convolution Module 4114(Double Conv Module). The feature extraction module may use the pooling layer and the convolution block to perform feature extraction, and the process of feature extraction may be as shown in the following formula (2).
G(F,S)=conv(maxpool(F)↓ S ) (2)
Where F represents the input features, G (F, S) represents the output features, maxpool represents the maximum pooling layer, S represents the step size of the pooling layer, conv represents the convolution combination block of convolutional layer + regularization + activation layer.
Fig. 5 is a schematic structural diagram of an exemplary downsampling convolution module, which may include, as shown in fig. 5, a fully-connected layer 51 and convolution combination blocks 52 of convolutional layer + regularization + active layer, where the number of convolution combination blocks of convolutional layer + regularization + active layer is two. The scales of the input image features are (c _ in, h _ in, w _ in), the scales of the output image features are (c _ in, h _ in/2, w _ in/2) after the fully-connected layer is passed through, and the scales of the output image features are (c _ out, h _ in/2, w _ in/2) after the convolution combination blocks of the convolutional layer + regularization + active layer are connected in series. Wherein c _ out is greater than c _ in.
Fig. 6 is a schematic structural diagram of an exemplary doubled convolution module, and as shown in fig. 6, the doubled convolution module may include convolution combination blocks 61 of convolution layer + regularization + active layer, where the number of convolution combination blocks of convolution layer + regularization + active layer is two, the scale of the input image feature is (c _ in, h _ in, w _ in), and the scale of the output image feature after passing through the convolution combination blocks of convolution layer + regularization + active layer is (c _ out, h _ in, w _ in). Wherein c _ out is greater than c _ in.
After the first image is subjected to feature extraction by the feature extraction module 4111, a third image feature may be obtained, which is denoted by F11. Accordingly, after the second image is subjected to feature extraction by the feature extraction module 4211 in the feature processing module 421, a fourth image feature can be obtained, which is denoted by F21.
In an alternative embodiment, the scale of the input image may be (4,224,224), and after the feature extraction module performs feature extraction, the scales of the obtained third image feature and the fourth image feature may be both (64,112,112).
The target model may comprise a cross attention block 4112, and in case of deriving the third image feature and the fourth image feature, the cross attention block in the first branch network may extract first weight information of the fourth image feature in a spatial dimension based on an attention mechanism. When the first image pair carries bright and dark stripes, the third image feature and the fourth image feature may include a banding position feature, and the banding position feature in the image features may be adjusted by the cross attention block 4112 based on an attention mechanism, so that a position where important attention is needed is highlighted.
Fig. 7 is a schematic structural diagram of an exemplary cross attention block, and as shown in fig. 7, the cross attention mechanism may extract weight information of one image feature in a spatial dimension based on the attention mechanism (which may be a spatial attention mechanism), and multiply another image feature with the weight information to obtain an output image feature.
Taking the adjustment of the third image feature F11 as an example, the first weight information of the fourth image feature F21 in the spatial dimension is extracted by the convolution combination block of the convolutional layer + regularization + activation layer, and is represented by SA21, and its scale is (1, h _ in, w _ in). The first weight information SA21 gives different weight values to the pixels in the fourth image feature F21 in the spatial dimension, and is used for indicating the attention degree of each pixel area in the fourth image feature, so as to highlight the position needing important attention.
When the second image carries stripes, that is, banding, a first weight value in the first weight information is greater than a second weight value in the first weight information, the first weight value is used for representing the attention degree of a banding region in the fourth image feature, and the second weight value is used for representing the attention degree of other regions except the banding region in the fourth image feature, that is, SA21 is mainly used for indicating the banding region in the fourth image feature F21, and the specific operation steps are as shown in the following formula (3).
SA21=conv(F21) (3)
Because the positions of the banding areas of the two images in the first image pair are different, if obvious dislocation exists, the difference of the positions of the banding areas is the key point of stroboscopic scene classification. Specifically, the first weight information SA21 and the third image feature F11 are multiplied, and the banding region information in the fourth image feature F21 affects the feature information in the third image feature F11, mainly to distinguish the brightness difference at the same position in the third image feature F11 and the fourth image feature F21. The method can avoid the interference of the background area between the two images, and gradually and accurately extract the change characteristics of the banding information, thereby improving the attention of the network to the banding area.
Accordingly, when the fourth image feature F21 is adjusted, the fourth image feature F21 is adjusted using the spatial weight calculated by the third image feature F11.
Then, the fifth image feature obtained by multiplying the first weight information SA21 and the third image feature F11 may be subjected to feature processing by another feature processing module connected in series with the feature processing module 411 in the first branch network, so as to obtain the first image feature F14. The feature processing may include feature extraction and feature adjustment, and the feature processing may be similar to the operation performed by the feature processing module 411, which is not described herein again. Correspondingly, the second branch network may also perform feature processing on the second image in the same manner, so as to obtain a second image feature F24, which is not described herein again.
In an alternative embodiment, the scale of the input image may be (4,224,224), the first feature processing module performs feature processing to obtain an image feature with the scale of (64,112,112), the second feature processing module performs feature processing to obtain an image feature with the scale of (128,56,56), the third feature processing module performs feature processing to obtain an image feature with the scale of (256,28,28), and the feature extraction module performs feature extraction to obtain the first image feature F14 and the second image feature F24 with the scales of (512,14, 14).
As shown in fig. 4, the target model may further include a stitching module 43, which stitches the first image feature F14 and the second image feature F24 to obtain a first target image feature. After a series of feature extraction is performed on the first target image feature, input features F6 of a full connection layer are obtained, the scale of the input features F6 is (512,1, 1), the target model may further include a full connection layer 44, and the full connection layer 44 outputs a classification result based on the input features F6.
In this embodiment, by using the difference characteristic of the position of the banding area in the first image pair in the same shooting scene, the image feature of one branch network is adjusted based on the image feature of the other branch network through the cross attention mechanism, and accordingly, the banding position feature in the image feature can be adjusted, so that the banding area information in the image feature of one branch network affects the feature information in the image feature of the other branch network. Therefore, the brightness difference of the same position in the image characteristics of the two images can be distinguished, so that the interference of a background area between the two images can be avoided, the change characteristics of the banding information are gradually and accurately extracted, the attention of a network to the banding area is promoted, and the accuracy of stroboscopic detection can be further improved.
Optionally, the step 101 specifically includes:
acquiring a second image pair in the shooting scene, wherein the second image pair comprises two third images with image formats of a preset image format and a single channel, and the third images comprise pixel points corresponding to four color types;
and performing first image preprocessing on the two third images to obtain the first image and the second image, wherein the first image and the second image are both images of four channels, pixel points in the images of the four channels comprise pixel values of the four channels, and each pixel value corresponds to one color type of the four color types.
In this embodiment, the first image pair may be obtained by format conversion of an image in a RAW domain bayer format.
Specifically, the second image pair may be a RAW domain image, the image format of the second image pair is a preset image format, that is, a bayer format, and the bayer format may include pixel points corresponding to four color types, that is, pixel points representing red (represented by R), pixel points representing blue (represented by B), pixel points near red representing green (represented by GR), and pixel points near blue representing green (represented by GB). The bayer format may be an RGGB format, a GRBG format, or the like.
The second image pair may be obtained in a plurality of manners, for example, two adjacent frames, that is, two consecutive frames of RAW domain images in a real-time shooting scene may be obtained as the second image pair, and for example, two adjacent frames in a video or two frames of images with a frame number separated by a preset threshold may be obtained, and format conversion is performed on the two frames of images to obtain the second image pair.
And then, respectively carrying out first image preprocessing on the two third images to obtain the first image and the second image. Wherein the first image pre-processing may comprise a format conversion, the purpose of the format conversion being to convert a bayer format into a four-channel image format. Fig. 8 is a schematic diagram of a process of converting a RAW domain image into an RGGB image, and as shown in fig. 8, a first image preprocessing is performed on the RAW domain image (an image format may be an RGGB format, a GRBG format, or the like), and the RAW domain image with a scale of (H, W) is converted into an RGGB image with a size of (H/2, W/2, 4), where a pixel point in the RGGB image includes pixel values of four channels, and each pixel value corresponds to one color type of the four color types.
The first image and the second image may be obtained based on the two RGGB images obtained by the format conversion, for example, the RGGB image may be normalized, and the size of the normalized RGGB image may be adjusted to (224, 224, 4).
In the embodiment, because the banding phenomenon is directly related to the change of the ambient brightness, and the pixels in the RAW domain image and the ambient brightness are in a linear relation, the RAW domain image is used for carrying out stroboscopic scene classification, so that the stroboscopic scene classification can be more consistent with the stroboscopic detection scene, and the accuracy of stroboscopic detection is improved.
Optionally, the performing the first image preprocessing on the two third images to obtain the first image and the second image includes:
performing format conversion on the two third images to obtain a third image pair, wherein the third image pair comprises two fourth images, and the fourth images are images of four channels;
and normalizing the two fourth images based on the maximum pixel values of the two fourth images and a preset black level to obtain the first image and the second image.
In this embodiment, the first image preprocessing may include format conversion and normalization processing. The normalization process is to normalize the RGGB image, which is the fourth image, based on the maximum pixel value of the RGGB image and a preset black level (e.g., 64 or 1024) for each RGGB image, and the formula is shown in the following expression (4).
Figure BDA0003720966640000141
Where bl is the black level, max is the maximum pixel value, I is the pixel value In the RGGB image, and In is the pixel value obtained after normalization.
In this embodiment, normalization is performed based on the maximum pixel value of the RGGB image and a preset black level, and thus, the accuracy of image normalization can be improved.
It should be noted that the target model needs to be trained in advance before being used to fix the network parameters of the target model, and the training process will be described in detail in the following embodiments. The model training of the target model for strobe scene classification provided by the embodiments of the present application is described in detail below with reference to the accompanying drawings by specific embodiments and application scenarios thereof.
Fig. 9 is a flowchart of a model training method provided in an embodiment of the present application, and as shown in fig. 9, the method includes the following steps:
step 901, acquiring training sample data, wherein the training sample data comprises a fourth image pair under the same shooting scene and a stroboscopic scene classification label of the shooting scene, and the fourth image pair comprises a fifth image and a sixth image;
step 902, inputting the fourth image pair to a target model to perform a target operation, to obtain a second classification result, where the target operation includes: performing feature processing on the fifth image and the sixth image to obtain a sixth image feature of the fifth image and a seventh image feature of the sixth image; splicing the sixth image characteristic and the seventh image characteristic to obtain a second target image characteristic; performing stroboscopic scene classification on the shooting scene based on the second target image feature to obtain a second classification result, wherein the second classification result is used for representing the stroboscopic intensity level of the fourth image pair;
step 903, comparing the second classification result with the stroboscopic scene classification label to obtain a network loss value;
step 904, updating the network parameters in the target model based on the network loss value.
This embodiment describes the training process of the target model.
In step 901, the training sample data may include at least one shooting scene and a fourth image pair in each shooting scene, and the training sample data may further include a strobe scene classification label in each shooting scene.
The fourth image pair is obtained in a manner similar to that of the first image pair, which is not described herein again.
It should be noted that, in order to make the training effect of the target model relatively good, the training sample data needs to include image pairs of each stroboscopic scene, for example, image pairs without stroboscopic scenes, image pairs with light stroboscopic scenes, and image pairs with heavy stroboscopic scenes. In an optional embodiment, the fourth image may be obtained by performing fusion processing on a background image not carrying banding and a banding mask image carrying various intensities, so as to simplify acquisition of training sample data.
The stroboscopic scene classification label of the shooting scene can be obtained by manual labeling of a user, and can also be obtained by performing pixel statistics on a mask image carrying banding, which is not specifically limited herein.
In an alternative embodiment, the strobe scene classification label may be a one-dimensional vector, and its scale may be (3,1,1), and in the case of no strobe, the strobe scene classification label may be [1,0,0], in the case of light strobe, the strobe scene classification label may be [0,1,0], in the case of heavy strobe, the strobe scene classification label may be [0,0,1 ].
In step 902, a fourth image pair may be input to the target model to perform a target operation, resulting in a second classification result. The manner of inputting the fourth image pair to the target model to perform the target operation is similar to the manner of performing the stroboscopic scene classification of the shooting scene on the first image pair based on the target model in the above embodiment, and is not repeated here. Accordingly, the resulting second classification result is similar in concept to the first classification result, the second classification result characterizing the strobe intensity level of the fourth image pair, and the first classification result characterizing the strobe intensity level of the first image pair.
In step 903, the second classification result may be compared with the stroboscopic scene classification label to obtain a network loss value. In an optional embodiment, the second classification result may be compared with the stroboscopic scene classification label in a vector distance comparison manner, so as to obtain a network loss value.
In step 904, the network parameters of the target model may be updated by a gradient descent method, and the network parameters of the target model may be continuously updated by a loop iteration method until the difference between the second classification result and the stroboscopic scene classification label, that is, the network loss value is smaller than a certain threshold value and convergence is achieved, at which time the training of the target model may be completed.
In this embodiment, training sample data is obtained, where the training sample data includes a fourth image pair in the same shooting scene and a stroboscopic scene classification label of the shooting scene, and the fourth image pair includes a fifth image and a sixth image; inputting the fourth image pair into a target model to execute a target operation to obtain a second classification result, wherein the target operation comprises: performing feature processing on the fifth image and the sixth image to obtain a sixth image feature of the fifth image and a seventh image feature of the sixth image; splicing the sixth image characteristic and the seventh image characteristic to obtain a second target image characteristic; based on the second target image feature, performing stroboscopic scene classification on the shooting scene to obtain a second classification result, wherein the second classification result is used for representing the stroboscopic intensity level of the fourth image pair; comparing the second classification result with the stroboscopic scene classification label to obtain a network loss value; updating network parameters in the target model based on the network loss values. Therefore, training of the target model can be achieved, the target model can be used for stroboscopic scene classification of shooting scenes, and accuracy of stroboscopic detection is improved.
Optionally, the target model includes a first branch network and a second branch network, the first branch network is configured to perform feature processing on the fifth image and the sixth image to obtain a sixth image feature of the fifth image, and the second branch network is configured to perform feature processing on the fifth image and the sixth image to obtain a seventh image feature of the sixth image.
In this embodiment, the target model may include two branch networks, the first branch network and the second branch network may perform feature processing of the images independently of each other, and because the positions of bandind areas in two images in the same shooting scene are different, the first branch network and the second branch network may perform feature processing of the images in a mutually constrained manner based on the attention mechanism, that is, the attention mechanism may be adopted to adjust the image features extracted by the second branch network based on the image features extracted by the first branch network, and the attention mechanism may be adopted to adjust the image features extracted by the first branch network based on the image features extracted by the second branch network, so as to improve the attention of the networks to the bandind areas.
In the embodiment, the target model can accept the input of two images through two branch networks, so that the stroboscopic scene classification can be performed by using the difference of the position of the banding area in the two images in the same shooting scene and the banding brightness characteristic.
Optionally, the step 902 specifically includes:
performing feature extraction on the fifth image to obtain eighth image features;
extracting second weight information of a ninth image feature in a spatial dimension based on an attention mechanism, wherein the second weight information is used for indicating attention degree of each pixel region in the ninth image feature, the ninth image feature is obtained by performing the feature extraction based on the sixth image, and when the sixth image carries a stripe, a third weight value in the second weight information is larger than a fourth weight value, the third weight value is used for representing attention degree of the stripe region in the ninth image feature, and the fourth weight value is used for representing attention degree of other regions except the stripe region in the ninth image feature;
multiplying the second weight information and the eighth image characteristic to obtain a tenth image characteristic;
and performing feature processing on the tenth image feature to obtain a sixth image feature.
In this embodiment, the first branch network and the second branch network may perform feature processing of images in a mutually constrained manner based on an attention mechanism, that is, the target model may be a network model with double inputs, in which two input images have mutual reference and are mutually constrained.
The manner in which the first branch network performs feature processing on the fifth image is similar to the manner in which the first image performs feature processing, and details are not described here.
In this embodiment, by using the difference characteristic of the position of the banding area in the first image pair in the same shooting scene, the image feature of one branch network is adjusted based on the image feature of the other branch network through the cross attention mechanism, and accordingly, the banding position feature in the image feature can be adjusted, so that the banding area information in the image feature of one branch network affects the feature information in the image feature of the other branch network. Therefore, the brightness difference of the same position in the image characteristics of the two images can be distinguished, so that the interference of a background area between the two images can be avoided, the change characteristics of the binding information are gradually and accurately extracted, the attention of a network to the binding area is promoted, and the accuracy of stroboscopic detection can be further improved.
Optionally, the step 901 specifically includes:
acquiring a fifth image pair in the shooting scene, wherein the fifth image pair comprises two seventh images with image formats of a preset image format and a single channel, and the seventh images comprise pixel points corresponding to four color types; acquiring a sixth image pair, wherein the sixth image pair comprises two mask images which carry stripes and are single-channel, the stroboscopic intensity of the two mask images is the same, but the positions of the stripes are different;
performing second image preprocessing on two seventh images in the fifth image pair to obtain a seventh image pair, wherein the seventh image pair comprises two images with four channels, pixel points in the images with four channels comprise pixel values of four channels, and each pixel value corresponds to one color type in the four color types; performing third image preprocessing on the two mask images to obtain an eighth image pair, wherein the eighth image pair comprises the two mask images with four channels;
multiplying the seventh image pair and the eighth image pair to obtain a fourth image pair; and counting the pixel values in the eighth image pair to obtain the stroboscopic scene classification label.
In this embodiment, the fourth image may be obtained by performing fusion processing on the background image without carrying banding and the banding mask image with each intensity, so that the acquisition of training sample data may be simplified.
The image formats of the fifth image pair and the second image pair are similar, and are RAW domain images, the image format is a bayer format, and the obtaining manner of the fifth image pair and the second image pair may be similar to that of the second image pair, which is not described herein again.
The sixth image pair may carry stripes or banding, and the sixth image pair may include two mask images of a single channel, which may be grayscale images, with the strobe intensities of the two mask images being the same but the stripe positions being different. The sixth image pair may be obtained in a plurality of manners, for example, a prestored sixth image pair may be obtained, and a sixth image peer sent by another electronic device may be received.
Then, for each image in the fifth image pair, second image preprocessing may be performed on the image, and a manner of performing the second image preprocessing on the image may be similar to a manner of performing the first image preprocessing on the image in the second image pair, which is not described herein again.
Correspondingly, the third image preprocessing may be performed on the sixth image pair to obtain an eighth image pair, where the eighth image pair includes two mask images with four channels.
Wherein the third image pre-processing may include format conversion to convert the images in the sixth image pair (which are single-channel images) into four-channel images. Before format conversion, in order to improve the generalization of the target model, the third image preprocessing may further include performing random conversion on the two mask images to adjust the banding strength. After the format conversion is performed, in order to further simplify the acquisition of training sample data, so that a large number of mask images with different banding strengths can be acquired, the third image preprocessing may further include a process for changing the degree of banding in the mask images.
Specifically, in order to improve the generalization of the target model, the two mask images are randomly converted, and the banding strength is adjusted. The two mask images are gray level images, in order to adapt to a four-channel bayer format, the gray level images are normalized and copied into four-channel images, and then the four-channel mask images are transformed. First, a random exponentiation may be performed on the mask map to change the weighting degree of the banding as a whole, and the adjustment formula is shown in the following formula (5).
M1=M rt ,rt∈(0.5,5) (5)
Wherein, M is the lightness degree of the banding before the strength adjustment, and M1 is the lightness degree of the banding after the strength adjustment.
And then, multiplying the four channels by different random enhancement coefficients respectively, and changing the color form of banding, wherein the enhancement coefficients of the channel index c corresponding to the two G channels are the same, and the adjustment formula is shown as the following formula (6).
M2 c =1-e(1-M1 c ),c∈(R,G,B)&e∈(0.5,1) (6)
Wherein, M1 c M2 for lightness of banding before color shape adjustment for channel index c c The degree of banding adjusted for the color shape of the channel index c.
The transformation parameters of both mask images are the same, an eighth image pair may be obtained, which may comprise both mask images after enhancement transformation, denoted by K1 and K2, respectively.
Then, the banding intensity may be calculated based on the eighth image pair, and specifically, the pixel values of the mask image in the eighth image pair may be counted to determine the minimum pixel value P min According to the pixel value P min And judging the banding strength, and expressing the banding strength by gt, wherein y represents the one-shot code of the gt.
In an alternative embodiment, the pixel value P min The relationship with gt can be expressed by the following formula (7), and the relationship with y can be expressed by the following formula (8).
Figure BDA0003720966640000201
Figure BDA0003720966640000202
Wherein 0 represents no stroboflash, i.e., no banding in the image, 1 represents light stroboflash, i.e., light banding in the image, and 2 represents heavy stroboflash, i.e., heavy banding in the image.
Accordingly, based on the minimum pixel value of the mask image in the eighth image pair, and in combination with equations (7) and (8) above, a stroboscopic scene classification label can be determined.
The two mask images K1 and K2 are respectively resized so that the sizes thereof are the same as those of the two images R1 and R1 in the seventh image pair. Subsequently, banding is added to R1 and R1, specifically, K1 and R1, K2 and R2 are multiplied, and resize is adjusted to the size of the input image of the target model, for example, (4,224,224), so as to obtain a fourth image pair, i.e., the input image of the target model, which is represented by KR1 and KR2, respectively.
In the embodiment, because the banding phenomenon is directly related to the change of the ambient brightness, and the pixels in the RAW domain image and the ambient brightness are in a linear relation, the RAW domain image is used for carrying out stroboscopic scene classification, so that the stroboscopic scene classification can be more consistent with the stroboscopic detection scene, and the accuracy of stroboscopic detection is improved. And moreover, the input image of the target model is obtained by carrying out fusion processing on the background image without carrying the banding and the banding mask image carrying various intensities, and the stroboscopic scene classification label for model training is obtained on the basis of the banding mask image, so that the acquisition of training sample data can be simplified.
Optionally, the acquiring a fifth image pair in the shooting scene includes:
acquiring a ninth image pair under the shooting scene from a video, wherein the ninth image pair comprises two frame images with the frame number separated by a preset threshold value in the video;
and performing format conversion on the two frames of images in the ninth image pair to obtain the fifth image pair.
In this embodiment, the ninth image pair may be a three-channel image, and the image format may be an RGB image. The preset threshold can be set according to actual conditions, and generally cannot be set to be too large, so that the two acquired images are not images in the same shooting scene any more.
In an alternative embodiment, a continuous 8-frame image may be obtained every 5 seconds (30 frames per second, i.e. every 150 frames) in the video, and two random frames may be taken from the 8-frame image as a ninth image pair, which is respectively denoted by I1 and I2.
And then, carrying out format conversion on the ninth image pair to obtain a fifth image pair so as to convert I 1 ,I 2 From an RGB image to a RAW domain image.
The format conversion method comprises the following specific steps:
step 1, passing formula
Figure BDA0003720966640000211
Will I 1 And I 2 And (6) normalizing.
Step 2, inverse gamma operation, after the RAW domain image is converted into the RGB image, in order to better accord with human vision and enhance contrast, gamma operation is carried out on the RAW domain image, and the formula is I RGB =I RAW g Where g is typically 1/2.2, therefore, the inverse gamma operation formula may be I gj =I j 2.2 J is belonged to {1,2}, and the image after the inverse gamma operation, namely I, can be obtained through the processing g1 ,I g2
In step 3, reverse color correction, in order to convert the RAW domain image into the RGB image, a conversion matrix is usually multiplied on the RAW domain image, and therefore, when converting the RGB image into the RAW domain image, the conversion matrix needs to be multiplied by an inverse matrix of the conversion matrix. In an alternative embodiment, the inverse of the conversion matrix may be
Figure BDA0003720966640000221
And the image arranged by three channels (h, w, c) of the RGB image is converted into a two-dimensional matrix of (h x w, c), as shown in fig. 10, which facilitates the calculation operation.
Definition I g1 ,I g2 After conversion as shown in FIG. 10, becomes I s1 ,I s2 Multiplying ccm by matrix multiplication and converting the size into (h, w, c) again to obtain I c1 ,I c2
Through steps 2 and 3, the non-linear relationship between the RGB image and the ambient brightness can be converted to a near-linear relationship. Then, a RAW domain image in a bayer format is obtained by an anti-demosaicing method. The anti-demosaicing method is shown in FIG. 11, to reduce I c1 ,I c2 Conversion to bayer format results in a fifth image pair.
Then, the RAW domain image in the bayer format is converted into four channels according to the color channels, resulting in two images R1 and R1 in the seventh image pair.
The RAW domain map with banding is difficult to collect, and manual classification is subjective and difficult to classify accurately. Therefore, in the present embodiment, the format of two consecutive frames of images at arbitrary positions of the video is converted into the RAW domain image by combining the data, and the objectively classified banding mask image is added to the RAW domain image to form the input image of the target model, thereby further simplifying the acquisition of the training sample data.
It should be noted that, in the stroboscopic scene classification method provided in the embodiment of the present application, the execution subject may be a stroboscopic scene classification device, or a control module in the stroboscopic scene classification device, which is used for executing the stroboscopic scene classification method. In the embodiment of the present application, a stroboscopic scene classification device is taken as an example to execute a stroboscopic scene classification method, and the stroboscopic scene classification device provided in the embodiment of the present application is described.
Referring to fig. 12, fig. 12 is a structural diagram of a strobe scene classification device according to an embodiment of the present application, and as shown in fig. 12, a strobe scene classification device 1200 includes:
a first obtaining module 1201, configured to obtain a first image pair in the same shooting scene, where the first image pair includes a first image and a second image;
a feature processing module 1202, configured to perform feature processing on the first image and the second image to obtain a first image feature of the first image and a second image feature of the second image;
a stitching module 1203, configured to stitch the first image feature and the second image feature to obtain a first target image feature;
a classification module 1204, configured to perform stroboscopic scene classification on the shooting scene based on the first target image feature, to obtain a first classification result, where the first classification result is used to characterize a stroboscopic intensity level of the first image pair.
Optionally, the feature processing module 1202 is specifically configured to:
performing feature extraction on the first image to obtain a third image feature;
extracting first weight information of a fourth image feature in a spatial dimension based on an attention mechanism, wherein the first weight information is used for indicating attention degrees of pixel regions in the fourth image feature, the fourth image feature is obtained by performing feature extraction based on the second image, and when a stripe is carried in the second image, a first weight value in the first weight information is greater than a second weight value, the first weight value is used for representing the attention degrees of the stripe regions in the fourth image feature, and the second weight value is used for representing the attention degrees of other regions except the stripe regions in the fourth image feature;
multiplying the first weight information and the third image characteristic to obtain a fifth image characteristic;
and performing feature processing on the fifth image feature to obtain the first image feature.
Optionally, the first obtaining module 1201 includes:
the first acquisition unit is used for acquiring a second image pair under the shooting scene, wherein the second image pair comprises two third images with image formats of a preset image format and a single channel, and the third images comprise pixel points corresponding to four color types;
the first image preprocessing unit is configured to perform first image preprocessing on the two third images to obtain the first image and the second image, where the first image and the second image are both images of four channels, a pixel point in the image of the four channels includes pixel values of the four channels, and each pixel value corresponds to one color type of the four color types.
Optionally, the first image preprocessing unit is specifically configured to:
performing format conversion on the two third images to obtain a third image pair, wherein the third image pair comprises two fourth images, and the fourth images are images of four channels;
and normalizing the two fourth images based on the maximum pixel values of the two fourth images and a preset black level to obtain the first image and the second image.
In this embodiment, a first obtaining module 1201 obtains a first image pair in the same shooting scene, where the first image pair includes a first image and a second image; performing feature processing on the first image and the second image through a feature processing module 1202 to obtain a first image feature of the first image and a second image feature of the second image; splicing the first image feature and the second image feature through a splicing module 1203 to obtain a first target image feature; based on the first target image feature, the classification module 1204 performs stroboscopic scene classification on the shooting scene to obtain a first classification result, where the first classification result is used to characterize a stroboscopic intensity level of the first image pair. Therefore, the shooting scenes can be classified into stroboscopic scenes, the stroboscopic intensity level of the image pairs can be evaluated as a whole, and the accuracy of stroboscopic detection can be improved.
The stroboscopic scene classification apparatus in the embodiment of the present application may be an apparatus, or may be a component, an integrated circuit, or a chip in an electronic device. The device can be mobile electronic equipment or non-mobile electronic equipment. By way of example, the mobile electronic device may be a mobile phone, a tablet computer, a notebook computer, a palm top computer, a vehicle-mounted electronic device, a wearable device, an ultra-mobile personal computer (UMPC), a netbook or a Personal Digital Assistant (PDA), and the like, and the non-mobile electronic device may be a server, a Network Attached Storage (NAS), a Personal Computer (PC), a Television (TV), a teller machine or a self-service machine, and the like, and the embodiments of the present application are not particularly limited.
The strobe scene classification device in the embodiment of the present application may be a device having an operating system. The operating system may be an Android (Android) operating system, an ios operating system, or other possible operating systems, and embodiments of the present application are not limited specifically.
The stroboscopic scene classification device provided in the embodiment of the present application can implement each process implemented in the method embodiment of fig. 1, and is not described here again to avoid repetition.
It should be noted that, in the stroboscopic scene classification method provided in the embodiment of the present application, the execution subject may be a stroboscopic scene classification device, or a control module in the stroboscopic scene classification device, which is used for executing the stroboscopic scene classification method. In the embodiment of the present application, a stroboscopic scene classification device is taken as an example to execute a stroboscopic scene classification method, and the stroboscopic scene classification device provided in the embodiment of the present application is described.
Referring to fig. 13, fig. 13 is a block diagram of a model training apparatus according to an embodiment of the present application, and as shown in fig. 13, a model training apparatus 1300 includes:
a second obtaining module 1301, configured to obtain training sample data, where the training sample data includes a fourth image pair in the same shooting scene and a stroboscopic scene classification label of the shooting scene, and the fourth image pair includes a fifth image and a sixth image;
a target operation module 1302, configured to perform a target operation on the fourth image pair input to the target model to obtain a second classification result, where the target operation includes: performing feature processing on the fifth image and the sixth image to obtain a sixth image feature of the fifth image and a seventh image feature of the sixth image; splicing the sixth image characteristic and the seventh image characteristic to obtain a second target image characteristic; based on the second target image feature, performing stroboscopic scene classification on the shooting scene to obtain a second classification result, wherein the second classification result is used for representing the stroboscopic intensity level of the fourth image pair;
a comparing module 1303, configured to compare the second classification result with the stroboscopic scene classification label to obtain a network loss value;
an updating module 1304 for updating the network parameters in the target model based on the network loss value.
Optionally, the target model includes a first branch network and a second branch network, the first branch network is configured to perform feature processing on the fifth image and the sixth image to obtain a sixth image feature of the fifth image, and the second branch network is configured to perform feature processing on the fifth image and the sixth image to obtain a seventh image feature of the sixth image.
Optionally, the target operation module 1302 is specifically configured to:
performing feature extraction on the fifth image to obtain eighth image features;
extracting second weight information of a ninth image feature in a spatial dimension based on an attention mechanism, wherein the second weight information is used for indicating attention degrees of pixel regions in the ninth image feature, the ninth image feature is obtained by performing the feature extraction based on the sixth image, and in the case that the sixth image carries a stripe, a third weight value in the second weight information is greater than a fourth weight value, the third weight value is used for representing the attention degrees of the stripe regions in the ninth image feature, and the fourth weight value is used for representing the attention degrees of other regions except the stripe regions in the ninth image feature;
multiplying the second weight information and the eighth image characteristic to obtain a tenth image characteristic;
and performing feature processing on the tenth image feature to obtain a sixth image feature.
Optionally, the second obtaining module 1301 includes:
the second acquisition unit is used for acquiring a fifth image pair in the shooting scene, wherein the fifth image pair comprises two seventh images with image formats of a preset image format and a single channel, and the seventh images comprise pixel points corresponding to four color types; acquiring a sixth image pair, wherein the sixth image pair comprises two mask images which carry stripes and are single-channel, the stroboscopic intensity of the two mask images is the same, but the positions of the stripes are different;
a second image preprocessing unit, configured to perform second image preprocessing on two seventh images in the fifth image pair to obtain a seventh image pair, where the seventh image pair includes two images with four channels, a pixel point in the image with four channels includes pixel values of four channels, and each pixel value corresponds to one color type of the four color types;
the third image preprocessing unit is used for carrying out third image preprocessing on the two mask images to obtain an eighth image pair, and the eighth image pair comprises the two mask images with four channels;
a multiplication processing unit, configured to multiply the seventh image pair and the eighth image pair to obtain a fourth image pair; and counting the pixel values in the eighth image pair to obtain the stroboscopic scene classification label.
Optionally, the second obtaining unit is specifically configured to:
acquiring a ninth image pair under the shooting scene from a video, wherein the ninth image pair comprises two frame images with the frame number separated by a preset threshold value in the video;
and performing format conversion on the two frames of images in the ninth image pair to obtain the fifth image pair.
In this embodiment, training sample data is obtained through the second obtaining module 1301, where the training sample data includes a fourth image pair in the same shooting scene and a stroboscopic scene classification label of the shooting scene, and the fourth image pair includes a fifth image and a sixth image; inputting the fourth image pair to the target model through the target operation module 1302 to perform a target operation, which includes: performing feature processing on the fifth image and the sixth image to obtain a sixth image feature of the fifth image and a seventh image feature of the sixth image; splicing the sixth image characteristic and the seventh image characteristic to obtain a second target image characteristic; based on the second target image feature, performing stroboscopic scene classification on the shooting scene to obtain a second classification result, wherein the second classification result is used for representing the stroboscopic intensity level of the fourth image pair; comparing the second classification result with the stroboscopic scene classification label through a comparison module 1303 to obtain a network loss value; network parameters in the target model are updated based on the network loss values by an update module 1304. So, can realize the training to the target model for this target model can be used for shooting the stroboscopic scene classification of scene, improves stroboscopic detection's accuracy.
The model training apparatus in the embodiment of the present application may be an apparatus, and may also be a component, an integrated circuit, or a chip in an electronic device. The device can be mobile electronic equipment or non-mobile electronic equipment. By way of example, the mobile electronic device may be a mobile phone, a tablet computer, a notebook computer, a palm top computer, a vehicle-mounted electronic device, a wearable device, an ultra-mobile personal computer (UMPC), a netbook or a Personal Digital Assistant (PDA), and the like, and the non-mobile electronic device may be a server, a Network Attached Storage (NAS), a Personal Computer (PC), a Television (TV), a teller machine or a self-service machine, and the like, and the embodiments of the present application are not particularly limited.
The model training apparatus in the embodiment of the present application may be an apparatus having an operating system. The operating system may be an Android (Android) operating system, an ios operating system, or other possible operating systems, and embodiments of the present application are not limited specifically.
The model training device provided in the embodiment of the present application can implement each process implemented in the method embodiment of fig. 9, and is not described here again to avoid repetition.
Optionally, as shown in fig. 14, an electronic device 1400 is further provided in this embodiment of the present application, and includes a processor 1401, a memory 1402, and a program or an instruction stored in the memory 1402 and executable on the processor 1401, where the program or the instruction is executed by the processor 1401 to implement each process of the above stroboscopic scene classification method embodiment, or to implement each process of the above model training method embodiment, and can achieve the same technical effect, and details are not repeated here to avoid repetition.
It should be noted that the electronic device in the embodiment of the present application includes the mobile electronic device and the non-mobile electronic device described above.
Fig. 15 is a schematic hardware structure diagram of an electronic device implementing an embodiment of the present application.
The electronic device 1500 includes, but is not limited to: a radio frequency unit 1501, a network module 1502, an audio output unit 1503, an input unit 1504, a sensor 1505, a display unit 1506, a user input unit 1507, an interface unit 1508, a memory 1509, and a processor 1510.
Those skilled in the art will appreciate that the electronic device 1500 may also include a power supply (e.g., a battery) for powering the various components, which may be logically coupled to the processor 1510 via a power management system to perform functions such as managing charging, discharging, and power consumption via the power management system. The electronic device structure shown in fig. 15 does not constitute a limitation of the electronic device, and the electronic device may include more or less components than those shown, or combine some components, or arrange different components, and thus, the description is omitted here.
The electronic device may be configured to implement a stroboscopic scene classification method, wherein the processor 1510 is configured to:
acquiring a first image pair under the same shooting scene, wherein the first image pair comprises a first image and a second image;
performing feature processing on the first image and the second image to obtain a first image feature of the first image and a second image feature of the second image;
splicing the first image characteristic and the second image characteristic to obtain a first target image characteristic;
and based on the first target image characteristics, carrying out stroboscopic scene classification on the shooting scene to obtain a first classification result, wherein the first classification result is used for representing the stroboscopic intensity level of the first image pair.
In this embodiment, a first image pair in the same shooting scene is obtained by the processor 1510, where the first image pair includes a first image and a second image; performing feature processing on the first image and the second image to obtain a first image feature of the first image and a second image feature of the second image; splicing the first image characteristic and the second image characteristic to obtain a first target image characteristic; and based on the first target image characteristics, carrying out stroboscopic scene classification on the shooting scene to obtain a first classification result, wherein the first classification result is used for representing the stroboscopic intensity level of the first image pair. Therefore, the shooting scene can be classified into the stroboscopic scenes, the stroboscopic intensity level of the image pair can be evaluated as a whole, and the accuracy of stroboscopic detection is improved.
Optionally, the processor 1510 is further configured to:
performing feature extraction on the first image to obtain a third image feature;
extracting first weight information of a fourth image feature on a spatial dimension based on an attention mechanism, wherein the first weight information is used for indicating attention degree of each pixel region in the fourth image feature, the fourth image feature is obtained by performing the feature extraction based on the second image, and when the second image carries a stripe, a first weight value in the first weight information is larger than a second weight value, the first weight value is used for representing attention degree of the stripe region in the fourth image feature, and the second weight value is used for representing attention degree of other regions except the stripe region in the fourth image feature;
multiplying the first weight information and the third image characteristic to obtain a fifth image characteristic;
and performing feature processing on the fifth image feature to obtain the first image feature.
Optionally, the processor 1510 is further configured to:
acquiring a second image pair in the shooting scene, wherein the second image pair comprises two third images with image formats of a preset image format and a single channel, and the third images comprise pixel points corresponding to four color types;
and performing first image preprocessing on the two third images to obtain the first image and the second image, wherein the first image and the second image are both images of four channels, pixel points in the images of the four channels comprise pixel values of the four channels, and each pixel value corresponds to one color type of the four color types.
Optionally, the processor 1510 is further configured to:
performing format conversion on the two third images to obtain a third image pair, wherein the third image pair comprises two fourth images, and the fourth images are images of four channels;
and normalizing the two fourth images based on the maximum pixel values of the two fourth images and a preset black level to obtain the first image and the second image.
The electronic device may further be configured to implement a model training method, wherein the processor 1510 is configured to:
acquiring training sample data, wherein the training sample data comprises a fourth image pair under the same shooting scene and a stroboscopic scene classification label of the shooting scene, and the fourth image pair comprises a fifth image and a sixth image;
inputting the fourth image pair into a target model to execute a target operation to obtain a second classification result, wherein the target operation comprises: performing feature processing on the fifth image and the sixth image to obtain a sixth image feature of the fifth image and a seventh image feature of the sixth image; splicing the sixth image characteristic and the seventh image characteristic to obtain a second target image characteristic; performing stroboscopic scene classification on the shooting scene based on the second target image feature to obtain a second classification result, wherein the second classification result is used for representing the stroboscopic intensity level of the fourth image pair;
comparing the second classification result with the stroboscopic scene classification label to obtain a network loss value;
updating network parameters in the target model based on the network loss values.
Optionally, the target model includes a first branch network and a second branch network, the first branch network is configured to perform feature processing on the fifth image and the sixth image to obtain a sixth image feature of the fifth image, and the second branch network is configured to perform feature processing on the fifth image and the sixth image to obtain a seventh image feature of the sixth image.
Optionally, the processor 1510 is further configured to:
performing feature extraction on the fifth image to obtain eighth image features;
extracting second weight information of a ninth image feature in a spatial dimension based on an attention mechanism, wherein the second weight information is used for indicating attention degrees of pixel regions in the ninth image feature, the ninth image feature is obtained by performing the feature extraction based on the sixth image, and in the case that the sixth image carries a stripe, a third weight value in the second weight information is greater than a fourth weight value, the third weight value is used for representing the attention degrees of the stripe regions in the ninth image feature, and the fourth weight value is used for representing the attention degrees of other regions except the stripe regions in the ninth image feature;
multiplying the second weight information and the eighth image characteristic to obtain a tenth image characteristic;
and performing feature processing on the tenth image feature to obtain a sixth image feature.
Optionally, the processor 1510 is further configured to:
acquiring a fifth image pair in the shooting scene, wherein the fifth image pair comprises two seventh images which are in a single channel and have preset image formats, and the seventh images comprise pixel points corresponding to four color types; acquiring a sixth image pair, wherein the sixth image pair comprises two mask images which carry stripes and are single-channel, the stroboscopic intensity of the two mask images is the same, but the positions of the stripes are different;
performing second image preprocessing on two seventh images in the fifth image pair to obtain a seventh image pair, wherein the seventh image pair comprises two images with four channels, pixel points in the images with four channels comprise pixel values of four channels, and each pixel value corresponds to one color type in the four color types; performing third image preprocessing on the two mask images to obtain an eighth image pair, wherein the eighth image pair comprises the two mask images with four channels;
multiplying the seventh image pair and the eighth image pair to obtain a fourth image pair; and counting the pixel values in the eighth image pair to obtain the stroboscopic scene classification label.
Optionally, the processor 1510 is further configured to:
acquiring a ninth image pair under the shooting scene from a video, wherein the ninth image pair comprises two frame images with the frame number separated by a preset threshold value in the video;
and performing format conversion on the two frames of images in the ninth image pair to obtain the fifth image pair.
It should be understood that in the embodiment of the present application, the input Unit 1504 may include a Graphics Processing Unit (GPU) 15041 and a microphone 15042, and the Graphics processor 15041 processes image data of still pictures or videos obtained by an image capturing device (such as a camera) in a video capturing mode or an image capturing mode. The display unit 1506 may include a display panel 15061, and the display panel 15061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 1507 includes a touch panel 15071 and other input devices 15072. A touch panel 15071, also referred to as a touch screen. The touch panel 15071 may include two parts of a touch detection device and a touch controller. Other input devices 15072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, and a joystick, which are not described in detail herein. The memory 1509 may be used to store software programs as well as various data including, but not limited to, application programs and an operating system. The processor 1510 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communication. It will be appreciated that the modem processor described above may not be integrated into the processor 1510.
The embodiments of the present application further provide a readable storage medium, where a program or an instruction is stored, and when the program or the instruction is executed by a processor, the program or the instruction implements each process of the above-mentioned stroboscopic scene classification method embodiment, or implements each process of the above-mentioned model training method embodiment, and can achieve the same technical effect, and in order to avoid repetition, the detailed description is omitted here.
The processor is the processor in the electronic device described in the above embodiment. The readable storage medium includes a computer readable storage medium, such as a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and so on.
The embodiment of the present application further provides a chip, where the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to run a program or an instruction to implement each process of the above stroboscopic scene classification method embodiment, or to implement each process of the above model training method embodiment, and can achieve the same technical effect, and in order to avoid repetition, the details are not repeated here.
It should be understood that the chips mentioned in the embodiments of the present application may also be referred to as system-on-chip, system-on-chip or system-on-chip, etc.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Further, it should be noted that the scope of the methods and apparatus of the embodiments of the present application is not limited to performing the functions in the order illustrated or discussed, but may include performing the functions in a substantially simultaneous manner or in a reverse order based on the functions involved, e.g., the methods described may be performed in an order different than that described, and various steps may be added, omitted, or combined. In addition, features described with reference to certain examples may be combined in other examples.
Through the description of the foregoing embodiments, it is clear to those skilled in the art that the method of the foregoing embodiments may be implemented by software plus a necessary general hardware platform, and certainly may also be implemented by hardware, but in many cases, the former is a better implementation. Based on such understanding, the technical solutions of the present application may be embodied in the form of a computer software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling an electronic device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present application.
While the present embodiments have been described with reference to the accompanying drawings, it is to be understood that the present embodiments are not limited to those precise embodiments, which are intended to be illustrative rather than restrictive, and that various changes and modifications may be effected therein by one skilled in the art without departing from the scope of the appended claims.

Claims (20)

1. A method of strobe scene classification, the method comprising:
acquiring a first image pair under the same shooting scene, wherein the first image pair comprises a first image and a second image;
performing feature processing on the first image and the second image to obtain a first image feature of the first image and a second image feature of the second image;
splicing the first image characteristic and the second image characteristic to obtain a first target image characteristic;
and based on the first target image characteristics, carrying out stroboscopic scene classification on the shooting scene to obtain a first classification result, wherein the first classification result is used for representing the stroboscopic intensity level of the first image pair.
2. The method of claim 1, wherein performing feature processing on the first image and the second image to obtain a first image feature of the first image comprises:
performing feature extraction on the first image to obtain a third image feature;
extracting first weight information of a fourth image feature on a spatial dimension based on an attention mechanism, wherein the first weight information is used for indicating attention degree of each pixel region in the fourth image feature, the fourth image feature is obtained by performing the feature extraction based on the second image, and when the second image carries a stripe, a first weight value in the first weight information is larger than a second weight value, the first weight value is used for representing attention degree of the stripe region in the fourth image feature, and the second weight value is used for representing attention degree of other regions except the stripe region in the fourth image feature;
multiplying the first weight information and the third image characteristic to obtain a fifth image characteristic;
and performing feature processing on the fifth image feature to obtain the first image feature.
3. The method of claim 1, wherein the obtaining a first pair of images of a same capture scene comprises:
acquiring a second image pair in the shooting scene, wherein the second image pair comprises two third images with image formats of a preset image format and a single channel, and the third images comprise pixel points corresponding to four color types;
and performing first image preprocessing on the two third images to obtain a first image and a second image, wherein the first image and the second image are both images of four channels, pixel points in the images of the four channels comprise pixel values of the four channels, and each pixel value corresponds to one of the four color types.
4. The method according to claim 3, wherein the performing the first image preprocessing on the two third images to obtain the first image and the second image comprises:
performing format conversion on the two third images to obtain a third image pair, wherein the third image pair comprises two fourth images, and the fourth images are images of four channels;
and normalizing the two fourth images based on the maximum pixel values of the two fourth images and a preset black level to obtain the first image and the second image.
5. A method of model training, the method comprising:
acquiring training sample data, wherein the training sample data comprises a fourth image pair under the same shooting scene and a stroboscopic scene classification label of the shooting scene, and the fourth image pair comprises a fifth image and a sixth image;
inputting the fourth image pair into a target model to execute a target operation to obtain a second classification result, wherein the target operation comprises: performing feature processing on the fifth image and the sixth image to obtain a sixth image feature of the fifth image and a seventh image feature of the sixth image; splicing the sixth image characteristic and the seventh image characteristic to obtain a second target image characteristic; based on the second target image feature, performing stroboscopic scene classification on the shooting scene to obtain a second classification result, wherein the second classification result is used for representing the stroboscopic intensity level of the fourth image pair;
comparing the second classification result with the stroboscopic scene classification label to obtain a network loss value;
updating network parameters in the target model based on the network loss values.
6. The method of claim 5, wherein the target model comprises a first branch network and a second branch network, the first branch network is configured to perform feature processing on the fifth image and the sixth image to obtain a sixth image feature of the fifth image, and the second branch network is configured to perform feature processing on the fifth image and the sixth image to obtain a seventh image feature of the sixth image.
7. The method according to claim 5 or 6, wherein performing feature processing on the fifth image and the sixth image to obtain a sixth image feature of the fifth image comprises:
performing feature extraction on the fifth image to obtain eighth image features;
extracting second weight information of a ninth image feature in a spatial dimension based on an attention mechanism, wherein the second weight information is used for indicating attention degree of each pixel region in the ninth image feature, the ninth image feature is obtained by performing the feature extraction based on the sixth image, and when the sixth image carries a stripe, a third weight value in the second weight information is larger than a fourth weight value, the third weight value is used for representing attention degree of the stripe region in the ninth image feature, and the fourth weight value is used for representing attention degree of other regions except the stripe region in the ninth image feature;
multiplying the second weight information and the eighth image characteristic to obtain a tenth image characteristic;
and performing feature processing on the tenth image feature to obtain a sixth image feature.
8. The method of claim 5, wherein the obtaining training sample data comprises:
acquiring a fifth image pair in the shooting scene, wherein the fifth image pair comprises two seventh images with image formats of a preset image format and a single channel, and the seventh images comprise pixel points corresponding to four color types; acquiring a sixth image pair, wherein the sixth image pair comprises two mask images which carry stripes and are single-channel, and the stroboscopic intensities of the two mask images are the same, but the positions of the stripes are different;
performing second image preprocessing on two seventh images in the fifth image pair to obtain a seventh image pair, wherein the seventh image pair comprises two images with four channels, pixel points in the images with four channels comprise pixel values of four channels, and each pixel value corresponds to one color type in the four color types; performing third image preprocessing on the two mask images to obtain an eighth image pair, wherein the eighth image pair comprises the two mask images with four channels;
multiplying the seventh image pair and the eighth image pair to obtain a fourth image pair; and counting the pixel values in the eighth image pair to obtain the stroboscopic scene classification label.
9. The method of claim 8, wherein said obtaining a fifth pair of images in the capture scene comprises:
acquiring a ninth image pair under the shooting scene from a video, wherein the ninth image pair comprises two frame images with the frame number separated by a preset threshold value in the video;
and performing format conversion on two frames of images in the ninth image pair to obtain the fifth image pair.
10. A stroboscopic scene classification apparatus, comprising:
the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a first image pair in the same shooting scene, and the first image pair comprises a first image and a second image;
the characteristic processing module is used for carrying out characteristic processing on the first image and the second image to obtain a first image characteristic of the first image and a second image characteristic of the second image;
the splicing module is used for splicing the first image characteristic and the second image characteristic to obtain a first target image characteristic;
and the classification module is used for carrying out stroboscopic scene classification on the shooting scene based on the first target image characteristic to obtain a first classification result, and the first classification result is used for representing the stroboscopic intensity level of the first image pair.
11. The apparatus of claim 10, wherein the feature processing module is specifically configured to:
performing feature extraction on the first image to obtain a third image feature;
extracting first weight information of a fourth image feature on a spatial dimension based on an attention mechanism, wherein the first weight information is used for indicating attention degree of each pixel region in the fourth image feature, the fourth image feature is obtained by performing the feature extraction based on the second image, and when the second image carries a stripe, a first weight value in the first weight information is larger than a second weight value, the first weight value is used for representing attention degree of the stripe region in the fourth image feature, and the second weight value is used for representing attention degree of other regions except the stripe region in the fourth image feature;
multiplying the first weight information and the third image characteristic to obtain a fifth image characteristic;
and performing feature processing on the fifth image feature to obtain the first image feature.
12. The apparatus of claim 10, wherein the first obtaining module comprises:
the first acquisition unit is used for acquiring a second image pair under the shooting scene, wherein the second image pair comprises two third images with image formats of a preset image format and a single channel, and the third images comprise pixel points corresponding to four color types;
the first image preprocessing unit is configured to perform first image preprocessing on the two third images to obtain the first image and the second image, where the first image and the second image are both images of four channels, pixel points in the images of four channels include pixel values of four channels, and each pixel value corresponds to one color type of the four color types.
13. The apparatus according to claim 12, wherein the first image pre-processing unit is specifically configured to:
performing format conversion on the two third images to obtain a third image pair, wherein the third image pair comprises two fourth images, and the fourth images are images of four channels;
and normalizing the two fourth images based on the maximum pixel values of the two fourth images and a preset black level to obtain the first image and the second image.
14. A model training apparatus, the apparatus comprising:
the second acquisition module is used for acquiring training sample data, wherein the training sample data comprises a fourth image pair under the same shooting scene and a stroboscopic scene classification label of the shooting scene, and the fourth image pair comprises a fifth image and a sixth image;
a target operation module, configured to input the fourth image pair to a target model to perform a target operation, so as to obtain a second classification result, where the target operation includes: performing feature processing on the fifth image and the sixth image to obtain a sixth image feature of the fifth image and a seventh image feature of the sixth image; splicing the sixth image characteristic and the seventh image characteristic to obtain a second target image characteristic; based on the second target image feature, performing stroboscopic scene classification on the shooting scene to obtain a second classification result, wherein the second classification result is used for representing the stroboscopic intensity level of the fourth image pair;
the comparison module is used for comparing the second classification result with the stroboscopic scene classification label to obtain a network loss value;
and the updating module is used for updating the network parameters in the target model based on the network loss value.
15. The apparatus of claim 14, wherein the target model comprises a first branch network and a second branch network, the first branch network is configured to perform feature processing on the fifth image and the sixth image to obtain a sixth image feature of the fifth image, and the second branch network is configured to perform feature processing on the fifth image and the sixth image to obtain a seventh image feature of the sixth image.
16. The apparatus according to claim 14 or 15, wherein the target operation module is specifically configured to:
performing feature extraction on the fifth image to obtain eighth image features;
extracting second weight information of a ninth image feature in a spatial dimension based on an attention mechanism, wherein the second weight information is used for indicating attention degree of each pixel region in the ninth image feature, the ninth image feature is obtained by performing the feature extraction based on the sixth image, and when the sixth image carries a stripe, a third weight value in the second weight information is larger than a fourth weight value, the third weight value is used for representing attention degree of the stripe region in the ninth image feature, and the fourth weight value is used for representing attention degree of other regions except the stripe region in the ninth image feature;
multiplying the second weight information and the eighth image characteristic to obtain a tenth image characteristic;
and performing feature processing on the tenth image feature to obtain a sixth image feature.
17. The apparatus of claim 14, wherein the second obtaining module comprises:
the second acquisition unit is used for acquiring a fifth image pair in the shooting scene, wherein the fifth image pair comprises two seventh images with image formats of a preset image format and a single channel, and the seventh images comprise pixel points corresponding to four color types; acquiring a sixth image pair, wherein the sixth image pair comprises two mask images which carry stripes and are single-channel, the stroboscopic intensity of the two mask images is the same, but the positions of the stripes are different;
a second image preprocessing unit, configured to perform second image preprocessing on two seventh images in the fifth image pair to obtain a seventh image pair, where the seventh image pair includes two images with four channels, a pixel point in the image with four channels includes pixel values of four channels, and each pixel value corresponds to one color type of the four color types;
the third image preprocessing unit is used for carrying out third image preprocessing on the two mask images to obtain an eighth image pair, and the eighth image pair comprises the two mask images with four channels;
the multiplication processing unit is used for multiplying the seventh image pair and the eighth image pair to obtain a fourth image pair; and counting the pixel values in the eighth image pair to obtain the stroboscopic scene classification label.
18. The apparatus according to claim 17, wherein the second obtaining unit is specifically configured to:
acquiring a ninth image pair under the shooting scene from a video, wherein the ninth image pair comprises two frame images with the frame number separated by a preset threshold value in the video;
and performing format conversion on the two frames of images in the ninth image pair to obtain the fifth image pair.
19. An electronic device comprising a processor, a memory, and a program or instructions stored on the memory and executable on the processor, the program or instructions, when executed by the processor, implementing the steps of the stroboscopic scene classification method of any of claims 1-4 or the steps of the model training method of any of claims 5-9.
20. A readable storage medium, characterized in that a program or instructions are stored on the readable storage medium, which program or instructions, when executed by a processor, carry out the steps of the stroboscopic scene classification method of any one of claims 1-4 or the steps of the model training method of any one of claims 5-9.
CN202210760670.8A 2022-06-29 2022-06-29 Stroboscopic scene classification method, model training method, related device and electronic equipment Pending CN115035393A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210760670.8A CN115035393A (en) 2022-06-29 2022-06-29 Stroboscopic scene classification method, model training method, related device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210760670.8A CN115035393A (en) 2022-06-29 2022-06-29 Stroboscopic scene classification method, model training method, related device and electronic equipment

Publications (1)

Publication Number Publication Date
CN115035393A true CN115035393A (en) 2022-09-09

Family

ID=83128968

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210760670.8A Pending CN115035393A (en) 2022-06-29 2022-06-29 Stroboscopic scene classification method, model training method, related device and electronic equipment

Country Status (1)

Country Link
CN (1) CN115035393A (en)

Similar Documents

Publication Publication Date Title
CN111741211B (en) Image display method and apparatus
Jiang et al. A switched view of Retinex: Deep self-regularized low-light image enhancement
WO2021164234A1 (en) Image processing method and image processing device
JP7286010B2 (en) Human body attribute recognition method, device, electronic device and computer program
CN107808136A (en) Image processing method, device, readable storage medium storing program for executing and computer equipment
CN111583161A (en) Blurred image enhancement method, computer device and storage medium
CN107911625A (en) Light measuring method, device, readable storage medium storing program for executing and computer equipment
CN108876753A (en) Optional enhancing is carried out using navigational figure pairing growth exposure image
CN107172354A (en) Method for processing video frequency, device, electronic equipment and storage medium
Wang et al. Variational single nighttime image haze removal with a gray haze-line prior
CN109712177A (en) Image processing method, device, electronic equipment and computer readable storage medium
CN102088539B (en) Method and system for evaluating pre-shot picture quality
CN107368806A (en) Image correction method, device, computer-readable recording medium and computer equipment
WO2023151511A1 (en) Model training method and apparatus, image moire removal method and apparatus, and electronic device
Wang et al. Low-light image enhancement based on virtual exposure
CN113507570B (en) Exposure compensation method and device and electronic equipment
CN110674729A (en) Method for identifying number of people based on heat energy estimation, computer device and computer readable storage medium
CN116681636B (en) Light infrared and visible light image fusion method based on convolutional neural network
CN109191398A (en) Image processing method, device, computer readable storage medium and electronic equipment
CN111797694A (en) License plate detection method and device
CN107424134A (en) Image processing method, device, computer-readable recording medium and computer equipment
WO2023011280A1 (en) Image noise degree estimation method and apparatus, and electronic device and storage medium
Zhang et al. Color-to-gray conversion based on boundary points
CN115035393A (en) Stroboscopic scene classification method, model training method, related device and electronic equipment
CN111866476B (en) Image shooting method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination