WO2019233394A1

WO2019233394A1 - Image processing method and apparatus, storage medium and electronic device

Info

Publication number: WO2019233394A1
Application number: PCT/CN2019/089914
Authority: WO
Inventors: 陈岩
Original assignee: Oppo广东移动通信有限公司
Priority date: 2018-06-08
Filing date: 2019-06-04
Publication date: 2019-12-12
Also published as: CN108764208B; CN108764208A

Abstract

The present application relates to an image processing method and apparatus, an electronic device and a computer readable storage medium, the method comprising: acquiring an image to be detected, carrying out scenario recognition for the image to be detected according to a multi-label categorization model so as to obtain a label corresponding to the image to be detected, the multi-label categorization model being obtained according to multi-label images comprising multiple scenario factors; and using the label corresponding to the image to be detected as the result of scenario recognition for output.

Description

Image processing method and device, storage medium and electronic equipment

Cross-reference to related applications

This application claims the priority of a Chinese patent application filed on June 08, 2018 with the application number 201810585679.3 and the invention name "Image Processing Method and Apparatus, Storage Medium, Electronic Equipment", the entire contents of which are incorporated by reference. In this application.

Technical field

The present application relates to the field of computer technology, and in particular, to an image processing method and device, a storage medium, and an electronic device.

Background technique

With the popularity of mobile terminals and the rapid development of mobile Internet, the user usage of mobile terminals is increasing. The camera function in mobile terminals has become one of the commonly used functions of users. During or after taking a photo, the mobile terminal may perform scene recognition on the image to provide a smart experience for the user.

Summary of the Invention

The embodiments of the present application provide an image processing method and device, a storage medium, and an electronic device, which can improve the accuracy of scene recognition on an image.

An image processing method includes:

Obtaining images to be detected;

Performing scene recognition on the to-be-detected image according to a multi-label classification model to obtain labels corresponding to the to-be-detected image, and the multi-label classification model is obtained based on a multi-label image including multiple scene elements;

And outputting a label corresponding to the image to be detected as a result of scene recognition.

An image processing device includes:

An image acquisition module, configured to acquire an image to be detected;

A scene recognition module is configured to perform scene recognition on the to-be-detected image according to a multi-label classification model to obtain tags corresponding to the to-be-detected image. The multi-label classification model is obtained from a multi-label image including multiple scene elements. ;

An output module is configured to output a label corresponding to the image to be detected as a result of scene recognition.

A computer-readable storage medium has stored thereon a computer program that, when executed by a processor, implements the operations of the image processing method described above.

An electronic device includes a memory, a processor, and a computer program stored on the memory and operable on the processor. When the processor executes the computer program, the operations of the image processing method described above are performed.

The foregoing scene recognition method and device, storage medium, and electronic device obtain an image to be detected, perform scene recognition according to a multi-label classification model, and obtain tags corresponding to the image to be detected. The multi-label classification model is based on Multi-label images are obtained. The label corresponding to the image to be detected is output as a result of scene recognition.

BRIEF DESCRIPTION OF THE DRAWINGS

In order to explain the technical solutions in the embodiments of the present application or the prior art more clearly, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings in the following description are merely These are some embodiments of the present application. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without paying creative work.

FIG. 1 is an internal structural diagram of an electronic device in an embodiment; FIG.

2 is a flowchart of an image processing method according to an embodiment;

3A is a flowchart of an image processing method according to another embodiment;

3B is a schematic structural diagram of a neural network in an embodiment;

4 is a flowchart of a method for obtaining a label corresponding to an image by performing scene recognition on the image according to the multi-label classification model in FIG. 2;

5 is a flowchart of an image processing method in still another embodiment;

6 is a schematic structural diagram of an image processing apparatus according to an embodiment;

FIG. 7 is a schematic structural diagram of an image processing apparatus according to another embodiment; FIG.

8 is a schematic structural diagram of a scene recognition module in FIG. 6;

FIG. 9 is a block diagram of a partial structure of a mobile phone related to an electronic device according to an embodiment.

Detailed ways

In order to make the purpose, technical solution, and advantages of the present application clearer, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the application, and are not used to limit the application.

FIG. 1 is a schematic diagram of an internal structure of an electronic device in an embodiment. As shown in FIG. 1, the electronic device includes a processor, a memory, and a network interface connected through a system bus. The processor is used to provide computing and control capabilities to support the operation of the entire electronic device. The memory is used to store data, programs, and the like. At least one computer program is stored on the memory, and the computer program can be executed by a processor to implement the image processing method applicable to the electronic device provided in the embodiments of the present application. The memory may include a non-volatile storage medium such as a magnetic disk, an optical disc, a read-only memory (ROM), or a random-access memory (RAM). For example, in one embodiment, the memory includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The computer program can be executed by a processor to implement an image processing method provided by each of the following embodiments. The internal memory provides a cached operating environment for operating system computer programs in a non-volatile storage medium. The network interface may be an Ethernet card or a wireless network card, and is used to communicate with external electronic devices. The electronic device may be a mobile phone, a tablet computer, or a personal digital assistant or a wearable device.

In one embodiment, as shown in FIG. 2, an image processing method is provided. The method is applied to the electronic device in FIG. 1 as an example, and includes:

Operation 220: Acquire an image to be detected.

The user uses an electronic device (with a photographing function) to take a picture and obtain an image to be detected. The image to be detected may be a photo preview screen, or a photo saved to an electronic device after the photo is taken. The image to be detected refers to an image requiring scene recognition, and includes both an image containing only a single scene element and an image containing multiple scene elements (two or more). Generally, the scene elements in the image include landscape, beach, blue sky, green grass, snow, night scene, dark, backlight, sunrise / sunset, fireworks, spotlight, indoor, macro, text document, portrait, baby, cat, dog , Food and more. Of course, the above are not exhaustive, but also include many other categories of scene elements.

Operation 240: Perform scene recognition according to the multi-label classification model to obtain tags corresponding to the image to be detected, and the multi-label classification model is obtained from a multi-label image including multiple scene elements.

After acquiring the image to be detected, scene recognition is performed on the image to be detected. Specifically, a pre-trained multi-label classification model is used to perform scene recognition on the image to obtain tags corresponding to the scene included in the image. The multi-label classification model is obtained based on a multi-label image including multiple scene elements. That is, the multi-label classification model is a scene recognition model obtained after scene recognition training using an image containing multiple scene elements. After the multi-label classification model performs scene recognition on the images to be detected, labels corresponding to the scenes contained in the images to be detected are obtained. For example, by using a multi-label classification model to perform scene recognition on an image to be detected that includes multiple scene elements such as beach, blue sky, and portrait, the labels of the image to be detected can be directly output as beach, blue sky, and portrait. The beach, blue sky, and portrait are labels corresponding to the scene in the image to be detected.

In operation 260, the label corresponding to the image to be detected is output as a result of scene recognition.

After the scene to be detected is identified by the multi-label classification model, and the tags corresponding to the scene included in the to-be-detected image are obtained, the tags corresponding to the to-be-detected image are the results of scene recognition. Output the results of scene recognition.

In the embodiment of the present application, an image requiring scene recognition is acquired, and a scene recognition is performed on an image to be detected according to a multi-label classification model to obtain a tag corresponding to the image to be detected. The multi-label classification model is obtained from a multi-label image including multiple scene elements. of. The label corresponding to the image to be detected is output as a result of scene recognition. Because the multi-label classification model is a scene recognition model obtained from multi-label images containing multiple scene elements, it is possible to directly and accurately output multiple scenes in this image after performing scene recognition on images containing different scene elements. s Mark. Therefore, the accuracy of scene recognition for images containing different scene elements is improved, and the efficiency of scene recognition is also improved.

In an embodiment, as shown in FIG. 3A, before acquiring an image to be detected, the method includes:

Operation 320: Obtain a multi-label image including multiple scene elements.

Obtaining an image containing multiple scene elements is called a multi-label image in this embodiment, because after scene recognition for an image containing multiple scenes, each scene will correspond to a label, and all the labels form the label of the image, that is, Multi-label images.

Operation 340: Train a multi-label classification model using a multi-label image including multiple scene elements.

To obtain some multi-label image samples, scene recognition may be performed on the above-mentioned multi-label image samples manually, and a label corresponding to each multi-label image sample is obtained, which is called a standard label. Then use the images in the above multi-label image samples for scene recognition training one by one, until the error between the trained scene recognition results and the standard tags is getting smaller and smaller. At this time, after training, the multi-label classification model that can realize scene recognition on multi-label images is obtained.

In the embodiment of the present application, because the multi-label classification model is a scene recognition model obtained by training using multi-label images containing multiple scene elements, the images containing different scene elements can be directly and accurately output after scene recognition. Labels corresponding to multiple scenes in this image. The accuracy of multi-tag image recognition is improved, and the efficiency of multi-tag image recognition is also improved.

In one embodiment, the multi-label classification model is constructed based on a neural network model.

The specific training method of the multi-label classification model is: input a training image containing a background training target and a foreground training target to a neural network, and obtain a first prediction confidence and a first true confidence that reflect each pixel of the background region in the training image. A first loss function that reflects the difference between degrees, and a second loss function that reflects the difference between the second prediction confidence and the second true confidence of each pixel in the foreground region in the training image; the first prediction confidence is The neural network predicts the confidence level that a pixel in the background region belongs to the background training target. The first true confidence level represents the confidence level that the pixel labeled in the training image belongs to the background training target; the second prediction confidence level To use the neural network to predict the confidence level that a pixel in the foreground region belongs to the foreground training target, the second true confidence level represents the confidence level that the pixel labeled in the training image belongs to the foreground training target;

Weighting and summing the first loss function and the second loss function to obtain a target loss function;

Adjust the parameters of the neural network according to the target loss function, train the neural network and finally obtain a multi-label classification model. Among them, the background training target of the training image has corresponding labels, and the foreground training target also has labels.

FIG. 3B is a schematic structural diagram of a neural network model in an embodiment. As shown in FIG. 3B, the input layer of the neural network receives training images with image category labels, performs feature extraction through a basic network (such as a CNN network), and outputs the extracted image features to the feature layer, and the feature layer is used for the background The first loss function is obtained by performing category detection on the training target, and the second loss function is obtained by performing category detection on the foreground training target based on image features. The position loss function is obtained by performing position detection on the foreground training target based on the foreground area. The weighted sum of the loss function and the position loss function is used to obtain the target loss function. The neural network may be a convolutional neural network. Convolutional neural networks include a data input layer, a convolutional calculation layer, an activation layer, a pooling layer, and a fully connected layer. The data input layer is used to pre-process the original image data. The pre-processing may include de-averaging, normalization, dimensionality reduction, and whitening processes. De-averaging refers to centering each dimension of the input data to 0, the purpose is to pull the center of the sample back to the origin of the coordinate system. Normalization is normalizing the amplitude to the same range. Whitening refers to normalizing the amplitude on each characteristic axis of the data. The convolution calculation layer is used for local correlation and window sliding. The weight of each filter connected to the data window in the convolution calculation layer is fixed. Each filter focuses on an image feature, such as vertical edges, horizontal edges, colors, textures, etc., and these filters are combined to obtain the entire image. Feature extractor collection. A filter is a weight matrix. A weight matrix can be used to convolve with data in different windows. The activation layer is used to non-linearly map the output of the convolution layer. The activation function used by the activation layer may be ReLU (The Rectified Linear Unit). The pooling layer can be sandwiched between consecutive convolutional layers to compress the amount of data and parameters and reduce overfitting. The pooling layer can use the maximum method or average method to reduce the dimensionality of the data. The fully connected layer is located at the tail of the convolutional neural network, and all neurons between the two layers have the right to reconnect. Part of the convolutional neural network is cascaded to the first confidence output node, part of the convolutional layer is cascaded to the second confidence output node, and part of the convolutional layer is cascaded to the position output node. According to the first confidence output node, it can be detected. To the background classification of the image, the output node can detect the type of the foreground object of the image according to the second confidence level, and the position corresponding to the foreground object can be detected according to the position output node.

Specifically, artificial neural networks (Artificial Neural Networks, abbreviated as ANNs) are also referred to as neural networks (NNs) or connection models. It abstracts the human brain neuron network from the perspective of information processing, establishes some simple model, and forms different networks according to different connection methods. In engineering and academia, it is often referred to as neural network or neural network. It can be understood that artificial neural network is a mathematical model that uses information similar to the structure of brain synapses to process information.

Neural networks are often used for classification, for example, the classification of spam, the classification of cats and dogs in images, and so on. This kind of machine that can automatically classify the input variables is called a classifier. The input to the classifier is a numeric vector called a feature (vector). Before using the classifier, the classifier needs to be trained, that is, the neural network needs to be trained first.

The training of artificial neural networks relies on back-propagation algorithms. First, input the feature vector in the input layer and obtain the output through network calculation. The output layer finds that the output is not consistent with the correct class number. At this time, it allows the last layer of neurons to adjust the parameters. , And will also order the penultimate neuron connected to it to adjust its parameters, so that the layers are adjusted backward. The adjusted network will continue to test on the sample. If the output is still wrong, continue to the next round of rollback adjustments until the output through the neural network is as consistent as possible with the correct result.

In the embodiment of the present application, the neural network model includes an input layer, a hidden layer, and an output layer. Feature vectors are extracted from multi-label images containing multiple scene elements, and then the feature vectors are input into the hidden layer to calculate the size of the loss function, and then the parameters of the neural network model are adjusted according to the loss function, so that the loss function continuously converges, and then Multi-label classification model is obtained by training the neural network model. The multi-label classification model can implement scene recognition on the input image to obtain tags for each scene included in the image, and output these tags as the result of scene recognition. The target loss function is obtained by weighted summing the first loss function corresponding to the background training target and the second loss function corresponding to the foreground training target, and the parameters of the neural network are adjusted according to the target loss function, so that the trained multi-label classification model is obtained. Subsequent identification of the background category and the label of the foreground target can obtain more information and improve the recognition efficiency.

In one embodiment, as shown in FIG. 4, operation 240, performing scene recognition according to the multi-label classification model to obtain a label corresponding to the image to be detected, including:

Operation 242: Perform scene recognition according to the multi-label classification model to obtain an initial label of the image to be detected and a confidence level corresponding to the initial label;

Operation 244: Determine whether the confidence level of the initial label is greater than a preset threshold;

In operation 246, when the determination result is yes, the initial label whose confidence level is greater than a preset threshold is used as a label corresponding to the image to be detected.

Using the trained multi-label classification model, there may still be some errors in the output of image scene recognition in practice, so it is necessary to further reduce the errors. In general, if the multi-label classification model obtained by the training is used to perform scene recognition on a to-be-detected image that contains multiple scene elements, multiple initial tags of the to-be-detected image and the confidence levels corresponding to the initial tags will be obtained. For example, for a scene to be detected including a beach, blue sky, and portrait for scene recognition, the confidence that the initial label of the image to be detected is beach is 0.6, and the confidence that the initial label of the image to be detected is blue sky is 0.7, the confidence that the initial label of the image to be detected is a portrait is 0.8, the confidence that the initial label of the image to be detected is a dog is 0.4, and the confidence that the initial label of the image to be detected is snow is 0.3.

Then, the initial labels of the recognition results are filtered. Specifically, it is determined whether the confidence level of the initial labels is greater than a preset threshold. Wherein, the preset threshold may be a confidence level obtained when the multi-label classification model is trained in the early stage, based on a large number of training samples, when the loss function is relatively small, and the result obtained is close to the actual result. Threshold. For example, if the confidence threshold obtained based on a large number of training samples is 0.5, in the above example, it is determined whether the confidence of the initial label is greater than a preset threshold, and the initial label with a confidence greater than the preset threshold is used as a label corresponding to the image. The labels corresponding to the obtained images to be detected are beach, blue sky, and portrait, and two interference terms, dog and snow scene with confidence lower than the threshold, are discarded.

In the embodiment of the present application, scene recognition is performed on an image to be detected according to a multi-label classification model, and an initial label of the image to be detected and a confidence level corresponding to the initial label are obtained. Because the initial labels obtained from scene recognition are not necessarily the true labels corresponding to the images to be detected, the confidence of each initial label is used to filter the initial labels, and the initial labels larger than the confidence threshold are selected as the corresponding images to be detected. Scene recognition results. This improves the accuracy of the scene recognition results to a certain extent.

In one embodiment, the range of confidence corresponding to each initial label is [0,1].

Specifically, because the multi-label classification model is a scene recognition model that is trained based on multi-label images containing multiple scene elements, it is possible to directly output this image more accurately after performing scene recognition on images to be detected that contain different scene elements. Labels corresponding to multiple scenes in the image to be detected. In this multi-label classification model, the identification process of each label is independent, so the probability of each identified label can be between [0,1]. In the embodiment of the present application, the recognition processes of different tags do not affect each other, so it is possible to comprehensively identify all the scenes included in the image to be detected and avoid omissions.

In an embodiment, as shown in FIG. 5, after outputting a label corresponding to an image to be detected as a result of scene recognition, the method includes:

Operation 520: Obtain position information when the image to be detected is captured;

In operation 540, the result of scene recognition is corrected according to the position information to obtain a final result of scene recognition after correction.

Specifically, in general, the electronic device records the location of each picture, and generally uses GPS (Global Positioning System) to record address information. Get the address information recorded by the electronic device. After acquiring the address information recorded by the electronic device, the position information of the image to be detected is acquired according to the address information. Match the corresponding scene category and the weight corresponding to the scene category for different address information in advance. Specifically, it may be a result obtained by performing statistical analysis on a large number of image materials, and the corresponding scene category and the corresponding weight of the scene category may be matched for different address information according to the result. For example, according to a statistical analysis of a large number of image materials, when the address information is displayed as "XXX grassland", the scene corresponding to the address "grassland" is "grass green" with a weight of 9, and "snow landscape" The weight of "" is 7, the weight of "landscape" is 4, the weight of "blue sky" is 6, the weight of "beach" is -8, and the value range is [-10,10]. The larger the weight value, the greater the probability of the scene appearing in the image, and the smaller the weight value, the smaller the probability of the scene appearing in the image. In this way, the result of scene recognition can be corrected according to the address information at the time of image shooting and the probability of the scene corresponding to the address information, to obtain the final result of scene recognition after correction. For example, if the address information of the picture is "XXX grassland", then the scenes corresponding to the "XXX grassland" have higher weights such as "green grass", "snow landscape", and "blue sky", so these scenes have a higher probability of appearing . Therefore, the result of scene recognition is corrected. If the above-mentioned "green grass", "snow scene", and "blue sky" appear in the result of scene recognition, then it can be used as the final result of scene recognition. If the scene of "beach" appears in the result of scene recognition, then the "beach" scene should be filtered according to the address information when the image was taken to remove the "beach" scene to avoid getting incorrect and incompatible scene categories.

In the embodiment of the present application, position information at the time of shooting an image to be detected is acquired, and a result of scene recognition is corrected according to the position information to obtain a final result of scene recognition after correction. The scene classification of the to-be-detected image obtained by using the shooting address information of the to-be-detected image can be implemented to calibrate the result of scene recognition, thereby ultimately improving the accuracy of scene detection.

In one embodiment, after outputting a label corresponding to the image to be detected as a result of scene recognition, the method further includes:

According to the result of scene recognition, the image to be detected is subjected to image processing corresponding to the result of scene recognition.

In the embodiment of the present application, after the image to be detected is subjected to multi-label classification model for scene recognition, a label corresponding to the image to be detected is obtained, and a label corresponding to the image to be detected is output as a result of scene recognition. The result of scene recognition can be used as the basis for image post-processing, and targeted image processing can be performed according to the result of scene recognition, thereby greatly improving the quality of the image. For example, if the scene type of the image to be detected is identified as night scene, the image may be processed in a suitable manner for the night scene, such as increasing brightness. If it is identified that the scene type of the image to be detected is backlighting, the image can be processed using a suitable processing method for backlighting. Of course, if it is identified that the scene category of the image to be detected is multi-label, such as including beaches, green grass, and blue sky, the beach area can be processed in a manner suitable for the beach, and the green grass area can be processed in a manner suitable for green grass. The blue sky is processed separately for the blue sky, so that the effect of the entire image is very good.

In a specific embodiment, an image processing method is provided. The method is applied to the electronic device in FIG. 1 as an example, and includes:

Operation 1: Obtain a multi-label image containing multiple scene elements, and use the multi-label image containing multiple scene elements to train a neural network model to obtain a multi-label classification model, that is, the multi-label classification model is based on a neural network architecture;

Operation 2: Perform scene recognition according to the multi-label classification model to obtain the initial label of the image to be detected and the confidence level corresponding to the initial label;

Operation three: Determine whether the confidence level of the initial label is greater than a preset threshold. When the determination result is yes, use the initial label whose confidence level is greater than the preset threshold as the label corresponding to the image to be detected, and use the label corresponding to the image to be detected as the scene recognition. Output the results;

Operation four: Obtain position information at the time of shooting the image to be detected, and correct the scene recognition result according to the position information to obtain the final result of the scene recognition after correction;

Operation five: According to the result of the scene recognition, the image to be detected is subjected to image processing corresponding to the result of the scene recognition to obtain a processed image.

In the embodiment of the present application, because the multi-label classification model is a scene recognition model obtained from a multi-label image containing multiple scene elements, it is possible to directly and accurately perform scene recognition on images to be detected that include different scene elements. Output labels corresponding to multiple scenes in this image. Therefore, the accuracy of scene recognition on images to be detected containing different scene elements is improved, and the efficiency of scene recognition is also improved. The result of scene recognition is corrected according to the position information when the image to be detected is captured, to obtain the final result of scene recognition after correction. The scene classification of the to-be-detected image obtained by using the shooting address information of the to-be-detected image can be implemented to calibrate the result of scene recognition, thereby ultimately improving the accuracy of scene detection. And the result of scene recognition can be used as the basis for image post-processing, and the image can be targeted for image processing according to the result of scene recognition, thereby greatly improving the quality of the image.

It should be understood that although the operations in the above flowchart are sequentially displayed in accordance with the directions of the arrows, these operations are not necessarily performed in the order indicated by the arrows. Unless explicitly stated in this article, there is no strict order in which these operations can be performed, and these operations can be performed in other orders. Moreover, at least a part of the operations in the above figure may include multiple sub-operations or multiple phases. These sub-operations or phases are not necessarily performed at the same time, but may be performed at different times. The execution of these sub-operations or phases The order is not necessarily performed sequentially, but may be performed in turn or alternately with at least a part of another operation or a sub-operation or stage of another operation.

In one embodiment, as shown in FIG. 6, an image processing device 600 is provided, and the device includes an image acquisition module 610, a scene recognition module 620, and an output module 630. among them,

An image acquisition module 610, configured to acquire an image to be detected;

A scene recognition module 620 is configured to perform scene recognition according to a multi-label classification model to obtain a label corresponding to the image to be detected, and the multi-label classification model is obtained from a multi-label image including multiple scene elements;

An output module 630 is configured to output a label corresponding to the image to be detected as a result of scene recognition.

In one embodiment, as shown in FIG. 7, an image processing apparatus 600 is provided, and the apparatus further includes:

A multi-label image acquisition module 640, configured to acquire a multi-label image including multiple scene elements;

A multi-label classification model training module 650 is configured to train a multi-label classification model using a multi-label image including multiple scene elements.

In one embodiment, as shown in FIG. 8, the scene recognition module 620 includes:

An initial label acquisition module 622 is configured to perform scene recognition based on a multi-label classification model to obtain an initial label of the image to be detected and a confidence level corresponding to the initial label;

A determining module 624, configured to determine whether the confidence level of the initial label is greater than a preset threshold;

The image label generation module 626 is configured to, when the determination result is yes, use an initial label with a confidence level greater than a preset threshold as a label corresponding to the image to be detected.

In one embodiment, an image processing device 600 is provided, which is further configured to obtain position information when an image to be detected is taken; and correct the scene recognition result according to the position information to obtain a final scene recognition result after the correction.

In one embodiment, an image processing device 600 is provided, and further configured to perform image processing corresponding to a scene recognition result on an image to be detected according to a result of scene recognition.

The division of each module in the above image processing apparatus is for illustration only. In other embodiments, the image processing apparatus may be divided into different modules as needed to complete all or part of the functions of the above image processing apparatus.

Each module in the image processing apparatus may be implemented in whole or in part by software, hardware, and a combination thereof. The network interface may be an Ethernet card or a wireless network card. The above modules may be embedded in the processor in the form of hardware or independent of the processor in the server, or may be stored in the memory of the server in the form of software to facilitate the processor. Call to perform the operations corresponding to the above modules.

In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored. When the computer program is executed by a processor, the operations of the image processing methods provided by the foregoing embodiments are implemented.

In one embodiment, an electronic device is provided, including a memory, a processor, and a computer program stored on the memory and executable on the processor. When the processor executes the computer program, the image processing provided by the foregoing embodiments is implemented. The operation of the method.

An embodiment of the present application further provides a computer program product, which when executed on a computer, causes the computer to perform operations of the image processing methods provided by the foregoing embodiments.

An embodiment of the present application further provides an electronic device. The above electronic device includes an image processing circuit. The image processing circuit may be implemented by hardware and / or software components, and may include various processing units that define an ISP (Image Signal Processing) pipeline. FIG. 9 is a schematic diagram of an image processing circuit in one embodiment. As shown in FIG. 9, for ease of description, only aspects of the image processing technology related to the embodiments of the present application are shown.

As shown in FIG. 9, the image processing circuit includes an ISP processor 940 and a control logic 950. The image data captured by the imaging device 910 is first processed by the ISP processor 940, which analyzes the image data to capture image statistical information that can be used to determine and / or one or more control parameters of the imaging device 910. The imaging device 910 may include a camera having one or more lenses 912 and an image sensor 914. The image sensor 914 may include a color filter array (such as a Bayer filter). The image sensor 914 may obtain the light intensity and wavelength information captured by each imaging pixel of the image sensor 914, and provide a set of Image data. The sensor 920 (such as a gyroscope) may provide parameters (such as image stabilization parameters) of the acquired image processing to the ISP processor 940 based on the interface type of the sensor 920. The sensor 920 interface may use a SMIA (Standard Mobile Imaging Architecture) interface, other serial or parallel camera interfaces, or a combination of the foregoing interfaces.

In addition, the image sensor 914 may also send the original image data to the sensor 920, and the sensor 920 may provide the original image data to the ISP processor 940 based on the interface type of the sensor 920, or the sensor 920 stores the original image data in the image memory 930.

The ISP processor 940 processes the original image data pixel by pixel in a variety of formats. For example, each image pixel may have a bit depth of 8, 10, 12, or 14 bits, and the ISP processor 940 may perform one or more image processing operations on the original image data and collect statistical information about the image data. The image processing operations may be performed with the same or different bit depth accuracy.

The ISP processor 940 may also receive image data from the image memory 930. For example, the sensor 920 interface sends the original image data to the image memory 930, and the original image data in the image memory 930 is then provided to the ISP processor 940 for processing. The image memory 930 may be a part of a memory device, a storage device, or a separate dedicated memory in an electronic device, and may include a DMA (Direct Memory Access) feature.

When receiving raw image data from the image sensor 914 interface or from the sensor 920 interface or from the image memory 930, the ISP processor 940 may perform one or more image processing operations, such as time-domain filtering. The processed image data may be sent to the image memory 930 for further processing before being displayed. The ISP processor 940 receives the processing data from the image memory 930, and performs processing on the image data in the original domain and in the RGB and YCbCr color spaces. The image data processed by the ISP processor 940 may be output to the display 970 for viewing by the user and / or further processed by a graphics engine or a GPU (Graphics Processing Unit). In addition, the output of the ISP processor 940 can also be sent to the image memory 930, and the display 970 can read image data from the image memory 930. In one embodiment, the image memory 930 may be configured to implement one or more frame buffers. In addition, the output of the ISP processor 940 may be sent to an encoder / decoder 960 to encode / decode image data. The encoded image data can be saved and decompressed before being displayed on the display 970 device. The encoder / decoder 960 may be implemented by a CPU or a GPU or a coprocessor.

The statistical data determined by the ISP processor 940 may be sent to the control logic 950 unit. For example, the statistical data may include image information of the image sensor 914 such as auto exposure, auto white balance, auto focus, flicker detection, black level compensation, and lens 912 shading correction. The control logic 950 may include a processor and / or a microcontroller that executes one or more routines (such as firmware). The one or more routines may determine the control parameters of the imaging device 910 and the ISP processing according to the received statistical data. Parameters of the controller 940. For example, the control parameters of the imaging device 910 may include sensor 920 control parameters (such as gain, integration time for exposure control, image stabilization parameters, etc.), camera flash control parameters, lens 912 control parameters (such as focus distance for focusing or zooming), or these A combination of parameters. ISP control parameters may include gain levels and color correction matrices for automatic white balance and color adjustment (eg, during RGB processing), and lens 912 shading correction parameters.

Any reference to memory, storage, database, or other media used in this application may include non-volatile and / or volatile memory. Suitable non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (RAM), which is used as external cache memory. By way of illustration and not limitation, RAM is available in various forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDR, SDRAM), enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

Those of ordinary skill in the art can understand that the implementation of all or part of the processes in the method of the above embodiments can be completed by a computer program instructing related hardware. The program can be stored in a non-volatile computer-readable storage medium. When the program is executed, it may include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or the like.

The technical features of the above embodiments can be arbitrarily combined. In order to make the description concise, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, all It is considered to be the range described in this specification.

The above embodiments only express several implementation manners of the present application, and the descriptions thereof are more specific and detailed, but cannot be understood as limiting the scope of the invention patent. It should be noted that, for those of ordinary skill in the art, without departing from the concept of the present application, several modifications and improvements can be made, and these all belong to the protection scope of the present application. Therefore, the protection scope of this application patent shall be subject to the appended claims.

Claims

An image processing method, comprising:

Obtaining images to be detected;

Performing scene recognition on the to-be-detected image according to a multi-label classification model to obtain tags corresponding to the to-be-detected image, the multi-label classification model being obtained from a multi-label image including multiple scene elements; and

And outputting a label corresponding to the image to be detected as a result of scene recognition.
The method according to claim 1, wherein before the acquiring an image to be detected, comprises:

Obtain multi-label images containing multiple scene elements; and

The multi-label classification model is trained using the multi-label images containing multiple scene elements.
The method according to claim 2, wherein the multi-label classification model is constructed based on a neural network model.
The method according to claim 1, wherein performing scene recognition on the image to be detected according to a multi-label classification model, and obtaining a label corresponding to the image to be detected comprises:

Performing scene recognition on the image to be detected according to a multi-label classification model, and obtaining an initial label of the image to be detected and a confidence level corresponding to the initial label;

Determining whether the confidence level of the initial label is greater than a preset threshold; and

When the determination result is yes, the initial label with a confidence level greater than a preset threshold is used as a label corresponding to the image to be detected.
The method according to claim 4, wherein the range of the confidence corresponding to each initial label is [0,1].
The method according to claim 1, after outputting a label corresponding to the image to be detected as a result of scene recognition, comprising:

Acquiring position information when the image to be detected is taken; and

The result of scene recognition is corrected according to the position information to obtain a final result of scene recognition after correction.
The method according to claim 1, after outputting a label corresponding to the image to be detected as a result of scene recognition, further comprising:

Performing image processing corresponding to the scene recognition result on the image to be detected according to a scene recognition result.
An image processing device, wherein the device includes:

An image acquisition module, configured to acquire an image to be detected;

A scene recognition module is configured to perform scene recognition on the image to be detected according to a multi-label classification model, to obtain tags corresponding to the image to be detected, and the multi-label classification model is obtained from a multi-label image including multiple scene elements ;and

An output module is configured to output a label corresponding to the image to be detected as a result of scene recognition.
A computer-readable storage medium having stored thereon a computer program, wherein when the computer program is executed by a processor, the operations of the image processing method according to any one of claims 1 to 7 are realized.
An electronic device includes a memory, a processor, and a computer program stored on the memory and executable on the processor. The electronic device is characterized in that when the processor executes the computer program, the following operations are performed:

Obtaining images to be detected;

Performing scene recognition on the to-be-detected image according to a multi-label classification model to obtain tags corresponding to the to-be-detected image, the multi-label classification model being obtained from a multi-label image including multiple scene elements; and

And outputting a label corresponding to the image to be detected as a result of scene recognition.
The electronic device according to claim 10, before the acquiring an image to be detected, comprising:

Obtain multi-label images containing multiple scene elements; and

The multi-label classification model is trained using the multi-label images containing multiple scene elements.
The electronic device according to claim 11, wherein the multi-label classification model is constructed based on a neural network model.
The electronic device according to claim 10, wherein the performing scene recognition on the image to be detected according to a multi-label classification model to obtain a label corresponding to the image to be detected comprises:

Performing scene recognition on the image to be detected according to a multi-label classification model, and obtaining an initial label of the image to be detected and a confidence level corresponding to the initial label;

Determining whether the confidence level of the initial label is greater than a preset threshold; and

When the determination result is yes, the initial label with a confidence level greater than a preset threshold is used as a label corresponding to the image to be detected.
The electronic device according to claim 13, wherein the range of the confidence corresponding to each initial label is [0,1].
The electronic device according to claim 10, wherein after outputting a label corresponding to the image to be detected as a result of scene recognition, comprising:

Acquiring position information when the image to be detected is taken; and

The result of scene recognition is corrected according to the position information to obtain a final result of scene recognition after correction.
The electronic device according to claim 10, after outputting a label corresponding to the image to be detected as a result of scene recognition, further comprising:

Performing image processing corresponding to the scene recognition result on the image to be detected according to a scene recognition result.