CN108764208B

CN108764208B - Image processing method and device, storage medium and electronic equipment

Info

Publication number: CN108764208B
Application number: CN201810585679.3A
Authority: CN
Inventors: 陈岩
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2018-06-08
Filing date: 2018-06-08
Publication date: 2021-06-08
Anticipated expiration: 2038-06-08
Also published as: WO2019233394A1; CN108764208A

Abstract

The application relates to an image processing method and device, electronic equipment and a computer readable storage medium, wherein an image to be detected is obtained, scene recognition is carried out on the image to be detected according to a multi-label classification model, a label corresponding to the image to be detected is obtained, and the multi-label classification model is obtained according to a multi-label image containing various scene elements. And outputting the label corresponding to the image to be detected as a scene identification result. Since the multi-label classification model is a scene recognition model obtained from a multi-label image containing multiple scene elements, labels corresponding to multiple scenes in the image can be directly and accurately output after scene recognition is performed on the image containing different scene elements. Therefore, the accuracy of scene recognition of images containing different scene elements is improved, and the efficiency of scene recognition is improved at the same time.

Description

Image processing method and device, storage medium and electronic equipment

Technical Field

The present application relates to the field of computer technologies, and in particular, to an image processing method and apparatus, a storage medium, and an electronic device.

Background

With the popularization of mobile terminals and the rapid development of mobile internet, the usage amount of users of mobile terminals is increasing. The photographing function in the mobile terminal has become one of the functions commonly used by the user. During or after the photographing, the mobile terminal may perform scene recognition on the image to provide an intelligent experience for the user.

Disclosure of Invention

The embodiment of the application provides an image processing method and device, a storage medium and electronic equipment, which can improve the accuracy of scene recognition on an image.

An image processing method comprising:

acquiring an image to be detected;

carrying out scene recognition on the image to be detected according to a multi-label classification model to obtain a label corresponding to the image to be detected, wherein the multi-label classification model is obtained according to a multi-label image containing various scene elements;

and outputting the label corresponding to the image to be detected as a scene identification result.

An image processing apparatus, the apparatus comprising:

the image acquisition module is used for acquiring an image to be detected;

the scene recognition module is used for carrying out scene recognition on the image to be detected according to a multi-label classification model to obtain a label corresponding to the image to be detected, and the multi-label classification model is obtained according to a multi-label image containing various scene elements;

and the output module is used for outputting the label corresponding to the image to be detected as a scene identification result.

A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the image processing method as described above.

An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor performing the steps of the image processing method as described above when executing the computer program.

The scene recognition method and device, the storage medium and the electronic equipment acquire the image to be detected, perform scene recognition on the image to be detected according to the multi-label classification model to obtain the label corresponding to the image to be detected, wherein the multi-label classification model is obtained according to the multi-label image containing various scene elements. And outputting the label corresponding to the image to be detected as a scene identification result. Because the multi-label classification model is a scene recognition model obtained according to the multi-label image containing various scene elements, labels corresponding to a plurality of scenes in the image to be detected can be directly and accurately output after scene recognition is carried out on the image to be detected containing different scene elements. Therefore, the accuracy of scene recognition of images containing different scene elements is improved, and the efficiency of scene recognition is improved at the same time.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a diagram of the internal structure of an electronic device in one embodiment;

FIG. 2 is a flow diagram of a method of image processing in one embodiment;

FIG. 3A is a flow chart of a method of image processing in yet another embodiment;

FIG. 3B is a schematic diagram of an embodiment of a neural network;

FIG. 4 is a flowchart of a method for performing scene recognition on an image according to the multi-label classification model to obtain labels corresponding to the image in FIG. 2;

FIG. 5 is a flowchart of an image processing method in yet another embodiment;

FIG. 6 is a diagram showing a configuration of an image processing apparatus according to an embodiment;

FIG. 7 is a schematic diagram showing a configuration of an image processing apparatus according to still another embodiment;

FIG. 8 is a schematic diagram of the scene recognition module of FIG. 6;

fig. 9 is a block diagram of a partial structure of a cellular phone related to an electronic device provided in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

Fig. 1 is a schematic diagram of an internal structure of an electronic device in one embodiment. As shown in fig. 1, the electronic device includes a processor, a memory, and a network interface connected by a system bus. Wherein, the processor is used for providing calculation and control capability and supporting the operation of the whole electronic equipment. The memory is used for storing data, programs and the like, and at least one computer program is stored on the memory, and can be executed by the processor to realize the image processing method suitable for the electronic device provided by the embodiment of the application. The Memory may include a non-volatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a Random-Access-Memory (RAM). For example, in one embodiment, the memory includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The computer program can be executed by a processor to implement an image processing method provided in the following embodiments. The internal memory provides a cached execution environment for the operating system computer programs in the non-volatile storage medium. The network interface may be an ethernet card or a wireless network card, etc. for communicating with an external electronic device. The electronic device may be a mobile phone, a tablet computer, or a personal digital assistant or a wearable device, etc.

In one embodiment, as shown in fig. 2, an image processing method is provided, which is described by taking the method as an example applied to the electronic device in fig. 1, and includes:

step 220, obtaining an image to be detected.

The user uses the electronic equipment (with the photographing function) to photograph, and the image to be detected is obtained. The image to be detected can be a picture preview picture or a picture stored in the electronic equipment after being photographed. The image to be detected refers to an image that needs scene recognition, and includes both an image including only a single scene element and an image including a plurality of scene elements (two or more). Scene elements in an image typically include scenery, beach, blue sky, green grass, snow scenery, night scenery, darkness, backlighting, sunrise/sunset, fireworks, spotlights, rooms, microspurs, text documents, figures, babies, cats, dogs, gourmet, etc. Of course, the above is not exhaustive and many other categories of scene elements are also included.

And 240, carrying out scene recognition on the image to be detected according to the multi-label classification model to obtain a label corresponding to the image to be detected, wherein the multi-label classification model is obtained according to the multi-label image containing various scene elements.

And after the image to be detected is obtained, carrying out scene recognition on the image to be detected. Specifically, scene recognition is performed on the image by adopting a pre-trained multi-label classification model, and labels corresponding to scenes contained in the image are obtained. The multi-label classification model is obtained according to a multi-label image containing a plurality of scene elements. That is, the multi-label classification model is a scene recognition model obtained after performing scene recognition training using an image including a plurality of scene elements. And carrying out scene recognition on the image to be detected through the multi-label classification model to obtain a label corresponding to a scene contained in the image to be detected. For example, a multi-label classification model is used for carrying out scene recognition on an image to be detected which simultaneously comprises a plurality of scene elements such as a beach, a blue sky and a portrait, and then labels of the image to be detected can be directly output as the beach, the blue sky and the portrait. The beach, the blue sky and the portrait are labels corresponding to scenes in the image to be detected.

And step 260, outputting the label corresponding to the image to be detected as a scene identification result.

After scene recognition is carried out on the image to be detected through the multi-label classification model to obtain a label corresponding to a scene contained in the image to be detected, the label corresponding to the image to be detected is a scene recognition result. And outputting a scene recognition result.

In the embodiment of the application, the image needing scene recognition is obtained, the scene recognition is carried out on the image to be detected according to the multi-label classification model, the label corresponding to the image to be detected is obtained, and the multi-label classification model is obtained according to the multi-label image containing various scene elements. And outputting the label corresponding to the image to be detected as a scene identification result. Since the multi-label classification model is a scene recognition model obtained from a multi-label image containing multiple scene elements, labels corresponding to multiple scenes in the image can be directly and accurately output after scene recognition is performed on the image containing different scene elements. Therefore, the accuracy of scene recognition of images containing different scene elements is improved, and the efficiency of scene recognition is improved at the same time.

In one embodiment, as shown in fig. 3A, before acquiring the image to be detected, the method includes:

step 320, acquiring a multi-label image containing a plurality of scene elements.

The image including the multiple scene elements is acquired, and is referred to as a multi-label image in this embodiment, because after the image including the multiple scenes is subjected to scene recognition, each scene corresponds to one label, and all the labels constitute the labels of the image, that is, the multi-label image.

Step 340, training a multi-label classification model by using the multi-label image containing the scene elements.

The method comprises the steps of obtaining a plurality of multi-label image samples, carrying out scene recognition on the multi-label image samples manually in advance, and obtaining labels corresponding to the multi-label image samples, wherein the labels are called standard labels. And then, performing scene recognition training one by adopting the images in the multi-label image sample until the error between the trained scene recognition result and the standard label is smaller and smaller. And the multi-label classification model which can realize scene recognition on the multi-label image is obtained after training.

In the embodiment of the application, because the multi-label classification model is a scene recognition model obtained by training a multi-label image containing multiple scene elements, labels corresponding to multiple scenes in the image can be directly and accurately output after scene recognition is performed on the image containing different scene elements. The accuracy of multi-label image identification is improved, and the efficiency of multi-label image identification is improved.

In one embodiment, the multi-label classification model is constructed based on a neural network model.

The specific training method of the multi-label classification model comprises the following steps: inputting a training image containing a background training target and a foreground training target into a neural network to obtain a first loss function reflecting the difference between a first prediction confidence and a first real confidence of each pixel point in a background area in the training image and a second loss function reflecting the difference between a second prediction confidence and a second real confidence of each pixel point in a foreground area in the training image; the first prediction confidence coefficient is the confidence coefficient that a certain pixel point in a background area in a training image predicted by adopting a neural network belongs to a background training target, and the first real confidence coefficient represents the confidence coefficient that a pixel point labeled in advance in the training image belongs to the background training target; the second prediction confidence coefficient is the confidence coefficient that a certain pixel point in a foreground region in the training image predicted by adopting the neural network belongs to the foreground training target, and the second real confidence coefficient represents the confidence coefficient that a pixel point labeled in advance in the training image belongs to the foreground training target;

weighting and summing the first loss function and the second loss function to obtain a target loss function;

and adjusting parameters of the neural network according to the target loss function, and training the neural network to finally obtain the multi-label classification model. The background training target of the training image has a corresponding label, and the foreground training target also has a label.

FIG. 3B is a block diagram of a neural network model in accordance with an embodiment. As shown in fig. 3B, an input layer of the neural network receives a training image with an image category label, performs feature extraction through a basic network (e.g., a CNN network), outputs the extracted image features to a feature layer, performs category detection on a background training target by the feature layer to obtain a first loss function, performs category detection on a foreground training target according to the image features to obtain a second loss function, performs position detection on the foreground training target according to a foreground region to obtain a position loss function, and performs weighted summation on the first loss function, the second loss function, and the position loss function to obtain a target loss function. The neural network may be a convolutional neural network. The convolutional neural network comprises a data input layer, a convolutional calculation layer, an activation layer, a pooling layer and a full-link layer. The data input layer is used for preprocessing the original image data. The preprocessing may include de-averaging, normalization, dimensionality reduction, and whitening processing. Deaveraging refers to centering the input data to 0 for each dimension in order to pull the center of the sample back to the origin of the coordinate system. Normalization is to normalize the amplitude to the same range. Whitening refers to normalizing the amplitude on each characteristic axis of the data. The convolution computation layer is used for local correlation and window sliding. The weights of each filter connection data window in the convolution calculation layer are fixed, each filter pays attention to one image feature, such as vertical edge, horizontal edge, color, texture and the like, and the filters are combined to obtain a feature extractor set of the whole image. One filter is a weight matrix. The convolution can be performed with the data in different windows through a weight matrix. The activation layer is used for carrying out nonlinear mapping on the convolution layer output result. The activation function used by The activation layer may be ReLU (The Rectified Linear Unit). A pooling layer may be sandwiched between successive convolutional layers for compressing the amount of data and parameters, reducing overfitting. The pooling layer may employ a maximum or mean method to dimensionality-reduce the data. The fully connected layer is positioned at the tail part of the convolutional neural network, and all neurons between the two layers are connected in a weighted mode. And one part of convolutional layers of the convolutional neural network are cascaded to a first confidence coefficient output node, one part of convolutional layers are cascaded to a second confidence coefficient output node, one part of convolutional layers are cascaded to a position output node, the classification of the background of the image can be detected according to the first confidence coefficient output node, the classification of the foreground target of the image can be detected according to the second confidence coefficient output node, and the position corresponding to the foreground target can be detected according to the position output node.

In particular, Artificial Neural Networks (ans), also referred to as Neural Networks (NNs) or as Connection models (Connection models), are used for short. The method abstracts the human brain neuron network from the information processing angle, establishes a certain simple model, and forms different networks according to different connection modes. It is also often directly referred to in engineering and academia as neural networks or neural-like networks. It is understood that an artificial neural network is a mathematical model that uses a structure similar to brain neurosynaptic connections for information processing.

Neural networks are commonly used for classification, e.g., for spam recognition classification, for cat and dog recognition classification in images, etc. Such machines that automatically classify input variables are called classifiers. The input to the classifier is a vector of values, called features (vectors). Before the classifier is used, the classifier needs to be trained, namely, the neural network needs to be trained firstly.

Training of artificial neural networks relies on back-propagation algorithms. At first, feature vectors are input into an input layer, output is obtained through network calculation, the output layer finds that the output is inconsistent with a correct class number, then the last layer of neuron adjusts parameters, the last layer of neuron not only adjusts parameters of the last layer of neuron, but also leads the second last layer of neuron connected with the last layer of neuron to adjust parameters of the last layer of neuron, and therefore the last layer of neuron can back and forth to adjust the parameters. The adjusted network will continue testing on the sample and if the output is still erroneous, continue the next round of adjustment back until the result output by the neural network is as consistent as possible with the correct result.

In an embodiment of the application, the neural network model comprises an input layer, a hidden layer and an output layer. Extracting a characteristic vector from a multi-label image containing various scene elements, inputting the characteristic vector into a hidden layer to calculate the size of a loss function, and adjusting parameters of a neural network model according to the loss function to enable the loss function to be converged continuously, thereby realizing training of the neural network model to obtain a multi-label classification model. The multi-label classification model can realize scene recognition on the input image to obtain labels of each scene contained in the image, and the labels are output as a scene recognition result. The target loss function is obtained by weighted summation of the first loss function corresponding to the background training target and the second loss function corresponding to the foreground training target, and parameters of the neural network are adjusted according to the target loss function, so that the labels of the background class and the foreground target can be simultaneously identified subsequently by the multi-label classification model obtained by training, more information is obtained, and the identification efficiency is improved.

In an embodiment, as shown in fig. 4, step 240, performing scene recognition on an image to be detected according to a multi-label classification model to obtain a label corresponding to the image to be detected, includes:

step 242, performing scene recognition on the image to be detected according to the multi-label classification model to obtain an initial label of the image to be detected and a confidence coefficient corresponding to the initial label;

step 244, determining whether the confidence of the initial label is greater than a preset threshold;

and step 246, if so, taking the initial label with the confidence coefficient larger than the preset threshold value as the label corresponding to the image to be detected.

With the multi-label classification model obtained through training, there may be a certain error in the output of the image scene recognition in practice, and therefore, the error needs to be further reduced. Generally, if scene recognition is performed on an image to be detected including multiple scene elements by using the multi-label classification model obtained through the training, multiple initial labels of the image to be detected and confidence degrees corresponding to the initial labels are obtained. For example, scene recognition is performed on an image to be detected including a beach, a blue sky and a portrait, the confidence that the initial label of the image to be detected is the beach is 0.6, the confidence that the initial label of the image to be detected is the blue sky is 0.7, the confidence that the initial label of the image to be detected is the portrait is 0.8, the confidence that the initial label of the image to be detected is the dog is 0.4, and the confidence that the initial label of the image to be detected is the snowscape is 0.3.

And then screening the initial tags of the recognition result, specifically, judging whether the confidence of the initial tags is greater than a preset threshold value. The preset threshold may be a confidence threshold obtained when the loss function is relatively small and the obtained result is relatively close to the actual result according to a large number of training samples when the multi-label classification model is trained in the previous period. For example, if the confidence threshold obtained from a large number of training samples is 0.5, in the above example, it is determined whether the confidence of the initial label is greater than the preset threshold, and the initial label with the confidence greater than the preset threshold is used as the label corresponding to the image. The labels corresponding to the obtained image to be detected are beach, blue sky and portrait, and two interference items of dogs and snowscapes with confidence coefficients lower than a threshold value are abandoned.

In the embodiment of the application, scene recognition is carried out on the image to be detected according to the multi-label classification model, so that the initial label of the image to be detected and the confidence coefficient corresponding to the initial label are obtained. Because the initial label obtained by scene recognition is not necessarily the real label corresponding to the image to be detected, the confidence of each initial label is adopted to screen the initial label, and the initial label larger than the confidence threshold is screened out as the scene recognition result corresponding to the image to be detected. Therefore, the accuracy of the scene recognition result is improved to a certain extent.

In one embodiment, each initial tag corresponds to a confidence level in the range of [0,1 ].

Specifically, the multi-label classification model is a scene recognition model obtained by training a multi-label image containing multiple scene elements, so that labels respectively corresponding to multiple scenes in the image to be detected can be directly and accurately output after scene recognition is performed on the image to be detected containing different scene elements. The identification process for each label in the multi-label classification model is independent, so the probability of each identified label can be between [0,1 ]. In the embodiment of the application, the identification processes of different labels are not mutually influenced, so that all scenes contained in the image to be detected can be comprehensively identified, and omission is avoided.

In one embodiment, as shown in fig. 5, after outputting the label corresponding to the image to be detected as the result of scene recognition, the method includes:

step 520, acquiring position information of the image to be detected during shooting;

and 540, correcting the scene recognition result according to the position information to obtain a corrected scene recognition final result.

Specifically, the electronic device generally records the location of each shot, and generally records address information by using a Global Positioning System (GPS). And acquiring address information recorded by the electronic equipment. After the address information recorded by the electronic equipment is acquired, the position information of the image to be detected is acquired according to the address information. Matching corresponding scene types and weight values corresponding to the scene types for different address information in advance. Specifically, the corresponding scene type and the weight corresponding to the scene type may be correspondingly matched for different address information according to a result obtained by performing statistical analysis on a large number of image materials. For example, it is found from statistical analysis of a large number of image materials that when the address information shows "XXX steppe", the weight of "green grass" corresponding to the "steppe" is 9, the weight of "snow scene" is 7, the weight of "landscape" is 4, the weight of "blue sky" is 6, the weight of "beach" is-8, and the value range of the weight is [ -10,10 ]. A larger weight indicates a larger probability of the scene appearing in the image, and a smaller weight indicates a smaller probability of the scene appearing in the image. Therefore, the scene recognition result can be corrected according to the address information during image shooting and the probability of the scene corresponding to the address information, and the corrected scene recognition final result can be obtained. For example, if the address information of a picture is "XXX prairie", then the scenes corresponding to this "XXX prairie" are "green grass", "snow scene", "blue sky" with higher weight, and the probability of these scenes occurring is higher. Therefore, the result of scene recognition is corrected, and if the above-mentioned "green grass", "snow scene", and "blue sky" appear in the result of scene recognition, it can be used as the final result of scene recognition. If the scene of the 'beach' appears in the scene recognition result, the 'beach' scene is filtered according to the address information during image shooting, the 'beach' scene is removed, and the situation that the scene is incorrect and does not accord with the reality is avoided.

In the embodiment of the application, the position information of the image to be detected during shooting is obtained, and the result of scene recognition is corrected according to the position information to obtain the final result of the corrected scene recognition. The scene type of the image to be detected, which is acquired through the shooting address information of the image to be detected, can be used for calibrating the scene identification result, so that the accuracy of scene detection is finally improved.

In one embodiment, after outputting the label corresponding to the image to be detected as the result of scene recognition, the method further includes:

and performing image processing corresponding to the scene recognition result on the image to be detected according to the scene recognition result.

In the embodiment of the application, after the image to be detected is subjected to scene recognition through the multi-label classification model, the label corresponding to the image to be detected is obtained, and the label corresponding to the image to be detected is output as a scene recognition result. The result of scene recognition can be used as the basis for image post-processing, and the image to be detected can be subjected to targeted image processing according to the result of scene recognition, so that the quality of the image is greatly improved. For example, if the scene type of the image to be detected is identified as a night scene, the image may be processed in a processing mode suitable for the night scene, such as increasing brightness. If the scene type of the image to be detected is identified as the backlight, the image can be processed by adopting a processing mode suitable for the backlight. Of course, if the scene type of the image to be detected is identified as multi-label, for example, including beach, green grass and blue sky, the processing method suitable for the beach can be respectively adopted for the beach area, the processing method suitable for the green grass is adopted for the green grass area, and the processing method suitable for the blue sky is adopted for the blue sky to respectively perform image processing, so that the effect of the whole image is very good.

In a specific embodiment, an image processing method is provided, which is described by taking the application of the method to the electronic device in fig. 1 as an example, and includes:

acquiring a multi-label image containing various scene elements, and training a neural network model by using the multi-label image containing various scene elements to obtain a multi-label classification model, namely the multi-label classification model is based on a neural network framework;

secondly, performing scene recognition on the image to be detected according to the multi-label classification model to obtain an initial label of the image to be detected and a confidence coefficient corresponding to the initial label;

judging whether the confidence coefficient of the initial label is greater than a preset threshold value, if so, taking the initial label with the confidence coefficient greater than the preset threshold value as a label corresponding to the image to be detected, and outputting the label corresponding to the image to be detected as a scene recognition result;

acquiring position information of the image to be detected during shooting, and correcting a scene recognition result according to the position information to obtain a corrected scene recognition final result;

and fifthly, performing image processing corresponding to the scene recognition result on the image to be detected according to the scene recognition result to obtain a processed image.

In the embodiment of the application, because the multi-label classification model is a scene recognition model obtained according to a multi-label image containing multiple scene elements, labels corresponding to multiple scenes in the image can be directly and accurately output after scene recognition is performed on an image to be detected containing different scene elements. Therefore, the accuracy of scene recognition of the to-be-detected images containing different scene elements is improved, and the efficiency of scene recognition is improved. And correcting the scene recognition result according to the position information of the image to be detected during shooting to obtain the corrected final scene recognition result. The scene type of the image to be detected, which is acquired through the shooting address information of the image to be detected, can be used for calibrating the scene identification result, so that the accuracy of scene detection is finally improved. And the result of scene recognition can be used as the basis of image post-processing, and the image can be subjected to targeted image processing according to the result of scene recognition, so that the quality of the image is greatly improved.

In one embodiment, as shown in fig. 6, there is provided an image processing apparatus 600, the apparatus comprising: an image acquisition module 610, a scene recognition module 620, and an output module 630. Wherein,

an image obtaining module 610, configured to obtain an image to be detected;

the scene recognition module 620 is configured to perform scene recognition on an image to be detected according to a multi-label classification model, so as to obtain a label corresponding to the image to be detected, where the multi-label classification model is obtained according to a multi-label image including multiple scene elements;

and the output module 630 is configured to output a label corresponding to the image to be detected as a result of scene recognition.

In one embodiment, as shown in fig. 7, there is provided an image processing apparatus 600, the apparatus further comprising:

a multi-label image obtaining module 640, configured to obtain a multi-label image including multiple scene elements;

a multi-label classification model training module 650 for training a multi-label classification model using a multi-label image containing a plurality of scene elements.

In one embodiment, as shown in FIG. 8, the scene recognition module 620 includes:

an initial label obtaining module 622, configured to perform scene recognition on an image to be detected according to the multi-label classification model, so as to obtain an initial label of the image to be detected and a confidence corresponding to the initial label;

a judging module 624, configured to judge whether the confidence of the initial tag is greater than a preset threshold;

and an image tag generating module 626, configured to, if yes, use the initial tag with the confidence coefficient greater than the preset threshold as the tag corresponding to the image to be detected.

In one embodiment, an image processing apparatus 600 is provided, which is further configured to acquire position information at the time of photographing an image to be detected; and correcting the scene recognition result according to the position information to obtain a corrected final scene recognition result.

In one embodiment, an image processing apparatus 600 is provided, which is further configured to perform image processing corresponding to a scene recognition result on an image to be detected according to the scene recognition result.

The division of the modules in the image processing apparatus is only for illustration, and in other embodiments, the image processing apparatus may be divided into different modules as needed to complete all or part of the functions of the image processing apparatus.

In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which, when being executed by a processor, implements the steps of the image processing method provided by the above embodiments.

In one embodiment, an electronic device is provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the steps of the image processing method provided in the above embodiments are implemented.

The embodiments of the present application also provide a computer program product, which when run on a computer, causes the computer to execute the steps of the image processing method provided in the foregoing embodiments.

The embodiment of the application also provides the electronic equipment. The electronic device includes therein an Image Processing circuit, which may be implemented using hardware and/or software components, and may include various Processing units defining an ISP (Image Signal Processing) pipeline. FIG. 9 is a schematic diagram of an image processing circuit in one embodiment. As shown in fig. 9, for convenience of explanation, only aspects of the image processing technique related to the embodiments of the present application are shown.

As shown in fig. 9, the image processing circuit includes an ISP processor 940 and a control logic 950. The image data captured by the imaging device 910 is first processed by the ISP processor 940, and the ISP processor 940 analyzes the image data to capture image statistics that may be used to determine and/or control one or more parameters of the imaging device 910. The imaging device 910 may include a camera having one or more lenses 912 and an image sensor 914. Image sensor 914 may include an array of color filters (e.g., Bayer filters), and image sensor 914 may acquire light intensity and wavelength information captured with each imaging pixel of image sensor 914 and provide a set of raw image data that may be processed by ISP processor 940. The sensor 920 (e.g., a gyroscope) may provide parameters of the acquired image processing (e.g., anti-shake parameters) to the ISP processor 940 based on the type of interface of the sensor 920. The sensor 920 interface may utilize an SMIA (Standard Mobile Imaging Architecture) interface, other serial or parallel camera interfaces, or a combination of the above.

In addition, image sensor 914 may also send raw image data to sensor 920, sensor 920 may provide raw image data to ISP processor 940 based on the type of interface of sensor 920, or sensor 920 may store raw image data in image memory 930.

The ISP processor 940 processes the raw image data pixel by pixel in a variety of formats. For example, each image pixel may have a bit depth of 8, 10, 12, or 14 bits, and the ISP processor 940 may perform one or more image processing operations on the raw image data, collecting statistical information about the image data. Wherein the image processing operations may be performed with the same or different bit depth precision.

ISP processor 940 may also receive image data from image memory 930. For example, the sensor 920 interface sends raw image data to the image memory 930, and the raw image data in the image memory 930 is then provided to the ISP processor 940 for processing. The image Memory 930 may be a part of a Memory device, a storage device, or a separate dedicated Memory within an electronic device, and may include a DMA (Direct Memory Access) feature.

Upon receiving raw image data from image sensor 914 interface or from sensor 920 interface or from image memory 930, ISP processor 940 may perform one or more image processing operations, such as temporal filtering. The processed image data may be sent to image memory 930 for additional processing before being displayed. ISP processor 940 receives the processed data from image memory 930 and performs image data processing on the processed data in the raw domain and in the RGB and YCbCr color spaces. The image data processed by ISP processor 940 may be output to display 970 for viewing by a user and/or further processed by a Graphics Processing Unit (GPU). Further, the output of ISP processor 940 may also be sent to image memory 930 and display 970 may read image data from image memory 930. In one embodiment, image memory 930 may be configured to implement one or more frame buffers. In addition, the output of the ISP processor 940 may be transmitted to an encoder/decoder 960 for encoding/decoding the image data. The encoded image data may be saved and decompressed before being displayed on a display 970 device. The encoder/decoder 960 may be implemented by a CPU or GPU or coprocessor.

The statistical data determined by the ISP processor 940 may be transmitted to the control logic 950 unit. For example, the statistical data may include image sensor 914 statistics such as auto-exposure, auto-white balance, auto-focus, flicker detection, black level compensation, lens 912 shading correction, and the like. The control logic 950 may include a processor and/or microcontroller that executes one or more routines (e.g., firmware) that may determine control parameters of the imaging device 910 and control parameters of the ISP processor 940 based on the received statistical data. For example, the control parameters of imaging device 910 may include sensor 920 control parameters (e.g., gain, integration time for exposure control, anti-shake parameters, etc.), camera flash control parameters, lens 912 control parameters (e.g., focal length for focusing or zooming), or a combination of these parameters. The ISP control parameters may include gain levels and color correction matrices for automatic white balance and color adjustment (e.g., during RGB processing), as well as lens 912 shading correction parameters.

Any reference to memory, storage, database, or other medium used herein may include non-volatile and/or volatile memory. Suitable non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM), which acts as external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms, such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), Enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and bus dynamic RAM (RDRAM).

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. An image processing method, comprising:

acquiring an image to be detected;

scene recognition is carried out on the image to be detected according to a multi-label classification model, a plurality of labels corresponding to the image to be detected are directly obtained, and the multi-label classification model is obtained according to a multi-label image containing a plurality of scene elements; the multi-label classification model is constructed based on a neural network model, and the specific training method of the multi-label classification model comprises the following steps: inputting a training image containing a background training target and a foreground training target into a neural network to obtain a first loss function reflecting the difference between a first prediction confidence and a first real confidence of each pixel point in a background area in the training image and a second loss function reflecting the difference between a second prediction confidence and a second real confidence of each pixel point in a foreground area in the training image; the first prediction confidence coefficient is the confidence coefficient that a certain pixel point in a background area in a training image predicted by adopting a neural network belongs to a background training target, and the first real confidence coefficient represents the confidence coefficient that a pixel point labeled in advance in the training image belongs to the background training target; the second prediction confidence coefficient is the confidence coefficient that a certain pixel point in a foreground region in the training image predicted by adopting a neural network belongs to the foreground training target, and the second real confidence coefficient represents the confidence coefficient that a pixel point labeled in advance in the training image belongs to the foreground training target;

adjusting parameters of a neural network according to the target loss function, and training the neural network to obtain a multi-label classification model; the background training target of the training image is provided with a corresponding label, and the foreground training target is also provided with a corresponding label;

outputting a plurality of labels corresponding to the image to be detected as a scene recognition result;

and respectively carrying out image processing corresponding to the scene recognition result on the corresponding area in the image to be detected according to the scene recognition result.

2. The method according to claim 1, characterized in that it comprises, before said acquisition of the image to be detected:

acquiring a multi-label image containing a plurality of scene elements;

training the multi-label classification model using the multi-label image comprising the plurality of scene elements.

3. The method according to claim 1, wherein the performing scene recognition on the image to be detected according to the multi-label classification model to obtain the label corresponding to the image to be detected comprises:

carrying out scene recognition on the image to be detected according to the multi-label classification model to obtain an initial label of the image to be detected and a confidence coefficient corresponding to the initial label;

judging whether the confidence of the initial label is greater than a preset threshold value or not;

and if so, taking the initial label with the confidence coefficient larger than a preset threshold value as a label corresponding to the image to be detected.

4. The method of claim 3, wherein the confidence level for each of the initial labels ranges from [0,1 ].

5. The method according to claim 1, wherein after outputting the label corresponding to the image to be detected as a result of scene recognition, the method comprises:

acquiring position information of the image to be detected during shooting;

and correcting the scene recognition result according to the position information to obtain a corrected scene recognition final result.

6. An image processing apparatus, characterized in that the apparatus comprises:

the image acquisition module is used for acquiring an image to be detected;

the scene recognition module is used for carrying out scene recognition on the image to be detected according to a multi-label classification model to directly obtain a plurality of labels corresponding to the image to be detected, and the multi-label classification model is obtained according to a multi-label image containing a plurality of scene elements; the multi-label classification model is constructed based on a neural network model, and the specific training method of the multi-label classification model comprises the following steps: inputting a training image containing a background training target and a foreground training target into a neural network to obtain a first loss function reflecting the difference between a first prediction confidence and a first real confidence of each pixel point in a background area in the training image and a second loss function reflecting the difference between a second prediction confidence and a second real confidence of each pixel point in a foreground area in the training image; the first prediction confidence coefficient is the confidence coefficient that a certain pixel point in a background area in a training image predicted by adopting a neural network belongs to a background training target, and the first real confidence coefficient represents the confidence coefficient that a pixel point labeled in advance in the training image belongs to the background training target; the second prediction confidence coefficient is the confidence coefficient that a certain pixel point in a foreground region in the training image predicted by adopting a neural network belongs to the foreground training target, and the second real confidence coefficient represents the confidence coefficient that a pixel point labeled in advance in the training image belongs to the foreground training target; weighting and summing the first loss function and the second loss function to obtain a target loss function; adjusting parameters of a neural network according to the target loss function, and training the neural network to obtain a multi-label classification model; the background training target of the training image is provided with a corresponding label, and the foreground training target is also provided with a corresponding label;

the output module is used for outputting a plurality of labels corresponding to the image to be detected as a scene identification result;

and the image processing module is used for respectively carrying out image processing corresponding to the scene recognition result on the corresponding area in the image to be detected according to the scene recognition result.

7. The apparatus of claim 6, further comprising:

the multi-label image acquisition module is used for acquiring a multi-label image containing various scene elements;

and the multi-label classification model training module is used for training a multi-label classification model by using a multi-label image containing various scene elements.

8. The apparatus of claim 6, wherein the scene recognition module comprises:

the initial label obtaining module is used for carrying out scene recognition on the image to be detected according to the multi-label classification model to obtain an initial label of the image to be detected and a confidence coefficient corresponding to the initial label;

the judging module is used for judging whether the confidence coefficient of the initial label is greater than a preset threshold value or not;

and the image tag generation module is used for taking the initial tag with the confidence coefficient larger than the preset threshold value as the tag corresponding to the image to be detected if the initial tag is the tag corresponding to the image to be detected.

9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the image processing method according to any one of claims 1 to 5.

10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the image processing method according to any of claims 1 to 5 are implemented by the processor when executing the computer program.