WO2019233394A1 - Image processing method and apparatus, storage medium and electronic device - Google Patents

Image processing method and apparatus, storage medium and electronic device Download PDF

Info

Publication number
WO2019233394A1
WO2019233394A1 PCT/CN2019/089914 CN2019089914W WO2019233394A1 WO 2019233394 A1 WO2019233394 A1 WO 2019233394A1 CN 2019089914 W CN2019089914 W CN 2019089914W WO 2019233394 A1 WO2019233394 A1 WO 2019233394A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
label
detected
scene recognition
scene
Prior art date
Application number
PCT/CN2019/089914
Other languages
French (fr)
Chinese (zh)
Inventor
陈岩
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Publication of WO2019233394A1 publication Critical patent/WO2019233394A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/20Scenes; Scene-specific elements in augmented reality scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • the present application relates to the field of computer technology, and in particular, to an image processing method and device, a storage medium, and an electronic device.
  • the mobile terminal may perform scene recognition on the image to provide a smart experience for the user.
  • the embodiments of the present application provide an image processing method and device, a storage medium, and an electronic device, which can improve the accuracy of scene recognition on an image.
  • An image processing method includes:
  • An image processing device includes:
  • An image acquisition module configured to acquire an image to be detected
  • a scene recognition module is configured to perform scene recognition on the to-be-detected image according to a multi-label classification model to obtain tags corresponding to the to-be-detected image.
  • the multi-label classification model is obtained from a multi-label image including multiple scene elements. ;
  • An output module is configured to output a label corresponding to the image to be detected as a result of scene recognition.
  • a computer-readable storage medium has stored thereon a computer program that, when executed by a processor, implements the operations of the image processing method described above.
  • An electronic device includes a memory, a processor, and a computer program stored on the memory and operable on the processor.
  • the processor executes the computer program, the operations of the image processing method described above are performed.
  • the foregoing scene recognition method and device, storage medium, and electronic device obtain an image to be detected, perform scene recognition according to a multi-label classification model, and obtain tags corresponding to the image to be detected.
  • the multi-label classification model is based on Multi-label images are obtained.
  • the label corresponding to the image to be detected is output as a result of scene recognition.
  • FIG. 1 is an internal structural diagram of an electronic device in an embodiment
  • FIG. 2 is a flowchart of an image processing method according to an embodiment
  • 3A is a flowchart of an image processing method according to another embodiment
  • 3B is a schematic structural diagram of a neural network in an embodiment
  • FIG. 4 is a flowchart of a method for obtaining a label corresponding to an image by performing scene recognition on the image according to the multi-label classification model in FIG. 2;
  • FIG. 6 is a schematic structural diagram of an image processing apparatus according to an embodiment
  • FIG. 7 is a schematic structural diagram of an image processing apparatus according to another embodiment.
  • FIG. 8 is a schematic structural diagram of a scene recognition module in FIG. 6;
  • FIG. 9 is a block diagram of a partial structure of a mobile phone related to an electronic device according to an embodiment.
  • FIG. 1 is a schematic diagram of an internal structure of an electronic device in an embodiment.
  • the electronic device includes a processor, a memory, and a network interface connected through a system bus.
  • the processor is used to provide computing and control capabilities to support the operation of the entire electronic device.
  • the memory is used to store data, programs, and the like. At least one computer program is stored on the memory, and the computer program can be executed by a processor to implement the image processing method applicable to the electronic device provided in the embodiments of the present application.
  • the memory may include a non-volatile storage medium such as a magnetic disk, an optical disc, a read-only memory (ROM), or a random-access memory (RAM).
  • the memory includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system and a computer program.
  • the computer program can be executed by a processor to implement an image processing method provided by each of the following embodiments.
  • the internal memory provides a cached operating environment for operating system computer programs in a non-volatile storage medium.
  • the network interface may be an Ethernet card or a wireless network card, and is used to communicate with external electronic devices.
  • the electronic device may be a mobile phone, a tablet computer, or a personal digital assistant or a wearable device.
  • an image processing method is provided.
  • the method is applied to the electronic device in FIG. 1 as an example, and includes:
  • Operation 220 Acquire an image to be detected.
  • the user uses an electronic device (with a photographing function) to take a picture and obtain an image to be detected.
  • the image to be detected may be a photo preview screen, or a photo saved to an electronic device after the photo is taken.
  • the image to be detected refers to an image requiring scene recognition, and includes both an image containing only a single scene element and an image containing multiple scene elements (two or more).
  • the scene elements in the image include landscape, beach, blue sky, green grass, snow, night scene, dark, backlight, sunrise / sunset, fireworks, spotlight, indoor, macro, text document, portrait, baby, cat, dog , Food and more.
  • the above are not exhaustive, but also include many other categories of scene elements.
  • Operation 240 Perform scene recognition according to the multi-label classification model to obtain tags corresponding to the image to be detected, and the multi-label classification model is obtained from a multi-label image including multiple scene elements.
  • scene recognition is performed on the image to be detected.
  • a pre-trained multi-label classification model is used to perform scene recognition on the image to obtain tags corresponding to the scene included in the image.
  • the multi-label classification model is obtained based on a multi-label image including multiple scene elements. That is, the multi-label classification model is a scene recognition model obtained after scene recognition training using an image containing multiple scene elements. After the multi-label classification model performs scene recognition on the images to be detected, labels corresponding to the scenes contained in the images to be detected are obtained.
  • the labels of the image to be detected can be directly output as beach, blue sky, and portrait.
  • the beach, blue sky, and portrait are labels corresponding to the scene in the image to be detected.
  • the label corresponding to the image to be detected is output as a result of scene recognition.
  • the tags corresponding to the scene included in the to-be-detected image are obtained, the tags corresponding to the to-be-detected image are the results of scene recognition. Output the results of scene recognition.
  • an image requiring scene recognition is acquired, and a scene recognition is performed on an image to be detected according to a multi-label classification model to obtain a tag corresponding to the image to be detected.
  • the multi-label classification model is obtained from a multi-label image including multiple scene elements. of.
  • the label corresponding to the image to be detected is output as a result of scene recognition. Because the multi-label classification model is a scene recognition model obtained from multi-label images containing multiple scene elements, it is possible to directly and accurately output multiple scenes in this image after performing scene recognition on images containing different scene elements. s Mark. Therefore, the accuracy of scene recognition for images containing different scene elements is improved, and the efficiency of scene recognition is also improved.
  • the method before acquiring an image to be detected, the method includes:
  • Operation 320 Obtain a multi-label image including multiple scene elements.
  • Obtaining an image containing multiple scene elements is called a multi-label image in this embodiment, because after scene recognition for an image containing multiple scenes, each scene will correspond to a label, and all the labels form the label of the image, that is, Multi-label images.
  • Operation 340 Train a multi-label classification model using a multi-label image including multiple scene elements.
  • scene recognition may be performed on the above-mentioned multi-label image samples manually, and a label corresponding to each multi-label image sample is obtained, which is called a standard label. Then use the images in the above multi-label image samples for scene recognition training one by one, until the error between the trained scene recognition results and the standard tags is getting smaller and smaller. At this time, after training, the multi-label classification model that can realize scene recognition on multi-label images is obtained.
  • the multi-label classification model is a scene recognition model obtained by training using multi-label images containing multiple scene elements
  • the images containing different scene elements can be directly and accurately output after scene recognition. Labels corresponding to multiple scenes in this image.
  • the accuracy of multi-tag image recognition is improved, and the efficiency of multi-tag image recognition is also improved.
  • the multi-label classification model is constructed based on a neural network model.
  • the specific training method of the multi-label classification model is: input a training image containing a background training target and a foreground training target to a neural network, and obtain a first prediction confidence and a first true confidence that reflect each pixel of the background region in the training image.
  • a first loss function that reflects the difference between degrees
  • a second loss function that reflects the difference between the second prediction confidence and the second true confidence of each pixel in the foreground region in the training image
  • the first prediction confidence is The neural network predicts the confidence level that a pixel in the background region belongs to the background training target.
  • the first true confidence level represents the confidence level that the pixel labeled in the training image belongs to the background training target;
  • the second prediction confidence level To use the neural network to predict the confidence level that a pixel in the foreground region belongs to the foreground training target, the second true confidence level represents the confidence level that the pixel labeled in the training image belongs to the foreground training target;
  • the background training target of the training image has corresponding labels
  • the foreground training target also has labels.
  • FIG. 3B is a schematic structural diagram of a neural network model in an embodiment.
  • the input layer of the neural network receives training images with image category labels, performs feature extraction through a basic network (such as a CNN network), and outputs the extracted image features to the feature layer, and the feature layer is used for the background
  • the first loss function is obtained by performing category detection on the training target
  • the second loss function is obtained by performing category detection on the foreground training target based on image features.
  • the position loss function is obtained by performing position detection on the foreground training target based on the foreground area.
  • the weighted sum of the loss function and the position loss function is used to obtain the target loss function.
  • the neural network may be a convolutional neural network.
  • Convolutional neural networks include a data input layer, a convolutional calculation layer, an activation layer, a pooling layer, and a fully connected layer.
  • the data input layer is used to pre-process the original image data.
  • the pre-processing may include de-averaging, normalization, dimensionality reduction, and whitening processes.
  • De-averaging refers to centering each dimension of the input data to 0, the purpose is to pull the center of the sample back to the origin of the coordinate system.
  • Normalization is normalizing the amplitude to the same range.
  • Whitening refers to normalizing the amplitude on each characteristic axis of the data.
  • the convolution calculation layer is used for local correlation and window sliding. The weight of each filter connected to the data window in the convolution calculation layer is fixed.
  • Each filter focuses on an image feature, such as vertical edges, horizontal edges, colors, textures, etc., and these filters are combined to obtain the entire image.
  • a filter is a weight matrix.
  • a weight matrix can be used to convolve with data in different windows.
  • the activation layer is used to non-linearly map the output of the convolution layer.
  • the activation function used by the activation layer may be ReLU (The Rectified Linear Unit).
  • the pooling layer can be sandwiched between consecutive convolutional layers to compress the amount of data and parameters and reduce overfitting.
  • the pooling layer can use the maximum method or average method to reduce the dimensionality of the data.
  • the fully connected layer is located at the tail of the convolutional neural network, and all neurons between the two layers have the right to reconnect.
  • Part of the convolutional neural network is cascaded to the first confidence output node, part of the convolutional layer is cascaded to the second confidence output node, and part of the convolutional layer is cascaded to the position output node.
  • the first confidence output node it can be detected.
  • the output node can detect the type of the foreground object of the image according to the second confidence level, and the position corresponding to the foreground object can be detected according to the position output node.
  • artificial neural networks are also referred to as neural networks (NNs) or connection models. It abstracts the human brain neuron network from the perspective of information processing, establishes some simple model, and forms different networks according to different connection methods. In engineering and academia, it is often referred to as neural network or neural network. It can be understood that artificial neural network is a mathematical model that uses information similar to the structure of brain synapses to process information.
  • Neural networks are often used for classification, for example, the classification of spam, the classification of cats and dogs in images, and so on.
  • This kind of machine that can automatically classify the input variables is called a classifier.
  • the input to the classifier is a numeric vector called a feature (vector).
  • the classifier needs to be trained, that is, the neural network needs to be trained first.
  • the training of artificial neural networks relies on back-propagation algorithms. First, input the feature vector in the input layer and obtain the output through network calculation. The output layer finds that the output is not consistent with the correct class number. At this time, it allows the last layer of neurons to adjust the parameters. , And will also order the penultimate neuron connected to it to adjust its parameters, so that the layers are adjusted backward. The adjusted network will continue to test on the sample. If the output is still wrong, continue to the next round of rollback adjustments until the output through the neural network is as consistent as possible with the correct result.
  • the neural network model includes an input layer, a hidden layer, and an output layer.
  • Feature vectors are extracted from multi-label images containing multiple scene elements, and then the feature vectors are input into the hidden layer to calculate the size of the loss function, and then the parameters of the neural network model are adjusted according to the loss function, so that the loss function continuously converges, and then Multi-label classification model is obtained by training the neural network model.
  • the multi-label classification model can implement scene recognition on the input image to obtain tags for each scene included in the image, and output these tags as the result of scene recognition.
  • the target loss function is obtained by weighted summing the first loss function corresponding to the background training target and the second loss function corresponding to the foreground training target, and the parameters of the neural network are adjusted according to the target loss function, so that the trained multi-label classification model is obtained. Subsequent identification of the background category and the label of the foreground target can obtain more information and improve the recognition efficiency.
  • operation 240 performing scene recognition according to the multi-label classification model to obtain a label corresponding to the image to be detected, including:
  • Operation 242 Perform scene recognition according to the multi-label classification model to obtain an initial label of the image to be detected and a confidence level corresponding to the initial label;
  • Operation 244 Determine whether the confidence level of the initial label is greater than a preset threshold
  • the multi-label classification model obtained by the training is used to perform scene recognition on a to-be-detected image that contains multiple scene elements, multiple initial tags of the to-be-detected image and the confidence levels corresponding to the initial tags will be obtained.
  • the confidence that the initial label of the image to be detected is beach is 0.6
  • the confidence that the initial label of the image to be detected is blue sky is 0.7
  • the confidence that the initial label of the image to be detected is a portrait is 0.8
  • the confidence that the initial label of the image to be detected is a dog is 0.4
  • the confidence that the initial label of the image to be detected is snow is 0.3.
  • the initial labels of the recognition results are filtered. Specifically, it is determined whether the confidence level of the initial labels is greater than a preset threshold.
  • the preset threshold may be a confidence level obtained when the multi-label classification model is trained in the early stage, based on a large number of training samples, when the loss function is relatively small, and the result obtained is close to the actual result. Threshold. For example, if the confidence threshold obtained based on a large number of training samples is 0.5, in the above example, it is determined whether the confidence of the initial label is greater than a preset threshold, and the initial label with a confidence greater than the preset threshold is used as a label corresponding to the image.
  • the labels corresponding to the obtained images to be detected are beach, blue sky, and portrait, and two interference terms, dog and snow scene with confidence lower than the threshold, are discarded.
  • scene recognition is performed on an image to be detected according to a multi-label classification model, and an initial label of the image to be detected and a confidence level corresponding to the initial label are obtained. Because the initial labels obtained from scene recognition are not necessarily the true labels corresponding to the images to be detected, the confidence of each initial label is used to filter the initial labels, and the initial labels larger than the confidence threshold are selected as the corresponding images to be detected. Scene recognition results. This improves the accuracy of the scene recognition results to a certain extent.
  • the range of confidence corresponding to each initial label is [0,1].
  • the multi-label classification model is a scene recognition model that is trained based on multi-label images containing multiple scene elements, it is possible to directly output this image more accurately after performing scene recognition on images to be detected that contain different scene elements. Labels corresponding to multiple scenes in the image to be detected.
  • the identification process of each label is independent, so the probability of each identified label can be between [0,1].
  • the recognition processes of different tags do not affect each other, so it is possible to comprehensively identify all the scenes included in the image to be detected and avoid omissions.
  • the method includes:
  • Operation 520 Obtain position information when the image to be detected is captured
  • the result of scene recognition is corrected according to the position information to obtain a final result of scene recognition after correction.
  • the electronic device records the location of each picture, and generally uses GPS (Global Positioning System) to record address information. Get the address information recorded by the electronic device. After acquiring the address information recorded by the electronic device, the position information of the image to be detected is acquired according to the address information. Match the corresponding scene category and the weight corresponding to the scene category for different address information in advance. Specifically, it may be a result obtained by performing statistical analysis on a large number of image materials, and the corresponding scene category and the corresponding weight of the scene category may be matched for different address information according to the result.
  • GPS Global Positioning System
  • the result of scene recognition can be corrected according to the address information at the time of image shooting and the probability of the scene corresponding to the address information, to obtain the final result of scene recognition after correction.
  • the address information of the picture is "XXX grassland”
  • the scenes corresponding to the "XXX grassland” have higher weights such as “green grass”, “snow landscape”, and “blue sky”, so these scenes have a higher probability of appearing . Therefore, the result of scene recognition is corrected. If the above-mentioned "green grass”, “snow scene”, and “blue sky” appear in the result of scene recognition, then it can be used as the final result of scene recognition. If the scene of "beach” appears in the result of scene recognition, then the "beach” scene should be filtered according to the address information when the image was taken to remove the "beach” scene to avoid getting incorrect and incompatible scene categories.
  • position information at the time of shooting an image to be detected is acquired, and a result of scene recognition is corrected according to the position information to obtain a final result of scene recognition after correction.
  • the scene classification of the to-be-detected image obtained by using the shooting address information of the to-be-detected image can be implemented to calibrate the result of scene recognition, thereby ultimately improving the accuracy of scene detection.
  • the method further includes:
  • the image to be detected is subjected to image processing corresponding to the result of scene recognition.
  • a label corresponding to the image to be detected is obtained, and a label corresponding to the image to be detected is output as a result of scene recognition.
  • the result of scene recognition can be used as the basis for image post-processing, and targeted image processing can be performed according to the result of scene recognition, thereby greatly improving the quality of the image. For example, if the scene type of the image to be detected is identified as night scene, the image may be processed in a suitable manner for the night scene, such as increasing brightness. If it is identified that the scene type of the image to be detected is backlighting, the image can be processed using a suitable processing method for backlighting.
  • the beach area can be processed in a manner suitable for the beach, and the green grass area can be processed in a manner suitable for green grass.
  • the blue sky is processed separately for the blue sky, so that the effect of the entire image is very good.
  • an image processing method is provided.
  • the method is applied to the electronic device in FIG. 1 as an example, and includes:
  • Operation 1 Obtain a multi-label image containing multiple scene elements, and use the multi-label image containing multiple scene elements to train a neural network model to obtain a multi-label classification model, that is, the multi-label classification model is based on a neural network architecture;
  • Operation 2 Perform scene recognition according to the multi-label classification model to obtain the initial label of the image to be detected and the confidence level corresponding to the initial label;
  • Operation three Determine whether the confidence level of the initial label is greater than a preset threshold. When the determination result is yes, use the initial label whose confidence level is greater than the preset threshold as the label corresponding to the image to be detected, and use the label corresponding to the image to be detected as the scene recognition. Output the results;
  • Operation four Obtain position information at the time of shooting the image to be detected, and correct the scene recognition result according to the position information to obtain the final result of the scene recognition after correction;
  • Operation five According to the result of the scene recognition, the image to be detected is subjected to image processing corresponding to the result of the scene recognition to obtain a processed image.
  • the multi-label classification model is a scene recognition model obtained from a multi-label image containing multiple scene elements, it is possible to directly and accurately perform scene recognition on images to be detected that include different scene elements. Output labels corresponding to multiple scenes in this image. Therefore, the accuracy of scene recognition on images to be detected containing different scene elements is improved, and the efficiency of scene recognition is also improved.
  • the result of scene recognition is corrected according to the position information when the image to be detected is captured, to obtain the final result of scene recognition after correction.
  • the scene classification of the to-be-detected image obtained by using the shooting address information of the to-be-detected image can be implemented to calibrate the result of scene recognition, thereby ultimately improving the accuracy of scene detection.
  • the result of scene recognition can be used as the basis for image post-processing, and the image can be targeted for image processing according to the result of scene recognition, thereby greatly improving the quality of the image.
  • an image processing device 600 includes an image acquisition module 610, a scene recognition module 620, and an output module 630. among them,
  • An image acquisition module 610 configured to acquire an image to be detected
  • a scene recognition module 620 is configured to perform scene recognition according to a multi-label classification model to obtain a label corresponding to the image to be detected, and the multi-label classification model is obtained from a multi-label image including multiple scene elements;
  • An output module 630 is configured to output a label corresponding to the image to be detected as a result of scene recognition.
  • an image processing apparatus 600 is provided, and the apparatus further includes:
  • a multi-label image acquisition module 640 configured to acquire a multi-label image including multiple scene elements
  • a multi-label classification model training module 650 is configured to train a multi-label classification model using a multi-label image including multiple scene elements.
  • the scene recognition module 620 includes:
  • An initial label acquisition module 622 is configured to perform scene recognition based on a multi-label classification model to obtain an initial label of the image to be detected and a confidence level corresponding to the initial label;
  • a determining module 624 configured to determine whether the confidence level of the initial label is greater than a preset threshold
  • the image label generation module 626 is configured to, when the determination result is yes, use an initial label with a confidence level greater than a preset threshold as a label corresponding to the image to be detected.
  • an image processing device 600 is provided, which is further configured to obtain position information when an image to be detected is taken; and correct the scene recognition result according to the position information to obtain a final scene recognition result after the correction.
  • an image processing device 600 is provided, and further configured to perform image processing corresponding to a scene recognition result on an image to be detected according to a result of scene recognition.
  • each module in the above image processing apparatus is for illustration only. In other embodiments, the image processing apparatus may be divided into different modules as needed to complete all or part of the functions of the above image processing apparatus.
  • Each module in the image processing apparatus may be implemented in whole or in part by software, hardware, and a combination thereof.
  • the network interface may be an Ethernet card or a wireless network card.
  • the above modules may be embedded in the processor in the form of hardware or independent of the processor in the server, or may be stored in the memory of the server in the form of software to facilitate the processor. Call to perform the operations corresponding to the above modules.
  • a computer-readable storage medium on which a computer program is stored.
  • the computer program is executed by a processor, the operations of the image processing methods provided by the foregoing embodiments are implemented.
  • an electronic device including a memory, a processor, and a computer program stored on the memory and executable on the processor.
  • the processor executes the computer program, the image processing provided by the foregoing embodiments is implemented. The operation of the method.
  • An embodiment of the present application further provides a computer program product, which when executed on a computer, causes the computer to perform operations of the image processing methods provided by the foregoing embodiments.
  • An embodiment of the present application further provides an electronic device.
  • the above electronic device includes an image processing circuit.
  • the image processing circuit may be implemented by hardware and / or software components, and may include various processing units that define an ISP (Image Signal Processing) pipeline.
  • FIG. 9 is a schematic diagram of an image processing circuit in one embodiment. As shown in FIG. 9, for ease of description, only aspects of the image processing technology related to the embodiments of the present application are shown.
  • the image processing circuit includes an ISP processor 940 and a control logic 950.
  • the image data captured by the imaging device 910 is first processed by the ISP processor 940, which analyzes the image data to capture image statistical information that can be used to determine and / or one or more control parameters of the imaging device 910.
  • the imaging device 910 may include a camera having one or more lenses 912 and an image sensor 914.
  • the image sensor 914 may include a color filter array (such as a Bayer filter). The image sensor 914 may obtain the light intensity and wavelength information captured by each imaging pixel of the image sensor 914, and provide a set of Image data.
  • the sensor 920 may provide parameters (such as image stabilization parameters) of the acquired image processing to the ISP processor 940 based on the interface type of the sensor 920.
  • the sensor 920 interface may use a SMIA (Standard Mobile Imaging Architecture) interface, other serial or parallel camera interfaces, or a combination of the foregoing interfaces.
  • SMIA Standard Mobile Imaging Architecture
  • the image sensor 914 may also send the original image data to the sensor 920, and the sensor 920 may provide the original image data to the ISP processor 940 based on the interface type of the sensor 920, or the sensor 920 stores the original image data in the image memory 930.
  • the ISP processor 940 processes the original image data pixel by pixel in a variety of formats.
  • each image pixel may have a bit depth of 8, 10, 12, or 14 bits, and the ISP processor 940 may perform one or more image processing operations on the original image data and collect statistical information about the image data.
  • the image processing operations may be performed with the same or different bit depth accuracy.
  • the ISP processor 940 may also receive image data from the image memory 930.
  • the sensor 920 interface sends the original image data to the image memory 930, and the original image data in the image memory 930 is then provided to the ISP processor 940 for processing.
  • the image memory 930 may be a part of a memory device, a storage device, or a separate dedicated memory in an electronic device, and may include a DMA (Direct Memory Access) feature.
  • DMA Direct Memory Access
  • the ISP processor 940 may perform one or more image processing operations, such as time-domain filtering.
  • the processed image data may be sent to the image memory 930 for further processing before being displayed.
  • the ISP processor 940 receives the processing data from the image memory 930, and performs processing on the image data in the original domain and in the RGB and YCbCr color spaces.
  • the image data processed by the ISP processor 940 may be output to the display 970 for viewing by the user and / or further processed by a graphics engine or a GPU (Graphics Processing Unit).
  • the output of the ISP processor 940 can also be sent to the image memory 930, and the display 970 can read image data from the image memory 930.
  • the image memory 930 may be configured to implement one or more frame buffers.
  • the output of the ISP processor 940 may be sent to an encoder / decoder 960 to encode / decode image data.
  • the encoded image data can be saved and decompressed before being displayed on the display 970 device.
  • the encoder / decoder 960 may be implemented by a CPU or a GPU or a coprocessor.
  • the statistical data determined by the ISP processor 940 may be sent to the control logic 950 unit.
  • the statistical data may include image information of the image sensor 914 such as auto exposure, auto white balance, auto focus, flicker detection, black level compensation, and lens 912 shading correction.
  • the control logic 950 may include a processor and / or a microcontroller that executes one or more routines (such as firmware). The one or more routines may determine the control parameters of the imaging device 910 and the ISP processing according to the received statistical data. Parameters of the controller 940.
  • control parameters of the imaging device 910 may include sensor 920 control parameters (such as gain, integration time for exposure control, image stabilization parameters, etc.), camera flash control parameters, lens 912 control parameters (such as focus distance for focusing or zooming), or these A combination of parameters.
  • ISP control parameters may include gain levels and color correction matrices for automatic white balance and color adjustment (eg, during RGB processing), and lens 912 shading correction parameters.
  • Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory can include random access memory (RAM), which is used as external cache memory.
  • RAM is available in various forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDR, SDRAM), enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
  • SRAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDR dual data rate SDRAM
  • SDRAM enhanced SDRAM
  • SLDRAM synchronous Link (Synchlink) DRAM
  • SLDRAM synchronous Link (Synchlink) DRAM
  • Rambus direct RAM
  • DRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM
  • the program can be stored in a non-volatile computer-readable storage medium.
  • the storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The present application relates to an image processing method and apparatus, an electronic device and a computer readable storage medium, the method comprising: acquiring an image to be detected, carrying out scenario recognition for the image to be detected according to a multi-label categorization model so as to obtain a label corresponding to the image to be detected, the multi-label categorization model being obtained according to multi-label images comprising multiple scenario factors; and using the label corresponding to the image to be detected as the result of scenario recognition for output.

Description

图像处理方法和装置、存储介质、电子设备Image processing method and device, storage medium and electronic equipment
相关申请的交叉引用Cross-reference to related applications
本申请要求于2018年06月08日提交中国专利局,申请号为201810585679.3,发明名称为“图像处理方法和装置、存储介质、电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed on June 08, 2018 with the application number 201810585679.3 and the invention name "Image Processing Method and Apparatus, Storage Medium, Electronic Equipment", the entire contents of which are incorporated by reference. In this application.
技术领域Technical field
本申请涉及计算机技术领域,特别是涉及一种图像处理方法和装置、存储介质、电子设备。The present application relates to the field of computer technology, and in particular, to an image processing method and device, a storage medium, and an electronic device.
背景技术Background technique
随着移动终端的普及和移动互联网的迅速发展,移动终端的用户使用量越来越大。移动终端中的拍照功能已经成为用户常用功能之一。在拍照的过程中或在拍照之后,移动终端都可能会对图像进行场景识别,以给用户提供智能化的体验。With the popularity of mobile terminals and the rapid development of mobile Internet, the user usage of mobile terminals is increasing. The camera function in mobile terminals has become one of the commonly used functions of users. During or after taking a photo, the mobile terminal may perform scene recognition on the image to provide a smart experience for the user.
发明内容Summary of the Invention
本申请实施例提供一种图像处理方法和装置、存储介质、电子设备,可以提高对图像进行场景识别的准确性。The embodiments of the present application provide an image processing method and device, a storage medium, and an electronic device, which can improve the accuracy of scene recognition on an image.
一种图像处理方法,包括:An image processing method includes:
获取待检测图像;Obtaining images to be detected;
根据多标签分类模型对所述待检测图像进行场景识别,得到所述待检测图像对应的标签,所述多标签分类模型为根据包含多种场景要素的多标签图像得到的;Performing scene recognition on the to-be-detected image according to a multi-label classification model to obtain labels corresponding to the to-be-detected image, and the multi-label classification model is obtained based on a multi-label image including multiple scene elements;
将所述待检测图像对应的标签作为场景识别的结果进行输出。And outputting a label corresponding to the image to be detected as a result of scene recognition.
一种图像处理装置,所述装置包括:An image processing device includes:
图像获取模块,用于获取待检测图像;An image acquisition module, configured to acquire an image to be detected;
场景识别模块,用于根据多标签分类模型对所述待检测图像进行场景识别,得到所述 待检测图像对应的标签,所述多标签分类模型为根据包含多种场景要素的多标签图像得到的;A scene recognition module is configured to perform scene recognition on the to-be-detected image according to a multi-label classification model to obtain tags corresponding to the to-be-detected image. The multi-label classification model is obtained from a multi-label image including multiple scene elements. ;
输出模块,用于将所述待检测图像对应的标签作为场景识别的结果进行输出。An output module is configured to output a label corresponding to the image to be detected as a result of scene recognition.
一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现如上所述的图像处理方法的操作。A computer-readable storage medium has stored thereon a computer program that, when executed by a processor, implements the operations of the image processing method described above.
一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,处理器执行计算机程序时执行如上所述的图像处理方法的操作。An electronic device includes a memory, a processor, and a computer program stored on the memory and operable on the processor. When the processor executes the computer program, the operations of the image processing method described above are performed.
上述场景识别方法和装置、存储介质、电子设备,获取待检测图像,根据多标签分类模型对待检测图像进行场景识别,得到待检测图像对应的标签,多标签分类模型为根据包含多种场景要素的多标签图像得到的。将待检测图像对应的标签作为场景识别的结果进行输出。The foregoing scene recognition method and device, storage medium, and electronic device obtain an image to be detected, perform scene recognition according to a multi-label classification model, and obtain tags corresponding to the image to be detected. The multi-label classification model is based on Multi-label images are obtained. The label corresponding to the image to be detected is output as a result of scene recognition.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to explain the technical solutions in the embodiments of the present application or the prior art more clearly, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings in the following description are merely These are some embodiments of the present application. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without paying creative work.
图1为一个实施例中电子设备的内部结构图;FIG. 1 is an internal structural diagram of an electronic device in an embodiment; FIG.
图2为一个实施例中图像处理方法的流程图;2 is a flowchart of an image processing method according to an embodiment;
图3A为又一个实施例中图像处理方法的流程图;3A is a flowchart of an image processing method according to another embodiment;
图3B为一个实施例中神经网络的架构示意图;3B is a schematic structural diagram of a neural network in an embodiment;
图4为图2中根据多标签分类模型对图像进行场景识别得到图像对应的标签方法的流程图;4 is a flowchart of a method for obtaining a label corresponding to an image by performing scene recognition on the image according to the multi-label classification model in FIG. 2;
图5为再一个实施例中图像处理方法的流程图;5 is a flowchart of an image processing method in still another embodiment;
图6为一个实施例中图像处理装置的结构示意图;6 is a schematic structural diagram of an image processing apparatus according to an embodiment;
图7为又一个实施例中图像处理装置的结构示意图;FIG. 7 is a schematic structural diagram of an image processing apparatus according to another embodiment; FIG.
图8为图6中场景识别模块的结构示意图;8 is a schematic structural diagram of a scene recognition module in FIG. 6;
图9为一个实施例中提供的电子设备相关的手机的部分结构的框图。FIG. 9 is a block diagram of a partial structure of a mobile phone related to an electronic device according to an embodiment.
具体实施方式Detailed ways
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。In order to make the purpose, technical solution, and advantages of the present application clearer, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the application, and are not used to limit the application.
图1为一个实施例中电子设备的内部结构示意图。如图1所示,该电子设备包括通过系统总线连接的处理器、存储器和网络接口。其中,该处理器用于提供计算和控制能力,支撑整个电子设备的运行。存储器用于存储数据、程序等,存储器上存储至少一个计算机程序,该计算机程序可被处理器执行,以实现本申请实施例中提供的适用于电子设备的图像处理方法。存储器可包括磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)等非易失性存储介质,或随机存储记忆体(Random-Access-Memory,RAM)等。例如,在一个实施例中,存储器包括非易失性存储介质及内存储器。非易失性存储介质存储有操作系统和计算机程序。该计算机程序可被处理器所执行,以用于实现以下各个实施例所提供的一种图像处理方法。内存储器为非易失性存储介质中的操作系统计算机程序提供高速缓存的运行环境。网络接口可以是以太网卡或无线网卡等,用于与外部的电子设备进行通信。该电子设备可以是手机、平板电脑或者个人数字助理或穿戴式设备等。FIG. 1 is a schematic diagram of an internal structure of an electronic device in an embodiment. As shown in FIG. 1, the electronic device includes a processor, a memory, and a network interface connected through a system bus. The processor is used to provide computing and control capabilities to support the operation of the entire electronic device. The memory is used to store data, programs, and the like. At least one computer program is stored on the memory, and the computer program can be executed by a processor to implement the image processing method applicable to the electronic device provided in the embodiments of the present application. The memory may include a non-volatile storage medium such as a magnetic disk, an optical disc, a read-only memory (ROM), or a random-access memory (RAM). For example, in one embodiment, the memory includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The computer program can be executed by a processor to implement an image processing method provided by each of the following embodiments. The internal memory provides a cached operating environment for operating system computer programs in a non-volatile storage medium. The network interface may be an Ethernet card or a wireless network card, and is used to communicate with external electronic devices. The electronic device may be a mobile phone, a tablet computer, or a personal digital assistant or a wearable device.
在一个实施例中,如图2所示,提供了一种图像处理方法,以该方法应用于图1中的电子设备为例进行说明,包括:In one embodiment, as shown in FIG. 2, an image processing method is provided. The method is applied to the electronic device in FIG. 1 as an example, and includes:
操作220,获取待检测图像。Operation 220: Acquire an image to be detected.
用户使用电子设备(具有拍照功能)进行拍照,获取待检测图像。待检测图像可以是拍照预览画面,也可以是拍照后保存到电子设备中的照片。待检测图像指的是需要进行场景识别的图像,既包括仅包含单一场景要素的图像,也包括包含多个场景要素(两个或两个以上)的图像。一般情况下图像中的场景要素包括风景、海滩、蓝天、绿草、雪景、夜景、黑暗、背光、日出/日落、烟火、聚光灯、室内、微距、文本文档、人像、婴儿、猫、狗、美食等。当然,以上并不是穷举,还包含很多其他类别的场景要素。The user uses an electronic device (with a photographing function) to take a picture and obtain an image to be detected. The image to be detected may be a photo preview screen, or a photo saved to an electronic device after the photo is taken. The image to be detected refers to an image requiring scene recognition, and includes both an image containing only a single scene element and an image containing multiple scene elements (two or more). Generally, the scene elements in the image include landscape, beach, blue sky, green grass, snow, night scene, dark, backlight, sunrise / sunset, fireworks, spotlight, indoor, macro, text document, portrait, baby, cat, dog , Food and more. Of course, the above are not exhaustive, but also include many other categories of scene elements.
操作240,根据多标签分类模型对待检测图像进行场景识别,得到待检测图像对应的 标签,多标签分类模型为根据包含多种场景要素的多标签图像得到的。Operation 240: Perform scene recognition according to the multi-label classification model to obtain tags corresponding to the image to be detected, and the multi-label classification model is obtained from a multi-label image including multiple scene elements.
在获取了待检测图像之后,对待检测图像进行场景识别。具体地,采用预先训练好的多标签分类模型对图像进行场景识别,得到图像所包含的场景对应的标签。其中,多标签分类模型为根据包含多种场景要素的多标签图像得到的。即多标签分类模型是使用包含多种场景要素的图像进行场景识别训练之后得到的场景识别模型。经过多标签分类模型对待检测图像进行场景识别,得到待检测图像所包含的场景对应的标签。例如,通过多标签分类模型对一张同时包含海滩、蓝天、人像这样多个场景要素的待检测图像进行场景识别,就可以直接输出待检测图像的标签为海滩、蓝天及人像。海滩、蓝天及人像即为待检测图像中的场景所对应的标签。After acquiring the image to be detected, scene recognition is performed on the image to be detected. Specifically, a pre-trained multi-label classification model is used to perform scene recognition on the image to obtain tags corresponding to the scene included in the image. The multi-label classification model is obtained based on a multi-label image including multiple scene elements. That is, the multi-label classification model is a scene recognition model obtained after scene recognition training using an image containing multiple scene elements. After the multi-label classification model performs scene recognition on the images to be detected, labels corresponding to the scenes contained in the images to be detected are obtained. For example, by using a multi-label classification model to perform scene recognition on an image to be detected that includes multiple scene elements such as beach, blue sky, and portrait, the labels of the image to be detected can be directly output as beach, blue sky, and portrait. The beach, blue sky, and portrait are labels corresponding to the scene in the image to be detected.
操作260,将待检测图像对应的标签作为场景识别的结果进行输出。In operation 260, the label corresponding to the image to be detected is output as a result of scene recognition.
在通过多标签分类模型对待检测图像进行场景识别,得到待检测图像所包含的场景对应的标签之后,待检测图像对应的标签即为场景识别的结果。将场景识别的结果进行输出。After the scene to be detected is identified by the multi-label classification model, and the tags corresponding to the scene included in the to-be-detected image are obtained, the tags corresponding to the to-be-detected image are the results of scene recognition. Output the results of scene recognition.
本申请实施例中,获取需要进行场景识别的图像,根据多标签分类模型对待检测图像进行场景识别,得到待检测图像对应的标签,多标签分类模型为根据包含多种场景要素的多标签图像得到的。将待检测图像对应的标签作为场景识别的结果进行输出。因为多标签分类模型为根据包含多种场景要素的多标签图像所得到的场景识别模型,所以可以对包含不同场景要素的图像,进行场景识别之后直接较为准确地输出这个图像中多个场景分别对应的标签。因此提高了对包含不同场景要素的图像进行场景识别的准确性,且同时提高了场景识别的效率。In the embodiment of the present application, an image requiring scene recognition is acquired, and a scene recognition is performed on an image to be detected according to a multi-label classification model to obtain a tag corresponding to the image to be detected. The multi-label classification model is obtained from a multi-label image including multiple scene elements. of. The label corresponding to the image to be detected is output as a result of scene recognition. Because the multi-label classification model is a scene recognition model obtained from multi-label images containing multiple scene elements, it is possible to directly and accurately output multiple scenes in this image after performing scene recognition on images containing different scene elements. s Mark. Therefore, the accuracy of scene recognition for images containing different scene elements is improved, and the efficiency of scene recognition is also improved.
在一个实施例中,如图3A所示,在获取待检测图像之前,包括:In an embodiment, as shown in FIG. 3A, before acquiring an image to be detected, the method includes:
操作320,获取包含多种场景要素的多标签图像。Operation 320: Obtain a multi-label image including multiple scene elements.
获取包含多种场景要素的图像,在本实施例中称为多标签图像,因为包含多种场景的图像在进行场景识别之后,每个场景都会对应一个标签,所有的标签构成图像的标签,即多标签图像。Obtaining an image containing multiple scene elements is called a multi-label image in this embodiment, because after scene recognition for an image containing multiple scenes, each scene will correspond to a label, and all the labels form the label of the image, that is, Multi-label images.
操作340,使用包含多种场景要素的多标签图像训练多标签分类模型。Operation 340: Train a multi-label classification model using a multi-label image including multiple scene elements.
获取一些多标签图像样本,预先可以通过人工对上述多标签图像样本进行场景识别,获取每个多标签图像样本所对应的标签,称为标准标签。然后采用上述多标签图像样本中 的图像一一进行场景识别训练,直到训练出来的场景识别结果与标准标签之间的误差越来越小。此时经过训练之后获得的即为可以实现对多标签图像进行场景识别的多标签分类模型。To obtain some multi-label image samples, scene recognition may be performed on the above-mentioned multi-label image samples manually, and a label corresponding to each multi-label image sample is obtained, which is called a standard label. Then use the images in the above multi-label image samples for scene recognition training one by one, until the error between the trained scene recognition results and the standard tags is getting smaller and smaller. At this time, after training, the multi-label classification model that can realize scene recognition on multi-label images is obtained.
本申请实施例中,因为多标签分类模型为使用包含多种场景要素的多标签图像进行训练所得到的场景识别模型,所以可以对包含不同场景要素的图像,进行场景识别之后直接较为准确地输出这个图像中多个场景分别对应的标签。提高了对多标签图像识别的准确性、同时也提高了多标签图像识别的效率。In the embodiment of the present application, because the multi-label classification model is a scene recognition model obtained by training using multi-label images containing multiple scene elements, the images containing different scene elements can be directly and accurately output after scene recognition. Labels corresponding to multiple scenes in this image. The accuracy of multi-tag image recognition is improved, and the efficiency of multi-tag image recognition is also improved.
在一个实施例中,多标签分类模型基于神经网络模型构建。In one embodiment, the multi-label classification model is constructed based on a neural network model.
多标签分类模型的具体的训练方法为:将包含有背景训练目标和前景训练目标的训练图像输入到神经网络,得到反映训练图像中背景区域各像素点的第一预测置信度与第一真实置信度之间的差异的第一损失函数,以及反映训练图像中前景区域各像素点的第二预测置信度与第二真实置信度之间的差异的第二损失函数;第一预测置信度为采用神经网络预测出的训练图像中背景区域某一像素点属于背景训练目标的置信度,第一真实置信度表示在训练图像中预先标注的像素点属于背景训练目标的置信度;第二预测置信度为采用神经网络预测出的训练图像中前景区域某一像素点属于前景训练目标的置信度,第二真实置信度表示在训练图像中预先标注的像素点属于前景训练目标的置信度;The specific training method of the multi-label classification model is: input a training image containing a background training target and a foreground training target to a neural network, and obtain a first prediction confidence and a first true confidence that reflect each pixel of the background region in the training image. A first loss function that reflects the difference between degrees, and a second loss function that reflects the difference between the second prediction confidence and the second true confidence of each pixel in the foreground region in the training image; the first prediction confidence is The neural network predicts the confidence level that a pixel in the background region belongs to the background training target. The first true confidence level represents the confidence level that the pixel labeled in the training image belongs to the background training target; the second prediction confidence level To use the neural network to predict the confidence level that a pixel in the foreground region belongs to the foreground training target, the second true confidence level represents the confidence level that the pixel labeled in the training image belongs to the foreground training target;
将第一损失函数和第二损失函数进行加权求和得到目标损失函数;Weighting and summing the first loss function and the second loss function to obtain a target loss function;
根据目标损失函数调整神经网络的参数,对神经网络进行训练进而最终得到多标签分类模型。其中,训练图像的背景训练目标有对应的标签,前景训练目标中也有标签。Adjust the parameters of the neural network according to the target loss function, train the neural network and finally obtain a multi-label classification model. Among them, the background training target of the training image has corresponding labels, and the foreground training target also has labels.
图3B为一个实施例中神经网络模型的架构示意图。如图3B所示,神经网络的输入层接收带有图像类别标签的训练图像,通过基础网络(如CNN网络)进行特征提取,并将提取的图像特征输出给特征层,由该特征层对背景训练目标进行类别检测得到第一损失函数,对前景训练目标根据图像特征进行类别检测得到第二损失函数,对前景训练目标根据前景区域进行位置检测得到位置损失函数,将第一损失函数、第二损失函数和位置损失函数进行加权求和得到目标损失函数。该神经网络可为卷积神经网络。卷积神经网络包括数据输入层、卷积计算层、激活层、池化层和全连接层。数据输入层用于对原始图像数据进行预处理。该预处理可包括去均值、归一化、降维和白化处理。去均值是指将输入数据各个维 度都中心化为0,目的是将样本的中心拉回到坐标系原点上。归一化是将幅度归一化到同样的范围。白化是指对数据各个特征轴上的幅度归一化。卷积计算层用于局部关联和窗口滑动。卷积计算层中每个滤波器连接数据窗的权重是固定的,每个滤波器关注一个图像特征,如垂直边缘、水平边缘、颜色、纹理等,将这些滤波器合在一起得到整张图像的特征提取器集合。一个滤波器是一个权重矩阵。通过一个权重矩阵可与不同窗口内数据做卷积。激活层用于将卷积层输出结果做非线性映射。激活层采用的激活函数可为ReLU(The Rectified Linear Unit,修正线性单元)。池化层可夹在连续的卷积层中间,用于压缩数据和参数的量,减小过拟合。池化层可采用最大值法或平均值法对数据降维。全连接层位于卷积神经网络的尾部,两层之间所有神经元都有权重连接。卷积神经网络的一部分卷积层级联到第一置信度输出节点,一部分卷积层级联到第二置信度输出节点,一部分卷积层级联到位置输出节点,根据第一置信度输出节点可以检测到图像的背景分类,根据第二置信度输出节点可以检测到图像的前景目标的类别,根据位置输出节点可以检测到前景目标所对应的位置。FIG. 3B is a schematic structural diagram of a neural network model in an embodiment. As shown in FIG. 3B, the input layer of the neural network receives training images with image category labels, performs feature extraction through a basic network (such as a CNN network), and outputs the extracted image features to the feature layer, and the feature layer is used for the background The first loss function is obtained by performing category detection on the training target, and the second loss function is obtained by performing category detection on the foreground training target based on image features. The position loss function is obtained by performing position detection on the foreground training target based on the foreground area. The weighted sum of the loss function and the position loss function is used to obtain the target loss function. The neural network may be a convolutional neural network. Convolutional neural networks include a data input layer, a convolutional calculation layer, an activation layer, a pooling layer, and a fully connected layer. The data input layer is used to pre-process the original image data. The pre-processing may include de-averaging, normalization, dimensionality reduction, and whitening processes. De-averaging refers to centering each dimension of the input data to 0, the purpose is to pull the center of the sample back to the origin of the coordinate system. Normalization is normalizing the amplitude to the same range. Whitening refers to normalizing the amplitude on each characteristic axis of the data. The convolution calculation layer is used for local correlation and window sliding. The weight of each filter connected to the data window in the convolution calculation layer is fixed. Each filter focuses on an image feature, such as vertical edges, horizontal edges, colors, textures, etc., and these filters are combined to obtain the entire image. Feature extractor collection. A filter is a weight matrix. A weight matrix can be used to convolve with data in different windows. The activation layer is used to non-linearly map the output of the convolution layer. The activation function used by the activation layer may be ReLU (The Rectified Linear Unit). The pooling layer can be sandwiched between consecutive convolutional layers to compress the amount of data and parameters and reduce overfitting. The pooling layer can use the maximum method or average method to reduce the dimensionality of the data. The fully connected layer is located at the tail of the convolutional neural network, and all neurons between the two layers have the right to reconnect. Part of the convolutional neural network is cascaded to the first confidence output node, part of the convolutional layer is cascaded to the second confidence output node, and part of the convolutional layer is cascaded to the position output node. According to the first confidence output node, it can be detected. To the background classification of the image, the output node can detect the type of the foreground object of the image according to the second confidence level, and the position corresponding to the foreground object can be detected according to the position output node.
具体地,人工神经网络(Artificial Neural Networks,简写为ANNs),也简称为神经网络(NNs)或称作连接模型(Connection Model)。它从信息处理角度对人脑神经元网络进行抽象,建立某种简单模型,按不同的连接方式组成不同的网络。在工程与学术界也常直接简称为神经网络或类神经网络。可以理解为,人工神经网络就是一种应用类似于大脑神经突触联接的结构进行信息处理的数学模型。Specifically, artificial neural networks (Artificial Neural Networks, abbreviated as ANNs) are also referred to as neural networks (NNs) or connection models. It abstracts the human brain neuron network from the perspective of information processing, establishes some simple model, and forms different networks according to different connection methods. In engineering and academia, it is often referred to as neural network or neural network. It can be understood that artificial neural network is a mathematical model that uses information similar to the structure of brain synapses to process information.
神经网络常用于分类,例如,对垃圾邮件的识别分类、对图像中猫狗的识别分类等。这种能自动对输入的变量进行分类的机器,就叫做分类器。分类器的输入是一个数值向量,叫做特征(向量)。在使用分类器之前,需要对分类器进行训练,即需要先对神经网络进行训练。Neural networks are often used for classification, for example, the classification of spam, the classification of cats and dogs in images, and so on. This kind of machine that can automatically classify the input variables is called a classifier. The input to the classifier is a numeric vector called a feature (vector). Before using the classifier, the classifier needs to be trained, that is, the neural network needs to be trained first.
人工神经网络的训练依靠反向传播算法。最开始在输入层输入特征向量,经过网络计算获得输出,输出层发现输出和正确的类号不一致,这时它就让最后一层神经元进行参数调整,最后一层神经元不仅调整自身的参数,还会勒令连接它的倒数第二层神经元进行调整自身参数,如此层层往回退着调整。经过调整的网络将会在样本上继续测试,如果输出依然出错,继续下一轮回退调整,直到经过神经网络输出的结果与正确的结果尽可能的一 致为止。The training of artificial neural networks relies on back-propagation algorithms. First, input the feature vector in the input layer and obtain the output through network calculation. The output layer finds that the output is not consistent with the correct class number. At this time, it allows the last layer of neurons to adjust the parameters. , And will also order the penultimate neuron connected to it to adjust its parameters, so that the layers are adjusted backward. The adjusted network will continue to test on the sample. If the output is still wrong, continue to the next round of rollback adjustments until the output through the neural network is as consistent as possible with the correct result.
本申请实施例中,神经网络模型包括输入层、隐层和输出层。从包含多种场景要素的多标签图像中提取特征向量,然后将特征向量输入至隐层中进行计算损失函数的大小,再根据损失函数来调整神经网络模型的参数,使得损失函数不断收敛,进而实现对神经网络模型进行训练得到多标签分类模型。该多标签分类模型可以实现对输入的图像进行场景识别得到图像中所包含的每个场景的标签,并将这些标签作为场景识别的结果进行输出。通过对背景训练目标所对应的第一损失函数和前景训练目标所对应的第二损失函数的加权求和得到目标损失函数,根据目标损失函数调整神经网络的参数,使得训练得到的多标签分类模型后续可以同时识别出背景类别和前景目标的标签,获取更多的信息,且提高了识别效率。In the embodiment of the present application, the neural network model includes an input layer, a hidden layer, and an output layer. Feature vectors are extracted from multi-label images containing multiple scene elements, and then the feature vectors are input into the hidden layer to calculate the size of the loss function, and then the parameters of the neural network model are adjusted according to the loss function, so that the loss function continuously converges, and then Multi-label classification model is obtained by training the neural network model. The multi-label classification model can implement scene recognition on the input image to obtain tags for each scene included in the image, and output these tags as the result of scene recognition. The target loss function is obtained by weighted summing the first loss function corresponding to the background training target and the second loss function corresponding to the foreground training target, and the parameters of the neural network are adjusted according to the target loss function, so that the trained multi-label classification model is obtained. Subsequent identification of the background category and the label of the foreground target can obtain more information and improve the recognition efficiency.
在一个实施例中,如图4所示,操作240,根据多标签分类模型对待检测图像进行场景识别,得到待检测图像对应的标签,包括:In one embodiment, as shown in FIG. 4, operation 240, performing scene recognition according to the multi-label classification model to obtain a label corresponding to the image to be detected, including:
操作242,根据多标签分类模型对待检测图像进行场景识别,得到待检测图像的初始标签及初始标签对应的置信度;Operation 242: Perform scene recognition according to the multi-label classification model to obtain an initial label of the image to be detected and a confidence level corresponding to the initial label;
操作244,判断初始标签的置信度是否大于预设阈值;Operation 244: Determine whether the confidence level of the initial label is greater than a preset threshold;
操作246,当判断结果为是,则将置信度大于预设阈值的初始标签作为待检测图像对应的标签。In operation 246, when the determination result is yes, the initial label whose confidence level is greater than a preset threshold is used as a label corresponding to the image to be detected.
采用经过训练所得到的多标签分类模型,在实际中进行图像场景识别时的输出还是可能存在一定的误差,因此,需要进一步减小误差。一般情况下,如果是采用上述训练所得的多标签分类模型对一张包含多种场景要素的待检测图像进行场景识别,那么会得到待检测图像的多个初始标签及初始标签对应的置信度。例如,对于一张包含海滩、蓝天、人像的待检测图像来进行场景识别,识别出待检测图像的初始标签为海滩的置信度为0.6,识别出待检测图像的初始标签为蓝天的置信度为0.7,识别出待检测图像的初始标签为人像的置信度为0.8,识别出待检测图像的初始标签为狗的置信度为0.4,识别出待检测图像的初始标签为雪景的置信度为0.3。Using the trained multi-label classification model, there may still be some errors in the output of image scene recognition in practice, so it is necessary to further reduce the errors. In general, if the multi-label classification model obtained by the training is used to perform scene recognition on a to-be-detected image that contains multiple scene elements, multiple initial tags of the to-be-detected image and the confidence levels corresponding to the initial tags will be obtained. For example, for a scene to be detected including a beach, blue sky, and portrait for scene recognition, the confidence that the initial label of the image to be detected is beach is 0.6, and the confidence that the initial label of the image to be detected is blue sky is 0.7, the confidence that the initial label of the image to be detected is a portrait is 0.8, the confidence that the initial label of the image to be detected is a dog is 0.4, and the confidence that the initial label of the image to be detected is snow is 0.3.
然后再对识别结果的初始标签进行筛选,具体的,判断初始标签的置信度是否大于预设阈值。其中,预设阈值可以是在前期训练出这个多标签分类模型的时候,根据大量的训 练样本,当损失函数比较小,所得出的结果比较接近实际的结果的的时候,所得出的一个置信度阈值。例如,根据大量的训练样本所得出的置信度阈值为0.5,则在上述例子中,判断初始标签的置信度是否大于预设阈值,将置信度大于预设阈值的初始标签作为图像对应的标签。所得出的待检测图像对应的标签为海滩、蓝天、人像,舍弃了置信度低于阈值的狗和雪景这两个干扰项。Then, the initial labels of the recognition results are filtered. Specifically, it is determined whether the confidence level of the initial labels is greater than a preset threshold. Wherein, the preset threshold may be a confidence level obtained when the multi-label classification model is trained in the early stage, based on a large number of training samples, when the loss function is relatively small, and the result obtained is close to the actual result. Threshold. For example, if the confidence threshold obtained based on a large number of training samples is 0.5, in the above example, it is determined whether the confidence of the initial label is greater than a preset threshold, and the initial label with a confidence greater than the preset threshold is used as a label corresponding to the image. The labels corresponding to the obtained images to be detected are beach, blue sky, and portrait, and two interference terms, dog and snow scene with confidence lower than the threshold, are discarded.
本申请实施例中,根据多标签分类模型对待检测图像进行场景识别,得到待检测图像的初始标签及初始标签对应的置信度。因为进行场景识别所得的初始标签不一定是待检测图像对应的真实的标签,因此,采用每个初始标签的置信度对初始标签进行筛选,筛选出大于置信度阈值的初始标签作为待检测图像对应的场景识别结果。这样在一定程度上提高了场景识别结果的准确性。In the embodiment of the present application, scene recognition is performed on an image to be detected according to a multi-label classification model, and an initial label of the image to be detected and a confidence level corresponding to the initial label are obtained. Because the initial labels obtained from scene recognition are not necessarily the true labels corresponding to the images to be detected, the confidence of each initial label is used to filter the initial labels, and the initial labels larger than the confidence threshold are selected as the corresponding images to be detected. Scene recognition results. This improves the accuracy of the scene recognition results to a certain extent.
在一个实施例中,每个初始标签对应的置信度的范围为[0,1]。In one embodiment, the range of confidence corresponding to each initial label is [0,1].
具体地,因为多标签分类模型为根据包含多种场景要素的多标签图像进行训练所得到的场景识别模型,所以可以对包含不同场景要素的待检测图像,进行场景识别之后直接较为准确地输出这个待检测图像中多个场景分别对应的标签。该多标签分类模型中对每一个标签的识别过程都是独立的,所以每一个识别出来的标签的概率都可以是在[0,1]之间。在本申请实施例中,不同标签的识别过程是互不影响的,所以就能够全面地识别出待检测图像中包含的所有场景,避免遗漏。Specifically, because the multi-label classification model is a scene recognition model that is trained based on multi-label images containing multiple scene elements, it is possible to directly output this image more accurately after performing scene recognition on images to be detected that contain different scene elements. Labels corresponding to multiple scenes in the image to be detected. In this multi-label classification model, the identification process of each label is independent, so the probability of each identified label can be between [0,1]. In the embodiment of the present application, the recognition processes of different tags do not affect each other, so it is possible to comprehensively identify all the scenes included in the image to be detected and avoid omissions.
在一个实施例中,如图5所示,在将待检测图像对应的标签作为场景识别的结果进行输出之后,包括:In an embodiment, as shown in FIG. 5, after outputting a label corresponding to an image to be detected as a result of scene recognition, the method includes:
操作520,获取待检测图像拍摄时的位置信息;Operation 520: Obtain position information when the image to be detected is captured;
操作540,根据位置信息对场景识别的结果进行校正,得到校正之后的场景识别的最终结果。In operation 540, the result of scene recognition is corrected according to the position information to obtain a final result of scene recognition after correction.
具体地,一般情况下,电子设备会对每次拍照的地点进行记录,一般采用GPS(Global Positioning System,全球定位系统)来进行记录地址信息。获取电子设备所记录的地址信息。在获取电子设备所记录的地址信息之后,根据地址信息获取待检测图像的位置信息。预先为不同的地址信息匹配对应的场景类别及场景类别对应的权值。具体地,可以是根据对大量的图像素材进行统计学分析后得出的结果,根据结果相应地为不同的地址信息匹配 对应的场景类别及场景类别对应的权值。例如,根据对大量的图像素材进行统计学分析后得出,当地址信息显示为“XXX草原”时,则与地址为“草原”对应的场景为“绿草”的权值为9,“雪景”的权值为7,“风景”的权值为4,“蓝天”的权值为6,“海滩”的权值为-8,权值的取值范围为[-10,10]。权值越大说明在该图像中出现该场景的概率就越大,权值越小说明在该图像中出现该场景的概率就越小。这样就可以根据图像拍摄时的地址信息及与该地址信息对应的场景的概率大小,对场景识别的结果进行校正,得到校正之后的场景识别的最终结果。例如,如果图片的地址信息为“XXX草原”,那么与该“XXX草原”对应的场景为“绿草”、“雪景”、“蓝天”的权值较高,则这些场景出现的概率较大。因此,对场景识别的结果进行校正,如果场景识别的结果中出现上述“绿草”、“雪景”、“蓝天”,那么就可以作为场景识别的最终结果。如果场景识别的结果中出现“海滩”这个场景,那么就应该根据图像拍摄时的地址信息对“海滩”场景进行过滤,去除“海滩”场景,避免得到不正确、不符合实际的场景类别。Specifically, in general, the electronic device records the location of each picture, and generally uses GPS (Global Positioning System) to record address information. Get the address information recorded by the electronic device. After acquiring the address information recorded by the electronic device, the position information of the image to be detected is acquired according to the address information. Match the corresponding scene category and the weight corresponding to the scene category for different address information in advance. Specifically, it may be a result obtained by performing statistical analysis on a large number of image materials, and the corresponding scene category and the corresponding weight of the scene category may be matched for different address information according to the result. For example, according to a statistical analysis of a large number of image materials, when the address information is displayed as "XXX grassland", the scene corresponding to the address "grassland" is "grass green" with a weight of 9, and "snow landscape" The weight of "" is 7, the weight of "landscape" is 4, the weight of "blue sky" is 6, the weight of "beach" is -8, and the value range is [-10,10]. The larger the weight value, the greater the probability of the scene appearing in the image, and the smaller the weight value, the smaller the probability of the scene appearing in the image. In this way, the result of scene recognition can be corrected according to the address information at the time of image shooting and the probability of the scene corresponding to the address information, to obtain the final result of scene recognition after correction. For example, if the address information of the picture is "XXX grassland", then the scenes corresponding to the "XXX grassland" have higher weights such as "green grass", "snow landscape", and "blue sky", so these scenes have a higher probability of appearing . Therefore, the result of scene recognition is corrected. If the above-mentioned "green grass", "snow scene", and "blue sky" appear in the result of scene recognition, then it can be used as the final result of scene recognition. If the scene of "beach" appears in the result of scene recognition, then the "beach" scene should be filtered according to the address information when the image was taken to remove the "beach" scene to avoid getting incorrect and incompatible scene categories.
本申请实施例中,获取待检测图像拍摄时的位置信息,根据位置信息对场景识别的结果进行校正,得到校正之后的场景识别的最终结果。可以实现用通过待检测图像的拍摄地址信息获取到的待检测图像的场景类别,来对场景识别的结果进行校准,从而最终提高了场景检测的准确度。In the embodiment of the present application, position information at the time of shooting an image to be detected is acquired, and a result of scene recognition is corrected according to the position information to obtain a final result of scene recognition after correction. The scene classification of the to-be-detected image obtained by using the shooting address information of the to-be-detected image can be implemented to calibrate the result of scene recognition, thereby ultimately improving the accuracy of scene detection.
在一个实施例中,在将待检测图像对应的标签作为场景识别的结果进行输出之后,还包括:In one embodiment, after outputting a label corresponding to the image to be detected as a result of scene recognition, the method further includes:
根据场景识别的结果对待检测图像进行与场景识别结果相对应的图像处理。According to the result of scene recognition, the image to be detected is subjected to image processing corresponding to the result of scene recognition.
本申请实施例中,在对待检测图像经过多标签分类模型进行场景识别之后,得到了待检测图像对应的标签,并将待检测图像对应的标签作为场景识别的结果进行输出之后。场景识别的结果可以用来作为图像后期处理的依据,可以根据场景识别的结果来对待检测图像进行针对性地图像处理,从而大大提高图像的质量。例如,如果识别出待检测图像的场景类别为夜景,则就可以采用夜景所适合的处理方式对该图像进行处理,例如增加亮度等。如果识别出待检测图像的场景类别为逆光,则就可以采用逆光所合适的处理方式对该图像进行处理。当然,如果识别出待检测图像的场景类别为多标签,例如包含海滩、绿草、蓝天,而可以分别对海滩区域采用适合海滩的处理方式,对绿草区域采用绿草所适合的处理 方式,而对蓝天则采用适合蓝天的处理方式分别进行图像处理,从而使得整个图像的效果都非常好。In the embodiment of the present application, after the image to be detected is subjected to multi-label classification model for scene recognition, a label corresponding to the image to be detected is obtained, and a label corresponding to the image to be detected is output as a result of scene recognition. The result of scene recognition can be used as the basis for image post-processing, and targeted image processing can be performed according to the result of scene recognition, thereby greatly improving the quality of the image. For example, if the scene type of the image to be detected is identified as night scene, the image may be processed in a suitable manner for the night scene, such as increasing brightness. If it is identified that the scene type of the image to be detected is backlighting, the image can be processed using a suitable processing method for backlighting. Of course, if it is identified that the scene category of the image to be detected is multi-label, such as including beaches, green grass, and blue sky, the beach area can be processed in a manner suitable for the beach, and the green grass area can be processed in a manner suitable for green grass. The blue sky is processed separately for the blue sky, so that the effect of the entire image is very good.
在一个具体的实施例中,提供了一种图像处理方法,以该方法应用于图1中的电子设备为例进行说明,包括:In a specific embodiment, an image processing method is provided. The method is applied to the electronic device in FIG. 1 as an example, and includes:
操作一,获取包含多种场景要素的多标签图像,使用包含多种场景要素的多标签图像,对神经网络模型进行训练以得到多标签分类模型,即多标签分类模型基于神经网络架构;Operation 1: Obtain a multi-label image containing multiple scene elements, and use the multi-label image containing multiple scene elements to train a neural network model to obtain a multi-label classification model, that is, the multi-label classification model is based on a neural network architecture;
操作二,根据多标签分类模型对待检测图像进行场景识别,得到待检测图像的初始标签及初始标签对应的置信度;Operation 2: Perform scene recognition according to the multi-label classification model to obtain the initial label of the image to be detected and the confidence level corresponding to the initial label;
操作三,判断初始标签的置信度是否大于预设阈值,当判断结果为是,则将置信度大于预设阈值的初始标签作为待检测图像对应的标签,将待检测图像对应的标签作为场景识别的结果进行输出;Operation three: Determine whether the confidence level of the initial label is greater than a preset threshold. When the determination result is yes, use the initial label whose confidence level is greater than the preset threshold as the label corresponding to the image to be detected, and use the label corresponding to the image to be detected as the scene recognition. Output the results;
操作四,获取待检测图像拍摄时的位置信息,根据位置信息对场景识别的结果进行校正,得到校正之后的场景识别的最终结果;Operation four: Obtain position information at the time of shooting the image to be detected, and correct the scene recognition result according to the position information to obtain the final result of the scene recognition after correction;
操作五,根据场景识别的结果对待检测图像进行与场景识别结果相对应的图像处理,得到处理之后的图像。Operation five: According to the result of the scene recognition, the image to be detected is subjected to image processing corresponding to the result of the scene recognition to obtain a processed image.
在本申请实施例中,因为多标签分类模型为根据包含多种场景要素的多标签图像所得到的场景识别模型,所以可以对包含不同场景要素的待检测图像,进行场景识别之后直接较为准确地输出这个图像中多个场景分别对应的标签。因此提高了对包含不同场景要素的待检测图像进行场景识别的准确性,且同时提高了场景识别的效率。根据待检测图像拍摄时的位置信息对场景识别的结果进行校正,得到校正之后的场景识别的最终结果。可以实现用通过待检测图像的拍摄地址信息获取到的待检测图像的场景类别,来对场景识别的结果进行校准,从而最终提高了场景检测的准确度。且场景识别的结果可以用来作为图像后期处理的依据,可以根据场景识别的结果来对图像进行针对性地图像处理,从而大大提高图像的质量。In the embodiment of the present application, because the multi-label classification model is a scene recognition model obtained from a multi-label image containing multiple scene elements, it is possible to directly and accurately perform scene recognition on images to be detected that include different scene elements. Output labels corresponding to multiple scenes in this image. Therefore, the accuracy of scene recognition on images to be detected containing different scene elements is improved, and the efficiency of scene recognition is also improved. The result of scene recognition is corrected according to the position information when the image to be detected is captured, to obtain the final result of scene recognition after correction. The scene classification of the to-be-detected image obtained by using the shooting address information of the to-be-detected image can be implemented to calibrate the result of scene recognition, thereby ultimately improving the accuracy of scene detection. And the result of scene recognition can be used as the basis for image post-processing, and the image can be targeted for image processing according to the result of scene recognition, thereby greatly improving the quality of the image.
应该理解的是,虽然上述流程图中的各个操作按照箭头的指示依次显示,但是这些操作并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些操作的执行并没有严格的顺序限制,这些操作可以以其它的顺序执行。而且,上述图中的至少一部分 操作可以包括多个子操作或者多个阶段,这些子操作或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子操作或者阶段的执行顺序也不必然是依次进行,而是可以与其它操作或者其它操作的子操作或者阶段的至少一部分轮流或者交替地执行。It should be understood that although the operations in the above flowchart are sequentially displayed in accordance with the directions of the arrows, these operations are not necessarily performed in the order indicated by the arrows. Unless explicitly stated in this article, there is no strict order in which these operations can be performed, and these operations can be performed in other orders. Moreover, at least a part of the operations in the above figure may include multiple sub-operations or multiple phases. These sub-operations or phases are not necessarily performed at the same time, but may be performed at different times. The execution of these sub-operations or phases The order is not necessarily performed sequentially, but may be performed in turn or alternately with at least a part of another operation or a sub-operation or stage of another operation.
在一个实施例中,如图6所示,提供了一种图像处理装置600,装置包括:图像获取模块610、场景识别模块620及输出模块630。其中,In one embodiment, as shown in FIG. 6, an image processing device 600 is provided, and the device includes an image acquisition module 610, a scene recognition module 620, and an output module 630. among them,
图像获取模块610,用于获取待检测图像;An image acquisition module 610, configured to acquire an image to be detected;
场景识别模块620,用于根据多标签分类模型对待检测图像进行场景识别,得到待检测图像对应的标签,多标签分类模型为根据包含多种场景要素的多标签图像得到的;A scene recognition module 620 is configured to perform scene recognition according to a multi-label classification model to obtain a label corresponding to the image to be detected, and the multi-label classification model is obtained from a multi-label image including multiple scene elements;
输出模块630,用于将待检测图像对应的标签作为场景识别的结果进行输出。An output module 630 is configured to output a label corresponding to the image to be detected as a result of scene recognition.
在一个实施例中,如图7所示,提供了一种图像处理装置600,装置还包括:In one embodiment, as shown in FIG. 7, an image processing apparatus 600 is provided, and the apparatus further includes:
多标签图像获取模块640,用于获取包含多种场景要素的多标签图像;A multi-label image acquisition module 640, configured to acquire a multi-label image including multiple scene elements;
多标签分类模型训练模块650,用于使用包含多种场景要素的多标签图像训练多标签分类模型。A multi-label classification model training module 650 is configured to train a multi-label classification model using a multi-label image including multiple scene elements.
在一个实施例中,如图8所示,场景识别模块620包括:In one embodiment, as shown in FIG. 8, the scene recognition module 620 includes:
初始标签获取模块622,用于根据多标签分类模型对待检测图像进行场景识别,得到待检测图像的初始标签及初始标签对应的置信度;An initial label acquisition module 622 is configured to perform scene recognition based on a multi-label classification model to obtain an initial label of the image to be detected and a confidence level corresponding to the initial label;
判断模块624,用于判断初始标签的置信度是否大于预设阈值;A determining module 624, configured to determine whether the confidence level of the initial label is greater than a preset threshold;
图像标签生成模块626,用于当判断结果为是,则将置信度大于预设阈值的初始标签作为待检测图像对应的标签。The image label generation module 626 is configured to, when the determination result is yes, use an initial label with a confidence level greater than a preset threshold as a label corresponding to the image to be detected.
在一个实施例中,提供了一种图像处理装置600,还用于获取待检测图像拍摄时的位置信息;根据位置信息对场景识别的结果进行校正,得到校正之后的场景识别的最终结果。In one embodiment, an image processing device 600 is provided, which is further configured to obtain position information when an image to be detected is taken; and correct the scene recognition result according to the position information to obtain a final scene recognition result after the correction.
在一个实施例中,提供了一种图像处理装置600,还用于根据场景识别的结果对待检测图像进行与场景识别结果相对应的图像处理。In one embodiment, an image processing device 600 is provided, and further configured to perform image processing corresponding to a scene recognition result on an image to be detected according to a result of scene recognition.
上述图像处理装置中各个模块的划分仅用于举例说明,在其他实施例中,可将图像处理装置按照需要划分为不同的模块,以完成上述图像处理装置的全部或部分功能。The division of each module in the above image processing apparatus is for illustration only. In other embodiments, the image processing apparatus may be divided into different modules as needed to complete all or part of the functions of the above image processing apparatus.
上述图像处理装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。其中,网络接口可以是以太网卡或无线网卡等,上述各模块可以以硬件形式内嵌于或独立于服务 器中的处理器中,也可以以软件形式存储于服务器中的存储器中,以便于处理器调用执行以上各个模块对应的操作。Each module in the image processing apparatus may be implemented in whole or in part by software, hardware, and a combination thereof. The network interface may be an Ethernet card or a wireless network card. The above modules may be embedded in the processor in the form of hardware or independent of the processor in the server, or may be stored in the memory of the server in the form of software to facilitate the processor. Call to perform the operations corresponding to the above modules.
在一个实施例中,提供了一种计算机可读存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现上述各实施例所提供的图像处理方法的操作。In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored. When the computer program is executed by a processor, the operations of the image processing methods provided by the foregoing embodiments are implemented.
在一个实施例中,提供了一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,处理器执行计算机程序时实现上述各实施例所提供的图像处理方法的操作。In one embodiment, an electronic device is provided, including a memory, a processor, and a computer program stored on the memory and executable on the processor. When the processor executes the computer program, the image processing provided by the foregoing embodiments is implemented. The operation of the method.
本申请实施例还提供了一种计算机程序产品,当其在计算机上运行时,使得计算机执行上述各实施例所提供的图像处理方法的操作。An embodiment of the present application further provides a computer program product, which when executed on a computer, causes the computer to perform operations of the image processing methods provided by the foregoing embodiments.
本申请实施例还提供一种电子设备。上述电子设备中包括图像处理电路,图像处理电路可以利用硬件和/或软件组件实现,可包括定义ISP(Image Signal Processing,图像信号处理)管线的各种处理单元。图9为一个实施例中图像处理电路的示意图。如图9所示,为便于说明,仅示出与本申请实施例相关的图像处理技术的各个方面。An embodiment of the present application further provides an electronic device. The above electronic device includes an image processing circuit. The image processing circuit may be implemented by hardware and / or software components, and may include various processing units that define an ISP (Image Signal Processing) pipeline. FIG. 9 is a schematic diagram of an image processing circuit in one embodiment. As shown in FIG. 9, for ease of description, only aspects of the image processing technology related to the embodiments of the present application are shown.
如图9所示,图像处理电路包括ISP处理器940和控制逻辑器950。成像设备910捕捉的图像数据首先由ISP处理器940处理,ISP处理器940对图像数据进行分析以捕捉可用于确定和/或成像设备910的一个或多个控制参数的图像统计信息。成像设备910可包括具有一个或多个透镜912和图像传感器914的照相机。图像传感器914可包括色彩滤镜阵列(如Bayer滤镜),图像传感器914可获取用图像传感器914的每个成像像素捕捉的光强度和波长信息,并提供可由ISP处理器940处理的一组原始图像数据。传感器920(如陀螺仪)可基于传感器920接口类型把采集的图像处理的参数(如防抖参数)提供给ISP处理器940。传感器920接口可以利用SMIA(Standard Mobile Imaging Architecture,标准移动成像架构)接口、其它串行或并行照相机接口或上述接口的组合。As shown in FIG. 9, the image processing circuit includes an ISP processor 940 and a control logic 950. The image data captured by the imaging device 910 is first processed by the ISP processor 940, which analyzes the image data to capture image statistical information that can be used to determine and / or one or more control parameters of the imaging device 910. The imaging device 910 may include a camera having one or more lenses 912 and an image sensor 914. The image sensor 914 may include a color filter array (such as a Bayer filter). The image sensor 914 may obtain the light intensity and wavelength information captured by each imaging pixel of the image sensor 914, and provide a set of Image data. The sensor 920 (such as a gyroscope) may provide parameters (such as image stabilization parameters) of the acquired image processing to the ISP processor 940 based on the interface type of the sensor 920. The sensor 920 interface may use a SMIA (Standard Mobile Imaging Architecture) interface, other serial or parallel camera interfaces, or a combination of the foregoing interfaces.
此外,图像传感器914也可将原始图像数据发送给传感器920,传感器920可基于传感器920接口类型把原始图像数据提供给ISP处理器940,或者传感器920将原始图像数据存储到图像存储器930中。In addition, the image sensor 914 may also send the original image data to the sensor 920, and the sensor 920 may provide the original image data to the ISP processor 940 based on the interface type of the sensor 920, or the sensor 920 stores the original image data in the image memory 930.
ISP处理器940按多种格式逐个像素地处理原始图像数据。例如,每个图像像素可具有8、10、12或14比特的位深度,ISP处理器940可对原始图像数据进行一个或多个图 像处理操作、收集关于图像数据的统计信息。其中,图像处理操作可按相同或不同的位深度精度进行。The ISP processor 940 processes the original image data pixel by pixel in a variety of formats. For example, each image pixel may have a bit depth of 8, 10, 12, or 14 bits, and the ISP processor 940 may perform one or more image processing operations on the original image data and collect statistical information about the image data. The image processing operations may be performed with the same or different bit depth accuracy.
ISP处理器940还可从图像存储器930接收图像数据。例如,传感器920接口将原始图像数据发送给图像存储器930,图像存储器930中的原始图像数据再提供给ISP处理器940以供处理。图像存储器930可为存储器装置的一部分、存储设备、或电子设备内的独立的专用存储器,并可包括DMA(Direct Memory Access,直接直接存储器存取)特征。The ISP processor 940 may also receive image data from the image memory 930. For example, the sensor 920 interface sends the original image data to the image memory 930, and the original image data in the image memory 930 is then provided to the ISP processor 940 for processing. The image memory 930 may be a part of a memory device, a storage device, or a separate dedicated memory in an electronic device, and may include a DMA (Direct Memory Access) feature.
当接收到来自图像传感器914接口或来自传感器920接口或来自图像存储器930的原始图像数据时,ISP处理器940可进行一个或多个图像处理操作,如时域滤波。处理后的图像数据可发送给图像存储器930,以便在被显示之前进行另外的处理。ISP处理器940从图像存储器930接收处理数据,并对处理数据进行原始域中以及RGB和YCbCr颜色空间中的图像数据处理。ISP处理器940处理后的图像数据可输出给显示器970,以供用户观看和/或由图形引擎或GPU(Graphics Processing Unit,图形处理器)进一步处理。此外,ISP处理器940的输出还可发送给图像存储器930,且显示器970可从图像存储器930读取图像数据。在一个实施例中,图像存储器930可被配置为实现一个或多个帧缓冲器。此外,ISP处理器940的输出可发送给编码器/解码器960,以便编码/解码图像数据。编码的图像数据可被保存,并在显示于显示器970设备上之前解压缩。编码器/解码器960可由CPU或GPU或协处理器实现。When receiving raw image data from the image sensor 914 interface or from the sensor 920 interface or from the image memory 930, the ISP processor 940 may perform one or more image processing operations, such as time-domain filtering. The processed image data may be sent to the image memory 930 for further processing before being displayed. The ISP processor 940 receives the processing data from the image memory 930, and performs processing on the image data in the original domain and in the RGB and YCbCr color spaces. The image data processed by the ISP processor 940 may be output to the display 970 for viewing by the user and / or further processed by a graphics engine or a GPU (Graphics Processing Unit). In addition, the output of the ISP processor 940 can also be sent to the image memory 930, and the display 970 can read image data from the image memory 930. In one embodiment, the image memory 930 may be configured to implement one or more frame buffers. In addition, the output of the ISP processor 940 may be sent to an encoder / decoder 960 to encode / decode image data. The encoded image data can be saved and decompressed before being displayed on the display 970 device. The encoder / decoder 960 may be implemented by a CPU or a GPU or a coprocessor.
ISP处理器940确定的统计数据可发送给控制逻辑器950单元。例如,统计数据可包括自动曝光、自动白平衡、自动聚焦、闪烁检测、黑电平补偿、透镜912阴影校正等图像传感器914统计信息。控制逻辑器950可包括执行一个或多个例程(如固件)的处理器和/或微控制器,一个或多个例程可根据接收的统计数据,确定成像设备910的控制参数及ISP处理器940的控制参数。例如,成像设备910的控制参数可包括传感器920控制参数(例如增益、曝光控制的积分时间、防抖参数等)、照相机闪光控制参数、透镜912控制参数(例如聚焦或变焦用焦距)、或这些参数的组合。ISP控制参数可包括用于自动白平衡和颜色调整(例如,在RGB处理期间)的增益水平和色彩校正矩阵,以及透镜912阴影校正参数。The statistical data determined by the ISP processor 940 may be sent to the control logic 950 unit. For example, the statistical data may include image information of the image sensor 914 such as auto exposure, auto white balance, auto focus, flicker detection, black level compensation, and lens 912 shading correction. The control logic 950 may include a processor and / or a microcontroller that executes one or more routines (such as firmware). The one or more routines may determine the control parameters of the imaging device 910 and the ISP processing according to the received statistical data. Parameters of the controller 940. For example, the control parameters of the imaging device 910 may include sensor 920 control parameters (such as gain, integration time for exposure control, image stabilization parameters, etc.), camera flash control parameters, lens 912 control parameters (such as focus distance for focusing or zooming), or these A combination of parameters. ISP control parameters may include gain levels and color correction matrices for automatic white balance and color adjustment (eg, during RGB processing), and lens 912 shading correction parameters.
本申请所使用的对存储器、存储、数据库或其它介质的任何引用可包括非易失性和/或易 失性存储器。合适的非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM),它用作外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDR SDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)。Any reference to memory, storage, database, or other media used in this application may include non-volatile and / or volatile memory. Suitable non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (RAM), which is used as external cache memory. By way of illustration and not limitation, RAM is available in various forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDR, SDRAM), enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,该程序可存储于一非易失性计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,该存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)等。Those of ordinary skill in the art can understand that the implementation of all or part of the processes in the method of the above embodiments can be completed by a computer program instructing related hardware. The program can be stored in a non-volatile computer-readable storage medium. When the program is executed, it may include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or the like.
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。The technical features of the above embodiments can be arbitrarily combined. In order to make the description concise, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, all It is considered to be the range described in this specification.
以上实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。The above embodiments only express several implementation manners of the present application, and the descriptions thereof are more specific and detailed, but cannot be understood as limiting the scope of the invention patent. It should be noted that, for those of ordinary skill in the art, without departing from the concept of the present application, several modifications and improvements can be made, and these all belong to the protection scope of the present application. Therefore, the protection scope of this application patent shall be subject to the appended claims.

Claims (16)

  1. 一种图像处理方法,其特征在于,包括:An image processing method, comprising:
    获取待检测图像;Obtaining images to be detected;
    根据多标签分类模型对所述待检测图像进行场景识别,得到所述待检测图像对应的标签,所述多标签分类模型为根据包含多种场景要素的多标签图像得到的;及Performing scene recognition on the to-be-detected image according to a multi-label classification model to obtain tags corresponding to the to-be-detected image, the multi-label classification model being obtained from a multi-label image including multiple scene elements; and
    将所述待检测图像对应的标签作为场景识别的结果进行输出。And outputting a label corresponding to the image to be detected as a result of scene recognition.
  2. 根据权利要求1所述的方法,其特征在于,在所述获取待检测图像之前,包括:The method according to claim 1, wherein before the acquiring an image to be detected, comprises:
    获取包含多种场景要素的多标签图像;及Obtain multi-label images containing multiple scene elements; and
    使用所述包含多种场景要素的多标签图像训练所述多标签分类模型。The multi-label classification model is trained using the multi-label images containing multiple scene elements.
  3. 根据权利要求2所述的方法,其特征在于,所述多标签分类模型基于神经网络模型构建。The method according to claim 2, wherein the multi-label classification model is constructed based on a neural network model.
  4. 根据权利要求1所述的方法,其特征在于,所述根据多标签分类模型对所述待检测图像进行场景识别,得到所述待检测图像对应的标签,包括:The method according to claim 1, wherein performing scene recognition on the image to be detected according to a multi-label classification model, and obtaining a label corresponding to the image to be detected comprises:
    根据多标签分类模型对所述待检测图像进行场景识别,得到所述待检测图像的初始标签及所述初始标签对应的置信度;Performing scene recognition on the image to be detected according to a multi-label classification model, and obtaining an initial label of the image to be detected and a confidence level corresponding to the initial label;
    判断所述初始标签的置信度是否大于预设阈值;及Determining whether the confidence level of the initial label is greater than a preset threshold; and
    当判断结果为是,则将置信度大于预设阈值的所述初始标签作为所述待检测图像对应的标签。When the determination result is yes, the initial label with a confidence level greater than a preset threshold is used as a label corresponding to the image to be detected.
  5. 根据权利要求4所述的方法,其特征在于,所述每个初始标签对应的置信度的范围为[0,1]。The method according to claim 4, wherein the range of the confidence corresponding to each initial label is [0,1].
  6. 根据权利要求1所述的方法,其特征在于,在将所述待检测图像对应的标签作为场景识别的结果进行输出之后,包括:The method according to claim 1, after outputting a label corresponding to the image to be detected as a result of scene recognition, comprising:
    获取所述待检测图像拍摄时的位置信息;及Acquiring position information when the image to be detected is taken; and
    根据所述位置信息对所述场景识别的结果进行校正,得到校正之后的场景识别的最终结果。The result of scene recognition is corrected according to the position information to obtain a final result of scene recognition after correction.
  7. 根据权利要求1所述的方法,其特征在于,在将所述待检测图像对应的标签作为 场景识别的结果进行输出之后,还包括:The method according to claim 1, after outputting a label corresponding to the image to be detected as a result of scene recognition, further comprising:
    根据场景识别的结果对所述待检测图像进行与所述场景识别结果相对应的图像处理。Performing image processing corresponding to the scene recognition result on the image to be detected according to a scene recognition result.
  8. 一种图像处理装置,其特征在于,所述装置包括:An image processing device, wherein the device includes:
    图像获取模块,用于获取待检测图像;An image acquisition module, configured to acquire an image to be detected;
    场景识别模块,用于根据多标签分类模型对所述待检测图像进行场景识别,得到所述待检测图像对应的标签,所述多标签分类模型为根据包含多种场景要素的多标签图像得到的;及A scene recognition module is configured to perform scene recognition on the image to be detected according to a multi-label classification model, to obtain tags corresponding to the image to be detected, and the multi-label classification model is obtained from a multi-label image including multiple scene elements ;and
    输出模块,用于将所述待检测图像对应的标签作为场景识别的结果进行输出。An output module is configured to output a label corresponding to the image to be detected as a result of scene recognition.
  9. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至7中任一项所述的图像处理方法的操作。A computer-readable storage medium having stored thereon a computer program, wherein when the computer program is executed by a processor, the operations of the image processing method according to any one of claims 1 to 7 are realized.
  10. 一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现以下操作:An electronic device includes a memory, a processor, and a computer program stored on the memory and executable on the processor. The electronic device is characterized in that when the processor executes the computer program, the following operations are performed:
    获取待检测图像;Obtaining images to be detected;
    根据多标签分类模型对所述待检测图像进行场景识别,得到所述待检测图像对应的标签,所述多标签分类模型为根据包含多种场景要素的多标签图像得到的;及Performing scene recognition on the to-be-detected image according to a multi-label classification model to obtain tags corresponding to the to-be-detected image, the multi-label classification model being obtained from a multi-label image including multiple scene elements; and
    将所述待检测图像对应的标签作为场景识别的结果进行输出。And outputting a label corresponding to the image to be detected as a result of scene recognition.
  11. 根据权利要求10所述的电子设备,其特征在于,在所述获取待检测图像之前,包括:The electronic device according to claim 10, before the acquiring an image to be detected, comprising:
    获取包含多种场景要素的多标签图像;及Obtain multi-label images containing multiple scene elements; and
    使用所述包含多种场景要素的多标签图像训练所述多标签分类模型。The multi-label classification model is trained using the multi-label images containing multiple scene elements.
  12. 根据权利要求11所述的电子设备,其特征在于,所述多标签分类模型基于神经网络模型构建。The electronic device according to claim 11, wherein the multi-label classification model is constructed based on a neural network model.
  13. 根据权利要求10所述的电子设备,其特征在于,所述根据多标签分类模型对所述待检测图像进行场景识别,得到所述待检测图像对应的标签,包括:The electronic device according to claim 10, wherein the performing scene recognition on the image to be detected according to a multi-label classification model to obtain a label corresponding to the image to be detected comprises:
    根据多标签分类模型对所述待检测图像进行场景识别,得到所述待检测图像的初始标签及所述初始标签对应的置信度;Performing scene recognition on the image to be detected according to a multi-label classification model, and obtaining an initial label of the image to be detected and a confidence level corresponding to the initial label;
    判断所述初始标签的置信度是否大于预设阈值;及Determining whether the confidence level of the initial label is greater than a preset threshold; and
    当判断结果为是,则将置信度大于预设阈值的所述初始标签作为所述待检测图像对应的标签。When the determination result is yes, the initial label with a confidence level greater than a preset threshold is used as a label corresponding to the image to be detected.
  14. 根据权利要求13所述的电子设备,其特征在于,所述每个初始标签对应的置信度的范围为[0,1]。The electronic device according to claim 13, wherein the range of the confidence corresponding to each initial label is [0,1].
  15. 根据权利要求10所述的电子设备,其特征在于,在将所述待检测图像对应的标签作为场景识别的结果进行输出之后,包括:The electronic device according to claim 10, wherein after outputting a label corresponding to the image to be detected as a result of scene recognition, comprising:
    获取所述待检测图像拍摄时的位置信息;及Acquiring position information when the image to be detected is taken; and
    根据所述位置信息对所述场景识别的结果进行校正,得到校正之后的场景识别的最终结果。The result of scene recognition is corrected according to the position information to obtain a final result of scene recognition after correction.
  16. 根据权利要求10所述的电子设备,其特征在于,在将所述待检测图像对应的标签作为场景识别的结果进行输出之后,还包括:The electronic device according to claim 10, after outputting a label corresponding to the image to be detected as a result of scene recognition, further comprising:
    根据场景识别的结果对所述待检测图像进行与所述场景识别结果相对应的图像处理。Performing image processing corresponding to the scene recognition result on the image to be detected according to a scene recognition result.
PCT/CN2019/089914 2018-06-08 2019-06-04 Image processing method and apparatus, storage medium and electronic device WO2019233394A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810585679.3A CN108764208B (en) 2018-06-08 2018-06-08 Image processing method and device, storage medium and electronic equipment
CN201810585679.3 2018-06-08

Publications (1)

Publication Number Publication Date
WO2019233394A1 true WO2019233394A1 (en) 2019-12-12

Family

ID=64000474

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/089914 WO2019233394A1 (en) 2018-06-08 2019-06-04 Image processing method and apparatus, storage medium and electronic device

Country Status (2)

Country Link
CN (1) CN108764208B (en)
WO (1) WO2019233394A1 (en)

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111008145A (en) * 2019-12-19 2020-04-14 中国银行股份有限公司 Test information acquisition method and device
CN111125177A (en) * 2019-12-26 2020-05-08 北京奇艺世纪科技有限公司 Method and device for generating data label, electronic equipment and readable storage medium
CN111128348A (en) * 2019-12-27 2020-05-08 上海联影智能医疗科技有限公司 Medical image processing method, device, storage medium and computer equipment
CN111160289A (en) * 2019-12-31 2020-05-15 欧普照明股份有限公司 Method and device for detecting accident of target user and electronic equipment
CN111291800A (en) * 2020-01-21 2020-06-16 青梧桐有限责任公司 House decoration type analysis method and system, electronic device and readable storage medium
CN111292331A (en) * 2020-02-23 2020-06-16 华为技术有限公司 Image processing method and device
CN111353549A (en) * 2020-03-10 2020-06-30 创新奇智(重庆)科技有限公司 Image tag verification method and device, electronic device and storage medium
CN111461260A (en) * 2020-04-29 2020-07-28 上海东普信息科技有限公司 Target detection method, device and equipment based on feature fusion and storage medium
CN111612034A (en) * 2020-04-15 2020-09-01 中国科学院上海微系统与信息技术研究所 Method and device for determining object recognition model, electronic equipment and storage medium
CN111709371A (en) * 2020-06-17 2020-09-25 腾讯科技(深圳)有限公司 Artificial intelligence based classification method, device, server and storage medium
CN111985449A (en) * 2020-09-03 2020-11-24 深圳壹账通智能科技有限公司 Rescue scene image identification method, device, equipment and computer medium
CN112023400A (en) * 2020-07-24 2020-12-04 上海米哈游天命科技有限公司 Height map generation method, device, equipment and storage medium
CN112579587A (en) * 2020-12-29 2021-03-30 北京百度网讯科技有限公司 Data cleaning method and device, equipment and storage medium
CN112926158A (en) * 2021-03-16 2021-06-08 上海设序科技有限公司 General design method based on parameter fine adjustment in industrial machine design scene
CN113065513A (en) * 2021-01-27 2021-07-02 武汉星巡智能科技有限公司 Method, device and equipment for optimizing self-training confidence threshold of intelligent camera
CN113177498A (en) * 2021-05-10 2021-07-27 清华大学 Image identification method and device based on object real size and object characteristics
CN113221800A (en) * 2021-05-24 2021-08-06 珠海大横琴科技发展有限公司 Monitoring and judging method and system for target to be detected
CN113329173A (en) * 2021-05-19 2021-08-31 Tcl通讯(宁波)有限公司 Image optimization method and device, storage medium and terminal equipment
CN113569593A (en) * 2020-04-28 2021-10-29 京东方科技集团股份有限公司 Intelligent vase system, flower identification and display method and electronic equipment
CN113642595A (en) * 2020-05-11 2021-11-12 北京金山数字娱乐科技有限公司 Information extraction method and device based on picture
CN114049420A (en) * 2021-10-29 2022-02-15 马上消费金融股份有限公司 Model training method, image rendering method, device and electronic equipment
CN114118114A (en) * 2020-08-26 2022-03-01 顺丰科技有限公司 Image detection method, device and storage medium thereof
CN114255381A (en) * 2021-12-23 2022-03-29 北京瑞莱智慧科技有限公司 Training method of image recognition model, image recognition method, device and medium
CN115100419A (en) * 2022-07-20 2022-09-23 中国科学院自动化研究所 Target detection method and device, electronic equipment and storage medium

Families Citing this family (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108764208B (en) * 2018-06-08 2021-06-08 Oppo广东移动通信有限公司 Image processing method and device, storage medium and electronic equipment
CN109635701B (en) * 2018-12-05 2023-04-18 宽凳(北京)科技有限公司 Lane passing attribute acquisition method, lane passing attribute acquisition device and computer readable storage medium
CN109657517B (en) * 2018-12-21 2021-12-03 深圳智可德科技有限公司 Miniature two-dimensional code identification method and device, readable storage medium and code scanning gun
CN109741288B (en) * 2019-01-04 2021-07-13 Oppo广东移动通信有限公司 Image processing method, image processing device, storage medium and electronic equipment
CN109831629B (en) * 2019-03-14 2021-07-02 Oppo广东移动通信有限公司 Terminal photographing mode adjusting method and device, terminal and storage medium
CN109831628B (en) * 2019-03-14 2021-07-16 Oppo广东移动通信有限公司 Terminal photographing mode adjusting method and device, terminal and storage medium
CN110348291A (en) * 2019-05-28 2019-10-18 华为技术有限公司 A kind of scene recognition method, a kind of scene Recognition device and a kind of electronic equipment
CN110266946B (en) * 2019-06-25 2021-06-25 普联技术有限公司 Photographing effect automatic optimization method and device, storage medium and terminal equipment
CN110796715B (en) * 2019-08-26 2023-11-24 腾讯科技(深圳)有限公司 Electronic map labeling method, device, server and storage medium
CN110704650B (en) * 2019-09-29 2023-04-25 携程计算机技术(上海)有限公司 OTA picture tag identification method, electronic equipment and medium
CN110781834A (en) * 2019-10-28 2020-02-11 上海眼控科技股份有限公司 Traffic abnormality image detection method, device, computer device and storage medium
CN111191706A (en) * 2019-12-25 2020-05-22 深圳市赛维网络科技有限公司 Picture identification method, device, equipment and storage medium
CN111212243B (en) * 2020-02-19 2022-05-20 深圳英飞拓智能技术有限公司 Automatic exposure adjusting system for mixed line detection
CN111523390B (en) * 2020-03-25 2023-11-03 杭州易现先进科技有限公司 Image recognition method and augmented reality AR icon recognition system
CN111597921B (en) * 2020-04-28 2024-06-18 深圳市人工智能与机器人研究院 Scene recognition method, device, computer equipment and storage medium
CN111709283A (en) * 2020-05-07 2020-09-25 顺丰科技有限公司 Method and device for detecting state of logistics piece
CN111613212B (en) * 2020-05-13 2023-10-31 携程旅游信息技术(上海)有限公司 Speech recognition method, system, electronic device and storage medium
CN111626353A (en) * 2020-05-26 2020-09-04 Oppo(重庆)智能科技有限公司 Image processing method, terminal and storage medium
CN111915598B (en) * 2020-08-07 2023-10-13 温州医科大学 Medical image processing method and device based on deep learning
CN112163110B (en) * 2020-09-27 2023-01-03 Oppo(重庆)智能科技有限公司 Image classification method and device, electronic equipment and computer-readable storage medium
CN112329725B (en) * 2020-11-27 2022-03-25 腾讯科技(深圳)有限公司 Method, device and equipment for identifying elements of road scene and storage medium
CN112651332A (en) * 2020-12-24 2021-04-13 携程旅游信息技术(上海)有限公司 Scene facility identification method, system, equipment and storage medium based on photo library
CN112686316A (en) * 2020-12-30 2021-04-20 上海掌门科技有限公司 Method and equipment for determining label
CN112906811B (en) * 2021-03-09 2023-04-18 西安电子科技大学 Automatic classification method for images of engineering vehicle-mounted equipment based on Internet of things architecture
CN113222058B (en) * 2021-05-28 2024-05-10 芯算一体(深圳)科技有限公司 Image classification method, device, electronic equipment and storage medium
CN113222055B (en) * 2021-05-28 2023-01-10 新疆爱华盈通信息技术有限公司 Image classification method and device, electronic equipment and storage medium
CN113065615A (en) * 2021-06-02 2021-07-02 南京甄视智能科技有限公司 Scenario-based edge analysis algorithm issuing method and device and storage medium
CN114998357B (en) * 2022-08-08 2022-11-15 长春摩诺维智能光电科技有限公司 Industrial detection method, system, terminal and medium based on multi-information analysis
CN116310665B (en) * 2023-05-17 2023-08-15 济南博观智能科技有限公司 Image environment analysis method, device and medium
CN117671497B (en) * 2023-12-04 2024-05-28 广东筠诚建筑科技有限公司 Engineering construction waste classification method and device based on digital images

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106845549A (en) * 2017-01-22 2017-06-13 珠海习悦信息技术有限公司 A kind of method and device of the scene based on multi-task learning and target identification
CN106951911A (en) * 2017-02-13 2017-07-14 北京飞搜科技有限公司 A kind of quick multi-tag picture retrieval system and implementation method
CN108052966A (en) * 2017-12-08 2018-05-18 重庆邮电大学 Remote sensing images scene based on convolutional neural networks automatically extracts and sorting technique
CN108090497A (en) * 2017-12-28 2018-05-29 广东欧珀移动通信有限公司 Video classification methods, device, storage medium and electronic equipment
CN108764208A (en) * 2018-06-08 2018-11-06 Oppo广东移动通信有限公司 Image processing method and device, storage medium, electronic equipment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107622281B (en) * 2017-09-20 2021-02-05 Oppo广东移动通信有限公司 Image classification method and device, storage medium and mobile terminal

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106845549A (en) * 2017-01-22 2017-06-13 珠海习悦信息技术有限公司 A kind of method and device of the scene based on multi-task learning and target identification
CN106951911A (en) * 2017-02-13 2017-07-14 北京飞搜科技有限公司 A kind of quick multi-tag picture retrieval system and implementation method
CN108052966A (en) * 2017-12-08 2018-05-18 重庆邮电大学 Remote sensing images scene based on convolutional neural networks automatically extracts and sorting technique
CN108090497A (en) * 2017-12-28 2018-05-29 广东欧珀移动通信有限公司 Video classification methods, device, storage medium and electronic equipment
CN108764208A (en) * 2018-06-08 2018-11-06 Oppo广东移动通信有限公司 Image processing method and device, storage medium, electronic equipment

Cited By (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111008145B (en) * 2019-12-19 2023-09-22 中国银行股份有限公司 Test information acquisition method and device
CN111008145A (en) * 2019-12-19 2020-04-14 中国银行股份有限公司 Test information acquisition method and device
CN111125177B (en) * 2019-12-26 2024-01-16 北京奇艺世纪科技有限公司 Method and device for generating data tag, electronic equipment and readable storage medium
CN111125177A (en) * 2019-12-26 2020-05-08 北京奇艺世纪科技有限公司 Method and device for generating data label, electronic equipment and readable storage medium
CN111128348A (en) * 2019-12-27 2020-05-08 上海联影智能医疗科技有限公司 Medical image processing method, device, storage medium and computer equipment
CN111128348B (en) * 2019-12-27 2024-03-26 上海联影智能医疗科技有限公司 Medical image processing method, medical image processing device, storage medium and computer equipment
CN111160289A (en) * 2019-12-31 2020-05-15 欧普照明股份有限公司 Method and device for detecting accident of target user and electronic equipment
CN111291800A (en) * 2020-01-21 2020-06-16 青梧桐有限责任公司 House decoration type analysis method and system, electronic device and readable storage medium
CN111292331A (en) * 2020-02-23 2020-06-16 华为技术有限公司 Image processing method and device
CN111292331B (en) * 2020-02-23 2023-09-12 华为云计算技术有限公司 Image processing method and device
CN111353549A (en) * 2020-03-10 2020-06-30 创新奇智(重庆)科技有限公司 Image tag verification method and device, electronic device and storage medium
CN111612034A (en) * 2020-04-15 2020-09-01 中国科学院上海微系统与信息技术研究所 Method and device for determining object recognition model, electronic equipment and storage medium
CN111612034B (en) * 2020-04-15 2024-04-12 中国科学院上海微系统与信息技术研究所 Method and device for determining object recognition model, electronic equipment and storage medium
CN113569593A (en) * 2020-04-28 2021-10-29 京东方科技集团股份有限公司 Intelligent vase system, flower identification and display method and electronic equipment
CN111461260B (en) * 2020-04-29 2023-04-18 上海东普信息科技有限公司 Target detection method, device and equipment based on feature fusion and storage medium
CN111461260A (en) * 2020-04-29 2020-07-28 上海东普信息科技有限公司 Target detection method, device and equipment based on feature fusion and storage medium
CN113642595A (en) * 2020-05-11 2021-11-12 北京金山数字娱乐科技有限公司 Information extraction method and device based on picture
CN111709371A (en) * 2020-06-17 2020-09-25 腾讯科技(深圳)有限公司 Artificial intelligence based classification method, device, server and storage medium
CN111709371B (en) * 2020-06-17 2023-12-22 腾讯科技(深圳)有限公司 Classification method, device, server and storage medium based on artificial intelligence
CN112023400A (en) * 2020-07-24 2020-12-04 上海米哈游天命科技有限公司 Height map generation method, device, equipment and storage medium
CN114118114A (en) * 2020-08-26 2022-03-01 顺丰科技有限公司 Image detection method, device and storage medium thereof
CN111985449A (en) * 2020-09-03 2020-11-24 深圳壹账通智能科技有限公司 Rescue scene image identification method, device, equipment and computer medium
CN112579587A (en) * 2020-12-29 2021-03-30 北京百度网讯科技有限公司 Data cleaning method and device, equipment and storage medium
CN113065513A (en) * 2021-01-27 2021-07-02 武汉星巡智能科技有限公司 Method, device and equipment for optimizing self-training confidence threshold of intelligent camera
CN112926158A (en) * 2021-03-16 2021-06-08 上海设序科技有限公司 General design method based on parameter fine adjustment in industrial machine design scene
CN112926158B (en) * 2021-03-16 2023-07-14 上海设序科技有限公司 General design method based on parameter fine adjustment in industrial machinery design scene
CN113177498A (en) * 2021-05-10 2021-07-27 清华大学 Image identification method and device based on object real size and object characteristics
CN113177498B (en) * 2021-05-10 2022-08-09 清华大学 Image identification method and device based on object real size and object characteristics
CN113329173A (en) * 2021-05-19 2021-08-31 Tcl通讯(宁波)有限公司 Image optimization method and device, storage medium and terminal equipment
CN113221800A (en) * 2021-05-24 2021-08-06 珠海大横琴科技发展有限公司 Monitoring and judging method and system for target to be detected
CN114049420A (en) * 2021-10-29 2022-02-15 马上消费金融股份有限公司 Model training method, image rendering method, device and electronic equipment
CN114255381B (en) * 2021-12-23 2023-05-12 北京瑞莱智慧科技有限公司 Training method of image recognition model, image recognition method, device and medium
CN114255381A (en) * 2021-12-23 2022-03-29 北京瑞莱智慧科技有限公司 Training method of image recognition model, image recognition method, device and medium
CN115100419A (en) * 2022-07-20 2022-09-23 中国科学院自动化研究所 Target detection method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN108764208B (en) 2021-06-08
CN108764208A (en) 2018-11-06

Similar Documents

Publication Publication Date Title
WO2019233394A1 (en) Image processing method and apparatus, storage medium and electronic device
WO2019233393A1 (en) Image processing method and apparatus, storage medium, and electronic device
US11138478B2 (en) Method and apparatus for training, classification model, mobile terminal, and readable storage medium
CN108764370B (en) Image processing method, image processing device, computer-readable storage medium and computer equipment
CN108777815B (en) Video processing method and device, electronic equipment and computer readable storage medium
WO2019233297A1 (en) Data set construction method, mobile terminal and readable storage medium
CN108921161B (en) Model training method and device, electronic equipment and computer readable storage medium
US10896323B2 (en) Method and device for image processing, computer readable storage medium, and electronic device
WO2020001197A1 (en) Image processing method, electronic device and computer readable storage medium
WO2019233266A1 (en) Image processing method, computer readable storage medium and electronic device
CN108810413B (en) Image processing method and device, electronic equipment and computer readable storage medium
EP3598736B1 (en) Method and apparatus for processing image
US11132771B2 (en) Bright spot removal using a neural network
WO2019233392A1 (en) Image processing method and apparatus, electronic device, and computer-readable storage medium
CN108804658B (en) Image processing method and device, storage medium and electronic equipment
WO2019233260A1 (en) Method and apparatus for pushing advertisement information, storage medium and electronic device
CN108961302B (en) Image processing method, image processing device, mobile terminal and computer readable storage medium
CN108897786B (en) Recommendation method and device of application program, storage medium and mobile terminal
WO2019233271A1 (en) Image processing method, computer readable storage medium and electronic device
WO2019223594A1 (en) Neural network model processing method and device, image processing method, and mobile terminal
WO2020001196A1 (en) Image processing method, electronic device, and computer readable storage medium
WO2019223513A1 (en) Image recognition method, electronic device and storage medium
CN108848306B (en) Image processing method and device, electronic equipment and computer readable storage medium
CN108717530A (en) Image processing method, device, computer readable storage medium and electronic equipment
CN110956679B (en) Image processing method and device, electronic equipment and computer readable storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19816116

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19816116

Country of ref document: EP

Kind code of ref document: A1