WO2019228450A1

WO2019228450A1 - Image processing method, device, and equipment, and readable medium

Info

Publication number: WO2019228450A1
Application number: PCT/CN2019/089249
Authority: WO
Inventors: 徐跃书; 肖飞; 范蒙; 俞海
Original assignee: 杭州海康威视数字技术股份有限公司
Priority date: 2018-05-31
Filing date: 2019-05-30
Publication date: 2019-12-05
Also published as: CN110555877B; CN110555877A

Abstract

Provided by the present disclosure are an image processing method, device, and equipment, and a storage medium, the image processing method comprising: acquiring, from collected first image data of a first data format, position information of a specific target in the first image data; intercepting target data corresponding to the position information from the first image data; converting the data format of the target data from the first data format to a second data format, the second data format being suitable for displaying and/or transmitting the target data. The present disclosure may improve the image quality of a detection target.

Description

Image processing method, device and equipment, and readable medium

Cross-reference to related applications

This patent application claims priority from a Chinese patent application filed on May 31, 2018 with an application number of 201810571964.X and an invention name of "an image processing method, device and device, and readable medium". The entire text is incorporated herein by reference.

Technical field

The present disclosure relates to the field of image processing technologies, and in particular, to an image processing method, apparatus, and device, and a readable medium.

Background technique

The main purpose of target detection technology is to detect and locate specific targets from a single frame of pictures or videos. At present, target detection technology has been widely used in various fields in society, such as: text detection of goods handling in logistics, detection of illegal vehicles in road traffic, detection of passenger flow and statistics of passenger flow in shopping malls and stations, and so on.

The target detection algorithm mainly uses low-bit-width images that have been processed by the ISP. After detecting the target of interest, the corresponding target image is extracted from the image for display or subsequent recognition. In the system of this detection technology, the quality of the target image finally obtained is generally large, and some images have better quality, but in many cases there may be poor quality such as blur, insufficient brightness, and insufficient contrast.

In the patent application document published by the Chinese Patent Office with the publication number of CN104463103A, an image processing method and device are proposed. When the detection target is text, the text in the target image is sharpened. The main flow of the solution is as follows: The object of interest in the image is detected. The detected object is classified using a preset classifier. When the classification result is a text, the text is sharpened.

Due to the design defects of the ISP processing algorithm and the superposition of the loss of each processing module, the original information of the image will be lost to a certain extent, and the information used in the technical solution of CN104463103A in the subsequent word processing has been processed by the ISP algorithm Data format image, at this time, the information may be seriously lost and cannot be repaired later; and only the text is processed, and the text is generally a very small part of the people's attention. When other targets of people's attention such as faces, For vehicles, buildings, etc., the patent does not perform subsequent processing to improve the quality of key targets; on the whole, the current solution is more limited and cannot comprehensively improve the image quality of detected targets.

Summary of the Invention

In view of this, the present disclosure provides an image processing method, apparatus, and device, and a readable medium, which can improve the image quality of a detection target.

A first aspect of the present disclosure provides an image processing method, including:

Acquiring position information of a specified target in the first image data from the collected first image data in a first data format;

Intercepting target data corresponding to the position information from the first image data;

And converting the data format of the target data from the first data format to a second data format, where the second data format is suitable for displaying and / or transmitting the target data.

According to an embodiment of the present disclosure, the acquiring position information of a specified target in the first image data from the acquired first image data in the first data format includes:

Converting the first image data into second image data capable of performing target detection;

The position information of the designated target is detected in the second image data, and the detected position information is determined as the position information of the designated target in the first image data.

According to an embodiment of the present disclosure, the detecting the position information of the designated target in the second image data, and determining the detected position information as the position information of the designated target in the first image data includes:

Input the second image data to a trained first neural network; the first neural network at least passes a convolution layer for performing convolution, a pooling layer for performing downsampling, and performs feature synthesis A fully-connected layer and a frame regression layer for performing coordinate transformation to realize the positioning and output of the position information of the specified target;

The result output by the first neural network is determined as position information of the specified target in the first image data.

According to an embodiment of the present disclosure, the converting the first image data into second image data capable of performing target detection includes: using black level correction, white balance correction, color interpolation, contrast enhancement, and bit width compression. At least one of the image processing methods realizes converting the first image data into second image data capable of performing target detection.

Input the first image data to a trained second neural network; the second neural network passes at least a grayscale layer for performing grayscale processing, a convolution layer for performing convolution, and The down-sampling pooling layer, a fully connected layer for performing feature synthesis, and a border regression layer for performing coordinate transformation implement conversion of the first image data into second image data capable of target detection, and detection of a specified target Position information in the second image data;

The result output by the second neural network is determined as position information of the specified target in the first image data.

According to an embodiment of the present disclosure, converting the data format of the target data from the first data format to the second data format includes: inputting the target data to a trained third neural network; the third The neural network implements conversion of the data format of the target data from the first data format to the second data format by at least a convolution layer for performing convolution.

According to an embodiment of the present disclosure, the converting a data format of the target data from the first data format to a second data format includes:

ISP processing is performed on the target data; wherein the ISP processing is used to convert a data format of the target data from the first data format to a second data format, and the ISP processing includes at least color interpolation.

According to an embodiment of the present disclosure, the ISP process further includes at least one of the following processes: white balance correction, curve mapping.

A second aspect of the present disclosure provides an image processing apparatus including:

A first processing module, configured to obtain position information of a specified target in the first image data from the collected first image data in a first data format;

A second processing module, configured to intercept target data corresponding to the position information from the first image data;

A third processing module is configured to convert a data format of the target data from the first data format to a second data format, where the second data format is suitable for displaying and / or transmitting the target data.

According to an embodiment of the present disclosure, the first processing module includes a first processing unit and a second processing unit; the first processing unit is configured to convert the first image data into a second object that can perform target detection Image data; the second processing unit is configured to detect position information of the designated target in the second image data, and determine the detected position information as the designated target is in the first image data Location information.

According to an embodiment of the present disclosure, the second processing unit is specifically configured to: input the second image data to a trained first neural network, and determine a result output by the first neural network as all The position information of the specified target in the first image data; the first neural network at least passes a convolution layer for performing convolution, a pooling layer for performing downsampling, and a fully connected layer for performing feature synthesis And a frame regression layer for performing coordinate transformation to realize the positioning and output of the position information of the specified target.

According to an embodiment of the present disclosure, the first processing unit is specifically configured to: use at least one image processing mode of black level correction, white balance correction, color interpolation, contrast enhancement, and bit width compression to implement the The first image data is converted into second image data capable of performing target detection.

According to an embodiment of the present disclosure, the first processing module includes a third processing unit; the third processing unit is configured to input the first image data to a trained second neural network; the second Neural networks pass at least a grayscale layer for performing grayscale processing, a convolution layer for performing convolution, a pooling layer for performing downsampling, a fully connected layer for performing feature synthesis, and a coordinate for performing The transformed border regression layer realizes converting the first image data into second image data capable of target detection, and detecting position information of a specified target in the second image data; determining a result output by the second neural network Is position information of the specified target in the first image data.

According to an embodiment of the present disclosure, the third processing module includes a fourth processing unit; the fourth processing unit is configured to input the target data to a trained third neural network; and the third neural network The conversion of the data format of the target data from the first data format to the second data format is achieved by at least a convolution layer for performing a convolution.

According to an embodiment of the present disclosure, the third processing module includes a fifth processing unit; the fifth processing unit is configured to perform ISP processing on the target data; wherein the ISP processing is used to transfer the target data The data format of the data is converted from the first data format to a second data format, including at least color interpolation.

According to a third aspect of the present disclosure, there is provided an electronic device including a processor and a memory; the memory stores a program that can be called by the processor; wherein when the processor executes the program, it implements any one of the foregoing embodiments. Item described in the image processing method.

A fourth aspect of the present disclosure provides a machine-readable storage medium having a program stored thereon that, when executed by a processor, implements the image processing method according to any one of the foregoing embodiments. Compared with the prior art, the embodiments of the present invention have the following beneficial effects:

In the embodiment of the present invention, the first image data in the first data format acquired is used to detect the specified target to obtain its position information, and then the target data corresponding to the obtained position information is intercepted by using the first image data in the first data format. Because the target data is intercepted from the first image data, there is no change in the image format or quality. The target data is then used to perform format conversion to convert it to a data format suitable for display and / or transmission, compared to Existing methods for performing post-processing on an image that has undergone image processing after detection improves the image quality of the detected target.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic flowchart of an image processing method according to an exemplary embodiment of the present disclosure.

FIG. 2 is a structural block diagram of an image processing apparatus according to an exemplary embodiment of the present disclosure.

FIG. 3 is a structural block diagram of an embodiment of a first processing module provided by the present disclosure.

FIG. 4 is a schematic flowchart of an embodiment of converting first image data to second image data provided by the present disclosure.

FIG. 5 is a schematic diagram of an embodiment of color interpolation provided by the present disclosure.

FIG. 6 is a structural block diagram of an embodiment of a first neural network provided by the present disclosure.

FIG. 7 is a structural block diagram of another embodiment of a first neural network provided by the present disclosure.

FIG. 8 is a structural block diagram of another embodiment of a first processing module provided by the present disclosure.

FIG. 9 is a structural block diagram of an embodiment of a second neural network provided by the present disclosure.

FIG. 10 is a structural block diagram of another embodiment of a second neural network provided by the present disclosure.

FIG. 11 is a schematic diagram of an embodiment for performing grayscale processing provided by the present disclosure.

FIG. 12 is a structural block diagram of an embodiment of an image processing apparatus provided by the present disclosure.

FIG. 13 is a structural block diagram of an embodiment of a third neural network provided by the present disclosure.

FIG. 14 is a structural block diagram of another embodiment of an image processing apparatus provided by the present disclosure.

15 is a structural block diagram of an embodiment of an ISP process for converting target data from a first data format to a second data format provided by the present disclosure.

FIG. 16 is a structural block diagram of an electronic device according to an exemplary embodiment of the present disclosure.

Detailed ways

Exemplary embodiments will be described in detail here, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numerals in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with the present disclosure. Rather, they are merely examples of devices and methods consistent with some aspects of the present disclosure, as detailed in the appended claims.

The terminology used in this disclosure is for the purpose of describing particular embodiments only and is not intended to limit the disclosure. As used in this disclosure and the appended claims, the singular forms "a," "the," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and / or" as used herein refers to and includes any or all possible combinations of one or more of the associated listed items.

It should be understood that, although the terms first, second, third, etc. may be used in this disclosure to describe various information, such information should not be limited to these terms. These terms are only used to distinguish the same type of information from each other. For example, without departing from the scope of the present disclosure, the first information may also be referred to as the second information, and similarly, the second information may also be referred to as the first information. Depending on the context, the word "if" as used herein can be interpreted as "at" or "when" or "in response to determination".

In order to make the description of the disclosure more clear and concise, some technical terms in the disclosure are explained below:

ISP (Image Signal Processor) processing: It can process the image signals collected by the image sensor of the front-end imaging device, including dead pixel correction, black level correction, white balance correction, color interpolation, gamma correction, color correction, sharpening, Denoising and other functions, you can choose one or more of them according to the actual application.

Deep learning: It is a method that uses neural networks to simulate human brain analysis and learning, and establishes corresponding data representation.

Neural Network (Neural Network): It is mainly composed of neurons; it can include Convolutional Layer and Pooling Layer.

The image processing method of the embodiment of the present disclosure is described in more detail below, but it should not be limited to this.

In one embodiment, referring to FIG. 1, an image processing method according to an embodiment of the present disclosure is shown. The method may include the following steps:

S1: acquiring position information of a specified target in the first image data from the collected first image data in a first data format;

S2: intercept target data corresponding to the position information from the first image data;

S3: Convert the data format of the target data from the first data format to a second data format, where the second data format is suitable for displaying and / or transmitting the target data.

In the embodiment of the present disclosure, the image processing method of FIG. 1 may be applied to an image device. The image device may be a device with an imaging function, such as a camera, or a device capable of performing image post-processing, and the like is not limited. The first image data in the first data format may be image data acquired by the image device itself, or image data acquired from other devices, which is not limited in particular.

The image data collected by the image device is first image data, and the image data format of the first image data is the first data format. The first data format is the original image format collected by the image device. For example, the original image format is an image format without image preprocessing generated by an image sensor in the image device after sensing one or more spectral bands, and an image in the original image format. It may include data in one or more spectral bands, for example, it may include a spectral sampling signal with a wavelength range of 380 nm to 780 nm and / or a spectral sampling signal with a wavelength range of 780 nm to 2500 nm. Generally speaking, it is difficult to directly display or transmit the image in the first data format.

In step S1, position information of a specified target in the first image data is acquired from the collected first image data in a first data format.

The first image data includes a specified target, and the specified target is an object that is expected to undergo ISP processing to improve the image quality of the specified target. A specified target may be detected and located on the first image data.

The position information of the designated target in the first image data may include: the coordinates of the feature point of the designated target in the first image data, and the size of the image area of the designated target; or, the start point and end point of the designated image area of the target The coordinates and the like are not specifically limited, as long as they can locate the position of the designated target in the first image data.

Then step S2 is executed to intercept target data corresponding to the position information from the first image data.

The first image data in step S2 is the first image data in the first data format that is collected, that is, the original image when the device is acquired, and is not the first image data that is processed in order to obtain the position information of the target object. Image data, there is no problem of losing image information. That is, the first image data used in steps S1 and S2 are the same data source, and may be the same first image data, or different first image data collected in the same scene. Frame image data, as long as the specified target does not undergo motion or other changes in the two frames of image data. Of course, preferably, in step S1 and step S2, the same first image data is selected, and the first image data can be stored in an image device and can be accessed when needed.

Since the position information is detected and acquired from the first image data, the image area corresponding to the position information in the first image data is the designated target. Image capture can be performed on the area pointed by the position information in the first image data to obtain target data corresponding to the specified target. Since the target data is intercepted from the first image data, its data format is still the first data format, which is the same as the data format of the first image data.

Step S3 is then executed to convert the data format of the target data from the first data format to a second data format, and the second data format is suitable for displaying and / or transmitting the target data.

In step S3, image processing is performed on the target data in the first data format, and the data format is converted into the second data format. The second data format is a data format suitable for displaying and / or transmitting the target data. Both the first data format and the second data format are image formats. The image processing process may not only perform data format conversion, but may also include other image processing to improve the image quality of the target data.

In the embodiment of the present disclosure, the first image data in the first data format acquired is used to detect the specified target to obtain its position information, and then the first data in the first data format is used to intercept the target data corresponding to the obtained position information. Because the target data is intercepted from the first image data, there is no change in the image format or quality, and the target data is then format converted to convert it to a data format suitable for display and / or transmission, compared to As for the manner in which the image that has undergone image processing is post-processed after detection, the image quality of the detected target is improved.

Step S1 is a step of acquiring position information. The position information of the specified target can be obtained by detecting the specified target of interest and positioning after detecting the specified target. The types of designated targets are not limited, such as text, characters, vehicles, license plates, buildings, etc. The shape and size are also unlimited. Preprocessing can be performed to convert the first image data of the inputted first data format into commonly used data for target detection, and then perform target detection, or directly perform the first image data of the first data format. Target detection, output target location information, the specific implementation is not limited.

In one embodiment, the above method flow may be executed by the image processing apparatus 100. As shown in FIG. 2, the image processing apparatus 100 mainly includes three modules: a first processing module 101, a second processing module 102, and a third processing module 103. The first processing module 101 is configured to perform step S1, the second processing module 102 is configured to perform step S2, and the third processing module 103 is configured to perform step S3.

As shown in FIG. 2, the first processing module 101 detects a target or object of interest from the first image data in the first data format, and outputs position information of the detected target; the second processing module 102 outputs based on the first processing module 101 The position information of the target of interest and the first image data of the first input data format, and obtain the target data of the first data format corresponding to the target of interest from the first image data of the original first data format; third The processing module 103 performs adaptive ISP processing on the target data in the first data format corresponding to the target of interest output by the second processing module 102 to obtain the target data in the second data format with higher quality.

In one embodiment, as shown in FIG. 3, the first processing module 101 includes a first processing unit 1011 and a second processing unit 1012. Step S101 may be performed by the first processing unit 1011, and step S102 may be performed by the second processing unit 1012. To achieve the above step S1. The above step S1 specifically includes the following steps:

S101: converting the first image data into second image data capable of performing target detection;

S102: Detect position information of a designated target in the second image data, and determine the detected position information as position information of the designated target in the first image data.

Since the specified target needs to be detected, and the first image data is not convenient for directly detecting the specified target, in step S101, the first image data is first converted into second image data that can be used for target detection, so that the second Image data can be used to detect specific targets. The specific conversion method is not limited, as long as the first image data can be converted into the second image data capable of detecting a target.

Because the second image data is converted, the data format may no longer be the first data format. If it is used for post-processing to detect the extraction target, the image quality cannot be guaranteed. Therefore, in this embodiment, instead of using the second image data to extract the specified target, the second image data is used to detect the position information of the specified target.

After step S101, step S102 is performed, and position information of a specified target is detected in the second image data. Target recognition and positioning of the specified target in the second image data can determine the position information of the specified target in the second image data. The positional relationship of the specified target in the first image data and the second image data generally does not change. Of course, it is not excluded that zooming or panning of the designated target occurs between the first image data and the second image data, but these zooming and panning are all determinable during processing, so it is known that the designated target is in the second image data. The position information can be used to know the position information of the specified target in the first image data, and the detected position information is determined as the position information of the specified target in the first image data.

Further, a manner of converting the first image data into second image data capable of performing target detection may include performing color interpolation processing on at least the first image data. On this basis, for example, at least one of the following processes may also be performed: black level correction, white balance correction, contrast enhancement, and bit width compression, of course, it is not specifically limited to this.

In a possible implementation manner, the first processing unit 1011 may implement step S101 by performing steps S1011 to S1015. Referring to FIG. 4, S1011 to step S1015 are specifically:

S1011: black level correction;

S1012: white balance correction;

S1013: color interpolation;

S1014: contrast enhancement;

S1015: Bit width compression.

It can be understood that the method of converting the first image data to the second image data is not limited to the above steps S1011 to S1015, and the processing order is not limited. For example, the first image data can be converted to the second image data only by color The interpolation process may be performed as long as the obtained second image data can perform target detection.

In step S1011, it is assumed that the first image data in the first data format is recorded as imgR, and the black level correction is to remove the influence of the black level in the first image data in the first data format, and output imgR _blc :

imgR _blc = imgR-V _blc

Among them, V _blc is a black level value; “-” here is not a mathematical operation, which means the meaning of “removal”.

In step S1012, the white balance correction is to remove the image color cast due to the influence of ambient light in the image imaging to restore the original color information of the image. The corresponding R1 and B1 components can be controlled by two coefficients R _gain and B _gain Adjustments:

R1 ′ = R1 * R _gain

B1 ′ = B1 * B _gain

Among them, R1 and B1 are the color components of the red and blue channels of the image data after the black level correction processing, and R1 ′ and B1 ′ are the color components of the red and blue channels of the output image of the white balance correction module. The output image is recorded as imgR _wb .

In step S1013, the data targeted for color interpolation is data after white balance correction processing. The color interpolation can be implemented by the nearest neighbor interpolation method, and the first image data in the single channel first data format is expanded into multi-channel data. For the first image data in the first data format of the Bayer format, directly fill the missing pixels with the nearest color pixels, so that each pixel contains three RGB color components. The specific interpolation process is shown in Figure 5. R11 fills its three neighboring color pixels as R11. Which specific neighboring pixels can be filled can be set. The same applies to other color pixels, which will not be repeated here. The interpolated image is recorded as imgC.

In step S1014, the data for contrast enhancement is data after color interpolation. Contrast enhancement is to enhance the contrast of the image after interpolation. Gamma curves can be used for linear mapping. It is assumed that the mapping function of the Gamma curve is f (). The image is recorded as imgC _gm :

imgC _gm (i, j) = f (imgC (i, j)),

Where (i, j) is the coordinate of the pixel.

In step S1015, the data targeted for bit-width compression is the data with enhanced contrast. Bit-width compression is the compression of the high-bit-width data imgC _gm obtained after the contrast enhancement. Linear compression is used directly, and the compressed image is recorded as imgC _1b :

imgC _1b (i, j) = imgC _gm (i, j) / M

Among them, M is a compression ratio corresponding to the compression of the first data format to the second data format.

In a possible implementation manner, the second processing unit 1012 may implement step S102 by performing steps S1021 to 1022.

S1021: input the second image data to a trained first neural network; the first neural network is used to achieve positioning through at least a convolution layer, a pooling layer, a fully connected layer, and a frame regression layer;

S1022: Determine a result output by the first neural network as position information of the specified target in the first image data.

In step S1021, the first neural network is a trained network, and inputting the second image data into the first neural network can realize the positioning of the specified target in the second image data and obtain the position information of the specified target accordingly.

The first neural network may be integrated in the second processing unit 1012 as a part of the first processing module 101, or may be provided outside the first processing module 101, and may be scheduled by the second processing unit 1012.

Referring to FIG. 6, the first neural network 200 may include at least one convolution layer 201 for performing convolution, at least one pooling layer 202 for performing downsampling, and at least one layer for performing feature synthesis. A fully connected layer 203 and at least one border regression layer 204 for performing coordinate transformation.

As an embodiment of the first neural network, referring to FIG. 7, the first neural network 200 may include a convolution layer 205, a convolution layer 206, a pooling layer 207,... A convolution layer 208, a pooling layer 209, The connection layer 210 and the frame return layer 211. The second image data is input to the first neural network 200, and the first neural network 200 outputs position information, which is used as position information of the specified target in the first image data. The functions performed by each layer of the first neural network have been described above, and each layer may have adaptive changes. For example, the convolution kernels of different convolution layers may be different, which will not be described again here. It can be understood that the first neural network shown in FIG. 7 is only an example, and is not specifically limited thereto. For example, a convolution layer, a pooling layer, and / or other layers may be reduced or increased.

The following describes the specific functions of the layers in the first neural network, but it should not be limited to this.

The convolution layer (Conv) performs a convolution operation and can also carry an activation function ReLU, which can activate the convolution result. Therefore, the operation for a convolution layer can be expressed by the following formula:

YC _i (I) = g (W _i * YC _i-1 (I) + B _i )

Wherein, YC _{i (I)} is the i-th output of the convolutional layers, YC _{i-1 (I)} is a convolution of the i-th input layer, * denotes convolution, W is _i and B _i are the i th The weight coefficient and offset coefficient of the convolution filter of the convolution layer. G () represents the activation function. When the activation function is ReLU, g (x) = max (0, x), x is YC _i (I).

The pooling layer (Pool) is a special type of downsampling layer, that is, the feature map obtained by the convolution is reduced, and the size of the reduction window is N × N. Take the maximum value as the value of the corresponding point in the latest image. The specific formula is as follows:

YP _j (I) = maxpool (YP _j-1 (I))

Among them, YP _j-1 (I) is the input of the j-th pooling layer, and YP _j (I) is the output of the j-th pooling layer.

The fully connected layer (FC) can be regarded as a convolution layer with a filter window of 1 × 1. Each node of the fully connected layer is connected to all the nodes in the previous layer, which is used to integrate the features extracted before. , Concrete implementation and convolution filtering class

The width and height, W _ij and B _ij are the connection weight coefficients and bias coefficients of the fully connected layer, g () represents the activation function, and I is (i, j).

The border regression layer (BBR) is to find a relationship such that the window P output by the fully connected layer is mapped to obtain a window G ′ closer to the real window G; the implementation of regression is generally to perform a coordinate transformation on window P, such as including translation Transformation and / or scaling transformation; assuming that the coordinates of the window P output by the fully connected layer are (x ₁ , x ₂ , y ₁ , y ₂ ), then the coordinates of the transformed window P (x ₃ , x ₄ , y _3, y _4);

If the transformation is a translation transformation, the translation scale is (Δx, Δy), and the coordinate relationship before and after the translation is:

x ₃ = x ₁ + Δx

x ₄ = x ₂ + Δx

y ₃ = y ₁ + Δy

y ₄ = y ₂ + Δy

If the scale transformation is a scale transformation, the scale scales in the X and Y directions are dx and dy, respectively, and the coordinate relationship before and after the transformation is:

x ₄ -x ₃ = (x ₂ -x ₁ ) * dx

y ₄ -y ₃ = (y ₂ -y ₁ ) * dy.

In step S1022, the position information of the specified target in the first image data is determined according to a result output by the first neural network, and the output result of the first neural network may be directly used as the position information of the specified target in the first image data. Or, the output result may also be converted by using the position change relationship of the specified target in the first image data and the second image data to obtain position information of the specified target in the first image data.

For the training of the first neural network, the first neural network can be obtained by obtaining a second image data sample and a corresponding position information sample as a training sample set, taking the second image data sample as an input, and the corresponding position information sample as an output Training model for training. Regarding the acquisition of the second image data sample and the corresponding position information sample, the second image data sample can be processed by an image processing method that can identify the detection target to obtain a corresponding position information sample.

In another embodiment, referring to FIG. 8, the first processing module 101 includes a third processing unit 1013, and step S111 and step S112 may be performed by the third processing unit 1013 to implement the foregoing step S1. Step S111 and step S112 are specifically:

S111: input the first image data to a trained second neural network; the second neural network at least passes the first neural network through a grayscale layer, a convolution layer, a pooling layer, a fully connected layer, and a frame regression layer Converting the image data into second image data capable of target detection, and detecting position information of a specified target in the second image data;

S112: Determine a result output by the second neural network as position information of the specified target in the first image data.

The second neural network may be integrated in the third processing unit 1013 as a part of the first processing module 101, or may be provided outside the first processing module 101, and may be scheduled by the third processing unit 1013.

Referring to FIG. 9, the second neural network 300 includes at least one grayscale layer 301 for performing grayscale processing, one convolution layer 302 for performing convolution, and one layer for performing downsampling. A layer 303, a fully connected layer 304 for performing feature synthesis, and a border regression layer 305 for performing coordinate transformation. The second neural network can be used to convert the first image data into second image data capable of target detection and detect position information of a specified target in the second image data without performing other ISP processing. Of course, according to different requirements, certain information processing can be performed on the basis of the second neural network processing, which is not limited in particular.

As an example of the second neural network, referring to FIG. 10, the second neural network 300 may include a grayscale layer 306, a convolutional layer 307, a convolutional layer 308, a pooling layer 309, ... a convolutional layer 310, a pooling Layer 311, fully connected layer 312, and border regression layer 313. The first image data is input to the second neural network, and each layer structure of the second neural network applies position processing to the first image data and outputs position information, and the position information is used as position information of the specified target in the first image data. The function performed by each layer of the second neural network is the same as that of the corresponding layer in the first neural network, which has been described above. Each layer can have adaptive changes. For example, the convolution kernels of different convolution layers may be different. , Will not repeat them here. It can be understood that the second neural network 300 shown in FIG. 10 is only an example, and is not specifically limited thereto. For example, the convolutional layer, and / or, the pooling layer, and / or other layers may be reduced or increased.

The gray layer in the second neural network is to convert the multi-channel first data format information into single-channel gray information, which can be achieved by weighting the components representing different colors around the current pixel. Referring to FIG. 11, through processing of a graying layer, components of different colors RGB are weighted and converted into single-channel gray information Y. For example, for Y22, the calculation formula is as follows:

Y22 = (B22 + (G12 + G32 + G21 + G23) / 4 + (R11 + R13 + R31 + R33) / 4) / 3

The other components can be the same, and will not be repeated here.

The functions performed by the convolutional layer, pooling layer, fully connected layer, and border regression layer in the second neural network can be the same as the corresponding layers in the first neural network. Each layer can have adaptive changes, such as different convolutions. The convolution kernels of the layers may be different, and will not be repeated here.

For the training of the second neural network, the second neural network can be obtained by obtaining the first image data sample and the corresponding position information sample as the training sample set, taking the first image data sample as the input, and the corresponding position information sample as the output. Training model for training. Regarding the acquisition of the first image data sample and the corresponding position information sample, the first image data sample may be subjected to target detectable image processing, and then the target may be detected through an image processing method that can identify the detection target to obtain the corresponding Sample location information.

In step S2, data can be intercepted according to the position information of the specified target in the first image data obtained in step S1 to the corresponding position in the first image data of the first data format that was originally input, and the intercepted data is used as the corresponding target. Target data in the first data format.

In one embodiment, it is assumed that the position information of the specified target obtained in step S1 in the first image data is [x1, x2, y1, y2], where x1, y1 are starting position information, and x2, y2 are ending position information. When the first image data in the first data format corresponding to the entire image is represented by imgR, the target data imgT in the first data format specifying the target is:

imgT = imgR (x1: x2, y1: y2).

In step S3, the target data in the first data format corresponding to the specified target obtained in step S2 is processed to convert the target data of the specified target from the first data format to the second data format. Step S3 is actually image processing for small target data, which can be implemented by ISP processing implemented by a non-neural network, or by a neural network.

In an embodiment, as shown in FIG. 12, the third processing module 103 includes a fourth processing unit 1031, and the fourth processing unit 1031 may perform the following steps to implement the above step S3.

Input target data in a first data format to a trained third neural network; the third neural network implements at least a convolution layer to convert the data format of the target data from the first data format to the second data format.

The third neural network may be integrated in the fourth processing unit 1031 as a part of the third processing module 103, or may be provided outside the third processing module 103, and may be scheduled by the fourth processing unit 1031.

The third neural network may include at least one convolution layer for performing a convolution to convert a data format of the target data from the first data format to a second data format. Of course, the layer structure of the third neural network is not limited to this. For example, it may include at least one ReLu layer for performing activation, or may include other layers. The number of specific layers is not limited.

Image processing is implemented based on the third neural network, which reduces the error propagation that may be caused by traditional image processing in each processing step.

The operations performed by each layer of the third neural network are described in detail below, but it should not be limited to this.

For the convolutional layer of the third neural network, assuming that the input of each convolutional layer is FC _i and the output of the convolutional layer is FC _{i + 1} , there are:

FC _{i + 1} = g (w _ik * FC _i + b _ik )

w _ik and b _ik are parameters of the k-th convolution in the current convolution layer, and g (x) is a linear weighting function, that is, the convolution output of each convolution layer is linearly weighted. Of course, the convolutional layer of the third neural network and the convolutional layer of the first neural network both perform convolution operations, and therefore have similar functions. For related descriptions, please also refer to the content of the convolutional layer of the first neural network.

For the ReLu layer of the third neural network, assuming that the input of each ReLu layer is FR _i and the input of the ReLu layer is FR _{i + 1} , then:

FR _{i + 1} = max (FR _i , 0),

That is, the largest of 0 and FR _i is selected.

As an embodiment of the third neural network, referring to FIG. 13, the third neural network 400 may include a convolutional layer 401, a convolutional layer 402, a ReLu layer 403, a convolutional layer 404, and a convolutional layer 405 which are sequentially connected. The input of the third neural network 400 is the target data in the first data format, and the output is the target data in the second data format. The functions performed by each layer of the third neural network are the same as the corresponding layers of the first neural network, which have been described above. Each layer may have adaptive changes. For example, the convolution kernels of different convolution layers may be different. I will not repeat them here. It can be understood that the third neural network shown in FIG. 13 is only an example, and is not specifically limited thereto. For example, the convolutional layer, and / or, the pooling layer, and / or other layers may be reduced or increased.

For the training of the third neural network, in order to optimize the deep neural network in advance, a large number of target data samples in the first data format and target data samples corresponding to the ideal second data format can be used to form samples. The third neural network training process is used The network parameters are continuously trained until the target data in the first data format is input, and the target data in the ideal second data format can be output. At this time, the network parameters are output for actual testing and use by the third neural network.

The training process for training the third neural network may include the following steps:

S311: Collect training samples: collect first data format information corresponding to the target of interest and corresponding ideal second data format information. Assume that n training sample pairs {(x ₁ , y ₁ ), (x ₂ , y ₂ ), ..., (x _n , y _n )} have been obtained, where x _i represents the input first data format information , Y _i represents corresponding ideal second data format information.

S312: design the structure of the third neural network; the network structure used in network training and the network structure used in testing are the same network structure;

S313: Initialize training parameters; initialize network parameters of the structure of the third neural network, which can be random value initialization, fixed value initialization, etc .; set training related parameters, such as learning rate, number of iterations, etc .;

S314: forward propagation; based on the current network parameters, the training sample x _i is used for forward propagation on the third neural network to obtain the output F (x _i ) of the third neural network, and the loss function Loss is calculated:

Loss = (F (x _i ) -y _i ) ² ;

S315: Backward propagation: use backward propagation to adjust the network parameters of the third neural network;

S316: Repeated iteration: Repeat iteration steps S314 and S315 until the network converges, and output network parameters at this time.

Of course, the training process of the third neural network is not limited to this, and it can also be other training methods, as long as the trained third neural network can achieve the target data input in the first data format and the corresponding second data format can be obtained. Target data.

In another embodiment, as shown in FIG. 14, the third processing module 103 includes a fifth processing unit 1032. The fifth processing unit 1032 may perform ISP processing on the target data, and remove the target data from the first data. The data format is converted into the second data format, and the ISP processing includes at least color interpolation to implement step S3 described above.

Further, the ISP processing further includes at least one of the following processes: white balance correction and curve mapping, which can further improve image quality.

Using only the target data in the first data format to implement the calculation of the parameters in the ISP processing can improve the accuracy of the processing parameters, thereby improving the image quality after the target data is processed.

As an embodiment of performing ISP processing on the target data, referring to FIG. 15, the ISP processing may include the following steps in order:

S301: white balance correction; inputting target data in a first data format;

S302: color interpolation;

S303: curve mapping; outputting the target data in the second data format.

It can be understood that the ISP processing for converting the target data from the first data format to the second data format is not limited to this, for example, only color interpolation may be performed, or other ISP processing methods may be included.

The ISP processes such as white balance correction, color interpolation, and curve mapping are described in more detail below, but it should not be limited to this.

White balance correction is to remove the image color cast due to the influence of ambient light to restore the original color information of the image. Generally, two coefficients R- _gain and B- _{gain are} used to control the corresponding R and B components. Adjustment.

R2 ′ = R2 * R _{_gain}

B2 ′ = B2 * B _-gain

Among them, R2 and B2 are the color components of the red and blue channels of the input image of white balance correction, and R2 'and B2' are the color components of the red and blue channels of the output image of white balance correction; In terms of R- _gain and B- _gain, only the R, B, and G channel color components of the target of interest need to be calculated and calculated.

When calculating R- _gain and B- _gain , you need to first calculate the average values R _avg , G _avg, and B _{avg of} each color component of the R, G, and B channels.

Color interpolation refers to expanding the target data of the first data format after white balance correction from a single-channel format to a multi-channel data format in which each channel represents a color component; it can be implemented using the nearest neighbor interpolation method to convert the single-channel first data The formatted target data is expanded into multi-channel target data. For example, for the image data in the first data format of the Bayer format, the nearest color pixels can be directly used to fill the missing pixels of the corresponding color, so that each pixel contains three RGB color components. The specific interpolation process corresponds to the aforementioned FIG. 4 The embodiments may be the same or similar, and are not repeated here.

Curve mapping refers to adjusting the brightness and contrast of image data according to the visual characteristics of the human eye. Gamma curves with different parameters are commonly used for mapping. Assuming that the mapping function of the Gamma curve is g, the mapped image is recorded as img _gm . The previous image is marked as img, then:

img _gm (i, j) = g (img (i, j)).

The embodiment of the present disclosure uses the acquired first image data in the first data format to perform detection of a specified target to obtain its position information; and then uses the first image data in the first data format to intercept target data corresponding to the obtained position information; the Because the target data is intercepted from the first image data, there is no change in the image format or quality, and then the target data is converted to a data format suitable for display and / or transmission, compared to the image processing In terms of post-processing of the image after detection, the image quality of the detected object is improved.

The image processing apparatus according to the embodiment of the present disclosure is described below, but it should not be limited to this.

In one embodiment, referring to FIG. 2, an image processing apparatus 100 may include:

A first processing module 101, configured to obtain position information of a specified target in the first image data from the collected first image data in a first data format;

A second processing module 102, configured to intercept target data corresponding to the position information from the first image data;

A third processing module 103 is configured to convert a data format of the target data from the first data format to a second data format, where the second data format is suitable for displaying and / or transmitting the target data.

In the embodiment of the present disclosure, the image processing apparatus 100 may be applied to an image device. The image device may be a device with an imaging function, such as a video camera, or a device capable of performing image post-processing, and the like is not limited. The first image data in the first data format may be image data acquired by the image device itself, or image data acquired from other devices, which is not limited in particular.

In one embodiment, referring to FIG. 3, the first processing module 101 includes a first processing unit 1011 and a second processing unit 1012. The first processing unit 1011 is configured to convert the first image data into second image data capable of performing target detection. The second processing unit 1012 is configured to detect position information of the designated target in the second image data, and determine the detected position information as a position of the designated target in the first image data. information.

In one embodiment, the second processing unit 1012 is specifically configured to: input the second image data to a trained first neural network, and determine a result output by the first neural network as the first neural network. Specify the position information of the target in the first image data. The first neural network includes at least a convolution layer for performing convolution, a pooling layer for performing downsampling, a fully connected layer for performing feature synthesis, and a frame regression layer for performing coordinate transformation. In order to realize the positioning and output of the position information of the specified target.

In one embodiment, the first processing unit 1011 is specifically configured to: use at least one of black level correction, white balance correction, color interpolation, contrast enhancement, and bit width compression to convert the first image data It is converted into the second image data capable of performing target detection.

In one embodiment, referring to FIG. 8, the first processing module 101 includes a third processing unit 1013 for inputting the first image data to a trained second neural network. The second neural network includes at least a grayscale layer for performing grayscale processing, a convolution layer for performing convolution, a pooling layer for performing downsampling, and a full connection for performing feature synthesis. Layer and frame regression layer for performing coordinate transformation to convert the first image data into second image data capable of target detection, and detect position information of a specified target in the second image data. In this way, position information of the specified target in the first image data may be determined according to a result output by the second neural network.

In one embodiment, referring to FIG. 12, the third processing module 103 includes a fourth processing unit 1031 for inputting the target data to a trained third neural network. Wherein, the third neural network includes at least a convolution layer for performing convolution to convert the target data from the first data format to a second data format.

In one embodiment, referring to FIG. 14, the third processing module 103 includes a fifth processing unit 1032 for performing ISP processing on the target data. The ISP processing is used to convert the target data from the first data format to the second data format, and the ISP processing includes at least color interpolation.

For details about the implementation process of the functions and functions of the units in the above device, refer to the implementation process of the corresponding steps in the foregoing method for details, and details are not described herein again.

As for the device embodiment, since it basically corresponds to the method embodiment, the relevant part may refer to the description of the method embodiment. The device embodiments described above are only schematic, and the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units.

The present disclosure also provides an electronic device including a processor and a memory; the memory stores a program that can be called by the processor; wherein when the processor executes the program, the program is implemented as in any one of the foregoing embodiments. The image processing method described above.

Embodiments of the image processing apparatus of the present disclosure can be applied to electronic devices. Taking software implementation as an example, as a device in a logical sense, it is formed by reading the corresponding computer program instructions in the non-volatile memory into the memory through the processor of the electronic device where it is located. In terms of hardware, as shown in FIG. 16, FIG. 16 is a hardware structural diagram of an electronic device in which the image processing apparatus 100 is located according to an exemplary embodiment of the present disclosure, except for the processor 510 and the memory shown in FIG. 7. In addition to 530, interface 520, and non-volatile memory 540, the electronic device in which the device 100 is located in the embodiment may generally include other hardware according to the actual function of the electronic device, and details are not described herein again.

The present disclosure also provides a machine-readable storage medium having a program stored thereon, which when executed by a processor, causes an image device to implement the image processing method according to any one of the foregoing embodiments.

The present disclosure may take the form of a computer program product implemented on one or more storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing program code therein. Machine-readable storage media includes permanent and non-permanent, removable and non-removable media, and information can be stored by any method or technology. Information may be computer-readable instructions, data structures, modules of a program, or other data. Examples of machine-readable storage media include, but are not limited to: phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only Memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, read-only disc read-only memory (CD-ROM), digital versatile disc (DVD), or other optical storage , Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transmission media can be used to store information that can be accessed by computing devices.

The above are only examples of the present disclosure, and are not intended to limit the present disclosure. Any modification, equivalent replacement, or improvement made within the spirit and principles of the present disclosure shall be included in the protection of the present disclosure. Within range.

Claims

An image processing method includes:

Acquiring position information of a specified target in the first image data from the collected first image data in a first data format;

Intercepting target data corresponding to the position information from the first image data;

And converting the data format of the target data from the first data format to a second data format, where the second data format is suitable for displaying and / or transmitting the target data.
The image processing method according to claim 1, wherein the acquiring position information of a specified target in the first image data from the acquired first image data in a first data format comprises:

Converting the first image data into second image data capable of performing target detection;

The position information of the designated target is detected in the second image data, and the detected position information is determined as the position information of the designated target in the first image data.
The image processing method according to claim 2, wherein the position information of the specified target is detected in the second image data, and the detected position information is determined as the specified target in the first image data Location information in includes:

Input the second image data to a trained first neural network; the first neural network at least passes a convolution layer for performing convolution, a pooling layer for performing downsampling, and performs feature synthesis A fully-connected layer and a frame regression layer for performing coordinate transformation to realize the positioning and output of the position information of the specified target;

Determining position information of the specified target in the first image data according to an output of the first neural network.
The image processing method according to claim 2, wherein the converting the first image data into second image data capable of target detection comprises:

The first image data is converted into the second image data capable of target detection by using at least one image processing method among black level correction, white balance correction, color interpolation, contrast enhancement, and bit width compression.
The image processing method according to claim 1, wherein the acquiring position information of a specified target in the first image data from the acquired first image data in a first data format comprises:

Input the first image data to a trained second neural network; the second neural network passes at least a grayscale layer for performing grayscale processing, a convolution layer for performing convolution, and A down-sampling pooling layer, a fully connected layer for performing feature synthesis, and a frame regression layer for performing coordinate transformations to convert the first image data into second image data capable of target detection, and detecting a designation Position information of the target in the second image data;

Determining position information of the specified target in the first image data according to an output of the second neural network.
The image processing method according to any one of claims 1 to 5, wherein the converting a data format of the target data from the first data format to a second data format comprises:

Inputting the target data to a trained third neural network; the third neural network realizes converting a data format of the target data from the first data format to at least a convolution layer for performing convolution The second data format.
The image processing method according to any one of claims 1 to 5, wherein the converting a data format of the target data from the first data format to a second data format comprises:

Performing image signal processor ISP processing on the target data; wherein the ISP processing is used to convert a data format of the target data from the first data format to a second data format, and the ISP processing includes at least color Interpolation.
The image processing method according to claim 7, wherein the ISP processing further comprises at least one of the following processing: white balance correction, curve mapping.
An image processing device includes:

A first processing module, configured to obtain position information of a specified target in the first image data from the collected first image data in a first data format;

A second processing module, configured to intercept target data corresponding to the position information from the first image data;

A third processing module is configured to convert a data format of the target data from the first data format to a second data format, where the second data format is suitable for displaying and / or transmitting the target data.
The image processing apparatus according to claim 9, wherein the first processing module comprises a first processing unit and a second processing unit;

The first processing unit is configured to convert the first image data into second image data capable of performing target detection;

The second processing unit is configured to detect position information of the designated target in the second image data, and determine the detected position information as position information of the designated target in the first image data. .
The image processing apparatus according to claim 10, wherein the second processing unit is specifically configured to:

Inputting the second image data to a trained first neural network, and determining a result output by the first neural network as position information of the designated target in the first image data; the first neural network At least the position information of the specified target is achieved by a convolution layer for performing convolution, a pooling layer for performing downsampling, a fully connected layer for performing feature synthesis, and a frame regression layer for performing coordinate transformation Positioning and output.
The image processing apparatus according to claim 10, wherein the first processing unit is specifically configured to use at least one of black level correction, white balance correction, color interpolation, contrast enhancement, and bit width compression, Converting the first image data into second image data capable of subject detection.
The image processing apparatus according to claim 9, wherein the first processing module includes a third processing unit;

The third processing unit is configured to input the first image data to a trained second neural network; the second neural network passes at least a grayscale layer for performing grayscale processing, and is used for performing volume Convolutional layer, pooling layer for performing downsampling, fully connected layer for performing feature synthesis, and border regression layer for performing coordinate transformation to convert the first image data into target detection Second image data and detecting position information of a specified target in the second image data;

Determining position information of the specified target in the first image data according to a result output by the second neural network.
The image processing apparatus according to claim 9, wherein the third processing module includes a fourth processing unit;

The fourth processing unit is configured to input the target data to a trained third neural network; the third neural network at least passes a convolution layer for performing a convolution to transfer data of the target data The format is converted from the first data format to a second data format.
The image processing apparatus according to claim 9, wherein the third processing module includes a fifth processing unit;

The fifth processing unit is configured to perform ISP processing on the target data; wherein the ISP processing is used to convert a data format of the target data from the first data format to a second data format; ISP processing includes at least color interpolation.
An electronic device includes a processor and a memory; the memory stores a program that can be called by the processor; wherein when the processor executes the program, the program according to any one of claims 1-8 is implemented Image processing method.
A machine-readable storage medium stores a program thereon, and when the program is executed by a processor, the image processing method according to any one of claims 1-8 is implemented.