WO2019228450A1 - Procédé, dispositif et équipement de traitement d'image et support lisible - Google Patents

Procédé, dispositif et équipement de traitement d'image et support lisible Download PDF

Info

Publication number
WO2019228450A1
WO2019228450A1 PCT/CN2019/089249 CN2019089249W WO2019228450A1 WO 2019228450 A1 WO2019228450 A1 WO 2019228450A1 CN 2019089249 W CN2019089249 W CN 2019089249W WO 2019228450 A1 WO2019228450 A1 WO 2019228450A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
target
image data
image
position information
Prior art date
Application number
PCT/CN2019/089249
Other languages
English (en)
Chinese (zh)
Inventor
徐跃书
肖飞
范蒙
俞海
Original Assignee
杭州海康威视数字技术股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 杭州海康威视数字技术股份有限公司 filed Critical 杭州海康威视数字技术股份有限公司
Publication of WO2019228450A1 publication Critical patent/WO2019228450A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/90Determination of colour characteristics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • the present disclosure relates to the field of image processing technologies, and in particular, to an image processing method, apparatus, and device, and a readable medium.
  • target detection technology is to detect and locate specific targets from a single frame of pictures or videos.
  • target detection technology has been widely used in various fields in society, such as: text detection of goods handling in logistics, detection of illegal vehicles in road traffic, detection of passenger flow and statistics of passenger flow in shopping malls and stations, and so on.
  • the target detection algorithm mainly uses low-bit-width images that have been processed by the ISP. After detecting the target of interest, the corresponding target image is extracted from the image for display or subsequent recognition. In the system of this detection technology, the quality of the target image finally obtained is generally large, and some images have better quality, but in many cases there may be poor quality such as blur, insufficient brightness, and insufficient contrast.
  • the original information of the image will be lost to a certain extent, and the information used in the technical solution of CN104463103A in the subsequent word processing has been processed by the ISP algorithm Data format image, at this time, the information may be seriously lost and cannot be repaired later; and only the text is processed, and the text is generally a very small part of the people's attention.
  • the patent does not perform subsequent processing to improve the quality of key targets; on the whole, the current solution is more limited and cannot comprehensively improve the image quality of detected targets.
  • the present disclosure provides an image processing method, apparatus, and device, and a readable medium, which can improve the image quality of a detection target.
  • a first aspect of the present disclosure provides an image processing method, including:
  • the acquiring position information of a specified target in the first image data from the acquired first image data in the first data format includes:
  • the position information of the designated target is detected in the second image data, and the detected position information is determined as the position information of the designated target in the first image data.
  • the detecting the position information of the designated target in the second image data, and determining the detected position information as the position information of the designated target in the first image data includes:
  • the first neural network at least passes a convolution layer for performing convolution, a pooling layer for performing downsampling, and performs feature synthesis A fully-connected layer and a frame regression layer for performing coordinate transformation to realize the positioning and output of the position information of the specified target;
  • the result output by the first neural network is determined as position information of the specified target in the first image data.
  • the converting the first image data into second image data capable of performing target detection includes: using black level correction, white balance correction, color interpolation, contrast enhancement, and bit width compression. At least one of the image processing methods realizes converting the first image data into second image data capable of performing target detection.
  • the acquiring position information of a specified target in the first image data from the acquired first image data in the first data format includes:
  • the second neural network passes at least a grayscale layer for performing grayscale processing, a convolution layer for performing convolution, and The down-sampling pooling layer, a fully connected layer for performing feature synthesis, and a border regression layer for performing coordinate transformation implement conversion of the first image data into second image data capable of target detection, and detection of a specified target Position information in the second image data;
  • the result output by the second neural network is determined as position information of the specified target in the first image data.
  • converting the data format of the target data from the first data format to the second data format includes: inputting the target data to a trained third neural network; the third The neural network implements conversion of the data format of the target data from the first data format to the second data format by at least a convolution layer for performing convolution.
  • the converting a data format of the target data from the first data format to a second data format includes:
  • ISP processing is performed on the target data; wherein the ISP processing is used to convert a data format of the target data from the first data format to a second data format, and the ISP processing includes at least color interpolation.
  • the ISP process further includes at least one of the following processes: white balance correction, curve mapping.
  • a second aspect of the present disclosure provides an image processing apparatus including:
  • a first processing module configured to obtain position information of a specified target in the first image data from the collected first image data in a first data format
  • a second processing module configured to intercept target data corresponding to the position information from the first image data
  • a third processing module is configured to convert a data format of the target data from the first data format to a second data format, where the second data format is suitable for displaying and / or transmitting the target data.
  • the first processing module includes a first processing unit and a second processing unit; the first processing unit is configured to convert the first image data into a second object that can perform target detection Image data; the second processing unit is configured to detect position information of the designated target in the second image data, and determine the detected position information as the designated target is in the first image data Location information.
  • the second processing unit is specifically configured to: input the second image data to a trained first neural network, and determine a result output by the first neural network as all The position information of the specified target in the first image data; the first neural network at least passes a convolution layer for performing convolution, a pooling layer for performing downsampling, and a fully connected layer for performing feature synthesis And a frame regression layer for performing coordinate transformation to realize the positioning and output of the position information of the specified target.
  • the first processing unit is specifically configured to: use at least one image processing mode of black level correction, white balance correction, color interpolation, contrast enhancement, and bit width compression to implement the The first image data is converted into second image data capable of performing target detection.
  • the first processing module includes a third processing unit; the third processing unit is configured to input the first image data to a trained second neural network; the second Neural networks pass at least a grayscale layer for performing grayscale processing, a convolution layer for performing convolution, a pooling layer for performing downsampling, a fully connected layer for performing feature synthesis, and a coordinate for performing
  • the transformed border regression layer realizes converting the first image data into second image data capable of target detection, and detecting position information of a specified target in the second image data; determining a result output by the second neural network Is position information of the specified target in the first image data.
  • the third processing module includes a fourth processing unit; the fourth processing unit is configured to input the target data to a trained third neural network; and the third neural network
  • the conversion of the data format of the target data from the first data format to the second data format is achieved by at least a convolution layer for performing a convolution.
  • the third processing module includes a fifth processing unit; the fifth processing unit is configured to perform ISP processing on the target data; wherein the ISP processing is used to transfer the target data
  • the data format of the data is converted from the first data format to a second data format, including at least color interpolation.
  • an electronic device including a processor and a memory; the memory stores a program that can be called by the processor; wherein when the processor executes the program, it implements any one of the foregoing embodiments. Item described in the image processing method.
  • a fourth aspect of the present disclosure provides a machine-readable storage medium having a program stored thereon that, when executed by a processor, implements the image processing method according to any one of the foregoing embodiments.
  • the embodiments of the present invention have the following beneficial effects:
  • the first image data in the first data format acquired is used to detect the specified target to obtain its position information, and then the target data corresponding to the obtained position information is intercepted by using the first image data in the first data format. Because the target data is intercepted from the first image data, there is no change in the image format or quality. The target data is then used to perform format conversion to convert it to a data format suitable for display and / or transmission, compared to Existing methods for performing post-processing on an image that has undergone image processing after detection improves the image quality of the detected target.
  • FIG. 1 is a schematic flowchart of an image processing method according to an exemplary embodiment of the present disclosure.
  • FIG. 2 is a structural block diagram of an image processing apparatus according to an exemplary embodiment of the present disclosure.
  • FIG. 3 is a structural block diagram of an embodiment of a first processing module provided by the present disclosure.
  • FIG. 4 is a schematic flowchart of an embodiment of converting first image data to second image data provided by the present disclosure.
  • FIG. 5 is a schematic diagram of an embodiment of color interpolation provided by the present disclosure.
  • FIG. 6 is a structural block diagram of an embodiment of a first neural network provided by the present disclosure.
  • FIG. 7 is a structural block diagram of another embodiment of a first neural network provided by the present disclosure.
  • FIG. 8 is a structural block diagram of another embodiment of a first processing module provided by the present disclosure.
  • FIG. 9 is a structural block diagram of an embodiment of a second neural network provided by the present disclosure.
  • FIG. 10 is a structural block diagram of another embodiment of a second neural network provided by the present disclosure.
  • FIG. 11 is a schematic diagram of an embodiment for performing grayscale processing provided by the present disclosure.
  • FIG. 12 is a structural block diagram of an embodiment of an image processing apparatus provided by the present disclosure.
  • FIG. 13 is a structural block diagram of an embodiment of a third neural network provided by the present disclosure.
  • FIG. 14 is a structural block diagram of another embodiment of an image processing apparatus provided by the present disclosure.
  • 15 is a structural block diagram of an embodiment of an ISP process for converting target data from a first data format to a second data format provided by the present disclosure.
  • FIG. 16 is a structural block diagram of an electronic device according to an exemplary embodiment of the present disclosure.
  • first, second, third, etc. may be used in this disclosure to describe various information, such information should not be limited to these terms. These terms are only used to distinguish the same type of information from each other.
  • first information may also be referred to as the second information, and similarly, the second information may also be referred to as the first information.
  • word “if” as used herein can be interpreted as “at” or "when” or "in response to determination”.
  • ISP Image Signal Processor processing: It can process the image signals collected by the image sensor of the front-end imaging device, including dead pixel correction, black level correction, white balance correction, color interpolation, gamma correction, color correction, sharpening, Denoising and other functions, you can choose one or more of them according to the actual application.
  • Deep learning It is a method that uses neural networks to simulate human brain analysis and learning, and establishes corresponding data representation.
  • Neural Network (Neural Network): It is mainly composed of neurons; it can include Convolutional Layer and Pooling Layer.
  • an image processing method according to an embodiment of the present disclosure is shown.
  • the method may include the following steps:
  • S1 acquiring position information of a specified target in the first image data from the collected first image data in a first data format
  • S2 intercept target data corresponding to the position information from the first image data
  • S3 Convert the data format of the target data from the first data format to a second data format, where the second data format is suitable for displaying and / or transmitting the target data.
  • the image processing method of FIG. 1 may be applied to an image device.
  • the image device may be a device with an imaging function, such as a camera, or a device capable of performing image post-processing, and the like is not limited.
  • the first image data in the first data format may be image data acquired by the image device itself, or image data acquired from other devices, which is not limited in particular.
  • the image data collected by the image device is first image data
  • the image data format of the first image data is the first data format.
  • the first data format is the original image format collected by the image device.
  • the original image format is an image format without image preprocessing generated by an image sensor in the image device after sensing one or more spectral bands, and an image in the original image format. It may include data in one or more spectral bands, for example, it may include a spectral sampling signal with a wavelength range of 380 nm to 780 nm and / or a spectral sampling signal with a wavelength range of 780 nm to 2500 nm.
  • step S1 position information of a specified target in the first image data is acquired from the collected first image data in a first data format.
  • the first image data includes a specified target
  • the specified target is an object that is expected to undergo ISP processing to improve the image quality of the specified target.
  • a specified target may be detected and located on the first image data.
  • the position information of the designated target in the first image data may include: the coordinates of the feature point of the designated target in the first image data, and the size of the image area of the designated target; or, the start point and end point of the designated image area of the target.
  • the coordinates and the like are not specifically limited, as long as they can locate the position of the designated target in the first image data.
  • step S2 is executed to intercept target data corresponding to the position information from the first image data.
  • the first image data in step S2 is the first image data in the first data format that is collected, that is, the original image when the device is acquired, and is not the first image data that is processed in order to obtain the position information of the target object.
  • Image data there is no problem of losing image information. That is, the first image data used in steps S1 and S2 are the same data source, and may be the same first image data, or different first image data collected in the same scene. Frame image data, as long as the specified target does not undergo motion or other changes in the two frames of image data.
  • the same first image data is selected, and the first image data can be stored in an image device and can be accessed when needed.
  • the image area corresponding to the position information in the first image data is the designated target.
  • Image capture can be performed on the area pointed by the position information in the first image data to obtain target data corresponding to the specified target. Since the target data is intercepted from the first image data, its data format is still the first data format, which is the same as the data format of the first image data.
  • Step S3 is then executed to convert the data format of the target data from the first data format to a second data format, and the second data format is suitable for displaying and / or transmitting the target data.
  • step S3 image processing is performed on the target data in the first data format, and the data format is converted into the second data format.
  • the second data format is a data format suitable for displaying and / or transmitting the target data.
  • Both the first data format and the second data format are image formats.
  • the image processing process may not only perform data format conversion, but may also include other image processing to improve the image quality of the target data.
  • the first image data in the first data format acquired is used to detect the specified target to obtain its position information, and then the first data in the first data format is used to intercept the target data corresponding to the obtained position information. Because the target data is intercepted from the first image data, there is no change in the image format or quality, and the target data is then format converted to convert it to a data format suitable for display and / or transmission, compared to As for the manner in which the image that has undergone image processing is post-processed after detection, the image quality of the detected target is improved.
  • Step S1 is a step of acquiring position information.
  • the position information of the specified target can be obtained by detecting the specified target of interest and positioning after detecting the specified target.
  • the types of designated targets are not limited, such as text, characters, vehicles, license plates, buildings, etc. The shape and size are also unlimited.
  • Preprocessing can be performed to convert the first image data of the inputted first data format into commonly used data for target detection, and then perform target detection, or directly perform the first image data of the first data format.
  • Target detection, output target location information the specific implementation is not limited.
  • the above method flow may be executed by the image processing apparatus 100.
  • the image processing apparatus 100 mainly includes three modules: a first processing module 101, a second processing module 102, and a third processing module 103.
  • the first processing module 101 is configured to perform step S1
  • the second processing module 102 is configured to perform step S2
  • the third processing module 103 is configured to perform step S3.
  • the first processing module 101 detects a target or object of interest from the first image data in the first data format, and outputs position information of the detected target; the second processing module 102 outputs based on the first processing module 101 The position information of the target of interest and the first image data of the first input data format, and obtain the target data of the first data format corresponding to the target of interest from the first image data of the original first data format; third The processing module 103 performs adaptive ISP processing on the target data in the first data format corresponding to the target of interest output by the second processing module 102 to obtain the target data in the second data format with higher quality.
  • the first processing module 101 includes a first processing unit 1011 and a second processing unit 1012.
  • Step S101 may be performed by the first processing unit 1011
  • step S102 may be performed by the second processing unit 1012.
  • the above step S1 specifically includes the following steps:
  • S102 Detect position information of a designated target in the second image data, and determine the detected position information as position information of the designated target in the first image data.
  • the first image data is first converted into second image data that can be used for target detection, so that the second Image data can be used to detect specific targets.
  • the specific conversion method is not limited, as long as the first image data can be converted into the second image data capable of detecting a target.
  • the data format may no longer be the first data format. If it is used for post-processing to detect the extraction target, the image quality cannot be guaranteed. Therefore, in this embodiment, instead of using the second image data to extract the specified target, the second image data is used to detect the position information of the specified target.
  • step S102 is performed, and position information of a specified target is detected in the second image data.
  • Target recognition and positioning of the specified target in the second image data can determine the position information of the specified target in the second image data.
  • the positional relationship of the specified target in the first image data and the second image data generally does not change.
  • zooming or panning of the designated target occurs between the first image data and the second image data, but these zooming and panning are all determinable during processing, so it is known that the designated target is in the second image data.
  • the position information can be used to know the position information of the specified target in the first image data, and the detected position information is determined as the position information of the specified target in the first image data.
  • a manner of converting the first image data into second image data capable of performing target detection may include performing color interpolation processing on at least the first image data.
  • at least one of the following processes may also be performed: black level correction, white balance correction, contrast enhancement, and bit width compression, of course, it is not specifically limited to this.
  • the first processing unit 1011 may implement step S101 by performing steps S1011 to S1015.
  • S1011 to step S1015 are specifically:
  • the method of converting the first image data to the second image data is not limited to the above steps S1011 to S1015, and the processing order is not limited.
  • the first image data can be converted to the second image data only by color
  • the interpolation process may be performed as long as the obtained second image data can perform target detection.
  • step S1011 it is assumed that the first image data in the first data format is recorded as imgR, and the black level correction is to remove the influence of the black level in the first image data in the first data format, and output imgR blc :
  • imgR blc imgR-V blc
  • V blc is a black level value; “-” here is not a mathematical operation, which means the meaning of “removal”.
  • step S1012 the white balance correction is to remove the image color cast due to the influence of ambient light in the image imaging to restore the original color information of the image.
  • the corresponding R1 and B1 components can be controlled by two coefficients R gain and B gain Adjustments:
  • R1 and B1 are the color components of the red and blue channels of the image data after the black level correction processing
  • R1 ′ and B1 ′ are the color components of the red and blue channels of the output image of the white balance correction module.
  • the output image is recorded as imgR wb .
  • the data targeted for color interpolation is data after white balance correction processing.
  • the color interpolation can be implemented by the nearest neighbor interpolation method, and the first image data in the single channel first data format is expanded into multi-channel data.
  • For the first image data in the first data format of the Bayer format directly fill the missing pixels with the nearest color pixels, so that each pixel contains three RGB color components.
  • the specific interpolation process is shown in Figure 5. R11 fills its three neighboring color pixels as R11. Which specific neighboring pixels can be filled can be set. The same applies to other color pixels, which will not be repeated here.
  • the interpolated image is recorded as imgC.
  • the data for contrast enhancement is data after color interpolation.
  • Contrast enhancement is to enhance the contrast of the image after interpolation.
  • Gamma curves can be used for linear mapping. It is assumed that the mapping function of the Gamma curve is f (). The image is recorded as imgC gm :
  • imgC gm (i, j) f (imgC (i, j)),
  • bit-width compression is the compression of the high-bit-width data imgC gm obtained after the contrast enhancement. Linear compression is used directly, and the compressed image is recorded as imgC 1b :
  • imgC 1b (i, j) imgC gm (i, j) / M
  • M is a compression ratio corresponding to the compression of the first data format to the second data format.
  • the second processing unit 1012 may implement step S102 by performing steps S1021 to 1022.
  • S1021 input the second image data to a trained first neural network; the first neural network is used to achieve positioning through at least a convolution layer, a pooling layer, a fully connected layer, and a frame regression layer;
  • S1022 Determine a result output by the first neural network as position information of the specified target in the first image data.
  • step S1021 the first neural network is a trained network, and inputting the second image data into the first neural network can realize the positioning of the specified target in the second image data and obtain the position information of the specified target accordingly.
  • the first neural network may be integrated in the second processing unit 1012 as a part of the first processing module 101, or may be provided outside the first processing module 101, and may be scheduled by the second processing unit 1012.
  • the first neural network 200 may include at least one convolution layer 201 for performing convolution, at least one pooling layer 202 for performing downsampling, and at least one layer for performing feature synthesis.
  • the first neural network 200 may include a convolution layer 205, a convolution layer 206, a pooling layer 207,... A convolution layer 208, a pooling layer 209, The connection layer 210 and the frame return layer 211.
  • the second image data is input to the first neural network 200, and the first neural network 200 outputs position information, which is used as position information of the specified target in the first image data.
  • the functions performed by each layer of the first neural network have been described above, and each layer may have adaptive changes.
  • the convolution kernels of different convolution layers may be different, which will not be described again here.
  • the first neural network shown in FIG. 7 is only an example, and is not specifically limited thereto.
  • a convolution layer, a pooling layer, and / or other layers may be reduced or increased.
  • the convolution layer (Conv) performs a convolution operation and can also carry an activation function ReLU, which can activate the convolution result. Therefore, the operation for a convolution layer can be expressed by the following formula:
  • YC i (I) is the i-th output of the convolutional layers
  • YC i-1 (I) is a convolution of the i-th input layer
  • W is i
  • B i are the i th
  • G () represents the activation function.
  • the activation function is ReLU
  • g (x) max (0, x)
  • x is YC i (I).
  • the pooling layer (Pool) is a special type of downsampling layer, that is, the feature map obtained by the convolution is reduced, and the size of the reduction window is N ⁇ N. Take the maximum value as the value of the corresponding point in the latest image.
  • the specific formula is as follows:
  • YP j (I) maxpool (YP j-1 (I))
  • YP j-1 (I) is the input of the j-th pooling layer
  • YP j (I) is the output of the j-th pooling layer
  • the fully connected layer can be regarded as a convolution layer with a filter window of 1 ⁇ 1.
  • Each node of the fully connected layer is connected to all the nodes in the previous layer, which is used to integrate the features extracted before.
  • W ij and B ij are the connection weight coefficients and bias coefficients of the fully connected layer, g () represents the activation function, and I is (i, j).
  • the border regression layer is to find a relationship such that the window P output by the fully connected layer is mapped to obtain a window G ′ closer to the real window G; the implementation of regression is generally to perform a coordinate transformation on window P, such as including translation Transformation and / or scaling transformation; assuming that the coordinates of the window P output by the fully connected layer are (x 1 , x 2 , y 1 , y 2 ), then the coordinates of the transformed window P (x 3 , x 4 , y 3, y 4);
  • the transformation is a translation transformation
  • the translation scale is ( ⁇ x, ⁇ y)
  • the coordinate relationship before and after the translation is:
  • the scale transformation is a scale transformation
  • the scale scales in the X and Y directions are dx and dy, respectively, and the coordinate relationship before and after the transformation is:
  • step S1022 the position information of the specified target in the first image data is determined according to a result output by the first neural network, and the output result of the first neural network may be directly used as the position information of the specified target in the first image data. Or, the output result may also be converted by using the position change relationship of the specified target in the first image data and the second image data to obtain position information of the specified target in the first image data.
  • the first neural network can be obtained by obtaining a second image data sample and a corresponding position information sample as a training sample set, taking the second image data sample as an input, and the corresponding position information sample as an output Training model for training.
  • the second image data sample can be processed by an image processing method that can identify the detection target to obtain a corresponding position information sample.
  • the first processing module 101 includes a third processing unit 1013, and step S111 and step S112 may be performed by the third processing unit 1013 to implement the foregoing step S1.
  • Step S111 and step S112 are specifically:
  • S111 input the first image data to a trained second neural network; the second neural network at least passes the first neural network through a grayscale layer, a convolution layer, a pooling layer, a fully connected layer, and a frame regression layer Converting the image data into second image data capable of target detection, and detecting position information of a specified target in the second image data;
  • S112 Determine a result output by the second neural network as position information of the specified target in the first image data.
  • the second neural network may be integrated in the third processing unit 1013 as a part of the first processing module 101, or may be provided outside the first processing module 101, and may be scheduled by the third processing unit 1013.
  • the second neural network 300 includes at least one grayscale layer 301 for performing grayscale processing, one convolution layer 302 for performing convolution, and one layer for performing downsampling.
  • the second neural network can be used to convert the first image data into second image data capable of target detection and detect position information of a specified target in the second image data without performing other ISP processing.
  • certain information processing can be performed on the basis of the second neural network processing, which is not limited in particular.
  • the second neural network 300 may include a grayscale layer 306, a convolutional layer 307, a convolutional layer 308, a pooling layer 309, ... a convolutional layer 310, a pooling Layer 311, fully connected layer 312, and border regression layer 313.
  • the first image data is input to the second neural network, and each layer structure of the second neural network applies position processing to the first image data and outputs position information, and the position information is used as position information of the specified target in the first image data.
  • the function performed by each layer of the second neural network is the same as that of the corresponding layer in the first neural network, which has been described above.
  • Each layer can have adaptive changes.
  • the convolution kernels of different convolution layers may be different. , Will not repeat them here. It can be understood that the second neural network 300 shown in FIG. 10 is only an example, and is not specifically limited thereto.
  • the convolutional layer, and / or, the pooling layer, and / or other layers may be reduced or increased.
  • the gray layer in the second neural network is to convert the multi-channel first data format information into single-channel gray information, which can be achieved by weighting the components representing different colors around the current pixel.
  • the gray layer in the second neural network through processing of a graying layer, components of different colors RGB are weighted and converted into single-channel gray information Y.
  • the calculation formula is as follows:
  • the functions performed by the convolutional layer, pooling layer, fully connected layer, and border regression layer in the second neural network can be the same as the corresponding layers in the first neural network.
  • Each layer can have adaptive changes, such as different convolutions.
  • the convolution kernels of the layers may be different, and will not be repeated here.
  • the second neural network can be obtained by obtaining the first image data sample and the corresponding position information sample as the training sample set, taking the first image data sample as the input, and the corresponding position information sample as the output.
  • Training model for training Regarding the acquisition of the first image data sample and the corresponding position information sample, the first image data sample may be subjected to target detectable image processing, and then the target may be detected through an image processing method that can identify the detection target to obtain the corresponding Sample location information.
  • step S2 data can be intercepted according to the position information of the specified target in the first image data obtained in step S1 to the corresponding position in the first image data of the first data format that was originally input, and the intercepted data is used as the corresponding target.
  • Target data in the first data format can be intercepted according to the position information of the specified target in the first image data obtained in step S1 to the corresponding position in the first image data of the first data format that was originally input, and the intercepted data is used as the corresponding target.
  • the position information of the specified target obtained in step S1 in the first image data is [x1, x2, y1, y2], where x1, y1 are starting position information, and x2, y2 are ending position information.
  • the target data imgT in the first data format specifying the target is:
  • imgT imgR (x1: x2, y1: y2).
  • step S3 the target data in the first data format corresponding to the specified target obtained in step S2 is processed to convert the target data of the specified target from the first data format to the second data format.
  • Step S3 is actually image processing for small target data, which can be implemented by ISP processing implemented by a non-neural network, or by a neural network.
  • the third processing module 103 includes a fourth processing unit 1031, and the fourth processing unit 1031 may perform the following steps to implement the above step S3.
  • Input target data in a first data format to a trained third neural network implements at least a convolution layer to convert the data format of the target data from the first data format to the second data format.
  • the third neural network may be integrated in the fourth processing unit 1031 as a part of the third processing module 103, or may be provided outside the third processing module 103, and may be scheduled by the fourth processing unit 1031.
  • the third neural network may include at least one convolution layer for performing a convolution to convert a data format of the target data from the first data format to a second data format.
  • the layer structure of the third neural network is not limited to this.
  • it may include at least one ReLu layer for performing activation, or may include other layers. The number of specific layers is not limited.
  • Image processing is implemented based on the third neural network, which reduces the error propagation that may be caused by traditional image processing in each processing step.
  • each layer of the third neural network is described in detail below, but it should not be limited to this.
  • FC i + 1 g (w ik * FC i + b ik )
  • w ik and b ik are parameters of the k-th convolution in the current convolution layer, and g (x) is a linear weighting function, that is, the convolution output of each convolution layer is linearly weighted.
  • g (x) is a linear weighting function, that is, the convolution output of each convolution layer is linearly weighted.
  • the convolutional layer of the third neural network and the convolutional layer of the first neural network both perform convolution operations, and therefore have similar functions.
  • the third neural network 400 may include a convolutional layer 401, a convolutional layer 402, a ReLu layer 403, a convolutional layer 404, and a convolutional layer 405 which are sequentially connected.
  • the input of the third neural network 400 is the target data in the first data format
  • the output is the target data in the second data format.
  • the functions performed by each layer of the third neural network are the same as the corresponding layers of the first neural network, which have been described above.
  • Each layer may have adaptive changes.
  • the convolution kernels of different convolution layers may be different. I will not repeat them here.
  • the third neural network shown in FIG. 13 is only an example, and is not specifically limited thereto.
  • the convolutional layer, and / or, the pooling layer, and / or other layers may be reduced or increased.
  • the third neural network For the training of the third neural network, in order to optimize the deep neural network in advance, a large number of target data samples in the first data format and target data samples corresponding to the ideal second data format can be used to form samples.
  • the third neural network training process is used The network parameters are continuously trained until the target data in the first data format is input, and the target data in the ideal second data format can be output. At this time, the network parameters are output for actual testing and use by the third neural network.
  • the training process for training the third neural network may include the following steps:
  • S311 Collect training samples: collect first data format information corresponding to the target of interest and corresponding ideal second data format information. Assume that n training sample pairs ⁇ (x 1 , y 1 ), (x 2 , y 2 ), ..., (x n , y n ) ⁇ have been obtained, where x i represents the input first data format information , Y i represents corresponding ideal second data format information.
  • S312 design the structure of the third neural network; the network structure used in network training and the network structure used in testing are the same network structure;
  • S313 Initialize training parameters; initialize network parameters of the structure of the third neural network, which can be random value initialization, fixed value initialization, etc .; set training related parameters, such as learning rate, number of iterations, etc .;
  • the training process of the third neural network is not limited to this, and it can also be other training methods, as long as the trained third neural network can achieve the target data input in the first data format and the corresponding second data format can be obtained.
  • Target data can be obtained.
  • the third processing module 103 includes a fifth processing unit 1032.
  • the fifth processing unit 1032 may perform ISP processing on the target data, and remove the target data from the first data.
  • the data format is converted into the second data format, and the ISP processing includes at least color interpolation to implement step S3 described above.
  • the ISP processing further includes at least one of the following processes: white balance correction and curve mapping, which can further improve image quality.
  • Using only the target data in the first data format to implement the calculation of the parameters in the ISP processing can improve the accuracy of the processing parameters, thereby improving the image quality after the target data is processed.
  • the ISP processing may include the following steps in order:
  • S301 white balance correction; inputting target data in a first data format
  • the ISP processing for converting the target data from the first data format to the second data format is not limited to this, for example, only color interpolation may be performed, or other ISP processing methods may be included.
  • the ISP processes such as white balance correction, color interpolation, and curve mapping are described in more detail below, but it should not be limited to this.
  • White balance correction is to remove the image color cast due to the influence of ambient light to restore the original color information of the image.
  • two coefficients R- gain and B- gain are used to control the corresponding R and B components. Adjustment.
  • R2 ′ R2 * R _gain
  • R2 and B2 are the color components of the red and blue channels of the input image of white balance correction
  • R2 'and B2' are the color components of the red and blue channels of the output image of white balance correction
  • R- gain and B- gain only the R, B, and G channel color components of the target of interest need to be calculated and calculated.
  • Color interpolation refers to expanding the target data of the first data format after white balance correction from a single-channel format to a multi-channel data format in which each channel represents a color component; it can be implemented using the nearest neighbor interpolation method to convert the single-channel first data
  • the formatted target data is expanded into multi-channel target data.
  • the nearest color pixels can be directly used to fill the missing pixels of the corresponding color, so that each pixel contains three RGB color components.
  • the specific interpolation process corresponds to the aforementioned FIG. 4
  • the embodiments may be the same or similar, and are not repeated here.
  • Curve mapping refers to adjusting the brightness and contrast of image data according to the visual characteristics of the human eye.
  • Gamma curves with different parameters are commonly used for mapping. Assuming that the mapping function of the Gamma curve is g, the mapped image is recorded as img gm . The previous image is marked as img, then:
  • img gm (i, j) g (img (i, j)).
  • the embodiment of the present disclosure uses the acquired first image data in the first data format to perform detection of a specified target to obtain its position information; and then uses the first image data in the first data format to intercept target data corresponding to the obtained position information; the Because the target data is intercepted from the first image data, there is no change in the image format or quality, and then the target data is converted to a data format suitable for display and / or transmission, compared to the image processing In terms of post-processing of the image after detection, the image quality of the detected object is improved.
  • an image processing apparatus 100 may include:
  • a first processing module 101 configured to obtain position information of a specified target in the first image data from the collected first image data in a first data format
  • a second processing module 102 configured to intercept target data corresponding to the position information from the first image data
  • a third processing module 103 is configured to convert a data format of the target data from the first data format to a second data format, where the second data format is suitable for displaying and / or transmitting the target data.
  • the image processing apparatus 100 may be applied to an image device.
  • the image device may be a device with an imaging function, such as a video camera, or a device capable of performing image post-processing, and the like is not limited.
  • the first image data in the first data format may be image data acquired by the image device itself, or image data acquired from other devices, which is not limited in particular.
  • the first processing module 101 includes a first processing unit 1011 and a second processing unit 1012.
  • the first processing unit 1011 is configured to convert the first image data into second image data capable of performing target detection.
  • the second processing unit 1012 is configured to detect position information of the designated target in the second image data, and determine the detected position information as a position of the designated target in the first image data. information.
  • the second processing unit 1012 is specifically configured to: input the second image data to a trained first neural network, and determine a result output by the first neural network as the first neural network. Specify the position information of the target in the first image data.
  • the first neural network includes at least a convolution layer for performing convolution, a pooling layer for performing downsampling, a fully connected layer for performing feature synthesis, and a frame regression layer for performing coordinate transformation. In order to realize the positioning and output of the position information of the specified target.
  • the first processing unit 1011 is specifically configured to: use at least one of black level correction, white balance correction, color interpolation, contrast enhancement, and bit width compression to convert the first image data It is converted into the second image data capable of performing target detection.
  • the first processing module 101 includes a third processing unit 1013 for inputting the first image data to a trained second neural network.
  • the second neural network includes at least a grayscale layer for performing grayscale processing, a convolution layer for performing convolution, a pooling layer for performing downsampling, and a full connection for performing feature synthesis.
  • Layer and frame regression layer for performing coordinate transformation to convert the first image data into second image data capable of target detection, and detect position information of a specified target in the second image data. In this way, position information of the specified target in the first image data may be determined according to a result output by the second neural network.
  • the third processing module 103 includes a fourth processing unit 1031 for inputting the target data to a trained third neural network.
  • the third neural network includes at least a convolution layer for performing convolution to convert the target data from the first data format to a second data format.
  • the third processing module 103 includes a fifth processing unit 1032 for performing ISP processing on the target data.
  • the ISP processing is used to convert the target data from the first data format to the second data format, and the ISP processing includes at least color interpolation.
  • the relevant part may refer to the description of the method embodiment.
  • the device embodiments described above are only schematic, and the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units.
  • the present disclosure also provides an electronic device including a processor and a memory; the memory stores a program that can be called by the processor; wherein when the processor executes the program, the program is implemented as in any one of the foregoing embodiments.
  • Embodiments of the image processing apparatus of the present disclosure can be applied to electronic devices. Taking software implementation as an example, as a device in a logical sense, it is formed by reading the corresponding computer program instructions in the non-volatile memory into the memory through the processor of the electronic device where it is located.
  • FIG. 16 is a hardware structural diagram of an electronic device in which the image processing apparatus 100 is located according to an exemplary embodiment of the present disclosure, except for the processor 510 and the memory shown in FIG. 7.
  • the electronic device in which the device 100 is located in the embodiment may generally include other hardware according to the actual function of the electronic device, and details are not described herein again.
  • the present disclosure also provides a machine-readable storage medium having a program stored thereon, which when executed by a processor, causes an image device to implement the image processing method according to any one of the foregoing embodiments.
  • the present disclosure may take the form of a computer program product implemented on one or more storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing program code therein.
  • Machine-readable storage media includes permanent and non-permanent, removable and non-removable media, and information can be stored by any method or technology.
  • Information may be computer-readable instructions, data structures, modules of a program, or other data.
  • machine-readable storage media include, but are not limited to: phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only Memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, read-only disc read-only memory (CD-ROM), digital versatile disc (DVD), or other optical storage , Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transmission media can be used to store information that can be accessed by computing devices.
  • PRAM phase change memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • RAM random access memory
  • ROM read-only Memory
  • EEPROM electrically erasable programmable read-only memory
  • flash memory or other memory technologies
  • CD-ROM compact disc read-only memory
  • DVD digital versatile disc
  • Magnetic tape cartridges magnetic tape storage or other magnetic storage devices or any other non-transmission media can be used to store information that can be accessed by computing devices

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

La présente invention concerne un procédé, un dispositif et un équipement de traitement d'image, ainsi qu'un support d'informations, le procédé de traitement d'image consistant : à acquérir, à partir de premières données d'image collectées d'un premier format de données, des informations de position d'une cible spécifique dans les premières données d'image ; à intercepter des données cibles correspondant aux informations de position dans les premières données d'image ; à convertir le format de données des données cibles du premier format de données en un second format de données, le second format de données permettant d'afficher et/ou de transmettre les données cibles. La présente invention peut améliorer la qualité d'image d'une cible de détection.
PCT/CN2019/089249 2018-05-31 2019-05-30 Procédé, dispositif et équipement de traitement d'image et support lisible WO2019228450A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810571964.XA CN110555877B (zh) 2018-05-31 2018-05-31 一种图像处理方法、装置及设备、可读介质
CN201810571964.X 2018-05-31

Publications (1)

Publication Number Publication Date
WO2019228450A1 true WO2019228450A1 (fr) 2019-12-05

Family

ID=68698712

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/089249 WO2019228450A1 (fr) 2018-05-31 2019-05-30 Procédé, dispositif et équipement de traitement d'image et support lisible

Country Status (2)

Country Link
CN (1) CN110555877B (fr)
WO (1) WO2019228450A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113077516A (zh) * 2021-04-28 2021-07-06 深圳市人工智能与机器人研究院 一种位姿确定方法及相关设备

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111110272B (zh) * 2019-12-31 2022-12-23 深圳开立生物医疗科技股份有限公司 超声图像测量信息显示方法、装置、设备及可读存储介质
RU2764395C1 (ru) 2020-11-23 2022-01-17 Самсунг Электроникс Ко., Лтд. Способ и устройство для совместного выполнения дебайеризации и устранения шумов изображения с помощью нейронной сети

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170083752A1 (en) * 2015-09-18 2017-03-23 Yahoo! Inc. Face detection
CN107886074A (zh) * 2017-11-13 2018-04-06 苏州科达科技股份有限公司 一种人脸检测方法以及人脸检测系统
CN108009524A (zh) * 2017-12-25 2018-05-08 西北工业大学 一种基于全卷积网络的车道线检测方法

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103873781B (zh) * 2014-03-27 2017-03-29 成都动力视讯科技股份有限公司 一种宽动态摄像机实现方法及装置
US9881234B2 (en) * 2015-11-25 2018-01-30 Baidu Usa Llc. Systems and methods for end-to-end object detection
CN106529446A (zh) * 2016-10-27 2017-03-22 桂林电子科技大学 基于多分块深层卷积神经网络的车型识别方法和系统
CN107301383B (zh) * 2017-06-07 2020-11-24 华南理工大学 一种基于Fast R-CNN的路面交通标志识别方法
CN107895378A (zh) * 2017-10-12 2018-04-10 西安天和防务技术股份有限公司 目标检测方法和装置、存储介质、电子设备
CN107808139B (zh) * 2017-11-01 2021-08-06 电子科技大学 一种基于深度学习的实时监控威胁分析方法及系统
CN107871126A (zh) * 2017-11-22 2018-04-03 西安翔迅科技有限责任公司 基于深层神经网络的车型识别方法和系统

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170083752A1 (en) * 2015-09-18 2017-03-23 Yahoo! Inc. Face detection
CN107886074A (zh) * 2017-11-13 2018-04-06 苏州科达科技股份有限公司 一种人脸检测方法以及人脸检测系统
CN108009524A (zh) * 2017-12-25 2018-05-08 西北工业大学 一种基于全卷积网络的车道线检测方法

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113077516A (zh) * 2021-04-28 2021-07-06 深圳市人工智能与机器人研究院 一种位姿确定方法及相关设备
CN113077516B (zh) * 2021-04-28 2024-02-23 深圳市人工智能与机器人研究院 一种位姿确定方法及相关设备

Also Published As

Publication number Publication date
CN110555877A (zh) 2019-12-10
CN110555877B (zh) 2022-05-31

Similar Documents

Publication Publication Date Title
US11882357B2 (en) Image display method and device
CN110738697B (zh) 基于深度学习的单目深度估计方法
WO2021164234A1 (fr) Procédé de traitement d'image et dispositif de traitement d'image
US20200234414A1 (en) Systems and methods for transforming raw sensor data captured in low-light conditions to well-exposed images using neural network architectures
WO2019228450A1 (fr) Procédé, dispositif et équipement de traitement d'image et support lisible
CN112966635B (zh) 面向低分辨率时序遥感影像的运动舰船检测方法及装置
US20240007600A1 (en) Spatially Varying Reduction of Haze in Images
US11170470B1 (en) Content-adaptive non-uniform image downsampling using predictive auxiliary convolutional neural network
CN112037129A (zh) 图像超分辨率重建方法、装置、设备及存储介质
US20220398698A1 (en) Image processing model generation method, processing method, storage medium, and terminal
CN109272014B (zh) 一种基于畸变适应卷积神经网络的图像分类方法
US20230127009A1 (en) Joint objects image signal processing in temporal domain
CN111582074A (zh) 一种基于场景深度信息感知的监控视频树叶遮挡检测方法
CN111784624A (zh) 目标检测方法、装置、设备及计算机可读存储介质
CN113409355A (zh) 一种基于fpga的运动目标识别系统及方法
CN110647813A (zh) 一种基于无人机航拍的人脸实时检测识别方法
CN112241982A (zh) 一种图像处理方法、装置及机器可读存储介质
CN117456376A (zh) 一种基于深度学习的遥感卫星影像目标检测方法
CN112288031A (zh) 交通信号灯检测方法、装置、电子设备和存储介质
CN117409244A (zh) 一种SCKConv多尺度特征融合增强的低照度小目标检测方法
CN116403200A (zh) 基于硬件加速的车牌实时识别系统
CN111815529B (zh) 一种基于模型融合和数据增强的低质图像分类增强方法
CN114463379A (zh) 一种视频关键点的动态捕捉方法及装置
CN112241670B (zh) 图像处理方法及装置
CN112241936B (zh) 图像处理方法、装置及设备、存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19810790

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19810790

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 19810790

Country of ref document: EP

Kind code of ref document: A1