CN115100500A

CN115100500A - Target detection method and device and readable storage medium

Info

Publication number: CN115100500A
Application number: CN202210684927.6A
Authority: CN
Inventors: 谢旭; 凌明; 杨作兴; 杨敏; 艾国
Original assignee: Shenzhen MicroBT Electronics Technology Co Ltd
Current assignee: Shenzhen MicroBT Electronics Technology Co Ltd
Priority date: 2022-06-14
Filing date: 2022-06-14
Publication date: 2022-09-23

Abstract

The embodiment of the invention provides a target detection method, a target detection device and a readable storage medium. The method comprises the following steps: inputting an image to be processed into a parameter prediction network, and outputting tuning parameters through the parameter prediction network, wherein the tuning parameters comprise at least one of defogging parameters, white balance parameters, contrast parameters, hue parameters, sharpening parameters and correction parameters; performing image enhancement processing on the image to be processed according to the tuning parameters output by the parameter prediction network to obtain an optimized image; inputting the optimized image corresponding to the image to be processed into a target detection network for target detection, and outputting a target detection result through the target detection network, wherein the target detection network and the parameter prediction network are neural networks obtained by joint training in advance by using training data, and the training data comprises images meeting preset conditions. The embodiment of the invention can reduce the operation cost of the user and improve the accuracy of target detection.

Description

Target detection method and device and readable storage medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a target detection method, an apparatus, and a readable storage medium.

Background

Artificial Intelligence (AI) is a technology for studying and developing theories, methods, and application systems for simulating, extending, and expanding human Intelligence. Computer Vision (CV) is a branch of artificial intelligence that attempts to build artificial intelligence systems that can obtain information from images or multidimensional data.

Object detection is an important application of computer vision, such as detecting faces, vehicles or buildings from images. For high quality images, the object detection model can often accurately detect objects therein. However, for low-quality images, such as images shot under severe weather, dark light, and the like, it is difficult to accurately detect the images therein, which greatly affects the accuracy of target detection. Under the conditions of severe weather, dark light and the like, the quality of a shot image needs to be improved by manually adjusting camera parameters so as to improve the accuracy of target detection, but the method cannot be suitable for different shooting scenes and has higher professional requirements on shooting personnel.

Disclosure of Invention

The embodiment of the invention provides a target detection method, a target detection device and a readable storage medium, which can reduce the operation cost of a user and improve the accuracy of target detection.

In a first aspect, an embodiment of the present invention discloses a target detection method, where the method includes:

inputting an image to be processed into a parameter prediction network, and outputting tuning parameters through the parameter prediction network, wherein the tuning parameters comprise at least one of defogging parameters, white balance parameters, contrast parameters, hue parameters, sharpening parameters and correction parameters;

performing image enhancement processing on the image to be processed according to the tuning parameters output by the parameter prediction network to obtain an optimized image corresponding to the image to be processed;

inputting the optimized image corresponding to the image to be processed into a target detection network for target detection, and outputting a target detection result through the target detection network, wherein the target detection network and the parameter prediction network are neural networks obtained by joint training in advance by using training data, and the training data comprises images meeting preset conditions.

In a second aspect, an embodiment of the present invention discloses an apparatus for detecting a target, where the apparatus includes:

the parameter prediction module is used for inputting the image to be processed into a parameter prediction network and outputting tuning parameters through the parameter prediction network, wherein the tuning parameters comprise at least one of defogging parameters, white balance parameters, contrast parameters, hue parameters, sharpening parameters and correction parameters;

the image enhancement module is used for carrying out image enhancement processing on the image to be processed according to the tuning parameters output by the parameter prediction network to obtain an optimized image corresponding to the image to be processed;

and the target detection module is used for inputting the optimized image corresponding to the image to be processed into a target detection network for target detection, and outputting a target detection result through the target detection network, wherein the target detection network and the parameter prediction network are neural networks obtained by joint training by utilizing training data in advance, and the training data comprise images meeting preset conditions.

In a third aspect, embodiments of the invention disclose a machine-readable medium having instructions stored thereon, which when executed by one or more processors of an apparatus, cause the apparatus to perform an object detection method as described in one or more of the preceding.

The embodiment of the invention has the following advantages:

the embodiment of the invention obtains a parameter prediction network and a target detection network by training data in a combined manner in advance, inputs the image to be processed into the trained parameter prediction network, can output tuning parameters through the parameter prediction network, performs image enhancement processing on the image to be processed by using the tuning parameters, can obtain an optimized image corresponding to the image to be processed, and inputs the optimized image into the trained target detection network to output a target detection result. The embodiment of the invention can automatically predict the tuning parameters required by the image to be processed through the parameter prediction network, does not need to have higher professional requirements on shooting personnel when shooting the image, and can also reduce the operation cost of a user. In addition, the parameter prediction network is a neural network obtained through training of a large amount of training data, and the training data comprise images under preset conditions (such as severe weather), so that the parameter prediction network provided by the embodiment of the invention can accurately predict tuning parameters required by the images under the preset conditions (such as severe weather), the adaptability of the parameter prediction network to the images under the preset conditions (such as severe weather) can be enhanced, and compared with manual parameter setting, the embodiment of the invention can improve the accuracy of the tuning parameters, and further improve the accuracy of target detection. Moreover, the parameter prediction network and the target detection network of the embodiment of the invention can be obtained through end-to-end training and testing, and the cost of manual debugging can be reduced.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive labor.

FIG. 1 is a flow chart of the steps of one embodiment of a method of target detection of the present invention;

FIG. 2 is a schematic diagram of an image enhancement module process flow in one example of the invention;

FIG. 3 is a schematic diagram of an end-to-end system architecture of the present invention;

FIG. 4 is a flowchart illustrating steps of an embodiment of an object detection apparatus according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terms first, second and the like in the description and in the claims of the present invention are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that embodiments of the invention may be practiced in sequences other than those illustrated or described herein, and that the objects identified as "first", "second", etc. are often one, and do not limit the number of objects, e.g., the first object may be one or more. Furthermore, the term "and/or" as used in the specification and claims to describe an associative relationship of associated objects means that there may be three relationships, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. The term "plurality" in the embodiments of the present invention means two or more, and other terms are similar thereto.

Referring to fig. 1, a flow chart of steps of an embodiment of a method of object detection of the present invention is shown, which may include the steps of:

step 101, inputting an image to be processed into a parameter prediction network, and outputting tuning parameters through the parameter prediction network, wherein the tuning parameters comprise at least one of defogging parameters, white balance parameters, contrast parameters, hue parameters, sharpening parameters and correction parameters;

102, predicting an optimization parameter output by a network according to the parameter, and performing image enhancement processing on the image to be processed to obtain an optimized image corresponding to the image to be processed;

step 103, inputting the optimized image corresponding to the image to be processed into a target detection network for target detection, and outputting a target detection result through the target detection network, wherein the target detection network and the parameter prediction network are neural networks obtained by joint training in advance by using training data, and the training data comprises images meeting preset conditions.

The target detection method is a method of finding a plurality of specific targets from an image and determining specific positions of the specific targets in the image. The target detection method provided by the embodiment of the invention can be used for a face recognition scene, detecting a face target in an image, and also can be used for an automatic driving scene, and detecting targets such as pedestrians, obstacles, traffic signals and the like in the image.

The embodiment of the invention realizes the target detection method through an end-to-end system. And inputting the image to be processed into the end-to-end system, and outputting a target detection result of the image to be processed. The end-to-end system mainly comprises the following three parts: the system comprises a parameter prediction network, an image enhancement module and a target detection network.

The parameter prediction network is used for receiving the image to be processed and outputting the tuning parameters corresponding to the image to be processed. The adjusting and optimizing parameters are parameters which are determined by the parameter prediction network according to the global information of the image to be processed and are used for optimizing the image to be processed, and are used for improving the quality of the image to be processed. The tuning parameters may include, but are not limited to, at least one of a defogging parameter, a white balance parameter, a contrast parameter, a hue parameter, a sharpening parameter, and a correction parameter. The image enhancement module is used for receiving the image to be processed and the tuning parameters output by the parameter prediction network, carrying out image enhancement processing on the received image to be processed according to the tuning parameters and outputting an optimized image. That is, the image enhancement module is used for optimizing the image to be processed according to the tuning parameters provided by the parameter prediction network to obtain an optimized image. For example, if the brightness of the image to be processed is too low, the brightness of the image to be processed can be enhanced according to the correction parameters output by the parameter prediction network; for another example, if the image to be processed is an image shot in a foggy day, the image to be processed may be subjected to defogging processing and the like according to the defogging parameters output by the parameter prediction network. The image enhancement module can improve the image quality after optimizing the image to be processed, and further can improve the accuracy of target detection. It should be noted that the type and number of tuning parameters output by the parameter prediction network may be set according to actual needs. The target detection network is used for receiving the optimized image output by the image enhancement module, carrying out target detection on the received optimized image and outputting a target detection result.

The embodiment of the invention does not limit the source of the image to be processed. For example, the image to be processed may be an image in a traffic monitoring video, or the image to be processed may be an image in a video recorded by a mobile phone of a user, and the like.

The target detection network and the parameter prediction network are neural networks obtained by joint training in advance by utilizing training data, and the training data comprises images meeting preset conditions. The image of the preset condition refers to an image with poor shooting condition, for example, the preset condition may include, but is not limited to, any one or more of the following shooting conditions: rainy day, snowy day, cloudy day, night, foggy day, dim light, strong light, etc.

The embodiment of the invention obtains the parameter prediction network and the target detection network by utilizing training data joint training in advance. Because the training data includes images meeting preset conditions, such as images shot in rainy days, snowy days, cloudy days, nights, foggy days, dim lights, strong lights and the like, the parameter prediction network and the target detection network are jointly trained based on the training data, the trained parameter prediction network can accurately predict tuning parameters corresponding to the images under the preset conditions, and the trained target detection network can accurately detect targets in the images under the preset conditions.

In an optional embodiment of the present invention, the predicting, according to the parameter, a tuning parameter output by a network in step 102, and performing image enhancement processing on the image to be processed may include:

step S11, inputting the image to be processed and the tuning parameters output by the parameter prediction network into an image enhancement module, wherein the image enhancement module comprises filters corresponding to the tuning parameters output by the parameter prediction network;

and step S12, sequentially performing image enhancement processing on the images to be processed by using the corresponding tuning parameters through each filter.

In an embodiment of the invention, the image enhancement module may include several differentiable filters, each filter being used to perform some kind of optimization on the image. The tuning parameters output by the parameter prediction network have a one-to-one correspondence relationship with the filters included in the image enhancement module, and the parameter prediction network can input the corresponding tuning parameters into the corresponding filters in the image enhancement module, so that each filter can sequentially perform image enhancement processing on the image to be processed by using the corresponding tuning parameters.

Further, the image enhancement module may include, but is not limited to, at least one of a defogging filter, a white balance filter, a contrast filter, a hue filter, a sharpening filter, and a correction filter. The parameter prediction network may input the defogging parameters into the defogging filter, the white balance parameters into the white balance filter, the contrast parameters into the contrast filter, the hue parameters into the hue filter, the sharpening parameters into the sharpening filter, and the correction parameters into the correction filter.

The defogging filter can be used for defogging the received image according to the received defogging parameters; the white balance filter is used for carrying out white balance adjustment processing on the received image according to the received white balance parameters; the contrast filter is used for carrying out contrast adjustment processing on the received image according to the received contrast parameters; the tone filter is used for carrying out tone adjustment processing on the received image according to the received tone parameters; the sharpening filter is used for sharpening the received image according to the received sharpening parameter; the correction filter is used for adjusting the brightness of the received image according to the received correction parameters.

Referring to fig. 2, a schematic diagram of an image enhancement process flow in one example of the invention is shown. The flow diagram shown in fig. 2 includes a parameter prediction network 201 and an image enhancement module 202, and the image enhancement module 202 sequentially includes a correction filter, a white balance filter, a sharpening filter, a contrast filter, and a defogging filter. The filters shown in fig. 2 are connected in sequence, and the output of the previous filter is used as the input of the next filter, thereby realizing the gradual optimization of the image to be processed. As shown in fig. 2, the input of the correction filter is the image to be processed, the image to be processed is input into the white balance filter after being processed by the correction filter, is input into the sharpening filter after being processed by the white balance filter, is input into the contrast filter after being processed by the sharpening filter, is input into the defogging filter after being processed by the contrast filter, and is output to obtain the optimized image after being processed by the defogging filter.

It should be noted that the connection sequence of the filters shown in fig. 2 is only an application example of the present invention, and the connection sequence of the filters included in the image enhancement module is not limited in the embodiment of the present invention, that is, the embodiment of the present invention does not limit the sequence of sequentially performing image enhancement processing on the image to be processed by using corresponding tuning parameters through each filter.

The correction filter may perform Gamma correction on the image by:

f(I)＝I ^γ (1)

gamma correction is an important nonlinear transformation, which is to perform an exponential transformation on the gray value of an input image to further correct the brightness deviation of the image, and is generally applied to expand the details of a dark tone. In the above formula (1), f (I) is an image output from the correction filter, I is an image input to the correction filter, and γ is a Gamma value for Gamma correction. In the embodiment of the present invention, the correction parameter output by the parameter prediction network includes a Gamma value used for performing Gamma correction on an image to be processed, taking fig. 2 as an example, I is the image to be processed, and the correction parameter (Gamma value) output by the parameter prediction network according to the image to be processed I is γ (Gamma value). Inputting the image I to be processed and the correction parameter Gamma (Gamma value) output by the parameter prediction network into a correction filter, and carrying out Gamma correction on the image I to be processed by the correction filter according to the correction parameter Gamma and the formula (1). For example, when γ is greater than 1, the correction filter may enhance the luminance of the image to be processed by the above equation (1), and when γ is less than 1, the correction filter may reduce the luminance of the image to be processed by the above equation (1).

The white balance filter may adjust the white balance of the image by:

f(I _r )＝W _r I _r (2)

f(I _g )＝W _g I _g (3)

f(I _b )＝W _b I _b (4)

white balance is an index describing the accuracy of white color generated by mixing three primary colors of red, green and blue in a display. In the above formulae (2), (3) and (4), f (I) _r )、f(I _g ) And f (I) _b ) Values of three channels R (red), G (green) and B (blue), I, respectively, corresponding to the image output by the white balance filter _r 、I _g And I _b Values of three channels R (red) G (green) B (blue), W, respectively, corresponding to the image input to the white balance filter _r 、W _g And W _b The weights are corresponding to the three channels of RGB respectively. In the embodiment of the present invention, the white balance parameters output by the parameter prediction network include weights W respectively corresponding to three RGB channels for performing white balance adjustment on the image to be processed _r 、W _g And W _b The value of (c). Taking fig. 2 as an example, I is the image to be processed, and the white balance parameter output by the parameter prediction network according to the image to be processed I is W _r 、W _g And W _b The image input to the white balance filter is the image output by the correction filter. Image and parameter prediction net for outputting correction filterWhite balance parameter (W) of envelope output _r 、W _g And W _b Value of) is input to a white balance filter based on W _r 、W _g And W _b The value of (c) is used to perform white balance adjustment on the image output by the correction filter according to equations (2), (3) and (4) above.

The sharpening filter may sharpen the image by:

F(x,λ)＝I(x)+λ(I(x)-Gau(I(x))) (5)

the sharpness of the edges of the image can be improved by the sharpening process to highlight the detail information of the image. In the above formula (5), F (x, λ) is an image output from the sharpening filter, and i (x) is an image input to the sharpening filter. Taking fig. 2 as an example, the image input to the sharpening filter is the image output by the white balance filter. Gau (i (x)) denotes that the image inputted to the sharpening filter is subjected to gaussian filtering, where λ is a positive scale factor, and the degree of sharpening of the image can be adjusted by adjusting λ. In the embodiment of the present invention, the sharpening parameter output by the parameter prediction network includes a value of a positive scale factor λ used for sharpening the image to be processed. Taking fig. 2 as an example, I is an image to be processed, and the sharpening parameter output by the parameter prediction network according to the image to be processed I is a value of the positive scale factor λ. The image output by the white balance filter and the sharpening parameter (the value of the positive scale factor lambda) output by the parameter prediction network are input into the sharpening filter, and the sharpening filter sharpens the image output by the white balance filter according to the value of the positive scale factor lambda according to the formula (5).

The contrast filter may adjust the contrast of the image by:

F(x,α)＝αI(x) (6)

contrast refers to the measurement of different brightness levels between the brightest white and darkest black of bright and dark regions in an image, and a larger difference range represents a larger contrast and a smaller difference range represents a smaller contrast. In the above formula (6), F (x, α) is an image output from the contrast filter, and i (x) is an image input to the contrast filter. Taking fig. 2 as an example, the image input to the contrast filter is the image output by the sharpening filter. Alpha is a contrast adjusting factor, and the contrast of the image can be adjusted by adjusting alpha. In the embodiment of the present invention, the contrast parameter output by the parameter prediction network includes a value of a contrast adjustment factor α used for performing contrast adjustment on an image to be processed. Taking fig. 2 as an example, I is an image to be processed, and the contrast parameter output by the parameter prediction network according to the image to be processed I is a value of the contrast adjustment factor α. And (3) inputting the image output by the sharpening filter and the contrast parameter (the value of the contrast adjusting factor alpha) output by the parameter prediction network into the contrast filter, and carrying out contrast adjustment on the image output by the sharpening filter according to the value of the contrast adjusting factor alpha by the contrast filter according to the formula (6).

The defogging filter may be used to defogg an image, and the image input to the defogging filter may be represented as follows:

I(x)＝J(x)t(x)+A(1-t(x)) (7)

in the above formula (7), i (x) represents an image (foggy day image) input to the defogging filter, j (x) represents a normal image (non-foggy day image) after the defogging process on i (x), a represents the brightness of the image input to the defogging filter, and t (x) represents a medium projection map. The key to the defogging process is the control of the function t (x).

From equation (7) above, it can be derived that t (x) can be expressed approximately as follows:

in the above equation (8), C represents R, G, B three channels of the image input to the defogging filter, and the function t (x) processes these three channels in turn. When each channel is processed, the processing can be carried out according to the area, and the size of the area can be set according to the requirement. For example, assume that the image size of the input defogging filter is 416 × 416 pixels, and the region size per process is 3 × 3 pixels. For an image input to the defogging filter, y represents the pixel value of the currently processed region, min _c Is the minimum pixel value, min, of the current channel _y Is the minimum pixel value of the current region, I ^C (y) is the pixel value of the current region in the current channel, A ^C Is the brightness of the current channelAnd (4) degree.

Further, according to the above formula (8), a parameter ω can be introduced to control the degree of defogging as follows:

in the above equation (9), t (x, ω) is an image output from the defogging filter, and the image input to the defogging filter is an image output from the contrast filter, taking fig. 2 as an example. In the embodiment of the present invention, the defogging parameters output by the parameter prediction network include a value of a defogging control parameter ω for performing a defogging process on an image to be processed. Taking fig. 2 as an example, I is an image to be processed, and the defogging parameter output by the parameter prediction network according to the image to be processed I is a value of the defogging control parameter ω. And inputting the image output by the contrast filter and the defogging parameter (value of the defogging control parameter omega) output by the parameter prediction network into the defogging filter, and defogging the image output by the contrast filter by the defogging filter according to the value of the defogging control parameter omega by the defogging filter according to the formula (9). In the example shown in fig. 2, the image output by the defogging filter is an optimized image corresponding to the image I to be processed, that is, an image to be input into the target detection network.

In an optional embodiment of the present invention, before the step 101 of inputting the image to be processed into the parameter prediction network, the method may further include: adjusting the original image to respectively obtain a to-be-processed image with a first size and a to-be-processed image with a second size; the step 101 of inputting the image to be processed into the parameter prediction network may include: inputting the image to be processed with the first size into the parameter prediction network; in step 102, the predicting, according to the parameter, a tuning parameter output by the network, and performing image enhancement processing on the image to be processed may include: and predicting tuning parameters output by the network according to the parameters, and performing image enhancement processing on the image to be processed with the second size.

In the embodiment of the present invention, the image to be processed may be an image obtained by preprocessing an original image. The pre-processing may include resizing the original image. Further, the embodiment of the invention adjusts the original image to obtain the image to be processed with the first size and the image to be processed with the second size respectively.

The original image is an image which needs to be subjected to target detection, such as an image in a traffic monitoring video, or an image in a video recorded by a mobile phone of a user, and the like.

The embodiment of the invention inputs the image to be processed with the first size into the parameter prediction network to predict and optimize the tuning parameters. In order to improve the efficiency of target detection, the parameter prediction module may be a small neural network, and in the embodiment of the present invention, the size of the original image is adjusted to the first size (a smaller size) and then input into the parameter prediction network, so as to reduce the amount of calculation of the parameter prediction network, thereby improving the efficiency of parameter prediction and optimization by the parameter prediction network, and further improving the efficiency of target detection. Illustratively, an image pre-processing module may be provided to which the raw image is input for pre-processing, which may include, but is not limited to, resizing the raw image (e.g., to a first size and a second size, respectively), removing noise in the raw image, and the like. Inputting the image to be processed with the first size output by the image preprocessing module into a parameter prediction network, and inputting the image to be processed with the second size output by the image preprocessing module into an image enhancement module. It should be noted that, the method adopted by the embodiment of the present invention to adjust the size of the original image is not limited.

In a specific implementation, the larger the first size and the second size are, the more complete the original information retained in the image is, and the more accurate the target detection result is. In the case of directly processing by using an original image, the target detection result is more accurate, but the calculation efficiency is low, and the feasibility and the real-time performance are difficult to guarantee. Therefore, the first size and the second size may select a minimum size capable of securing an object detection effect. Illustratively, the first size may be selected to be 256 pixels by 256, and the second size may be selected to be 416 pixels by 416. The first size is selected to be a smaller size so as to reduce the calculation amount of the parameter prediction network and improve the target detection efficiency, and the second size is selected to be larger than the first size so as to ensure the image enhancement processing effect and further ensure the target detection accuracy.

It should be noted that, in the embodiment of the present invention, the sizes of the first dimension and the second dimension are not limited. For example, the first dimension may be smaller than the second dimension. Of course, the first size may also be greater than or equal to the second size.

Referring to fig. 3, a schematic diagram of an end-to-end system architecture of the present invention is shown. The system architecture shown in fig. 3 includes a parameter prediction network 301, an image enhancement module 302, and an object detection network 303.

In the embodiment of the present invention, an original image is adjusted to obtain a to-be-processed image with a first size (e.g., 256 × 256 pixels) and a to-be-processed image with a second size (e.g., 416 × 416 pixels), and the to-be-processed image with the first size (e.g., 256 × 256 pixels) is input into a parameter prediction network to predict tuning parameters (the tuning parameters are parameters required by each filter in an image enhancement module). Inputting the image to be processed with the second size (such as 416 × 416 pixels) and the tuning parameters output by the parameter prediction network into an image enhancement module, gradually performing image enhancement processing on the image to be processed with the second size by the image enhancement module through each filter according to the received tuning parameters so as to eliminate the influence of severe weather and the like and retain more key information to obtain an optimized image, and inputting the optimized image into a target detection network for target detection to obtain a target detection result.

In an optional embodiment of the present invention, before the step 101 of inputting the image to be processed into the parameter prediction network, the method may further include:

step S21, training data is obtained;

step S22, labeling the targets contained in the training data to obtain labeling results;

step S23, inputting the training data into an initial parameter prediction network, and outputting tuning parameters through the initial parameter prediction network;

step S24, predicting tuning parameters output by the network according to the initial parameters, and performing image enhancement processing on the training data to obtain an optimized image corresponding to the training data;

step S25, inputting the optimized image corresponding to the training data into an initial target detection network for target detection, and outputting a target detection result through the initial target detection network;

step S26, calculating a joint loss value according to the difference between the target detection result output by the initial target detection network and the labeling result, and performing iterative optimization on the parameters of the initial parameter prediction network and the parameters of the initial target detection network until the joint loss value meets an iteration stop condition to obtain a trained parameter prediction network and a trained target detection network.

The embodiment of the invention utilizes training data to jointly train the parameter prediction network and the target detection network in advance. Specifically, firstly, training data is obtained, and targets included in the training data are labeled to obtain a labeling result. The training data may include images under normal conditions (clear, high-quality photographing conditions) and images under the preset conditions. The annotation result may include whether the image includes the target and a position of the included target.

And then, inputting the training data into an initial parameter prediction network in sequence for iterative training. For example, after initializing the parameter prediction network and the target detection network, inputting a first image in the training data into an initial parameter prediction network, wherein the initial parameter prediction network outputs tuning parameters of the first image; the initial parameter prediction network may input the tuning parameters of the first image to the image enhancement module to set the parameters of each filter in the image enhancement module. The image enhancement module predicts the tuning parameters of the first image output by the network according to the initial parameters and performs image enhancement processing on the first image to obtain an optimized image corresponding to the first image; then, the image enhancement module inputs the optimized image corresponding to the first image into an initial target detection network for target detection, and outputs a target detection result of the first image through the initial target detection network; and finally, calculating a joint loss value according to the difference between the target detection result of the first image output by the initial target detection network and the labeling result of the first image, optimizing the parameters of the initial parameter prediction network and the parameters of the initial target detection network, entering next round of training if the joint loss value does not meet the iteration stop condition, inputting a second image in the training data into the initial parameter prediction network (at the moment, the parameters of the initial parameter prediction network and the parameters of the initial target detection network are optimized once), and executing second round of optimization until the joint loss value meets the iteration stop condition to obtain the trained parameter prediction network and the trained target detection network. The joint loss value satisfies an iteration stop condition, which may include: the joint loss value is less than a preset threshold.

In an optional embodiment of the present invention, the joint loss value may be obtained by performing weighted calculation on the loss value of the initial parameter prediction network and the loss value of the initial target detection network.

In a specific implementation, the loss value of the initial target detection network may be directly used as the joint loss value, or the loss value of the initial parameter prediction network may be further calculated, and the joint loss value is obtained by weighted calculation according to the loss value of the initial parameter prediction network and the loss value of the initial target detection network.

When a training data is constructed to carry out a joint training parameter prediction network and a target detection network, tuning parameters corresponding to the training data can be labeled, further, in the iterative training process, the loss value of the initial parameter prediction network can be calculated according to the difference between the tuning parameters output by the initial parameter prediction network and the labeled tuning parameters, and the joint loss value is calculated according to the loss value of the initial parameter prediction network and the loss value of the initial target detection network in a weighting mode.

In one example, assuming that in a certain iterative training process, the initial parameter prediction network has a loss value of a1, the initial parameter prediction network has a weight of b1, the initial target detection network has a loss value of a2, and the initial target detection network has a weight of b2, the joint loss value may be: a1 × b1+ a2 × b 2. It should be noted that, in the embodiment of the present invention, specific values of the weight b1 and the weight b2 are not limited, and the influence degree of the parameter prediction network and the target detection network on the whole end-to-end system in the joint training process can be adjusted by setting the weight b1 and the weight b 2. Illustratively, the weight b1 is set to 0.1 and the weight b2 is set to 1.

The parameter prediction network and the target detection network can be obtained by performing supervised training on the existing neural network according to a large amount of training data and a machine learning method. It should be noted that, the structures and training methods of the parameter prediction network and the target detection network are not limited in the embodiments of the present disclosure. The parameter prediction network and the target detection network can be fused with various neural networks. The neural network includes, but is not limited to, at least one or a combination, superposition, nesting of at least two of the following: CNN (Convolutional Neural Network), LSTM (Long Short-Term Memory) Network, RNN (Simple Recurrent Neural Network), attention Neural Network, and the like.

In an optional embodiment of the present invention, the parameter prediction network may include a first number of convolutional layers for performing convolution operation on an input image to be processed to output a feature map, and a second number of fully-connected layers for performing fully-connected operation on the feature map output by the convolutional layers to output tuning parameters.

The specific numerical values of the first number and the second number are not limited in the embodiment of the present invention. Illustratively, the first number is 5, the second number is 2, and the parameter prediction network may include 5 convolutional layers and two fully-connected layers. Further, in the specific implementation, the number of channels of each convolutional layer and the size of each convolutional kernel may also be set according to actual needs. Illustratively, the number of channels of the first convolutional layer may be set to 16, the number of channels of each of the second to fifth convolutional layers may be set to 32, and the size of each convolution kernel may be set to 3 × 3, and the convolution step may be set to 2. Carrying out convolution operation on the input image to be processed through the 5 convolution layers to obtain a characteristic diagram, and carrying out full connection on the characteristic diagram through the two full connection layers to obtain tuning parameters needing to be input into the image enhancement module.

The parameter prediction network predicts parameters required for optimizing the image to be processed according to the global information of the image to be processed, outputs tuning parameters, and can optimize a target detection result after the image to be processed is subjected to image enhancement processing by using the tuning parameters. Compared with the method for improving the quality of the shot image by manually adjusting the camera parameters, the method and the device for predicting the image quality can automatically predict the parameters required by optimizing the image quality through the trained parameter prediction network, do not need shooting personnel to have higher professional requirements, and can reduce the operation cost of a user. In addition, the parameter prediction network is a neural network obtained by training a large amount of training data, and the training data comprises images under preset conditions (such as severe weather), so that the parameter prediction network can accurately predict the optimal target detection effect of the optimized images under the optimal parameters of the images under the preset conditions (such as severe weather), and the accuracy of the optimized parameters can be improved relative to the manual parameter adjustment.

The embodiment of the invention does not limit the type of the target detection network. In an alternative embodiment of the invention, the target detection network may comprise a YOLOX network.

The embodiment of the invention can adopt a target detection network based on YOLOX, wherein YOLOX is a target detection network without an anchor frame, the calculation amount of post-processing can be reduced, and a target detection network based on YOLOX-L, YOLOX-M or YOLOX-T can be selected according to actually deployed equipment. Of course, the above-listed target detection networks are only exemplary, and in a specific implementation, the parameter prediction network and the image enhancement module according to the embodiment of the present invention may perform end-to-end training and detection in combination with various types of target detection networks.

Further, since the number of images under the preset condition (such as shot in severe weather) is usually small, and the number of training data has an important influence on the accuracy of the neural network model, in the embodiment of the present invention, when the training data is constructed, the images under the preset condition are made by using the images under the normal condition, so as to obtain more images under the preset condition. The normal condition refers to a non-preset condition such as a photographing condition capable of photographing a clear, high-quality image. The embodiment of the invention processes the images shot under the normal condition to obtain the images under the preset condition, for example, the images shot under the normal illumination in the daytime are modified into the images shot under the dark light at night, and the images shot on the sunny day are modified into the images shot in the foggy day, and the like.

step S31, acquiring a non-foggy day image;

step S32, processing the non-foggy day images through a first control parameter and a second control parameter to generate foggy day images with different illumination intensities and different foggy day grades, wherein the first control parameter is used for controlling the illumination intensity of the generated foggy day images, and the second control parameter is used for controlling the foggy day grades of the generated foggy day images;

and step S33, constructing training data by using the foggy day images.

The foggy day is a common natural phenomenon, and the visibility is low in the foggy day, so that the detection and identification capability of the road vehicle information can be greatly influenced. Furthermore, because different illumination intensities and different fog day grades have different degrees of influence on the definition of the image, the fog day images with different illumination intensities and different fog day grades are generated when the fog day image is manufactured, so that the accuracy of the joint training parameter prediction network and the accuracy of the target detection network are improved.

The embodiment of the invention can generate foggy day images with different illumination intensities and different foggy day grades through the following formula:

I(x)＝(J(x)t(x)+A(1-t(x))) ^γ (10)

in the above formula (10), i (x) is the prepared foggy day image, j (x) is an image (non-foggy day image) photographed in normal weather, a is the brightness of j (x), γ is a first control parameter for controlling the illumination intensity of the generated foggy day image, t (x) is a medium projection diagram, and t (x) is defined as follows:

t(x)＝e ^-β d(x) (11)

in the above formula (11), β is an atmospheric scattering coefficient, and d (x) is defined as follows:

in the above equation (12), ρ represents the Euclidean distance from the current pixel to the center pixel, row and col represent the number of rows and columns of the image, respectively, the number of rows and columns being in units of pixels. Illustratively, according to the embodiment of the invention, a is 0.5, and ρ is 0.01 × i +0.05, where i is a second control parameter, and the value of i is 0 to 9, which can be used to adjust and correspond to ten different foggy day grades.

In the above equation (10), γ may be used to perform brightness adjustment once on the generated fog day image of a certain fog day level to simulate the fog day image under different illumination intensities, and if γ is greater than 1, the generated fog day image becomes bright, and if γ is less than 1, the generated fog day image becomes dark.

When the foggy day image is generated, values of gamma (a first control parameter) and i (a second control parameter) can be randomly selected to perform enhancement processing on the original image (non-foggy day image), and then foggy day images with different illumination intensities and different foggy day grades can be generated. Wherein i is used for controlling the fog level, and gamma is used for controlling the illumination intensity.

In summary, in the embodiments of the present invention, training data is used to perform joint training in advance to obtain a parameter prediction network and a target detection network, an image to be processed is input to the trained parameter prediction network, an optimization parameter can be output through the parameter prediction network, the image to be processed is subjected to image enhancement processing by using the optimization parameter, an optimized image corresponding to the image to be processed can be obtained, and the optimized image is input to the trained target detection network, so that a target detection result can be output. The embodiment of the invention can automatically predict the tuning parameters required by the image to be processed through the parameter prediction network, does not need to have higher professional requirements on shooting personnel when shooting the image, and can also reduce the operation cost of a user. In addition, the parameter prediction network is a neural network obtained through training of a large amount of training data, and the training data comprise images under preset conditions (such as severe weather), so that the parameter prediction network provided by the embodiment of the invention can accurately predict tuning parameters required by the images under the preset conditions (such as severe weather), the adaptability of the parameter prediction network to the images under the preset conditions (such as severe weather) can be enhanced, and compared with manual parameter setting, the embodiment of the invention can improve the accuracy of the tuning parameters, and further improve the accuracy of target detection. Moreover, the parameter prediction network and the target detection network of the embodiment of the invention can be obtained through end-to-end training and testing, and the cost of manual debugging can be reduced.

It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the embodiments are not limited by the order of acts, as some steps may occur in other orders or concurrently depending on the embodiments. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.

Referring to fig. 4, a block diagram of an embodiment of an object detection apparatus of the present invention is shown, and the apparatus may include:

the parameter prediction module 401 is configured to input an image to be processed into a parameter prediction network, and output a tuning parameter through the parameter prediction network, where the tuning parameter includes at least one of a defogging parameter, a white balance parameter, a contrast parameter, a hue parameter, a sharpening parameter, and a correction parameter;

an image enhancement module 402, configured to perform image enhancement processing on the image to be processed according to the tuning parameters output by the parameter prediction network, so as to obtain an optimized image corresponding to the image to be processed;

the target detection module 403 is configured to input the optimized image corresponding to the image to be processed into a target detection network for target detection, and output a target detection result through the target detection network, where the target detection network and the parameter prediction network are neural networks obtained by performing joint training in advance using training data, and the training data includes images meeting preset conditions.

Optionally, the apparatus further comprises:

the data acquisition module is used for acquiring training data;

the data labeling module is used for labeling the targets contained in the training data to obtain labeling results;

the initial prediction module is used for inputting the training data into an initial parameter prediction network and outputting tuning parameters through the initial parameter prediction network;

the initial tuning module is used for predicting tuning parameters output by a network according to the initial parameters and carrying out image enhancement processing on the training data to obtain an optimized image corresponding to the training data;

the initial detection module is used for inputting the optimized image corresponding to the training data into an initial target detection network for target detection and outputting a target detection result through the initial target detection network;

and the iterative training module is used for calculating a joint loss value according to the difference between the target detection result output by the initial target detection network and the labeling result, and performing iterative optimization on the parameters of the initial parameter prediction network and the parameters of the initial target detection network until the joint loss value meets an iteration stop condition to obtain a trained parameter prediction network and a trained target detection network.

Optionally, the joint loss value is obtained by performing weighted calculation according to the loss value of the initial parameter prediction network and the loss value of the initial target detection network.

Optionally, the apparatus further comprises:

the image acquisition module is used for acquiring non-foggy day images;

the image making module is used for processing the non-foggy day images through a first control parameter and a second control parameter to generate foggy day images with different illumination intensities and different foggy day grades, the first control parameter is used for controlling the illumination intensity of the generated foggy day images, and the second control parameter is used for controlling the foggy day grades of the generated foggy day images;

and the data construction module is used for constructing training data by utilizing the foggy day images.

Optionally, the apparatus further comprises:

the size adjusting module is used for adjusting the original image to respectively obtain a to-be-processed image with a first size and a to-be-processed image with a second size;

the parameter prediction module is specifically configured to input the image to be processed of the first size into the parameter prediction network;

the image enhancement module is specifically configured to perform image enhancement processing on the image to be processed of the second size according to the tuning parameters output by the parameter prediction network.

Optionally, the image enhancement module comprises:

the input submodule is used for inputting the image to be processed and the tuning parameters output by the parameter prediction network into an image enhancement module, and the image enhancement module comprises filters which correspond to the tuning parameters output by the parameter prediction network one by one;

and the processing submodule is used for sequentially carrying out image enhancement processing on the image to be processed by using the corresponding tuning parameters through each filter.

Optionally, the image enhancement module comprises at least one of a defogging filter, a white balance filter, a contrast filter, a tone filter, a sharpening filter, and a correction filter, an output of a previous filter being an input of a next filter.

Optionally, the parameter prediction network includes a first number of convolutional layers and a second number of fully-connected layers, where the convolutional layers are configured to perform convolution operation on an input image to be processed to output a feature map, and the fully-connected layers are configured to perform fully-connected operation on the feature map output by the convolutional layers to output tuning parameters.

Optionally, the target detection network comprises a YOLOX network.

For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts in the embodiments are referred to each other.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

An embodiment of the present invention further provides a non-transitory computer-readable storage medium, where when a processor of a device (server or terminal) executes an instruction in the storage medium, the device is enabled to perform the description of the target detection method in the embodiment corresponding to fig. 1, and therefore, the description will not be repeated here. In addition, the beneficial effects of the same method are not described in detail. For technical details not disclosed in the embodiments of the computer program product or the computer program referred to in the present application, reference is made to the description of the embodiments of the method of the present application.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. The invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and should not be taken as limiting the scope of the present invention, which is intended to cover any modifications, equivalents, improvements, etc. within the spirit and scope of the present invention.

The above detailed description of the target detection method, the target detection device and the machine-readable storage medium provided by the present invention, and the specific examples applied herein have been set forth to explain the principles and embodiments of the present invention, and the above descriptions of the embodiments are only used to help understand the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A method of object detection, the method comprising:

2. The method of claim 1, wherein before inputting the image to be processed into the parameter prediction network, the method further comprises:

acquiring training data;

labeling targets contained in the training data to obtain a labeling result;

inputting the training data into an initial parameter prediction network, and outputting tuning parameters through the initial parameter prediction network;

predicting tuning parameters output by a network according to the initial parameters, and performing image enhancement processing on the training data to obtain an optimized image corresponding to the training data;

inputting the optimized image corresponding to the training data into an initial target detection network for target detection, and outputting a target detection result through the initial target detection network;

and calculating a joint loss value according to the difference between the target detection result output by the initial target detection network and the labeling result, and performing iterative optimization on the parameters of the initial parameter prediction network and the parameters of the initial target detection network until the joint loss value meets an iteration stop condition to obtain a trained parameter prediction network and a trained target detection network.

3. The method of claim 2, wherein the joint loss value is calculated by weighting the initial parameter prediction network loss value and the initial target detection network loss value.

4. The method according to any one of claims 1 to 3, wherein before inputting the image to be processed into the parameter prediction network, the method further comprises:

acquiring a non-foggy day image;

processing the non-foggy day images through a first control parameter and a second control parameter to generate foggy day images with different illumination intensities and different foggy day grades, wherein the first control parameter is used for controlling the illumination intensity of the generated foggy day images, and the second control parameter is used for controlling the foggy day grades of the generated foggy day images;

and constructing training data by using the foggy day image.

5. The method of claim 1, wherein before inputting the image to be processed into the parameter prediction network, the method further comprises:

adjusting the original image to respectively obtain a to-be-processed image with a first size and a to-be-processed image with a second size;

the inputting of the image to be processed into the parameter prediction network comprises:

inputting the image to be processed with the first size into the parameter prediction network;

the predicting the tuning parameters output by the network according to the parameters and carrying out image enhancement processing on the image to be processed comprises the following steps:

and predicting the tuning parameters output by the network according to the parameters, and performing image enhancement processing on the image to be processed with the second size.

6. The method according to claim 1, wherein said predicting the tuning parameters output by the network according to the parameters, and performing image enhancement processing on the image to be processed comprises:

inputting the image to be processed and the tuning parameters output by the parameter prediction network into an image enhancement module, wherein the image enhancement module comprises filters which correspond to the tuning parameters output by the parameter prediction network one by one;

and sequentially carrying out image enhancement processing on the images to be processed by using the corresponding tuning parameters through each filter.

7. The method of claim 6, wherein the image enhancement module comprises at least one of a defogging filter, a white balance filter, a contrast filter, a hue filter, a sharpening filter, and a correction filter, wherein an output of a previous filter is used as an input to a next filter.

8. The method of claim 1, wherein the parameter prediction network comprises a first number of convolutional layers for performing convolutional operation on the input image to be processed to output a feature map, and a second number of fully-connected layers for performing fully-connected output of tuning parameters on the feature map output by the convolutional layers.

9. The method of claim 1, wherein the target detection network comprises a YOLOX network.

10. An object detection apparatus, characterized in that the apparatus comprises:

11. A machine-readable storage medium having stored thereon instructions which, when executed by one or more processors of an apparatus, cause the apparatus to perform the object detection method of any one of claims 1 to 9.