CN113936134A

CN113936134A - Target detection method and device

Info

Publication number: CN113936134A
Application number: CN202010602952.6A
Authority: CN
Inventors: 李翔; 杨志雄; 李亚; 王文海; 李俊
Original assignee: Momenta Suzhou Technology Co Ltd
Current assignee: Momenta Suzhou Technology Co Ltd
Priority date: 2020-06-29
Filing date: 2020-06-29
Publication date: 2022-01-14

Abstract

The embodiment of the invention discloses a target detection method and a device, wherein the method comprises the following steps: obtaining an image to be detected; extracting the characteristics of the image to be detected by utilizing a pre-established characteristic extraction layer of the target detection model, and determining the characteristics of the image to be detected corresponding to the image to be detected; determining a probability value corresponding to each initial frame boundary position information corresponding to each detected target detected in an image to be detected by utilizing a pre-established feature regression layer of a target detection model and the features of the image to be detected; and aiming at each detection target, determining the position information of the target detection frame corresponding to the detection target based on the probability value corresponding to the boundary position information of each initial frame corresponding to the detection target, wherein the pre-established target detection model is as follows: training the obtained model based on the sample image and the corresponding calibration information thereof, wherein the calibration information comprises: and the position information of the calibration frame corresponding to the sample target in the corresponding sample image is used for realizing more accurate detection of the boundary of the target in the image.

Description

Target detection method and device

Technical Field

The invention relates to the technical field of target detection, in particular to a target detection method and device.

Background

Currently, training data for training an object detection model generally defines the true boundary of an object in an image as clearly as possible, but there are still some cases where there is ambiguity and ambiguity in the boundary of the object in a sample image in part of the training data, such as: the method comprises the following steps that a target in an image is partially shielded or a partial area of the target in the image is not clear in outline due to light, so that the position information of a boundary frame in calibration information corresponding to a sample image in training data cannot clearly represent the real boundary of the target in the image.

However, the frame regression method of the feature regression layer of the current target detection model generally adopts a singular value regression method, that is, the frame of the target in the image is considered to satisfy dirac distribution, that is, the position information of the boundary frame in the calibration information corresponding to the sample image in the training data is considered to be the real boundary of the target in the sample image, and accordingly, in the process of training the target detection model by using the training data, the target detection model does not learn the uncertainty of the boundary of the target. Subsequently, by using the target detection model, a large error may exist in a detection result of a target which is partially blocked in an image or has an unclear partial outline due to a light reason, that is, a position determined by a detection frame of the target in the image in the above-mentioned situation is not accurate enough.

Disclosure of Invention

The invention provides a target detection method and a target detection device, which are used for realizing more accurate detection of a boundary of a target in an image. The specific technical scheme is as follows:

in a first aspect, an embodiment of the present invention provides a target detection method, where the method includes:

obtaining an image to be detected;

extracting the characteristics of the image to be detected by utilizing a pre-established characteristic extraction layer of a target detection model, and determining the characteristics of the image to be detected corresponding to the image to be detected;

determining a probability value corresponding to each initial frame boundary position information corresponding to each detected target detected in the image to be detected by using the pre-established feature regression layer of the target detection model and the features of the image to be detected; and aiming at each detection target, determining the target detection frame position information corresponding to the detection target based on the probability value corresponding to each initial frame boundary position information corresponding to the detection target, wherein the pre-established target detection model is as follows: training an obtained model based on a sample image and corresponding calibration information thereof, wherein the calibration information comprises: and the position information of the calibration frame corresponding to the sample target in the corresponding sample image.

Optionally, the step of determining, for each detection target, target detection frame position information corresponding to the detection target based on a probability value corresponding to each initial frame boundary position information corresponding to the detection target includes:

and for each detection target, determining the boundary position information of the target detection frame corresponding to the detection target on each azimuth based on the probability value corresponding to the boundary position information of each initial frame corresponding to the detection target on each azimuth so as to determine the position information of the target detection frame corresponding to the detection target, wherein the azimuth comprises the upper azimuth, the lower azimuth, the left azimuth and the right azimuth of the image to be detected.

Optionally, the step of determining, for each detection target, the boundary position information of the target detection box corresponding to the detection target in each position based on the probability value corresponding to each initial box boundary position information corresponding to the detection target in each position includes:

and determining the boundary position information of the target detection frame corresponding to the detection target on each azimuth based on the probability value corresponding to each initial frame boundary position information corresponding to the detection target on each azimuth and a preset integral formula.

Optionally, the expression of the preset integral formula is as follows:

wherein, the P represents the position represented by the boundary position information of the target detection frame corresponding to the detection target in the target direction, and the target translation value between the position where the activation point corresponding to the detection target is predicted by the pre-established target detection model, a_iA preset translation value p (a) corresponding to the ith initial frame boundary position information corresponding to the detection target in the target direction_i) The method comprises the steps of representing a probability value corresponding to a preset translation value corresponding to the ith initial frame boundary position information of the detected target in a target position, wherein n +1 represents the total number of the preset translation values corresponding to the initial frame boundary position information corresponding to the detected target in the target position, and the target position is any one of the positions.

Optionally, determining, at the feature regression layer of the pre-established target detection model and the feature of the image to be detected, a probability value corresponding to each initial frame boundary position information corresponding to each detected target in the image to be detected; before the step of determining, for each detection target, the target detection frame position information corresponding to the detection target based on the probability value corresponding to each initial frame boundary position information corresponding to the detection target, the method further includes:

a process of training a pre-established target detection model, wherein the process comprises:

obtaining a plurality of sample images and calibration information corresponding to each sample image;

obtaining an initial target detection model;

inputting the sample image into a feature extraction layer of the initial target detection model aiming at each sample image, and extracting to obtain sample image features corresponding to the sample image;

for each sample image, inputting the sample image characteristics corresponding to the sample image into the characteristic regression layer of the initial target detection model, and determining the probability value corresponding to the boundary position information of each prediction frame corresponding to each position of the sample target in the sample image; determining the current frame position information corresponding to each sample target in the sample image based on the probability value corresponding to the boundary position information of each prediction frame corresponding to each position corresponding to the sample target and a preset integral formula;

aiming at each sample image, determining a current loss value by utilizing a preset loss function, current position information corresponding to each sample target in the sample image and calibration frame position information corresponding to each sample target in the calibration information corresponding to the sample image;

judging whether the current loss value exceeds a preset loss threshold value or not;

if the current loss value is judged to exceed a preset loss threshold value, adjusting model parameters of a feature extraction layer and a feature regression layer of the initial target detection model, returning to execute the step of inputting the sample image into the feature extraction layer of the initial target detection model aiming at each sample image, and extracting the sample image feature corresponding to the sample image;

and if the current loss value is judged not to exceed the preset loss threshold value, determining that the initial target detection model reaches a convergence state, and determining a pre-established target detection model comprising a feature extraction layer and a feature regression layer.

Optionally, determining, at the feature regression layer of the pre-established target detection model and the feature of the image to be detected, a probability value corresponding to each initial frame boundary position information corresponding to each detected target in the image to be detected; and for each detection target, after the step of determining the target detection frame position information corresponding to the detection target based on the probability value corresponding to each initial frame boundary position information corresponding to the detection target, the method further comprises the following steps:

and outputting the probability value corresponding to each initial frame boundary position information corresponding to each detected target and/or the target detection frame position information corresponding to each detected target.

In a second aspect, an embodiment of the present invention provides an object detection apparatus, where the apparatus includes:

an obtaining module configured to obtain an image to be detected;

the first determination module is configured to utilize a pre-established feature extraction layer of a target detection model to perform feature extraction on the image to be detected and determine the image to be detected corresponding to the image to be detected;

the second determining module is configured to determine a probability value corresponding to each initial frame boundary position information corresponding to each detected target detected in the image to be detected by using the pre-established feature regression layer of the target detection model and the features of the image to be detected; and aiming at each detection target, determining the target detection frame position information corresponding to the detection target based on the probability value corresponding to each initial frame boundary position information corresponding to the detection target, wherein the pre-established target detection model is as follows: training an obtained model based on a sample image and corresponding calibration information thereof, wherein the calibration information comprises: and the position information of the calibration frame corresponding to the sample target in the corresponding sample image.

Optionally, the second determining module is specifically configured to

Optionally, the expression of the preset integral formula is as follows:

Optionally, the apparatus further comprises:

the model training module is configured to determine a probability value corresponding to each initial frame boundary position information corresponding to each detected target detected in the image to be detected in the feature regression layer of the pre-established target detection model and the feature of the image to be detected; before determining the position information of the target detection frame corresponding to the detection target based on the probability value corresponding to the position information of each initial frame boundary corresponding to the detection target aiming at each detection target, training to obtain a pre-established target detection model, wherein the model training module is specifically configured to obtain a plurality of sample images and calibration information corresponding to each sample image;

obtaining an initial target detection model;

if the current loss value exceeds a preset loss threshold value, adjusting model parameters of a feature extraction layer and a feature regression layer of the initial target detection model, returning to execute the step of inputting the sample image into the feature extraction layer of the initial target detection model aiming at each sample image, and extracting to obtain sample image features corresponding to the sample image;

Optionally, the apparatus further comprises:

the output module is configured to determine a probability value corresponding to each initial frame boundary position information corresponding to each detected target detected in the image to be detected in the feature regression layer of the pre-established target detection model and the feature of the image to be detected; and aiming at each detection target, determining target detection frame position information corresponding to the detection target based on the probability value corresponding to each initial frame boundary position information corresponding to the detection target, and then outputting the detected probability value corresponding to each initial frame boundary position information corresponding to each detection target and/or target detection frame position information corresponding to each detection target.

As can be seen from the above, the target detection method and apparatus provided by the embodiment of the present invention obtain an image to be detected; extracting the characteristics of the image to be detected by utilizing a pre-established characteristic extraction layer of the target detection model, and determining the characteristics of the image to be detected corresponding to the image to be detected; determining a probability value corresponding to each initial frame boundary position information corresponding to each detected target detected in an image to be detected by utilizing a pre-established feature regression layer of a target detection model and the features of the image to be detected; and aiming at each detection target, determining the position information of the target detection frame corresponding to the detection target based on the probability value corresponding to the boundary position information of each initial frame corresponding to the detection target, wherein the pre-established target detection model is as follows: training the obtained model based on the sample image and the corresponding calibration information thereof, wherein the calibration information comprises: and the position information of the calibration frame corresponding to the sample target in the corresponding sample image.

By applying the embodiment of the invention, the probability value corresponding to each initial frame boundary position information corresponding to each detection target can be regressed based on the regression layer of the pre-established target detection model, namely, the probability distribution of the frame boundary on each position corresponding to each detection target, namely the uncertainty of the frame boundary, is regressed, and further, the position information of the frame boundary on each direction is determined based on the probability value corresponding to each initial frame boundary position information corresponding to each detection target aiming at each detection target, so that the target detection frame position information corresponding to the detection target is determined, and the more accurate detection of the frame boundary of the target in the image is realized. Of course, not all of the advantages described above need to be achieved at the same time in the practice of any one product or method of the invention.

The innovation points of the embodiment of the invention comprise:

1. the method comprises the steps of regressing a probability value corresponding to each initial frame boundary position information corresponding to each detection target based on a regression layer of a pre-established target detection model, namely regressing probability distribution of each frame boundary corresponding to each detection target, namely uncertainty of the frame boundary, further determining position information of each frame boundary based on the probability value corresponding to each initial frame boundary position information corresponding to each detection target aiming at each detection target so as to determine target detection frame position information corresponding to the detection target, and further realizing more accurate detection of the frame boundary of the target in the image.

2. For each detection target, determining the boundary position information of the target detection frame corresponding to the detection target in each position based on the probability value corresponding to the boundary position information of each initial frame corresponding to the detection target in the upper, lower, left and right directions of the detection image, namely the probability distribution, namely uncertainty, of the frame boundary in each direction of the detection target, so as to determine the position information of the target detection frame corresponding to the detection target, obtain the frame boundary with more accurate position in each direction, and determine the position information of the detection target with more accurate position.

3. And integrating to obtain the boundary position information of the target detection frame corresponding to the detection target in each azimuth based on the probability value corresponding to each initial frame boundary position information corresponding to the detection target in each azimuth and a preset integral formula so as to determine the position information of the detection target with more accurate position.

4. In the process of training to obtain a pre-established target detection model, a probability value corresponding to each prediction frame boundary position information corresponding to each position of a sample target in a sample image is regressed through a characteristic regression layer of the initial target detection model, namely, a probability value corresponding to different prediction frame boundary position information in each position is regressed, namely, the probability distribution, namely uncertainty, of the frame boundary of the sample target in each position is determined, and then the current frame position information corresponding to the sample target is determined by combining a preset integral formula and the probability value corresponding to each prediction frame boundary position information corresponding to each position of the sample target; for each sample image, a preset loss function, current position information corresponding to each sample target in the sample image and calibration frame position information corresponding to each sample target in the calibration information corresponding to the sample image are utilized to determine a current loss value, whether model parameters of a feature extraction layer and a feature regression layer of an initial target detection model are adjusted or not is determined, namely whether the initial target detection model is trained or not is determined, a pre-established target detection model is trained to obtain probability distribution, namely uncertainty capacity, of frame boundaries at each position of the target in a predicted image is learned, the pre-established target detection model is made to learn more robust and steady image feature representation, a basis is provided for accurate determination of frame boundary position information of subsequent targets, and reference is provided for subsequent tasks.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It is to be understood that the drawings in the following description are merely exemplary of some embodiments of the invention. For a person skilled in the art, without inventive effort, further figures can be obtained from these figures.

Fig. 1 is a schematic flow chart of a target detection method according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart of a training process for a pre-established target detection model;

FIG. 3 is a schematic view showing a visualization of probability values corresponding to each initial frame boundary position information corresponding to the detected target in each direction;

fig. 4 is a schematic structural diagram of an object detection apparatus according to an embodiment of the present invention.

Detailed Description

The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It is to be understood that the described embodiments are merely a few embodiments of the invention, and not all embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.

It is to be noted that the terms "comprises" and "comprising" and any variations thereof in the embodiments and drawings of the present invention are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

The invention provides a target detection method and a target detection device, which are used for realizing more accurate detection of a boundary of a target in an image. The following provides a detailed description of embodiments of the invention.

Fig. 1 is a schematic flow chart of a target detection method according to an embodiment of the present invention. The method may comprise the steps of: .

S101: and obtaining an image to be detected.

The target detection method provided by the embodiment of the invention can be applied to any electronic equipment with computing capacity, and the electronic equipment can be a terminal or a server. In an implementation mode, the electronic equipment can be vehicle-mounted equipment and is arranged on a vehicle, the vehicle can also be provided with image acquisition equipment, the image acquisition equipment can acquire images aiming at the environment where the vehicle is located, the electronic equipment is connected with the image acquisition equipment, and the images acquired by the image acquisition equipment can be acquired and serve as images to be detected. In another implementation manner, the electronic device may be an off-board device, and the electronic device may be connected to an image capturing device that captures an image of a target scene, and obtain an image captured by the image capturing device for the target scene, where the image is to be detected, and in one case, the target scene may be a road scene, a square scene, or an indoor scene, which may be all the cases.

The image to be detected may be an RGB (Red Green Blue ) image or an infrared image, which may be both. The embodiment of the invention does not limit the type of the image to be detected.

S102: and performing feature extraction on the image to be detected by using a pre-established feature extraction layer of the target detection model, and determining the feature of the image to be detected corresponding to the image to be detected.

In this step, the electronic device performs operations such as convolution operation and pooling operation on the image to be detected by using a pre-established feature extraction layer of the target detection model, so as to extract image features corresponding to the image to be detected as image features to be detected. The process of extracting the features of the image to be detected by the feature extraction layer may adopt any one of the related technologies to extract the image features of the image, and the embodiment of the present invention is not limited.

S103: determining a probability value corresponding to each initial frame boundary position information corresponding to each detected target detected in an image to be detected by utilizing a pre-established feature regression layer of a target detection model and the features of the image to be detected; and aiming at each detection target, determining target detection frame position information corresponding to the detection target based on the probability value corresponding to each initial frame boundary position information corresponding to the detection target.

The pre-established target detection model is as follows: training the obtained model based on the sample image and the corresponding calibration information thereof, wherein the calibration information comprises: and the position information of the calibration frame corresponding to the sample target in the corresponding sample image. The feature regression layer of the pre-established target detection model corresponds to a preset number of regression outputs for each azimuth, and each regression output corresponding to each azimuth corresponds to a preset translation value.

After the electronic equipment obtains the image features to be detected, the feature regression layer and the image features to be detected are utilized to regress to obtain the probability value corresponding to each initial frame boundary position information corresponding to each detected target detected in the image to be detected, wherein the probability value corresponding to each initial frame boundary position information corresponding to each detected target comprises: the probability value corresponding to each initial frame boundary position information corresponding to each direction of the detection target, namely the probability value corresponding to each initial frame boundary position information corresponding to four directions of the upper, lower, left and right of the image to be detected. The probability value corresponding to each initial frame boundary position information corresponding to each azimuth of the detection target can represent a discretized generalized probability density function of the frame boundary corresponding to each azimuth of the detection target, that is, the probability distribution of the frame boundary corresponding to each azimuth of the detection target, that is, the uncertainty of the position of the frame boundary corresponding to each azimuth of the detection target.

Subsequently, for each detection target, the electronic device determines the boundary position information of the target detection frame corresponding to the detection target in each orientation based on the probability value corresponding to each initial frame boundary position information corresponding to the detection target in each orientation, that is, based on the probability distribution of the frame boundary corresponding to the detection target in each orientation, so as to determine the target detection frame position information corresponding to the detection target.

The above process of obtaining the probability value corresponding to each initial frame boundary position information corresponding to each detected target in the image to be detected by regression by using the feature regression layer and the feature of the image to be detected may be: inputting the characteristics of the image to be detected into a characteristic regression layer of a pre-established target detection model, carrying out regression processing on the characteristics of the image to be detected by the characteristic regression layer, and regressing to obtain a preset number of regression numerical values corresponding to each detection target in each direction; and aiming at the regression values of the detection targets in each azimuth, determining to obtain a preset number of probability values by using a preset excitation function, namely a softmax function, and a preset number of regression values of the detection targets in the azimuth, so as to obtain the probability value corresponding to the boundary position information of each initial frame corresponding to each detection target. Each regression value in the corresponding preset number of regression values of the detection target in each direction corresponds to a subsequently mentioned preset translation value, and after the detection target translates the preset translation value from the corresponding activation point, initial frame boundary position information corresponding to the detection target can be obtained. The position of the activation point corresponding to the detection target is the position predicted and regressed by the pre-established target detection model, and the way of predicting the position of the activation point corresponding to the regression detection target by the pre-established target detection model can refer to the way of predicting the position of the activation point corresponding to the target in the regression image by the target detection model in the related art, and is not described herein again.

The preset translation value is determined based on the calibration distance between each frame boundary of the calibration frame represented by the position information of each calibration frame in the calibration information corresponding to the sample image required by the pre-established target detection model obtained through training and the central point of the calibration frame, namely based on the calibration distance between the frame boundaries of the four directions of the upper, lower, left and right directions of the calibration frame represented by the position information of each calibration frame and the central point of the calibration frame. The minimum value of the preset translation values may be 0, and the maximum value of the preset translation values is the maximum calibration distance among the calibration distances. The values between the translation values in the preset translation values may be sequentially and uniformly increased from the minimum value to the maximum value, or may be sequentially and non-uniformly increased.

By applying the embodiment of the invention, the probability value corresponding to each initial frame boundary position information corresponding to each detection target can be regressed based on the regression layer of the pre-established target detection model, namely, the probability distribution of each frame boundary corresponding to each detection target, namely the uncertainty of the frame boundary, is regressed, and further, the position information of each frame boundary is determined based on the probability value corresponding to each initial frame boundary position information corresponding to each detection target aiming at each detection target, so that the target detection frame position information corresponding to the detection target is determined, and the frame boundary of the target in the image is more accurately detected.

In another embodiment of the present invention, the step of determining, for each detected object, the boundary position information of the object detected by the detected object in each orientation based on the probability value corresponding to each initial boundary position information corresponding to the detected object in each orientation may include:

In this implementation manner, the electronic device determines, by using a feature regression layer of a pre-established target detection model, a probability value corresponding to each initial frame boundary position information corresponding to each detection target in each direction, that is, determines a probability distribution, that is, uncertainty, corresponding to each frame boundary of each detection target in each direction, and further determines, in an integration manner, target detection frame boundary position information corresponding to each detection target in each direction based on the probability value corresponding to each initial frame boundary position information corresponding to each detection target in each direction and a preset integration formula.

In another embodiment of the present invention, in the process of training to obtain the pre-established target detection model, the pre-established target detection model may learn the probability density function p (x) corresponding to the position where the frame boundary corresponding to each direction of the target in the image is located, and in order to simplify the regression complexity of the feature regression layer of the target detection model, the regression range of the feature regression layer is correspondingly limited, and the regression range of the feature regression layer is limited to a₀And a_nA of₀Is the minimum of the above-mentioned preset translation values, the_nIs the maximum of the above mentioned preset translation values; discretizing is carried out, and correspondingly, the expression of the preset integral formula can be as follows:

wherein, the P represents the target translation value between the position represented by the boundary position information of the target detection frame corresponding to the detection target in the target direction and the position of the activation point corresponding to the detection target predicted by the pre-established target detection model, a_iA preset translation value p (a) corresponding to the ith initial frame boundary position information corresponding to the detection target in the target direction_i) The method includes the steps that a probability value corresponding to a preset translation value corresponding to the ith initial frame boundary position information of the detected target in a target direction is shown, namely the probability value corresponding to the ith initial frame boundary position information of the detected target in the target direction, n +1 shows the total number of the preset translation values corresponding to the initial frame boundary position information of the detected target in the target direction, and the target direction is any direction in the directions.

Wherein n is a positive integer, and n +1 is the above-mentioned predetermined number.

In another embodiment of the present invention, a probability value corresponding to each initial frame boundary position information corresponding to each detected target detected in an image to be detected is determined by using a feature regression layer of a pre-established target detection model and the features of the image to be detected; before the step of determining, for each detection target, the target detection frame position information corresponding to the detection target based on the probability value corresponding to each initial frame boundary position information corresponding to the detection target, the method may further include:

a process of training a pre-established target detection model, wherein, as shown in fig. 2, the process may include the following steps:

s201: obtaining a plurality of sample images and calibration information corresponding to each sample image;

s202: obtaining an initial target detection model;

s203: inputting the sample image into a feature extraction layer of an initial target detection model aiming at each sample image, and extracting to obtain sample image features corresponding to the sample image;

s204: inputting the sample image characteristics corresponding to each sample image into a characteristic regression layer of an initial target detection model, and determining the probability value corresponding to the boundary position information of each prediction frame corresponding to each position of the sample target in the sample image; determining the current frame position information corresponding to each sample target in the sample image based on the probability value corresponding to the boundary position information of each prediction frame corresponding to each position corresponding to the sample target and a preset integral formula;

s205: aiming at each sample image, determining a current loss value by utilizing a preset loss function, current position information corresponding to each sample target in the sample image and calibration frame position information corresponding to each sample target in the calibration information corresponding to the sample image;

s206: judging whether the current loss value exceeds a preset loss threshold value or not;

s207: if the current loss value is judged to exceed the preset loss threshold value, adjusting model parameters of a feature extraction layer and a feature regression layer of the initial target detection model, and returning to execute S203;

s208: and if the current loss value is judged not to exceed the preset loss threshold value, determining that the initial target detection model reaches a convergence state, and determining a pre-established target detection model comprising a feature extraction layer and a feature regression layer.

In this implementation manner, before determining the target detection result corresponding to the image to be detected, the electronic device may further include a process of training to obtain a pre-established target detection model. Correspondingly, the electronic equipment obtains a plurality of sample images and calibration information corresponding to each sample image, wherein the calibration information comprises the calibration frame position information corresponding to the sample target in the corresponding sample image. Obtaining an initial target detection model, wherein a feature regression layer of the initial target detection model respectively corresponds to a preset number of regression outputs aiming at the upper, lower, left and right directions of an image; inputting the sample image into a feature extraction layer of an initial target detection model aiming at each sample image, and extracting to obtain sample image features corresponding to the sample image; inputting the sample image characteristics corresponding to the sample image into a characteristic regression layer of the initial target detection model, and performing regression processing on the sample image characteristics by the characteristic regression layer to obtain a preset number of regression values corresponding to each sample target in each direction through regression; determining to obtain a preset number of probability values by using a preset excitation function, namely a softmax function, and a preset number of regression values of the sample target in each azimuth, wherein the preset number of regression values of the sample target in each azimuth corresponds to the preset number of probability values, namely determining the probability value corresponding to the boundary position information of each prediction frame of the sample target in each azimuth in the sample image; and integrating to obtain the current frame position information corresponding to the sample target based on the probability value corresponding to each prediction frame boundary position information corresponding to each position corresponding to the sample target and a preset integral formula aiming at each sample target in the sample image.

Aiming at each sample image, determining a current loss value by utilizing a preset loss function, current position information corresponding to each sample target in the sample image and calibration frame position information corresponding to each sample target in the calibration information corresponding to the sample image; judging whether the current loss value exceeds a preset loss threshold value or not; if the current loss value exceeds the preset loss threshold value, determining that the initial target detection model does not reach the convergence state, adjusting model parameters of a feature extraction layer and a feature regression layer of the initial target detection model by using a preset optimization algorithm, and returning to execute S203; and if the current loss value is judged not to exceed the preset loss threshold value, determining that the initial target detection model reaches a convergence state, and determining a pre-established target detection model comprising a feature extraction layer and a feature regression layer. The pre-established target detection model learns the probability density function of the position of the frame boundary of the target in the image through the training process, can predict the probability distribution of the position of the frame boundary of the target in the image, can model the uncertainty of the frame boundary of the target in the image in a display mode, and learns more robust and stable representation of image characteristics. Providing a basis for accurate determination of the position information of the target in subsequent images.

The preset loss function may be any type of loss function that may be used in a neural network model in the related art, and the embodiment of the present invention is not limited. The pre-set optimization algorithm may include, but is not limited to, a gradient descent method.

In another embodiment of the present invention, after the S103, the method may further include the steps of:

In this implementation manner, in one case, the electronic device may directly output, in a form of a numerical value, a probability value corresponding to each piece of initial frame boundary position information corresponding to each detected object, and/or directly output, in a form of a numerical value, piece of target detection frame position information corresponding to each detected object. In another case, the electronic device may output, in a graphical form, a probability value corresponding to each detected initial box boundary position information corresponding to each detected object, and/or directly output, in a graphical form, object detection box position information corresponding to each detected object.

Under the condition that the electronic equipment outputs the position information of the target detection frame corresponding to each detection target, the electronic equipment can draw the detection frame represented by the position information of the target detection frame corresponding to each detection target at the corresponding position of the image to be detected so as to output the position information of the target detection frame corresponding to each detection target, and a user can visually observe the position of the detection target.

Under the condition that the probability value corresponding to each detected initial frame boundary position information corresponding to each detected target is output, the probability value corresponding to each initial frame boundary position information corresponding to each detected target can be visualized, namely, for the probability value corresponding to each detected target in each position corresponding to the initial frame boundary position information, a histogram is drawn by using the probability value corresponding to each initial frame boundary position information corresponding to the detected target in the position, wherein the vertical coordinate of the histogram can represent the probability value, and the horizontal coordinate of the histogram can represent the preset translation value. As shown in fig. 3, the left image in fig. 3 is an image to be detected, and the four histograms on the right side respectively represent probability values corresponding to preset translation values on the left position, "left", the upper position, "top", the right position, "right", and the lower position "bottom" corresponding to the target a in the left image in fig. 3, that is, probability values corresponding to frame boundary position information on the left position, the upper position, the right position, and the lower position corresponding to the target a in the left image, that is, probability distribution conditions.

The probability distribution condition and the uncertainty of the frame boundary on each position of the detected target can be visually observed in a graph form, wherein if the probability value corresponding to the initial frame boundary position information corresponding to the histogram representation of a certain position of the target presents a flatter probability distribution, the position of the frame boundary of the target on the position can be determined to have higher uncertainty; if the histogram corresponding to a certain position of the target represents that the probability value corresponding to the initial frame boundary position information corresponding to the histogram represents a sharper distribution, it can be determined that the position of the frame boundary of the target in the position has a higher certainty, which can indicate that the pre-established target detection model predicts the position of the frame boundary clearly and accurately. As shown in fig. 3, the position of the box boundary on the upper position of the target a in the left image, the position of the box boundary on the left image, and the position of the box boundary on the right image are relatively clear, and accordingly, the histogram corresponding to the upper position of the target a determined by the target detection process provided by the embodiment of the present invention represents that the probability value corresponding to the initial box boundary position information corresponding to the histogram presents a relatively sharp distribution, that is, the probability distribution corresponding to the box boundary corresponding to the upper position of the target a presents a relatively sharp distribution; the histogram corresponding to the left bit of the target a represents that the probability value corresponding to the initial box boundary position information corresponding to the histogram represents a relatively sharp distribution, that is, the probability distribution corresponding to the box boundary corresponding to the left bit of the target a represents a relatively sharp distribution condition; and the histogram corresponding to the right bit of the target a represents that the probability value corresponding to the initial box boundary position information corresponding thereto presents a sharper distribution, that is, the probability distribution corresponding to the box boundary corresponding to the right bit of the target a presents a sharper distribution condition. However, due to the light factor, the position of the box boundary in the lower position of the target a in the left image in fig. 3 is blurred, and the probability value corresponding to the position information of the initial box boundary corresponding to the histogram representation of the lower position of the target a determined by using the target detection process provided by the embodiment of the present invention is in a flatter distribution, that is, the probability distribution corresponding to the box boundary in the upper position of the target a is in a flatter distribution condition.

Corresponding to the above method embodiment, an embodiment of the present invention provides an object detection apparatus, and as shown in fig. 4, the apparatus may include:

an obtaining module 410 configured to obtain an image to be detected;

the first determining module 420 is configured to perform feature extraction on the image to be detected by using a feature extraction layer of a pre-established target detection model, and determine a feature of the image to be detected corresponding to the image to be detected;

a second determining module 430, configured to determine, by using the feature regression layer of the pre-established target detection model and the features of the image to be detected, a probability value corresponding to each initial frame boundary position information corresponding to each detected target detected in the image to be detected; and aiming at each detection target, determining the target detection frame position information corresponding to the detection target based on the probability value corresponding to each initial frame boundary position information corresponding to the detection target, wherein the pre-established target detection model is as follows: training an obtained model based on a sample image and corresponding calibration information thereof, wherein the calibration information comprises: and the position information of the calibration frame corresponding to the sample target in the corresponding sample image.

In another embodiment of the present invention, the second determining module 430 is specifically configured to

In another embodiment of the present invention, the expression of the predetermined integral formula is:

In another embodiment of the present invention, the apparatus further comprises:

a model training module (not shown in the figure) configured to determine, in the feature regression layer using the pre-established target detection model and the feature of the image to be detected, a probability value corresponding to each initial frame boundary position information corresponding to each detected target detected in the image to be detected; before determining the position information of the target detection frame corresponding to the detection target based on the probability value corresponding to the position information of each initial frame boundary corresponding to the detection target aiming at each detection target, training to obtain a pre-established target detection model, wherein the model training module is specifically configured to obtain a plurality of sample images and calibration information corresponding to each sample image;

obtaining an initial target detection model;

an output module (not shown in the figure), configured to determine, in the feature regression layer using the pre-established target detection model and the feature of the image to be detected, a probability value corresponding to each initial frame boundary position information corresponding to each detected target detected in the image to be detected; and aiming at each detection target, determining target detection frame position information corresponding to the detection target based on the probability value corresponding to each initial frame boundary position information corresponding to the detection target, and then outputting the detected probability value corresponding to each initial frame boundary position information corresponding to each detection target and/or target detection frame position information corresponding to each detection target.

The system and apparatus embodiments correspond to the system embodiments, and have the same technical effects as the method embodiments, and for the specific description, refer to the method embodiments. The device embodiment is obtained based on the method embodiment, and for specific description, reference may be made to the method embodiment section, which is not described herein again. Those of ordinary skill in the art will understand that: the figures are merely schematic representations of one embodiment, and the blocks or flow diagrams in the figures are not necessarily required to practice the present invention.

Those of ordinary skill in the art will understand that: modules in the devices in the embodiments may be distributed in the devices in the embodiments according to the description of the embodiments, or may be located in one or more devices different from the embodiments with corresponding changes. The modules of the above embodiments may be combined into one module, or further split into multiple sub-modules.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method of object detection, the method comprising:

obtaining an image to be detected;

2. The method of claim 1, wherein the step of determining the position information of the target detection box corresponding to each detection target based on the probability value corresponding to each initial box boundary position information corresponding to the detection target comprises:

3. The method of claim 2, wherein the step of determining the corresponding target detection box boundary position information of the detection target in each direction based on the probability value corresponding to each initial box boundary position information corresponding to the detection target in each direction comprises:

4. The method of claim 3, wherein the predetermined integral formula is expressed by:

5. The method according to any one of claims 1 to 4, wherein a probability value corresponding to each initial frame boundary position information corresponding to each detected object detected in the image to be detected is determined at the feature regression layer using the pre-established object detection model and the feature of the image to be detected; before the step of determining, for each detection target, the target detection frame position information corresponding to the detection target based on the probability value corresponding to each initial frame boundary position information corresponding to the detection target, the method further includes:

obtaining an initial target detection model;

6. The method according to any one of claims 1 to 5, wherein a probability value corresponding to each initial frame boundary position information corresponding to each detected object detected in the image to be detected is determined at the feature regression layer using the pre-established object detection model and the feature of the image to be detected; and for each detection target, after the step of determining the target detection frame position information corresponding to the detection target based on the probability value corresponding to each initial frame boundary position information corresponding to the detection target, the method further comprises the following steps:

7. An object detection apparatus, characterized in that the apparatus comprises:

an obtaining module configured to obtain an image to be detected;

8. The apparatus of claim 7, wherein the second determination module is specifically configured to

9. The apparatus of claim 8, wherein the second determination module is specifically configured to

10. The apparatus of claim 9, wherein the predetermined integral formula is expressed by: