CN114565940A

CN114565940A - Target detection method and device

Info

Publication number: CN114565940A
Application number: CN202210164064.XA
Authority: CN
Inventors: 肖传利
Original assignee: Shenzhen Lianzhou International Technology Co Ltd
Current assignee: Shenzhen Lianzhou International Technology Co Ltd
Priority date: 2022-02-22
Filing date: 2022-02-22
Publication date: 2022-05-31

Abstract

The invention discloses a target detection method and a target detection device, wherein the method comprises the following steps: the method comprises the steps of obtaining position information and discrimination probability of different detection objects of an original image, and calculating the occurrence probability of the position relation between two detection objects corresponding to each detection object position relation combination, wherein the detection object position relation combination is composed of the position information of any two different detection objects. Judging whether the discrimination probability of the two detection objects corresponding to each detection object position relation combination and the occurrence probability of the position relation between the two detection objects meet a preset threshold condition, and acquiring the position information of the whole target corresponding to the detection object position relation combination when the preset threshold condition is met. By adopting the embodiment of the invention, the occurrence probability of the position relation between different detection objects of the target and the discrimination probability of the position information of the detection objects can be effectively combined, and the position of the whole target can be accurately positioned, thereby improving the detection precision of the whole target.

Description

Target detection method and device

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a target detection method and apparatus.

Background

With the increasing demand of target detection technology, the accuracy of target detection is receiving wide attention. Taking a human-shaped object as an example, in order to improve accuracy, a conventional human-shaped detection method not only uses pedestrian feature detection, but also uses features of body components of a pedestrian, such as a half-body feature and a head feature. The classifier is used for detecting a plurality of body components of the pedestrian, the human body is screened by using the position relation between the body components and the whole body of the human body, and finally, the human body target detection result is output.

However, the inventors found that the prior art has at least the following problems: the output score of the classifier for detecting the body component or the whole body of the human body in the prior art is different from the physical significance of the position relation of the body component relative to the whole body, and the output score and the position relation do not have a relatively effective fusion mode, and can only be judged by means of human experience, so that the accuracy of a pedestrian detection result is low.

Disclosure of Invention

The embodiment of the invention aims to provide a target detection method and a target detection device, which can effectively combine the occurrence probability of the position relation between different detection objects of a target and the discrimination probability of the position information of the detection objects to accurately position the position of the whole target, thereby improving the detection precision of the whole target.

In order to achieve the above object, an embodiment of the present invention provides a target detection method, including:

acquiring position information and discrimination probability of different detection objects of an original image; wherein the different detection objects comprise a target whole and at least one component constituting the target whole;

traversing to obtain all detection object position relation combinations according to the position information and the discrimination probability of each detection object; the detection object position relation combination consists of position information of any two different detection objects;

calculating the occurrence probability of the position relation between the two detection objects corresponding to each detection object position relation combination according to a pre-constructed probability model;

judging whether the discrimination probability of the two detection objects corresponding to each detection object position relation combination and the occurrence probability of the position relation between the two detection objects meet a preset threshold value condition or not;

and when a preset threshold condition is met, acquiring the position information of the whole target corresponding to the detection object position relation combination.

As an improvement of the above scheme, the probability model is constructed by the following method:

acquiring a training data set; the training data set comprises a plurality of sample images which are marked with detection object rectangular frames in advance, wherein the detection object rectangular frames comprise target rectangular frames corresponding to the whole target and sub-target rectangular frames corresponding to at least one component of the whole target;

carrying out size normalization on all images in the target rectangular frame according to a preset size;

according to the normalized target rectangular frame, determining the position information of the target rectangular frame and the sub-target rectangular frames corresponding to the target rectangular frame;

and calculating the occurrence probability of the position relation between any two detection objects in the training data set according to the position information of any two detection object rectangular frames in each sample image so as to construct and obtain the probability model.

As an improvement of the above scheme, the calculating, according to the position information of any two rectangular frames of the detection objects in each sample image, the probability of occurrence of the position relationship between any two detection objects in the training data set specifically includes:

calculating the occurrence probability of the position relationship between the two detection objects corresponding to the n-th detection object rectangular frame and the m-th detection object rectangular frame according to the position information of the n-th detection object rectangular frame and the m-th detection object rectangular frame in the same sample image by the following calculation formula:

q＝(x_n,k-x_m,k,y_n,k-y_m,k)；

wherein (x)_n,k,y_n,k) Position information of the rectangular frame to be detected for the nth_m,k,y_m,k) The position information of the mth detection object rectangular frame is obtained, and K is the number of sample images in the training data set; k_q,m,nThe number of sample images of the n-th detection object rectangular frame in the training data set, which appears at the position q relative to the m-th detection object rectangular frame, is 1,2, …, C; m is 1,2, …, C; and n is not equal to m; and C is the number of the detection object rectangular frames marked in the sample image.

As an improvement of the above solution, the preset threshold condition is: at least one of the discrimination probabilities of the two detection objects corresponding to the detection object position relationship combination is greater than a preset first probability threshold, the occurrence probability of the position relationship between the two detection objects is greater than a preset second probability threshold, and the sum of the discrimination probabilities of the two detection objects and the occurrence probability of the position relationship between the two detection objects is greater than a preset third probability threshold.

As an improvement of the above scheme, when a preset threshold condition is satisfied, acquiring position information of the whole target corresponding to the detection object position relationship combination specifically includes:

when a preset threshold condition is satisfied,

if one detection object exists in the two detection objects corresponding to the detection object position relation combination as the whole target, acquiring position information of the whole target;

and if one detection object does not exist in the two detection objects corresponding to the detection object position relation combination, calculating the position information of the whole target according to the position relation of the two detection objects.

As an improvement of the above scheme, the acquiring of the position information and the discrimination probability of different detection objects of the original image specifically includes:

acquiring an original image, and extracting the characteristics of the original image;

inputting the characteristics of the original image into different pre-trained classifiers respectively to obtain the position information and the discrimination probability of the detection object detected by each classifier; wherein, different classifiers are used for detecting the position information and the discrimination probability of different detection objects.

As an improvement of the above scheme, the classifier for detecting the whole target is trained by the following method:

acquiring a training data set; the training data set comprises a plurality of sample images which are marked with rectangular frames of detection objects in advance, wherein the rectangular frames of the detection objects comprise target rectangular frames corresponding to the whole targets;

and extracting the features of the normalized images in the target rectangular frame, and training a classifier by using the extracted features to obtain the classifier for detecting the whole target.

As an improvement of the above solution, the rectangular frame for detection further includes a sub-target rectangular frame corresponding to at least one component of the target;

then, the classifier for detecting components of the same class is trained by:

acquiring the size of a sub-target rectangular frame corresponding to the target rectangular frame according to the normalized target rectangular frame;

calculating the normalized dimension of the components of the same category according to the dimension of the sub-target rectangular frames corresponding to the components of the same category in the training data set, and carrying out dimension normalization on the images in the sub-target rectangular frames corresponding to the components of the same category according to the normalized dimension;

and performing feature extraction on the images in the normalized sub-target rectangular frames, and training a classifier by using the extracted features to obtain the classifier for detecting the components of the same category.

As an improvement of the above solution, after the obtaining of the position information of the entire target corresponding to the detection object position relationship combination when the preset threshold condition is satisfied, the method further includes:

and screening the acquired position information of all the target integers to obtain the position information of the target integers meeting preset conditions.

As an improvement of the above scheme, the screening the acquired location information of all the target integers to obtain location information of the target integers meeting preset conditions specifically includes:

and screening the detected position information of all the target integers by adopting a non-maximum value inhibition processing method so as to obtain the position information of the target integers meeting preset conditions.

As an improvement of the above solution, before the acquiring the position information and the discrimination probability of the different detection objects of the original image, the method further includes:

obtaining an original image, and scaling the original image to different sizes;

then, sequentially executing, for the original images with different scales and sizes: and acquiring the position information and the discrimination probability of different detection objects of the original image.

An embodiment of the present invention further provides a target detection apparatus, including:

the detection object detection module is used for acquiring position information and discrimination probability of different detection objects of the original image; wherein the different detection objects include the whole target and at least one component constituting the whole target;

the position relation combination acquisition module is used for traversing to obtain all the position relation combinations of the detection objects according to the position information and the discrimination probability of each detection object; the detection object position relation combination consists of position information of any two different detection objects;

the probability calculation module is used for calculating the occurrence probability of the position relation between the two detection objects corresponding to each detection object position relation combination according to a pre-constructed probability model;

a threshold condition judging module, configured to judge whether a discrimination probability of two detection objects corresponding to each detection object position relationship combination and an occurrence probability of a position relationship between the two detection objects satisfy a preset threshold condition;

and the target position acquisition module is used for acquiring the position information of the whole target corresponding to the detection object position relation combination when a preset threshold condition is met.

An embodiment of the present invention further provides an object detection apparatus, including a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, where the processor executes the computer program to implement the object detection method according to any one of the above items.

Compared with the prior art, the target detection method and the target detection device disclosed by the invention have the advantages that the position information and the discrimination probability of different detection objects of an original image are obtained, the occurrence probability of the position relationship between two detection objects corresponding to each detection object position relationship combination is calculated, whether the discrimination probability of the two detection objects corresponding to each detection object position relationship combination and the occurrence probability of the position relationship between the two detection objects meet the preset threshold condition or not is judged, and the position information of the whole target corresponding to the detection object position relationship combination is obtained when the preset threshold condition is met. In the embodiment of the invention, the position relation among different detection objects is considered in the process of target detection, and the position relation among the different detection objects is represented in a probability mode. Because the position relation among different detection objects is represented as probability, the position information of the detection objects is also represented as probability, the two probabilities are easier to explain and judge the threshold value, the position information of the whole target is detected by combining the judgment probability and the occurrence probability of the detection objects, the position of the whole target can be more accurately positioned, and the detection precision of the whole target is improved.

Drawings

Fig. 1 is a schematic step diagram of a target detection method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a training procedure of a probabilistic model according to a second embodiment of the present invention;

FIG. 3 is a schematic diagram of the training steps of the classifier for detecting the whole target according to the third embodiment of the present invention;

FIG. 4 is a schematic diagram of the training steps of the classifier for detecting the components of the whole target according to the third embodiment of the present invention;

FIG. 5 is a schematic diagram illustrating steps of another target detection method according to a fourth embodiment of the present invention;

fig. 6 is a schematic structural diagram of an object detection apparatus according to a fifth embodiment of the present invention;

fig. 7 is a schematic structural diagram of another object detection apparatus according to a sixth embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 is a schematic step diagram of a target detection method according to an embodiment of the present invention. The first embodiment of the present invention provides a target detection method, which is specifically executed through steps S11 to S15:

s11, acquiring position information and discrimination probabilities of different detection objects of the original image; wherein the different detection objects include a target whole and at least one component constituting the target whole.

It should be noted that the original image includes at least one target whole, and the target whole is composed of several components. The plurality of parts into which the target whole is divided may be set in advance according to the type and structure of the target whole.

For example, taking a pedestrian as an object entirety, the pedestrian object is the object entirety to be detected and finally output by the method, and the components constituting the object entirety may include: head, left arm, right arm, body, left leg, and right leg.

It is to be understood that the above scenario is only an example, and in order to improve the detection accuracy, components such as a left shoulder, a right shoulder, a left hand, a right hand, a left foot, and a right foot may be further divided, and are not limited in detail herein.

For example, taking an automobile as a target whole, where the automobile target is the target whole to be detected and finally output by the method, the components constituting the target whole may include: wheels, windows, doors, etc.

It should be understood that the above-mentioned scenarios are only examples, and in practical applications, the components of the automobile may be further divided according to the detection requirement, which is not specifically limited herein.

In the embodiment of the invention, each detection object (whole target and components thereof) of the original image is detected and calculated to obtain a plurality of position information q and a discrimination probability P corresponding to each detection object_clsCombinations of (a) and (b). Wherein the position information refers to a location of the detection object in the original image, such as coordinate information. The discriminant probability refers to a probability value that the detection object is located at a certain position in the original image.

Generally, when the probability value that the detection object is located at a certain position in the original image is greater than a preset threshold, for example, 75%, the position information is considered as the determined position of the detection object.

It should be noted that the method for calculating the position information and the discrimination probability of the detection object may adopt a detection method in the prior art, for example, a neural network model trained in advance for detecting the position information and the discrimination probability, and is not limited in this respect.

For convenience of explanation, pedestrians are used hereTaking the overall target as an example, the judgment probability that the pedestrian target is located at the position A is obtained as P_cls1, the discrimination probability at position B is P_cls2, the discrimination probability at position C is P_cls3; and at least one component of the pedestrian's object, such as the head and body, wherein the head is at position D with a probability of discrimination P_cls4, the discrimination probability at position E is P_cls5; the discrimination probability that the body is at position F is P_cls6, the discrimination probability at position G is P_cls7, and so on.

S12, traversing to obtain all detection object position relation combinations according to the position information and the discrimination probability of each detection object; the detection object position relation combination is composed of position information of any two different detection objects.

The position relation combination of the detection objects is composed by using a certain position information of any two different detection objects according to the plurality of position information of each detection object and the corresponding discrimination probability obtained in step S11.

As an example, a pedestrian target and a head of the pedestrian are taken as the two different detection objects, and the obtained detection object position relationship combination is as follows: six combinations of position a and position D, position a and position E, position B and position D, position B and position E, position C and position D, and position C and position E. And similarly, obtaining a detection object position relation combination taking the pedestrian target and the body of the pedestrian as the two different detection objects, a detection object position relation combination taking the head and the body of the pedestrian target as the two different detection objects, and the like.

And S13, calculating the occurrence probability of the position relation between the two detection objects corresponding to each detection object position relation combination according to a pre-constructed probability model.

A probability model is constructed in advance and used for calculating the occurrence probability P of the position relation between two detection objects corresponding to the position relation combination of the detection objects_pos. The probability of occurrence P_posFor characterizing the combination of the positional relationships of the first test object with respect to the secondThe probability of the object appearing at the position q is detected. The position q refers to a relative position between two detection objects.

Generally, the higher the occurrence probability of the positional relationship between two detection objects, the more normal the relative positional relationship between the two detection objects is, the closer the positional information corresponding to the entire target formed by the two detection objects is to the final target position to be detected.

Specifically, the position information of the two detection objects in the detection object position relationship combination is input into the probability model for calculation and analysis, so as to obtain the occurrence probability of the position relationship between the two detection objects.

And S14, judging whether the discrimination probability of the two detection objects corresponding to each detection object position relation combination and the occurrence probability of the position relation between the two detection objects meet a preset threshold value condition.

Specifically, the discrimination probability P of the detection object obtained in the previous step_clsAnd the probability of occurrence P of the positional relationship between two different detection objects_posAnd judging corresponding threshold conditions to obtain a detection object position relation combination meeting the threshold conditions, and further determining the position information of the whole target corresponding to the detection object position relation combination, so as to position and obtain the determined position of the whole target to be detected on the original image.

Preferably, the preset threshold condition is: at least one of the discrimination probabilities of the two detection objects corresponding to the detection object position relationship combination is greater than a preset first probability threshold, the occurrence probability of the position relationship between the two detection objects is greater than a preset second probability threshold, and the sum of the discrimination probabilities of the two detection objects and the occurrence probability of the position relationship between the two detection objects is greater than a preset third probability threshold.

In the embodiment of the invention, traversing each detection object relation combination to obtain the discrimination probability of two detection objects i and j and the occurrence probability of the position relation between the two detection objects, and marking as P_clsi(pi)、P_clsj(pj)、P_pos(pj-pi, i, j), P obtained by taking all i, j values_clsi(pi)、P_clsj(pj)、P_pos(pj-pi, i, j) is marked as a set R, and the targets are classified according to the set R. Defining the operation as op, outputting the whole object classified to be detected as 1, otherwise, outputting the whole object to be detected as 0. Where pi is the position information of the ith detection object, and pj is the position information of the jth detection object.

For convenience of explanation, the 0 th detection object is referred to as the entire target itself, and when op (r) is 1, p0 is returned, that is, the position of the entire target is obtained.

Here, if the pedestrian target is the 0 th detection object and the head is the 1 st detection object, then:

op(R)＝op(Pa,Pb,Pc)＝op(P_cls0(p0)、P_cls1(p1)、P_pos(p1-p0,0,1))；

when op (R) satisfies: pa or Pb is greater than or lower than the probability threshold P1_ thresh, Pc is greater than the second probability threshold P2_ thresh, and Pa + Pb + Pc is greater than the third probability threshold P3_ thresh, indicating that the detected object relationship combination satisfies the preset threshold condition.

And S15, when a preset threshold condition is met, acquiring the position information of the whole target corresponding to the detection object position relation combination.

Specifically, when a certain detection object relationship combination meets the preset threshold condition, if a detection object exists in two detection objects corresponding to the detection object position relationship combination and serves as the target whole, position information of the target whole is acquired; and if one detection object does not exist in the two detection objects corresponding to the detection object position relation combination, calculating the position information of the whole target according to the position relation of the two detection objects.

For example, when the detected object relationship combination includes the whole pedestrian object, the corresponding position information p0 is the final detected target position; when the detection object relation combination does not include the whole pedestrian target, such as the head and the body, the position information of the pedestrian target is determined according to the position information corresponding to the head and the body, and the target position obtained through final detection is obtained.

The embodiment of the invention provides a target detection method, which comprises the steps of obtaining position information and discrimination probabilities of different detection objects of an original image, calculating the occurrence probability of a position relationship between two detection objects corresponding to each detection object position relationship combination, judging whether the discrimination probability of the two detection objects corresponding to each detection object position relationship combination and the occurrence probability of the position relationship between the two detection objects meet a preset threshold condition or not, and obtaining the position information of a target whole corresponding to the detection object position relationship combination when the preset threshold condition is met. In the embodiment of the invention, the position relation among different detection objects is considered in the process of target detection, and the position relation among the different detection objects is represented in a probability mode. Because the position relation among different detection objects is represented as probability, the position information of the detection objects is also represented as probability, the two probabilities are easier to explain and judge the threshold value, the position information of the whole target is detected by combining the judgment probability and the occurrence probability of the detection objects, the position of the whole target can be more accurately positioned, and the detection precision of the whole target is improved.

Fig. 2 is a schematic diagram of a training procedure of a probabilistic model according to a second embodiment of the present invention. The second embodiment of the invention provides a target detection method which is further implemented on the basis of the first embodiment. In the embodiment of the present invention, the probabilistic model in step S13 is constructed through the following steps S21 to S24:

s21, acquiring a training data set; the training data set comprises a plurality of sample images which are marked with rectangular frames of the detection objects in advance, wherein the rectangular frames of the detection objects comprise a target rectangular frame corresponding to the whole target and sub-target rectangular frames corresponding to at least one component of the whole target.

And S22, carrying out size normalization on the images in all the target rectangular frames according to the preset size.

And S23, determining the position information of the target rectangular frame and the sub-target rectangular frames corresponding to the target rectangular frame according to the normalized target rectangular frame.

S24, calculating the occurrence probability of the position relation between any two detection objects in the training data set according to the position information of any two detection object rectangular frames in each sample image to construct and obtain the probability model.

Specifically, a plurality of sample images containing a target are obtained, and rectangular frames of the detection object are labeled on the plurality of sample images, including a target rectangular frame corresponding to the whole target and sub-target rectangular frames corresponding to at least one component constituting the whole target.

Taking a pedestrian as an example as a whole, the plurality of sample images all include a pedestrian target, a target rectangular frame corresponding to the pedestrian target is marked on each sample image, and at least one component forming the pedestrian, such as a corresponding sub-target rectangular frame of a head, a left arm, a right arm, a body, a left leg, a right leg and the like.

Further, the images in all the target rectangular frames in the plurality of sample images are normalized, and the width and the height of the image in the target rectangular frame after size normalization are (normal _ width, normal _ height).

It can be understood that, since the sizes of the whole targets are normalized, the sizes of the rectangular frames, i.e., sub-target rectangular frames, of the corresponding components of each whole target are also changed correspondingly, and the width and height of the ith sub-target rectangular frame in the k-th sample image after normalization is labeled as (part _ width)_i,k,part_height_i,k) The center position is marked as (x)_i,k,y_i,k). Wherein K belongs to K, K is the number of the sample images, i belongs to C, and C is the number of the detection objects of different types, that is, the total number of the target whole is added to the number of the components corresponding to the target whole. For example, in the case where the component parts of the pedestrian target are set to the head, left arm, right arm, body, left leg, and right leg, the value of C is 7.

Further, based on the position information of the n-th rectangular frame and the m-th rectangular frame in the same sample image, the probability of occurrence of the positional relationship between the two detection objects corresponding to the n-th rectangular frame and the m-th rectangular frame is calculated by the following calculation formula:

q＝(x_n,k-x_m,k,y_n,k-y_m,k)；

wherein (x)_n,k,y_n,k) Position information of the rectangular frame to be detected for the nth_m,k,y_m,k) The position information of the m-th detection object rectangular frame is obtained, and K is the number of sample images in the training data set; k_q,m,nThe number of sample images of the n-th detection object rectangular frame in the training data set, which appears at the position q relative to the m-th detection object rectangular frame, is 1,2, …, C; m is 1,2, …, C; and n is not equal to m; and C is the number of the detection object rectangular frames marked in the sample image.

For any two detection objects in each sample image, the method is adopted to calculate the occurrence probability of the relative position relationship, and the relative position relationship q and the occurrence probability P between different detection objects are established_posThereby constructing and obtaining the probability model.

Therefore, in the application phase, that is, in step S13, the probability model is used to calculate the relative positional relationship (position q) between the two detection objects according to the positional information of the two detection objects in the detection object relationship combination, so as to obtain the occurrence probability of the positional relationship between the two corresponding detection objects.

By adopting the technical means of the embodiment of the invention, the probability model is trained by acquiring a plurality of labeled sample images, and the training process is simple, convenient and effective.

As a preferred implementation manner, a third embodiment of the present invention provides a target detection method, which is further implemented on the basis of the first embodiment or the second embodiment. In the embodiment of the present invention, step S11, that is, acquiring the position information and the determination probability of different detection objects of the original image, is specifically executed through steps S111 to S112:

and S111, acquiring an original image, and performing feature extraction on the original image.

S112, respectively inputting the characteristics of the original image into different pre-trained classifiers to obtain the position information and the discrimination probability of the detection object detected by each classifier; wherein, different classifiers are used for detecting the position information and the discrimination probability of different detection objects.

Specifically, according to each pixel value on the original image, a feature map (feature map) is calculated on the original image to obtain a feature map to be detected of the original image. And extracting the features of the feature map to obtain a feature vector of the feature map, and further respectively inputting the feature vector into different pre-trained classifiers to obtain the position information and the discrimination probability of the detection object detected by each classifier.

The different classifiers are pre-trained and used for detecting different detection objects. The classifier corresponding to a certain detection object can output the position information and the corresponding discrimination probability of the detection object by analyzing and calculating the input feature vector.

As a preferred implementation manner, refer to fig. 3, which is a schematic diagram of a training step of a classifier for detecting an entirety of a target according to a third embodiment of the present invention. The classifier C0 for detecting the whole target is obtained by training through the following steps S31 to S33:

s31, acquiring a training data set; the training data set comprises a plurality of sample images which are marked with rectangular frames of detection objects in advance, wherein the rectangular frames of the detection objects comprise target rectangular frames corresponding to the whole targets;

s32, carrying out size normalization on the images in all the target rectangular frames according to a preset scale;

and S33, extracting the features of the normalized images in the target rectangular frame, and training a classifier by using the extracted features to obtain the classifier for detecting the whole target.

Specifically, a plurality of sample images including a target are obtained, and a rectangular frame of the detection object is labeled on the plurality of sample images, including a target rectangular frame corresponding to the whole target.

Taking a pedestrian as an integral target, marking a target rectangular frame corresponding to the pedestrian target on each sample image, and normalizing the images in all the target rectangular frames in the sample images, wherein the width and the height of the images in the target rectangular frames after size normalization are (normal _ width, normal _ height).

By way of example, the size of the target rectangular box is scaled to the same size of 32 × 32.

Further, according to each pixel value of the normalized image in the target rectangular frame, calculating a feature map of the image in the target rectangular frame, performing feature extraction on the feature map to obtain feature vectors of the feature map, and then inputting the feature vectors into a preset classifier respectively to train the classifier.

Preferably, in order to improve the training precision of the classifier, the image in the target rectangular frame may be taken as a positive sample image, a plurality of image regions not containing the whole target may be obtained as negative sample images, a feature vector of each negative sample image is calculated and input into the classifier, so as to implement the training of the classifier.

Illustratively, a neural network can be used as a classifier, a loss function is calculated according to a predicted value output by the classifier and a real rectangular box position, and parameters of the neural network are updated by adopting a gradient descent optimization algorithm to reduce the loss function until the trained classifier is obtained when the loss function tends to be minimized. Wherein. The last layer of the neural network is the activation function sigmoid and the loss function logistic, noted as H (y, y'). The mathematical expressions of sigmoid function and logistic are as follows:

H(y,y′)＝-y·log(y′)-(1-y)·log(1-y′)；

wherein y is an actual label, y' is a predicted value of the label, and x is an input of the last layer of the neural network. After the training of the neural network is completed, the output of the neural network, i.e. y', can be used as the discriminant probability of the classifier.

Further, referring to fig. 4, a schematic diagram of a training step of the classifier for detecting a component of the whole target according to the third embodiment of the present invention is shown. The rectangular frames for detection objects further include sub-target rectangular frames corresponding to at least one component of the whole target, for example, in the above-described scene in which the whole target is a pedestrian, and further include sub-target rectangular frames corresponding to at least one component of the pedestrian, for example, a head, a left arm, a right arm, a body, a left leg, a right leg, and the like.

Then, the classifier Ci for detecting components of the same category is trained by the following steps S31 'to S33':

s31', according to the normalized target rectangular frame, the size of the sub-target rectangular frame corresponding to the target rectangular frame is obtained.

S32', calculating the normalized dimension of the components of the same category according to the dimension of the sub-target rectangular boxes corresponding to the components of the same category in the training data set, and carrying out dimension normalization on the images in the sub-target rectangular boxes corresponding to the components of the same category according to the normalized dimension.

S33', extracting the features of the images in the normalized sub-target rectangular frame, and training a classifier by using the extracted features to obtain the classifier for detecting the components of the same category.

Understandably, due to the normalization of the sizes of the whole targets, each whole target corresponds to a rectangular frame of the component parts, i.e. subdirectoriesThe size of the marked rectangular frame is changed correspondingly, and the width and height of the ith sub-target rectangular frame in the k sample image after normalization are marked as (part _ width)_i,k,part_height_i,k) The center position is marked as (x)_i,k,y_i,k). Wherein K belongs to K, K is the number of the sample images, i belongs to C, and C is the number of the detection objects of different types.

Further, the coordinates of the center position of the sub-target rectangular frame with respect to the target rectangular frame, and the corresponding width and height are calculated. The width and height of sub-target rectangular boxes (for example, rectangular boxes corresponding to all heads, or rectangular boxes corresponding to all bodies, etc.) corresponding to the components of the same category in the training set are averaged and rounded, and the average is used as the normalized dimension of the sub-target rectangular boxes. The width and height corresponding to the normalized dimension are calculated as follows:

wherein ceil represents a rounding operation.

And further, performing feature extraction on the normalized images in the sub-target rectangular frames, and performing classifier training by using the features of the sub-target rectangular frames corresponding to the components of the same category to obtain the classifier for detecting the components of the same category.

It can be understood that the training process of the classifier for detecting the components of the same class is similar to the training process of the classifier for detecting the whole target, and is not described herein again.

By adopting the technical means of the embodiment of the invention, different classifier training is carried out on different detection objects including the whole target and a plurality of components, so that the position information of the detection object can be more accurately obtained, and the subsequent detection precision of the whole target is improved.

As a preferred embodiment, the present invention is further implemented on the basis of the first embodiment. After the step S15, the object detection method further includes a step S16:

and S16, screening the acquired position information of all the target integers to obtain the position information of the target integers meeting the preset conditions.

Preferably, in order to improve the accuracy of positioning the target object, a non-maximum suppression processing method is adopted to screen the acquired position information of all the target entities, so as to obtain the position information of the target entities meeting a preset threshold condition.

Specifically, sorting the target rectangular frames corresponding to all the obtained targets integrally in a descending order according to scores; performing intersection operation on the currently traversed target frame and the currently remaining target frames to obtain a corresponding intersection point set, and calculating the intersection ratio IOU of every two target frames according to the area of a convex edge formed by judging the intersection point set; filtering the target frames with the IOU larger than a preset threshold value, and reserving the target frames with the IOU smaller than the preset threshold value; thereby obtaining the final target frame meeting the preset threshold condition.

Since the object classifier performs classification calculation according to the input feature vector, after the original image is subjected to the object detection operation in steps S11 to S15, when the discrimination probability satisfies the threshold condition, it is determined that there is a corresponding object frame and output, and at this time, position information of more than one object as a whole may be obtained. In this case, there may be a case where the target object in the target frame output by the target classifier is incomplete or low in representativeness, and therefore, in the embodiment of the present invention, a non-maximum suppression processing method is adopted to screen all the target frames output by the target classifier, so as to obtain a target frame satisfying a preset threshold condition, and the target frame is used as a target frame capable of accurately positioning the target object finally.

In other embodiments, the obtained target frame of the whole target may be processed in other post-processing manners, for example, two adjacent complementary target frames are spliced to obtain a target frame including the whole target, so as to further improve the accuracy of target detection. And is not particularly limited herein.

By adopting the technical means of the embodiment of the invention, the position information of all the whole targets obtained after the threshold condition is met is further screened, so that the detection precision of the whole targets can be effectively improved.

Fig. 5 is a schematic step diagram of another target detection method provided in the fourth embodiment of the present invention. The fourth embodiment of the present invention is further implemented on the basis of the third embodiment, and the target detection method includes steps S41 to S46:

s41, acquiring an original image, and scaling the original image to different sizes;

s42, sequentially executing the following steps on the original images with different scales: acquiring position information and discrimination probability of different detection objects of an original image; wherein the different detection objects comprise a target whole and at least one component constituting the target whole;

s43, traversing to obtain all detection object position relation combinations according to the position information and the discrimination probability of each detection object; the detection object position relation combination consists of position information of any two different detection objects;

s44, calculating the occurrence probability of the position relation between the two detection objects corresponding to each detection object position relation combination according to a pre-constructed probability model;

s45, judging whether the judgment probability of the two detection objects corresponding to each detection object position relation combination and the occurrence probability of the position relation between the two detection objects meet a preset threshold value condition or not;

and S46, when a preset threshold condition is met, acquiring the position information of the whole target corresponding to the detection object position relation combination.

In the embodiment of the present invention, the classifier C0 for detecting the position information and the discrimination probability of the whole target obtained by training in the training manner of the third embodiment is only suitable for detecting a detection object of one size, for example, the whole target of 32 × 32 size mentioned in the above scenario. Based on this, in order to be able to comprehensively detect target objects of different sizes on the original image, multi-scale detection needs to be performed on the original image.

In an implementation manner, as described in the fourth embodiment of the present invention, a plurality of different dimensions are preset, the original image is scaled according to the dimensions to obtain original images with different dimensions, and the target detection method provided in the first embodiment is performed on the original image with each dimension.

For example, by reducing the original image so that the whole target with the scale size of 32 × 64 pixel values on the original image is scaled to the whole target with the scale size of 16 × 32 pixel values, the classifier C0 obtained by training in the above-mentioned third embodiment and the classifier Ci of the corresponding component are used for detection, and then the determination of the occurrence probability of the positional relationship is further combined, and finally the positional information corresponding to the whole target is obtained.

In another embodiment, in order to reduce the amount of computation, the feature map of the original image may be directly obtained, and then scaling with different scales and detail processing operations after scaling are performed on the feature map of the original image, and further, feature extraction is sequentially performed on the feature maps of the original image with different scales, and the feature maps are input into the classifier C0 obtained by training in the third embodiment and the classifier Ci of the corresponding component for detection, thereby implementing multi-scale detection on the original image.

In yet another embodiment, classifiers C0 and Ci corresponding to different scale sizes may also be trained in advance. For example, the classifier C0 for detecting the whole of the target with the scale size of 32 × 64 pixel values and the whole of the target with the scale size of 16 × 32 pixel values, and the classifier Ci at the corresponding scale size are trained, respectively. In the target detection process, classifiers with different scales are directly adopted for the original image to carry out target detection, so that multi-scale detection of the original image is realized.

It can be understood that the foregoing embodiments are only examples, and in practical applications, the foregoing method may be used to perform target detection on an original image in different scales according to practical situations, or other processing methods may be used to perform multi-scale detection on an original image, without affecting the beneficial effects obtained by the present invention.

By adopting the technical means of the embodiment of the invention, the original image is subjected to multi-scale target detection, so that the position information of all targets is more accurately positioned and obtained, and the detection precision of the targets is improved.

Fig. 6 is a schematic structural diagram of a target detection apparatus according to a fifth embodiment of the present invention. The fifth embodiment of the present invention provides a target detection apparatus 50, which includes a detection object detection module 51, a position relationship combination obtaining module 52, a probability calculation module 53, a threshold condition judgment module 54, and a target position obtaining module 55; wherein the content of the first and second substances,

the detection object detection module 51 is configured to obtain position information and discrimination probabilities of different detection objects in an original image; wherein the different detection objects include the whole target and at least one component constituting the whole target;

the position relationship combination obtaining module 52 is configured to traverse to obtain position relationship combinations of all the detection objects according to the position information and the discrimination probability of each detection object; the detection object position relation combination consists of position information of any two different detection objects;

the probability calculation module 53 is configured to calculate, according to a pre-constructed probability model, an occurrence probability of a positional relationship between two detection objects corresponding to each detection object positional relationship combination;

the threshold condition determining module 54 is configured to determine whether the discrimination probability of the two detection objects corresponding to each detection object position relationship combination and the occurrence probability of the position relationship between the two detection objects satisfy a preset threshold condition;

the target position obtaining module 55 is configured to obtain position information of the whole target corresponding to the detection object position relationship combination when a preset threshold condition is met.

It should be noted that, the target detection apparatus provided in the fifth embodiment of the present invention is configured to execute all the process steps of the target detection method provided in any one of the first to fourth embodiments, and the working principles and beneficial effects of the two are in one-to-one correspondence, so that details are not described herein.

Fig. 7 is a schematic structural diagram of another object detection apparatus provided in the sixth embodiment of the present invention. An object detection apparatus 60 according to a sixth embodiment of the present invention includes a processor 61, a memory 62, and a computer program stored in the memory and configured to be executed by the processor, and when the processor executes the computer program, the object detection method according to any one of the first to fourth embodiments is implemented.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-only memory (ROM), a Random Access Memory (RAM), or the like.

While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims

1. A method of object detection, comprising:

2. The method of claim 1, wherein the probabilistic model is constructed by:

acquiring a training data set; the training data set comprises a plurality of sample images which are marked with rectangular frames of detection objects in advance, wherein the rectangular frames of the detection objects comprise a target rectangular frame corresponding to the whole target and sub-target rectangular frames corresponding to at least one component of the whole target;

carrying out size normalization on the images in all the target rectangular frames according to a preset size;

3. The method of claim 2, wherein the calculating the probability of occurrence of the positional relationship between any two detection objects in the training data set according to the positional information of any two detection object rectangular frames in each sample image specifically comprises:

q＝(x_n，k-x_m，k，y_n，k-y_m，k)；

wherein (x)_n，k，y_n，k) Position information of the rectangular frame to be detected for the nth_m，k，y_m，k) The position information of the m-th detection object rectangular frame is obtained, and K is the number of sample images in the training data set; k_q，m，nThe number of sample images, n being 1,2, is C, of an nth detection object rectangular frame appearing at a position q relative to an mth detection object rectangular frame in the training data set; 1,2,. C; and n is not equal to m; and C is the number of the detection object rectangular frames marked in the sample image.

4. The object detection method of claim 1, wherein the preset threshold condition is: at least one of the discrimination probabilities of the two detection objects corresponding to the detection object position relationship combination is greater than a preset first probability threshold, the occurrence probability of the position relationship between the two detection objects is greater than a preset second probability threshold, and the sum of the discrimination probabilities of the two detection objects and the occurrence probability of the position relationship between the two detection objects is greater than a preset third probability threshold.

5. The method for detecting the target according to claim 1, wherein the obtaining the position information of the whole target corresponding to the position relationship combination of the detection objects when a preset threshold condition is satisfied specifically includes:

when a preset threshold condition is satisfied,

6. The object detection method of claim 1, wherein the obtaining of the position information and the discrimination probability of different detection objects of the original image specifically comprises:

7. The object detection method of claim 6, wherein the classifier for detecting the entirety of the object is trained by:

8. The object detection method according to claim 7, wherein the detection object rectangular frame further includes a sub-object rectangular frame corresponding to at least one component of the whole object;

then, the classifier for detecting components of the same class is trained by:

according to the normalized target rectangular frame, acquiring the size of a sub-target rectangular frame corresponding to the target rectangular frame;

9. The object detection method according to claim 1, wherein after acquiring the position information of the entire object corresponding to the detection object position relationship combination when the preset threshold condition is satisfied, the method further includes:

10. The method for detecting the target according to claim 8, wherein the step of screening the acquired position information of all the target integers to obtain the position information of the target integers meeting a preset condition specifically includes:

11. The object detection method according to claim 1, further comprising, before said acquiring position information and discrimination probabilities of different detection objects of an original image:

obtaining an original image, and scaling the original image to different sizes;

12. An object detection device, comprising:

13. An object detection apparatus, comprising a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, the processor implementing the object detection method according to any one of claims 1 to 11 when executing the computer program.