CN111783683A - Human body detection method based on feature balance and relationship enhancement - Google Patents

Human body detection method based on feature balance and relationship enhancement Download PDF

Info

Publication number
CN111783683A
CN111783683A CN202010634855.5A CN202010634855A CN111783683A CN 111783683 A CN111783683 A CN 111783683A CN 202010634855 A CN202010634855 A CN 202010634855A CN 111783683 A CN111783683 A CN 111783683A
Authority
CN
China
Prior art keywords
human body
feature
scale
training
balance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202010634855.5A
Other languages
Chinese (zh)
Inventor
安玉山
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Shizhen Intelligent Technology Co ltd
Original Assignee
Beijing Shizhen Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Shizhen Intelligent Technology Co ltd filed Critical Beijing Shizhen Intelligent Technology Co ltd
Priority to CN202010634855.5A priority Critical patent/CN111783683A/en
Publication of CN111783683A publication Critical patent/CN111783683A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components

Abstract

The invention discloses a human body detection method based on feature balance and relationship enhancement, which is applied to detecting human body targets of different scenes under multiple scales, balances feature information of each scale by fusing the multiple-scale feature information, further enhances feature expression by using implicit relationship between a background and a human body, and has better feature extraction capability and detection performance for human bodies of different scales and postures. Compared with the prior art, the human body detection method with enhanced feature balance and relationship is used, the semantic information of multilayer features is fused and enhanced, the feature extraction capability of the model on the multi-scale human body is improved, diversified human body prediction tasks under different scenes are responded, the number of human body training samples needed is less by utilizing various pre-training algorithms and a balance training sampling technology compared with other human body detection methods, the generalization capability and robustness of the model on human body detection under different scenes are improved, and the method is suitable for wide popularization.

Description

Human body detection method based on feature balance and relationship enhancement
Technical Field
The invention relates to the technical field of human body detection in computer vision, in particular to a human body detection method based on feature balance and relationship enhancement.
Background
The human body detection aims at detecting whether a human body target exists in an image scene or not and giving the position of the target. In various application scenes, the difficulty of the human body detection technology mainly lies in the problems of the diversity of human body actions and postures, the complexity of the background, the shielding of buildings, vehicles and the like on human bodies, the mutual shielding among human bodies, the change of illumination, the change of visual angles, the diversity of human body dimensions caused by the shooting distance and the scene change and the like. The difficulties make the research of the human body detection technology face a plurality of challenges, and reduce the human body detection precision of the existing algorithm in diversified scenes.
At present, the human body detection method mainly takes people as a whole, uses a rectangular frame to represent the human body, carries out traditional manual feature extraction on the human body, such as wavelet features, histogram features of directional gradients and the like, and then uses a classifier to classify the human body. In a complex background, the human body features under different scales are fitted by using a common image pyramid or feature pyramid so as to improve the detection accuracy of the multi-scale human body target, however, for some extremely small targets, the problems of boundary blurring and appearance blurring still occur, and the extremely small targets are difficult to distinguish from a disordered background and other overlapped targets. The human body trunk is divided into a plurality of parts, the occlusion condition of each part is analyzed or predicted to solve the occlusion problem, and the method needs more complex labeling data and a model reasoning process, so that the model cost is higher.
Aiming at the defects of the prior art, a human body detection method which solves the problems that the human body detection precision is low, the boundary and the appearance of a tiny target are fuzzy, the model cost is high, the generalization capability and the robustness of the model to human body detection in different scenes are too low, and the deception attack by means of copying and replaying and the like cannot be dealt with needs to be designed urgently.
Disclosure of Invention
In view of the above defects, the technical problem to be solved by the present invention is to provide a human body detection method based on feature balance and relationship enhancement, so as to solve the problems of low human body detection accuracy in diversified scenes, fuzzy boundary and fuzzy appearance of an extremely small target, high model cost, too low generalization ability and robustness of the model for human body detection in different scenes, and incapability of coping with means of spoofing attacks such as reproduction and the like in the prior art.
The invention provides a human body detection method based on feature balance and relationship enhancement, which comprises the following specific steps:
step 1, performing model pre-training on a detection model to obtain the detection model with better extraction capability and sensitivity on the characteristic features of a human body;
step 2, performing multi-scale feature fusion on the detection model to obtain a multi-scale feature pyramid;
step 3, enhancing the relation of image features of the image features after the multi-scale feature fusion to obtain fusion features after the relation enhancement;
step 4, performing multi-scale feature redistribution on the features of the detection model based on the feature result after the previous fusion and enhancement;
step 5, carrying out balanced sampling on the real sample frame data in the detection model by adopting a negative sample sampling and positive sample sampling method;
and 6, predicting and training the characteristics of the human bodies with different scales on different levels respectively by using a detection model according to the adjusted characteristic pyramid.
Preferably, the step 1 specifically comprises the following steps:
step 1.1, carrying out first-round pre-training on a detection model by adopting a huge universal object detection data set to obtain a detection model with higher generalization characteristic extraction capability;
step 1.2, after the first round of pre-training is finished, adjusting and detecting the top layer structure of the model;
and step 1.3, performing secondary pre-training by mixing a sample containing a human body target in a general scene to obtain a detection model with better extraction capability and sensitivity on the human body characteristic features.
Preferably, the step 2 specifically comprises the following steps:
2.1, selecting a proper intermediate scale, wherein the selection rule of the intermediate scale is as follows: when the number of the feature pyramid layers is n, selecting the scale of the feature of the first floor (n/2) (rounding down) layer as the middle scale;
step 2.2, zooming the image of the detection model by using a bilinear interpolation value to obtain a characteristic pyramid which keeps as much information as possible in the original characteristics;
2.3, simply stacking on the channel, and fusing to obtain a new characteristic diagram containing all levels of information;
and 2.4, compressing the number of channels of the new feature map to the number of channels before fusion by using an additional module to obtain a fused multi-scale feature pyramid.
Preferably, the specific steps of step 2.2 include:
step 2.2.1, setting relevant parameters, wherein the parameters comprise coefficients which need to be multiplied by the central value;
step 2.2.2, performing linear interpolation in two directions by using the gray levels of four adjacent pixels of the pixel to be obtained, and obtaining the gray level according to the linear relation of the gray level change from f (i, j) to f (i, j + 1):
for (i, j + v), f (i, j + v) ([ f (i, j +1) -f (i, j) ] × v + f (i, j),
for (i +1, j + v), f (i +1, j + v) ═ f (i +1, j +1) -f (i +1, j) ] × v + f (i +1, j);
step 2.2.3, obtaining a pixel gray value of bilinear interpolation according to the fact that gray level changes from f (i, j + v) to f (i +1, j + v) are also in a linear relation:
Figure BDA0002567878860000031
preferably, the specific method for relationship enhancement in step 3 includes using a relationship metric function in a training process
Figure BDA0002567878860000032
It is derived that the relation between the 256 one-dimensional vectors H and the 256 one-dimensional vectors G measures the function value as:
Figure BDA0002567878860000033
further obtaining a relation-enhanced fused characteristic F'mComprises the following steps:
Figure BDA0002567878860000034
wherein α, β are parameters that can be learned in training, FmThe fused features are shown, the 256 one-dimensional vector H is the pooled features of the human instance, and the 256 one-dimensional vector G is the pooled features of a certain area around.
Preferably, the step 4 comprises:
step 4.1, fusing and enhancing the characteristic F'mCarrying out corresponding scaling operation;
step 4.2, for the characteristic that the original scale is smaller than the intermediate scale after the fusion, adopting a pooling layer to carry out down-sampling;
4.3, performing up-sampling on the characteristics of which the original scale is larger than the intermediate scale after fusion by adopting bilinear interpolation;
and 4.4, adjusting the characteristics that the original scale is equal to the intermediate scale after fusion by adopting a convolution layer with unchanged size.
Preferably, the step 5 specifically comprises the following steps:
step 5.1, representing the training difficulty of the negative sample box by using the size of the cross-over ratio (IOU) of the negative sample box and the real sample box, and carrying out a negative sample sampling method based on cross-over ratio balance on the sample data of the detection model;
and 5.2, measuring the representativeness of the positive sample sampling by using the positive sample matching number corresponding to the real sample frame, and carrying out the positive sample sampling method based on example balance on the sample data of the detection model.
Preferably, the method in step 5.1 specifically divides all negative samples into K levels by using the label information of the image, and the number of the negative samples randomly sampled in the K-th level is:
Figure BDA0002567878860000041
wherein N is the total number of negative samples to be extracted, if the negative samples of the k-th level appear, the number NkLess than MkIf yes, all negative samples in the kth level are extracted, the negative samples in the (k +1) th level are sorted according to IOU, and M is selected according to ascending orderk-NkThe negative examples are supplemented as negative examples for the kth level.
Preferably, the method in step 5.2 specifically includes corresponding all positive samples to P human body labeling boxes, where the number of the positive samples randomly extracted around each human body labeling box is:
Figure RE-GDA0002614497440000042
wherein M is the number of positive samples needing to be extracted for training, and P is the total number of the personal labeling frames in the image.
Preferably, the step 6 is specifically:
6.1, in the training process, independently matching human bodies with different scales at a certain optimal level for training, wherein the human bodies are independent of each other;
and 6.2, in the prediction process, sequencing the prediction results of all layers according to the equal priority by using a multi-layer result fusion mode, and obtaining a final prediction result by using a single-class NMS algorithm.
According to the scheme, the human body detection method based on the feature balance and the relationship enhancement overcomes the defects of the existing general human body detection technology, the detection method based on the multi-scale feature fusion and the feature fusion enhancement is applied to detecting human body targets of different scenes under multi-scale, feature information of each scale is balanced by fusing the multi-scale feature information, the feature expression is further enhanced by utilizing the implicit relationship between the background and the human body, and the human body detection method has better feature extraction capability and detection performance for the human bodies of different scales and postures. Compared with the prior art, the human body detection method with enhanced feature balance and relationship is used, the semantic information of multilayer features is fused and enhanced, the feature extraction capability of the model on the multi-scale human body is improved, diversified human body prediction tasks under different scenes are responded, the number of human body training samples needed is less by utilizing various pre-training algorithms and a balance training sampling technology compared with other human body detection methods, the generalization capability and robustness of the model on human body detection under different scenes are improved, and the method is suitable for wide popularization.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a process block diagram i of a human body detection method based on feature balance and relationship enhancement according to an embodiment of the present invention;
fig. 2 is a process block diagram of a human body detection method based on feature balance and relationship enhancement according to an embodiment of the present invention;
fig. 3 is a process block diagram of a human body detection method based on feature balance and relationship enhancement according to an embodiment of the present invention;
fig. 4 is a process block diagram of a human body detection method based on feature balance and relationship enhancement according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1 to 4, a specific embodiment of a human body detection method based on feature balance and relationship enhancement according to the present invention will now be described. The human body detection method based on feature balance and relationship enhancement balances feature information of each scale by fusing multi-scale feature information, and further enhances feature expression by using implicit relationship between a background and a human body, and the method specifically comprises the following steps:
and S1, performing model pre-training on the detection model to obtain the detection model with better extraction capability and sensitivity on the characteristic features of the human body.
Pre-training a model: the same or similar learning task is carried out through a large-scale data set, and all parameters or part of parameters of the used model are initialized more reasonably.
The method solves the problems that the human body sample in a single data set is single in scene, sample diversity is low and the like, so that overfitting is caused when training is directly carried out, and a detection model with high generalization and robustness is difficult to obtain. The feature extraction capability of similar classes or other classes of objects is learned through pre-training, so that the problem that the finally learned data set is small in scale or has high sample similarity is avoided for the detection model, the features with better generalization and robustness are finally extracted, overfitting is prevented, and the performance of the detection model is improved.
The specific implementation steps of the step can be as follows:
s1.1, performing first-round pre-training on a detection model by adopting a huge universal object detection data set to obtain model parameters with better generalization performance, improving the feature extraction capability of the model, and finally obtaining the detection model with higher generalization feature extraction capability;
specific universal object detection datasets are, for example, PASCAL VOC, MS COCO and ImageNet.
S1.2, adjusting the top layer structure of the model after the first round of pre-training is finished;
the following are exemplary: when the number of categories of the general object detection data set is 80, the number of classification convolution channels at the top layer of the model is 80. At this time, the parameter value of the classified convolution is deleted, the number of the convolution channels is changed into the number of the classes required by human body detection, and the parameter value of the new convolution is initialized randomly.
S1.3, performing secondary pre-training by mixing a sample containing a human body target in a general scene, improving the extraction capability and sensitivity of the model to the human body characteristic features, and obtaining a detection model with better extraction capability and sensitivity to the human body characteristic features.
And S2, performing multi-scale feature fusion on the detection model, and obtaining a multi-scale feature pyramid based on the feature pyramid frame.
The feature pyramid is a basic component in a recognition system for detecting objects of different dimensions.
The specific implementation steps of the step can be as follows:
s2.1, selecting a proper intermediate scale, wherein the selection rule of the intermediate scale is as follows: when the number of the feature pyramid layers is n, selecting the scale of the feature of the first floor (n/2) (rounding down) layer as the middle scale;
s2.2, scaling by using bilinear interpolation to obtain a characteristic pyramid which retains as much information as possible in the original characteristics;
the step solves the problem that the influence caused by different feature sizes of the features of each layer is caused by the fact that the multi-scale feature pyramid obtained based on the feature pyramid frame, so that as much information as possible in the original features is reserved as much as possible.
The bilinear interpolation is used in the up-sampling process of images or features, the bilinear interpolation method has no defect of discontinuous gray scale, the result is satisfactory, and the specific calculation steps are as follows:
s2.2.1, setting relevant parameters, wherein the parameters comprise coefficients to be multiplied by the central value;
s2.2.2, linear interpolation is carried out in two directions by utilizing the gray scales of four adjacent pixels of the pixel to be solved, and the gray scale change from f (i, j) to f (i, j +1) is a linear relation, and the result is that:
for f (i, j + v), f (i, j + v) ═ f (i, j +1) -f (i, j) ] × v + f (i, j)
For (i +1, j + v), f (i +1, j + v) ═ f (i +1, j +1) -f (i +1, j) ] × v + f (i +1, j)
S2.2.3, obtaining the pixel gray value of bilinear interpolation according to the linear relationship of the gray change from f (i, j + v) to f (i +1, j + v):
Figure BDA0002567878860000071
s2.3, simply overlapping on the channel, fusing to obtain a new feature map containing all levels of information, and not destroying the original semantic and spatial information of the zoomed features as far as possible;
and S2.4, compressing the number of channels of the new feature graph to the number of channels before fusion by using an additional module, and completing the whole feature fusion process.
The features are all three-dimensional matrixes, the dimensions are (C, H and W) respectively, the direction represented by C represents a channel, the value of C is 256, and the features have 256 channels.
There are two ways in which features may be merged, exemplary: 2 features are fused together, and the method I comprises the following steps: directly adding element by element, and still obtaining the characteristics of (C, H, W) after adding; the second method comprises the following steps: put together in the channel dimension, resulting in a feature of size (n × C, H, W), the additional module employed at this time is (3 × 3 convolution + ReLU activation function), reconverting the feature of (n × C, H, W) back to the feature of size (C, H, W).
And S3, performing image feature relationship enhancement on the image features subjected to multi-scale feature fusion to obtain fusion features subjected to relationship enhancement.
In particular, use is made of FmRepresenting the fused features, utilizing the correlation between the human body examples hidden in the image and the appearance of the surrounding background, and using a relation weighing function in the training process
Figure BDA0002567878860000072
And (4) calculating.
Using a relationship metric function in a training process
Figure BDA0002567878860000073
It is derived that the relation between the 256 one-dimensional vectors H and the 256 one-dimensional vectors G measures the function value as:
Figure BDA0002567878860000074
further, a relation-enhanced fused feature F 'is obtained'mComprises the following steps:
Figure BDA0002567878860000075
wherein α, β are parameters that can be learned in training, FmThe fused features are shown, the 256 one-dimensional vector H is the pooled features of the human instance, and the 256 one-dimensional vector G is the pooled features of a certain area around.
The implicit relationship between the background and the human body is calculated by assuming that the feature is F (C, H, W), and the relationship value at the (H, W) position is the average cosine value of F (H, W) (the vector of C × 1) and 15 vectors (F (H-2, W-2), F (H-2, W-1), …, F (H +2, W +2)) within the range of 4 × 4 around. When some peripheral vector does not exist, if h is 1, a zero vector is used instead.
And S4, performing multi-scale feature reallocation on the features of the detection model based on the feature results after the previous fusion and enhancement.
The specific implementation steps of the step can be as follows:
s4.1, based on the size of the feature pyramid before the previous fusion, and according to the scale of each layer of features in the previous feature pyramid, fusing and enhancing the feature F'mCarrying out corresponding scaling operation;
s4.2, for the characteristic that the original scale is smaller than the intermediate scale after fusion, adopting a pooling layer to perform down-sampling;
s4.3, performing up-sampling on the characteristic that the original scale is larger than the intermediate scale after fusion by adopting bilinear interpolation;
in upsampling bilinear interpolation, an exemplary: when a certain pixel value of the original feature scale is f (i, j), the adjacent four pixel values of the (i, j), (2 × i +1, 2 × j), (2 × i, 2 × j +1), (2 × i +1, 2 × j +1) positions after upsampling are equal to f (i, j).
And S4.4, adjusting the characteristics that the original scale is equal to the intermediate scale after fusion by adopting a convolution layer with unchanged size.
And S5, carrying out balanced sampling on the real sample frame data in the detection model by adopting a negative sample sampling method and a positive sample sampling method.
Illustratively, of the 256 samples in each training, 128 negative samples and 128 positive samples are taken.
The specific implementation steps of the step can be as follows:
s5.1, representing the training difficulty of the negative sample box by using the size of the cross-over ratio (IOU) of the negative sample box and the real sample box, and carrying out a negative sample sampling method based on the cross-over ratio balance on the sample data of the detection model.
Specifically, the selection probability of the hard samples is increased to improve the representativeness of the sampled negative samples, and all the negative samples are used for detecting the accuracy of corresponding objects in a specific data set according to the IOU (interaction over Unit) measurement of a real human body frame by utilizing the marking information of the imageOne standard) into K levels, the number of negative samples randomly sampled at the K level is:
Figure BDA0002567878860000081
wherein N is the total number of negative samples to be extracted, if the negative samples of the k-th level appear, the number NkLess than MkIf yes, all negative samples in the kth level are extracted, the negative samples in the (k +1) th level are sorted according to IOU, and M is selected according to ascending orderk-NkThe negative examples are supplemented as negative examples for the kth level. The finally extracted negative samples have different IOU from the real frames and are distributed uniformly.
S5.2, carrying out a positive sample sampling method based on example balance on the sample data of the detection model by using a method for measuring the representativeness of the positive sample sampling by using the matching number of the positive samples corresponding to the real sample frame.
Specifically, all positive samples are corresponding to P human body labeling boxes, and the number of the positive samples randomly extracted around each human body labeling box is as follows:
Figure BDA0002567878860000082
wherein M is the number of positive samples needing to be extracted for training, and P is the total number of the personal labeling frames in the image.
And S6, predicting and training the characteristics of the human body with different scales on different levels respectively by using a detection model according to the adjusted characteristic pyramid.
The specific implementation steps of the step can be as follows:
s6.1, in the training process, independently matching human bodies with different scales at a certain optimal level for training, wherein the human bodies are independent of each other;
the optimal level is related to anchor point frame design of the model, anchor point frames of different levels are different in size, after the size of the anchor point frames is compared with the size of a marking frame of a specific human body, the level with smaller frame size difference is selected as the optimal level used in training, and training is not carried out on other levels.
S6.2, in the prediction process, sequencing the prediction results of all layers according to the equal priority by using a multi-layer result fusion mode, and obtaining the final prediction result by using a single-class NMS algorithm (Non-maximum suppression algorithm).
The multilayer result fusion and the equal priority are that the prediction results of all the layers are put together and directly sorted according to the prediction scores, and the final prediction result only selects the prediction box with the highest score.
The method is applied to detecting human body targets of different scenes under multiple scales, balances the characteristic information of each scale by fusing the multi-scale characteristic information, further enhances the characteristic expression by utilizing the implicit relation between the background and the human body, has better characteristic extraction capability and detection performance for the human bodies of different scales and postures, overcomes the defects of the existing human body detection technology, improves the robustness of human body detection by fusing the semantic information and the peripheral background information of different depth characteristics based on the characteristic balance and relation enhancement human body detection method, and further has better human body detection performance.
The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A human body detection method based on feature balance and relationship enhancement is characterized by comprising the following specific steps:
step 1, performing model pre-training on a detection model to obtain the detection model with better extraction capability and sensitivity on the characteristic features of a human body;
step 2, performing multi-scale feature fusion on the detection model to obtain a multi-scale feature pyramid;
step 3, enhancing the relation of image features of the image features after the multi-scale feature fusion to obtain fusion features after the relation enhancement;
step 4, performing multi-scale feature redistribution on the features of the detection model based on the feature result after the previous fusion and enhancement;
step 5, carrying out balanced sampling on the real sample frame data in the detection model by adopting a negative sample sampling and positive sample sampling method;
and 6, predicting and training the characteristics of the human bodies with different scales on different levels respectively by using a detection model according to the adjusted characteristic pyramid.
2. The human body detection method based on feature balance and relationship enhancement as claimed in claim 1, wherein the specific steps of step 1 include:
step 1.1, carrying out first-round pre-training on a detection model by adopting a huge universal object detection data set to obtain a detection model with higher generalization characteristic extraction capability;
step 1.2, after the first round of pre-training is finished, adjusting and detecting the top layer structure of the model;
and step 1.3, performing secondary pre-training by mixing a sample containing a human body target in a general scene to obtain a detection model with better extraction capability and sensitivity on the human body characteristic features.
3. The human body detection method based on feature balance and relationship enhancement as claimed in claim 2, wherein the step 2 comprises the following specific steps:
2.1, selecting a proper intermediate scale, wherein the selection rule of the intermediate scale is as follows: when the number of the feature pyramid layers is n, selecting the scale of the feature of the first floor (n/2) (rounding down) layer as the middle scale;
step 2.2, zooming the image of the detection model by using a bilinear interpolation value to obtain a characteristic pyramid which keeps as much information as possible in the original characteristics;
2.3, simply stacking on the channel, and fusing to obtain a new characteristic diagram containing all levels of information;
and 2.4, compressing the number of channels of the new feature map to the number of channels before fusion by using an additional module to obtain a fused multi-scale feature pyramid.
4. The human body detection method based on feature balance and relationship enhancement as claimed in claim 3, wherein the specific steps of step 2.2 include:
step 2.2.1, setting relevant parameters, wherein the relevant parameters comprise coefficients which need to be multiplied by the central value;
step 2.2.2, performing linear interpolation in two directions by using the gray levels of four adjacent pixels of the pixel to be obtained, and obtaining the gray level according to the linear relation of the gray level change from f (i, j) to f (i, j + 1):
for (i, j + v), f (i, j + v) ([ f (i, j +1) -f (i, j) ] × v + f (i, j),
for (i +1, j + v), f (i +1, j + v) ═ f (i +1, j +1) -f (i +1, j) ] × v + f (i +1, j);
step 2.2.3, obtaining a pixel gray value of bilinear interpolation according to the fact that gray level changes from f (i, j + v) to f (i +1, j + v) are also in a linear relation:
Figure FDA0002567878850000021
5. the human body detection method based on feature balance and relationship enhancement as claimed in claim 4, wherein the specific method of relationship enhancement in step 3 comprises using a relationship metric function in a training process
Figure FDA0002567878850000022
It is derived that the relation between the 256 one-dimensional vectors H and the 256 one-dimensional vectors G measures the function value as:
Figure FDA0002567878850000023
further obtaining a fusion characteristic F after the relationship is enhancedm' is:
Figure FDA0002567878850000024
wherein α, β are parameters that can be learned in training, FmThe fused features are shown, the 256 one-dimensional vector H is the pooled features of the human instance, and the 256 one-dimensional vector G is the pooled features of a certain area around.
6. The human body detection method based on feature balance and relationship enhancement as claimed in claim 5, wherein the step 4 comprises:
step 4.1, fusing and enhancing the characteristics Fm' performing corresponding scaling operation;
step 4.2, for the characteristic that the original scale is smaller than the intermediate scale after the fusion, adopting a pooling layer to carry out down-sampling;
4.3, performing up-sampling on the characteristics of which the original scale is larger than the intermediate scale after fusion by adopting bilinear interpolation;
and 4.4, adjusting the characteristics that the original scale is equal to the intermediate scale after fusion by adopting a convolution layer with unchanged size.
7. The human body detection method based on feature balance and relationship enhancement as claimed in claim 6, wherein the specific steps of the step 5 include:
step 5.1, representing the training difficulty of the negative sample box by using the size of the cross-over ratio (IOU) of the negative sample box and the real sample box, and carrying out a negative sample sampling method based on cross-over ratio balance on the sample data of the detection model;
and 5.2, measuring the representativeness of the positive sample sampling by using the positive sample matching number corresponding to the real sample frame, and carrying out the positive sample sampling method based on example balance on the sample data of the detection model.
8. The human body detection method based on feature balance and relationship enhancement as claimed in claim 7, wherein the method of step 5.1 is specifically to divide all negative samples into K levels by using the labeling information of the image, and the number of the negative samples randomly sampled on the K level is:
Figure FDA0002567878850000031
wherein N is the total number of negative samples to be extracted, if the negative samples of the k-th level appear, the number NkLess than MkIf yes, all negative samples in the kth level are extracted, the negative samples in the (k +1) th level are sorted according to IOU, and M is selected according to ascending orderk-NkThe negative examples are supplemented as negative examples for the kth level.
9. The human body detection method based on feature balance and relationship enhancement according to claim 8, wherein the method of step 5.2 is specifically that all positive samples are corresponding to P human body labeling boxes, and the number of the randomly extracted positive samples around each human body labeling box is:
Figure FDA0002567878850000032
wherein M is the number of positive samples needing to be extracted for training, and P is the total number of the personal labeling frames in the image.
10. The human body detection method based on feature balance and relationship enhancement according to claim 9, wherein the step 6 is specifically:
6.1, in the training process, independently matching human bodies with different scales at a certain optimal level for training, wherein the human bodies are independent of each other;
and 6.2, in the prediction process, sequencing the prediction results of all layers according to the equal priority by using a multi-layer result fusion mode, and obtaining a final prediction result by using a single-class NMS algorithm.
CN202010634855.5A 2020-07-03 2020-07-03 Human body detection method based on feature balance and relationship enhancement Withdrawn CN111783683A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010634855.5A CN111783683A (en) 2020-07-03 2020-07-03 Human body detection method based on feature balance and relationship enhancement

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010634855.5A CN111783683A (en) 2020-07-03 2020-07-03 Human body detection method based on feature balance and relationship enhancement

Publications (1)

Publication Number Publication Date
CN111783683A true CN111783683A (en) 2020-10-16

Family

ID=72758523

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010634855.5A Withdrawn CN111783683A (en) 2020-07-03 2020-07-03 Human body detection method based on feature balance and relationship enhancement

Country Status (1)

Country Link
CN (1) CN111783683A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112966697A (en) * 2021-03-17 2021-06-15 西安电子科技大学广州研究院 Target detection method, device and equipment based on scene semantics and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112966697A (en) * 2021-03-17 2021-06-15 西安电子科技大学广州研究院 Target detection method, device and equipment based on scene semantics and storage medium

Similar Documents

Publication Publication Date Title
CN109584248B (en) Infrared target instance segmentation method based on feature fusion and dense connection network
CN112507997B (en) Face super-resolution system based on multi-scale convolution and receptive field feature fusion
CN112287860B (en) Training method and device of object recognition model, and object recognition method and system
CN110929593A (en) Real-time significance pedestrian detection method based on detail distinguishing and distinguishing
CN114359851A (en) Unmanned target detection method, device, equipment and medium
CN111160249A (en) Multi-class target detection method of optical remote sensing image based on cross-scale feature fusion
CN111709313B (en) Pedestrian re-identification method based on local and channel combination characteristics
CN115631344B (en) Target detection method based on feature self-adaptive aggregation
CN111461213A (en) Training method of target detection model and target rapid detection method
CN115019201B (en) Weak and small target detection method based on feature refinement depth network
CN112287859A (en) Object recognition method, device and system, computer readable storage medium
CN112395962A (en) Data augmentation method and device, and object identification method and system
CN114037640A (en) Image generation method and device
CN113095152A (en) Lane line detection method and system based on regression
CN115238758A (en) Multi-task three-dimensional target detection method based on point cloud feature enhancement
CN112149526A (en) Lane line detection method and system based on long-distance information fusion
Barodi et al. An enhanced artificial intelligence-based approach applied to vehicular traffic signs detection and road safety enhancement
WO2023284255A1 (en) Systems and methods for processing images
Pervej et al. Real-time computer vision-based bangla vehicle license plate recognition using contour analysis and prediction algorithm
CN111783683A (en) Human body detection method based on feature balance and relationship enhancement
CN115861595A (en) Multi-scale domain self-adaptive heterogeneous image matching method based on deep learning
CN115439926A (en) Small sample abnormal behavior identification method based on key region and scene depth
CN114359955A (en) Object visual field estimation method based on appearance features and space constraints
CN117237830B (en) Unmanned aerial vehicle small target detection method based on dynamic self-adaptive channel attention
CN115761552B (en) Target detection method, device and medium for unmanned aerial vehicle carrying platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20201016