CN114359687A - Target detection method, device, equipment and medium based on multi-mode data dual fusion - Google Patents

Target detection method, device, equipment and medium based on multi-mode data dual fusion Download PDF

Info

Publication number
CN114359687A
CN114359687A CN202111483806.7A CN202111483806A CN114359687A CN 114359687 A CN114359687 A CN 114359687A CN 202111483806 A CN202111483806 A CN 202111483806A CN 114359687 A CN114359687 A CN 114359687A
Authority
CN
China
Prior art keywords
image
frequency sub
band
fused
visible light
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111483806.7A
Other languages
Chinese (zh)
Other versions
CN114359687B (en
Inventor
张浪文
张晋凯
解宇敏
刘洁耿
施唯
彭雄峰
胡俊嘉
罗其昆
时佰仟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN202111483806.7A priority Critical patent/CN114359687B/en
Publication of CN114359687A publication Critical patent/CN114359687A/en
Application granted granted Critical
Publication of CN114359687B publication Critical patent/CN114359687B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a target detection method, a device, electronic equipment and a storage medium based on multi-mode data dual fusion, wherein the method comprises the following steps: generating a fusion image according to any pair of visible light images and infrared images in the visible light-infrared target detection data set; forming a data sample by the visible light image, the infrared image and the fused image; training a detector by using a data sample to obtain trained detectors with different modes; generating a fusion image to be detected according to a pair of visible light images and infrared images to be detected; respectively inputting the visible light image and the infrared image to be detected and the fusion image to be detected into the trained detectors with the corresponding modes to obtain detection results; and fusing the detection results to obtain a final detection result. The invention comprehensively utilizes the advantages of pixel level fusion and decision level fusion, and makes full use of the information of visible light mode and infrared mode as far as possible, thereby having more excellent all-weather detection performance.

Description

Target detection method, device, equipment and medium based on multi-mode data dual fusion
Technical Field
The invention relates to the technical field of target detection, in particular to a target detection method and device based on multi-mode data dual fusion, electronic equipment and a storage medium.
Background
In recent years, target detection technologies based on visible light images have become mature and are widely used in life and industrial production. However, the expressive force of the visible light image is insufficient under the condition of insufficient light, so that the technology has no good adaptability to the change of environmental illumination, and the target detection effect is greatly reduced under the condition of insufficient light. Aiming at the problem, the current solution is to introduce an infrared image on the basis of an original visible light image to weaken the influence of illumination conditions on the target detection performance. The infrared image has the characteristic of insensitivity to illumination, and can distinguish a target from a background according to radiation difference. While the manner in which the visible light image is consistent with the human visual system may provide texture details with high spatial resolution and clarity. The infrared image and the visible light image have complementary characteristics, and by combining the thermal radiation information in the infrared image and the detailed texture information in the visible light image, more robust target representation information is provided for a detection task, and the target detection performance under all-weather conditions is effectively improved.
At present, target detection methods based on visible light and infrared data fusion are mainly divided into two types: one is a detection method for pre-fusion detection, and the other is a detection method for pre-fusion detection. The detection method of fusion before detection is that before detection, visible light and infrared images are subjected to pixel level fusion to obtain a fused image, and then the fused image is sent to a detector for detection. Such methods can retain the information of the original modality to a large extent, but have high requirements on the alignment of the images and may cause partial information redundancy. The detection method of the detection before the fusion is to fuse the detection results, firstly use the detectors of the corresponding modes to respectively detect the visible light images and the infrared images, then combine the detection results and reserve the optimal decision result. The information redundancy degree of the method is the lowest, but the quality of the detection result has great relation with the selection of the detector, and the existing decision algorithm is difficult to break through the upper limit of the performance of a certain detector after the result is fused.
Disclosure of Invention
In order to solve the defects of the prior art, the invention provides a target detection method, a device, equipment and a medium based on multi-mode data dual fusion, wherein the method comprehensively utilizes the advantages of pixel-level fusion and decision-level fusion, so that the information of a visible light mode and an infrared mode is fully utilized as far as possible, and the method has more excellent all-weather detection performance
The invention aims to provide a target detection method based on multi-modal data dual fusion.
The second purpose of the invention is to provide an object detection device based on multi-modal data dual fusion.
A third object of the present invention is to provide an electronic apparatus.
It is a fourth object of the present invention to provide a storage medium.
The first purpose of the invention can be achieved by adopting the following technical scheme:
a method for target detection based on multi-modal data dual fusion, the method comprising:
generating a fusion image according to any pair of visible light images and infrared images in the visible light-infrared target detection data set; forming a data sample by the visible light image, the infrared image and the fused image;
training a detector by using the data sample to obtain trained detectors with different modes;
generating a fusion image to be detected according to a pair of visible light images and infrared images to be detected; inputting the pair of visible light images and infrared images to be detected and the fused image to be detected into a trained detector with a corresponding mode respectively to obtain a detection result;
and fusing the detection results to obtain a final detection result.
Further, the generating of the fused image according to any pair of the visible light image and the infrared image in the visible light-infrared target detection data set is based on an image fusion algorithm based on multi-scale transformation, and specifically includes:
processing the visible light image and the infrared image by adopting wavelet transformation to generate a low-frequency sub-band and a high-frequency sub-band;
respectively fusing the low-frequency sub-band and the high-frequency sub-band to obtain a fused low-frequency sub-band and a fused high-frequency sub-band;
and reconstructing the fused low-frequency sub-band and the fused high-frequency sub-band to generate a fused image.
Further, the processing the visible light image and the infrared image by using wavelet transform to generate a low frequency sub-band and a high frequency sub-band specifically includes:
when i is 1, carrying out ith decomposition on the visible light image to generate an ith low-frequency sub-band and t ith high-frequency sub-bands of the visible light image; carrying out ith decomposition on the infrared image to generate an ith low-frequency sub-band and t ith high-frequency sub-bands of the infrared image; wherein i is a positive integer greater than or equal to 1 and less than or equal to c, t is a first set threshold, and c is a second set threshold;
when i is larger than 1 and is smaller than or equal to c, carrying out ith decomposition on the i-1 th low-frequency sub-band of the visible light image to generate the ith low-frequency sub-band and t ith high-frequency sub-bands of the visible light image; and carrying out ith decomposition on the i-1 th low-frequency sub-band of the infrared image to generate the ith low-frequency sub-band and t ith high-frequency sub-bands of the infrared image.
Further, the fusing the low-frequency sub-band and the high-frequency sub-band respectively to obtain a fused low-frequency sub-band and a fused high-frequency sub-band specifically includes:
fusing the low-frequency sub-bands by adopting a window fusion rule to obtain a c-th fused low-frequency sub-band;
and fusing the high-frequency sub-bands by adopting a region characteristic energy fusion method to obtain fused high-frequency sub-bands.
Further, the low frequency sub-band includes a c-th low frequency sub-band LF of the visible light imagec RGB(x, y) and the c-th low-frequency subband LF of the infrared imagec IR(x,y);
The method comprises the following steps of adopting a window fusion rule to fuse the low-frequency sub-bands to obtain a c-th fused low-frequency sub-band, and specifically comprising the following steps:
obtaining the c-th fused low-frequency sub-band by using the following formula:
LFc F(x,y)=α1LFc RGB(x,y)+α2LFc IR(x,y)
wherein x and y are horizontal and vertical coordinates of processing point on the image, alpha12Alpha is the fusion coefficient of the visible light image and the infrared image respectively12=1;
The t ith high-frequency sub-bands comprise ith high-frequency sub-bands in the horizontal direction, the vertical direction and the diagonal direction;
the high-frequency sub-bands comprise the high-frequency sub-bands in the ith horizontal direction, the vertical direction and the diagonal direction of the visible light image and the high-frequency sub-bands in the ith horizontal direction, the vertical direction and the diagonal direction of the infrared image;
fusing the high-frequency sub-bands by adopting a region characteristic energy fusion method to obtain fused high-frequency sub-bands, which specifically comprises the following steps:
respectively extracting edge image features of the first image and the second image by using a canny operator to obtain edge feature images, calculating the regional variance energy features through a sliding window, and respectively obtaining regional energy values RGB (red, green, blue) of the first image and the second image at the (x, y) positionEAnd IRE(ii) a Wherein the first image and the second image are respectively a high-frequency subband in an ith horizontal direction of the visible light image and a high-frequency subband in an ith horizontal direction of the infrared image, a high-frequency subband in an ith vertical direction of the visible light image and a high-frequency subband in an ith vertical direction of the infrared image, and a high-frequency subband in an ith diagonal direction of the visible light image and a high-frequency subband in an ith diagonal direction of the infrared image;
selective fusion is performed by regional energy comparison, and a fusion formula is as follows:
Figure BDA0003395840350000031
wherein the content of the first and second substances,
Figure BDA0003395840350000032
respectively a first image and a second image,
Figure BDA0003395840350000033
is a fused image;
after the fusion, the fusion high-frequency sub-band in the ith horizontal direction, the fusion high-frequency sub-band in the ith vertical direction and the fusion high-frequency sub-band in the ith diagonal direction are respectively obtained.
Further, reconstructing the fused low-frequency subband and the fused high-frequency subband to generate a fused image specifically includes:
when i is 1, reconstructing the c-th fused low-frequency sub-band, the c-th fused high-frequency sub-band in the horizontal direction, the c-th fused high-frequency sub-band in the vertical direction and the c-th fused high-frequency sub-band in the diagonal direction to generate a c-1-th fused low-frequency sub-band;
when i is larger than 1 and is less than or equal to c, reconstructing the (c + 1) -i) th fused low-frequency sub-band, the (c + 1) -i) th fused high-frequency sub-band in the horizontal direction, the (c + 1) -i) th fused high-frequency sub-band in the vertical direction and the (c + 1-i) th fused high-frequency sub-band in the diagonal direction to generate a (c-i) th fused low-frequency sub-band;
the 0 th fused low-frequency sub-band is the fused image.
Further, the detection results comprise a visible light mode detection result, an infrared mode detection result and a fusion mode detection result, wherein each mode detection result comprises a target boundary box coordinate, a class cls to which the target belongs and a confidence score;
the fusing the detection results to obtain a final detection result specifically comprises:
all the detection results are processed as follows:
for target bounding boxes with the same value of cls, then:
calculating the intersection ratio of the two targets IoU pairwise, and when IoU is more than or equal to a third set threshold, the two target bounding boxes are the same target; if the intersection ratio IoU of the two frames is less than a third set threshold, the two target boundary frames are different targets;
if the two target bounding boxes are the same target, fusing the coordinates and the confidence score of the target bounding boxes through a Bayesian decision level fusion algorithm, and putting the fused result into a set B; if the two target bounding boxes are different targets, putting the coordinates and confidence scores of the target bounding boxes into a set B;
for target bounding boxes with different cls values, putting coordinates and confidence scores of the target bounding boxes into a set B;
the set B is the final detection result.
Further, the fusing the target bounding box coordinates and the confidence score by using a bayesian decision-level fusion algorithm specifically includes:
fusing the confidence scores of all the modes together through a Bayesian rule to obtain fused confidence scores;
and calculating the average value of the coordinate values of the target boundary frames which represent the same target in different modes to obtain the coordinate of the fused target boundary frame.
The second purpose of the invention can be achieved by adopting the following technical scheme:
an object detection apparatus based on multi-modal data dual fusion, the apparatus comprising:
the data sample acquisition module is used for generating a fusion image according to any pair of visible light images and infrared images in the visible light-infrared target detection data set; forming a data sample by the visible light image, the infrared image and the fused image;
the detector training module is used for respectively training the detectors by using the data samples to obtain the trained detectors with different modes;
the detection result generation module is used for generating a fusion image to be detected according to the pair of visible light images and the infrared image to be detected; inputting the pair of visible light images and infrared images to be detected and the fused image to be detected into a trained detector with a corresponding mode respectively to obtain a detection result;
and the detection result fusion module is used for fusing the detection results to obtain a final detection result.
The third purpose of the invention can be achieved by adopting the following technical scheme:
an electronic device comprises a processor and a memory for storing a program executable by the processor, wherein the processor executes the program stored in the memory to realize the target detection method.
The fourth purpose of the invention can be achieved by adopting the following technical scheme:
a storage medium stores a program that, when executed by a processor, implements the object detection method described above.
Compared with the prior art, the invention has the following beneficial effects:
1. the invention designs an image fusion algorithm based on multi-scale transformation, obtains a fusion image by utilizing a visible light image and an infrared image, forms three data sources with the visible light image and the infrared image, and adopts a multi-mode data dual fusion strategy at a data input level, thereby retaining original information to the maximum extent.
2. In the invention, on the aspect of detection result output, the detection results output by the three detectors according to the three modes are fused through a Bayesian decision level fusion algorithm, and the results of different detectors are integrated, so that the final output fusion result has a more accurate detection result compared with any detector.
3. The dual fusion combines the advantages of two levels of fusion, and has more excellent all-weather detection performance compared with the single use of pixel level fusion or decision level fusion.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the structures shown in the drawings without creative efforts.
Fig. 1 is a schematic diagram of a target detection method based on multi-modal data dual fusion according to embodiment 1 of the present invention.
Fig. 2 is a flowchart of a target detection method based on multi-modal data dual fusion according to embodiment 1 of the present invention.
Fig. 3 is a flowchart of an image fusion algorithm based on multi-scale transformation according to embodiment 1 of the present invention.
Fig. 4 is a schematic diagram of the distribution of fused subbands after image fusion according to embodiment 1 of the present invention.
Fig. 5 is a flowchart of the bayesian decision-level fusion algorithm in embodiment 1 of the present invention.
Fig. 6 is a block diagram of a target detection apparatus based on multi-modal data dual fusion according to embodiment 2 of the present invention.
Fig. 7 is a block diagram of an electronic device according to embodiment 3 of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer and more complete, the technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some embodiments of the present invention, but not all embodiments, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts based on the embodiments of the present invention belong to the protection scope of the present invention. It should be understood that the description of the specific embodiments is intended to be illustrative only and is not intended to be limiting.
Example 1:
the application provides a target detection method based on multi-modal data dual fusion, which mainly relates to two key technologies: pixel level fusion algorithm: an image fusion algorithm based on multi-scale transformation is designed, and the existing visible light and infrared images are utilized to obtain a fusion image, so that the data source is enriched; (II) a decision-level fusion algorithm: and a Bayesian decision level fusion algorithm is designed, the results of independent detection input by the three detectors according to the three modes are fused, and the results of different detectors are synthesized. The method mainly comprises the following steps: generating a fused image, training a detector, predicting the detector and fusing a detection result, wherein: generating a fusion image, namely generating the fusion image according to the input visible light and infrared images through a pixel level fusion algorithm; training a detector, namely training the detector of a corresponding mode by utilizing a visible light image, an infrared image and a fusion image respectively; detector prediction, namely, respectively carrying out target detection on three image inputs by using the three detectors obtained in the previous step; and (4) fusion of detection results, namely merging the three groups of detection results obtained in the last step by using a decision-level fusion algorithm to obtain a final detection result.
As shown in fig. 1 and fig. 2, the present embodiment provides a target detection method based on multi-modal data dual fusion, including the following steps:
s201, generating a fusion image according to any pair of visible light images and infrared images in the visible light-infrared target detection data set; and forming a data sample by the visible light image, the infrared image and the fused image.
The method comprises the steps of obtaining a disclosed visible light-infrared target detection data set, fusing a visible light image and an infrared image in the visible light-infrared target detection data set to obtain a fused image, and forming three groups of data samples with the visible light image and the infrared image.
As shown in fig. 3, a fused image is obtained by fusing a visible light image and an infrared image through an image fusion algorithm based on multi-scale transformation, and includes image decomposition generation sub-bands, sub-band information processing by frequency division and fused image reconstruction.
The method comprises the steps of fusing a visible light image and an infrared image by using an image fusion algorithm based on multi-scale transformation, obtaining multi-scale representation of an input image by using wavelet transformation, namely obtaining low-frequency and high-frequency sub-bands of the two images, fusing the low-frequency and high-frequency sub-bands of the two images by using different methods to obtain a fused sub-band, and finally performing multi-scale inverse transformation by using the fused sub-band, namely inverse transformation of the wavelet transformation, so as to obtain the fused image.
Further, step S201 includes:
and S2011, processing the visible light image and the infrared image by adopting wavelet transformation to generate a high-frequency sub-band and a low-frequency sub-band.
The present embodiment is described by taking a visible light image as an example. For an input image I, firstly, taking the whole image as a target, decomposing the image to obtain a low-frequency subband LF1And three high-frequency subbands DF1 H,DF1 V,DF1 D. From the second decomposition, only the low-frequency subband obtained from the previous decomposition is subjected to decomposition as a target, and one low-frequency subband and three high-frequency subbands are obtained (for example, the second decomposition is performed on the low-frequency subband LF obtained from the first decomposition1Decomposing to obtain low frequency sub-band LF2And high frequency sub-bands
Figure BDA0003395840350000071
). By small wave changeAnd obtaining a low-frequency image and three pieces of high-frequency image information in the horizontal direction, the vertical direction and the diagonal direction after conversion.
The wavelet transform generates corresponding scale and displacement functions, namely an inner product of a square integrable function f (t) and a wavelet function ψ (t), through a wavelet basis, wherein the expression of the wavelet basis is as follows:
Figure BDA0003395840350000072
where α is a scale factor, β is a displacement factor, t is time, ψ (t) is a wavelet function, i.e., mother wavelet, ψα,β(t) is a family of functions, wavelet basis, generated by shifting and warping the mother wavelet ψ (t). By varying the value of the scale factor alpha, the function psi can be controlledα,β(t) effects of stretching (α > 1) or shrinking (α < 1); changing the shift factor β affects the analysis of the function f (t) around the β point.
Continuous wavelet transform:
Figure BDA0003395840350000073
w (alpha, beta) is f (t) and is subjected to continuous wavelet transformation.
The digital image is a discrete digital signal, and the scale factor alpha and the displacement factor beta are subjected to discretization treatment, wherein the discretization treatment is
Figure BDA0003395840350000074
Wherein i, j are integers, alpha0Is a constant greater than 1, beta0A constant greater than 0, discrete wavelet transform:
Figure BDA0003395840350000075
wherein, W (alpha)i,jβ0) To f (t) undergo a discrete wavelet transform,
Figure BDA0003395840350000076
is a discretized wavelet basis.
In the embodiment, the Mallat decomposition algorithm is used to decompose the input original image to obtain a high frequency part and a low frequency part.
In this embodiment, the primary image is decomposed three times by using a Mallat decomposition algorithm to obtain three low frequency portions and nine high frequency portions, and the mathematical expression of the algorithm is as follows:
Figure BDA0003395840350000081
wherein i is 1,2, 3; r and c represent the row and column values, i.e. dimensions, of the input image; x and y are the horizontal and vertical coordinate values of the processing point on the image, and the value range is
Figure BDA0003395840350000082
LF0For input original image, LFj+1For low-frequency parts of the image, LF1,LF2,LF3Low-frequency parts after the first decomposition, the second decomposition and the third decomposition are respectively; hrAnd HcA high-pass filter; l isrAnd LcIs a low-pass filter; the high frequency parts in the horizontal, vertical and diagonal directions are respectively DF1 H,
Figure BDA0003395840350000083
DF1 V,
Figure BDA0003395840350000084
And DF1 D,
Figure BDA0003395840350000085
Can be respectively expressed by LH, HL and HH.
And carrying out the processing on the infrared image to obtain a corresponding low-frequency part and a corresponding high-frequency part. As can be seen, the low frequency part comprises the low frequency part LF of the visible light imagei RGBLow frequency part LF of (x, y) and infrared imagei IR(x,y)。
S2012, the high-frequency sub-band and the low-frequency sub-band are respectively fused to obtain a fused high-frequency sub-band and a fused low-frequency sub-band.
(1) For the low frequency part, a window fusion rule is adopted.
For the low-frequency part, a window fusion rule is adopted to fuse the Nth low-frequency sub-band after the visible light image and the infrared image are respectively decomposed for N times (N is 3 in the embodiment), and the low-frequency sub-band of the first N-1 times is subjected to re-decomposition and does not need to be fused. The nth low frequency subband fusion formula is:
Figure BDA0003395840350000086
wherein alpha is12Respectively, the fusion coefficients of the visible light image and the infrared image.
This embodiment uses average fusion to fuse the low frequency part, i.e. alpha1=0.5,α2When the total decomposition is three times, the third low-frequency subband fusion formula is as follows:
Figure BDA0003395840350000087
obtaining a third fused low frequency sub-band LF3 F
(2) For the high frequency part, a region characteristic energy fusion method is adopted.
And for the fusion of the high-frequency sub-bands, a region characteristic energy fusion method is adopted, edge characteristics are extracted from the high-frequency components through a canny operator, a region energy value is calculated to serve as a threshold condition, and the fusion is selected through a threshold after up-sampling from top to bottom.
Specifically, for the fusion of the high-frequency part, edge feature extraction is carried out by using a canny operator, the variance energy feature of a region is calculated through a sliding window, the variance feature value is used as a threshold value for comparison, and a sub-band of an image corresponding to a larger feature value is selected as a fusion sub-band.
Taking horizontal sub-bands as an example, visible light image sub-bands thereof
Figure BDA0003395840350000091
And infrared image sub-band
Figure BDA0003395840350000092
The calculation flow of (2) is as follows:
(1-1) extracting edge image features from a visible light image and an infrared image to be fused by using a canny operator to obtain edge feature images, calculating the variance energy features of the regions through a sliding window, and respectively obtaining the region energy values RGB at the (x, y) positionsEAnd IRE
(1-2) performing threshold selection fusion through regional energy comparison, wherein a fusion formula is as follows:
Figure BDA0003395840350000093
wherein the content of the first and second substances,
Figure BDA0003395840350000094
the high-frequency sub-band in the horizontal direction after the ith decomposition of the visible light image,
Figure BDA0003395840350000095
the high-frequency sub-band in the horizontal direction after the ith decomposition of the infrared image is obtained. After the fusion, a fusion high-frequency sub-band in the horizontal direction is obtained, namely the fusion high-frequency sub-band
Figure BDA0003395840350000096
By the method, the fused high-frequency sub-band in the vertical direction and the diagonal direction can be obtained, namely the fused high-frequency sub-band
Figure BDA0003395840350000097
And
Figure BDA0003395840350000098
as shown in fig. 4.
S2013, reconstructing the fused high-frequency sub-band and the fused low-frequency sub-band by adopting a Mallat algorithm to obtain a fused image.
The reconstruction is the inverse process of the decomposition, that is, the low-frequency subband and the high-frequency subband obtained by the decomposition are reconstructed into the low-frequency subband subjected to the decomposition in the previous round (for example, the low-frequency subband and the high-frequency subband obtained by the decomposition in the third time in this embodiment are fused, and then the fused low-frequency subband and the fused high-frequency subband are reconstructed to obtain the fused low-frequency subband subjected to the decomposition in the second time, and so on). When the fused low frequency sub-band LF of the first decomposition is reconstructed1 FThen, it is combined with the fused high frequency sub-band
Figure BDA0003395840350000099
Make reconstruction, the final LF0 FNamely the fused image.
The image reconstruction process of the Mallat algorithm can be described as:
Figure BDA00033958403500000910
wherein i is 2,1, 0; the final LF0 FNamely the fusion image; r and c represent the row, column values, i.e. dimensions, of the input image; and m and n are horizontal and vertical coordinate values of processing points on the generated image, and the value range m is 0-2 r, and n is 0-2 c.
And fusing each pair of images in the visible light-infrared data set by adopting the image fusion algorithm based on multi-scale transformation to obtain a fused image, and forming three groups of data samples in different modes with the visible light-infrared data set.
S202, training the detector by using the data sample to obtain the trained detectors with different modes.
Based on the YOLOv5 target detection framework, the detector is trained by using the visible light image, the infrared image and the corresponding fusion image respectively, so as to obtain trained detectors of three different modalities.
Further, a Yolov5 target detection algorithm is used as a framework, three detectors with random weights are initialized, and the three detectors are respectively trained by using data samples of a visible light mode, an infrared mode and a fusion mode, so that optimal model weight parameters are obtained, and the detectors corresponding to the trained three modes are obtained.
S203, generating a fusion image to be detected according to the pair of visible light image and infrared image to be detected; and respectively inputting the pair of visible light images and infrared images to be detected and the fused images to be detected into the trained detectors with the corresponding modes to obtain detection results.
And (3) inputting a pair of visible light images and infrared images and the fused images obtained by the visible light images and the infrared images through the step (S201) into the corresponding detectors respectively by using the detectors trained in the step (S202) to obtain three groups of detection results.
For a pair of visible light and infrared images to be detected, a fused image is obtained through step S201, the visible light image, the infrared image and the fused image are respectively input into corresponding detectors for detection, and three groups of detection results are respectively obtained, where the detection results include three information, namely, coordinates of a target boundary box, a class to which the target belongs, and a confidence score (i.e., a probability that all given classes may belong to each class), and are respectively marked as bbox, cls, conf.
Specifically, a visible light image, an infrared image and a fusion image describing the same picture are respectively input into corresponding detectors to obtain three groups of detection results, and each target in the detection results is composed of three pieces of information { bbox, cls and conf }, wherein bbox represents the coordinate of a bounding box of the target, cls represents the most likely category to which the target belongs, and conf represents the probability that all given categories may belong to various categories.
And S204, fusing the detection results to obtain a final detection result.
As shown in fig. 5, the three groups of detection results are merged by using a bayesian decision-level fusion algorithm to obtain a final detection result.
Specifically, three groups of detection results are put into a set A, and whether each overlapped bounding box represents the same target is judged according to the bounding box intersection ratio (IoU): taking a threshold value thres, when some boundary frames in the set are overlapped in pairs and the intersection ratio IoU of the two frames is more than or equal to thres, considering that the two frames represent the same target, fusing the classification confidence score and the boundary frame coordinate representing the same target by using a decision-level fusion algorithm, and putting the result into the set B; if the intersection ratio IoU of the two boxes is less than thres, the two boxes represent different targets, and the confidence score and the bounding box coordinate of the two boxes are reserved according to the original result and are placed in the set B. And the result in the set B is the final detection result.
Further, step S204 includes:
(1) placing the three groups of detection results in a set A, calculating the intersection ratio IoU of the bounding boxes with the same cls value pairwise, setting the threshold value thres to be 0.5, and when the intersection ratio IoU of the two boxes is more than or equal to 0.5, considering that the two boxes represent the same target; if the intersection ratio IoU of the two boxes is ≦ thres, the two boxes are considered to represent different targets.
(2) And according to the IoU result, correspondingly processing the bounding box in the set in the following way:
Figure BDA0003395840350000111
wherein fusion represents that if the two frames represent the same target, the coordinates and the confidence score of the target boundary frame are fused by a Bayesian decision level fusion algorithm; the reserve indicates that if the two boxes indicate that the two boxes do not represent the same target, no processing is performed on the bounding box, and the result is retained.
Further, the step (2) specifically comprises:
and (2-1) fusing the confidence scores of all the modes together through a Bayesian rule to obtain a fused confidence score.
The fusion of the confidence scores is based on the condition independence of the detection process of each mode and the combination of the Bayes rule and the prediction characteristics of the YOLOv5 algorithm, and the fused confidence scores and the original confidence scores have the following relations:
Figure BDA0003395840350000112
wherein y represents a visible-infrared target detection data setThe object category annotated in (1) includes, but is not limited to, typical object categories such as pedestrians, bicycles, automobiles, etc., when k is 0,1,2, the object is a pedestrian, a bicycle or an automobile, and so on, xjInformation in a certain modality of input for the target category y, p (y | x)j) To predict the probability that a certain object belongs to class k (y-k) in label y,
Figure BDA0003395840350000113
is p (y | x)j) Probability distribution under the YOLOv5 target detection framework.
And multiplying the prediction probabilities of the same target from different modes, and dividing the result by the sum of the probabilities of all the categories to perform normalization to obtain a fusion confidence score. The principle of confidence score fusion is as follows:
predicting information x of visible light modality given label y of certain object1If this prediction does not change the infrared modality information x2Or fusion modality information x3Then conditional independence holds. According to the independence condition, the following probability relationship is provided:
p(x1,x2,x3|y)=p(x1|y)p(x2|y)p(x3|y)
according to the multi-modal information, the probability that a certain target belongs to the category y is predicted, and the probability is obtained by a Bayesian rule:
Figure BDA0003395840350000114
according to the independence condition, the above formula can be rewritten into the following relational expression, which is called Bayesian probability fusion formula:
Figure BDA0003395840350000121
according to the prediction principle of the Yolov5 target detection framework, certain modal information x is input1Predicting a probability distribution (k) where a certain target belongs to class k (y-k) in the label yConfidence score) may be expressed as:
Figure BDA0003395840350000122
substituting the Bayes probability fusion formula to obtain:
Figure BDA0003395840350000123
in summary, the process of bayesian probability fusion can be described simply as the normalization by multiplying the predicted probabilities from different modalities for the same target, and dividing by the sum of the probabilities of all classes.
(2-2) the principle of bounding box coordinate fusion adopts the principle of simple averaging, namely, the coordinate values of the bounding boxes from different modalities and representing the same target are averaged. Assuming that three detectors respectively detect three images with different modalities and the same content, n detectors detect the same target, such as the same automobile, and the predicted coordinates of the target bounding box are expressed as
Figure BDA0003395840350000124
Wherein n is an integer and n<=3;
Figure BDA0003395840350000125
Represents the horizontal and vertical coordinates of the upper left corner of the bounding box,
Figure BDA0003395840350000126
the coordinates of the bottom right corner of the bounding box are represented by the horizontal and vertical coordinates from the prediction result of the jth detector, and the coordinates of the fused bounding box are represented as:
Figure BDA0003395840350000127
the method of averaging the coordinates of the bounding box can properly reduce the prediction error of the coordinates of the detection box, so that the finally obtained fusion bounding box is closer to the real label.
(3) Performing the above processing on all the prediction results in the set A, if the intersection ratio of the two frames is IoU < thres, considering that the two frames represent different targets, reserving the confidence score and the boundary frame coordinate of the two frames according to the original result, and putting the confidence score and the boundary frame coordinate into the set B; otherwise, the two frames represent the same target, the classification confidence score and the boundary frame coordinate representing the same target are fused, and the result is put into the set B.
Those skilled in the art will appreciate that all or part of the steps in the method for implementing the above embodiments may be implemented by a program to instruct associated hardware, and the corresponding program may be stored in a computer-readable storage medium.
It should be noted that although the method operations of the above-described embodiments are depicted in the drawings in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Rather, the depicted steps may change the order of execution. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.
Example 2:
as shown in fig. 6, the present embodiment provides an object detection apparatus based on multi-modal data dual fusion, the apparatus includes a data sample acquisition module 601, a detector training module 602, a detection result generation module 603, and a detection result fusion module 604, wherein:
the data sample acquisition module 601 is configured to generate a fused image according to any pair of visible light images and infrared images in the visible light-infrared target detection data set; forming a data sample by the visible light image, the infrared image and the fused image;
a detector training module 602, configured to train detectors respectively by using the data samples, so as to obtain trained detectors in different modalities;
a detection result generating module 603, configured to generate a fused image to be detected according to a pair of visible light images and infrared images to be detected; inputting the pair of visible light images and infrared images to be detected and the fused image to be detected into a trained detector with a corresponding mode respectively to obtain a detection result;
and a detection result fusion module 604, configured to fuse the detection results to obtain a final detection result.
The specific implementation of each module in this embodiment may refer to embodiment 1, which is not described herein any more; it should be noted that, the apparatus provided in this embodiment is only illustrated by dividing the functional modules, and in practical applications, the functions may be distributed by different functional modules according to needs, that is, the internal structure is divided into different functional modules to complete all or part of the functions described above.
Example 3:
this embodiment provides an electronic device, which may be a computer, as shown in fig. 7, and includes a processor 702, a memory, an input device 703, a display 704, and a network interface 705 that are connected by a system bus 701, where the processor is used to provide computing and control capabilities, the memory includes a nonvolatile storage medium 706 and an internal memory 707, the nonvolatile storage medium 706 stores an operating system, computer programs, and a database, the internal memory 707 provides an environment for the operating system and the computer programs in the nonvolatile storage medium to run, and when the processor 702 executes the computer programs stored in the memory, the object detection method of embodiment 1 is implemented as follows:
generating a fusion image according to any pair of visible light images and infrared images in the visible light-infrared target detection data set; forming a data sample by the visible light image, the infrared image and the fused image;
training a detector by using the data sample to obtain trained detectors with different modes;
generating a fusion image to be detected according to a pair of visible light images and infrared images to be detected; inputting the pair of visible light images and infrared images to be detected and the fused image to be detected into a trained detector with a corresponding mode respectively to obtain a detection result;
and fusing the detection results to obtain a final detection result.
Example 4:
the present embodiment provides a storage medium, which is a computer-readable storage medium, and stores a computer program, and when the computer program is executed by a processor, the method for detecting the target of embodiment 1 is implemented as follows:
generating a fusion image according to any pair of visible light images and infrared images in the visible light-infrared target detection data set; forming a data sample by the visible light image, the infrared image and the fused image;
training a detector by using the data sample to obtain trained detectors with different modes;
generating a fusion image to be detected according to a pair of visible light images and infrared images to be detected; inputting the pair of visible light images and infrared images to be detected and the fused image to be detected into a trained detector with a corresponding mode respectively to obtain a detection result;
and fusing the detection results to obtain a final detection result.
It should be noted that the computer readable storage medium of the present embodiment may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
In conclusion, the invention designs an image fusion algorithm based on multi-scale transformation, obtains a fusion image by utilizing a visible light image and an infrared image, forms three data sources with the visible light image and the infrared image, and furthest retains original information; meanwhile, three modal detection results output by the three detectors are fused through a Bayesian decision level fusion algorithm, and the results of different detectors are integrated, so that the final output fusion result has a more accurate detection result compared with any detector. The invention combines the advantages of two levels of fusion, and has more excellent all-weather detection performance compared with the single use of pixel level fusion or decision level fusion.
The above description is only for the preferred embodiments of the present invention, but the protection scope of the present invention is not limited thereto, and any person skilled in the art can substitute or change the technical solution and the inventive concept of the present invention within the scope of the present invention.

Claims (10)

1. A target detection method based on multi-modal data dual fusion is characterized by comprising the following steps:
generating a fusion image according to any pair of visible light images and infrared images in the visible light-infrared target detection data set; forming a data sample by the visible light image, the infrared image and the fused image;
training a detector by using the data sample to obtain trained detectors with different modes;
generating a fusion image to be detected according to a pair of visible light images and infrared images to be detected; inputting the pair of visible light images and infrared images to be detected and the fused image to be detected into a trained detector with a corresponding mode respectively to obtain a detection result;
and fusing the detection results to obtain a final detection result.
2. The target detection method according to claim 1, wherein the generating of the fused image according to any pair of the visible light image and the infrared image in the visible light-infrared target detection data set adopts an image fusion algorithm based on multi-scale transformation, and specifically comprises:
processing the visible light image and the infrared image by adopting wavelet transformation to generate a low-frequency sub-band and a high-frequency sub-band;
respectively fusing the low-frequency sub-band and the high-frequency sub-band to obtain a fused low-frequency sub-band and a fused high-frequency sub-band;
and reconstructing the fused low-frequency sub-band and the fused high-frequency sub-band to generate a fused image.
3. The object detection method according to claim 2, wherein the processing the visible light image and the infrared image by using wavelet transform to generate a low frequency subband and a high frequency subband specifically comprises:
when i is 1, carrying out ith decomposition on the visible light image to generate an ith low-frequency sub-band and t ith high-frequency sub-bands of the visible light image; carrying out ith decomposition on the infrared image to generate an ith low-frequency sub-band and t ith high-frequency sub-bands of the infrared image; wherein i is a positive integer greater than or equal to 1 and less than or equal to c, t is a first set threshold, and c is a second set threshold;
when i is larger than 1 and is smaller than or equal to c, carrying out ith decomposition on the i-1 th low-frequency sub-band of the visible light image to generate the ith low-frequency sub-band and t ith high-frequency sub-bands of the visible light image; and carrying out ith decomposition on the i-1 th low-frequency sub-band of the infrared image to generate the ith low-frequency sub-band and t ith high-frequency sub-bands of the infrared image.
4. The target detection method according to claim 3, wherein the fusing the low-frequency subband and the high-frequency subband respectively to obtain a fused low-frequency subband and a fused high-frequency subband specifically comprises:
fusing the low-frequency sub-bands by adopting a window fusion rule to obtain a c-th fused low-frequency sub-band;
and fusing the high-frequency sub-bands by adopting a region characteristic energy fusion method to obtain fused high-frequency sub-bands.
5. The object detection method of claim 4, wherein the low-frequency subbands include a c-th low-frequency subband of the visible light image
Figure FDA0003395840340000021
And the c-th low-frequency sub-band of the infrared image
Figure FDA0003395840340000022
The method comprises the following steps of adopting a window fusion rule to fuse the low-frequency sub-bands to obtain a c-th fused low-frequency sub-band, and specifically comprising the following steps:
obtaining the c-th fused low-frequency sub-band by using the following formula:
Figure FDA0003395840340000023
wherein x and y are horizontal and vertical coordinates of processing point on the image, alpha12Respectively the fusion coefficients of the visible light image and the infrared image and alpha12=1;
The t ith high-frequency sub-bands comprise ith high-frequency sub-bands in the horizontal direction, the vertical direction and the diagonal direction;
the high-frequency sub-bands comprise the high-frequency sub-bands in the ith horizontal direction, the vertical direction and the diagonal direction of the visible light image and the high-frequency sub-bands in the ith horizontal direction, the vertical direction and the diagonal direction of the infrared image;
fusing the high-frequency sub-bands by adopting a region characteristic energy fusion method to obtain fused high-frequency sub-bands, which specifically comprises the following steps:
respectively extracting edge image features of the first image and the second image by using a canny operator to obtain edge feature images, calculating the regional variance energy features through a sliding window, and respectively obtaining regional energy values RGB (red, green, blue) of the first image and the second image at the (x, y) positionEAnd IRE(ii) a Wherein the firstThe image and the second image are respectively a high-frequency sub-band in the ith horizontal direction of the visible light image and a high-frequency sub-band in the ith horizontal direction of the infrared image, a high-frequency sub-band in the ith vertical direction of the visible light image and a high-frequency sub-band in the ith vertical direction of the infrared image, and a high-frequency sub-band in the ith diagonal direction of the visible light image and a high-frequency sub-band in the ith diagonal direction of the infrared image;
selective fusion is performed by regional energy comparison, and a fusion formula is as follows:
Figure FDA0003395840340000024
wherein the content of the first and second substances,
Figure FDA0003395840340000025
respectively a first image and a second image,
Figure FDA0003395840340000026
is a fused image;
after the fusion, the fusion high-frequency sub-band in the ith horizontal direction, the fusion high-frequency sub-band in the ith vertical direction and the fusion high-frequency sub-band in the ith diagonal direction are respectively obtained.
6. The target detection method according to claim 5, wherein the reconstructing the fused low-frequency subband and the fused high-frequency subband to generate a fused image specifically comprises:
when i is 1, reconstructing the c-th fused low-frequency sub-band, the c-th fused high-frequency sub-band in the horizontal direction, the c-th fused high-frequency sub-band in the vertical direction and the c-th fused high-frequency sub-band in the diagonal direction to generate a c-1-th fused low-frequency sub-band;
when i is larger than 1 and is less than or equal to c, reconstructing the (c + 1) -i) th fused low-frequency sub-band, the (c + 1) -i) th fused high-frequency sub-band in the horizontal direction, the (c + 1) -i) th fused high-frequency sub-band in the vertical direction and the (c + 1-i) th fused high-frequency sub-band in the diagonal direction to generate a (c-i) th fused low-frequency sub-band;
the 0 th fused low-frequency sub-band is the fused image.
7. The target detection method according to claim 1, wherein the detection results comprise visible light mode detection results, infrared mode detection results and fusion mode detection results, wherein each mode detection result comprises target bounding box coordinates, class cls to which the target belongs and a confidence score;
the fusing the detection results to obtain a final detection result specifically comprises:
all the detection results are processed as follows:
for target bounding boxes with the same value of cls, then:
calculating the intersection ratio of the two targets IoU pairwise, and when IoU is more than or equal to a third set threshold, the two target bounding boxes are the same target; if the intersection ratio IoU of the two frames is less than a third set threshold, the two target boundary frames are different targets;
if the two target bounding boxes are the same target, fusing the coordinates and the confidence score of the target bounding boxes through a Bayesian decision level fusion algorithm, and putting the fused result into a set B; if the two target bounding boxes are different targets, putting the coordinates and confidence scores of the target bounding boxes into a set B;
for target bounding boxes with different cls values, putting coordinates and confidence scores of the target bounding boxes into a set B;
the set B is the final detection result.
8. The method for detecting the target according to claim 7, wherein the fusing the target bounding box coordinates and the confidence score by a Bayesian decision level fusion algorithm specifically comprises:
fusing the confidence scores of all the modes together through a Bayesian rule to obtain fused confidence scores;
and calculating the average value of the coordinate values of the target boundary frames which represent the same target in different modes to obtain the coordinate of the fused target boundary frame.
9. An object detection device based on multi-modal data dual fusion, the device comprising:
the data sample acquisition module is used for generating a fusion image according to any pair of visible light images and infrared images in the visible light-infrared target detection data set; forming a data sample by the visible light image, the infrared image and the fused image;
the detector training module is used for respectively training the detectors by using the data samples to obtain the trained detectors with different modes;
the detection result generation module is used for generating a fusion image to be detected according to the pair of visible light images and the infrared image to be detected; inputting the pair of visible light images and infrared images to be detected and the fused image to be detected into a trained detector with a corresponding mode respectively to obtain a detection result;
and the detection result fusion module is used for fusing the detection results to obtain a final detection result.
10. A storage medium storing a program, wherein the program, when executed by a processor, implements the object detection method of any one of claims 1 to 8.
CN202111483806.7A 2021-12-07 2021-12-07 Target detection method, device, equipment and medium based on multi-mode data double fusion Active CN114359687B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111483806.7A CN114359687B (en) 2021-12-07 2021-12-07 Target detection method, device, equipment and medium based on multi-mode data double fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111483806.7A CN114359687B (en) 2021-12-07 2021-12-07 Target detection method, device, equipment and medium based on multi-mode data double fusion

Publications (2)

Publication Number Publication Date
CN114359687A true CN114359687A (en) 2022-04-15
CN114359687B CN114359687B (en) 2024-04-09

Family

ID=81098072

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111483806.7A Active CN114359687B (en) 2021-12-07 2021-12-07 Target detection method, device, equipment and medium based on multi-mode data double fusion

Country Status (1)

Country Link
CN (1) CN114359687B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116543283A (en) * 2023-07-05 2023-08-04 合肥工业大学 Multimode target detection method considering modal uncertainty
CN117773405A (en) * 2024-02-28 2024-03-29 茌平鲁环汽车散热器有限公司 Method for detecting brazing quality of automobile radiator
CN117773405B (en) * 2024-02-28 2024-05-14 茌平鲁环汽车散热器有限公司 Method for detecting brazing quality of automobile radiator

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107451984A (en) * 2017-07-27 2017-12-08 桂林电子科技大学 A kind of infrared and visual image fusion algorithm based on mixing multiscale analysis
CN111754447A (en) * 2020-07-06 2020-10-09 江南大学 Infrared and visible light image fusion method based on multi-state context hidden Markov model

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107451984A (en) * 2017-07-27 2017-12-08 桂林电子科技大学 A kind of infrared and visual image fusion algorithm based on mixing multiscale analysis
CN111754447A (en) * 2020-07-06 2020-10-09 江南大学 Infrared and visible light image fusion method based on multi-state context hidden Markov model

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
李伟等: "结合NSST和LC显著性的红外与可见光图像融合", 《电子技术与软件工程》, no. 08, 15 April 2020 (2020-04-15) *
邱文嘉等: "基于NSCT和SLIP模型的红外与可见光图像融合", 《指挥信息系统与技术》, no. 02, 8 May 2018 (2018-05-08) *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116543283A (en) * 2023-07-05 2023-08-04 合肥工业大学 Multimode target detection method considering modal uncertainty
CN116543283B (en) * 2023-07-05 2023-09-15 合肥工业大学 Multimode target detection method considering modal uncertainty
CN117773405A (en) * 2024-02-28 2024-03-29 茌平鲁环汽车散热器有限公司 Method for detecting brazing quality of automobile radiator
CN117773405B (en) * 2024-02-28 2024-05-14 茌平鲁环汽车散热器有限公司 Method for detecting brazing quality of automobile radiator

Also Published As

Publication number Publication date
CN114359687B (en) 2024-04-09

Similar Documents

Publication Publication Date Title
Xu et al. Inter/intra-category discriminative features for aerial image classification: A quality-aware selection model
CN111882002B (en) MSF-AM-based low-illumination target detection method
Tang et al. DIVFusion: Darkness-free infrared and visible image fusion
CN110956126B (en) Small target detection method combined with super-resolution reconstruction
CN111539343B (en) Black smoke vehicle detection method based on convolution attention network
CN110533046B (en) Image instance segmentation method and device, computer readable storage medium and electronic equipment
Raza et al. IR-MSDNet: Infrared and visible image fusion based on infrared features and multiscale dense network
CN111899203B (en) Real image generation method based on label graph under unsupervised training and storage medium
CN110807384A (en) Small target detection method and system under low visibility
Jiang et al. A self-attention network for smoke detection
CN112966747A (en) Improved vehicle detection method based on anchor-frame-free detection network
CN115565043A (en) Method for detecting target by combining multiple characteristic features and target prediction method
Hongmeng et al. A detection method for deepfake hard compressed videos based on super-resolution reconstruction using CNN
CN114359687B (en) Target detection method, device, equipment and medium based on multi-mode data double fusion
CN116311005A (en) Apparatus, method and storage medium for moving image processing
CN111368634A (en) Human head detection method, system and storage medium based on neural network
Shao et al. Generative image inpainting with salient prior and relative total variation
Chen et al. Real-time lane detection model based on non bottleneck skip residual connections and attention pyramids
Liang et al. An Interpretable Image Denoising Framework Via Dual Disentangled Representation Learning
Zhang et al. Trustworthy image fusion with deep learning for wireless applications
CN116883303A (en) Infrared and visible light image fusion method based on characteristic difference compensation and fusion
CN116958911A (en) Traffic monitoring image target detection method oriented to severe weather
CN111598841A (en) Example significance detection method based on regularized dense connection feature pyramid
CN112446292B (en) 2D image salient object detection method and system
CN114332754A (en) Cascade R-CNN pedestrian detection method based on multi-metric detector

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant