CN115424204A

CN115424204A - Pedestrian detection method and system based on information fusion

Info

Publication number: CN115424204A
Application number: CN202211029205.3A
Authority: CN
Inventors: 赵小龙; 韩中军
Original assignee: Wenzhou Danguang Stationery Co ltd
Current assignee: Wenzhou Danguang Stationery Co ltd
Priority date: 2022-08-26
Filing date: 2022-08-26
Publication date: 2022-12-02

Abstract

The application relates to the field of pedestrian intelligent detection, and particularly discloses a pedestrian detection method and system based on information fusion, wherein monitoring image characteristics acquired by a visual sensor in an unsafe area of a hoisting equipment site and echo signal characteristics acquired by a millimeter wave radar are excavated through a convolutional neural network model based on deep learning to comprehensively detect pedestrians in the area, and in the process of feature fusion, derivative information hyperconvexity coefficient is applied to serve as a loss function to train so as to perform hyperconvexity consistency derivation representation of the manifold of high-dimensional features through information measurement among internal element sub-dimensions of a feature map in a high-dimensional space, so that feature extraction capacities of convolutional neural network models of the two can be migrated and learned, and the classification accuracy is improved. Therefore, the pedestrian in the unsafe area of the lifting equipment site can be accurately detected to ensure the safety of the pedestrian.

Description

Pedestrian detection method and system based on information fusion

Technical Field

The present application relates to the field of pedestrian intelligent detection, and more particularly, to a pedestrian detection method and system based on information fusion.

Background

Accidents of casualties frequently occur on the site of the lifting equipment, so that the pedestrian detection and positioning technology needs to be paid attention to in specific construction projects. When the pedestrians on the construction site are detected, the pedestrians need to be extracted from a complex and changeable construction environment, and the position and motion state information of the pedestrians are acquired in real time to ensure the safety of the pedestrians.

The visual sensor is widely applied to target detection due to the advantages of low cost, rich image information, easy target identification and classification and the like, but the detection method based on the vision is easily influenced by the environment (illumination, weather, dust concentration and the like) and cannot acquire accurate position information of the target; the electromagnetic wave emitted by the millimeter wave Lei Dayin has the characteristic of strong dust penetrating capability and is used for detecting the relative distance between an obstacle and a millimeter wave radar in a given environment, but the geometric information and the category information of a target cannot be acquired.

Therefore, in order to accurately detect pedestrians in an unsafe area of a lifting equipment site to ensure the safety of the pedestrians, it is desirable to provide a pedestrian detection scheme based on information fusion.

Disclosure of Invention

The present application is proposed to solve the above-mentioned technical problems. The embodiment of the application provides a pedestrian detection method and system based on information fusion, wherein monitoring image characteristics acquired by a visual sensor in an unsafe area of a hoisting equipment site and echo signal characteristics acquired by a millimeter wave radar are mined through a convolutional neural network model based on deep learning to comprehensively detect pedestrians in the area, and in the process of feature fusion, a derivative information hyper-convexity coefficient is applied to serve as a loss function to train so as to perform hyper-convexity consistency derivative representation of the manifold of high-dimensional features through information measurement among internal element sub-dimensions of a feature map in a high-dimensional space, so that the convolutional neural network models of the two can perform transfer learning on feature extraction capability of each other, and the accuracy of classification is improved. Therefore, the pedestrian in the unsafe area of the lifting equipment site can be accurately detected to ensure the safety of the pedestrian.

According to one aspect of the application, a pedestrian detection method based on information fusion is provided, and comprises the following steps:

a training phase comprising:

acquiring a monitoring image through a visual sensor deployed on a hoisting equipment site;

acquiring an echo signal through a millimeter wave radar deployed on the site of the lifting equipment;

passing the monitoring image through a first convolutional neural network to obtain a first feature map;

passing the oscillogram of the echo signal through a second convolutional neural network to obtain a second characteristic map;

fusing the first feature map and the second feature map to obtain a classification feature map;

passing the classification feature map through a classifier to obtain a classification loss function value;

calculating a derivative information hyperconvexity measurement factor between the first feature map and the second feature map, wherein the derivative information hyperconvexity measurement factor is a weighted sum of absolute values of differences between feature values of respective positions in the first feature map and the second feature map along a width dimension, a height dimension and a channel dimension divided by a weighted sum of differences between feature values of respective positions in the first feature map and the second feature map along the width dimension, the height dimension and the channel dimension; and

calculating a weighted sum of the classification loss function value and the derivative information hyperconvexity factor as a loss function value to train the first convolutional neural network and the second convolutional neural network; and

an inference phase comprising:

acquiring echo signals through a millimeter wave radar deployed on the site of the lifting equipment;

passing the monitoring image through the first convolutional neural network trained by a training stage to obtain a first feature map;

passing the oscillogram of the echo signal through the trained second convolutional neural network to obtain a second feature map;

fusing the first feature map and the second feature map to obtain a classification feature map; and passing the classification characteristic diagram through a classifier to obtain a classification result, wherein the classification result is used for indicating whether pedestrians exist in the non-safety area or not.

According to another aspect of the present application, there is provided a pedestrian detection system based on information fusion, including:

a training module comprising:

the monitoring image acquisition unit is used for acquiring a monitoring image through a visual sensor deployed on a hoisting equipment site;

the echo signal acquisition unit is used for acquiring echo signals through a millimeter wave radar deployed on the site of the lifting equipment;

the first feature extraction unit is used for enabling the monitoring image to pass through a first convolutional neural network to obtain a first feature map;

the second feature extraction unit is used for enabling the oscillogram of the echo signal to pass through a second convolutional neural network so as to obtain a second feature map;

a fusion unit, configured to fuse the first feature map and the second feature map to obtain a classification feature map;

the classification loss function value calculation unit is used for enabling the classification characteristic map to pass through a classifier so as to obtain a classification loss function value;

a derived information hyperconvexity measurement factor calculation unit configured to calculate a derived information hyperconvexity measurement factor between the first feature map and the second feature map, wherein the derived information hyperconvexity measurement factor is a weighted sum of absolute values of differences between feature values of respective positions in the first feature map and the second feature map along a width dimension, a height dimension, and a channel dimension, divided by a weighted sum of differences between feature values of respective positions in the first feature map and the second feature map along the width dimension, the height dimension, and the channel dimension; and

a training unit for calculating a weighted sum of the classification loss function value and the derivative information hyperconvexity factor as a loss function value to train the first convolutional neural network and the second convolutional neural network; and

an inference module comprising:

the inferred image acquisition unit is used for acquiring a monitoring image through a visual sensor deployed on a hoisting equipment site;

the inferred signal acquisition unit is used for acquiring echo signals through a millimeter wave radar deployed on the site of the lifting equipment;

a first feature map generation unit, configured to pass the monitoring image through the first convolutional neural network trained in the training stage to obtain a first feature map;

a second feature map generation unit, configured to pass the waveform map of the echo signal through the trained second convolutional neural network to obtain a second feature map;

a classification feature map generation unit, configured to fuse the first feature map and the second feature map to obtain a classification feature map; and

and the classification unit is used for enabling the classification characteristic graph to pass through a classifier to obtain a classification result, and the classification result is used for indicating whether pedestrians exist in the non-safety area or not.

According to yet another aspect of the present application, there is provided a computer readable medium having stored thereon computer program instructions which, when executed by a processor, cause the processor to execute the information fusion-based pedestrian detection method as described above.

According to the pedestrian detection method and system based on information fusion, the monitoring image characteristics acquired by the vision sensor in the unsafe area of the lifting equipment site and the echo signal characteristics acquired by the millimeter wave radar are excavated through the convolutional neural network model based on deep learning to comprehensively detect pedestrians in the area, and in the process of feature fusion, the derivative information hyperconvexity degree factor is applied to be used as a loss function to train so as to perform the hyperconvexity consistency derivation representation of the manifold of the high-dimensional features through the information measurement between the internal element sub-dimensions of the feature diagram in the high-dimensional space, so that the convolutional neural network models of the two can perform transfer learning on the feature extraction capability of each other, and the accuracy of classification is improved. Therefore, the pedestrian in the unsafe area of the lifting equipment site can be accurately detected to ensure the safety of the pedestrian.

Drawings

The above and other objects, features and advantages of the present application will become more apparent by describing in more detail embodiments of the present application with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of the embodiments of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the principles of the application. In the drawings, like reference numbers generally represent like parts or steps.

Fig. 1 is a scene schematic diagram of a pedestrian detection method based on information fusion according to an embodiment of the present application.

Fig. 2A is a flowchart of a training phase in a pedestrian detection method based on information fusion according to an embodiment of the present application.

Fig. 2B is a flowchart of an inference phase in a pedestrian detection method based on information fusion according to an embodiment of the present application.

Fig. 3A is a schematic diagram of an architecture of a training phase in a pedestrian detection method based on information fusion according to an embodiment of the present disclosure.

Fig. 3B is a schematic diagram of an architecture of an inference stage in a pedestrian detection method based on information fusion according to an embodiment of the present application.

Fig. 4 is a block diagram of a pedestrian detection system based on information fusion according to an embodiment of the present application.

Detailed Description

Hereinafter, example embodiments according to the present application will be described in detail with reference to the accompanying drawings. It should be understood that the described embodiments are only some embodiments of the present application and not all embodiments of the present application, and that the present application is not limited by the example embodiments described herein.

Overview of a scene

As mentioned above, accidents of casualties frequently occur on the site of the lifting equipment, so the pedestrian detection and positioning technology needs to be paid attention to in specific construction projects. When the pedestrians on the construction site are detected, the pedestrians need to be extracted from a complex and changeable construction environment, and the position and motion state information of the pedestrians are acquired in real time to ensure the safety of the pedestrians.

Accordingly, the inventor considers that the vision sensor is susceptible to the environment, the millimeter wave radar cannot acquire the geometric and category information of the target, but the millimeter wave radar has respective advantages, and therefore if the geometric and category information of the target can be combined with the acquired data characteristic information to perform fusion classification, the pedestrian in the unsafe area of the lifting equipment site can be detected more accurately.

Specifically, in the technical scheme of the application, firstly, a monitoring image of a target area to be detected is obtained through a vision sensor deployed on a hoisting equipment site, and an echo signal of the target area to be detected is obtained through a millimeter wave radar. It should be understood that, in order to extract local implicit feature information in the monitored image, a convolutional neural network model with excellent performance in implicit feature extraction of the image is used for feature mining of the monitored image. And in order to make full use of the geometric information of the target, shallow features of the monitored image are extracted using a shallow layer of a convolutional neural network and deep features are extracted from a deep layer of the convolutional neural network. In particular, the number of shallow layers of the convolutional neural network is selected according to the total number of layers of the convolutional neural network, for example, when the total number of layers is 40, the shallow feature map is extracted from the 4 th layer of the convolutional neural network, and when the total number of layers is 50, the shallow feature map is extracted from the 5 th layer of the convolutional neural network. Therefore, the deep characteristic diagram and the shallow characteristic diagram are fused to obtain the first characteristic diagram with abundant geometric information and category information of the target to be detected.

Then, for the echo signal acquired by the millimeter wave radar, feature extraction is performed by using a depth separable convolutional neural network to obtain a second feature map. Specifically, the convolution kernel of the deep separable convolutional neural network is

Here, the deep separable convolutional neural network can remove an unwanted noise signal in the echo signal, so that accuracy of pedestrian detection is higher, which is beneficial to improving subsequent accuracy, and it can also replace three-dimensional convolution with two-dimensional convolutions, which reduces the amount of computation, so that accuracy of classification is higher.

In this way, the first feature map and the second feature map are fused to obtain a classification feature map for classification processing, so as to obtain a classification result for indicating whether a pedestrian exists in an unsafe area. However, since the first convolutional neural network has a problem that the robustness of feature extraction is susceptible to the environment for the visual feature features of the visual sensor, and the image of the echo signal acted by the deep separable convolutional neural network itself has a lack of the geometric information and the category information of the target object, it is desirable to train the first convolutional neural network and the deep separable convolutional neural network to learn each other in the aspect of feature extraction, so as to improve the feature extraction capability of the obtained first feature map and the second feature map.

Specifically, the derivative information hyperconvexity measure is applied as a loss function, expressed as:

wherein

And

the feature values of each position of the first feature map and the second feature map are respectively represented, and the first feature map and the second feature map each have a scale W × H × C.

The derivative information hyperconvexity measurement factor can perform hyperconvexity consistency derivative representation of the manifold of the high-dimensional features through information measurement among internal element sub-dimensions of the feature map in the high-dimensional space, and the first convolution neural network and the depth separable convolution neural network are trained by taking the factor as a loss function, so that the manifold difference between the feature maps tends to be hyperconvexity consistency in the information dimension, the first convolution neural network and the depth separable convolution neural network perform transfer learning on the feature extraction capability of each other, and the classification accuracy is further improved.

Based on this, the application provides a pedestrian detection method based on information fusion, which includes: a training phase and an inference phase. Wherein the training phase comprises the steps of: acquiring a monitoring image through a visual sensor deployed on a hoisting equipment site; acquiring an echo signal through a millimeter wave radar deployed on the site of the lifting equipment; passing the monitoring image through a first convolutional neural network to obtain a first feature map; passing the oscillogram of the echo signal through a second convolutional neural network to obtain a second characteristic map; fusing the first feature map and the second feature map to obtain a classification feature map; passing the classification feature map through a classifier to obtain a classification loss function value; calculating a derivative information hyperconvexity measurement factor between the first feature map and the second feature map, wherein the derivative information hyperconvexity measurement factor is a weighted sum of absolute values of differences between feature values of respective positions in the first feature map and the second feature map along a width dimension, a height dimension and a channel dimension divided by a weighted sum of differences between feature values of respective positions in the first feature map and the second feature map along the width dimension, the height dimension and the channel dimension; and calculating a weighted sum of the classification loss function value and the derivative information hyperconvexity factor as a loss function value to train the first convolutional neural network and the second convolutional neural network. Wherein the inference phase comprises the steps of: acquiring a monitoring image through a visual sensor deployed on a hoisting equipment site; acquiring echo signals through a millimeter wave radar deployed on the site of the lifting equipment; passing the monitoring image through the first convolutional neural network trained by a training stage to obtain a first feature map; passing the oscillogram of the echo signal through the trained second convolutional neural network to obtain a second feature map; fusing the first feature map and the second feature map to obtain a classification feature map; and passing the classification characteristic diagram through a classifier to obtain a classification result, wherein the classification result is used for indicating whether pedestrians exist in the non-safety area or not.

Fig. 1 illustrates a scene schematic diagram of a pedestrian detection method based on information fusion according to an embodiment of the present application. As shown in fig. 1, in the training phase of the application scenario, first, a monitoring image is acquired by a vision sensor (e.g., T as illustrated in fig. 1) disposed on the site of a lifting apparatus (e.g., E as illustrated in fig. 1), and an echo signal is acquired by a millimeter wave radar (e.g., R as illustrated in fig. 1) disposed on the site of the lifting apparatus. Then, the obtained monitoring image and the echo signal are input into a server (for example, S as illustrated in fig. 1) deployed with an information fusion-based pedestrian detection algorithm, wherein the server is capable of training the first convolutional neural network and the second convolutional neural network for information fusion-based pedestrian detection with the monitoring image and the echo signal based on the information fusion-based pedestrian detection algorithm.

After the training is completed, in the inference phase, first, a monitoring image is acquired by a vision sensor (e.g., T as illustrated in fig. 1) disposed on the site of a lifting equipment (e.g., E as illustrated in fig. 1), and an echo signal is acquired by a millimeter wave radar (e.g., R as illustrated in fig. 1) disposed on the site of the lifting equipment. Then, the monitoring image and the echo signal are input into a server (e.g., S as illustrated in fig. 1) deployed with an information fusion-based pedestrian detection algorithm, wherein the server can process the monitoring image and the echo signal with the information fusion-based pedestrian detection algorithm to generate a classification result indicating whether a pedestrian is present in an unsafe area.

Having described the general principles of the present application, various non-limiting embodiments of the present application will now be described with reference to the accompanying drawings.

Exemplary method

Fig. 2A illustrates a flowchart of a training phase in a pedestrian detection method based on information fusion according to an embodiment of the present application. As shown in fig. 2A, the pedestrian detection method based on information fusion according to the embodiment of the application includes: a training phase comprising the steps of: s110, acquiring a monitoring image through a visual sensor deployed on a hoisting equipment site; s120, acquiring echo signals through a millimeter wave radar deployed on the site of the lifting equipment; s130, enabling the monitoring image to pass through a first convolutional neural network to obtain a first characteristic diagram; s140, passing the oscillogram of the echo signal through a second convolutional neural network to obtain a second characteristic diagram; s150, fusing the first feature map and the second feature map to obtain a classification feature map; s160, passing the classification characteristic map through a classifier to obtain a classification loss function value; s170, calculating a derivative information hyperconvexity metric between the first feature map and the second feature map, wherein the derivative information hyperconvexity metric is a weighted sum of absolute values of differences between feature values of respective positions in the first feature map and the second feature map along the width dimension, the height dimension, and the channel dimension divided by a weighted sum of differences between feature values of respective positions in the first feature map and the second feature map along the width dimension, the height dimension, and the channel dimension; and S180, calculating a weighted sum of the classification loss function value and the derivative information hyper-convexity factor as a loss function value to train the first convolutional neural network and the second convolutional neural network.

Fig. 2B illustrates a flowchart of an inference phase in a pedestrian detection method based on information fusion according to an embodiment of the present application. As shown in fig. 2B, the pedestrian detection method based on information fusion according to the embodiment of the present application further includes: an inference phase comprising the steps of: s210, acquiring a monitoring image through a visual sensor deployed on a hoisting equipment site; s220, acquiring an echo signal through a millimeter wave radar deployed on the site of the lifting equipment; s230, enabling the monitoring image to pass through the first convolutional neural network trained in the training stage to obtain a first feature map; s240, passing the oscillogram of the echo signal through the trained second convolutional neural network to obtain a second feature map; s250, fusing the first feature map and the second feature map to obtain a classification feature map; and S260, passing the classification characteristic diagram through a classifier to obtain a classification result, wherein the classification result is used for indicating whether pedestrians exist in the non-safety area or not.

Fig. 3A illustrates an architecture diagram of a training phase in a pedestrian detection method based on information fusion according to an embodiment of the present application. As shown in fig. 3A, in the training phase, first, the obtained monitoring image (e.g., P1 as illustrated in fig. 3A) is passed through a first convolutional neural network (e.g., CNN1 as illustrated in fig. 3A) to obtain a first feature map (e.g., F1 as illustrated in fig. 3A) in the network architecture; next, passing the obtained oscillogram of the echo signal (e.g., P2 as illustrated in fig. 3A) through a second convolutional neural network (e.g., CNN2 as illustrated in fig. 3A) to obtain a second profile (e.g., F2 as illustrated in fig. 3A); then, fusing the first feature map and the second feature map to obtain a classification feature map (e.g., FC as illustrated in fig. 3A); then, passing the classification feature map through a classifier (e.g., a classifier as illustrated in fig. 3A) to obtain a classification loss function value (e.g., a CLV as illustrated in fig. 3A); then, a derivative information hyper-convexity factor (e.g., a DIH as illustrated in FIG. 3A) between the first feature map and the second feature map is calculated; and finally, calculating a weighted sum of the classification loss function value and the derivative information hyperconvexity factor as a loss function value to train the first convolutional neural network and the second convolutional neural network.

Fig. 3B illustrates an architecture diagram of an inference stage in a pedestrian detection method based on information fusion according to an embodiment of the present application. As shown in fig. 3B, in the inference phase, first, the obtained monitoring image (e.g., Q1 as illustrated in fig. 3B) is passed through the first convolutional neural network (e.g., CN1 as illustrated in fig. 3B) trained by the training phase to obtain a first feature map (e.g., F1 as illustrated in fig. 3B) in the network structure; then, passing the obtained oscillogram of the echo signal (e.g., Q2 as illustrated in fig. 3B) through the trained second convolutional neural network (e.g., CN2 as illustrated in fig. 3B) to obtain a second feature map (e.g., F2 as illustrated in fig. 3B); then, fusing the first feature map and the second feature map to obtain a classification feature map (e.g., FC as illustrated in fig. 3B); finally, the classification feature map is passed through a classifier (e.g., a classifier as illustrated in fig. 3B) to obtain a classification result, which is used to indicate whether a pedestrian is present in the unsafe area.

More specifically, in the training phase, in step S110, step S120 and step S130, a monitoring image is obtained through a vision sensor deployed on the site of the lifting equipment, an echo signal is obtained through a millimeter wave radar deployed on the site of the lifting equipment, and the monitoring image is passed through a first convolutional neural network to obtain a first feature map. As mentioned above, since the vision sensor is susceptible to the environment and the millimeter wave radar cannot acquire the geometric and category information of the target, but the vision sensor has advantages such as low cost, rich image information, and easy identification and classification of the target, the millimeter wave radar emits electromagnetic waves with the characteristic of strong dust penetration capability for detecting the relative distance between the obstacle and the millimeter wave radar in the given environment. Therefore, in the technical scheme of the application, the data characteristic information acquired by combining the two is combined for fusion and classification, so that the pedestrians in the unsafe area of the lifting equipment field can be more accurately detected.

Specifically, in the technical scheme of the application, firstly, a monitoring image of a target area to be detected is obtained through a vision sensor deployed on a hoisting equipment site, and an echo signal of the target area to be detected is obtained through a millimeter wave radar. It should be understood that, in order to extract local implicit feature information in the monitored image, a convolutional neural network model with excellent performance in implicit feature extraction of the image is used for feature mining of the monitored image. In order to fully utilize the geometric information of the target, in the technical scheme of the application, a shallow layer of the monitoring image is extracted by using a shallow layer of a first convolution neural network, a deep layer of the monitoring image is extracted from a deep layer of the first convolution neural network, and then feature fusion is further carried out.

Specifically, in this embodiment of the present application, the process of passing the monitoring image through a first convolutional neural network to obtain a first feature map includes: each layer of the first convolutional neural network respectively performs the following operations on input data in the forward transmission of the layer: performing convolution processing based on a two-dimensional convolution kernel on the input data to generate a convolution characteristic diagram; pooling the convolved feature map to generate a pooled feature map; and performing activation based on a nonlinear activation function on the pooled feature map to obtain an activation feature map. Accordingly, in one particular example, a first activation feature map is extracted from an mth layer of the first convolutional neural network; and extracting a second activation feature map from a last layer of the first convolutional neural network. In particular, here, the number of shallow layers of the first convolutional neural network is selected according to the total number of layers of the first convolutional neural network, for example, when the total number of layers is 40, the shallow feature map is extracted from the 4 th layer of the first convolutional neural network, and when the total number of layers is 50, the shallow feature map is extracted from the 5 th layer of the first convolutional neural network. In this way, the deep feature map and the shallow feature map are fused to obtain a first feature map with abundant geometric information and category information of the target to be detected, that is, the first activation feature map and the second activation feature map are fused to obtain the first feature map.

More specifically, in the training phase, in step S140, the waveform of the echo signal is passed through a second convolutional neural network to obtain a second feature map. That is, in the technical solution of the present application, for the echo signal obtained by the millimeter wave radar, feature extraction is performed by using a depth separable convolutional neural network model to obtain a second feature map. Specifically, the convolution kernel of the deep separable convolutional neural network is

It should be understood that, here, the second convolutional neural network model is a deep separable convolutional neural network, which can remove the useless noise signals in the echo signals, so that the accuracy of pedestrian detection is higher, which is beneficial to improve the subsequent precision, and it can also replace three-dimensional convolution with two-dimensional convolutions, which reduces the amount of computation, so that the classification accuracy is higher.

More specifically, in the training phase, in step S150 and step S160, the first feature map and the second feature map are fused to obtain a classification feature map, and the classification feature map is passed through a classifier to obtain a classification loss function value. That is, in the technical solution of the present application, after the first feature map and the second feature map are obtained, feature information of the first feature map and the second feature map is further fused for classification, so as to obtain a classification loss function value. Accordingly, in one particular example, a weighted sum of respective locations of the first and second feature maps may be computed to obtain the classification feature map.

Specifically, in the embodiment of the present application, the process of passing the classification feature map through a classifier to obtain a classification loss function value includes: firstly, the classification feature map is processed by the classifier according to the following formula to generate a classification result, wherein the formula is as follows: softmax { (W) _n ,B _n ):…:(W ₁ ,B ₁ ) Project (F) }, where Project (F) denotes projection of the classification feature map as a vector, W ₁ To W _n As a weight matrix for all connected layers of each layer, B ₁ To B _n A bias matrix representing all layers of the fully connected layer; then, a cross entropy value between the classification result and a true value is calculated as the classification loss function value.

More specifically, in a training phase, in steps S170 and S180, a derivative information hyperconvexity factor between the first feature map and the second feature map is calculated, wherein the derivative information hyperconvexity factor is a weighted sum of absolute values of differences between feature values of respective positions in the first feature map and the second feature map along a width dimension, a height dimension, and a channel dimension divided by a weighted sum of differences between feature values of respective positions in the first feature map and the second feature map along the width dimension, the height dimension, and the channel dimension, and the weighted sum of the classification loss function value and the derivative information hyperconvexity factor is calculated as a loss value to train the first convolutional neural network and the second convolutional neural network. Since the robustness of the feature extraction of the first convolutional neural network is susceptible to the environment for the visual feature features of the visual sensor, and the image of the echo signal acted by the deep separable convolutional neural network has the absence of the geometric information and the category information of the target object, in the technical solution of the present application, it is desirable to learn the feature extraction of the first convolutional neural network and the deep separable convolutional neural network from each other through training of the two networks, so as to improve the feature extraction capability of the obtained first feature map and the second feature map. That is, in the present invention, the first convolutional neural network and the second convolutional neural network are trained by calculating a derivative information hyper-convexity metric between the first feature map and the second feature map as a weighted sum of the loss function and the classification loss function value.

Specifically, in this embodiment of the present application, the process of calculating the derivative information hyperconvexity metric factor between the first feature map and the second feature map includes: calculating the derivative information hyperconvexity measure factor between the first feature map and the second feature map in the following formula;

wherein the formula is:

wherein

And

feature values of each position of the first feature map and the second feature map are respectively represented, and the first feature map and the second feature map have a scale W × H × C.

After training is completed, the inference phase is entered. That is, after the first convolutional neural network and the second convolutional neural network are trained using the pedestrian detection algorithm based on information fusion, the trained first convolutional neural network and second convolutional neural network are used in an actual pedestrian detection scene.

More specifically, in the inference phase, first, a monitoring image is acquired by a vision sensor disposed on the site of the hoisting equipment. And then, acquiring an echo signal through a millimeter wave radar deployed on the site of the lifting equipment. Then, the monitoring image is passed through the first convolutional neural network which is trained by a training stage to obtain a first feature map. Then, the waveform diagram of the echo signal is passed through the trained second convolutional neural network to obtain a second feature diagram. Then, the first feature map and the second feature map are fused to obtain a classification feature map. And finally, passing the classification feature map through a classifier to obtain a classification result, wherein the classification result is used for indicating whether pedestrians exist in the non-safety area or not.

In summary, the pedestrian detection method based on information fusion is clarified, the monitoring image features acquired by the vision sensor in the unsafe area of the lifting equipment site and the echo signal features acquired by the millimeter wave radar are excavated through the convolutional neural network model based on deep learning to comprehensively detect the pedestrians in the area, in addition, in the process of feature fusion, the derivative information hyper-convexity coefficient is applied to be used as a loss function for training, and the hyper-convexity consistency derivative expression of the manifold of the high-dimensional features is carried out through the information measurement among the internal element sub-dimensions of the feature diagram in the high-dimensional space, so that the convolutional neural network models of the two can carry out transfer learning on the feature extraction capability of each other, and the classification accuracy is improved. Therefore, the pedestrian in the unsafe area of the lifting equipment site can be accurately detected, and the safety of the pedestrian is ensured.

Exemplary System

FIG. 4 illustrates a block diagram of a pedestrian detection system based on information fusion according to an embodiment of the application. As shown in fig. 4, the pedestrian detection system 400 based on information fusion according to the embodiment of the present application includes: a training module 410 and an inference module 420.

As shown in fig. 4, the training module 410 includes: a monitoring image acquisition unit 410, configured to acquire a monitoring image through a vision sensor deployed on the site of the hoisting equipment; an echo signal acquiring unit 420, configured to acquire an echo signal through a millimeter wave radar deployed on the site of the hoisting equipment; a first feature extraction unit 430, configured to pass the monitoring image through a first convolutional neural network to obtain a first feature map; a second feature extraction unit 440, configured to pass the waveform map of the echo signal through a second convolutional neural network to obtain a second feature map; a fusion unit 450, configured to fuse the first feature map and the second feature map to obtain a classification feature map; a classification loss function value calculating unit 460, configured to pass the classification feature map through a classifier to obtain a classification loss function value; a derived information hyperconvexity metric calculation unit 470, configured to calculate a derived information hyperconvexity metric between the first feature map and the second feature map, where the derived information hyperconvexity metric is a weighted sum of absolute values of differences between feature values of respective positions in the first feature map and the second feature map along the width dimension, the height dimension, and the channel dimension divided by a weighted sum of differences between feature values of respective positions in the first feature map and the second feature map along the width dimension, the height dimension, and the channel dimension; and a training unit 480 for calculating a weighted sum of the classification loss function value and the derivative information hyperconvexity factor as a loss function value to train the first convolutional neural network and the second convolutional neural network.

As shown in fig. 4, the inference module 420 includes: the inferred image acquisition unit 421 is used for acquiring a monitoring image through a visual sensor deployed on a lifting equipment site; an inferred signal acquisition unit 422, configured to acquire an echo signal through a millimeter wave radar deployed on the site of the hoisting equipment; a first feature map generating unit 423, configured to pass the monitoring image through the first convolutional neural network trained in the training phase to obtain a first feature map; a second feature map generating unit 424, configured to pass the waveform map of the echo signal through the trained second convolutional neural network to obtain a second feature map; a classification feature map generating unit 425 configured to fuse the first feature map and the second feature map to obtain a classification feature map; and a classification unit 426, configured to pass the classification feature map through a classifier to obtain a classification result, where the classification result is used to indicate whether a pedestrian exists in the unsafe area.

In an example, in the above pedestrian detection system 400 based on information fusion, the first feature extraction unit 430 is further configured to: each layer of the first convolutional neural network performs input data in forward transmission of the layer respectively: performing convolution processing based on a two-dimensional convolution kernel on the input data to generate a convolution characteristic diagram; pooling the convolved feature map to generate a pooled feature map; and performing activation based on a nonlinear activation function on the pooled feature map to obtain an activation feature map.

In an example, in the above pedestrian detection system 400 based on information fusion, the first feature extraction unit 430 is further configured to: extracting a first activation feature map from an mth layer of the first convolutional neural network; extracting a second activation feature map from a last layer of the first convolutional neural network; and fusing the first activation feature map and the second activation feature map to obtain the first feature map.

In an example, in the above pedestrian detection system 400 based on information fusion, the second feature extraction unit 440 is further configured to: the second convolutional neural network model is a deep separable convolutional neural network, wherein the convolutional kernel of the deep separable convolutional neural network is

In one example, in the above pedestrian detection system 400 based on information fusion, the classification loss function value calculating unit 460 is further configured to: the classifier processes the classification feature map to generate a classification result according to the following formula, wherein the formula is as follows: softmax { (W) _n ,B _n ):…:(W ₁ ,B ₁ ) L Project (F) }, where Project (F) denotes that the classification feature map is appliedProjected as a vector, W ₁ To W _n As a weight matrix for each fully connected layer, B ₁ To B _n A bias matrix representing the layers of the fully-connected layer; and calculating a cross entropy value between the classification result and a real value as the classification loss function value.

In one example, in the above pedestrian detection system 400 based on information fusion, the derived information hyper-convexity factor calculating unit 470 is further configured to: calculating the derivative information hyperconvexity measure factor between the first feature map and the second feature map with the following formula;

wherein the formula is:

wherein

And

Here, it can be understood by those skilled in the art that the specific functions and operations of the respective units and modules in the above-described information fusion-based pedestrian detection system 400 have been described in detail in the above description of the information fusion-based pedestrian detection method with reference to fig. 1 to 3B, and thus, a repetitive description thereof will be omitted.

As described above, the pedestrian detection system 400 based on information fusion according to the embodiment of the present application can be implemented in various terminal devices, such as a server of a pedestrian detection algorithm based on information fusion, and the like. In one example, the pedestrian detection system 400 based on information fusion according to the embodiment of the present application may be integrated into a terminal device as one software module and/or hardware module. For example, the pedestrian detection system 400 based on information fusion may be a software module in the operating device of the terminal device, or may be an application developed for the terminal device; of course, the pedestrian detection system 400 based on information fusion can also be one of the hardware modules of the terminal device.

Alternatively, in another example, the pedestrian detection system 400 based on information fusion and the terminal device may also be separate devices, and the pedestrian detection system 400 based on information fusion may be connected to the terminal device through a wired and/or wireless network and transmit the mutual information according to the agreed data format.

Exemplary electronic device

Exemplary computer program product and computer-readable storage Medium

In addition to the above-described methods and apparatus, embodiments of the present application may also be a computer program product comprising computer program instructions that, when executed by a processor, cause the processor to perform the steps in the functions of the information fusion-based pedestrian detection method according to the various embodiments of the present application described in the "exemplary methods" section of this specification, above.

The computer program product may be written with program code for performing the operations of embodiments of the present application in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.

Furthermore, embodiments of the present application may also be a computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, cause the processor to perform the steps in the information fusion-based pedestrian detection method described in the "exemplary method" section above in this specification.

The computer-readable storage medium may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor device, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The basic principles of the present application have been described above with reference to specific embodiments, but it should be noted that advantages, effects, etc. mentioned in the present application are only examples and are not limiting, and the advantages, effects, etc. must not be considered to be possessed by various embodiments of the present application. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the foregoing disclosure is not intended to be exhaustive or to limit the disclosure to the precise details disclosed.

The block diagrams of devices, apparatuses, systems referred to in this application are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations, etc. must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," "comprising," "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably therewith. The words "or" and "as used herein mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".

It should also be noted that in the devices, apparatuses, and methods of the present application, the components or steps may be decomposed and/or recombined. These decompositions and/or recombinations are to be considered as equivalents of the present application.

The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present application. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the application. Thus, the present application is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The foregoing description has been presented for purposes of illustration and description. Furthermore, the description is not intended to limit embodiments of the application to the form disclosed herein. While a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, alterations, additions and sub-combinations thereof.

Claims

1. A pedestrian detection method based on information fusion is characterized by comprising the following steps: a training phase comprising: acquiring a monitoring image through a visual sensor deployed on a hoisting equipment site; acquiring an echo signal through a millimeter wave radar deployed on the site of the lifting equipment; passing the monitoring image through a first convolutional neural network to obtain a first feature map; passing the oscillogram of the echo signal through a second convolutional neural network to obtain a second characteristic map; fusing the first feature map and the second feature map to obtain a classification feature map; passing the classification feature map through a classifier to obtain a classification loss function value; calculating a derivative information hyperconvexity measurement factor between the first feature map and the second feature map, wherein the derivative information hyperconvexity measurement factor is a weighted sum of absolute values of differences between feature values of respective positions in the first feature map and the second feature map along a width dimension, a height dimension and a channel dimension divided by a weighted sum of differences between feature values of respective positions in the first feature map and the second feature map along the width dimension, the height dimension and the channel dimension; and calculating a weighted sum of the classification loss function value and the derivative information hyperconvexity factor as a loss function value to train the first convolutional neural network and the second convolutional neural network; and an inference phase comprising: acquiring a monitoring image through a visual sensor deployed on a hoisting equipment site; acquiring an echo signal through a millimeter wave radar deployed on the site of the lifting equipment; passing the monitoring image through the first convolutional neural network trained by a training stage to obtain a first feature map; passing the oscillogram of the echo signal through the trained second convolutional neural network to obtain a second feature map; fusing the first feature map and the second feature map to obtain a classification feature map; and passing the classification characteristic diagram through a classifier to obtain a classification result, wherein the classification result is used for indicating whether pedestrians exist in the non-safety area or not.

2. The pedestrian detection method based on information fusion according to claim 1, wherein the passing the monitoring image through a first convolutional neural network to obtain a first feature map includes: each layer of the first convolutional neural network performs input data in forward transmission of the layer respectively: performing convolution processing based on a two-dimensional convolution kernel on the input data to generate a convolution characteristic diagram; pooling the convolved feature map to generate a pooled feature map; and performing activation based on a nonlinear activation function on the pooled feature map to obtain an activation feature map.

3. The pedestrian detection method based on information fusion according to claim 2, wherein passing the monitoring image through a first convolutional neural network to obtain a first feature map comprises: extracting a first activation feature map from an mth layer of the first convolutional neural network; extracting a second activation feature map from a last layer of the first convolutional neural network; and fusing the first activation feature map and the second activation feature map to obtain the first feature map.

4. The pedestrian detection method based on information fusion of claim 3, wherein the second convolutional neural network model is a deep separable convolutional neural network, wherein a convolution kernel of the deep separable convolutional neural network is

5. The information fusion-based pedestrian detection method of claim 4, wherein passing the classification feature map through a classifier to obtain classification loss function values comprises: the classifier processes the classification feature map to generate a classification result according to the following formula, wherein the formula is as follows: softmax { (W) _n ,B _n ):…:(W ₁ ,B ₁ ) L Project (F) }, where Project (F) denotes the projection of the classification feature map as a vector, W ₁ To W _n As a weight matrix for each fully connected layer, B ₁ To B _n A bias matrix representing the layers of the fully-connected layer; and calculating a cross entropy value between the classification result and the real value as the classification loss function value.

6. The pedestrian detection method based on information fusion according to claim 5, wherein calculating a derivative information hyper-convexity metric factor between the first feature map and the second feature map comprises: calculating the derivative information hyperconvexity measure factor between the first feature map and the second feature map with the following formula; wherein the formula is:

wherein

And

7. A pedestrian detection system based on information fusion is characterized by comprising: a training module comprising: the monitoring image acquisition unit is used for acquiring a monitoring image through a visual sensor deployed on a hoisting equipment site; the echo signal acquisition unit is used for acquiring echo signals through a millimeter wave radar deployed on the site of the lifting equipment; the first feature extraction unit is used for enabling the monitoring image to pass through a first convolutional neural network to obtain a first feature map; the second feature extraction unit is used for enabling the oscillogram of the echo signal to pass through a second convolutional neural network so as to obtain a second feature map; a fusion unit, configured to fuse the first feature map and the second feature map to obtain a classification feature map; the classification loss function value calculation unit is used for enabling the classification characteristic map to pass through a classifier so as to obtain a classification loss function value; a derived information hyperconvexity measurement factor calculation unit configured to calculate a derived information hyperconvexity measurement factor between the first feature map and the second feature map, wherein the derived information hyperconvexity measurement factor is a weighted sum of absolute values of differences between feature values of respective positions in the first feature map and the second feature map along a width dimension, a height dimension, and a channel dimension, divided by a weighted sum of differences between feature values of respective positions in the first feature map and the second feature map along the width dimension, the height dimension, and the channel dimension; and a training unit for calculating a weighted sum of the classification loss function value and the derivative information hyperconvexity factor as a loss function value to train the first convolutional neural network and the second convolutional neural network; and an inference module comprising: the inferred image acquisition unit is used for acquiring a monitoring image through a visual sensor deployed on a lifting equipment site; the inferred signal acquisition unit is used for acquiring echo signals through a millimeter wave radar deployed on the site of the lifting equipment; a first feature map generation unit, configured to pass the monitoring image through the first convolutional neural network trained in the training stage to obtain a first feature map; a second feature map generation unit, configured to pass the waveform map of the echo signal through the trained second convolutional neural network to obtain a second feature map; a classification feature map generation unit, configured to fuse the first feature map and the second feature map to obtain a classification feature map; and the classification unit is used for enabling the classification characteristic graph to pass through a classifier to obtain a classification result, and the classification result is used for indicating whether pedestrians exist in the non-safety area or not.

8. The information fusion-based pedestrian detection system according to claim 7, wherein the first feature extraction unit is further configured to: each layer of the first convolutional neural network performs input data in forward transmission of the layer respectively: performing convolution processing based on a two-dimensional convolution kernel on the input data to generate a convolution characteristic diagram; pooling the convolved feature map to generate a pooled feature map; and performing activation based on a nonlinear activation function on the pooled feature map to obtain an activation feature map.

9. The information fusion-based pedestrian detection system according to claim 8, wherein the second feature extraction unit is further configured to: extracting a first activation feature map from an mth layer of the first convolutional neural network; extracting a second activation feature map from a last layer of the first convolutional neural network; and fusing the first activation feature map and the second activation feature map to obtain the first feature map.

10. The information fusion-based pedestrian detection system according to claim 9, wherein the first feature extraction unit is further configured to: the second convolutional neural network model is a deep separable convolutional neural network, wherein the convolutional kernel of the deep separable convolutional neural network is