WO2024125583A1 - 检测方法及相关设备 - Google Patents

检测方法及相关设备 Download PDF

Info

Publication number
WO2024125583A1
WO2024125583A1 PCT/CN2023/138649 CN2023138649W WO2024125583A1 WO 2024125583 A1 WO2024125583 A1 WO 2024125583A1 CN 2023138649 W CN2023138649 W CN 2023138649W WO 2024125583 A1 WO2024125583 A1 WO 2024125583A1
Authority
WO
WIPO (PCT)
Prior art keywords
neural network
feature map
sample image
heat map
detection
Prior art date
Application number
PCT/CN2023/138649
Other languages
English (en)
French (fr)
Inventor
宋天源
杨静林
李永翔
Original Assignee
中国电信股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国电信股份有限公司 filed Critical 中国电信股份有限公司
Publication of WO2024125583A1 publication Critical patent/WO2024125583A1/zh

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present disclosure relates to the field of computer graphics, and in particular to a detection method and related equipment.
  • a detection method including: inputting a target image into a pre-trained first neural network, outputting a heat map, wherein the heat map reflects the position of a human head; inputting the target image into a second neural network, outputting a feature map; fusing the heat map with the feature map features to determine a weighted feature map; inputting the weighted feature map into a pre-trained third neural network, outputting a plurality of detection frames, wherein the center point of the detection frame is at the center of the human head area; and determining, based on the coordinates in the heat map whose scores are greater than a threshold, a detection frame with the largest coordinate score among the plurality of detection frames as a detection result.
  • the detection method further includes: acquiring a target image; marking a human head in the target image with a rectangular frame to determine the marked frame, wherein the marked frame includes coordinate information and the center of the marked frame corresponds to the center of the human head area; determining a heat map according to the output of the target image after inputting the first neural network; and determining a heat map according to the rectangular frame.
  • the center point of the box annotation is determined to determine the true value heat map; and the first neural network is trained according to the heat map and the true value heat map.
  • the detection method also includes: acquiring a first sample image; marking a human head in the first sample image with a rectangular frame to determine the marking frame, wherein the marking frame includes coordinate information and the center of the marking frame corresponds to the center of the head area; determining a heat map corresponding to the first sample image based on the output after the first sample image is input into the first neural network; determining a true value heat map corresponding to the first sample image based on the center point of the marking frame; and training the first neural network based on the heat map and the true value heat map corresponding to the second sample image.
  • inputting the target image into the second neural network and outputting a feature map includes: inputting the target image into the second neural network and outputting a feature map of the same size as the heat map.
  • the detection method further includes: fusing the feature map and the heat map by weighting to determine a weighted feature map; and training a third neural network based on the weighted feature map.
  • the detection method also includes: inputting the second sample image into a pre-trained first neural network, and outputting a heat map corresponding to the second sample image; inputting the second sample image into a second neural network, and outputting a feature map corresponding to the second sample image; performing feature fusion on the heat map and feature map corresponding to the second sample image to determine a weighted feature map corresponding to the second sample image; and training the third neural network based on the weighted feature map corresponding to the second sample image.
  • the feature map and the heat map are weighted for feature fusion
  • determining the weighted feature map includes: multiplying the heat map with the feature map on each channel point by point, and performing weighted summation according to the weights corresponding to the feature map on each channel to determine the weighted feature map, wherein the heat map includes one channel and the feature map includes at least one channel.
  • the detection method further includes: updating parameters of the second neural network and the third neural network according to a back propagation algorithm.
  • the training of the third neural network according to the weighted feature map corresponding to the second sample image includes: inputting the weighted feature map corresponding to the second sample image into the third neural network, and outputting multiple detection boxes corresponding to the second sample image; determining the detection box with the largest coordinate score in the multiple detection boxes corresponding to the second sample image according to the coordinates with scores greater than a threshold in the heat map corresponding to the second sample image, as the detection result corresponding to the second sample image; and updating the parameters of the second neural network and the parameters of the third neural network according to the back propagation algorithm according to the detection result corresponding to the second sample image and the position information of the human head in the second sample image, so as to train the second neural network and the third neural network.
  • the detection method further includes: deploying the first neural network in the first thread; Through the network, the third neural network is deployed in the second thread.
  • a detection device including: a first neural network module, used to input the target image into a pre-trained first neural network, and output a heat map, wherein the heat map reflects the position of the human head; a second neural network module, used to input the target image into the second neural network, and output a feature map; a feature fusion module, used to fuse the heat map with the feature map features, and determine a weighted feature map; a third neural network module, used to input the weighted feature map into a pre-trained third neural network, and output multiple detection frames; a detection result determination module, used to determine, based on the coordinates in the heat map whose scores are greater than a threshold, a detection frame with the largest coordinate score among multiple detection frames as the detection result.
  • an electronic device comprising: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to execute any one of the above-mentioned detection methods by executing the executable instructions.
  • a computer-readable storage medium on which a computer program is stored.
  • the computer program is executed by a processor, any one of the above detection methods is implemented.
  • a computer program product including a computer program, wherein when the computer program is executed by a processor, the computer program implements any one of the above detection methods.
  • a computer program comprising: instructions, wherein when the instructions are executed by a processor, the detection method according to any of the above embodiments is implemented.
  • FIG1 is a schematic diagram showing a detection system structure in an embodiment of the present disclosure
  • FIG2 shows a flow chart of another detection method in an embodiment of the present disclosure
  • FIG3 shows a flow chart of another detection method in an embodiment of the present disclosure
  • FIG4 shows a flow chart of another detection method in an embodiment of the present disclosure
  • FIG5 shows a flow chart of another detection method in an embodiment of the present disclosure
  • FIG6 shows a schematic diagram of an overall training of a detection method in an embodiment of the present disclosure
  • FIG7 shows a schematic diagram of auxiliary network training of a detection method in an embodiment of the present disclosure
  • FIG8 shows a schematic diagram of a detection device in an embodiment of the present disclosure
  • FIG9 shows a structural block diagram of a computer device according to an embodiment of the present disclosure.
  • FIG. 10 is a schematic diagram of a computer-readable storage medium in an embodiment of the present disclosure.
  • the inventors have discovered that in the prior art, in some special scenarios, such as when the target is relatively small, such as a construction site safety helmet scenario, the commonly used detection algorithm cannot accurately identify the small-sized target.
  • the present disclosure provides a detection method and related equipment, which at least to some extent overcome the problem of inaccurate detection of small-sized human heads in the related art.
  • Fig. 1 shows a schematic diagram of an exemplary application system architecture to which the detection method in the embodiment of the present disclosure can be applied.
  • the system architecture 100 may include an auxiliary network 101 , a backbone network 102 and a detection network 103 .
  • the auxiliary network (the first neural network) is deployed in thread one and consists of multiple convolutional layers and pooling layers. It finally generates a heat map H of the head position.
  • the heat map size (dimension) is w*h and the number of channels is 1.
  • the cross entropy loss of the heat map H generated by the auxiliary network and the true value heat map H_GT is calculated point by point to train the auxiliary network.
  • the backbone network (the second neural network) is trained.
  • the backbone network is also composed of multiple convolutional and pooling layers, which is responsible for extracting image features.
  • the feature map F is extracted.
  • the size of the feature map is consistent with the heat map, which is w*h, and the number of channels is C.
  • the heat map and the feature map on each channel are multiplied point by point to generate a weighted feature map F1.
  • the size of F1 is w*h, and the number of channels is C.
  • the weighted feature map F1 is input into the detection network (third neural network) part, and the detection network is trained. During back propagation, only the parameters of the detection network and the backbone network are updated. After obtaining a large number of candidate boxes through the detection network, the coordinates of all points with scores greater than the threshold T are taken in the heat map generated by the auxiliary network, and the detection box with the largest score in the coordinate position and its four neighborhoods is selected from the many candidate boxes as the final detection result, replacing the non-maximum suppression (NMS) process of the traditional detection algorithm and speeding up the post-processing speed.
  • NMS non-maximum suppression
  • the weight files of the auxiliary network, backbone network and detection network are first converted into the format required by the corresponding chip manufacturer.
  • the parameter accuracy can be appropriately reduced to improve the reasoning speed.
  • the algorithm is deployed, two threads are enabled.
  • the auxiliary network is deployed in thread 1 (the first thread), and the backbone network and detection network are deployed in thread 2 (the second thread).
  • thread 1 runs, the auxiliary network shares the heat map and the coordinates of the position points whose responses (scores) are greater than the threshold with thread 2.
  • the backbone network calculation is completed, the information shared by thread 2 will be used for subsequent calculations.
  • the present invention makes full use of the characteristics that the shapes of objects such as human heads and helmets are relatively fixed and the scale changes little in the same monitoring scene. It uses an additional auxiliary network to predict the position of the center of the human head in the image, and uses the center position of the human head to assist the training of the backbone network and the detection network, and directly obtains the final detection frame from the candidate frame in the post-processing process.
  • the present disclosure can significantly improve the accuracy and recall rate of small objects in images, and eliminates the need for time-consuming non-maximum suppression during post-processing, thereby increasing the speed of the algorithm.
  • the auxiliary network module disclosed in the present invention is decoupled from the backbone network and detection network modules during the reasoning process.
  • the two modules can be placed in different processes or threads for parallel processing during deployment, and can be run offline in real time on embedded devices of any manufacturer.
  • auxiliary networks, backbone networks, and detection networks in FIG1 is merely illustrative, and any number of auxiliary networks, backbone networks, and detection networks may be provided according to actual needs, which is not limited in the embodiments of the present disclosure.
  • an embodiment of the present disclosure provides a small-size head detection method based on an attention mechanism, which can be executed by any electronic device with computing and processing capabilities.
  • FIG2 shows a flow chart of a detection method in an embodiment of the present disclosure.
  • the detection method provided in an embodiment of the present disclosure includes the following steps S202 to S210.
  • step S202 the target image is input into a pre-trained first neural network, and a heat map is output, wherein the heat map reflects the position of the human head.
  • the target image may be a head with a helmet taken in a head detection scenario.
  • Image For example, an image taken in a construction site safety helmet scene.
  • the first neural network model can be a neural network model, which is described based on the mathematical model of neurons and is a mathematical model.
  • the heat map can be a diagram showing the page areas that visitors are keen on and the geographical areas where the visitors are located in a special highlighted form. For example, highlighting an image of a human head wearing a safety helmet.
  • step S204 the target image is input into a pre-trained second neural network, and a feature map is output.
  • the second neural network model can be a neural network model.
  • the feature map can be a two-dimensional image.
  • the data exists in a three-dimensional form, which is a stack of many two-dimensional images, each of which is called a feature map.
  • step S206 the heat map is fused with the feature map features to determine a weighted feature map.
  • the above-mentioned feature fusion can be an optimized combination of different feature vectors extracted from the same mode. For example, during feature fusion, the heat map and the feature map on each channel are multiplied point by point to generate a weighted feature map (equivalent to the above-mentioned weighted feature map).
  • step S208 the weighted feature map is input into a pre-trained third neural network, and a plurality of annotation boxes are output.
  • the third neural network model may be a neural network model.
  • the annotation box may be a recognized head area, for example, a rectangular box parallel to the axis.
  • the center point of the detection box may be at the center of the human head region.
  • step S210 based on the coordinates in the heat map whose scores are greater than a preset threshold, the detection frame with the largest coordinate score among multiple detection frames is determined as the detection result.
  • scores may be scores at a point and its four neighborhoods.
  • the coordinates of all points with scores greater than a threshold T are taken in the generated heat map, and the annotation box with the largest score at the coordinate position and its four neighborhoods is selected from among the numerous annotation boxes as the final detection result.
  • the detection method provided in the embodiment of the present disclosure inputs the target image into a pre-trained first neural network, outputs a heat map, wherein the heat map reflects the location of the human head; inputs the target image into a second neural network, outputs a feature map; fuses the heat map with the feature map features to determine a weighted feature map; inputs the weighted feature map into a pre-trained third neural network, outputs multiple detection frames; and according to the coordinates in the heat map whose scores are greater than a threshold, determines the detection frame with the largest coordinate score among the multiple detection frames as the detection result.
  • the first neural network heat map predicts the location of the human head and is fused with the second neural network feature map to implement the attention mechanism, a more accurate human head boundary frame can be obtained, thereby helping to improve the accuracy of small-sized head detection.
  • this paper transforms ordinary rectangular annotations into head position heat maps and trains
  • the first neural network is trained, the first neural network is used to predict the position of the head, and it is integrated with the second neural network feature map to realize the attention mechanism, and the second neural network and the third neural network are assisted in training, and the first neural network is used to directly obtain the final annotation box from the candidate annotation box, replacing the non-maximum suppression process, and speeding up the post-processing speed.
  • the present disclosure predicts the position of the head through the first neural network heat map, and integrates it with the second neural network feature map to realize the attention mechanism, which can obtain a more accurate head bounding box, thereby helping to improve the accuracy of small-sized head detection.
  • the detection method provided in the embodiments of the present disclosure can train a first neural network through the following steps to accurately predict the center position of a human head in an image.
  • step S302 a target image (first sample image) is acquired.
  • step S304 a rectangular frame is marked on the head in the target image (first sample image) to determine the marked frame, wherein the marked frame includes coordinate information and the center of the marked frame corresponds to the center of the head area.
  • S308 Determine a true value heat map (corresponding to the first sample image) according to the center point marked by the rectangular box.
  • the data processing part of the target image includes the acquisition, annotation process and the true value heat map generation process.
  • the acquisition and annotation process that is, using the on-site acquisition and network to collect pictures (equivalent to the above target image (first sample image)) to make a data set, annotate all the head areas therein with rectangular boxes, and when annotating, it is necessary to ensure that the center point of the rectangular box is at the center of the head area, where the annotated coordinates will be directly applied to the training of the backbone network (equivalent to the above second neural network) and the detection network (equivalent to the above third neural network); in addition, it is necessary to use the annotation of the rectangular box to generate the true value heat map, first scale the original image and the corresponding annotation box, the scaled size should ensure that the same size as the heat map generated by the auxiliary network (equivalent to the above first neural network), and then take the coordinates of the center point of the scaled annotation box as P1, P2, P3, etc., set the
  • the detection method provided in the embodiments of the present disclosure can output a feature map through the following steps, and can accurately determine the size of the feature map.
  • step S402 the target image is input into the second neural network, and a feature map having the same size as the heat map is output.
  • the detection method provided in the embodiments of the present disclosure can be The third neural network is trained through the following steps to accurately calculate the weighted feature map of the target image.
  • step S502 the feature map and the heat map are weighted to perform feature fusion to determine a weighted feature map
  • step S504 a third neural network is trained according to the weighted feature map.
  • the second sample image is input into a pre-trained first neural network, and a heat map corresponding to the second sample image is output;
  • the second sample image is input into a second neural network, and a feature map corresponding to the second sample image is output;
  • the heat map and feature map corresponding to the second sample image are feature fused to determine a weighted feature map corresponding to the second sample image;
  • the third neural network is trained based on the weighted feature map corresponding to the second sample image.
  • the first sample image and the second sample image may be the same or different.
  • feature fusion is performed on the feature map and the heat map by weighting, and determining the weighted feature map includes: point-by-point multiplication of the heat map with the feature map on each channel to obtain the result corresponding to each channel, and weighted summing the results corresponding to each channel according to the weight corresponding to the feature map on each channel to determine the weighted feature map, wherein the heat map includes one channel and the feature map includes at least one channel.
  • the number of channels of the heat map H is 1, and the number of channels of the feature map is C, where C includes three channels C1, C2, and C3.
  • the heat map H is weightedly calculated with the feature maps C1, C2, and C3 to determine the weighted feature map, wherein the weights of the feature maps C1, C2, and C3 can be allocated according to actual conditions.
  • the target detection method provided in the embodiments of the present disclosure further includes: updating the parameters of the second neural network and the third neural network according to a back propagation algorithm.
  • a second sample image is input into a pre-trained first neural network, and a heat map corresponding to the second sample image is output; the second sample image is input into a second neural network, and a feature map corresponding to the second sample image is output; the heat map and feature map corresponding to the second sample image are feature fused to determine a weighted feature map corresponding to the second sample image; the weighted feature map corresponding to the second sample image is input into the third neural network, and multiple detection boxes corresponding to the second sample image are output; based on the coordinates in the heat map corresponding to the second sample image whose scores are greater than a threshold, the detection box with the largest coordinate score among the multiple detection boxes corresponding to the second sample image is determined as the detection result corresponding to the second sample image; based on the detection result corresponding to the second sample image and the position information of the human head in the second sample image, the parameters of the second neural network and the parameters of the third neural network are updated according to the back propagation algorithm to train the second neural network and the
  • the detection method provided in the embodiments of the present disclosure further includes: deploying the first neural network in the One thread; the second neural network and the third neural network are deployed in the second thread.
  • the first neural network and the second neural network disclosed in the present invention are decoupled and can be processed in parallel during deployment, thereby improving processing efficiency.
  • FIG6 shows a schematic diagram of an overall training of a detection method in an embodiment of the present disclosure.
  • the model training process includes the auxiliary network (equivalent to the first neural network) training process, the feature fusion process (thermal map and feature map feature fusion) and the model's backbone network (equivalent to the second neural network) and the final detection network (equivalent to the third neural network) training process.
  • the model training process includes the auxiliary network training process of the model, the feature fusion process, and the main network of the model (equivalent to the second neural network) and the final detection network (equivalent to the third neural network) training process.
  • the input image 61 (equivalent to the target image) is input into the auxiliary network and the main network to obtain the heat map H and the feature map F respectively, and then the heat map H is fused with the feature map F to generate the weighted feature map F1 (equivalent to the weighted feature map) and sent to the detection network.
  • the reasoning process of the model is that after the network model training is completed, a large number of candidate boxes are obtained in the detection network output, and the final detection box is directly selected from the location and surrounding of the points (for example, (x 0 , y 0 ), (x 1 , y 1 )) greater than the threshold in the heat map (when selecting points in the heat map, it should be noted that if a point and the points in its four neighborhoods are all greater than the threshold, only the largest one is selected).
  • the selection rule is to select the detection box with the highest score in the point and its four neighborhoods as the final result.
  • the above process replaces the non-maximum suppression process, and the threshold used in this process can be adjusted according to actual conditions.
  • FIG. 7 shows a schematic diagram of auxiliary network training of a detection method in an embodiment of the present disclosure.
  • the auxiliary network (the first neural network) is composed entirely of convolution and maximum pooling (i.e., 71 in Figure 7 includes convolution and maximum pooling).
  • the number of network layers (the number of network layers is greater than 5) and the input image size can be set according to actual conditions.
  • the auxiliary network finally generates a heat map H after passing through the fully connected layer 72, and compares it point by point with the true value H_GT of the heat map, and calculates the cross entropy loss point by point.
  • the scaled size should be the same as the size of the heat map generated by the auxiliary network (for example, the size of the heat map generated by the auxiliary network is 80*60, so the scaled image size here should also be 80*60).
  • take the coordinates of the center point of the scaled rectangular frame as P1, P2, P3, etc. set the pixel values of each center point and its four neighbors to 1, and the pixel values of the remaining positions to 0 to generate the true value heat map H_GT.
  • the heat map can be saved as a single-channel binary image or in the form of coordinate points for training the auxiliary network.
  • the auxiliary network module and the backbone and detection network modules are trained separately.
  • the auxiliary network is composed entirely of convolution and maximum pooling. The number of network layers and the size of the input image can be freely determined according to the actual situation.
  • the auxiliary network finally generates a heat map after passing through the fully connected layer, and compares it with the true value of the heat map point by point, and calculates the cross entropy loss point by point.
  • the backbone network and the detection network are trained.
  • the backbone network is also composed of multiple convolution and pooling layers. The size of the output feature map of the backbone network should be kept consistent with the auxiliary network. It is responsible for extracting image features.
  • the feature map calculated by the backbone network is multiplied point by point on each channel of the heat map to weight the feature map on each channel, enhance the response of the head area and suppress the response of the background.
  • the fused feature map is input into the final detection network part, and the detection network is trained. During back propagation, only the parameters of the detection network and the backbone network are updated.
  • the input image is first input into the auxiliary network and the main network to obtain the heat map and feature map respectively, and then the heat map and feature map are fused and sent to the detection network.
  • the detection network After the detection network outputs a large number of candidate frames, the final detection frame is directly selected from the location and surroundings of the points greater than the threshold in the heat map (when selecting points in the heat map, it should be noted that if a point and the points in its four neighborhoods are all greater than the threshold, only the largest one is selected).
  • the selection rule is to select the detection frame with the highest score in the point and its four neighborhoods as the final result.
  • the above process replaces the non-maximum suppression process, and the threshold used in this process can be adjusted according to actual conditions.
  • the weight files of the auxiliary network, backbone network and detection network are first converted into the format required by the corresponding chip manufacturer.
  • Each chip and device has a corresponding conversion method, and the parameter accuracy can be appropriately reduced to improve the reasoning speed; the algorithm proposed in this method can decouple the auxiliary network and the backbone network, so they can be calculated in parallel during reasoning.
  • the algorithm enables two threads when deployed.
  • the auxiliary network is deployed in thread 1 (equivalent to the first thread mentioned above), and the backbone network and detection network are deployed in thread 2 (equivalent to the second thread mentioned above).
  • thread 1 runs, the auxiliary network shares the heat map and the coordinates of the position points with responses greater than the threshold with thread 2.
  • the information shared by thread 2 will be used for subsequent calculations.
  • the present disclosure also provides a small-size head detection device based on an attention mechanism, as described in the following embodiments. Since the principle of solving the problem in the device embodiment is similar to that in the above method embodiment, the implementation of the device embodiment can refer to the implementation of the above method embodiment, and the repeated parts will not be repeated.
  • Figure 8 shows a schematic diagram of a detection device in an embodiment of the present disclosure.
  • the device includes: a first neural network module 81, a second neural network module 82, a feature fusion module 83, a third neural network module 84, a detection result determination module 85, a back propagation module 86 and a network deployment module 87.
  • the first neural network module 81 is used to input the target image into the pre-trained first neural network and output the thermal a heat map; a second neural network module 82, used to input the target image into the second neural network and output a feature map; a feature fusion module 83, used to fuse the heat map with the feature map features and determine a weighted feature map; a third neural network module 84, used to input the weighted feature map into a pre-trained third neural network and output multiple detection frames; a detection result determination module 85, used to determine, based on the coordinates in the heat map whose scores are greater than a threshold, the detection frame with the largest coordinate score among the multiple detection frames as the detection result.
  • the first neural network module 81 is also used to: acquire a target image; annotate a human head in the target image with a rectangular frame to determine the annotated frame, wherein the annotated frame includes coordinate information and the center of the annotated frame corresponds to the center of the human head area; determine a heat map based on the output of the target image after inputting the first neural network; determine a true value heat map based on the center point of the rectangular frame annotation; and train the first neural network based on the heat map and the true value heat map.
  • the first neural network module 81 is further used to obtain a first sample image; annotate a human head in the first sample image with a rectangular frame to determine the annotated frame, wherein the annotated frame includes coordinate information and the center of the annotated frame corresponds to the center of the human head area; determine a heat map corresponding to the first sample image according to the output after the first sample image is input into the first neural network; determine a true value heat map corresponding to the first sample image according to the center point of the annotated frame; train the first neural network according to the heat map corresponding to the second sample image and the true value heat map.
  • the second neural network module 82 is used to: input the target image into the second neural network, and output a feature map with the same size as the heat map.
  • the third neural network module 84 is further used to: perform feature fusion on the feature map and the heat map by weighting to determine a weighted feature map; and train the third neural network according to the weighted feature map.
  • the first neural network module 81 is also used to input the second sample image into the pre-trained first neural network, and output the heat map corresponding to the second sample image;
  • the second neural network module 82 is also used to input the second sample image into the second neural network, and output the feature map corresponding to the second sample image;
  • the feature fusion module 83 is also used to perform feature fusion on the heat map and feature map corresponding to the second sample image, and determine the weighted feature map corresponding to the second sample image;
  • the third neural network module 84 is also used to train the third neural network according to the weighted feature map corresponding to the second sample image.
  • the above-mentioned feature fusion module 83 is also used to: multiply the heat map by the feature map on each channel point by point, and perform weighted summation according to the weights corresponding to the feature map on each channel to determine the weighted feature map, wherein the heat map includes one channel and the feature map includes at least one channel.
  • the detection device further includes a reverse propagation module 86 for: Propagation algorithm, updates the parameters of the second neural network and the third neural network.
  • the above-mentioned detection device also includes a back propagation module 86
  • the third neural network module is also used to input the weighted feature map corresponding to the second sample image into the third neural network, and output multiple detection boxes corresponding to the second sample image
  • the detection result determination module 85 is also used to determine the detection box with the largest coordinate score in the multiple detection boxes corresponding to the second sample image according to the coordinates with scores greater than a threshold in the heat map corresponding to the second sample image, as the detection result corresponding to the second sample image
  • the back propagation module 86 is used to update the parameters of the second neural network and the parameters of the third neural network according to the back propagation algorithm based on the detection result corresponding to the second sample image and the position information of the human head in the second sample image, so as to train the second neural network and the third neural network.
  • the above-mentioned detection device also includes a network deployment module 87 for: deploying the first neural network in the first thread; and deploying the second neural network and the third neural network in the second thread.
  • the first neural network module, the second neural network module, the feature fusion module, the third neural network module, and the detection result determination module correspond to S202 to S210 in the method embodiment, and the examples and application scenarios implemented by the above modules and the corresponding steps are the same, but are not limited to the contents disclosed in the above method embodiment. It should be noted that the above modules as part of the device can be executed in a computer system such as a set of computer executable instructions.
  • the electronic device 900 according to this embodiment of the present disclosure is described below with reference to Fig. 9.
  • the electronic device 900 shown in Fig. 9 is only an example and should not bring any limitation to the functions and scope of use of the embodiment of the present disclosure.
  • the electronic device 900 is presented in the form of a general computing device.
  • the components of the electronic device 900 may include but are not limited to: at least one processing unit 910, at least one storage unit 920, and a bus 930 connecting different system components (including the storage unit 920 and the processing unit 910).
  • the storage unit stores program codes, which can be executed by the processing unit 910, so that the processing unit 910 executes the steps described in the above “exemplary method” section of this specification according to various exemplary embodiments of the present disclosure.
  • the processing unit 910 may perform the following steps of the above method embodiment: input the target image into a pre-trained first neural network, output a heat map, wherein the heat map reflects the position of the human head; The image is input into the second neural network, and a feature map is output; the heat map is fused with the feature map features to determine a weighted feature map; the weighted feature map is input into the pre-trained third neural network, and multiple detection frames are output; based on the coordinates in the heat map whose scores are greater than the threshold, the detection frame with the largest coordinate score among the multiple detection frames is determined as the detection result.
  • the processing unit 910 can execute the following steps of the above method embodiment: obtain a target image; mark the human head in the target image with a rectangular frame to determine the marked frame, wherein the marked frame includes coordinate information and the center of the marked frame corresponds to the center of the head area; determine a heat map based on the output after the target image is input into the first neural network; determine a true value heat map based on the center point of the rectangular frame mark; and train the first neural network based on the heat map and the true value heat map.
  • the processing unit 910 can execute the following steps of the above method embodiment: obtain a first sample image; mark the human head in the first sample image with a rectangular frame to determine the marked frame, wherein the marked frame includes coordinate information and the center of the marked frame corresponds to the center of the head area; determine the heat map corresponding to the first sample image according to the output after the first sample image is input into the first neural network; determine the true value heat map corresponding to the first sample image according to the center point of the marked frame; train the first neural network according to the heat map corresponding to the second sample image and the true value heat map.
  • the processing unit 910 may perform the following steps of the above method embodiment: inputting the target image into the second neural network, and outputting a feature map with the same size as the heat map.
  • the processing unit 910 may execute the following steps of the above method embodiment: perform feature fusion on the feature map and the heat map by weighting to determine a weighted feature map; and train the third neural network according to the weighted feature map.
  • the processing unit 910 can execute the following steps of the above method embodiment: input the second sample image into a pre-trained first neural network, and output a heat map corresponding to the second sample image; input the second sample image into a second neural network, and output a feature map corresponding to the second sample image; perform feature fusion on the heat map and feature map corresponding to the second sample image to determine a weighted feature map corresponding to the second sample image; and train the third neural network based on the weighted feature map corresponding to the second sample image.
  • the processing unit 910 can execute the following steps of the above method embodiment: multiply the heat map with the feature map on each channel point by point, and perform weighted summation according to the weights corresponding to the feature map on each channel to determine the weighted feature map, wherein the heat map includes one channel and the feature map includes at least one channel.
  • the processing unit 910 may execute the following steps of the above method embodiment: updating the parameters of the second neural network and the third neural network according to the back propagation algorithm.
  • the processing unit 910 may perform the following steps of the above method embodiment: inputting the weighted feature map corresponding to the second sample image into the third neural network, outputting a plurality of detection features corresponding to the second sample image, detection frame; according to the coordinates in the heat map corresponding to the second sample image whose scores are greater than a threshold, determine the detection frame with the largest coordinate score among the multiple detection frames corresponding to the second sample image as the detection result corresponding to the second sample image; according to the detection result corresponding to the second sample image and the position information of the human head in the second sample image, according to the back propagation algorithm, update the parameters of the second neural network and the parameters of the third neural network to train the second neural network and the third neural network.
  • the processing unit 910 may execute the following steps of the above method embodiment: the first neural network is deployed in the first thread; the second neural network and the third neural network are deployed in the second thread.
  • the storage unit 920 may include a readable medium in the form of a volatile storage unit, such as a random access storage unit (RAM) 9201 and/or a cache storage unit 9202 , and may further include a read-only storage unit (ROM) 9203 .
  • RAM random access storage unit
  • ROM read-only storage unit
  • the storage unit 920 may also include a program/utility 9204 having a set (at least one) of program modules 9205, such program modules 9205 including but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which or some combination may include an implementation of a network environment.
  • program modules 9205 including but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which or some combination may include an implementation of a network environment.
  • Bus 930 may represent one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
  • the electronic device 900 may also communicate with one or more external devices 940 (e.g., keyboards, pointing devices, Bluetooth devices, etc.), may also communicate with one or more devices that enable a user to interact with the electronic device 900, and/or may communicate with any device that enables the electronic device 900 to communicate with one or more other computing devices (e.g., routers, modems, etc.). Such communication may be performed via an input/output (I/O) interface 950.
  • the electronic device 900 may also communicate with one or more networks (e.g., local area networks (LANs), wide area networks (WANs), and/or public networks, such as the Internet) via a network adapter 960.
  • networks e.g., local area networks (LANs), wide area networks (WANs), and/or public networks, such as the Internet
  • the network adapter 960 communicates with other modules of the electronic device 900 via a bus 930. It should be understood that, although not shown in the figure, other hardware and/or software modules may be used in conjunction with the electronic device 900, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, etc.
  • the technical solution according to the implementation of the present disclosure can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a USB flash drive, a mobile hard disk, etc.) or on a network, including a number of instructions to enable a computing device (which can be a personal computer, a server, a terminal device, or a network device) to perform a certain operation. etc.) to execute the method according to the embodiment of the present disclosure.
  • a non-volatile storage medium which can be a CD-ROM, a USB flash drive, a mobile hard disk, etc.
  • a computing device which can be a personal computer, a server, a terminal device, or a network device
  • the process described above with reference to the flowchart may be implemented as a computer program product, which includes: a computer program, which implements the above detection method when executed by a processor.
  • a computer-readable storage medium is also provided, which may be a readable signal medium or a readable storage medium.
  • FIG. 10 shows a schematic diagram of a computer-readable storage medium in an embodiment of the present disclosure.
  • a program product capable of implementing the above-mentioned method of the present disclosure is stored on the computer-readable storage medium 1000.
  • various aspects of the present disclosure may also be implemented in the form of a program product, which includes a program code. When the program product is run on a terminal device, the program code is used to enable the terminal device to execute the steps described in the above “Exemplary Method” section of this specification according to various exemplary implementations of the present disclosure.
  • the program product in the embodiment of the present disclosure when executed by a processor, the following steps are implemented: inputting the target image into a pre-trained first neural network to output a heat map, wherein the heat map reflects the position of the human head; inputting the target image into a second neural network to output a feature map; fusing the heat map with the feature map features to determine a weighted feature map; inputting the weighted feature map into a pre-trained third neural network to output multiple detection frames; and determining, based on the coordinates in the heat map whose scores are greater than a threshold, the detection frame with the largest coordinate score among the multiple detection frames as the detection result.
  • the method implements the following steps: acquiring a target image; marking a human head in the target image with a rectangular frame to determine the marked frame, wherein the marked frame includes coordinate information and the center of the marked frame corresponds to the center of the human head area; determining a heat map based on the output after the target image is input into the first neural network; determining a true value heat map based on the center point of the rectangular frame marking; and training the first neural network based on the heat map and the true value heat map.
  • the program product in the embodiment of the present disclosure implements the following steps when executed by a processor: obtaining a first sample image; marking a human head in the first sample image with a rectangular frame to determine the marking frame, wherein the marking frame includes coordinate information and the center of the marking frame corresponds to the center of the head area; determining a heat map corresponding to the first sample image based on the output after the first sample image is input into a first neural network; determining a true value heat map corresponding to the first sample image based on the center point of the marking frame; and training the first neural network based on the heat map corresponding to the second sample image and the true value heat map.
  • the method implements the following steps: inputting the target image into the second neural network, and outputting a feature map with the same size as the heat map.
  • the program product in the embodiment of the present disclosure when executed by a processor, the following steps are implemented:
  • the image and the heat map are weighted to perform feature fusion to determine a weighted feature map; and the third neural network is trained based on the weighted feature map.
  • the method implements the following steps: inputting the second sample image into a pre-trained first neural network, and outputting a heat map corresponding to the second sample image; inputting the second sample image into a second neural network, and outputting a feature map corresponding to the second sample image; performing feature fusion on the heat map and feature map corresponding to the second sample image, and determining a weighted feature map corresponding to the second sample image; and training the third neural network based on the weighted feature map corresponding to the second sample image.
  • the method implements the following steps: multiplying the heat map with the feature map on each channel point by point, and performing weighted summation according to the weights corresponding to the feature map on each channel to determine the weighted feature map, wherein the heat map includes one channel and the feature map includes at least one channel.
  • the method implements the following steps: updating the parameters of the second neural network and the third neural network according to the back propagation algorithm.
  • the program product in the embodiment of the present disclosure implements the following steps when executed by a processor: inputting the weighted feature map corresponding to the second sample image into the third neural network, and outputting multiple detection boxes corresponding to the second sample image; determining the detection box with the largest coordinate score in the multiple detection boxes corresponding to the second sample image according to the coordinates with scores greater than a threshold in the heat map corresponding to the second sample image, as the detection result corresponding to the second sample image; updating the parameters of the second neural network and the parameters of the third neural network according to the back propagation algorithm, so as to train the second neural network and the third neural network.
  • the program product in the embodiment of the present disclosure when executed by a processor, the following steps are implemented: the first neural network is deployed in the first thread; the second neural network and the third neural network are deployed in the second thread.
  • Computer-readable storage media in the present disclosure may include, but are not limited to, an electrical connection having one or more conductors, a portable computer disk, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), optical fiber, a portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read-only memory
  • EPROM or flash memory erasable programmable read-only memory
  • CD-ROM compact disk read-only memory
  • CD-ROM compact disk read-only memory
  • magnetic storage device or any suitable combination of the foregoing.
  • a computer readable storage medium may include a data signal propagated in baseband or as part of a carrier wave, wherein a readable program code is carried. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the above.
  • a readable signal medium may also be any readable medium other than a readable storage medium, which may send, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device.
  • the program code contained on the computer-readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wired, optical cable, RF, etc., or any suitable combination of the foregoing.
  • the program code for performing the operations of the present disclosure may be written in any combination of one or more programming languages, including object-oriented programming languages such as Java, C++, etc., and conventional procedural programming languages such as "C" or similar programming languages.
  • the program code may be executed entirely on the user computing device, partially on the user device, as a separate software package, partially on the user computing device and partially on a remote computing device, or entirely on a remote computing device or server.
  • the remote computing device may be connected to the user computing device through any type of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computing device (e.g., using an Internet service provider to connect through the Internet).
  • LAN local area network
  • WAN wide area network
  • the technical solution according to the implementation mode of the present disclosure can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a USB flash drive, a mobile hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which can be a personal computer, a server, a mobile terminal, or a network device, etc.) to execute the method according to the implementation mode of the present disclosure.
  • a non-volatile storage medium which can be a CD-ROM, a USB flash drive, a mobile hard disk, etc.
  • a computing device which can be a personal computer, a server, a mobile terminal, or a network device, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)

Abstract

本公开提供了一种检测方法及相关设备,涉及计算机图像领域。该方法包括,将目标图像输入预先训练好的第一神经网络,输出热力图,其中,热力图反映人头所在位置;将目标图像输入第二神经网络,输出特征图;将热力图与特征图特征融合,确定加权特征图;将加权特征图输入预先训练好的第三神经网络,输出多个检测框;根据热力图中得分大于阈值的坐标,确定多个检测框中坐标得分最大的检测框为检测结果。

Description

检测方法及相关设备
相关申请的交叉引用
本申请是以CN申请号为202211609796.1,申请日为2022年12月14日的申请为基础,并主张其优先权,该CN申请的公开内容在此作为整体引入本申请中。
技术领域
本公开涉及计算机图像领域,尤其涉及一种检测方法及相关设备。
背景技术
目前基于视频图像和目标检测等算法的自动化监测系统已经广泛应用于生活中的多个领域。在目标检测任务中,首先需要现场采集或者网络搜集大量真实场景的图片数据,人工对收集到的图片数据进行标注,然后将收集到的数据输入神经网络,针对需要完成的任务对神经网络进行训练,最终将需要预测的图片输入训练好的网络模型中,将模型的输出进行一系列的操作以解析模型参数形成可视化的检测框结果。
需要说明的是,在上述背景技术部分公开的信息仅用于加强对本公开的背景的理解,因此可以包括不构成对本领域普通技术人员已知的现有技术的信息。
发明内容
本公开的其他特性和优点将通过下面的详细描述变得显然,或部分地通过本公开的实践而习得。
根据本公开的一个方面,提供了一种检测方法,包括:将目标图像输入预先训练好的第一神经网络,输出热力图,其中,所述热力图反映人头所在位置;将目标图像输入第二神经网络,输出特征图;将所述热力图与所述特征图特征融合,确定加权特征图;将所述加权特征图输入预先训练好的第三神经网络,输出多个检测框,其中,检测框的中心点在人头区域中心;根据所述热力图中得分大于阈值的坐标,确定多个检测框中坐标得分最大的检测框为检测结果。
在一些实施例中,所述检测方法还包括:获取目标图像;将所述目标图像中人头进行矩形框标注,确定标注框,其中,标注框包括坐标信息且标注框的中心对应人头区域中心;根据所述目标图像输入第一神经网络后的输出,确定热力图;根据所述矩 形框标注的中心点,确定真值热力图;根据热力图和真值热力图,训练第一神经网络。
在一些实施例中,所述检测方法还包括:获取第一样本图像;将所述第一样本图像中人头进行矩形框标注,确定标注框,其中,所述标注框包括坐标信息且标注框的中心对应人头区域中心;根据所述第一样本图像输入第一神经网络后的输出,确定所述第一样本图像对应的热力图;根据所述标注框的中心点,确定所述第一样本图像对应的真值热力图;根据所述第二样本图像对应的热力图和真值热力图,训练所述第一神经网络。
在一些实施例中,所述将目标图像输入第二神经网络,输出特征图包括:将目标图像输入第二神经网络,输出与所述热力图尺寸相同的特征图。
在一些实施例中,所述检测方法还包括:将所述特征图与热力图通过加权进行特征融合,确定加权特征图;根据所述加权特征图,训练第三神经网络。
在一些实施例中,所述检测方法还包括:将第二样本图像输入预先训练好的第一神经网络,输出所述第二样本图像对应的热力图;将所述第二样本图像输入第二神经网络,输出所述第二样本图像对应的特征图;将所述第二样本图像对应的热力图和特征图进行特征融合,确定所述第二样本图像对应的加权特征图;根据所述第二样本图像对应的加权特征图,训练所述第三神经网络。
在一些实施例中,将所述特征图与热力图通过加权进行特征融合,确定加权特征图包括:将所述热力图分别与每个通道上的特征图逐点相乘,并根据每个通道上的特征图对应的权重进行加权求和,确定加权特征图,其中,热力图包括一个通道,特征图包括至少一个通道。
在一些实施例中,所述检测方法还包括:根据反向传播算法,更新第二神经网络、第三神经网络的参数。
在一些实施例中,所述根据所述第二样本图像对应的加权特征图,训练所述第三神经网络包括:将所述第二样本图像对应的加权特征图输入所述第三神经网络,输出所述第二样本图像对应的多个检测框;根据所述第二样本图像对应的热力图中得分大于阈值的坐标,确定所述第二样本图像对应的多个检测框中坐标得分最大的检测框,作为所述第二样本图像对应的检测结果;根据所述第二样本图像对应的检测结果和所述第二样本图像中人头的位置信息,根据反向传播算法,更新所述第二神经网络的参数和所述第三神经网络的参数,以训练所述第二神经网络和所述第三神经网络。
在一些实施例中,所述检测方法还包括:第一神经网络部署在第一线程;第二神 经网络、第三神经网络部署在第二线程。
根据本公开的另一个方面,还提供了一种检测装置,包括:第一神经网络模块,用于将目标图像输入预先训练好的第一神经网络,输出热力图,其中,所述热力图反映人头所在位置;第二神经网络模块,用于将目标图像输入第二神经网络,输出特征图;特征融合模块,用于将所述热力图与所述特征图特征融合,确定加权特征图;第三神经网络模块,用于将所述加权特征图输入预先训练好的第三神经网络,输出多个检测框;检测结果确定模块,用于根据所述热力图中得分大于阈值的坐标,确定多个检测框中坐标得分最大的检测框为检测结果。
根据本公开的又一个方面,还提供了一种电子设备,该电子设备包括:处理器;以及存储器,用于存储所述处理器的可执行指令;其中,所述处理器配置为经由执行所述可执行指令来执行上述任意一项所述的检测方法。
根据本公开的再一个方面,还提供了一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现上述任意一项所述的检测方法。
根据本公开的又一个方面,还提供了一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时实现上述任意一项的检测方法。
根据本公开的再一个方面,还提供了一种计算机程序,包括:指令,所述指令被处理器执行时实现如上述任意实施例的检测方法。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本公开。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本公开的实施例,并与说明书一起用于解释本公开的原理。显而易见地,下面描述中的附图仅仅是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1示出本公开实施例中一种检测系统结构的示意图;
图2示出本公开实施例中另一种检测方法流程图;
图3示出本公开实施例中再一种检测方法流程图;
图4示出本公开实施例中另外一种检测方法流程图;
图5示出本公开实施例中另一种检测方法流程图;
图6示出本公开实施例中一种检测方法的整体训练示意图;
图7示出本公开实施例中一种检测方法的辅助网络训练示意图;
图8示出本公开实施例中一种检测装置示意图;
图9示出本公开实施例中一种计算机设备的结构框图;
图10示出本公开实施例中一种计算机可读存储介质的示意图。
具体实施方式
现在将参考附图更全面地描述示例实施方式。然而,示例实施方式能够以多种形式实施,且不应被理解为限于在此阐述的范例;相反,提供这些实施方式使得本公开将更加全面和完整,并将示例实施方式的构思全面地传达给本领域的技术人员。所描述的特征、结构或特性可以以任何合适的方式结合在一个或更多实施方式中。
此外,附图仅为本公开的示意性图解,并非一定是按比例绘制。图中相同的附图标记表示相同或类似的部分,因而将省略对它们的重复描述。附图中所示的一些方框图是功能实体,不一定必须与物理或逻辑上独立的实体相对应。可以采用软件形式来实现这些功能实体,或在一个或多个硬件模块或集成电路中实现这些功能实体,或在不同网络和/或处理器装置和/或微控制器装置中实现这些功能实体。
发明人发现:现有技术中,在部分特殊场景如目标较小时如工地安全帽场景下,常用的检测算法会对小尺寸的目标无法进行精准的识别。
本公开提供一种检测方法及相关设备,至少在一定程度上克服由于相关技术中小尺寸人头检测不准确的问题。
下面结合附图,对本公开实施例的具体实施方式进行详细说明。
图1示出了可以应用本公开实施例中检测方法的示例性应用系统架构示意图。如图1所示,该系统架构100可以包括辅助网络101、主干网络102和检测网络103。
辅助网络(第一神经网络)部署在线程一,由多个卷积层和池化层组成,最终生成人头位置的热力图H,热力图大小(尺寸)为w*h,通道数为1;训练时逐点计算辅助网络生成的热力图H和真值热力图H_GT的交叉熵损失,训练辅助网络。
辅助网络训练完成后开始训练主干网络(第二神经网络),主干网络同样由多个卷积和池化层组成,负责提取图像特征,最后提取到特征图F,特征图大小与热力图保持一致为w*h,通道数为C;特征融合时将热力图和每一个通道上的特征图逐点相乘,生成加权后的特征图F1,F1大小为w*h,通道数为C。
最后将加权后的特征图F1输入到检测网络(第三神经网络)部分,并对检测网络进行训练,反向传播时只对检测网络、主干网络部分的参数进行更新。在通过检测网络得到大量侯选框以后,在辅助网络生成的热力图内取所有得分大于阈值T的点的坐标,并在众多侯选框中选择在坐标位置及其四邻域内取得分最大的检测框作为最终的检测结果,取代了传统检测算法的非极大值抑制(Non-Maximum Suppression,NMS)过程,加快后处理速度。
在系统运行前,首先将辅助网络、主干网络和检测网络部分的权重文件进行格式转换,转化为对应芯片厂商所需格式,可以适当降低参数精度来提升推理速度;算法在部署时启用两个线程,辅助网络部署在线程1(第一线程),主干网络和检测网络部署在线程2(第二线程),辅助网络在线程1运行结束后将热力图和响应(得分)大于阈值的位置点坐标共享给线程2,主干网络计算结束后将再利用线程2共享的信息进行后续计算。
本公开充分利用了人头、安全帽等物体形状相对固定、同一监控场景下尺度变化较小的特点,利用额外的辅助网络去预测图像中人头中心的位置,并利用人头中心位置去辅助主干网络和检测网络训练,并在后处理过程从候选框中直接得到最终检测框。
本公开可以大幅提高图像中小目标的准确率和召回率,并且在后处理过程中无须进行耗时的非极大值抑制过程,提高算法速度。
本公开的辅助网络模块在推理过程与主干网络和检测网络模块解耦,可以在部署时将两个模块放入不同的进程或者线程中并行处理,可以在任意厂商的嵌入式设备上实时离线运行。
本领域技术人员可以知晓,图1中的辅助网络、主干网络和检测网络的数量仅仅是示意性的,根据实际需要,可以具有任意数目的的辅助网络、主干网络、检测网络。本公开实施例对此不作限定。
在上述系统架构下,本公开实施例中提供了一种基于注意力机制的小尺寸人头检测方法,该方法可以由任意具备计算处理能力的电子设备执行。
图2示出本公开实施例中一种检测方法流程图,如图2所示,本公开实施例中提供的检测方法包括如下步骤S202~210。
在步骤S202中,将目标图像输入预先训练好的第一神经网络,输出热力图,其中,热力图反映人头所在位置。
需要说明的是,上述目标图像可以是在人头检测场景下拍摄的带有安全帽的人头 图像。例如,工地安全帽场景下拍摄的图像。上述第一神经网络模型可以是神经网络模型,神经网络模型是以神经元的数学模型为基础来描述的,是一个数学模型。上述热力图可以是以特殊高亮的形式显示访客热衷的页面区域和访客所在的地理区域的图示。例如,高亮显示带有安全帽的人头图像。
在步骤S204中,将目标图像输入预先训练好的第二神经网络,输出特征图。
需要说明的是,上述第二神经网络模型可以是神经网络模型。上述特征图可以是二维图片。例如,在每个卷积层,数据都是以三维形式存在的三维是许多个二维图片叠在一起,其中每一个称为一个特征图。
在步骤S206中,将热力图与特征图特征融合,确定加权特征图。
需要说明的是,上述特征融合可以是对同一模式抽取不同的特征矢量进行优化组合,例如,特征融合时将热力图和每一个通道上的特征图逐点相乘,生成加权后的特征图(相当于上述加权特征图)。
在步骤S208中,将加权特征图输入预先训练好的第三神经网络,输出多个标注框。
需要说明的是,上述第三神经网络模型可以是神经网络模型。上述标注框可以是识别出来的人头区域,例如,平行轴的矩形框。
在一些实例中,检测框的中心点可以在人头区域中心。
在步骤S210中,根据热力图中得分大于预设阈值的坐标,确定多个检测框中坐标得分最大的检测框为检测结果。
需要说明的是,上述得分可以是一点及其四邻域内的得分。
例如,在生成的热力图内取所有得分大于阈值T的点的坐标,并在众多标注框中选择在坐标位置及其四邻域内取得分最大的标注框作为最终的检测结果。
本公开的实施例中提供的检测方法,将目标图像输入预先训练好的第一神经网络,输出热力图,其中,热力图反映人头所在位置;将目标图像输入第二神经网络,输出特征图;将热力图与特征图特征融合,确定加权特征图;将加权特征图输入预先训练好的第三神经网络,输出多个检测框;根据热力图中得分大于阈值的坐标,确定多个检测框中坐标得分最大的检测框为检测结果。本公开实施例中,由于第一神经网络热力图对人头所在位置进行预测,并与第二神经网络特征图融合实现注意力机制,能够获得更加准确的人头边界框,从而有助于小尺寸人头检测准确度的提高。
本公开在人头检测场景下,对普通的矩形标注进行转化生成人头位置热力图并训 练第一神经网络,利用第一神经网络来对人头所在位置进行预测,并与第二神经网络特征图融合实现注意力机制,辅助第二神经网络和第三神经网络训练,利用第一神经网络直接从候选的标注框中得到最终的标注框,取代非极大值抑制过程,加快后处理速度。本公开通过第一神经网络热力图对人头所在位置进行预测,并与第二神经网络特征图融合实现注意力机制,能够获得更加准确的人头边界框,从而有助于小尺寸人头检测准确度的提高。
在本公开的一些实施例中,如图3所示,本公开实施例中提供的检测方法可以通过如下步骤来训练第一神经网络,能够准确预测出图像中人头所在的中心位置。
在步骤S302中,获取目标图像(第一样本图像)。
在步骤S304中,将目标图像(第一样本图像)中人头进行矩形框标注,确定标注框,其中,标注框包括坐标信息且标注框的中心对应人头区域中心。
S306,根据目标图像(第一样本图像)输入第一神经网络后的输出,确定(第一样本图像对应的)热力图。
S308,根据矩形框标注的中心点,确定(第一样本图像对应的)真值热力图。
S310,根据(第一样本图像对应的)热力图和真值热力图,训练第一神经网络。
在一些具体的实例中,对于目标图像(第一样本图像)的数据处理部分包括采集、标注过程和真值热力图生成过程。采集和标注过程,即利用现场采集和网络收集图片(相当于上述目标图像(第一样本图像))制作成为数据集,对其中的全部人头区域进行矩形框标注,标注时需要保证矩形框的中心点在人头区域中心,其中标注的坐标将直接应用于主干网络(相当于上述第二神经网络)和检测网络(相当于上述第三神经网络)的训练;除此之外,需要利用矩形框的标注,生成真值热力图,首先对原图和对应的标注框进行缩放,缩放后的尺寸应保证与辅助网络(相当于上述第一神经网络)生成的热力图尺寸相同,然后取缩放后的标注框中心点坐标为P1、P2、P3等,将各个中心点及其四邻域的像素值设为1,其余位置的像素值设为0,生成真值热力图H_GT。
在本公开的一些实施例中,如图4所示,本公开实施例中提供的检测方法可以通过如下步骤来输出特征图,能够准确确定特征图尺寸。
在步骤S402中,将目标图像输入第二神经网络,输出与热力图尺寸相同的特征图。
在本公开的一些实施例中,如图5所示,本公开实施例中提供的检测方法可以通 过如下步骤来训练第三神经网络,能够准确计算出目标图像的加权特征图。
在步骤S502中,将特征图与热力图通过加权进行特征融合,确定加权特征图;
在步骤S504中,根据加权特征图,训练第三神经网络。
在一些实施例中,将第二样本图像输入预先训练好的第一神经网络,输出所述第二样本图像对应的热力图;将所述第二样本图像输入第二神经网络,输出所述第二样本图像对应的特征图;将所述第二样本图像对应的热力图和特征图进行特征融合,确定所述第二样本图像对应的加权特征图;根据所述第二样本图像对应的加权特征图,训练所述第三神经网络。
第一样本图像和第二样本图像可以相同或不同。
在一些具体的实例中,将特征图与热力图通过加权进行特征融合,确定加权特征图包括:将热力图分别与每个通道上的特征图逐点相乘得到每个通道对应的结果,根据每个通道上的特征图对应的权重对各个通道对应的结果进行加权求和,确定加权特征图,其中,热力图包括一个通道,特征图包括至少一个通道。
例如,热力图H的通道数为1,特征图的通道数为C,C包括C1,C2,C3三条通道,在进行特征融合时,热力图H分别与特征图C1、特征图C2、特征图C3进行加权计算,确定加权特征图,其中,对于特征图C1、特征图C2、特征图C3的权值可以根据实际情况分配。
在一些实例中,本公开实施例中提供的目标检测方法还包括:根据反向传播算法,更新第二神经网络、第三神经网络的参数。
在一些实施例中,将第二样本图像输入预先训练好的第一神经网络,输出所述第二样本图像对应的热力图;将所述第二样本图像输入第二神经网络,输出所述第二样本图像对应的特征图;将所述第二样本图像对应的热力图和特征图进行特征融合,确定所述第二样本图像对应的加权特征图;将所述第二样本图像对应的加权特征图输入所述第三神经网络,输出所述第二样本图像对应的多个检测框;根据所述第二样本图像对应的热力图中得分大于阈值的坐标,确定所述第二样本图像对应的多个检测框中坐标得分最大的检测框,作为所述第二样本图像对应的检测结果;根据所述第二样本图像对应的检测结果和所述第二样本图像中人头的位置信息,根据反向传播算法,更新所述第二神经网络的参数和所述第三神经网络的参数,以训练所述第二神经网络和所述第三神经网络。
在一些实例中,本公开实施例中提供的检测方法还包括:第一神经网络部署在第 一线程;第二神经网络、第三神经网络部署在第二线程。
本公开第一神经网络和第二神经网络解耦,在部署时可以并行处理,提高处理时效。
图6示出本公开实施例中一种检测方法的整体训练示意图。
如图6所示,模型训练过程包含辅助网络(相当于上述第一神经网络)训练过程、特征融合过程(热力图与特征图特征融合)和模型的主干网络(相当于上述第二神经网络)及最终的检测网络(相当于上述第三神经网络)同时训练过程。
模型训练过程包含模型的辅助网络训练过程、特征融合过程和模型的主干网络(相当于上述第二神经网络)及最终的检测网络(相当于上述第三神经网络)同时训练过程。首先将输入图像61(相当于上述目标图像)分别输入辅助网络和主干网络部分分别得到热力图H和特征图F,然后将热力图H与特征图F进行融合生成加权后的特征图F1(相当于上述加权特征图),并送入检测网络。
模型的推理过程即在网络模型训练完成后,在检测网络输出得到大量的候选框,直接在热力图中大于阈值的点(例如,(x0,y0)、(x1,y1))所在位置及周围选取最终的检测框(在热力图中选取点时要注意若某个点及其4邻域内的点都大于阈值则只选取最大的),选取规则为选取该点及其四邻域内得分最高的检测框作为最终结果。从而将上述过程代替非极大值抑制过程,该过程使用的阈值可以根据实际情况进行调整。
图7示出本公开实施例中一种检测方法的辅助网络训练示意图。
如图7所示,辅助网络(第一神经网络)全部由卷积和最大池化组成(即图7中71包括卷积和最大池化),可以根据实际情况来设置网络层数(网络层数大于5)和输入图像大小,辅助网络最终经过全连接层72后生成热力图H,并与热力图真值H_GT进行逐点比对,逐点计算交叉熵损失。
在一些的实例中,首先采集真实待检测场景现场的监控图片,数量一般越多越好,最少要保证在3000张以上。随后对图像中的人头区域进行标注,标注时采用平行轴的矩形框进行标注。随后,将原图和对应的标注进行缩放,缩放后的尺寸应保证与辅助网络生成的热力图尺寸相同(例如辅助网络生成热力图大小为80*60,那么此处缩放后的图像大小也应是80*60),然后取缩放后的矩形框中心点坐标为P1、P2、P3等,将各个中心点及其四邻域的像素值设为1,其余位置的像素值设为0,生成真值热力图H_GT,热力图可以用单通道二值图像保存或者用坐标点的形式保存,用于辅助网络的训练。
接下来是模型训练部分,辅助网络模块和主干、检测网络模块两个模块分别训练。辅助网络全部由卷积和最大池化组成,可以根据实际情况自由决定网络层数和输入图像大小,辅助网络最终经过全连接层后生成热力图,并与热力图真值进行逐点比对,逐点计算交叉熵损失。在辅助网络训练完成以后开始训练主干网络和检测网络,主干网络同样由多个卷积和池化层组成,其中主干网络的输出特征图的尺寸应保持与辅助网络一致,负责提取图像特征,最终将主干网络计算得到的特征图与热力图每一个通道上逐点相乘对每一个通道上的特征图进行加权,增强人头区域的响应并抑制背景的响应,最后将融合后的特征图输入到最后的检测网络部分,并对检测网络进行训练,反向传播时只对检测网络、主干网络部分的参数进行更新。
在模型推理时,首先将输入图像分别输入辅助网络和主干网络部分分别得到热力图和特征图,然后将热力图与特征图进行融合并送入检测网络,在检测网络输出得到大量的候选框以后,直接在热力图中大于阈值的点所在位置及周围选取最终的检测框(在热力图中选取点时要注意若某个点及其4邻域内的点都大于阈值则只选取最大的),选取规则为选取该点及其四邻域内得分最高的检测框作为最终结果。从而将上述过程代替非极大值抑制过程,该过程使用的阈值可以根据实际情况进行调整。
最终在部署时,首先将辅助网络、主干网络和检测网络部分的权重文件进行格式转换,转化为对应芯片厂商所需格式,具体各个芯片和设备有对应的转换方法,可以适当降低参数精度来提升推理速度;本方法中提出的算法可以使辅助网络和主干网络解耦,因此可以在推理时并行计算,算法在部署时启用两个线程,辅助网络部署在线程1(相当于上述第一线程),主干网络和检测网络部署在线程2(相当于上述第二线程),辅助网络在线程1运行结束后将热力图和响应大于阈值的位置点坐标共享给线程2,主干网络计算结束后将再利用线程2共享的信息进行后续计算。
基于同一发明构思,本公开实施例中还提供了一种基于注意力机制的小尺寸人头检测装置,如下面的实施例所述。由于该装置实施例解决问题的原理与上述方法实施例相似,因此该装置实施例的实施可以参见上述方法实施例的实施,重复之处不再赘述。
图8示出本公开实施例中一种检测装置示意图,如图8所示,该装置包括:第一神经网络模块81、第二神经网络模块82、特征融合模块83、第三神经网络模块84、检测结果确定模块85、反向传播模块86和网络部署模块87。
第一神经网络模块81,用于将目标图像输入预先训练好的第一神经网络,输出热 力图;第二神经网络模块82,用于将目标图像输入第二神经网络,输出特征图;特征融合模块83,用于将热力图与特征图特征融合,确定加权特征图;第三神经网络模块84,用于将加权特征图输入预先训练好的第三神经网络,输出多个检测框;检测结果确定模块85,用于根据热力图中得分大于阈值的坐标,确定多个检测框中坐标得分最大的检测框为检测结果。
在本公开的一些实施例中,上述第一神经网络模块81还用于:获取目标图像;将目标图像中人头进行矩形框标注,确定标注框,其中,标注框包括坐标信息且标注框的中心对应人头区域中心;根据目标图像输入第一神经网络后的输出,确定热力图;根据矩形框标注的中心点,确定真值热力图;根据热力图和真值热力图,训练第一神经网络。
在一些实施例中,第一神经网络模块81还用于获取第一样本图像;将所述第一样本图像中人头进行矩形框标注,确定标注框,其中,所述标注框包括坐标信息且标注框的中心对应人头区域中心;根据所述第一样本图像输入第一神经网络后的输出,确定所述第一样本图像对应的热力图;根据所述标注框的中心点,确定所述第一样本图像对应的真值热力图;根据所述第二样本图像对应的热力图和真值热力图,训练所述第一神经网络
在本公开的一些实施例中,上述第二神经网络模块82用于:将目标图像输入第二神经网络,输出与热力图尺寸相同的特征图。
在本公开的一些实施例中,上述第三神经网络模块84还用于:将特征图与热力图通过加权进行特征融合,确定加权特征图;根据加权特征图,训练第三神经网络。
在一些实施例中,第一神经网络模块81还用于将第二样本图像输入预先训练好的第一神经网络,输出所述第二样本图像对应的热力图;第二神经网络模块82还用于将所述第二样本图像输入第二神经网络,输出所述第二样本图像对应的特征图;特征融合模块83还用于将所述第二样本图像对应的热力图和特征图进行特征融合,确定所述第二样本图像对应的加权特征图;第三神经网络模块84还用于根据所述第二样本图像对应的加权特征图,训练所述第三神经网络。在本公开的一些实施例中,上述特征融合模块83还用于:将热力图分别与每个通道上的特征图逐点相乘,并根据每个通道上的特征图对应的权重进行加权求和,确定加权特征图,其中,热力图包括一个通道,特征图包括至少一个通道。
在本公开的一些实施例中,上述检测装置还包括反向传播模块86用于:根据反向 传播算法,更新第二神经网络、第三神经网络的参数。
在本公开的一些实施例中,上述检测装置还包括反向传播模块86,所述第三神经网络模块还用于将所述第二样本图像对应的加权特征图输入所述第三神经网络,输出所述第二样本图像对应的多个检测框;所述检测结果确定模块85还用于根据所述第二样本图像对应的热力图中得分大于阈值的坐标,确定所述第二样本图像对应的多个检测框中坐标得分最大的检测框,作为所述第二样本图像对应的检测结果;反向传播模块86用于根据所述第二样本图像对应的检测结果和所述第二样本图像中人头的位置信息,根据反向传播算法,更新所述第二神经网络的参数和所述第三神经网络的参数,以训练所述第二神经网络和所述第三神经网络。
在本公开的一些实施例中,上述检测装置还包括网络部署模块87用于:将第一神经网络部署在第一线程;将第二神经网络、第三神经网络部署在第二线程。
此处需要说明的是,上述第一神经网络模块、第二神经网络模块、特征融合模块、第三神经网络模块、检测结果确定模块对应于方法实施例中的S202~S210,上述模块与对应的步骤所实现的示例和应用场景相同,但不限于上述方法实施例所公开的内容。需要说明的是,上述模块作为装置的一部分可以在诸如一组计算机可执行指令的计算机系统中执行。
所属技术领域的技术人员能够理解,本公开的各个方面可以实现为系统、方法或程序产品。因此,本公开的各个方面可以具体实现为以下形式,即:完全的硬件实施方式、完全的软件实施方式(包括固件、微代码等),或硬件和软件方面结合的实施方式,这里可以统称为“电路”、“模块”或“系统”。
下面参照图9来描述根据本公开的这种实施方式的电子设备900。图9显示的电子设备900仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图9所示,电子设备900以通用计算设备的形式表现。电子设备900的组件可以包括但不限于:上述至少一个处理单元910、上述至少一个存储单元920、连接不同系统组件(包括存储单元920和处理单元910)的总线930。
其中,所述存储单元存储有程序代码,所述程序代码可以被所述处理单元910执行,使得所述处理单元910执行本说明书上述“示例性方法”部分中描述的根据本公开各种示例性实施方式的步骤。
例如,所述处理单元910可以执行上述方法实施例的如下步骤:将目标图像输入预先训练好的第一神经网络,输出热力图,其中,热力图反映人头所在位置;将目标 图像输入第二神经网络,输出特征图;将热力图与特征图特征融合,确定加权特征图;将加权特征图输入预先训练好的第三神经网络,输出多个检测框;根据热力图中得分大于阈值的坐标,确定多个检测框中坐标得分最大的检测框为检测结果。
例如,所述处理单元910可以执行上述方法实施例的如下步骤:获取目标图像;将目标图像中人头进行矩形框标注,确定标注框,其中,标注框包括坐标信息且标注框的中心对应人头区域中心;根据目标图像输入第一神经网络后的输出,确定热力图;根据矩形框标注的中心点,确定真值热力图;根据热力图和真值热力图,训练第一神经网络。
例如,所述处理单元910可以执行上述方法实施例的如下步骤:获取第一样本图像;将所述第一样本图像中人头进行矩形框标注,确定标注框,其中,所述标注框包括坐标信息且标注框的中心对应人头区域中心;根据所述第一样本图像输入第一神经网络后的输出,确定所述第一样本图像对应的热力图;根据所述标注框的中心点,确定所述第一样本图像对应的真值热力图;根据所述第二样本图像对应的热力图和真值热力图,训练所述第一神经网络。
例如,所述处理单元910可以执行上述方法实施例的如下步骤:将目标图像输入第二神经网络,输出与热力图尺寸相同的特征图。
例如,所述处理单元910可以执行上述方法实施例的如下步骤:将特征图与热力图通过加权进行特征融合,确定加权特征图;根据加权特征图,训练第三神经网络。
例如,所述处理单元910可以执行上述方法实施例的如下步骤:将第二样本图像输入预先训练好的第一神经网络,输出所述第二样本图像对应的热力图;将所述第二样本图像输入第二神经网络,输出所述第二样本图像对应的特征图;将所述第二样本图像对应的热力图和特征图进行特征融合,确定所述第二样本图像对应的加权特征图;根据所述第二样本图像对应的加权特征图,训练所述第三神经网络。
例如,所述处理单元910可以执行上述方法实施例的如下步骤:将热力图分别与每个通道上的特征图逐点相乘,并根据每个通道上的特征图对应的权重进行加权求和,确定加权特征图,其中,热力图包括一个通道,特征图包括至少一个通道。
例如,所述处理单元910可以执行上述方法实施例的如下步骤:根据反向传播算法,更新第二神经网络、第三神经网络的参数。
例如,所述处理单元910可以执行上述方法实施例的如下步骤:将所述第二样本图像对应的加权特征图输入所述第三神经网络,输出所述第二样本图像对应的多个检 测框;根据所述第二样本图像对应的热力图中得分大于阈值的坐标,确定所述第二样本图像对应的多个检测框中坐标得分最大的检测框,作为所述第二样本图像对应的检测结果;根据所述第二样本图像对应的检测结果和所述第二样本图像中人头的位置信息,根据反向传播算法,更新所述第二神经网络的参数和所述第三神经网络的参数,以训练所述第二神经网络和所述第三神经网络。
例如,所述处理单元910可以执行上述方法实施例的如下步骤:第一神经网络部署在第一线程;第二神经网络、第三神经网络部署在第二线程。
存储单元920可以包括易失性存储单元形式的可读介质,例如随机存取存储单元(RAM)9201和/或高速缓存存储单元9202,还可以进一步包括只读存储单元(ROM)9203。
存储单元920还可以包括具有一组(至少一个)程序模块9205的程序/实用工具9204,这样的程序模块9205包括但不限于:操作系统、一个或者多个应用程序、其它程序模块以及程序数据,这些示例中的每一个或某种组合中可能包括网络环境的实现。
总线930可以为表示几类总线结构中的一种或多种,包括存储单元总线或者存储单元控制器、外围总线、图形加速端口、处理单元或者使用多种总线结构中的任意总线结构的局域总线。
电子设备900也可以与一个或多个外部设备940(例如键盘、指向设备、蓝牙设备等)通信,还可与一个或者多个使得用户能与该电子设备900交互的设备通信,和/或与使得该电子设备900能与一个或多个其它计算设备进行通信的任何设备(例如路由器、调制解调器等等)通信。这种通信可以通过输入/输出(I/O)接口950进行。并且,电子设备900还可以通过网络适配器960与一个或者多个网络(例如局域网(LAN),广域网(WAN)和/或公共网络,例如因特网)通信。如图所示,网络适配器960通过总线930与电子设备900的其它模块通信。应当明白,尽管图中未示出,可以结合电子设备900使用其它硬件和/或软件模块,包括但不限于:微代码、设备驱动器、冗余处理单元、外部磁盘驱动阵列、RAID系统、磁带驱动器以及数据备份存储系统等。
通过以上的实施方式的描述,本领域的技术人员易于理解,这里描述的示例实施方式可以通过软件实现,也可以通过软件结合必要的硬件的方式来实现。因此,根据本公开实施方式的技术方案可以以软件产品的形式体现出来,该软件产品可以存储在一个非易失性存储介质(可以是CD-ROM,U盘,移动硬盘等)中或网络上,包括若干指令以使得一台计算设备(可以是个人计算机、服务器、终端装置、或者网络设备 等)执行根据本公开实施方式的方法。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机程序产品,该计算机程序产品包括:计算机程序,所述计算机程序被处理器执行时实现上述检测方法。
在本公开的示例性实施例中,还提供了一种计算机可读存储介质,该计算机可读存储介质可以是可读信号介质或者可读存储介质。图10示出本公开实施例中一种计算机可读存储介质的示意图,如图10所示,该计算机可读存储介质1000上存储有能够实现本公开上述方法的程序产品。在一些可能的实施方式中,本公开的各个方面还可以实现为一种程序产品的形式,其包括程序代码,当所述程序产品在终端设备上运行时,所述程序代码用于使所述终端设备执行本说明书上述“示例性方法”部分中描述的根据本公开各种示例性实施方式的步骤。
例如,本公开实施例中的程序产品被处理器执行时实现如下步骤的方法:将目标图像输入预先训练好的第一神经网络,输出热力图,其中,热力图反映人头所在位置;将目标图像输入第二神经网络,输出特征图;将热力图与特征图特征融合,确定加权特征图;将加权特征图输入预先训练好的第三神经网络,输出多个检测框;根据热力图中得分大于阈值的坐标,确定多个检测框中坐标得分最大的检测框为检测结果。
例如,本公开实施例中的程序产品被处理器执行时实现如下步骤的方法:获取目标图像;将目标图像中人头进行矩形框标注,确定标注框,其中,标注框包括坐标信息且标注框的中心对应人头区域中心;根据目标图像输入第一神经网络后的输出,确定热力图;根据矩形框标注的中心点,确定真值热力图;根据热力图和真值热力图,训练第一神经网络。
例如,本公开实施例中的程序产品被处理器执行时实现如下步骤的方法:获取第一样本图像;将所述第一样本图像中人头进行矩形框标注,确定标注框,其中,所述标注框包括坐标信息且标注框的中心对应人头区域中心;根据所述第一样本图像输入第一神经网络后的输出,确定所述第一样本图像对应的热力图;根据所述标注框的中心点,确定所述第一样本图像对应的真值热力图;根据所述第二样本图像对应的热力图和真值热力图,训练所述第一神经网络。
例如,本公开实施例中的程序产品被处理器执行时实现如下步骤的方法:将目标图像输入第二神经网络,输出与热力图尺寸相同的特征图。
例如,本公开实施例中的程序产品被处理器执行时实现如下步骤的方法:将特征 图与热力图通过加权进行特征融合,确定加权特征图;根据加权特征图,训练第三神经网络。
例如,本公开实施例中的程序产品被处理器执行时实现如下步骤的方法:将第二样本图像输入预先训练好的第一神经网络,输出所述第二样本图像对应的热力图;将所述第二样本图像输入第二神经网络,输出所述第二样本图像对应的特征图;将所述第二样本图像对应的热力图和特征图进行特征融合,确定所述第二样本图像对应的加权特征图;根据所述第二样本图像对应的加权特征图,训练所述第三神经网络。
例如,本公开实施例中的程序产品被处理器执行时实现如下步骤的方法:将热力图分别与每个通道上的特征图逐点相乘,并根据每个通道上的特征图对应的权重进行加权求和,确定加权特征图,其中,热力图包括一个通道,特征图包括至少一个通道。
例如,本公开实施例中的程序产品被处理器执行时实现如下步骤的方法:根据反向传播算法,更新第二神经网络、第三神经网络的参数。
例如,本公开实施例中的程序产品被处理器执行时实现如下步骤的方法:将所述第二样本图像对应的加权特征图输入所述第三神经网络,输出所述第二样本图像对应的多个检测框;根据所述第二样本图像对应的热力图中得分大于阈值的坐标,确定所述第二样本图像对应的多个检测框中坐标得分最大的检测框,作为所述第二样本图像对应的检测结果;根据所述第二样本图像对应的检测结果和所述第二样本图像中人头的位置信息,根据反向传播算法,更新所述第二神经网络的参数和所述第三神经网络的参数,以训练所述第二神经网络和所述第三神经网络。
例如,本公开实施例中的程序产品被处理器执行时实现如下步骤的方法:第一神经网络部署在第一线程;第二神经网络、第三神经网络部署在第二线程。
本公开中的计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。
在本公开中,计算机可读存储介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了可读程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。可读信号介质还可以是可读存储介质以外的任何可读介质,该可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。
可选地,计算机可读存储介质上包含的程序代码可以用任何适当的介质传输,包括但不限于无线、有线、光缆、RF等等,或者上述的任意合适的组合。
在具体实施时,可以以一种或多种程序设计语言的任意组合来编写用于执行本公开操作的程序代码,所述程序设计语言包括面向对象的程序设计语言—诸如Java、C++等,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算设备上执行、部分地在用户设备上执行、作为一个独立的软件包执行、部分在用户计算设备上部分在远程计算设备上执行、或者完全在远程计算设备或服务器上执行。在涉及远程计算设备的情形中,远程计算设备可以通过任意种类的网络,包括局域网(LAN)或广域网(WAN),连接到用户计算设备,或者,可以连接到外部计算设备(例如利用因特网服务提供商来通过因特网连接)。
应当注意,尽管在上文详细描述中提及了用于动作执行的设备的若干模块或者单元,但是这种划分并非强制性的。实际上,根据本公开的实施方式,上文描述的两个或更多模块或者单元的特征和功能可以在一个模块或者单元中具体化。反之,上文描述的一个模块或者单元的特征和功能可以进一步划分为由多个模块或者单元来具体化。
此外,尽管在附图中以特定顺序描述了本公开中方法的各个步骤,但是,这并非要求或者暗示必须按照该特定顺序来执行这些步骤,或是必须执行全部所示的步骤才能实现期望的结果。附加的或备选的,可以省略某些步骤,将多个步骤合并为一个步骤执行,以及/或者将一个步骤分解为多个步骤执行等。
通过以上实施方式的描述,本领域的技术人员易于理解,这里描述的示例实施方式可以通过软件实现,也可以通过软件结合必要的硬件的方式来实现。因此,根据本公开实施方式的技术方案可以以软件产品的形式体现出来,该软件产品可以存储在一个非易失性存储介质(可以是CD-ROM,U盘,移动硬盘等)中或网络上,包括若干指令以使得一台计算设备(可以是个人计算机、服务器、移动终端、或者网络设备等)执行根据本公开实施方式的方法。
本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本公开的其它实施方案。本公开旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本公开的真正范围和精神由所附的权利要求指出。

Claims (14)

  1. 一种检测方法,包括:
    将目标图像输入预先训练好的第一神经网络,输出热力图,其中,所述热力图反映人头所在位置;
    将所述目标图像输入第二神经网络,输出特征图;
    将所述热力图与所述特征图特征融合,确定加权特征图;
    将所述加权特征图输入预先训练好的第三神经网络,输出多个检测框;
    根据所述热力图中得分大于阈值的坐标,确定多个检测框中坐标得分最大的检测框为检测结果。
  2. 根据权利要求1所述的检测方法,还包括:
    获取目标图像;
    将所述目标图像中人头进行矩形框标注,确定标注框,其中,标注框包括坐标信息且标注框的中心对应人头区域中心;
    根据所述目标图像输入第一神经网络后的输出,确定热力图;
    根据所述矩形框标注的中心点,确定真值热力图;
    根据热力图和真值热力图,训练第一神经网络。
  3. 根据权利要求1所述的检测方法,还包括:
    获取第一样本图像;
    将所述第一样本图像中人头进行矩形框标注,确定标注框,其中,所述标注框包括坐标信息且标注框的中心对应人头区域中心;
    根据所述第一样本图像输入第一神经网络后的输出,确定所述第一样本图像对应的热力图;
    根据所述标注框的中心点,确定所述第一样本图像对应的真值热力图;
    根据所述第二样本图像对应的热力图和真值热力图,训练所述第一神经网络。
  4. 根据权利要求1或2所述的检测方法,其中,所述将目标图像输入第二神经网络,输出特征图包括:
    将目标图像输入第二神经网络,输出与所述热力图尺寸相同的特征图。
  5. 根据权利要求1所述的检测方法,其中,还包括:
    将所述特征图与热力图通过加权进行特征融合,确定加权特征图;
    根据所述加权特征图,训练第三神经网络。
  6. 根据权利要求1-5任一项所述的检测方法,还包括:
    将第二样本图像输入预先训练好的第一神经网络,输出所述第二样本图像对应的热力图;
    将所述第二样本图像输入第二神经网络,输出所述第二样本图像对应的特征图;
    将所述第二样本图像对应的热力图和特征图进行特征融合,确定所述第二样本图像对应的加权特征图;
    根据所述第二样本图像对应的加权特征图,训练所述第三神经网络。
  7. 根据权利要求1-6任一项所述的检测方法,其中,将所述特征图与热力图通过加权进行特征融合,确定加权特征图包括:
    将所述热力图分别与每个通道上的特征图逐点相乘,并根据每个通道上的特征图对应的权重进行加权求和,确定加权特征图,其中,热力图包括一个通道,特征图包括至少一个通道。
  8. 根据权利要求1-7任一项所述的检测方法,还包括:
    根据反向传播算法,更新第二神经网络、第三神经网络的参数。
  9. 根据权利要求6所述的检测方法,其中,所述根据所述第二样本图像对应的加权特征图,训练所述第三神经网络包括:
    将所述第二样本图像对应的加权特征图输入所述第三神经网络,输出所述第二样本图像对应的多个检测框;
    根据所述第二样本图像对应的热力图中得分大于阈值的坐标,确定所述第二样本图像对应的多个检测框中坐标得分最大的检测框,作为所述第二样本图像对应的检测结果;
    根据所述第二样本图像对应的检测结果和所述第二样本图像中人头的位置信息,根据反向传播算法,更新所述第二神经网络的参数和所述第三神经网络的参数,以训练所述第二神经网络和所述第三神经网络。
  10. 根据权利要求1-9任一项所述的检测方法,还包括:
    所述第一神经网络部署在第一线程;
    所述第二神经网络、所述第三神经网络部署在第二线程。
  11. 一种检测装置,其中,包括:
    第一神经网络模块,用于将目标图像输入预先训练好的第一神经网络,输出热力 图,其中,所述热力图反映人头所在位置;
    第二神经网络模块,用于将目标图像输入第二神经网络,输出特征图;
    特征融合模块,用于将所述热力图与所述特征图特征融合,确定加权特征图;
    第三神经网络模块,用于将所述加权特征图输入预先训练好的第三神经网络,输出多个检测框;
    检测结果确定模块,用于根据所述热力图中得分大于阈值的坐标,确定多个检测框中坐标得分最大的检测框为检测结果。
  12. 一种电子设备,其中,包括:
    处理器;以及
    存储器,用于存储所述处理器的可执行指令;
    其中,所述处理器配置为经由执行所述可执行指令来执行权利要求1~10中任意一项所述的检测方法。
  13. 一种计算机可读存储介质,其上存储有计算机程序,其中,所述计算机程序被处理器执行时实现权利要求1~10中任意一项所述的检测方法。
  14. 一种计算机程序,包括:指令,所述指令被处理器执行时实现如权利要求1至10任一项所述的检测方法。
PCT/CN2023/138649 2022-12-14 2023-12-14 检测方法及相关设备 WO2024125583A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211609796.1A CN115830638A (zh) 2022-12-14 2022-12-14 基于注意力机制的小尺寸人头检测方法及相关设备
CN202211609796.1 2022-12-14

Publications (1)

Publication Number Publication Date
WO2024125583A1 true WO2024125583A1 (zh) 2024-06-20

Family

ID=85547329

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/138649 WO2024125583A1 (zh) 2022-12-14 2023-12-14 检测方法及相关设备

Country Status (2)

Country Link
CN (1) CN115830638A (zh)
WO (1) WO2024125583A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115830638A (zh) * 2022-12-14 2023-03-21 中国电信股份有限公司 基于注意力机制的小尺寸人头检测方法及相关设备

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200143204A1 (en) * 2018-11-01 2020-05-07 International Business Machines Corporation Image classification using a mask image and neural networks
CN111339934A (zh) * 2020-02-25 2020-06-26 河海大学常州校区 一种融合图像预处理与深度学习目标检测的人头检测方法
CN111931764A (zh) * 2020-06-30 2020-11-13 华为技术有限公司 一种目标检测方法、目标检测框架及相关设备
CN113920538A (zh) * 2021-10-20 2022-01-11 北京多维视通技术有限公司 目标检测方法、装置、设备、存储介质及计算机程序产品
CN114067186A (zh) * 2021-09-26 2022-02-18 北京建筑大学 一种行人检测方法、装置、电子设备及存储介质
CN114140683A (zh) * 2020-08-12 2022-03-04 天津大学 一种航拍图像目标检测的方法、设备与介质
CN115830638A (zh) * 2022-12-14 2023-03-21 中国电信股份有限公司 基于注意力机制的小尺寸人头检测方法及相关设备

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200143204A1 (en) * 2018-11-01 2020-05-07 International Business Machines Corporation Image classification using a mask image and neural networks
CN111339934A (zh) * 2020-02-25 2020-06-26 河海大学常州校区 一种融合图像预处理与深度学习目标检测的人头检测方法
CN111931764A (zh) * 2020-06-30 2020-11-13 华为技术有限公司 一种目标检测方法、目标检测框架及相关设备
CN114140683A (zh) * 2020-08-12 2022-03-04 天津大学 一种航拍图像目标检测的方法、设备与介质
CN114067186A (zh) * 2021-09-26 2022-02-18 北京建筑大学 一种行人检测方法、装置、电子设备及存储介质
CN113920538A (zh) * 2021-10-20 2022-01-11 北京多维视通技术有限公司 目标检测方法、装置、设备、存储介质及计算机程序产品
CN115830638A (zh) * 2022-12-14 2023-03-21 中国电信股份有限公司 基于注意力机制的小尺寸人头检测方法及相关设备

Also Published As

Publication number Publication date
CN115830638A (zh) 2023-03-21

Similar Documents

Publication Publication Date Title
CN108345890B (zh) 图像处理方法、装置和相关设备
JP7065199B2 (ja) 画像処理方法及び装置、電子機器、記憶媒体並びにプログラム製品
WO2024125583A1 (zh) 检测方法及相关设备
CN111523413B (zh) 生成人脸图像的方法和装置
WO2019116354A1 (en) Training of artificial neural networks using safe mutations based on output gradients
CN108648140B (zh) 图像拼接方法、系统、设备及存储介质
CN112200041B (zh) 视频动作识别方法、装置、存储介质与电子设备
CN114821605B (zh) 一种文本的处理方法、装置、设备和介质
KR102352942B1 (ko) 객체 경계정보의 주석을 입력하는 방법 및 장치
WO2024060558A1 (zh) 可行域预测方法、装置、系统和存储介质
CN110163052B (zh) 视频动作识别方法、装置和机器设备
CN113724128A (zh) 一种训练样本的扩充方法
CN113626612A (zh) 一种基于知识图谱推理的预测方法和系统
CN114723646A (zh) 带标注的图像数据生成方法、装置、存储介质及电子设备
CN112686317A (zh) 神经网络训练方法、装置、电子设备及存储介质
CN114549369A (zh) 数据修复方法、装置、计算机及可读存储介质
CN114580425A (zh) 命名实体识别的方法和装置,以及电子设备和存储介质
JP2022103136A (ja) 画像処理の方法、デバイス及びコンピュータ可読記憶媒体
Yang et al. Never forget: Balancing exploration and exploitation via learning optical flow
CN114581652A (zh) 目标对象的检测方法、装置、电子设备以及存储介质
CN116229535A (zh) 人脸检测模型的训练方法、人脸检测方法及装置
CN116049691A (zh) 模型转换方法、装置、电子设备和存储介质
CN113191364B (zh) 车辆外观部件识别方法、装置、电子设备和介质
EP4170552A1 (en) Method for generating neural network, and device and computer-readable storage medium
CN114882557A (zh) 一种人脸识别方法和装置