CN114913550A - Wounded person identification method and system based on deep learning under wound point gathering scene - Google Patents

Wounded person identification method and system based on deep learning under wound point gathering scene Download PDF

Info

Publication number
CN114913550A
CN114913550A CN202210599623.XA CN202210599623A CN114913550A CN 114913550 A CN114913550 A CN 114913550A CN 202210599623 A CN202210599623 A CN 202210599623A CN 114913550 A CN114913550 A CN 114913550A
Authority
CN
China
Prior art keywords
wounded
layer
picture
size
pictures
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210599623.XA
Other languages
Chinese (zh)
Inventor
楼云江
彭建文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Graduate School Harbin Institute of Technology
Original Assignee
Shenzhen Graduate School Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Graduate School Harbin Institute of Technology filed Critical Shenzhen Graduate School Harbin Institute of Technology
Priority to CN202210599623.XA priority Critical patent/CN114913550A/en
Publication of CN114913550A publication Critical patent/CN114913550A/en
Priority to PCT/CN2022/128628 priority patent/WO2023231290A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4038Image mosaicing, e.g. composing plane images from plane sub-images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes
    • G06T7/62Analysis of geometric attributes of area, perimeter, diameter or volume
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/32Indexing scheme for image data processing or generation, in general involving image mosaicing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30242Counting objects in image

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Geometry (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a wounded person identification method and system based on a deep learning neural network, wherein the method comprises the following steps: s10, collecting at least one wounded picture in the wounded point collecting environment through a depth camera, and collecting the wounded pictures into a live data set of the wounded pictures; s20, generating an additional wounded picture with a smaller size aiming at the original wounded picture shot in a close range in the field data set in a data augmentation mode, and storing the additional wounded picture and the original wounded picture in the field data set after being related; s30, inputting the live pictures shot by the depth camera into a neural network based on deep learning to calculate the number of wounded persons in the live pictures, wherein the neural network is trained by a pre-training data set and the live data set. The system includes a depth camera, a memory, and a processor that implements the method when executing instructions stored in the memory.

Description

Wounded person identification method and system based on deep learning under wound point gathering scene
Technical Field
The invention relates to a wounded person identification method and system based on deep learning under a wound site scene, and belongs to the technical field of computer software, particularly robot visual identification. The technical scheme of the invention is particularly suitable for the field of emergency rescue application.
Background
In the event of large-scale casualties in collapse environments such as earthquakes, fires and accidents, rescue teams generally concentrate the wounded on a certain area, and the area has the following characteristics: open and far from the disaster site; flat ground or grass; possesses the transportation characteristic and is convenient for follow-up wounded person to send afterwards. Such an injured person concentrated placement area is generally referred to as a concentrated injury point. In order to quickly classify the injury of the wounded in the scene of the injury point, a system is required to be designed to carry a navigation sensor and a medical sensor to the side of the wounded for autonomous injury detection. As shown in fig. 1, in a scene of a flaw collection point after an earthquake, a four-footed robot dog carries a mechanical arm, a depth camera, a radar and various medical sensors as a hardware system for wounded triage of the flaw collection point (a dotted line area in fig. 1). Therefore, a system is needed, which has the primary task of quickly identifying the position of the wounded, the position of the wounded point, whether the wounded exists or not in the wounded point environment.
Disclosure of Invention
The invention provides a wounded person identification method and system based on deep learning in a wound site gathering scene, and aims to at least solve one of technical problems in the prior art.
The technical scheme of the invention is that on one hand, the wounded identification method based on the deep learning neural network comprises the following steps:
s10, collecting at least one wounded picture in the wounded point collecting environment through a depth camera, and collecting the wounded pictures into a live data set of the wounded pictures;
s20, generating an additional wounded picture with a smaller size aiming at the original wounded picture shot in a close range in the field data set in a data augmentation mode, and storing the additional wounded picture and the original wounded picture in the field data set after being related;
s30, inputting the live pictures shot by the depth camera into a neural network based on deep learning to calculate the number of wounded persons in the live pictures, wherein the neural network is trained by a pre-training data set and the live data set.
Further, the step S10 includes:
in the process of moving the mobile device to the injury-concentrated environment, image acquisition is carried out from a far position through a depth camera carried on the mobile device, wherein the mobile device comprises a four-footed robot dog, a mobile robot, a mobile intelligent vehicle or a flying unmanned aerial vehicle.
Further, the step S20 includes:
s21, determining the picture of the wounded shot in the short distance as a picture with low small target ratio;
and S22, reducing the image size of the picture with the small target occupying a relatively low ratio to one fourth of the original image size, and splicing the four reduced pictures into the picture with the same size as the original image.
Further, the step S21 includes:
and identifying that the area range of the picture occupied by the human body outline in the shot picture is smaller than a preset pixel product threshold value, and determining the picture as the picture containing the small target.
Further, for the step S30, the neural network includes: the underlying network layer of the VGG16 network architecture; an auxiliary convolution layer for extracting feature maps of different scales; predicted convolutional layers, including a position predicted convolutional layer and a class predicted convolutional layer.
Further, the neural network is configured to: sending the feature maps output by the conv4_3 layer, the conv7 layer, the conv8_2 layer, the conv9_2 layer, the conv10_2 layer and the conv11_2 layer to a prediction convolution layer to acquire the position information and the classification information of the wounded in the picture; feature maps output by the conv4_3 layer, the conv7 layer and the conv8_2 layer are fused into a feature map with the same size as the conv4_3 layer output feature map, and are sent to the prediction convolutional layer instead of the conv4_3 layer feature map.
Further, the neural network is configured to:
h of conv4_3 layer 1 xW 1 xC 1 The size characteristic diagram passes through a convolution layer with the size of 3x3, then is connected with an L2norm layer and then is connected with a ReLu layer, and therefore a first characteristic diagram with the same size is output;
h of Conv7 layer 2 xW 2 xC 2 The size characteristic diagram passes through the deconvolution layer, is connected with the L2norm layer and then is connected with the ReLu layer, so that the output size is H 1 xW 1 xC 1 A second profile of size/2;
conv8_2 layer H 3 xW 3 xC 3 The feature diagram of size is connected with the L2norm layer and the ReLu layer through the deconvolution layer, so that the output size is H 1 xW 1 xC 1 A third profile of size/2;
generating H by splicing the output first characteristic diagram, the second characteristic diagram and the third characteristic diagram 1 xW 1 xC 1 Size 2 signature, wherein H, W and C represent the dimension of the signature, respectively.
Further, the training of the neural network comprises the steps of:
s40, calculating a loss value of each point of the feature map output by the predicted convolutional layer through a loss function, and updating the model parameters of the neural network in the training process of the neural network until the sum of the loss values of all the points of the trained feature map is less than a preset threshold, stopping training, wherein the calculation of the loss function comprises:
calculating the overlapping degree of the prediction frame and the real frame of each point of the characteristic diagram; if the overlapping degree is larger than the set threshold value, the class marked by the prediction frame and the class marked by the real frame are the same, and the prediction frame is set as a positive class; if the overlapping degree is smaller than the set threshold value, the class marked by the prediction frame is considered as the background and is set as a negative class;
the loss function of the prediction box and the real box is equal to the sum of the position loss and the classification loss of the prediction box, wherein
The position loss of the prediction frame is calculated by
Figure BDA0003669450340000031
The classification loss of the prediction box is calculated in the manner of
Figure BDA0003669450340000032
Wherein L is loc Position loss function, L class Is a classification loss function, N p Number of positive class prediction boxes, N n Is the number of negative class prediction boxes, Box i_pred Is the coordinate information of the prediction Box, Box i_real Is the coordinate information of the corresponding real frame, Distance () represents the calculation function of the euclidean Distance between the coordinates, and CE _ loss represents the cross entropy loss function.
Further, the training of the neural network may further include the steps of:
s51, capturing a plurality of wounded pictures through the Internet and adding the pictures into a pre-training data set;
and S52, reducing the image size of the original wounded picture shot in a close range in the pre-data set to one fourth of the original image size in a data augmentation mode, splicing the four reduced pictures into a picture with the same size as the original image, and storing the spliced wounded picture and the original wounded picture in the pre-data set after being associated.
Another aspect of the present invention relates to a wounded person identification system, including:
at least one depth camera carried by a mobile device;
a computer device connected to the depth camera, the computer device comprising a computer readable storage medium having stored thereon program instructions that, when executed by a processor, implement the method described above.
The beneficial effects of the invention are as follows.
The identification method and the identification system in the technical scheme can adapt to the conditions of quickly identifying the position of the wounded, the position of the wounded point, whether the wounded exists and the like in the process of going to the wounded point through the neural network based on-site self-learning. Therefore, the time for the wounded to stay at the wound collecting point can be greatly reduced, and precious time is strived for the wounded to quickly transfer to the corresponding treatment point.
Drawings
Fig. 1 is a schematic diagram of an autonomous triage hardware system in a site of injury collection in one example.
Fig. 2 is a basic flow diagram of a victim identification method in an embodiment in accordance with the present invention.
Fig. 3 is a flow chart of the method for augmenting the picture of the small-sized wounded person in a picture augmentation manner according to the embodiment of the present invention.
Fig. 4 is a schematic diagram of an injury identification network infrastructure in an embodiment of the method according to the invention.
FIG. 5 is a schematic diagram of a wounded identification network assisted convolutional layer in an embodiment of the method according to the present invention.
FIG. 6 is a schematic diagram of a wounded identification network predicted convolutional layer in an embodiment of the method according to the present invention.
FIG. 7 is a schematic diagram of a feature fused victim identification network prediction convolutional layer in an embodiment of a method according to the present invention.
FIG. 8 is a schematic illustration of feature map fusion details in an embodiment of a method according to the invention.
FIG. 9 is a flow chart of training of the triage identification network in an embodiment of a method according to the invention.
Fig. 10 is a schematic illustration of a picture of an injured person in a training data set in an embodiment of the method according to the invention.
Fig. 11 is a diagram of the effect of the wounded recognition experiment in the scene of the collected injury point according to the technical solution of the present invention, wherein the simulated wounded is located on the paved road surface.
Fig. 12 is a diagram of the experimental effect of identifying the wounded under the scene of the gathered wounded point, wherein the simulated wounded is on a flat lawn.
It should be understood that the words which have been used in the specification and drawings are words of description rather than limitation, and are used in any combination with the words of description in this specification and its accompanying drawings without departing from the spirit and scope of the invention.
Detailed Description
The conception, the specific structure and the technical effects of the present invention will be clearly and completely described in conjunction with the embodiments and the accompanying drawings to fully understand the objects, the schemes and the effects of the present invention.
It should be noted that, unless otherwise specified, when a feature is referred to as being "fixed" or "connected" to another feature, it can be directly fixed or connected to the other feature or indirectly fixed or connected to the other feature. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. Furthermore, unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art. The terminology used in the description herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the term "and/or" includes any combination of one or more of the associated listed items.
It will be understood that, although the terms first, second, third, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element of the same type from another. For example, a first element could be termed a second element, and, similarly, a second element could be termed a first element, without departing from the scope of the present disclosure. The use of any and all examples, or exemplary language ("e.g.," such as "or the like") provided herein, is intended merely to better illuminate embodiments of the invention and does not pose a limitation on the scope of the invention unless otherwise claimed.
Referring to fig. 1, in some embodiments, the wounded identification system according to the present invention is generally applied to identify a concentrated wound point location in a concentrated wound point scene, identify whether there is a wounded or the number of wounded in the concentrated wound point, and the like. The victim identification system can include at least one depth camera carried by a mobile device and a computer device connected to the depth camera. Wherein, the mobile device includes four-footed machine dog, mobile robot, removal smart car or flight unmanned aerial vehicle. Referring to fig. 1, taking a four-footed robot dog as an example, a depth camera 110 (such as a depth camera model of Intel Realsense D455) carried by a robot dog 100 acquires field image data at an accident scene, and then generates pictures for a computer device to operate and analyze. As shown in fig. 1, in the working task of the machine dog 100, it is necessary to search and advance from a far place to each injury-collecting point (one injury-collecting point is shown as a dotted area in fig. 1, and there may be a plurality of injury-collecting points), detect the injured person, and gradually approach each injured person from far to near according to the detection result. In the process, an application program running in the computer device implements the wounded identification method according to the present invention, and intercepts the pictures containing the wounded by analyzing the images (or videos) transmitted by the depth camera 110 in real time, analyzes the number classification of the wounded, and then obtains the orientation of the wounded relative to the robot dog 100 by back calculation according to the shooting angle of the wounded pictures. The wounded recognition method according to the present invention implemented in the computer device is described in more embodiments in detail below with reference to fig. 2 to 12.
Referring to fig. 2, in some embodiments, a method of triage identification according to the present invention includes at least the steps of:
s10, collecting at least one wounded picture in the wounded point collecting environment through a depth camera, and collecting the wounded pictures into a live data set of the wounded pictures;
s20, generating an additional wounded picture with a smaller size aiming at the original wounded picture shot in a close range in the field data set in a data augmentation mode, and storing the additional wounded picture and the original wounded picture in the field data set after being related;
and S30, inputting the live pictures taken by the depth camera into the neural network based on the deep learning to calculate the number of wounded persons in the live pictures. If the victim picture used for detection is an additional victim picture of a smaller size, which is size modified, the identified number of victims result can be associated with the original victim picture of the original size.
Further, the neural network is trained by a pre-training dataset and the live dataset. Therefore, the method according to the invention may further comprise a training step of the neural network:
and S40, calculating the loss value of each point of the feature map output by the predicted convolutional layer through a loss function, and updating the model parameters of the neural network in the training process of the neural network until the sum of the loss values of all the points of the trained feature map is less than a preset threshold value, and stopping training.
Detailed description of step S10
During the travel of the mobile device to the injury-focus environment, image acquisition is performed by a depth camera mounted on the mobile device (such as a four-footed robot dog) from a remote location. In the process, an application program running in the computer equipment intercepts pictures containing the wounded through real-time analysis of images (or videos) transmitted by the depth camera, the pictures are used for analyzing the number classification of the wounded, and then the orientation of the wounded relative to the mobile equipment is obtained through inverse calculation by utilizing the shooting angle of the pictures of the wounded.
In addition, the wounded picture acquired by the onboard depth camera and marked with certain marks (such as system review confirmation, superior system confirmation or confirmation by other auxiliary sensors) can be used for making a wounded picture data set for continuous updating training of a neural network so as to learn to adapt to the scene of the wounded on site.
Detailed description of step S20
S21, determining the close-distance shot wounded picture as a picture with a low small target ratio, wherein if the area range of the picture occupied by the human body outline in the shot picture is identified to be smaller than a preset pixel product threshold (such as 32 pixels x32 pixels), determining the picture as a picture containing the small target;
and S22, reducing the image size of the picture with the small target occupying a relatively low ratio to one fourth of the original image size, and splicing the four reduced pictures into the picture with the same size as the original image.
Specifically, a depth camera carried by a mobile device (such as a four-footed robot dog) detects the wounded from a far place and gradually approaches the wounded from far to near according to the detection result, and because the visual field range of the depth camera is wide, the wounded detecting process of the four-footed robot moving to the side of the wounded mostly belongs to small target detection, namely the wounded belongs to a target which is less than 32 pixels x32 pixels in the visual field range of the camera.
And the sizes of the wounded in the corresponding pictures in the initial data set are not in accordance with the characteristic. In order to increase the small target wounded sample in the wounded data set, the present section proposes a data enhancement method, aiming to train the target detector in a data balance manner, construct and expand the small target sample by using an oversampling method, reduce the image size of the small target with a relatively low small target to 1/4, and splice 4 reduced images into a picture with the same size as the original image for sending to the network. The stitched image inevitably contains smaller target objects, thereby increasing the weight of the small target data, as shown in fig. 3.
With continued reference to fig. 3, preferably, if the victim size in the victim picture is smaller than a fixed value, the picture is considered to contain a small-sized victim and can be directly input to the victim detection network, otherwise, the picture needs to pass through a preprocessing module and is reduced to 1/4, and the four pictures are combined into a picture with an original size, so that the picture can contain a small-sized target victim and can be input to the victim detection network. Of course, when four pictures are synthesized, the four pictures can be different, so that the wounded data set can be greatly increased, and the target number of small-size wounded is increased. It should be noted that, since the integrated picture obtained by the preprocessing module from multiple pictures is equal to the number of wounded persons appearing multiple times (e.g. 4 times), the detection structure of the number of wounded persons obtained by the detection network recognizing the integrated picture needs to be divided by the original integration multiple times (e.g. 4) to obtain the actual number of wounded persons.
Detailed description of step S30
The neural network structure for identifying the wounded comprises a basic network layer, an auxiliary convolution layer and a prediction convolution layer. The basic network layer can use a network architecture of VGG16 as a basic network, the auxiliary convolutional layer is used for extracting feature maps with different scales, the detection and positioning of wounded persons by using the feature maps of a plurality of different depth layers are used for realizing the detection of wounded persons with different sizes, and the auxiliary convolutional layer outputs 4 feature maps with different scales in the wounded person identification network. The predicted convolutional layer is divided into a position prediction layer having four position information and a classification prediction layer having two information. The basic network, the auxiliary convolutional layer, and the prediction convolutional layer are shown in fig. 4, 5, and 6, respectively.
Under the scene of a wound gathering point, the mobile equipment carries a depth camera to identify the wounded from far, then the mobile equipment slowly approaches the wounded from far to near according to the position of the wounded, and meanwhile, because the visual field range of the camera is wide, the process of identifying the wounded from far to near is carried out by the mobile equipment, and the wounded with small size is identified most of the time. In some preferred embodiments, the triage identification network may be optimized in the following manner, taking into account small-size triage objectives.
Preferably, referring to fig. 7, according to the basic network and the auxiliary convolutional layer of the victim identification network, the network adopted in the identification method according to the present invention is a network that obtains the picture kind victim location information and classification information by outputting six feature maps of conv4_3 layer, conv7 layer, conv8_2 layer, conv9_2 layer, conv10_2 layer, and conv11_2 layer to the prediction convolutional layer. In order to better obtain context information as additional information to help the detection of small-size wounded persons, three feature maps output by the conv4_3 layer, the conv7 layer and the conv8_2 layer are subjected to feature fusion to form a feature map with the same size as the conv4_3 layer output feature map, and the feature map is input into the prediction convolutional layer instead of the conv4_3 layer feature map. The reason why the conv4_3 layer, the conv7 layer and the conv8_2 layer are selected for feature fusion instead of the last three layers is that the small-sized victim pixels have limited and low resolution, so a feature map of a larger size is required for fusion. Fig. 7 shows a schematic diagram of the wounded identification network prediction convolutional layer after feature fusion.
Referring to fig. 8, specific feature fusion details are as follows:
(1) conv4_3 layer feature map H 1 xW 1 xC 1 Connecting an L2norm layer and a ReLu layer through a rolling layer (wherein the convolution size is 3x3, padding is 1 and stride is 1), and outputting a characteristic diagram with the same size;
(2) conv7 layer signature graph H 2 xW 2 xC 2 After passing through the deconvolution layer, the L2norm layer is connected, and then the ReLu layer is connected, so that the output size is H 1 xW 1 xC 1 A profile of size/2;
(3) conv8_2 layer feature graph H 3 xW 3 xC 3 After passing through the deconvolution layer, the L2norm layer is connected, and then the ReLu layer is connected, so that the output size is H 1 xW 1 xC 1 A profile of size/2;
(4) and generating H by splicing the three finally output characteristic graphs 1 xW 1 xC 1 Feature map size 2.
Wherein H 1 xW 1 xC 1 Is 38x38x512, H 2 xW 2 xC 2 Is 19x19x1024, H 3 xW 3 xC 3 Is 10x10x 512.
Detailed description of step S40
The overall flow chart of the network training is shown in fig. 9, in which a batch of picture data (such as a pre-training data set, a live data set, and an initial data set) needs to be input into the neural network model for training, and then used for adjusting the model parameters through the calculation of the loss function. The key to training the network is to determine the computation of the loss function. The prediction convolutional layer predicts the position information and classification information of each point of the feature map, and the loss value of the point is equal to the sum of the loss of the position and the loss of the classification. IoU (overlapping degree) of the prior box and the real box of each point of the feature map is calculated, if IoU is greater than a set fixed threshold, the class marked by the prior box and the real box is the same and is called as a positive class; if the value is less than the set threshold value, the class marked by the prior frame is considered as the background and is called as a negative class. The penalty function for the prediction box and the real box equal to the sum of the penalty for the prediction box position and the penalty for the classification can be expressed as
Loss=L class +L loc (1.1)
Where Loss is the overall Loss function, L class Is a classification loss function, L loc A position loss function. L is class And L loc The calculation method is shown in the following formulas 1.2 and 1.3:
Figure BDA0003669450340000071
Figure BDA0003669450340000072
wherein N is p Predict the number of frames for the positive class, N n Predicting the number of boxes for negative classes, Box i_pred Coordinate information, Box, referring to the prediction Box i_real The method refers to coordinate information corresponding to a true value frame, Distance () represents the Euclidean Distance between the solved coordinates, and CE _ loss represents the solved cross entropy loss function.
In addition, for the initial training data set of the network model, the public data sets are mostly target detection data sets and pedestrian detection data sets, and no specific wounded detection data set exists. Due to the particular lying posture of the wounded, the accuracy of the results of the wounded detection is not satisfactory if there is no particular wounded data set. The method comprises the steps of simulating and collecting wounded pictures in a wounded spot scene by using a depth camera carried by a four-footed robot dog, searching the wounded pictures by using the Internet, and sorting thousands of wounded pictures to form an initial pre-training data set. Then, the image size of the original wounded picture shot in a close range in the pre-data set can be reduced to one fourth of the original image size by adopting a data augmentation mode, the four reduced pictures are spliced into the picture with the same size as the original image, and the spliced wounded picture is associated with the original wounded picture and then stored in the pre-data set, so that thousands of wounded targets are contained in the pre-data set. These pictures containing the victim constitute an initial version of the image dataset of the victim (as shown in figure 10) for training by the network model.
Experimental validation of the victim identification method and system according to the present invention
In order to better approach to a wound-collecting scene, the experimental scene is firstly determined to be a wider cement ground or grassland, meanwhile, four wounded models are enabled to regularly lie on the ground, other pedestrians simulate rescue workers to shuttle among the wounded models, and four-footed robot dogs carry depth cameras to identify the wounded from far to near. Experimental results are shown in FIG. 10, where the LYING PERSON identifier indicates the detected victim.
It should be recognized that the method steps in embodiments of the present invention may be embodied or carried out by computer hardware, a combination of hardware and software, or by computer instructions stored in a non-transitory computer readable memory. The method may use standard programming techniques. Each program may be implemented in a high level procedural or object oriented programming language to communicate with a computer system. However, the program(s) can be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or interpreted language. Furthermore, the program can be run on a programmed application specific integrated circuit for this purpose.
Further, the operations of processes described herein can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The processes described herein (or variations and/or combinations thereof) may be performed under the control of one or more computer systems configured with executable instructions, and may be implemented as code (e.g., executable instructions, one or more computer programs, or one or more applications) collectively executed on one or more processors, by hardware, or combinations thereof. The computer program includes a plurality of instructions executable by one or more processors.
Further, the method may be implemented in any type of computing platform operatively connected to a suitable interface, including but not limited to a personal computer, mini computer, mainframe, workstation, networked or distributed computing environment, separate or integrated computer platform, or in communication with a charged particle tool or other imaging device, and the like. Aspects of the invention may be embodied in machine-readable code stored on a non-transitory storage medium or device, whether removable or integrated into a computing platform, such as a hard disk, optically read and/or write storage medium, RAM, ROM, or the like, such that it may be read by a programmable computer, which when read by the storage medium or device, is operative to configure and operate the computer to perform the procedures described herein. Further, the machine-readable code, or portions thereof, may be transmitted over a wired or wireless network. The invention described herein includes these and other different types of non-transitory computer-readable storage media when such media include instructions or programs that implement the steps described above in conjunction with a microprocessor or other data processor. The invention may also include the computer itself when programmed according to the methods and techniques described herein.
A computer program can be applied to input data to perform the functions described herein to transform the input data to generate output data that is stored to non-volatile memory. The output information may also be applied to one or more output devices, such as a display. In a preferred embodiment of the invention, the transformed data represents physical and tangible objects, including particular visual depictions of physical and tangible objects produced on a display.
The above description is only a preferred embodiment of the present invention, and the present invention is not limited to the above embodiment, and any modifications, equivalent substitutions, improvements, etc. within the spirit and principle of the present invention should be included in the protection scope of the present invention as long as the technical effects of the present invention are achieved by the same means. The invention is capable of other modifications and variations in its technical solution and/or its implementation, within the scope of protection of the invention.

Claims (10)

1. A wounded person identification method based on a deep learning neural network is characterized by comprising the following steps:
s10, collecting at least one wounded picture in the wounded point collecting environment through a depth camera, and collecting the wounded pictures into a live data set of the wounded pictures;
s20, generating an additional wounded picture with a smaller size aiming at the original wounded picture shot in a close range in the field data set in a data augmentation mode, and storing the additional wounded picture and the original wounded picture in the field data set after being related;
s30, inputting the live pictures shot by the depth camera into a neural network based on deep learning to calculate the number of wounded persons in the live pictures, wherein the neural network is trained by a pre-training data set and the live data set.
2. The method according to claim 1, wherein the step S10 includes:
in the process of moving the mobile device to the injury-concentrated environment, image acquisition is carried out from a far position through a depth camera carried on the mobile device, wherein the mobile device comprises a four-footed robot dog, a mobile robot, a mobile intelligent vehicle or a flying unmanned aerial vehicle.
3. The method according to claim 1, wherein the step S20 includes:
s21, determining the picture of the wounded shot in the short distance as a picture with low small target ratio;
and S22, reducing the image size of the picture with the small target occupying a relatively low ratio to one fourth of the original image size, and splicing the four reduced pictures into the picture with the same size as the original image.
4. The method according to claim 3, wherein the step S21 includes:
and identifying that the area range of the picture occupied by the human body outline in the shot picture is smaller than a preset pixel product threshold value, and determining the picture as the picture containing the small target.
5. The method according to claim 1, wherein for the step S30, the neural network comprises:
the underlying network layer of the VGG16 network architecture;
an auxiliary convolution layer for extracting feature maps of different scales;
predicted convolutional layers, including a position predicted convolutional layer and a class predicted convolutional layer.
6. The method of claim 5, wherein the neural network is configured to:
sending feature maps output by the conv4_3 layer, the conv7 layer, the conv8_2 layer, the conv9_2 layer, the conv10_2 layer and the conv11_2 layer to a prediction convolutional layer to acquire wounded position information and classification information in a picture;
feature maps output by the conv4_3 layer, the conv7 layer and the conv8_2 layer are fused into a feature map with the same size as the conv4_3 layer output feature map, and are sent to the prediction convolutional layer instead of the conv4_3 layer feature map.
7. The method of claim 6, wherein the neural network is configured to:
h of conv4_3 layer 1 xW 1 xC 1 The convolution layer of the size characteristic diagram passing through 3x3 size is connected with the L2norm layer and then connected with the ReLu layer, thereby outputting the first characteristic diagram with the same size;
h of Conv7 layer 2 xW 2 xC 2 The size characteristic diagram passes through the deconvolution layer, is connected with the L2norm layer and then is connected with the ReLu layer, so that the output size is H 1 xW 1 xC 1 A second profile of size/2;
conv8_2 layer H 3 xW 3 xC 3 The feature diagram of size is connected with the L2norm layer and the ReLu layer through the deconvolution layer, so that the output size is H 1 xW 1 xC 1 A third profile of size/2;
generating H by splicing the output first characteristic diagram, the second characteristic diagram and the third characteristic diagram 1 xW 1 xC 1 Size of 2Wherein H, W and C represent the dimension of the feature map, respectively.
8. The method of claim 5, wherein the training of the neural network comprises the steps of:
s40, calculating the loss value of each point of the feature map output by the predicted convolutional layer through a loss function, and updating the model parameters of the neural network in the training process of the neural network until the sum of the loss values of all the points of the trained feature map is smaller than a preset threshold value, and stopping training, wherein the calculation of the loss function comprises the following steps:
calculating the overlapping degree of the prediction frame and the real frame of each point of the characteristic diagram; if the overlapping degree is larger than the set threshold value, the class marked by the prediction frame and the class marked by the real frame are the same, and the prediction frame is set as a positive class; if the overlapping degree is smaller than the set threshold value, the class marked by the prediction frame is considered as the background and is set as a negative class;
the loss function of the prediction box and the real box is equal to the sum of the position loss and the classification loss of the prediction box, wherein
The position loss of the prediction frame is calculated in the manner of
Figure FDA0003669450330000021
The classification loss of the prediction box is calculated in the manner of
Figure FDA0003669450330000022
Wherein L is loc Position loss function, L class Is a classification loss function, N p Number of positive class prediction boxes, N n Is the number of negative class prediction boxes, Box i_pred Is the coordinate information of the prediction Box, Box i_real Is the coordinate information of the corresponding real frame, Distance () represents the calculation function of the euclidean Distance between the coordinates, and CE _ loss represents the cross entropy loss function.
9. The method of any one of claims 1 to 8, wherein the training of the neural network comprises the steps of:
s51, capturing a plurality of wounded pictures through the Internet and adding the pictures into a pre-training data set;
and S52, reducing the image size of the original wounded picture shot in a close range in the pre-data set to one fourth of the original image size in a data augmentation mode, splicing the four reduced pictures into a picture with the same size as the original image, and storing the spliced wounded picture and the original wounded picture in the pre-data set after being associated.
10. A victim identification system, comprising:
at least one depth camera carried by a mobile device;
a computer device connected to the depth camera, the computer device comprising a computer readable storage medium having stored thereon program instructions that, when executed by a processor, implement the method of any of claims 1-9.
CN202210599623.XA 2022-05-30 2022-05-30 Wounded person identification method and system based on deep learning under wound point gathering scene Pending CN114913550A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210599623.XA CN114913550A (en) 2022-05-30 2022-05-30 Wounded person identification method and system based on deep learning under wound point gathering scene
PCT/CN2022/128628 WO2023231290A1 (en) 2022-05-30 2022-10-31 Casualty recognition method and system based on deep learning in casualty gathering place scene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210599623.XA CN114913550A (en) 2022-05-30 2022-05-30 Wounded person identification method and system based on deep learning under wound point gathering scene

Publications (1)

Publication Number Publication Date
CN114913550A true CN114913550A (en) 2022-08-16

Family

ID=82768950

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210599623.XA Pending CN114913550A (en) 2022-05-30 2022-05-30 Wounded person identification method and system based on deep learning under wound point gathering scene

Country Status (2)

Country Link
CN (1) CN114913550A (en)
WO (1) WO2023231290A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023231290A1 (en) * 2022-05-30 2023-12-07 哈尔滨工业大学(深圳) Casualty recognition method and system based on deep learning in casualty gathering place scene

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109344690B (en) * 2018-08-09 2022-09-23 上海青识智能科技有限公司 People counting method based on depth camera
CN109886085A (en) * 2019-01-03 2019-06-14 四川弘和通讯有限公司 People counting method based on deep learning target detection
US11443549B2 (en) * 2019-07-09 2022-09-13 Josh Lehman Apparatus, system, and method of providing a facial and biometric recognition system
CN114511710A (en) * 2022-02-10 2022-05-17 北京工业大学 Image target detection method based on convolutional neural network
CN114913550A (en) * 2022-05-30 2022-08-16 哈尔滨工业大学(深圳) Wounded person identification method and system based on deep learning under wound point gathering scene

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023231290A1 (en) * 2022-05-30 2023-12-07 哈尔滨工业大学(深圳) Casualty recognition method and system based on deep learning in casualty gathering place scene

Also Published As

Publication number Publication date
WO2023231290A1 (en) 2023-12-07

Similar Documents

Publication Publication Date Title
CN112740268B (en) Target detection method and device
US11755918B2 (en) Fast CNN classification of multi-frame semantic signals
US11836884B2 (en) Real-time generation of functional road maps
CN111401517B (en) Method and device for searching perceived network structure
CN113591872A (en) Data processing system, object detection method and device
CN112084835A (en) Generating map features based on aerial data and telemetry data
Sulistijono et al. Implementation of victims detection framework on post disaster scenario
US11900257B2 (en) Method for representing an environment of a mobile platform
CN114913550A (en) Wounded person identification method and system based on deep learning under wound point gathering scene
CN114972182A (en) Object detection method and device
Khalilullah et al. Road area detection method based on DBNN for robot navigation using single camera in outdoor environments
KR101862545B1 (en) Method and system for providing rescue service using robot
Vempati et al. Victim detection from a fixed-wing uav: Experimental results
CN113065637A (en) Perception network and data processing method
Gemerek Active vision and perception
Pathak et al. Mobile Rescue Robot
Xiang et al. A Study of Autonomous Landing of UAV for Mobile Platform
Klette et al. Computer Vision in Vehicles
US20220284623A1 (en) Framework For 3D Object Detection And Depth Prediction From 2D Images
Li et al. Multi-scale Small Target Detection for Indoor Mobile Rescue Vehicles Based on Improved YOLOv5
Das Vision-Based Lane and Vehicle Detection: A First Step Toward Autonomous Unmanned Vehicle
Shen A Novel Three-Dimensional Navigation Method for the Visually Impaired
Zbala et al. Image Classification for Autonomous Vehicles
Watson Improved Ground-Based Monocular Visual Odometry Estimation using Inertially-Aided Convolutional Neural Networks
Aburaya et al. Review of vision-based reinforcement learning for drone navigation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination