CN112749653A - Pedestrian detection method, device, electronic equipment and storage medium - Google Patents

Pedestrian detection method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN112749653A
CN112749653A CN202011637418.5A CN202011637418A CN112749653A CN 112749653 A CN112749653 A CN 112749653A CN 202011637418 A CN202011637418 A CN 202011637418A CN 112749653 A CN112749653 A CN 112749653A
Authority
CN
China
Prior art keywords
pedestrian detection
classification
positioning
pedestrian
detection model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011637418.5A
Other languages
Chinese (zh)
Inventor
王健宗
瞿晓阳
李佳琳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202011637418.5A priority Critical patent/CN112749653A/en
Priority to PCT/CN2021/083707 priority patent/WO2022141858A1/en
Publication of CN112749653A publication Critical patent/CN112749653A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention relates to an artificial intelligence technology, and discloses a pedestrian detection method, which comprises the following steps: performing data enhancement processing on the pedestrian detection data set to obtain a training image set; decomposing a classification positioning task in a traditional detection model, and adding a full-connection layer to obtain a decoupling pedestrian detection model; calculating a classification deviation and a positioning deviation, and generating a loss function according to the classification deviation and the positioning deviation; training the decoupling pedestrian detection model by using the training image set and the loss function to obtain a trained decoupling pedestrian detection model; and detecting the image to be detected by utilizing the decoupling pedestrian detection model to obtain pedestrian detection information. In addition, the invention also relates to a block chain technology, and the image to be detected can be stored in the node of the block chain. The invention also provides a pedestrian detection device, an electronic device and a computer readable storage medium. The pedestrian detection method and the pedestrian detection system can solve the problem of low accuracy of pedestrian detection results.

Description

Pedestrian detection method, device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of artificial intelligence technologies, and in particular, to a pedestrian detection method, apparatus, electronic device, and computer-readable storage medium.
Background
With the development of artificial intelligence, the automatic driving technology has also been developed. The automatic driving technology is an important enabling technology in artificial intelligence, has important research and engineering application values, and effective pedestrian detection and avoidance are one of core links of the automatic driving technology.
Conventional pedestrian detection methods (such as fast RCNN, SSD, YOLO, etc.) typically learn together with classification and regression localization, sharing region boxes and feature extractors for potential presence of objects. However, the classification task focuses more on the places with rich semantic information, and the regression task focuses more on the bounding box of the object, so that the traditional pedestrian detection method has some inherent contradictions to the region box and the feature extractor which are potentially present in the same object and are shared by the classification and regression tasks, which affects the training of the detection model, and the accuracy of the detection result is low.
Disclosure of Invention
The invention provides a pedestrian detection method, a pedestrian detection device and a computer-readable storage medium, and mainly aims to solve the problem of low accuracy of a pedestrian detection result.
In order to achieve the above object, the present invention provides a pedestrian detection method, including:
acquiring a pedestrian detection data set, and performing data enhancement processing on the pedestrian detection data set to obtain a training image set;
decomposing a classification positioning task in a traditional detection model, and adding a full-connection layer to obtain a decoupling pedestrian detection model;
respectively calculating classification deviation and positioning deviation by utilizing the full connection layer, and generating a loss function of the decoupling pedestrian detection model according to the classification deviation and the positioning deviation;
training the decoupling pedestrian detection model by using the training image set and the loss function to obtain a trained decoupling pedestrian detection model;
and detecting the image to be detected by utilizing the decoupling pedestrian detection model to obtain pedestrian detection information.
Optionally, decomposing the classification positioning task in the conventional detection model, and adding a full connection layer to obtain a decoupling pedestrian detection model, includes:
respectively constructing a classification network and a positioning network based on a deep learning algorithm;
replacing a classification detection network in a conventional detection model with the classification network and the positioning network;
and connecting the pre-constructed classification full-connection layer behind the classification network and connecting the pre-constructed positioning full-connection layer behind the positioning network to obtain the decoupling pedestrian detection model.
Optionally, the replacing the classification detection network in the conventional detection model with the classification network and the positioning network includes:
deleting the classified detection network in the traditional detection model;
and connecting the classification network and the positioning network in parallel to a feature extraction network in the traditional detection model.
Optionally, the respectively calculating the classification deviation and the positioning deviation by using the fully-connected layer includes:
judging whether each pixel point in a classification frame predicted by the classification network contains a target class through the classification full-link layer, and counting the pixel points without the target class to obtain a classification deviation;
and calculating the intersection ratio of the positioning frame predicted by the positioning network and the actual frame through the positioning full-connection layer to obtain the positioning deviation.
Optionally, the generating a loss function of the decoupled pedestrian detection model according to the classification bias and the positioning bias includes:
acquiring an original loss function of the traditional detection model;
and adding classification deviation and positioning deviation in the original loss function to obtain a loss function of the decoupling pedestrian detection model.
Optionally, the performing data enhancement processing on the pedestrian detection data set to obtain a training image set includes:
marking the pedestrian detection data set according to the position of the pedestrian to obtain a first pedestrian data set with a label;
performing geometric transformation processing or Gaussian noise processing on the first pedestrian data set to obtain a second pedestrian data set;
and collecting the first pedestrian data set and the second pedestrian data set to obtain a training image set.
Optionally, the detecting the image to be detected by using the decoupling pedestrian detection model to obtain the pedestrian detection information includes:
extracting the characteristics of the image to be detected by using a characteristic extraction network of the decoupling pedestrian detection model to obtain a characteristic diagram;
classifying the characteristic diagram by utilizing a classification network of the decoupling pedestrian detection model to obtain pedestrian classification information;
positioning the characteristic diagram by utilizing the classification network of the decoupling pedestrian detection model to obtain position information;
and collecting the pedestrian classification information and the position information to obtain pedestrian detection information corresponding to the image to be detected.
In order to solve the above problem, the present invention also provides a pedestrian detection device, including:
the training data set module is used for acquiring a pedestrian detection data set and performing data enhancement processing on the pedestrian detection data set to obtain a training image set;
the model construction module is used for decomposing the classification positioning task in the traditional detection model and adding a full connection layer to obtain a decoupling pedestrian detection model;
the loss function module is used for respectively calculating classification deviation and positioning deviation by utilizing the full-connection layer and generating a loss function of the decoupling pedestrian detection model according to the classification deviation and the positioning deviation;
the model training module is used for training the decoupling pedestrian detection model by using the training image set and the loss function to obtain a trained decoupling pedestrian detection model;
and the detection module is used for detecting the image to be detected by utilizing the decoupling pedestrian detection model to obtain the pedestrian detection information.
In order to solve the above problem, the present invention also provides an electronic device, including:
a memory storing at least one instruction; and
and the processor executes the instructions stored in the memory to realize the pedestrian detection method.
In order to solve the above problem, the present invention further provides a computer-readable storage medium, which stores at least one instruction, and the at least one instruction is executed by a processor in an electronic device to implement the pedestrian detection method.
According to the embodiment of the invention, the classification and positioning tasks of the traditional detection model are spatially decomposed, the model is trained by using the training image set, and the classification deviation and the positioning deviation are added in the loss function, so that the classification and regression network learning respectively adapt to the candidate frame and the feature extractor, and the input and the feature extractor are not shared for the final discriminator, so that the conflict caused by inconsistent optimization targets is reduced to the maximum extent, the pedestrian detection performance is improved, the problems of mistaken identification or inaccurate frame selection of pedestrians are reduced, and the identification detection accuracy is improved. Therefore, the pedestrian detection method, the pedestrian detection device, the electronic equipment and the computer readable storage medium can solve the problem of low accuracy of the pedestrian detection result.
Drawings
Fig. 1 is a schematic flow chart of a pedestrian detection method according to an embodiment of the present invention;
FIG. 2 is a functional block diagram of a pedestrian detection apparatus according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an electronic device for implementing the pedestrian detection method according to an embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The embodiment of the application provides a pedestrian detection method. The execution subject of the pedestrian detection method includes, but is not limited to, at least one of electronic devices such as a server and a terminal, which can be configured to execute the method provided by the embodiment of the present application. In other words, the pedestrian detection method may be performed by software or hardware installed in a terminal device or a server device, and the software may be a blockchain platform. The server includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like.
Fig. 1 is a schematic flow chart of a pedestrian detection method according to an embodiment of the present invention. In this embodiment, the pedestrian detection method includes:
and S1, acquiring a pedestrian detection data set, and performing data enhancement processing on the pedestrian detection data set to obtain a training image set.
In the embodiment of the invention, the pedestrian detection data set is a KITTI data set and comprises real image data acquired by pedestrians in various scenes such as urban areas, villages, expressways and the like. Further, the pedestrian detection data set may be obtained from a published network platform.
In detail, the performing data enhancement processing on the pedestrian detection data set to obtain a training image set includes:
marking the pedestrian detection data set according to the position of the pedestrian to obtain a first pedestrian data set with a label;
performing geometric transformation processing or Gaussian noise processing on the first pedestrian data set to obtain a second pedestrian data set;
and collecting the first pedestrian data set and the second pedestrian data set to obtain a training image set.
Further, the performing a geometric transformation process or adding a gaussian noise process on the first pedestrian data set includes: and randomly carrying out left-right inversion, rotation and cutting on the image in the first pedestrian data set, adding Gaussian white noise with different degrees, and simultaneously carrying out corresponding transformation on the label marked in the image.
According to the embodiment of the invention, the data enhancement processing is carried out on the pedestrian detection data set, so that the number of image samples is enriched, the learning of a detection model is facilitated, and the accuracy of identification and detection can be improved.
And S2, decomposing the classification positioning task in the traditional detection model, and adding a full connection layer to obtain a decoupling pedestrian detection model.
In the embodiment of the invention, the traditional detection model is a deep learning network based on a target detection algorithm, such as a fast-RCNN model, an SSD model, a YOLO model and the like, and the traditional detection model generally learns by classification and regression positioning together and shares a region frame and a feature extractor in which an object potentially exists.
Optionally, the conventional detection model includes a feature extraction network and a classification detection network, where the feature extraction network is used to extract features, and the classification detection network is used for classification and regression positioning.
The fully-connected layer is a convolutional neural network with three layers, the fully-connected layer comprises a plurality of neurons, each neuron contains an excitation function, such as a ReLU function, and each neuron is fully connected with all neurons in the previous layer, and the fully-connected layer can be used for predicting offset.
In detail, the decomposing of the classification positioning task in the traditional detection model and the adding of the full connection layer to obtain the decoupling pedestrian detection model comprises the following steps:
respectively constructing a classification network and a positioning network based on a deep learning algorithm;
replacing a classification detection network in the conventional detection model with the classification network and the positioning network;
and connecting the pre-constructed classification full-connection layer behind the classification network and connecting the pre-constructed positioning full-connection layer behind the positioning network to obtain the decoupling pedestrian detection model.
The classification network is a convolutional neural network only used for classification, and the classification network can pay more attention to semantic features of images and classify the images according to the semantic features. The positioning network is a convolutional neural network only used for positioning, the positioning network can pay more attention to the position information of the image, namely the boundary frame of the object, and the object in the image is positioned according to the position information.
Further, the building of the classification network and the positioning network based on the deep learning algorithm includes: selecting a plurality of optional network operations according to the characteristics of the classification task, and connecting the plurality of optional network operations to obtain a classification network; and selecting a plurality of optional network operations according to the characteristics of the positioning task, and connecting the plurality of optional network operations to obtain the positioning network. Wherein the selectable network operations include a 1x1 convolution, a 3x3 convolution, a 5x5 convolution, a 7x7 convolution, a maximum pooling layer, and an average pooling layer.
Further, the replacing the classification detection network in the conventional detection model with the classification network and the positioning network according to the embodiment of the present invention includes deleting the classification detection network in the conventional detection model; and connecting the classification network and the positioning network in parallel after the feature extraction network in the traditional detection model.
According to the embodiment of the invention, classification and positioning tasks of a traditional detection model are decomposed in space, and deformation of a classification frame and deformation of a positioning frame are learned by using a three-layer full-connection network respectively, wherein the deformation of the classification frame is offset at a point level, and the deformation of the positioning frame is integral offset at a frame level. The areas concerned by the classification and positioning tasks can be obtained by combining the learned offset with the positions of the original classification frame and positioning frame, and the classification and positioning accuracy is improved by separating the classification from the positioning.
And S3, calculating classification deviation and positioning deviation respectively by utilizing the full connection layer, and generating a loss function of the decoupling pedestrian detection model according to the classification deviation and the positioning deviation.
In the embodiment of the invention, the full connection layer is a 3-layer full connection network, and the full connection layer can predict the deviation between the classification frame and the positioning frame output by the classification network and the positioning network in the decoupling pedestrian detection model and the classification frame and the positioning frame of the real position to obtain the classification deviation and the positioning deviation.
And the classification network and the positioning network in the decoupling pedestrian detection model are respectively used for classification and positioning, and respectively output a feature map with a classification frame and a feature map with a positioning frame. Further, the calculating the classification deviation and the positioning deviation by using the full connection layer respectively includes: judging whether each pixel point in a classification frame predicted by the classification network contains a target class through the classification full-link layer, and counting the pixel points without the target class to obtain a classification deviation; and calculating the intersection ratio IOU (intersection over unit) of the positioning frame predicted by the positioning network and the actual frame through the positioning full-connection layer to obtain the positioning deviation. Wherein the location box can be represented by (x, y, w, h), (x, y) is the center coordinates of the location box, and w and h are the width and height of the location box.
In detail, the generating a loss function of the decoupled pedestrian detection model according to the classification bias and the positioning bias includes:
acquiring an original loss function of the traditional detection model;
and adding classification deviation and positioning deviation in the original loss function to obtain a loss function of the decoupling pedestrian detection model.
Further, the original loss function in the embodiment of the present invention includes:
Figure BDA0002876979230000071
wherein, L (p)i,ti) For the loss value, N is the total number of samples in the training dataset,
Figure BDA0002876979230000072
is the loss of classification, piIs a predictive label of the output of a conventional detection model,
Figure BDA0002876979230000073
is a real label and is a label of the real,
Figure BDA0002876979230000074
is the loss of positioning, tiIs the predicted location frame position information output by the traditional detection model,
Figure BDA0002876979230000075
is the position information of the real positioning frame, and λ is the coefficient, which is the preset threshold.
The loss function of the decoupling pedestrian detection model provided by the embodiment of the invention comprises the following steps:
Loss=L(pi,ti)+Ec+El
wherein, L (p)i,ti) Is the base loss value, EcIs the classification deviation, ElIs the positioning deviation.
And S4, training the decoupling pedestrian detection model by using the training image set and the loss function to obtain the trained decoupling pedestrian detection model.
In detail, the training of the decoupling pedestrian detection model by using the training image set and the loss function until a preset convergence condition is reached to obtain a trained decoupling pedestrian detection model includes:
inputting the training image set into the decoupling pedestrian detection model for classification recognition and positioning detection to obtain a prediction result, wherein the prediction result comprises category information and position information;
calculating a confidence level of the predicted result using the loss function;
and updating the parameters of the decoupling pedestrian detection model according to the confidence degree, returning to the step of inputting the training image set into the decoupling pedestrian detection model for classification recognition and positioning detection to obtain a prediction result, and obtaining the trained decoupling pedestrian detection model until a preset convergence condition is reached.
The convergence condition means that the current confidence degree is greater than the sum of the last calculated confidence degree and a preset confidence degree threshold value.
The loss function used in the training of the decoupling pedestrian detection model in the embodiment of the invention not only comprises the loss function of the original common pedestrian detection task, namely the integral error of the model, but also comprises the loss of the classification network in the model, namely the classification deviation, and the loss of the positioning network in the model, namely the positioning deviation, so that the optimization of the classification network and the positioning network is facilitated, and the accuracy of the detection result is improved.
And S5, detecting the image to be detected by using the decoupling pedestrian detection model to obtain pedestrian detection information.
The image to be detected in the embodiment of the invention can be a real-time monitoring image during automatic driving. In order to further ensure the privacy and security of the image to be detected, the image to be detected may be stored in a node of a block chain.
In detail, the detecting the image to be detected by using the decoupling pedestrian detection model to obtain the pedestrian detection information includes:
extracting the characteristics of the image to be detected by using a characteristic extraction network of the decoupling pedestrian detection model to obtain a characteristic diagram;
classifying the characteristic diagram by utilizing a classification network of the decoupling pedestrian detection model to obtain pedestrian classification information;
positioning the characteristic diagram by utilizing the classification network of the decoupling pedestrian detection model to obtain position information;
and collecting the pedestrian classification information and the position information to obtain pedestrian detection information corresponding to the image to be detected.
According to the embodiment of the invention, classification and positioning tasks are decoupled, so that the accuracy of model detection is improved, the features of the image input model are extracted from the backbone network to obtain the features during detection, and then the classification and positioning networks of the model are respectively input to obtain the position information of pedestrians.
According to the embodiment of the invention, the classification and positioning tasks of the traditional detection model are spatially decomposed, the model is trained by using the training image set, and the classification deviation and the positioning deviation are added in the loss function, so that the classification and regression network learning respectively adapt to the candidate frame and the feature extractor, and the input and the feature extractor are not shared for the final discriminator, so that the conflict caused by inconsistent optimization targets is reduced to the maximum extent, the pedestrian detection performance is improved, the problems of mistaken identification or inaccurate frame selection of pedestrians are reduced, and the identification detection accuracy is improved. Therefore, the pedestrian detection method, the pedestrian detection device, the electronic equipment and the computer readable storage medium can solve the problem of low accuracy of the pedestrian detection result.
Fig. 2 is a functional block diagram of a pedestrian detection device according to an embodiment of the present invention.
The pedestrian detection apparatus 100 according to the present invention may be installed in an electronic device. According to the implemented functions, the pedestrian detection apparatus 100 may include a training data set module 101, a model construction module 102, a loss function module 103, a model training module 104, and a detection module 105. The module of the present invention, which may also be referred to as a unit, refers to a series of computer program segments that can be executed by a processor of an electronic device and that can perform a fixed function, and that are stored in a memory of the electronic device.
In the present embodiment, the functions regarding the respective modules/units are as follows:
the training data set module 101 is configured to acquire a pedestrian detection data set, and perform data enhancement processing on the pedestrian detection data set to obtain a training image set.
In the embodiment of the invention, the pedestrian detection data set is a KITTI data set and comprises real image data acquired by pedestrians in various scenes such as urban areas, villages, expressways and the like. Further, the pedestrian detection data set may be obtained from a published network platform.
In detail, when the pedestrian detection data set is subjected to data enhancement processing to obtain a training image set, the training data set module 101 specifically executes the following steps:
marking the pedestrian detection data set according to the position of the pedestrian to obtain a first pedestrian data set with a label;
performing geometric transformation processing or Gaussian noise processing on the first pedestrian data set to obtain a second pedestrian data set;
and collecting the first pedestrian data set and the second pedestrian data set to obtain a training image set.
Further, the performing a geometric transformation process or adding a gaussian noise process on the first pedestrian data set includes: and randomly carrying out left-right inversion, rotation and cutting on the image in the first pedestrian data set, adding Gaussian white noise with different degrees, and simultaneously carrying out corresponding transformation on the label marked in the image.
According to the embodiment of the invention, the data enhancement processing is carried out on the pedestrian detection data set, so that the number of image samples is enriched, the learning of a detection model is facilitated, and the accuracy of identification and detection can be improved.
The model building module 102 is configured to decompose a classification positioning task in a conventional detection model, and add a full connection layer to obtain a decoupling pedestrian detection model.
In the embodiment of the invention, the traditional detection model is a deep learning network based on a target detection algorithm, such as a fast-RCNN model, an SSD model, a YOLO model and the like, and the traditional detection model generally learns by classification and regression positioning together and shares a region frame and a feature extractor in which an object potentially exists.
Optionally, the conventional detection model includes a feature extraction network and a classification detection network, where the feature extraction network is used to extract features, and the classification detection network is used for classification and regression positioning.
The fully-connected layer is a convolutional neural network with three layers, the fully-connected layer comprises a plurality of neurons, each neuron contains an excitation function, such as a ReLU function, and each neuron is fully connected with all neurons in the previous layer, and the fully-connected layer can be used for predicting offset.
In detail, the model building module 102 is specifically configured to:
respectively constructing a classification network and a positioning network based on a deep learning algorithm;
replacing a classification detection network in the conventional detection model with the classification network and the positioning network;
and connecting the pre-constructed classification full-connection layer behind the classification network and connecting the pre-constructed positioning full-connection layer behind the positioning network to obtain the decoupling pedestrian detection model.
The classification network is a convolutional neural network only used for classification, and the classification network can pay more attention to semantic features of images and classify the images according to the semantic features. The positioning network is a convolutional neural network only used for positioning, the positioning network can pay more attention to the position information of the image, namely the boundary frame of the object, and the object in the image is positioned according to the position information.
Further, the building of the classification network and the positioning network based on the deep learning algorithm includes: selecting a plurality of optional network operations according to the characteristics of the classification task, and connecting the plurality of optional network operations to obtain a classification network; and selecting a plurality of optional network operations according to the characteristics of the positioning task, and connecting the plurality of optional network operations to obtain the positioning network. Wherein the selectable network operations include a 1x1 convolution, a 3x3 convolution, a 5x5 convolution, a 7x7 convolution, a maximum pooling layer, and an average pooling layer.
Further, the replacing the classification detection network in the conventional detection model with the classification network and the positioning network according to the embodiment of the present invention includes deleting the classification detection network in the conventional detection model; and connecting the classification network and the positioning network in parallel after the feature extraction network in the traditional detection model.
According to the embodiment of the invention, classification and positioning tasks of a traditional detection model are decomposed in space, and deformation of a classification frame and deformation of a positioning frame are learned by using a three-layer full-connection network respectively, wherein the deformation of the classification frame is offset at a point level, and the deformation of the positioning frame is integral offset at a frame level. The areas concerned by the classification and positioning tasks can be obtained by combining the learned offset with the positions of the original classification frame and positioning frame, and the classification and positioning accuracy is improved by separating the classification from the positioning.
The loss function module 103 is configured to calculate a classification deviation and a positioning deviation respectively by using the full connection layer, and generate a loss function of the decoupling pedestrian detection model according to the classification deviation and the positioning deviation.
In the embodiment of the invention, the full connection layer is a 3-layer full connection network, and the full connection layer can predict the deviation between the classification frame and the positioning frame output by the classification network and the positioning network in the decoupling pedestrian detection model and the classification frame and the positioning frame of the real position to obtain the classification deviation and the positioning deviation.
And the classification network and the positioning network in the decoupling pedestrian detection model are respectively used for classification and positioning, and respectively output a feature map with a classification frame and a feature map with a positioning frame. Further, the calculating the classification deviation and the positioning deviation by using the full connection layer respectively includes: judging whether each pixel point in a classification frame predicted by the classification network contains a target class through the classification full-link layer, and counting the pixel points without the target class to obtain a classification deviation; and calculating the intersection ratio IOU (intersection over unit) of the positioning frame predicted by the positioning network and the actual frame through the positioning full-connection layer to obtain the positioning deviation. Wherein the location box can be represented by (x, y, w, h), (x, y) is the center coordinates of the location box, and w and h are the width and height of the location box.
In detail, when generating the loss function of the decoupled pedestrian detection model according to the classification bias and the positioning bias, the loss function module 103 specifically performs the following steps:
acquiring an original loss function of the traditional detection model;
and adding classification deviation and positioning deviation in the original loss function to obtain a loss function of the decoupling pedestrian detection model.
Further, the original loss function in the embodiment of the present invention includes:
Figure BDA0002876979230000111
wherein, L (p)i,ti) For the loss value, N is the total number of samples in the training dataset,
Figure BDA0002876979230000112
is the loss of classification, piIs a predictive label of the output of a conventional detection model,
Figure BDA0002876979230000113
is a real label and is a label of the real,
Figure BDA0002876979230000114
is the loss of positioning, tiIs the predicted location frame position information output by the traditional detection model,
Figure BDA0002876979230000121
is the position information of the real positioning frame, and λ is the coefficient, which is the preset threshold.
The loss function of the decoupling pedestrian detection model provided by the embodiment of the invention comprises the following steps:
Loss=L(pi,ti)+Ec+El
wherein, L (p)i,ti) Is the base loss value, EcIs the classification deviation, ElIs the positioning deviation.
The model training module 104 is configured to train the decoupling pedestrian detection model by using the training image set and the loss function, so as to obtain a trained decoupling pedestrian detection model.
In detail, the model training module 104 is specifically configured to:
inputting the training image set into the decoupling pedestrian detection model for classification recognition and positioning detection to obtain a prediction result, wherein the prediction result comprises category information and position information;
calculating a confidence level of the predicted result using the loss function;
and updating the parameters of the decoupling pedestrian detection model according to the confidence degree, returning to the step of inputting the training image set into the decoupling pedestrian detection model for classification recognition and positioning detection to obtain a prediction result, and obtaining the trained decoupling pedestrian detection model until a preset convergence condition is reached.
The convergence condition means that the current confidence degree is greater than the sum of the last calculated confidence degree and a preset confidence degree threshold value.
The loss function used in the training of the decoupling pedestrian detection model in the embodiment of the invention not only comprises the loss function of the original common pedestrian detection task, namely the integral error of the model, but also comprises the loss of the classification network in the model, namely the classification deviation, and the loss of the positioning network in the model, namely the positioning deviation, so that the optimization of the classification network and the positioning network is facilitated, and the accuracy of the detection result is improved.
The detection module 105 is configured to detect an image to be detected by using the decoupling pedestrian detection model to obtain pedestrian detection information.
The image to be detected in the embodiment of the invention can be a real-time monitoring image during automatic driving. In order to further ensure the privacy and security of the image to be detected, the image to be detected may be stored in a node of a block chain.
In detail, the detecting the image to be detected by using the decoupling pedestrian detection model to obtain the pedestrian detection information includes:
extracting the characteristics of the image to be detected by using a characteristic extraction network of the decoupling pedestrian detection model to obtain a characteristic diagram;
classifying the characteristic diagram by utilizing a classification network of the decoupling pedestrian detection model to obtain pedestrian classification information;
positioning the characteristic diagram by utilizing the classification network of the decoupling pedestrian detection model to obtain position information;
and collecting the pedestrian classification information and the position information to obtain pedestrian detection information corresponding to the image to be detected.
According to the embodiment of the invention, classification and positioning tasks are decoupled, so that the accuracy of model detection is improved, the features of the image input model are extracted from the backbone network to obtain the features during detection, and then the classification and positioning networks of the model are respectively input to obtain the position information of pedestrians.
Fig. 3 is a schematic structural diagram of an electronic device for implementing a pedestrian detection method according to an embodiment of the present invention.
The electronic device 1 may comprise a processor 10, a memory 11 and a bus, and may further comprise a computer program, such as a pedestrian detection program 12, stored in the memory 11 and executable on the processor 10.
The memory 11 includes at least one type of readable storage medium, which includes flash memory, removable hard disk, multimedia card, card-type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disk, optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device 1, such as a removable hard disk of the electronic device 1. The memory 11 may also be an external storage device of the electronic device 1 in other embodiments, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the electronic device 1. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device 1. The memory 11 may be used not only to store application software installed in the electronic device 1 and various types of data, such as a code of the pedestrian detection program 12, but also to temporarily store data that has been output or is to be output.
The processor 10 may be composed of an integrated circuit in some embodiments, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects various components of the electronic device by using various interfaces and lines, and executes various functions and processes data of the electronic device 1 by operating or executing programs or modules (e.g., pedestrian detection programs, etc.) stored in the memory 11 and calling data stored in the memory 11.
The bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. The bus is arranged to enable connection communication between the memory 11 and at least one processor 10 or the like.
Fig. 3 shows only an electronic device with components, and it will be understood by those skilled in the art that the structure shown in fig. 3 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than those shown, or some components may be combined, or a different arrangement of components.
For example, although not shown, the electronic device 1 may further include a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least one processor 10 through a power management device, so as to implement functions of charge management, discharge management, power consumption management, and the like through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The electronic device 1 may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.
Further, the electronic device 1 may further include a network interface, and optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a bluetooth interface, etc.), which are generally used for establishing a communication connection between the electronic device 1 and other electronic devices.
Optionally, the electronic device 1 may further comprise a user interface, which may be a Display (Display), an input unit (such as a Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable for displaying information processed in the electronic device 1 and for displaying a visualized user interface, among other things.
It is to be understood that the described embodiments are for purposes of illustration only and that the scope of the appended claims is not limited to such structures.
The pedestrian detection program 12 stored in the memory 11 of the electronic device 1 is a combination of instructions that, when executed in the processor 10, may implement:
acquiring a pedestrian detection data set, and performing data enhancement processing on the pedestrian detection data set to obtain a training image set;
decomposing a classification positioning task in a traditional detection model, and adding a full-connection layer to obtain a decoupling pedestrian detection model;
respectively calculating classification deviation and positioning deviation by utilizing the full connection layer, and generating a loss function of the decoupling pedestrian detection model according to the classification deviation and the positioning deviation;
training the decoupling pedestrian detection model by using the training image set and the loss function to obtain a trained decoupling pedestrian detection model;
and detecting the image to be detected by utilizing the decoupling pedestrian detection model to obtain pedestrian detection information.
Specifically, the specific implementation method of the processor 10 for the instruction may refer to the description of the relevant steps in the embodiments corresponding to fig. 1 to fig. 3, which is not repeated herein.
Further, the integrated modules/units of the electronic device 1, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. The computer readable storage medium may be volatile or non-volatile. For example, the computer-readable medium may include: any entity or device capable of carrying said computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM).
The present invention also provides a computer-readable storage medium, storing a computer program which, when executed by a processor of an electronic device, may implement:
acquiring a pedestrian detection data set, and performing data enhancement processing on the pedestrian detection data set to obtain a training image set;
decomposing a classification positioning task in a traditional detection model, and adding a full-connection layer to obtain a decoupling pedestrian detection model;
respectively calculating classification deviation and positioning deviation by utilizing the full connection layer, and generating a loss function of the decoupling pedestrian detection model according to the classification deviation and the positioning deviation;
training the decoupling pedestrian detection model by using the training image set and the loss function to obtain a trained decoupling pedestrian detection model;
and detecting the image to be detected by utilizing the decoupling pedestrian detection model to obtain pedestrian detection information.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method can be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof.
The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.
The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms second, etc. are used to denote names, but not any particular order.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims (10)

1. A pedestrian detection method, characterized in that the method comprises:
acquiring a pedestrian detection data set, and performing data enhancement processing on the pedestrian detection data set to obtain a training image set;
decomposing a classification positioning task in a traditional detection model, and adding a full-connection layer to obtain a decoupling pedestrian detection model;
respectively calculating classification deviation and positioning deviation by utilizing the full connection layer, and generating a loss function of the decoupling pedestrian detection model according to the classification deviation and the positioning deviation;
training the decoupling pedestrian detection model by using the training image set and the loss function to obtain a trained decoupling pedestrian detection model;
and detecting the image to be detected by utilizing the decoupling pedestrian detection model to obtain pedestrian detection information.
2. The pedestrian detection method of claim 1, wherein decomposing the classification and positioning task in the conventional detection model and adding a full connection layer to obtain a decoupled pedestrian detection model comprises:
respectively constructing a classification network and a positioning network based on a deep learning algorithm;
replacing a classification detection network in a conventional detection model with the classification network and the positioning network;
and connecting the pre-constructed classification full-connection layer behind the classification network and connecting the pre-constructed positioning full-connection layer behind the positioning network to obtain the decoupling pedestrian detection model.
3. The pedestrian detection method of claim 2, wherein the using the classification network and the positioning network to replace a classification detection network in a conventional detection model comprises:
deleting the classified detection network in the traditional detection model;
and connecting the classification network and the positioning network in parallel to a feature extraction network in the traditional detection model.
4. The pedestrian detection method of claim 1, wherein the separately calculating a classification bias and a localization bias using the fully-connected layers comprises:
judging whether each pixel point in a classification frame predicted by the classification network contains a target class through the classification full-link layer, and counting the pixel points without the target class to obtain a classification deviation;
and calculating the intersection ratio of the positioning frame predicted by the positioning network and the actual frame through the positioning full-connection layer to obtain the positioning deviation.
5. The pedestrian detection method of claim 1, wherein the generating a loss function of the decoupled pedestrian detection model from the classification bias and the localization bias comprises:
acquiring an original loss function of the traditional detection model;
and adding classification deviation and positioning deviation in the original loss function to obtain a loss function of the decoupling pedestrian detection model.
6. The pedestrian detection method of claim 1, wherein the data enhancement processing of the pedestrian detection data set to obtain a training image set comprises:
marking the pedestrian detection data set according to the position of the pedestrian to obtain a first pedestrian data set with a label;
performing geometric transformation processing or Gaussian noise processing on the first pedestrian data set to obtain a second pedestrian data set;
and collecting the first pedestrian data set and the second pedestrian data set to obtain a training image set.
7. The pedestrian detection method according to any one of claims 1 to 6, wherein the detecting the image to be detected by using the decoupled pedestrian detection model to obtain the pedestrian detection information comprises:
extracting the characteristics of the image to be detected by using a characteristic extraction network of the decoupling pedestrian detection model to obtain a characteristic diagram;
classifying the characteristic diagram by utilizing a classification network of the decoupling pedestrian detection model to obtain pedestrian classification information;
positioning the characteristic diagram by utilizing the classification network of the decoupling pedestrian detection model to obtain position information;
and collecting the pedestrian classification information and the position information to obtain pedestrian detection information corresponding to the image to be detected.
8. A pedestrian detection device, characterized in that the device comprises:
the training data set module is used for acquiring a pedestrian detection data set and performing data enhancement processing on the pedestrian detection data set to obtain a training image set;
the model construction module is used for decomposing the classification positioning task in the traditional detection model and adding a full connection layer to obtain a decoupling pedestrian detection model;
the loss function module is used for respectively calculating classification deviation and positioning deviation by utilizing the full-connection layer and generating a loss function of the decoupling pedestrian detection model according to the classification deviation and the positioning deviation;
the model training module is used for training the decoupling pedestrian detection model by using the training image set and the loss function to obtain a trained decoupling pedestrian detection model;
and the detection module is used for detecting the image to be detected by utilizing the decoupling pedestrian detection model to obtain the pedestrian detection information.
9. An electronic device, characterized in that the electronic device comprises:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the pedestrian detection method of any one of claims 1 to 7.
10. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the pedestrian detection method according to any one of claims 1 to 7.
CN202011637418.5A 2020-12-31 2020-12-31 Pedestrian detection method, device, electronic equipment and storage medium Pending CN112749653A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202011637418.5A CN112749653A (en) 2020-12-31 2020-12-31 Pedestrian detection method, device, electronic equipment and storage medium
PCT/CN2021/083707 WO2022141858A1 (en) 2020-12-31 2021-03-30 Pedestrian detection method and apparatus, electronic device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011637418.5A CN112749653A (en) 2020-12-31 2020-12-31 Pedestrian detection method, device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112749653A true CN112749653A (en) 2021-05-04

Family

ID=75651116

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011637418.5A Pending CN112749653A (en) 2020-12-31 2020-12-31 Pedestrian detection method, device, electronic equipment and storage medium

Country Status (2)

Country Link
CN (1) CN112749653A (en)
WO (1) WO2022141858A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113255766A (en) * 2021-05-25 2021-08-13 平安科技(深圳)有限公司 Image classification method, device, equipment and storage medium
CN113743256A (en) * 2021-08-17 2021-12-03 武汉大学 Construction site safety intelligent early warning method and device
CN115861316A (en) * 2023-02-27 2023-03-28 深圳佑驾创新科技有限公司 Pedestrian detection model training method and device and pedestrian detection method

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116383427B (en) * 2023-06-06 2023-08-11 深圳市微克科技有限公司 Picture batch analysis method, system and medium based on intelligent wearable device
CN116721095B (en) * 2023-08-04 2023-11-03 杭州瑞琦信息技术有限公司 Aerial photographing road illumination fault detection method and device

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102592076B1 (en) * 2015-12-14 2023-10-19 삼성전자주식회사 Appartus and method for Object detection based on Deep leaning, apparatus for Learning thereof
US11188794B2 (en) * 2017-08-10 2021-11-30 Intel Corporation Convolutional neural network framework using reverse connections and objectness priors for object detection
CN108376235A (en) * 2018-01-15 2018-08-07 深圳市易成自动驾驶技术有限公司 Image detecting method, device and computer readable storage medium
CN111160379B (en) * 2018-11-07 2023-09-15 北京嘀嘀无限科技发展有限公司 Training method and device of image detection model, and target detection method and device
CN111950329A (en) * 2019-05-16 2020-11-17 长沙智能驾驶研究院有限公司 Target detection and model training method and device, computer equipment and storage medium
CN111062413B (en) * 2019-11-08 2024-05-07 熊猫汽车(上海)有限公司 Road target detection method and device, electronic equipment and storage medium

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113255766A (en) * 2021-05-25 2021-08-13 平安科技(深圳)有限公司 Image classification method, device, equipment and storage medium
CN113255766B (en) * 2021-05-25 2023-12-22 平安科技(深圳)有限公司 Image classification method, device, equipment and storage medium
CN113743256A (en) * 2021-08-17 2021-12-03 武汉大学 Construction site safety intelligent early warning method and device
CN113743256B (en) * 2021-08-17 2023-12-26 武汉大学 Intelligent early warning method and device for site safety
CN115861316A (en) * 2023-02-27 2023-03-28 深圳佑驾创新科技有限公司 Pedestrian detection model training method and device and pedestrian detection method
CN115861316B (en) * 2023-02-27 2023-09-29 深圳佑驾创新科技股份有限公司 Training method and device for pedestrian detection model and pedestrian detection method

Also Published As

Publication number Publication date
WO2022141858A1 (en) 2022-07-07

Similar Documents

Publication Publication Date Title
CN112749653A (en) Pedestrian detection method, device, electronic equipment and storage medium
CN110033018B (en) Graph similarity judging method and device and computer readable storage medium
CN112395978A (en) Behavior detection method and device and computer readable storage medium
CN112447189A (en) Voice event detection method and device, electronic equipment and computer storage medium
CN112699775A (en) Certificate identification method, device and equipment based on deep learning and storage medium
CN112137591B (en) Target object position detection method, device, equipment and medium based on video stream
CN112396005A (en) Biological characteristic image recognition method and device, electronic equipment and readable storage medium
CN111695609A (en) Target damage degree determination method, target damage degree determination device, electronic device, and storage medium
CN113283446A (en) Method and device for identifying target object in image, electronic equipment and storage medium
CN112507934A (en) Living body detection method, living body detection device, electronic apparatus, and storage medium
CN113298159A (en) Target detection method and device, electronic equipment and storage medium
CN112580684A (en) Target detection method and device based on semi-supervised learning and storage medium
CN113516417A (en) Service evaluation method and device based on intelligent modeling, electronic equipment and medium
CN113222063A (en) Express carton garbage classification method, device, equipment and medium
CN111985449A (en) Rescue scene image identification method, device, equipment and computer medium
CN115471775A (en) Information verification method, device and equipment based on screen recording video and storage medium
CN112016617A (en) Fine-grained classification method and device and computer-readable storage medium
CN115830399A (en) Classification model training method, apparatus, device, storage medium, and program product
CN113887439A (en) Automatic early warning method, device, equipment and storage medium based on image recognition
CN113487621A (en) Medical image grading method and device, electronic equipment and readable storage medium
CN115049836B (en) Image segmentation method, device, equipment and storage medium
CN115760854A (en) Deep learning-based power equipment defect detection method and device and electronic equipment
CN112580505B (en) Method and device for identifying network point switch door state, electronic equipment and storage medium
CN114708073A (en) Intelligent detection method and device for surrounding mark and serial mark, electronic equipment and storage medium
CN112132037A (en) Sidewalk detection method, device, equipment and medium based on artificial intelligence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination