CN111008576A

CN111008576A - Pedestrian detection and model training and updating method, device and readable storage medium thereof

Info

Publication number: CN111008576A
Application number: CN201911163826.9A
Authority: CN
Inventors: 肖刚; 周捷; 王逸飞; 王正来
Original assignee: Gaochuang Anbang Beijing Technology Co Ltd
Current assignee: Gaochuang Anbang Beijing Technology Co Ltd
Priority date: 2019-11-22
Filing date: 2019-11-22
Publication date: 2020-04-14
Anticipated expiration: 2039-11-22
Also published as: CN111008576B

Abstract

The invention discloses a pedestrian detection and model training and updating method, equipment and a readable storage medium thereof, wherein the method for training a pedestrian detection model comprises the following steps: acquiring training video data containing pedestrians; converting training video data into an image sequence training sample set; marking pedestrian areas in the image sequence training sample set according to whether pedestrians in the image sequence training sample set are shielded to obtain positive and negative sample sets; calculating according to the positive and negative sample sets to obtain a first training set; and carrying out iterative training on the deep convolutional neural network model based on the cascade network and the feature fusion according to the first training set to obtain a pedestrian detection model. According to the method for training the pedestrian detection model, provided by the embodiment of the invention, the independent label for blocking the pedestrian is given according to whether the pedestrian is blocked in the image sequence, and the method is different from the pedestrian which is not blocked in the image, so that the detection precision of the blocked pedestrian is improved.

Description

Pedestrian detection and model training and updating method, device and readable storage medium thereof

Technical Field

The invention relates to the technical field of pedestrian detection, in particular to a pedestrian detection method, a model training method, a model updating method, equipment and a readable storage medium.

Background

The pedestrian detection technology is a technology for automatically searching the position and size of a pedestrian in any input image, is a key problem in the field of target detection, and has wide application in the fields of automatic driving, video monitoring, biological feature recognition, behavior analysis and the like.

Under the complex environment in real life, different pedestrians are different in clothing, the situation of confusion with the background is easily generated, meanwhile, the situation that the trunk part is blocked easily occurs, and in addition, the interference of the visual angle of a monitoring lens, illumination and the like causes the blocking problem of the pedestrians, which is one of the biggest challenges in the conventional pedestrian detection, especially in a crowded scene, how to carry out efficient and accurate pedestrian detection is a hotspot and a difficulty in research.

The traditional pedestrian detection method usually adopts a mode of manually designing and extracting features, a good detection effect is usually obtained only in a specific scene, and the robustness of an algorithm is difficult to guarantee.

Disclosure of Invention

In view of this, embodiments of the present invention provide a pedestrian detection method, a method and an apparatus for training and updating a model thereof, and a readable storage medium, so as to solve the problem of poor accuracy of the existing pedestrian detection algorithm.

The technical scheme provided by the invention is as follows:

a first aspect of an embodiment of the present invention provides a method for training a pedestrian detection model, where the method includes: acquiring training video data containing pedestrians; converting the training video data into an image sequence training sample set; marking pedestrian areas in the image sequence training sample set according to whether pedestrians in the image sequence training sample set are shielded to obtain positive and negative sample sets; calculating to obtain a first training set according to the positive and negative sample sets; and performing iterative training on the deep convolutional neural network model based on the cascade network and the feature fusion according to the first training set to obtain the pedestrian detection model.

According to a first aspect, in a first implementation form of the first aspect, the cascaded network comprises: an anchor point refining module and a target detection module; iteratively training a deep convolutional neural network model based on a cascade network and feature fusion according to the first training set to obtain the pedestrian detection model, wherein the iterative training comprises inputting the first training set into the anchor point refining module to obtain a first feature vector through calculation; inputting the first feature vector into the target detection module to calculate to obtain a second feature vector; inputting the first feature vector and the second feature vector into the feature fusion module respectively for calculation to obtain a first loss function and a second loss function; superposing the first loss function and the second loss function, and calculating to obtain a loss function; and selecting the model with the lowest loss function value as the pedestrian detection model.

According to the first aspect, in a second embodiment of the first aspect, the method for training a pedestrian detection model further comprises: calculating the positive and negative sample sets according to a preset proportion to obtain a first verification set; and verifying the pedestrian detection model according to the first verification set.

According to a second embodiment of the first aspect, in a third embodiment of the first aspect, the method for training a pedestrian detection model further comprises: calculating the positive and negative sample sets according to a preset proportion to obtain a first test set; and testing the verified pedestrian detection model according to the first test set to obtain a test result.

A second aspect of an embodiment of the present invention provides a pedestrian detection method, including: acquiring to-be-detected video data containing pedestrians; converting the video data to be detected into an image sequence detection sample set; inputting the image sequence detection sample set into the pedestrian detection model generated by training according to the method for training a pedestrian detection model in any one of the first aspect and the first aspect of the embodiment of the invention, and obtaining a detection result.

A third aspect of the embodiments of the present invention provides a method for updating a pedestrian detection model, where the method includes: acquiring detection results at intervals of preset time, wherein the detection results are obtained by the pedestrian detection method according to the second aspect of the embodiment of the invention; calculating to obtain a second training set according to the detection result; by adopting the method of any one of the first aspect and the second aspect of the embodiment of the invention, an updated model is obtained according to the second training set; judging the accuracy of the updating model and the pedestrian detection model; when the accuracy of the pedestrian detection model is lower than that of the updated model, the updated model is used as a pedestrian detection model to detect video data containing pedestrians; and when the accuracy of the pedestrian detection model is higher than that of the updating model, detecting the video data containing the pedestrian according to the pedestrian detection model.

According to a second aspect, in the first embodiment of the second aspect, calculating a second training set according to the detection result includes: taking a result with a score higher than a preset value in the detection result as a positive sample set, and taking a result with a score lower than a preset value in the detection result as a negative sample set; and calculating to obtain a second training set according to the positive sample set and the negative sample set.

According to a second aspect, in a second embodiment of the second aspect, the determining the accuracy of the update model and the pedestrian detection model includes: calculating according to the positive sample set and the negative sample set to obtain a second test set; calculating to obtain a second test set according to the to-be-detected video data containing the pedestrians; and judging the accuracy of the updated model and the pedestrian detection model according to a first test set and the second test set, wherein the first test set is obtained according to the method for training the pedestrian detection model in the third embodiment of the first aspect.

A fourth aspect of the embodiments of the present invention provides a computer-readable storage medium, which stores computer instructions for causing a computer to execute the method for training a pedestrian detection model according to any one of the first aspect and the first aspect of the embodiments of the present invention, or execute the method for detecting a pedestrian according to the second aspect of the embodiments of the present invention, or execute the method for updating a pedestrian detection model according to any one of the third aspect and the third aspect of the embodiments of the present invention.

A fifth aspect of an embodiment of the present invention provides a pedestrian detection apparatus, including: a memory and a processor, the memory and the processor being communicatively connected to each other, the memory storing computer instructions, and the processor executing the computer instructions to perform the method for training a pedestrian detection model according to any one of the first aspect and the first aspect of the embodiments of the present invention, or to perform the method for detecting a pedestrian according to the second aspect of the embodiments of the present invention, or to perform the method for updating a pedestrian detection model according to any one of the third aspect and the third aspect of the embodiments of the present invention.

The technical scheme provided by the invention has the following effects:

according to the pedestrian detection and model training and updating method, device and readable storage medium provided by the embodiment of the invention, the video data containing the pedestrian is obtained, the video data is converted into the image sequence, the independent label for shielding the pedestrian is given according to whether the pedestrian is shielded in the image sequence, the pedestrian is distinguished from the pedestrian which is not shielded in the image, the detection precision of the shielded pedestrian is improved, meanwhile, the pedestrian features are extracted through the cascade network, the model performance is improved through iterative training, and in addition, the detection effect of the small target of the pedestrian is further effectively improved through the feature fusion of the cascade network.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a flow diagram of a method of training a pedestrian detection model according to an embodiment of the invention;

FIG. 2 is a flow diagram of a method of training a pedestrian detection model, according to another embodiment of the invention;

FIG. 3 is a flow chart of a pedestrian detection method according to an embodiment of the invention;

FIG. 4 is a flow chart of a method of updating a pedestrian detection model according to an embodiment of the invention;

FIG. 5 is a flow diagram of a method of updating a pedestrian detection model according to another embodiment of the invention;

FIG. 6 is a flow diagram of a method of updating a pedestrian detection model according to another embodiment of the invention;

fig. 7 is a schematic hardware structure diagram of a pedestrian detection apparatus according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The pedestrian detection technology is a technology for automatically searching the position and size of a pedestrian in an arbitrary input image, and is widely applied to the fields of computer vision, pattern recognition and the like, such as automatic driving, video monitoring, biometric recognition and the like.

Under a complex environment in real life, the problem of blocking of pedestrians is one of the biggest challenges facing pedestrian detection at present, and especially under a crowded scene, how to perform efficient and accurate pedestrian detection is a hot spot and a difficult point of research. The method comprises the steps of obtaining training video data containing pedestrians through Deep Learning (Deep Learning), converting the training video data into an image sequence, and labeling the image sequence according to whether the pedestrians are shielded, so that a pedestrian detection result is obtained.

Deep learning is a learning method for establishing a deep structure model, and typical deep learning algorithms comprise a deep confidence network, a convolutional neural network, a limited boltzmann machine, a cyclic neural network and the like. Deep learning is also known as deep neural networks (referring to neural networks with more than 3 layers). Deep learning is derived from a multilayer neural network, and essentially, a mode of combining feature representation and learning is provided. The deep learning is characterized in that interpretability is abandoned, and effectiveness of learning is simply pursued.

Referring to fig. 1, a method for training a pedestrian detection model according to an embodiment of the present invention is mainly described as follows:

step S101: acquiring training video data containing pedestrians; specifically, the video including the pedestrian may be a monitoring video installed at each intersection, or may also be video data including a subway exit, a supermarket exit, a shopping mall exit, a train station exit, a school and the like in different outdoor occasions, which is not limited by the present invention.

Step S102: converting training video data into an image sequence training sample set; the sequence of images is a series of images that are acquired sequentially at different times and different orientations of the object, and the video is composed of a series of images called frames, which are acquired at fixed time intervals (called frame rate, usually expressed in frames/second), so that the scene in motion can be displayed. The present invention can convert the training video data into an image sequence training sample set by adopting the existing video conversion software.

Step S103: and marking the pedestrian area in the image sequence training sample set according to whether the pedestrian in the image sequence training sample set is shielded or not to obtain a positive sample set and a negative sample set. The positive sample set is a sample set containing pedestrians, and the negative sample set is a sample set containing no pedestrians.

Specifically, all pedestrian regions in the image sequence training sample set can be labeled in a rectangular frame form, and Person and Blocked Person labels are given according to whether the pedestrian is Blocked or not, wherein the Person label indicates that the pedestrian is not Blocked, and the Blocked Person label indicates that the pedestrian is Blocked. And storing information such as the name, the size, the label corresponding to the pedestrian area, the completeness, the easiness in identification and the coordinates of the image in the image sequence training sample set into a corresponding annotation file in a standard voc format, wherein the completeness of the image means whether the pedestrian in the annotated pedestrian area completely appears in the image, and when the pedestrian completely appears in the image, the image is easy to identify. Meanwhile, the name of the annotation file can be consistent with the corresponding image in the image sequence training sample set, the format of the annotation file can be xml, and the image and the annotation folder form a positive sample set Y and a negative sample set Y. In addition, in the embodiment of the present invention, the marking tool for marking the pedestrian region may be a homemade python tool, and the user may use the python tool to mark the pedestrian region, or may use another marking tool, which is not limited in this respect.

Step S104: calculating according to the positive and negative sample sets to obtain a first training set; specifically, the positive and negative sample sets may be divided into a training set and a first test set according to a first preset ratio, and then the training set may be divided into the first training set and a first verification set according to a second preset ratio.

Step S105: and carrying out iterative training on the deep convolutional neural network model based on the cascade network and the feature fusion according to the first training set to obtain a pedestrian detection model. Specifically, the deep convolutional neural network model can select a convolutional neural network model reaching a preset depth, and the deep convolutional neural network model comprises a cascade network and a feature fusion model, wherein the cascade network is used as a feature extraction module in the convolutional network, can comprise a plurality of convolutional layers and pooling layers and is used for extracting pedestrian features, and the performance of the model can be improved through iterative training; the feature fusion model can be a concat layer, the pedestrian features extracted from the cascade network are fused, and the optimal model in the iterative training is selected as the pedestrian detection model.

Through the steps S101 to S105, the method for training a pedestrian detection model according to the embodiment of the present invention converts video data including a pedestrian into an image sequence by obtaining the video data, and distinguishes the video data from an unobstructed pedestrian according to whether the pedestrian in the image sequence is obstructed or not, so as to improve the detection accuracy of the obstructed pedestrian, extract pedestrian features through a cascade network, improve the model performance through iterative training, and further effectively improve the detection effect of a small pedestrian target through feature fusion of the cascade network.

As an optional implementation manner of the embodiment of the present invention, the cascade network provided in the foregoing embodiment includes: an anchor point refining module and a target detection module; as shown in fig. 2, the step S105 performs iterative training on the deep convolutional neural network model based on the cascade network and the feature fusion according to the first training set to obtain the pedestrian detection model, which includes the following steps:

the method comprises the following steps of S201, inputting a first training set into an anchor point refining module to obtain a first feature vector through calculation, specifically, obtaining images in the first training set, adjusting the size of the obtained images to be 320 ＊ 320 uniformly, inputting the images with the size into an anchor point refining model to be calculated, and obtaining the image size required by a target detection model.

The method comprises the steps of S202, inputting a first feature vector into a target detection module to obtain a second feature vector through calculation, specifically, obtaining the image size in the first feature vector, selecting feature vectors with the image sizes of 40 ＊ 40, 20 ＊ 20, 10 ＊ 10 and 5 ＊ 5 obtained through convolution calculation of different layers in an anchor point refining module, inputting the feature vectors into the target detection module to be calculated to obtain the second feature vector, selecting feature vectors with the image sizes of 40 ＊ 40, 20 ＊ 20, 10 ＊ 10 and 5 ＊ 5 for obtaining deep features obtained in the anchor point refining module, wherein the target detection module can be connected with the anchor point refining module through a transmission link module, images in the first feature vector can be input into the target detection module through the transmission link module, the target detection module can further extract pedestrian features on the basis of the anchor point refining module, the target detection module takes the anchor point improved by the anchor point refining module as input, the second feature vector is obtained through calculation, the second feature vector comprises position information, line coordinate size, Person frame labeling and other information, and the like, and can be used for further improving the Person classification and the regression effect of the image and the label.

And S203, respectively inputting the first feature vector and the second feature vector into a feature fusion module for calculation to obtain a first loss function and a second loss function, specifically, obtaining the image sizes of the first feature vector and the second feature vector, selecting the feature vectors with the image sizes of 40 ＊ 40, 20 ＊ 20, 10 ＊ 10 and 5 ＊ 5 from the two feature vectors obtained by different layers of convolution calculation in a target detection module, and respectively inputting the feature vectors into the feature fusion module for convolution calculation to obtain the first loss function and the second loss function.

Step S204: superposing the first loss function and the second loss function, and calculating to obtain a loss function; in particular, the first loss function may be expressed as

Wherein N is_armTo the number of anchor points, L_bFor the classification loss, the method is used for calculating whether the classification is correct or not, namely whether the classification is correct or not for the image background, the Person label, the Blocked Person label and other categories, and L_rThe method is used for calculating the offset of an object detection frame and a real frame for regression loss, wherein the object detection frame is the coordinate of a pedestrian frame obtained from the first feature vector, the real frame is the coordinate of the pedestrian frame actually displayed by the image, and p_iProbability value, x, of whether or not it is an object predicted for the network_iRepresenting the detected coordinates, g_iRepresenting real coordinates, i representing the respective images in the first training set; the second loss function can be expressed as

Wherein N is_odmFor object detection module acquisitionNumber of anchor points, L_mTo classify the loss, L_rTo return loss, c_iProbability of detection box belonging to each category, t_iAnd g_iRespectively representing the detection coordinates and the real coordinates; after the first loss function and the second loss function are obtained through calculation, the first loss function and the second loss function are superposed, and the loss function is obtained through calculation and can be represented by a formula (1):

wherein the content of the first and second substances,

representing the loss of only positive samples in the positive and negative sample sets calculated in the block regression. Since the positive sample set is a sample set containing pedestrians, the negative sample set is other sample sets, and the final output result only needs the pedestrian frame and does not need the frames of other categories, only the loss of the positive sample is calculated.

And S205, selecting a model with the lowest loss function value as a pedestrian detection model, specifically, fusing the anchor point refining module and the characteristic vectors with the sizes of 40 ＊ 40, 20 ＊ 20, 10 ＊ 10 and 5 ＊ 5 in the target detection module by characteristic fusion for calculating the loss function value, and taking the model with the lowest calculated loss function value as the pedestrian detection model according to a formula (1).

In the embodiment of the invention, the anchor point refining module and the target detection module are built, the first training set is input into the cascade network for calculation, judgment and classification are carried out step by step, regression of a coarse detection frame to a fine detection frame is realized, and the detection precision of the pedestrian detection model is further improved; meanwhile, the loss function is calculated by carrying out feature fusion on the cascade network, so that the detection effect of the pedestrian detection model on the small pedestrian target is further effectively improved.

As an optional implementation manner of the embodiment of the present invention, the method for training a pedestrian detection model provided in the embodiment of the present invention further includes: inputting a first verification set obtained by calculating in the positive and negative sample sets in the embodiment into the obtained pedestrian detection model for verification; in addition, the first test set obtained by calculating the positive and negative sample sets in the embodiment is input into the verified pedestrian detection model for testing, so that a test result is obtained. Whether the detection result of the pedestrian detection model obtained by the embodiment of the invention meets the preset standard can be judged through the verification and test process, for example, the detection result can be divided into different scores, when the detection result of the preset proportion sample set is higher than the preset score, the detection result meets the preset standard, and the pedestrian detection model can be used for pedestrian detection.

An embodiment of the present invention further provides a pedestrian detection method, as shown in fig. 3, the pedestrian detection method includes the following steps:

step S301: acquiring to-be-detected video data containing pedestrians; specifically, the video to be detected including the pedestrian may be a monitoring video installed at each intersection, or may also be video data in different outdoor situations including a subway exit, a supermarket exit, a market exit, a train station exit, a school and the like, which is not limited by the present invention.

Step S302: converting video data to be detected into an image sequence detection sample set; the sequence of images is a series of images that are acquired sequentially at different times and different orientations of the object, and the video is composed of a series of images called frames, which are acquired at fixed time intervals (called frame rate, usually expressed in frames/second), so that the scene in motion can be displayed. The invention can adopt the existing video conversion software to convert the video data to be detected into the image sequence detection sample set.

Step S303: and inputting the image sequence detection sample set into the pedestrian detection model obtained by the method for training the pedestrian detection model in the embodiment to obtain a detection result. Specifically, the image sequence detection sample set may be input to a pedestrian detection model obtained by a method of training a pedestrian detection model as shown in fig. 1 to 2 for detection, so as to obtain a detection result.

Through the steps S301 to S303, in the pedestrian detection method provided in the embodiment of the present invention, the video data including the pedestrian is obtained, the video data is converted into the image sequence, the image sequence is input into the pedestrian detection model obtained by the method for training the pedestrian detection model in the embodiment, whether the pedestrian in the image sequence is blocked is provided with the individual label for blocking the pedestrian, the individual label is distinguished from the pedestrian that is not blocked in the image, the deep learning is performed on the partial trunk of the human body individually, and the detection accuracy for blocking the pedestrian is improved.

An embodiment of the present invention further provides a method for updating a pedestrian detection model, as shown in fig. 4, the method for updating a pedestrian detection model further includes the following steps:

step S401: acquiring detection results at intervals of preset time, wherein the detection results are obtained by the pedestrian detection method in the embodiment; specifically, according to the pedestrian detection method in the embodiment, video data at different times are detected, and after detection results are obtained, the detection results are obtained at intervals.

Step S402: calculating according to the detection result to obtain a second training set; specifically, the data in the detection result may be acquired as the second training set.

Step S403: by adopting the method for training the pedestrian detection model, the updated model is obtained according to the second training set. Specifically, the second training set may be trained according to the method for training a pedestrian detection model shown in fig. 1 to 2, so as to obtain an updated pedestrian detection model.

Step S404: the accuracy of the update model and the pedestrian detection model is judged.

Step S405: and when the accuracy of the pedestrian detection model is lower than that of the updating model, detecting the video data containing the pedestrian by taking the updating model as the pedestrian detection model.

Step S406: and when the accuracy of the pedestrian detection model is higher than that of the updated model, detecting the video data containing the pedestrian according to the pedestrian detection model.

As an optional implementation manner of the embodiment of the present invention, as shown in fig. 5, the step S402 obtains a second training set by calculation according to the detection result, and includes the following steps:

step S501: taking a result with the score higher than a preset value in the detection result as a positive sample set, and taking a result with the score lower than the preset value in the detection result as a negative sample set; specifically, the detection result is obtained according to the pedestrian detection method shown in fig. 3, the detection result includes the detection score of the video data to be detected, the detection result is obtained once every preset time, the detection result with the score of 0-0.2 in the detection result is used as a negative sample set, the detection result with the score of more than 0.9 is used as a positive sample set, the similarity detection is performed on the images included in the negative sample set and the positive sample set, the repeated samples are removed, and the precision of the subsequent detection by using the training set can be improved. In the embodiment of the present invention, the scores of the detection results in the negative sample set and the positive sample set are only examples, and the detection results including other scores may also be used as the positive sample set and the negative sample set, which is not limited in the present invention.

Step S502: and calculating according to the positive sample set and the negative sample set to obtain a second training set. Specifically, the ratio of 3: randomly sampling the negative sample set and the positive sample set according to the proportion of 1, repeating for 5 times to obtain a sample set G with five times of sampling₁、G₂、G₃、G₄And G₅The five sample sets are respectively merged with the first training set obtained in step S104 of the above embodiment to obtain merged sample sets H1, H2, H3, H4, and H5, so that the second training set may include five sample sets, or obtain other number of sample sets according to the sampling times, which is not limited in the present invention. When the pedestrian detection model is subjected to iterative training according to the second training set, the five sample sets can be respectively input into the model to obtain five updating models, and the accuracy of the five updating models and the accuracy of the pedestrian detection model can be compared later.

As an alternative implementation manner of the embodiment of the present invention, as shown in fig. 6, the step S404 of determining the accuracy of the update model and the pedestrian detection model includes:

step S601: calculating to obtain a second test set according to the to-be-detected video data containing the pedestrians; specifically, part of the latest video data to be detected is acquired, the video data to be detected is converted into an image sequence set, and the obtained image sequence set is a second test set.

Step S602: judging the precision of the updating model and the pedestrian detection model according to the first test set and the second test set; specifically, a first test set is obtained in step S104 and a second test set is obtained in step S601 according to the above embodiment, the first test set and the second test set are input into the five update models obtained in step S403 and the pedestrian detection model obtained by the method for training the pedestrian detection model according to the above embodiment for detection, and the accuracy of the update models and the pedestrian detection model is determined according to the detection result.

According to the method for updating the pedestrian detection model, provided by the embodiment of the invention, the pedestrian detection model obtained by the method for training the pedestrian detection model in the embodiment is updated by extracting the data in the detection result, so that the detection performance of the model is improved, meanwhile, the model with higher precision can be obtained by judging the precision of the updated model to detect the video data to be detected containing pedestrians, the situations of false alarm and missing alarm are reduced, and the detection precision of the pedestrian detection method is improved.

An embodiment of the present invention further provides a pedestrian detection apparatus, as shown in fig. 7, the pedestrian detection apparatus may include a processor 51 and a memory 52, where the processor 51 and the memory 52 may be connected by a bus or in another manner, and fig. 7 takes the example of connection by a bus as an example.

The processor 51 may be a Central Processing Unit (CPU). The Processor 51 may also be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, or combinations thereof.

The memory 52 is a non-transitory computer readable storage medium, and can be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as the apparatuses corresponding to the pedestrian detection methods in the embodiments of the present invention. The processor 51 executes various functional applications and data processing of the processor by running non-transitory software programs, instructions and modules stored in the memory 52, that is, implements the pedestrian detection method in the above-described method embodiment.

The memory 52 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created by the processor 51, and the like. Further, the memory 52 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 52 may optionally include memory located remotely from the processor 51, and these remote memories may be connected to the processor 51 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The one or more modules are stored in the memory 52 and, when executed by the processor 51, perform the pedestrian detection method in the embodiment shown in fig. 3.

The details of the pedestrian detection device can be understood by referring to the corresponding related description and effects in the embodiment shown in fig. 3, and are not described herein again.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a flash Memory (FlashMemory), a Hard Disk (Hard Disk Drive, abbreviated as HDD), a Solid State Drive (SSD), or the like; the storage medium may also comprise a combination of memories of the kind described above.

Although the embodiments of the present invention have been described in conjunction with the accompanying drawings, those skilled in the art may make various modifications and variations without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope defined by the appended claims.

Claims

1. A method of training a pedestrian detection model, comprising:

acquiring training video data containing pedestrians;

converting the training video data into an image sequence training sample set;

marking pedestrian areas in the image sequence training sample set according to whether pedestrians in the image sequence training sample set are shielded to obtain positive and negative sample sets;

calculating to obtain a first training set according to the positive and negative sample sets;

and performing iterative training on the deep convolutional neural network model based on the cascade network and the feature fusion according to the first training set to obtain the pedestrian detection model.

2. The method of training a pedestrian detection model in accordance with claim 1, wherein the cascade network comprises: an anchor point refining module and a target detection module;

iteratively training a deep convolutional neural network model based on a cascade network and feature fusion according to the first training set to obtain the pedestrian detection model, comprising,

inputting the first training set into the anchor point refinement module to calculate to obtain a first feature vector;

inputting the first feature vector into the target detection module to calculate to obtain a second feature vector;

inputting the first feature vector and the second feature vector into the feature fusion module respectively for calculation to obtain a first loss function and a second loss function;

superposing the first loss function and the second loss function, and calculating to obtain a loss function;

and selecting the model with the lowest loss function value as the pedestrian detection model.

3. The method of training a pedestrian detection model in accordance with claim 1, further comprising:

calculating the positive and negative sample sets according to a preset proportion to obtain a first verification set;

and verifying the pedestrian detection model according to the first verification set.

4. The method of training a pedestrian detection model in accordance with claim 3, further comprising:

calculating the positive and negative sample sets according to a preset proportion to obtain a first test set;

and testing the verified pedestrian detection model according to the first test set to obtain a test result.

5. A pedestrian detection method, characterized by comprising:

acquiring to-be-detected video data containing pedestrians;

converting the video data to be detected into an image sequence detection sample set;

inputting the image sequence detection sample set into a pedestrian detection model generated by training according to the method for training a pedestrian detection model of any one of claims 1 to 4, and obtaining a detection result.

6. A method of updating a pedestrian detection model, comprising:

acquiring detection results at intervals of preset time, wherein the detection results are obtained according to the pedestrian detection method of claim 5;

calculating to obtain a second training set according to the detection result;

obtaining an updated model from the second training set using the method of any of claims 1-4;

judging the accuracy of the updating model and the pedestrian detection model;

when the accuracy of the pedestrian detection model is lower than that of the updated model, the updated model is used as a pedestrian detection model to detect video data containing pedestrians;

and when the accuracy of the pedestrian detection model is higher than that of the updating model, detecting the video data containing the pedestrian according to the pedestrian detection model.

7. The method of updating a pedestrian detection model according to claim 6, wherein calculating a second training set from the detection results comprises:

taking a result with a score higher than a preset value in the detection result as a positive sample set, and taking a result with a score lower than a preset value in the detection result as a negative sample set;

and calculating to obtain a second training set according to the positive sample set and the negative sample set.

8. The method of updating a pedestrian detection model according to claim 6, wherein determining the accuracy of the updated model and the pedestrian detection model comprises:

calculating to obtain a second test set according to the to-be-detected video data containing the pedestrians;

and judging the accuracy of the updating model and the pedestrian detection model according to a first test set and the second test set, wherein the first test set is obtained according to the method for training the pedestrian detection model in claim 4.

9. A computer-readable storage medium having stored thereon computer instructions for causing a computer to perform the method of any one of claims 1-4, 5, or 6-8.

10. A pedestrian detection apparatus, characterized by comprising: a memory and a processor communicatively coupled to each other, the memory storing computer instructions, the processor executing the computer instructions to perform the method of any of claims 1-4, 5, or 6-8.