CN111008576B

CN111008576B - Pedestrian detection and model training method, device and readable storage medium

Info

Publication number: CN111008576B
Application number: CN201911163826.9A
Authority: CN
Inventors: 肖刚; 周捷; 王逸飞; 王正来
Original assignee: Gaochuang Anbang Beijing Technology Co ltd
Current assignee: Gaochuang Anbang Beijing Technology Co ltd
Priority date: 2019-11-22
Filing date: 2019-11-22
Publication date: 2023-09-01
Anticipated expiration: 2039-11-22
Also published as: CN111008576A

Abstract

The invention discloses a pedestrian detection and model training and updating method, equipment and a readable storage medium thereof, wherein the method for training the pedestrian detection model comprises the following steps: acquiring training video data containing pedestrians; converting training video data into an image sequence training sample set; labeling the pedestrian region in the image sequence training sample set according to whether the pedestrian in the image sequence training sample set is shielded or not to obtain a positive sample set and a negative sample set; calculating according to the positive and negative sample sets to obtain a first training set; and carrying out iterative training on the deep convolutional neural network model based on the cascade network and the feature fusion according to the first training set to obtain a pedestrian detection model. According to the method for training the pedestrian detection model, provided by the embodiment of the invention, the independent labels of the blocked pedestrians are endowed according to whether the pedestrians are blocked in the image sequence or not, and the method is distinguished from the pedestrians which are not blocked in the image, so that the detection precision of the blocked pedestrians is improved.

Description

Pedestrian detection and model training method, device and readable storage medium

Technical Field

The invention relates to the technical field of pedestrian detection, in particular to a pedestrian detection method, a pedestrian detection model training method, a pedestrian detection model updating device and a pedestrian detection model updating program.

Background

The pedestrian detection technology is a technology for automatically searching the position and the size of a pedestrian in any input image, is an important problem in the field of target detection, and has wide application in the fields of automatic driving, video monitoring, biological feature recognition, behavior analysis and the like.

Under the complex environment in real life, the clothing of different pedestrians is different, the situation that is confused with the background is easy to generate, the situation that the trunk part is blocked easily appears simultaneously, and the interference of the view angle, the illumination and the like of the monitoring lens, so that the problem of blocking the pedestrians is one of the biggest challenges faced by the current pedestrian detection, and particularly under the crowded scene, how to perform high-efficiency and accurate pedestrian detection is a hot spot and a difficult point of research.

The traditional pedestrian detection method generally adopts a mode of manually designing and extracting features, so that a good detection effect is usually obtained only in a specific scene, and the robustness of an algorithm is difficult to guarantee.

Disclosure of Invention

In view of the above, the embodiment of the invention provides a pedestrian detection method, a model training method, a model updating device and a readable storage medium for solving the problem that the existing pedestrian detection algorithm is poor in accuracy.

The technical scheme provided by the invention is as follows:

a first aspect of an embodiment of the present invention provides a method for training a pedestrian detection model, the method including: acquiring training video data containing pedestrians; converting the training video data into an image sequence training sample set; labeling the pedestrian region in the image sequence training sample set according to whether the pedestrian in the image sequence training sample set is shielded or not to obtain a positive sample set and a negative sample set; calculating according to the positive and negative sample sets to obtain a first training set; and performing iterative training on the deep convolutional neural network model based on the cascade network and the feature fusion according to the first training set to obtain the pedestrian detection model.

In a first implementation manner of the first aspect according to the first aspect, the cascade network includes: an anchor point refining module and a target detection module; performing iterative training on a deep convolutional neural network model based on cascade network and feature fusion according to the first training set to obtain the pedestrian detection model, wherein the iterative training comprises the steps of inputting the first training set into the anchor point refinement module to calculate to obtain a first feature vector; inputting the first feature vector into the target detection module to calculate to obtain a second feature vector; respectively inputting the first feature vector and the second feature vector into the feature fusion module for calculation to obtain a first loss function and a second loss function; superposing the first loss function and the second loss function, and calculating to obtain a loss function; and selecting the model with the lowest loss function value as the pedestrian detection model.

In a second implementation of the first aspect, according to the first aspect, the method for training a pedestrian detection model further includes: calculating from the positive and negative sample sets according to a preset proportion to obtain a first verification set; and verifying the pedestrian detection model according to the first verification set.

In a third implementation form of the first aspect according to the second implementation form of the first aspect, the method for training a pedestrian detection model further comprises: calculating a first test set from the positive and negative sample sets according to a preset proportion; and testing the verified pedestrian detection model according to the first test set to obtain a test result.

A second aspect of an embodiment of the present invention provides a pedestrian detection method, including: acquiring video data to be detected containing pedestrians; converting the video data to be detected into an image sequence detection sample set; inputting the image sequence detection sample set into the pedestrian detection model generated by training according to the method for training the pedestrian detection model in any one of the first aspect and the first aspect of the embodiment of the invention, so as to obtain a detection result.

A third aspect of the embodiment of the present invention provides a method for updating a pedestrian detection model, the method including: acquiring detection results obtained by the pedestrian detection method according to the second aspect of the embodiment of the invention every preset time; calculating a second training set according to the detection result; by adopting the method according to any one of the first aspect and the first aspect of the embodiments of the present invention, an update model is obtained according to the second training set; judging the precision of the updated model and the pedestrian detection model; when the precision of the pedestrian detection model is lower than that of the updating model, the updating model is used as a pedestrian detection model to detect the video data containing pedestrians; and when the precision of the pedestrian detection model is higher than that of the updated model, detecting the video data containing the pedestrians according to the pedestrian detection model.

According to a second aspect, in a first implementation manner of the second aspect, the calculating to obtain the second training set according to the detection result includes: taking the result with the score higher than a preset value in the detection result as a positive sample set, and taking the result with the score lower than the preset value in the detection result as a negative sample set; and calculating to obtain a second training set according to the positive sample set and the negative sample set.

According to a second aspect, in a second implementation manner of the second aspect, determining the accuracy of the update model and the pedestrian detection model includes: calculating according to the positive sample set and the negative sample set to obtain a second test set; calculating to obtain a second test set according to the video data to be detected containing pedestrians; and judging the precision of the updating model and the pedestrian detection model according to a first test set and the second test set, wherein the first test set is obtained by the method for training the pedestrian detection model according to the third implementation mode of the first aspect.

A fourth aspect of the embodiment of the present invention provides a computer-readable storage medium storing computer instructions for causing the computer to perform the method for training a pedestrian detection model according to any one of the first aspect and the first aspect of the embodiment of the present invention, or to perform the pedestrian detection method according to the second aspect of the embodiment of the present invention, or to perform the method for updating a pedestrian detection model according to any one of the third aspect and the third aspect of the embodiment of the present invention.

A fifth aspect of an embodiment of the present invention provides a pedestrian detection apparatus including: the device comprises a memory and a processor, wherein the memory and the processor are in communication connection, the memory stores computer instructions, and the processor executes the computer instructions to execute the method for training the pedestrian detection model according to any one of the first aspect and the first aspect of the embodiment of the invention, or execute the pedestrian detection method according to the second aspect of the embodiment of the invention, or execute the method for updating the pedestrian detection model according to any one of the third aspect and the third aspect of the embodiment of the invention.

The technical scheme provided by the invention has the following effects:

according to the pedestrian detection and model training, updating method, device and readable storage medium, the video data containing pedestrians are obtained, the video data are converted into the image sequence, and independent labels for shielding the pedestrians are given according to whether the pedestrians are shielded in the image sequence or not, so that the pedestrians which are not shielded in the image are distinguished from each other, the detection precision of the pedestrians which are shielded in the image is improved, meanwhile, the characteristics of the pedestrians are extracted through a cascade network, the model performance is improved through iterative training, and in addition, the detection effect of small objects of the pedestrians is further effectively improved through the characteristic fusion of the cascade network.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a method of training a pedestrian detection model in accordance with an embodiment of the invention;

FIG. 2 is a flow chart of a method of training a pedestrian detection model in accordance with another embodiment of the invention;

fig. 3 is a flowchart of a pedestrian detection method according to an embodiment of the invention;

FIG. 4 is a flow chart of a method of updating a pedestrian detection model in accordance with an embodiment of the invention;

FIG. 5 is a flow chart of a method of updating a pedestrian detection model in accordance with another embodiment of the invention;

FIG. 6 is a flow chart of a method of updating a pedestrian detection model in accordance with another embodiment of the invention;

fig. 7 is a schematic hardware structure of a pedestrian detection apparatus according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The pedestrian detection technique is a technique for automatically searching the position and size of a pedestrian in an arbitrary input image, and is widely used in the fields of computer vision, pattern recognition, and the like, such as automatic driving, video monitoring, biometric recognition, and the like.

In a complex environment in real life, the problem of shielding pedestrians is one of the biggest challenges faced by pedestrian detection at present, and particularly in a crowded scene, how to perform efficient and accurate pedestrian detection is a hot spot and difficulty of research. According to the pedestrian detection method, training video data containing pedestrians are obtained through Deep Learning and converted into image sequences, and the image sequences are marked according to whether the pedestrians are shielded or not, so that pedestrian detection results are obtained.

Deep learning is a learning method for establishing a deep structure model, and typical deep learning algorithms include deep belief networks, convolutional neural networks, limited boltzmann machines, cyclic neural networks and the like. Deep learning is also known as deep neural networks (meaning neural networks with more than 3 layers). Deep learning is derived from a multi-layer neural network, and essentially provides a way to integrate feature representation and learning. The deep learning is characterized in that the interpretability is abandoned, and the learning effectiveness is simply pursued.

Referring to fig. 1, the method for training a pedestrian detection model in the embodiment of the invention mainly includes the following steps:

step S101: acquiring training video data containing pedestrians; specifically, the video including pedestrians can be a monitoring video installed at each intersection, or can be video data including subway exits, supermarket exits, market exits, train station exits, in-school places and other different outdoor places, and the invention is not limited to this.

Step S102: converting training video data into an image sequence training sample set; in this case, the image sequence is a series of images sequentially acquired for the target at different times and different directions, and the video is composed of a series of images, which are called frames, and the frames are acquired at fixed time intervals (called frame rate, which is usually expressed by frames/second), so that a scene in motion can be displayed. The present invention can use existing video conversion software to convert training video data into image sequence training sample sets.

Step S103: and labeling the pedestrian area in the image sequence training sample set according to whether the pedestrians in the image sequence training sample set are blocked or not, so as to obtain a positive sample set and a negative sample set. The positive sample set is a sample set containing pedestrians, and the negative sample set is a sample set not containing pedestrians.

Specifically, all pedestrian areas in the image sequence training sample set can be marked in a rectangular frame mode, and Person and Blocked Person labels are given according to whether the pedestrians are Blocked, wherein the Person labels indicate that the pedestrians are not Blocked, and the Blocked Person labels indicate that the pedestrians are Blocked. And storing information such as names, sizes, labels corresponding to pedestrian areas, whether the labels are complete, whether the labels are easy to identify, coordinates and the like of the images in the image sequence training sample set to corresponding annotation files in a standard voc format, wherein whether the images are complete refers to whether pedestrians in the annotated pedestrian areas are completely present in the images, and when the pedestrians are completely present in the images, the images are indicated to be easy to identify. Meanwhile, the name of the annotation file can be consistent with the corresponding image in the image sequence training sample set, the annotation file format can be xml, and the image and the annotation file form a positive and negative sample set Y. In addition, the marking tool for marking the pedestrian area in the embodiment of the invention can be a self-made python tool, and the user can mark the pedestrian area by using the python tool and can also use other marking tools, so that the invention is not limited to the marking tool.

Step S104: calculating according to the positive and negative sample sets to obtain a first training set; specifically, the positive and negative sample sets may be divided into a training set and a first test set according to a first preset ratio, and then the training set may be divided into a first training set and a first verification set according to a second preset ratio.

Step S105: and carrying out iterative training on the deep convolutional neural network model based on the cascade network and the feature fusion according to the first training set to obtain a pedestrian detection model. Specifically, the deep convolutional neural network model can select a convolutional neural network model reaching a preset depth, the deep convolutional neural network model comprises a cascade network and a feature fusion model, wherein the cascade network is used as a feature extraction module in the convolutional network and can comprise a plurality of convolutional layers and pooling layers for extracting pedestrian features, and the performance of the model can be improved through iterative training; the feature fusion model can be a concat layer, fuses pedestrian features extracted from a cascade network, and selects an optimal model in iterative training as a pedestrian detection model.

Through the steps S101 to S105, the method for training the pedestrian detection model provided by the embodiment of the invention converts video data into the image sequence by acquiring the video data containing pedestrians, and gives a single label for shielding the pedestrians according to whether the pedestrians are shielded in the image sequence or not, so that the detection precision of the pedestrians which are not shielded in the image is improved, meanwhile, the characteristics of the pedestrians are extracted through the cascade network, the performance of the model is improved through iterative training, and in addition, the detection effect of small targets of the pedestrians is further effectively improved through the characteristic fusion of the cascade network.

As an alternative implementation manner of the embodiment of the present invention, the cascade network provided in the foregoing embodiment includes: an anchor point refining module and a target detection module; as shown in fig. 2, step S105 performs iterative training on a deep convolutional neural network model based on cascade network and feature fusion according to a first training set to obtain a pedestrian detection model, and includes the following steps:

step S201: inputting the first training set into an anchor point refinement module to calculate to obtain a first feature vector; specifically, the images in the first training set are acquired, the acquired image sizes are adjusted, the sizes can be uniformly adjusted to 320 a, the images with the sizes are input into an anchor point refinement model for calculation, and the image sizes required by the target detection model can be obtained. And the anchor point refining module can be used for preliminarily extracting pedestrian characteristics. The first feature vector calculated by the anchor point refinement module contains marking information, position coordinates, size and the like of the pedestrian frame, so that the anchor point refinement module is used for roughly describing the position and size of the pedestrian and providing a better anchor point initialization position for the target detection module.

Step S202: inputting the first feature vector into a target detection module to calculate to obtain a second feature vector; specifically, the image size in the first feature vector is obtained, the feature vectors with the image sizes of 40 times 40, 20 times 20, 10 times 5, which are obtained by convolution calculation of different layers in the anchor point thinning module, are selected, and are input into the target detection module for calculation, so that the second feature vector is obtained. In order to obtain deep features obtained in the anchor refinement module, feature vectors having image sizes 40 x 40, 20 x 20, 10 x 10, 5 x 5 are selected. The target detection module can be connected with the anchor point refinement module through the transmission link module, and the image in the first feature vector can be input to the target detection module through the transmission link module. The object detection module can further extract pedestrian characteristics on the basis of the anchor point refinement module. The target detection module takes the anchor point improved by the anchor point refining module as input, and obtains a second feature vector through calculation, wherein the second feature vector contains marking information, the position coordinates, the size and the like of a pedestrian frame, can be used for distinguishing the categories of the background, the Person label, the Blocked Person label and the like of the image, and can further improve the regression and prediction effects.

Step S203: respectively inputting the first feature vector and the second feature vector into a feature fusion module for calculation to obtain a first loss function and a second loss function; specifically, the image sizes in the first characteristic vector and the second characteristic vector are obtained, the characteristic vectors with the image sizes of 40 times 40, 20 times 20, 10 times 10 and 5 times 5 in the two characteristic vectors obtained by convolution calculation of different layers in the target detection module are selected, and the characteristic vectors are respectively input into the characteristic fusion module to carry out convolution calculation, so that a first loss function and a second loss function are obtained.

Step S204:superposing the first loss function and the second loss function, and calculating to obtain a loss function; in particular, the first loss function may be expressed asWherein N is _arm For the number of anchor points, L _b For classifying loss, it is used to calculate whether the classification is correct, i.e. whether the classification is correct for the background, person label and Blocked Person label of the image, L _r For regression loss, the method is used for calculating the offset of an object detection frame and a real frame, wherein the object detection frame is the coordinates of a pedestrian frame obtained from a first feature vector, the real frame is the coordinates of the pedestrian frame actually displayed by an image, and p _i For network predicted probability value of whether or not it is an object, x _i Representing the detection coordinates g _i Representing real coordinates, i representing each image in the first training set; the second loss function may be expressed as +.> Wherein N is _odm The number of anchor points obtained for the target detection module, L _m To classify losses, L _r For regression loss c _i To detect the probability that a frame belongs to each category, t _i And g is equal to _i Representing the detection coordinate and the real coordinate respectively; after the first loss function and the second loss function are obtained through calculation, the first loss function and the second loss function are overlapped, the loss function is obtained through calculation, and the loss function can be represented by a formula (1):

wherein, the liquid crystal display device comprises a liquid crystal display device,representing the loss of only positive samples in the positive and negative sample sets calculated at the time of block regression. Since the positive sample set isThe sample set containing pedestrians, the negative sample set is other sample sets, and the final output result only needs the frames of the pedestrians and does not need the frames of the rest categories, so that only the loss of the positive sample is calculated.

Step S205: and selecting the model with the lowest loss function value as a pedestrian detection model. Specifically, the feature fusion fuses feature vectors with sizes of 40×40, 20×20, 10×10 and 5×5 in the anchor point refinement module and the target detection module, and is used for calculating the loss function value, so that the model with the lowest loss function value can be calculated according to the formula (1) as the pedestrian detection model.

In the embodiment of the invention, the anchor point refinement module and the target detection module are built to form a two-layer cascade network, the first training set is input into the cascade network for calculation, judgment and classification are gradually carried out, the regression of a coarse detection frame to a fine detection frame is realized, and the detection precision of the pedestrian detection model is further improved; meanwhile, the feature fusion is carried out on the cascade network, and the loss function is calculated, so that the detection effect of the pedestrian detection model on the pedestrian small target is further effectively improved.

As an optional implementation manner of the embodiment of the present invention, the method for training the pedestrian detection model provided by the embodiment of the present invention further includes: inputting the first verification set obtained by calculation in the positive and negative sample sets in the embodiment into the obtained pedestrian detection model for verification; in addition, the first test set obtained by calculation in the positive and negative sample sets in the embodiment is input into the verified pedestrian detection model for testing, and a test result is obtained. Through the verification and test process, whether the detection result of the pedestrian detection model obtained by the embodiment of the invention meets the preset standard can be judged, for example, the detection result can be divided into different scores, and when the detection result of the preset proportion sample set is higher than the preset score, the detection result meets the preset standard, and the pedestrian detection model can be used for pedestrian detection.

The embodiment of the invention also provides a pedestrian detection method, as shown in fig. 3, which comprises the following steps:

step S301: acquiring video data to be detected containing pedestrians; specifically, the video to be detected including pedestrians can be a monitoring video installed at each intersection, or can be video data including subway exits, supermarket exits, market exits, train station exits, schools and other different outdoor occasions, and the invention is not limited to this.

Step S302: converting the video data to be detected into an image sequence detection sample set; in this case, the image sequence is a series of images sequentially acquired for the target at different times and different directions, and the video is composed of a series of images, which are called frames, and the frames are acquired at fixed time intervals (called frame rate, which is usually expressed by frames/second), so that a scene in motion can be displayed. The invention can adopt the existing video conversion software to convert the video data to be detected into an image sequence detection sample set.

Step S303: and inputting the image sequence detection sample set into the pedestrian detection model obtained by the pedestrian detection model training method in the embodiment, and obtaining a detection result. Specifically, the image sequence detection sample set may be input to a pedestrian detection model obtained by a method of training the pedestrian detection model as shown in fig. 1 to 2 for detection, thereby obtaining a detection result.

Through the steps S301 to S303, in the pedestrian detection method provided by the embodiment of the invention, the video data including the pedestrians is obtained and converted into the image sequence, the image sequence is input into the pedestrian detection model obtained through the method of training the pedestrian detection model in the embodiment, whether the pedestrians in the image sequence are blocked or not is given with a separate label for blocking the pedestrians, the pedestrians are distinguished from the pedestrians which are not blocked in the image, and the human body part trunk is subjected to deep learning independently, so that the detection precision of blocking the pedestrians is improved.

The embodiment of the invention also provides a method for updating the pedestrian detection model, as shown in fig. 4, the method for updating the pedestrian detection model further comprises the following steps:

step S401: acquiring detection results obtained by the pedestrian detection method in the embodiment at intervals of preset time; specifically, the pedestrian detection method according to the above embodiment detects video data at different times, and obtains detection results at intervals after obtaining the detection results.

Step S402: calculating a second training set according to the detection result; specifically, data in the detection result may be acquired as the second training set.

Step S403: by adopting the method for training the pedestrian detection model, an updated model is obtained according to the second training set. Specifically, the second training set may be trained according to the method for training a pedestrian detection model shown in fig. 1 to 2, so as to obtain an updated pedestrian detection model.

Step S404: and judging the precision of the updating model and the pedestrian detection model.

Step S405: and when the precision of the pedestrian detection model is lower than that of the updating model, detecting the video data containing the pedestrian by taking the updating model as the pedestrian detection model.

Step S406: and when the precision of the pedestrian detection model is higher than that of the updating model, detecting the video data containing the pedestrian according to the pedestrian detection model.

As an optional implementation manner of the embodiment of the present invention, as shown in fig. 5, step S402 calculates a second training set according to the detection result, including the following steps:

step S501: taking the result with the score higher than a preset value in the detection result as a positive sample set, and taking the result with the score lower than the preset value in the detection result as a negative sample set; specifically, according to the pedestrian detection method shown in fig. 3, a detection result is obtained, the detection result includes a detection score of video data to be detected, the detection result is obtained once every preset time, the detection result with the score of 0-0.2 in the detection result is used as a negative sample set, the detection result with the score of more than 0.9 is used as a positive sample set, similarity detection is performed on images included in the negative sample set and the positive sample set, repeated samples are removed, and the accuracy of detection performed by using the training set in the follow-up process can be improved. In the embodiment of the invention, the scores of the detection results in the negative sample set and the positive sample set are only illustrative, and the detection results containing other scores can be used as the positive sample set and the negative sample set, which is not limited by the invention.

Step S502: and calculating according to the positive sample set and the negative sample set to obtain a second training set. Specifically, it is possible to follow 3:1 randomly sampling the negative sample set and the positive sample set, repeating for 5 times to obtain a sample set G of five times ₁ 、G ₂ 、G ₃ 、G ₄ G (G) ₅ The five sample sets are respectively combined with the first training set obtained in step S104 in the foregoing embodiment to obtain combined sample sets H1, H2, H3, H4 and H5, so that the second training set may include five sample sets, or may obtain other sample sets according to the number of sampling times, which is not limited in the present invention. When the pedestrian detection model is subjected to iterative training according to the second training set, five sample sets can be respectively input into the model to obtain five updated models, and the accuracy of the five updated models and the pedestrian detection model can be compared.

As an alternative implementation manner of the embodiment of the present invention, as shown in fig. 6, step S404 determines the accuracy of the update model and the pedestrian detection model, including:

step S601: calculating to obtain a second test set according to the video data to be detected containing pedestrians; specifically, a part of the latest video data to be detected is acquired, the video data to be detected is converted into an image sequence set, and the obtained image sequence set is the second test set.

Step S602: judging the precision of the updating model and the pedestrian detection model according to the first test set and the second test set; specifically, the first test set is obtained in step S104 and the second test set is obtained in step S601 according to the above embodiment, the five update models obtained in step S403 and the pedestrian detection model obtained in the method for training the pedestrian detection model according to the above embodiment are input into the first test set and the second test set to detect, and the accuracy of the update model and the pedestrian detection model is determined according to the detection result.

According to the method for updating the pedestrian detection model, the pedestrian detection model obtained by training the pedestrian detection model in the embodiment is updated by extracting the data in the detection result, so that the detection performance of the model is improved, meanwhile, the model with higher precision can be obtained to detect the video data to be detected containing the pedestrian by judging the precision of the updated model, the false alarm and missing alarm conditions are reduced, and the detection precision of the pedestrian detection method is improved.

The embodiment of the present invention further provides a pedestrian detection device, as shown in fig. 7, where the pedestrian detection device may include a processor 51 and a memory 52, where the processor 51 and the memory 52 may be connected by a bus or other means, and in fig. 7, the connection is exemplified by a bus.

The processor 51 may be a central processing unit (Central Processing Unit, CPU). The processor 51 may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or combinations thereof.

The memory 52 is used as a non-transitory computer readable storage medium for storing a non-transitory software program, a non-transitory computer executable program, and a module, such as a device corresponding to the pedestrian detection method in the embodiment of the invention. The processor 51 executes various functional applications of the processor and data processing, namely, implements the pedestrian detection method in the above-described method embodiment by running non-transitory software programs, instructions, and modules stored in the memory 52.

Memory 52 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created by the processor 51, etc. In addition, memory 52 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 52 may optionally include memory located remotely from processor 51, which may be connected to processor 51 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The one or more modules are stored in the memory 52, which when executed by the processor 51, performs the pedestrian detection method in the embodiment shown in fig. 3.

The details of the above-mentioned pedestrian detection apparatus may be understood correspondingly with respect to the corresponding relevant descriptions and effects in the embodiment shown in fig. 3, and will not be repeated here.

It will be appreciated by those skilled in the art that implementing all or part of the above-described embodiment method may be implemented by a computer program to instruct related hardware, where the program may be stored in a computer readable storage medium, and the program may include the above-described embodiment method when executed. The storage medium may be a magnetic Disk, an optical disc, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a flash Memory (flash Memory), a Hard Disk (HDD), or a Solid State Drive (SSD); the storage medium may also comprise a combination of memories of the kind described above.

Although embodiments of the present invention have been described in connection with the accompanying drawings, various modifications and variations may be made by those skilled in the art without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope of the invention as defined by the appended claims.

Claims

1. A method of training a pedestrian detection model, comprising:

acquiring training video data containing pedestrians;

converting the training video data into an image sequence training sample set;

labeling the pedestrian region in the image sequence training sample set according to whether the pedestrian in the image sequence training sample set is shielded or not to obtain a positive sample set and a negative sample set;

calculating according to the positive and negative sample sets to obtain a first training set;

performing iterative training on a deep convolutional neural network model based on cascade network and feature fusion according to the first training set to obtain the pedestrian detection model;

the cascade network includes: an anchor point refining module and a target detection module;

performing iterative training on a deep convolutional neural network model based on cascade network and feature fusion according to the first training set to obtain the pedestrian detection model, comprising,

inputting the first training set into the anchor point refinement module to calculate to obtain a first feature vector;

inputting the first feature vector into the target detection module to calculate to obtain a second feature vector;

respectively inputting the first feature vector and the second feature vector into a feature fusion module for calculation to obtain a first loss function and a second loss function;

superposing the first loss function and the second loss function, and calculating to obtain a loss function;

and selecting the model with the lowest loss function value as the pedestrian detection model.

2. The method of training a pedestrian detection model of claim 1 further comprising:

calculating from the positive and negative sample sets according to a preset proportion to obtain a first verification set;

and verifying the pedestrian detection model according to the first verification set.

3. The method of training a pedestrian detection model of claim 2 further comprising:

calculating a first test set from the positive and negative sample sets according to a preset proportion;

and testing the verified pedestrian detection model according to the first test set to obtain a test result.

4. A pedestrian detection method, characterized by comprising:

acquiring video data to be detected containing pedestrians;

converting the video data to be detected into an image sequence detection sample set;

inputting the image sequence detection sample set into a pedestrian detection model generated by training the method for training the pedestrian detection model according to any one of claims 1-3, so as to obtain a detection result.

5. A method of updating a pedestrian detection model, comprising:

acquiring detection results obtained by the pedestrian detection method according to claim 4 every preset time;

calculating a second training set according to the detection result;

obtaining an updated model from the second training set using the method of any one of claims 1-3;

judging the precision of the updated model and the pedestrian detection model;

when the precision of the pedestrian detection model is lower than that of the updating model, the updating model is used as a pedestrian detection model to detect the video data containing pedestrians;

and when the precision of the pedestrian detection model is higher than that of the updated model, detecting the video data containing the pedestrians according to the pedestrian detection model.

6. The method of updating a pedestrian detection model of claim 5 wherein computing a second training set based on the detection results comprises:

taking the result with the score higher than a preset value in the detection result as a positive sample set, and taking the result with the score lower than the preset value in the detection result as a negative sample set;

and calculating to obtain a second training set according to the positive sample set and the negative sample set.

7. The method of updating a pedestrian detection model of claim 5, wherein determining the accuracy of the updated model and the pedestrian detection model comprises:

calculating to obtain a second test set according to the video data to be detected containing pedestrians;

and judging the precision of the updating model and the pedestrian detection model according to a first test set and the second test set, wherein the first test set is obtained by the method for training the pedestrian detection model according to claim 3.

8. A computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-3, 4 or 5-7.

9. A pedestrian detection apparatus characterized by comprising: a memory and a processor in communication with each other, the memory storing computer instructions, the processor executing the computer instructions to perform the method of any of claims 1-3, 4, or 5-7.