CN110490058B

CN110490058B - Training method, device and system of pedestrian detection model and computer readable medium

Info

Publication number: CN110490058B
Application number: CN201910615436.4A
Authority: CN
Inventors: 胡立; 孙培泽; 李伯勋; 俞刚
Original assignee: Beijing Megvii Technology Co Ltd
Current assignee: Beijing Megvii Technology Co Ltd
Priority date: 2019-07-09
Filing date: 2019-07-09
Publication date: 2022-07-26
Anticipated expiration: 2039-07-09
Also published as: CN110490058A

Abstract

The invention provides a training method, a device, a system and a computer readable medium of a pedestrian detection model, wherein the training method comprises the following steps: inputting a training image to a neural network to generate prediction information about a target object in the training image, wherein the prediction information comprises a detection frame position, a detection frame weight and a detection frame score, the detection frame weight represents the similarity of the target object in the detection frame and a background, and the higher the similarity is, the lower the detection frame weight is; calculating a first classification error between the detection box score and a score true value, and calculating a weighted classification error according to the detection box weight and the first classification error; updating network parameters of the neural network based at least on the weighted classification errors. According to the invention, the detection frame weight of the pedestrian sample similar to the background is automatically reduced in the training process of the pedestrian detection model, so that the adverse effect of the confusable sample on the network parameters is reduced, and the accuracy of pedestrian detection is greatly improved.

Description

Training method, device and system of pedestrian detection model and computer readable medium

Technical Field

The invention relates to the technical field of pedestrian detection, in particular to a training method, a device and a system of a pedestrian detection model and a computer readable medium.

Background

Pedestrian detection has wide application in the fields of security protection, automatic driving and the like, and aims to find out the position of a pedestrian from an image or a video. Pedestrian detection is the basis for many other visual tasks, such as pedestrian re-recognition, pedestrian tracking, and pedestrian motion recognition. Due to the fact that a lot of background foreign matters which are similar to the appearance of the pedestrian exist in the pedestrian scene, the pedestrian detection system can generate false detection conditions in the scenes, and the accuracy of pedestrian detection is reduced. In security or autopilot scenarios, erroneous detection results can lead to serious consequences, and therefore a more accurate detection system is needed to reduce the interference of foreign objects similar to pedestrians on pedestrian detection.

Disclosure of Invention

In order to solve the problems, the invention provides a training scheme of a pedestrian detection model based on weight self-adjustment. The following briefly describes the training scheme of the pedestrian detection model proposed by the present invention, and more details will be described in the following detailed description with reference to the accompanying drawings.

According to an aspect of the embodiments of the present invention, there is provided a method for training a pedestrian detection model, the method including:

inputting a training image to a neural network to generate prediction information about a target object in the training image, wherein the prediction information comprises a detection frame position, a detection frame weight and a detection frame score, the detection frame weight represents the similarity of the target object in the detection frame and a background, and the higher the similarity is, the lower the detection frame weight is;

calculating a first classification error between the detection box score and a score true value, and calculating a weighted classification error according to the detection box weight and the first classification error;

updating network parameters of the neural network based at least on the weighted classification errors.

In one embodiment, the weighted classification error is a product of the detection box weight and the first classification error.

In one embodiment, the method further comprises: calculating a second classification error between the detection frame weight and a weight true value; and updating the network parameter based on the second classification error.

In one embodiment, the method further comprises: calculating a position error between the position of the detection frame and a position true value; and updating the network parameter based on the location error.

In one embodiment, the generating the prediction information of the training targets in the training image comprises: performing feature extraction on the training image based on the neural network to generate a feature map of the training image; and generating the prediction information according to the feature map.

According to another aspect of the embodiments of the present invention, there is provided a training device of a pedestrian detection model, including:

a prediction module, configured to input a training image to a neural network to generate prediction information about a target object in the training image, where the prediction information includes a detection box position, a detection box weight, and a detection box score, where the detection box weight represents similarity between the target object and a background in the detection box, and the higher the similarity is, the lower the detection box weight is;

the error calculation module is used for calculating a first classification error between the detection box score and the score true value and calculating a weighted classification error according to the detection box weight and the first classification error; and

a training module to update network parameters based at least on the weighted classification errors.

In one embodiment, the apparatus further comprises: the characteristic extraction module is used for extracting the characteristics of the training images on the basis of the neural network so as to generate a characteristic diagram of the training images; and the prediction module generates the prediction information according to the feature map.

According to a further aspect of the embodiments of the present invention, there is provided a training system for a pedestrian detection model, the training system for a pedestrian detection model includes a storage device and a processor, the storage device stores thereon a computer program executed by the processor, and the computer program, when executed by the processor, performs any one of the above-mentioned training methods for a pedestrian detection model.

According to a further aspect of the embodiments of the present invention, there is provided a computer-readable medium having stored thereon a computer program which, when executed, performs the training method of the pedestrian detection model according to any one of the above.

The training method, the device and the system of the pedestrian detection model and the computer readable medium automatically reduce the detection frame weight of the pedestrian sample similar to the background in the training process of the pedestrian detection model, thereby avoiding the adverse effect of the confusable sample on the network parameters and greatly improving the accuracy of pedestrian detection.

Drawings

The above and other objects, features and advantages of the present invention will become more apparent from the following detailed description of the embodiments of the present invention when taken in conjunction with the accompanying drawings. The accompanying drawings are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings, like reference numbers generally represent like parts or steps.

FIG. 1 illustrates a schematic block diagram of an example electronic device for implementing a training method, apparatus, system and computer-readable medium for a pedestrian detection model in accordance with embodiments of the present invention;

FIG. 2 shows a schematic flow diagram of a method of training a pedestrian detection model in accordance with an embodiment of the invention;

FIG. 3 illustrates a framework diagram of a neural network of a training method of a pedestrian detection model, according to an embodiment of the present invention;

FIG. 4 shows a schematic block diagram of a training apparatus for a pedestrian detection model, according to an embodiment of the present invention; and

FIG. 5 shows a schematic block diagram of a training system for a pedestrian detection model, according to an embodiment of the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, exemplary embodiments according to the present invention will be described in detail below with reference to the accompanying drawings. It is to be understood that the described embodiments are merely a subset of embodiments of the invention and not all embodiments of the invention, with the understanding that the invention is not limited to the example embodiments described herein. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the invention described herein without inventive step, shall fall within the scope of protection of the invention.

First, an example electronic device 100 for implementing a pedestrian detection model training method, apparatus, system, and computer-readable medium according to an embodiment of the present invention is described with reference to fig. 1.

As shown in FIG. 1, electronic device 100 includes one or more processors 102, one or more memory devices 104, an input device 106, an output device 108, and an image sensor 110, which are interconnected via a bus system 112 and/or other form of connection mechanism (not shown). It should be noted that the components and configuration of the electronic device 100 shown in FIG. 1 are exemplary only, and not limiting, and that the electronic device may have other components and configurations as desired.

The processor 102 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 100 to perform desired functions.

The storage 104 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. On which one or more computer program instructions may be stored that may be executed by processor 102 to implement client functionality (implemented by the processor) and/or other desired functionality in embodiments of the invention described below. Various applications and various data, such as various data used and/or generated by the applications, may also be stored in the computer-readable storage medium.

The input device 106 may be a device used by a user to input instructions and may include one or more of a keyboard, a mouse, a microphone, a touch screen, and the like.

The output device 108 may output various information (e.g., images or sounds) to the outside (e.g., a user), and may include one or more of a display, a speaker, and the like.

The image sensor 110 may take images (e.g., photographs, videos, etc.) desired by the user and store the taken images in the storage device 104 for use by other components.

It should be noted that the components and the structure of the electronic apparatus 100 shown in fig. 1 are only exemplary, and although the electronic apparatus 100 shown in fig. 1 includes a plurality of different devices, some of the devices may not be necessary, the number of the devices may be larger, and the like, as required, and the invention is not limited thereto.

Exemplary electronic devices for implementing the training method, the recognition method, the apparatus and the processing device of the pedestrian detection model according to the embodiments of the present invention may be implemented as smart terminals such as smart phones, tablet computers, and the like.

Next, a training method 200 of a pedestrian detection model according to an embodiment of the invention will be described with reference to fig. 2.

As shown in fig. 2, in step S210, a training image is input to a neural network, and prediction information of a training target in the training image is obtained, where the prediction information includes a detection box position, a detection box weight, and a detection box score, where the detection box weight represents similarity between a target object in the detection box and a background, and the higher the similarity is, the lower the detection box weight is. Conversely, the lower the similarity, the higher the detection box weight.

Wherein the training image may be any snapshot containing a number of pedestrians. A training set may be pre-constructed, which comprises a plurality of training images, typically each training image comprising one or more pedestrians whose positions have been previously labeled with a bounding box, called a real box. Based on the labeling performed in advance, a position truth value, a score truth value and a weight truth value of each detection box can be obtained.

The neural network comprises but is not limited to a convolution neural network, and can be improved on the basis of the existing various target detection neural networks, such as the neural networks of Faster R-CNN, RetinaNet, R-CNN, Fast R-CNN and the like.

The neural network specifically includes a feature extraction network and a pedestrian detection network. The feature extraction network is used for extracting features in the original image and outputting a feature map of the original image. The pedestrian detection network is used for carrying out pedestrian detection based on the characteristic diagram and outputting a detection result.

Specifically, firstly, a training image is input into a neural network, and feature extraction is performed to generate feature maps of multiple scales. The feature map of the target image may be obtained by performing feature extraction processing on the target image through an algorithm such as a HOG (Histogram of Oriented Gradient) feature extraction algorithm, an LBP (Local Binary Pattern) feature extraction algorithm, a Haar-like feature extraction algorithm, and the like. In practice, the feature extraction network does not necessarily need to be completely reconstructed, and some pre-trained convolutional neural networks for image classification tasks may be directly deleted to be finally used as the feature extraction network after the full connection layer for classification output. The structure of the feature extraction network and the specific manner of feature extraction are not limited herein.

Then, based on the feature map, prediction information about the target object is determined, specifically including a detection frame position of the target object, which represents a rectangular frame surrounding a pedestrian if present, a detection frame weight, and a detection frame score representing a probability that the pedestrian is present in the detection frame. In one embodiment, the detection box weight may also be represented by a detection box score, see below for details.

As shown in fig. 3, the existing pedestrian detection model only outputs the detection frame position and the detection frame score of each detection frame in the training process, but the embodiment of the present invention outputs the detection frame position and the detection frame score of each detection frame and also outputs the detection frame weight for identifying the probability that the target object exists in the detection frame, that is, for representing the similarity between the target in the detection frame and the background.

The prediction information of the target object can be obtained by adopting various feasible target detection algorithms, and finally, the detection frame position, the detection frame weight and the detection frame score are respectively output. Specifically, a general detection model respectively predicts a detection frame position and a detection frame score by using two classifiers, for example, an RPN (Region pro-social Network) can simultaneously predict a target boundary and a target score at each position; in this embodiment, in addition to the detection frame position, two identical or similar classifiers are used to calculate the probability of the presence of a pedestrian in the detection frame, respectively, and serve as the detection frame score and the detection frame weight, respectively.

Specifically, if the similarity of the target in the detection frame to the background is high, the possibility of confusion is high, and therefore the probability of the target object existing in the detection frame is low; conversely, if the similarity of the target in the detection frame to the background is low, the probability that the target object exists in the detection frame is high. Therefore, if the pedestrian and the background foreign matter at the position of the detection frame are similar, the network tends to output low probability, so that the detection frame weight of the detection frame at the position is reduced, and then the weighted classification error measured subsequently is lower, namely the weight of the detection frame is reduced during training, thereby reducing the adverse effect of the confusable sample on the network parameters.

It will be appreciated that the detection box weight and the detection box score are both used to represent the probability that the target object is present at the location indicated by the detection box. Therefore, in actual training, similar or identical classifiers can be used to output the detection box weight and the detection box score respectively, that is, one of the probabilities output by the two classifiers is used as the detection box score, and the other is used as the detection box weight.

In step S220, a first classification error between the box score and the score true value is calculated, and a weighted classification error is calculated according to the box weight and the first classification error.

As described above, the detection frame weight represents the probability that a pedestrian exists in the detection frame, further indicating the similarity of the pedestrian in the detection frame to the background, and if the pedestrian and the background foreign matter at this position are similar, the network tends to output a low probability, so that the detection frame weight of the detection frame is reduced. The weighted classification error comprises two factors of detection frame weight and first classification error, and when the detection frame weight is low, the calculated weighted classification error is small, so that the influence of confusable samples on the training result is reduced. In one embodiment, the weighted classification error is a product of the detection box weight and the first classification error.

In addition, calculating a second classification error between the box weight and a weight true value, and calculating a position error between the box position and a position true value, wherein when the box weight is represented by a box score, the weight true value can also be represented by a score true value; the second classification error is a training error of the weight of the detection frame, and the position error is a training error of the position of the detection frame.

In step S230, network parameters are updated based at least on the weighted classification errors.

In particular, the network parameters of the initial neural network may be adjusted to minimize the value of the weighted classification error as much as possible. The network parameters may include weights of each layer of the neural network, iteration number, and the like. Additionally, updating the network parameter based on the second classification error and the location error is included.

The higher the similarity between the pedestrian and the background in the detection frame is, the lower the weight of the detection frame is, the smaller the weighted classification error is relatively, and the smaller the influence of the pedestrian sample at the position on the training result is.

Specifically, end-to-end training may be performed by using back-propagation (BP), Stochastic Gradient Descent (SGD), or gradient back-propagation (gradient back) algorithms to optimize each parameter in the model. After each training image is processed, whether training ending conditions are met or not can be judged, if the training ending conditions are met, the training is ended, and the network parameters at the moment can be used as parameters of the trained pedestrian detection model. If the condition is not satisfied, the process returns to S210 to continue training. The conditions for the end of training may include that the training images in the training set have been exhausted, that the loss function has converged, and so on.

The training method of the pedestrian detection model according to the embodiment of the invention is exemplarily described above. Illustratively, the training method of the pedestrian detection model according to the embodiment of the present invention may be implemented in an apparatus, a device or a system having a memory and a processor.

In addition, the training method of the pedestrian detection model according to the embodiment of the invention can be conveniently deployed on mobile equipment such as a smart phone, a tablet computer and a personal computer. Alternatively, the training method of the pedestrian detection model according to the embodiment of the invention may also be deployed at a server side (or a cloud side). Alternatively, the training method of the pedestrian detection model according to the embodiment of the invention can also be distributively deployed at a server side (or a cloud side) and a personal terminal side.

Based on the above description, the training method according to the embodiment of the invention automatically reduces the detection frame weight of the pedestrian sample similar to the background in the training process of the pedestrian detection model, thereby reducing the adverse effect of the confusable sample on the network parameters and greatly improving the accuracy of pedestrian detection.

The above describes exemplary steps included in the training method of the pedestrian detection model according to the embodiment of the present invention.

The following describes a training apparatus for a pedestrian detection model according to another aspect of the present invention with reference to fig. 4. Fig. 4 shows a schematic block diagram of a training apparatus 400 of a pedestrian detection model according to an embodiment of the present invention.

As shown in fig. 4, the training apparatus 300 for a pedestrian detection model according to an embodiment of the present invention includes a prediction module 410, an error calculation module 420, and a training module 430. The various modules may perform the various steps/functions of the training method of a pedestrian detection model described above in connection with fig. 2, respectively.

The prediction module 410 is configured to input a training image to a neural network, to obtain prediction information of a training target in the training image, where the prediction information includes a detection box position, a detection box weight, and a detection box score, where the detection box weight represents similarity between a target object in the detection box and a background, and the higher the similarity is, the lower the detection box weight is. Conversely, the lower the similarity, the higher the detection box weight.

Wherein the training image may be any snapshot containing a number of pedestrians. A training set may be pre-constructed comprising a plurality of training images, typically each training image comprising one or more pedestrians, the positions of which have been previously marked with a bounding box, called a real box. Based on the labeling performed in advance, a position truth value, a score truth value, and a weight truth value of the detection box can be obtained.

The neural network specifically comprises a feature extraction network and a pedestrian detection network. The feature extraction network is used for extracting features in the original image and outputting a feature map of the original image. The pedestrian detection network is used for carrying out pedestrian detection based on the characteristic diagram and outputting a detection result.

Specifically, firstly, a training image is input into a neural network, and feature extraction is performed to generate feature maps of various scales. The feature extraction processing may be performed on the target image through an extraction algorithm, such as a HOG (Histogram of Oriented Gradient) feature extraction algorithm, an LBP (Local Binary Pattern) feature extraction algorithm, and a Haar-like feature extraction algorithm, so as to obtain a feature map of the target image. In practice, the feature extraction network does not necessarily need to be completely reconstructed, and some pre-trained convolutional neural networks for image classification tasks may be directly deleted to be finally used as the feature extraction network after the full connection layer for classification output. The structure of the feature extraction network and the specific manner of feature extraction are not limited herein.

The prediction information of the target object can be obtained by adopting various feasible target detection algorithms, and finally, the detection frame position, the detection frame weight and the detection frame score are respectively output. Specifically, a general detection model employs two classifiers to predict the detection box position and the detection box score, respectively, for example, an RPN (Region pro-active Network) can predict the target boundary and the target score at each position at the same time. In this embodiment, in addition to the detection frame position, two identical or similar classifiers are used to calculate the probability of the presence of a pedestrian in the detection frame, respectively, and serve as the detection frame score and the detection frame weight, respectively.

Specifically, if the similarity of the target in the detection frame to the background is high, the possibility of confusion is high, and therefore the probability of the target object existing in the detection frame is low; conversely, if the similarity of the target in the detection frame to the background is low, the probability that the target object exists in the detection frame is high. Therefore, if the pedestrian and the background foreign matter at the position of the detection frame are similar, the network tends to output low probability, so that the detection frame weight of the detection frame at the position is reduced, and then the weighted classification error measured subsequently is lower, namely the weight of the detection frame is reduced during training, so that the adverse effect of the confusable sample on the network parameters is reduced.

The error calculation module 420 is configured to calculate a first classification error between the box score and the score true value, and calculate a weighted classification error according to the box weight and the first classification error.

In addition, a second classification error between the detection box weight and the weight true value is calculated, and a position error between the detection box position and the position true value is calculated, wherein when the detection box weight is represented by the detection box score, the weight true value can also be represented by the score true value; the second classification error is a training error of the detection frame weight, and the position error is a training error of the detection frame position.

The training module 430 is configured to update network parameters based at least on the weighted classification errors.

Specifically, end-to-end training may be performed by using back-propagation (BP), Stochastic Gradient Descent (SGD), or gradient back-propagation (drd) algorithms to optimize each parameter in the model. After each training image is processed, whether training ending conditions are met or not can be judged, if the training ending conditions are met, the training is ended, and the network parameters at the moment can be used as parameters of the trained pedestrian detection model. If the condition is not met, the training is continued. The conditions for the end of training may include that the training images in the training set have been exhausted, that the loss function has converged, and so on.

The above exemplarily describes the training apparatus of the pedestrian detection model according to the embodiment of the invention. Illustratively, the training apparatus of the pedestrian detection model according to the embodiment of the present invention may be implemented in a device, an apparatus, or a system having a memory and a processor.

In addition, the training device of the pedestrian detection model according to the embodiment of the invention can be conveniently deployed on mobile equipment such as a smart phone, a tablet computer, a personal computer and the like. Alternatively, the training device of the pedestrian detection model according to the embodiment of the present invention may also be deployed at a server side (or a cloud side). Alternatively, the training devices of the pedestrian detection model according to the embodiment of the present invention may also be distributively deployed at the server side (or cloud side) and the personal terminal side.

Based on the above description, the training device according to the embodiment of the invention automatically reduces the detection frame weight of the pedestrian sample similar to the background in the training process of the pedestrian detection model, thereby reducing the adverse effect of the confusable sample on the network parameters and greatly improving the accuracy of pedestrian detection.

FIG. 5 shows a schematic block diagram of a training system 500 for a pedestrian detection model, according to an embodiment of the invention. The training system 500 for pedestrian detection models includes a memory device 510 and a processor 520.

The storage device 510 stores therein program codes for implementing respective steps in the training method of the pedestrian detection model according to the embodiment of the invention. The processor 520 is configured to run the program codes stored in the storage device 510 to perform the corresponding steps of the training method of the pedestrian detection model according to the embodiment of the present invention, and is configured to implement the corresponding modules in the training device of the pedestrian detection model according to the embodiment of the present invention.

In one embodiment, the program code when executed by the processor 520 causes the training system 500 for a pedestrian detection model to perform the steps of:

inputting a training image into a neural network to generate prediction information about a target object in the training image, wherein the prediction information comprises a detection frame position, a detection frame weight and a detection frame score, the detection frame weight represents the similarity of the target object in the detection frame and a background, and the higher the similarity is, the lower the detection frame weight is;

In one embodiment, the program code when executed by the processor 520 further causes the training system 500 for a pedestrian detection model to perform: calculating a second classification error between the detection frame weight and a weight true value; and updating the network parameter based on the second classification error.

In one embodiment, the program code, when executed by the processor 520, further causes the training system 500 for the pedestrian detection model to perform: calculating a position error between the position of the detection frame and a position true value; and updating the network parameter based on the location error.

Further, according to an embodiment of the present invention, there is also provided a storage medium on which program instructions for executing respective steps of the training method of a pedestrian detection model according to an embodiment of the present invention when the program instructions are executed by a computer or a processor, and for implementing respective modules in the training apparatus of a pedestrian detection model according to an embodiment of the present invention are stored. The storage medium may include, for example, a memory card of a smart phone, a storage component of a tablet computer, a hard disk of a personal computer, a Read Only Memory (ROM), an Erasable Programmable Read Only Memory (EPROM), a portable compact disc read only memory (CD-ROM), a USB memory, or any combination of the above storage media. The computer-readable storage medium may be any combination of one or more computer-readable storage media.

In one embodiment, the computer program instructions may, when executed by a computer, implement the functional modules of the training apparatus of a pedestrian detection model according to an embodiment of the present invention, and/or may perform the training method of a pedestrian detection model according to an embodiment of the present invention.

In one embodiment, the computer program instructions, when executed by a computer or processor, cause the computer or processor to perform the steps of:

In one embodiment, the computer program instructions, when executed by a computer or processor, further cause the computer or processor to perform: calculating a second classification error between the detection frame weight and a weight true value; and updating the network parameter based on the second classification error.

In one embodiment, the computer program instructions, when executed by a computer or processor, further cause the computer or processor to perform: calculating a position error between the position of the detection frame and a position truth value; and updating the network parameter based on the location error.

According to the training method, the training device, the training system and the computer readable medium of the pedestrian detection model, the weight of the detection frame of the pedestrian sample similar to the background is automatically reduced in the training process of the pedestrian detection model, so that the adverse effect of the confusable sample on the network parameters is reduced, and the accuracy of pedestrian detection is greatly improved.

Although the illustrative embodiments have been described herein with reference to the accompanying drawings, it is to be understood that the foregoing illustrative embodiments are merely exemplary and are not intended to limit the scope of the invention thereto. Various changes and modifications may be effected therein by one of ordinary skill in the pertinent art without departing from the scope or spirit of the present invention. All such changes and modifications are intended to be included within the scope of the present invention as set forth in the appended claims.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the technical solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the above-described device embodiments are merely illustrative, and for example, the division of the units is only one logical functional division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another device, or some features may be omitted, or not executed.

In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the invention and aiding in the understanding of one or more of the various inventive aspects. However, the method of the present invention should not be construed to reflect the intent: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

It will be understood by those skilled in the art that all of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where such features are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Moreover, those skilled in the art will appreciate that although some embodiments described herein include some features included in other embodiments, not others, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the claims, any of the claimed embodiments may be used in any combination.

Various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or other suitable processor may be used in practice to implement some or all of the functionality of some of the modules according to embodiments of the invention. The present invention may also be embodied as apparatus programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention can be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

The above description is only for the specific embodiment of the present invention or the description thereof, and the protection scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and the changes or substitutions should be covered within the protection scope of the present invention. The protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A method of training a pedestrian detection model, the method comprising:

2. The method for training a pedestrian detection model according to claim 1, wherein the weighted classification error is a product of the detection box weight and the first classification error.

3. The training method of the pedestrian detection model according to claim 1, characterized by further comprising:

calculating a second classification error between the detection box weight and a weight true value; and

updating the network parameter based on the second classification error.

4. The training method of the pedestrian detection model according to claim 1, characterized by further comprising:

calculating a position error between the position of the detection frame and a position true value; and

updating the network parameter based on the location error.

5. The training method of the pedestrian detection model according to claim 1, wherein the generating of the prediction information about the target object in the training image includes:

performing feature extraction on the training image based on the neural network to generate a feature map of the training image;

and generating the prediction information according to the feature map.

6. A training device for a pedestrian detection model, characterized by comprising:

a training module to update network parameters of the neural network based at least on the weighted classification errors.

7. The training apparatus for a pedestrian detection model according to claim 6, wherein the weighted classification error is a product of the detection box weight and the first classification error.

8. The training device of the pedestrian detection model according to claim 6, further comprising:

the characteristic extraction module is used for extracting the characteristics of the training images on the basis of the neural network so as to generate a characteristic diagram of the training images; and the number of the first and second electrodes,

the prediction module generates the prediction information according to the feature map.

9. A training system of a pedestrian detection model, characterized in that the training system of the pedestrian detection model comprises a storage method and a processor, the storage method having stored thereon a computer program to be run by the processor, the computer program, when being run by the processor, performing the training method of a pedestrian detection model according to any one of claims 1-5.

10. A computer-readable medium, characterized in that a computer program is stored on the computer-readable medium, which computer program, when running, performs a training method of a pedestrian detection model according to any one of claims 1-5.