CN112052818A

CN112052818A - Unsupervised domain adaptive pedestrian detection method, unsupervised domain adaptive pedestrian detection system and storage medium

Info

Publication number: CN112052818A
Application number: CN202010968987.1A
Authority: CN
Inventors: 谭宇志
Original assignee: Zhejiang Smart Video Security Innovation Center Co Ltd
Current assignee: Zhejiang Smart Video Security Innovation Center Co Ltd
Priority date: 2020-09-15
Filing date: 2020-09-15
Publication date: 2020-12-08
Anticipated expiration: 2040-09-15
Also published as: CN112052818B

Abstract

The embodiment of the application provides a pedestrian detection method and system adaptive to an unsupervised domain and a storage medium. According to the method and the device, only label-free image data in a new scene need to be acquired, and a large amount of image annotation is not needed, so that the manpower and material resource consumption caused by data annotation in the development process is greatly saved, and the efficiency is improved; the migration capability of the model is improved, and the method can be more suitable for the change of scenes.

Description

Unsupervised domain adaptive pedestrian detection method, unsupervised domain adaptive pedestrian detection system and storage medium

Technical Field

The application belongs to the technical field of pedestrian detection, and particularly relates to a pedestrian detection method and system adaptive to an unsupervised domain and a storage medium.

Background

Along with pedestrian detects by fields such as more and more use intelligent security, autopilot, pedestrian's application scene that detects is more and more abundant. However, the lighting conditions, background, camera angle, etc. are all different for different scenes, meaning that the data distribution is usually different for different scenes. Practical application scenes of pedestrian detection are various, and a network model trained under one scene is directly used for another scene to cause great reduction of detection performance. The deep learning method relies on a large amount of labeled data to improve the generalization performance of the network model, so that the prior art mainly re-labels a large amount of data of each scene in a data-driven manner, and then trains the network again on the newly labeled data to obtain the network model in the new scene.

Specifically, in the existing methods, the supervised transfer learning is mainly used for processing the problems. Namely, new data are artificially marked in a new scene, and fine tuning training is carried out on the original model by utilizing the newly marked data to realize model migration. The specific implementation process comprises the following steps: 1. collecting data in a target domain scene; 2. manually marking the collected data; 3. carrying out fine tuning training on the trained model on the source domain by using the data and the corresponding label information; 4. and carrying out target detection of the target domain by using the trained model.

However, although the method can adapt the network model to a new scene, a large amount of new scene data is generated correspondingly after each scene switching, and the large amount of new scene data needs a large amount of manpower and material resources for labeling, which brings great inconvenience to applications and developers. Finally, with the increasing change of scenes, the model migration capability is gradually deteriorated. Therefore, the above method for performing fine tuning training based on labeled data suffers from a relatively large bottleneck in practical application, and a new pedestrian detection algorithm is urgently needed to solve the above problems.

Disclosure of Invention

The invention provides a pedestrian detection method, a system and a storage medium adaptive to an unsupervised domain, and aims to solve the problems that in the prior art, when fine tuning training is carried out based on labeled data, a large amount of data in a new scene needs to be manually labeled, and time and labor are wasted.

According to a first aspect of embodiments of the present application, there is provided an unsupervised domain adapted pedestrian detection method, comprising the steps of:

randomly selecting labeled image data and unlabeled image data;

the tagged image data is enhanced through random data to obtain enhanced tagged image data; obtaining first enhanced non-label image data and second enhanced non-label image data respectively by the non-label image label data through random data enhancement;

inputting enhanced labeled data to a first pedestrian detection network to obtain first pedestrian prediction characteristics; inputting first enhanced non-tag data to a first pedestrian detection network to obtain a second pedestrian prediction characteristic; inputting second enhanced label-free image data to a second pedestrian detection network to obtain a third pedestrian prediction characteristic;

obtaining a supervised learning cost according to the label characteristics of the labeled image data and the first pedestrian prediction characteristics; obtaining consistency cost according to the second pedestrian prediction characteristic and the third pedestrian prediction characteristic;

adding the supervised learning cost and the consistency cost to obtain a total cost;

updating the weight parameter of the first pedestrian detection network through a random gradient descent algorithm according to the total cost;

and updating the weight parameter of the second pedestrian detection network through an exponential moving average algorithm according to the weight parameter of the first pedestrian detection network.

In some embodiments of the present application, the unsupervised domain adapted pedestrian detection method further comprises:

repeating the steps until the first pedestrian detection network and the second pedestrian monitoring network converge to obtain the updated first pedestrian detection network and the updated second pedestrian monitoring network;

and inputting the unlabeled image data to be detected into the updated first pedestrian detection network to obtain a pedestrian detection result.

In some embodiments of the present application, the tagged image data is randomly selected from image data for which the tag data is known; the non-label image data is randomly selected from the non-label image data to be detected.

In some embodiments of the present application, the first, second and third pedestrian prediction features include size information, classification information and location information of image pedestrians.

In some embodiments of the present application, the supervised learning costs and consistency costs include pedestrian classification loss, pedestrian center point offset loss, pedestrian border width and height loss.

In some embodiments of the present application, the first and second pedestrian detection networks initially employ the same neural network architecture.

In some embodiments of the present application, the random data enhancement includes image size or pixel random enhancement.

In some embodiments of the present application, the weight parameter of the second pedestrian detection network is obtained by performing exponential moving average in a training process on the weight calculation of the first pedestrian detection network.

According to a second aspect of the embodiments of the present application, there is provided an unsupervised domain adapted pedestrian detection system, specifically including:

a training data selection module: the image processing device is used for randomly selecting image data with labels and image data without labels;

the data enhancement module: the system comprises a random data enhancement module, a tag image data acquisition module, a tag image data storage module and a tag image data processing module, wherein the random data enhancement module is used for enhancing the tag image data to obtain enhanced tag image data; the system comprises a random data enhancement module, a label-free image data acquisition module and a label-free image data acquisition module, wherein the random data enhancement module is used for respectively obtaining first enhanced label-free image data and second enhanced label-free image data through;

the characteristic prediction network module: the system comprises a first pedestrian detection network, a second pedestrian detection network and a third pedestrian prediction network, wherein the first pedestrian prediction network is used for inputting enhanced labeled data to the first pedestrian detection network to obtain first pedestrian prediction characteristics; the first enhanced non-tag data is input to the first pedestrian detection network to obtain a second pedestrian prediction characteristic; the third pedestrian prediction feature is obtained by inputting second enhanced label-free image data to a second pedestrian detection network;

a supervised learning cost module: the monitoring learning cost is obtained according to the label characteristics of the labeled image data and the first pedestrian prediction characteristics;

a consistency cost module: the system is used for obtaining consistency cost according to the second pedestrian prediction characteristic and the third pedestrian prediction characteristic;

a total cost module: the method is used for adding the supervised learning cost and the consistency cost to obtain a total cost;

a first pedestrian detection network update module: the weight parameter updating module is used for updating the weight parameter of the first pedestrian detection network through a random gradient descent algorithm according to the total cost;

a second pedestrian detection network update module: and the weight parameter updating module is used for updating the weight parameter of the second pedestrian detection network through an exponential moving average algorithm according to the weight parameter of the first pedestrian detection network.

In some embodiments of the present application, the unsupervised domain adapted pedestrian detection system further comprises:

training a convergence module: the pedestrian detection method comprises the steps of repeating the steps until the first pedestrian detection network and the second pedestrian monitoring network converge to obtain the updated first pedestrian detection network and the updated second pedestrian monitoring network;

a pedestrian detection module: and the pedestrian detection system is used for inputting the unlabeled image data to be detected into the updated first pedestrian detection network to obtain a pedestrian detection result.

According to a third aspect of embodiments of the present application, there is provided a computer-readable storage medium having a computer program stored thereon; a computer program is executed by a processor to implement an unsupervised domain adapted pedestrian detection method.

By adopting the unsupervised domain adaptive pedestrian detection method, the unsupervised domain adaptive pedestrian detection system and the storage medium in the embodiment of the application, the labeled image data and the unlabeled image data are randomly selected; the tagged image data is enhanced through random data to obtain enhanced tagged image data; obtaining first enhanced non-label image data and second enhanced non-label image data respectively by the non-label image label data through random data enhancement; inputting enhanced labeled data to a first pedestrian detection network to obtain first pedestrian prediction characteristics; inputting first enhanced non-tag data to a first pedestrian detection network to obtain a second pedestrian prediction characteristic; inputting second enhanced label-free image data to a second pedestrian detection network to obtain a third pedestrian prediction characteristic; obtaining a supervised learning cost according to the label characteristics of the labeled image data and the first pedestrian prediction characteristics; obtaining consistency cost according to the second pedestrian prediction characteristic and the third pedestrian prediction characteristic; adding the supervised learning cost and the consistency cost to obtain a total cost; updating the weight parameter of the first pedestrian detection network through a random gradient descent algorithm according to the total cost; and updating the weight parameter of the second pedestrian detection network through an exponential moving average algorithm according to the weight parameter of the first pedestrian detection network. According to the method, the migration capability of the model is enhanced by adopting the migration learning, and the model can be trained by using the label data in the existing scene and the label-free data in the new scene together through the unsupervised domain adaptation method, so that the data expression capability of the existing scene can be migrated to the data in the new scene by the model. According to the method and the device, only label-free image data in a new scene need to be acquired, and a large amount of image annotation is not needed, so that the manpower and material resource consumption caused by data annotation in the development process is greatly saved, and the efficiency is improved; the migration capability of the model is improved, and the method can be more suitable for the change of scenes.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

FIG. 1 is a flow chart illustrating steps of a unsupervised domain adapted pedestrian detection method according to an embodiment of the present application;

FIG. 2 is a schematic diagram illustrating steps in an unsupervised domain adapted pedestrian detection method according to another embodiment of the present application;

a flow diagram of an unsupervised domain adapted pedestrian detection method according to an embodiment of the application is shown in fig. 3;

fig. 4 shows a schematic structural diagram of an unsupervised domain adapted pedestrian detection system according to an embodiment of the application.

Detailed Description

In the process of implementing the application, the inventor finds that when the existing processing is performed on the detection data of the downloader in the new scene, the supervised migration learning method is used for artificially labeling the image data in the new scene to obtain new data, fine tuning is performed on the detection network of the original scene, and finally, along with the increasing change of the scenes, the model migration capability is gradually deteriorated, the detection result is inaccurate, and a large amount of manual data labeling is time-consuming and labor-consuming.

In order to solve the problems that the model migration capability is gradually poor and the cost of labeled data is high, the application provides a unsupervised domain adaptive migration learning method which only needs to collect unlabeled data in a target scene and does not need to manually mark new data.

The method and the device construct two identical pedestrian detection network models, wherein one is used as a student model and the other is used as a teacher model. And generating a pseudo label according to the target domain data through a teacher network, and guiding the student network needing training by using the generated pseudo label. By making the output of the student network as close as possible to the output of the teacher network, the student network is more suitable for the distribution of the target domain data.

The pedestrian detection method, the system and the storage medium which are adaptive to the unsupervised domain randomly select the image data with the label and the image data without the label; the tagged image data is enhanced through random data to obtain enhanced tagged image data; obtaining first enhanced non-label image data and second enhanced non-label image data respectively by the non-label image label data through random data enhancement; inputting enhanced labeled data to a first pedestrian detection network to obtain first pedestrian prediction characteristics; inputting first enhanced non-tag data to a first pedestrian detection network to obtain a second pedestrian prediction characteristic; inputting second enhanced label-free image data to a second pedestrian detection network to obtain a third pedestrian prediction characteristic; obtaining a supervised learning cost according to the label characteristics of the labeled image data and the first pedestrian prediction characteristics; obtaining consistency cost according to the second pedestrian prediction characteristic and the third pedestrian prediction characteristic; adding the supervised learning cost and the consistency cost to obtain a total cost; updating the weight parameter of the first pedestrian detection network through a random gradient descent algorithm according to the total cost; and updating the weight parameter of the second pedestrian detection network through an exponential moving average algorithm according to the weight parameter of the first pedestrian detection network. According to the method, the migration capability of the model is enhanced by adopting migration learning, and the model is trained jointly by using the labeled data in the existing scene and the unlabeled data in the new scene through an unsupervised domain adaptation method, so that the data expression capability of the existing scene can be migrated to the data in the new scene by the model. According to the method and the device, only label-free image data under a new scene need to be acquired, a large amount of image labels do not need to be carried out again, the efficiency is greatly improved, and therefore the migration performance of the model is greatly improved.

In order to make the technical solutions and advantages of the embodiments of the present application more apparent, the following further detailed description of the exemplary embodiments of the present application with reference to the accompanying drawings makes it clear that the described embodiments are only a part of the embodiments of the present application, and are not exhaustive of all embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

Example 1

A flowchart of the steps of an unsupervised domain adapted pedestrian detection method according to an embodiment of the application is shown in fig. 1.

As shown in fig. 1, the unsupervised domain adaptive pedestrian detection method specifically includes the following steps:

s101: and randomly selecting the image data with the label and the image data without the label.

Wherein, the image data with the label is randomly selected from the image data with known label data; the non-label image data is randomly selected from the non-label image data to be detected.

S102: the tagged image data is enhanced through random data to obtain enhanced tagged image data; and respectively obtaining first enhanced non-label image data and second enhanced non-label image data by random data enhancement of the non-label image data.

Wherein the random data enhancement comprises image size or pixel random enhancement.

S103: inputting enhanced labeled data to a first pedestrian detection network to obtain first pedestrian prediction characteristics; inputting first enhanced non-tag data to a first pedestrian detection network to obtain a second pedestrian prediction characteristic; and inputting second enhanced unlabeled image data to a second pedestrian detection network to obtain a third pedestrian prediction characteristic.

The first pedestrian detection network and the second pedestrian detection network initially adopt the same neural network architecture.

S104: obtaining a supervised learning cost according to the label characteristics of the labeled image data and the first pedestrian prediction characteristics; and obtaining the consistency cost according to the second pedestrian prediction characteristic and the third pedestrian prediction characteristic.

The first pedestrian prediction feature, the second pedestrian prediction feature and the third pedestrian prediction feature comprise size information, classification information and position information of image pedestrians.

S105: and adding the supervised learning cost and the consistency cost to obtain the total cost.

S106: and updating the weight parameter of the first pedestrian detection network through a random gradient descent algorithm according to the total cost.

Wherein, the supervised learning cost and the consistency cost comprise pedestrian classification loss, offset loss of a pedestrian center point, pedestrian frame width and height loss.

S107: and updating the weight parameter of the second pedestrian detection network through an exponential moving average algorithm according to the weight parameter of the first pedestrian detection network.

And the weight parameter of the second pedestrian detection network is obtained by calculating the weight of the first pedestrian detection network and calculating the exponential sliding average in the training process.

A schematic diagram of steps in an unsupervised domain adapted pedestrian detection method according to another embodiment of the present application is shown in fig. 2.

As shown in fig. 2, in some embodiments of the present application, the unsupervised domain adapted pedestrian detection method further comprises the steps of:

s108: repeating the steps from S101 to S107 until the first pedestrian detection network and the second pedestrian monitoring network converge to obtain the updated first pedestrian detection network and the updated second pedestrian monitoring network;

s109: and inputting the unlabeled image data to be detected into the updated first pedestrian detection network to obtain a pedestrian detection result.

A flow diagram of an unsupervised domain adapted pedestrian detection method according to an embodiment of the application is shown in fig. 3.

In the implementation of the supervision domain adaptive pedestrian detection method, two identical pedestrian detection network models, namely a first pedestrian detection network and a second pedestrian monitoring network, need to be established, the first pedestrian detection network serves as a student model, and the second pedestrian monitoring network serves as a teacher model.

In this embodiment, the pedestrian detection network may adopt a network with an anchor frame, such as a network structure of SSD, YOLO-V3, and the like; an anchorless pedestrian detection network, such as a Center-Net, YOLO-V1 network structure, may also be used.

As shown in fig. 3, the supervision domain adaptive pedestrian detection method of the present application specifically includes the following steps:

1) respectively randomly selecting training data, labeled image data (x) from labeled image data in existing scene and unlabeled image data in new scene_s,B_s) And label-free image data x_tIn which B is_sRepresenting tagged image data x_sThe label of (1).

2) Tagged image data (x)_s,B_s) Obtaining enhanced tagged data

x_tAfter random data enhancement processing is carried out twice, first enhanced label-free image data are respectively obtained

And second enhanced unlabeled image data

3) Will be enhanced with tagged data

And first enhanced unlabeled image data

Respectively input into student network to obtain output characteristics f_sAnd f_t ^SSecond enhanced unlabeled image data

Input teacher network obtaining output characteristics f_t ^TOutput characteristic f_t ^TI.e. a pseudo tag.

4) Computing output characteristic f_sAnd label B_sSupervision loss l between_supervisedComputing the output characteristic f_t ^SAnd f_t ^TLoss of consistency between l_consist。

5) For supervision loss l_supervisedAnd a consistency cost l_consistSumming to obtain total cost l_total。

6) The student network weight parameters are updated by a Stochastic Gradient Descent (SGD) algorithm.

7) The student network weight is updated to the teacher network by an Exponential Moving Average (EMA).

8) And returning to the step 1) to loop the series of steps until the student network converges.

In the testing stage, the picture to be detected in the target domain in the new scene is input to the trained student network, and the student network outputs the classification confidence, the frame deviation value and the width and height of the frame. The feature points with the confidence coefficient of the pedestrian classification greater than a certain threshold and the corresponding frames thereof are used as final output, and the threshold of the embodiment is set to be 0.7.

In specific implementation, the total cost of the student network, namely the loss function, is composed of two parts of loss, namely supervision loss of source domain data in an existing scene and consistency loss of target domain data in a new scene.

Regarding the supervision loss of the source domain data in the existing scene, taking the Center-Net detection network as an example, the output of each feature point of the network includes the classification information and the position information of the object to which the feature point belongs. Wherein the classification information is represented as a class confidence of each class in the detection task. The position information is represented by the offset distance of each point from the center of the target to which the point belongs, and the length and width of the frame of the target to which each point belongs.

Therefore, the supervision loss of the network includes the accumulated classification loss of the center point of the pedestrian, the offset loss of the center point of the pedestrian, and the width and height loss of the frame of the pedestrian.

Supervision loss l_supervisedThe total loss of (a) is a weighted sum of three losses, as specified in equation (1):

L_supervised＝L_k+λ_shapeL_shape+λ_offL_offformula (1)

Wherein L is_kTo classify the loss, L_shapeFor width and height loss of pedestrian borders, L_offIs the offset loss of the center point of the pedestrian, lambda_shapeWeight of width and height loss, λ, of pedestrian borders_offIs the weight lost to the offset of the center point of the pedestrian.

Regarding the consistency loss of the target domain data in the new scene, because the target domain data does not have artificial tag information, the adopted tag information is derived from the output of the teacher network, and is the same as the supervision loss of the source domain data in the existing scene, including classification confidence loss, frame position and width and height loss, which is not described again here.

Specifically, as shown in fig. 3, the student network receives two types of input data: one is the image data of the source domain and the other is the image data of the target domain.

Image data output characteristic f of student network passing source domain_sIncluding pedestrian location and category confidence in the image data.

Due to the image of the source domainThe labels of the data are known, so that the supervised loss L can be calculated by comparing the network predicted pedestrian location and class confidence with the true label_supervised. During training, L is reduced by gradient_supervisedAnd the pedestrian detection system is smaller and smaller, so that the student network can output more and more accurate pedestrian detection results on the source domain.

On the other hand, the student network outputs the feature f through the image data of the target domain_t ^SIncluding pedestrian location and category confidence in the image data.

The target domain has no label information and therefore cannot be used directly to train the network. The application provides a method for constructing a pseudo label, and image data of different target domains are input into a teacher network to obtain a characteristic f_t ^T，f_t ^TAs a "pseudo tag".

Then, the output characteristics f of the student network are determined_t ^SWith a pseudo label f_t ^TComparison was made and loss L calculated_consistDuring training, L is reduced by gradient_consistThe pedestrian detection system is smaller and smaller, so that the student network and the teacher network can output pedestrian detection results which are predicted more and more accurately on the target domain.

In order to obtain the above-mentioned "pseudo label", the teacher model constructed in the present application is the same as the student model.

In the training process, the weight of the teacher network model is obtained by calculating the Exponential Moving Average (EMA) of the student network weight in the training process. The parameters after this iteration of the teacher network are represented as follows:

Y_t＝λY_t-1+(1-λ)X_t

wherein, Y_tIs the parameter, Y, of the teacher's network after this iteration_t-1For the parameters, X, of the teacher's network after the last update_tFor the parameters after this update of the student network, λ is the weight assigned to the model parameters after the last iteration.

When t is 0, Y is initialized and coincides with X, i.e. Y₀＝X₀。

Thus, the teacher network can be viewed as the result of a weighted summation of the student networks over a series of different iterative stages. Thus, the weights of the teacher network are smoother in time. Meanwhile, the student networks in different stages are integrated, so that the system has stronger generalization capability.

The random gradient descent method adopted by the student network updating method in the embodiment of the application can also adopt other optimized updating methods.

The teacher network updating method in the embodiment of the application adopts an exponential moving average weighting mode, and other weighting modes can be substituted for the teacher network updating method.

The pedestrian detection network in the embodiment of the present application is not limited to a specific depth network model.

In the pedestrian detection method adaptive to the unsupervised domain in the embodiment of the application, the unlabeled image of the target domain is respectively input to the student network and the teacher network after different data enhancement. And the teacher network predicts the information such as the size, the position, the class confidence coefficient and the like of the pedestrian in the image, and the information is used as a pseudo label of the student network to guide the student network to learn. And after the student network updates the weight, updating the weight of the teacher network by using the sliding average.

Meanwhile, the teacher network and the student network receive images obtained by different data enhancement of the same data. By constraining the output of the two networks to be consistent, the student networks can learn the potential similarity of the target domain data. And because the teacher network has a higher generalization ability than the student network, the generalization ability of the student network can be enhanced by making the output of the student network consistent with the output of the teacher network.

Finally, as shown in fig. 3, by continuously iterating the above training process, the student model and the teacher model can mutually promote the performance improvement, so that the teacher model can predict better pseudo labels, and the student model can better adapt to the distribution of target domain data.

By adopting the unsupervised domain adaptive pedestrian detection method in the embodiment of the application, the labeled image data and the unlabeled image data are randomly selected; the tagged image data is enhanced through random data to obtain enhanced tagged image data; obtaining first enhanced non-label image data and second enhanced non-label image data respectively by the non-label image label data through random data enhancement; inputting enhanced labeled data to a first pedestrian detection network to obtain first pedestrian prediction characteristics; inputting first enhanced non-tag data to a first pedestrian detection network to obtain a second pedestrian prediction characteristic; inputting second enhanced label-free image data to a second pedestrian detection network to obtain a third pedestrian prediction characteristic; obtaining a supervised learning cost according to the label characteristics of the labeled image data and the first pedestrian prediction characteristics; obtaining consistency cost according to the second pedestrian prediction characteristic and the third pedestrian prediction characteristic; adding the supervised learning cost and the consistency cost to obtain a total cost; updating the weight parameter of the first pedestrian detection network through a random gradient descent algorithm according to the total cost; and updating the weight parameter of the second pedestrian detection network through an exponential moving average algorithm according to the weight parameter of the first pedestrian detection network. According to the method, the migration capability of the model is enhanced by adopting the migration learning, and the model can be trained by using the label data in the existing scene and the label-free data in the new scene together through the unsupervised domain adaptation method, so that the data expression capability of the existing scene can be migrated to the data in the new scene by the model. According to the method and the device, only label-free image data in a new scene need to be acquired, and a large amount of image annotation is not needed, so that the manpower and material resource consumption caused by data annotation in the development process is greatly saved, and the efficiency is improved; the migration capability of the model is improved, and the method can be more suitable for the change of scenes.

Example 2

For details not disclosed in the unsupervised domain adaptive pedestrian detection system of this embodiment, please refer to implementation contents of the unsupervised domain adaptive pedestrian detection method in other embodiments.

As shown in fig. 4, the unsupervised domain adaptive pedestrian detection system according to the embodiment of the present application includes a training data selecting module 10, a data enhancing module 20, a feature prediction network module 30, a supervised learning cost module 40, a consistency cost module 50, a total cost module 60, a first pedestrian detection network updating module 70, and a second pedestrian detection network updating module 80.

Specifically, the method comprises the following steps:

the training data selection module 10: the method is used for randomly selecting the image data with the label and the image data without the label.

The data enhancement module 20: the system comprises a random data enhancement module, a tag image data acquisition module, a tag image data storage module and a tag image data processing module, wherein the random data enhancement module is used for enhancing the tag image data to obtain enhanced tag image data; the method is used for enhancing the label-free image data through random data to respectively obtain first enhanced label-free image data and second enhanced label-free image data.

Feature prediction network module 30: the system comprises a first pedestrian detection network, a second pedestrian detection network and a third pedestrian prediction network, wherein the first pedestrian prediction network is used for inputting enhanced labeled data to the first pedestrian detection network to obtain first pedestrian prediction characteristics; the first enhanced non-tag data is input to the first pedestrian detection network to obtain a second pedestrian prediction characteristic; and the third pedestrian prediction feature is obtained by inputting the second enhanced unlabeled image data to the second pedestrian detection network.

The supervised learning cost module 40: and obtaining the supervised learning cost according to the label characteristics of the labeled image data and the first pedestrian prediction characteristics.

Consistency cost module 50: and obtaining the consistency cost according to the second pedestrian prediction characteristic and the third pedestrian prediction characteristic.

Total cost module 60: and adding the supervised learning cost and the consistency cost to obtain the total cost.

The first pedestrian detection network update module 70: and updating the weight parameter of the first pedestrian detection network through a random gradient descent algorithm according to the total cost.

The second pedestrian detection network updating module 80: and the weight parameter updating module is used for updating the weight parameter of the second pedestrian detection network through an exponential moving average algorithm according to the weight parameter of the first pedestrian detection network.

training a convergence module: and repeating the steps until the first pedestrian detection network and the second pedestrian monitoring network converge to obtain the updated first pedestrian detection network and the updated second pedestrian monitoring network.

An application flow diagram of the unsupervised domain adapted pedestrian detection system according to the embodiment of the application is also shown in fig. 3.

Firstly, two identical pedestrian detection network models, namely a first pedestrian detection network and a second pedestrian monitoring network, need to be established, wherein the first pedestrian detection network is used as a student model, and the second pedestrian monitoring network is used as a teacher model.

As shown in fig. 3, the application of the supervision domain adaptive pedestrian detection system of the present application includes the following specific steps:

2) Tagged image data (x)_s,B_s) Obtaining enhanced tagged data

And second enhanced unlabeled image data

3) Will be enhanced with tagged data

And first enhanced unlabeled image data

In the pedestrian detection system adaptive to the unsupervised domain in the embodiment of the application, the target domain unlabeled image is respectively input to the student network and the teacher network after different data enhancement. And the teacher network predicts the information such as the size, the position, the class confidence coefficient and the like of the pedestrian in the image, and the information is used as a pseudo label of the student network to guide the student network to learn. And after the student network updates the weight, updating the weight of the teacher network by using the sliding average.

By adopting the unsupervised domain adaptive pedestrian detection system in the embodiment of the application, the labeled image data and the unlabeled image data are randomly selected; the tagged image data is enhanced through random data to obtain enhanced tagged image data; obtaining first enhanced non-label image data and second enhanced non-label image data respectively by the non-label image label data through random data enhancement; inputting enhanced labeled data to a first pedestrian detection network to obtain first pedestrian prediction characteristics; inputting first enhanced non-tag data to a first pedestrian detection network to obtain a second pedestrian prediction characteristic; inputting second enhanced label-free image data to a second pedestrian detection network to obtain a third pedestrian prediction characteristic; obtaining a supervised learning cost according to the label characteristics of the labeled image data and the first pedestrian prediction characteristics; obtaining consistency cost according to the second pedestrian prediction characteristic and the third pedestrian prediction characteristic; adding the supervised learning cost and the consistency cost to obtain a total cost; updating the weight parameter of the first pedestrian detection network through a random gradient descent algorithm according to the total cost; and updating the weight parameter of the second pedestrian detection network through an exponential moving average algorithm according to the weight parameter of the first pedestrian detection network. According to the method, the migration capability of the model is enhanced by adopting the migration learning, and the model can be trained by using the label data in the existing scene and the label-free data in the new scene together through the unsupervised domain adaptation method, so that the data expression capability of the existing scene can be migrated to the data in the new scene by the model. According to the method and the device, only label-free image data under a new scene need to be acquired, a large amount of image labels do not need to be carried out again, the efficiency is greatly improved, and therefore the migration performance of the model is greatly improved.

Example 3

The present embodiment provides a computer-readable storage medium having stored thereon a computer program; the computer program is executed by a processor to implement the unsupervised domain adapted pedestrian detection method in other embodiments.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present invention. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. An unsupervised domain adapted pedestrian detection method, comprising the steps of:

randomly selecting labeled image data and unlabeled image data;

the tagged image data is enhanced through random data to obtain enhanced tagged image data; the label-free image label data are subjected to random data enhancement to respectively obtain first enhanced label-free image data and second enhanced label-free image data;

inputting the enhanced labeled data to a first pedestrian detection network to obtain a first pedestrian prediction characteristic; inputting the first enhanced non-tag data to a first pedestrian detection network to obtain a second pedestrian prediction characteristic; inputting the second enhanced unlabeled image data to a second pedestrian detection network to obtain a third pedestrian prediction feature;

2. The unsupervised domain adapted pedestrian detection method of claim 1, further comprising:

repeating the steps until the first pedestrian detection network and the second pedestrian monitoring network are converged to obtain an updated first pedestrian detection network and an updated second pedestrian monitoring network;

3. The unsupervised domain adapted pedestrian detection method of claim 1, wherein the tagged image data is randomly selected from image data for which tag data is known; and randomly selecting the non-label image data from the to-be-detected non-label image data.

4. The unsupervised domain adapted pedestrian detection method of claim 1, wherein the first, second and third pedestrian prediction features comprise image pedestrian size information, classification information and location information.

5. The unsupervised domain adapted pedestrian detection method of claim 3, wherein the supervised learning cost and consistency cost comprise a pedestrian classification loss, a pedestrian center point offset loss, a pedestrian border width and a height loss.

6. The unsupervised domain adapted pedestrian detection method of claim 1, wherein the first and second pedestrian detection networks initially employ the same neural network architecture.

7. The unsupervised domain adapted pedestrian detection method of claim 1, wherein the weight parameters of the second pedestrian detection network are obtained by exponential moving average in a training process of weight calculation of the first pedestrian detection network.

8. A pedestrian detection system adaptive to an unsupervised domain is characterized by specifically comprising:

the data enhancement module: the image processing device is used for enhancing the labeled image data through random data to obtain enhanced labeled image data; the image enhancement module is used for enhancing the label-free image data through random data to respectively obtain first enhanced label-free image data and second enhanced label-free image data;

the characteristic prediction network module: the system is used for inputting the enhanced labeled data to a first pedestrian detection network to obtain a first pedestrian prediction characteristic; the first enhanced non-tag data is input to a first pedestrian detection network to obtain a second pedestrian prediction characteristic; the second enhanced unlabeled image data is input to a second pedestrian detection network to obtain a third pedestrian prediction characteristic;

a supervised learning cost module: the monitoring learning cost is obtained according to the label features of the labeled image data and the first pedestrian prediction features;

a total cost module: the device is used for adding the supervised learning cost and the consistency cost to obtain a total cost;

a first pedestrian detection network update module: the weight parameter of the first pedestrian detection network is updated through a random gradient descent algorithm according to the total cost;

9. The unsupervised domain adapted pedestrian detection system of claim 8, further comprising:

training a convergence module: the pedestrian detection method comprises the steps of obtaining a first pedestrian detection network and a second pedestrian monitoring network after updating, and repeating the steps until the first pedestrian detection network and the second pedestrian monitoring network converge to obtain the updated first pedestrian detection network and the updated second pedestrian monitoring network;

10. A computer-readable storage medium, having stored thereon a computer program; a computer program for execution by a processor for implementing an unsupervised domain adapted pedestrian detection method according to any one of claims 1-7.