CN115880727A

CN115880727A - Training method and device for human body recognition model

Info

Publication number: CN115880727A
Application number: CN202310214276.9A
Authority: CN
Inventors: 陈鑫嘉
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2023-03-01
Filing date: 2023-03-01
Publication date: 2023-03-31

Abstract

The embodiment of the application discloses a training method and a training device for a human body recognition model. The method comprises the steps of acquiring a face image and a body image of a pedestrian, and respectively extracting face features of the face image and body features of the body image; establishing a file by using the face characteristics of the face image and the body characteristics of the body image to obtain a first pedestrian file; then judging the space-time accessibility between the image acquisition place of the first human body image and the image acquisition place of the second human body image in the first pedestrian file, and deleting the second human body image from the first pedestrian file if the space-time accessibility is not met to obtain a second pedestrian file; and then screening out human body images from the second pedestrian file to serve as training data, and training the human body recognition model by using the training data. Through the application, the technical problem that the pedestrian file with high confidence coefficient cannot be acquired in the related technology is solved, and the technical effect of improving the accuracy of the pedestrian file is achieved.

Description

Training method and device for human body recognition model

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a method and an apparatus for training a human body recognition model.

Background

In the process of establishing the file, the images belonging to the same person are classified into the same pedestrian file, and the images belonging to different persons are classified into different pedestrian files. However, for the profiling of massive images, the first existing solution is: the adoption tradition is beaten the label mode and is built shelves, can lead to work load big, the error scheduling problem that inefficiency and artifical error brought. The existing scheme is as follows: generally, the holographic file is focused on a pure face file and is supplemented with a holographic file for displaying information of a human body, a vehicle and the like. In the process of filing, maximum similarity filing or single factor (human face or human body) is adopted for image clustering, the accuracy is low, and different results cannot be uniformly managed; moreover, noise images may exist in the archived pedestrian files, and if the noise images cannot be effectively eliminated, the accuracy of the pedestrian files is greatly reduced. In addition, the pedestrian file acquired by the existing scheme is used as training data for training the relevant model, and the confidence coefficient of the training data is low, so that the recognition effect of the model can not be further improved obviously.

An effective solution to the above problems has not been proposed.

Disclosure of Invention

The embodiment of the application provides a training method and a training device for a human body recognition model, and at least solves the technical problem that a pedestrian file with high confidence level cannot be obtained in the related technology.

According to an aspect of an embodiment of the present application, there is provided a training data processing method, including: acquiring a face image and a body image of a pedestrian, and respectively extracting the face characteristics of the face image and the body characteristics of the body image; establishing a file based on the human face characteristics of the human face image and the human body characteristics of the human body image to obtain a first pedestrian file; classifying the human body image in the first pedestrian file into a first human body image and a second human body image, and judging the space-time accessibility between the image acquisition place of the first human body image and the image acquisition place of the second human body image, wherein the first human body image comprises a human face image, and the second human body image does not comprise the human face image; and under the condition that the space-time accessibility is not met, deleting the second human body image from the first pedestrian file to obtain a second pedestrian file.

Optionally, the creating a document based on the face features of the face image and the body features of the body image to obtain a first pedestrian file, including: clustering the face features of the face images and the human body features of the human body images respectively to obtain face cluster clusters and human body cluster clusters; and associating the face cluster with the human body cluster according to the face image in the human body cluster and the face image in the human body cluster to obtain the first pedestrian file.

Optionally, the determining temporal and spatial accessibility between the image capturing location of the first human body image and the image capturing location of the second human body image comprises: acquiring a spatial distance between an image acquisition place of the first human body image and an image acquisition place of the second human body image and a spatial distance threshold, wherein the spatial distance threshold is determined according to a time interval between image acquisition time of the first human body image and image acquisition time of the second human body image; comparing the spatial distance with the spatial distance threshold, and determining whether the spatiotemporal reachability is satisfied based on a comparison result.

Optionally, obtaining a spatial distance threshold between an image capturing location of the first human body image and an image capturing location of the second human body image includes: acquiring a time interval between the image acquisition time of the first human body image and the image acquisition time of the second human body image; under the condition that the motion posture of the first human body image is the same as that of the second human body image, acquiring a preset motion speed corresponding to the motion posture; calculating the spatial distance threshold based on the time interval and the predetermined movement speed.

Optionally, after classifying the human image in the first pedestrian profile into a first human image and a second human image, the method further comprises: acquiring similarity between the human body characteristics of the first human body image and the human body characteristics of the second human body image; and deleting the second human body image from the first pedestrian file under the condition that the similarity is lower than a preset similarity threshold value.

Optionally, after obtaining the second pedestrian profile, the method further comprises: acquiring a plurality of second pedestrian files of the same pedestrian, wherein each second pedestrian file corresponds to a different preset time period; and carrying out clothing classification on the human body images in the plurality of second pedestrian files to obtain clothing classification results of the same pedestrian, wherein the clothing classification results comprise human body images corresponding to different clothing types.

According to another aspect of the embodiments of the present application, there is also provided a training method of a human body recognition model, including: acquiring a human body image, and partitioning the human body image to obtain a plurality of image blocks; respectively reducing/increasing the dimension of the image blocks, and inputting the image blocks subjected to dimension reduction/increasing into a transform neural network to obtain a characteristic diagram; inputting the characteristic diagram into a full-connection network to obtain a classification result; respectively calculating an identity label loss value and a triple loss value corresponding to the human body image of the training data based on the classification result; and optimizing the human body recognition model by using the identity label loss value and the triple loss value corresponding to the human body image of the training data.

Optionally, the human body recognition model is trained from human body images in a predetermined pedestrian profile or human body images in a second pedestrian profile of a historical time period, wherein the historical time period includes at least one time period before the predetermined time period.

According to another aspect of the embodiments of the present application, there is also provided a training data processing apparatus including: the first processing module is used for acquiring a face image and a body image of a pedestrian and respectively extracting the face characteristics of the face image and the body characteristics of the body image; the second processing module is used for establishing files based on the human face characteristics of the human face image and the human body characteristics of the human body image to obtain a first pedestrian file; a third processing module, configured to classify a human body image in the first pedestrian file into a first human body image and a second human body image, and determine spatiotemporal accessibility between an image acquisition location of the first human body image and an image acquisition location of the second human body image, where the first human body image includes a face image, and the second human body image does not include a face image; and the fourth processing module is used for deleting the second human body image from the first pedestrian file to obtain a second pedestrian file under the condition that the space-time accessibility is not met.

According to another aspect of the embodiments of the present application, there is also provided a training apparatus for a human body recognition model, including: the acquisition module is used for acquiring a second pedestrian file of a preset time period according to the training data processing method; and the training module is used for screening out human body images from the second pedestrian file to serve as training data, and training the human body recognition model by using the training data.

According to another aspect of the embodiments of the present application, there is also provided an electronic device, including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to perform the steps of the method of any of the above.

According to another aspect of the embodiments of the present application, there is also provided a computer-readable storage medium, which includes a stored program, wherein when the program runs, the apparatus on which the computer-readable storage medium is located is controlled to execute the steps of the method of any one of the above.

In the embodiment of the application, the face image and the body image of a pedestrian are obtained, and the face characteristic of the face image and the body characteristic of the body image are respectively extracted; establishing a file by using the face characteristics of the face image and the body characteristics of the body image to obtain a first pedestrian file; then the human body image in the first pedestrian file is classified into a first human body image and a second human body image, the space-time accessibility between the image acquisition place of the first human body image and the image acquisition place of the second human body image is judged, if the space-time accessibility is not met, the second human body image is deleted from the first pedestrian file, and the second pedestrian file is obtained.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

fig. 1 is a flowchart of a training data processing method according to an embodiment of the present disclosure;

FIG. 2 is a flowchart of a training method for a human body recognition model according to an embodiment of the present disclosure;

fig. 3 is a schematic diagram of a training data processing apparatus according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a training apparatus for human body recognition model according to an embodiment of the present application;

fig. 5 is a block diagram of a target gathering system according to an embodiment of the present application;

fig. 6 is a schematic diagram of an image capturing unit according to an embodiment of the present application;

fig. 7 is a flowchart illustrating a video analysis module processing a video stream according to an embodiment of the present application;

fig. 8 is a flowchart of clothing classification of a human body image in a pedestrian file according to an embodiment of the present disclosure;

FIG. 9 is a schematic diagram of a training data organization provided by an embodiment of the present application;

fig. 10 is a flowchart of a training method of a pedestrian re-identification model according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that the terms "first", "second", and the like in the description and claims of the present application and the accompanying drawings are used for distinguishing different objects, and are not used for limiting a specific order. The steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer-executable instructions and, although a logical order is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order different than here.

For convenience of description, some nouns or terms appearing in the present application will be described in detail below.

Target gathering: the face image, the body image and the picture of the vehicle driven by the target, which belong to the same pedestrian, are gathered together, and the information such as the target activity route can be drawn quickly and accurately by combining the snapshot time and space information.

And (3) pedestrian re-identification: the method is a technology for judging whether a specific pedestrian exists in an image or a video sequence by utilizing a computer vision technology. Given a monitored pedestrian image, the pedestrian image is retrieved across the device. The visual limitation of a fixed camera is overcome, the pedestrian detection technology can be combined, and the method can be widely applied to the fields of intelligent video monitoring, intelligent security and the like.

And (3) autonomous learning: the pedestrian re-identification model is automatically evolved by utilizing the high confidence data of the collected files along with the continuous operation of the deployed system, the model is triggered and updated in stages, the model capacity is improved, and the collection effect of the files is continuously improved.

According to an aspect of an embodiment of the present application, there is provided a training data processing method, and fig. 1 is a flowchart of the training data processing method provided by the embodiment of the present application, as shown in fig. 1, the method includes the following steps:

step S102, acquiring a face image and a body image of a pedestrian, and respectively extracting the face characteristics of the face image and the body characteristics of the body image;

the face image and the body image of the pedestrian may be collected within a predetermined time period, which includes, but is not limited to, 1 day, 3 days, a week, and the like.

Step S104, establishing a file based on the human face characteristics of the human face image and the human body characteristics of the human body image to obtain a first pedestrian file;

optionally, in the document creating process, the identity tag is used for tagging the face image and the human body image, so that the same pedestrian file corresponds to one identity tag.

Step S106, classifying the human body image in the first pedestrian file into a first human body image and a second human body image, and judging the space-time accessibility between the image acquisition place of the first human body image and the image acquisition place of the second human body image, wherein the first human body image comprises a human face image, and the second human body image only comprises a human body image;

and step S108, deleting the second human body image from the first pedestrian file to obtain a second pedestrian file under the condition that the space-time accessibility is not met.

It should be noted that the face images and the body images of different pedestrians are identified by different identity tags, and each identity tag corresponds to a different pedestrian archive.

Through the steps, the face image and the body image of the pedestrian can be obtained, and the face characteristic of the face image and the body characteristic of the body image are respectively extracted; establishing a file by using the face characteristics of the face image and the body characteristics of the body image to obtain a first pedestrian file; then the human body image in the first pedestrian file is classified into a first human body image and a second human body image, the space-time accessibility between the image acquisition place of the first human body image and the image acquisition place of the second human body image is judged, if the space-time accessibility is not met, the second human body image is deleted from the first pedestrian file, and the second pedestrian file is obtained.

In an optional embodiment, the obtaining a first pedestrian file based on the facial features of the facial image and the body features of the body image includes: clustering the face features of the face images and the human body features of the human body images respectively to obtain face cluster clusters and human body cluster clusters; and associating the face cluster with the human body cluster according to the face image in the human body cluster and the face image in the human body cluster to obtain a first pedestrian file.

Optionally, the face features of the face image are extracted through a face recognition model, and the face features are clustered by using a preset clustering algorithm to obtain a face cluster. In addition, the human body features of the human body image are extracted through the human body recognition model, and the human body features are clustered through a preset clustering algorithm to obtain a human body clustering cluster. It should be noted that the predetermined Clustering algorithm includes, but is not limited to, a Density-Based Clustering of Applications with Noise (DBSCAN), a network-Based Clustering algorithm (infomap), and the like. In addition, in the embodiment of the present application, the number of face clusters and the number of human body clusters are not limited at all.

Further, the human body recognition model, also called as a pedestrian re-recognition model, is trained according to a human body image in a predetermined pedestrian file or a human body image in a second pedestrian file of a historical time period, wherein the historical time period includes at least one time period before the predetermined time period.

In an alternative embodiment, determining spatiotemporal accessibility between an image acquisition location of the first human body image and an image acquisition location of the second human body image comprises: acquiring a spatial distance between an image acquisition place of the first human body image and an image acquisition place of the second human body image and a spatial distance threshold, wherein the spatial distance threshold is determined according to a time interval between image acquisition time of the first human body image and image acquisition time of the second human body image; the spatial distance is compared with a spatial distance threshold value, and whether space-time reachability is satisfied is determined based on the comparison result.

Optionally, acquiring a spatial distance between an image capturing location of the first human body image and an image capturing location of the second human body image comprises: acquiring a first longitude and latitude of an image acquisition place of a first human body image and a second longitude and latitude of an image acquisition place of a second human body image; and calculating the space distance between the image acquisition place of the first human body image and the image acquisition place of the second human body image according to the first longitude and the second latitude.

Further, the spatial distance may be calculated using the following expression:

wherein, the first and the second end of the pipe are connected with each other,

representing a spatial distance between an image acquisition location of the first human body image and an image acquisition location of the second human body image; />

A longitude representing an image capture location of the second human body image; />

A longitude representing an image capture location of the first human body image; />

A latitude representing an image acquisition location of the second human body image; />

A latitude of an image acquisition location representing the first human body image.

Optionally, comparing the spatial distance with a spatial distance threshold, and determining whether the spatiotemporal reachability is satisfied based on the comparison result, includes: when the space distance is larger than the space distance threshold value, judging that the space-time accessibility is not met; alternatively, when the spatial distance is less than or equal to the spatial distance threshold, it is determined that the spatiotemporal reachability is satisfied.

In the above-described embodiments of the present application, the determination as to whether the spatiotemporal accessibility is satisfied is achieved using the comparison result by comparing the spatial distance between the image capturing place of the first human body image and the image capturing place of the second human body image with the spatial distance threshold value.

In an alternative embodiment, obtaining a spatial distance threshold between the image capture location of the first human body image and the image capture location of the second human body image comprises: acquiring a time interval between the image acquisition time of the first human body image and the image acquisition time of the second human body image; under the condition that the motion posture of the first human body image is the same as that of the second human body image, acquiring a preset motion speed corresponding to the motion posture; based on the time interval and the predetermined movement speed, a spatial distance threshold is calculated.

Further, the spatial distance threshold may be calculated using the following expression:

wherein the content of the first and second substances,

representing a spatial distance threshold between an image acquisition location of the first human body image and an image acquisition location of the second human body image; />

Representing the preset movement speed corresponding to the same movement posture; />

The time interval between the image acquisition time of the first human body image and the image acquisition time of the second human body image is represented, which is also called time difference.

It should be noted that the above-mentioned motion gestures include, but are not limited to, walking, cycling, driving, etc.

Further, the predetermined movement speed for different movement postures is also different, e.g. the average speed for walking is

The average speed corresponding to cycling is ^ 4>

The average speed corresponding to the driving is->

。

In the above-described embodiment of the present application, the spatial distance threshold value is accurately calculated by using the time interval between the image capturing time of the first human body image and the image capturing time of the second human body image, and the predetermined movement speed corresponding to the movement posture when the movement posture of the first human body image and the movement posture of the second human body image are the same.

In an optional embodiment, after classifying the human body image in the first pedestrian profile into the first human body image and the second human body image, the method further comprises: acquiring similarity between the human body characteristics of the first human body image and the human body characteristics of the second human body image; and deleting the second human body image from the first pedestrian file under the condition that the similarity is lower than a preset similarity threshold value.

Further, in the process of obtaining the similarity between the human body features of the first human body image and the human body features of the second human body image, the following expression may be adopted for calculating the similarity:

representing the similarity between the human body characteristics of the first human body image and the human body characteristics of the second human body image, which is also called human body characteristic similarity; />

A human body feature representing a second human body image; />

Representing a body feature of the first body image.

Optionally, firstly, calculating a similarity between the human features of the first human body image and the human features of the second human body image, then comparing the similarity with a preset similarity threshold, and if the similarity is smaller than the preset similarity threshold, deleting the second human body image from the first pedestrian file; and if the similarity is greater than or equal to a preset similarity threshold, reserving the second human body image in the first pedestrian file. It should be noted that the preset similarity threshold may be set according to the needs of the application scenario.

In the above embodiment of the application, the human body image denoising of the first pedestrian file is realized by using the human body feature similarity, and the human body image which does not meet the requirement of the application scene is deleted from the first pedestrian file, so that the accuracy of the human body image in the first pedestrian file is improved.

In an alternative embodiment, after obtaining the second pedestrian profile, the method further comprises: acquiring a plurality of second pedestrian files of the same pedestrian, wherein each second pedestrian file corresponds to a different preset time period; and carrying out clothing classification on the human body images in the plurality of second pedestrian files to obtain clothing classification results of the same pedestrian, wherein the clothing classification results comprise human body images corresponding to different clothing types.

Optionally, after obtaining the second pedestrian file, obtaining the second pedestrian file of the same pedestrian in a plurality of predetermined time periods, and performing clothing classification on the human body images in the plurality of second pedestrian files by using the trained clothing recognition model, thereby obtaining a clothing classification result of the same pedestrian. For the human body images in the second pedestrian files, the identity labels of the pedestrians are adopted for identification, namely the second pedestrian files correspond to the identity labels one by one.

It should be noted that the clothing recognition model can be used to recognize clothing features in the human body image, including style, category, color, etc. It converts the information of the human body image into the features that the model can interpret by using the neural network, and then classifies clothing according to the features.

In the above embodiment of the present application, the human body images in the plurality of second pedestrian files are subjected to clothing classification, so as to obtain the human body images corresponding to the same clothing type of the same pedestrian.

According to another aspect of the embodiments of the present application, there is also provided a method for training a human body recognition model, and fig. 2 is a flowchart of the method for training a human body recognition model provided in the embodiments of the present application, as shown in fig. 2, the method includes the following steps:

step S202, acquiring a human body image, and blocking the human body image to obtain a plurality of image blocks;

step S204, respectively reducing/increasing the dimension of the plurality of image blocks, and inputting the plurality of image blocks subjected to dimension reduction/increasing into a transform neural network to obtain a characteristic diagram;

step S206, inputting the feature map into a full-connection network to obtain a classification result;

step S208, respectively calculating an identity label loss value and a triple loss value corresponding to the human body image of the training data based on the classification result;

and S210, optimizing the human body recognition model by using the identity label loss value and the triple loss value corresponding to the human body image of the training data.

The human body recognition model includes, but is not limited to, a transform neural network, a fully connected network, and the like. In addition, according to the training data processing method, a second pedestrian file of a preset time period is obtained; and screening the human body image from the second pedestrian file to serve as training data, and training the human body recognition model by using the training data.

Further, the human body images can be screened out from the human body images in the preset pedestrian file or the human body images in the second pedestrian file in the historical time period to serve as training data, and the human body recognition model is trained by using the training data. By means of the method, on the basis that the human body images are screened out from the second pedestrian file to serve as training data, the sample size of the training data is increased, and the training effect is further improved. And the trained human body recognition model can be applied to the filing of the next time period.

In addition, in the process of organizing the training data, whether the human body images are screened out from the second pedestrian file or the human body images in the preset pedestrian file or the second pedestrian file in the historical time period, the human body images are the human body images corresponding to the same clothing type of the same pedestrian, namely the human body images corresponding to the same clothing type of the same identity tag.

In the embodiment of the application, the face image and the body image of a pedestrian are obtained, and the face characteristic of the face image and the body characteristic of the body image are respectively extracted; establishing a file by using the face characteristics of the face image and the body characteristics of the body image to obtain a first pedestrian file; then judging the space-time accessibility between the image acquisition place of the first human body image and the image acquisition place of the second human body image in the first pedestrian file, and deleting the second human body image from the first pedestrian file if the space-time accessibility is not met to obtain a second pedestrian file; and then screening out the human body image from the second pedestrian file to be used as training data, and training the human body recognition model by using the training data. Through the method and the device, the technical problem that the pedestrian file with high confidence level cannot be obtained in the related technology is solved, and the technical effect of improving the accuracy of the pedestrian file is achieved.

It should be noted that the accuracy of the human body recognition model can be improved and the recognition effect of the human body recognition model can be improved by adopting the training method of the human body recognition model.

Optionally, the human body recognition model is obtained by training according to a human body image in a predetermined pedestrian file or a human body image in a second pedestrian file of a historical time period, wherein the historical time period includes at least one time period before the predetermined time period.

It should be noted that the human body recognition model evolves autonomously, triggers and updates the model in stages, and improves the model capability, thereby continuously improving the aggregation effect of the files.

In the embodiment of the application, the human body characteristics corresponding to the human body image of the training data can be extracted by using the human body recognition model, the identity tag loss value and the triple loss value corresponding to the human body image of the training data are respectively calculated, and then the human body recognition model is optimized by using the identity tag loss value and the triple loss value respectively, so that the accuracy of the human body recognition model is improved, and the recognition effect of the human body recognition model is improved.

According to another aspect of the embodiments of the present application, there is also provided a training data processing apparatus, and fig. 3 is a schematic diagram of a training data processing apparatus provided in an embodiment of the present application, as shown in fig. 3, the training data processing apparatus includes: a first processing module 302, a second processing module 304, a third processing module 306, and a fourth processing module 308. The training data processing apparatus will be described in detail below.

The first processing module 302 is configured to obtain a face image and a body image of a pedestrian, and extract a face feature of the face image and a body feature of the body image respectively;

the second processing module 304 is connected with the first processing module 302, and is used for performing document creation based on the human face features of the human face image and the human body features of the human body image to obtain a first pedestrian file;

a third processing module 306, connected to the second processing module 304, configured to classify the human body image in the first pedestrian archive into a first human body image and a second human body image, and determine a spatiotemporal accessibility between an image acquisition location of the first human body image and an image acquisition location of the second human body image, where the first human body image includes a face image, and the second human body image does not include the face image;

and the fourth processing module 308 is connected to the third processing module 306, and configured to delete the second human body image from the first pedestrian file to obtain a second pedestrian file if the space-time accessibility is not satisfied.

In the embodiment of the application, the device can respectively extract the face features of the face image and the body features of the body image by acquiring the face image and the body image of a pedestrian; establishing a file by using the face characteristics of the face image and the body characteristics of the body image to obtain a first pedestrian file; then the human body image in the first pedestrian file is classified into a first human body image and a second human body image, the space-time accessibility between the image acquisition place of the first human body image and the image acquisition place of the second human body image is judged, if the space-time accessibility is not met, the second human body image is deleted from the first pedestrian file, and the second pedestrian file is obtained.

It should be noted here that the first processing module 302, the second processing module 304, the third processing module 306, and the fourth processing module 308 correspond to steps S102 to S108 in the method embodiment, and the modules are the same as the corresponding steps in the implementation example and the application scenario, but are not limited to the disclosure in the method embodiment.

Optionally, the second processing module 304 includes: the clustering unit is used for respectively clustering the face features of the face images and the human body features of the human body images to obtain face clustering clusters and human body clustering clusters; and the association unit is used for associating the face cluster with the human body cluster according to the face image in the human body cluster and the face image in the human face cluster to obtain a first pedestrian file.

Optionally, the third processing module 306 includes: a first acquisition unit for acquiring a spatial distance between an image acquisition place of the first human body image and an image acquisition place of the second human body image and a spatial distance threshold value, wherein the spatial distance threshold value is determined according to a time interval between an image acquisition time of the first human body image and an image acquisition time of the second human body image; a determination unit configured to compare the spatial distance with a spatial distance threshold value, and determine whether the spatiotemporal reachability is satisfied based on a comparison result.

Optionally, the first obtaining unit includes: the first acquisition subunit is used for acquiring the time interval between the image acquisition time of the first human body image and the image acquisition time of the second human body image; the second acquiring subunit is used for acquiring a preset movement speed corresponding to the movement posture under the condition that the movement posture of the first human body image is the same as the movement posture of the second human body image; a calculating subunit for calculating the spatial distance threshold based on the time interval and the predetermined movement speed.

Optionally, the apparatus further comprises: a tenth processing module, comprising: a second obtaining unit configured to obtain a similarity between a human body feature of the first human body image and a human body feature of the second human body image after classifying the human body image in the first pedestrian file into the first human body image and the second human body image; and the deleting unit is used for deleting the second human body image from the first pedestrian file under the condition that the similarity is lower than a preset similarity threshold.

Optionally, the apparatus further comprises: an eleventh processing module comprising: the third acquisition unit is used for acquiring a plurality of second pedestrian files of the same pedestrian after the second pedestrian files are obtained, wherein each second pedestrian file corresponds to a different preset time period; and the classification unit is used for carrying out clothing classification on the human body images in the second pedestrian files to obtain clothing classification results of the same pedestrian, wherein the clothing classification results comprise the human body images corresponding to different clothing types.

According to another aspect of the embodiment of the present application, there is further provided a training apparatus for a human body recognition model, and fig. 4 is a schematic diagram of the training apparatus for a human body recognition model provided in the embodiment of the present application, as shown in fig. 4, the training apparatus for a human body recognition model includes: a fifth processing module 402, a sixth processing module 404, a seventh processing module 406, an eighth processing module 408 and a ninth processing module 410. The following describes the training apparatus of the human body recognition model in detail.

A fifth processing module 402, configured to obtain a human body image, and block the human body image to obtain a plurality of image blocks;

a sixth processing module 404, configured to perform dimension reduction/dimension lifting on the multiple image blocks respectively, and input the multiple image blocks after dimension reduction/dimension lifting into a transform neural network to obtain a feature map;

a seventh processing module 406, configured to input the feature map into a fully connected network to obtain a classification result;

an eighth processing module 408, configured to calculate, based on the classification result, an identity label loss value and a triple loss value corresponding to the human body image of the training data, respectively;

the ninth processing module 410 is configured to optimize the human body recognition model by using the identity label loss value and the triplet loss value corresponding to the human body image of the training data.

In the embodiment of the application, the device acquires a face image and a body image of a pedestrian and respectively extracts the face characteristics of the face image and the body characteristics of the body image; the method comprises the steps of establishing a file by using human face characteristics of a human face image and human body characteristics of a human body image to obtain a first pedestrian file; then judging the space-time accessibility between the image acquisition place of the first human body image and the image acquisition place of the second human body image in the first pedestrian file, and deleting the second human body image from the first pedestrian file if the space-time accessibility is not met to obtain a second pedestrian file; and then screening out human body images from the second pedestrian file to serve as training data, and training the human body recognition model by using the training data. Through the application, the technical problem that the pedestrian file with high confidence coefficient cannot be acquired in the related technology is solved, and the technical effect of improving the accuracy of the pedestrian file is achieved.

It should be noted that, the training device for the human body recognition model can improve the accuracy of the human body recognition model and improve the recognition effect of the human body recognition model.

It should be noted here that the fifth processing module 402, the sixth processing module 404, the seventh processing module 406, the eighth processing module 408 and the ninth processing module 410 correspond to steps S202 to S210 in the method embodiment, and the modules are the same as the examples and application scenarios realized by the corresponding steps, but are not limited to the disclosure in the method embodiment.

According to another aspect of the embodiments of the present application, there is also provided a target archive system, and fig. 5 is a block diagram of the target archive system provided in the embodiments of the present application, and as shown in fig. 5, the system includes: the system comprises an image acquisition unit, a target gathering intelligent analysis unit and a data storage and system management unit.

The image acquisition unit comprises a plurality of high-definition cameras and security monitoring cameras, and the high-definition cameras acquire face images and human body images. The security surveillance camera collects images of the body (a small number of images with faces). Fig. 6 is a schematic diagram of an image capturing unit according to an embodiment of the present disclosure, and as shown in fig. 6, a plurality of cameras are erected in an area.

The target file-gathering intelligent analysis unit comprises a human face analysis server, an autonomous evolution server and a network transmission unit.

Fig. 7 is a flowchart of a video analysis module processing a video stream according to an embodiment of the present application, and as shown in fig. 7, the video analysis module processes an accessed video stream (a high definition video camera and a security monitoring camera), and outputs a face optimal frame and a body optimal frame of the same pedestrian through face and body detection, target tracking and association relationship analysis.

The human face image and the human body image are derived from video structured analysis and snapshot of a front-end intelligent camera, and data are gathered to a human face and human body analysis server, so that the target document gathering capacity is realized.

And (5) performing target filing by using the face image, and establishing a pedestrian file carrying an identity tag. And extracting face features from the face image through a face recognition model, and inputting the face features into a clustering algorithm (not limited to clustering algorithms such as DBSCAN and infomap) together to obtain a face filing result. And marking the identity label on the face image in the face clustering cluster to obtain the pedestrian file carrying the identity label.

The human body images are clustered, human body features are extracted from the human body images through a human body recognition model (also called a pedestrian re-recognition model), and the human body features are input into a clustering algorithm (not limited to clustering algorithms such as DBSCAN and infomap) together to obtain a human body clustering result (corresponding to the human body clustering cluster).

Through the human face human body incidence relation, clusters containing successfully clustered incidence human faces in human body cluster clusters are reserved, other human body cluster clusters are considered to have no human faces to display, and the human face cluster clusters can be filtered. By this point, the construction of the target archive is complete.

The autonomous evolution server realizes autonomous learning of the pedestrian re-identification model, triggers and updates the model in stages, continuously improves the archive convergence effect, and has the autonomous evolution capability of the archive.

Firstly, the human body images in the pedestrian files are subjected to denoising operation, so that the human body images of the same person are ensured to be left in a single pedestrian file, and then the method is used for autonomous learning of a pedestrian re-identification model. By period

Cut out in>

To

In the period, a human body denoising scheme is designed. The human body image (i.e. the human body image containing the human face image) associated with the human face identity of the pedestrian file is a candidate human body cover, and a plurality of human body covers (corresponding to the first human body image) are determined according to the total human body image quality score.

The quality evaluation of the human body image comprises the following specific implementation processes of attitude, orientation, definition, illumination, integrity, pitch angle, area, edge approaching and the like: the postures are classified into walking, riding, driving and the like

If the gesture is>

And if not, only the human body in the current posture is taken as the cover. If the gesture has>

Each then eachThe human body is obtained as the cover in equal posture. The orientations are divided into forward, backward and lateral. If only +>

In each direction, only the human body in the current direction is taken as the cover. If there is an orientation

And then the human body is obtained as a cover in every direction. And the total score is calculated by weighting other scoring items. Hypothesis definition

In light or on light>

Degree of completeness->

And a pitch angle->

Area->

Close to the edge->

Total score of

Wherein, in the step (A),

is->

And->

. In each orientation, the candidate body covers are ranked from high to low according to the total score, and the highest score is taken. I.e. here at most

A human body cover.

There are two denoising schemes, one of which may be used, or a combination of both.

In the scheme 1, for any human body image (corresponding to the second human body image) in the pedestrian file, the spatial distance between the image acquisition place of the human body image and the image acquisition place of the human body cover is calculated, and the retention or deletion of the human body image in the pedestrian file is determined by comparing the spatial distance with a spatial distance threshold value.

And 2, for any human body image in the pedestrian file, calculating the human body feature similarity between the human body image and the cover of the human body, and comparing the human body feature similarity with a preset human body feature similarity threshold value to determine the retention or deletion of the human body image in the pedestrian file.

Fig. 8 is a flowchart of clothing classification of a human body image in a pedestrian file according to an embodiment of the present disclosure, and as shown in fig. 8, in the pedestrian file with denoising function, a period is divided by using a trained clothing recognition model

To>

(e.g., period @)>

One month) of the pedestrian files, and classifies the pedestrian file X into a garment a, a garment B, a garment C, and the like. And (4) carrying out pedestrian re-recognition model training by using the human body images of the single set of clothes of each file. And organizing training samples according to rules by utilizing the clothing recognition model, iterating the pedestrian re-recognition model to be convergent, automatically deploying and improving the pedestrian re-recognition effect, and improving the target gathering system capacity.

Trigger time for autonomous learning with period

In units, e.g. <>

Period in>

Triggering the pedestrian to re-recognize the model learning at any moment, and applying the model to the judgment after the model is converged>

And labeling the human body image of the periodic pedestrian file. Is at>

Triggering the autonomous learning of a next pedestrian re-recognition model at any moment, and applying the autonomous learning to the & lt/EN & gt>

And labeling the human body image of the periodic file. FIG. 9 is a schematic diagram of a training data organization according to an embodiment of the present application, as shown in FIG. 9, the old historical identification tag data, the newly added historical identification tag data, and the newly generated identification tag data are used to combine the identity and the clothing type to organize the training data ^ and ^ according to the training data>

(consisting of old data of historical identity labels, newly added data of historical identity labels and newly generated identity label data, each identity label only picks out a set of human body images of clothes, and the identity label is selected by ^ or ^ based on>

And each identity label) starting model training according to the training mode of the classification model until the model converges. Based on time>

The training data is gradually increased, the capability and adaptability of the pedestrian re-identification model are enhanced, and the labeling effect of the human body image in the pedestrian file is correspondingly improved.

In that

Periodic, using predetermined identity tag data as historyThe identity tag is old data. When the system is operated with a +>

And after a period, generating a large amount of new data in the pedestrian file, and denoising. And taking the historical identity label old data, the historical identity label new data and the newly generated identity label data as the training data of the period. Training data

Wherein is present>

Individual identity tag, each identity tag being taken>

A human body image can be formed

A compliant triplet. And in a certain identity label, the human body image of the same set of clothing is taken for training until the model training is converged. Because the human body weight recognition model learns the appearance characteristics including the information of clothing color, style and the like, if the identity label has the change of clothes in different days or the same day, K human body images which form the identity label by taking a plurality of sets of clothes can mislead the model training, and the training effect is poor.

Will be provided with

Pedestrian re-recognition model for periodic autonomous training for->

And the human body clustering effect of the periodic target file-gathering system can be correspondingly improved. Reuse of>

In the pedestrian file obtained periodically, the existing identity label in the last period is called as old data of the historical identity label, the new identity label obtained in the new period and the same as the identity label in the last period is called as new data of the historical identity label,the data different from the identity tag of the previous cycle is newly generated identity tag data. And repeating the organization mode of the training data in the last period to carry out iterative training of the pedestrian re-identification model.

Fig. 10 is a flowchart of a training method for a pedestrian re-identification model according to an embodiment of the present application, and as shown in fig. 10, a human body image is divided into two

Image blocks, there may be an overlap between blocks. And (3) inputting each image block into a Transformer neural network through dimension reduction/dimension lifting, respectively calculating a corresponding identity tag loss value and a triple loss value between classes through two loss guides, and optimizing the pedestrian re-identification model by using the identity tag loss value and the triple loss value. In a single cycle>

In that a decision is made to pick up/pick up a device from a target gathering system>

The individual identity tags can accurately acquire the human body image of the single identity tag with the consistent clothing through the clothing classification of each identity tag, and the formed image is combined with the corresponding identity tags>

The size of each identity tag is large, and is generally millions. Through the optimization training of the pedestrian re-recognition model in the period, the pedestrian re-recognition model with stronger feature expression capability is obtained, and therefore the effect of the target gear-gathering system is further improved.

The network transmission unit refers to an industrial switch and an optical fiber transceiver which are arranged in a front-end chassis and are responsible for constructing a crossing local area network, realizing the transmission and exchange of front-end data and transmitting the snapshot data recorded by the front end to a rear-end monitoring center.

The data storage and system management unit is arranged in the rear-end monitoring center and is mainly responsible for storing videos collected by the image collection unit, face images and human body images which are captured in a snapping mode and the like. And the system management is responsible for configuring and managing the snapshot system.

It should be noted that, the front-end camera and the back-end server are adopted to cooperate to realize the intelligent analysis of target document gathering, in the fields of artificial intelligence, security protection and the like, images belonging to the same person are classified into the same document through image clustering, images belonging to different persons are classified into different documents, and the method can be used for analyzing and predicting the behaviors of users and realizing the application of target identification and the like. For example, when an abnormal event occurs, useful information can be obtained quickly by searching images and the like from massive images, and upper-layer application analysis is performed based on the target document gathering capability, so that a prompt effect is achieved.

According to another aspect of embodiments of the present application, there is also provided a computer-readable storage medium comprising a stored program, wherein the program, when executed, controls an apparatus in which the computer-readable storage medium is located to perform the steps of any one of the above methods.

In the above embodiments of the present application, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described apparatus embodiments are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or may not be executed. In addition, the shown or discussed coupling or direct coupling or communication connection between each other may be an indirect coupling or communication connection through some interfaces, units or modules, and may be electrical or in other forms.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may also be implemented in the form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

The foregoing is only a preferred embodiment of the present application and it should be noted that those skilled in the art can make several improvements and modifications without departing from the principle of the present application, and these improvements and modifications should also be considered as the protection scope of the present application.

Claims

1. A method of processing training data, comprising:

acquiring a face image and a body image of a pedestrian, and respectively extracting the face characteristics of the face image and the body characteristics of the body image;

establishing a file based on the human face characteristics of the human face image and the human body characteristics of the human body image to obtain a first pedestrian file;

classifying the human body image in the first pedestrian file into a first human body image and a second human body image, and judging the space-time accessibility between the image acquisition place of the first human body image and the image acquisition place of the second human body image, wherein the first human body image comprises a human face image, and the second human body image only comprises a human body image;

and under the condition that the space-time accessibility is not met, deleting the second human body image from the first pedestrian file to obtain a second pedestrian file.

2. The method of claim 1, wherein the obtaining a first pedestrian file by performing a filing based on the facial features of the facial image and the body features of the body image comprises:

clustering the face features of the face images and the human body features of the human body images respectively to obtain face cluster clusters and human body cluster clusters;

and associating the face cluster with the human body cluster according to the face image in the human body cluster and the face image in the human body cluster to obtain the first pedestrian file.

3. The method of claim 1, wherein determining spatiotemporal reachability between the image acquisition location of the first human body image and the image acquisition location of the second human body image comprises:

acquiring a spatial distance and a spatial distance threshold between an image acquisition place of the first human body image and an image acquisition place of the second human body image, wherein the spatial distance threshold is determined according to a time interval between image acquisition time of the first human body image and image acquisition time of the second human body image;

comparing the spatial distance with the spatial distance threshold, and determining whether the spatiotemporal reachability is satisfied based on a comparison result.

4. The method of claim 3, wherein obtaining a spatial distance threshold between the location of image capture of the first human image and the location of image capture of the second human image comprises:

acquiring a time interval between the image acquisition time of the first human body image and the image acquisition time of the second human body image;

under the condition that the motion posture of the first human body image is the same as that of the second human body image, acquiring a preset motion speed corresponding to the motion posture;

calculating the spatial distance threshold based on the time interval and the predetermined movement speed.

5. The method of claim 1, wherein after classifying the human images in the first pedestrian profile into a first human image and a second human image, the method further comprises:

acquiring similarity between the human body characteristics of the first human body image and the human body characteristics of the second human body image;

and deleting the second human body image from the first pedestrian file under the condition that the similarity is lower than a preset similarity threshold value.

6. The method of any one of claims 1 to 5, wherein after obtaining the second pedestrian profile, the method further comprises:

acquiring a plurality of second pedestrian files of the same pedestrian, wherein each second pedestrian file corresponds to a different preset time period;

and carrying out clothing classification on the human body images in the plurality of second pedestrian files to obtain clothing classification results of the same pedestrian, wherein the clothing classification results comprise human body images corresponding to different clothing types.

7. A training method of a human body recognition model is characterized by comprising the following steps:

acquiring a human body image, and partitioning the human body image to obtain a plurality of image blocks;

respectively reducing/increasing the dimensions of the image blocks, and inputting the image blocks subjected to dimension reduction/increasing into a Transformer neural network to obtain a characteristic diagram;

inputting the characteristic diagram into a full-connection network to obtain a classification result;

respectively calculating an identity label loss value and a triple loss value corresponding to the human body image of the training data based on the classification result;

and optimizing the human body recognition model by using the identity label loss value and the triple loss value corresponding to the human body image of the training data.

8. The method of claim 7, wherein the human recognition model is trained from human images in a predetermined pedestrian profile or human images in a second pedestrian profile over a historical time period, wherein the historical time period comprises at least one time period prior to the predetermined time period.

9. A training data processing apparatus, comprising:

the first processing module is used for acquiring a face image and a body image of a pedestrian and respectively extracting the face characteristics of the face image and the body characteristics of the body image;

the second processing module is used for establishing files based on the human face characteristics of the human face image and the human body characteristics of the human body image to obtain a first pedestrian file;

a third processing module, configured to classify a human body image in the first pedestrian file into a first human body image and a second human body image, and determine spatiotemporal accessibility between an image acquisition location of the first human body image and an image acquisition location of the second human body image, where the first human body image includes a face image, and the second human body image does not include a face image;

and the fourth processing module is used for deleting the second human body image from the first pedestrian file to obtain a second pedestrian file under the condition that the space-time accessibility is not met.

10. A training device for human body recognition model is characterized by comprising:

the fifth processing module is used for acquiring a human body image and partitioning the human body image to obtain a plurality of image blocks;

the sixth processing module is used for respectively reducing/increasing the dimensions of the plurality of image blocks, and inputting the plurality of image blocks subjected to dimension reduction/increasing into a transform neural network to obtain a feature map;

the seventh processing module is used for inputting the characteristic diagram into a full-connection network to obtain a classification result;

the eighth processing module is used for respectively calculating an identity label loss value and a triple loss value corresponding to the human body image of the training data based on the classification result;

and the ninth processing module is used for optimizing the human body recognition model by using the identity label loss value and the triple loss value corresponding to the human body image of the training data.

11. An electronic device, comprising: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to perform the steps of the method of any one of claims 1 to 8.

12. A computer-readable storage medium, comprising a stored program, wherein the program, when executed, controls an apparatus in which the computer-readable storage medium is located to perform the steps of the method of any one of claims 1 to 8.