WO2020151300A1

WO2020151300A1 - Deep residual network-based gender recognition method and apparatus, medium, and device

Info

Publication number: WO2020151300A1
Application number: PCT/CN2019/116236
Authority: WO
Inventors: 马潜; 李洪燕
Original assignee: 平安科技（深圳）有限公司
Priority date: 2019-01-25
Filing date: 2019-11-07
Publication date: 2020-07-30
Also published as: CN109829415A

Abstract

A deep residual network-based gender recognition method, comprising: obtaining a preset number of video frames of a target object from a video stream on the basis of a pedestrian tracking algorithm (S110); inputting the preset number of video frames into a pre-trained gender recognition model to obtain gender prediction values corresponding to the target object in the preset number of video frames, respectively, wherein the gender recognition model is pre-trained on the basis of a deep residual network (S120); weighting the gender prediction values to obtain the weighted gender prediction values of the target object (S130); and obtaining the gender recognition result of the target object according to the weighted gender prediction values (S140). The method can achieve real-time gender recognition of a pedestrian without face recognition, can achieve high gender recognition efficiency and accuracy, and meets the practical application needs of real-time pedestrian gender recognition.

Description

Gender recognition method, device, medium and equipment based on deep residual network

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on January 25, 2019, the application number is 201910074634.4, and the application name is "Gender recognition method, device, medium and equipment based on deep residual network", all of which The content is incorporated in this application by reference.

Technical field

This application relates to the field of intelligent recognition technology. Specifically, this application relates to a gender recognition method, device, computer-readable storage medium, and computer equipment based on a deep residual network.

Background technique

With the rapid development of artificial intelligence technology, more and more application scenarios require intelligent identification of human gender. Currently, most gender recognition is based on face recognition technology. However, in actual application scenarios, human faces often have the problem of being occluded, and it is difficult to perform gender recognition based on human facial features. Therefore, judgments can usually only be made based on the person's figure, wear, and other appearances. The difficulty in determining the gender of pedestrians lies in the fact that some people who wear neutral clothes, are obese or whose gender characteristics are not obvious, can not achieve gender recognition only from a certain angle. The gender recognition accuracy of the existing gender recognition methods is low, and it is difficult to meet actual application requirements.

Summary of the invention

In order to solve at least one of the above technical shortcomings, this application provides the following technical solutions based on a deep residual network gender recognition method and corresponding devices, computer-readable storage media and computer equipment.

According to one aspect, the embodiments of the present application provide a method for gender recognition based on a deep residual network, including the following steps:

Obtain a preset number of video frames of the target object from the video stream based on the pedestrian tracking algorithm;

The preset number of video frames are respectively input into the pre-trained gender recognition model to obtain the gender prediction values corresponding to the target object in the preset number of video frames respectively; wherein the gender recognition model is pre-trained based on the deep residual network ；

Performing a weighted operation on the gender prediction value to obtain the weighted gender prediction value of the target object;

According to the weighted gender prediction value, the gender recognition result of the target object is obtained.

In addition, according to another aspect, the embodiments of the present application provide a gender recognition device based on a deep residual network, including:

The video frame acquisition module is used to acquire a preset number of video frames of the target object from the video stream based on the pedestrian tracking algorithm;

The predictive value acquisition module is used to input a preset number of video frames into a pre-trained gender recognition model to obtain gender predictive values corresponding to the target object in the preset number of video frames; wherein the gender recognition model is based on The deep residual network is pre-trained;

A weighted calculation module, configured to perform a weighted calculation on the gender prediction value to obtain the weighted gender prediction value of the target object;

The gender recognition result generation module is used to obtain the gender recognition result of the target object according to the weighted gender prediction value.

According to another aspect, the embodiments of the present application provide a computer-readable storage medium having a computer program stored on the computer-readable storage medium, and when the computer program is executed by a processor, the above-mentioned deep residual network-based Method of gender identification.

According to another aspect, the embodiments of the present application provide a computer device. The computer includes one or more processors; a memory; one or more computer programs, wherein the one or more computer programs are stored in the The memory is configured to be executed by the one or more processors, and the one or more computer programs are configured to execute the aforementioned method for gender recognition based on the deep residual network.

Compared with the prior art, this application has the following beneficial effects:

The gender recognition method, device, computer-readable storage medium, and computer equipment based on the deep residual network provided in this application obtain multiple video frames from the video stream during the dynamic walking of the target object, and input multiple video frames The gender recognition model pre-trained based on the deep residual network realizes the gender recognition of the target object. Real-time gender recognition of pedestrians can be realized without face recognition. The gender recognition efficiency and accuracy are high, which meets the practical application of real-time gender recognition of pedestrians. demand.

The additional aspects and advantages of the present application will be partly given in the following description, which will become obvious from the following description, or be understood through the practice of the present application.

Description of the drawings

The above and/or additional aspects and advantages of this application will become obvious and easy to understand from the following description of the embodiments in conjunction with the accompanying drawings, in which:

FIG. 1 is a method flowchart of a gender recognition method based on a deep residual network provided by an embodiment of this application;

2 is a schematic structural diagram of a gender recognition device based on a deep residual network provided by an embodiment of this application;

Fig. 3 is a schematic structural diagram of a computer device provided by an embodiment of the application.

detailed description

The embodiments of the present application are described in detail below. Examples of the embodiments are shown in the accompanying drawings, wherein the same or similar reference numerals represent the same or similar elements or elements with the same or similar functions. The embodiments described below with reference to the drawings are exemplary, and are only used to explain the present application, and cannot be construed as limiting the present application.

Those skilled in the art can understand that, unless specifically stated otherwise, the singular forms "a", "an", "said" and "the" used herein may also include plural forms. It should be further understood that the term "comprising" used in the specification of this application refers to the presence of the described features, integers, steps, operations, elements and/or components, but does not exclude the presence or addition of one or more other features, Integers, steps, operations, elements, components, and/or groups thereof. The term "and/or" as used herein includes all or any unit and all combinations of one or more associated listed items.

The embodiment of the application provides a method for gender recognition based on a deep residual network. As shown in FIG. 1, the method includes:

Step S110: Obtain a preset number of video frames of the target object from the video stream based on the pedestrian tracking algorithm.

For this embodiment, the target object is a person whose gender is to be identified.

In actual application scenarios, the target object is first tracked based on the pedestrian tracking algorithm within a preset time period, and the video stream during the dynamic walking of the target object during the preset time period is recorded by a video monitoring tool; then, from Extracting a preset number of video frames of the target object from the video stream, wherein the preset number of video frames of the target object may be obtained by extracting key frames from the video stream in a preset period, and the preset The setting period can be any length of 50ms, 80ms, 1s, etc.

For this embodiment, the preset number of acquired video frames is used as input data for inputting a pre-trained gender recognition model.

Wherein, the preset number can be any value such as 5, 9, 15, etc. Those skilled in the art can determine the specific value of the preset number according to actual application requirements, which is not limited in this embodiment.

Step S120: Input a preset number of video frames into a pre-trained gender recognition model, respectively, to obtain gender prediction values corresponding to the target object in the preset number of video frames; wherein the gender recognition model is based on a deep residual network Get pre-trained.

For this embodiment, the gender recognition model is used to extract the gender characteristics of the target object and calculate the gender prediction value.

For this embodiment, the obtained preset number of video frames are successively input into the pre-trained gender recognition model, and the sex prediction values of the respective video frames corresponding to the target object can be successively obtained. Wherein, the calculation process of the gender recognition model to estimate the gender prediction value of the target object is specifically: extracting the gender feature vector of the target object according to the video frame as input data, and further estimating the target based on the gender feature vector The probabilities that the objects are male and female respectively, the gender classification recognition of the target object is realized according to the probability that the target object is male and female.

Among them, the deep residual network (Deep residual network, ResNet) uses the residual structure as the basic structure of the network. This basic structure can be used to solve the problem of performance degradation after the network becomes deeper, and it can also improve the accuracy of gender prediction. And computing efficiency provides strong technical support.

Step S130: Perform a weighted operation on the gender prediction value to obtain the weighted gender prediction value of the target object.

For this embodiment, the gender predictive value corresponding to each video frame is weighted according to a preset weighting method, and the weighted gender predictive value of the target object is calculated. The gender predictive value corresponding to each video frame is weighted and combined. Calculating the weighted gender predictive value can obtain a more accurate gender predictive value than a single static image to identify gender, thereby obtaining a more accurate gender recognition result.

Step S140: Obtain a gender recognition result of the target object according to the weighted gender prediction value.

For this embodiment, according to the weighted gender prediction value, it is determined whether the weighted gender prediction value is greater than a preset threshold; if the weighted gender prediction value is greater than the preset threshold, it is determined that the gender of the target user is male, and the result is obtained. The gender recognition result that the gender object is a male; if the weighted gender prediction value is less than or equal to a preset threshold, the gender of the target user is determined to be female, and the gender recognition result that the gender object is a female is obtained.

The preset threshold may be 0.5. When the predicted gender value is greater than 0.5, it is determined that the gender of the target object is male, and when the predicted gender value is less than or equal to 0.5, the gender of the target object is determined to be female.

The gender recognition method based on the deep residual network provided in this application obtains multiple video frames from the video stream during the dynamic walking of the target object, and inputs the multiple video frames into the gender recognition based on the pre-trained deep residual network The model realizes the gender recognition of the target object, and can realize the real-time gender recognition of pedestrians without being based on face recognition. The efficiency and accuracy of gender recognition are high, and it meets the practical application requirements of real-time gender recognition of pedestrians.

In an embodiment, the obtaining a preset number of video frames of the target object from a video stream based on a pedestrian tracking algorithm includes:

Based on the KCF target tracking algorithm, the preset number of video frames of the target object is obtained from the video stream. The KCF target tracking algorithm has the characteristics of fast algorithm speed and strong robustness, which can further improve the efficiency and accuracy of obtaining the preset number of video frames of the target object, and meet real-time requirements.

In an embodiment, the performing a weighted operation on the gender prediction value to obtain the weighted gender prediction value of the target object includes:

Obtain the weight ratio corresponding to the preset number of video frames; wherein the weight ratio is generated according to the weight of the preset number of video frames, and the weight of the preset number of video frames is respectively based on the sequence of the timestamps of the video frames corresponding to the video stream. Sequence setting

Perform a weighted operation on the gender prediction value according to the weight ratio to obtain the weighted gender prediction value of the target object.

For this embodiment, a weight used for weighting calculation is preset for each video frame in the preset number of video frames to obtain the weight ratio of the preset number of video frames. Wherein, the weight used for weighting calculation of each video frame may be the same or different.

For this embodiment, the weight of each video frame in the preset number of video frames is set according to the sequence of the time stamp of the video frame corresponding to the video stream, that is, the weight used for weighting calculation is preset for each video frame. The sizes are respectively associated with the sequence of the time stamps of the respective video frames corresponding to the video stream. In actual application scenarios, considering that a relatively complete target object may not have been captured in the video frame acquired when the target object is initially tracked based on the pedestrian tracking algorithm, it is easy to affect the accuracy of the weighted gender prediction value of the target object. As a preferred example, according to the timestamp of the video frame corresponding to the video stream, the weight of the video frame that is lower in the order of the timestamp is larger, so that the video frame that can capture a more complete target object is weighted The calculated contribution of gender prediction value is greater, thereby improving the accuracy of real-time identification of pedestrian gender.

For this embodiment, according to the weight ratio, the gender prediction value of each video frame in the preset number of video frames is multiplied by the corresponding weight to calculate a weighted average value, and the weighted average value is used as the target The weighted gender predictive value of the subject.

In this embodiment, the weighted gender predictive value of the target object is calculated by performing a weighted operation on the gender predictive value, which can further improve the accuracy of real-time gender recognition of pedestrians.

In an embodiment, the gender recognition model is pre-trained through the following steps:

Obtain training samples containing pedestrian images and corresponding gender information;

A deep residual network is trained based on the training samples to obtain a gender recognition model.

For this embodiment, a training sample for training the deep residual network as a gender recognition model is obtained from a preset pedestrian image database, wherein the training sample prestores a large number of pedestrian human images, and the pedestrian human images are A human body image of a person in a walking state, and each pedestrian human body image is pre-marked with a corresponding gender.

For example, one hundred thousand pre-collected human body images of males and females are obtained from a preset pedestrian database and used as input data for the deep residual network.

For this embodiment, the standard deep residual network is trained according to the pedestrian body image and the gender information marked by the pedestrian body image in the training sample to obtain the network structure and weights suitable for the gender recognition task of this scheme, and the training obtains The gender recognition model.

In an embodiment, after obtaining the gender recognition result of the target object according to the weighted gender prediction value, the method further includes:

Save the preset number of video frames and gender recognition results of the target object.

For this embodiment, after the gender result of the target object is obtained, some or all of the video frames of the preset number of video frames of the target object and the corresponding gender recognition result are saved in the gender recognition result database , To quickly match and feedback gender recognition results in subsequent repeated recognition application scenarios. The video frames and the corresponding gender recognition results stored in the gender recognition result database can be periodically cleaned up according to a preset intelligent strategy.

In one embodiment, before inputting a preset number of video frames into a pre-trained gender recognition model to obtain a gender prediction value corresponding to the target object in the preset number of video frames, the method further includes:

Judging whether there are pedestrian human images matching the preset number of video frames in the preset database;

If so, obtain the gender information corresponding to the human body image of the pedestrian prestored in the preset database; generate the gender recognition result of the target object according to the gender information;

If not, continue to perform the step of inputting a preset number of video frames into a pre-trained gender recognition model to obtain the gender prediction value of the target object in the preset number of video frames respectively.

In actual application scenarios, after pedestrians leave the shooting range of the video surveillance tool, they may re-enter the shooting range within a period of time. In order to reduce the workload of real-time gender identification of pedestrians in actual application scenarios, the target Before the subject performs gender recognition, it is quickly matched based on the existing gender recognition results.

For this embodiment, the preset database is a gender recognition result database that stores video frames of historical target objects and corresponding gender recognition results, and the video frames of historical target objects are human body images of pedestrians that include the historical target objects. The pedestrian human body image is a human body image in which a person is walking. Match one or more of the acquired video frames of the preset number of target objects with the video frames in the gender recognition result database, and determine whether there is a match with the preset number of video frames in the gender recognition result database Pedestrian human body image. If there is a matching human body image of a pedestrian in the gender recognition result database, the gender information of the historical target object corresponding to the pedestrian body image is determined according to the gender recognition result pre-stored in the gender recognition result database, and all The gender information of the historical target object is used as the gender recognition result of the target object. If there is no matching pedestrian human body image in the gender recognition result database, real-time gender recognition of the target object is performed.

In this embodiment, by performing quick matching based on the existing gender recognition results in advance before performing gender recognition on the target object, the gender recognition system does not need to re-take the target in the video shooting range within a preset time period. Subjects perform gender recognition again, which significantly reduces the workload of gender recognition in actual application scenarios and improves the efficiency of real-time gender recognition of pedestrians.

In one embodiment, the inputting a preset number of video frames into a pre-trained gender recognition model to obtain the gender prediction value corresponding to the target object in the preset number of video frames respectively includes:

Determining the human body area of the target object in the preset number of video frames;

Acquiring, according to the body region, a preset number of pedestrian body images corresponding to the preset number of video frames;

The preset number of human body images of pedestrians are respectively input to a pre-trained gender recognition model to obtain gender prediction values corresponding to the target object in the preset number of video frames respectively.

In actual application scenarios, the video surveillance tool records the video stream during the dynamic walking of the target object. Therefore, the image information in the preset number of video frames extracted from the video stream may include the information within the shooting range Information outside the target object will interfere with the gender recognition result of the target object. Therefore, it is necessary to preprocess the preset number of video frames, and use the preprocessed preset number of video frames as input data of the gender recognition model.

Specifically, the preprocessing includes:

Determine the human body area of the target object in the preset number of video frames, and intercept the image of the human body area in each video frame to obtain a preset number of pedestrian human body images corresponding to the preset number of video frames. The human body image of the pedestrian is subjected to operations such as normalization processing, noise reduction, and light supplement, and the pre-processed human body image of a preset number of pedestrians is used as the input data of the gender recognition model, and the preset number of pedestrians The human body images are respectively input to the pre-trained gender recognition model, and the gender prediction values of the target objects in the preset number of video frames are obtained respectively. By preprocessing the input data of the gender recognition model, the accuracy of gender recognition of the gender recognition model can be effectively guaranteed.

In addition, an embodiment of the present application provides a gender recognition device based on a deep residual network. As shown in FIG. 2, the device includes: a video frame acquisition module 21, a prediction value acquisition module 22, a weighting operation module 23, and gender recognition The result generation module 24; among them,

The video frame acquisition module 21 is configured to acquire a preset number of video frames of the target object from the video stream based on a pedestrian tracking algorithm;

The predicted value acquisition module 22 is configured to input a preset number of video frames into a pre-trained gender recognition model to obtain the predicted value of the gender of the target object in the preset number of video frames respectively; wherein, the gender The recognition model is pre-trained based on the deep residual network;

The weighted calculation module 23 is configured to perform a weighted calculation on the gender prediction value to obtain the weighted gender prediction value of the target object;

The gender recognition result generating module 24 is configured to obtain the gender recognition result of the target object according to the weighted gender prediction value.

In an embodiment, the video frame acquisition module 21 is specifically configured to:

Based on the KCF target tracking algorithm, the preset number of video frames of the target object is obtained from the video stream.

In an embodiment, the predicted value obtaining module 22 is specifically configured to:

The preset number of human body images of pedestrians are respectively input to a pre-trained gender recognition model to obtain gender prediction values corresponding to the target object in the preset number of video frames, respectively.

The gender recognition device based on the deep residual network provided in this application can be realized by obtaining multiple video frames from the video stream of the target object during the dynamic walking process, and inputting the multiple video frames into the pre-trained based on the deep residual network The gender recognition model realizes the gender recognition of the target object, and can realize the real-time gender recognition of pedestrians without being based on face recognition. The efficiency and accuracy of gender recognition are high, and it can meet the practical application requirements of real-time gender recognition of pedestrians.

The gender recognition device based on the deep residual network provided by the embodiments of the present application can implement the method embodiments provided above. For specific function implementation, please refer to the descriptions in the method embodiments, which will not be repeated here.

In addition, an embodiment of the present application provides a computer-readable storage medium on which a computer program is stored. When the computer program is executed by a processor, the gender based on the deep residual network described in the above embodiment is implemented. recognition methods. Wherein, the computer-readable storage medium includes, but is not limited to, any type of disk (including floppy disk, hard disk, optical disk, CD-ROM, and magneto-optical disk), ROM (Read-Only Memory), RAM (Random AccesSS) Memory), EPROM (EraSable Programmable Read-Only Memory), EEPROM (Electrically EraSable Programmable Read-Only Memory), flash memory, magnetic card or Light card. That is, a storage device includes any medium that stores or transmits information in a readable form by a device (for example, a computer or a mobile phone), and may be a read-only memory, a magnetic disk, or an optical disk.

The computer-readable storage medium provided in this application can realize: by obtaining multiple video frames from the video stream of the target object during the dynamic walking process, and inputting the multiple video frames into the gender recognition model pre-trained based on the deep residual network To realize the gender recognition of the target object, real-time gender recognition of pedestrians can be realized without the need of face recognition. The efficiency and accuracy of gender recognition are high, which meets the practical application requirements of real-time gender recognition of pedestrians.

The computer-readable storage medium provided in the embodiments of the present application can implement the method embodiments provided above. For specific function implementation, please refer to the descriptions in the method embodiments, which will not be repeated here.

In addition, an embodiment of the present application also provides a computer device, as shown in FIG. 3. The computer equipment described in this embodiment may be equipment such as servers, personal computers, and network equipment. The computer equipment includes a processor 302, a memory 303, an input unit 304, a display unit 305 and other devices. Those skilled in the art can understand that the device structure shown in FIG. 3 does not constitute a limitation on all devices, and may include more or less components than those shown in the figure, or combine certain components. The memory 303 may be used to store a computer program 301 and various functional modules, and the processor 302 runs the computer program 301 stored in the memory 303 to execute various functional applications and data processing of the device. The memory may be internal memory or external memory, or include both internal memory and external memory. The internal memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), flash memory, or random access memory. External storage can include hard disks, floppy disks, ZIP disks, U disks, tapes, etc. The memory disclosed in this application includes but is not limited to these types of memory. The memory disclosed in this application is only an example and not a limitation.

The input unit 304 is used for receiving signal input and receiving keywords input by the user. The input unit 304 may include a touch panel and other input devices. The touch panel can collect the user's touch operations on or near it (for example, the user uses any suitable objects or accessories such as fingers, stylus, etc., to operate on the touch panel or near the touch panel), and according to the preset The program drives the corresponding connection device; other input devices can include, but are not limited to, one or more of a physical keyboard, function keys (such as playback control keys, switch keys, etc.), trackball, mouse, and joystick. The display unit 305 can be used to display information input by the user or information provided to the user and various menus of the computer device. The display unit 305 can take the form of a liquid crystal display, an organic light emitting diode, or the like. The processor 302 is the control center of the computer equipment. It uses various interfaces and lines to connect the various parts of the entire computer. By running or executing the software programs and/or modules stored in the memory 302, and calling the data stored in the memory, execute Various functions and processing data.

As an embodiment, the computer device includes: one or more processors 302, a memory 303, and one or more computer programs 301, wherein the one or more computer programs 301 are stored in the memory 303 and configured to Executed by the one or more processors 302, the one or more computer programs 301 are configured to execute the deep residual network-based gender recognition method described in any of the above embodiments.

The computer equipment provided in this application can realize the realization of the target object by acquiring multiple video frames from the video stream during the dynamic walking of the target object, and inputting the multiple video frames into the gender recognition model pre-trained based on the deep residual network Gender recognition can realize real-time gender recognition of pedestrians without being based on face recognition. The efficiency and accuracy of gender recognition are high, which meets the practical application requirements of real-time gender recognition of pedestrians.

The computer device provided in the embodiments of the present application can implement the method embodiments provided above. For specific function implementation, please refer to the descriptions in the method embodiments, which will not be repeated here.

In addition, the functional units in the various embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units may be integrated into one module. The above-mentioned integrated modules can be implemented in the form of hardware or software functional modules. If the integrated module is implemented in the form of a software function module and sold or used as an independent product, it can also be stored in a computer readable storage medium.

The above are only part of the implementation of this application. It should be pointed out that for those of ordinary skill in the art, without departing from the principle of this application, several improvements and modifications can be made, and these improvements and modifications are also Should be regarded as the scope of protection of this application.

Claims

A gender recognition method based on a deep residual network, which is characterized in that it comprises the following steps:

Obtain a preset number of video frames of the target object from the video stream based on the pedestrian tracking algorithm;

The preset number of video frames are respectively input into the pre-trained gender recognition model to obtain the gender prediction values corresponding to the target object in the preset number of video frames respectively; wherein the gender recognition model is pre-trained based on the deep residual network ；

Performing a weighted operation on the gender prediction value to obtain the weighted gender prediction value of the target object;

According to the weighted gender prediction value, the gender recognition result of the target object is obtained.
The gender recognition method according to claim 1, wherein said obtaining a preset number of video frames of the target object from a video stream based on a pedestrian tracking algorithm comprises:

Based on the KCF target tracking algorithm, the preset number of video frames of the target object is obtained from the video stream.
The gender identification method according to claim 1, wherein the performing a weighting operation on the gender prediction value to obtain the weighted gender prediction value of the target object comprises:

Obtain the weight ratio corresponding to the preset number of video frames; wherein the weight ratio is generated according to the weight of the preset number of video frames, and the weight of the preset number of video frames is respectively based on the sequence of the timestamps of the video frames corresponding to the video stream. Sequence setting

Perform a weighted operation on the gender prediction value according to the weight ratio to obtain the weighted gender prediction value of the target object.
The gender recognition method according to claim 1, wherein the gender recognition model is obtained by pre-training through the following steps:

Obtain training samples containing pedestrian images and corresponding gender information;

A deep residual network is trained based on the training samples to obtain a gender recognition model.
The gender recognition method according to claim 1, wherein after obtaining the gender recognition result of the target object according to the weighted gender prediction value, the method further comprises:

Save the preset number of video frames and gender recognition results of the target object.
The gender recognition method according to claim 1, wherein the preset number of video frames are respectively input into a pre-trained gender recognition model to obtain a gender prediction corresponding to the target object in the preset number of video frames. Before the value, it also includes:

Judging whether there are pedestrian human images matching the preset number of video frames in the preset database;

If so, obtain the gender information corresponding to the human body image of the pedestrian prestored in the preset database; generate the gender recognition result of the target object according to the gender information;

If not, continue to perform the step of inputting a preset number of video frames into a pre-trained gender recognition model to obtain the gender prediction value of the target object in the preset number of video frames respectively.
The gender recognition method according to claim 1, wherein the preset number of video frames are respectively input into a pre-trained gender recognition model to obtain a gender prediction corresponding to the target object in the preset number of video frames. Values include:

Determining the human body area of the target object in the preset number of video frames;

Acquiring, according to the body region, a preset number of pedestrian body images corresponding to the preset number of video frames;

The preset number of human body images of pedestrians are respectively input to a pre-trained gender recognition model to obtain gender prediction values corresponding to the target object in the preset number of video frames respectively.
A gender recognition device based on a deep residual network, characterized in that it comprises:

The video frame acquisition module is used to acquire a preset number of video frames of the target object from the video stream based on the pedestrian tracking algorithm;

The predictive value acquisition module is used to input a preset number of video frames into a pre-trained gender recognition model to obtain gender predictive values corresponding to the target object in the preset number of video frames; wherein the gender recognition model is based on The deep residual network is pre-trained;

A weighted calculation module, configured to perform a weighted calculation on the gender prediction value to obtain the weighted gender prediction value of the target object;

The gender recognition result generation module is used to obtain the gender recognition result of the target object according to the weighted gender prediction value.
The device according to claim 8, wherein the video frame acquisition module is specifically configured to:

Based on the KCF target tracking algorithm, the preset number of video frames of the target object is obtained from the video stream.
The device according to claim 8, wherein the weighting operation module is specifically configured to:

Obtain the weight ratio corresponding to the preset number of video frames; wherein the weight ratio is generated according to the weight of the preset number of video frames, and the weight of the preset number of video frames is respectively based on the sequence of the timestamps of the video frames corresponding to the video stream. Sequence setting

Perform a weighted operation on the gender prediction value according to the weight ratio to obtain the weighted gender prediction value of the target object.
The device according to claim 8, wherein the gender recognition model is obtained by pre-training through the following steps:

Obtain training samples containing pedestrian images and corresponding gender information;

A deep residual network is trained based on the training samples to obtain a gender recognition model.
The device according to claim 8, wherein:

The gender recognition result generation module is further configured to save a preset number of video frames and gender recognition results of the target object after the gender recognition result of the target object is obtained according to the weighted gender prediction value.
The device according to claim 8, wherein:

The predictive value acquisition module is further configured to judge before the preset number of video frames are respectively input into a pre-trained gender recognition model to obtain the gender predictive value corresponding to the target object in the preset number of video frames. Whether there are pedestrian human images matching the preset number of video frames in the preset database; if so, obtain gender information corresponding to the pedestrian human images prestored in the preset database; generate the target according to the gender information The gender recognition result of the object; if not, continue to execute the input of the preset number of video frames into the pre-trained gender recognition model to obtain the gender prediction values corresponding to the target object in the preset number of video frames.
The device according to claim 8, wherein the predictive value obtaining module is specifically configured to:

Determining the human body area of the target object in the preset number of video frames;

Acquiring, according to the body region, a preset number of pedestrian body images corresponding to the preset number of video frames;

The preset number of human body images of pedestrians are respectively input to a pre-trained gender recognition model to obtain gender prediction values corresponding to the target object in the preset number of video frames respectively.
A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the depth-based residual according to any one of claims 1 to 7 is implemented. Network gender identification method.
A computer device, characterized in that it includes:

One or more processors;

Memory

One or more computer programs, wherein the one or more computer programs are stored in the memory and configured to be executed by the one or more processors, and the one or more computer programs are configured to execute :

Obtain a preset number of video frames of the target object from the video stream based on the pedestrian tracking algorithm;

The preset number of video frames are respectively input into the pre-trained gender recognition model to obtain the gender prediction values corresponding to the target object in the preset number of video frames respectively; wherein the gender recognition model is pre-trained based on the deep residual network ；

Performing a weighted operation on the gender prediction value to obtain the weighted gender prediction value of the target object;

According to the weighted gender prediction value, the gender recognition result of the target object is obtained.
The computer device of claim 16, wherein when the pedestrian tracking algorithm obtains a preset number of video frames of the target object from the video stream, the one or more computer programs are configured to execute:

Based on the KCF target tracking algorithm, the preset number of video frames of the target object is obtained from the video stream.
The computer device according to claim 16, wherein the one or more computer programs are configured to perform a weighted operation on the gender prediction value to obtain the weighted gender prediction value of the target object carried out:

Obtain the weight ratio corresponding to the preset number of video frames; wherein the weight ratio is generated according to the weight of the preset number of video frames, and the weight of the preset number of video frames is respectively based on the sequence of the timestamps of the video frames corresponding to the video stream. Sequence setting

Perform a weighted operation on the gender prediction value according to the weight ratio to obtain the weighted gender prediction value of the target object.
The computer device according to claim 16, wherein the preset number of video frames are respectively input into a pre-trained gender recognition model to obtain the gender prediction values corresponding to the target object in the preset number of video frames respectively Previously, the one or more computer programs were also configured to execute:

Judging whether there are pedestrian human images matching the preset number of video frames in the preset database;

If so, obtain the gender information corresponding to the human body image of the pedestrian prestored in the preset database; generate the gender recognition result of the target object according to the gender information;

If not, continue to perform the step of inputting a preset number of video frames into a pre-trained gender recognition model to obtain the gender prediction value of the target object in the preset number of video frames respectively.
16. The computer device according to claim 16, wherein the preset number of video frames are respectively input into a pre-trained gender recognition model to obtain gender prediction values corresponding to the target object in the preset number of video frames respectively When the one or more computer programs are configured to execute:

Determining the human body area of the target object in the preset number of video frames;

Acquiring, according to the body region, a preset number of pedestrian body images corresponding to the preset number of video frames;

The preset number of human body images of pedestrians are respectively input to a pre-trained gender recognition model to obtain gender prediction values corresponding to the target object in the preset number of video frames respectively.