WO2020151300A1 - Deep residual network-based gender recognition method and apparatus, medium, and device - Google Patents

Deep residual network-based gender recognition method and apparatus, medium, and device Download PDF

Info

Publication number
WO2020151300A1
WO2020151300A1 PCT/CN2019/116236 CN2019116236W WO2020151300A1 WO 2020151300 A1 WO2020151300 A1 WO 2020151300A1 CN 2019116236 W CN2019116236 W CN 2019116236W WO 2020151300 A1 WO2020151300 A1 WO 2020151300A1
Authority
WO
WIPO (PCT)
Prior art keywords
gender
preset number
video frames
target object
weighted
Prior art date
Application number
PCT/CN2019/116236
Other languages
French (fr)
Chinese (zh)
Inventor
马潜
李洪燕
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020151300A1 publication Critical patent/WO2020151300A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition

Definitions

  • This application relates to the field of intelligent recognition technology. Specifically, this application relates to a gender recognition method, device, computer-readable storage medium, and computer equipment based on a deep residual network.
  • this application provides the following technical solutions based on a deep residual network gender recognition method and corresponding devices, computer-readable storage media and computer equipment.
  • the embodiments of the present application provide a method for gender recognition based on a deep residual network, including the following steps:
  • the preset number of video frames are respectively input into the pre-trained gender recognition model to obtain the gender prediction values corresponding to the target object in the preset number of video frames respectively; wherein the gender recognition model is pre-trained based on the deep residual network ;
  • the gender recognition result of the target object is obtained.
  • the embodiments of the present application provide a gender recognition device based on a deep residual network, including:
  • the video frame acquisition module is used to acquire a preset number of video frames of the target object from the video stream based on the pedestrian tracking algorithm;
  • the predictive value acquisition module is used to input a preset number of video frames into a pre-trained gender recognition model to obtain gender predictive values corresponding to the target object in the preset number of video frames; wherein the gender recognition model is based on The deep residual network is pre-trained;
  • a weighted calculation module configured to perform a weighted calculation on the gender prediction value to obtain the weighted gender prediction value of the target object
  • the gender recognition result generation module is used to obtain the gender recognition result of the target object according to the weighted gender prediction value.
  • the embodiments of the present application provide a computer-readable storage medium having a computer program stored on the computer-readable storage medium, and when the computer program is executed by a processor, the above-mentioned deep residual network-based Method of gender identification.
  • the embodiments of the present application provide a computer device.
  • the computer includes one or more processors; a memory; one or more computer programs, wherein the one or more computer programs are stored in the The memory is configured to be executed by the one or more processors, and the one or more computer programs are configured to execute the aforementioned method for gender recognition based on the deep residual network.
  • the gender recognition method, device, computer-readable storage medium, and computer equipment based on the deep residual network obtained multiple video frames from the video stream during the dynamic walking of the target object, and input multiple video frames
  • the gender recognition model pre-trained based on the deep residual network realizes the gender recognition of the target object.
  • Real-time gender recognition of pedestrians can be realized without face recognition.
  • the gender recognition efficiency and accuracy are high, which meets the practical application of real-time gender recognition of pedestrians. demand.
  • FIG. 1 is a method flowchart of a gender recognition method based on a deep residual network provided by an embodiment of this application;
  • FIG. 2 is a schematic structural diagram of a gender recognition device based on a deep residual network provided by an embodiment of this application;
  • Fig. 3 is a schematic structural diagram of a computer device provided by an embodiment of the application.
  • the embodiment of the application provides a method for gender recognition based on a deep residual network. As shown in FIG. 1, the method includes:
  • Step S110 Obtain a preset number of video frames of the target object from the video stream based on the pedestrian tracking algorithm.
  • the target object is a person whose gender is to be identified.
  • the target object is first tracked based on the pedestrian tracking algorithm within a preset time period, and the video stream during the dynamic walking of the target object during the preset time period is recorded by a video monitoring tool; then, from Extracting a preset number of video frames of the target object from the video stream, wherein the preset number of video frames of the target object may be obtained by extracting key frames from the video stream in a preset period, and the preset
  • the setting period can be any length of 50ms, 80ms, 1s, etc.
  • the preset number of acquired video frames is used as input data for inputting a pre-trained gender recognition model.
  • the preset number can be any value such as 5, 9, 15, etc. Those skilled in the art can determine the specific value of the preset number according to actual application requirements, which is not limited in this embodiment.
  • Step S120 Input a preset number of video frames into a pre-trained gender recognition model, respectively, to obtain gender prediction values corresponding to the target object in the preset number of video frames; wherein the gender recognition model is based on a deep residual network Get pre-trained.
  • the gender recognition model is used to extract the gender characteristics of the target object and calculate the gender prediction value.
  • the obtained preset number of video frames are successively input into the pre-trained gender recognition model, and the sex prediction values of the respective video frames corresponding to the target object can be successively obtained.
  • the calculation process of the gender recognition model to estimate the gender prediction value of the target object is specifically: extracting the gender feature vector of the target object according to the video frame as input data, and further estimating the target based on the gender feature vector The probabilities that the objects are male and female respectively, the gender classification recognition of the target object is realized according to the probability that the target object is male and female.
  • the deep residual network uses the residual structure as the basic structure of the network.
  • This basic structure can be used to solve the problem of performance degradation after the network becomes deeper, and it can also improve the accuracy of gender prediction. And computing efficiency provides strong technical support.
  • Step S130 Perform a weighted operation on the gender prediction value to obtain the weighted gender prediction value of the target object.
  • the gender predictive value corresponding to each video frame is weighted according to a preset weighting method, and the weighted gender predictive value of the target object is calculated.
  • the gender predictive value corresponding to each video frame is weighted and combined. Calculating the weighted gender predictive value can obtain a more accurate gender predictive value than a single static image to identify gender, thereby obtaining a more accurate gender recognition result.
  • Step S140 Obtain a gender recognition result of the target object according to the weighted gender prediction value.
  • the weighted gender prediction value it is determined whether the weighted gender prediction value is greater than a preset threshold; if the weighted gender prediction value is greater than the preset threshold, it is determined that the gender of the target user is male, and the result is obtained.
  • the preset threshold may be 0.5.
  • the predicted gender value is greater than 0.5, it is determined that the gender of the target object is male, and when the predicted gender value is less than or equal to 0.5, the gender of the target object is determined to be female.
  • the gender recognition method based on the deep residual network obtaineds multiple video frames from the video stream during the dynamic walking of the target object, and inputs the multiple video frames into the gender recognition based on the pre-trained deep residual network
  • the model realizes the gender recognition of the target object, and can realize the real-time gender recognition of pedestrians without being based on face recognition.
  • the efficiency and accuracy of gender recognition are high, and it meets the practical application requirements of real-time gender recognition of pedestrians.
  • the obtaining a preset number of video frames of the target object from a video stream based on a pedestrian tracking algorithm includes:
  • the preset number of video frames of the target object is obtained from the video stream.
  • the KCF target tracking algorithm has the characteristics of fast algorithm speed and strong robustness, which can further improve the efficiency and accuracy of obtaining the preset number of video frames of the target object, and meet real-time requirements.
  • the performing a weighted operation on the gender prediction value to obtain the weighted gender prediction value of the target object includes:
  • a weight used for weighting calculation is preset for each video frame in the preset number of video frames to obtain the weight ratio of the preset number of video frames.
  • the weight used for weighting calculation of each video frame may be the same or different.
  • the weight of each video frame in the preset number of video frames is set according to the sequence of the time stamp of the video frame corresponding to the video stream, that is, the weight used for weighting calculation is preset for each video frame.
  • the sizes are respectively associated with the sequence of the time stamps of the respective video frames corresponding to the video stream.
  • the weight of the video frame that is lower in the order of the timestamp is larger, so that the video frame that can capture a more complete target object is weighted
  • the calculated contribution of gender prediction value is greater, thereby improving the accuracy of real-time identification of pedestrian gender.
  • the gender prediction value of each video frame in the preset number of video frames is multiplied by the corresponding weight to calculate a weighted average value, and the weighted average value is used as the target The weighted gender predictive value of the subject.
  • the weighted gender predictive value of the target object is calculated by performing a weighted operation on the gender predictive value, which can further improve the accuracy of real-time gender recognition of pedestrians.
  • the gender recognition model is pre-trained through the following steps:
  • a deep residual network is trained based on the training samples to obtain a gender recognition model.
  • a training sample for training the deep residual network as a gender recognition model is obtained from a preset pedestrian image database, wherein the training sample prestores a large number of pedestrian human images, and the pedestrian human images are A human body image of a person in a walking state, and each pedestrian human body image is pre-marked with a corresponding gender.
  • one hundred thousand pre-collected human body images of males and females are obtained from a preset pedestrian database and used as input data for the deep residual network.
  • the standard deep residual network is trained according to the pedestrian body image and the gender information marked by the pedestrian body image in the training sample to obtain the network structure and weights suitable for the gender recognition task of this scheme, and the training obtains The gender recognition model.
  • the method further includes:
  • the video frames of the preset number of video frames of the target object and the corresponding gender recognition result are saved in the gender recognition result database .
  • the video frames and the corresponding gender recognition results stored in the gender recognition result database can be periodically cleaned up according to a preset intelligent strategy.
  • the method before inputting a preset number of video frames into a pre-trained gender recognition model to obtain a gender prediction value corresponding to the target object in the preset number of video frames, the method further includes:
  • the target Before the subject performs gender recognition, it is quickly matched based on the existing gender recognition results.
  • the preset database is a gender recognition result database that stores video frames of historical target objects and corresponding gender recognition results
  • the video frames of historical target objects are human body images of pedestrians that include the historical target objects.
  • the pedestrian human body image is a human body image in which a person is walking. Match one or more of the acquired video frames of the preset number of target objects with the video frames in the gender recognition result database, and determine whether there is a match with the preset number of video frames in the gender recognition result database Pedestrian human body image.
  • the gender information of the historical target object corresponding to the pedestrian body image is determined according to the gender recognition result pre-stored in the gender recognition result database, and all The gender information of the historical target object is used as the gender recognition result of the target object. If there is no matching pedestrian human body image in the gender recognition result database, real-time gender recognition of the target object is performed.
  • the gender recognition system does not need to re-take the target in the video shooting range within a preset time period. Subjects perform gender recognition again, which significantly reduces the workload of gender recognition in actual application scenarios and improves the efficiency of real-time gender recognition of pedestrians.
  • the inputting a preset number of video frames into a pre-trained gender recognition model to obtain the gender prediction value corresponding to the target object in the preset number of video frames respectively includes:
  • the preset number of human body images of pedestrians are respectively input to a pre-trained gender recognition model to obtain gender prediction values corresponding to the target object in the preset number of video frames respectively.
  • the video surveillance tool records the video stream during the dynamic walking of the target object. Therefore, the image information in the preset number of video frames extracted from the video stream may include the information within the shooting range Information outside the target object will interfere with the gender recognition result of the target object. Therefore, it is necessary to preprocess the preset number of video frames, and use the preprocessed preset number of video frames as input data of the gender recognition model.
  • the preprocessing includes:
  • the human body image of the pedestrian is subjected to operations such as normalization processing, noise reduction, and light supplement, and the pre-processed human body image of a preset number of pedestrians is used as the input data of the gender recognition model, and the preset number of pedestrians
  • the human body images are respectively input to the pre-trained gender recognition model, and the gender prediction values of the target objects in the preset number of video frames are obtained respectively.
  • an embodiment of the present application provides a gender recognition device based on a deep residual network.
  • the device includes: a video frame acquisition module 21, a prediction value acquisition module 22, a weighting operation module 23, and gender recognition The result generation module 24; among them,
  • the video frame acquisition module 21 is configured to acquire a preset number of video frames of the target object from the video stream based on a pedestrian tracking algorithm
  • the predicted value acquisition module 22 is configured to input a preset number of video frames into a pre-trained gender recognition model to obtain the predicted value of the gender of the target object in the preset number of video frames respectively; wherein, the gender The recognition model is pre-trained based on the deep residual network;
  • the weighted calculation module 23 is configured to perform a weighted calculation on the gender prediction value to obtain the weighted gender prediction value of the target object;
  • the gender recognition result generating module 24 is configured to obtain the gender recognition result of the target object according to the weighted gender prediction value.
  • the video frame acquisition module 21 is specifically configured to:
  • the preset number of video frames of the target object is obtained from the video stream.
  • the predicted value obtaining module 22 is specifically configured to:
  • the gender recognition model is pre-trained through the following steps:
  • a deep residual network is trained based on the training samples to obtain a gender recognition model.
  • the method further includes:
  • the method before inputting a preset number of video frames into a pre-trained gender recognition model to obtain a gender prediction value corresponding to the target object in the preset number of video frames, the method further includes:
  • the predicted value obtaining module 22 is specifically configured to:
  • the preset number of human body images of pedestrians are respectively input to a pre-trained gender recognition model to obtain gender prediction values corresponding to the target object in the preset number of video frames, respectively.
  • the gender recognition device based on the deep residual network provided in this application can be realized by obtaining multiple video frames from the video stream of the target object during the dynamic walking process, and inputting the multiple video frames into the pre-trained based on the deep residual network
  • the gender recognition model realizes the gender recognition of the target object, and can realize the real-time gender recognition of pedestrians without being based on face recognition.
  • the efficiency and accuracy of gender recognition are high, and it can meet the practical application requirements of real-time gender recognition of pedestrians.
  • the gender recognition device based on the deep residual network provided by the embodiments of the present application can implement the method embodiments provided above.
  • an embodiment of the present application provides a computer-readable storage medium on which a computer program is stored.
  • the computer program is executed by a processor, the gender based on the deep residual network described in the above embodiment is implemented. recognition methods.
  • the computer-readable storage medium includes, but is not limited to, any type of disk (including floppy disk, hard disk, optical disk, CD-ROM, and magneto-optical disk), ROM (Read-Only Memory), RAM (Random AccesSS) Memory), EPROM (EraSable Programmable Read-Only Memory), EEPROM (Electrically EraSable Programmable Read-Only Memory), flash memory, magnetic card or Light card.
  • a storage device includes any medium that stores or transmits information in a readable form by a device (for example, a computer or a mobile phone), and may be a read-only memory, a magnetic disk, or an optical disk.
  • the computer-readable storage medium provided in this application can realize: by obtaining multiple video frames from the video stream of the target object during the dynamic walking process, and inputting the multiple video frames into the gender recognition model pre-trained based on the deep residual network
  • gender recognition model pre-trained based on the deep residual network To realize the gender recognition of the target object, real-time gender recognition of pedestrians can be realized without the need of face recognition. The efficiency and accuracy of gender recognition are high, which meets the practical application requirements of real-time gender recognition of pedestrians.
  • the computer-readable storage medium provided in the embodiments of the present application can implement the method embodiments provided above.
  • an embodiment of the present application also provides a computer device, as shown in FIG. 3.
  • the computer equipment described in this embodiment may be equipment such as servers, personal computers, and network equipment.
  • the computer equipment includes a processor 302, a memory 303, an input unit 304, a display unit 305 and other devices.
  • the memory 303 may be used to store a computer program 301 and various functional modules, and the processor 302 runs the computer program 301 stored in the memory 303 to execute various functional applications and data processing of the device.
  • the memory may be internal memory or external memory, or include both internal memory and external memory.
  • the internal memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), flash memory, or random access memory.
  • ROM read only memory
  • PROM programmable ROM
  • EPROM electrically programmable ROM
  • EEPROM electrically erasable programmable ROM
  • flash memory or random access memory.
  • External storage can include hard disks, floppy disks, ZIP disks, U disks, tapes, etc.
  • the memory disclosed in this application includes but is not limited to these types of memory.
  • the memory disclosed in this application is only an example and not a limitation.
  • the input unit 304 is used for receiving signal input and receiving keywords input by the user.
  • the input unit 304 may include a touch panel and other input devices.
  • the touch panel can collect the user's touch operations on or near it (for example, the user uses any suitable objects or accessories such as fingers, stylus, etc., to operate on the touch panel or near the touch panel), and according to the preset
  • the program drives the corresponding connection device; other input devices can include, but are not limited to, one or more of a physical keyboard, function keys (such as playback control keys, switch keys, etc.), trackball, mouse, and joystick.
  • the display unit 305 can be used to display information input by the user or information provided to the user and various menus of the computer device.
  • the display unit 305 can take the form of a liquid crystal display, an organic light emitting diode, or the like.
  • the processor 302 is the control center of the computer equipment. It uses various interfaces and lines to connect the various parts of the entire computer. By running or executing the software programs and/or modules stored in the memory 302, and calling the data stored in the memory, execute Various functions and processing data.
  • the computer device includes: one or more processors 302, a memory 303, and one or more computer programs 301, wherein the one or more computer programs 301 are stored in the memory 303 and configured to Executed by the one or more processors 302, the one or more computer programs 301 are configured to execute the deep residual network-based gender recognition method described in any of the above embodiments.
  • the computer equipment provided in this application can realize the realization of the target object by acquiring multiple video frames from the video stream during the dynamic walking of the target object, and inputting the multiple video frames into the gender recognition model pre-trained based on the deep residual network Gender recognition can realize real-time gender recognition of pedestrians without being based on face recognition.
  • the efficiency and accuracy of gender recognition are high, which meets the practical application requirements of real-time gender recognition of pedestrians.
  • the computer device provided in the embodiments of the present application can implement the method embodiments provided above.
  • the functional units in the various embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units may be integrated into one module.
  • the above-mentioned integrated modules can be implemented in the form of hardware or software functional modules. If the integrated module is implemented in the form of a software function module and sold or used as an independent product, it can also be stored in a computer readable storage medium.

Abstract

A deep residual network-based gender recognition method, comprising: obtaining a preset number of video frames of a target object from a video stream on the basis of a pedestrian tracking algorithm (S110); inputting the preset number of video frames into a pre-trained gender recognition model to obtain gender prediction values corresponding to the target object in the preset number of video frames, respectively, wherein the gender recognition model is pre-trained on the basis of a deep residual network (S120); weighting the gender prediction values to obtain the weighted gender prediction values of the target object (S130); and obtaining the gender recognition result of the target object according to the weighted gender prediction values (S140). The method can achieve real-time gender recognition of a pedestrian without face recognition, can achieve high gender recognition efficiency and accuracy, and meets the practical application needs of real-time pedestrian gender recognition.

Description

基于深度残差网络的性别识别方法、装置、介质和设备Gender recognition method, device, medium and equipment based on deep residual network
本申请要求于2019年01月25日提交中国专利局、申请号为201910074634.4、申请名称为“基于深度残差网络的性别识别方法、装置、介质和设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on January 25, 2019, the application number is 201910074634.4, and the application name is "Gender recognition method, device, medium and equipment based on deep residual network", all of which The content is incorporated in this application by reference.
技术领域Technical field
本申请涉及智能识别技术领域,具体而言,本申请涉及一种基于深度残差网络的性别识别方法、装置、计算机可读存储介质和计算机设备。This application relates to the field of intelligent recognition technology. Specifically, this application relates to a gender recognition method, device, computer-readable storage medium, and computer equipment based on a deep residual network.
背景技术Background technique
随着人工智能技术的快速发展,越来越多的应用场景中需要智能识别人的性别。目前,大多数的性别识别基于人脸识别技术实现。然而,在实际应用场景中,人的脸部经常存在被遮挡的问题,难以基于人的脸部特征进行性别识别,故通常只能根据人的身材、穿着、及其它外观进行判断。而行人的性别判断难点在于,一些穿着中性,体型偏胖或者性别特征不明显的人群,只根据某一个角度,也难以实现性别识别。现有的性别识别方法的性别识别准确率较低,难以满足实际应用需求。With the rapid development of artificial intelligence technology, more and more application scenarios require intelligent identification of human gender. Currently, most gender recognition is based on face recognition technology. However, in actual application scenarios, human faces often have the problem of being occluded, and it is difficult to perform gender recognition based on human facial features. Therefore, judgments can usually only be made based on the person's figure, wear, and other appearances. The difficulty in determining the gender of pedestrians lies in the fact that some people who wear neutral clothes, are obese or whose gender characteristics are not obvious, can not achieve gender recognition only from a certain angle. The gender recognition accuracy of the existing gender recognition methods is low, and it is difficult to meet actual application requirements.
发明内容Summary of the invention
为至少能解决上述的技术缺陷之一,本申请提供了以下技术方案的基于深度残差网络的性别识别方法及对应的装置、计算机可读存储介质和计算机设备。In order to solve at least one of the above technical shortcomings, this application provides the following technical solutions based on a deep residual network gender recognition method and corresponding devices, computer-readable storage media and computer equipment.
本申请的实施例根据一个方面,提供了一种基于深度残差网络的性别识别方法,包括如下步骤:According to one aspect, the embodiments of the present application provide a method for gender recognition based on a deep residual network, including the following steps:
基于行人追踪算法从视频流中获取目标对象的预置数量视频帧;Obtain a preset number of video frames of the target object from the video stream based on the pedestrian tracking algorithm;
将预置数量视频帧分别输入预先训练的性别识别模型,得到分别对应所述预置数量视频帧中所述目标对象的性别预测值;其中,所述性别识别模型基于深度残差网络预先训练得到;The preset number of video frames are respectively input into the pre-trained gender recognition model to obtain the gender prediction values corresponding to the target object in the preset number of video frames respectively; wherein the gender recognition model is pre-trained based on the deep residual network ;
对所述性别预测值进行加权运算,得到所述目标对象的加权性别预测值;Performing a weighted operation on the gender prediction value to obtain the weighted gender prediction value of the target object;
根据所述加权性别预测值,得到所述目标对象的性别识别结果。According to the weighted gender prediction value, the gender recognition result of the target object is obtained.
此外,本申请的实施例根据另一个方面,提供了一种基于深度残差网络的性别识别装置,包括:In addition, according to another aspect, the embodiments of the present application provide a gender recognition device based on a deep residual network, including:
视频帧获取模块,用于基于行人追踪算法从视频流中获取目标对象的预置数量视频帧;The video frame acquisition module is used to acquire a preset number of video frames of the target object from the video stream based on the pedestrian tracking algorithm;
预测值获取模块,用于将预置数量视频帧分别输入预先训练的性别识别模型,得到分别对应所述预置数量视频帧中所述目标对象的性别预测值;其中,所述性别识别模型基于深度残差网络预先训练得到;The predictive value acquisition module is used to input a preset number of video frames into a pre-trained gender recognition model to obtain gender predictive values corresponding to the target object in the preset number of video frames; wherein the gender recognition model is based on The deep residual network is pre-trained;
加权运算模块,用于对所述性别预测值进行加权运算,得到所述目标对象的加权性别预测值;A weighted calculation module, configured to perform a weighted calculation on the gender prediction value to obtain the weighted gender prediction value of the target object;
性别识别结果生成模块,用于根据所述加权性别预测值,得到所述目标对象的性别识别结果。The gender recognition result generation module is used to obtain the gender recognition result of the target object according to the weighted gender prediction value.
本申请的实施例根据又一个方面,提供了一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现上述的基于深度残差网络的性别识别方法。According to another aspect, the embodiments of the present application provide a computer-readable storage medium having a computer program stored on the computer-readable storage medium, and when the computer program is executed by a processor, the above-mentioned deep residual network-based Method of gender identification.
本申请的实施例根据再一个方面,提供了一种计算机设备,所述计算机包括一个或多个处理器;存储器;一个或多个计算机程序,其中所述一个或多个计算机程序被存储在所述存储器中并被配置为由所述一个或多个处理器执行,所述一个或多个计算机程序配置用于:执行上述的基于深度残差网络的性别识别方法。According to another aspect, the embodiments of the present application provide a computer device. The computer includes one or more processors; a memory; one or more computer programs, wherein the one or more computer programs are stored in the The memory is configured to be executed by the one or more processors, and the one or more computer programs are configured to execute the aforementioned method for gender recognition based on the deep residual network.
本申请与现有技术相比,具有以下有益效果:Compared with the prior art, this application has the following beneficial effects:
本申请提供的基于深度残差网络的性别识别方法、装置、计算机可读存储介质和计算机设备,通过从目标对象动态行走过程中的视频流中获取多张视频帧,并将多张视频帧输入基于深度残差网络预先训练得到的性别识别模型实现目标对象的性别识别,可无需基于人脸识别便可实现行人的实时性别识别,性别识别效率及准确率高,满足行人性别实时识别的实际应用需求。The gender recognition method, device, computer-readable storage medium, and computer equipment based on the deep residual network provided in this application obtain multiple video frames from the video stream during the dynamic walking of the target object, and input multiple video frames The gender recognition model pre-trained based on the deep residual network realizes the gender recognition of the target object. Real-time gender recognition of pedestrians can be realized without face recognition. The gender recognition efficiency and accuracy are high, which meets the practical application of real-time gender recognition of pedestrians. demand.
本申请附加的方面和优点将在下面的描述中部分给出,这些将从下面的描述中变得明显,或通过本申请的实践了解到。The additional aspects and advantages of the present application will be partly given in the following description, which will become obvious from the following description, or be understood through the practice of the present application.
附图说明Description of the drawings
本申请上述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明显和容易理解,其中:The above and/or additional aspects and advantages of this application will become obvious and easy to understand from the following description of the embodiments in conjunction with the accompanying drawings, in which:
图1为本申请实施例提供的基于深度残差网络的性别识别方法的方法流程图;FIG. 1 is a method flowchart of a gender recognition method based on a deep residual network provided by an embodiment of this application;
图2为本申请实施例提供的基于深度残差网络的性别识别装置的结构示意图;2 is a schematic structural diagram of a gender recognition device based on a deep residual network provided by an embodiment of this application;
图3为本申请实施例提供的计算机设备的结构示意图。Fig. 3 is a schematic structural diagram of a computer device provided by an embodiment of the application.
具体实施方式detailed description
下面详细描述本申请的实施例,所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,仅用于解释本申请,而不能解释为对本申请的限制。The embodiments of the present application are described in detail below. Examples of the embodiments are shown in the accompanying drawings, wherein the same or similar reference numerals represent the same or similar elements or elements with the same or similar functions. The embodiments described below with reference to the drawings are exemplary, and are only used to explain the present application, and cannot be construed as limiting the present application.
本技术领域技术人员可以理解,除非特意声明,这里使用的单数形式“一”、“一个”、“所述”和“该”也可包括复数形式。应该进一步理解的是,本申请的说明书中使用的措辞“包括”是指存在所述特征、整数、步骤、操作、元件和/或组件,但是并不排除存在或添加一个或多个其他特征、整数、步骤、操作、元件、组件和/或它们的组。这里使用的措辞“和/或”包括一个或更多个相关联的列出项的全部或任一单元和全部组合。Those skilled in the art can understand that, unless specifically stated otherwise, the singular forms "a", "an", "said" and "the" used herein may also include plural forms. It should be further understood that the term "comprising" used in the specification of this application refers to the presence of the described features, integers, steps, operations, elements and/or components, but does not exclude the presence or addition of one or more other features, Integers, steps, operations, elements, components, and/or groups thereof. The term "and/or" as used herein includes all or any unit and all combinations of one or more associated listed items.
本申请实施例提供了一种基于深度残差网络的性别识别方法,如图1所示,该方法包括:The embodiment of the application provides a method for gender recognition based on a deep residual network. As shown in FIG. 1, the method includes:
步骤S110:基于行人追踪算法从视频流中获取目标对象的预置数量视频帧。Step S110: Obtain a preset number of video frames of the target object from the video stream based on the pedestrian tracking algorithm.
对于本实施例,所述目标对象为待进行性别识别的人物。For this embodiment, the target object is a person whose gender is to be identified.
在实际应用场景中,首先在预置时长内基于行人追踪算法对所述目标对象进行追踪,通过视频监控工具录制所述预置时长内所述目标对象动态行走过程中的视频流;随后,从所述视频流中提取所述目标对象的预置数量视频帧,其中,可以以预置周期从所述视频流中提取关键帧的方式获取所述目标对象的预 置数量视频帧,所述预置周期可以是50ms、80ms、1s等任意时长。In actual application scenarios, the target object is first tracked based on the pedestrian tracking algorithm within a preset time period, and the video stream during the dynamic walking of the target object during the preset time period is recorded by a video monitoring tool; then, from Extracting a preset number of video frames of the target object from the video stream, wherein the preset number of video frames of the target object may be obtained by extracting key frames from the video stream in a preset period, and the preset The setting period can be any length of 50ms, 80ms, 1s, etc.
对于本实施例,所获取的所述预置数量视频帧用于作为输入预先训练的性别识别模型的输入数据。For this embodiment, the preset number of acquired video frames is used as input data for inputting a pre-trained gender recognition model.
其中,所述预置数量可以是5、9、15等任意数值,本领域技术人员可根据实际应用需求确定所述预置数量的具体数值,本实施例对此不做限定。Wherein, the preset number can be any value such as 5, 9, 15, etc. Those skilled in the art can determine the specific value of the preset number according to actual application requirements, which is not limited in this embodiment.
步骤S120:将预置数量视频帧分别输入预先训练的性别识别模型,得到分别对应所述预置数量视频帧中所述目标对象的性别预测值;其中,所述性别识别模型基于深度残差网络预先训练得到。Step S120: Input a preset number of video frames into a pre-trained gender recognition model, respectively, to obtain gender prediction values corresponding to the target object in the preset number of video frames; wherein the gender recognition model is based on a deep residual network Get pre-trained.
对于本实施例,所述性别识别模型用于提取目标对象的性别特征并计算性别预测值。For this embodiment, the gender recognition model is used to extract the gender characteristics of the target object and calculate the gender prediction value.
对于本实施例,将获取的所述预置数量视频帧先后输入预先训练的性别识别模型,可先后得到所述目标对象对应各个视频帧的性别预测值。其中,所述性别识别模型估算所述目标对象的性别预测值的运算过程具体为:根据作为输入数据的视频帧提取所述目标对象的性别特征向量,基于所述性别特征向量进一步估算所述目标对象分别为男性、女性的概率,以根据所述目标对象为男性、女性的概率实现所述目标对象的性别分类识别。For this embodiment, the obtained preset number of video frames are successively input into the pre-trained gender recognition model, and the sex prediction values of the respective video frames corresponding to the target object can be successively obtained. Wherein, the calculation process of the gender recognition model to estimate the gender prediction value of the target object is specifically: extracting the gender feature vector of the target object according to the video frame as input data, and further estimating the target based on the gender feature vector The probabilities that the objects are male and female respectively, the gender classification recognition of the target object is realized according to the probability that the target object is male and female.
其中,深度残差网络(Deep residual network,ResNet)采用了残差结构作为网络的基本结构,该基本结构可用于解决网络深度变深以后性能的退化问题,同时可为提高性别预测值的准确率和运算效率提供有力的技术支持。Among them, the deep residual network (Deep residual network, ResNet) uses the residual structure as the basic structure of the network. This basic structure can be used to solve the problem of performance degradation after the network becomes deeper, and it can also improve the accuracy of gender prediction. And computing efficiency provides strong technical support.
步骤S130:对所述性别预测值进行加权运算,得到所述目标对象的加权性别预测值。Step S130: Perform a weighted operation on the gender prediction value to obtain the weighted gender prediction value of the target object.
对于本实施例,对对应各个视频帧的性别预测值按预置的加权方式进行加权运算,计算得到所述目标对象的加权性别预测值,通过将对应各个视频帧的性别预测值进行加权运算并计算加权性别预测值,可得到相比单张静态图像识别性别更准确的性别预测值,从而得到更准确的性别识别结果。For this embodiment, the gender predictive value corresponding to each video frame is weighted according to a preset weighting method, and the weighted gender predictive value of the target object is calculated. The gender predictive value corresponding to each video frame is weighted and combined. Calculating the weighted gender predictive value can obtain a more accurate gender predictive value than a single static image to identify gender, thereby obtaining a more accurate gender recognition result.
步骤S140:根据所述加权性别预测值,得到所述目标对象的性别识别结果。Step S140: Obtain a gender recognition result of the target object according to the weighted gender prediction value.
对于本实施例,根据所述加权性别预测值,判断所述加权性别预测值是否大于预置阈值;若所述加权性别预测值大于预置阈值,判定所述目标用户的性 别为男性,得到所述性别对象为男性的性别识别结果;若所述加权性别预测值小于等于预置阈值,判定所述目标用户的性别为女性,得到所述性别对象为女性的性别识别结果。For this embodiment, according to the weighted gender prediction value, it is determined whether the weighted gender prediction value is greater than a preset threshold; if the weighted gender prediction value is greater than the preset threshold, it is determined that the gender of the target user is male, and the result is obtained. The gender recognition result that the gender object is a male; if the weighted gender prediction value is less than or equal to a preset threshold, the gender of the target user is determined to be female, and the gender recognition result that the gender object is a female is obtained.
其中,所述预置阈值可为0.5,当所述性别预测值大于0.5时确定所述目标对象的性别为男性,当所述性别预测值小于等于0.5时确定所述目标对象的性别为女性。The preset threshold may be 0.5. When the predicted gender value is greater than 0.5, it is determined that the gender of the target object is male, and when the predicted gender value is less than or equal to 0.5, the gender of the target object is determined to be female.
本申请提供的基于深度残差网络的性别识别方法,通过从目标对象动态行走过程中的视频流中获取多张视频帧,并将多张视频帧输入基于深度残差网络预先训练得到的性别识别模型实现目标对象的性别识别,可无需基于人脸识别便可实现行人的实时性别识别,性别识别效率及准确率高,满足行人性别实时识别的实际应用需求。The gender recognition method based on the deep residual network provided in this application obtains multiple video frames from the video stream during the dynamic walking of the target object, and inputs the multiple video frames into the gender recognition based on the pre-trained deep residual network The model realizes the gender recognition of the target object, and can realize the real-time gender recognition of pedestrians without being based on face recognition. The efficiency and accuracy of gender recognition are high, and it meets the practical application requirements of real-time gender recognition of pedestrians.
在一个实施例中,所述基于行人追踪算法从视频流中获取目标对象的预置数量视频帧,包括:In an embodiment, the obtaining a preset number of video frames of the target object from a video stream based on a pedestrian tracking algorithm includes:
基于KCF目标跟踪算法从视频流中获取目标对象的预置数量视频帧。所述KCF目标跟踪算法具有算法速度快、鲁棒性强的特征,可进一步提高获取所述目标对象的预置数量视频帧的效率及准确性,满足实时性要求。Based on the KCF target tracking algorithm, the preset number of video frames of the target object is obtained from the video stream. The KCF target tracking algorithm has the characteristics of fast algorithm speed and strong robustness, which can further improve the efficiency and accuracy of obtaining the preset number of video frames of the target object, and meet real-time requirements.
在一个实施例中,所述对所述性别预测值进行加权运算,得到所述目标对象的加权性别预测值,包括:In an embodiment, the performing a weighted operation on the gender prediction value to obtain the weighted gender prediction value of the target object includes:
获取对应所述预置数量视频帧的权重比例;其中,所述权重比例根据预置数量视频帧的权重生成,预置数量视频帧的权重分别根据视频帧对应所述视频流的时间戳的先后顺序设定;Obtain the weight ratio corresponding to the preset number of video frames; wherein the weight ratio is generated according to the weight of the preset number of video frames, and the weight of the preset number of video frames is respectively based on the sequence of the timestamps of the video frames corresponding to the video stream. Sequence setting
根据所述权重比例对所述性别预测值进行加权运算,得到所述目标对象的加权性别预测值。Perform a weighted operation on the gender prediction value according to the weight ratio to obtain the weighted gender prediction value of the target object.
对于本实施例,为所述预置数量视频帧中的各个视频帧预先设定用于加权计算的权重,得到所述预置数量视频帧的权重比例。其中,各个视频帧用于加权计算的权重可相同可不同。For this embodiment, a weight used for weighting calculation is preset for each video frame in the preset number of video frames to obtain the weight ratio of the preset number of video frames. Wherein, the weight used for weighting calculation of each video frame may be the same or different.
对于本实施例,预置数量视频帧中的各个视频帧的权重分别根据视频帧对应所述视频流的时间戳的先后顺序设定,即针对各个视频帧预先设定的用于加权计算的权重大小分别与各自视频帧对应所述视频流的时间戳的先后顺序相 关联。在实际应用场景中,考虑到初始基于行人追踪算法对所述目标对象进行追踪时获取的视频帧中可能尚未捕捉到较为完整的目标对象,容易影响目标对象的加权性别预测值的准确性,故作为一个优选例,可根据视频帧对应所述视频流的时间戳,所述时间戳的顺序越靠后的视频帧的权重越大,以使可捕捉到较为完整的目标对象的视频帧对加权性别预测值的计算贡献更大,从而提高行人性别实时识别的准确率。For this embodiment, the weight of each video frame in the preset number of video frames is set according to the sequence of the time stamp of the video frame corresponding to the video stream, that is, the weight used for weighting calculation is preset for each video frame. The sizes are respectively associated with the sequence of the time stamps of the respective video frames corresponding to the video stream. In actual application scenarios, considering that a relatively complete target object may not have been captured in the video frame acquired when the target object is initially tracked based on the pedestrian tracking algorithm, it is easy to affect the accuracy of the weighted gender prediction value of the target object. As a preferred example, according to the timestamp of the video frame corresponding to the video stream, the weight of the video frame that is lower in the order of the timestamp is larger, so that the video frame that can capture a more complete target object is weighted The calculated contribution of gender prediction value is greater, thereby improving the accuracy of real-time identification of pedestrian gender.
对于本实施例,根据所述权重比例,将所述预置数量视频帧中各个视频帧的所述性别预测值乘以对应的权重,计算加权平均值,将所述加权平均值作为所述目标对象的加权性别预测值。For this embodiment, according to the weight ratio, the gender prediction value of each video frame in the preset number of video frames is multiplied by the corresponding weight to calculate a weighted average value, and the weighted average value is used as the target The weighted gender predictive value of the subject.
在本实施例中,通过对性别预测值进行加权运算计算所述目标对象的加权性别预测值,可进一步提高行人性别实时识别的准确率。In this embodiment, the weighted gender predictive value of the target object is calculated by performing a weighted operation on the gender predictive value, which can further improve the accuracy of real-time gender recognition of pedestrians.
在一个实施例中,所述性别识别模型通过以下步骤预先训练得到:In an embodiment, the gender recognition model is pre-trained through the following steps:
获取包含行人人体图像和对应性别信息的训练样本;Obtain training samples containing pedestrian images and corresponding gender information;
基于所述训练样本训练深度残差网络,得到性别识别模型。A deep residual network is trained based on the training samples to obtain a gender recognition model.
对于本实施例,从预置行人图像库中获取用于将深度残差网络训练为性别识别模型的训练样本,其中,所述训练样本预存有数量众多的行人人体图像,所述行人人体图像为人物呈行走状态的人体图像,各个行人人体图像预先标注有对应的性别。For this embodiment, a training sample for training the deep residual network as a gender recognition model is obtained from a preset pedestrian image database, wherein the training sample prestores a large number of pedestrian human images, and the pedestrian human images are A human body image of a person in a walking state, and each pedestrian human body image is pre-marked with a corresponding gender.
例如,从预置行人数据库中获取预先收集的十万张性别有男、女的行人人体图像用于作为深度残差网络的输入数据。For example, one hundred thousand pre-collected human body images of males and females are obtained from a preset pedestrian database and used as input data for the deep residual network.
对于本实施例,根据所述训练样本中的行人人体图像及行人人体图像标注的性别信息,对标准的深度残差网络进行训练,得到适用于本方案性别识别任务的网络结构和权重,训练得到所述性别识别模型。For this embodiment, the standard deep residual network is trained according to the pedestrian body image and the gender information marked by the pedestrian body image in the training sample to obtain the network structure and weights suitable for the gender recognition task of this scheme, and the training obtains The gender recognition model.
在一个实施例中,所述根据所述加权性别预测值,得到所述目标对象的性别识别结果之后,还包括:In an embodiment, after obtaining the gender recognition result of the target object according to the weighted gender prediction value, the method further includes:
保存所述目标对象的预置数量视频帧和性别识别结果。Save the preset number of video frames and gender recognition results of the target object.
对于本实施例,在得到所述目标对象的性别结果之后,将所述目标对象的所述预置数量视频帧中的部分或全部视频帧,以及对应的性别识别结果保存至性别识别结果数据库中,以用于后续重复识别应用场景下的性别识别结果快速 匹配及反馈。所述保存于性别识别结果数据库中视频帧及对应的性别识别结果可根据预置智能策略定时清理。For this embodiment, after the gender result of the target object is obtained, some or all of the video frames of the preset number of video frames of the target object and the corresponding gender recognition result are saved in the gender recognition result database , To quickly match and feedback gender recognition results in subsequent repeated recognition application scenarios. The video frames and the corresponding gender recognition results stored in the gender recognition result database can be periodically cleaned up according to a preset intelligent strategy.
在一个实施例中,所述将预置数量视频帧分别输入预先训练的性别识别模型,得到分别对应所述预置数量视频帧中所述目标对象的性别预测值之前,还包括:In one embodiment, before inputting a preset number of video frames into a pre-trained gender recognition model to obtain a gender prediction value corresponding to the target object in the preset number of video frames, the method further includes:
判断预置数据库中是否存在与所述预置数量视频帧匹配的行人人体图像;Judging whether there are pedestrian human images matching the preset number of video frames in the preset database;
若是,获取所述预置数据库预存的对应所述行人人体图像的性别信息;根据所述性别信息,生成所述目标对象的性别识别结果;If so, obtain the gender information corresponding to the human body image of the pedestrian prestored in the preset database; generate the gender recognition result of the target object according to the gender information;
若否,继续执行所述将预置数量视频帧分别输入预先训练的性别识别模型,得到分别对应所述预置数量视频帧中所述目标对象的性别预测值的步骤。If not, continue to perform the step of inputting a preset number of video frames into a pre-trained gender recognition model to obtain the gender prediction value of the target object in the preset number of video frames respectively.
在实际应用场景中,在行人离开视频监控工具的拍摄范围外后,可能会在一段时间内再次进入拍摄范围,为降低实际应用场景中的行人性别实时识别的工作量,可在对所述目标对象进行性别识别之前预先基于已有的性别识别结果进行快速匹配。In actual application scenarios, after pedestrians leave the shooting range of the video surveillance tool, they may re-enter the shooting range within a period of time. In order to reduce the workload of real-time gender identification of pedestrians in actual application scenarios, the target Before the subject performs gender recognition, it is quickly matched based on the existing gender recognition results.
对于本实施例,所述预置数据库为保存有历史目标对象的视频帧及对应性别识别结果的性别识别结果数据库,所述历史目标对象的视频帧为包含所述历史目标对象的行人人体图像,所述行人人体图像为人物呈行走状态的人体图像。将获取的所述目标对象预置数量视频帧中的一张或者多张视频帧与性别识别结果数据库中的视频帧进行匹配,判断性别识别结果数据库中是否存在与所述预置数量视频帧匹配的行人人体图像。若所述性别识别结果数据库中存在匹配的行人人体图像,则根据所述性别识别结果数据库中预存的所述性别识别结果,确定对应所述行人人体图像的历史目标对象的性别信息,并将所述历史目标对象的性别信息作为所述目标对象的性别识别结果。若所述性别识别结果数据库中不存在匹配的行人人体图像,才对所述目标对象进行实时性别识别。For this embodiment, the preset database is a gender recognition result database that stores video frames of historical target objects and corresponding gender recognition results, and the video frames of historical target objects are human body images of pedestrians that include the historical target objects. The pedestrian human body image is a human body image in which a person is walking. Match one or more of the acquired video frames of the preset number of target objects with the video frames in the gender recognition result database, and determine whether there is a match with the preset number of video frames in the gender recognition result database Pedestrian human body image. If there is a matching human body image of a pedestrian in the gender recognition result database, the gender information of the historical target object corresponding to the pedestrian body image is determined according to the gender recognition result pre-stored in the gender recognition result database, and all The gender information of the historical target object is used as the gender recognition result of the target object. If there is no matching pedestrian human body image in the gender recognition result database, real-time gender recognition of the target object is performed.
在本实施例中,通过在对所述目标对象进行性别识别之前预先基于已有的性别识别结果进行快速匹配,可使性别识别系统无需对在预置时间段内重新进行视频拍摄范围内的目标对象进行重新性别识别,显著降低实际应用场景中的性别识别工作量,提高行人性别实时识别的效率。In this embodiment, by performing quick matching based on the existing gender recognition results in advance before performing gender recognition on the target object, the gender recognition system does not need to re-take the target in the video shooting range within a preset time period. Subjects perform gender recognition again, which significantly reduces the workload of gender recognition in actual application scenarios and improves the efficiency of real-time gender recognition of pedestrians.
在一个实施例中,所述将预置数量视频帧分别输入预先训练的性别识别模 型,得到分别对应所述预置数量视频帧中所述目标对象的性别预测值,包括:In one embodiment, the inputting a preset number of video frames into a pre-trained gender recognition model to obtain the gender prediction value corresponding to the target object in the preset number of video frames respectively includes:
确定所述预置数量视频帧中所述目标对象的人体区域;Determining the human body area of the target object in the preset number of video frames;
根据所述人体区域,获取对应所述预置数量视频帧的预置数量行人人体图像;Acquiring, according to the body region, a preset number of pedestrian body images corresponding to the preset number of video frames;
将所述预置数量行人人体图像分别输入预先训练的性别识别模型,得到分别对应所述预置数量视频帧中所述目标对象的性别预测值。The preset number of human body images of pedestrians are respectively input to a pre-trained gender recognition model to obtain gender prediction values corresponding to the target object in the preset number of video frames respectively.
在实际应用场景中,视频监控工具录制的是所述目标对象动态行走过程中的视频流,因此,从视频流中提取的预置数量视频帧中的图像信息可能包括拍摄范围内的除所述目标对象以外的信息,会对目标对象的性别识别结果造成干扰。因此,需要对所述预置数量视频帧进行预处理,将预处理后的预置数量视频帧用于作为所述性别识别模型的输入数据。In actual application scenarios, the video surveillance tool records the video stream during the dynamic walking of the target object. Therefore, the image information in the preset number of video frames extracted from the video stream may include the information within the shooting range Information outside the target object will interfere with the gender recognition result of the target object. Therefore, it is necessary to preprocess the preset number of video frames, and use the preprocessed preset number of video frames as input data of the gender recognition model.
具体地,所述预处理包括:Specifically, the preprocessing includes:
确定所述预置数量视频帧中所述目标对象的人体区域,截取各个视频帧中的所述人体区域的图像,得到对应所述预置数量视频帧的预置数量行人人体图像,还可对所述行人人体图像进行归一化处理、降噪、补光等操作,将经预处理之后的预置数量行人人体图像用于作为所述性别识别模型的输入数据,将所述预置数量行人人体图像分别输入预先训练的性别识别模型,得到分别对应所述预置数量视频帧中所述目标对象的性别预测值。通过对所述性别识别模型的输入数据进行预处理,可有效保证所述性别识别模型的性别识别准确率。Determine the human body area of the target object in the preset number of video frames, and intercept the image of the human body area in each video frame to obtain a preset number of pedestrian human body images corresponding to the preset number of video frames. The human body image of the pedestrian is subjected to operations such as normalization processing, noise reduction, and light supplement, and the pre-processed human body image of a preset number of pedestrians is used as the input data of the gender recognition model, and the preset number of pedestrians The human body images are respectively input to the pre-trained gender recognition model, and the gender prediction values of the target objects in the preset number of video frames are obtained respectively. By preprocessing the input data of the gender recognition model, the accuracy of gender recognition of the gender recognition model can be effectively guaranteed.
此外,本申请实施例提供了一种基于深度残差网络的性别识别装置,如图2所示,所述装置包括:视频帧获取模块21、预测值获取模块22、加权运算模块23和性别识别结果生成模块24;其中,In addition, an embodiment of the present application provides a gender recognition device based on a deep residual network. As shown in FIG. 2, the device includes: a video frame acquisition module 21, a prediction value acquisition module 22, a weighting operation module 23, and gender recognition The result generation module 24; among them,
所述视频帧获取模块21,用于基于行人追踪算法从视频流中获取目标对象的预置数量视频帧;The video frame acquisition module 21 is configured to acquire a preset number of video frames of the target object from the video stream based on a pedestrian tracking algorithm;
所述预测值获取模块22,用于将预置数量视频帧分别输入预先训练的性别识别模型,得到分别对应所述预置数量视频帧中所述目标对象的性别预测值;其中,所述性别识别模型基于深度残差网络预先训练得到;The predicted value acquisition module 22 is configured to input a preset number of video frames into a pre-trained gender recognition model to obtain the predicted value of the gender of the target object in the preset number of video frames respectively; wherein, the gender The recognition model is pre-trained based on the deep residual network;
所述加权运算模块23,用于对所述性别预测值进行加权运算,得到所述目标对象的加权性别预测值;The weighted calculation module 23 is configured to perform a weighted calculation on the gender prediction value to obtain the weighted gender prediction value of the target object;
所述性别识别结果生成模块24,用于根据所述加权性别预测值,得到所述目标对象的性别识别结果。The gender recognition result generating module 24 is configured to obtain the gender recognition result of the target object according to the weighted gender prediction value.
在一个实施例中,所述视频帧获取模块21,具体用于:In an embodiment, the video frame acquisition module 21 is specifically configured to:
基于KCF目标跟踪算法从视频流中获取目标对象的预置数量视频帧。Based on the KCF target tracking algorithm, the preset number of video frames of the target object is obtained from the video stream.
在一个实施例中,所述预测值获取模块22,具体用于:In an embodiment, the predicted value obtaining module 22 is specifically configured to:
获取对应所述预置数量视频帧的权重比例;其中,所述权重比例根据预置数量视频帧的权重生成,预置数量视频帧的权重分别根据视频帧对应所述视频流的时间戳的先后顺序设定;Obtain the weight ratio corresponding to the preset number of video frames; wherein the weight ratio is generated according to the weight of the preset number of video frames, and the weight of the preset number of video frames is respectively based on the sequence of the timestamps of the video frames corresponding to the video stream. Sequence setting
根据所述权重比例对所述性别预测值进行加权运算,得到所述目标对象的加权性别预测值。Perform a weighted operation on the gender prediction value according to the weight ratio to obtain the weighted gender prediction value of the target object.
在一个实施例中,所述性别识别模型通过以下步骤预先训练得到:In an embodiment, the gender recognition model is pre-trained through the following steps:
获取包含行人人体图像和对应性别信息的训练样本;Obtain training samples containing pedestrian images and corresponding gender information;
基于所述训练样本训练深度残差网络,得到性别识别模型。A deep residual network is trained based on the training samples to obtain a gender recognition model.
在一个实施例中,所述根据所述加权性别预测值,得到所述目标对象的性别识别结果之后,还包括:In an embodiment, after obtaining the gender recognition result of the target object according to the weighted gender prediction value, the method further includes:
保存所述目标对象的预置数量视频帧和性别识别结果。Save the preset number of video frames and gender recognition results of the target object.
在一个实施例中,所述将预置数量视频帧分别输入预先训练的性别识别模型,得到分别对应所述预置数量视频帧中所述目标对象的性别预测值之前,还包括:In one embodiment, before inputting a preset number of video frames into a pre-trained gender recognition model to obtain a gender prediction value corresponding to the target object in the preset number of video frames, the method further includes:
判断预置数据库中是否存在与所述预置数量视频帧匹配的行人人体图像;Judging whether there are pedestrian human images matching the preset number of video frames in the preset database;
若是,获取所述预置数据库预存的对应所述行人人体图像的性别信息;根据所述性别信息,生成所述目标对象的性别识别结果;If so, obtain the gender information corresponding to the human body image of the pedestrian prestored in the preset database; generate the gender recognition result of the target object according to the gender information;
若否,继续执行所述将预置数量视频帧分别输入预先训练的性别识别模型,得到分别对应所述预置数量视频帧中所述目标对象的性别预测值的步骤。If not, continue to perform the step of inputting a preset number of video frames into a pre-trained gender recognition model to obtain the gender prediction value of the target object in the preset number of video frames respectively.
在一个实施例中,所述预测值获取模块22,具体用于:In an embodiment, the predicted value obtaining module 22 is specifically configured to:
确定所述预置数量视频帧中所述目标对象的人体区域;Determining the human body area of the target object in the preset number of video frames;
根据所述人体区域,获取对应所述预置数量视频帧的预置数量行人人体图像;Acquiring, according to the body region, a preset number of pedestrian body images corresponding to the preset number of video frames;
将所述预置数量行人人体图像分别输入预先训练的性别识别模型,得到分 别对应所述预置数量视频帧中所述目标对象的性别预测值。The preset number of human body images of pedestrians are respectively input to a pre-trained gender recognition model to obtain gender prediction values corresponding to the target object in the preset number of video frames, respectively.
本申请提供的基于深度残差网络的性别识别装置可实现:通过从目标对象动态行走过程中的视频流中获取多张视频帧,并将多张视频帧输入基于深度残差网络预先训练得到的性别识别模型实现目标对象的性别识别,可无需基于人脸识别便可实现行人的实时性别识别,性别识别效率及准确率高,满足行人性别实时识别的实际应用需求。The gender recognition device based on the deep residual network provided in this application can be realized by obtaining multiple video frames from the video stream of the target object during the dynamic walking process, and inputting the multiple video frames into the pre-trained based on the deep residual network The gender recognition model realizes the gender recognition of the target object, and can realize the real-time gender recognition of pedestrians without being based on face recognition. The efficiency and accuracy of gender recognition are high, and it can meet the practical application requirements of real-time gender recognition of pedestrians.
本申请实施例提供的基于深度残差网络的性别识别装置可以实现上述提供的方法实施例,具体功能实现请参见方法实施例中的说明,在此不再赘述。The gender recognition device based on the deep residual network provided by the embodiments of the present application can implement the method embodiments provided above. For specific function implementation, please refer to the descriptions in the method embodiments, which will not be repeated here.
此外,本申请实施例提供了一种计算机可读存储介质,计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现以上实施例所述的基于深度残差网络的性别识别方法。其中,所述计算机可读存储介质包括但不限于任何类型的盘(包括软盘、硬盘、光盘、CD-ROM、和磁光盘)、ROM(Read-Only Memory,只读存储器)、RAM(Random AcceSS Memory,随即存储器)、EPROM(EraSable Programmable Read-Only Memory,可擦写可编程只读存储器)、EEPROM(Electrically EraSable Programmable Read-Only Memory,电可擦可编程只读存储器)、闪存、磁性卡片或光线卡片。也就是,存储设备包括由设备(例如,计算机、手机)以能够读的形式存储或传输信息的任何介质,可以是只读存储器,磁盘或光盘等。In addition, an embodiment of the present application provides a computer-readable storage medium on which a computer program is stored. When the computer program is executed by a processor, the gender based on the deep residual network described in the above embodiment is implemented. recognition methods. Wherein, the computer-readable storage medium includes, but is not limited to, any type of disk (including floppy disk, hard disk, optical disk, CD-ROM, and magneto-optical disk), ROM (Read-Only Memory), RAM (Random AccesSS) Memory), EPROM (EraSable Programmable Read-Only Memory), EEPROM (Electrically EraSable Programmable Read-Only Memory), flash memory, magnetic card or Light card. That is, a storage device includes any medium that stores or transmits information in a readable form by a device (for example, a computer or a mobile phone), and may be a read-only memory, a magnetic disk, or an optical disk.
本申请提供的计算机可读存储介质,可实现:通过从目标对象动态行走过程中的视频流中获取多张视频帧,并将多张视频帧输入基于深度残差网络预先训练得到的性别识别模型实现目标对象的性别识别,可无需基于人脸识别便可实现行人的实时性别识别,性别识别效率及准确率高,满足行人性别实时识别的实际应用需求。The computer-readable storage medium provided in this application can realize: by obtaining multiple video frames from the video stream of the target object during the dynamic walking process, and inputting the multiple video frames into the gender recognition model pre-trained based on the deep residual network To realize the gender recognition of the target object, real-time gender recognition of pedestrians can be realized without the need of face recognition. The efficiency and accuracy of gender recognition are high, which meets the practical application requirements of real-time gender recognition of pedestrians.
本申请实施例提供的计算机可读存储介质可以实现上述提供的方法实施例,具体功能实现请参见方法实施例中的说明,在此不再赘述。The computer-readable storage medium provided in the embodiments of the present application can implement the method embodiments provided above. For specific function implementation, please refer to the descriptions in the method embodiments, which will not be repeated here.
此外,本申请实施例还提供了一种计算机设备,如图3所示。本实施例所述的计算机设备可以是服务器、个人计算机以及网络设备等设备。所述计算机设备包括处理器302、存储器303、输入单元304以及显示单元305等器件。本领域技术人员可以理解,图3示出的设备结构器件并不构成对所有设备的限 定,可以包括比图示更多或更少的部件,或者组合某些部件。存储器303可用于存储计算机程序301以及各功能模块,处理器302运行存储在存储器303的计算机程序301,从而执行设备的各种功能应用以及数据处理。存储器可以是内存储器或外存储器,或者包括内存储器和外存储器两者。内存储器可以包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦写可编程ROM(EEPROM)、快闪存储器、或者随机存储器。外存储器可以包括硬盘、软盘、ZIP盘、U盘、磁带等。本申请所公开的存储器包括但不限于这些类型的存储器。本申请所公开的存储器只作为例子而非作为限定。In addition, an embodiment of the present application also provides a computer device, as shown in FIG. 3. The computer equipment described in this embodiment may be equipment such as servers, personal computers, and network equipment. The computer equipment includes a processor 302, a memory 303, an input unit 304, a display unit 305 and other devices. Those skilled in the art can understand that the device structure shown in FIG. 3 does not constitute a limitation on all devices, and may include more or less components than those shown in the figure, or combine certain components. The memory 303 may be used to store a computer program 301 and various functional modules, and the processor 302 runs the computer program 301 stored in the memory 303 to execute various functional applications and data processing of the device. The memory may be internal memory or external memory, or include both internal memory and external memory. The internal memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), flash memory, or random access memory. External storage can include hard disks, floppy disks, ZIP disks, U disks, tapes, etc. The memory disclosed in this application includes but is not limited to these types of memory. The memory disclosed in this application is only an example and not a limitation.
输入单元304用于接收信号的输入,以及接收用户输入的关键字。输入单元304可包括触控面板以及其它输入设备。触控面板可收集用户在其上或附近的触摸操作(比如用户使用手指、触笔等任何适合的物体或附件在触控面板上或在触控面板附近的操作),并根据预先设定的程序驱动相应的连接装置;其它输入设备可以包括但不限于物理键盘、功能键(比如播放控制按键、开关按键等)、轨迹球、鼠标、操作杆等中的一种或多种。显示单元305可用于显示用户输入的信息或提供给用户的信息以及计算机设备的各种菜单。显示单元305可采用液晶显示器、有机发光二极管等形式。处理器302是计算机设备的控制中心,利用各种接口和线路连接整个电脑的各个部分,通过运行或执行存储在存储器302内的软件程序和/或模块,以及调用存储在存储器内的数据,执行各种功能和处理数据。The input unit 304 is used for receiving signal input and receiving keywords input by the user. The input unit 304 may include a touch panel and other input devices. The touch panel can collect the user's touch operations on or near it (for example, the user uses any suitable objects or accessories such as fingers, stylus, etc., to operate on the touch panel or near the touch panel), and according to the preset The program drives the corresponding connection device; other input devices can include, but are not limited to, one or more of a physical keyboard, function keys (such as playback control keys, switch keys, etc.), trackball, mouse, and joystick. The display unit 305 can be used to display information input by the user or information provided to the user and various menus of the computer device. The display unit 305 can take the form of a liquid crystal display, an organic light emitting diode, or the like. The processor 302 is the control center of the computer equipment. It uses various interfaces and lines to connect the various parts of the entire computer. By running or executing the software programs and/or modules stored in the memory 302, and calling the data stored in the memory, execute Various functions and processing data.
作为一个实施例,所述计算机设备包括:一个或多个处理器302,存储器303,一个或多个计算机程序301,其中所述一个或多个计算机程序301被存储在存储器303中并被配置为由所述一个或多个处理器302执行,所述一个或多个计算机程序301配置用于执行以上任一实施例所述的基于深度残差网络的性别识别方法。As an embodiment, the computer device includes: one or more processors 302, a memory 303, and one or more computer programs 301, wherein the one or more computer programs 301 are stored in the memory 303 and configured to Executed by the one or more processors 302, the one or more computer programs 301 are configured to execute the deep residual network-based gender recognition method described in any of the above embodiments.
本申请提供的计算机设备,可实现:通过从目标对象动态行走过程中的视频流中获取多张视频帧,并将多张视频帧输入基于深度残差网络预先训练得到的性别识别模型实现目标对象的性别识别,可无需基于人脸识别便可实现行人的实时性别识别,性别识别效率及准确率高,满足行人性别实时识别的实际应用需求。The computer equipment provided in this application can realize the realization of the target object by acquiring multiple video frames from the video stream during the dynamic walking of the target object, and inputting the multiple video frames into the gender recognition model pre-trained based on the deep residual network Gender recognition can realize real-time gender recognition of pedestrians without being based on face recognition. The efficiency and accuracy of gender recognition are high, which meets the practical application requirements of real-time gender recognition of pedestrians.
本申请实施例提供的计算机设备可以实现上述提供的方法实施例,具体功能实现请参见方法实施例中的说明,在此不再赘述。The computer device provided in the embodiments of the present application can implement the method embodiments provided above. For specific function implementation, please refer to the descriptions in the method embodiments, which will not be repeated here.
此外,在本申请各个实施例中的各功能单元可以集成在一个处理模块中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。In addition, the functional units in the various embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units may be integrated into one module. The above-mentioned integrated modules can be implemented in the form of hardware or software functional modules. If the integrated module is implemented in the form of a software function module and sold or used as an independent product, it can also be stored in a computer readable storage medium.
以上所述仅是本申请的部分实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本申请原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本申请的保护范围。The above are only part of the implementation of this application. It should be pointed out that for those of ordinary skill in the art, without departing from the principle of this application, several improvements and modifications can be made, and these improvements and modifications are also Should be regarded as the scope of protection of this application.

Claims (20)

  1. 一种基于深度残差网络的性别识别方法,其特征在于,包括如下步骤:A gender recognition method based on a deep residual network, which is characterized in that it comprises the following steps:
    基于行人追踪算法从视频流中获取目标对象的预置数量视频帧;Obtain a preset number of video frames of the target object from the video stream based on the pedestrian tracking algorithm;
    将预置数量视频帧分别输入预先训练的性别识别模型,得到分别对应所述预置数量视频帧中所述目标对象的性别预测值;其中,所述性别识别模型基于深度残差网络预先训练得到;The preset number of video frames are respectively input into the pre-trained gender recognition model to obtain the gender prediction values corresponding to the target object in the preset number of video frames respectively; wherein the gender recognition model is pre-trained based on the deep residual network ;
    对所述性别预测值进行加权运算,得到所述目标对象的加权性别预测值;Performing a weighted operation on the gender prediction value to obtain the weighted gender prediction value of the target object;
    根据所述加权性别预测值,得到所述目标对象的性别识别结果。According to the weighted gender prediction value, the gender recognition result of the target object is obtained.
  2. 根据权利要求1所述的性别识别方法,其特征在于,所述基于行人追踪算法从视频流中获取目标对象的预置数量视频帧,包括:The gender recognition method according to claim 1, wherein said obtaining a preset number of video frames of the target object from a video stream based on a pedestrian tracking algorithm comprises:
    基于KCF目标跟踪算法从视频流中获取目标对象的预置数量视频帧。Based on the KCF target tracking algorithm, the preset number of video frames of the target object is obtained from the video stream.
  3. 根据权利要求1所述的性别识别方法,其特征在于,所述对所述性别预测值进行加权运算,得到所述目标对象的加权性别预测值,包括:The gender identification method according to claim 1, wherein the performing a weighting operation on the gender prediction value to obtain the weighted gender prediction value of the target object comprises:
    获取对应所述预置数量视频帧的权重比例;其中,所述权重比例根据预置数量视频帧的权重生成,预置数量视频帧的权重分别根据视频帧对应所述视频流的时间戳的先后顺序设定;Obtain the weight ratio corresponding to the preset number of video frames; wherein the weight ratio is generated according to the weight of the preset number of video frames, and the weight of the preset number of video frames is respectively based on the sequence of the timestamps of the video frames corresponding to the video stream. Sequence setting
    根据所述权重比例对所述性别预测值进行加权运算,得到所述目标对象的加权性别预测值。Perform a weighted operation on the gender prediction value according to the weight ratio to obtain the weighted gender prediction value of the target object.
  4. 根据权利要求1所述的性别识别方法,其特征在于,所述性别识别模型通过以下步骤预先训练得到:The gender recognition method according to claim 1, wherein the gender recognition model is obtained by pre-training through the following steps:
    获取包含行人人体图像和对应性别信息的训练样本;Obtain training samples containing pedestrian images and corresponding gender information;
    基于所述训练样本训练深度残差网络,得到性别识别模型。A deep residual network is trained based on the training samples to obtain a gender recognition model.
  5. 根据权利要求1所述的性别识别方法,其特征在于,所述根据所述加权性别预测值,得到所述目标对象的性别识别结果之后,还包括:The gender recognition method according to claim 1, wherein after obtaining the gender recognition result of the target object according to the weighted gender prediction value, the method further comprises:
    保存所述目标对象的预置数量视频帧和性别识别结果。Save the preset number of video frames and gender recognition results of the target object.
  6. 根据权利要求1所述的性别识别方法,其特征在于,所述将预置数量视频帧分别输入预先训练的性别识别模型,得到分别对应所述预置数量视频帧中所述目标对象的性别预测值之前,还包括:The gender recognition method according to claim 1, wherein the preset number of video frames are respectively input into a pre-trained gender recognition model to obtain a gender prediction corresponding to the target object in the preset number of video frames. Before the value, it also includes:
    判断预置数据库中是否存在与所述预置数量视频帧匹配的行人人体图像;Judging whether there are pedestrian human images matching the preset number of video frames in the preset database;
    若是,获取所述预置数据库预存的对应所述行人人体图像的性别信息;根据所述性别信息,生成所述目标对象的性别识别结果;If so, obtain the gender information corresponding to the human body image of the pedestrian prestored in the preset database; generate the gender recognition result of the target object according to the gender information;
    若否,继续执行所述将预置数量视频帧分别输入预先训练的性别识别模型,得到分别对应所述预置数量视频帧中所述目标对象的性别预测值的步骤。If not, continue to perform the step of inputting a preset number of video frames into a pre-trained gender recognition model to obtain the gender prediction value of the target object in the preset number of video frames respectively.
  7. 根据权利要求1所述的性别识别方法,其特征在于,所述将预置数量视频帧分别输入预先训练的性别识别模型,得到分别对应所述预置数量视频帧中所述目标对象的性别预测值,包括:The gender recognition method according to claim 1, wherein the preset number of video frames are respectively input into a pre-trained gender recognition model to obtain a gender prediction corresponding to the target object in the preset number of video frames. Values include:
    确定所述预置数量视频帧中所述目标对象的人体区域;Determining the human body area of the target object in the preset number of video frames;
    根据所述人体区域,获取对应所述预置数量视频帧的预置数量行人人体图像;Acquiring, according to the body region, a preset number of pedestrian body images corresponding to the preset number of video frames;
    将所述预置数量行人人体图像分别输入预先训练的性别识别模型,得到分别对应所述预置数量视频帧中所述目标对象的性别预测值。The preset number of human body images of pedestrians are respectively input to a pre-trained gender recognition model to obtain gender prediction values corresponding to the target object in the preset number of video frames respectively.
  8. 一种基于深度残差网络的性别识别装置,其特征在于,包括:A gender recognition device based on a deep residual network, characterized in that it comprises:
    视频帧获取模块,用于基于行人追踪算法从视频流中获取目标对象的预置数量视频帧;The video frame acquisition module is used to acquire a preset number of video frames of the target object from the video stream based on the pedestrian tracking algorithm;
    预测值获取模块,用于将预置数量视频帧分别输入预先训练的性别识别模型,得到分别对应所述预置数量视频帧中所述目标对象的性别预测值;其中,所述性别识别模型基于深度残差网络预先训练得到;The predictive value acquisition module is used to input a preset number of video frames into a pre-trained gender recognition model to obtain gender predictive values corresponding to the target object in the preset number of video frames; wherein the gender recognition model is based on The deep residual network is pre-trained;
    加权运算模块,用于对所述性别预测值进行加权运算,得到所述目标对象的加权性别预测值;A weighted calculation module, configured to perform a weighted calculation on the gender prediction value to obtain the weighted gender prediction value of the target object;
    性别识别结果生成模块,用于根据所述加权性别预测值,得到所述目标对象的性别识别结果。The gender recognition result generation module is used to obtain the gender recognition result of the target object according to the weighted gender prediction value.
  9. 根据权利要求8所述的装置,其特征在于,所述视频帧获取模块具体用于:The device according to claim 8, wherein the video frame acquisition module is specifically configured to:
    基于KCF目标跟踪算法从视频流中获取目标对象的预置数量视频帧。Based on the KCF target tracking algorithm, the preset number of video frames of the target object is obtained from the video stream.
  10. 根据权利要求8所述的装置,其特征在于,所述加权运算模块具体用于:The device according to claim 8, wherein the weighting operation module is specifically configured to:
    获取对应所述预置数量视频帧的权重比例;其中,所述权重比例根据预置数量视频帧的权重生成,预置数量视频帧的权重分别根据视频帧对应所述视频 流的时间戳的先后顺序设定;Obtain the weight ratio corresponding to the preset number of video frames; wherein the weight ratio is generated according to the weight of the preset number of video frames, and the weight of the preset number of video frames is respectively based on the sequence of the timestamps of the video frames corresponding to the video stream. Sequence setting
    根据所述权重比例对所述性别预测值进行加权运算,得到所述目标对象的加权性别预测值。Perform a weighted operation on the gender prediction value according to the weight ratio to obtain the weighted gender prediction value of the target object.
  11. 根据权利要求8所述的装置,其特征在于,所述性别识别模型通过以下步骤预先训练得到:The device according to claim 8, wherein the gender recognition model is obtained by pre-training through the following steps:
    获取包含行人人体图像和对应性别信息的训练样本;Obtain training samples containing pedestrian images and corresponding gender information;
    基于所述训练样本训练深度残差网络,得到性别识别模型。A deep residual network is trained based on the training samples to obtain a gender recognition model.
  12. 根据权利要求8所述的装置,其特征在于,The device according to claim 8, wherein:
    所述性别识别结果生成模块,还用于在所述根据所述加权性别预测值,得到所述目标对象的性别识别结果之后,保存所述目标对象的预置数量视频帧和性别识别结果。The gender recognition result generation module is further configured to save a preset number of video frames and gender recognition results of the target object after the gender recognition result of the target object is obtained according to the weighted gender prediction value.
  13. 根据权利要求8所述的装置,其特征在于,The device according to claim 8, wherein:
    所述预测值获取模块,还用于在所述将预置数量视频帧分别输入预先训练的性别识别模型,得到分别对应所述预置数量视频帧中所述目标对象的性别预测值之前,判断预置数据库中是否存在与所述预置数量视频帧匹配的行人人体图像;若是,获取所述预置数据库预存的对应所述行人人体图像的性别信息;根据所述性别信息,生成所述目标对象的性别识别结果;若否,继续执行所述将预置数量视频帧分别输入预先训练的性别识别模型,得到分别对应所述预置数量视频帧中所述目标对象的性别预测值。The predictive value acquisition module is further configured to judge before the preset number of video frames are respectively input into a pre-trained gender recognition model to obtain the gender predictive value corresponding to the target object in the preset number of video frames. Whether there are pedestrian human images matching the preset number of video frames in the preset database; if so, obtain gender information corresponding to the pedestrian human images prestored in the preset database; generate the target according to the gender information The gender recognition result of the object; if not, continue to execute the input of the preset number of video frames into the pre-trained gender recognition model to obtain the gender prediction values corresponding to the target object in the preset number of video frames.
  14. 根据权利要求8所述的装置,其特征在于,所述预测值获取模块具体用于:The device according to claim 8, wherein the predictive value obtaining module is specifically configured to:
    确定所述预置数量视频帧中所述目标对象的人体区域;Determining the human body area of the target object in the preset number of video frames;
    根据所述人体区域,获取对应所述预置数量视频帧的预置数量行人人体图像;Acquiring, according to the body region, a preset number of pedestrian body images corresponding to the preset number of video frames;
    将所述预置数量行人人体图像分别输入预先训练的性别识别模型,得到分别对应所述预置数量视频帧中所述目标对象的性别预测值。The preset number of human body images of pedestrians are respectively input to a pre-trained gender recognition model to obtain gender prediction values corresponding to the target object in the preset number of video frames respectively.
  15. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现权利要求1至7任一项所述的基于深度残差网络的性别识别方法。A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the depth-based residual according to any one of claims 1 to 7 is implemented. Network gender identification method.
  16. 一种计算机设备,其特征在于,其包括:A computer device, characterized in that it includes:
    一个或多个处理器;One or more processors;
    存储器;Memory
    一个或多个计算机程序,其中所述一个或多个计算机程序被存储在所述存储器中并被配置为由所述一个或多个处理器执行,所述一个或多个计算机程序配置用于执行:One or more computer programs, wherein the one or more computer programs are stored in the memory and configured to be executed by the one or more processors, and the one or more computer programs are configured to execute :
    基于行人追踪算法从视频流中获取目标对象的预置数量视频帧;Obtain a preset number of video frames of the target object from the video stream based on the pedestrian tracking algorithm;
    将预置数量视频帧分别输入预先训练的性别识别模型,得到分别对应所述预置数量视频帧中所述目标对象的性别预测值;其中,所述性别识别模型基于深度残差网络预先训练得到;The preset number of video frames are respectively input into the pre-trained gender recognition model to obtain the gender prediction values corresponding to the target object in the preset number of video frames respectively; wherein the gender recognition model is pre-trained based on the deep residual network ;
    对所述性别预测值进行加权运算,得到所述目标对象的加权性别预测值;Performing a weighted operation on the gender prediction value to obtain the weighted gender prediction value of the target object;
    根据所述加权性别预测值,得到所述目标对象的性别识别结果。According to the weighted gender prediction value, the gender recognition result of the target object is obtained.
  17. 根据权利要求16所述的计算机设备,其特征在于,在所述基于行人追踪算法从视频流中获取目标对象的预置数量视频帧时,所述一个或多个计算机程序被配置用于执行:The computer device of claim 16, wherein when the pedestrian tracking algorithm obtains a preset number of video frames of the target object from the video stream, the one or more computer programs are configured to execute:
    基于KCF目标跟踪算法从视频流中获取目标对象的预置数量视频帧。Based on the KCF target tracking algorithm, the preset number of video frames of the target object is obtained from the video stream.
  18. 根据权利要求16所述的计算机设备,其特征在于,所述对所述性别预测值进行加权运算,得到所述目标对象的加权性别预测值时,所述一个或多个计算机程序被配置用于执行:The computer device according to claim 16, wherein the one or more computer programs are configured to perform a weighted operation on the gender prediction value to obtain the weighted gender prediction value of the target object carried out:
    获取对应所述预置数量视频帧的权重比例;其中,所述权重比例根据预置数量视频帧的权重生成,预置数量视频帧的权重分别根据视频帧对应所述视频流的时间戳的先后顺序设定;Obtain the weight ratio corresponding to the preset number of video frames; wherein the weight ratio is generated according to the weight of the preset number of video frames, and the weight of the preset number of video frames is respectively based on the sequence of the timestamps of the video frames corresponding to the video stream. Sequence setting
    根据所述权重比例对所述性别预测值进行加权运算,得到所述目标对象的加权性别预测值。Perform a weighted operation on the gender prediction value according to the weight ratio to obtain the weighted gender prediction value of the target object.
  19. 根据权利要求16所述的计算机设备,其特征在于,所述将预置数量视频帧分别输入预先训练的性别识别模型,得到分别对应所述预置数量视频帧中所述目标对象的性别预测值之前,所述一个或多个计算机程序还被配置用于执行:The computer device according to claim 16, wherein the preset number of video frames are respectively input into a pre-trained gender recognition model to obtain the gender prediction values corresponding to the target object in the preset number of video frames respectively Previously, the one or more computer programs were also configured to execute:
    判断预置数据库中是否存在与所述预置数量视频帧匹配的行人人体图像;Judging whether there are pedestrian human images matching the preset number of video frames in the preset database;
    若是,获取所述预置数据库预存的对应所述行人人体图像的性别信息;根据所述性别信息,生成所述目标对象的性别识别结果;If so, obtain the gender information corresponding to the human body image of the pedestrian prestored in the preset database; generate the gender recognition result of the target object according to the gender information;
    若否,继续执行所述将预置数量视频帧分别输入预先训练的性别识别模型,得到分别对应所述预置数量视频帧中所述目标对象的性别预测值的步骤。If not, continue to perform the step of inputting a preset number of video frames into a pre-trained gender recognition model to obtain the gender prediction value of the target object in the preset number of video frames respectively.
  20. 根据权利要求16所述的计算机设备,其特征在于,所述将预置数量视频帧分别输入预先训练的性别识别模型,得到分别对应所述预置数量视频帧中所述目标对象的性别预测值时,所述一个或多个计算机程序被配置用于执行:16. The computer device according to claim 16, wherein the preset number of video frames are respectively input into a pre-trained gender recognition model to obtain gender prediction values corresponding to the target object in the preset number of video frames respectively When the one or more computer programs are configured to execute:
    确定所述预置数量视频帧中所述目标对象的人体区域;Determining the human body area of the target object in the preset number of video frames;
    根据所述人体区域,获取对应所述预置数量视频帧的预置数量行人人体图像;Acquiring, according to the body region, a preset number of pedestrian body images corresponding to the preset number of video frames;
    将所述预置数量行人人体图像分别输入预先训练的性别识别模型,得到分别对应所述预置数量视频帧中所述目标对象的性别预测值。The preset number of human body images of pedestrians are respectively input to a pre-trained gender recognition model to obtain gender prediction values corresponding to the target object in the preset number of video frames respectively.
PCT/CN2019/116236 2019-01-25 2019-11-07 Deep residual network-based gender recognition method and apparatus, medium, and device WO2020151300A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910074634.4A CN109829415A (en) 2019-01-25 2019-01-25 Gender identification method, device, medium and equipment based on depth residual error network
CN201910074634.4 2019-01-25

Publications (1)

Publication Number Publication Date
WO2020151300A1 true WO2020151300A1 (en) 2020-07-30

Family

ID=66862501

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/116236 WO2020151300A1 (en) 2019-01-25 2019-11-07 Deep residual network-based gender recognition method and apparatus, medium, and device

Country Status (2)

Country Link
CN (1) CN109829415A (en)
WO (1) WO2020151300A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113469144A (en) * 2021-08-31 2021-10-01 北京文安智能技术股份有限公司 Video-based pedestrian gender and age identification method and model

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109829415A (en) * 2019-01-25 2019-05-31 平安科技(深圳)有限公司 Gender identification method, device, medium and equipment based on depth residual error network

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3023911A1 (en) * 2014-11-24 2016-05-25 Samsung Electronics Co., Ltd. Method and apparatus for recognizing object, and method and apparatus for training recognizer
CN106203306A (en) * 2016-06-30 2016-12-07 北京小米移动软件有限公司 The Forecasting Methodology at age, device and terminal
CN106529442A (en) * 2016-10-26 2017-03-22 清华大学 Pedestrian identification method and apparatus
CN107633223A (en) * 2017-09-15 2018-01-26 深圳市唯特视科技有限公司 A kind of video human attribute recognition approach based on deep layer confrontation network
CN107844784A (en) * 2017-12-08 2018-03-27 广东美的智能机器人有限公司 Face identification method, device, computer equipment and readable storage medium storing program for executing
CN108510000A (en) * 2018-03-30 2018-09-07 北京工商大学 The detection and recognition methods of pedestrian's fine granularity attribute under complex scene
CN109829415A (en) * 2019-01-25 2019-05-31 平安科技(深圳)有限公司 Gender identification method, device, medium and equipment based on depth residual error network

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3023911A1 (en) * 2014-11-24 2016-05-25 Samsung Electronics Co., Ltd. Method and apparatus for recognizing object, and method and apparatus for training recognizer
CN106203306A (en) * 2016-06-30 2016-12-07 北京小米移动软件有限公司 The Forecasting Methodology at age, device and terminal
CN106529442A (en) * 2016-10-26 2017-03-22 清华大学 Pedestrian identification method and apparatus
CN107633223A (en) * 2017-09-15 2018-01-26 深圳市唯特视科技有限公司 A kind of video human attribute recognition approach based on deep layer confrontation network
CN107844784A (en) * 2017-12-08 2018-03-27 广东美的智能机器人有限公司 Face identification method, device, computer equipment and readable storage medium storing program for executing
CN108510000A (en) * 2018-03-30 2018-09-07 北京工商大学 The detection and recognition methods of pedestrian's fine granularity attribute under complex scene
CN109829415A (en) * 2019-01-25 2019-05-31 平安科技(深圳)有限公司 Gender identification method, device, medium and equipment based on depth residual error network

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113469144A (en) * 2021-08-31 2021-10-01 北京文安智能技术股份有限公司 Video-based pedestrian gender and age identification method and model
CN113469144B (en) * 2021-08-31 2021-11-09 北京文安智能技术股份有限公司 Video-based pedestrian gender and age identification method and model

Also Published As

Publication number Publication date
CN109829415A (en) 2019-05-31

Similar Documents

Publication Publication Date Title
US11354901B2 (en) Activity recognition method and system
Jegham et al. Vision-based human action recognition: An overview and real world challenges
WO2021114892A1 (en) Environmental semantic understanding-based body movement recognition method, apparatus, device, and storage medium
CN105160318B (en) Lie detecting method based on facial expression and system
CN103870828B (en) Image similarity judges system and method
WO2016107482A1 (en) Method and device for determining identity identifier of human face in human face image, and terminal
Ji et al. Learning contrastive feature distribution model for interaction recognition
Lin et al. On the detection-to-track association for online multi-object tracking
CN112380512B (en) Convolutional neural network dynamic gesture authentication method and device, storage medium and equipment
WO2018103416A1 (en) Method and device for detecting facial image
WO2020151300A1 (en) Deep residual network-based gender recognition method and apparatus, medium, and device
CN110458235B (en) Motion posture similarity comparison method in video
Ponce-López et al. Multi-modal social signal analysis for predicting agreement in conversation settings
US20220027606A1 (en) Human behavior recognition method, device, and storage medium
Borghi et al. Fast gesture recognition with multiple stream discrete HMMs on 3D skeletons
WO2018068654A1 (en) Scenario model dynamic estimation method, data analysis method and apparatus, and electronic device
CN111783619A (en) Human body attribute identification method, device, equipment and storage medium
Zuo et al. Face liveness detection algorithm based on livenesslight network
US11138417B2 (en) Automatic gender recognition utilizing gait energy image (GEI) images
CN113011399A (en) Video abnormal event detection method and system based on generation cooperative judgment network
Radwan et al. Regression based pose estimation with automatic occlusion detection and rectification
Zhu et al. Multi-target tracking via hierarchical association learning
Yan et al. Foreground Extraction and Motion Recognition Technology for Intelligent Video Surveillance
KR100711223B1 (en) Face recognition method using Zernike/LDA and recording medium storing the method
Ren et al. Human fall detection model with lightweight network and tracking in video

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19912077

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19912077

Country of ref document: EP

Kind code of ref document: A1