CN110059577B

CN110059577B - Pedestrian attribute information extraction method and device

Info

Publication number: CN110059577B
Application number: CN201910232030.8A
Authority: CN
Inventors: 石娟峰
Original assignee: Beijing Megvii Technology Co Ltd
Current assignee: Beijing Megvii Technology Co Ltd
Priority date: 2019-03-26
Filing date: 2019-03-26
Publication date: 2022-02-18
Anticipated expiration: 2039-03-26
Also published as: CN110059577A

Abstract

The invention provides a pedestrian attribute information extraction method and device. The method is performed using a convolutional neural network. The convolutional neural network comprises a preliminary feature extraction model, a pedestrian segmentation feature extraction model, a pedestrian attribute feature extraction model and a full connection layer. The method comprises the following steps: a preliminary feature extraction step, namely inputting the pedestrian image into a preliminary feature extraction model to obtain a preliminary feature of the pedestrian image; a segmentation feature extraction step, namely inputting the preliminary features into a pedestrian segmentation feature extraction model to extract pedestrian segmentation features; an attribute feature extraction step, namely inputting the preliminary features into a pedestrian attribute feature extraction model to extract pedestrian attribute features; a feature fusion step, namely fusing the pedestrian segmentation features and the pedestrian attribute features to obtain fusion features; and an attribute information prediction step, wherein the fusion characteristics are input into the full connection layer to obtain the predicted pedestrian attribute information. The accuracy of the pedestrian attribute information is improved through the combination of the pedestrian segmentation features and the pedestrian attribute features.

Description

Pedestrian attribute information extraction method and device

Technical Field

The present invention relates generally to the field of artificial intelligence technology, and more particularly, to a method and apparatus for extracting pedestrian attribute information, an electronic device, and a computer-readable storage medium.

Background

In many applications of video structuring, analysis of pedestrians is crucial, and especially, the pedestrian identification method plays a core role in many fields such as security and video retrieval for human identification.

The pedestrian attribute is an important part in video structuring, and the accuracy of the pedestrian attribute plays a vital role in improving the working efficiency of security protection and the like in a real application scene.

Disclosure of Invention

In order to solve the above problems in the prior art, embodiments of the present invention provide a method and an apparatus for extracting pedestrian attribute information, an electronic device, and a computer-readable storage medium.

In a first aspect, an embodiment of the present invention provides a method for extracting pedestrian attribute information, where the method is performed using a convolutional neural network, where the convolutional neural network includes a preliminary feature extraction model, a pedestrian segmentation feature extraction model, a pedestrian attribute feature extraction model, and a full connection layer, and the method includes: a preliminary feature extraction step, namely inputting the pedestrian image into a preliminary feature extraction model to obtain a preliminary feature of the pedestrian image; a segmentation feature extraction step, namely inputting the preliminary features into a pedestrian segmentation feature extraction model to extract pedestrian segmentation features; an attribute feature extraction step, namely inputting the preliminary features into a pedestrian attribute feature extraction model to extract pedestrian attribute features; a feature fusion step, namely fusing the pedestrian segmentation features and the pedestrian attribute features to obtain fusion features; and an attribute information prediction step, wherein the fusion characteristics are input into the full connection layer to obtain the predicted pedestrian attribute information.

In one example, the pedestrian segmentation feature extraction model includes a multi-layer convolution and a multi-layer deconvolution neural network.

In one example, the feature fusion step includes: and performing bitwise addition and/or bitwise multiplication operation on the pedestrian segmentation features and the pedestrian attribute features to obtain fusion features.

In one example, the feature fusion step includes: acquiring pedestrian segmentation position information according to the pedestrian segmentation features; and utilizing the pedestrian segmentation position information mask to mask partial pedestrian attribute features in the pedestrian attribute features to obtain pedestrian attribute features of the region of interest, thereby obtaining fusion features.

In one example, the pedestrian segmentation feature includes pedestrian segmentation semantic information and the pedestrian attribute feature includes pedestrian attribute semantic information. And, the feature fusion step includes: and obtaining fusion characteristics according to semantic relevance between the pedestrian segmentation semantic information and the pedestrian attribute semantic information.

In one example, the convolutional neural network is obtained by training through the following steps: a preliminary feature extraction training step, namely inputting the pedestrian sample image into a preliminary feature extraction model to obtain a preliminary sample feature of the pedestrian sample image; a segmentation feature extraction training step, namely inputting the preliminary features of the sample into a pedestrian segmentation feature extraction model and extracting the pedestrian segmentation features of the sample; an attribute feature extraction training step, wherein the preliminary features of the sample are input into a pedestrian attribute feature extraction model, and the pedestrian attribute features of the sample are extracted; a characteristic fusion training step, namely fusing the pedestrian segmentation characteristics of the sample and the pedestrian attribute characteristics of the sample to obtain fusion characteristics of the sample; an attribute information prediction training step, wherein the sample fusion characteristics are input into a full connection layer to obtain predicted pedestrian attribute information of the sample; a loss function calculation step, namely calculating the integral loss function of the convolutional neural network according to the sample pedestrian segmentation characteristic, the sample pedestrian attribute characteristic and the sample pedestrian attribute information; a loss function feedback step, in which the whole loss function is fed back to the convolutional neural network; and a parameter adjusting step, adjusting the parameters of the convolutional neural network according to the overall loss function until the convolutional neural network is converged.

In one example, the step of calculating the loss function comprises: calculating a segmentation loss function according to the pedestrian segmentation characteristics of the sample; calculating an attribute feature loss function according to the pedestrian attribute features of the sample; calculating an attribute information loss function according to the pedestrian attribute information of the sample; and carrying out weighted summation on the segmentation loss function, the attribute characteristic loss function and the attribute information loss function to obtain an integral loss function.

In a second aspect, an embodiment of the present invention provides a device for extracting pedestrian attribute information, where the device is implemented by using a convolutional neural network, where the convolutional neural network includes a preliminary feature extraction model, a pedestrian segmentation feature extraction model, a pedestrian attribute feature extraction model, and a full connection layer, and the device includes: the preliminary feature extraction module is configured for inputting the pedestrian image into the preliminary feature extraction model to obtain a preliminary feature of the pedestrian image; the segmentation feature extraction module is configured to input the preliminary features into a pedestrian segmentation feature extraction model and extract pedestrian segmentation features; the attribute feature extraction module is configured for inputting the preliminary features into a pedestrian attribute feature extraction model and extracting pedestrian attribute features; the feature fusion module is configured to fuse the pedestrian segmentation features and the pedestrian attribute features to obtain fusion features; and the attribute information prediction module is configured to input the fusion features into the full-connection layer to obtain predicted pedestrian attribute information.

In a third aspect, an embodiment of the present invention provides an electronic device, where the electronic device includes: a memory to store instructions; and the processor is used for calling the instructions stored in the memory to execute the method.

In a fourth aspect, embodiments of the present invention provide a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, perform the above-described method.

The pedestrian attribute information extraction method and device, the electronic device and the computer readable storage medium provided by the embodiment of the invention greatly improve the accuracy of the pedestrian attribute information by combining the pedestrian segmentation information and the pedestrian attribute information.

Drawings

The above and other objects, features and advantages of embodiments of the present invention will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the invention are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:

fig. 1 shows a flowchart of a pedestrian attribute information extraction method according to an embodiment of the present invention;

fig. 2 shows a block diagram of a pedestrian attribute information extraction device according to an embodiment of the present invention;

fig. 3 shows a block diagram of an electronic device according to an embodiment of the invention.

In the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.

Detailed Description

The principles and spirit of the present invention will be described with reference to a number of exemplary embodiments. It is understood that these embodiments are given solely for the purpose of enabling those skilled in the art to better understand and to practice the invention, and are not intended to limit the scope of the invention in any way.

As shown in fig. 1, one embodiment of the present invention provides a pedestrian attribute information extraction method 100. The method 100 is performed using a convolutional neural network. The convolutional neural network is a network commonly used in image recognition and video analysis at present, and is composed of a plurality of convolution units (namely convolution kernels), and each convolution unit extracts different features. The convolutional neural network may include a preliminary feature extraction model, a pedestrian segmentation feature extraction model, a pedestrian attribute feature extraction model, and a full connection layer. The method 100 includes steps S101-S105.

Step S101 is a preliminary feature extraction step of inputting a pedestrian image into a preliminary feature extraction model to obtain a preliminary feature of the pedestrian image.

For example, a classical network structure may be used as a preliminary feature extraction model. By way of example only, classical network structures such as GoogleNet, VGG, ResNet, and the like. In some embodiments, an image is first input into a preliminary feature extraction model whose parameters are initialized with a base model that has been trained.

In some embodiments, the preliminary feature extraction model may include one or more layers of convolutional neural networks.

Step S102 is a segmentation feature extraction step of inputting the preliminary features into a pedestrian segmentation feature extraction model to extract pedestrian segmentation features.

In some embodiments, extracting the pedestrian segmentation features may include extracting image information of one or more pedestrians from a background image, segmenting a portion of the pedestrians from images of other pedestrians, or segmenting image information of a local region of the pedestrians. A partial region of a pedestrian such as the head, upper body, lower body, etc. of the pedestrian.

Step S103 is an attribute feature extraction step of inputting the preliminary features into a pedestrian attribute feature extraction model to extract pedestrian attribute features.

In some embodiments, the pedestrian attribute features may include attribute features of all pedestrians, attribute features of a portion of pedestrians, or attribute features of a particular one of the pedestrians in the image. For example only, the pedestrian attribute features may include: gender, age, style of clothing, length of clothing, color of clothing, whether to wear a hat, whether to wear a backpack, style of bag, length of hair, whether to ride a bike, and the like. The above pedestrian attribute features are not exhaustive and may include any other pedestrian attribute feature.

Although step S103 is shown in fig. 1 as being performed after step S102, it should be noted that the order of steps S102 and S103 is not limited thereto. As another embodiment, step S102 may be performed after step S103. As still another embodiment, step S102 and step S103 may be performed simultaneously. The invention is not limited in this respect.

Step S104 is a feature fusion step of fusing the pedestrian segmentation features and the pedestrian attribute features to obtain fusion features.

In some embodiments, the fused feature may include a pedestrian local attribute feature, which may be a pedestrian feature relative to a single pedestrian of a plurality of pedestrians, or a local region feature relative to a pedestrian.

Step S105 is an attribute information prediction step of inputting the fusion feature into the all-connected layer to obtain predicted pedestrian attribute information.

The pedestrian attribute information extraction method 100 provided by the embodiment of the invention combines the pedestrian segmentation information and the pedestrian attribute information, and improves the accuracy of the pedestrian attribute information by using the relationship between the two.

As one embodiment of the present invention, the pedestrian segmentation feature extraction model may include a multilayer convolution and a multilayer deconvolution neural network.

As an embodiment of the present invention, the feature fusing step S104 may include: and performing bitwise addition and/or bitwise multiplication operation on the pedestrian segmentation features and the pedestrian attribute features to obtain fusion features.

As an embodiment of the present invention, the feature fusing step S104 may include: acquiring pedestrian segmentation position information according to the pedestrian segmentation features; and utilizing the pedestrian segmentation position information mask to mask partial pedestrian attribute features in the pedestrian attribute features to obtain pedestrian attribute features of the region of interest, thereby obtaining fusion features.

For example, if the attributes related to the hat are analyzed, the hat area obtained by dividing the pedestrian can be analyzed, and other areas such as the upper body and the lower body can be covered; if the related attributes of the packet are analyzed, the packet area obtained by dividing the pedestrian can be analyzed to obtain local information, and other areas are covered to avoid the interference of other areas.

By combining the pedestrian segmentation feature as a mask with the pedestrian attribute feature, the following processing steps can focus on the interested part in the pedestrian attribute feature, and cover other uninteresting parts, so that the interference of other parts is avoided, and the accuracy of the attribute information is improved.

As one embodiment of the present invention, the pedestrian segmentation feature may include pedestrian segmentation semantic information, and the pedestrian attribute feature may include pedestrian attribute semantic information. The feature fusion step S104 may include: and obtaining fusion characteristics according to semantic relevance between the pedestrian segmentation semantic information and the pedestrian attribute semantic information.

For example only, the pedestrian segmentation semantic information may include semantics of various body parts of the pedestrian, such as a head, the pedestrian attribute semantic information may include semantic information of whether the pedestrian wears a hat, and the segmentation features and the attribute features may be fused by semantic association of the head and the hat.

By semantically associating the pedestrian segmentation features and the pedestrian attribute features, the pedestrian segmentation features and the pedestrian attribute features which are mutually associated can be conveniently found so as to obtain the fusion features by utilizing the two features.

As an embodiment of the present invention, the convolutional neural network may be obtained by training through the following steps: a preliminary feature extraction training step, namely inputting the pedestrian sample image into a preliminary feature extraction model to obtain a preliminary sample feature of the pedestrian sample image; a segmentation feature extraction training step, namely inputting the preliminary features of the sample into a pedestrian segmentation feature extraction model and extracting the pedestrian segmentation features of the sample; an attribute feature extraction training step, wherein the preliminary features of the sample are input into a pedestrian attribute feature extraction model, and the pedestrian attribute features of the sample are extracted; a characteristic fusion training step, namely fusing the pedestrian segmentation characteristics of the sample and the pedestrian attribute characteristics of the sample to obtain fusion characteristics of the sample; an attribute information prediction training step, wherein the sample fusion characteristics are input into a full connection layer to obtain predicted pedestrian attribute information of the sample; a loss function calculation step, namely calculating the integral loss function of the convolutional neural network according to the sample pedestrian segmentation characteristic, the sample pedestrian attribute characteristic and the sample pedestrian attribute information; a loss function feedback step, in which the whole loss function is fed back to the convolutional neural network; and a parameter adjusting step, adjusting the parameters of the convolutional neural network according to the overall loss function until the convolutional neural network is converged.

In some embodiments, a loss function may be used to measure how inconsistent the predicted values of the model are from the true values. Which may be a non-negative real-valued function.

For example, the predicted values of the pedestrian segmentation features, the pedestrian attribute features and the pedestrian attribute information obtained according to the current convolutional neural network are respectively compared with the real pedestrian segmentation features, the real pedestrian attribute features and the real pedestrian attribute information to obtain the overall loss function.

In some embodiments, the loss function may include hinge loss, mutual entropy loss, square loss, exponential loss, and the like.

In some embodiments, the loss function calculating step may include: calculating a segmentation loss function according to the pedestrian segmentation characteristics of the sample; calculating an attribute feature loss function according to the pedestrian attribute features of the sample; calculating an attribute information loss function according to the pedestrian attribute information of the sample; and carrying out weighted summation on the segmentation loss function, the attribute characteristic loss function and the attribute information loss function to obtain an integral loss function.

In some embodiments, the weighted summation may multiply each individual loss function by a respective weighted ratio and add to obtain an overall value as the overall loss function. Alternatively, the weighted average may be obtained as the overall loss function by weighting and adding the sum and dividing the sum by the number of terms of the loss function involved in the weighting. Of course, any other weighting may be used, and the invention is not limited in this respect.

In some embodiments, adjusting parameters in the convolutional neural network according to the overall loss function may cause the convolutional neural network to tend to converge. For example, it may be determined that the convolutional neural network converges when the value of the loss function is below a certain threshold.

In some embodiments, training the network by adjusting the parameters of the convolutional neural network through the global loss function as described above may be achieved by adjusting the weight of each of the individual loss functions.

As shown in fig. 2, one embodiment of the present invention provides a pedestrian attribute information extraction device 200. The apparatus 200 is implemented using a convolutional neural network. The convolutional neural network comprises a preliminary feature extraction model, a pedestrian segmentation feature extraction model, a pedestrian attribute feature extraction model and a full connection layer. The apparatus 200 includes a module 201 and 205.

The preliminary feature extraction module 201 may be configured to input the pedestrian image into a preliminary feature extraction model, obtaining preliminary features of the pedestrian image.

The segmentation feature extraction module 202 may be configured to input the preliminary features into a pedestrian segmentation feature extraction model, extracting pedestrian segmentation features.

The attribute feature extraction module 203 may be configured to input the preliminary features into a pedestrian attribute feature extraction model, extracting pedestrian attribute features.

The feature fusion module 204 may be configured to fuse the pedestrian segmentation features with the pedestrian attribute features to obtain fusion features.

The attribute information prediction module 205 may be configured to input the fusion feature into the fully-connected layer to obtain predicted pedestrian attribute information.

As an embodiment of the present invention, the feature fusion module 204 may be further configured to: and performing bitwise addition and/or bitwise multiplication operation on the pedestrian segmentation features and the pedestrian attribute features to obtain fusion features.

As an embodiment of the present invention, the feature fusion module 204 may be further configured to: acquiring pedestrian segmentation position information according to the pedestrian segmentation features; and utilizing the pedestrian segmentation position information mask to mask partial pedestrian attribute features in the pedestrian attribute features to obtain pedestrian attribute features of the region of interest, thereby obtaining fusion features.

As one embodiment of the present invention, the pedestrian segmentation feature may include pedestrian segmentation semantic information, and the pedestrian attribute feature includes pedestrian attribute semantic information. Moreover, the feature fusion module 204 may be further configured to: and obtaining fusion characteristics according to semantic relevance between the pedestrian segmentation semantic information and the pedestrian attribute semantic information.

As an embodiment of the present invention, the convolutional neural network may be obtained by training through the following modules: the preliminary feature extraction training module is configured for inputting the pedestrian sample image into the preliminary feature extraction model to obtain preliminary features of the pedestrian sample image; the segmentation feature extraction training module is configured for inputting the preliminary features of the sample into a pedestrian segmentation feature extraction model and extracting pedestrian segmentation features of the sample; the attribute feature extraction training module is configured for inputting the preliminary features of the sample into a pedestrian attribute feature extraction model and extracting pedestrian attribute features of the sample; the characteristic fusion training module is configured for fusing the sample pedestrian segmentation characteristics and the sample pedestrian attribute characteristics to obtain sample fusion characteristics; the attribute information prediction training module is configured to input the sample fusion characteristics into the full-connection layer to obtain predicted pedestrian attribute information of the sample; the loss function calculation module is configured for calculating an overall loss function of the convolutional neural network according to the sample pedestrian segmentation feature, the sample pedestrian attribute feature and the sample pedestrian attribute information; a loss function feedback module configured to feed back the overall loss function to the convolutional neural network; and the parameter adjusting module is configured for adjusting the parameters of the convolutional neural network according to the overall loss function until the convolutional neural network converges.

As an embodiment of the present invention, the loss function calculation module may be further configured to: calculating a segmentation loss function according to the pedestrian segmentation characteristics of the sample; calculating an attribute feature loss function according to the pedestrian attribute features of the sample; calculating an attribute information loss function according to the pedestrian attribute information of the sample; and carrying out weighted summation on the segmentation loss function, the attribute characteristic loss function and the attribute information loss function to obtain an integral loss function.

The functions implemented by the modules in the apparatus correspond to the steps in the method described above, and for concrete implementation and technical effects, please refer to the description of the method steps above, which is not described herein again.

As shown in fig. 3, one embodiment of the invention provides an electronic device 300. The electronic device 300 includes a memory 301, a processor 302, and an Input/Output (I/O) interface 303. The memory 301 is used for storing instructions. And the processor 302 is used for calling the instructions stored in the memory 301 to execute the pedestrian attribute information extraction method according to the embodiment of the invention. The processor 302 is connected to the memory 301 and the I/O interface 303, respectively, for example, via a bus system and/or other connection mechanism (not shown). The memory 301 may be used to store programs and data, including a pedestrian attribute information extraction program according to an embodiment of the present invention, and the processor 302 executes various functional applications and data processing of the electronic device 300 by executing the programs stored in the memory 301.

In an embodiment of the present invention, the processor 302 may be implemented in at least one hardware form of a Digital Signal Processor (DSP), a Field-Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), and the processor 302 may be one or a combination of several Central Processing Units (CPUs) or other forms of Processing units with data Processing capability and/or instruction execution capability.

Memory 301 in embodiments of the present invention may comprise one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile Memory may include, for example, a Random Access Memory (RAM), a Cache Memory (Cache), and/or the like. The nonvolatile Memory may include, for example, a Read-Only Memory (ROM), a Flash Memory (Flash Memory), a Hard Disk Drive (HDD), a Solid-State Drive (SSD), or the like.

In the embodiment of the present invention, the I/O interface 303 may be used to receive input instructions (e.g., numeric or character information, and generate key signal inputs related to user settings and function control of the electronic device 300, etc.), and may also output various information (e.g., images or sounds, etc.) to the outside. The I/O interface 303 may comprise one or more of a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a mouse, a joystick, a trackball, a microphone, a speaker, a touch panel, and the like.

One embodiment of the present invention provides a computer-readable storage medium having stored thereon computer-executable instructions that, when executed by a processor, perform any of the methods described above.

Although operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in serial order, or that all illustrated operations be performed, to achieve desirable results. In certain environments, multitasking and parallel processing may be advantageous.

The methods and apparatus of the present invention can be accomplished with standard programming techniques with rule based logic or other logic to accomplish the various method steps. It should also be noted that the words "means" and "module," as used herein and in the claims, is intended to encompass implementations using one or more lines of software code, and/or hardware implementations, and/or equipment for receiving inputs.

Any of the steps, operations, or procedures described herein may be performed or implemented using one or more hardware or software modules, alone or in combination with other devices. In one embodiment, the software modules are implemented using a computer program product comprising a computer readable medium containing computer program code, which is executable by a computer processor for performing any or all of the described steps, operations, or procedures.

The foregoing description of the implementation of the invention has been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise form disclosed, and modifications and variations are possible in light of the above teachings or may be acquired from practice of the invention. The embodiments were chosen and described in order to explain the principles of the invention and its practical application to enable one skilled in the art to utilize the invention in various embodiments and with various modifications as are suited to the particular use contemplated.

Claims

1. A pedestrian attribute information extraction method, wherein the method is performed using a convolutional neural network, wherein the convolutional neural network includes a preliminary feature extraction model, a pedestrian segmentation feature extraction model, a pedestrian attribute feature extraction model, and a full-connected layer, the method comprising:

a preliminary feature extraction step of inputting a pedestrian image into the preliminary feature extraction model to obtain a preliminary feature of the pedestrian image;

a segmentation feature extraction step, namely inputting the preliminary features into the pedestrian segmentation feature extraction model to extract pedestrian segmentation features;

an attribute feature extraction step, namely inputting the preliminary features into the pedestrian attribute feature extraction model to extract pedestrian attribute features;

a feature fusion step, namely fusing the pedestrian segmentation features and the pedestrian attribute features to obtain fusion features;

an attribute information prediction step of inputting the fusion features into the full-link layer to obtain predicted pedestrian attribute information;

correspondingly, the feature fusion step comprises the following steps: acquiring pedestrian segmentation position information according to the pedestrian segmentation features; and masking partial pedestrian attribute features in the pedestrian attribute features by using the pedestrian segmentation position information to obtain pedestrian attribute features of the region of interest, so as to obtain the fusion features.

2. The method of claim 1, wherein the pedestrian segmentation feature extraction model comprises a multi-layer convolution and a multi-layer deconvolution neural network.

3. The method of claim 1, wherein the feature fusion step comprises:

and performing bitwise addition and/or bitwise multiplication operation on the pedestrian segmentation features and the pedestrian attribute features to obtain the fusion features.

4. The method of claim 1, wherein the pedestrian segmentation features comprise pedestrian segmentation semantic information, the pedestrian attribute features comprise pedestrian attribute semantic information, and,

the feature fusion step includes:

and acquiring the fusion feature according to the semantic relevance between the pedestrian segmentation semantic information and the pedestrian attribute semantic information.

5. The method of claim 1, wherein the convolutional neural network is obtained by training by:

a preliminary feature extraction training step, namely inputting a sample pedestrian image into the preliminary feature extraction model to obtain a sample preliminary feature of the sample pedestrian image;

a segmentation feature extraction training step, namely inputting the preliminary features of the sample into the pedestrian segmentation feature extraction model to extract pedestrian segmentation features of the sample;

an attribute feature extraction training step, namely inputting the preliminary features of the samples into the pedestrian attribute feature extraction model to extract pedestrian attribute features of the samples;

a characteristic fusion training step, namely fusing the sample pedestrian segmentation characteristic and the sample pedestrian attribute characteristic to obtain a sample fusion characteristic;

an attribute information prediction training step, namely inputting the sample fusion characteristics into the full-connection layer to obtain predicted pedestrian attribute information of the sample;

a loss function calculation step, calculating the integral loss function of the convolutional neural network according to the sample pedestrian segmentation feature, the sample pedestrian attribute feature and the sample pedestrian attribute information;

a loss function feedback step of feeding back the overall loss function to the convolutional neural network;

and a parameter adjusting step, adjusting the parameters of the convolutional neural network according to the overall loss function until the convolutional neural network is converged.

6. The method of claim 5, wherein the loss function calculating step comprises:

calculating a segmentation loss function according to the sample pedestrian segmentation features;

calculating an attribute feature loss function according to the sample pedestrian attribute features;

calculating an attribute information loss function according to the sample pedestrian attribute information;

and carrying out weighted summation on the segmentation loss function, the attribute characteristic loss function and the attribute information loss function to obtain the integral loss function.

7. A pedestrian attribute information extraction device, wherein the device is implemented using a convolutional neural network, wherein the convolutional neural network includes a preliminary feature extraction model, a pedestrian segmentation feature extraction model, a pedestrian attribute feature extraction model, and a full connection layer, the device comprising:

the preliminary feature extraction module is configured to input a pedestrian image into the preliminary feature extraction model to obtain a preliminary feature of the pedestrian image;

the segmentation feature extraction module is configured to input the preliminary features into the pedestrian segmentation feature extraction model and extract pedestrian segmentation features;

the attribute feature extraction module is configured to input the preliminary features into the pedestrian attribute feature extraction model and extract pedestrian attribute features;

the feature fusion module is configured to fuse the pedestrian segmentation features and the pedestrian attribute features to obtain fusion features;

the attribute information prediction module is configured to input the fusion features into the full-connection layer to obtain predicted pedestrian attribute information;

the feature fusion module is specifically used for obtaining pedestrian segmentation position information according to the pedestrian segmentation features; and masking partial pedestrian attribute features in the pedestrian attribute features by using the pedestrian segmentation position information to obtain pedestrian attribute features of the region of interest, so as to obtain the fusion features.

8. An electronic device, the electronic device comprising:

a memory to store instructions; and

a processor for invoking the instructions stored by the memory to perform the method of any of claims 1-6.

9. A computer-readable storage medium having stored thereon computer-executable instructions that, when executed by a processor, perform the method of any one of claims 1-6.