CN109447021B

CN109447021B - Attribute detection method and attribute detection device

Info

Publication number: CN109447021B
Application number: CN201811326781.8A
Authority: CN
Inventors: 高岱恒
Original assignee: Beijing Lynxi Technology Co Ltd
Current assignee: Beijing Lynxi Technology Co Ltd
Priority date: 2018-11-08
Filing date: 2018-11-08
Publication date: 2020-11-27
Anticipated expiration: 2038-11-08
Also published as: CN109447021A; WO2020093884A1

Abstract

The attribute detection method and the attribute detection device are characterized in that a plurality of neural network modules are controlled to process an image to be processed, and an attribute prediction vector and a constraint vector are obtained according to an obtained first feature map, so that the constraint vector restricts the attribute prediction vector to obtain an attribute detection result, and therefore the efficiency of attribute detection and the accuracy of attribute detection can be improved.

Description

Attribute detection method and attribute detection device

Technical Field

The present invention relates to the field of computer technologies, and in particular, to an attribute detection method and an attribute detection apparatus.

Background

With the rapid development of machine learning/deep learning technology, more and more technicians apply the technology to the technical field of attribute identification and detection. Attribute detection refers to a technique for acquiring various attributes in an image by processing the image. For example, the pedestrian attribute detection refers to a technique of acquiring various attributes (such as sex, whether to pack a backpack, whether to wear glasses, etc.) of a pedestrian in an image by processing an input image.

In the conventional attribute detection, a convolutional neural network based on deep multi-attribute recognition is used for reducing the size of an attribute detection model and improving the speed of attribute detection by reducing the number of convolutional layers and fully-connected layers in the attribute detection model. However, this method is susceptible to the location of the attribute, making the detection of the attribute less flexible and accurate. The other method is a weak supervised learning attribute detection method based on middle layer features, which guides weak supervised learning through attribute labeling of images so as to improve the accuracy of attribute detection. However, the attribute detection model obtained by the method has a large size, the efficiency of attribute detection is low, and the application scenarios are limited.

Disclosure of Invention

In view of this, the present invention provides an attribute detection method and an attribute detection apparatus, so as to improve the efficiency of attribute detection and the accuracy of attribute detection.

In a first aspect, an embodiment of the present invention provides an attribute detection method, including:

acquiring an image to be processed;

sequentially carrying out a plurality of neural network processing on the image to be processed to obtain a first characteristic diagram;

acquiring a constraint vector according to the first feature map, wherein the constraint vector comprises a constraint value of a visual angle;

acquiring an attribute prediction vector according to the first feature map;

and restricting the attribute prediction vector through the constraint vector to obtain an attribute detection result.

In the embodiment of the invention, the image to be processed is sequentially subjected to a plurality of neural network processes to obtain the first characteristic diagram, and the attribute prediction vector obtained according to the first characteristic diagram is restricted by the constraint vector obtained according to the first characteristic diagram, so that the attribute prediction result is obtained, and the efficiency and the accuracy of attribute detection are improved.

Further, obtaining a constraint vector according to the first feature map comprises:

predicting view information of the first feature map;

and acquiring the constraint vector from a preset view angle attribute vector according to the view angle information, wherein the view angle attribute vector comprises constraint values of different attributes of the image at each visual angle.

Further, obtaining an attribute prediction vector according to the first feature map comprises:

controlling an adaptive neural network to process the first feature map to obtain a second feature map;

acquiring the attribute prediction vector according to the second feature map;

wherein the adaptive neural network comprises at least one initiation model such that the attribute detection model reduces parameters while increasing the neural network depth and width.

In the embodiment of the invention, the data processing speed is further accelerated by adding an initiation model in the adaptive neural network.

Further, a compression-stimulation layer is deployed on at least one branch of the initiation model to generate a weight of each feature channel, and the weight is used for calibrating the attribute feature in the first feature map.

In the embodiment of the invention, the compression-stimulation layer is deployed on at least one branch of the initiation model to generate the weight of each channel, and the attributes in the first feature map are calibrated through the weight, so that the accuracy of attribute detection is further improved.

Further, controlling the adaptive neural network to process the first feature map to obtain a second feature map comprises:

performing feature splicing on the first feature map and at least one intermediate feature map to obtain a second feature map;

wherein the intermediate feature map is an intermediate output of the to-be-processed image in the plurality of neural network processing processes.

Further, the feature stitching the first feature map and the at least one intermediate feature map to obtain the second feature map comprises:

and sequentially splicing the first feature map and the at least one intermediate feature map from low-dimensional features to high-dimensional features through corresponding acceptance models to obtain the second feature map.

In the embodiment of the invention, the first feature map and the at least one intermediate feature map are sequentially spliced from the low-dimensional feature to the high-dimensional feature through the corresponding initiation model to obtain the second feature map, so that the accuracy of attribute prediction is further improved.

In a second aspect, an embodiment of the present invention provides an attribute detection apparatus, including:

an acquisition module configured to acquire an image to be processed;

the plurality of neural network modules are used for sequentially carrying out a plurality of neural network processes on the image to be processed so as to obtain a first characteristic diagram;

a visual angle detection module configured to obtain a constraint vector according to the first feature map, wherein the constraint vector comprises a constraint value of a visual angle;

an attribute prediction module configured to obtain an attribute prediction vector according to the first feature map; and

a detection result obtaining module configured to restrict the attribute prediction vector by the constraint vector to obtain an attribute detection result.

Further, the viewing angle detection module includes:

a visual angle information acquisition sub-module configured to predict visual angle information of the first feature map; and

and the constraint vector acquisition submodule is configured to acquire the constraint vector from a preset visual angle attribute vector according to the visual angle information, wherein the visual angle attribute vector comprises constraint values of different attributes of the image at each visual angle.

Further, the attribute prediction module comprises:

an adaptive neural network sub-module configured to process the first feature map to obtain a second feature map; and

an attribute prediction sub-module configured to obtain the attribute prediction vector according to the second feature map;

wherein the adaptive neural network comprises at least one initiation model such that parameters are reduced while neural network depth and width are increased.

Further, the adaptive neural network sub-module includes:

a feature stitching unit configured to perform feature stitching on the first feature map and at least one intermediate feature map to obtain the second feature map;

Further, the feature splicing unit includes:

and the dimension feature splicing subunit is configured to splice the first feature map and the at least one intermediate feature map from low-dimension features to high-dimension features sequentially through corresponding acceptance models to obtain the second feature map.

In a third aspect, the present invention provides a computer-readable storage medium, on which a computer program is stored, the computer program being executed by a processor to implement the method described above.

In a fourth aspect, an embodiment of the present invention provides an electronic device, including a memory and a processor, wherein,

the memory is used to store one or more computer instructions, which are executed by the processor to implement the method as described above.

In a fifth aspect, embodiments of the present invention provide a computer program product, which when run on a computer, causes the computer to perform the method as described above.

According to the technical scheme of the embodiment of the invention, the plurality of neural network modules are controlled to process the image to be processed, and the attribute prediction vector and the constraint vector are obtained according to the obtained first characteristic diagram, so that the constraint vector restricts the attribute prediction vector to obtain the attribute detection result, and therefore, the efficiency of attribute detection and the accuracy of attribute detection can be improved.

Drawings

The above and other objects, features and advantages of the present invention will become more apparent from the following description of the embodiments of the present invention with reference to the accompanying drawings, in which:

FIG. 1 is a flow chart of an attribute detection method of an embodiment of the present invention;

FIGS. 2 and 3 are schematic views of an image to be processed;

FIGS. 4 and 5 are image data flow diagrams of embodiments of the present invention;

FIG. 6 is a schematic diagram of a neural network in accordance with an embodiment of the present invention;

FIG. 7 is a diagram of an acceptance model according to an embodiment of the present invention;

FIG. 8 is a schematic view of a compression-stimulation layer of an embodiment of the present invention;

FIG. 9 is a schematic diagram of another acceptance model of an embodiment of the invention;

FIG. 10 is a schematic diagram of an attribute detection apparatus according to an embodiment of the present invention;

fig. 11 is a schematic diagram of an electronic device of an embodiment of the invention.

Detailed Description

The present invention will be described below based on examples, but the present invention is not limited to only these examples. In the following detailed description of the present invention, certain specific details are set forth. It will be apparent to one skilled in the art that the present invention may be practiced without these specific details. Well-known methods, procedures, components and circuits have not been described in detail so as not to obscure the present invention.

Further, those of ordinary skill in the art will appreciate that the drawings provided herein are for illustrative purposes and are not necessarily drawn to scale.

Unless the context clearly requires otherwise, throughout the description and the claims, the words "comprise", "comprising", and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense; that is, what is meant is "including, but not limited to".

In the description of the present invention, it is to be understood that the terms "first," "second," and the like are used for descriptive purposes only, do not denote any order, and are not to be construed as indicating or implying relative importance. In addition, in the description of the present invention, "a plurality" means two or more unless otherwise specified.

The attribute detection method provided by the embodiment of the invention can be applied to various types of attribute detection, such as pedestrian attribute detection, fruit attribute classification and the like. The embodiment of the invention is mainly described by pedestrian attribute detection.

Fig. 1 is a flowchart of an attribute detection method according to an embodiment of the present invention. As shown in fig. 1, the attribute detection method of the present embodiment includes the following steps:

step S110, an image to be processed is acquired.

Step S120, a plurality of neural network processes are sequentially carried out on the image to be processed to obtain a first characteristic diagram.

In an alternative implementation, the neural network processing may be convolutional neural network processing, for example, a plurality of convolution operations with different convolution kernels are performed, and the convolution operations are performed on the image to be processed in sequence to obtain the first feature map.

And step S130, acquiring a constraint vector according to the first feature map. Wherein the constraint vector comprises a constraint value for the viewing angle.

Specifically, firstly, the view angle information in the first feature map is predicted according to the image semantic feature (image semantic feature) of the first feature map, and then a constraint vector is acquired from a preset view angle attribute vector according to the obtained view angle information. The preset view angle attribute vector comprises constraint values of different attributes of the image at each visual angle. The visual angle information can be the visual angle of the pedestrian such as the front, the back or the side.

In the following, the detected attributes of the pedestrian are taken as whether to wear glasses or not and whether to wear a backpack or not, it is easy to understand that the backpack is generally worn on the back of the pedestrian (that is, whether the pedestrian is backpack or not is detected only when the pedestrian displayed on the picture is the back), and the glasses are generally worn on the front, and the like. Assume that the first column of the preset view angle attribute vector characterizes the front or back of the image, with 1 representing the front attribute and 0 representing the back attribute. The second column is a backpack attribute, with 1 indicating the presence of a backpack and 0 indicating the absence of a backpack. The third column shows the glasses attribute, 1 indicates the presence of glasses, and 0 indicates the absence of glasses. Then the preset view angle attribute vector is:

if the view angle information of the pedestrian in the image is predicted to be the front according to the feature information in the first feature map, the obtained constraint vector is [ 101 ].

It should be understood that the preset view attribute vector may be a probability constraint for the attributes. The upper view attribute vector may also be:

that is, the probability of the backpack being 10% on the front side and 90% on the back side. The probability of the glasses being 95% on the front and 5% on the back.

In step S140, an attribute prediction vector is obtained according to the first feature map. Specifically, the adaptive neural network is controlled to process the first feature map to obtain a second feature map, and an attribute prediction vector is obtained according to the second feature map, that is, the attribute prediction vector is obtained according to the semantic features of the image in the second feature map. The adaptive neural network comprises at least one initiation model so that parameters are reduced while the depth and the width of the neural network are increased, the computation of the model is reduced, and the data processing speed is increased. In an optional implementation manner, a compression-stimulation layer is deployed on at least one branch of the initiation model to generate a weight of each feature channel so as to calibrate the attribute features in the first feature map, so that the accuracy of attribute prediction can be further improved. That is, the first feature map is sequentially subjected to feature extraction by at least one acceptance model to acquire a second feature map.

It should be understood that the steps S130 and S140 in the above steps are not chronological, and the execution order may be interchanged. In a preferred implementation, steps S130 and S140 may be performed simultaneously to increase the processing speed.

And S150, restricting the attribute prediction vector through the restriction vector to obtain an attribute detection result. For example, assuming that the attribute prediction about the backpack in the attribute prediction vector is 1 in the front image of one pedestrian, it is forcibly modified to 0 by the constraint vector so that the presence of the backpack in the output attribute detection result is no. Since the shoulders are usually carried on the back, while the shoulder bags are not visible on the front in the human view, there should be no shoulder bags in the output attributes.

In another alternative implementation, the second feature map may be obtained by stitching the high-dimensional features and the low-dimensional features to further improve the accuracy of the attribute prediction. Specifically, the first feature map and the at least one intermediate feature map are feature-spliced to obtain a second feature map. The intermediate characteristic diagram is intermediate output in the process of processing the image to be processed by the neural network for multiple times. Preferably, the first feature map and the at least one intermediate feature map are sequentially subjected to feature splicing from a low-dimensional feature to a high-dimensional feature through corresponding initiation models to obtain a second feature map. Therefore, better detail information can be obtained, and stronger image semantic feature information can be obtained, so that the accuracy of attribute prediction is further improved. The high-dimensional features refer to features with a large number of dimensions, wherein the features may include many features with low correlation with information to be extracted, and the low-dimensional features (including relatively few features with high correlation with information to be extracted) can be obtained by performing further neural network processing on the feature map, and this process may be referred to as a dimension reduction process.

The following further illustrates aspects of embodiments of the present invention by describing the process of detecting the attributes of two pedestrian images. Takes the detection attributes as the backpack and the glasses and the preset visual angle attribute vector as

For example.

Fig. 2 and 3 are schematic diagrams of two images to be processed. And after the image 2 to be processed and the image 3 to be processed are subjected to neural network processing for multiple times, first characteristic maps of the image 2 to be processed and the image 3 to be processed are obtained. The view angle information of the image 2 to be processed is obtained as the back side according to the image semantic feature (image semantic feature) of the first feature map, and therefore the obtained constraint vector is [ 010 ]. According to the image semantic feature information of the first feature map of the image 2 to be processed, the attribute prediction information vector is [ 011 ], namely, the pedestrian attribute prediction information of the image 2 to be processed is that the image is the back of a pedestrian, the pedestrian carries a backpack, and the pedestrian wears glasses. And restricting the attribute prediction vector [ 011 ] by the constraint vector [ 010 ] to obtain an attribute detection result [ 010 ]. That is, the final result of the attribute detection of the image to be processed 2 is that the image is the back of a pedestrian, and the pedestrian carries a backpack, but does not wear glasses.

And obtaining the view angle information of the image to be processed 3 as a front according to the image semantic feature information of the first feature map, so that the obtained constraint vector is [ 101 ]. And obtaining an attribute prediction vector [ 110 ] according to the image semantic feature information of the first feature map of the image 3 to be processed, namely the pedestrian attribute prediction information of the image 3 to be processed is the front of a pedestrian, the pedestrian carries a backpack, and the pedestrian does not wear glasses. The attribute detection result is [ 100 ] obtained by constraining the attribute prediction vector [ 110 ] by the constraint vector [ 101 ]. That is, the final result of the attribute detection of the image to be processed 3 is that the image is the front of a pedestrian, and the pedestrian does not have a backpack or wear glasses. Therefore, the constraint vector of the image to be processed is obtained through the preset view angle attribute vector so as to correct the attribute prediction vector of the image to be processed, and a more accurate attribute detection result can be obtained.

Fig. 4 and 5 are image data flow diagrams of the embodiment of the present invention. As shown in fig. 4, the 1 st to nth neural network modules 41 to 4N are controlled to sequentially perform neural network processing on the input image Pin to obtain a first feature map M, and the first feature map M is output to the adaptive neural network sub-module 4a and the visual detection module 4b of the attribute prediction module. The first feature map M is sequentially processed by controlling an inceptionA model, an inceptionB model, an inceptionC model and an inceptionD model in the adaptive neural network sub-module 4a to obtain an attribute prediction vector M. The visual detection module 4b extracts visual information from the first feature map M, and acquires a constraint vector n from a preset viewing angle attribute vector according to the visual information. Then, the attribute prediction vector m and the constraint vector n are passed through the multiplier 4c, and the attribute detection result Pout of the input image Pin is output. It should be understood that the number of the initiation models in the present embodiment is only exemplary, and the type and number of the initiation models may be configured according to the actual application scenario.

In another optional implementation manner, part of the neural network modules in the plurality of neural network modules may be controlled to output the intermediate feature maps to the adaptive neural network sub-module, and the adaptive neural network sub-module is controlled to perform feature splicing of the high-dimensional features and the low-dimensional features on the plurality of intermediate feature maps and the first feature map to obtain the second feature map, so that the accuracy of attribute detection is further improved. As shown in fig. 5, the 1 st neural network module 51 to the nth neural network module 5N are controlled to sequentially process the input image Pin1, so that the N-2 st neural network module 5N-2 outputs the intermediate feature map M2 to the inception c model in the adaptive neural network sub-module 5a, the N-1 st neural network module 5N-1 outputs the intermediate feature map M1 to the inception b model, and the nth neural network module 5N outputs the first feature map M' to the inception a model and the visual inspection module 5 b. As shown in fig. 5, the inceptionB model processes the output of the inceptionA model and the intermediate feature map M1 for feature stitching. The inceptionC model processes the output of the inceptionB model and the intermediate feature map M2 for feature stitching. The inceptionD model processes the output of the inceptionC model to obtain an attribute prediction vector m 1. The visual inspection module 5b extracts visual information from the first feature map M', and obtains a constraint vector n1 from a preset viewing angle attribute vector according to the visual information. Then, the attribute prediction vector m1 and the constraint vector n1 are passed through the multiplier 5c to output an attribute detection result Pout1 of the input image Pin 1. The image processing process of the intermediate feature map M2, the intermediate feature map M1 and the first feature map M' may be referred to as a dimension reduction process. That is, the feature dimension of the intermediate feature map M2 > the feature dimension of the intermediate feature map M1 > the feature dimension of the first feature map M', and the dimension reduction process is a key feature extraction process.

Therefore, the implementation mode sequentially performs feature splicing on the first feature map and the at least one intermediate feature map from the low-dimensional feature to the high-dimensional feature through the corresponding initiation model to acquire the second feature map, so that better detail information can be obtained, and meanwhile stronger image semantic feature information can be acquired, so that the accuracy of attribute prediction is further improved.

FIG. 6 is a schematic diagram of a neural network module of an embodiment of the present invention. In this embodiment, the neural network modules are all convolutional neural network modules, and the schematic diagrams of the modules may be as shown in fig. 6. The neural network module 6 of the present embodiment includes a convolution and pooling layer 61 and a BN (Batch Normalization) layer 62. The convolution and pooling layer 61 is used for performing convolution pooling operation on the read data. The BN layer 62 is configured to perform normalization processing on the read-in data, and meanwhile, constrain the neural network to automatically adjust the strength of normalization in the training process, thereby reducing the cost of weight initialization and increasing the convergence speed of the model.

FIG. 7 is a diagram illustrating an acceptance model according to an embodiment of the present invention. The initiation model can automatically extract features through a convolutional neural network. And the input model is adopted to realize the parallel compression of the image, so that the depth and the width of the neural network are increased, and simultaneously, the parameters are reduced to accelerate the training speed. As shown in fig. 7, the acceptance model 7 of the present embodiment improves the acceptance model of the prior art. The initiation model 7 includes three branches. The attribute features in the input feature map are calibrated by adding an S-E layer (Squeeze-Excitation layer, compression-stimulation layer) at the tail ends of two convolution branches to generate the weight of each feature channel, so that the accuracy of attribute detection can be improved.

Fig. 8 is a schematic diagram of a compression-stimulation module of an embodiment of the present invention. The S-E layer can improve the expressive ability of the training model through the action relation among all characteristic channels of the characteristic diagram. As shown in fig. 8, first, the input feature map is subjected to the Squeeze operation, that is, the input feature map is sequentially processed through the global pooling layer 81 and the full connection layer 82. Specifically, the input feature map is processed by the global pooling layer 81 to place spatial information of the global receptive field of the feature map into a description (descriptor) feature map. And the full connection layer 82 acquires the global receptive field information of the input feature map according to the descriptor feature map. Then, an Excitation operation is performed, that is, the fully-connected layer 83 performs weight evaluation on each feature channel in the feature map according to the global receptive field information of the input feature map to generate a weight of each channel. And then weighting the weight of each channel to the corresponding original characteristic channel according to the S-shaped function, thereby completing the operation of calibrating the original characteristic on the characteristic channel dimension.

Thereby, the expression ability of the feature map group at the position can be enhanced. That is, by adding the S-E layer to the attribute detection model, the model can automatically acquire the importance level of each feature channel in a learning manner, and then enhance useful features and suppress features that are not useful for the current task according to the importance level of each feature channel, thereby enhancing the accuracy of detecting attributes with large position changes (for example, a satchel is on the left or right side of a pedestrian) and small attributes (for example, glasses, hats, and the like).

FIG. 9 is a diagram of another acceptance model according to an embodiment of the present invention. As shown in fig. 9, the initiation model 9 of the present embodiment includes four branches, and adds an S-E layer (Squeeze-Excitation layer, compression-stimulation layer) to the ends of three of the convolution branches to generate the weight of each feature channel, so as to calibrate the attribute features in the input feature map, thereby improving the accuracy of attribute detection. Compared with the allowance model 7 in fig. 7, the allowance model 9 further reduces the size of the convolution kernel, thereby further reducing the parameter number and improving the data processing efficiency.

It should be understood that the embodiment is only illustrated by the initiation model 7 and the initiation model 9 in fig. 7 and fig. 9, and other schemes of adding an S-E layer at the end of at least one branch of the existing initiation model are within the protection scope of the embodiment.

Fig. 10 is a schematic diagram of an attribute detection apparatus according to another embodiment of the present invention. As shown in fig. 10, the attribute detection apparatus 1 of the present embodiment includes an acquisition module 11, a 1 st neural network module 12, a 2 nd neural network module 13, a 3 rd neural network module 14, a 4 th neural network module 15, a perspective detection module 16, an attribute prediction module 17, and a detection result acquisition module 18. The acquisition module 11 is configured to acquire an image to be processed. The 1 st neural network module 12, the 2 nd neural network module 13, the 3 rd neural network 14 and the 4 th neural network 15 are configured to process the image to be processed in sequence to output a first feature map to the attribute prediction module 17 and the perspective detection module 16. The 1 st neural network module 12, the 2 nd neural network module 13, the 3 rd neural network 14 and the 4 th neural network 15 may be convolutional neural network modules, for example, a plurality of convolutional operations using different convolutional kernels, and the convolutional operations are sequentially performed on the image to be processed to obtain the first feature map.

The perspective detection module 16 is configured to obtain a constraint vector from the first feature map. The view angle detecting module 16 includes a view angle information acquiring sub-module 161 and a constraint vector acquiring sub-module 162. The perspective information acquisition sub-module 161 is configured to predict the perspective information of the first feature map according to the image semantic feature information of the first feature map. The constraint vector acquisition submodule 162 is configured to acquire a constraint vector from a preset view angle attribute vector according to the obtained view angle information. The preset view angle attribute vector comprises constraint values of different attributes of the image at each visual angle.

The attribute prediction module 17 is configured to obtain an attribute prediction vector from the first feature map. The attribute prediction module 17 includes an adaptive neural network sub-module 171 and an attribute prediction sub-module 172. The adaptive neural network sub-module 171 is configured to process the first signature to obtain a second signature. Wherein at least one initiation model is included in the adaptive neural network such that parameters are reduced while increasing the depth and width of the neural network. The attribute predictor sub-module 172 is configured to obtain an attribute predictor vector from the second profile.

The detection result obtaining module 18 is configured to constrain the attribute prediction vector by the constraint vector to obtain an attribute detection result. For example, assuming that the attribute prediction about the backpack in the attribute prediction vector is 1 in the front image of one pedestrian, it is forcibly modified to 0 by the constraint vector so that the presence of the backpack in the output attribute detection result is no. Since the shoulders are usually carried on the back, while the shoulder bags are not visible on the front in the human view, there should be no shoulder bags in the output attributes.

It should be understood that the number of the first neural network modules of the present embodiment is merely exemplary, and the number of the first neural networks in the attribute detection model should correspond to an actual application scenario.

In another alternative implementation, the adaptive neural network sub-module 171 may include a feature stitching unit 171a configured to perform feature stitching on the first feature map and the at least one intermediate feature map to obtain the second feature map. The intermediate characteristic diagram is intermediate output in the process of processing the image to be processed by the neural network for multiple times. Preferably, the feature stitching unit 171a may include a dimension feature stitching subunit (not shown in the figure), and is configured to sequentially stitch the first feature map and the at least one intermediate feature map from the low-dimension feature to the high-dimension feature through the corresponding initiation model to obtain the second feature map. Therefore, better detail information can be obtained, and stronger image semantic feature information can be obtained, so that the accuracy of attribute prediction is further improved.

Fig. 11 is a schematic diagram of an electronic device of an embodiment of the invention. The electronic device shown in fig. 11 is a general-purpose data processing apparatus comprising a general-purpose computer hardware structure including at least a processor 111 and a memory 112. The processor 111 and the memory 112 are connected by a bus 113. The memory 112 is adapted to store instructions or programs executable by the processor 111. Processor 111 may be a stand-alone microprocessor or may be a collection of one or more microprocessors. Thus, processor 111 implements the processing of data and the control of other devices by executing instructions stored by memory 112 to perform the method flows of embodiments of the present invention as described above. The bus 113 connects the above components together, and also connects the above components to a display controller 114 and a display device and an input/output (I/O) device 115. Input/output (I/O) device 115 may be a mouse, keyboard, modem, network interface, touch input device, motion sensing input device, printer, and other devices known in the art. Typically, the input/output devices 115 are coupled to the system through input/output (I/O) controllers 116.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus (device) or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-readable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations of methods, apparatus (devices) and computer program products according to embodiments of the invention. It will be understood that each flow in the flow diagrams can be implemented by computer program instructions.

These computer program instructions may be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows.

These computer program instructions may also be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows.

The method of the embodiment of the present invention may be carried by hardware chips such as a CPU and a GPU (Graphics Processing Unit). Wherein a hardware chip may include multiple processing cores to perform the methods of embodiments of the present invention. The plurality of processing cores may share a common memory. The general purpose memory is configured to store executable instructions that when executed perform methods of embodiments of the present invention.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. An attribute detection method, comprising:

acquiring an image to be processed;

acquiring an attribute prediction vector according to the first feature map;

restricting the attribute prediction vector through the constraint vector to obtain an attribute detection result;

wherein obtaining a constraint vector according to the first feature map comprises:

predicting view information of the first feature map;

and acquiring the constraint vector from a preset view angle attribute vector according to the view angle information, wherein the view angle attribute vector comprises constraint values of different attributes of the image at each visual angle, and the view angle attribute vector is probability constraint of each attribute.

2. The detection method according to claim 1, wherein obtaining an attribute prediction vector according to the first feature map comprises:

acquiring the attribute prediction vector according to the second feature map;

3. The detection method according to claim 2, wherein a compression-stimulation layer is deployed on at least one branch of the initiation model to generate a weight for each feature channel, and the weight is used for calibrating the attribute feature in the first feature map.

4. The detection method according to claim 2 or 3, wherein controlling the adaptive neural network to process the first feature map to obtain a second feature map comprises:

5. The detection method according to claim 4, wherein the feature stitching the first feature map and the at least one intermediate feature map to obtain the second feature map comprises:

6. An attribute detection device, comprising:

an acquisition module configured to acquire an image to be processed;

a detection result obtaining module configured to restrict the attribute prediction vector by the constraint vector to obtain an attribute detection result;

wherein the viewing angle detection module comprises:

and the constraint vector acquisition submodule is configured to acquire the constraint vector from a preset view angle attribute vector according to the view angle information, the view angle attribute vector comprises constraint values of different attributes of the image at each visual angle, and the view angle attribute vector is probability constraint of each attribute.

7. The attribute detection device of claim 6, wherein the attribute prediction module comprises:

8. The attribute detection apparatus according to claim 7, wherein a compression-stimulation layer is deployed on at least one branch of the initiation model to generate a weight for each feature channel, the weight being used to calibrate the attribute feature in the first feature map.

9. The attribute detection device of claim 7 or 8, wherein the adaptive neural network sub-module comprises:

10. The attribute detection device according to claim 9, wherein the feature concatenation unit includes:

11. A computer-readable storage medium, on which a computer program is stored, characterized in that the program is executed by a processor to implement the method according to any of claims 1-5.

12. An electronic device comprising a memory and a processor, wherein,

the memory is to store one or more computer instructions, wherein the one or more computer instructions are to be executed by the processor to implement the method of any one of claims 1-5.