CN108229559B

CN108229559B - Clothing detection method, clothing detection device, electronic device, program, and medium

Info

Publication number: CN108229559B
Application number: CN201711498336.5A
Authority: CN
Inventors: 陈益民; 陈海峰; 张伟
Original assignee: Shenzhen Sensetime Technology Co Ltd
Current assignee: Shenzhen Sensetime Technology Co Ltd
Priority date: 2017-12-29
Filing date: 2017-12-29
Publication date: 2021-05-18
Anticipated expiration: 2037-12-29
Also published as: CN108229559A

Abstract

The embodiment of the invention discloses a clothing detection method, a clothing detection device, electronic equipment, a program and a medium, wherein the method comprises the following steps: extracting human body prediction key points in an image to be detected; extracting clothing characteristics in an image to be detected; and clothing detection is carried out according to the clothing characteristics and the human body prediction key points to obtain clothing detection results. The method and the device can obtain the clothing detection result by extracting the human body prediction key points and the clothing characteristics from the image to be detected and combining the human body prediction key points and the clothing characteristics, and the human body key points can provide better context information for clothing detection, so the human body key points are used for assisting in the clothing detection, and the clothing detection accuracy can be improved.

Description

Clothing detection method, clothing detection device, electronic device, program, and medium

Technical Field

The present invention relates to computer vision technology, and in particular, to a clothing inspection method, apparatus, electronic device, program, and medium.

Background

The development of smart phones and mobile internet has produced massive picture data, and computer vision related technologies based on picture data have also been developed rapidly. In computer vision, the detection of objects in pictures is an important task, which is the basis for object recognition. At present, a big data-driven deep learning method is a big hotspot in the field of artificial intelligence, and is also commonly applied to the detection of objects in computer vision.

The detection of the human body clothes in the picture by utilizing the object identification can be used for carrying out personalized recommendation for the user, and the user experience is improved. At present, before image recognition of a picture containing clothes is performed, relevant clothes features are collected in advance, and similarity comparison recognition is performed on the whole picture according to the clothes features collected in advance. Since all features in the overall picture are identified in comparison with respect to the collected features of the apparel, the accuracy of identifying the features is low.

Disclosure of Invention

The embodiment of the invention provides a technical scheme for detecting clothes.

According to an aspect of the embodiments of the present invention, there is provided a clothing detection method, including:

extracting human body prediction key points in an image to be detected;

extracting clothing features in the image to be detected;

and clothing detection is carried out according to the clothing characteristics and the human body prediction key points to obtain clothing detection results.

In an embodiment of the present invention, the extracting human body prediction key points in an image to be detected includes: acquiring human body prediction key points in the image to be detected based on a human body key point network;

and/or the presence of a gas in the gas,

the dress characteristic in the image to be detected is extracted, including: and acquiring clothing characteristics in the image to be detected based on a clothing detection network.

In another embodiment of the present invention, the human body key point network and the clothing detection network share 1 st to N convolutional layers, where the 1 st to N convolutional layers are shared network layers, and N is an integer greater than or equal to 2.

In still another embodiment of the present invention, the human body key point network includes: the shared network layer and the key point detection branch layer;

the characteristics output by the shared network layer are shared layer characteristics, and the shared layer characteristics are used as key point characteristics of a human body;

the key point detection branch layer comprises M convolution layers, wherein M is an integer greater than or equal to 1;

and the key point detection branch layer is used for returning the positions of the human key points according to the human key point characteristics output by the shared network layer to obtain the human body prediction key points.

In yet another embodiment of the present invention, the clothing detection network includes: the shared network layer and the ornament detection branch layer;

the characteristics output by the shared network layer are shared layer characteristics, and the shared layer characteristics are used as the clothing characteristics;

the clothing detects the branch layer and includes: a pooling layer and a full-link layer;

the clothing detection branch layer is used for detecting the position information of clothing in the image to be detected based on the human body prediction key points and the clothing characteristics.

In another embodiment of the present invention, the detecting clothing according to the clothing feature and the human body prediction key point includes:

generating at least one alternative frame based on the human body prediction key points, wherein the alternative frame is used for framing the area of each human body prediction key point in the image to be detected;

clothing detection is carried out according to the clothing characteristics and the at least one alternative frame, and clothing detection results are output; the clothing detection result comprises: apparel location information and/or apparel classification information.

In another embodiment of the present invention, the detecting the clothing according to the clothing feature and the at least one candidate box includes:

based on the clothing detection network, acquiring clothing position information in the image to be detected according to the clothing characteristics and the at least one alternative frame;

and acquiring clothing classification information in the image to be detected according to the clothing characteristics, the clothing position information and the at least one alternative frame based on a clothing classification network.

In yet another embodiment of the present invention, the human body key point network, the clothing detection network and the clothing classification network share 1 st to N convolutional layers, where the 1 st to N convolutional layers are shared network layers, and N is an integer greater than or equal to 2.

In yet another embodiment of the present invention, the clothing classification network includes: the shared network layer and the ornament classification branch layer; the characteristics output by the shared network layer are shared layer characteristics, and the shared layer characteristics are used as the clothing classification characteristics; the clothing classification branch layer comprises: an alignment layer and V convolution layers, wherein V is an integer greater than or equal to 1; and the clothing classification branch layer is used for detecting the classification information of the clothing in the image to be detected based on the clothing detection result and the clothing classification characteristics.

In yet another embodiment of the present invention, the human prediction key point includes at least one of: a left shoulder point, a left arm point, a left hand point, a right shoulder point, a right arm point, a right hand point, a left waist point, a right waist point, a left knee point, a left foot point, a right knee point, and a right foot point.

In yet another embodiment of the present invention, the at least one alternate block includes at least one of:

the minimum rectangle of the left shoulder point, the right shoulder point and the waist point;

the minimum rectangle of the left and right shoulder points and the left and right knee points;

the minimum rectangle of two points of the waist and the left and right knee points;

the minimum rectangle of two points of the waist and the left and right foot points;

left and right shoulder points and left and right foot points.

In still another embodiment of the present invention, the method further includes:

training a clothing detection network and human body prediction key points for realizing the clothing detection method by using the first sample image; the first sample image is marked with human key point characteristics and decoration characteristics.

In another embodiment of the present invention, the training of the clothing detection network and the human body prediction key point for implementing the clothing detection method by using the first sample image includes:

extracting the human body key point features in the first sample image by using the human body key point network;

detecting the human body key points by using the human body key point network based on the human body key point characteristics to obtain human body prediction key points of the first sample image;

utilizing the clothing detection network to extract clothing characteristics in the first sample image;

detecting clothing according to the clothing characteristics in the extracted first sample image and the human body prediction key points by using the clothing detection network to obtain a prediction clothing detection result;

training the human body key point network and the clothing detection network based on a first difference between the human body prediction key point in the first sample image and the labeled human body key point characteristic and/or a second difference between the prediction clothing detection result and the labeled clothing characteristic until a preset condition is met.

In still another embodiment of the present invention, the satisfaction of the preset condition includes at least one of:

training the human body key point network and the clothing detection network for a preset number of times;

the first difference is smaller than a first preset threshold;

the second difference is less than a second preset threshold.

training a clothing classification network for realizing the clothing detection method and the clothing position information by using a second sample image; the second sample image is marked with human body key point features, clothing features and clothing classification features.

In another embodiment of the present invention, the training of the clothing classification network and the clothing position information by using the second sample image to implement the clothing detection method includes:

extracting the human body key point features in the second sample image by using the human body key point network;

detecting the human body key points by using the human body key point network based on the human body key point characteristics to obtain human body prediction key points of the second sample image;

extracting clothing features in the second sample image by using the clothing detection network;

detecting clothing according to the clothing characteristics in the extracted second sample image and the human body prediction key points by using the clothing detection network to obtain a prediction clothing detection result;

utilizing the clothing classification network to classify clothing according to the clothing classification characteristics in the extracted second sample image and the predicted clothing detection result to obtain a predicted clothing classification result;

training the human body key point network, the clothing detection network and the clothing classification network based on a first difference between the human body prediction key point in the second sample image and the labeled human body key point characteristic, and/or a second difference between the predicted clothing detection result and the labeled clothing characteristic, and/or a third difference between the predicted clothing classification result and the labeled clothing classification characteristic until a preset condition is met.

training the human body key point network, the clothing detection network and the clothing classification network for preset times;

the first difference is smaller than a first preset threshold;

the second difference is smaller than a second preset threshold;

the third difference is less than a third preset threshold.

According to another aspect of the embodiments of the present invention, there is provided a clothing detection apparatus, including:

the prediction key point extraction module is used for extracting human body prediction key points in the image to be detected;

the clothing feature extraction module is used for extracting clothing features in the image to be detected;

and the clothing detection module is used for carrying out clothing detection according to the clothing characteristics and the human body prediction key points to obtain clothing detection results.

In an embodiment of the present invention, the predicted key point extracting module is specifically configured to obtain a human body predicted key point in the image to be detected based on a human body key point network;

the clothing feature extraction module is specifically used for acquiring clothing features in the image to be detected based on a clothing detection network.

In yet another embodiment of the present invention, the clothing detection module includes: an alternative frame generating unit and a clothing detecting unit;

the candidate frame generating unit is used for generating at least one candidate frame based on the human body prediction key points, and the candidate frame is used for framing the area of each human body prediction key point in the image to be detected;

the clothing detection unit is used for detecting clothing according to the clothing characteristics and the at least one alternative frame and outputting a clothing detection result; the clothing detection result comprises: apparel location information and/or apparel classification information.

In still another embodiment of the present invention, the clothing detection unit includes: a clothing position sub-unit is obtained, and a clothing classification sub-unit is obtained;

the obtaining clothing position subunit is configured to obtain clothing position information in the image to be detected according to the clothing feature and the at least one candidate frame based on the clothing detection network;

and the obtaining clothing classification subunit is used for obtaining clothing classification information in the image to be detected according to the clothing characteristics, the clothing position information and the at least one alternative frame based on a clothing classification network.

minimum rectangle for left and right shoulder points and left and right knee points:

left and right shoulder points and left and right foot points.

In yet another embodiment of the present invention, the apparatus further comprises:

the clothing position training module is used for training a clothing detection network and human body prediction key points of the clothing detection device by utilizing the first sample image; the first sample image is marked with human key point characteristics and decoration characteristics.

In yet another embodiment of the present invention, the apparel position training module includes:

a human body key point extracting unit, configured to extract a human body key point feature in the first sample image by using the human body key point network;

a human body key point detection unit, configured to perform human body key point detection based on the human body key point features by using the human body key point network, and obtain a human body predicted key point of the first sample image;

the clothing feature extraction unit is used for extracting clothing features in the first sample image by using the clothing detection network;

the predicted clothing detection unit is used for detecting clothing according to the clothing characteristics in the extracted first sample image and the human body prediction key points by utilizing the clothing detection network to obtain a predicted clothing detection result;

and the clothing position training unit is used for training the human body key point network and the clothing detection network based on a first difference between the human body prediction key point in the first sample image and the labeled human body key point characteristic and/or a second difference between the prediction clothing detection result and the labeled clothing characteristic until a preset condition is met.

the first difference is smaller than a first preset threshold;

the second difference is less than a second preset threshold.

the clothing classification training module is used for training a clothing classification network for realizing the clothing detection device and the clothing position information by utilizing a second sample image; the second sample image is marked with human body key point features, clothing features and clothing classification features.

In another embodiment of the present invention, the clothing classification training module includes:

a human body key point extracting unit, configured to extract a human body key point feature in the second sample image by using the human body key point network;

a human body key point detection unit, configured to perform human body key point detection based on the human body key point features by using the human body key point network, and obtain a human body predicted key point of the second sample image;

a clothing feature extraction unit, configured to extract clothing features in the second sample image using the clothing detection network;

the predicted clothing detection unit is used for detecting clothing according to the clothing characteristics in the extracted second sample image and the human body prediction key points by using the clothing detection network to obtain a predicted clothing detection result;

the predicted clothing classification unit is used for classifying clothing according to the clothing classification characteristics in the extracted second sample image and the predicted clothing detection result by utilizing the clothing classification network to obtain a predicted clothing classification result;

and the clothing classification training unit is used for training the human body key point network, the clothing detection network and the clothing classification network based on a first difference between the human body prediction key point in the second sample image and the labeled human body key point characteristic, and/or a second difference between the predicted clothing detection result and the labeled clothing characteristic, and/or a third difference between the predicted clothing classification result and the labeled clothing classification characteristic until a preset condition is met.

the first difference is smaller than a first preset threshold;

the second difference is smaller than a second preset threshold;

the third difference is less than a third preset threshold.

According to another aspect of the embodiments of the present invention, there is provided an electronic device provided with the clothing detection apparatus described in any of the above embodiments.

According to still another aspect of the embodiments of the present invention, there is provided an electronic apparatus including:

a memory for storing executable instructions; and

a processor in communication with the memory for executing the executable instructions to perform the operations of the apparel detection method of any of the above embodiments.

According to yet another aspect of embodiments of the present invention, there is provided a computer program, which includes computer readable code, when the computer readable code is run on a device, a processor in the device executes the instructions of the steps of the clothing detection method according to any one of the above embodiments.

According to another aspect of the embodiments of the present invention, a computer-readable storage medium is provided for storing computer-readable instructions, which when executed, perform the operations of the steps in the clothing detection method according to any one of the embodiments.

Based on the clothing detection method, the device, the electronic equipment, the program and the medium provided by the embodiment of the invention, the human body prediction key points and the clothing characteristics are extracted from the image to be detected, and the clothing detection result is obtained by combining the human body prediction key points and the clothing characteristics.

The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description, serve to explain the principles of the invention.

The invention will be more clearly understood from the following detailed description, taken with reference to the accompanying drawings, in which:

FIG. 1 is a flow chart of an embodiment of a method for detecting clothing of the present invention.

FIG. 2 is a flow chart of an embodiment of training a convolutional neural network for clothing detection in an embodiment of the present invention.

FIG. 3 is a diagram of an alternative block for predicting key points with respect to a human body according to an embodiment of the present invention.

Fig. 4 is a flowchart of another embodiment of a clothing detection method of the present invention.

FIG. 5 is a flow chart of another embodiment of training a convolutional neural network for clothing detection in an embodiment of the present invention.

Figure 6 is a flow chart of yet another embodiment of a method of apparel detection in accordance with the present invention.

Figure 7 is a schematic structural diagram of one embodiment of the clothing inspection device of the present invention.

Figure 8 is a schematic structural view of another embodiment of the clothing inspection device of the present invention.

Figure 9 is a schematic structural view of another embodiment of the apparel detection device of the present invention.

Fig. 10 is a schematic structural diagram of an embodiment of a clothing detection unit in the clothing detection device according to the invention.

Figure 11 is a schematic structural view of yet another embodiment of the apparel detection device of the present invention.

Fig. 12 is a schematic structural diagram of an embodiment of an electronic device according to the invention.

Detailed Description

Various exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless specifically stated otherwise.

Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.

Embodiments of the invention are operational with numerous other general purpose or special purpose computing system environments or configurations, and with numerous other electronic devices, such as terminal devices, computer systems, servers, etc. Examples of well known terminal devices, computing systems, environments, and/or configurations that may be suitable for use with electronic devices, such as terminal devices, computer systems, servers, and the like, include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, microprocessor-based systems, set top boxes, programmable consumer electronics, network pcs, minicomputer systems, mainframe computer systems, distributed cloud computing environments that include any of the above systems, and the like.

Electronic devices such as terminal devices, computer systems, servers, etc. may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc. that perform particular tasks or implement particular abstract data types. The computer system/server may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices.

FIG. 1 is a flow chart of an embodiment of a method for detecting clothing of the present invention. As shown in fig. 1, the clothing detection method of this embodiment includes:

101, extracting human body prediction key points in an image to be detected.

The image to be detected in the embodiment of the invention is an image containing clothing, and the image to be detected can be a static picture or a certain frame in a video and is mainly used as an image for identifying clothing characteristics.

The human body prediction key point refers to a key point characteristic of a part of the wearing apparel, which has a certain probability, in a human body.

And 102, extracting clothing features in the image to be detected.

The apparel features refer to information features of apparel for human wear.

103, clothing detection is carried out according to the clothing characteristics and the human body prediction key points, and clothing detection results are obtained.

In the process of implementing the invention, researchers find that in the field of clothing, a convolutional neural network is often directly adopted to detect clothing in an image at present, and in most clothing images, the clothing is worn on the human body, and the difference of human postures may influence the identification result of the clothing.

In the embodiment of the invention, the human body prediction key points and the decoration features are extracted from the image to be detected, and the human body prediction key points and the decoration features are combined to obtain the clothing detection result.

As an example, in an embodiment of the present invention, extracting human body prediction key and clothing features in an image to be detected may be obtained based on a convolutional neural network. It can be seen that the efficiency of clothing identification can be improved by simultaneously extracting human body prediction key points and clothing features by using an end-to-end convolutional neural network.

Optionally, the method for extracting the human body prediction key points in the image to be detected may be: and acquiring human body prediction key points in the image to be detected based on the human body key point network.

Further, the mode of extracting the clothing features in the image to be detected can be as follows: and acquiring clothing characteristics in the image to be detected based on the clothing detection network.

As an example, in the present embodiment, the human body keypoint network includes 1 st to L convolutional layers; the clothing detection network comprises 1 st to P convolution layers, a pooling layer and a full-link layer, wherein L, P are integers which are all larger than 1, and L and P can be the same or different.

The pooling layer in this embodiment may also be a region pooling layer, where the region pooling layer is to pool the candidate frames on the feature map to obtain a feature map with a fixed size.

In this embodiment, the convolution layer included in the human body key point network and the convolution layer included in the clothing detection network may be independent convolution layers respectively.

By way of example, in the embodiment of the present invention, the human body key point network and the clothing detection network share 1 st to N convolutional layers, where the 1 st to N convolutional layers are shared network layers, and N is an integer greater than or equal to 2.

The human body key point network comprises: a shared network layer and a key point detection branch layer.

The characteristics output by the shared network layer are shared layer characteristics, and the shared layer characteristics are used as key point characteristics of a human body; the key point detection branch layer comprises M convolution layers, wherein M is an integer greater than or equal to 1; the key point detection branch layer is used for returning the positions of the human key points according to the human key point features output by the shared network layer to obtain human body predicted key points.

The clothing detection network includes: the network layer and the ornament detection branch layer are shared.

The characteristics output by the shared network layer are shared layer characteristics, and the shared layer characteristics are used as clothing characteristics;

the clothing detects the branch layer and includes: a pooling layer and a full-link layer; the clothing detection branch layer is used for detecting the position information of clothing in the image to be detected based on the human body prediction key points and the clothing characteristics.

For convenience of understanding, the example of sharing 1 st to N convolutional layers by the human body key point network and the clothing detection network is given as an example,

as shown in particular in fig. 2. It can be seen that N-5, M-2 are shown in fig. 2; namely, the shared network layer includes a convolutional layer 1, a convolutional layer 2, a convolutional layer 3, a convolutional layer 4, and a convolutional layer 5.

Therefore, the human key point network comprises a convolutional layer 1, a convolutional layer 2, a convolutional layer 3, a convolutional layer 4, a convolutional layer 5, a convolutional layer 6 and a convolutional layer 7.

The clothing detection network comprises a convolution layer 1, a convolution layer 2, a convolution layer 3, a convolution layer 4, a convolution layer 5, a pooling layer and a full-connection layer.

Wherein, the convolutional layers 1 to 5 are shared network layers of the human body key point network and the clothing detection network, and the convolutional layers 6 and 7 are key point detection branch layers in the human body key point network.

As can be seen from fig. 2, the human body key point network and the clothing detection network work in the shared layer and include parallel output of shared layer features, and the shared layer features include human body key point features and clothing features, that is, the human body prediction key point detection and the clothing feature detection are performed simultaneously.

As an example, the human body prediction key point detection on the basis of the extracted human body prediction key points may be implemented as follows:

in a first step, at least one candidate box is generated based on the detected human body prediction key points.

In the embodiment of the invention, the alternative frame is used for framing the area of each human body prediction key point in the image to be detected.

It should be noted that the human body prediction key points include any one or more of the following: a left shoulder point, a left arm point, a left hand point, a right shoulder point, a right arm point, a right hand point, a left waist point, a right waist point, a left knee point, a left foot point, a right knee point, and a right foot point.

The at least one alternate block includes any one or more of:

left and right shoulder points and left and right foot points.

The candidate box may be represented by coordinate information, and may be coordinates of four vertices, or coordinates of a center point and a length and a width of the position box.

As shown in fig. 3, 12 human body prediction key points are included in fig. 3, that is, the human body prediction key points described above are included. Based on the above proposed human body prediction key points, 5 candidate boxes (fig. 3) can be extracted to cover the possible positions of the apparel:

secondly, clothing detection is carried out according to the extracted clothing characteristics and at least one alternative frame, and clothing detection results are output; the clothing detection result comprises clothing position information.

In one embodiment of the present invention, the clothing detection result may further include a clothing category, that is, the clothing detection result includes clothing position information and a clothing category.

Illustratively, the apparel location information may be represented in the form of an apparel location box.

Similarly, the clothing position frame may also be represented by coordinate information. The representation manner is the same as that of the alternative block, and is not described herein again.

The dress is a general term for the articles for decorating human body. The clothing category includes clothing (further refined into coats, trousers, skirts and the like), shoes, hats, socks, gloves, scarves, ties, bags, umbrellas, hair ornaments and the like, and further, the clothing category can be further divided into men clothing and women clothing or clothing of different ages.

Optionally, after the human body prediction key point is detected, the detected human body prediction key point may be further output, so as to display the human body prediction key point.

As an example, the method for training a model for clothing detection of the present invention includes: and training a clothing detection network and human body prediction key points for realizing the clothing detection method by using the first sample image.

The first sample image is marked with human key point characteristics and decoration characteristics; the apparel features may include annotation categories.

The first sample image is at least one image, that is, the first sample image may be one image or a plurality of images. The first sample image is used to train a model of the convolutional neural network.

It should be noted that the detection framework in the convolutional Neural network of the present invention may be, for example, a detection framework of fast Regions with Convo1 functional Neural Networks (Fastregion RCNN) or a detection framework of SSD (Single Shot MultiBox Detector).

Illustratively, a method for training a clothing detection network and human body prediction key points for implementing the clothing detection method by using a first sample image is shown in fig. 4, and the method comprises the following steps:

and 401, extracting the human body key point features in the first sample image by using the human body key point network.

And 402, detecting the human key points by utilizing a human key point network based on the human key point characteristics to obtain the human prediction key points of the first sample image.

And 403, extracting clothing features in the first sample image by utilizing the clothing detection network.

And 404, detecting the clothing according to the clothing characteristics in the extracted first sample image and the human body prediction key points by utilizing a clothing detection network to obtain a prediction clothing detection result.

Optionally, the predicting the clothing detection result may include: and predicting the position information of the clothes.

Further optionally, predicting the clothing detection result may include: a predicted apparel location box and a predicted apparel category.

And 405, training a human body key point network and a clothing detection network based on a first difference between the human body prediction key point in the first sample image and the labeled human body key point characteristic and/or a second difference between the predicted clothing detection result and the labeled clothing characteristic until a preset condition is met.

Wherein, satisfying the preset condition comprises at least one of the following:

training the first neural network for a preset number of times;

the first difference is smaller than a first preset threshold;

the second difference is less than a second preset threshold.

The clothing position frame is obtained based on the human body prediction key points, so that the condition of clothing missing detection or false detection when the human body posture is large can be effectively reduced, and the problem of false detection of a background in an image can also be reduced; in the clothing detection part, the number of the clothing position frames can be reduced through the positions of the clothing position frames generated by the key points of the human body, the position difference between the obtained clothing position frames and the actual clothing frame is small, and the difficulty of the clothing detection network training can be reduced.

The convolutional neural network can be divided into two network structures (a human body key point network and a clothing detection network), when the two network structures in the invention are shared by the convolutions of the previous layers (such as convolution layers 1-5 in figure 2), namely, the depth characteristic of the human body prediction key point is calculated and is also applied to clothing detection; the image is calculated by a shared convolution neural network layer (such as convolution layers 1-5 in figure 2), enters a key point detection branch layer, regresses the positions of human body prediction key points, uses the human body prediction key points to form clothing position information (clothing position frame), performs regional pooling on the last layer of the shared convolution layer, regresses the real position of clothing and identifies the category of clothing.

According to the embodiment of the invention, the end-to-end convolutional neural network is used for simultaneously training the key points and the decorative features of the human body in the extracted image, so that better context information can be provided, the information can be used for assisting in the detection of clothing, the accuracy of clothing recognition is improved, the recognition efficiency is improved, and the posture of the human body can be predicted.

As an example, human keypoint detection for the predicted keypoints based on extracted human may be implemented in another way: the present invention further includes a clothing detection method, as shown in fig. 5, which performs clothing detection according to clothing characteristics and the human body prediction key points (step 103 in fig. 1), and may further include:

in a first step, at least one candidate box is generated based on the human body predicted key points.

Similarly, the candidate frame is used for framing the regions of the predicted key points of each human body in the image to be detected.

And secondly, clothing detection is carried out according to the clothing characteristics and the at least one alternative frame, and clothing detection results are output.

Specifically, based on a clothing detection network, clothing position information in an image to be detected is obtained according to clothing features and at least one alternative frame; and based on the clothing classification network, obtaining clothing classification information in the image to be detected according to the clothing characteristics, the clothing position information and the at least one alternative frame.

The clothing detection result may include: clothing classification information.

Optionally, the clothing detection result may include: garment location information and garment classification information.

Further optionally, the clothing detection result may include: garment location information, garment category, and garment classification information.

Illustratively, the apparel location information may be specifically an apparel location box.

The clothing classification information includes: and clothing categories corresponding to at least one pixel in the clothing position frame.

It is understood that an image is composed of many pixels (pixels). Semantic classification (also known as semantic Segmentation) is the Grouping (Grouping)/Segmentation (Segmentation) of pixels according to the difference of semantic meanings expressed in an image. In the image Semantic classification task, a three-channel color image is input, a corresponding matrix is output, and each element of the matrix indicates a Semantic class (Semantic label) represented by a pixel at a corresponding position in an original image. Thus, an Image Semantic classification is also referred to as an Image Semantic label (Image Semantic labeling), a pixel Semantic label (Semantic pixel labeling), or a pixel Semantic grouping. The difficulty with the image semantic grouping task is that this "semantic" two word. In a real image, the same object expressing a certain semantic is often composed of different parts, and these parts often have different colors, textures and even brightness, which brings difficulties and challenges to the accurate segmentation of the image semantic.

The initial semantic classification map has X channels, representing the probability that each pixel in the image belongs to each semantic category in the X semantic categories. Wherein X is an integer greater than 1 and represents X-1 semantic categories and 1 background category of each pixel. For example, the value of X is 21, and 21 represents 20 semantic categories and one background category of pixels.

After the clothing classification network is added, clothing in the clothing position information (clothing position frame) can be accurately segmented, namely whether each pixel in the image (clothing position frame) belongs to clothing (or human body) or background information and the range of each clothing can be determined through pixels, and therefore, the clothing classification network can also be used for assisting in identification of the clothing.

It should be noted that, in this embodiment, the logical structures of the human body key point network and the decoration detection network are the same as those described above.

Namely, the human body key point network can comprise 1 st to L convolutional layers; the clothing detection network comprises 1 st to P convolutional layers, a pooling layer and a full-connection layer; the apparel classification network includes 1 st through W convolutional networks, an alignment layer, and Q convolutional layers. Wherein P, W, Q are each integers greater than 1.

It is worth noting that in another embodiment, the human body key point network, the clothing detection network and the clothing classification network may share one shared network layer. Then, in conjunction with what has been described above, it can be stated that:

the human body key point network comprises: a shared network layer and a key point detection branch layer; the characteristics output by the shared network layer are shared layer characteristics, and the shared layer characteristics are used as key point characteristics of a human body; the key point detection branch layer comprises M convolution layers, wherein M is an integer greater than or equal to 1; the key point detection branch layer is used for returning the positions of the human key points according to the human key point features output by the shared network layer to obtain human body predicted key points.

The clothing detection network includes: sharing a network layer and a decoration detection branch layer; the characteristics output by the shared network layer are shared layer characteristics, and the shared layer characteristics are used as clothing characteristics; the clothing detects the branch layer and includes: a pooling layer and a full-link layer; the clothing detection branch layer is used for detecting the position information of clothing in the image to be detected based on the human body prediction key points and the clothing characteristics.

The clothing classification network includes: sharing a network layer and a decoration classification branch layer; the characteristics output by the shared network layer are shared layer characteristics, and the shared layer characteristics are used as clothing classification characteristics; the clothing classification branch layer comprises: an alignment layer, V convolutional layers, V being an integer greater than or equal to 1; the clothing classification branch layer is used for detecting clothing classification information in the image to be detected based on clothing position information and clothing classification characteristics (such as pixels).

In one embodiment of the invention, the alignment layer may be a zone alignment layer. The region alignment layer can enable the clothing position frame to be accurately aligned to the original feature map, so that the pixel-level classification (segmentation) has no error and can be directly mapped to the original image.

For ease of understanding, this is shown in detail in fig. 5, by way of example. It can be seen that the network of key points in the human body shown in fig. 5 includes convolutional layers 1, 2, 3, 4, 5, 6, and 7, where N is 5, M is 2, and V is 2.

The clothing detection network comprises a convolutional layer 1, a convolutional layer 2, a convolutional layer 3, a convolutional layer 4, a convolutional layer 5, a regional pooling layer and a full-connection layer.

Wherein, the convolutional layers 1 to 5 are shared convolutional layers of a human body key point network and a clothing detection network, and the convolutional layers 6 and 7 are key point detection branch layers in the human body key point network. The area pooling layer and the full connection layer are clothing detection branch layers in the clothing detection network.

The garment classification network includes a convolutional layer 1, a convolutional layer 2, a convolutional layer 3, a convolutional layer 4, a convolutional layer 5, a region alignment layer, a convolutional layer 8, and a convolutional layer 9. Region alignment layer, convolutional layer 8 and convolutional layer 9 are the garment classification branch layers in the garment classification network.

As an example, the method for training a model for clothing detection of the present invention further includes: and training the clothing classification network and the clothing position information of the clothing detection method by using the second sample image.

The second sample image is marked with human body key point characteristics, clothing characteristics and ornament classification characteristics.

Optional apparel features include apparel annotation boxes, and may also include annotation categories.

It is to be understood that the second sample image is at least one image, and the type of image is also the same as the first sample image.

Exemplarily, a method for utilizing a second sample image for clothing classification network and clothing position information of a clothing detection method of the embodiment of the present invention is shown in fig. 6, and the method includes:

and 601, extracting the human body key point characteristics in the second sample image by using the human body key point network.

And 602, detecting the human key points by using a human key point network based on the human key point characteristics to obtain human prediction key points of the second sample image.

603, using the clothing detection network to extract clothing features in the second sample image.

And 604, detecting the clothing according to the clothing characteristics in the extracted second sample image and the human body prediction key points by utilizing a clothing detection network to obtain a prediction clothing detection result.

Optionally, predicting the clothing detection knot may include: a predicted apparel location box.

Alternatively, predicting the clothing detection knot may include: a predicted apparel location box and a predicted apparel category.

By extracting the clothing position frame based on the human body prediction key points, the condition of clothing missing detection or false detection when the human body posture is large can be effectively reduced, and the problem of false detection of a background in an image can also be reduced; in the clothing detection part, the positions of the clothing position frames generated by the human body prediction key points can reduce the number of the clothing position frames, the position difference between the obtained clothing position frames and the actual clothing frame is small, and the difficulty of the clothing detection network training can be reduced.

605, classifying the clothing according to the clothing classification features and the predicted clothing detection result in the extracted second sample image by utilizing a clothing classification network to obtain a predicted clothing classification result.

Optionally, the predicted clothing classification result may include predicted clothing classification information.

And 606, training the human body key point network, the clothing detection network and the clothing classification network based on a first difference between the human body prediction key point in the second sample image and the labeled human body key point characteristic, and/or a second difference between the predicted clothing detection result and the labeled clothing characteristic, and/or a third difference between the predicted clothing classification result and the labeled clothing classification characteristic until a preset condition is met.

training the second neural network for a preset number of times;

the first difference is smaller than a first preset threshold;

the second difference is smaller than a second preset threshold;

the third difference is less than a third preset threshold.

The convolutional neural network in the embodiment of the invention can be divided into three network structures (a human body key point network, a clothing detection network and a clothing classification network), and similarly, the three network structures in the invention are shared by the convolutions of the previous layers (such as convolution layers 1-5 in figure 5), namely, the depth characteristic of the human body prediction key point is calculated and is also applied to clothing detection core clothing classification; the image is calculated by a shared convolution neural network layer (such as convolution layers 1-5 in figure 5), enters a key point detection branch layer, regresses the positions of human body prediction key points, forms a clothing position frame by using the human body prediction key points, performs regional pooling on the last layer of the shared convolution layer, regresses the real position of clothing and identifies the category of clothing. The embodiment of the invention can expand to realize the function of clothing classification, and train the network on the data set marked with human key point characteristics, clothing characteristics (clothing marking frame) and clothing classification characteristics, during training, clothing position information (clothing position frame) is input into the clothing classification network, the clothing classification network shares convolution calculation and characteristics with the human key point network and the clothing detection network in the first several layers of convolution layers, and the classification loss function in each clothing frame is used for training the clothing classification network; when the system is used, after human body prediction key point detection and clothing position information detection are completed, each piece of clothing position information passes through a clothing classification network to obtain a final classification result.

In an embodiment of the present invention, a clothing detection apparatus is further provided, and the apparatus of this embodiment may be used to implement the above method embodiments of the present invention. As shown in fig. 7, fig. 7 is a schematic structural diagram of an embodiment of the apparatus of the present invention, and the apparatus of the embodiment includes:

and the prediction key point extraction module 701 is used for extracting human body prediction key points in the image to be detected.

And the clothing feature extraction module 702 is used for extracting clothing features in the image to be detected.

And the clothing detection module 703 is configured to perform clothing detection according to the clothing characteristics and the human body prediction key points to obtain a clothing detection result.

As an example, the predicted key point extracting module 701 is specifically configured to obtain a human body predicted key point in an image to be detected based on a human body key point network.

As an example, the clothing feature extraction module 702 is specifically configured to obtain clothing features in an image to be detected based on a clothing detection network.

In the embodiment of the invention, the human body key point network and the clothing detection network share 1 st to N convolution layers, the 1 st to N convolution layers are shared network layers, and N is an integer greater than or equal to 2.

Specifically, optionally, the human body key point network includes: a shared network layer and a key point detection branch layer; the characteristics output by the shared network layer are shared layer characteristics, and the shared layer characteristics are used as key point characteristics of a human body; the key point detection branch layer comprises M convolution layers, wherein M is an integer greater than or equal to 1; the key point detection branch layer is used for returning the positions of the human key points according to the human key point features output by the shared network layer to obtain human body predicted key points.

Further, the clothing detection network includes: sharing a network layer and a decoration detection branch layer; the characteristics output by the shared network layer are shared layer characteristics, and the shared layer characteristics are used as clothing characteristics; the clothing detects the branch layer and includes: a pooling layer and a full-link layer; the clothing detection branch layer is used for detecting the position information of clothing in the image to be detected based on the human body prediction key points and the clothing characteristics.

As an example, in another embodiment of the present invention, as shown in fig. 8, clothing detection module 703 in clothing detection apparatus may include: candidate block generation unit 7031, clothing detection unit 7032.

The candidate frame generating unit 7031 is configured to generate at least one candidate frame based on the human body prediction key points, where the candidate frame is used to frame an area of each human body prediction key point in the image to be detected.

And a clothing detection unit 7032, configured to perform clothing detection according to the clothing characteristics and the at least one candidate frame, and output a clothing detection result.

The clothing detection result comprises: garment location information and garment classification information.

In one embodiment of the invention, the human prediction key points comprise at least one of: a left shoulder point, a left arm point, a left hand point, a right shoulder point, a right arm point, a right hand point, a left waist point, a right waist point, a left knee point, a left foot point, a right knee point, and a right foot point.

Optionally, the at least one alternative block includes at least one of:

left and right shoulder points and left and right foot points.

As an example, the present invention may also provide another clothing detection apparatus, as shown in fig. 9, the clothing detection apparatus may further include: apparel position training module 704.

A clothing position training module 704, configured to train a clothing detection network and a human body prediction key point of a clothing detection apparatus using the first sample image.

The first sample image is marked with human key point characteristics and decoration characteristics.

As an example, apparel position training module 704 includes: the system comprises a human body key point extraction unit, a human body key point detection unit, a clothing feature extraction unit, a predicted clothing detection unit and a clothing position training unit.

And the human body key point extracting unit is used for extracting the human body key point characteristics in the first sample image by utilizing the human body key point network.

And the human body key point detection unit is used for detecting the human body key points based on the human body key point characteristics by utilizing the human body key point network to obtain the human body prediction key points of the first sample image.

And the clothing feature extraction unit is used for extracting clothing features in the first sample image by utilizing a clothing detection network.

And the predicted clothing detection unit is used for detecting clothing according to the clothing characteristics in the extracted first sample image and the human body prediction key points by utilizing a clothing detection network to obtain a predicted clothing detection result.

And the clothing position training unit is used for training the human body key point network and the clothing detection network based on a first difference between the human body prediction key point in the first sample image and the labeled human body key point characteristic and/or a second difference between the predicted clothing detection result and the labeled clothing characteristic until a preset condition is met.

the first difference is smaller than a first preset threshold;

the second difference is less than a second preset threshold.

Optionally, in another embodiment of the present invention, there is further provided a clothing detection apparatus, as shown in fig. 10, a clothing detection unit 7032 in the clothing detection apparatus further includes: a get dress position sub-unit 70321, a get dress classification sub-unit 70322.

The obtaining clothing position subunit 70321 is configured to obtain clothing position information in the image to be detected according to the clothing features and the at least one candidate box based on the clothing detection network.

And an obtaining clothing classification subunit 70322, configured to obtain clothing classification information in the image to be detected according to the clothing features, the clothing location information, and the at least one candidate frame based on the clothing classification network.

In this embodiment, the clothing detection result output by the clothing detection module 703 may include: clothing classification information.

For example, a human body keypoint network may include 1 st to L convolutional layers; the clothing detection network comprises 1 st to P convolutional layers, a pooling layer and a full-connection layer; the apparel classification network includes 1 st through W convolutional networks, an alignment layer, and Q convolutional layers. Wherein P, W, Q are each integers greater than 1.

It should be noted that, in another embodiment, the human body key point network, the clothing detection network and the clothing classification network share the 1 st to N convolutional layers, where the 1 st to N convolutional layers are shared network layers, and N is an integer greater than or equal to 2.

The clothing classification network includes: sharing a network layer and a decoration classification branch layer; the characteristics output by the shared network layer are shared layer characteristics, and the shared layer characteristics are used as clothing classification characteristics; the clothing classification branch layer comprises: an alignment layer, V convolutional layers, V being an integer greater than or equal to 1; the clothing classification branch layer is used for detecting the classification information of the clothing in the image to be detected based on the clothing detection result and the clothing classification characteristic.

As an example, in another embodiment of the present invention, there is further provided a clothing detection apparatus, as shown in fig. 11, further including: clothing classification training module 705.

A clothing classification training module 705, configured to train a clothing classification network and clothing position information for implementing the clothing detection apparatus by using the second sample image; the second sample image is marked with human body key point characteristics, clothing characteristics and ornament classification characteristics.

As an example, the clothing classification training module 705 may include: the system comprises a human body key point extraction unit, a human body key point detection unit, a clothing feature extraction unit, a clothing prediction detection unit, a clothing prediction classification unit and a clothing classification training unit.

And the human body key point extracting unit is used for extracting the human body key point characteristics in the second sample image by using the human body key point network.

And the human body key point detection unit is used for detecting the human body key points based on the human body key point characteristics by utilizing the human body key point network to obtain the human body prediction key points of the second sample image.

And the clothing feature extraction unit is used for extracting clothing features in the second sample image by utilizing a clothing detection network.

And the clothes prediction detection unit is used for detecting clothes according to the clothes characteristics in the extracted second sample image and the human body prediction key points by using a clothes detection network to obtain a clothes prediction detection result.

And the predicted clothing classification unit is used for classifying clothing according to the clothing classification characteristics in the extracted second sample image and the predicted clothing detection result by utilizing a clothing classification network to obtain a predicted clothing classification result.

training times of the human body key point network, the clothing detection network and the clothing classification network reach preset times;

the first difference is smaller than a first preset threshold;

the second difference is smaller than a second preset threshold;

the third difference is less than a third preset threshold.

By extracting the clothing position frame based on the human body prediction key points, the condition of clothing missing detection or false detection when the human body posture is large can be effectively reduced, and the problem of false detection of a background in an image can also be reduced; in the clothing detection part, the positions of the clothing position frames generated by the human body prediction key points can reduce the number of the clothing position frames, the position difference between the obtained clothing position frames and the actual clothing frame is small, and the difficulty of the clothing detection network training can be reduced. In addition, the end-to-end convolutional neural network is used for simultaneously training the key point features and the decoration features of the human body in the extracted image, so that better context information can be provided, the information can be used for assisting in detecting clothes, the accuracy of clothes recognition is improved, the recognition efficiency is improved, and meanwhile, the posture of the human body can be predicted.

In addition, in an embodiment of the present invention, an electronic device is further provided, where the electronic device may be a mobile terminal, a Personal Computer (PC), a tablet computer, a server, and the like, and the electronic device is provided with the clothing detection apparatus in any embodiment of the present invention.

Based on the electronic equipment provided by the embodiment of the invention, the clothing detection device provided by the embodiment of the invention can provide better context information by using the end-to-end convolutional neural network to simultaneously train the key point characteristics and the decoration characteristics of the human body in the extracted image, and the information can be used for assisting the clothing detection, so that the accuracy of clothing recognition is improved, the recognition efficiency is improved, and the posture of the human body can be predicted.

In addition, another electronic device provided in an embodiment of the present invention includes:

a memory for storing executable instructions; and the number of the first and second groups,

a processor in communication with the memory for executing the executable instructions to perform the operations of the apparel detection method of any of the above embodiments of the present invention.

Fig. 12 is a schematic structural diagram of an embodiment of an electronic device according to the present invention. Referring now to FIG. 12, shown is a schematic diagram of an electronic device suitable for use in implementing embodiments of the present application. As shown in fig. 12, the electronic device includes one or more processors, a communication section, and the like, for example: one or more Central Processing Units (CPUs), and/or one or more image processors (GPUs), etc., which may perform various appropriate actions and processes according to executable instructions stored in a Read Only Memory (ROM) or loaded from a storage section into a Random Access Memory (RAM). The communication part may include, but is not limited to, a network card, which may include, but is not limited to, an ib (infiniband) network card, and the processor may communicate with the read-only memory and/or the random access memory to execute the executable instructions, connect with the communication part through the bus, and communicate with other target devices through the communication part, so as to complete operations corresponding to any method provided by the embodiments of the present application, for example, extracting human body prediction key points in the image to be detected; extracting clothing characteristics in an image to be detected; and clothing detection is carried out according to the clothing characteristics and the human body prediction key points to obtain clothing detection results.

In addition, in the RAM, various programs and data necessary for the operation of the apparatus can also be stored. The CPU, ROM, and RAM are connected to each other via a bus. In the case of RAM, ROM is an optional module. The RAM stores executable instructions or writes executable instructions into the ROM during operation, and the executable instructions cause the processor to execute operations corresponding to any one of the methods of the invention. An input/output (I/O) interface is also connected to the bus. The communication unit may be integrated, or may be provided with a plurality of sub-modules (e.g., a plurality of IB network cards) and connected to the bus link.

The following components are connected to the I/O interface: an input section including a keyboard, a mouse, and the like; an output section including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section including a hard disk and the like; and a communication section including a network interface card such as a LAN card, a modem, or the like. The communication section performs communication processing via a network such as the internet. The drive is also connected to the I/O interface as needed. A removable medium such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive as necessary, so that a computer program read out therefrom is mounted into the storage section as necessary.

It should be noted that the architecture shown in fig. 9 is only an optional implementation manner, and in a specific practical process, the number and types of the components in fig. 9 may be selected, deleted, added or replaced according to actual needs; in different functional component settings, separate settings or integrated settings may also be used, for example, the GPU and the CPU may be separately set or the GPU may be integrated on the CPU, the communication part may be separately set or integrated on the CPU or the GPU, and so on. These alternative embodiments are all within the scope of the present disclosure.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program tangibly embodied on a machine-readable medium, the computer program comprising program code for performing the method illustrated in the flowchart, the program code may include instructions corresponding to performing the steps of the apparel detection method provided by embodiments of the present application. In such an embodiment, the computer program may be downloaded and installed from a network via the communication section, and/or installed from a removable medium. The computer program, when executed by the CPU, performs the above-described functions defined in the method of the present application.

In addition, an embodiment of the present invention further provides a computer program, which includes a computer readable code, and when the computer readable code runs on a device, a processor in the device executes instructions for implementing steps in the clothing detection method according to any embodiment of the present invention.

In addition, the embodiment of the present invention further provides a computer-readable storage medium, configured to store computer-readable instructions, where the instructions, when executed, perform operations of the steps in the clothing detection method according to any embodiment of the present invention.

Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

In the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts in the embodiments are referred to each other. For the system embodiment, since it basically corresponds to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The method and apparatus of the present invention may be implemented in a number of ways. For example, the methods and apparatus of the present invention may be implemented in software, hardware, firmware, or any combination of software, hardware, and firmware. The above-described order for the steps of the method is for illustrative purposes only, and the steps of the method of the present invention are not limited to the order specifically described above unless specifically indicated otherwise. Furthermore, in some embodiments, the present invention may also be embodied as a program recorded in a recording medium, the program including machine-readable instructions for implementing a method according to the present invention. Thus, the present invention also covers a recording medium storing a program for executing the method according to the present invention. The description of the present invention has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to practitioners skilled in this art. The embodiment was chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.

Claims

1. A clothing detection method is characterized by comprising the following steps:

extracting human body prediction key points in an image to be detected;

extracting clothing features in the image to be detected;

clothing detection is carried out according to the clothing characteristics and the human body prediction key points to obtain clothing detection results;

the clothing detection according to the clothing characteristics and the human body prediction key points comprises the following steps:

clothing detection is carried out according to the clothing characteristics and the at least one alternative frame, and clothing detection results are output; the clothing detection result comprises: apparel location information and/or apparel classification information;

the method for extracting the human body prediction key points in the image to be detected comprises the following steps: acquiring human body prediction key points in the image to be detected based on a human body key point network; and/or, the clothes characteristics in the image to be detected are extracted, and the method comprises the following steps: acquiring clothing characteristics in the image to be detected based on a clothing detection network;

the detecting clothing according to the clothing characteristics and the at least one alternative frame comprises:

2. The method of claim 1, wherein the human keypoint network and the apparel detection network share 1 to N convolutional layers, the 1 to N convolutional layers being shared network layers, and N being an integer greater than or equal to 2.

3. The method of claim 2,

the human body key point network comprises: a shared network layer and a key point detection branch layer;

4. The method of claim 2,

the apparel detection network includes: sharing a network layer and a decoration detection branch layer;

5. The method of claim 1, wherein the human keypoint network, the garment detection network, and the garment classification network share 1 st to N convolutional layers, the 1 st to N convolutional layers being shared network layers, and N being an integer greater than or equal to 2.

6. The method of claim 5,

the apparel classification network includes: sharing a network layer and a decoration classification branch layer; the characteristics output by the shared network layer are shared layer characteristics, and the shared layer characteristics are used as clothing classification characteristics; the clothing classification branch layer comprises: an alignment layer and V convolution layers, wherein V is an integer greater than or equal to 1; and the clothing classification branch layer is used for detecting the classification information of the clothing in the image to be detected based on the clothing detection result and the clothing classification characteristics.

7. The method of claim 6, wherein the human prediction keypoints comprise at least one of: a left shoulder point, a left arm point, a left hand point, a right shoulder point, a right arm point, a right hand point, a left waist point, a right waist point, a left knee point, a left foot point, a right knee point, and a right foot point.

8. The method of claim 7, wherein the at least one candidate box comprises at least one of:

left and right shoulder points and left and right foot points.

9. The method of claim 1, further comprising:

10. The method of claim 9, wherein training a clothing detection network and human prediction key points implementing the clothing detection method with a first sample image comprises:

11. The method of claim 10, wherein the preset condition being met comprises at least one of:

the first difference is smaller than a first preset threshold;

the second difference is less than a second preset threshold.

12. The method of claim 1, further comprising:

training a clothing classification network and the clothing location information implementing the clothing detection method of claim 1 using a second sample image; the second sample image is marked with human body key point features, clothing features and clothing classification features.

13. The method of claim 12, wherein training the clothing classification network and the clothing location information implementing the clothing detection method of claim 1 using the second sample image comprises:

14. The method of claim 13, wherein the predetermined condition being met comprises at least one of:

the first difference is smaller than a first preset threshold;

the second difference is smaller than a second preset threshold;

the third difference is less than a third preset threshold.

15. A dress detection device, its characterized in that includes:

the clothing detection module is used for carrying out clothing detection according to the clothing characteristics and the human body prediction key points to obtain clothing detection results;

dress detection module includes: an alternative frame generating unit and a clothing detecting unit;

the clothing detection unit is used for detecting clothing according to the clothing characteristics and the at least one alternative frame and outputting a clothing detection result; the clothing detection result comprises: apparel location information and/or apparel classification information;

the prediction key point extraction module is specifically used for acquiring human body prediction key points in the image to be detected based on a human body key point network; the clothing feature extraction module is specifically used for acquiring clothing features in the image to be detected based on a clothing detection network;

the dress detection element includes: a clothing position sub-unit is obtained, and a clothing classification sub-unit is obtained;

16. The apparatus of claim 15, wherein the human keypoint network and the apparel detection network share 1 to N convolutional layers, the 1 to N convolutional layers being shared network layers, the N being an integer greater than or equal to 2.

17. The apparatus of claim 16,

18. The apparatus of claim 16,

19. The apparatus of claim 15, wherein the human keypoint network, the garment detection network, and the garment classification network share 1 st to N convolutional layers, the 1 st to N convolutional layers being shared network layers, the N being an integer greater than or equal to 2.

20. The apparatus of claim 19,

21. The apparatus of claim 20, wherein the human prediction keypoints comprise at least one of: a left shoulder point, a left arm point, a left hand point, a right shoulder point, a right arm point, a right hand point, a left waist point, a right waist point, a left knee point, a left foot point, a right knee point, and a right foot point.

22. The apparatus of claim 21, wherein the at least one candidate box comprises at least one of:

left and right shoulder points and left and right foot points.

23. The apparatus of claim 15, further comprising:

24. The apparatus of claim 23, wherein the apparel position training module comprises:

25. The apparatus of claim 24, wherein the preset condition is met, comprising at least one of:

the first difference is smaller than a first preset threshold;

the second difference is less than a second preset threshold.

26. The apparatus of claim 15, further comprising:

a clothing classification training module for training a clothing classification network implementing the clothing detection apparatus of claim 15 and the clothing location information using a second sample image; the second sample image is marked with human body key point features, clothing features and clothing classification features.

27. The apparatus of claim 26, wherein the apparel classification training module comprises:

28. The apparatus of claim 27, wherein the predetermined condition is satisfied comprises at least one of:

the first difference is smaller than a first preset threshold;

the second difference is smaller than a second preset threshold;

the third difference is less than a third preset threshold.

29. An electronic device comprising the apparel detection apparatus of any of claims 15-28.

30. An electronic device, comprising:

a memory for storing executable instructions; and

a processor in communication with the memory for executing the executable instructions to perform the operations of the apparel detection method of any of claims 1-15.

31. A computer-readable storage medium storing computer-readable instructions that, when executed, perform the operations of any of the steps in the apparel detection method of claims 1-15.