CN113989541A

CN113989541A - Dressing classification method and system based on feature aggregation

Info

Publication number: CN113989541A
Application number: CN202111112584.8A
Authority: CN
Inventors: 高朋; 刘辰飞; 陈英鹏; 席道亮
Original assignee: Synthesis Electronic Technology Co Ltd
Current assignee: Synthesis Electronic Technology Co Ltd
Priority date: 2021-09-23
Filing date: 2021-09-23
Publication date: 2022-01-28

Abstract

The invention discloses a dressing classification method and system based on feature aggregation, which relate to the technical field of image processing and comprise the following steps: carrying out multi-layer feature extraction on the image training set based on a pre-constructed classification model; performing channel dimension transformation on the extracted features of each layer, and compressing the feature dimensions to be consistent with the number of categories in the image training set to obtain compressed features; performing cosine transformation on the compressed features of each layer, mapping the features to an angle space, aggregating the features, and constructing a loss function after the feature aggregation; and training the classification model according to the loss function, and dressing the image to be recognized according to the trained classification model. By means of weighting and fusing coefficients, the color information of a shallow layer and the contour information of a deep layer are fully utilized, meanwhile, feature aggregation operation is added to map feature information to an angle space, a maximized classification boundary is obtained, and the similarity difference in the dressing matching process is increased.

Description

Dressing classification method and system based on feature aggregation

Technical Field

The invention relates to the technical field of image processing, in particular to a dressing classification method and system based on feature aggregation.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

At present, in the service industry, the standard dressing of related workers such as service windows, enterprise production, construction sites and the like is restrained, and the standardized dressing not only can improve the service image and enhance the production efficiency of enterprises, but also can reduce the safety production accidents. The existing dressing detection method based on the deep learning algorithm mainly adopts a classification network to classify targets or carry out similarity comparison with standard dressing so as to realize the dressing detection function. The method has the advantages that the classification network training is simple, and the dressing detection process is simple and quick.

The core of the classification model is to find the optimal classification boundary among the final sample characteristics of the convolutional network, so as to achieve the target classification effect. As the distribution of the convolution layer number deepens, the surface layer characteristics of the sample such as color, outline, texture and the like are continuously reduced, and the deep abstract characteristics are continuously increased. Due to the particularity of the tool sample, namely, the dressing types with similar colors exist, the classification boundary is not obvious enough, the similarity difference in the standard dressing matching process is small, and the detection result is influenced, so that the characteristics of the dressing sample need deep abstract characteristics and also need light characteristics such as colors, outlines, textures and the like.

Chinese patent CN114183472A discloses a method for detecting whether a worker wears a work clothes based on improved RetinaNet, which comprises the steps of extracting multi-scale features of an image by using MobileNet with a classification layer removed, obtaining three multi-scale feature maps of the image, and classifying clothing types by using Resnet18 to realize a clothing detection function. The method essentially adopts a classification network to realize the dressing type classification function, and no solution is provided when the dressing discrimination is reduced due to environmental change aiming at a specific scene.

Chinese patent CN110059674A discloses a standard dressing detection method based on deep learning, which adds related categories such as masks and hats in training samples by improving the setting parameters of anchor points in a target detection algorithm, completes target classification while realizing target detection, and further judges whether to dress as required. The method has simple network structure and high detection speed, but does not provide a solution on how to improve the degree of similar dressing areas.

Chinese patent CN111401418A discloses a method for detecting employee dressing specification based on improved Faster r-cnn, which is used for collecting, labeling and enhancing data of sample data sets aiming at different application scenes; establishing an improved Faster r-cnn network model; training the improved network by using the enhanced training sample set; detecting the test sample set by using the trained network model; and analyzing whether the detection result meets the predefined dressing specification. Although the diversity of the samples is increased by using a data enhancement method, no improvement method is proposed on the inter-sample-class difference.

Disclosure of Invention

In order to solve the problems, the invention provides a clothing classification method and a clothing classification system based on feature aggregation.

In order to achieve the purpose, the invention adopts the following technical scheme:

in a first aspect, the present invention provides a clothing classification method based on feature aggregation, including:

carrying out multi-layer feature extraction on the image training set based on a pre-constructed classification model;

performing channel dimension transformation on the extracted features of each layer, and compressing the feature dimensions to be consistent with the number of categories in the image training set to obtain compressed features;

performing cosine transformation on the compressed features of each layer, mapping the features to an angle space, aggregating the features, and constructing a loss function after the feature aggregation;

and training the classification model according to the loss function, and dressing the image to be recognized according to the trained classification model.

As an alternative embodiment, the extracted multi-layer features are embodied as a light color feature, a light texture feature, a deep contour feature and a deep abstract feature.

As an alternative implementation, the multi-layer feature extraction process is that except the last layer of feature layer, the output of each layer is subjected to feature extraction through convolution operation and maximum mean operation in sequence; and taking the output of more than one characteristic layer as input, and directly carrying out maximum mean value operation.

As an alternative embodiment, the channel dimension transformation adds a linear link layer to the classification model, and compresses the feature dimension according to the weight coefficient of the linear link layer.

As an alternative embodiment, each layer after compression is characterized by:

wherein Linear () represents a Linear chaining operation, W_clsIs a weight coefficient of a linear link layer whose dimension is W_cls∈R^2048,nN is the number of classes in the image training set, and F is the feature extracted from each layer.

As an alternative embodiment, the cosine transform is performed on the compressed features of each layer to obtain: f_cls＝||W_cls||||F||cosθ；

Wherein, F_clsFor features after compression, W_clsThe weight coefficient of the linear link layer is shown, n is the number of categories in the image training set, F is the feature extracted from each layer, and theta is the cosine included angle.

As an alternative embodiment, the loss function for each layer is:

wherein the content of the first and second substances,

is a linear link layer weight coefficient W_clsThe feature outputs a link weight to the ith output neuron; f. ofⁱIs the output of the ith neuron in the linear link layer; thetaⁱIs a cosine included angle; m is an interval parameter; s is a scaling factor.

As an alternative embodiment, in the process of mapping and aggregating the features to the angle space, the features are mapped to a hypersphere on the angle space, the radius of the hypersphere is controlled by a scaling factor s, the feature polymerization degree is controlled by an interval parameter m, and as m increases, the difference between the features is higher, and the feature polymerization degree is higher.

In a second aspect, the present invention provides a clothing classification system based on feature aggregation, including:

the characteristic extraction module is configured to perform multi-layer characteristic extraction on the image training set based on a pre-constructed classification model;

the dimension transformation module is configured to perform channel dimension transformation on the extracted features of each layer, and compress the feature dimensions to be consistent with the number of categories in the image training set to obtain compressed features;

the characteristic aggregation module is configured to perform cosine transform on the compressed characteristics of each layer, map the characteristics to an angle space, aggregate the characteristics, and construct a loss function after the characteristics are aggregated;

and the dressing classification module is configured to train the classification model according to the loss function, and the dressing image to be recognized is subjected to dressing classification according to the trained classification model.

In a third aspect, the present invention provides an electronic device comprising a memory and a processor, and computer instructions stored on the memory and executed on the processor, wherein when the computer instructions are executed by the processor, the method of the first aspect is performed.

In a fourth aspect, the present invention provides a computer readable storage medium for storing computer instructions which, when executed by a processor, perform the method of the first aspect.

Compared with the prior art, the invention has the beneficial effects that:

according to the dressing classification method and system based on feature aggregation, when feature extraction is carried out on an image training set, features with different dimensionalities are output by each feature extraction layer, when channel conversion is carried out, linear link layers are added, output feature dimensionalities are compressed to the number of categories through weight coefficients of the linear link layers, and through a mode of weighting fusion coefficients, color information output by a shallow layer and profile information of a deep layer are fully utilized on the basis of deep abstract features, the weighting fusion coefficients are continuously optimized, and an optimal fusion proportion is obtained.

According to the dressing classification method and system based on feature aggregation, feature aggregation operation is added, the similarity difference in the dressing matching process is increased, the feature information is mapped to the angle space, the maximum classification boundary is obtained, the difference among different types of samples is improved, and the distinguishing degree among different types of clothes is further improved.

Advantages of additional aspects of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description serve to explain the invention and not to limit the invention.

Fig. 1 is a flowchart of a clothing classification method based on feature aggregation according to embodiment 1 of the present invention.

Detailed Description

The invention is further described with reference to the following figures and examples.

It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise, and it should be understood that the terms "comprises" and "comprising", and any variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The embodiments and features of the embodiments of the present invention may be combined with each other without conflict.

Example 1

As shown in fig. 1, the present embodiment provides a clothing classification method based on feature aggregation to enhance the difference between different color clothing types, which specifically includes:

s1: carrying out multi-layer feature extraction on the image training set based on a pre-constructed classification model;

s2: performing channel dimension transformation on the extracted features of each layer, and compressing the feature dimensions to be consistent with the number of categories in the image training set to obtain compressed features;

s3: performing cosine transformation on the compressed features of each layer, mapping the features to an angle space, aggregating the features, and constructing a loss function after the feature aggregation;

s4: and training the classification model according to the loss function, and obtaining a dressing classification result of the dressing image to be recognized according to the trained classification model.

In this embodiment, in step S1, the pre-constructed classification model uses Resnet50 as a basic network, and includes 4 layers of feature extraction layers, and totally outputs 4 layers of features with different dimensions, specifically, a light color feature, a light texture feature, a deep contour feature, and a deep abstract feature.

In this embodiment, the images in the image training set are samples with 3 channels, 256 pixels in width and 128 pixels in height, and output features of different dimensions after sequentially passing through 4 layers of feature extraction layers.

In step S2, the process of performing channel dimension transformation on the features extracted from each layer specifically includes:

s21-1: the output of the first Layer1 is L₁Dimension of L₁∈R^1,256,68,32The input of the layer is an image training set, and after convolution operation is performed on the images in the image training set, the following results are obtained:

L_{1_conv}＝Conv(L₁)；

wherein L is_1-convTo use a convolution kernel with Conv () of 1 × 1, input 256-dimensional and output 2048-dimensional convolution operations, output L_{1_conv}∈R^1,2048,68,32。

S21-2: to L_{1_conv}And performing an average operation to obtain the output characteristics of the layer:

F₁＝Avg(L_{1_conv})

wherein, F₁Is the output characteristic of Layer1 and has the dimension of F₁∈R^1,2048Avg () is the maximum mean operation.

S21-3: the operation of performing channel dimension transformation on the output features of the Layer1 Layer specifically includes:

adding a linear link layer in the classification model, compressing the dimension of the output feature to the number of categories, wherein the compressed feature is expressed as:

wherein Linear () represents a Linear chaining operation, W_cls1Is a weight coefficient of a linear link layer whose dimension is W_cls1∈R^2048,nAnd n is the number of classes in the image training set.

S22-1: the output of the second Layer2 is L₂Dimension of L₂∈R^1,512,33,16The input of the Layer is the output L of Layer1₁After convolution operation, we get:

L_{2_conv}＝Conv(L₂)

wherein L is_{2_conv}Adopting Conv () as 1 × 1 convolution kernel, inputting 512-dimensional convolution operation and outputting 2048-dimensional convolution operation, then outputting L_{2_conv}∈R^1,2048,33,16。

S22-2: to L_{2_conv}And performing an average operation to obtain the output characteristics of the layer:

F₂＝Avg(L_{2_conv})

wherein, F₂Is the output characteristic of Layer2 and has the dimension of F₂∈R^1,2048Avg () is the maximum mean operation.

S22-3: the operation of performing channel dimension transformation on the output features of the Layer2 Layer specifically includes:

adding a linear link layer, compressing the dimension of the output feature to the number of categories, wherein the compressed feature is as follows:

wherein Linear () represents a Linear chaining operation, W_cls2Is a weight coefficient of a linear link layer whose dimension is W_cls2∈R^2048,n。

S23-1: the output of the third Layer3 is L₃Dimension of L₃∈R^1,1024,17,8The input of the Layer is the output L of Layer2₂After convolution operation, we get:

L_{3_conv}＝Conv(L₃)

wherein L is_{3_conv}Adopting Conv () as 1 × 1 convolution kernel, inputting 1024-dimension and outputting 2048-dimension convolution operation, then outputting L_{3_conv}∈R^1,2048,17,8。

S23-2: to L_{3_conv}The average value operation is carried out, and the average value operation,the output characteristics of the layer are obtained:

F₃＝Avg(L_{3_conv})

wherein, F₃Is the output characteristic of Layer3 and has the dimension of F₃∈R^1,2048Avg () is the maximum mean operation.

S23-3: the operation of performing channel dimension transformation on the output features of the Layer3 Layer specifically includes:

adding a linear link layer to the output features, compressing the dimensions of the output features to the number of categories, and then the compressed features are:

wherein Linear () represents a Linear chaining operation, W_cls3Is a weight coefficient of a linear link layer whose dimension is W_cls3∈R^2048,n。

S24-1: the output of the fourth Layer4 is L₄Dimension of L₄∈R^1,2048,9,4The input of the Layer is the output L of Layer3₃Directly carrying out mean value operation on the obtained product to obtain an output characteristic F₄Dimension F₄∈R^1,2048：

F₄＝Avg(L₄)

S24-2: the operation of performing channel dimension transformation on the output features of the Layer3 Layer specifically includes:

wherein Linear () represents a Linear chaining operation, W_cls4Is a weight coefficient of a linear link layer whose dimension is W_cls4∈R^2048,n。

In step S3, performing a feature aggregation loss operation on the compressed features of each layer, specifically including:

for the features of Layer1, cosine transform is performed according to the cosine theorem:

the Layer1 Layer loss function is:

wherein N is the number of input samples,

is a linear link layer weight coefficient W_cls1The feature outputs a link weight to the ith output neuron;

is the output of the ith neuron in the linear link layer;

is the cosine included angle; m is a fixed interval parameter, and m is taken in general>0; s is a scaling factor.

Similarly, cosine transform is performed on the Layer2 Layer class output, and the loss function of the Layer is obtained as follows:

wherein the content of the first and second substances,

is a linear link layer weight coefficient W_cls2The feature outputs a link weight to the ith output neuron;

is the output of the ith neuron in the linear link layer;

is the cosine angle.

Cosine transforming the Layer3 Layer class output and obtaining the loss function of the Layer as:

wherein the content of the first and second substances,

is a linear link layer weight coefficient W_cls3The feature outputs a link weight to the ith output neuron;

is the output of the ith neuron in the linear link layer;

is the cosine included angle;

cosine transforming the Layer4 Layer class output and obtaining the loss function of the Layer as:

wherein the content of the first and second substances,

is a linear link layer weight coefficient W_cls4The feature outputs a link weight to the ith output neuron;

is the output of the ith neuron in the linear link layer;

is the cosine angle.

Based on the four types of loss functions, the total loss function of the classification model is as follows:

wherein the content of the first and second substances,

is a scaling factor.

After cosine transformation, the features are mapped to a hypersphere on an angle space, the radius of the hypersphere is controlled by a scaling factor s, the aggregation degree of the features is controlled by an interval parameter m, and with the increase of m, the difference between the features is more obvious, the polymerization degree of the features is higher, and the feature area is more compact, so that the difference between samples is increased.

In this embodiment, when the classification model is trained, the existing optimization method, such as a gradient correction method, may be used to correct the network model parameters and the feature fusion weights, and obtain the optimal parameters, so as to obtain the optimal classification model.

The embodiment enhances the difference between different colors and different dress types based on a deep learning algorithm, and fully utilizes the light color information and the deep contour information on the basis of deep abstract characteristics by training a weighting fusion coefficient. Meanwhile, the feature aggregation operation is added, the feature information is mapped to an angle space, the maximized classification boundary is obtained, and the similarity difference in the standard dressing matching process is increased.

Example 2

The embodiment provides a dressing classification system based on feature aggregation, which comprises:

It should be noted that the modules correspond to the steps described in embodiment 1, and the modules are the same as the corresponding steps in the implementation examples and application scenarios, but are not limited to the disclosure in embodiment 1. It should be noted that the modules described above as part of a system may be implemented in a computer system such as a set of computer-executable instructions.

In further embodiments, there is also provided:

an electronic device comprising a memory and a processor and computer instructions stored on the memory and executed on the processor, the computer instructions when executed by the processor performing the method of embodiment 1. For brevity, no further description is provided herein.

It should be understood that in this embodiment, the processor may be a central processing unit CPU, and the processor may also be other general purpose processors, digital signal processors DSP, application specific integrated circuits ASIC, off-the-shelf programmable gate arrays FPGA or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and so on. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory may include both read-only memory and random access memory, and may provide instructions and data to the processor, and a portion of the memory may also include non-volatile random access memory. For example, the memory may also store device type information.

A computer readable storage medium storing computer instructions which, when executed by a processor, perform the method described in embodiment 1.

The method in embodiment 1 may be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules in the processor. The software modules may be located in ram, flash, rom, prom, or eprom, registers, among other storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor. To avoid repetition, it is not described in detail here.

Those of ordinary skill in the art will appreciate that the various illustrative elements, i.e., algorithm steps, described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

Although the embodiments of the present invention have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present invention, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive efforts by those skilled in the art based on the technical solution of the present invention.

Claims

1. A clothing classification method based on feature aggregation is characterized by comprising the following steps:

2. The method for classifying dresses based on feature aggregation as claimed in claim 1, wherein the multi-layer feature extraction process comprises the steps of sequentially performing feature extraction on the outputs of each layer except the last layer by convolution operation and maximum mean value operation; and taking the output of more than one characteristic layer as input, and directly carrying out maximum mean value operation.

3. The method of claim 1, wherein the channel dimension transformation is to add a linear link layer to the classification model, and the feature dimension is compressed according to a weight coefficient of the linear link layer.

4. The method of claim 1, wherein the compressed features of each layer are as follows:

wherein Linear () represents a Linear chaining operation, W_clsIs a weight coefficient of a linear link layer whose dimension is W_cls∈R²⁰⁴⁸ ^,nN is the number of classes in the image training set, and F is the feature extracted from each layer.

5. The method of claim 1, wherein the method is based on feature aggregationThe method for classifying the loads is characterized in that cosine transform is carried out on the compressed features of each layer to obtain: f_cls＝||W_cls||||F||cosθ；

6. The method of claim 1, wherein the penalty function for each layer is:

wherein the content of the first and second substances,

7. The method of claim 6, wherein in the process of mapping the features to the angle space and aggregating the features, the features are mapped to a hypersphere on the angle space, the hypersphere radius is controlled by a scaling factor s, the degree of aggregation of the features is controlled by an interval parameter m, and as m increases, the higher the difference between the features, the higher the degree of aggregation of the features.

8. A clothing classification system based on feature aggregation, comprising:

9. An electronic device comprising a memory and a processor and computer instructions stored on the memory and executed on the processor, the computer instructions when executed by the processor performing the method of any of claims 1-7.

10. A computer-readable storage medium storing computer instructions which, when executed by a processor, perform the method of any one of claims 1 to 7.