CN109784350A

CN109784350A - In conjunction with the dress ornament key independent positioning method of empty convolution and cascade pyramid network

Info

Publication number: CN109784350A
Application number: CN201811634796.0A
Authority: CN
Inventors: 姚麟倩; 李锵; 关欣
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2018-12-29
Filing date: 2018-12-29
Publication date: 2019-05-21

Abstract

The present invention relates to the dress ornament key independent positioning methods of a kind of combination cavity convolution and cascade pyramid network, and include three parts: ResNet-101, GlobalNet and RefineNet carry out image characteristics extraction by ResNet-101；GlobalNet carries out simple crucial point location；RefineNet integrates the characteristic present from GlobalNet, identifies remaining difficult key point.

Description

In conjunction with the dress ornament key independent positioning method of empty convolution and cascade pyramid network

Technical field

The present invention relates to fashion world, field of image processing, key point positioning field, deep learning fields, by cascade gold Word tower network (Cascade Pyramid Network, CPN) is combined with empty convolution and is improved, and realizes dress ornament key point Location tasks.

Background technique

In recent years, with the fast development of electric business platform and fashion industry, it is more next for the algorithm requirements of dress ornament analysis It is more urgent.Dress ornament key point location can effectively promote the alignment of dress ornament position, accelerate thingness identification, divide image can automatically Class ownership, has caused social extensive concern.Be applied to human body critical point detection algorithm at present and have been achieved with tremendous development, but with In the mutual fusion process of fashion industry, since dress ornament is in classification, ratio and apparent variability, dress ornament key point location algorithm Still suffer from significant challenge.For human body key point location, most methods are all the coordinates for directly returning out human joint points, But the flexibility and regression model scalability due to human motion are poor, the effect of such method is not ideal.

With the development of depth learning technology, answered extensively on image classification, identification and critical point detection With CPM (Convolutional Pose Machines, CPM) network of the propositions such as Wei in 2016 passes through ordered convolution side Formula carries out the expression of spatial information and texture information, realizes the stronger key point location algorithm of robustness.The same year Alejandro Deng proposition Hourglass (Stack Hourglass Networks) network, passes through and introduce the full convolutional neural networks of multimode (Convolutional Neural Network, CNN) solves single crucial point location, and each CNN module captures different rulers The feature of picture is spent, human body spatial relationship is found with this, infers the artis position of human body.Then, more people's critical point detections Algorithm gradually appears, and effect is preferably top-down algorithm, i.e., first detects one, repositions everyone key point. The G-RMI algorithm of the propositions such as Papandreou in 2017 then makes first with multiple people in FasterR-CNN detection figure Key point is accurately positioned with depth residual error network (Deep Residual Networks, ResNet)；The same year, He Kaiming was in R- MASK R-CNN innovatory algorithm is proposed on the basis of CNN, Fast R-CNN and Faster R-CNN, in example segmentation, bounding box inspection The effect for being better than single model is all obtained in the multiple tasks such as survey and human body key point location；Subsequent RMPE algorithm is to overcome by list People's detection block difference and the problem of cause key point position error, utilize the single goal detection algorithm SSD of pyramid structure (Single ShotDetector, SSD) detects single people, reuses the key point inspection that Hourglass network carries out single posture It surveys.Chen etc. artificially solves more difficult critical point detection and proposes cascade pyramid structure network (Cascaded Pyramid Network, CPN), first by multiple people in the target detection topology discovery figure of MASK R-CNN, pass through later The cascade network of GlobalNet (Global Pyramid Network) and RefineNet (Refined Pyramid Network) Network structure is realized to everyone more difficult critical point detection, and human body critical point detection challenge match champion in 2017 is won.CPN network Simply greatly improve the accuracy of positioning with difficult key point by distinguishing, but network is not still well by image Low-level details information is used for crucial point location, it is therefore desirable to be further improved.

Summary of the invention

The object of the present invention is to provide the clothes that image low-level details information can be preferably used for crucial point location by one kind Adorn crucial independent positioning method: ICPN (Improved Cascaded Pyramid Network, ICPN).ICPN algorithm is for pass The semantic information of different levels merges problem in key point location task, using empty convolution, is not reducing the impression of high-level characteristic figure In the case where open country, the spatial resolution of characteristic pattern is improved, to obtain more image detail information features, is further promoted and is closed Key point detection accuracy is improved the robustness of algorithm by a variety of data enhancement operations, and avoids sky by the cutting of corresponding feature Hole convolution bring computation complexity becomes larger problem.Technical solution is as follows:

The dress ornament key independent positioning method of a kind of combination cavity convolution and cascade pyramid network, includes three parts: ResNet-101, GlobalNet and RefineNet carry out image characteristics extraction by ResNet-101；GlobalNet is carried out Simple key point location；RefineNet integrates the characteristic present from GlobalNet, identifies remaining difficult key point, packet It includes:

1) ResNet-101 feature extraction network: the input picture for being N × N for a Zhang great little introduces shortcut and skips certain The connection of a little layers, then converge with main diameter.

2) based on GlobalNet extraction different scale feature cascade pyramid structure module: Conv4-Conv5 with Empty convolution replaces the convolution operation of script, does not reduce spatial resolution while increasing receptive field, generates space ruler respectively Degree is 256 × N/4 × N/4, and 512 × N/8 × N/8,512 × N/8 × N/8,512 × N/8 × N/8 characteristic pattern, latter three groups special Sign figure scale is the same, and the characteristic pattern Conv2 and Conv3 of bottom have relatively high spatial resolution, but semantic information is relatively low； And high-rise characteristic pattern Conv4 and Conv5 includes more semantic informations and spatial resolution does not reduce.

3) three groups of features after empty convolution makes the fusion different scale feature cascade module based on GlobalNet: are introduced Figure has been of the same size and can directly be added, and only merges after the last layer needs to carry out up-sampling operation.

4) characteristic present from GlobalNet is polymerize come location difficulty key point using RefineNet, The characteristic pattern that Conv2 and Conv4 is generated only is remained in RefineNet.

5) data such as corresponding image rotation, thermodynamic chart Gaussian Blur network training and test: are carried out for training image Enhancing operates to improve data volume and promote network robustness, and by test data set test result, it is fixed to export dress ornament key point Error rate of the key point coordinate of position result figure and final result relative to true tag.

The present invention carries out the task of dress ornament key point location by the method that empty convolution is combined with CPN network, with one A little classical methods compare, and advantage is mainly reflected in:

Novelty: artificial intelligence is introduced fashion world by the present invention, effectively improves the precision of dress ornament key point location, There is great commercial application value under the scenes such as electric business, fashion collocation.The present invention passes through empty convolution and improves CPN network, overcomes The problem of feature pyramid structure in former network can largely lose the low-level details information in further feature figure is closed in dress ornament Preferable effect is achieved in key point location task.

Robustness: algorithm of the invention is applicable to a variety of key point location tasks, and the present invention is revolved by corresponding image Turn, the data enhancement operations such as thermodynamic chart Gaussian Blur further increase the robustness of model.

Detailed description of the invention

The improved CPN network algorithm frame of Fig. 1

Fig. 2 ResNet schematic network structure

(a) (b) (c) of Fig. 3 is empty convolution principle schematic diagram

Fig. 4 feature pyramid structure schematic diagram, (a) primitive character pyramid (b) improve feature pyramid

Fig. 5 heterogeneous networks testing result comparison diagram, (a) true tag；(b)Horglass；(c)CPM；(d)CPN； (e) ICPN

Specific embodiment

The present invention includes three parts: ResNet-101, GlobalNet and RefineNet altogether.Wherein pass through ResNet- 101 carry out image characteristics extraction；GlobalNet is by the improved pyramid structure progress Fusion Features of empty convolution and simply Crucial point location；RefineNet integrates the characteristic present from GlobalNet, polymerize the feature of different dimensions, identifies remaining Difficult key point, to avoid becoming larger because of empty convolution bring computation complexity, the present invention carries out corresponding feature in the part It cuts, only remains the characteristic pattern that Conv2 and Conv4 is generated.

Fig. 1 is total algorithm frame of the invention.Dress ornament key point location of the empty convolution in conjunction with CPN includes following several Step:

1) feature extraction network, the input picture for being N × N for a Zhang great little, the present invention pass through ResNet-101 first Carry out feature extraction, as shown in Fig. 2, the general network that compares, ResNet introduces the connection that shortcut skips certain layers, then with main diameter Converge, so that the error of bottom can solve the problems, such as that gradient disappears to upper layer transfers by shortcut, is not increasing additional parameter Increase the training speed of network model while not improving computation complexity again, improve training effect.

2) GlobalNet extracts the cascade pyramid structure module of different scale feature, as shown in figure 4, C2-C5 generation respectively In table residual error network Conv2-Conv5 generate characteristic pattern, be different from original pyramid structure characteristic pattern scale (Fig. 4 (a)) by The characteristics of layer successively decreases, (Fig. 4 (b)) of the invention replace the convolution operation of script in Conv4-Conv5 with empty convolution (such as Fig. 3), Spatial resolution is not reduced while increasing receptive field, generating space scale respectively is 256 × N/4 × N/4,512 × N/8 × N/8,512 × N/16 × N/16,512 × N/16 × N/16 characteristic pattern.The characteristic pattern C2 and C3 of bottom have relatively high sky Between resolution ratio, but semantic information is relatively low；And high-rise characteristic pattern C4, C5 includes more semantic informations and spatial discrimination Rate does not reduce.

3) GlobalNet merges pyramid structure different scale characteristic module, as shown in figure 4, original pyramid structure In each layer characteristic pattern carry out being required to carry out up-sampling operation when top-down Fusion Features so that characteristic pattern scale is unanimously again Addition fusion is carried out, this inevitably affects the quality of characteristic pattern, and the present invention is due to introducing three after empty convolution makes Group characteristic pattern has been of the same size and can directly be added, and only merges after the last layer needs to carry out up-sampling operation.

4) RefineNet, which polymerize the characteristic present from GlobalNet, carrys out location difficulty key point, and the present invention exists The characteristic pattern that Conv2 and Conv4 is generated only is remained in RefineNet, it is intended under the premise of not influencing key point locating effect Feature redundancy is reduced, the increase due to introducing empty convolution and bring calculation amount is reduced.

5) Alibaba Tianchi contest 2018FashionAI dress ornament key point location data collection training network is used, for Training image carries out the data enhancement operations such as corresponding image rotation, thermodynamic chart Gaussian Blur to improve data volume and promote network Robustness.By test data set test result, the key point for exporting dress ornament key point positioning result figure and final result is sat Mark the error rate relative to true tag.

The testing result comparison of heterogeneous networks is as shown in Figure 5.Crucial positioning result of the invention closest to true tag, and Key point position error in remaining network is larger, the more difficult positioning of key point especially more similar with background, algorithms of different As a result gap is obvious.Specific normalization error rate is as shown in table 1, the present invention respectively jacket, housing, one-piece dress, half body skirt, Best result is obtained respectively in the dress ornament key point location of five kinds of classifications of trousers and entirety.

The normalization detection error rate of 1 heterogeneous networks of table

Claims

1. the dress ornament key independent positioning method of a kind of combination cavity convolution and cascade pyramid network, includes three parts: ResNet-101, GlobalNet and RefineNet carry out image characteristics extraction by ResNet-101；GlobalNet is carried out Simple key point location；RefineNet integrates the characteristic present from GlobalNet, identifies remaining difficult key point.Packet It includes:

1) ResNet-101 feature extraction network: the input picture for being N × N for a Zhang great little introduces shortcut and skips certain layers Connection, then converge with main diameter；

2) the cascade pyramid structure module of the extraction different scale feature based on GlobalNet: in Conv4-Conv5 with cavity Convolution replaces the convolution operation of script, does not reduce spatial resolution while increasing receptive field, generates space scale respectively and is 256 × N/4 × N/4,512 × N/8 × N/8,512 × N/8 × N/8,512 × N/8 × N/8 characteristic pattern, rear three groups of characteristic patterns Scale is the same, and the characteristic pattern Conv2 and Conv3 of bottom have relatively high spatial resolution, but semantic information is relatively low；And it is high The characteristic pattern Conv4 and Conv5 of layer include more semantic informations and spatial resolution does not reduce；

3) it the fusion different scale feature cascade module based on GlobalNet: introduces empty convolution and has made rear three groups of characteristic patterns Being of the same size directly to be added, and only merge after the last layer needs to carry out up-sampling operation；

4) characteristic present from GlobalNet is polymerize come location difficulty key point using RefineNet, The characteristic pattern that Conv2 and Conv4 is generated only is remained in RefineNet；

5) network training and test: carrying out the data such as corresponding image rotation, thermodynamic chart Gaussian Blur for training image enhances It operates to improve data volume and promote network robustness, passes through test data set test result, export dress ornament key point location knot Error rate of the key point coordinate of fruit figure and final result relative to true tag.