CN108229497A

CN108229497A - Image processing method, device, storage medium, computer program and electronic equipment

Info

Publication number: CN108229497A
Application number: CN201710632941.0A
Authority: CN
Inventors: 杨巍; 欧阳万里; 李爽; 李鸿升; 王晓刚
Original assignee: Beijing Sensetime Technology Development Co Ltd
Current assignee: Beijing Sensetime Technology Development Co Ltd
Priority date: 2017-07-28
Filing date: 2017-07-28
Publication date: 2018-06-29
Anticipated expiration: 2037-07-28
Also published as: CN108229497B; WO2019020075A1

Abstract

An embodiment of the present invention provides a kind of image processing method, device, storage medium, computer program and electronic equipment, wherein, described image processing method includes：Obtain the characteristic pattern of image to be detected；At least two kinds of different scales are based on by neural network, feature extraction is carried out to the characteristic pattern, obtain at least two other characteristic patterns；Merge the characteristic pattern and other each described characteristic patterns, obtain the fisrt feature figure of described image to be detected.Using the technical solution of the embodiment of the present invention, neural network learning can be utilized and extract the feature of different scale, improve the accuracy and robustness of feature extraction.

Description

Image processing method, device, storage medium, computer program and electronic equipment

Technical field

The present embodiments relate to technical field of computer vision more particularly to a kind of image processing method, device, storages Medium, computer program and electronic equipment.

Background technology

Human body attitude estimation mainly positions the position of human body each section in given image or video, is meter One important research topic of calculation machine visual field is mainly used in action recognition, Activity recognition, clothes parsing, task pair Than, human-computer interaction etc..

At present, estimation method of human posture depends on the feature that object detector is detected, existing object detector one As be trained to obtain on fixed size.

Invention content

An embodiment of the present invention provides a kind of image procossing schemes.

It is according to embodiments of the present invention in a first aspect, provide a kind of image processing method, including：Obtain image to be detected Characteristic pattern；At least two kinds of different scales are based on by neural network, feature extraction is carried out to the characteristic pattern, obtain at least two Other a characteristic patterns；Merge the characteristic pattern and other each described characteristic patterns, obtain the fisrt feature figure of described image to be detected.

Optionally, it further includes：The target object in described image to be detected is carried out according to the fisrt feature figure crucial Point detection.

Optionally, it is described that critical point detection is carried out to the target object according to the fisrt feature figure, including：According to institute State the shot chart that fisrt feature figure obtains an at least key point for the target object respectively；It is wrapped according in each shot chart The score of the pixel included determines the position of the corresponding key point of the target object.

Optionally, the neural network includes at least one feature pyramid sub-network, the feature pyramid sub-network At least one second branching networks in parallel with first branching networks including the first branching networks and respectively；It is described other Characteristic pattern includes second feature figure or third feature figure；The original scale of first branching networks based on the characteristic pattern is to institute It states characteristic pattern and carries out feature extraction, obtain the second feature figure；Each second branching networks are based respectively on different from described Other scales of original scale carry out feature extraction to the characteristic pattern, obtain the third feature figure.

Optionally, first branching networks include the second convolutional layer, third convolutional layer and Volume Four lamination；Described second Convolutional layer reduces the dimension of the characteristic pattern；After the original scale of the third convolutional layer based on the characteristic pattern is to reducing dimension Characteristic pattern carry out process of convolution；The Volume Four lamination promotes the dimension of the characteristic pattern Jing Guo process of convolution, acquisition described the Two characteristic patterns.

Optionally, at least 1 second branching networks include the 5th convolutional layer, down-sampled layer, the 6th convolutional layer, on adopt Sample layer and the 7th convolutional layer；5th convolutional layer reduces the dimension of the characteristic pattern；The down-sampled layer is adopted according to setting drop Sample ratio is down-sampled to the characteristic pattern progress after reducing dimension, wherein, the scale of the characteristic pattern after down-sampled is less than described The original scale of characteristic pattern；6th convolutional layer carries out process of convolution to described by down-sampled characteristic pattern；It is adopted on described Sample layer up-samples the characteristic pattern Jing Guo convolution according to up-sampling ratio is set, wherein, the characteristic pattern after up-sampling Scale be equal to the characteristic pattern original scale；7th convolutional layer promotes the dimension of the characteristic pattern after up-sampling, Obtain the third feature figure.

Optionally, second branching networks have multiple；The down-sampled ratio of setting of at least two second branching networks Example it is different and/or, the down-sampled ratio of setting of at least two second branching networks is identical.

Optionally, second branching networks have multiple；The 6th convolution of at least two second branching networks Layer shared parameter.

Optionally, second branching networks include the 5th convolutional layer, expansion convolutional layer and the 7th convolutional layer；Described 5th Convolutional layer reduces the dimension of the characteristic pattern；The expansion convolutional layer carries out expansion convolution to the characteristic pattern after reducing dimension Processing, the 7th convolutional layer promote the dimension of the characteristic pattern after expanding convolution, obtain the third feature figure.

Optionally, second branching networks have multiple；At least two second branching networks share described volume five Lamination and/or the 7th convolutional layer；And/or at least two second branching networks have respective 5th convolution Layer and/or the 7th convolutional layer.

Optionally, the feature pyramid sub-network further includes the first output merging layer；It is right that first output merges layer Share the 7th convolutional layer at least two second branching networks before the 7th convolutional layer respectively export into Row merges and exports amalgamation result to shared the 7th convolutional layer.

Optionally, the neural network includes at least two feature pyramid sub-networks；Each feature pyramid subnet Network, the fisrt feature figure exported using the previous feature pyramid sub-network being connect with current signature pyramid sub-network are input, And according to the fisrt feature figure of input, the fisrt feature figure based on different scale extraction current signature pyramid sub-network.

Optionally, the neural network be hourglass HOURGLASS neural networks, the hourglass HOURGLASS neural networks Including an at least hourglass module include at least one feature pyramid sub-network.

Optionally, the initialization network parameter of an at least network layer for the neural network, from according to the initialization net It is obtained, and the mean value of the initialization network parameter is zero in the network parameter distribution that the mean value and variance of network parameter determine.

Optionally, if existing in neural network including the situation that at least two identical mappings are added, needing what is be added Output adjustment module in an at least identical mapping branch is set, adjusts what the identical mapping branch exported by output adjustment module Fisrt feature figure.

Second aspect according to embodiments of the present invention provides a kind of image processing apparatus, including：Acquisition module is used for Obtain the characteristic pattern of image to be detected；Extraction module is based at least two kinds of different scales to the spy for passing through neural network Sign figure carries out feature extraction, obtains at least two other characteristic patterns；Merging module, for merge the characteristic pattern and it is each it is described its His characteristic pattern, obtains the fisrt feature figure of described image to be detected.

Optionally, it further includes：Detection module, for according to the fisrt feature figure to the target in described image to be detected Object carries out critical point detection.

Optionally, the detection module includes：Subdivision is obtained, for obtaining the mesh respectively according to the fisrt feature figure Mark the shot chart of an at least key point for object；Determination unit, for according to pixel included in each shot chart Score determines the position of the corresponding key point of the target object.

Optionally, the neural network includes at least one feature pyramid sub-network, the feature pyramid sub-network At least one second branching networks in parallel with first branching networks including the first branching networks and respectively；It is described other Characteristic pattern includes second feature figure or third feature figure；First branching networks are used for the original scale based on the characteristic pattern Feature extraction is carried out to the characteristic pattern, obtains the second feature figure；Each second branching networks are used to be based respectively on not Other scales for being same as the original scale carry out feature extraction to the characteristic pattern, obtain the third feature figure.

Optionally, first branching networks include the second convolutional layer, third convolutional layer and Volume Four lamination；Described second Convolutional layer for reducing the characteristic pattern dimension；The third convolutional layer is for the original scale based on the characteristic pattern to drop Characteristic pattern after low dimensional carries out process of convolution；The Volume Four lamination is used to be promoted the dimension of the characteristic pattern Jing Guo process of convolution Degree, obtains the second feature figure.

Optionally, at least 1 second branching networks include the 5th convolutional layer, down-sampled layer, the 6th convolutional layer, on adopt Sample layer and the 7th convolutional layer；5th convolutional layer for reducing the characteristic pattern dimension；The down-sampled layer is used for basis It is down-sampled to the characteristic pattern progress after reducing dimension to set down-sampled ratio, wherein, the scale of the characteristic pattern after down-sampled Less than the original scale of the characteristic pattern；6th convolutional layer is used to carry out at convolution by down-sampled characteristic pattern to described Reason；The up-sampling layer is used to, according to setting up-sampling ratio, up-sample the characteristic pattern Jing Guo convolution, wherein, pass through The scale of characteristic pattern after up-sampling is equal to the original scale of the characteristic pattern；7th convolutional layer is above adopted for being promoted to pass through The dimension of characteristic pattern after sample obtains the third feature figure.

Optionally, second branching networks include the 5th convolutional layer, expansion convolutional layer and the 7th convolutional layer；Described 5th Convolutional layer for reducing the characteristic pattern dimension；The expansion convolutional layer is used to carry out the characteristic pattern after reduction dimension Expand process of convolution；7th convolutional layer obtains the third for promoting the dimension of the characteristic pattern after expanding convolution Characteristic pattern.

Optionally, the feature pyramid sub-network further includes the first output merging layer；First output merges layer and uses In respective defeated before the 7th convolutional layer at least two second branching networks for sharing the 7th convolutional layer Go out to merge and export amalgamation result to shared the 7th convolutional layer.

Optionally, the neural network includes at least two feature pyramid sub-networks；Each feature pyramid subnet Network, the fisrt feature figure for being exported using the previous feature pyramid sub-network being connect with current signature pyramid sub-network is defeated Enter, and according to the fisrt feature figure of input, the fisrt feature figure based on different scale extraction current signature pyramid sub-network.

Optionally, if existing in neural network including the situation that at least two identical mappings are added, needing what is be added Output adjustment module is set in an at least identical mapping branch, and the output adjustment module is defeated for adjusting the identical mapping branch The fisrt feature figure gone out.

The third aspect according to embodiments of the present invention provides a kind of computer readable storage medium, is stored thereon with meter Calculation machine program instruction, wherein, described program instructs the step of any one of aforementioned image processing method is realized when being executed by processor.

Fourth aspect according to embodiments of the present invention, provides a kind of electronic equipment, including：Processor, memory, communication Element and communication bus, the processor, the memory and the communication device are completed mutual by the communication bus Communication；For the memory for storing an at least executable instruction, it is aforementioned that the executable instruction performs the processor The corresponding operation of image processing method of any one.

5th aspect according to embodiments of the present invention, provides a kind of computer program, including：At least one executable finger It enables, it is described that any one of aforementioned image processing method corresponding behaviour is used to implement when at least an executable instruction is processed by the processor Make.

Image procossing scheme according to embodiments of the present invention after the characteristic pattern for obtaining image to be detected, passes through nerve Network carries out feature extraction to obtain other multiple characteristic patterns based on a variety of different scales to characteristic pattern, and by characteristic pattern with it is multiple Other characteristic patterns merge to obtain the fisrt feature figure of image to be detected, utilize the spy of neural network learning and extraction different scale Sign improves accuracy and robustness that neural network carries out feature extraction.

Description of the drawings

Fig. 1 is a kind of step flow chart of according to embodiments of the present invention one image processing method；

Fig. 2 is a kind of step flow chart of according to embodiments of the present invention two image processing method；

Fig. 3 is the first structure schematic diagram of according to embodiments of the present invention two feature pyramid sub-network；

Fig. 4 is the second structure diagram of according to embodiments of the present invention two feature pyramid sub-network；

Fig. 5 is the third structure diagram of according to embodiments of the present invention two feature pyramid sub-network；

Fig. 6 is a kind of structure diagram of according to embodiments of the present invention two neural network for image procossing；

Fig. 7 is a kind of structure diagram of according to embodiments of the present invention two HOURGLASS networks；

Fig. 8 is the shot chart of according to embodiments of the present invention two image processing method output；

Fig. 9 is a kind of structure diagram of according to embodiments of the present invention two identical mapping addition；

Figure 10 is a kind of structure diagram of according to embodiments of the present invention three image processing apparatus；

Figure 11 is the structure diagram of according to embodiments of the present invention four a kind of electronic equipment.

Specific embodiment

(identical label represents identical element in several attached drawings) and embodiment below in conjunction with the accompanying drawings, implement the present invention The specific embodiment of example is described in further detail.Following embodiment is used to illustrate the present invention, but be not limited to the present invention Range.

It will be understood by those skilled in the art that the terms such as " first ", " second " in the embodiment of the present invention are only used for distinguishing Different step, equipment or module etc. neither represent any particular technology meaning, also do not indicate that the inevitable logic between them is suitable Sequence.

Embodiment one

With reference to Fig. 1, a kind of step flow chart of according to embodiments of the present invention one image processing method is shown.

The image processing method of the present embodiment includes the following steps：

Step S102：Obtain the characteristic pattern of image to be detected.

In the present embodiment, arbitrary image analysis processing method may be used to carry out at feature extraction image to be detected Reason, to obtain the characteristic pattern of image to be detected.Optionally, feature is carried out to image to be detected for example, by convolutional neural networks to carry Extract operation obtains the characteristic pattern (Feature Map) for the characteristic information for including image to be detected.Wherein, image to be detected can be with It is any one frame image in independent still image or video sequence.

Illustrate herein, the characteristic pattern of acquisition can be the global characteristics figure of image to be detected or the non-overall situation Characteristic pattern, the present embodiment are not construed as limiting this.For example, it in practical applications, is used to carry out at image according to the characteristic pattern of acquisition The different application scenarios such as reason or object identification, can obtain the global characteristics figure of image to be detected or including object respectively The local feature figure of body.

Step S104：At least two kinds of different scales are based on by neural network, feature extraction is carried out to characteristic pattern, obtained extremely Few two other characteristic patterns.

Wherein, at least two other characteristic patterns are characteristic pattern of the neural network to image to be detected, based at least two kinds not Carry out the characteristic pattern of further feature extraction operation acquisition respectively with scale, each scale corresponds to other features Figure.

Neural network carries out the scale that feature extraction operation is based on, and can limit the feature that feature extraction operation is extracted Scale.In the embodiment of the present invention, neural network is based on different scale and carries out feature extraction to image to be detected, passes through nerve net Network learns and the feature of extraction different scale, can stablize the feature for accurately extracting image to be detected.The embodiment of the present invention The problem of characteristic dimension for occurring causing image to be detected the problems such as example blocking, have an X-rayed sends variation can be successfully managed, from And improve the robustness of feature extraction.

In practical applications, feature extraction is based on scale is different, can be image physics size dimension it is different or The size of the live part of person's image it is different (although for example, the physics size dimension of image is identical, the partial pixel of the image Pixel value used but be not limited to the modes such as zero setting and handled, in addition to the portion of other pixels composition of these treated pixels Divide and be equivalent to live part, the physical size of the size relative image of live part is smaller) etc., but not limited to this.

Optionally, at least two kinds of different scales can include the original scale of image to be detected with being different from original scale At least one scale, alternatively, at least two kinds of different scales including being different from original scale.

Step S106：Merge characteristic pattern and other each characteristic patterns, obtain the fisrt feature figure of image to be detected.

Characteristic pattern and other each characteristic patterns are merged to obtain fisrt feature figure so that fisrt feature figure includes extracting Different scale feature.Merge obtained fisrt feature figure to can be used for carrying out subsequent image procossing, example to image to be detected Such as critical point detection, object detection, object identification, image segmentation, object cluster can improve the effect of subsequent image procossing Fruit.

Image processing method according to embodiments of the present invention after the characteristic pattern for obtaining image to be detected, passes through nerve Network carries out feature extraction to obtain other multiple characteristic patterns based on a variety of different scales to characteristic pattern, and by characteristic pattern with it is multiple Other characteristic patterns merge to obtain the fisrt feature figure of image to be detected, utilize the spy of neural network learning and extraction different scale Sign improves accuracy and robustness that neural network carries out feature extraction.

Any image processing method provided in an embodiment of the present invention can have data-handling capacity by any suitable Equipment perform, including but not limited to：Terminal device and server etc..Alternatively, any image provided in an embodiment of the present invention Processing method can be performed by processor, as processor by the command adapted thereto that memory is called to store performs implementation of the present invention Any image processing method that example refers to.Hereafter repeat no more.

Embodiment two

With reference to Fig. 2, a kind of step flow chart of according to embodiments of the present invention two image processing method is shown.

Step S202：Obtain the characteristic pattern of image to be detected.

In the present embodiment, feature extraction operation is carried out to image to be detected by neural network to obtain characteristic pattern.For example, Neural network includes the convolutional layer (Convolution, Conv) for carrying out feature extraction, to the to be detected of input neural network Image carries out Preliminary detection and feature extraction operation, and acquisition includes the initial characteristic pattern of image to be detected.

Step S204：At least two kinds of different scales are based on by neural network, feature extraction is carried out to characteristic pattern, obtained extremely Few two other characteristic patterns.

Optionally, neural network includes at least one feature pyramid sub-network, for being based at least two kinds of different scales Feature extraction is carried out to characteristic pattern, obtains at least two other characteristic patterns.Feature pyramid includes the first branching networks and divides At least one second branching networks not in parallel with the first branching networks.Original ruler of first branching networks based on image to be detected Degree, to input feature vector, pyramidal characteristic pattern carries out further feature extraction, obtains second feature figure；Each second branching networks Further feature extraction is carried out to characteristic pattern based on other scales different from the original scale, obtains third feature figure. That is, at least two other characteristic patterns include second feature figure and third feature figure.

In a kind of optional embodiment, with reference to Fig. 3, the first branching networks include the second convolutional layer (Convolutio 2, Conv 2), third convolutional layer (Conv 3) and Volume Four lamination (Conv 4).At least one second branching networks include the 5th convolution Layer (Conv 5), down-sampled layer, the 6th convolutional layer (Conv 6), up-sampling layer and the 7th convolutional layer (Conv 7).

First branching networks are f₀, each second branching networks are respectively f₁To f_c, wherein, f₀Retain the original of input feature vector Scale.The characteristic pattern of input feature vector pyramid sub-network is separately input to f₀To f_c。f₀The second convolutional layer and f₁To f_c 1 × 1 convolutional network may be used in five convolutional layers, for reducing the dimension of input feature vector figure.f₁To f_cDown-sampled layer difference According to the down-sampled ratio Ratio 1 to Ratio c of setting, respectively to the feature after the reduction dimension of each 5th convolutional layer output Figure progress is down-sampled, obtains the characteristic pattern of different resolution.Wherein, the scale of the characteristic pattern after down-sampled is less than characteristic pattern Original scale.f₀Third convolutional layer and f₁To f_cThe 6th convolutional layer may be used 3 × 3 convolutional network, for point Characteristic pattern and corresponding down-sampled layer after other the reductions dimension to the output of the second convolutional layer export by down-sampled spy Sign figure carries out the feature of convolution, study and extraction different scale.f₁To f_cUp-sampling layer be based respectively on different up-sampling ratios Example, to being up-sampled by the characteristic pattern of convolution for each 6th convolutional layer output, wherein, characteristic pattern after up-sampling Scale is equal to the original scale of characteristic pattern.f₀Volume Four lamination promote the feature by process of convolution of third convolutional layer output The dimension of figure obtains second feature figure.f₁To f_cThe 7th convolutional layer promoted each corresponding up-sampling layer output by up-sampling The dimension of characteristic pattern obtains third feature figure respectively.

Wherein, each second branching networks f₁To f_cIn, the down-sampled ratio of setting of at least two the second branching networks is different, And/or the down-sampled ratio of setting of at least two the second branching networks is identical.That is, the drop that each second branching networks use is adopted Sample ratio can be differed, can part it is identical, can also be all identical.For these three situations, and based on original scale The first branching networks match, feature pyramid sub-network can extract different features based at least two kinds of different scales.

Further, since f₀Retain the original scale of input feature vector, without changing the resolution ratio of feature, therefore, f₀Do not use Down-sampled layer and up-sampling layer, in practical applications, f₀Down-sampled ratio and up-sampling ratio can also be used as 1 down-sampled layer With up-sampling layer.

Optionally, the 6th convolutional layer shared parameter of at least two the second branching networks.For example, at least two the second branches 6th convolutional layer of network shares convolution kernel, that is, the convolution kernel of at least two the 6th convolutional layers has identical parameter, with logical It crosses using inner parameter shared mechanism, to reduce number of parameters, while also is able to based on being obtained by data and tasking learning The higher accuracy rate of gain of parameter.

In another optional embodiment, the structure type of the feature pyramid sub-network shown in Fig. 4 can also be used, At least one second branching networks include the 5th convolutional layer, expansion convolutional layer and the 7th convolutional layer；5th convolutional layer reduces characteristic pattern Dimension；Expansion convolutional layer carries out expansion process of convolution to the characteristic pattern after reducing dimension；7th convolutional layer is promoted by expansion The dimension of characteristic pattern after convolution obtains third feature figure.That is, by at least down-sampled layer of one second branching networks, the 6th (dilated convolution are expressed as dstride 1 to dstride in figure by expansion convolutional layer for convolutional layer and up-sampling layer C) it replaces, simplifies the network structure inside feature pyramid sub-network, and the resolution ratio of input feature vector can be increased, utilize expansion Convolutional layer completes the sampling operation of different resolution feature, and the extraction operation of different scale feature and similary resolution ratio is special Sampling operation of sign etc., so as to obtain the feature of different scale.Wherein, expansion process of convolution can also realize it is down-sampled for example, In a manner that the pixel value of the one part of pixel of characteristic pattern is set to 0, in the physical size situation of the same size for keeping image Under, the part for having valid pixel value in characteristic pattern is become smaller, equally also achieves down-sampled effect.

Optionally, at least two the second branching networks share the 5th convolutional layer and/or the 7th convolutional layer；And/or at least two A second branching networks have respective 5th convolutional layer and/or the 7th convolutional layer.

For example, in order to simplify the structure of feature pyramid sub-network, the feature pyramid sub-network shown in Fig. 5 may be used Structure type, at least will share same 5th convolutional layer by two the second branching networks.For example, the 5th convolutional layer is 1 × 1 Convolutional network after the feature of input feature vector pyramid sub-network is carried out dimension-reduction treatment, is exported to shared 5th convolutional layer Each second branching networks down-sampled layer.The number of parameters of the feature pyramid sub-network of the structure is less, computation complexity It is relatively low.

Optionally, feature pyramid sub-network further includes the first output merging layer, and the first output merges layer to sharing the 7th Respective output of at least two the second branching networks of convolutional layer before the 7th convolutional layer merges and amalgamation result is defeated Go out to the 7th shared convolutional layer.

It is connected between shared 7th convolutional layer up-sampling layer and the 7th convolutional layer, is used for for example, the first output merges layer It merges processing to the characteristic pattern that the up-sampling layers of each second branching networks exports, and the characteristic pattern after merging is exported to the Seven convolutional layers.Here, merging treatment can include phase add operation or serial operation.For example, shown in figureRepresent output Phase add operation, in figureIt may be replaced byTo represent output serial operation (Concatenation).Wherein, it is added behaviour Work can be expressed as the point-to-point addition of multiple tensors, and serial operation can be expressed as string of multiple tensors in a dimension Connection.If c the second branching networks f₁To f_cThe characteristic pattern of c 256 × 64 × 64 is exported, after phase add operation or 256 × 64 × 64 characteristic pattern is concatenated then becoming after operating the characteristic pattern of (256 × c) × 64 × 64.

In addition, the 7th convolutional layer, which is additionally operable to the feature for exporting each second branching networks, carries out linear transformation, so as to the The feature of the original scale of one branching networks output is added.It is grasped if the first output merges the series connection that is consolidated into that layer carries out Make, the characteristic pattern that the 7th convolutional layer is additionally operable to merge the first output layer output carries out mapping transformation processing, and characteristic pattern is reflected Penetrate the size of the characteristic pattern before being transformed to series connection.For example, it is 256 by the characteristic pattern mapping transformation of above-mentioned (256 × c) × 64 × 64 × 64 × 64 characteristic pattern.

Step S206：Merge characteristic pattern and other each characteristic patterns, obtain the fisrt feature figure of image to be detected.

Optionally, feature pyramid sub-network further includes the second output merging layer, the first branching networks and each second branch The output terminal of network is connected to the second output and merges layer, and here, the output terminal of the second branching networks includes shared volume seven The output terminal of the up-sampling layer of each second branching networks of the output terminal of lamination and not shared 7th convolutional layer.Second output Merge layer to be used for characteristic pattern, the second feature figure of the first branching networks output and the third of each second branching networks output Characteristic pattern merges processing, obtains fisrt feature figure.Here, it is consolidated into phase add operation.

In the present embodiment, neural network includes at least two feature pyramid sub-networks；Each feature pyramid sub-network, with The fisrt feature figure of previous feature pyramid sub-network output being connect with current signature pyramid sub-network is input, and according to The fisrt feature figure of input, the fisrt feature figure based on different scale extraction current signature pyramid sub-network.Wherein, first spy The input of the golden word sub-network of sign is the characteristic pattern that step S202 is obtained, and performs step S204 to step S206 and obtains fisrt feature Figure；Fisrt feature figure of the input of non-first feature gold word sub-network for the output of previous feature pyramid sub-network, and perform step Rapid S204 to step S206 carries out feature extraction, by acquisition based at least two kinds of different scales to the fisrt feature figure of input Other each characteristic patterns are merged with the fisrt feature figure inputted, obtain the fisrt feature figure of current signature pyramid sub-network.

In the present embodiment, each sub-neural network includes multiple feature pyramid sub-networks, previous feature pyramid subnet The output of network can be the input of adjacent latter feature pyramid sub-network.If for example, x^(l)And W^(l)Represent l-th of feature The input (characteristic pattern) of pyramid sub-network and the output of parameter, then this feature pyramid sub-network, that is, next feature is golden The input of word tower sub-network can be expressed as：

x^(l+1)=x^(l)+p(x^(l)+W^(l)) (1)

Wherein, p (x^(l)+W^(l)) feature extraction operation performed by feature pyramid sub-network, and can be into one Step is expressed as：

Wherein, c is the number of the second branching networks,Represent each second branching networks f_cPerformed feature Extraction operation,Represent the first branching networks f₀Performed feature extraction operation,Represent the 7th convolutional layer Performed processing.

In practical applications, neural network can be by using feature pyramid sub-network as basic comprising modules, utilizing feature Pyramid study mechanism, to extract the feature of different scale.

In a kind of optional embodiment, HOURGLASS shown in Fig. 6 (hourglass) network structure can be used in neural network As a kind of optional basic network topology, but not limited to this.Multiple HOURGLASS structures ends pair that neural network structure includes End connection, forms HOURGLASS network structures, and each HOURGLASS structures include at least one feature pyramid sub-network.Before Input of the output of one HOURGLASS structures for adjacent latter HOURGLASS structures, passes through this network structure so that from Penetration model is analyzed and learnt in bottom always upward, top-downly so that neural network extraction feature it is more efficient and Accurately, ensure the accuracy of the fisrt feature figure obtained.Wherein, since HOURGLASS networks use residual error module (Residual Unit basic comprising modules) are used as, therefore, the feature pyramid sub-network of the present embodiment can be to be used to form HOURGLASS The feature pyramid residual error module (Pyramids Residual Module, PRM) of network structure.Here, HOURGLASS structures And the quantity of feature pyramid sub-network can be suitably set according to actual needs.

In HOURGLASS network structures shown in Fig. 7, each HOURGLASS structures can be by multiple feature pyramid subnets Network forms, and with the feature for being learnt using feature pyramid sub-network and being extracted different scale, and exports fisrt feature figure.Its In, the structure of any feature pyramid sub-network shown in above-mentioned Fig. 3 to Fig. 5 may be used in feature pyramid sub-network.Its In, the neural network shown in Fig. 7 further includes the first convolutional layer (Conv1), and characteristic pattern is obtained available for performing abovementioned steps S202； And pond layer (Pooling, Pool), the resolution ratio for constantly reducing characteristic pattern can be used, it, then will be global to obtain global characteristics The position that resolution ratio is corresponded in feature interpolation amplification and characteristic pattern combines, that is, by carrying out global pool to characteristic pattern, obtains Take the characteristic pattern of image to be detected.The characteristic pattern of acquisition can be with input feature vector pyramid sub-network so that feature pyramid subnet Network carries out characteristic pattern deeper study and extraction, and then extracts fisrt feature figure based on different scale.Optionally, in pond Feature pyramid sub-network or convolutional layer can also be set between layer and feature pyramid sub-network by changing, for adjusting characteristic pattern The attributes such as resolution ratio.

Step S208：Critical point detection is carried out to the target object in image to be detected according to fisrt feature figure.

Optionally, the shot chart of an at least key point for target object is obtained respectively according to fisrt feature figure；According to each getting The score of included pixel in component determines the position of the corresponding key point of target object.Pass through feature pyramid subnet The fisrt feature figure of image to be detected that network obtains, based on different scale come the feature of Detection and Extraction image to be detected, Ke Yiwen The fixed feature for accurately detecting different scale on this basis, carries out critical point detection, effectively according to fisrt feature figure Improve the accuracy of critical point detection.

In a kind of optional embodiment, for some key point, the highest position of shot chart mid-score represents detection The key point position arrived.As shown in figure 8, corresponding with the image to be detected for inputting neural network, the shot chart of output corresponds to Each key point of target object in image to be detected.Wherein, target object is behaved in image to be detected, including 16 keys Point, such as hand, knee etc..By the position of highest scoring in 16 shot charts, the position of corresponding key point is determined, you can completion pair The detection and localization of 16 key points.

In practical application scene, the image processing method of the embodiment of the present invention can be used but be not limited in progress human body attitude Estimation, video understand analysis, Activity recognition and human-computer interaction, image segmentation, object cluster etc..

For example, when carrying out human body attitude estimation, image to be detected is inputted into neural network, utilizes feature pyramid subnet Network is based on different scale and carries out feature extraction, and carries out critical point detection to target object according to the feature of extraction, so as to foundation The position of each key point detected carries out human body attitude estimation.For example, obtain the corresponding pass of 16 shot charts shown in Fig. 8 The position (for example, coordinate) of key point, human body attitude can be accurately estimated according to the position of 16 key points.Due to this implementation The image processing method of example extracts feature using feature pyramid study mechanism, can detect the target object of different scale, So as to ensure the robustness of human body Attitude estimation.

For another example for the video sequence comprising target object, the image processing method of the present embodiment may be used, utilize Feature pyramid study mechanism extracts the characteristic pattern of video frame images to stablize, and then accurately carries out the key point of target object Positioning helps to realize video and understands analysis.

Optionally, the initialization network parameter of an at least network layer for the neural network of the present embodiment, joins from according to network It is obtained in the network parameter distribution that several mean values and variance determines.Wherein, network parameter distribution can be the Gauss of a setting It is distributed or is uniformly distributed, the mean value and variance of network parameter distribution are determined by the number that outputs and inputs with parameter layer, just Beginningization network parameter stochastical sampling can be obtained from network parameter distribution.The parameter initialization method can be to multiple-limb The neural network of network structure is trained, which is not only applicable in what is proposed based on single branching networks, is also applicable to have The problem of residual module of feature pyramid for having multiple-limb network is trained so that the training process of neural network is more stablized.

For example, in network parameter initialization procedure, for neural network propagated forward process, by the mean value of network parameter 0 is initialized as, to ensure that the variance output and input of each layer of neural network is basically identical.In the variance for obtaining network parameter After σ, it is possible to from mean value be 0, variance is the Gaussian Profile of σ or initialization network parameter is adopted in being uniformly distributed Sample, the initialization network parameter as propagated forward process.For neural network back-propagating process, by the mean value of network parameter It is initialized as 0 so that the mean value of the gradient of network parameter is 0, and gradient is output and input so as to ensure each layer of neural network Variance it is basically identical.Obtaining the variances sigma of gradient of network parameter ' after, it is possible to from mean value it is 0, the side of gradient Difference is the Gaussian Profile of σ ' or initialization network parameter is sampled in being uniformly distributed, the initialization as back-propagating process Network parameter.

Optionally, if in the presence of the feelings for including at least two identical mapping (Identity Mapping) additions in neural network Shape then sets output adjustment module in at least identical mapping branch for needing to be added, should by the adjustment of output adjustment module The fisrt feature figure of identical mapping branch output.

It for example, (might as well be with two shown in Fig. 9 if there is the situation that at least two identical mappings are added in neural network Illustrated for a), then BN-ReLU-Conv (batch normalization- are set in some identical mapping branch Rectified Linear Units-Convolution, batch standardization-activation primitive-convolution) module, it is identical to adjust this Parameters, the so treated the output phase added-time in two identical mappings such as the range of variance of mapping branch output can avoid this The variance that Liang Ge identical mappings branch leads to the problem of output response is multiplied, and is conducive to keep neural network learning process Stability.It is illustrated by taking the situation that two identical mappings shown in Fig. 9 are added as an example, it can be in Liang Ge identical mappings branch Output adjustment module is set in any one.

In another example the neural network referred in the corresponding embodiments of above-mentioned Fig. 3 to Fig. 5, also identical reflected there are multiple The situation of branch's addition is penetrated, it can at least one of which identical mapping branch (such as f₀、f₁... or f_c) increase setting BN-ReLU- Conv layers, the output of the branch is thus adjusted, multiple identical mapping branches is avoided to be added out the problems such as corresponding variance of appearance is superimposed.

Image processing method according to embodiments of the present invention, by the feature pyramid sub-network of neural network, based on more Kind different scale carries out feature extraction to the characteristic pattern of image to be detected, and will obtain other multiple characteristic patterns and be closed with characteristic pattern And come the fisrt feature figure that obtains image to be detected, feature pyramid e-learning and the feature of extraction different scale are utilized, is protected Accuracy and robustness that neural network carries out feature extraction are demonstrate,proved；On this basis, according to the fisrt feature figure of acquisition come into Row critical point detection is effectively improved the accuracy of critical point detection.

Embodiment three

With reference to Figure 10, a kind of structure diagram of according to embodiments of the present invention three image processing apparatus is shown.

The image processing apparatus of the present embodiment, including：Acquisition module 1002, for obtaining the characteristic pattern of image to be detected； Extraction module 1004 is based at least two kinds of different scales to characteristic pattern progress feature extraction for passing through neural network, obtains Obtain at least two other characteristic patterns；Merging module 1006 for merging the characteristic pattern and other each described characteristic patterns, obtains institute State the fisrt feature figure of image to be detected.

Optionally, it further includes：Detection module 1008, for according to the fisrt feature figure in described image to be detected Target object carries out critical point detection.

Optionally, the detection module 1008 includes：Subdivision (not shown) is obtained, for according to the fisrt feature Figure obtains the shot chart of an at least key point for the target object respectively；Determination unit (not shown), for according to each The score of included pixel in the shot chart determines the position of the corresponding key point of the target object.

The image processing apparatus of the present embodiment is used to implement corresponding image processing method in preceding method embodiment, and has There is the advantageous effect of corresponding embodiment of the method, details are not described herein.

The present embodiment also provides a kind of computer readable storage medium, is stored thereon with computer program instructions, wherein, it should The step of any image processing method provided in an embodiment of the present invention is realized when program instruction is executed by processor.

The present embodiment also provides a kind of computer program, including：An at least executable instruction, described at least one executable finger The step of any image processing method provided in an embodiment of the present invention is used to implement when order is executed by processor.

Example IV

The embodiment of the present invention four provides a kind of electronic equipment, such as can be mobile terminal, personal computer (PC), put down Plate computer, server etc..Below with reference to Figure 11, it illustrates suitable for being used for realizing the terminal device of the embodiment of the present invention or service The structure diagram of the electronic equipment 1100 of device：As shown in figure 11, electronic equipment 1100 includes one or more processors, communication Element etc., one or more of processors are for example：One or more central processing unit (CPU) 1101 and/or one or more A image processor (GPU) 1113 etc., processor can be according to the executable instructions being stored in read-only memory (ROM) 1102 Or performed from the executable instruction that storage section 1108 is loaded into random access storage device (RAM) 1103 it is various appropriate Action and processing.Communication device includes communication component 1112 and/or communication interface 1109.Wherein, communication component 1112 may include But network interface card is not limited to, the network interface card may include but be not limited to IB (Infiniband) network interface card, and communication interface 1109 includes such as LAN The communication interface of the network interface card of card, modem etc., communication interface 1109 perform logical via the network of such as internet Letter processing.

Processor can communicate to perform executable finger with read-only memory 1102 and/or random access storage device 1103 It enables, is connected by communication bus 1104 with communication component 1112 and communicated through communication component 1112 with other target devices, so as to The corresponding operation of image processing method any one of provided in an embodiment of the present invention is completed, for example, obtaining the feature of image to be detected Figure；By neural network be based at least two kinds of different scales to the characteristic pattern carry out feature extraction, obtain at least two other Characteristic pattern；Merge the characteristic pattern and other each described characteristic patterns, obtain the fisrt feature figure of described image to be detected.

In addition, in RAM 1103, it can also be stored with various programs and data needed for device operation.CPU1101 or GPU1113, ROM1102 and RAM1103 are connected with each other by communication bus 1104.In the case where there is RAM1103, ROM1102 is optional module.RAM1103 stores executable instruction or executable instruction is written into ROM1102 at runtime, Executable instruction makes processor perform the corresponding operation of above-mentioned communication means.Input/output (I/O) interface 1105 is also connected to logical Believe bus 1104.Communication component 1112 can be integrally disposed, may be set to be with multiple submodule (such as multiple IB nets Card), and chained in communication bus.

I/O interfaces 1105 are connected to lower component：Importation 1106 including keyboard, mouse etc.；Including such as cathode The output par, c 1107 of ray tube (CRT), liquid crystal display (LCD) etc. and loud speaker etc.；Storage section including hard disk etc. 1108；And the communication interface 1109 of the network interface card including LAN card, modem etc..The also root of driver 1110 According to needing to be connected to I/O interfaces 1105.Detachable media 1111, such as disk, CD, magneto-optic disk, semiconductor memory etc., It is mounted on driver 1110 as needed, in order to be mounted into storage part as needed from the computer program read thereon Divide 1108.

Need what is illustrated, framework as shown in figure 11 is only a kind of optional realization method, can root during concrete practice The component count amount and type of above-mentioned Figure 11 are selected, are deleted, increased or replaced according to actual needs；It is set in different function component Put, can also be used it is separately positioned or integrally disposed and other implementations, such as GPU and CPU separate setting or can be by GPU collection Into on CPU, communication device separates setting, can also be integrally disposed on CPU or GPU, etc..These interchangeable embodiment party Formula each falls within protection scope of the present invention.

Particularly, according to embodiments of the present invention, it is soft to may be implemented as computer for the process above with reference to flow chart description Part program.For example, the embodiment of the present invention includes a kind of computer program product, including being tangibly embodied in machine readable media On computer program, computer program included for the program code of the method shown in execution flow chart, and program code can wrap The corresponding instruction of corresponding execution method and step provided in an embodiment of the present invention is included, for example, obtaining the characteristic pattern of image to be detected；It is logical It crosses neural network and is based at least two kinds of different scales to characteristic pattern progress feature extraction, obtain at least two other features Figure；Merge the characteristic pattern and other each described characteristic patterns, obtain the fisrt feature figure of described image to be detected.In such reality It applies in example, which can be downloaded and installed from network by communication device and/or from detachable media 1111 It is mounted.When the computer program is executed by processor, the above-mentioned function of being limited in the method for the embodiment of the present invention is performed.

It may be noted that according to the needs of implementation, all parts/step described in the embodiment of the present invention can be split as more The part operation of two or more components/steps or components/steps can be also combined into new component/step by multi-part/step Suddenly, to realize the purpose of the embodiment of the present invention.

It is above-mentioned to realize or be implemented as in hardware, firmware according to the method for the embodiment of the present invention to be storable in note Software or computer code in recording medium (such as CD ROM, RAM, floppy disk, hard disk or magneto-optic disk) are implemented through net The original storage that network is downloaded is in long-range recording medium or nonvolatile machine readable media and will be stored in local recording medium In computer code, can be stored in using all-purpose computer, application specific processor or can compile so as to method described here Such software processing in journey or the recording medium of specialized hardware (such as ASIC or FPGA).It is appreciated that computer, processing Device, microprocessor controller or programmable hardware include can storing or receive software or computer code storage assembly (for example, RAM, ROM, flash memory etc.), when the software or computer code are by computer, processor or hardware access and when performing, realize Processing method described here.In addition, when all-purpose computer access is used to implement the code for the processing being shown here, code It performs and is converted to all-purpose computer to perform the special purpose computer of processing being shown here.

Those of ordinary skill in the art may realize that each exemplary lists described with reference to the embodiments described herein Member and method and step can be realized with the combination of electronic hardware or computer software and electronic hardware.These functions are actually It is performed with hardware or software mode, specific application and design constraint depending on technical solution.Professional technician Described function can be realized using distinct methods to each specific application, but this realization is it is not considered that exceed The range of the embodiment of the present invention.

Embodiment of above is merely to illustrate the embodiment of the present invention, and is not the limitation to the embodiment of the present invention, related skill The those of ordinary skill in art field in the case where not departing from the spirit and scope of the embodiment of the present invention, can also make various Variation and modification, therefore all equivalent technical solutions also belong to the scope of the embodiment of the present invention, the patent of the embodiment of the present invention Protection domain should be defined by the claims.

Claims

1. a kind of image processing method, including：

Obtain the characteristic pattern of image to be detected；

By neural network be based at least two kinds of different scales to the characteristic pattern carry out feature extraction, obtain at least two other Characteristic pattern；

Merge the characteristic pattern and other each described characteristic patterns, obtain the fisrt feature figure of described image to be detected.

2. it according to the method described in claim 1, wherein, further includes：

Critical point detection is carried out to the target object in described image to be detected according to the fisrt feature figure.

It is 3. described that the target object is closed according to the fisrt feature figure according to the method described in claim 2, wherein Key point detects, including：

Obtain the shot chart of an at least key point for the target object respectively according to the fisrt feature figure；

According to the score of pixel included in each shot chart, the position of the corresponding key point of the target object is determined It puts.

4. according to the method any in claims 1 to 3, wherein, the neural network includes at least one feature gold word Tower sub-network, the feature pyramid sub-network include the first branching networks and in parallel with first branching networks respectively At least one second branching networks；Other described characteristic patterns include second feature figure or third feature figure；

The original scale of first branching networks based on the characteristic pattern carries out feature extraction to the characteristic pattern, described in acquisition Second feature figure；

Each second branching networks are based respectively on other scales different from the original scale to characteristic pattern progress spy Sign extraction, obtains the third feature figure.

5. according to the method described in claim 4, wherein, first branching networks include the second convolutional layer, third convolutional layer With Volume Four lamination；

Second convolutional layer reduces the dimension of the characteristic pattern；

The original scale of the third convolutional layer based on the characteristic pattern carries out process of convolution to the characteristic pattern after reducing dimension；

The Volume Four lamination promotes the dimension of the characteristic pattern Jing Guo process of convolution, obtains the second feature figure.

6. method according to claim 4 or 5, wherein, at least 1 second branching networks include the 5th convolutional layer, drop Sample level, the 6th convolutional layer, up-sampling layer and the 7th convolutional layer；

5th convolutional layer reduces the dimension of the characteristic pattern；

The down-sampled layer is down-sampled to the characteristic pattern progress after reducing dimension according to down-sampled ratio is set, wherein, by drop The scale of characteristic pattern after sampling is less than the original scale of the characteristic pattern；

6th convolutional layer carries out process of convolution to described by down-sampled characteristic pattern；

The up-sampling layer up-samples the characteristic pattern Jing Guo convolution according to up-sampling ratio is set, wherein, by above adopting The scale of characteristic pattern after sample is equal to the original scale of the characteristic pattern；

7th convolutional layer promotes the dimension of the characteristic pattern after up-sampling, obtains the third feature figure.

7. a kind of image processing apparatus, including：

Acquisition module, for obtaining the characteristic pattern of image to be detected；

Extraction module is based at least two kinds of different scales to characteristic pattern progress feature extraction for passing through neural network, obtains Obtain at least two other characteristic patterns；

Merging module, for merging the characteristic pattern and other each described characteristic patterns, obtain described image to be detected first is special Sign figure.

8. a kind of computer readable storage medium, is stored thereon with computer program instructions, wherein, described program instruction is handled The step of any one of claim 1 to 6 described image processing method is realized when device performs.

9. a kind of electronic equipment, including：Processor, memory, communication device and communication bus, the processor, the storage Device and the communication device complete mutual communication by the communication bus；

For the memory for storing an at least executable instruction, the executable instruction makes the processor perform right such as will Ask the corresponding operation of the image processing method described in any one of 1 to 6.

10. a kind of computer program, including：An at least executable instruction, an at least executable instruction are executed by processor When be used to implement such as the corresponding operation of any one of claim 1 to 6 described image processing method.