CN109934183A

CN109934183A - Image processing method and device, detection device and storage medium

Info

Publication number: CN109934183A
Application number: CN201910205458.3A
Authority: CN
Inventors: 金晟; 刘文韬; 钱晨
Original assignee: Beijing Sensetime Technology Development Co Ltd
Current assignee: Beijing Sensetime Technology Development Co Ltd
Priority date: 2019-03-18
Filing date: 2019-03-18
Publication date: 2019-06-25
Anticipated expiration: 2039-03-18
Also published as: CN109934183B

Abstract

The embodiment of the invention discloses a kind of image processing method and devices, detection device and storage medium.Described image processing method, comprising: determine target area of the target in described image；The first category feature is extracted from the target area, wherein first category feature, the characteristics of image including the target；According to the same target in the distribution of front and back two field pictures, the second category feature is obtained；Target following is carried out according to first category feature and second category feature.

Description

Image processing method and device, detection device and storage medium

Technical field

The present invention relates to information technology field more particularly to a kind of image processing methods and device, detection device and storage Medium.

Background technique

It can all need to carry out critical point detection to the portrait in image in safety-security area, motion analysis field etc., based on pass The detection of key point obtains spatial positional information and/or characteristics of human body's information of human body etc..There are many detect human body in the technology of first pass Key point method, however, it was found that application condition is big, for example, it may be possible to can a portrait be identified as multiple portraits.

Summary of the invention

An embodiment of the present invention is intended to provide a kind of image processing method and devices, detection device and storage medium.

The technical scheme of the present invention is realized as follows: a kind of image processing method, comprising:

Determine target area of the target in described image；

The first category feature is extracted from the target area, wherein first category feature, the image including the target Feature；

According to the same target in the distribution of front and back two field pictures, the second category feature is obtained；

Target following is carried out according to first category feature and second category feature.

Based on above scheme, second category feature includes:

The key point of target is directed toward the vector that t+1 frame image corresponds to the central point of target in t frame image, and/ Or,

The key point of the target of t+1 frame image is directed toward the vector that t frame image corresponds to the central point of target, and t is nature Number.

It is described that target following is carried out according to first category feature and second category feature based on above scheme, comprising:

First category feature of the first category feature of t+1 frame image and the t frame image is matched, obtains first Difference information；

By t+1 frame image relative to the second category feature of t frame image and the t frame image relative to described T-1 frame image obtains the second category feature and is matched, and obtains the second difference information；

According to first difference information and second difference information, obtain target in the t+1 frame image with The corresponding relationship between target is corresponded in t frame image.

It is described according to first difference information and second difference information based on above scheme, obtain the t+1 The corresponding relationship between target is corresponded in target and t frame image in frame image, comprising:

Second difference of first difference information of first object in t+1 frame image and the first object is believed Breath, is weighted summation and obtains summing value；

Determine the first object of the corresponding t+1 frame image of the minimum summing value and the second mesh of t frame image It is designated as the same target.

It is described to extract the first category feature from the target area based on above scheme, comprising: to the target area into Row residual noise reduction obtains residual error feature；Based on the residual error feature, first category feature is obtained.

It is described to extract the first category feature from the target area based on above scheme, comprising:

Residual noise reduction is carried out to the target area using the first residual error layer, obtains the first residual error feature；

Residual noise reduction is carried out to the first residual error feature using the second residual error layer, obtains the second residual error feature；

Residual noise reduction is carried out to the second residual error feature using third residual error layer, obtains third residual error feature；

Residual noise reduction is carried out to the third residual error feature using the 4th residual error layer, obtains the 4th residual error feature；

Based on the 4th residual error feature, described image feature is obtained；

It is described to be based on the residual error feature, obtain first category feature, comprising: be based on the 4th residual error feature, obtain To described image feature.

It is described that residual noise reduction is carried out to the target area using the first residual error layer based on above scheme, it is residual to obtain first Poor feature, comprising:

Residual noise reduction is carried out to the target area using the first residual error sublayer for including N1 the first residual error modules, is obtained Residual error feature；

Residual noise reduction is carried out to a residual error feature using the second residual error sublayer for including N2 the second residual error modules, Quadratic residue feature is obtained, N1 is positive integer；N2 is positive integer；

In conjunction with a residual error feature and the quadratic residue feature, the first residual error feature is obtained.

It is described to be based on the 4th residual error feature based on above scheme, obtain described image feature, comprising:

Pond is carried out to the 4th residual error feature to handle to obtain the feature of Chi Huahou；

Based on the feature of the Chi Huahou, described image feature is obtained.

Based on above scheme, the feature based on the Chi Huahou obtains described image feature, comprising:

The 4th residual error feature is subjected to the first pond feature and second residual error spy that the first pond is handled Sign carries out full connection and obtains fisrt feature；

The 4th residual error feature is carried out the second pond to handle to obtain second feature；

Splice the fisrt feature and the second feature obtains described image feature.

Based on above scheme, it is described according to the same target in the distribution of front and back two field pictures, obtain the second category feature, wrap It includes:

Obtain two third category features respectively from the two field pictures of front and back, wherein the third category feature includes: to encode The spatial positional information of the same target internal key point and the feature that different target can be distinguished；

Based on two third category features, second category feature is generated.

It is described to be based on two third category features based on above scheme, generate second category feature, comprising:

According to the 4th category feature, the splicing of two second category features is carried out, wherein the 4th category feature includes: It is used to indicate the confidence level for the key point that respective pixel is target；

Based on the 4th category feature, the splicing for carrying out two third category features obtains splicing feature；

Based on the splicing feature, second category feature is generated.

It is described to be based on the splicing feature based on above scheme, generate second category feature, comprising:

Process of convolution is carried out to the splicing feature using the first convolutional layer, obtains the first convolution feature；

Conversion process is carried out to the first convolution feature using hourglass shape switching network and obtains converting characteristic；

Process of convolution is carried out to the converting characteristic using the second convolutional layer, obtains second category feature.

It is described that process of convolution is carried out to the converting characteristic using the second convolutional layer based on above scheme, obtain described the Two category features, comprising:

Process of convolution is carried out to the converting characteristic using the first convolution sublayer, obtains a convolution feature；

Secondary process of convolution is carried out to a convolution feature using the second convolution sublayer, obtains secondary convolution feature；

Cubic convolution processing is carried out to the secondary convolution feature using third convolution sublayer, it is special to obtain second class Sign.

A kind of image processing method, comprising:

Determining module, for determining target area of the target in described image；

Extraction module, for extracting the first category feature from the target area, wherein first category feature, including The characteristics of image of the target；

Module is obtained, for, in the distribution of front and back two field pictures, obtaining the second category feature according to the same target；

Tracking module, for carrying out target following according to first category feature and second category feature.

Based on above scheme, second category feature includes:

Based on above scheme, the tracking module, specifically for by the first category feature and the t of t+1 frame image First category feature of frame image is matched, and the first difference information is obtained；By t+1 frame image relative to t frame image Two category features and the t frame image obtain the second category feature relative to the t-1 frame image and are matched, and obtain second Difference information；According to first difference information and second difference information, obtain target in the t+1 frame image with The corresponding relationship between target is corresponded in t frame image.

Based on above scheme, the tracking module, specifically for believing the first difference of first object in t+1 frame image Second difference information of breath and the first object, is weighted summation and obtains summing value；Determine the minimum summing value The first object of the corresponding t+1 frame image and the second target of t frame image are the same target.

Based on above scheme, residual noise reduction is carried out to the target area and obtains residual error feature；Based on the residual error feature, Obtain first category feature.

Based on above scheme, the extraction module is specifically used for carrying out the target area using the first residual error layer residual Difference processing, obtains the first residual error feature；Residual noise reduction is carried out to the first residual error feature using the second residual error layer, obtains second Residual error feature；Residual noise reduction is carried out to the second residual error feature using third residual error layer, obtains third residual error feature；Utilize Four residual error layers carry out residual noise reduction to the third residual error feature, obtain the 4th residual error feature；Based on the 4th residual error feature, Obtain described image feature；Based on the 4th residual error feature, described image feature is obtained.

Based on above scheme, the extraction module, specifically for utilizing the first residual error for including N1 the first residual error modules Sublayer carries out residual noise reduction to the target area, obtains a residual error feature；Using including the of N2 the second residual error modules Two residual error sublayers carry out residual noise reduction to a residual error feature, obtain quadratic residue feature, N1 is positive integer；N2 is positive whole Number；In conjunction with a residual error feature and the quadratic residue feature, the first residual error feature is obtained.

Based on above scheme, the extraction module handles to obtain specifically for carrying out pond to the 4th residual error feature The feature of Chi Huahou；Based on the feature of the Chi Huahou, described image feature is obtained.

Based on above scheme, the extraction module is specifically used for carrying out the 4th residual error feature into the processing of the first pondization The first obtained pond feature carries out connecting to obtain fisrt feature entirely with the second residual error feature；By the 4th residual error feature The second pond is carried out to handle to obtain second feature；Splice the fisrt feature and the second feature obtains described image feature.

It is described to obtain module based on above scheme, specifically for obtaining two third classes respectively from the two field pictures of front and back Feature, wherein the third category feature include: encode the same target internal key point spatial positional information and being capable of area Divide the feature of different target；Based on two third category features, second category feature is generated.

It is described to obtain module based on above scheme, it is specifically used for that it is special to carry out two second classes according to the 4th category feature The splicing of sign, wherein the 4th category feature includes: the confidence level for being used to indicate the key point that respective pixel is target；It is based on 4th category feature, the splicing for carrying out two third category features obtain splicing feature；Based on the splicing feature, generate Second category feature.

It is described to obtain module based on above scheme, specifically for being rolled up using the first convolutional layer to the splicing feature Product processing, obtains the first convolution feature；Conversion process is carried out to the first convolution feature using hourglass shape switching network to be converted Feature；Process of convolution is carried out to the converting characteristic using the second convolutional layer, obtains second category feature.

It is described to obtain module based on above scheme, it is specifically used for carrying out the converting characteristic using the first convolution sublayer Process of convolution obtains a convolution feature；Secondary convolution is carried out to a convolution feature using the second convolution sublayer Processing, obtains secondary convolution feature；Cubic convolution processing is carried out to the secondary convolution feature using third convolution sublayer, is obtained Second category feature.

A kind of detection device, the detection device include:

Memory, for storing computer executable instructions；

Processor is connect with the memory, for realizing aforementioned any by executing the computer executable instructions The image processing method that technical solution provides.

A kind of computer storage medium, the computer storage medium are stored with computer executable instructions；The calculating After machine executable instruction is executed by processor, the image processing method that aforementioned any embodiment provides can be realized.

Technical solution provided in an embodiment of the present invention can combine the first category feature and second when carrying out critical point detection Both category features, so that both features obtain the characteristic value of key point after mutually merging；In this way, each obtained is crucial The characteristic value of point had both included enough apparent information, also includes the interior spatial structure feature of the same target, utilizes this side The characteristic value for the key point that formula obtains carries out the differentiation of subsequent target, alternatively, accuracy can be promoted by carrying out target detection.

Detailed description of the invention

Fig. 1 is the flow diagram of the first image processing method provided in an embodiment of the present invention；

Fig. 2 is a kind of flow diagram for obtaining the first category feature provided in an embodiment of the present invention；

Fig. 3 is a kind of key point schematic diagram of human body provided in an embodiment of the present invention；

Fig. 4 is a kind of structural schematic diagram of deep learning model for obtaining the first category feature provided in an embodiment of the present invention；

Fig. 5 is the structural schematic diagram of the deep learning model provided in an embodiment of the present invention for obtaining the second category feature；

Fig. 6 is a kind of flow diagram for detecting target area provided in an embodiment of the present invention；

Fig. 7 is a kind of structural schematic diagram of image processing apparatus provided in an embodiment of the present invention；

Fig. 8 is a kind of structural schematic diagram of deep learning model provided in an embodiment of the present invention；

Fig. 9 is a kind of structural schematic diagram of vision facilities provided in an embodiment of the present invention.

Specific embodiment

Technical solution of the present invention is further described in detail with reference to the accompanying drawings and specific embodiments of the specification.

As shown in Figure 1, the present embodiment provides a kind of image processing methods, comprising:

Step S110: target area of the target in described image is determined；

Step S120: the first category feature is extracted from the target area, wherein first category feature, including it is described The characteristics of image of target；

Step S130: according to the same target in the distribution of front and back two field pictures, the second category feature is obtained；

Step S140: target following is carried out according to first category feature and second category feature.

Target described in the present embodiment can be the graphic element of any movable object such as human or animal or equipment.

In the present embodiment, the step S110 can include: the external frame that the key point based on target obtains, according to described The position that external frame is surrounded is the target area.The image-region that the external frame is included can for the target area or Person is referred to as area-of-interest.

In some embodiments, while the vision facilities of progress described image processing receives described image from other equipment Obtain the area coordinate etc. in the multiple images region that image is included；For another example using other full convolutional neural networks etc. other Network exports described image region.

In the present embodiment, after the key point based on the same target obtains the target area, meeting should from image Target area is split, the input as first kind feature extraction.First category feature is the figure of image-region where target As feature, the including but not limited to appearance features and/or structure feature of target.The structure feature includes the Body proportion of target Deng.

The appearance features include but is not limited to color character and/or contour feature of the surface observable of target etc..

The structure feature includes but is not limited to the spatial relation in target between different piece.

It is not only single to carry out target according to the first category feature in the present embodiment in order to promote the tracking precision to target Tracking, can also obtain the second category feature according to distribution of the same target in the two field pictures of front and back.

In conjunction with the first category feature and the second category feature, synthesis obtains the tracking result of target following, this tracking result It can be based on the first category feature in view of the similitude of the appearance features of the same target in adjacent two field pictures, simultaneously as drawing The second category feature is entered and has carried out target following, the second category feature has reacted the same target spatially transformation relation, thus comprehensive Close the tracking that the appearance similitude of the first category feature and spatial transform relation carry out on adjacent two frame considered.And second category feature It is that the features of two field pictures before and after binding capacity obtains, provides the time in this way, being equivalent to and carrying out target following to external appearance characteristic On binding character, to be based on temporal binding character, even if target occurs fast moving in the picture, large span is mobile or The apparent deformation of person, also can accurately carry out target following to improve the accuracy of target following reduces target The phenomenon that with losing, improves target following effect.

In some embodiments, second category feature includes: that the key point of a target in t frame image is directed toward t+ 1 frame image corresponds to the vector of the central point of target, and/or, the key point of the target of t+1 frame image is directed toward t frame image pair The vector of the central point of target is answered, t is natural number.

Herein, t+1 frame image is a later frame image of t frame image.Assuming that t frame image and t+1 frame image are equal Including S target, then t frame image and t+1 frame image can include the first category feature of S target, while t+1 frame figure As can obtain a second category feature figure relative to t frame image, the pixel value which is embedded in characteristic pattern is aforementioned the Two category features.Second category feature of the S target that one the second category feature figure is included.

In some embodiments, the step S130 can include:

First difference information can be the equidistant letter of Euclidean distance between the first category feature different in two kinds of images Breath, Euclidean distance herein is only citing, not limited to this when specific implementation there are many kinds of modes.

In further embodiments, first difference information and second difference information

Similarly, second difference information is also possible to the distance between second category feature corresponding to two kinds of images The incompatibilities information such as equidistant information or incompatibility.

It is described according to first difference information and second difference information, obtain the mesh in the t+1 frame image It marks and corresponds to the corresponding relationship between target in t frame image, comprising:

The key point as corresponding to the first category feature is known, therefore central point corresponding to these key points is also Know.And the central point for having arrived target is used in first category feature, therefore can be matched according to central point, it is known that in a frame figure Which first kind is characterized in that with which the second category feature be corresponding as in, in this way, can be by same matched first difference Information and the second difference information are weighted summation, will obtain final difference information；Know which is matched final by comparing Difference information is minimum, if the smallest one group of matching of final difference information, illustrates that corresponding in this adjacent two field pictures is same One target, to realize target following.

In some embodiments, the step S120 can include:

Residual noise reduction is carried out to the target area and obtains residual error feature；

Based on the residual error feature, first category feature is obtained.

In the present embodiment, the residual noise reduction is the processing carried out using residual error module, by the latter of residual noise reduction Aspect is able to maintain original technical detail, while can also be handled by the convolution etc. in residual error module, can protrude target Information needed.

In some embodiments, as shown in Fig. 2, the step S120 can include:

Step S121: residual noise reduction is carried out to the target area using the first residual error layer, obtains the first residual error feature；

Step S122: residual noise reduction is carried out to the first residual error feature using the second residual error layer, obtains the second residual error spy Sign；

Step S123: residual noise reduction is carried out to the second residual error feature using third residual error layer, it is special to obtain third residual error Sign；

Step S124: residual noise reduction is carried out to the third residual error feature using the 4th residual error layer, obtains the 4th residual error spy Sign；

Step S125: it is based on the 4th residual error feature, obtains described image feature.

In some embodiments, the residual error layer that the first residual error layer can be formed for an independent residual error module, can also be with It is the residual error module that multiple residual error sublayers are formed.Pass through residual noise reduction, the available residual error feature in a word.

The first residual error layer can be any one residual error module of offer in residual error network (ResNet).

In the present embodiment, the second residual error layer is located at after the first residual error layer, can carry out residual error to the first residual error layer It handles the first obtained residual error feature and carries out residual noise reduction again, obtain the second residual error feature.Third residual error feature is located at second After residual error module, the 4th residual error layer is located at after third residual error layer.First residual error layer, the second residual error layer, third residual error layer and The specific structure for the residual error module that 4th residual error layer includes may be the same or different, these residual error modules can come from different editions or The residual error network of different structure.

The residual error module that any two include in first residual error layer, the second residual error layer, third residual error layer and the 4th residual error layer It can also be identical.

Specifically, the step S121 can include:

In the present embodiment, the first residual error module and the second residual error module can be the residual error of heterogeneous networks structure Module.

Optionally, the value of the N1 and N2 is all the positive integer not less than 2, for example, it is 6 etc. that N1, which is 4, N2,.Using not The information content for the target that the residual error feature obtained after same residual error resume module retains can be different, identical residual relative to using Difference module, the residual error module that can reduce single type are lost the larger tracking error that a certain category information introduces；Therefore it selects not Same residual error module can further promote the accuracy of tracking.

In some embodiments, the step S125 can include: pond is carried out to the 4th residual error feature and handles to obtain The feature of Chi Huahou；Based on the feature of the Chi Huahou, described image feature is obtained.

The information needed for obtaining the second category feature after the present embodiment has obtained the 4th residual error feature, but in order to drop Low subsequent processing data volume can filter out the characteristic value of redundancy by pondization operation.In the present embodiment, the Chi Huacao Work includes but is not limited to: average pondization operation and/or maximum pondization operation etc..

Specifically, the step S125 can include:

In the present embodiment, the splicing fisrt feature and second feature include: will will include the of the fisrt feature Fisrt feature corresponding to i-th row jth column pixel in one characteristic pattern, and the i-th row jth column of the second feature figure comprising second feature The second feature of pixel is directly spliced, for example, the second feature of the fisrt feature of S1 dimension and S2 dimension is spliced, obtains S1+ The described image feature of S2 dimension.

Specifically such as, described image feature is obtained, it can be as follows:

Residual noise reduction is carried out to the target area using the first residual error sublayer for including N1 the first residual error modules, is obtained Residual error feature, and two are carried out to a residual error feature using the second residual error sublayer for including N2 the second residual error modules Secondary residual noise reduction, obtains the second residual error feature, and N1 is positive integer；N2 is positive integer；

The first residual error feature is handled using the second residual error layer, obtains the second residual error feature；

The second residual error feature is handled using third residual error layer, obtains third residual error feature；

The third residual error feature is handled using the 4th residual error layer, obtains the 4th residual error feature；

The 4th residual error feature is subjected to the first pond feature and third residual error spy that the first pond is handled Sign carries out full connection and obtains fisrt feature；

As shown in figure 4, the first residual error module is 4, respectively res3a, res3b, res3c and res3d；Second is residual Difference module is 6, is res4a, res4b, res4c, res4d, res4e and res4f respectively.

Third residual error layer may include residual error module res5a；4th residual error layer may include residual error module res5b；5th residual error Layer may include residual error module res5c.

The first pondization processing can obtain for average pond, then can be by obtaining middle level features after connection (fc) entirely One kind of aforementioned fisrt feature.

The corresponding second pondization processing of the 5th residual error feature can be average pond, and obtaining top-level feature is described second One kind of feature.The second feature can be the feature of 2048 dimensions (D).

After the intermediate features and top-level feature fusion, first category feature is obtained.

It can be the network architecture of the deep learning model in the present embodiment for extracting first category feature shown in Fig. 4 Figure.

It can be the network architecture of the deep learning model in the present embodiment for extracting second category feature shown in Fig. 5 Figure；

The Liang Ge branch of deep learning model described in the present embodiment respectively obtains first category feature and second class It is comprehensive to realize target following after feature, target following result can be promoted.

In some embodiments, the step S130 can include:

Based on two third category features, second category feature is generated.

Third category feature described in the present embodiment include: a target key point relative to the target central point to Amount composition, characterization is positional relationship between key point in a target.

In some embodiments, described to be based on two third category features, generate second category feature, comprising:

Based on the splicing feature, second category feature is generated.

For example, the 4th category feature may be based on the confidence level in the Gaussian response figure that Gauss algorithm obtains.The confidence Degree instruction be respective pixel whether be target key point probability.

In some embodiments, described to be based on the splicing feature, generate second category feature, comprising:

The hourglass shape network is the network symmetrical with intermediate point.

In further embodiments, described that process of convolution is carried out to the converting characteristic using the second convolutional layer, obtain institute State the second category feature, comprising:

Convolution is carried out by multiple convolutional layers or convolution sublayer, can gradually obtain second category feature.

There are many modes that described image region is determined in the step S110, a kind of optional way presented below；Such as Fig. 6 Shown, the present embodiment provides a kind of image processing methods, comprising:

Step S220: the 4th category feature is detected from described image, wherein the 4th category feature includes at least: mesh Target spatial positional information；

Step S210: the 5th category feature is detected from image, wherein the 5th category feature includes at least: target Apparent information；

Step S230: merging the 5th category feature and the 4th category feature obtains the characteristic value of key point.

Detect that the 5th category feature (Keypoints Embedding, KE), the KE include but is not limited to target from image The apparent information of body surface, the apparent information can be the observable profile information of various direct visions, texture information and skin matter Feel information etc..

By taking human body is target as an example, the apparent information includes but is not limited to: the profile information of face；The distribution of face is believed Breath etc..

It include: the pixel for belonging to the pixel of target and belonging to the background other than target in a kind of pixel of image.In this reality It applies in example, the pixel of pixel and background that target is included is distinguished, in the characteristic pattern comprising the 5th category feature of generation It is indicated using different pixel values (or being characterized value), for example, corresponding to the picture of background in the image of detection in characteristic pattern Element using pixel value " 0 ", and is for the pixel value other than " 0 " with the pixel value of pixel corresponding to target.In the present embodiment, There may be multiple targets in the image of detection, in order to distinguish multiple targets, the pixel value of different target pixel is corresponded in characteristic pattern Use different numerical value.For example, characteristic value corresponding to target A is indicated by " 1 ", characteristic value corresponding to target B is by " 2 " come table Show.And characteristic value corresponding to background is " 0 " in the picture；Then at this point, 1 is different from 2 and is different from 0；2 also different from characteristic value 0；In this way, the comparison based on above-mentioned numerical value, is known that in characteristic pattern which is background, which is target；Simultaneously as different Target uses different characteristic values, so that it may according to the specific value of characteristic value, identify which pixel belongs to the same target.

4th category feature includes the spatial positional information of target, optionally, the characteristic value of the 4th category feature It is used to indicate the relative positional relationship that each key point is the central point relative to target, specifically such as, the 4th category feature It can are as follows: the vector of the targeted central point of the spatial key point.4th category feature can characterize in target various pieces it Between relative positional relationship.Specifically such as, by taking human body is the target as an example, the 4th category feature can include: different in human body Relative positional relationship of the joint key point in joint relative to human body central point, which includes but is not limited to: side To and/or distance, can by key point be directed toward human body central point vector indicate.The human body central point can be the root section of human body Point.Fig. 3 show a kind of key point schematic diagram of human body, wherein key point 0 is the root node, is to be based on being calculated 's.Key point 10 is header key point in Fig. 3；Key point 9 is neck key point；Key point 11 and 14 is shoulder key point；It closes Key point 8 is the key point that shoulder is connect with neck；Key point 7 is waist key point；Key point 12 and 15 is ancon key point；It closes Key point 13 and 16 is wrist key point；Key point 1 and 4 is crotch's key point；Key point 5 and 20 is knee key point；Key point 6 It is ankle key point with 3.

In further embodiments, the human body central point can also be to belong to averaging for each spatial key point, obtain The coordinate value of the human body central point；In this way, each spatial key point is full relative to the distribution of the human body central point in the target The specific distribution occasion of foot.If when the space instances for judging whether it is a target are embedded in feature, so that it may according to space reality The value embedded of example insertion feature is to meet the distribution occasion, determines that the corresponding space instances insertion of which value embedded is characterized in belonging to In the same target.

Assuming that target is human body, value embedded corresponding to the space instances insertion feature is one and includes two elements Array, wherein the difference in the first direction element representation x in array；Second element indicates the difference in the direction y, the direction x and the direction y It is mutually perpendicular to.For the direction x and the direction y herein is all relative image, such as establishing in the plane where image includes x The two-dimensional Cartesian coordinate system of axis and y-axis, then the direction x can be the x-axis direction of image coordinate system；The direction y can be image The y-axis direction of coordinate system.For example, being subtracted first in the value embedded that human body center point coordinate obtains with the left face key point coordinate in head Element is positive value and second element is positive value；The right face key point coordinate in head subtracts in the value embedded that human body center point coordinate obtains First element is negative value and second element is positive value；Left foot key point coordinate subtracts the value embedded that human body center point coordinate obtains First element is positive value, and second element is negative value；Right crus of diaphragm key point coordinate subtracts the insertion obtained to human body center line points' coordinates First element of value is negative value, and second element is negative value.It, can be according to this when judgement belongs to the value embedded of a target Value embedded corresponds to the corresponding part of characteristic value of key point, i.e. the characteristics of its value embedded carries out.

In the present embodiment, the 4th category feature is vector of each spatial key point relative to central point, substantially It is equivalent to the relative positional relationship defined in a target between key point.

Since the 5th category feature more pays close attention to the apparent information of target, in the case where lacking space constraint, can make same The different key points of one target belong to different targets so as to cause inaccurate problem.

Since the 4th category feature more pays close attention to different spaces key point in target, may ignore between different target Relative positional relationship, and point farther away for the center position relative to target, it may appear that the reasons such as encoding error is big cause Poor accuracy.

When carrying out the characteristic value detection of key point in the present embodiment, in summary two kinds of features are understood, so that two kinds of features Using supplementing each other, for example, using the 4th category feature as the space constraint of the 5th category feature, with the 5th category feature The deficiency of 4th category feature described in reinforcement merges two kinds of features and obtains characteristic value of the fusion feature as the key point, is based on The characteristic value of the key point can judge which key point belongs to the same target, while can also obtain the apparent letter of target Breath facilitates the detection accuracy for promoting target, reduces a mesh in this way, obtaining the characteristic value of key point by means of which Mislabel the probability for accidentally splitting into two or more targets.And since the accuracy of the characteristic value of key point improves, reduce The problem of characteristics extraction low efficiency of key point caused by the reasons such as error correction, promote the extraction of the characteristic value of key point Efficiency.

In some embodiments, the method also includes:

Third category feature figure is detected from described image, wherein the third category feature figure includes at least: key point Characteristic value predictive information；

The step S230 can include:

Based on the third category feature figure, merges the 5th category feature and the 4th category feature obtains the key point Characteristic value.

The third category feature figure can be referred to as thermodynamic chart again in the present embodiment, the pixel in the third category feature figure It can indicate that respective pixel is the probability of key point in image for the predictive information such as confidence level or probability value, the predictive information Value, or, the pixel is predicted to be the confidence level etc. of key point.

In the present embodiment, the detection position of key point position can be determined in conjunction with third category feature figure.

When the 5th category feature of progress and the 4th category feature are merged in step S230, as where the 5th category feature Space instances insertion figure where 5th category feature figure and the 4th category feature is alignment, and is to be aligned with third category feature figure , the number of pixels that alignment herein refers to that image includes is identical, and is one-to-one on spatial position.

In this way, being by the 5th category feature and the 4th category feature in same detection position when obtaining the characteristic value of key point It is merged, obtains the characteristic value of the key point.

In the present embodiment, the fusion of the 5th category feature and the 4th category feature includes but is not limited to:

5th category feature and the 4th category feature are spliced.For example, the 5th category feature is a m1 dimensional feature；Institute Stating the 4th category feature is m2 dimensional feature, then can be m1+m2 dimensional feature after the two merging features.

In some embodiments, the 5th category feature can be 1 dimensional feature；4th category feature can be 2 dimensional features；It is logical It crosses after the fusion, obtaining splicing feature can be 3 dimensional features.

In the present embodiment, by the direct splicing of this feature, the splicing feature of formation remains the 5th class spy simultaneously The characteristic value of the characteristic value of sign and the 4th category feature remains apparent information and spatial positional information simultaneously, utilizes this spelling The splicing feature formed after connecing obtains the characteristic value of the key point, it is clear that can reduce error rate, promote accuracy.

In some embodiments, the step S230 is specific can include:

According to the confidence level for predicting key point in the key point Gaussian response figure third category feature figure, the pass is determined The detection position of the characteristic value of key point；

By check bit described in the 5th category feature and the 4th category feature figure in detection position described in the 5th category feature figure The 4th category feature in setting is spliced, and the characteristic value of the key point is obtained.

In the present embodiment, indicate that respective pixel is the feature of key point in the more high then third category feature figure of the confidence level The probability of value is higher.For example, traversing the pixel value of each pixel in third category feature figure by taking the confidence level of header key point as an example (the i.e. described confidence level) finds out the local maximum in different zones, local maximum confidence, with the maximum confidence institute Pixel coordinate be (X1, Y1), then take out the 5th category feature figure (X1, Y1) the 5th category feature；Take out the 4th category feature Scheme the 4th category feature of (X1, Y1), and by the two Fusion Features, just obtains the characteristic value of a key point.The key point Coordinate in the picture is (X1, Y1), and the characteristic value of the key point is the value embedded and m2 dimension of the 5th category feature of m1 dimension The value embedded of 4th category feature is constituted.

For example, if human body includes M key point, being based ultimately upon third category feature figure using human body as target and carrying out the 5th After category feature and the fusion of the 4th category feature, the characteristic value of M key point can be obtained, each described characteristic value is closed by corresponding What the 5th category feature and the 4th category feature of key point were spliced to form.

The method may also include that in some embodiments

The characteristic value of the key point is clustered, cluster result is obtained；

According to the cluster result, the key point for belonging to same target is determined.

For example, by having obtained the characteristic value of each key point after splicing, for example, by taking target as an example, if the pass of human body Key point is S1 etc., if there is S2 target in image, can obtain S1*S2 key point；

Then S1*S2 key point is clustered, obtains cluster result.

For example, the step S140 can be as follows:

According to predetermined direction, all types of key points of human body are clustered, such as carry out the cluster based on distance；

The locally optimal solution of different type key point is obtained based on cluster；

Each locally optimal solution is combined, the cluster result has been obtained.

For example, being illustrated so that target is human body as an example, clustered from head to predetermined direction corresponding to foot；Then It is described according to predetermined direction, to the distance between all types of key points of human body, comprising:

Each header key point and neck key point are carried out to obtain each header key point and each neck apart from cluster The distance between key point；

Citing cluster is carried out to each neck key point and chest key point, each neck key point is obtained and each chest closes The distance between key point,

And so on, until having traversed all local key points；

It is described that the locally optimal solution of different type key point is obtained based on cluster, comprising:

It selects apart from the smallest header key point and neck key point as locally best matching；

It selects apart from the smallest neck key point and chest key point as locally best matching；

And so on, until having traversed all locally best matchings；

The each locally optimal solution of combination, has obtained the cluster result, comprising:

The matching of identical key point involved in each locally best matching is combined, the cluster knot using target as granularity is obtained Fruit.

Finally according to the cluster result, the anti-all key points released the same target and included.

Certainly the above is only the citing that different key points are divided into the same target, there are many specific implementations, this Place just no longer schematically illustrates.

In the present embodiment, using deep learning model obtain the 5th category feature and/or, the space instances are special Sign.

The deep learning model includes but is not limited to neural network.

For example, refering to what is shown in Fig. 8, the deep learning model includes:

Feature extraction layer obtains characteristic pattern for extracting low-level feature from described image；

Conversion layer, be located at the feature extraction layer rear end, for based on the characteristic pattern obtain the third category feature figure, The 5th category feature figure, the 4th category feature figure comprising the 4th category feature comprising the 5th category feature；

Fusion Features convolutional layer, positioned at the rear end of conversion layer described in the last one, for being merged based on third category feature figure The 5th category feature figure and the 4th category feature figure.

The pixel that the figure of third category feature described in the present embodiment, the 5th category feature figure and the 4th category feature figure include Number is identical, but the dimension of single pixel can be different.

For example, the third category feature figure, the 5th category feature figure and the 4th category feature figure include equal W*H pixel；W It is positive integer with H.The dimension of a pixel can be J in third category feature figure；The dimension of a pixel in 5th category feature figure It can be J；The dimension of the 4th category feature figure can be 2.It is J+J+2 that then the Fusion Features convolutional layer, which can be port number,；Convolution Core is that 1:1 convolution step-length can be 1.

In some embodiments, the conversion layer includes: N number of concatenated hourglass shape coding sub-network, hourglass shape coding The network architecture of sub-network is hourglass-shaped；The N number of hourglass shape encodes sub-network, for obtaining described the based on the characteristic pattern Three classes characteristic pattern, the 5th category feature figure, the 4th category feature figure comprising the 4th category feature comprising the 5th category feature； N is positive integer, for example, N can be 2,3 or 4.

For example, the conversion layer can include: hourglass shape, which encodes sub-network and is located at the hourglass shape, encodes sub-network rear end At least two tail portion convolution sublayers and merging features node；The hourglass shape coding sub-network obtains spy from feature extraction layer Sign figure, handles characteristic pattern, and feature is input at concatenated at least two convolution sublayer progress convolution by treated Reason；J+J+ is obtained after the convolution feature of the last one convolution sublayer output and the characteristic pattern splicing obtained from feature extraction layer 2 dimensional feature figures, wherein 1 J dimensional feature corresponds to third category feature figure；Another J dimensional feature can be the 5th category feature of J dimension Figure；2 dimensional features are the 4th category feature figure.

In the present embodiment, the conversion layer uses hourglass shape and encodes sub-network, can also adopt during specific implementation Described hourglass shape coding sub-network etc. is substituted with residual error module, is only for example herein in a word, there are many kinds of specific implementations, this Place just different one schematically illustrates.

As shown in fig. 7, the present embodiment provides a kind of image processing apparatus, comprising:

Determining module 110, for determining target area of the target in described image；

Extraction module 120, for extracting the first category feature from the target area, wherein first category feature, packet Include the characteristics of image of the target；

Module 130 is obtained, for, in the distribution of front and back two field pictures, obtaining the second category feature according to the same target；

Tracking module 140, for carrying out target following according to first category feature and second category feature.

The present embodiment provides image processing apparatus, can be applied in various electronic equipments, for example, mobile device and fixation are set It is standby etc..The mobile device includes but is not limited to mobile phone, tablet computer or various wearable devices etc..The fixed equipment packet Include but be not limited to desk-top notebook or server etc..

In some embodiments, the determining module 110, extraction module 120, obtain module 130 and tracking module 140 can For program module, after which is executed by processor, it is capable of detecting when the first category feature, the second category feature, and closed The characteristic value of key point.

In further embodiments, the determining module 110, extraction module 120, obtain module 130 and tracking module 140 It can be soft or hard binding modules, the soft or hard binding modules may include various programmable arrays；The programmable array includes but not It is limited to this column of complex programmable array or field-programmable.

In some embodiments, second category feature includes:

In some embodiments, the tracking module 140, specifically for by the first category feature of t+1 frame image and institute The first category feature for stating t frame image is matched, and the first difference information is obtained；By t+1 frame image relative to t frame image The second category feature and the t frame image obtain the second category feature relative to the t-1 frame image and matched, obtain Second difference information；According to first difference information and second difference information, the mesh in the t+1 frame image is obtained It marks and corresponds to the corresponding relationship between target in t frame image.

In some embodiments, the tracking module 140, specifically for by first of first object in t+1 frame image Second difference information of difference information and the first object, is weighted summation and obtains summing value；It determines described in minimum The first object of the corresponding t+1 frame image of summing value and the second target of t frame image are the same target.

In some embodiments, the extraction module 120 is specifically used for carrying out residual noise reduction acquisition to the target area Residual error feature；Based on the residual error feature, first category feature is obtained.

Further, the extraction module 120 can be specifically used for carrying out the target area using the first residual error layer residual Difference processing, obtains the first residual error feature；Residual noise reduction is carried out to the first residual error feature using the second residual error layer, obtains second Residual error feature；Residual noise reduction is carried out to the second residual error feature using third residual error layer, obtains third residual error feature；Utilize Four residual error layers carry out residual noise reduction to the third residual error feature, obtain the 4th residual error feature；Based on the 4th residual error feature, Obtain described image feature；Based on the 4th residual error feature, described image feature is obtained.

In some embodiments, the extraction module 120, specifically for utilizing first including N1 the first residual error modules Residual error sublayer carries out residual noise reduction to the target area, obtains a residual error feature；Using including N2 the second residual error modules The second residual error sublayer to a residual error feature carry out residual noise reduction, obtain quadratic residue feature, N1 is positive integer；N2 is Positive integer；In conjunction with a residual error feature and the quadratic residue feature, the first residual error feature is obtained.

In some embodiments, the extraction module 120 is specifically used for carrying out pond processing to the 4th residual error feature Obtain the feature of Chi Huahou；Based on the feature of the Chi Huahou, described image feature is obtained.

In some embodiments, the extraction module 120 is specifically used for the 4th residual error feature carrying out the first pond The first obtained pond feature is handled to carry out connecting to obtain fisrt feature entirely with the second residual error feature；By the 4th residual error Feature carries out the second pond and handles to obtain second feature；Splice the fisrt feature and the second feature obtains described image spy Sign.

In some embodiments, described to obtain module 130, specifically for obtaining two respectively from the two field pictures of front and back Three category features, wherein the third category feature includes: the spatial positional information and energy for encoding the same target internal key point Enough distinguish the feature of different target；Based on two third category features, second category feature is generated.

In some embodiments, described to obtain module 130, it is specifically used for according to the 4th category feature, carries out two described the The splicing of two category features, wherein the 4th category feature includes: the confidence for being used to indicate the key point that respective pixel is target Degree；Based on the 4th category feature, the splicing for carrying out two third category features obtains splicing feature；It is special based on the splicing Sign generates second category feature.

In some embodiments, described to obtain module 130, be specifically used for using the first convolutional layer to the splicing feature into Row process of convolution obtains the first convolution feature；Conversion process is carried out to the first convolution feature using hourglass shape switching network to obtain Converting characteristic；Process of convolution is carried out to the converting characteristic using the second convolutional layer, obtains second category feature.

In some embodiments, described to obtain module 130, it is specifically used for using the first convolution sublayer to the converting characteristic A process of convolution is carried out, a convolution feature is obtained；Convolution feature is carried out using the second convolution sublayer secondary Process of convolution obtains secondary convolution feature；Cubic convolution processing is carried out to the secondary convolution feature using third convolution sublayer, Obtain second category feature.

Several specific examples are provided below in conjunction with above-mentioned any embodiment:

Example 1:

Human body critical point detection is the basis of video analysis, has important application in safety-security area, motion analysis field Prospect.

This example provides two kinds of human body critical point detection technologies, one is being based on the first category feature (Keypoint Embedding, KE) solution, another is based on the second category feature (Spatial Instance Embedding, SIE) Image processing method.

The dimension of first kind characteristic pattern and the second category feature figure is identical, can equally use a series of output resolution ratio sizes Two-dimensional matrix indicates, wherein the classification of each key point, corresponding two-dimensional matrix, and on spatial position, with pass Key point corresponds.

First category feature KE in the training process, the value embedded of each key point for the same person that furthers, and zoom out difference The value embedded of the key point of people.

KE mainly contains the apparent information of the pixel near key point.KE relates generally to apparent information, to spatial position It is insensitive, the node relationships of long range can be modeled；However, due to lack space constraint, only rely on KE may mistakenly by The key point of distant place different people is got together.

Second category feature SIE in the training process, the vector at human body center is revert to each pixel value, then SIE Contain human body center location information.

SIE mainly includes spatial positional information, encodes human body center, can efficiently use spatial position and be gathered Class.However, the encoding error of SIE is larger for apart from positions such as the farther away points in human body center (such as crown, ankle), it may The same person can be mistakenly divided into multiple portions.

As shown in fig. 6, this example proposes the critical point detection model an of multi-task multi-branch, can be extracted simultaneously One category feature and the second category feature, and it is dedicated to both bottom-up critical point detection schemes that organically blend, in conjunction with the two Advantage realizes more efficient, more accurate human body critical point detection.Critical point detection model shown in Fig. 6 carries out key point inspection When survey, third category feature figure can be also detected, subsequent key point distance is facilitated to obtain the characteristic value of final key point (i.e. in Fig. 6 Shown in final detection result).

Specifically, this example proposes the image processing method of a multi-task multi-branch, comprising: in conjunction with the first category feature With the second category feature, mostly everybody body key point prediction is carried out.

The detection method both can be used for the detection of everybody more body key points, while extend also to human body key point Among tracing task.As shown in Figure 7, for each frame image, pass through the bottom-up human body key point mould of multitask first Type directly exports the Gaussian response figure and first kind characteristic pattern, the second category feature figure of human body key point.Shown in fig. 7 Feature extraction layer, comprising: multiple convolution sublayers and pond layer, the number of convolution sublayer is 5 in Fig. 7；The pond layer is Maximum pond layer, maximum pond layer herein are the down-sampling layer for retaining maximum value；The port number of 1st convolution sublayer be 64, The size of convolution kernel is 7*7, and convolution step-length is 2；The port number of 2nd convolution sublayer is 128, the size of convolution kernel is 3*3, volume Product step-length is 1；The port number of 3rd convolution sublayer is 128, the size of convolution kernel is 7*7, and convolution step-length is 1；4th convolution The port number of sublayer is 128, the size of convolution kernel is 3*3, and convolution step-length is 1；The port number of 5th convolution sublayer is 256, rolls up The size of product core is 3*3, and convolution step-length is 1.Feature extraction layer outputs 256 characteristic patterns, and the pixel value of this feature figure is Aforementioned low-level image feature.

The Feature Conversion layer that S conversion module is formed；One conversion module includes a hourglass shape sub-network and more A convolution sublayer；The value of the S can be 2 or more any positive integer, for example, value is 4.There are two showing as shown in Figure 4 Convolutional layer, the port number of the two convolution sublayers are 256, and the size of convolution kernel is 3*3, and convolution step-length is 1.The depth After Feature Conversion layer of the learning model by the formation of 4 conversion modules, pass through the third class spy that a convolution sublayer exports J dimension Second category feature figure of sign figure, the first kind characteristic pattern of J dimension and 2 dimensions.

After fused layer carries out merging features, by a port number be J+J+2, convolution kernel is having a size of 1*1 and convolution walks A length of 1 convolution exports the second category feature figure of the Gaussian response figure of J dimension and the first kind characteristic pattern of J dimension, 2 dimensions respectively.This Two classes are embedded in characteristic pattern, are equally indicated with a series of two-dimensional matrixes, wherein the classification of each key point, a corresponding Zhang Erwei Matrix and Gaussian response figure can correspond on spatial position.For first kind characteristic pattern KE, each pass of the same person Key point has similar value embedded；For the key point of different people, it is desirable that its value embedded is different.The value of J can be decided by one The number of the included key point of target；For example, the crucial points that human body includes can be 14 or 17 if target is human body；Then J is 14 or 17 at this time.

Space instances are embedded in and are schemed, each pixel returns the coordinate vector for arriving human body center.Space instances Insertion figure SIE, contains the center position coordinates information of human body naturally.

By the bottom-up Critical point model based on convolutional neural networks, the Gauss that can obtain human body key point is rung It should be with the first category feature, the second category feature.

In third category feature image, the value of each position is the confidence level that the point is predicted to be corresponding key point.In the figure The coordinate of the maximum pixel of confidence level is the detection position of corresponding key point.

Then first kind characteristic pattern and the second category feature figure are stitched together along characteristic dimension, carry out artis jointly Cluster, final artis constitutes entire human body attitude.

Training loss function:

L in above formula₁The loss function of the first category feature is represented, J is the species number of artis, and K is that an image includes Target numbers；m(p_j,k) it is the corresponding value embedded of the first category feature；p_j,kFor the position of j-th of key point of k-th of target； For the mean value of the value embedded of each first category feature of k-th of target.

The functional relation for seeking the second loss item can be as follows:

In above formula, L₂For the second loss item.p_j,kFor k-th of target j-th of key point relative to k-th of target Central point vector；For the coordinate of the central point of k-th of target.The key that J includes by a target The total number of point；K is the target numbers that an image includes.

Simple to use the method based on the first category feature, KE relates generally to apparent information, insensitive to spatial position, can be with Model the node relationships of long range；However, only relying on KE may be mistakenly by distant place different people due to lacking space constraint Key point is got together.

Simple to use the method based on the second category feature, SIE mainly includes spatial positional information, encodes human body centre bit It sets, spatial position can be efficiently used and clustered.However, for apart from the farther away point in human body center (such as crown, ankle) Equal positions, the encoding error of SIE is larger, the same person mistakenly may be divided into multiple portions.

In short, this example proposes a kind of bottom-up multitask key point prediction model, while carrying out first kind spy Second category feature of seeking peace extracts.

In conjunction with the first category feature and the second category feature, mostly everybody body key point prediction is carried out.

This example combines the first category feature and the second category feature, carries out mostly everybody body key point prediction.First category feature packet The spatial positional information of the apparent information and the second category feature that contain combines, and can effectively promote critical point detection precision.

The key point prediction model that this example provides, can use this algorithm in internet video, to human body key point Position carry out Accurate Prediction；And the key point of prediction can be accurately positioned human body not for analyzing the behavior type of human body Increase Real-time Special Effect in body different parts with after position.In some scenes, whether used simultaneously in product, the first kind Feature and the second category feature carry out critical point detection or tracing task.

Example 2:

This example provide a Ge Shuan branch temporal aspect extract deep learning model, extract the 4th category feature of human body and 5th category feature carries out human body tracking.In this example, the 4th category feature of human body is one kind of aforementioned 4th category feature, Since the target of tracking is human body, therefore referred to as the 4th category feature of human body.But during concrete implementation, the tracking of target is not It is limited to human body, can also be other mobile objects, for example, the vehicles and/or ground mobile robot or low-latitude flying machine Device people.

The 4th category feature of human body contains the apparent information of entirety in key point region, and timing instance insertion contains the time Consistency constraint.

The 4th category feature of human body contains whole apparent information, the spatial positional information independent of human body, fast for human body Fast movement, camera motion and scaling have good robustness.5th category feature contains the constraint information of time consistency, Movement can be made more smooth, to attitudes vibration and blocked more robust.

This example proposes, can be embedded in using the 4th category feature of human body and timing instance, and the two is combined, altogether With the tracing task for carrying out human body key point.Substantially increase the tracking performance of model.

The deep learning model is used among the tracing task of everybody more body key points.As shown in Figure 8, this example uses Based on space instances insertion, the bottom-up prediction for carrying out the key point of human body in single-frame images.For each frame image, The third category feature figure, the second category feature figure and final attitude detection result of each frame are obtained first.

Next, being directed to two continuous frames image, double branching temporal logic feature extraction networks are input to, it is special to obtain the 4th class of human body Sign, the 5th category feature.In conjunction with the output of the two, sequential coupling result (the tracking knot of common prediction and former frame testing result Fruit), realize the online tracking of human body key point.

As shown in figure 9, the schematic network structure of double branching temporal logic feature extraction networks.Fig. 8 is the 4th category feature of human body Extracting branch inputs the character representation of neural network bottom, and according to the human body attitude that single frames is predicted, the sense for extracting human region is emerging Interest alignment (ROI-Align) feature, and pass through a series of residual error convolution operations, extract the feature of higher.

Feature at all levels is merged, and the 4th category feature of human body is obtained.

For each human body frame (a human body frame corresponds to an aforementioned target area), one can be obtained in advance The vector for determining dimension (for example, 3072) dimension, as the 4th category feature of human body.

The vector is similar for the 4th category feature of identical people, and the feature of different people is not identical.

Its training method is similar to human body recognizer again, that is, requires the 4th category feature of identical people similar, different people Feature it is not identical.

Fig. 4 is timing instance insertion branch, and characteristic pattern, the third class for inputting the low-level feature of two continuous frames image zooming-out are special Sign figure, the second category feature figure are spliced, and are being 256 by port number, the volume that convolution kernel is 1 having a size of 1*1 and convolution step-length It after product processing, is input in hourglass model and is handled, in the processing output timing example insertion by three convolutional layers.This The port number of the first two convolutional layer is 256 in three convolutional layers, and the size of convolution kernel is 3*3；Convolution step-length is 1.3rd convolution The port number of layer is 2*2, and the size of convolution kernel is 1*1；Convolution step-length is 1.

Timing instance insertion is two-way characteristic pattern.For positive timing instance insertion for, on t frame image each Pixel all returns the human body center point coordinate of a t+1 frame image.It is embedded in conversely, for reverse timing instance, t+1 frame Each pixel on image all returns the center point coordinate that a t frame image corresponds to human body.

Present example provides the temporal aspects of a Ge Shuan branch to extract network, extracts the 4th category feature of human body and the 5th class Feature, come the scheme tracked.The 4th category feature of human body contains the apparent information of entirety in key point region, and timing instance Insertion contains time consistency constraint.

The 4th category feature of human body contains whole apparent information, independent of spatial positional information, human body is fast moved, Camera motion and scaling have good robustness.5th category feature contains the constraint information of time consistency, can make Movement is more smooth, to attitudes vibration and blocks more robust.

This example proposes to be embedded in conjunction with the 4th category feature of human body and timing instance, and the common tracking for carrying out human body key point is appointed Business.Substantially increase the tracking performance of model.

As shown in figure 9, the embodiment of the present application provides a kind of detection device, comprising:

Memory, for storing information；

Processor is connect with display and the memory respectively, for being stored on the memory by executing Computer executable instructions can be realized the image processing method that aforementioned one or more technical solutions provide, for example, such as Fig. 1, At least one of Fig. 2, Fig. 3, Fig. 4, Fig. 5 and image processing method shown in Fig. 7.

The memory can be various types of memories, can be random access memory, read-only memory, flash memory etc..It is described to deposit Reservoir can be used for information storage, for example, storage computer executable instructions etc..The computer executable instructions can be various journeys Sequence instruction, for example, objective program instruction and/or source program instruction etc..

The processor can be various types of processors, for example, central processing unit, microprocessor, Digital Signal Processing Device, programmable array, digital signal processor, specific integrated circuit or image processor etc..

The processor can be connect by bus with the memory.The bus can be IC bus etc..

In some embodiments, the terminal device may also include that communication interface, the communication interface can include: network connects Mouthful, for example, lan interfaces, dual-mode antenna etc..The communication interface is equally connected to the processor, and can be used in information receipts Hair.

In some embodiments, the terminal device further includes man-machine interactive interface, for example, the man-machine interactive interface can Including various input-output equipment, for example, keyboard, touch screen etc..

In some embodiments, the detection device further include: display, the display can show various prompts, adopt The facial image of collection and/or various interfaces.

The embodiment of the present application provides a kind of computer storage medium, and the computer storage medium is stored with computer can Execute code；After the computer-executable code is performed, the figure that aforementioned one or more technical solutions provide can be realized As processing method, for example, as shown in Figure 1, Figure 2, at least one of Fig. 3, Fig. 4, Fig. 5 and image processing method shown in Fig. 7.

In several embodiments provided herein, it should be understood that disclosed device and method can pass through it Its mode is realized.Apparatus embodiments described above are merely indicative, for example, the division of the unit, only A kind of logical function partition, there may be another division manner in actual implementation, such as: multiple units or components can combine, or It is desirably integrated into another system, or some features can be ignored or not executed.In addition, shown or discussed each composition portion Mutual coupling or direct-coupling or communication connection is divided to can be through some interfaces, the INDIRECT COUPLING of equipment or unit Or communication connection, it can be electrical, mechanical or other forms.

Above-mentioned unit as illustrated by the separation member, which can be or may not be, to be physically separated, aobvious as unit The component shown can be or may not be physical unit, it can and it is in one place, it may be distributed over multiple network lists In member；Some or all of units can be selected to achieve the purpose of the solution of this embodiment according to the actual needs.

In addition, can be fully integrated into a processing module in each functional unit in each embodiment of this example, it can also To be each unit individually as a unit, can also be integrated in one unit with two or more units；It is above-mentioned Integrated unit both can take the form of hardware realization, can also realize in the form of hardware adds SFU software functional unit.

Those of ordinary skill in the art will appreciate that: realize that all or part of the steps of above method embodiment can pass through The relevant hardware of program instruction is completed, and program above-mentioned can be stored in a computer readable storage medium, the program When being executed, step including the steps of the foregoing method embodiments is executed；And storage medium above-mentioned include: movable storage device, it is read-only Memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or The various media that can store program code such as person's CD.

The above, only this exemplary specific embodiment, but this exemplary protection scope is not limited thereto, it is any Those familiar with the art can easily think of the change or the replacement in the technical scope that this example discloses, and should all contain It covers within this exemplary protection scope.Therefore, this exemplary protection scope should be based on the protection scope of the described claims.

Claims

1. a kind of image processing method characterized by comprising

Determine target area of the target in described image；

The first category feature is extracted from the target area, wherein first category feature, the image including the target are special Sign；

2. the method according to claim 1, wherein

Second category feature includes:

The key point of a target is directed toward the vector that t+1 frame image corresponds to the central point of target in t frame image, and/or,

The key point of the target of t+1 frame image is directed toward the vector that t frame image corresponds to the central point of target, and t is natural number.

3. method according to claim 1 or 2, which is characterized in that described according to first category feature and described second Category feature carries out target following, comprising:

First category feature of the first category feature of t+1 frame image and the t frame image is matched, the first difference is obtained Information；

By t+1 frame image relative to the second category feature of t frame image and the t frame image relative to the t-1 frame Image obtains the second category feature and is matched, and obtains the second difference information；

According to first difference information and second difference information, the target and t frame in the t+1 frame image are obtained The corresponding relationship between target is corresponded in image.

4. according to the method described in claim 3, it is characterized in that,

It is described according to first difference information and second difference information, obtain target in the t+1 frame image with The corresponding relationship between target is corresponded in t frame image, comprising:

By second difference information of the first difference information of first object in t+1 frame image and the first object, into Row weighted sum obtains summing value；

The first object of the corresponding t+1 frame image of the determination minimum summing value and the second target of t frame image are The same target.

5. method according to any one of claims 1 to 4, which is characterized in that described to extract from the target area One category feature, comprising:

Based on the residual error feature, first category feature is obtained.

6. according to the method described in claim 5, it is characterized in that, described residual to target area progress residual noise reduction acquisition Poor feature, comprising:

It is described to be based on the residual error feature, obtain first category feature, comprising:

Based on the 4th residual error feature, described image feature is obtained.

7. method according to any one of claims 1 to 5, which is characterized in that it is described according to the same target in front and back two The distribution of frame image obtains the second category feature, comprising:

Obtain two third category features respectively from the two field pictures of front and back, wherein the third category feature include: encode it is same The spatial positional information of a target internal key point and the feature that different target can be distinguished；

Based on two third category features, second category feature is generated.

8. a kind of image processing method characterized by comprising

Extraction module, for extracting the first category feature from the target area, wherein first category feature, including it is described The characteristics of image of target；

9. a kind of detection device, the detection device include:

Memory, for storing computer executable instructions；

Processor is connect with the memory, for realizing claim 1 to 7 by executing the computer executable instructions The method that any one provides.

10. a kind of computer storage medium, the computer storage medium is stored with computer executable instructions；The computer After executable instruction is executed by processor, it can be realized the described in any item methods of claim 1 to 7.