CN108229360A

CN108229360A - A kind of method of image procossing, equipment and storage medium

Info

Publication number: CN108229360A
Application number: CN201711437662.5A
Authority: CN
Inventors: 俞大海; 陈术义; 王欣博; 周均扬; 阮志锋
Original assignee: Midea Group Co Ltd
Current assignee: Midea Group Co Ltd
Priority date: 2017-12-26
Filing date: 2017-12-26
Publication date: 2018-06-29
Anticipated expiration: 2037-12-26
Also published as: CN108229360B

Abstract

The embodiment of the invention discloses a kind of method of image procossing, equipment and storage mediums；This method can include：Hand exercise data and convolutional neural networks CNN models based on setting are detected at least one of the detection image frame of video hand object, and obtain the description property value of each hand object；When the description property value of each hand object meets the trigger condition of setting, characteristics of objects and track algorithm based on setting, to setting the hand object in the tracking picture frame of quantity after the detection image frame into line trace.The technical solution of the embodiment of the present invention reduces the convolutional calculation number required when hand detects, calculates cost during hand detection so as to reduce, reduce the time consumed in detection process by the way that detection algorithm and track algorithm are merged.

Description

A kind of method of image procossing, equipment and storage medium

Technical field

The present invention relates to a kind of family's electro-technical field more particularly to method of image procossing, equipment and storage mediums.

Background technology

With the development of computer technology and signal processing technology, more and more home appliances are in addition to traditional Except button operation control, additionally it is possible to be controlled according to the sound of user or gesture.

Home appliance is controlled by gesture, that just needs to be detected hand.Presently relevant hand detection In scheme, generally use is the hand based on depth convolutional neural networks (CNN, Convolutional Neural Network) Portion's detection scheme, the program devises the depth convolutional neural networks for including convolutional layer, pond layer and full articulamentum, right The RGB image at the first visual angle carries out hand detection.Since the program is provided with the larger neural net layer of depth, and detected It needs to carry out full process of convolution to whole image in journey, therefore, the accuracy of detection of the program is higher, but increases calculate generation simultaneously Valency also increases the time that scheme is consumed in detection process.

Invention content

In order to solve the above technical problems, an embodiment of the present invention is intended to provide a kind of method of image procossing, equipment and storages Medium；Calculating cost can be reduced, reduces the time consumed in detection process.

The technical proposal of the invention is realized in this way：

In a first aspect, an embodiment of the present invention provides a kind of method of image procossing, the method includes：

Hand exercise data and convolutional neural networks CNN models based on setting in the detection image frame of video extremely A few hand object is detected, and obtain the description property value of each hand object；

When each hand object description property value meet setting trigger condition when, based on the characteristics of objects of setting with And track algorithm, to setting the hand object in the tracking picture frame of quantity after the detection image frame into line trace.

Second aspect, an embodiment of the present invention provides a kind of image processing equipment, the equipment includes：Filming apparatus is deposited Reservoir and processor；Wherein,

The filming apparatus is configured to acquisition video；

The memory is configured to the computer program that storage can be run on the processor；

The step of processor is configured to when running the computer program, execution first aspect the method.

The third aspect, an embodiment of the present invention provides a kind of computer-readable medium, the computer-readable medium storage There is image processing program, the step of first aspect the method is realized when described image processing routine is performed by least one processor Suddenly.

An embodiment of the present invention provides a kind of method of image procossing, equipment and storage mediums；Based on consecutive frame picture Similitude merges detection algorithm and track algorithm, reduces the convolutional calculation number required when hand detects, so as to Cost is calculated when reducing hand detection, reduces the time consumed in detection process.

Description of the drawings

Fig. 1 is a kind of method flow schematic diagram of image procossing provided in an embodiment of the present invention；

Fig. 2 is a kind of hand object testing process schematic diagram provided in an embodiment of the present invention；

Fig. 3 is a kind of video frame schematic diagram provided in an embodiment of the present invention；

Fig. 4 is a kind of flow diagram for building CNN models provided in an embodiment of the present invention；

Fig. 5 is a kind of CNN network establishments schematic diagram provided in an embodiment of the present invention；

Fig. 6 is a kind of method specific example flow diagram of image procossing provided in an embodiment of the present invention；

Fig. 7 is a kind of composition schematic diagram of image processing equipment provided in an embodiment of the present invention；

Fig. 8 is the composition schematic diagram of another image processing equipment provided in an embodiment of the present invention；

Fig. 9 is a kind of particular hardware structure diagram of image processing equipment provided in an embodiment of the present invention.

Specific embodiment

Below in conjunction with the attached drawing in the embodiment of the present invention, the technical solution in the embodiment of the present invention is carried out clear, complete Site preparation describes.

When carrying out hand detection using depth CNN networks, although accuracy of detection higher can be realized, need to adopting Each video frame in collection video carries out convolution algorithm, so can increase calculating cost, causes to detect consumed time increasing Add, be unable to reach the effect detected in real time.Furthermore, it is understood that such as processing chip being usually configured in intelligent appliance equipment, The arithmetic units such as ROM or RAM are compared to arithmetic unit in the terminal devices such as smart mobile phone or personal computer in calculation process energy There is larger gap, therefore, it is necessary to the relationships between balance detection precision and detection speed in power.Based on this, the present invention is logical Following embodiment is crossed to be illustrated.

Embodiment one

Referring to Fig. 1, it illustrates a kind of method of image procossing provided in an embodiment of the present invention, this method can be applied to In any home appliance for needing progress hand detection, this method can include：

S101：Hand exercise data and convolutional neural networks CNN models based on setting are to the detection image frame of video At least one of hand object be detected, and obtain the description property value of each hand object；

S102：When the description property value of each hand object meets the trigger condition of setting, the object based on setting Feature and track algorithm, the hand object in tracking picture frame to setting quantity after the detection image frame carry out with Track.

It should be noted that since the variation between the adjacent video frames in one section of video is smaller, to video During interior hand object is detected, without carrying out hand pair according to CNN models to each video frame in video The detection of elephant,, can after being detected by CNN models to the hand object in video frame in technical solution shown in FIG. 1 According to track algorithm the video frame images after detection image frame are carried out with the tracking of hand object, since track algorithm need not Convolution is carried out to video frame images, so, technical solution shown in FIG. 1 can reduce compared with relevant hand detection scheme Hand calculates cost when detecting, reduce the time consumed in detection process.

For technical solution shown in FIG. 1, it should be noted that step S101 can be from any video frame figure in video As starting to perform, it is possible thereby to learn, the video frame images that step S101 is performed in video are referred to as detection image frame, and to inspection It, can be from detection image frame after hand exercise data and CNN models of the altimetric image frame based on setting carry out hand detection The description property value of at least one hand object and each hand object is obtained, the description property value of hand object can include： Size of position, hand object of the hand object in detection image frame in detection image frame, the classification of hand object and The confidence level of hand object detection.Specifically, position of the hand object in detection image frame can be the seat of hand object Mark, size of the hand object in detection image frame can be width and height of the hand object in detection image frame.

And the hand exercise data set can pass through the existing gesture number to being collected into before step S101 is performed According to being trained to obtain, due to technical solution shown in FIG. 1 application and household appliances field, household can be collected in advance Common gesture data under scene, and hand exercise data are obtained by training to gesture data, so as to combine CNN moulds Type is detected the hand object in detection image frame.

For step S101, can be implemented by atypical hand object testing process shown in Fig. 2, As shown in Fig. 2, hand object detection environment can include application APP part and the gesture being deployed in chip system inspection Survey Gesture Software Development Kit (SDK, Software Development Kit) part；Gesture SDK can pass through Computer vision of increasing income library (such as：OpenCV, Open Source Computer Vision Library, SimpleCV, JavaCV etc.) it realizes, which includes CNN model Ms odel, the image processing algorithms etc. of third party Third Party.Video By the way that filming apparatus in home appliance, such as camera Camera is set to shoot driving Driver progress images by calling Acquisition, and by the image collected Image Capture by the way that Gesture SDK is called to carry out pretreatment Pre- Processing, such as denoising, enhancing clarity etc., then call the CNN models in Gesture SDK to carry out hand object inspection Detection is surveyed, after detection, by the way that Gesture SDK is called to carry out track and localization Tracking＆ to hand object Alignment after verification Recognition verification are identified according to Gesture SDK, calls API to pass The monitor Monitor in APP is transported to, so as to export the testing result of hand object in APP, specific testing result can be with The description property value of quantity and each hand object including hand object.In flow shown in Fig. 2, solid arrow represents number According to transmission path, dotted arrow represent processing data procedures in calling path, referring to shown in the legend below Fig. 2.

It should be noted that after execution of step S101, it will be able to obtain the hand object number in detection image frame The description property value of amount and each hand object.After step slol, it is possible to the subsequent video image of detection image frame Frame carries out the tracking of hand object.Tracking phase is namely turned to by detection-phase.Therefore, technical solution shown in FIG. 1 will be examined The video image frame that quantity is set after altimetric image frame is referred to as to track picture frame.

Specifically, when the hand object quantity detected is more, a large amount of fortune can be expended by carrying out the tracking of multipair elephant Process resource is calculated, more preferably continues to be detected hand object according to step S101 at this time.Therefore, in the present embodiment In, the description property value of each hand object meets the quantity that the trigger condition set can be preferably hand object and is less than 3. That is for step S102, when the description property value of each hand object meets the trigger condition of setting, based on setting Characteristics of objects and track algorithm, to set after the detection image frame hand object in the tracking picture frame of quantity into Line trace can include：

When the quantity of hand object is less than 3, characteristics of objects and track algorithm based on setting, to the tracing figure As the hand object in frame is into line trace；

When the quantity of hand object is greater than or equal to 3, then S101 is repeated, i.e.,：Based on the hand exercise number According to this and the CNN models, at least one of a later frame image of detection image frame hand object is detected, and Obtain the description property value of each hand object.

By taking two kinds shown in Fig. 3 illustrative detection image frames as an example, detected in the detection image frame in left side more than 3 Therefore hand object, can continue subsequent video image frame the detection of hand object；In the detection image frame on right side only It detects a hand object, therefore, then the detection of subsequent video image frame progress hand object will be consumed largely Calculation resources, so, the tracking of hand object is carried out to subsequent video image frame.

In addition, during step S102 is performed, each hand object can also be initialized, can specifically incited somebody to action each The description property value of hand object is set as tracking initial value, for initializing object to be tracked, the tracking of each hand object Initial value can include position of each hand object in detection image frame, size of each hand object in detection image frame with And the classification of each hand object.

After the completion of initialization, it is possible to characteristics of objects and track algorithm based on setting, to the detection image frame The hand object in the tracking picture frame of setting quantity is into line trace later.In the present embodiment, characteristics of objects can include side To histogram of gradients (HOG, Histogram of Oriented Gradient) feature, gray scale Gray features can also be included, It should be pointed out that tracking velocity can be accelerated into line trace using Gray features.

In addition, during step S102 is performed, in order to avoid the hand object quickly moved occurs what tracking was lost Situation for technical solution shown in FIG. 1, can also be directed to each hand object and set corresponding tracking peak value peak_value, It is used to refer to whether each hand object has obtained timely tracking.At this point, technical solution shown in FIG. 1 can also include：

When tracking peak value at least one in the corresponding tracking peak value peak_value of each hand object is less than the tracking of setting During peak value lower limit, alternatively, when tracking image frames numbers beyond the setting quantity, S101 is repeated, i.e.,：Based on the hand Portion's training data and at least one of the CNN models, a later frame image to currently tracking picture frame hand object into Row detection, and obtain the description property value of each hand object.At this point it is possible to detection-phase is switched to by tracking phase.It needs to illustrate , different tracking peak value lower limits according to each hand object can be set, be so as to reasonably determining each hand object It is no to have obtained timely tracking.

In the specific implementation of step 102, the track algorithm of setting can be preferably core correlation filtering (KCF, Kernelized Correlation Filters) algorithm, the detailed process of the algorithm in the present embodiment can include：Training One object detector, whether the position that detection next frame is removed using object detector is target, then reuses new detection As a result it removes update training set and updates object detector.

It should be noted that in training objective detector, selection target area is positive sample, and peripheral region is negative sample This.Positive negative sample is acquired using the circular matrix for choosing target area, using ridge regression algorithm training objective detector, and is utilized Property of the circular matrix in Fourier space diagonalizable, the Hadamard hadamad for the operation of matrix being converted to matrix are accumulated, So as to reduce operand, arithmetic speed is improved.In the present embodiment, it is straight can to include direction gradient for the characteristics of objects of extraction Side's figure (HOG, Histogram of Oriented Gradient) feature, can also include gray scale Gray features, it should be pointed out that , can accelerate tracking velocity into line trace using Gray features.

To said program in this present embodiment, can also include carrying out for the convolutional neural networks CNN models of setting The process of structure referring to Fig. 4, can include：

S401：Convolutional neural networks are built, the convolutional neural networks include at least four network layers：Image input layer, At least one convolutional layer, at least one pond layer and at least one full articulamentum；

It should be noted that the technical solution that is provided of the embodiment of the present invention for existing convolutional neural networks CNN into Row optimization so as under the conditions of limited operational capability, improve the feature representation ability of CNN models, and also is able to It is reduced when being detected by CNN and calculates consumption.

S402：When examined object quantity is less than predetermined threshold value, reduce the quantity of convolution kernel in the CNN；

It should be noted that pass through experimental verification, it can be deduced that following conclusions：In single class based on convolutional neural networks or In few class detection, narrow network can also obtain higher accuracy rate.Therefore, keeping having convolutional neural networks CNN depth Under the premise of, it is detected for single class or few class (such as 2 to 3 classes), can suitably reduce convolution kernel number.Specifically, as possible Under the premise of ensureing CNN network depths, when carrying out single class detection or few class detection, reduce in the CNN every layer of convolution kernel Quantity is less than or equal to 100.It is consumed so as to reduce by the calculating in CNN calculating process.

S403：The image that described image input layer inputs is divided at least one profit according to the edge determination strategy of setting The internal storage data section stored with contiguous memory, and carry out data using each internal storage data section of contiguous memory copy function pair of setting Copy；

For S403, it should be noted that CNN calculating process can include data preparation and matrix multiplication, and before being based on State convolution kernel number described in S402 it is fewer in the case of, Data Preparation Process can occupy higher in CNN calculating process Time scale, presently relevant Data Preparation Process generally use are reset image block and are copied for rectangular array Im2col functions progress data Shellfish, can be by judging the boundary parameter during convolutional calculation in order to reduce the time of Data Preparation Process, and analysis data exist Arrangement mode in memory, by data preparation it is all using contiguous memory store internal storage data sections, using memory copying Memcpy functions carry out data copy, the mode of cycle copy are avoided the occurrence of, so as to reduce the time that data copy is consumed.

In addition, for the matrix multiplication in CNN calculating process, preferably matrix multiplication can be carried out with third party library Speed-raising.Winograd Winograd algorithms are matrix multiplication algorithms most fast at present, and facebook facebook exploitations NNPACK accelerates packet that can realize above-mentioned algorithm, and NNPACK accelerates data flow single instrction of the packet for X86 system processors Majority according to the NEON instruction set of expansion instruction set 2 (SSE2, Streaming SIMD Extensions 2) and arm processor all Corresponding optimization is done.Therefore, packet is accelerated to accelerate the matrix multiplication process in CNN calculating process using NNPACK, so as to Forward speed can be improved.But although NNPACK accelerate packet can lifting matrixes multiplication speed, fully rely on its own Accelerated parallel, cpu busy percentage is not high, so we can utilize multi-threading first to image to be detected in periphery It is split, then parallel acceleration has been carried out by way of it will split obtained image block and be respectively combined.

S404：According to the consolidation strategy of setting by the initial parameter in batch regularization layer and the convolutional layer or described complete The parameter of articulamentum merges, using the parameter after merging as the new parameter of batch regularization layer；Wherein, the batch canonical Change layer after the convolutional layer or the full articulamentum.

For S404, in CNN calculating process, batch regularization (BN, Batch Normalization) is also indispensable One of few step can all add a BN layers of progress data normalization processing after the convolutional layer and full articulamentum of CNN, so as to Enhance the expressive ability of CNN, improve the convergence rate of depth CNN.And for BN layers, can preserve variance variance, The feature for inputting BN layers is normalized in the parameter of this four batch regularization layers of value mean, beta and gamma, specifically Normalized process linear course.This four parameters can be stored in weight weight texts as CNN model parameters In part.During target detection is carried out using CNN, convolutional layer, full articulamentum etc. it is also assumed that be linear transformation layer, because This, the linear change of linear change and convolutional layer, full articulamentum with reference to normalized can will be in batch regularization layer The parameter of initial parameter and the convolutional layer or the full articulamentum merge.In a kind of possible realization method, S404 can specifically include：

Mean Parameters in the batch regularization layer described are connected according to the first consolidation strategy and the convolutional layer or entirely The offset parameter connect in layer merges, and obtains the first amalgamation result；

Based on first amalgamation result, by the variance parameter in the batch regularization layer according to the second consolidation strategy with The convolutional layer or offset parameter in the full articulamentum and weight parameter merge.

For above-mentioned realization method, it is preferable that the Mean Parameters by the batch regularization layer are closed according to first And strategy is merged with the offset parameter in the convolutional layer or the full articulamentum, is obtained the first amalgamation result, can be wrapped It includes：

Based on the first expression formula and the second expression formula by the Mean Parameters in the batch regularization layer and the convolutional layer Or the offset parameter in the full articulamentum merges, and obtains the first amalgamation result；

Wherein, first expression formula is Y=WX+bias, and second expression formula isWherein, X is the input of the convolutional layer or the full articulamentum, and Y is the volume The output of lamination or the full articulamentum, Yb be the batch regularization layer output, variance variance, mean value mean, Beta and gamma is the parameter of the batch regularization layer；W is the weighting parameter of the convolutional layer or the full articulamentum, Bias is the offset parameter of the convolutional layer or the full articulamentum；

First amalgamation result is as shown in Equation 1：

Wherein,

It is described based on first amalgamation result based on above-mentioned preferred realization method, it will be in the batch regularization layer Variance parameter according to the offset parameter in the second consolidation strategy and the convolutional layer or the full articulamentum and weight parameter into Row merges, and can include：

SettingAndAnd according to W ' and bias ' Abbreviation is carried out to first amalgamation result, obtains Yb=W ' X+bias '；

By the W ' and the model parameter of bias ' preservations to the convolutional layer or the full articulamentum.

The specific derivation process of above-mentioned realization method is as follows：

The first expression formula and the second expression formula are primarily based on, is enabledThus by the first expression formula and second Expression formula merges into formula 1；Next, settingAnd And abbreviation is carried out to first amalgamation result according to W ' and bias ', obtain Yb=W ' X+bias '.Compare the table finally obtained Up to formula and the first expression formula, it can be found that after the W ' and bias ' are calculated in advance, preserve to the convolutional layer or institute The model parameter of full articulamentum is stated, then is just not required to carry out the calculating of batch regularization during new forward calculation again, so as to Save the calculating time of batch regularization.

For building process shown in Fig. 4, it is preferable that can also the part in the convolutional layer be calculated step and be arranged on It is calculated after the calculating step of the pond layer, so as to reduce calculation amount.For example, it is pond after usual convolutional layer Pool layers, such as maxpool layers, and the leaky activation primitives of convolutional layer and biasing bias add operations are placed on pond layer After carry out, then can just reduce 3/4 activation primitive calculate and bias calculate.In addition, for leaky activation primitives and Maxpool layers etc. take larger calculating process, and the mode that multithreading may be used is handled, so as to further improve multinuclear Utilization rate,

For building process shown in Fig. 4, it is preferable that can also include：

It derives to preserve each network layer output data Blob in CNN based on layer Layer classes；

The linking relationship between different Blob is established by the way of data flow, builds the CNN.

For example, can CNN, Ke Yishi be built by efficient, light-weighted deep neural network forward calculation frame Existing CNN build conveniently, the CNN expansibilities built it is strong, and can easily import darknet, caffe, tensorflow, The network models such as keras and theano, without to different depth learning framework repeated optimization.The frame is preferably by pure C++ Code does not depend on any three-party library, the convenient transplanting in different platform.And it is basic to have reserved GEMM and CONVOLUTION etc. The interface of function can facilitate user to use such as NNPACK three-party libraries or the self-defined optimization of progress for different platform.In addition, The CNN networks built by cpp files are realized, can realize efficient model initialization, and can be in program released version The details of effective hidden algorithm and data, avoids the leakage of core algorithm.Specifically, it in CNN build processes, may be used The network establishment mode of the data flow diagram similar with tensorflow preserves the output data of each network layer in Blob, in net Linking relationship is only needed to be established between different blob when network is built with Layer to build CNN networks.When network layer is expanded, A self-defined Layer class is only needed, realizes the data flow conversion function in different Blob, it is new so as to easily increase Network structure can also easily delete unwanted network structure to reduce program size.As shown in Figure 5 builds signal, The Blob of each network layer is derived from by defining multiple Layer classes, and establishes chain type connection relation.

It should be noted that in the present embodiment, Blob is a kind of type of data structure, specifically, Blob types are (Width, Height, Channel, Number) four-tuple represents that width, height, port number, quantity (or are kind respectively Class).

For building process shown in Fig. 4, can accordingly be optimized preferably for 2 models of CNN-YOLO, so as to It realizes under the premise of computational accuracy does not decline, improves 10 times or more of calculating speed.

In practical applications, it is detected and tracks by the above process for single gesture object in the present embodiment, with The scheme that hand detection is carried out using depth CNN networks in the relevant technologies is compared, and improves the performance of hand detection, detection speed 5 times or more are promoted, expending computing resource reduces by 4 times or more.

Embodiment two

Based on the identical inventive concept of previous embodiment, the present embodiment is by specific example flow to the skill of above-described embodiment Art scheme is illustrated, and referring to Fig. 6, which can include：

S601：The detection mark DECTECT FLAG of bool types are set, and the initial value of DECTECT FLAG is set For True；

S602：Receive the i-th frame in video frame；Wherein, i represents video frame number；

It is to be appreciated that the scheme of the present embodiment can be performed in any moment during obtaining video, without from regarding The first frame of frequency proceeds by.

S603：Judge the value of DECTECT FLAG；S604A is then gone to if TRUE；Otherwise S604B is gone to；

It is to be appreciated that when the value of DECTECT FLAG is TRUE, illustrate to need to examine the hand object of the i-th frame It surveys；When the value of DECTECT FLAG is FALSE, illustrate the i-th frame has been detected by hand object, therefore only need to right before The hand object of i-th frame is into line trace；

S604A：The hand object in the i-th frame is detected according to the hand exercise data and CNN models of setting, and Obtain the description property value of hand object；

Specifically, the CNN models of setting can be built according to building process shown in Fig. 4 in previous embodiment, The present embodiment does not repeat this.

S605A：Judge whether the description value of hand object meets the trigger condition of setting；If so, go to S606A：It will The value of DECTECT FLAG is set as FALSE, and sets i=i+1 and go to S602, receives the i+1 frame of video；Otherwise, if It puts i=i+1 and goes to S602, receive the i+1 frame of video；

S604B：Characteristics of objects and track algorithm based on setting, to the hand object in the i-th frame into line trace；

S605B：Judge whether that tracking is lost；If so, go to S606B：The value of DECTECT FLAG is set as TRUE, And i=i+1 is set and goes to S602, receive the i+1 frame of video；Otherwise, i=i+1 is set and goes to S602, receives video I+1 frame.

By process shown in fig. 6, during the detection and tracking for carrying out hand object, by setting detection mark DECTECT FLAG, and corresponding assignment is carried out to detection mark DECTECT FLAG to control detection process and track Journey, so as to implement technical solution shown in FIG. 1.

Embodiment three

Based on the identical inventive concept of previous embodiment, referring to Fig. 7, it illustrates a kind of figures provided in an embodiment of the present invention As the composition of processing equipment 70, the equipment 70 includes：Detection part 701 and tracking section 702；

Wherein, the detection part 701 is configured to the hand exercise data of setting and convolutional neural networks CNN Model is detected at least one of the detection image frame of video hand object, and obtains the description attribute of each hand object Value；

The tracking section 702 is configured to meet the trigger condition of setting when the description property value of each hand object When, characteristics of objects and track algorithm based on setting, in the tracking picture frame of setting quantity after the detection image frame Hand object into line trace.

In a kind of possible realization method, the description property value of the hand object can include：Hand object is being examined Size, the classification of hand object and the hand object detection of position, hand object in detection image frame in altimetric image frame Confidence level.

In a kind of possible realization method, the detection part 701 is additionally configured to collect in advance normal under household scene Use gesture data, and obtains the hand exercise data by training to the gesture data.

In a kind of possible realization method, the tracking section 702 is configured to：

When the quantity of hand object is greater than or equal to 3, the detection part 701 is triggered to the detection image frame At least one of a later frame image hand object is detected, and obtains the description property value of each hand object.

The description property value of each hand object is set as tracking initial value, wherein, at the beginning of the tracking of each hand object Initial value includes position, each hand object of each hand object in detection image frame size and each hand in detection image frame The classification of portion's object.

In a kind of possible realization method, the characteristics of objects includes histograms of oriented gradients HOG features or gray scale Gray features；The track algorithm includes core correlation filtering (KCF, Kernelized Correlation Filters) algorithm.

In a kind of possible realization method, the tracking section 702 is additionally configured to：

For the corresponding tracking peak value peak_value of each hand object setting；

When tracking peak value at least one in the corresponding tracking peak value peak_value of each hand object is less than the tracking of setting During peak value lower limit, alternatively, when tracking image frames numbers beyond the setting quantity, based on the hand exercise data and institute CNN models are stated, trigger at least one of a later frame image of the detection part 701 to currently tracking picture frame hand object It is detected, and obtains the description property value of each hand object.

In addition, referring to Fig. 8, image processing equipment 70 can also include：It builds part 703, first and optimizes part 704, the Two optimization parts 705, third optimization part 706；Wherein,

Part 703 is built, is configured to structure convolutional neural networks, the convolutional neural networks include at least four networks Layer：Image input layer, at least one convolutional layer, at least one pond layer and at least one full articulamentum；

First optimization part 704 is configured to, when examined object quantity is less than predetermined threshold value, reduce in the CNN and roll up The quantity of product core；

Second optimization part 705, is configured to the edge determination plan according to setting by the image that described image input layer inputs At least one internal storage data section stored using contiguous memory is slightly divided into, and each using the contiguous memory copy function pair of setting Internal storage data section carries out data copy；

Third optimizes part 706, be configured to consolidation strategy according to setting by the initial parameter in batch regularization layer with The parameter of the convolutional layer or the full articulamentum merges, using the parameter after merging as the new ginseng of batch regularization layer Number；Wherein, the batch regularization layer is after the convolutional layer or the full articulamentum.

In a kind of possible realization method, the first optimization part 704 is configured to when the single class detection of progress or few class detection When, the quantity for reducing every layer of convolution kernel in the CNN is less than or equal to 100.

In a kind of possible realization method, the second optimization part 705 is configured to utilize memcpy pairs of memory copying function All internal storage data sections using contiguous memory storage carry out data copy.

In a kind of possible realization method, third optimization part 706 is configured to：

Based on above-mentioned realization method, third optimizes part 706, and concrete configuration is：

First amalgamation result is as shown in Equation 2：

Wherein,

By the model parameter of the W ' and bias ' preservations to the convolutional layer or the full articulamentum, then in new forward direction It does not need to carry out the calculating of batch regularization in calculating again, saves and calculate the time.

In a kind of possible realization method, image processing equipment 70 further includes：4th optimization part 707, be configured to by Part in the convolutional layer calculates after step is arranged on the calculating step of the pond layer and is calculated.

In a kind of possible realization method, image processing equipment 70 further includes：5th optimization part 708, is configured to：

In a kind of possible realization method, image processing equipment 70 further includes：6th optimization part 709, is configured to：

It is disposably the memory headroom needed for each network Layer assignment according to the memory size needed for each network layer；

All data in the memory headroom that distributes directly by obtaining；

When the data do not use, the occupied memory headroom of data not used is covered by follow-up data, nothing The distribution and release of row repetition need to internally be deposited into.When this not only reduces content duplicate allocation and discharging required calculating Between, the also less memory size for needing to distribute.

It is to be appreciated that in the present embodiment, " part " can be partial circuit, segment processor, subprogram or soft Part etc., naturally it is also possible to be unit, it can also be non-modular for can also be module.

In addition, each component part in the present embodiment can be integrated in a processing unit or each list Member be individually physically present, can also two or more units integrate in a unit.Above-mentioned integrated unit both can be with It is realized, can also be realized in the form of software function module in the form of hardware.

If the integrated unit realizes that being not intended as independent product is sold in the form of software function module Or it in use, can be stored in a computer read/write memory medium, based on such understanding, the technical side of the present embodiment The part or all or part of the technical solution that case substantially in other words contributes to the prior art can be produced with software The form of product embodies, which is stored in a storage medium, is used including some instructions so that one Platform computer equipment (can be personal computer, server or the network equipment etc.) or processor (processor) perform sheet The all or part of step of embodiment the method.And aforementioned storage medium includes：USB flash disk, mobile hard disk, read-only memory (ROM, Read Only Memory), random access memory (RAM, Random Access Memory), magnetic disc or CD Etc. the various media that can store program code.

Therefore, a kind of computer-readable medium is present embodiments provided, which has image procossing The step of program, described image processing routine realizes the method described in above-described embodiment one when being performed by least one processor.

Based on the composition and computer-readable medium of above-mentioned image processing equipment 70, referring to Fig. 9, it illustrates the present invention The particular hardware structure for the image processing equipment 70 that embodiment provides, can include：Filming apparatus 901, memory 902 and processing Device 903；Various components are coupled by bus system 904.It is understood that bus system 904 be used to implement these components it Between connection communication.Bus system 904 further includes power bus, controlling bus and status signal in addition to including data/address bus Bus.But for the sake of clear explanation, various buses are all designated as bus system 904 in fig.9.Wherein, filming apparatus 901, it is configured to acquisition video.

Memory 902 is configured to the computer program that storage can be run on processor 903；

Processor 903 is configured to when running the computer program, performs following steps：

It is appreciated that the memory 902 in the embodiment of the present invention can be volatile memory or nonvolatile memory, Or it may include both volatile and non-volatile memories.Wherein, nonvolatile memory can be read-only memory (Read- Only Memory, ROM), programmable read only memory (Programmable ROM, PROM), the read-only storage of erasable programmable Device (Erasable PROM, EPROM), electrically erasable programmable read-only memory (Electrically EPROM, EEPROM) or Flash memory.Volatile memory can be random access memory (Random Access Memory, RAM), be used as external high Speed caching.By exemplary but be not restricted explanation, the RAM of many forms can be used, such as static RAM (Static RAM, SRAM), dynamic random access memory (Dynamic RAM, DRAM), Synchronous Dynamic Random Access Memory (Synchronous DRAM, SDRAM), double data speed synchronous dynamic RAM (Double Data Rate SDRAM, DDRSDRAM), enhanced Synchronous Dynamic Random Access Memory (Enhanced SDRAM, ESDRAM), synchronized links Dynamic random access memory (Synchlink DRAM, SLDRAM) and direct rambus random access memory (Direct Rambus RAM, DRRAM).The memory 902 of system and method described herein be intended to including but not limited to these and it is arbitrary its It is suitble to the memory of type.

And processor 903 may be a kind of IC chip, the processing capacity with signal.During realization, on Stating each step of method can be completed by the integrated logic circuit of the hardware in processor 903 or the instruction of software form. Above-mentioned processor 903 can be general processor, digital signal processor (Digital Signal Processor, DSP), Application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor are patrolled Collect device, discrete hardware components.It can realize or perform disclosed each method, step and the box in the embodiment of the present invention Figure.General processor can be microprocessor or the processor can also be any conventional processor etc..With reference to the present invention The step of method disclosed in embodiment, can be embodied directly in hardware decoding processor and perform completion or use decoding processor In hardware and software module combination perform completion.Software module can be located at random access memory, and flash memory, read-only memory can In the storage medium of this fields such as program read-only memory or electrically erasable programmable memory, register maturation.The storage Medium is located at memory 902, and processor 903 reads the information in memory 902, and the step of the above method is completed with reference to its hardware Suddenly.

It is understood that embodiments described herein can use hardware, software, firmware, middleware, microcode or its It combines to realize.For hardware realization, processing unit can be realized in one or more application-specific integrated circuit (Application Specific Integrated Circuits, ASIC), digital signal processor (Digital Signal Processing, DSP), digital signal processing appts (DSP Device, DSPD), programmable logic device (Programmable Logic Device, PLD), field programmable gate array (Field-Programmable Gate Array, FPGA), general processor, Controller, microcontroller, microprocessor, for performing in other electronic unit or combinations of herein described function.

For software implementations, it can be realized herein by performing the module (such as process, function etc.) of function described herein The technology.Software code is storable in memory and is performed by processor.Memory can in the processor or It is realized outside processor.

Specifically, it when the processor 903 in image processing equipment 70 is additionally configured to run the computer program, performs Method and step described in previous embodiment one, is not discussed here.

It should be understood by those skilled in the art that, the embodiment of the present invention can be provided as method, system or computer program Product.Therefore, the shape of the embodiment in terms of hardware embodiment, software implementation or combination software and hardware can be used in the present invention Formula.Moreover, the present invention can be used can use storage in one or more computers for wherein including computer usable program code The form of computer program product that medium is implemented on (including but not limited to magnetic disk storage and optical memory etc.).

The present invention be with reference to according to the method for the embodiment of the present invention, the flow of equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that it can be realized by computer program instructions every first-class in flowchart and/or the block diagram The combination of flow and/or box in journey and/or box and flowchart and/or the block diagram.These computer programs can be provided The processor of all-purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices is instructed to produce A raw machine so that the instruction performed by computer or the processor of other programmable data processing devices is generated for real The device of function specified in present one flow of flow chart or one box of multiple flows and/or block diagram or multiple boxes.

These computer program instructions, which may also be stored in, can guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works so that the instruction generation being stored in the computer-readable memory includes referring to Enable the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one box of block diagram or The function of being specified in multiple boxes.

These computer program instructions can be also loaded into computer or other programmable data processing devices so that counted Series of operation steps are performed on calculation machine or other programmable devices to generate computer implemented processing, so as in computer or The instruction offer performed on other programmable devices is used to implement in one flow of flow chart or multiple flows and/or block diagram one The step of function of being specified in a box or multiple boxes.

The foregoing is only a preferred embodiment of the present invention, is not intended to limit the scope of the present invention.

Claims

A kind of 1. method of image procossing, which is characterized in that the method includes：

Hand exercise data and convolutional neural networks CNN models based on setting are at least one in the detection image frame of video A hand object is detected, and obtains the description property value of each hand object；

When the description property value of each hand object meets the trigger condition of setting, characteristics of objects based on setting and with Track algorithm, to setting the hand object in the tracking picture frame of quantity after the detection image frame into line trace.
2. according to the method described in claim 1, it is characterized in that, the description property value of the hand object includes：Hand pair As size, the classification of hand object and hand pair of the position, hand object in detection image frame in detection image frame As the confidence level of detection.
3. according to the method described in claim 1, it is characterized in that, the method further includes：It collects under household scene in advance Common gesture data, and the hand exercise data are obtained by training to the gesture data.
4. according to the method described in claim 1, it is characterized in that, the description property value when each hand object meets setting Trigger condition when, characteristics of objects and track algorithm based on setting, to set after the detection image frame quantity with Hand object in track picture frame into line trace, including：

When the quantity of hand object is less than setting number, characteristics of objects and track algorithm based on setting, to the tracking Hand object in picture frame is into line trace；

When the quantity of hand object is greater than or equal to setting number, based on the hand exercise data and the CNN models, At least one of a later frame image to detection image frame hand object is detected, and obtain retouching for each hand object State property value.
5. according to the method described in claim 1, it is characterized in that, the description property value when each hand object meets setting Trigger condition when, the method further includes：

The description property value of each hand object is set as tracking initial value, wherein, the tracking initial value of each hand object The size and each hand pair of position, each hand object in detection image frame including each hand object in detection image frame The classification of elephant.
6. according to the method described in claim 1, it is characterized in that, the characteristics of objects includes histograms of oriented gradients HOG spies Sign or gray scale Gray features；The track algorithm includes core correlation filtering (KCF, Kernelized Correlation Filters) algorithm.
7. according to the method described in claim 1, it is characterized in that, the method further includes：

For the corresponding tracking peak value peak_value of each hand object setting；

When tracking peak value at least one in the corresponding tracking peak value peak_value of each hand object is less than the tracking peak value of setting During lower limit, alternatively, when tracking image frames numbers beyond the setting quantity, based on hand exercise data and described At least one of CNN models, a later frame image to currently tracking picture frame hand object is detected, and is obtained described each The description property value of hand object.
8. method according to any one of claims 1 to 7, which is characterized in that the method further includes：

Convolutional neural networks are built, the convolutional neural networks include at least four network layers：Image input layer, at least one volume Lamination, at least one pond layer and at least one full articulamentum；

When examined object quantity is less than predetermined threshold value, reduce the quantity of convolution kernel in the CNN；

The image that described image input layer inputs is divided at least one utilize in continuous according to the edge determination strategy of setting The internal storage data section of storage, and carry out data copy using each internal storage data section of contiguous memory copy function pair of setting；

According to the consolidation strategy of setting by the initial parameter in batch regularization layer and the convolutional layer or the full articulamentum Parameter merges, using the parameter after merging as the new parameter of batch regularization layer；Wherein, the batch regularization layer is in institute After stating convolutional layer or the full articulamentum.
9. according to the method described in claim 8, it is characterized in that, when examined object quantity is less than predetermined threshold value, reduce The quantity of convolution kernel in the CNN, including：

When carrying out single class detection or few class detection, the quantity for reducing every layer of convolution kernel in the CNN is less than or equal to 100.
It is 10. according to the method described in claim 8, it is characterized in that, described each using the contiguous memory copy function pair of setting Internal storage data section carries out data copy, including：

Data copy is carried out to all internal storage data sections using contiguous memory storage using memory copying function memcpy.
11. according to the method described in claim 8, it is characterized in that, the consolidation strategy according to setting is by batch regularization The parameter of initial parameter and the convolutional layer or the full articulamentum in layer merges, using the parameter after merging as batch The new parameter of regularization layer, including：

By the Mean Parameters in the batch regularization layer according to the first consolidation strategy and the convolutional layer or the full articulamentum In offset parameter merge, obtain the first amalgamation result；

Based on first amalgamation result, by the variance parameter in the batch regularization layer according to the second consolidation strategy with it is described Convolutional layer or offset parameter in the full articulamentum and weight parameter merge.
12. according to the method for claim 11, which is characterized in that the Mean Parameters by the batch regularization layer It is merged according to the first consolidation strategy and the offset parameter in the convolutional layer or the full articulamentum, obtains first and merge knot Fruit, including：

Based on the first expression formula and the second expression formula by the Mean Parameters in the batch regularization layer and the convolutional layer or institute The offset parameter stated in full articulamentum merges, and obtains the first amalgamation result；

Wherein, first expression formula is Y=WX+bias, and second expression formula is Wherein, X is the input of the convolutional layer or the full articulamentum, and Y is the output of the convolutional layer or the full articulamentum, and Yb is The output of the batch regularization layer, variance variance, mean value mean, beta and gamma are the batch regularization layer Parameter；W is the weighting parameter of the convolutional layer or the full articulamentum, and bias is the inclined of the convolutional layer or the full articulamentum Put parameter；

First amalgamation result is as shown in Equation 1：

Wherein,
13. according to the method for claim 12, which is characterized in that it is described based on first amalgamation result, it will be described batch The variance parameter in regularization layer is measured according to the offset parameter in the second consolidation strategy and the convolutional layer or the full articulamentum It is merged with weight parameter, including：

SettingAnd according to W ' and bias ' to institute It states the first amalgamation result and carries out abbreviation, obtain Yb=W ' X+bias '；

By the W ' and the model parameter of bias ' preservations to the convolutional layer or the full articulamentum.
14. according to the method described in claim 8, it is characterized in that, the method further includes：

Part calculating step in the convolutional layer is arranged on after the calculating step of the pond layer and is calculated.
15. according to the method described in claim 8, it is characterized in that, the method further includes：

It derives to preserve each network layer output data Blob in CNN based on layer Layer classes；

The linking relationship between different Blob is established by the way of data flow, builds the CNN.
16. according to the method described in claim 8, it is characterized in that, when building the CNN, the method further includes：

It is disposably the memory headroom needed for each network Layer assignment according to the memory size needed for each network layer；

All data in the memory headroom that distributes directly by obtaining；

When the data do not use, the occupied memory headroom of data not used is covered by follow-up data.
17. a kind of image processing equipment, which is characterized in that the equipment includes：Filming apparatus, memory and processor；Wherein,

The filming apparatus is configured to acquisition video；

The memory is configured to the computer program that storage can be run on the processor；

The processor is configured to when running the computer program, any one of perform claim requirement 1 to 16 the method Step.
18. a kind of computer-readable medium, the computer-readable medium storage has image processing program, described image processing journey It is realized when sequence is performed by least one processor such as the step of any one of claim 1 to 16 the method.