CN106919918A

CN106919918A - A kind of face tracking method and device

Info

Publication number: CN106919918A
Application number: CN201710108748.7A
Authority: CN
Inventors: 赵凌; 李季檩
Original assignee: Tencent Technology Shanghai Co Ltd
Current assignee: Tencent Technology Shanghai Co Ltd
Priority date: 2017-02-27
Filing date: 2017-02-27
Publication date: 2017-07-04
Anticipated expiration: 2037-02-27
Also published as: CN106919918B; WO2018153294A1

Abstract

The embodiment of the invention discloses a kind of face tracking method and device；The present embodiment is when needing to carry out face tracking to video flowing, the network model of corresponding deep learning can be obtained, and be the network model storage allocation resource, so that all layers of the network model share same memory space, then, memory source and network model based on distribution are processed the video flowing, to realize the real-time tracking of face；Because in this scenario, all layers of network model can share same memory space, therefore, an independent memory space need not be all distributed for each layer of network model, can not only save the occupancy of internal memory, improve computational efficiency, furthermore, it is possible to reduce fragmentation, application program capacity is improved.

Description

A kind of face tracking method and device

Technical field

The present invention relates to communication technical field, and in particular to a kind of face tracking method and device.

Background technology

In recent years, face tracking technology has obtained significant progress, in many fields, such as monitoring, video conference and remote Cheng Jiaoxue etc., is required for being tracked Given Face and analyzing.

In the prior art, there are various face tracking technologies, deep learning forward prediction technology is exactly one of which.In depth Degree study forward prediction technology in, for different application fields, it is necessary to set up different network models, and according to need solve The difference of the complex nature of the problem, the level of its network model be would also vary from, such as, complexity problem higher is generally required Set up deeper network model, etc..Personal computer (PC, Personal Computer) end, network model it is every One layer is required for monopolizing one section of storage region, and the storage region can be specifically configured by configuration file, for example, dividing During with storage resource, storage size calculating can be carried out to current layer, and be current Layer assignment by reading configuration file Memory space, etc., wherein, the storage region of each layer needs independently to be allocated, and without shared between the storage region of each layer Internal memory.

In the research and practice process to prior art, it was found by the inventors of the present invention that due in existing scheme, net Each layer of network model is required for monopolizing one section of storage region, therefore, required total internal memory is more, on the platform of constrained storage, To cause to calculate hydraulic performance decline, even resulting in algorithm cannot run；Being additionally, since batch operation needs generation multiple, so, Fragmentation is relatively easy to form, causes application program capacity to decline.

The content of the invention

The embodiment of the present invention provides a kind of face tracking method and device, can not only save the occupancy of internal memory, improves meter Efficiency is calculated, furthermore, it is possible to reduce fragmentation, application program capacity is improved.

The embodiment of the present invention provides a kind of face tracking method, including：

Acquisition needs to carry out the video flowing of face tracking and the network model of deep learning；

It is the network model storage allocation resource so that all layers of the network model share same memory space；

Memory source and the network model based on distribution are tracked to the face in the video flowing.

Accordingly, the embodiment of the present invention also provides a kind of face tracking device, including：

Acquiring unit, needs to carry out the video flowing of face tracking and the network model of deep learning for obtaining；

Allocation unit, for being the network model storage allocation resource so that all layers of the network model are shared Same memory space；

Tracking cell, is carried out for memory source and the network model based on distribution to the face in the video flowing Tracking.

The embodiment of the present invention is needing to carry out deep learning to video flowing, to carry out during face tracking, can obtain corresponding Deep learning network model, and be the network model storage allocation resource so that all layers of the network model are shared same One memory space, then, memory source and network model based on distribution are processed the video flowing, to realize the reality of face When track；Because in this scenario, all layers of network model can share same memory space, it is therefore not necessary to be network Each layer of model all distributes an independent memory space, can not only greatly save the occupancy of internal memory, improves computational efficiency, and And, due to need to only distribute once, so, it is also possible to the number of times of batch operation is substantially reduced, fragmentation is reduced, is conducive to improving Application program capacity.

Brief description of the drawings

Technical scheme in order to illustrate more clearly the embodiments of the present invention, below will be to that will make needed for embodiment description Accompanying drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the present invention, for For those skilled in the art, on the premise of not paying creative work, can also obtain other attached according to these accompanying drawings Figure.

Fig. 1 a are the schematic diagram of a scenario of face tracking method provided in an embodiment of the present invention；

Fig. 1 b are the flow charts of face tracking method provided in an embodiment of the present invention；

Fig. 1 c are the schematic diagrames of Memory Allocation in face tracking method provided in an embodiment of the present invention；

Fig. 1 d are the use schematic diagrames of memory headroom in face tracking method provided in an embodiment of the present invention；

Fig. 2 a are another flow charts of face tracking method provided in an embodiment of the present invention；

Fig. 2 b are network model schematic diagrames at all levels in face tracking method provided in an embodiment of the present invention；

Fig. 3 is the structural representation of face tracking device provided in an embodiment of the present invention；

Fig. 4 is the structural representation of terminal provided in an embodiment of the present invention.

Specific embodiment

Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Site preparation is described, it is clear that described embodiment is only a part of embodiment of the invention, rather than whole embodiments.It is based on Embodiment in the present invention, the every other implementation that those skilled in the art are obtained under the premise of creative work is not made Example, belongs to the scope of protection of the invention.

The embodiment of the present invention provides a kind of face tracking method and device.

Wherein, the face tracking device is specifically integrated in the equipment such as mobile terminal, for example, with reference to Fig. 1 a, the movement end End can obtain the network model of corresponding deep learning, and be the network mould when needing to carry out face tracking to video flowing The disposable storage allocation resource of type so that all layers of the network model share same memory space, such as, can calculate the net Memory space in network model needed for each layer network, selects maximum therein as the size of the memory space of predistribution, And accordingly for the network model storage allocation resource, etc., after memory source is assigned, the distribution just can be based on Memory source and the network model are tracked to the face in the video flowing, so as to reach province's EMS memory occupation, reduce storage broken Piece, and improve the purpose of computational efficiency.

It is described in detail individually below.It should be noted that, the sequence number of following examples is not as preferably suitable to embodiment The restriction of sequence.

Embodiment one,

The present embodiment will be described from the angle of face tracking device, and the face tracking device can specifically be integrated in shifting In the equipment such as dynamic terminal, the mobile terminal can be including mobile phone, panel computer or Intelligent worn device etc..

A kind of face tracking method, including：Acquisition needs to carry out the video flowing of face tracking and the network of deep learning Model, is the network model storage allocation resource so that all layers of the network model share same memory space, based on distribution Memory source and the network model face in the video flowing is tracked.

As shown in Figure 1 b, the idiographic flow of the face tracking method can be as follows：

101st, obtaining needs to carry out the video flowing of face tracking and the network model of deep learning.

For example, the network model of video flowing and deep learning can be specifically obtained from local or other storage devices, Etc..

Wherein, the network model can be configured according to the demand of practical application, will not be repeated here.

102nd, it is the network model storage allocation resource so that all layers of the network model share same memory space, For example, specifically can be as follows：

(1) memory space needed for each layer network in the network model is calculated.

For example, the configuration file of network model can be obtained, each layer in the network model is calculated according to the configuration file Memory space needed for network.Such as, specifically can be as follows：

First, the configuration file of the network model is read, secondly, each network layer is calculated according to the configuration file Number of parameters, input (i.e. Bottom) Blob, output (i.e. Top) Blob and the layer for obtaining each layer network needs temporarily Blob (the i.e. interim Blob) size opened up, finally, just can count according to the Bottom Blob, Top Blob and interim Blob Memory space required for calculating this layer, i.e., A+B+C area sizes as shown in figure 1 c, wherein, A areas can not only be used for this floor Input area, also can be used as last layer or next layer of output area, B areas are the temporary area of this floor, and C areas can not only be used for this floor Output area, also can be used as last layer or next layer of input area.

It should be noted that, for convenience, in embodiments of the present invention, Bottom Blob are referred to as input area, Top Blob is referred to as output area, and interim Blob is referred to as into temporary area.Wherein, Blob is the Storage unit names of depth network model, is One four-matrix, comprising each dimension size of matrix.

(2) using the maximum in the memory space needed for each layer as predistribution memory space size.

For example, by taking six layers of network model as an example, if the memory space required for layer 5 is maximum, with needed for layer 5 The memory space wanted is defined, as the size of the memory space of predistribution, by that analogy, etc..

(3) size of the memory space according to the predistribution is the network model storage allocation resource.

Need to be only the memory source of the storage size that the network model disposably distributes the predistribution, preceding It is not required to distribute other spaces during to calculating.

Wherein, the Memory Allocation process of forward calculation can be as shown in Figure 1 d：Assuming that the input area of n-th layer is currently deposited in A areas (i.e. Bottom Blob) data, ephemeral data needed for B areas storage current layer, the output area (Top that the storage of C areas is calculated Blob) data, then after n-th layer is calculated output result, just can be assigned to n-th by output area (the i.e. Top Blob) pointer + 1 layer of input area (i.e. Bottom Blob), and the input area pointer in A areas is assigned to output area (the i.e. Top of the (n+1)th floor Blob), for storing (n+1)th layer of output result, and B areas are equally used for preserving (n+1)th ephemeral data, so repeatedly, i.e., The calculating of whole feedforward network can be completed, the process is without other data copies and transmission, even if pointer assignment operation also can be pre- Processing stage completes.

103rd, memory source and the network model based on distribution are tracked to the face in the video flowing；For example, tool Body can be as follows：

(1) determined to be currently needed for the image for the treatment of according to the video flowing, obtain present frame.

(2) the face key point coordinates and confidence level of the previous frame image of present frame are obtained.

Wherein, face key point refers to reflecting the information of face characteristic, such as eyes, eyebrow, nose, face, And face's outline etc..Face key point coordinates refers to the coordinate of these face key points, each face key point coordinate An array can be used, such as with array (x₁, y₁, x₂, y₂..., x_n, y_n) characterize, wherein, (x_i, y_i) represent wherein i-th The coordinate of point.

(3) memory source based on distribution, the network model, the face key point coordinates of previous frame image and confidence level are pre- The face key point coordinates and confidence level of present frame are surveyed, and returns to the image for performing and treatment being currently needed for according to video flowing determination The step of, until the image in the video flowing is disposed.

Wherein, the memory source based on distribution, the network model, the face key point coordinates of previous frame image and confidence level Predict that the face key point coordinates of present frame and the mode of confidence level can have various, for example, specifically can be as follows：

When determining that the confidence level of previous frame image is more than predetermined threshold value, using the memory source of distribution, by the network mould Type is calculated the face key point coordinates of the previous frame image, obtains result of calculation, current according to result of calculation prediction The face key point coordinates of frame, and the confidence level for calculating the present frame.

Wherein, the predetermined threshold value can be configured according to the demand of practical application, will not be repeated here.

For example, so that the network model includes public network part, key point predicted branches and confidence level predicted branches as an example, Then step " face of previous frame image key point coordinates is calculated by the network model, obtains result of calculation, according to The face key point coordinates of result of calculation prediction present frame, and the confidence level for calculating the present frame " specifically can be as follows：

The face key point coordinates of the previous frame image is calculated by the public network part, obtains calculating knot Really；The result of calculation is processed by the key point predicted branches, obtains the face key point coordinates of present frame, and, The result of calculation is processed by the confidence level predicted branches, obtains confidence level of present frame, etc..

It should be noted that, if confidence level (i.e. the confidence level of the face key point coordinates of previous frame image) is not less than (i.e. high In, including be equal to) predetermined threshold value, then show that the reference value of the face key point coordinates of previous frame is relatively low, therefore, now can be with Come face key point coordinates in obtaining present frame by the way of detection；Similarly, if obtaining the previous frame image less than present frame Face key point coordinates and confidence level, such as present frame is the first frame of the video flowing, it would however also be possible to employ the mode of detection is obtained Take face key point coordinates in present frame, you can choosing, after step " being the network model storage allocation resource ", the face Tracking can also include：

In the face key point coordinates and confidence level that obtain the previous frame image less than present frame, or, determine previous frame When the confidence level of image is less than or equal to predetermined threshold value, based on the memory source of distribution, by Face datection algorithm in present frame Face detected, to determine the face key point coordinates and confidence level of present frame.

Wherein, the mode of detection can have various, such as, and can be in the following way：

A, the memory source based on distribution, the human face region of the present frame is determined by Face datection algorithm, for example, can be with It is as follows：

Based on the memory source of distribution, the face characteristic in obtaining the present frame by calculating image integration figure, according to this Face characteristic builds strong man's face and non-face strong classifier, then, present frame is processed according to the strong classifier, obtains The human face region of present frame.

Wherein, in order to improve the accuracy rate of Face datection, differentiation face and non-can be built using Adaboost algorithm The strong classifier of face, and by cascade system by strong classifier level be associated in a system, will the strong classifier level be associated in In same system.

Wherein, Adaboost is a kind of iterative algorithm, and its core concept is directed to same training set and trains different dividing Class device (Weak Classifier), then gets up these weak classifier sets, constitutes a stronger final classification device (strong classifier).

B, the position of the human face five-sense-organ in the human face region is predicted by the network model, obtains the people of present frame Face key point coordinates and confidence level.

From the foregoing, it will be observed that the present embodiment is when needing to carry out face tracking to video flowing, corresponding deep learning can be obtained Network model, and be the network model storage allocation resource so that all layers of the network model share same memory space, Then, memory source and network model based on distribution are processed the video flowing, to realize the real-time tracking of face；Due to In this scenario, all layers of network model can share same memory space, it is therefore not necessary to be each layer of network model An independent memory space is all distributed, the occupancy of internal memory can be not only greatlyd save, computational efficiency is improved, being additionally, since only needs Distribute once, so, it is also possible to the number of times of batch operation is substantially reduced, fragmentation is reduced, is conducive to improving application program Energy.

Embodiment two,

, be described in further detail for citing below by the method according to described by embodiment one.

In the present embodiment, will be illustrated so that the face tracking device can specifically be integrated in mobile terminal as an example.

As described in Fig. 2 a, a kind of face tracking method, idiographic flow can be as follows：

201st, acquisition for mobile terminal video flowing.

For example, mobile terminal can specifically receive the video flowing of other equipment transmission, or, obtained from local storage space Video flowing, etc..

202nd, the network model of acquisition for mobile terminal deep learning.

Wherein, the network model can be configured according to the demand of practical application, such as, the network model can include Three parts, first, Part I is public network part, is Liang Ge branches that public network part is subsequently generated secondly, crucial Point prediction branch and confidence level predicted branches.Wherein, the level of each section can according to demand depending on, for example, with reference to Fig. 2 b, The level of each several part specifically can be as follows：

Public network part can include 6 convolution (Convolution) layers, such as convolutional layer 1, convolutional layer 2, convolutional layer 3rd, convolutional layer 4, convolutional layer 5 and convolutional layer 6, immediately amendment linear unit (Relu, a Rectified after each convolutional layer Linear unit) activation primitive, abbreviation nonlinear activation function can also immediately be used to gather after the nonlinear activation function of part The layer of conjunction --- pond (Pooling) layer, for details, reference can be made to Fig. 2 b.

Key point predicted branches can include 1 convolutional layer and 3 inner product (Inner Product) layers, such as, referring to figure 2b, can specifically include convolutional layer 7, interior lamination 1, interior lamination 2 and interior lamination 3, immediately one after each convolutional layer and interior lamination Individual nonlinear activation function.

Confidence level predicted branches can include 1 convolutional layer (i.e. convolutional layer 8), 5 interior lamination (i.e. interior lamination 4, interior laminations 5th, interior lamination 6, interior lamination 7 and interior lamination 8) and 1 flexible maximum transfer function (Softmax) layer, wherein, Softmax Two values of layer output, are respectively face probability and non-face probability, and both add up to 1.0.Additionally, each convolutional layer and A nonlinear activation function can be connect in each two after lamination.

203rd, mobile terminal calculates the memory space needed for each layer network in the network model.

For example, the configuration file of the network model can be read, each network layer is calculated according to the configuration file Number of parameters, obtains the size of input area, output area and the temporary area of each layer network, then, according to the input area, defeated Memory space required for going out the stool and urine of area and temporary area and can calculating the floor, i.e., A+B+C areas as shown in figure 1 c Domain size, specifically can detailed in Example one, will not be repeated here.

204th, mobile terminal using the maximum in the memory space needed for each layer as predistribution memory space size, And according to the size of the memory space of the predistribution be the network model storage allocation resource.

Wherein, the Memory Allocation process of forward calculation can be as shown in Figure 1 d：Assuming that the input area of n-th layer is currently deposited in A areas Data, ephemeral data needed for B areas storage current layer, the output area data that the storage of C areas is calculated, then when n-th layer is calculated After output result, the output area pointer just can be assigned to (n+1)th layer of input area, and the input area pointer in A areas is assigned to n-th + 1 layer of output area, the output result for storing (n+1)th layer, and B areas are equally used for preserving (n+1)th ephemeral data, n-th After+1 layer has been processed, the value of n is updated to " n+1 ", repeats said process, so repeatedly, you can complete whole feedforward network Calculating.For example, be equal to as a example by 1 by the initial value of n, then specifically can be as follows：

After output result is calculated for the 1st layer, just output area (i.e. the C areas of the 1st floor) pointer of the 1st floor can be assigned To the 2nd layer of input area, and the input area pointer in A areas is assigned to the output area of the 2nd floor, the output result for storing the 2nd layer, And B areas are equally used for preserving the 2nd ephemeral data.Similarly, after being calculated output result at the 2nd layer, just can be by the 2nd layer Output area (the C areas of the 2nd floor are also the A areas of the 1st floor) pointer be assigned to the input area of the 3rd floor, and by the A areas the (the i.e. the 1st of the 2nd floor C the layer of layer) input area pointer be assigned to the 3rd layer of output area, the output result for storing the 3rd layer, and B areas are equally used for guarantor The 3rd ephemeral data is deposited, by that analogy, etc..

Wherein, the process is without other data copies and transmission, even if pointer assignment operation can also be completed in pretreatment stage.

It can be seen that, the calculating make use of a feature of deep learning, i.e., (n+1)th layer of calculating only needs to use (n+1)th layer Input area (i.e. the output area of n-th layer) and (n+1)th layer of output area, the input area without using n-th layer again, so that can To recycle the internal memory shared by the input area of n-th layer；That is, all layers of computing is in pre-assigned " A+B+ Carried out in C " region of memorys, therefore, no matter the depth network layer has more deep, and required memory space is only dependent upon depositing for a certain layer Storage space, so, the occupancy of memory source can be saved so that turn into the complicated profound network of mobile-terminal platform application May.Additionally, from the point of view of calculating process, due to being only the pointer assignment operation in internal memory, therefore, it can very quick high Effect.

205th, mobile terminal determines to be currently needed for the image for the treatment of according to the video flowing, obtains present frame.

206th, the face key point coordinates and confidence level of the previous frame image of acquisition for mobile terminal present frame, then perform step Rapid 207.

Wherein, face key point refers to reflecting the information of face characteristic, such as eyes, eyebrow, nose, face, And face's outline etc..Face key point coordinates refers to the coordinate of these face key points.

It should be noted that, if obtaining the face key point coordinates and confidence level of the previous frame image less than the present frame, than Such as the first frame that present frame is the video flowing, then the face key point coordinates and confidence of present frame can be obtained by detecting Degree, that is, perform step 208.

207th, whether mobile terminal determines the confidence level of the face key point coordinates of the previous frame image higher than predetermined threshold value, If so, then showing that face key point is tracked successfully, step 209 is performed, otherwise, if not higher than predetermined threshold value, show that face is closed The tracking failure of key point, performs step 208.

208th, mobile terminal is based on the memory source of distribution, and the face in present frame is examined by Face datection algorithm Survey, to determine the face key point coordinates and confidence level of present frame, then perform step 210.

(1) mobile terminal is based on the memory source of distribution, and the human face region of the present frame is determined by Face datection algorithm, For example, can be as follows：

Mobile terminal is based on the memory source of distribution, and by calculating image integration figure, to obtain face in the present frame special Levy, strong man's face and non-face strong classifier are built according to the face characteristic, then, present frame is carried out according to the strong classifier Treatment, obtains the human face region of present frame.

Wherein, in order to improve the accuracy rate of Face datection, differentiation face and non-can be built using Adaboost algorithm The strong classifier of face, and strong classifier level is associated in a system by cascade system.

(2) mobile terminal is predicted by the network model to the position of the human face five-sense-organ in the human face region, is obtained The face key point coordinates and confidence level of present frame.

Wherein, in order to reduce the calculating time, and computing resource is saved, the calculating of face key point coordinates and confidence level can Being synchronous.

209th, mobile terminal is crucial to the face of the previous frame image by the network model using the memory source for distributing Point coordinates is calculated, and obtains result of calculation, and the face key point coordinates of present frame is predicted according to the result of calculation, and is calculated The confidence level of the present frame, then performs step 210.

For example, mobile terminal can be crucial to the face of the previous frame image by the public network part of the network model Point coordinates is calculated, and obtains result of calculation, then, the result of calculation is processed by the key point predicted branches, is obtained To the face key point coordinates of present frame, and, the result of calculation is processed by the confidence level predicted branches, worked as Confidence level of previous frame, etc..

Such as, the face key point of the previous frame image can be specifically calculated by the public network part of the network model The envelope frame of coordinate, then, on the one hand, by the key point predicted branches, face in the present frame is calculated according to the envelope frame The position of key point, obtains the face key point coordinates of present frame, on the other hand, face is analyzed by the confidence level predicted branches The accuracy of identification, whether the image for example analyzed in the envelope frame is face, etc., and then according to analysis result come calculate work as The confidence level of previous frame.

Wherein, in order to reduce the calculating time, and computing resource is saved, the calculating of face key point coordinates and confidence level can Treatment with synchronization, i.e. key point predicted branches and confidence level predicted branches can be parallel.

210th, mobile terminal determines that whether identification is finished for image in video flowing, if so, then flow terminates, otherwise, is returned Receipt row step 205.

That is, the face key point coordinates and confidence level of present frame are referred to as one of next two field picture face tracking, So circulation, until the image in video flowing is recognized and finished.

From the foregoing, it will be observed that the present embodiment is when needing to carry out face tracking to video flowing, corresponding deep learning can be obtained Network model, and be the network model storage allocation resource so that all layers of the network model share same memory space, Then, memory source and network model based on distribution are processed the video flowing, to realize completing people in the terminal The real-time tracking of face.On the one hand, because in this scenario, all layers of network model can share same memory space, because This, an independent memory space is all distributed without each layer for network model, can not only greatly save the occupancy of internal memory, is carried Computationally efficient, being additionally, since need to only distribute once, so, it is also possible to the number of times of batch operation is substantially reduced, storage is reduced broken Piece, is conducive to improving application program capacity；On the other hand, this programme is less than or equal to threshold value in face tracking exception, such as confidence level Or obtain less than the face key point coordinates of previous frame and during confidence level, can also be tracked automatically replacement (i.e. again through The mode of detection come obtain face key point coordinates and confidence level), therefore, it can strengthen face tracking continuity.

Further, since the program is less to the demand of internal memory, and computational efficiency is higher, therefore, the requirement to equipment performance It is relatively low, go for the equipment such as mobile terminal, so, relative to the side that deep learning forwards algorithms are placed on server end For case, more high efficient and flexible face can be tracked, be conducive to improving Consumer's Experience.

Embodiment three,

In order to preferably implement above method, the embodiment of the present invention also provides a kind of face tracking device, as shown in figure 3, The face tracking device, including acquiring unit 301, allocation unit 302 and tracking cell 303, it is as follows：

(1) acquiring unit 301；

Acquiring unit 301, needs to carry out the video flowing of face tracking and the network model of deep learning for obtaining.

Wherein, the network model can be configured according to the demand of practical application, such as, the network model can include Public network part, key point predicted branches and confidence level predicted branches etc., for details, reference can be made to embodiment of the method above, herein Repeat no more.

(2) allocation unit 302；

Allocation unit 302, for being the network model storage allocation resource so that all layers of the network model are shared same One memory space.

For example, the allocation unit 302 can include computation subunit and distribution subelement, it is as follows：

Computation subunit, can be used for calculating the memory space needed for each layer network in the network model.

For example, the computation subunit, specifically can be used for obtaining the configuration file of network model, according to the configuration file meter Calculate the memory space needed for each layer network in the network model, such as, and can be as follows：

Computation subunit reads the configuration file of the network model, and each network layer is calculated according to the configuration file Number of parameters, obtains the size of input area, output area and the temporary area of each layer network, then, according to the input area, defeated Memory space required for going out the stool and urine of area and temporary area and can calculating the floor, i.e., A+B+C areas as shown in figure 1 c Domain size, specifically can detailed in Example one, will not be repeated here.

Distribution subelement, can be used for the maximum in the memory space needed for each layer as the memory space for pre-allocating Size, the size of the memory space according to the predistribution is the network model storage allocation resource.

(3) tracking cell 303；

Tracking cell 303, is carried out for memory source and the network model based on distribution to the face in the video flowing Tracking.

For example, the tracking cell 303 can include determining that subelement, parameter acquiring subelement and prediction subelement, it is as follows：

Determination subelement, can be used for being determined to be currently needed for the image for the treatment of according to the video flowing, obtain present frame；

Parameter acquiring subelement, can be used for the face key point coordinates of the previous frame image for obtaining present frame and confidence Degree.

Prediction subelement, can be used for the face key of the memory source based on distribution, the network model, previous frame image The face key point coordinates and confidence level of point coordinates and confidence level prediction present frame, and trigger determination subelement and perform and regarded according to this Frequency stream determines the operation of the image for being currently needed for treatment, until the image in the video flowing is disposed.

The prediction subelement, be specifically determined for previous frame image confidence level be more than predetermined threshold value when, using point The memory source matched somebody with somebody, is calculated the face key point coordinates of the previous frame image by the network model, obtains calculating knot Really, the face key point coordinates of present frame, and the confidence level for calculating the present frame are predicted according to the result of calculation.

For example, so that the network model includes public network part, key point predicted branches and confidence level predicted branches as an example, Then the prediction subelement, specifically can be used for：

It should be noted that, if the confidence level of the previous frame of present frame is not higher than predetermined threshold value, show the face of previous frame The reference value of crucial point coordinates is relatively low, therefore, now can be by the way of detection come face key point in obtaining present frame Coordinate；Similarly, if obtaining the face key point coordinates and confidence level of the previous frame image less than present frame, such as present frame is should The first frame of video flowing, equally can also be by the way of detection come face key point coordinates in obtaining present frame, you can choosing, should Tracking cell 303 can also include detection sub-unit, as follows：

The detection sub-unit, can be used in the face key point coordinates for obtaining the previous frame image less than present frame and puts Reliability, or, when determining that the confidence level of previous frame image is less than or equal to predetermined threshold value, based on the memory source of distribution, by people Face detection algorithm is detected to the face in present frame, to determine the face key point coordinates and confidence level of present frame.

Wherein, the mode of detection can have various, for example, the detection sub-unit, specifically can be used for based on the interior of distribution Resource is deposited, the human face region of the present frame is determined by Face datection algorithm, by the network model in the human face region The position of human face five-sense-organ is predicted, and obtains the face key point coordinates and confidence level of present frame.

During specific implementation, above unit can be realized respectively as independent entity, it is also possible to carry out any group Close, realized as same or several entities, the specific implementation of above unit can be found in embodiment of the method above, This is repeated no more.

The face tracking device can be specifically integrated in the equipment such as mobile terminal, the mobile terminal can include mobile phone, Panel computer or Intelligent worn device etc..

From the foregoing, it will be observed that the present embodiment is when needing to carry out face tracking to video flowing, phase can be obtained by acquiring unit 301 The network model of the deep learning answered, and be the network model storage allocation resource by allocation unit 302 so that the network model All layers share same memory space, then, by tracking cell 303 be based on distribution memory source and network model this is regarded Frequency stream is processed, to realize the real-time tracking of face；Because in this scenario, all layers of network model can be shared together One memory space, it is therefore not necessary to for each layer of network model all distributes an independent memory space, can not only greatly save The occupancy of internal memory, improves computational efficiency, and being additionally, since need to only distribute once, so, it is also possible to substantially reduce the secondary of batch operation Number, reduces fragmentation, is conducive to improving application program capacity.

Example IV,

Accordingly, the embodiment of the present invention also provides a kind of mobile terminal, as shown in figure 4, the mobile terminal can include penetrating Frequently (RF, Radio Frequency) circuit 401, include the memory of one or more computer-readable recording mediums 402nd, input block 403, display unit 404, sensor 405, voicefrequency circuit 406, Wireless Fidelity (WiFi, Wireless Fidelity) module 407, include the part such as or the processor 408 and power supply 409 of more than one processing core. It will be understood by those skilled in the art that the mobile terminal structure shown in Fig. 4 does not constitute the restriction to mobile terminal, can wrap Part more more or less than diagram is included, or combines some parts, or different part arrangements.Wherein：

RF circuits 401 can be used to receiving and sending messages or communication process in, the reception and transmission of signal, especially, by base station After downlink information is received, transfer to one or more than one processor 408 is processed；In addition, will be related to up data is activation to Base station.Generally, RF circuits 401 include but is not limited to antenna, at least one amplifier, tuner, one or more oscillators, use Family identity module (SIM, Subscriber Identity Module) card, transceiver, coupler, low-noise amplifier (LNA, Low Noise Amplifier), duplexer etc..Additionally, RF circuits 401 can also by radio communication and network and its His equipment communication.The radio communication can use any communication standard or agreement, including but not limited to global system for mobile telecommunications system System (GSM, Global System of Mobile communication), general packet radio service (GPRS, General Packet Radio Service), CDMA (CDMA, Code Division Multiple Access), wideband code division it is many Location (WCDMA, Wideband Code Division Multiple Access), Long Term Evolution (LTE, Long Term Evolution), Email, Short Message Service (SMS, Short Messaging Service) etc..

Memory 402 can be used to store software program and module, and processor 408 is by running storage in memory 402 Software program and module, so as to perform various function application and data processing.Memory 402 can mainly include storage journey Sequence area and storage data field, wherein, the application program (ratio that storing program area can be needed for storage program area, at least one function Such as sound-playing function, image player function) etc.；Storage data field can be stored and use created number according to mobile terminal According to (such as voice data, phone directory etc.) etc..Additionally, memory 402 can include high-speed random access memory, can also wrap Include nonvolatile memory, for example, at least one disk memory, flush memory device or other volatile solid-state parts. Correspondingly, memory 402 can also include Memory Controller, to provide processor 408 and input block 403 to memory 402 access.

Input block 403 can be used to receive the numeral or character information of input, and generation is set and function with user The relevant keyboard of control, mouse, action bars, optics or trace ball signal input.Specifically, in a specific embodiment In, input block 403 may include Touch sensitive surface and other input equipments.Touch sensitive surface, also referred to as touch display screen or tactile Control plate, user can be collected thereon or neighbouring touch operation (such as user use any suitable objects such as finger, stylus or Operation of the annex on Touch sensitive surface or near Touch sensitive surface), and corresponding connection dress is driven according to formula set in advance Put.Optionally, Touch sensitive surface may include two parts of touch detecting apparatus and touch controller.Wherein, touch detecting apparatus inspection The touch orientation of user is surveyed, and detects the signal that touch operation brings, transmit a signal to touch controller；Touch controller from Touch information is received on touch detecting apparatus, and is converted into contact coordinate, then give processor 408, and can reception processing Order that device 408 is sent simultaneously is performed.Furthermore, it is possible to various using resistance-type, condenser type, infrared ray and surface acoustic wave etc. Type realizes Touch sensitive surface.Except Touch sensitive surface, input block 403 can also include other input equipments.Specifically, other are defeated Entering equipment can include but is not limited to physical keyboard, function key (such as volume control button, switch key etc.), trace ball, mouse One or more in mark, action bars etc..

Display unit 404 can be used to showing by user input information or be supplied to the information and mobile terminal of user Various graphical user interface, these graphical user interface can be made up of figure, text, icon, video and its any combination. Display unit 404 may include display panel, optionally, can use liquid crystal display (LCD, Liquid Crystal Display), the form such as Organic Light Emitting Diode (OLED, Organic Light-Emitting Diode) configures display surface Plate.Further, Touch sensitive surface can cover display panel, when Touch sensitive surface is detected thereon or after neighbouring touch operation, Processor 408 is sent to determine the type of touch event, with preprocessor 408 according to the type of touch event in display panel It is upper that corresponding visual output is provided.Although in fig. 4, Touch sensitive surface with display panel is realized as two independent parts Input and input function, but in some embodiments it is possible to by Touch sensitive surface and display panel it is integrated and realize be input into and it is defeated Go out function.

Mobile terminal may also include at least one sensor 405, such as optical sensor, motion sensor and other sensings Device.Specifically, optical sensor may include ambient light sensor and proximity transducer, wherein, ambient light sensor can be according to environment The light and shade of light adjusts the brightness of display panel, and proximity transducer can close display surface when mobile terminal is moved in one's ear Plate and/or backlight.As one kind of motion sensor, in the detectable all directions of Gravity accelerometer (generally three axles) The size of acceleration, can detect that size and the direction of gravity when static, can be used for the application of identification mobile phone attitude (such as anyhow Screen switching, dependent game, magnetometer pose calibrating), Vibration identification correlation function (such as pedometer, percussion) etc.；As for movement The other sensors such as gyroscope, barometer, hygrometer, thermometer, infrared ray sensor that terminal can also configure, no longer go to live in the household of one's in-laws on getting married herein State.

Voicefrequency circuit 406, loudspeaker, microphone can provide the COBBAIF between user and mobile terminal.Voicefrequency circuit Electric signal after the 406 voice data conversions that will can be received, is transferred to loudspeaker, and it is defeated to be converted to voice signal by loudspeaker Go out；On the other hand, the voice signal of collection is converted to electric signal by microphone, and audio is converted to after being received by voicefrequency circuit 406 Data, then after voice data output processor 408 is processed, through RF circuits 401 being sent to such as another mobile terminal, or Voice data is exported to memory 402 so as to further treatment.Voicefrequency circuit 406 is also possible that earphone jack, to provide The communication of peripheral hardware earphone and mobile terminal.

WiFi belongs to short range wireless transmission technology, and mobile terminal can help user to receive and dispatch electricity by WiFi module 407 Sub- mail, browse webpage and access streaming video etc., it has provided the user wireless broadband internet and has accessed.Although Fig. 4 shows Go out WiFi module 407, but it is understood that, it is simultaneously not belonging to must be configured into for mobile terminal, completely can be according to need To be omitted in the essential scope for do not change invention.

Processor 408 is the control centre of mobile terminal, using various interfaces and each portion of connection whole mobile phone Point, by running or performing software program and/or module of the storage in memory 402, and storage is called in memory 402 Interior data, perform the various functions and processing data of mobile terminal, so as to carry out integral monitoring to mobile phone.Optionally, process Device 408 may include one or more processing cores；Preferably, processor 408 can integrated application processor and modulation /demodulation treatment Device, wherein, application processor mainly processes operating system, user interface and application program etc., and modem processor is mainly located Reason radio communication.It is understood that above-mentioned modem processor can not also be integrated into processor 408.

Mobile terminal also includes the power supply 409 (such as battery) powered to all parts, it is preferred that power supply can be by electricity Management system is logically contiguous with processor 408, so as to realize management charging, electric discharge and power consumption by power-supply management system The functions such as management.Power supply 409 can also include one or more direct current or AC power, recharging system, power supply event The random component such as barrier detection circuit, power supply changeover device or inverter, power supply status indicator.

Although not shown, mobile terminal can also will not be repeated here including camera, bluetooth module etc..Specifically at this In embodiment, the processor 408 in mobile terminal can be according to following instruction, by entering for one or more application program The corresponding executable file of journey is loaded into memory 402, and storage answering in memory 402 is run by processor 408 With program, so as to realize various functions：

Acquisition needs to carry out the video flowing of face tracking and the network model of deep learning, for the network model is distributed Memory source so that all layers of the network model share same memory space, memory source and the network mould based on distribution Type is tracked to the face in the video flowing.

For example, can specifically calculate the memory space in the network model needed for each layer network, such as obtain network mould The configuration file of type, the memory space according to needed for the configuration file calculates each layer network in the network model, then, will be each The maximum in memory space needed for layer as the memory space of predistribution size, memory space according to the predistribution Size is the network model storage allocation resource, etc..

Wherein, the structure of the network model can be configured according to the demand of practical application, such as, the network model can With including public network part, key point predicted branches and confidence level predicted branches etc..Additionally, public network part, the key Depending on the level of point prediction branch and confidence level predicted branches can also be according to the demand of practical application, for details, reference can be made to above Embodiment of the method, will not be repeated here.

Wherein, memory source and the network model based on distribution can to the mode that the face in the video flowing is tracked It is various to have, for example, the face key point coordinates and confidence level of the previous frame image of present frame can be obtained, then, based on point The face of memory source, the network model, the face key point coordinates of previous frame image and the confidence level prediction present frame matched somebody with somebody is closed Key point coordinates and confidence level, etc., i.e. application program of the storage in memory 402, can also implement function such as：

Determined to be currently needed for the image for the treatment of according to the video flowing, obtain present frame；Obtain the previous frame image of present frame Face key point coordinates and confidence level；Memory source, the network model, the face key point of previous frame image based on distribution The face key point coordinates and confidence level of coordinate and confidence level prediction present frame, and it is current according to video flowing determination to return to execution The step of needing image to be processed, until the image in the video flowing is disposed.

Such as, can lead to when it is determined that the confidence level of previous frame image is more than predetermined threshold value, using the memory source of distribution Cross the network model to calculate the face key point coordinates of the previous frame image, result of calculation is obtained, then, according to the meter Calculate the face key point coordinates of prediction of result present frame, and the confidence level for calculating the present frame.

It should be noted that, if the confidence level of previous frame is not higher than predetermined threshold value, show that the face key point of previous frame is sat Target reference value is relatively low, therefore, now can be by the way of detection come face key point coordinates in obtaining present frame；Together Reason, if obtaining the face key point coordinates and confidence level of the previous frame image less than present frame, such as present frame is the video flowing First frame, similarly can using detection by the way of come in obtaining present frame face key point coordinates, i.e., the storage is in memory Application program in 402, can also implement function such as：

The specific implementation of each operation above can be found in embodiment above, will not be repeated here.

From the foregoing, it will be observed that the mobile terminal of the present embodiment is needing to carry out deep learning to video flowing, to carry out face tracking When, the network model of corresponding deep learning can be obtained, and be the network model storage allocation resource so that the network model All layers share same memory space, then, based on distribution memory source and network model the video flowing is processed, To realize the real-time tracking of face；Because in this scenario, all layers of network model can share same memory space, because This, an independent memory space is all distributed without each layer for network model, can not only greatly save the occupancy of internal memory, is carried Computationally efficient, being additionally, since need to only distribute once, so, it is also possible to the number of times of batch operation is substantially reduced, storage is reduced broken Piece, is conducive to improving application program capacity.

One of ordinary skill in the art will appreciate that all or part of step in the various methods of above-described embodiment is can Completed with instructing the hardware of correlation by program, the program can be stored in a computer-readable recording medium, storage Medium can include：Read-only storage (ROM, Read Only Memory), random access memory (RAM, Random Access Memory), disk or CD etc..

A kind of face tracking method and device for being provided the embodiment of the present invention above are described in detail, herein Apply specific case to be set forth principle of the invention and implementation method, the explanation of above example is only intended to help Understand the method for the present invention and its core concept；Simultaneously for those skilled in the art, according to thought of the invention, in tool Be will change in body implementation method and range of application, in sum, this specification content should not be construed as to the present invention Limitation.

Claims

1. a kind of face tracking method, it is characterised in that including：

2. method according to claim 1, it is characterised in that described is the network model storage allocation resource so that All layers of the network model share same memory space, including：

Calculate the memory space needed for each layer network in the network model；

Using the maximum in the memory space needed for each layer as predistribution memory space size；

The size of the memory space according to the predistribution is the network model storage allocation resource.

3. method according to claim 2, it is characterised in that in the calculating network model needed for each layer network Memory space, including：

Obtain the configuration file of network model；

Memory space according to needed for the configuration file calculates each layer network in the network model.

4. the method according to any one of claims 1 to 3, it is characterised in that the memory source and institute based on distribution Network model is stated to be tracked the face in the video flowing, including：

Determined to be currently needed for the image for the treatment of according to the video flowing, obtain present frame；

Obtain the face key point coordinates and confidence level of the previous frame image of present frame；

The face key point coordinates of memory source, the network model, previous frame image based on distribution and confidence level prediction are worked as The face key point coordinates and confidence level of previous frame, and return to the image that execution is currently needed for processing according to video flowing determination Step, until the image in the video flowing is disposed.

5. method according to claim 4, it is characterised in that the memory source based on distribution, the network model, The face key point coordinates of previous frame image and the face key point coordinates and confidence level of confidence level prediction present frame, including：

When determining that the confidence level of previous frame image is more than predetermined threshold value, using the memory source of distribution, by the network model Face key point coordinates to the previous frame image is calculated, and obtains result of calculation；

The face key point coordinates of present frame, and the confidence level for calculating the present frame are predicted according to the result of calculation.

6. method according to claim 5, it is characterised in that the network model includes public network part, key point Predicted branches and confidence level predicted branches, then it is described the face key point of the previous frame image is sat by the network model Mark is calculated, and obtains result of calculation, including：

The face key point coordinates of the previous frame image is calculated by the public network part, obtains calculating knot Really；

The face key point coordinates that present frame is predicted according to the result of calculation, and the confidence for calculating the present frame Degree, including：The result of calculation is processed by the key point predicted branches, the face key point for obtaining present frame is sat Mark, and, the result of calculation is processed by the confidence level predicted branches, obtain the confidence level of present frame.

7. method according to claim 4, it is characterised in that also include：

In the face key point coordinates and confidence level that obtain the previous frame image less than present frame, or, determine previous frame image Confidence level be less than or equal to predetermined threshold value when, based on distribution memory source, by Face datection algorithm to the people in present frame Face is detected, to determine the face key point coordinates and confidence level of present frame.

8. method according to claim 7, it is characterised in that the memory source based on distribution, by Face datection Algorithm is detected to the face in present frame, to determine the face key point coordinates and confidence level of present frame, including：

Based on the memory source of distribution, the human face region of the present frame is determined by Face datection algorithm；

The position of the human face five-sense-organ in the human face region is predicted by the network model, obtains the face of present frame Crucial point coordinates and confidence level.

9. method according to claim 8, it is characterised in that the memory source based on distribution, by Face datection Algorithm determines the human face region of the present frame, including：

Based on the memory source of distribution, by calculating the face characteristic in the image integration figure acquisition present frame；

Strong man's face and non-face strong classifier are built according to the face characteristic, the strong classifier level is associated in same system In；

Present frame is processed according to the strong classifier, obtains the human face region of present frame.

10. a kind of face tracking device, it is characterised in that including：

Allocation unit, for being the network model storage allocation resource so that all layers of the network model share same Memory space；

Tracking cell, for based on distribution memory source and the network model face in the video flowing is carried out with Track.

11. devices according to claim 10, it is characterised in that the allocation unit includes computation subunit and distribution Unit；

Computation subunit, for calculating the memory space in the network model needed for each layer network；

Distribution subelement, for using the maximum in the memory space needed for each layer as predistribution memory space size, The size of the memory space according to the predistribution is the network model storage allocation resource.

12. devices according to claim 11, it is characterised in that

The computation subunit, the configuration file specifically for obtaining network model, the net is calculated according to the configuration file Memory space in network model needed for each layer network.

13. device according to any one of claim 10 to 12, it is characterised in that the tracking cell includes determining that son is single Unit, parameter acquiring subelement and prediction subelement；

Determination subelement, for determining to be currently needed for the image for the treatment of according to the video flowing, obtains present frame；

Parameter acquiring subelement, face key point coordinates and confidence level for obtaining the previous frame image of present frame；

Prediction subelement, for the face key point coordinates based on the memory source, the network model, previous frame image for distributing The crucial point coordinates of the face of present frame and confidence level are predicted with confidence level, and triggers determination subelement and performed according to the video flowing It is determined that the operation of the image for the treatment of is currently needed for, until the image in the video flowing is disposed.

14. devices according to claim 13, it is characterised in that the prediction subelement, specifically for：

15. devices according to claim 13, it is characterised in that the tracking cell also includes detection sub-unit；

The detection sub-unit, in the face key point coordinates and confidence level for obtaining the previous frame image less than present frame, Or, when determining that the confidence level of previous frame image is less than or equal to predetermined threshold value, based on the memory source of distribution, by Face datection Algorithm is detected to the face in present frame, to determine the face key point coordinates and confidence level of present frame.

16. devices according to claim 15, it is characterised in that

The detection sub-unit, specifically for the memory source based on distribution, the present frame is determined by Face datection algorithm Human face region, the position of the human face five-sense-organ in the human face region is predicted by the network model, obtain current The face key point coordinates and confidence level of frame.