CN109829432A

CN109829432A - Method and apparatus for generating information

Info

Publication number: CN109829432A
Application number: CN201910099415.1A
Authority: CN
Inventors: 邓启力
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Beijing ByteDance Network Technology Co Ltd
Priority date: 2019-01-31
Filing date: 2019-01-31
Publication date: 2019-05-31
Anticipated expiration: 2039-01-31
Also published as: CN109829432B

Abstract

Embodiment of the disclosure discloses the method and apparatus for generating information.One specific embodiment of this method includes: that the REF video frame of target video frame and target video frame is extracted from target face video, wherein REF video frame is adjacent with target video frame；Determine face key point information corresponding to REF video frame, and based on identified face key point information and pre-set image, generate hotspot graph corresponding to REF video frame, wherein, the image-region of hotspot graph includes numerical value set, for the numerical value in numerical value set, the numerical value is for characterizing face key point in the probability of the numerical value position；By the first identification model of target video frame, REF video frame and hotspot graph generated input training in advance, face key point information corresponding to target video frame is obtained.The embodiment helps to reduce face key point in the shake of continuous video interframe, improves the stability of face key point location.

Description

Method and apparatus for generating information

Technical field

Embodiment of the disclosure is related to field of computer technology, more particularly, to generates the method and apparatus of information.

Background technique

With popularizing for mobile video software, various video processnig algorithms are also widely used.Video human face closes Key point tracks one of the based process function as video, is also widely used.

The method that existing video human face key point tracking is generally basede on image face critical point detection is realized, i.e., based on every The facial image of frame obtains corresponding face key point.

Summary of the invention

Embodiment of the disclosure proposes the method and apparatus for generating information.

In a first aspect, the embodiment of the present disclosure provides a kind of method for generating information, this method comprises: from target person The REF video frame of target video frame and target video frame is extracted in face video, wherein REF video frame and target video frame phase It is adjacent；It determines face key point information corresponding to REF video frame, and based on identified face key point information and presets Image generates hotspot graph corresponding to REF video frame, wherein the shape size of pre-set image and REF video frame distinguishes phase Together, the image-region of hotspot graph includes numerical value set, and for the numerical value in numerical value set, the numerical value is for characterizing face key point Probability positioned at the numerical value position；Target video frame, REF video frame and hotspot graph generated are inputted into training in advance The first identification model, obtain target video frame corresponding to face key point information.

In some embodiments, face key point information corresponding to REF video frame is determined, comprising: by REF video frame Second identification model of input training in advance, obtains face key point information corresponding to REF video frame.

In some embodiments, training obtains the second identification model as follows: obtaining training sample set, wherein Training sample includes sample facial image and the sample face key point information that marks in advance for sample facial image；Utilize machine Device learning method, the sample facial image for the training sample that training sample is concentrated is as input, the sample face that will be inputted Sample face key point information corresponding to image obtains the second identification model as desired output, training.

In some embodiments, the first identification model is obtained by following steps training: multiple Sample video frame groups are obtained, Wherein, Sample video frame group includes extract from sample face video, adjacent two video frames；Multiple samples are regarded Sample video frame group in frequency frame group executes following steps: sample object video frame and sample are determined from the Sample video frame group This REF video frame；Determine face key point information corresponding to the sample REF video frame in the Sample video frame group, and Determine face key point information corresponding to the sample object video frame in the Sample video frame group as sample face key point Information；Based on face key point information and pre-set image corresponding to sample REF video frame, sample hotspot graph is generated；Utilize this The sample face key point information of Sample video frame group, sample hotspot graph generated and sample object video frame, composition training Sample；Using machine learning method, the Sample video frame group for including by the training sample in composed training sample and sample Hotspot graph as input, by corresponding to the Sample video frame group inputted and sample hotspot graph, the sample of sample object video frame This face key point information obtains the first identification model as desired output, training.

In some embodiments, face key point corresponding to the sample object video frame in the Sample video frame group is determined Information is as sample face key point information, comprising: determines corresponding to the sample object video frame in the Sample video frame group Initial Face key point information；Based on the face key point information and sample object video frame for being in advance sample REF video frame The weight that Initial Face key point information is distributed respectively, to identified, sample REF video frame face key point information and Identified, sample object video frame Initial Face key point information is weighted summation process, obtains processing result conduct The sample face key point information of sample object video frame in the Sample video frame group.

In some embodiments, based on identified face key point information and pre-set image, REF video frame institute is generated Corresponding hotspot graph, comprising: the face key point information institute for generating REF video frame on pre-set image using Gaussian function is right The numerical value set answered；Based on the pre-set image including numerical value set generated, hotspot graph corresponding to REF video frame is generated.

Second aspect, embodiment of the disclosure provide it is a kind of for generating the device of information, the device include: extract it is single Member is configured to extract the REF video frame of target video frame and target video frame from target face video, wherein benchmark view Frequency frame is adjacent with target video frame；Determination unit is configured to determine face key point information corresponding to REF video frame, with And based on identified face key point information and pre-set image, generate hotspot graph corresponding to REF video frame, wherein default Image is identical as the shape size difference of REF video frame, and the image-region of hotspot graph includes numerical value set, for numerical value set In numerical value, the numerical value is for characterizing face key point in the probability of the numerical value position；Generation unit, be configured to by First identification model of target video frame, REF video frame and hotspot graph generated input training in advance, obtains target video Face key point information corresponding to frame.

In some embodiments, determination unit is further configured to: by the second of REF video frame input training in advance Identification model obtains face key point information corresponding to REF video frame.

In some embodiments, determination unit includes: the first generation module, is configured to using Gaussian function in default figure The numerical value set as corresponding to the upper face key point information for generating REF video frame；Second generation module is configured to be based on Pre-set image including numerical value set generated generates hotspot graph corresponding to REF video frame.

The third aspect, embodiment of the disclosure provide a kind of electronic equipment, comprising: one or more processors；Storage Device is stored thereon with one or more programs, when one or more programs are executed by one or more processors, so that one Or the method that multiple processors realize any embodiment in the above-mentioned method for generating information.

Fourth aspect, embodiment of the disclosure provide a kind of computer-readable medium, are stored thereon with computer program, The program realizes any embodiment in the above-mentioned method for generating information method when being executed by processor.

The method and apparatus for generating information that embodiment of the disclosure provides, by being extracted from target face video The REF video frame of target video frame and target video frame, wherein REF video frame is adjacent with target video frame；Determine that benchmark regards Face key point information corresponding to frequency frame, and based on identified face key point information and pre-set image, generate benchmark Hotspot graph corresponding to video frame, wherein pre-set image is identical as the shape size difference of REF video frame, the image of hotspot graph Region includes numerical value set, and for the numerical value in numerical value set, the numerical value is for characterizing face key point where the numerical value The probability of position；The first identification model that the input of target video frame, REF video frame and hotspot graph generated is trained in advance, Face key point information corresponding to target video frame is obtained, so as to close the face of the REF video frame of target video frame Reference data of the key point information as the face key point information for generating target video frame helps to reduce face key point even The shake of continuous video interframe improves the stability of face key point location.

Detailed description of the invention

By reading a detailed description of non-restrictive embodiments in the light of the attached drawings below, the disclosure is other Feature, objects and advantages will become more apparent upon:

Fig. 1 is that one embodiment of the disclosure can be applied to exemplary system architecture figure therein；

Fig. 2 is the flow chart according to one embodiment of the method for generating information of the disclosure；

Fig. 3 is the schematic diagram of a hotspot graph of embodiment of the disclosure；

Fig. 4 is according to an embodiment of the present disclosure for generating the schematic diagram of an application scenarios of the method for information；

Fig. 5 is the flow chart according to another embodiment of the method for generating information of the disclosure；

Fig. 6 is the structural schematic diagram according to one embodiment of the device for generating information of the disclosure；

Fig. 7 is adapted for the structural schematic diagram for the computer system for realizing the electronic equipment of embodiment of the disclosure.

Specific embodiment

The disclosure is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining related invention, rather than the restriction to the invention.It also should be noted that in order to Convenient for description, part relevant to related invention is illustrated only in attached drawing.

It should be noted that in the absence of conflict, the feature in embodiment and embodiment in the disclosure can phase Mutually combination.The disclosure is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.

Fig. 1 is shown can be using the method for generating information of the disclosure or the implementation of the device for generating information The exemplary system architecture 100 of example.

As shown in Figure 1, system architecture 100 may include terminal device 101,102,103, network 104 and server 105. Network 104 between terminal device 101,102,103 and server 105 to provide the medium of communication link.Network 104 can be with Including various connection types, such as wired, wireless communication link or fiber optic cables etc..

User can be used terminal device 101,102,103 and be interacted by network 104 with server 105, to receive or send out Send message etc..Various telecommunication customer end applications can be installed on terminal device 101,102,103, such as Video processing software, Image processing software, web browser applications, searching class application, instant messaging tools, social platform software etc..

Terminal device 101,102,103 can be hardware, be also possible to software.When terminal device 101,102,103 is hard When part, it can be various electronic equipments, including but not limited to smart phone, tablet computer, E-book reader, MP3 player (Moving Picture Experts Group Audio Layer III, dynamic image expert's compression standard audio level 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic image expert's compression standard audio level 4) player, pocket computer on knee and desktop computer etc..It, can be with when terminal device 101,102,103 is software It is mounted in above-mentioned cited electronic equipment.Multiple softwares or software module may be implemented into (such as providing distribution in it The multiple softwares or software module of formula service), single software or software module also may be implemented into.It is not specifically limited herein.

Server 105 can be to provide the server of various services, such as to the mesh that terminal device 101,102,103 is shot The background video processing server that mark face video is handled.Background video processing server can be to the target person received The data such as face video carry out the processing such as analyzing, and obtain processing result (such as face key point corresponding to target video frame letter Breath).

It should be noted that can be by server 105 for generating the method for information provided by embodiment of the disclosure It executes, it can also be by terminal device 101,102,103；Correspondingly, it can be set for generating the device of information in server 105 In, it also can be set in terminal device 101,102,103.

It should be noted that server can be hardware, it is also possible to software.When server is hardware, may be implemented At the distributed server cluster that multiple servers form, individual server also may be implemented into.It, can when server is software It, can also be with to be implemented as multiple softwares or software module (such as providing multiple softwares of Distributed Services or software module) It is implemented as single software or software module.It is not specifically limited herein.

It should be understood that the number of terminal device, network and server in Fig. 1 is only schematical.According to realization need It wants, can have any number of terminal device, network and server.Generating the target video frame institute in target face video Used data do not need in the case where long-range obtain during corresponding face key point information, above system framework It can not include network, and only include terminal device or server.

With continued reference to Fig. 2, the process of one embodiment of the method for generating information according to the disclosure is shown 200.The method for being used to generate information, comprising the following steps:

Step 201, the REF video frame of target video frame and target video frame is extracted from target face video.

In the present embodiment, can lead to for generating the executing subject (such as server shown in FIG. 1) of the method for information It crosses wired connection mode or radio connection obtains target face video, and extract target view from target face video The REF video frame of frequency frame and target video frame.Wherein, target face video is the people to carry out face critical point detection to it Face video.Face video can be to carry out shooting video obtained to face.Video frame included by face video includes people Face image.In practice, face key point can be point crucial in face, specifically, can be influence face mask or five The point of official's shape.As an example, face key point can be point corresponding to point, eyes corresponding to nose etc..Specifically, mesh Mark face video can store in above-mentioned executing subject, can also be by the electronic equipment that communicates to connect with above-mentioned executing subject (such as terminal device shown in FIG. 1) is sent to above-mentioned executing subject.

In the present embodiment, target video frame is the video frame of the face key point information to be determined corresponding to it.Face Key point information can include but is not limited at least one of following: text for characterizing the position of face key point in the video frame Word, number, symbol, image.The REF video frame of target video frame is for determining the key of face corresponding to target video frame The video frame of point information.Herein, REF video frame is adjacent with target video frame.Specifically, REF video frame can be target In sequence of frames of video corresponding to face video, video frame adjacent with target video frame and before target video frame, It can be video frame adjacent with target video frame and after target video frame.

In the present embodiment, above-mentioned executing subject can extract target video from target face video using various methods The REF video frame of frame and target video frame.For example, target video frame can be extracted at random from target face video first, so The video frame adjacent with target video frame is extracted afterwards as REF video frame；Alternatively, can be wrapped first from target face video The highest video frame of clarity is extracted in the video frame included as target video frame, then extracts the view adjacent with target video frame Frequency frame is as REF video frame.It should be noted that herein, after extracting target video frame, specifically extracting and being located at target view Video frame before frequency frame still extracts the video frame after being located at target video frame as REF video frame as REF video Frame can be predefined by technical staff, or can be random.

Step 202, face key point information corresponding to REF video frame is determined, and crucial based on identified face Point information and pre-set image generate hotspot graph corresponding to REF video frame.

In the present embodiment, based on REF video frame obtained in step 201, above-mentioned executing subject can determine that benchmark regards Face key point information corresponding to frequency frame.Specifically, above-mentioned executing subject can determine REF video frame by various methods Corresponding face key point information.For example, above-mentioned executing subject can be used to show with outputting reference video frame, and obtains and use Family is directed to the face key point information that REF video frame marks out.

In the present embodiment, based on identified face key point information and pre-set image, above-mentioned executing subject can be given birth to At hotspot graph corresponding to REF video frame.Wherein.Pre-set image can be figure pre-set, for generating hotspot graph Picture, pre-set image can be identical as the shape size difference of REF video frame.In addition, initial pictures can only include Background Picture, without including foreground image.In turn, above-mentioned executing subject can add numerical value on initial pictures, to generate hotspot graph.Heat The image-region of point diagram includes numerical value set.For the numerical value in numerical value set, the numerical value for characterize face key point in The probability of the numerical value position.It is appreciated that since hotspot graph is identical as the shape size difference of REF video frame, so hot The position where numerical value in point diagram can be corresponding with the position in REF video frame, and then hotspot graph can serve to indicate that base The position of face key point in quasi- video frame.

It should be noted that hotspot graph may include at least two numerical value set, wherein at least two numerical value set Each numerical value set can correspond to the face key point information of a REF video frame.

Specifically, in, hotspot graph corresponding with the position in the REF video frame that face key point information is characterized Numerical value on position can be 1.According to each position in hotspot graph with numerical value 1 corresponding at a distance from position, each position Corresponding numerical value can be gradually reduced.I.e. position corresponding to distance values 1 is remoter, and corresponding numerical value is smaller.

It should be noted that the position where numerical value in hotspot graph can be by the minimum rectangle for surrounding numerical value Lai really It is fixed.Specifically, the center of above-mentioned minimum rectangle can be determined as numerical value position, alternatively, can be by minimum rectangle Endpoint location be determined as numerical value position.

In the present embodiment, above-mentioned executing subject can be believed using the face key point of REF video frame by various modes Breath generates numerical value set corresponding to the face key point information of REF video frame on initial pictures, and then obtains benchmark view Hotspot graph corresponding to frequency frame.

In some optional implementations of the present embodiment, above-mentioned executing subject can generate benchmark by following steps Hotspot graph corresponding to video frame: firstly, above-mentioned executing subject can use Gaussian function generates benchmark view on pre-set image Numerical value set corresponding to the face key point information of frequency frame.Then, above-mentioned executing subject can be based on including number generated The pre-set image of value set generates hotspot graph corresponding to REF video frame.Specifically, can will include the default of numerical value set Image is determined directly as hotspot graph corresponding to REF video frame, alternatively, can also to include numerical value set pre-set image into Row image procossing (such as addition background color), and image is determined as hotspot graph corresponding to REF video frame by treated.

In this implementation, above-mentioned executing subject can be right by position institute using position as the independent variable of Gaussian function Dependent variable of the numerical value answered as Gaussian function, and then numerical value set is determined based on position.It is appreciated that herein, numerical value 1 Position corresponding to (i.e. face key point) is independent variable corresponding to the mathematic expectaion of Gaussian function.

As an example, Fig. 3 shows the schematic diagram of a hotspot graph of embodiment of the disclosure.It include a numerical value in figure Gather 301, face key point information corresponding to the numerical value set 301 is face key point corresponding to face key point 302 Information.As shown in Figure 3.Numerical value on 302 position of face key point is 1.With the increasing at a distance from face key point 302 Greatly, numerical value is gradually decreased to 0.4 by 0.8, then is decreased to 0.1 by 0.4.It should be noted that herein, in hotspot graph not Mark out the position of numerical value, corresponding to numerical value can be 0.It numerical value position can be by the minimum square for surrounding numerical value Shape (such as appended drawing reference 303) determines.

Step 203, by the first identification of target video frame, REF video frame and hotspot graph generated input training in advance Model obtains face key point information corresponding to target video frame.

In the present embodiment, it is based on obtaining in target video frame, REF video frame and step 202 obtained in step 201 Target video frame, REF video frame and hotspot graph generated can be inputted instruction in advance by the hotspot graph arrived, above-mentioned executing subject The first experienced identification model obtains face key point information corresponding to target video frame.

In the present embodiment, the first identification model can be used for characterizing target video frame, REF video frame and REF video The corresponding relationship of face key point information corresponding to hotspot graph corresponding to frame and target video frame.Specifically, as an example, First identification model can be technical staff and be in advance based on to a large amount of target video frame, REF video frame, REF video frame institute The statistics of face key point information corresponding to corresponding hotspot graph and target video frame and pre-establish, be stored with multiple mesh Mark video frame, REF video frame, hotspot graph corresponding to REF video frame and the pass of face corresponding to corresponding target video frame The mapping table of key point information；Or it is based on preset training sample, using machine learning method to initial model (example Such as neural network) be trained after obtained model.

In some optional implementations of the present embodiment, the first identification model can be obtained by following steps training :

Step 2031, multiple Sample video frame groups are obtained.

Wherein, Sample video frame group includes extract from sample face video, adjacent two video frames.Sample people Face video is to carry out shooting face video obtained to face.Specifically, various methods can be used from sample face video Disease extracts Sample video frame group.Such as can be extracted using the method extracted at random, alternatively, sample face video institute can be extracted Two video frames of predeterminated position are arranged in corresponding sequence of frames of video as Sample video frame group.

Step 2032, for the Sample video frame group in multiple Sample video frame groups, following steps are executed: being regarded from the sample Sample object video frame and sample REF video frame are determined in frequency frame group；Determine the sample REF video in the Sample video frame group Face key point information corresponding to frame, and determine face corresponding to the sample object video frame in the Sample video frame group Key point information is as sample face key point information；Based on face key point information corresponding to sample REF video frame and in advance If image, sample hotspot graph is generated；Utilize the Sample video frame group, sample hotspot graph generated and sample object video frame Sample face key point information forms training sample.

Herein, sample REF video frame is for determining face key point information corresponding to sample object video frame Video frame, specifically, can determine sample object video frame and sample benchmark from the Sample video frame group using various methods Video frame.For example, a Sample video frame can be selected as sample object video frame from the Sample video frame group at random, then Non-selected Sample video frame is sample REF video frame in the Sample video frame group；Alternatively, the Sample video can be determined Sample video frame putting in order in the sequence of frames of video corresponding to sample face video in frame group can will arrange in turn The posterior Sample video frame of sequence is determined as sample object video frame, and the preceding Sample video frame that will sort is determined as sample benchmark view Frequency frame.

It in this implementation, can be using crucial with the face described in step 202, for determining REF video frame The similar method of the method for point information determines the key of face corresponding to the sample REF video frame in the Sample video frame group Point information, details are not described herein again.

Herein, sample corresponding to the sample object video frame in the Sample video frame group can be determined using various methods This face key point information.For example, can be using above-mentioned right with sample REF video frame institute that is determining in the Sample video frame group The similar method of the method for the face key point information answered determines that the sample object video frame institute in the Sample video frame group is right The sample face key point information answered.

In some optional implementations of the present embodiment, it can be determined by following steps in the Sample video frame group Sample object video frame corresponding to face key point information as sample face key point information: it is possible, firstly, to determine should Initial Face key point information corresponding to sample object video frame in Sample video frame group.It is then possible to based on being in advance What the face key point information of sample REF video frame and the Initial Face key point information of sample object video frame were distributed respectively Weight, to identified, sample REF video frame face key point information and identified, sample object video frame initial Face key point information is weighted summation process, obtains processing result as the sample object video in the Sample video frame group The sample face key point information of frame.

Wherein, Initial Face key point information is for characterizing position of the Initial Face key point in sample object video frame It sets, can include but is not limited at least one of following: number, text, symbol, image.The initial people of sample object video frame Face key point information can be used as the benchmark of the sample face key point information of sample object video frame, for determining sample object The sample face key point information of video frame.Specifically, can using with described in step 202, for determining REF video The similar method of the method for the face key point information of frame determines the first of the sample object video frame in the Sample video frame group Beginning face key point information.

In this implementation, face key point information and sample object video frame for sample REF video frame just The beginning pre-assigned weight of face key point information can be used for characterizing the face key point information and sample of sample REF video frame Influence journey of the Initial Face key point information of this target video frame to the sample face key point information of sample object video frame Degree.Specifically, the weight distributed is bigger, characterization gets over the influence degree of the sample face key point information of target video frame It is high.

As an example, the face key point information of sample REF video frame is face key point in sample REF video frame Coordinate (14,5).The Initial Face key point information of sample object video frame is Initial Face key point in sample object video Coordinate (13,6) in frame.It is in advance the initial people of the face key point information of sample REF video frame and sample object video frame The weight of face key point information distribution is respectively 0.4 and 0.6.Then the sample face key point information of sample object video frame is (13.4,5.6), wherein 13.4=14 × 0.4+13 × 0.6；5.6=5 × 0.4+6 × 0.6.

It is appreciated that this implementation can the face in conjunction with corresponding to the sample REF video frame of sample object video frame Key point information generates sample face key point information corresponding to sample object video frame, with this, sample face generated Key point information may include the feature of the face key point in sample REF video frame, help to enhance adjacent video frame it Between face key point continuity, and improve sample object video frame corresponding to sample face key point information it is accurate Property.

Furthermore it is possible to using the method described in step 202, for generating hotspot graph corresponding to REF video frame, Based on face key point information and pre-set image corresponding to sample REF video frame, generate corresponding to sample REF video frame Sample hotspot graph, details are not described herein again.

Step 2033, using machine learning method, the sample for including by the training sample in composed training sample is regarded Frequency frame group and sample hotspot graph as input, by corresponding to the Sample video frame group inputted and sample hotspot graph, sample mesh The sample face key point information of video frame is marked as desired output, training obtains the first identification model.

Herein, it can use machine learning method, the sample for including by the training sample in composed training sample The input of video frame group and sample hotspot graph as initial model, the Sample video frame group inputted and sample hotspot graph institute is right Desired output of the sample face key point information answer, sample object video frame as initial model carries out initial model Training, final training obtain the first identification model.

Herein, various existing convolutional neural networks structures can be used to be trained as initial model.Convolutional Neural Network is a kind of feedforward neural network, its artificial neuron can respond the surrounding cells in a part of coverage area, for Image procossing has outstanding performance, therefore, it is possible to using convolutional neural networks to the Sample video frame in composed training sample Group and sample hotspot graph are handled.It should be noted that other model conducts with image processing function also can be used Initial model, however it is not limited to which convolutional neural networks, specific model structure can set according to actual needs, not limit herein It is fixed.

It should be noted that practice in, for the step of generating model executing subject can with for generating information The executing subject of method is same or different.If identical, the executing subject for the step of generating model can be in training It obtains that trained model is stored in local after model.If it is different, then the executing subject for the step of generating model can Trained model to be sent to the executing subject for being used to generate the method for information after training obtains model.

With continued reference to the signal that Fig. 4, Fig. 4 are according to the application scenarios of the method for generating information of the present embodiment Figure.In the application scenarios of Fig. 4, server 401 can obtain first is pre-stored within local target face video 402, with And the REF video frame 404 of target video frame 403 and target video frame 403 is extracted from target face video 402, wherein base Quasi- video frame 404 is adjacent with target video frame 403, for example, REF video frame 404 can for it is adjacent with target video frame 403, And it is located at the video frame before target video frame 403.

Then, server 401 can determine face key point information 405 corresponding to REF video frame 404, and be based on Identified face key point information 405 and pre-set image 406 generate hotspot graph 407 corresponding to REF video frame 404, In, pre-set image 406 is identical as the shape size difference of REF video frame 404, and the image-region of hotspot graph 407 includes set of values It closes, for the numerical value in numerical value set, the numerical value is for characterizing face key point in the probability of the numerical value position.

Finally, server 401 can be defeated by target video frame 403, REF video frame 404 and hotspot graph generated 407 Enter the first identification model 408 of training in advance, obtains face key point information 409 corresponding to target video frame 403.

The method provided by the above embodiment of the disclosure can be by the face key point of the REF video frame of target video frame Reference data of the information as the face key point information for generating target video frame helps to reduce face key point continuous The shake of video interframe improves the stability of face key point location.

With further reference to Fig. 5, it illustrates the processes 500 of another embodiment of the method for generating information.The use In the process 500 for the method for generating information, comprising the following steps:

Step 501, the REF video frame of target video frame and target video frame is extracted from target face video.

In the present embodiment, can lead to for generating the executing subject (such as server shown in FIG. 1) of the method for information It crosses wired connection mode or radio connection obtains target face video, and extract target view from target face video The REF video frame of frequency frame and target video frame.Wherein, target face video is the people to carry out face critical point detection to it Face video.Face video can be to carry out shooting video obtained to face.Video frame included by face video includes people Face image.In practice, face key point can be point crucial in face, specifically, can be influence face mask or five The point of official's shape.

In the present embodiment, target video frame is the video frame of the face key point information to be determined corresponding to it.Face Key point information can include but is not limited at least one of following: text for characterizing the position of face key point in the video frame Word, number, symbol, image.The REF video frame of target video frame is for determining the key of face corresponding to target video frame The video frame of point information.Herein, REF video frame is adjacent with target video frame.

Step 502, it by the second identification model of REF video frame input training in advance, obtains corresponding to REF video frame Face key point information, and based on identified face key point information and pre-set image, generate corresponding to REF video frame Hotspot graph.

In the present embodiment, based on REF video frame obtained in step 501, above-mentioned executing subject can be by REF video Second identification model of frame input training in advance, obtains face key point information corresponding to REF video frame.Wherein, second knows Other model is used to characterize the corresponding relationship of face key point information corresponding to facial image and facial image.Specifically, conduct Example, the second identification model can be in advance based on for technical staff to face corresponding to a large amount of facial image and facial image The statistics of key point information and pre-establish, be stored with multiple facial images it is corresponding with corresponding face key point information close It is table；Or it is based on preset training sample, initial model (such as neural network) is carried out using machine learning method The model obtained after training.

In some optional implementations of the present embodiment, the second identification model can be trained as follows It arrives: firstly, obtaining training sample set, wherein training sample includes sample facial image and marks in advance for sample facial image The sample face key point information of note.Then, using machine learning method, the sample people for the training sample that training sample is concentrated Face image is as input, using sample face key point information corresponding to the sample facial image inputted as desired output, Training obtains the second identification model.

Specifically, the sample facial image for the training sample that training sample can be concentrated is as predetermined introductory die Sample face key point information corresponding to the sample facial image inputted is made in the input of type (such as convolutional neural networks) For the desired output of initial model, initial model is trained, final training obtains the second identification model.

It is appreciated that REF video frame is the video frame extracted from target face video.And target face video essence On be one according to the time sequencing arrange human face image sequence.Therefore, REF video frame is substantially facial image. In turn, above-mentioned executing subject can be based on the second identification model, determine face key point information corresponding to REF video frame.

In the present embodiment, based on identified face key point information and pre-set image, it is right to generate REF video frame institute The method for the hotspot graph answered can be identical as the method in the embodiment corresponding to Fig. 2, and details are not described herein again.

Step 503, by the first identification of target video frame, REF video frame and hotspot graph generated input training in advance Model obtains face key point information corresponding to target video frame.

In the present embodiment, it is based on obtaining in target video frame, REF video frame and step 502 obtained in step 501 Target video frame, REF video frame and hotspot graph generated can be inputted instruction in advance by the hotspot graph arrived, above-mentioned executing subject The first experienced identification model obtains face key point information corresponding to target video frame.

In the present embodiment, the first identification model can be used for characterizing target video frame, REF video frame and REF video The corresponding relationship of face key point information corresponding to hotspot graph corresponding to frame and target video frame.

Above-mentioned steps 501, step 503 are consistent with step 201, the step 203 in previous embodiment respectively, above with respect to step Rapid 201 and the description of step 203 be also applied for step 501 and step 503, details are not described herein again.

From figure 5 it can be seen that the method for generating information compared with the corresponding embodiment of Fig. 2, in the present embodiment Process 500 highlight the step of determining face key point information corresponding to REF video frame using the second identification model.By This, the scheme of the present embodiment description can use the second identification model, generate people corresponding to more accurate, REF video frame Face key point information can use face key point information corresponding to REF video frame, generate more accurate, mesh in turn Face key point information corresponding to video frame is marked, the accuracy of information generation is improved.

With further reference to Fig. 6, as the realization to method shown in above-mentioned each figure, present disclose provides one kind for generating letter One embodiment of the device of breath, the Installation practice is corresponding with embodiment of the method shown in Fig. 2, which can specifically answer For in various electronic equipments.

As shown in fig. 6, the present embodiment includes: extraction unit 601, determination unit 602 for generating the device 600 of information With generation unit 603.Wherein, extraction unit 601 is configured to extract target video frame and target view from target face video The REF video frame of frequency frame, wherein REF video frame is adjacent with target video frame；Determination unit 602 is configured to determine benchmark Face key point information corresponding to video frame, and based on identified face key point information and pre-set image, generate base Hotspot graph corresponding to quasi- video frame, wherein pre-set image is identical as the shape size difference of REF video frame, the figure of hotspot graph As region includes numerical value set, for the numerical value in numerical value set, the numerical value is for characterizing face key point in the numerical value institute Probability in position；Generation unit 603 is configured to input target video frame, REF video frame and hotspot graph generated pre- First the first identification model of training obtains face key point information corresponding to target video frame.

In the present embodiment, for generate information device 600 extraction unit 601 can by wired connection mode or Person's radio connection obtains target face video, and target video frame and target video frame are extracted from target face video REF video frame.Wherein, target face video is the face video to carry out face critical point detection to it.Face video can Think and face is carried out to shoot video obtained.Video frame included by face video includes facial image.In practice, face Key point can be point crucial in face, specifically, can be the point of influence face mask or face shape.

In the present embodiment, the REF video frame obtained based on extraction unit 601, determination unit 602 determine REF video Face key point information corresponding to frame, and based on identified face key point information and pre-set image, generate benchmark view Hotspot graph corresponding to frequency frame.Wherein.Pre-set image can be image pre-set, for generating hotspot graph, preset figure Picture can be identical as the shape size difference of REF video frame.The image-region of hotspot graph includes numerical value set.For set of values Numerical value in conjunction, the numerical value is for characterizing face key point in the probability of the numerical value position.

In the present embodiment, target video frame, REF video frame and the determination unit obtained based on extraction unit 601 602 obtained hotspot graphs, generation unit 603 can input target video frame, REF video frame and hotspot graph generated pre- First the first identification model of training obtains face key point information corresponding to target video frame.Wherein, the first identification model can For characterizing corresponding to hotspot graph corresponding to target video frame, REF video frame and REF video frame and target video frame The corresponding relationship of face key point information.

In some optional implementations of the present embodiment, determination unit 602 can be further configured to: by benchmark Second identification model of video frame input training in advance, obtains face key point information corresponding to REF video frame.

In some optional implementations of the present embodiment, the second identification model can be trained as follows It arrives: obtaining training sample set, wherein training sample includes sample facial image and the sample that marks in advance for sample facial image This face key point information；Using machine learning method, the sample facial image of the training sample that training sample is concentrated as Sample face key point information corresponding to the sample facial image inputted, is used as desired output by input, trained to obtain the Two identification models.

In some optional implementations of the present embodiment, the first identification model can be obtained by following steps training : obtain multiple Sample video frame groups, wherein Sample video frame group include extracted from sample face video, it is adjacent Two video frames；For the Sample video frame group in multiple Sample video frame groups, following steps are executed: from the Sample video frame group Middle determining sample object video frame and sample REF video frame；Determine that the sample REF video frame institute in the Sample video frame group is right The face key point information answered, and determine face key point corresponding to the sample object video frame in the Sample video frame group Information is as sample face key point information；Based on face key point information corresponding to sample REF video frame and default figure Picture generates sample hotspot graph；Utilize the sample of the Sample video frame group, sample hotspot graph and sample object video frame generated Face key point information forms training sample；Using machine learning method, by the training sample packet in composed training sample The Sample video frame group and sample hotspot graph included, will be corresponding to the Sample video frame group that inputted and sample hotspot graph as input , the sample face key point information of sample object video frame as desired output, training obtains the first identification model.

In some optional implementations of the present embodiment, the sample object video frame in the Sample video frame group is determined Corresponding face key point information is as sample face key point information, comprising: determines the sample in the Sample video frame group Initial Face key point information corresponding to target video frame；Based on the face key point information for being in advance sample REF video frame The weight distributed respectively with the Initial Face key point information of sample object video frame, to identified, sample REF video frame Face key point information and identified, sample object video frame Initial Face key point information be weighted at summation Reason obtains sample face key point information of the processing result as the sample object video frame in the Sample video frame group.

In some optional implementations of the present embodiment, determination unit 602 may include: the first generation module (figure In be not shown), be configured to generate on pre-set image using Gaussian function the face key point information of REF video frame it is right The numerical value set answered；Second generation module (not shown) is configured to based on default including numerical value set generated Image generates hotspot graph corresponding to REF video frame.

It is understood that all units recorded in the device 600 and each step phase in the method with reference to Fig. 2 description It is corresponding.As a result, above with respect to the operation of method description, the beneficial effect of feature and generation be equally applicable to device 600 and its In include unit, details are not described herein.

The device provided by the above embodiment 600 of the disclosure can be crucial by the face of the REF video frame of target video frame Reference data of the point information as the face key point information for generating target video frame helps to reduce face key point continuous Video interframe shake, improve face key point location stability.

Below with reference to Fig. 7, it illustrates the electronic equipment that is suitable for being used to realize embodiment of the disclosure, (example is as shown in figure 1 Server or terminal device) 700 structural schematic diagram.Terminal device in embodiment of the disclosure can include but is not limited to all As mobile phone, laptop, digit broadcasting receiver, PDA (personal digital assistant), PAD (tablet computer), PMP are (portable Formula multimedia player), the mobile terminal and such as number TV, desk-top meter of car-mounted terminal (such as vehicle mounted guidance terminal) etc. The fixed terminal of calculation machine etc..Terminal device or server shown in Fig. 7 are only an example, should not be to the implementation of the disclosure The function and use scope of example bring any restrictions.

As shown in fig. 7, electronic equipment 700 may include processing unit (such as central processing unit, graphics processor etc.) 701, random access can be loaded into according to the program being stored in read-only memory (ROM) 702 or from storage device 708 Program in memory (RAM) 703 and execute various movements appropriate and processing.In RAM 703, it is also stored with electronic equipment Various programs and data needed for 700 operations.Processing unit 701, ROM 702 and RAM703 are connected with each other by bus 704. Input/output (I/O) interface 705 is also connected to bus 704.

In general, following device can connect to I/O interface 705: including such as touch screen, touch tablet, keyboard, mouse, taking the photograph As the input unit 706 of head, microphone, accelerometer, gyroscope etc.；Including such as liquid crystal display (LCD), loudspeaker, vibration The output device 707 of dynamic device etc.；Storage device 708 including such as tape, hard disk etc.；And communication device 709.Communication device 709, which can permit electronic equipment 700, is wirelessly or non-wirelessly communicated with other equipment to exchange data.Although Fig. 7 shows tool There is the electronic equipment 700 of various devices, it should be understood that being not required for implementing or having all devices shown.It can be with Alternatively implement or have more or fewer devices.Each box shown in Fig. 7 can represent a device, can also root According to needing to represent multiple devices.

Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description Software program.For example, embodiment of the disclosure includes a kind of computer program product comprising be carried on computer-readable medium On computer program, which includes the program code for method shown in execution flow chart.In such reality It applies in example, which can be downloaded and installed from network by communication device 709, or from storage device 708 It is mounted, or is mounted from ROM 702.When the computer program is executed by processing unit 701, the implementation of the disclosure is executed The above-mentioned function of being limited in the method for example.It should be noted that computer-readable medium described in embodiment of the disclosure can be with It is computer-readable signal media or computer readable storage medium either the two any combination.It is computer-readable Storage medium for example may be-but not limited to-the system of electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, device or Device, or any above combination.The more specific example of computer readable storage medium can include but is not limited to: have The electrical connection of one or more conducting wires, portable computer diskette, hard disk, random access storage device (RAM), read-only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD- ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.In embodiment of the disclosure, computer Readable storage medium storing program for executing can be any tangible medium for including or store program, which can be commanded execution system, device Either device use or in connection.And in embodiment of the disclosure, computer-readable signal media may include In a base band or as the data-signal that carrier wave a part is propagated, wherein carrying computer-readable program code.It is this The data-signal of propagation can take various forms, including but not limited to electromagnetic signal, optical signal or above-mentioned any appropriate Combination.Computer-readable signal media can also be any computer-readable medium other than computer readable storage medium, should Computer-readable signal media can send, propagate or transmit for by instruction execution system, device or device use or Person's program in connection.The program code for including on computer-readable medium can transmit with any suitable medium, Including but not limited to: electric wire, optical cable, RF (radio frequency) etc. or above-mentioned any appropriate combination.

Above-mentioned computer-readable medium can be included in above-mentioned electronic equipment；It is also possible to individualism, and not It is fitted into the electronic equipment.Above-mentioned computer-readable medium carries one or more program, when said one or more When a program is executed by the electronic equipment, so that the electronic equipment: extracting target video frame and target from target face video The REF video frame of video frame, wherein REF video frame is adjacent with target video frame；Determine face corresponding to REF video frame Key point information, and based on identified face key point information and pre-set image, generate heat corresponding to REF video frame Point diagram, wherein pre-set image is identical as the shape size difference of REF video frame, and the image-region of hotspot graph includes set of values It closes, for the numerical value in numerical value set, the numerical value is for characterizing face key point in the probability of the numerical value position；By mesh The first identification model for marking video frame, REF video frame and hotspot graph generated input training in advance, obtains target video frame Corresponding face key point information.

The behaviour for executing embodiment of the disclosure can be write with one or more programming languages or combinations thereof The computer program code of work, described program design language include object oriented program language-such as Java, Smalltalk, C++ further include conventional procedural programming language-such as " C " language or similar program design language Speech.Program code can be executed fully on the user computer, partly be executed on the user computer, as an independence Software package execute, part on the user computer part execute on the remote computer or completely in remote computer or It is executed on server.In situations involving remote computers, remote computer can pass through the network of any kind --- packet It includes local area network (LAN) or wide area network (WAN)-is connected to subscriber computer, or, it may be connected to outer computer (such as benefit It is connected with ISP by internet).

Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the disclosure, method and computer journey The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation A part of one module, program segment or code of table, a part of the module, program segment or code include one or more use The executable instruction of the logic function as defined in realizing.It should also be noted that in some implementations as replacements, being marked in box The function of note can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are actually It can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it to infuse Meaning, the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart can be with holding The dedicated hardware based system of functions or operations as defined in row is realized, or can use specialized hardware and computer instruction Combination realize.

Being described in unit involved in embodiment of the disclosure can be realized by way of software, can also be passed through The mode of hardware is realized.Described unit also can be set in the processor, for example, can be described as: a kind of processor Including extraction unit, determination unit and generation unit.Wherein, the title of these units is not constituted under certain conditions to the list The restriction of member itself, for example, extraction unit is also described as " extracting the unit of video frame ".

Above description is only the preferred embodiment of the disclosure and the explanation to institute's application technology principle.Those skilled in the art Member it should be appreciated that embodiment of the disclosure involved in invention scope, however it is not limited to the specific combination of above-mentioned technical characteristic and At technical solution, while should also cover do not depart from foregoing invention design in the case where, by above-mentioned technical characteristic or its be equal Feature carries out any combination and other technical solutions for being formed.Such as disclosed in features described above and embodiment of the disclosure (but It is not limited to) technical characteristic with similar functions is replaced mutually and the technical solution that is formed.

Claims

1. a kind of method for generating information, comprising:

The REF video frame of target video frame and the target video frame is extracted from target face video, wherein the benchmark Video frame is adjacent with the target video frame；

Determine face key point information corresponding to the REF video frame, and based on identified face key point information and Pre-set image generates hotspot graph corresponding to the REF video frame, wherein the shape of pre-set image and the REF video frame Size difference is identical, and the image-region of hotspot graph includes numerical value set, and for the numerical value in numerical value set, the numerical value is for characterizing Face key point is in the probability of the numerical value position；

By the first identification mould of the target video frame, the REF video frame and hotspot graph generated input training in advance Type obtains face key point information corresponding to the target video frame.

2. according to the method described in claim 1, wherein, face key point corresponding to the determination REF video frame is believed Breath, comprising:

By the second identification model of REF video frame input training in advance, face corresponding to the REF video frame is obtained Key point information.

3. according to the method described in claim 2, wherein, training obtains second identification model as follows:

Obtain training sample set, wherein training sample includes sample facial image and marks in advance for sample facial image Sample face key point information；

Using machine learning method, the sample facial image for the training sample that the training sample is concentrated is as input, by institute Sample face key point information corresponding to the sample facial image of input obtains the second identification mould as desired output, training Type.

4. according to the method described in claim 1, wherein, first identification model is obtained by following steps training:

Obtain multiple Sample video frame groups, wherein Sample video frame group include extracted from sample face video, it is adjacent Two video frames；

For the Sample video frame group in the multiple Sample video frame group, following steps are executed: from the Sample video frame group Determine sample object video frame and sample REF video frame；It determines corresponding to the sample REF video frame in the Sample video frame group Face key point information, and determine the Sample video frame group in sample object video frame corresponding to face key point letter Breath is used as sample face key point information；Based on face key point information and pre-set image corresponding to sample REF video frame, Generate sample hotspot graph；Utilize the sample people of the Sample video frame group, sample hotspot graph generated and sample object video frame Face key point information forms training sample；

Using machine learning method, the Sample video frame group for including by the training sample in composed training sample and sample heat Point diagram as input, by corresponding to the Sample video frame group inputted and sample hotspot graph, the sample of sample object video frame Face key point information obtains the first identification model as desired output, training.

5. according to the method described in claim 4, wherein, the sample object video frame institute in the determination Sample video frame group Corresponding face key point information is as sample face key point information, comprising:

Determine Initial Face key point information corresponding to the sample object video frame in the Sample video frame group；

Initial Face key point based on the face key point information and sample object video frame that are in advance sample REF video frame The weight that information is distributed respectively, to identified, sample REF video frame face key point information and identified, sample mesh The Initial Face key point information of mark video frame is weighted summation process, obtains processing result as in the Sample video frame group Sample object video frame sample face key point information.

6. method described in one of -5 according to claim 1, wherein described based on identified face key point information and default Image generates hotspot graph corresponding to the REF video frame, comprising:

Number corresponding to the face key point information of the REF video frame is generated on the pre-set image using Gaussian function Value set；

Based on the pre-set image including numerical value set generated, hotspot graph corresponding to the REF video frame is generated.

7. a kind of for generating the device of information, comprising:

Extraction unit is configured to extract the REF video of target video frame and the target video frame from target face video Frame, wherein the REF video frame is adjacent with the target video frame；

Determination unit, is configured to determine face key point information corresponding to the REF video frame, and based on determining Face key point information and pre-set image, generate hotspot graph corresponding to the REF video frame, wherein pre-set image and institute The shape size difference for stating REF video frame is identical, and the image-region of hotspot graph includes numerical value set, in numerical value set Numerical value, the numerical value is for characterizing face key point in the probability of the numerical value position；

Generation unit is configured to input the target video frame, the REF video frame and hotspot graph generated preparatory The first trained identification model obtains face key point information corresponding to the target video frame.

8. device according to claim 7, wherein the determination unit is further configured to:

9. device according to claim 8, wherein training obtains second identification model as follows:

10. device according to claim 7, wherein first identification model is obtained by following steps training:

11. device according to claim 10, wherein the sample object video frame in the determination Sample video frame group Corresponding face key point information is as sample face key point information, comprising:

12. the device according to one of claim 7-11, wherein the determination unit includes:

First generation module is configured to generate the face of the REF video frame on the pre-set image using Gaussian function Numerical value set corresponding to key point information；

Second generation module is configured to generate the REF video based on the pre-set image including numerical value set generated Hotspot graph corresponding to frame.

13. a kind of electronic equipment, comprising:

One or more processors；

Storage device is stored thereon with one or more programs,

When one or more of programs are executed by one or more of processors, so that one or more of processors are real Now such as method as claimed in any one of claims 1 to 6.

14. a kind of computer-readable medium, is stored thereon with computer program, wherein the realization when program is executed by processor Such as method as claimed in any one of claims 1 to 6.