CN110287912A

CN110287912A - Method, apparatus and medium are determined based on the target object affective state of deep learning

Info

Publication number: CN110287912A
Application number: CN201910576185.3A
Authority: CN
Inventors: 赵志舜; 黄国恒
Original assignee: Guangdong University of Technology
Current assignee: Guangdong University of Technology
Priority date: 2019-06-28
Filing date: 2019-06-28
Publication date: 2019-09-27

Abstract

This application discloses a kind of target object affective states based on deep learning to determine method, comprising: includes the picture frame of target object in acquisition video clip；Picture frame is separately input into Face datection model and behavioral value model, determines corresponding expression affective state and behavior affective state using the face characteristic and behavioural characteristic that identify；The target affective state of target object is determined according to expression affective state and behavior affective state.As it can be seen that compared to the prior art, the target object affective state provided in this embodiment based on deep learning determines that method further contemplates influence of the behavioural characteristic of target object to determining affective state, therefore can more accurately obtain the affective state of target object.Disclosed herein as well is a kind of target object affective state determining device and computer readable storage medium based on deep learning, all have above-mentioned beneficial effect.

Description

Method, apparatus and medium are determined based on the target object affective state of deep learning

Technical field

The present invention relates to field of image recognition, in particular to a kind of target object affective state based on deep learning determines Method, apparatus and computer readable storage medium.

Background technique

With the development of information technology, video technique in people's daily life using more more and more universal.For example, people Can be used Internet chatroom carry out Video chat, trans-corporation by network progress video conference, subway, square, supermarket Equal public places carry out video monitoring etc. by camera.Currently, in order to further such that people can be in the mistake of viewing video Cheng Zhonggeng directly and accurately understands the affective state of the target object in video clip, improves the efficiency linked up between men, Or timely early warning can be carried out to the target object of abnormal feeling, dangerous generation is avoided, is proposed a kind of based on depth The target object affective state of habit determines method.In this method, by carrying out figure to from the picture frame extracted in video clip As identification, the affective state of target object is determined according to the expression of the target object identified.But due to target object Expression and actual affective state are not complete corresponding relationship, therefore determine target pair according to method in the prior art Error will be present in the affective state of elephant.

Therefore, the accuracy for determining the affective state of target object how is improved, is that those skilled in the art need at present Technical problems to be solved.

Summary of the invention

In view of this, the purpose of the present invention is to provide a kind of target object affective state determination sides based on deep learning Method can be improved the accuracy for determining the affective state of target object；It is a further object of the present invention to provide one kind based on deep The target object affective state determining device and computer readable storage medium for spending study, all have above-mentioned beneficial effect.

In order to solve the above technical problems, the present invention provides a kind of target object affective state determination side based on deep learning Method, comprising:

It include the picture frame of target object in acquisition video clip；

Described image frame is separately input into Face datection model and behavioral value model, it is special using the face identified Behavioural characteristic of seeking peace determines corresponding expression affective state and behavior affective state；Wherein, the Face datection model and institute Stating behavioral value model is the model being arranged by deep learning；

The target emotion shape of the target object is determined according to the expression affective state and the behavior affective state State.

Preferably, described that described image frame is separately input into Face datection model and behavioral value model, utilize knowledge Not Chu face characteristic and behavioural characteristic determine the process of corresponding expression affective state and behavior affective state, it is specific to wrap It includes:

Described image frame is input in the Face datection model, the extraction of Face datection and facial key point is carried out, Obtain the face coordinate of the target object；

The corresponding face image of the face coordinate is input in the facial expression disaggregated model trained in advance, is obtained Corresponding expression affective state；

Described image frame is input in the behavioral value model, the extraction of human testing and human body key point is carried out, Obtain the limbs coordinate of the target object；

The described image frame for being provided with the limbs coordinate is input in the limbs expression classification model trained in advance, Obtain corresponding behavior affective state.

Preferably, it is described acquisition video clip in include target object picture frame after, further comprise:

The similarity between any two described image frame is calculated using the feature vector in each described image frame and is adopted Sample distance threshold；

Initial clustering is carried out to each described image frame using the sampled distance threshold value and the similarity, and optimizes and obtains Target cluster centre；

Key frame is determined using the target cluster centre；

It is corresponding, it is described that described image frame is separately input into Face datection model and behavioral value model, utilize knowledge Not Chu face characteristic and behavioural characteristic determine the process of corresponding expression affective state and behavior affective state, specifically:

The key frame is separately input into Face datection model and behavioral value model, it is special using the face identified Behavioural characteristic of seeking peace determines corresponding expression affective state and behavior affective state.

Preferably, the Face datection model and the behavioral value model are specially in Two-pathway CNN network Face datection model and behavioral value model.

Preferably, further comprise:

Record the target object affective state and the corresponding time for determining the affective state.

In order to solve the above technical problems, the present invention also provides a kind of target object affective states based on deep learning to determine Device, comprising:

Obtain module, for obtain include in video clip target object picture frame；

Identification module is utilized for described image frame to be separately input into Face datection model and behavioral value model The face characteristic and behavioural characteristic identified determines corresponding expression affective state and behavior affective state；Wherein, the people Face detection model and the behavioral value model are the model being arranged by deep learning；

Determining module, for determining the target object according to the expression affective state and the behavior affective state Target affective state.

Preferably, further comprise:

Computing module, for calculating the phase between any two picture frame using the feature vector in each described image frame Like degree and sampled distance threshold value；

Cluster module, for initially being gathered using the sampled distance threshold value and the similarity to each described image frame Class, and optimize and obtain target cluster centre；

Determining module, for determining key frame using the target cluster centre.

In order to solve the above technical problems, the target object affective state the present invention also provides another kind based on deep learning is true Determine device, comprising:

Memory, for storing computer program；

Processor realizes any of the above-described kind of target object feelings based on deep learning when for executing the computer program The step of sense state determines method.

In order to solve the above technical problems, the present invention also provides a kind of computer readable storage medium, it is described computer-readable Computer program is stored on storage medium, the computer program realizes that any of the above-described kind is based on depth when being executed by processor The step of target object affective state of study determines method.

Target object affective state provided by the invention based on deep learning determines method, further contemplates target pair The behavioural characteristic of elephant to determine affective state influence, by obtain video clip in include target object picture frame it Afterwards, picture frame is separately input into Face datection model and behavioral value model, utilizes the face characteristic and behavior identified Feature determines corresponding expression affective state and behavior affective state, then according to expression affective state and behavior affective state It determines the target affective state of target object, therefore can more accurately obtain the affective state of target object.

In order to solve the above technical problems, the present invention also provides a kind of, the target object affective state based on deep learning is true Determine device and computer readable storage medium, all has above-mentioned beneficial effect.

Detailed description of the invention

It in order to illustrate the embodiments of the present invention more clearly or the technical solution of the prior art, below will be to embodiment or existing Attached drawing needed in technical description is briefly described, it should be apparent that, the accompanying drawings in the following description is only this hair Bright some embodiments for those of ordinary skill in the art without creative efforts, can be with root Other attached drawings are obtained according to the attached drawing of offer.

Fig. 1 is a kind of stream that method is determined based on the target object affective state of deep learning provided in an embodiment of the present invention Cheng Tu；

Fig. 2 is a kind of knot of the target object affective state determining device based on deep learning provided in an embodiment of the present invention Composition；

Fig. 3 is another target object affective state determining device based on deep learning provided in an embodiment of the present invention Structure chart.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.

The core of the embodiment of the present invention is to provide a kind of target object affective state based on deep learning and determines method, energy It is enough to improve the accuracy for determining the affective state of target object；Another core of the invention is to provide a kind of based on deep learning Target object affective state determining device and computer readable storage medium, all have above-mentioned beneficial effect.

It is right with reference to the accompanying drawings and detailed description in order to make those skilled in the art more fully understand the present invention program The present invention is described in further detail.

Fig. 1 is a kind of stream that method is determined based on the target object affective state of deep learning provided in an embodiment of the present invention Cheng Tu.As shown in Figure 1, a kind of target object affective state based on deep learning determines that method includes:

S10: including the picture frame of target object in acquisition video clip.

Specifically, the present embodiment is further to extract in video clip after getting video clip for true first Set the goal object affective state picture frame, and due to needing to carry out analysis identification to picture frame frame, the picture frame In need to include target object.

S20: picture frame is separately input into Face datection model and behavioral value model, special using the face identified Behavioural characteristic of seeking peace determines corresponding expression affective state and behavior affective state；Wherein, Face datection model and behavior inspection Surveying model is the model being arranged by deep learning.

Specifically, picture frame is then separately input into Face datection model and behavioral value after getting picture frame In model, recognition of face and Activity recognition are carried out, the face characteristic of target object is obtained by recognition of face, according to Activity recognition It obtains the behavioural characteristic of target object, and corresponding expression affective state and row is obtained according to face characteristic and behavioural characteristic respectively For affective state.

S30: the target affective state of target object is determined according to expression affective state and behavior affective state.

It is understood that the emotion contained under different behavior states is different for same expression.Example Such as, for this affective state of cryying, if not combining specific behavior, we can not judge that this people is the tears that weep through sorrow Still it is so happy as to weep.Specifically, analyzed in the present embodiment further combined with expression affective state and behavior affective state, Such as can be weighted by pre-set expression affective state and the corresponding weight of behavior affective state, it determines The target affective state of target object out.

It should be noted that in practical applications, can be carried out according to the target affective state for the target object determined Corresponding operation: for example, it is assumed that be the video clip obtained from the camera for being set to public place, it can be by identifying The target affective state of target object in video clip, obtains the target object in abnormal object affective state, such as angry, Indignation etc., so as to carry out early warning preparation, in advance so as to control in time abnormal conditions.

As it can be seen that compared to the prior art, the target object affective state provided in this embodiment based on deep learning determines Method further contemplates influence of the behavioural characteristic of target object to affective state is determined, by wrapping in obtaining video clip After the picture frame for including target object, picture frame is separately input into Face datection model and behavioral value model, is utilized The face characteristic and behavioural characteristic identified determines corresponding expression affective state and behavior affective state, then according to expression Affective state and behavior affective state determine the target affective state of target object, therefore can more accurately obtain target object Affective state.

On the basis of the above embodiments, the present embodiment has made further instruction and optimization to technical solution, specifically, Picture frame is separately input into Face datection model and behavioral value model, the face characteristic and behavioural characteristic identified is utilized The process for determining corresponding expression affective state and behavior affective state, specifically includes:

Picture frame is input in Face datection model, the extraction of Face datection and facial key point is carried out, obtains target The face coordinate of object；

The corresponding face image of face coordinate is input in the facial expression disaggregated model trained in advance, obtains correspondence Expression affective state；

Picture frame is input in behavioral value model, the extraction of human testing and human body key point is carried out, obtains target The limbs coordinate of object；

The picture frame for being provided with limbs coordinate is input in the limbs expression classification model trained in advance, obtains correspondence Behavior affective state.

Specifically, picture frame is separately input into Face datection model and behavioral value model first, face inspection is carried out Survey and the extraction of facial key point and the extraction for carrying out human testing and human body key point, obtain target object face coordinate and The limbs coordinate of target object.

Then, the corresponding face image of face coordinate is input in the facial expression disaggregated model trained in advance, is obtained Corresponding expression affective state out；The picture frame for being provided with limbs coordinate is input to the limbs expression classification mould trained in advance In type, corresponding behavior affective state is obtained.

In the present embodiment, Face datection model and behavioral value model are specially in Two-pathway CNN network Face datection model and behavioral value model.

Specifically, the present embodiment be Face datection and human testing are carried out by Two-pathway CNN network, thus Opposite detection efficiency can be improved, save detection time.Also, in Two-pathway CNN network, Face datection model can To be specially Mt-CNN convolutional neural networks, behavioral value model can be specially faster-RCNN convolutional neural networks, this reality Example is applied not limit this.In addition, facial expression disaggregated model can be specifically local CNN neural network, limbs expression point Class model can be specifically global CNN neural network, and the present embodiment does not do specific restriction to this yet.

As it can be seen that the present embodiment is to utilize model trained in advance, and obtain expression affective state and behavior according to picture frame Affective state, therefore can be derived that more accurate recognition result.

It is understood that video clip is as a kind of unstructured dynamic number being made of continuous associated images frame According to often comprising some redundancies.Such as in the same camera lens, the picture frame and the image at t+1 moment of video t moment Often therefore difference very little carries out the emotion recognition of target object in foundation video clip, really to frame on vision content and feature During the affective state for making target object, picture frame all in video clip is all used for the emotion shape of target object The analysis of state, will be so that analytic process be complicated, and calculation amount is huge, and operates redundancy.Therefore, the present embodiment is in above-mentioned implementation On the basis of example, further instruction and optimization are made to technical solution, specifically, including target in obtaining video clip After the picture frame of object, further comprise:

The similarity and sampled distance threshold between any two picture frame are calculated using the feature vector in each picture frame Value；

Initial clustering is carried out to each picture frame using sampled distance threshold value and similarity, and optimizes and obtains in target cluster The heart；

Key frame is determined using target cluster centre.

Specifically, being the video clip that acquisition includes target object first, then extracting each video in the present embodiment Each picture frame in segment obtains the feature vector in each picture frame, calculates the Euclidean distance between any two picture frame, Similarity i.e. between any two picture frame, recycles calculated Euclidean distance to calculate sampled distance threshold value.

More specifically, the calculation method of the Euclidean distance between any two picture frame is as follows:

Wherein, dis (F_i,F_j) indicate any two picture frame feature vector Euclidean distance；F_iAnd F_jIndicate video figure The feature vector of any two frames i and j as in；R indicates each picture frame；N is the dimension of the feature vector of picture frame.

Specifically, the calculation method of sampled distance threshold value is as follows:

Wherein, c constant, N are the quantity of picture frame；Indicate the number of the Euclidean distance calculated.

Different sampled distance threshold values can be chosen to different video clips by this method, also, calculated The initial clustering that sampled distance threshold value less than normal can make subsequent acquisition more is conducive to the class heart and merges and secondary cluster.

After calculating sampled distance threshold value and similarity, using sampled distance threshold value and similarity in video clip Picture frame carry out initial clustering, obtain initial cluster center set C and clusters number K.

Specifically, by random selection one picture frame feature vector be used as cluster centre, calculating it is remaining each Feature vector is at a distance from cluster centre；Then judge whether the minimum value in each distance is less than sampled distance threshold value, that is, judge Min(dis)≥2*d；If so, cluster data is added 1, and using the picture frame as a cluster centre, that is, by the picture frame Corresponding feature vector is added in the set of cluster centre；If it is not, then the corresponding feature vector of the picture frame is deleted；Again from Next picture frame cluster centre the most is chosen in picture frame and carries out judgement calculating, until by all picture frames of video clip It has carried out judgement to calculate, has obtained initial cluster center set C and clusters number K.

Then, the initial cluster center set C obtained is clustered using K-means algorithm again, is updated Cluster centre set G successively calculates the distance between two cluster centres in cluster centre set G, if Min (dis)≤2*d, The two cluster centres are then merged by Sequence cluster, and calculate corresponding updated cluster centre；Otherwise, it chooses another Group cluster center carries out judgement calculating, until each cluster centre in traversal cluster centre set G, thus the mesh after being optimized Mark cluster centre.

After obtaining target cluster centre, the image nearest from each target cluster centre is chosen from all picture frames Key frame is separately input into Face datection model and behavioral value model by key frame of the frame as the video clip, correspondence, Corresponding expression affective state and behavior affective state are determined using the face characteristic and behavioural characteristic that identify.

In the present embodiment, by the key frame for further selecting the video object, it is possible to reduce image interframe it is a large amount of superfluous Remaining information, the information that more condensed compactly expression video clip includes, the emotion of target object is carried out using the key frame selected The determination of state can be further reduced calculation amount, improve the real-time for determining the affective state of target object.

On the basis of the above embodiments, the present embodiment has made further instruction and optimization to technical solution, specifically, The present embodiment further comprises:

The affective state of record target object and the corresponding time for determining affective state.

Specifically, the target affective state that target object is determined according to expression affective state and behavior affective state it Afterwards, record target object affective state and the corresponding time for determining affective state.It should be noted that specific record Mode can be with text or record in table form, and the present embodiment does not limit this.More specifically, it can be with memory The modes such as item, hard disk, TF (Trans-flash Card) card and SD (Secure Digital Memory Card) card are deposited Storage, is selected, the present embodiment is not limited this with specific reference to actual demand.

In the present embodiment, affective state and the corresponding time for determining affective state by record target object, energy Enough convenient for the emotion situation of change of subsequent further analysis target object, so as to further promote usage experience.

Above for a kind of implementation for determining method based on the target object affective state of deep learning provided by the invention Example is described in detail, and the present invention also provides a kind of target object emotions based on deep learning corresponding with this method State determination device and computer readable storage medium, due to the embodiment and side of device and computer readable storage medium part The embodiment of method part mutually correlates, therefore the embodiment of device and computer readable storage medium part refers to method part Embodiment description, wouldn't repeat here.

Fig. 2 is a kind of knot of the target object affective state determining device based on deep learning provided in an embodiment of the present invention Composition, as shown in Fig. 2, a kind of target object affective state determining device based on deep learning includes:

Obtain module 21, for obtain include in video clip target object picture frame；

Identification module 22 utilizes knowledge for picture frame to be separately input into Face datection model and behavioral value model Not Chu face characteristic and behavioural characteristic determine corresponding expression affective state and behavior affective state；Wherein, Face datection Model and behavioral value model are the model being arranged by deep learning；

Determining module 23, for determining the target emotion of target object according to expression affective state and behavior affective state State.

Target object affective state determining device provided in an embodiment of the present invention based on deep learning has above-mentioned be based on The target object affective state of deep learning determines the beneficial effect of method.

As preferred embodiment, further comprise:

Computing module, for calculating the similarity between any two picture frame using the feature vector in each picture frame With sampled distance threshold value；

Cluster module for carrying out initial clustering to each picture frame using sampled distance threshold value and similarity, and optimizes To target cluster centre；

Determining module, for determining key frame using target cluster centre.

Fig. 3 is another target object affective state determining device based on deep learning provided in an embodiment of the present invention Structure chart, as shown in figure 3, a kind of target object affective state determining device based on deep learning includes:

Memory 31, for storing computer program；

Processor 32 realizes such as the above-mentioned target object affective state based on deep learning when for executing computer program The step of determining method.

In order to solve the above technical problems, the present invention also provides a kind of computer readable storage medium, computer-readable storage It is stored with computer program on medium, such as the above-mentioned target pair based on deep learning is realized when computer program is executed by processor The step of determining method as affective state.

Computer readable storage medium provided in an embodiment of the present invention has the above-mentioned target object feelings based on deep learning Sense state determines the beneficial effect of method.

Method, apparatus and calculating are determined to the target object affective state provided by the present invention based on deep learning above Machine readable storage medium storing program for executing is described in detail.Specific embodiment used herein to the principle of the present invention and embodiment into Elaboration is gone, the above description of the embodiment is only used to help understand the method for the present invention and its core ideas.It should be pointed out that pair For those skilled in the art, without departing from the principle of the present invention, the present invention can also be carried out Some improvements and modifications, these improvements and modifications also fall within the scope of protection of the claims of the present invention.

Each embodiment is described in a progressive manner in specification, the highlights of each of the examples are with other realities The difference of example is applied, the same or similar parts in each embodiment may refer to each other.For device disclosed in embodiment Speech, since it is corresponded to the methods disclosed in the examples, so being described relatively simple, related place is referring to method part illustration ?.

Professional further appreciates that, unit described in conjunction with the examples disclosed in the embodiments of the present disclosure And algorithm steps, can be realized with electronic hardware, computer software, or a combination of the two, in order to clearly demonstrate hardware and The interchangeability of software generally describes each exemplary composition and step according to function in the above description.These Function is implemented in hardware or software actually, the specific application and design constraint depending on technical solution.Profession Technical staff can use different methods to achieve the described function each specific application, but this realization is not answered Think beyond the scope of this invention.

Claims

1. a kind of target object affective state based on deep learning determines method characterized by comprising

It include the picture frame of target object in acquisition video clip；

Described image frame is separately input into Face datection model and behavioral value model, using the face characteristic that identifies and Behavioural characteristic determines corresponding expression affective state and behavior affective state；Wherein, the Face datection model and the row It is the model being arranged by deep learning for detection model；

The target affective state of the target object is determined according to the expression affective state and the behavior affective state.

2. the method according to claim 1, wherein described be separately input into Face datection mould for described image frame In type and behavioral value model, corresponding expression affective state and row are determined using the face characteristic and behavioural characteristic that identify For the process of affective state, specifically include:

Described image frame is input in the Face datection model, the extraction of Face datection and facial key point is carried out, obtains The face coordinate of the target object；

The corresponding face image of the face coordinate is input in the facial expression disaggregated model trained in advance, obtains correspondence Expression affective state；

Described image frame is input in the behavioral value model, the extraction of human testing and human body key point is carried out, obtains The limbs coordinate of the target object；

The described image frame for being provided with the limbs coordinate is input in the limbs expression classification model trained in advance, is obtained Corresponding behavior affective state.

3. the method according to claim 1, wherein including target object in the acquisition video clip After picture frame, further comprise:

Using the feature vector in each described image frame calculate similarity between any two described image frame and sampling away from From threshold value；

Key frame is determined using the target cluster centre；

It is corresponding, it is described that described image frame is separately input into Face datection model and behavioral value model, using identifying Face characteristic and behavioural characteristic determine the process of corresponding expression affective state and behavior affective state, specifically:

The key frame is separately input into Face datection model and behavioral value model, using the face characteristic that identifies and Behavioural characteristic determines corresponding expression affective state and behavior affective state.

4. the method according to claim 1, wherein the Face datection model and behavioral value model tool Body is the Face datection model and behavioral value model in Two-pathway CNN network.

5. method according to any one of claims 1 to 4, which is characterized in that further comprise:

6. a kind of target object affective state determining device based on deep learning characterized by comprising

Obtain module, for obtain include in video clip target object picture frame；

Identification module utilizes identification for described image frame to be separately input into Face datection model and behavioral value model Face characteristic and behavioural characteristic out determines corresponding expression affective state and behavior affective state；Wherein, the face inspection It surveys model and the behavioral value model is the model being arranged by deep learning；

Determining module, for determining the mesh of the target object according to the expression affective state and the behavior affective state Mark affective state.

7. device according to claim 6, which is characterized in that further comprise:

Computing module, for calculating the similarity between any two picture frame using the feature vector in each described image frame With sampled distance threshold value；

Cluster module, for carrying out initial clustering to each described image frame using the sampled distance threshold value and the similarity, And optimizes and obtain target cluster centre；

Determining module, for determining key frame using the target cluster centre.

8. a kind of target object affective state determining device based on deep learning characterized by comprising

Memory, for storing computer program；

Processor is realized when for executing the computer program and is based on deep learning as described in any one of claim 1 to 5 Target object affective state the step of determining method.

9. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium Program, when the computer program is executed by processor realize as it is described in any one of claim 1 to 5 based on deep learning The step of target object affective state determines method.