CN109697416A - A kind of video data handling procedure and relevant apparatus - Google Patents
A kind of video data handling procedure and relevant apparatus Download PDFInfo
- Publication number
- CN109697416A CN109697416A CN201811532116.4A CN201811532116A CN109697416A CN 109697416 A CN109697416 A CN 109697416A CN 201811532116 A CN201811532116 A CN 201811532116A CN 109697416 A CN109697416 A CN 109697416A
- Authority
- CN
- China
- Prior art keywords
- video frame
- target
- video
- region
- frame
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/40—Spoof detection, e.g. liveness detection
- G06V40/45—Detection of the body part being alive
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/172—Classification, e.g. identification
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Human Computer Interaction (AREA)
- General Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
The embodiment of the invention discloses a kind of video data handling procedure and relevant apparatus, this method comprises: obtaining target video sequence, targeted object region where extracting target object in each video frame of target video sequence, and crucial point location is carried out to the target object in targeted object region, obtain the location information of the target critical point of the target object in each video frame and the target critical point of each video frame;The location information of target critical point based on each video frame obtains the corresponding behavioral characteristics value of target critical point of each video frame;Meet the video frame under dbjective state from behavioral characteristics value is chosen in target video sequence, and the video frame under dbjective state is determined as key video sequence frame, and local image region corresponding to target critical point is taken in key video sequence frame and is identified, target identification is obtained as a result, determining the attribute of target object in turn.Using the present invention, the precision of vivo identification can be improved, to enhance the authentication dynamics of system.
Description
Technical field
The present invention relates to Internet technical field more particularly to a kind of video data handling procedures and relevant apparatus.
Background technique
With the development of science and technology, the terminals such as mobile phone, computer, attendance recorder are commonly used.Currently, all may be used in many terminals
To integrate face identification system, so as to carry out authentication based on recognition of face, and when authentication passes through
Corresponding operation function is triggered, for example, can be operated with the unlock of triggering terminal etc..
However, when terminal gets the target image of user by the face identification system, no matter the target figure
It whether is that recognition of face directly can be carried out to the face in the target image got comprising the user as in.In other words,
In existing face identification system, the single picture got can be identified, therefore, when the user is (for example, illegal
User) by false face (utilizing other people photo) Lai Jinhang authentication when, once the face characteristic in the photo accords with
The target image characteristics of preset owner are closed, then the illegal user can be mistaken for owner, and then trigger unlock operation, to drop
The low precision of face vivo identification, and seriously affect the authentication dynamics of system.
Summary of the invention
The embodiment of the present invention provides a kind of video data handling procedure and relevant apparatus, and the accurate of vivo identification can be improved
Degree, to enhance the authentication dynamics of system.
On the one hand the embodiment of the present invention provides a kind of video data handling procedure, comprising:
Target video sequence is obtained, the mesh where extracting target object in each video frame of the target video sequence
Mark subject area;
Crucial point location is carried out to the target object in the targeted object region, is obtained in each video frame
The target object target critical point and each video frame target critical point location information;
The location information of target critical point based on each video frame, obtains the target critical of each video frame
The corresponding behavioral characteristics value of point;
Behavioral characteristics value is chosen from the target video sequence meets the video frame under dbjective state;The dbjective state
Under video frame be used to characterize that the movement that is filtered out from the target video sequence to link up and the target object is in institute
State the video frame of dbjective state;
Video frame under the dbjective state is determined as key video sequence frame, and according to the target in the key video sequence frame
Part belonging to key point, the local image region where taking the part in the key video sequence frame;
The local image region is identified, obtains target identification as a result, and true based on the target identification result
The attribute of the fixed target object.
Wherein, the acquisition target video sequence extracts target pair from each video frame of the target video sequence
As the targeted object region at place, comprising:
Acquisition includes the video data of the target object, and the Digital video resolution is corresponding for the target object
Target video sequence, and obtain the first video frame and the second video frame from the target video sequence;
Image-region where obtaining the target object in first video frame, as in first video frame
Targeted object region, and the image-region where obtaining the target object in second video frame, as described
Targeted object region in two video frames.
Wherein, the target object in the targeted object region carries out crucial point location, obtains described every
The target critical point of target object described in a video frame and the location information of target critical point, comprising:
Crucial point location is carried out to the target object in the targeted object region in first video frame, obtains institute
State the position of all key points of the target object in all key points and first video frame in first video frame
Confidence ceases, and two key points on first position are determined as to the mesh of first video frame from obtained all key points
Mark key point;
In the targeted object region in second video frame to all key points in first video frame into
Row tracking, obtains the position of all key points in second video frame and all key points in second video frame
Information;
According to the target critical point in first video frame, included in the targeted object region of second video frame
All key points in, two key points on the second position are determined as to the target critical point of second video frame.
Wherein, the image-region where obtaining the target object in first video frame, as described
Targeted object region in one video frame, comprising:
If first video frame is the first video frame of the target video sequence, it is based on first network model, filter
Identify the target object in wiping out background except the background area in first video frame, and based on the first network model
The image-region in the first video frame behind region, and the image-region that will identify that as the target object described first
Targeted object region in video frame.
Wherein, related to the institute in first video frame in the targeted object region in second video frame
Key point obtains all keys in all key points and second video frame in second video frame to being tracked
The location information of point, comprising:
Based on the location information of each key point in first video frame, each crucial point tracking is mapped to
Targeted object region in second video frame, and based on being mapped in the targeted object region in second video frame
The key point arrived obtains all key points in second video frame, and described second is determined in second video frame
The location information of each key point in video frame.
Wherein, the location information of the target critical point based on each video frame, obtains each video frame
The corresponding behavioral characteristics value of target critical point, comprising:
The location information of the target critical point of each video frame is obtained, and according to the target critical point of each video frame
Location information, determine the corresponding distance difference of target critical point of each video frame, and the distance difference that will be determined
It is determined as behavioral characteristics value corresponding to the target critical point of corresponding video frame.
Wherein, the dbjective state includes the first sign state and the second sign state, the target of each video frame
The corresponding behavioral characteristics value of key point includes that the dynamic under behavioral characteristics value and the second sign state under the first sign state is special
Value indicative;
The behavioral characteristics value of choosing from the target video sequence meets the video frame under dbjective state, comprising:
In each video frame, the behavioral characteristics value under first sign state is obtained, and in first body
The first maximum behavioral characteristics value is obtained in behavioral characteristics value under symptom state, and is determined based on the described first maximum behavioral characteristics value
First object threshold value;
In each video frame, the behavioral characteristics value under second sign state is obtained, and in second body
The second maximum behavioral characteristics value is obtained in behavioral characteristics value under symptom state, and is determined based on the described second maximum behavioral characteristics value
Second targets threshold;
Behavioral characteristics value under first sign state is compared with the first object threshold value, and by described
Behavioral characteristics value under two sign states is compared with second targets threshold;
By the behavioral characteristics value under continuous multiple the first sign states greater than first object threshold value, and/or less than second
Video frame corresponding to behavioral characteristics value under second sign state of targets threshold is determined as from the target video sequence
The video frame for acting the coherent and described target object and being in dbjective state filtered out, to obtain the video under dbjective state
Frame.
Wherein, the video frame by under the dbjective state determines key video sequence frame, and according to the key video sequence frame
In target critical point belonging to part, the topography where taking the part in the key video sequence frame
Region, comprising:
Using the movement filtered out, the coherent and described target object is in the video frame of the dbjective state as candidate video
Frame, and quality evaluation is carried out to the targeted object region in the candidate video frame, and filtered out according to quality assessment result described
Blurry video frames in candidate video frame;
In filtering out the candidate video frame after blurry video frames, the candidate video frame with highest resolution is determined as closing
Key video frame, and based on part belonging to the target critical point in the key video sequence frame, take the key video sequence frame
In the part region as local image region.
Wherein, described that the local image region is identified, target identification is obtained as a result, and knowing based on the target
Other result determines the attribute of the target object, comprising:
The local image region is determined as pending area, and based on the second network model to the pending area
Feature extraction is carried out, characteristics of image corresponding with the pending area is obtained;
According in second network model, multiple Attribute class in described image feature and second network model are obtained
Matching degree between type feature;
By multiple attribute type features in the matching degree obtained by second network model and second network model
Corresponding label information is associated, and obtains the corresponding target identification of second network model as a result, and based on the target
Recognition result determines the corresponding attribute of the target object.
Wherein, the method also includes:
Sample set associated with the target object is obtained, and the first label letter will be carried in the sample set
The sample data of breath is determined as positive sample, and is determined as the sample data for carrying the second label information in the sample set
Negative sample;Wherein, the positive sample is the sample data that the attribute of target object is living body attribute, and the negative sample is target pair
The attribute of elephant is the sample data of non-living body attribute;
In the sample set, by the size scaling of the corresponding image data of the positive sample to identical size, and base
Corresponding first label information of positive sample, corresponding second label information of the negative sample after scaling, training described second
Network model.
Wherein, optionally, the part includes the first sign information and the second sign information;
It is described that the local image region is identified, target identification is obtained as a result, and based on the target identification knot
Fruit determines the attribute of the target object, comprising:
First sign information region is determined as the first image-region in the local image region, and will
Second sign information region is determined as the second image-region, and by the first image region and second image
Region inputs cascade network model, to extract the first characteristics of image and second image-region in the first image region
In the second characteristics of image;
The first image feature is inputted into the first classifier in the cascade network model, exports the first image
The first matching degree in feature and second network model between multiple attribute type features of the first classifier;
Second characteristics of image is inputted into the second classifier in the cascade network model, exports second image
The second matching degree in feature and the cascade network model between multiple attribute type features of the second classifier;Described second
Classifier be and the mutually cascade classifier of the first classifier;
The weighted value of weighted value and second classifier based on first classifier, by first matching degree with
Second matching degree is merged, and obtains the corresponding target identification of the cascade network model as a result, and based on the target
Recognition result determines the corresponding attribute of the target object.
On the one hand the embodiment of the present invention provides a kind of video data processing apparatus, comprising:
Retrieval module is mentioned from each video frame of the target video sequence for obtaining target video sequence
Take the targeted object region where target object;
Key point locating module, for carrying out crucial point location to the target object in the targeted object region,
Obtain the target critical point of the target object in each video frame and the target critical point of each video frame
Location information;
Characteristic value acquisition module obtains described for the location information of the target critical point based on each video frame
The corresponding behavioral characteristics value of target critical point of each video frame;
Video frame chooses module, meets under dbjective state for choosing behavioral characteristics value from the target video sequence
Video frame;Video frame under the dbjective state be used to characterize the movement filtered out from the target video sequence link up and
The target object is in the video frame of the dbjective state;
Key frame determining module, for the video frame under the dbjective state to be determined as key video sequence frame, and according to institute
Part belonging to the target critical point in key video sequence frame is stated, the institute, part is taken in the key video sequence frame
Local image region;
Local identification module obtains target identification as a result, and based on institute for identifying to the local image region
State the attribute that target identification result determines the target object.
Wherein, the retrieval module includes:
Data parsing unit, for acquiring the video data comprising the target object, and by the Digital video resolution
For the corresponding target video sequence of the target object, and the first video frame and the second view are obtained from the target video sequence
Frequency frame;
Area determination unit is made for the image-region where obtaining the target object in first video frame
For the targeted object region in first video frame, and the figure where obtaining the target object in second video frame
As region, as the targeted object region in second video frame.
Wherein, the key point locating module includes:
Key point positioning unit, in the targeted object region in first video frame to the target object into
Row key point location, obtains all key points and first video frame of the target object in first video frame
In all key points location information, and two key points on first position are determined as from obtained all key points
The target critical point of first video frame;
Key point tracing unit, in the targeted object region in second video frame to first video frame
In all key points to being tracked, obtain in all key points and second video frame in second video frame
All key points location information;
Key point determination unit, for according to the target critical point in first video frame, in second video frame
Targeted object region included all key points in, two key points on the second position are determined as second video
The target critical point of frame.
Wherein, the area determination unit is specifically used for, if first video frame is the head of the target video sequence
A video frame is then based on first network model, filters out the background area in first video frame, and be based on the first network
Model identifies image-region of the target object in the first video frame behind wiping out background region, and the image that will identify that
Targeted object region of the region as the target object in first video frame.
Wherein, the key point tracing unit, specifically for the position based on each key point in first video frame
Each crucial point tracking is mapped to the targeted object region in second video frame by confidence breath, and based on described the
The key point mapped in targeted object region in two video frames obtains all keys in second video frame
Point, and the location information of each key point in second video frame in determining second video frame.
Wherein, the characteristic value acquisition module, the location information of the target critical point specifically for obtaining each video frame,
And the location information of the target critical point according to each video frame, determine that the target critical point of each video frame is corresponding
Distance difference, and the distance difference determined is determined as behavioral characteristics corresponding to the target critical point of corresponding video frame
Value.
Wherein, the dbjective state includes the first sign state and the second sign state, the target of each video frame
The corresponding behavioral characteristics value of key point includes that the dynamic under behavioral characteristics value and the second sign state under the first sign state is special
Value indicative;
The video frame chooses module
First threshold determination unit, for obtaining the dynamic under first sign state in each video frame
Characteristic value, and the first maximum behavioral characteristics value is obtained in the behavioral characteristics value under first sign state, and based on described
First maximum behavioral characteristics value determines first object threshold value;
Second threshold determination unit, for obtaining the dynamic under second sign state in each video frame
Characteristic value, and the second maximum behavioral characteristics value is obtained in the behavioral characteristics value under second sign state, and based on described
Second maximum behavioral characteristics value determines the second targets threshold;
Threshold value comparison unit, for by under first sign state behavioral characteristics value and the first object threshold value into
Row compares, and the behavioral characteristics value under second sign state is compared with second targets threshold;
Video frame selection unit, by the behavioral characteristics under continuous multiple the first sign states greater than first object threshold value
Value, and/or less than video frame corresponding to the behavioral characteristics value under the second sign state of the second targets threshold, be determined as from institute
The video frame for acting the coherent and described target object and being in dbjective state filtered out in target video sequence is stated, to obtain mesh
Video frame under mark state.
Wherein, the key frame determining module includes:
Quality estimation unit, the view for acting the coherent and described target object and being in the dbjective state for that will filter out
Frequency frame carries out quality evaluation as candidate video frame, and to the targeted object region in the candidate video frame, and according to quality
Assessment result filters out the blurry video frames in the candidate video frame;
Key frame determination unit, for that will have highest resolution in filtering out the candidate video frame after blurry video frames
Candidate video frame be determined as key video sequence frame, and it is right based on part belonging to the target critical point in the key video sequence frame
As taking the part region in the key video sequence frame as local image region.
Wherein, the local identification module includes:
Feature extraction unit for the local image region to be determined as pending area, and is based on the second network mould
Type carries out feature extraction to the pending area, obtains characteristics of image corresponding with the pending area;
Characteristic matching unit, for obtaining described image feature and second net according in second network model
Matching degree in network model between multiple attribute type features;
In attribute determining unit, matching degree for will be obtained by second network model and second network model
The corresponding label information of multiple attribute type features is associated, and obtains the corresponding target identification knot of second network model
Fruit, and the corresponding attribute of the target object is determined based on the target identification result.
Wherein, the local identification module further include:
Sample acquisition unit, for obtaining sample set associated with the target object, and in the sample set
It is middle that the sample data for carrying the first label information is determined as positive sample, and the second label letter will be carried in the sample set
The sample data of breath is determined as negative sample;Wherein, the positive sample is the sample data that the attribute of target object is living body attribute,
The negative sample is that the attribute of target object is the sample data of non-living body attribute;
Model training unit, in the sample set, the size of the corresponding image data of the positive sample to be contracted
It puts to identical size, and based on corresponding first label information of positive sample, corresponding second label of the negative sample after scaling
Information, training second network model.
Wherein, the part includes the first sign information and the second sign information;
It is described part identification module include:
Image-region determination unit, in the local image region that first sign information region is true
It is set to the first image-region, and second sign information region is determined as the second image-region, and by described first
Image-region and second image-region input cascade network model, to extract the first image in the first image region
The second characteristics of image in feature and second image-region;
First matching unit, for the first image feature to be inputted to the first classification in the cascade network model
Device exports in the first image feature and second network model between multiple attribute type features of the first classifier
First matching degree;
Second matching unit, for second characteristics of image to be inputted to the second classification in the cascade network model
Device exports in second characteristics of image and the cascade network model between multiple attribute type features of the second classifier
Second matching degree;Second classifier be and the mutually cascade classifier of the first classifier;
Integrated unit is matched, for the weighted value of weighted value and second classifier based on first classifier,
First matching degree is merged with second matching degree, obtains the corresponding target identification knot of the cascade network model
Fruit, and the corresponding attribute of the target object is determined based on the target identification result.
On the one hand the embodiment of the present invention provides a kind of video data processing apparatus, comprising: processor and memory;
The processor is connected with memory, wherein for storing program code, the processor is used for the memory
Said program code is called, to execute such as the method in the embodiment of the present invention in first aspect.
On the one hand the embodiment of the present invention provides a kind of computer storage medium, the computer storage medium is stored with meter
Calculation machine program, the computer program include program instruction, and described program is instructed when being executed by a processor, executed such as the present invention
Method in embodiment in first aspect.
The embodiment of the present invention, can be to the target video first when getting the corresponding target video sequence of target object
Target object in sequence is detected, in order to the subsequent target that can find each video frame in the target video sequence
Key point, and then can capture the target critical point and appear in location information in each video frame, so as to according to should
Target critical point obtains moving corresponding to the target critical point in each video frame in the positional information calculation in each video frame
State characteristic value;For example, each video frame can be further calculated out so that the target critical point is key point A and key point B as an example
In key point A and the distance between key point B difference, and then target critical point institute is right in available corresponding video frame
The behavioral characteristics value answered;Then, it by behavioral characteristics value corresponding to the target critical point in each video frame, can filter out
Video frame under particular state, it can filter out behavioral characteristics value in each video frame and meet video under dbjective state
Frame, and then key video sequence frame can be determined from the video frame filtered out, to improve the efficiency of vivo identification, and ensure living body
The accuracy of identification;It is then possible to determine part belonging to target critical point in key video sequence frame, and then can be from this
The local image region where the part is taken out in key video sequence frame, to improve the efficiency of image recognition;Finally, logical
The part in the local image region under the particular state can be identified by crossing trained In vivo detection model, with
The precision of vivo identification under particular state is improved, and then the authentication dynamics of system can be improved.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with
It obtains other drawings based on these drawings.
Fig. 1 is a kind of structural schematic diagram of network architecture provided in an embodiment of the present invention;
Fig. 2 is a kind of schematic diagram of target video sequence provided in an embodiment of the present invention;
Fig. 3 is a kind of flow diagram of video data handling procedure provided in an embodiment of the present invention;
Fig. 4 is a kind of schematic diagram for obtaining video data provided in an embodiment of the present invention;
Fig. 5 is a kind of structural schematic diagram for obtaining key point provided in an embodiment of the present invention;
Fig. 6 is a kind of schematic diagram for tracking key point provided in an embodiment of the present invention;
Fig. 7 is a kind of schematic diagram of behavioral characteristics value for obtaining target critical point provided in an embodiment of the present invention;
Fig. 8 is a kind of schematic diagram for taking local image region provided in an embodiment of the present invention;
Fig. 9 is the flow diagram of another video data handling procedure provided in an embodiment of the present invention;
Figure 10 is a kind of schematic diagram for obtaining object region provided in an embodiment of the present invention;
Figure 11 is a kind of schematic diagram for obtaining key video sequence frame provided in an embodiment of the present invention;
Figure 12 is a kind of structural schematic diagram of video data processing apparatus provided in an embodiment of the present invention;
Figure 13 is the structural schematic diagram of another video data processing apparatus provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
It referring to Figure 1, is a kind of structural schematic diagram of network architecture provided in an embodiment of the present invention.As shown in Figure 1, the net
Network framework may include cloud server 2000 and user terminal cluster;User terminal cluster may include that multiple users are whole
End, as shown in Figure 1, specifically include user terminal 3000a, user terminal 3000b ..., user terminal 3000n;As shown in Figure 1,
User terminal 3000a, user terminal 3000b ..., user terminal 3000n can with the cloud server 2000 carry out network
Connection.
For ease of understanding, a user terminal is selected in multiple user terminals that the embodiment of the present invention can be shown in Fig. 1
As target terminal user, for example, can be using user terminal 3000a shown in FIG. 1 as target terminal user.Wherein, target
User terminal may include: that smart phone, tablet computer, desktop computer, smart television etc. carry the intelligence of camera function eventually
End.The target terminal user (for example, smart phone) can be opened detecting the camera (for example, front camera) in terminal
In the case where opening, video record function is used in the corresponding data acquisition interface of the camera, to obtain comprising the target pair
As the video flowing (video data comprising the target object can be obtained) of (for example, face).The video flowing can be by multiple
Video frame comprising the face is constituted.Therefore, by being parsed to the collected video flowing, the available target object
Corresponding target video sequence.Wherein, can be with displaying target prompt information on the data acquisition interface, which uses
Execute specific movement (for example, eye closing) in instruction user, in order to it is subsequent can be by statistics target critical point (for example, eye
Two key points in portion region) change in location situation in each video frame, and particular state is captured (for example, eye closing shape
State) under key video sequence frame, and then the corresponding area, topography of target critical point can be determined from the key video sequence frame
Domain, in other words, can from key video sequence frame using the eyes region under the closed-eye state taken out as area, topography
Domain, so as to judge whether the face is living body by the local image characteristics in the local image region.
Wherein, the judgement of face living body is very heavy in face core body system (can also be referred to as system for face identity authentication)
The link wanted, for example, by the face core body system, user can remotely be opened an account in certain application platform, remote authentication,
The operations such as account deblocking complaint.In order to improve the safety of the face core body system, then need to the currently used target user
The target user of terminal carries out real name reality people certification, to ensure that the target user is true legal user.In other words, the target
User terminal can acquire the video data of the face comprising the target user, and by the video data transmission to above-mentioned Fig. 1 institute
Cloud server 2000 in corresponding embodiment, so that the cloud server 2000 can use its powerful computing function, it is right
Face in target video sequence corresponding to the video data carries out living body judgement.Optionally, which may be used also
With collect comprising the target user face video data when, to mesh corresponding to the video data in local terminal
The face marked in video sequence carries out living body judgement.
In consideration of it, when the cloud server 2000 is integrated with being capable of the identity verified of the living body attribute to target object
When verifying system, target video sequence can be resolved to video flowing by above-mentioned in the cloud server 2000, in other words,
When the cloud server 2000 receives video data transmitted by the target terminal user, which can be solved
Analysis, to obtain the corresponding target video sequence of the target object.Optionally, it tests when being integrated with the identity in the target terminal user
When card system, the video flowing can be parsed in the target terminal user, to obtain the corresponding target of the target object
Video sequence.
For ease of understanding, the embodiment of the present invention to be for being integrated with the authentication system in the target terminal user, with
It illustrates how to obtain target video sequence in the target terminal user, and how to be based on target critical point in each video frame
In behavioral characteristics value, filter out the video frame under dbjective state from the target video sequence, and how based on filtering out
Video frame carry out vivo identification etc..
For ease of understanding, further, Fig. 2 is referred to, is a kind of target video sequence provided in an embodiment of the present invention
Schematic diagram.As shown in Fig. 2, target user can open in the front camera in target terminal user (for example, smart phone)
In the case where, video record is carried out based on target prompting information shown in data acquisition interface 100a, to obtain comprising being somebody's turn to do
The video data of target user (video data may include multiple video frames).It should be appreciated that constituting each of the video data
Video frame can carry out serializing distribution according to time shaft shown in Fig. 2, therefore, when the target terminal user is to collected view
When frequency is according to data parsing is carried out, available target video sequence corresponding with face (i.e. the target object) of the target user
It arranges, each video frame in the target video sequence can carry out serializing distribution according to time sequencing shown in Fig. 2.Wherein, right
It can be used as the first view in the target video sequence in former and later two video frames of the arbitrary neighborhood on time shaft shown in Fig. 2
Frequency frame and the second video frame.For example, the 1st moment corresponding video frame on the time shaft can be referred to as the first video frame, i.e.,
First video frame can be in the target video sequence, and the video frame of the 2nd moment object on the time shaft is referred to as the
Two video frames.Optionally, the one 2 moment corresponding video frame on the time shaft can also be referred to as to the first video frame, and will
The video frame of the 3rd moment object is referred to as the second video frame on the time shaft.By by the target video sequence arbitrarily connecting
Two continuous video frames can be quickly found out and appear in each video frame respectively as the first video frame and the second video frame
Target critical point.
For ease of understanding, the embodiment of the present invention can be using the target critical point as two on specific position in mouth region
For key point, wherein the two key points can be referred to as the first key point and the second key point, be based on the two to illustrate
Key point determines the detailed process of the key video sequence frame under the state of opening one's mouth.Wherein, which can be the mouth institute
Key point on upper lip in the zone, the second key point can be first crucial with this on lower lip in the mouth region
The corresponding key point of point, and then the target critical point that the two key point points can be referred to as in a video frame.For mesh
There can be such target critical point in each video frame in mark video sequence.Therefore, it can be somebody's turn to do by statistics
Location information of the target critical point in each video frame, and the dynamic of the target critical point in each video frame is calculated
Characteristic value, and then it is special can quickly to be filtered out based on the corresponding behavioral characteristics value of target critical point in each video frame for dynamic
Value indicative meets the video frame of dbjective state, so as to find key video sequence frame from the video frame filtered out, to improve image
The efficiency of identification.In other words, by the target critical point (i.e. the first key point and the second key point) in each video frame
Evolution rule counted, can calculate between the first key point and the second key point in each video frame away from
Deviation value, and then can be special by dynamic corresponding to target critical point that calculated each distance difference is referred to as in each video frame
Value indicative.In consideration of it, in target video sequence shown in Fig. 2, it can be based on the corresponding dynamic of target critical point of each video frame
It is higher and in the key video sequence frame opened one's mouth under state that characteristic value finds resolution ratio, so as to based on the key video sequence found
Mouth region (i.e. local image region) in frame carries out living body judgement, to improve recognition efficiency.It should be appreciated that living body judgement
Purpose be to confirm that the collected video data of target terminal user institute is valid data, and not turned over by plane
Bat (utilizing photo, certificate etc.) is obtained to have aggressive invalid data to the face core body system, in consideration of it, passing through
The statistics of behavioral characteristics value can be quickly found out key video sequence frame, to improve recognition efficiency, and the authentication power of system can be enhanced
Degree.
Wherein, above-mentioned target terminal user obtains above-mentioned target video sequence, determines that above-mentioned target critical point is corresponding dynamic
State characteristic value, and the detailed process of local image region is taken, it may refer to the institute of embodiment corresponding to following Fig. 3 to Figure 11
The implementation of offer.
Further, Fig. 3 is referred to, is a kind of process signal of video data handling procedure provided in an embodiment of the present invention
Figure.As shown in figure 3, method provided in an embodiment of the present invention may include:
Step S101 obtains target video sequence, and target pair is extracted from each video frame of the target video sequence
As the targeted object region at place;
Specifically, target terminal user can acquire the video data comprising the target object, and by the video counts
According to resolving to the corresponding target video sequence of the target object, and obtain from the target video sequence the first video frame and
Second video frame;Further, the image-region where obtaining the target object in first video frame, as described
Targeted object region in first video frame, and the image district where obtaining the target object in second video frame
Domain, as the targeted object region in second video frame.
Wherein, the target terminal user can be the target terminal user in embodiment corresponding to above-mentioned Fig. 1, for example,
The target terminal user can be the user terminal 3000a in embodiment corresponding to above-mentioned Fig. 1.The target terminal user can be
Acquisition includes the video data of the target object in the case that camera is opened.For example, the target terminal user can be preceding
In the case where setting camera unlatching, displaying target is prompted on the data acquisition interface 100a in the embodiment corresponding to above-mentioned Fig. 2
Information, which can serve to indicate that should using target user (for example, the user A) basis of the target terminal user
Target prompting information makes corresponding movement (for example, open one's mouth, close one's eyes, choosing the specific actions such as eyebrow), in order to target user end
End can collect the video flowing comprising these movements, it can obtain the view comprising target object (for example, face of user A)
Frequency evidence.
Optionally, in some face core body systems (for example, intelligent entrance guard or remote authentication), the target terminal user is also
It can be played by way of voice broadcast on above-mentioned data acquisition interface shown in the case where rear camera is opened
Target prompting information, to collect the video data for another user for being different from above-mentioned user A.Further, Fig. 4 is referred to,
It is a kind of schematic diagram for obtaining video data provided in an embodiment of the present invention.As shown in figure 4, user B can receive target use
Shown target prompting information is (for example, please be close on the data acquisition interface 200a that family terminal (i.e. smart phone) is broadcasted
Camera shooting terminal), and can according to the target prompting information, slowly from geographical location, A shifts to geographical location B, with to the target user
Terminal is drawn close, and then the target terminal user can be made to collect video flowing of user B during executing specific action, i.e.,
The available video data comprising target object (for example, face of user B).
Further, which can carry out data parsing to collected video data, to obtain the mesh
The corresponding target video sequence of object is marked, which can be the target video in embodiment corresponding to above-mentioned Fig. 2
Sequence includes N number of video frame in the target video sequence, for ease of understanding, only with target video sequence in the embodiment of the present invention
For two video frames of middle continuous adjacent, with the detailed process progress to target critical point is determined from the two video frames
It illustrates.Wherein, the two video frames can for the 1st moment in embodiment corresponding to above-mentioned Fig. 2 video frame and it is above-mentioned 2nd when
The video frame at quarter.Wherein, the video frame at the 1st moment can be referred to as the first video frame, and the video frame at the 2nd moment can be referred to as
Second video frame.
Step S102 carries out crucial point location to the target object in the targeted object region, obtains described every
The location information of the target critical point of the target critical point and each video frame of the target object in a video frame;
Specifically, the target terminal user can be in the targeted object region in first video frame to the mesh
It marks object and carries out crucial point location, obtain all key points of the target object in first video frame and described the
The location information of all key points in one video frame, and by two keys on first position from obtained all key points
Point is determined as the target critical point of first video frame;Further, which can be in second video
To all key points in first video frame to being tracked in targeted object region in frame, second video is obtained
The location information of all key points in frame and all key points in second video frame;Further, which uses
Family terminal can also be according to the target critical point in first video frame, in the targeted object region institute of second video frame
In all key points for including, two key points on the second position are determined as to the target critical point of second video frame.
It should be appreciated that for two video frame (i.e. the first video frames of accessed arbitrary continuation in above-mentioned steps S101
With the second video frame) for, which can be to the institute in the target object region in the two video frames
The target of each video frame is found in the key point for having crucial point to be positioned, and then can be navigated to from each video frame
The location information of key point and the target critical point.
In other words, which can be based on first network model (for example, the convolution in the first network model
Neural network model) Face datection is carried out to the video frame of input, to get face location (i.e. face of each video frame
Position can be referred to as face frame, can also be referred to as face region), and then can be based on the face frame found each
The location information of all key points associated with the target object and all key points is determined in video frame.Wherein,
The target critical point be in each video frame on predeterminated position two key points (for example, a key point on upper eyelid and
A key point on palpebra inferior corresponding with the upper eyelid), the two key points belong to all keys in corresponding video frame
A part in point.For the video frame comprising specific action that continuous acquisition arrives, the target critical point can be counted on
In the change in location situation in each video frame, and the change in location situation is to meet certain changing rule under normal conditions.
In order to save the Face datection time, processing speed is greatly improved, can only be carried out in the initial stage of Face datection
Face datection, in other words, if first video frame is the first video frame of the target video sequence, the target terminal user
It can be based on first network model, filter out the background area in first video frame, and identify based on the first network model
Image-region of the target object in the first video frame behind wiping out background region, and the image-region conduct that will identify that
Targeted object region of the target object in first video frame.Further, which can be from this
The key point of the target object is positioned in targeted object region in first video frame, and then the available target pair
As the location information of all key points in all key points and first video frame in first video frame, and
The target that two key points on first position are determined as first video frame can be closed from obtained all key points
Key point.
For example, can first video frame to the target video sequence (the first video frame can be referred to as the first video
Frame) carry out Face datection multiple key points can be extracted from the face frame (for example, 94 keys after obtaining face frame
Point).Then, the target terminal user can based on the location information for the key point and these key points that these are extracted,
Key point track algorithm is used in subsequent frame (i.e. the second video frame), and then can be reflected in the second video frame by what is tracked
Key point is penetrated as the key point in respective frame.It should be appreciated that each mapping key point in the second video frame (can also be referred to as
For each key point in the second video frame) to there are mapping relations one by one between the corresponding key point in the first video frame.
For ease of understanding, further, Fig. 5 is referred to, is a kind of knot for obtaining key point provided in an embodiment of the present invention
Structure schematic diagram.The target terminal user can be based on first when the first video frame is the first video frame of target video sequence
Convolutional neural networks model (for example, model A) in network model filters out the background area of the first video frame, with
To the targeted object region comprising the target object (i.e. face).In other words, which can filter out first view
Background area in frequency frame, and recognition of face is carried out to the image-region in the first video frame behind wiping out background region, and will
The image-region identified is determined as the region where face, it can obtains the target in first video frame shown in fig. 5
Subject area;Then, the target terminal user region (i.e. targeted object region) where the face can be inputted this first
Another convolutional neural networks model (for example, Model B) in network model, the Model B go out the face region for identification
In each part (i.e. facial contour and face), and then can be shown in Fig. 5 targeted object region in by the face
The key points of facial contour and face respective numbers indicates.In other words, which can be from mesh shown in fig. 5
It marks in subject area and crucial point location is carried out to the target object, it is right with each part to be extracted from the object region
As associated multiple key points.The key point that the target terminal user can further extract these is referred to as the target
All key points of the object in first video frame, and the position letter of these key points can be obtained from first video frame
Breath, in order to be tracked to all key points in first video frame in subsequent video frame, and then can be rear
All key points of stable corresponding video frame and the location information of these key points are obtained in continuous frame.
Further, Fig. 6 is referred to, is a kind of schematic diagram for tracking key point provided in an embodiment of the present invention.Based on upper
All key points in the first video frame accessed by stating in embodiment corresponding to Fig. 5, available as shown in FIG. 6 first
The schematic diagram of each key point in video frame.Optionally, which can be further by the institute in the first video frame
These obtained key points are added to set of keypoints M corresponding with first video frame, and the institute in set of keypoints M is related
Key point may be displayed in the first video frame as shown in FIG. 6.Wherein, if specific action during data acquisition is
Mouth, then the target terminal user, can be further in Fig. 6 after obtaining multiple key points in the first video frame described in Fig. 6
Shown in (can will by two key points on predeterminated position (i.e. first position in the first video frame) in display interface 100b
Key point A and key point B) it is used as the first key point pair.In addition, the target terminal user can also own in the first video frame
Key point is tracked, to obtain mapping key point (the i.e. second view that each key point maps in the second video frame
The each mapping key point mapped in frequency frame can be referred to as the key point in the second video frame).In other words, the target
User terminal can also determine the location information of all key points in the first video frame, and according to key point tracing algorithm and
All key points in first video frame are mapped to second by the location information of each key point in first video frame
Targeted object region in video frame, and mapped in the targeted object region that can be based further in second video frame
The key point arrived obtains all key points in the second video frame as shown in FIG. 6, is obtaining owning in second video frame
After key point, the key point that these mappings can be obtained is added in the corresponding set of keypoints N of the second video, so that should
Each key point in each key point and set of keypoints M in set of keypoints N has the relationship mapped one by one.It should
Understand, for the same target object, after all key points in the first video frame pass through mapping, can exist accordingly
Each key point mapped mapping key point, i.e. each key point and second in the first video frame are found in second video frame
There are one-to-one mapping relations between mapped mapping key point in video frame.In consideration of it, can be by the first video frame
In each key point and the second video frame in the mapping key point that maps, the key being referred to as in the two video frames
Point.Further, the target terminal user can in the display interface 200b described in Fig. 6 will the obtained key point B ' of mapping and
Key point A ' is determined as the second key point pair in second video frame.Further, which can be by first
The first key point in video frame to and the second video frame in the second key point the target being referred to as in corresponding video frame is closed
Key point.It should be appreciated that when the target terminal user finds corresponding position in each video frame in the target video sequence
On the key point clock synchronization that is constituted of two key points, then can be by key point that these find to being referred to as the target video sequence
Target critical point in column in corresponding video frame, and then these key points can be counted, the position in each video frame is become
Change situation, in order to be able to further execute step S103.
Wherein it is possible to understand, a multitask neural network model can be understood as in the first network model, it should
Multitask neural network model can be according to the target video sequence under the specific action got, in target video series
Target object in each video frame is identified, and the image-region that will identify that is as targeted object region.For each
For the targeted object region identified, the target terminal user can be based further on the multitask neural network model from this
Part belonging to the target critical point is identified in targeted object region, and then can be where the part
Target critical point associated with the specific action and the location information of the target critical point are captured in local image region
(for example, the pass of target corresponding to mouth region when specific action is to open one's mouth can be captured in above-mentioned display interface 100b
Key point), and then the target critical point captured can be tracked by key point tracing algorithm, quickly described second
The corresponding target critical point of the part is found in video frame.Wherein, the target critical point in the target video sequence can
With comprising the first key point in embodiment corresponding to above-mentioned Fig. 6 to, the second key point pair.
It should be appreciated that by key point tracing algorithm can using above-mentioned second video frame as newly the first video frame, and
The video frame at the 3rd moment adjacent with the first new video frame in embodiment corresponding to above-mentioned Fig. 2 is referred to as new second
Video frame.Then, which can be by all key points in embodiment corresponding to above-mentioned Fig. 6 in the second video frame
Set of keypoints in the first video frame new as this, and can be by the second key point in embodiment corresponding to above-mentioned Fig. 6
To as from the first key point pair in the first new video frame determined in the set of keypoints.Wherein, in order to
The first key point in embodiment corresponding to above-mentioned Fig. 6 in the first video frame is to distinguishing, in the first new video frame
First key point is to the third key point pair that can be referred to as in the first new video frame;Further, the target user is whole
End can be new to this first video frame in all key points be tracked, the institute obtained in the second new video frame is related
Key point, and then can be all in the second new video frame according to the third key point pair in the first new video frame
The second new key point pair is determined in key point.It is understood that in order to second in embodiment corresponding to above-mentioned Fig. 6
The second key point in video frame is to distinguishing, and the second new key point determined in the second new video frame is to can
With the 4th key point pair being referred to as in the second new video frame.Similarly, the third key point to and the 4th key point to can
With the target critical point being referred to as in the target video sequence.
It should be appreciated that for the 4th key point determined in subsequent each video frame to can be interpreted as subsequent every
The second key point pair determined in a video frame, therefore, which can will be in the target video sequence
First video frame is determined by the first key point in the first video frame to in each video frame after first video frame
The second key point out is to the target critical point for being determined as each video frame in the target video sequence.
In consideration of it, by the way that all key points in the first video frame are mapped in the subsequent video frame, in order to can be with
These key point mappeds mapping key point is found in subsequent video frame, and mapping each in subsequent video frame can be closed
Key point is referred to as the key point in corresponding video frame.It is understood that being closed for the target obtained in subsequent each video frame
The detailed process of key point can here will together referring to the detailed process of the second key point pair in above-mentioned the second video frame of acquisition
It does not continue to be repeated.
Step S103, the location information of the target critical point based on each video frame, obtains each video frame
The corresponding behavioral characteristics value of target critical point;
Specifically, which can obtain each video frame in each video frame of target video sequence
Target critical point location information, and the location information of the target critical point according to each video frame determines described every
The corresponding distance difference of target critical point of a video frame, and the distance difference determined is determined as to the target of corresponding video frame
Behavioral characteristics value corresponding to key point.
Wherein, the corresponding behavioral characteristics value of target critical point of each video frame may include dynamic under the first sign state
Behavioral characteristics value under state characteristic value and the second sign state, wherein first sign state and second sign state
It can be referred to as dbjective state;
Optionally, which can be special only for the dynamic under the first sign state
Value indicative.Optionally, which can also be special only for the dynamic under the second sign state
Value indicative.Wherein, first sign state can be the state of opening one's mouth, and second sign state can be closed-eye state;It is optional
Ground, first sign state can be eyes-open state, which can be the state of shutting up.
For ease of understanding, the embodiment of the present invention is only to open one's mouth for state by first sign state, to illustrate mouth
The change in location situation in each video frame of the target critical point in the embodiment corresponding to above-mentioned Fig. 2 in region.Further,
Fig. 7 is referred to, is a kind of schematic diagram of behavioral characteristics value for obtaining target critical point provided in an embodiment of the present invention.Due to the mesh
Mark key point is determine after crucial point location to the target object in each video frame based on above-mentioned first network model
Two key points on out the characteristics of position, i.e., the target critical point in each video frame is to be identified from corresponding video frame
The key point pair determined in key point represented by face and face contour out, therefore, the target terminal user is at this
To the statistics of change in location situation of the target critical point in each video frame under the state of opening one's mouth in target video frame sequence
Journey can be equivalent to the statistics of the behavioral characteristics value to the target critical point in mouth region shown in Fig. 7, Jin Erke
To find the video frame under dbjective state based on the corresponding behavioral characteristics value of target critical point in each video frame, it can from
The movement filtered out in the target video sequence is coherent, and in the video frame opened one's mouth under state.Wherein, the target critical point
It can be indicated with depth of opening one's mouth (i.e. the distance between two key points difference) in the behavioral characteristics value of each video frame.For just
In understanding, the depth of opening one's mouth in each video frame in embodiment corresponding to above-mentioned Fig. 2 can be used into as shown in Figure 7 first respectively
Distance difference, second distance difference ..., N distance difference indicate, and then the target critical point of available each video frame
Corresponding behavioral characteristics value.
In other words, in the target video sequence, the depth of opening one's mouth in video frame corresponding to the 1st moment can be the mesh
The corresponding first distance difference of key point is marked, i.e. depth of opening one's mouth in video frame corresponding to the 2nd moment can close for the target
The corresponding second distance difference ... of key point, i.e. depth of opening one's mouth in video frame corresponding to the n-th moment can be the target critical
The corresponding N distance difference of point.Since each video frame in the target video sequence is by embodiment corresponding to above-mentioned Fig. 2
Time sequencing carry out serializing arrangement, therefore, can be by counting two key points (i.e. first in the target critical point
Key point and the second key point) location information in each video frame is appeared in, the depth of opening one's mouth of each video frame is obtained, in turn
The corresponding behavioral characteristics value of target critical point of available corresponding video frame shown in Fig. 7.
It for ease of understanding, can be in conjunction with the key point A and key in the first video frame in embodiment corresponding to above-mentioned Fig. 6
Point B and key point A ' in the second video frame and key point B ' describes the target critical point in the two adjacent video frames
Behavioral characteristics value.Wherein, key point A and key point B can be understood as two keys under the specific action on predeterminated position
Point, for example, key point A is the key point on lower lip, key point B is the corresponding pass key point A corresponding with this on upper lip
Key point.Wherein, key point A ' is that key point A is mapped to obtained mapping key point, key point B ' in the second video frame to be
Key point B is mapped to obtained mapping key point in the second video frame, and then the target terminal user can be above-mentioned aobvious
Show the location information that key point A and key point B in the first video frame are obtained in the 100b of interface, and can be according to the two keys
The distance between point difference obtains of the target critical o'clock being made of key point A and key point B in the first video frame
Mouth depth (can obtain first distance value shown in Fig. 7).Similarly, which can also be in above-mentioned display interface
The location information of the key point A ' and key point B ' in the second video frame are obtained in 200b, and then can calculate the target critical
Depth of opening one's mouth (second distance value shown in Fig. 7 can be obtained) o'clock in the second video frame.Similarly, based on above-mentioned second view
All key points tracked in frequency frame can be tracked these key points in subsequent each video frame, Jin Erke
To capture the target critical point on corresponding position in the key point that these are tracked, therefore, existed based on the target critical point
Location information in subsequent each video frame can calculate the open one's mouth depth of the target critical point in subsequent each video frame
(for example, the N distance difference in available N video frame shown in Fig. 7, N at this time can be for just more than or equal to 3 for degree
Integer).Obtain in each video frame open one's mouth depth when, then can determine the target critical point in corresponding video frame
Behavioral characteristics value.Wherein, which can be used for being depicted the open one's mouth depth of the target critical point under dbjective state
And/or eye opening depth.Wherein, state when state and/or eye closing when dbjective state can be understood as opening one's mouth to act act.
Wherein, depth of opening one's mouth is the corresponding distance of the calculated target critical point of institute in embodiment corresponding to above-mentioned Fig. 7
Difference, since the location information of the target critical point in each video frame is not quite identical, can be in the target video
The corresponding distance difference of target critical point is referred to as dynamic corresponding to the target critical point in each video frame of sequence
Characteristic value.Wherein, according to the size of these distance differences, which can further unite from these distance differences
Counting out second distance difference in embodiment corresponding to above-mentioned Fig. 7 can be maximum distance difference (for example, 5cm), i.e., this second
The video frame at the 2nd moment corresponding to distance difference can be considered as the video frame under complete state.Optionally, the target user is whole
It to be minimum range difference (for example, 0cm) that end can also count N distance difference from these distance differences, then the N
The video frame at the n-th moment corresponding to distance difference can be considered as the video frame under full-shut position.Similarly, above-mentioned eye opening depth
Then can be understood as the target terminal user can be based on the target critical point in ocular in embodiment corresponding to above-mentioned Fig. 6
Change in location situation, and distance difference of the target critical point in corresponding video frame on the ocular counted on, in turn
The video under closed-eye state can be correspondingly found in these video frames by the distance difference on the ocular that counts
Frame, or find the video frame under eyes-open state.
Step S104 chooses behavioral characteristics value from the target video sequence and meets the video frame under dbjective state;
Specifically, if the corresponding behavioral characteristics value of target critical point includes the behavioral characteristics value and under the first sign state
Behavioral characteristics value under two sign states, then the target terminal user can be in each video frame of the target video sequence
In, the behavioral characteristics value under the first sign state is obtained, and obtain in the behavioral characteristics value under first sign state the
One maximum behavioral characteristics value, and first object threshold value is determined based on the described first maximum behavioral characteristics value;At the same time, each
In video frame, which can also obtain the behavioral characteristics value under the second sign state, and in second sign
It obtains the second maximum behavioral characteristics value in behavioral characteristics value under state, and determines the based on the second maximum behavioral characteristics value
Two targets thresholds;Further, the target terminal user can by under first sign state behavioral characteristics value with it is described
First object threshold value is compared, and the behavioral characteristics value under second sign state is carried out with second targets threshold
Compare;Further, the behavioral characteristics which can filter out from target video sequence according to comparison result
Value meets the video frame under dbjective state, i.e., the video frame under the dbjective state can be characterized is screened from target video sequence
The video frame for acting the coherent and described target object and being in dbjective state out.
Wherein, the comparison result may include the first comparison result and/or the second comparison result.Wherein, based on this
One comparison result can filter out video corresponding to the behavioral characteristics value under the first sign state greater than first object threshold value
Frame;Based on the behavioral characteristics value that second comparison result can be under the second sign state filtered out less than the second targets threshold
Corresponding video frame.Wherein, it can be m based on the video frame filtered out in the first comparison result, compare based on second
As a result the video frame filtered out in can be n.It should be appreciated that the dynamic when each video frame in the target video sequence is special
It, can be with when value indicative can be synchronized comprising behavioral characteristics value under the behavioral characteristics value and the second sign state under the first sign state
Respectively by the behavioral characteristics value and first object threshold value progress threshold value comparison under first sign state, by second sign state
Under behavioral characteristics value and the second targets threshold threshold value comparison, to obtain the corresponding comparison result of corresponding video frame, at this point, should
Comparison result may include the first comparison result and the second comparison result, it can filters out and is greater than in the target video sequence
Behavioral characteristics value under first sign state of first object threshold value, and less than dynamic under the second sign state of the second targets threshold
K video frame corresponding to state characteristic value.Wherein, m, n, k, can be positive integer, and the k video frame filtered out is understood that
For the partial video frame of the above-mentioned m video frame filtered out, and the partial video frame is also in the above-mentioned n video frame filtered out
Partial video frame, so, this k video frame is the m video frame filtered out and the identical view in the n video frame filtered out
Frequency frame.
For ease of understanding, the embodiment of the present invention is only with the target critical point for the first sign state (for example, state of opening one's mouth)
Under target critical point for, with illustrate from the target video sequence screen behavioral characteristics value meet the video under dbjective state
The detailed process of frame.Further, table 1 is referred to, is target critical point (the first sign shape that the embodiment of the present invention is counted on
Target critical point under state) behavioral characteristics value in the successive video frames of part distribution situation table.
Table 1
Video frame | The video frame at the 1st moment | The video frame at the 2nd moment | The video frame at the 3rd moment |
Behavioral characteristics value | 3cm | 5cm | 4cm |
Video frame | The video frame at the 4th moment | The video frame at the 5th moment | The video frame at the 6th moment |
Behavioral characteristics value | 3cm | 1cm | 0cm |
Based on above-mentioned steps S101- step S103, which can obtain the target by crucial point location
The location information of the target critical point of each video frame in video sequence, and then can be based on the target critical of each video frame
The location information of point is calculated the behavioral characteristics value of the target critical point of corresponding video frame and exists to get to the target critical point
Behavioral characteristics value in each video frame.So as shown in Table 1 above, which can count on target critical
Behavioral characteristics value in point (for example, target critical point in above-mentioned mouth region) video frame corresponding to the 1st moment is
3cm, behavioral characteristics value of the target critical o'clock in the video frame corresponding to the 2nd moment are 5cm, and the target critical o'clock is the 3rd
Behavioral characteristics value in video frame corresponding to moment is 4cm, and the target critical o'clock is in the video frame corresponding to the 4th moment
Behavioral characteristics value is 2cm, and behavioral characteristics value of the target critical o'clock in the video frame corresponding to the 5th moment is 1cm, the target
Behavioral characteristics value of the key point in the video frame corresponding to the 6th moment is 0cm.In consideration of it, the target terminal user can be from
It is 5cm that the maximum behavioral characteristics value under the state of opening one's mouth is found in these behavioral characteristics values, i.e. maximum under the state of opening one's mouth is dynamic
State characteristic value can be referred to as accessed the first maximum behavioral characteristics value from the behavioral characteristics value under the first sign state
(i.e. 5cm).Then the target terminal user can determine the first mesh based on the first maximum behavioral characteristics value and threshold parameter
Mark threshold value (for example, the first object threshold value can be with are as follows: 5cm*0.4=2cm).Wherein, threshold parameter 0.4, the threshold parameter
In other words a closed state for judging the first sign information region (i.e. mouth region) can use what this was determined
First object threshold value judges the closed state of mouth.It therefore, can be from above-mentioned target video sequence by mouth region
In target critical point behavioral characteristics value be greater than the first object threshold value video frame be referred to as the video frame under the state of opening one's mouth,
And then the behavioral characteristics value that the target terminal user is found can be determined as and meet video frame under dbjective state.For example, such as
It, can four (m=4) a video frames (i.e. video frame at the 1st moment, when the 2nd by behavioral characteristics value greater than 2cm shown in above-mentioned table 1
The video frame at quarter, the video frame and the video frame at the 4th moment at the 3rd moment) the behavioral characteristics value that is determined as finding meets target-like
Video frame under state.In other words, which can filter out under 4 states of opening one's mouth from the target video sequence
Video frame can further execute step S105 in order to subsequent, it can by the video under this 4 dbjective states filtered out
Frame determines key video sequence frame.
Video frame under the dbjective state is determined as key video sequence frame by step S105, and according to the key video sequence
Part belonging to target critical point in frame, the Local map where taking the part in the key video sequence frame
As region.
As shown in Table 1 above, the target terminal user can be filtered out from target video sequence this four movement it is coherent and
Target object is in the video frame of dbjective state, as the video frame under the dbjective state filtered out, and then can pass through quality
Assessment models find key video sequence frame (for example, the key video sequence frame can be the 2nd moment shown in table 1 in this 4 video frames
Video frame, i.e., with the video frame of highest resolution), and can be found in the key video sequence frame belonging to target critical point
Part, and then the region where can taking out the part (can be in above-mentioned Fig. 2 institute as local image region
Target critical point in corresponding embodiment in video frame shown in the 2nd moment, and then can be by the mouth in the region where mouth
Region where bar is as local image region), further to execute step S106.At this point, the target terminal user not nationwide examination for graduation qualification
Consider the closed state of the eyes under the second sign state.
Optionally, for ease of understanding, the embodiment of the present invention can also be only with the target critical point for the second sign state (example
Such as, closed-eye state) under target critical point for, with illustrate from the target video sequence screen behavioral characteristics value meet target
The detailed process of video frame under state.Further, table 2 is referred to, is the target critical that the embodiment of the present invention is counted on
The distribution situation table of behavioral characteristics value of the point (the target critical point under the second sign state) in the successive video frames of part.
Table 2
Video frame | The video frame at the 1st moment | The video frame at the 2nd moment | The video frame at the 3rd moment |
Behavioral characteristics value | 1.5cm | 1.2cm | 0.6cm |
Video frame | The video frame at the 4th moment | The video frame at the 5th moment | The video frame at the 6th moment |
Behavioral characteristics value | 0.5cm | 0.4cm | 0cm |
When target critical point is two key points in eye region on specific position, which can
With calculate the two key points between the location information in each video frame apart from difference, it can obtain the target pass
Behavioral characteristics value of the key point in each video frame.So as shown in Table 2 above, which can count on mesh
Mark key point (the target critical point that two key points i.e. in the ocular are constituted) video frame corresponding to the 1st moment
In behavioral characteristics value be 1.5cm, behavioral characteristics value of the target critical o'clock in the video frame corresponding to the 2nd moment be
1.2cm, behavioral characteristics value of the target critical o'clock in the video frame corresponding to the 3rd moment are 0.6cm, which exists
Behavioral characteristics value in video frame corresponding to 4th moment is 0.5cm, the target critical o'clock video corresponding to the 5th moment
Behavioral characteristics value in frame is 0.4cm, and behavioral characteristics value of the target critical o'clock in the video frame corresponding to the 6th moment is
0cm.In consideration of it, the target terminal user can find the maximum behavioral characteristics under the eyes-open state from these behavioral characteristics values
Value is 1.5cm, i.e., the maximum behavioral characteristics value under the eyes-open state can be referred to as from the behavioral characteristics under the second sign state
The second accessed maximum behavioral characteristics value (i.e. 1.5cm) in value.Then the target terminal user can be based on this second most
Larger Dynamic characteristic value and threshold parameter determine the second targets threshold (for example, second targets threshold can be with are as follows: 1.5cm*0.4
=0.6cm).Wherein, threshold parameter 0.4, the threshold parameter is for judging the second sign information region (i.e. eyes area
Domain) closed state in other words can use second targets threshold determined to judge the closed state of eyes.Therefore,
The behavioral characteristics value of the target critical point in eyes region can be less than second mesh from above-mentioned target video sequence
The video frame of mark threshold value is referred to as the video frame under closed-eye state, is determined as the behavioral characteristics value that the target terminal user is found
Meet the video frame under dbjective state.For example, as shown in Table 2 above, behavioral characteristics value can be less than to three (the i.e. n=of 0.6cm
3) a video frame (i.e. the video frame at the 4th moment, the video frame and the video frame at the 6th moment at the 5th moment) is determined as finding dynamic
State characteristic value meets the video frame under dbjective state.And then movement can be filtered out from target video sequence and is linked up, and target
Object is in video frame (the i.e. video frame at the 4th moment, the video frame and the video at the 6th moment at the 5th moment under dbjective state
Frame) as selected behavioral characteristics value from the target video sequence meet the video frame under dbjective state, and then can lead to
It crosses Evaluation Model on Quality and further finds key video sequence frame from these three video frames (for example, the target keywords can be table 2
Shown in the 6th moment video frame), and part described in the target critical point can be further found in the key video sequence frame
Object (i.e. eyes), and then the part region can be taken out in the key video sequence frame as local image region
(the target critical point i.e. in the embodiment corresponding to above-mentioned Fig. 2 in the video frame at the 6th moment is in eyes region, Jin Erke
Using by the region where the eyes as local image region), further to execute step S106.At this point, the target user is whole
End does not take into account that the closed state of the mouth under the first sign state.
Optionally, which can also meet target-like choosing behavioral characteristics value from target video sequence
When video frame under state, the synchronous behavioral characteristics considered under behavioral characteristics value and the second sign state under the first sign state
Value, that is, will be greater than the behavioral characteristics value under the first sign state of first object threshold value (i.e. > 2cm), and less than the second target threshold
It is worth video frame corresponding to the behavioral characteristics value under second sign state of (i.e. < 0.6cm), is determined as the behavioral characteristics filtered out
Value meets the video frame under dbjective state, with the k video frame filtered out.In conjunction with the target critical in mouth region
Dynamic of the point (at this point it is possible to referred to as first object key point) under the first sign state in above-mentioned 6 video frames
Corresponding first comparison result of characteristic value (it is big can to find 4 behavioral characteristics values based on above-mentioned table 1 from target video sequence
In the video frame of first object threshold value) and eyes region in target critical point (at this point it is possible to referred to as second
Target critical point) corresponding second comparison result of behavioral characteristics value under the second sign state in above-mentioned 6 video frames is (i.e.
Video frame of 3 behavioral characteristics values less than the second targets threshold can be found from the target video sequence based on above-mentioned table 2),
It can be based further on first comparison result and the second comparison result filters out behavioral characteristics value and meets under dbjective state and (opens one's mouth
And under closed-eye state) video frame.At this point, the target critical point in the target video sequence is i.e. while crucial comprising above-mentioned first
Point and above-mentioned second key point, in consideration of it, the target terminal user is based on for 6 video frames shown in above-mentioned Tables 1 and 2,
1 (i.e. k=1) a video frame (i.e. the video frame at the 4th moment) can be filtered out from this 6 video frames to be used as from the target video
The behavioral characteristics value of selected taking-up in sequence meets the video frame under dbjective state.Wherein it is possible to understand, the 4th moment
Video frame can be the same video frame in above-mentioned 4 filtered out video frame and 3 video frames filtering out.In consideration of it,
The video frame at the 4th moment filtered out further can be determined as key video sequence frame by the target terminal user, and in the pass
Determine that part belonging to the target critical point includes eyes and mouth in key video frame, and then can be in the key video sequence frame
In take eyes region and mouth region, and the eyes region taken out and mouth region are determined as
Local image region corresponding to the target critical point, further to execute step S106.
Step S106 identifies the part in the local image region, obtains target identification as a result, simultaneously base
The attribute of the target object is determined in the target identification result.
Specifically, which, which can divide the local image region, is determined as pending area, and base
Feature extraction is carried out to the pending area in the second network model, is obtained and the part pair in the pending area
The characteristics of image answered;According in second network model, obtain described image feature with it is multiple in second network model
Matching degree between attribute type feature;It will be in the matching degree that obtained by second network model and second network model
The corresponding label information of multiple attribute type features is associated, and obtains the corresponding target identification knot of second network model
Fruit, and determine that the corresponding attribute of target object is living body attribute based on the target identification result.Optionally, the target terminal user
It is also based on the target identification result and determines that the corresponding attribute of the target object is non-living body attribute.
Wherein, it during carrying out living body judgement by the local image region, needs in advance to the second network mould
Type is trained, in order to be able to the discrimination for improving image recognition by trained second network model.It should manage
Solution, may include eye feature, mouth from the local image characteristics in the local image region taken out in the key video sequence frame
The corresponding characteristics of image of parts such as Bart's sign, nose feature, ear feature and eyebrow feature, and then available part
The matching degree between multiple attribute type features in the corresponding characteristics of image of object and the nervus opticus network model.
Further, Fig. 8 is referred to, is a kind of schematic diagram for taking local image region provided in an embodiment of the present invention.
If part belonging to the target critical point includes the first sign information and the second sign information, wherein the first sign letter
For breath for that can be mouth with eyes, the second sign information, then the local image region taken out can be eye shown in Fig. 8
Image-region and mouth image region.Optionally, if part belonging to target critical point is only the second sign information,
The local image region then taken out can be the region where the mouth under the state of opening one's mouth.Optionally, if the target is closed
Part described in key point is only the first sign information, then the local image region taken out can be under closed-eye state
Region where eyes.
Further, as shown in figure 8, the local image region taken out can be input to Fig. 8 by the target terminal user
Shown in the second network model, second network model can for by great amount of samples set (true man's sample and attack sample) into
Acquired model after row training.Second network model can be convolutional neural networks model, wherein the pending area can
Think the corresponding localized target region of the target object (face) (for example, in embodiment corresponding to above-mentioned Fig. 8 where mouth
Region).It, can be first corresponding wait locate by target object in order to improve the accuracy rate that image data identifies in subsequent pending area
Reason region is adjusted to fixed size, and the image data in the pending area after adjustment size is then inputted convolution mind
Through the input layer in network model.The second convolution neural network model may include input layer, convolutional layer, pond layer, complete
Articulamentum and output layer;Wherein the parameter size of input layer is equal to the size of the pending area after adjustment size.When it is described to
After image data in processing region is input to the output layer of convolutional neural networks, convolutional layer is subsequently entered, is randomly selected first
A fritter in image data in the pending area as sample, and from this small sample, believe to some features by study
Breath, then successively slips over all pixels region of the pending area using this sample as a window, that is to say, that from
The characteristic information learnt in sample does convolution algorithm with the image data in pending area, to obtain in pending area
Image data it is (available for example, when target object is the face of animal or people in characteristics of image most significant on different location
The corresponding local image characteristics in each face position of the face of animal or people in the pending area).Finishing convolution
After operation, the characteristics of image of the image data in the pending area is extracted, but extract only by convolution algorithm
Feature quantity it is big, in order to reduce calculation amount, also need to carry out pond operation, that is, volume will be passed through from the pending area
The characteristics of image that product operation is extracted is transmitted to pond layer, carries out aggregate statistics to the characteristics of image of extraction, these statistical pictures are special
The order of magnitude of sign will be well below the order of magnitude for the characteristics of image that convolution algorithm extracts, while can also improve classifying quality.Often
Pond method mainly includes average pond operation method and maximum pond operation method.Average pond operation method is one
The feature that a average image feature represents the characteristics of image set is calculated in a characteristics of image set;Maximum pond operation is
The feature that maximum image feature represents the characteristics of image set is extracted in a characteristics of image set.Pass through the volume of convolutional layer
The pondization of product processing and pond layer is handled, and can extract the static structure feature letter of the image data in the pending area
Breath, it can obtain the corresponding characteristics of image of the pending area.
Wherein, it according to the full link sort layer (i.e. classifier) in the convolutional neural networks model, identifies described to be processed
Matching degree in the corresponding characteristics of image in region and the convolutional neural networks model between multiple attribute type features, convolution mind
It is trained completion in advance through the classifier in network model, the input of the classifier is the corresponding image of the pending area
Feature, the output of classifier are the matching degrees between the characteristics of image and a variety of attribute type features, the higher explanation of matching degree from
The local image characteristics of the part extracted in the pending area label corresponding with corresponding attribute type feature
Matching probability between information is bigger;Therefore, which can be further from point of the convolutional neural networks model
Maximum matching degree is determined in the matching degree that class device is exported, and can be further according to the maximum matching degree and the maximum
With the associated corresponding label information of attribute type feature of degree, the corresponding target identification result of second network model is obtained.
It may further determine that out the corresponding attribute of the target object based on the target identification result, for example, the attribute may include living
The corresponding living body attribute of body face, and the corresponding non-living body attribute of false face.Wherein, include in second network model
The value volume and range of product of attribute type feature be in training second network model by a large amount of training sample set (i.e. it is a large amount of just
Sample video segment and a large amount of negative sample video clip) in include the value volume and range of product of label information determine.Wherein, institute
The sample data that the attribute that positive sample is target object is living body attribute is stated, that is, includes the video clip of true man;The negative sample
Attribute for target object is the sample data of non-living body attribute, i.e., has comprising button hole, photo cutting, scribble etc. aggressive
Sample data.
Therefore, if the classifier in the second network model shown in Fig. 8 synchronizes the classifier comprising being identified to eye
With the classifier that mouth is identified, as long as then in embodiment corresponding to above-mentioned Fig. 8 obtained by classifier corresponding to mouth
Maximum matching degree and eyes corresponding to there is one to be unsatisfactory for local In vivo detection in the obtained maximum matching degree of classifier
Threshold value, then it is assumed that the corresponding attribute of the target object is non-living body attribute.Optionally, if in the second network model shown in Fig. 8
Classifier only include the classifier identified to eye, then the obtained maximum matching of the classifier corresponding to the eyes
Degree is when being unsatisfactory for local In vivo detection threshold value, then it is assumed that the corresponding attribute of the target object is non-living body attribute, conversely, then can be with
It is living body attribute for the corresponding attribute of the target object.Optionally, if the classifier in the second network model shown in Fig. 8 only wraps
Containing the classifier identified to mouth, then the obtained maximum matching degree of the classifier corresponding to the mouth is unsatisfactory for part
When In vivo detection threshold value, then it is assumed that the corresponding attribute of the target object is non-living body attribute, conversely, then can be the target object
Corresponding attribute is living body attribute.
It is understood that by collecting the data flow comprising specific action, the available a set of and target object
Associated action language, the action language can be interpreted as change in location rule of the target critical point in each video frame
(i.e. the situation of change of location information of the two key points in each video frame), can then be sieved based on the change in location rule
The video frame under particular state is selected, and then the target object can be further determined that out based on the video frame under the particular state
Attribute (attribute may include: living body attribute and non-living body attribute).For true man, it is based on the set of action language
Can find the key video sequence frame under particular state, and the reproduction videos synthesized for some cut using photo or video and
Speech, then each video frame in obtained target video sequence will be present cutting and editing trace, so can by this second
Network model identifies that the part in the specified region is non-living body attribute, so can effectively keep out photo, video,
Or the rogue attacks of static state 3D model, in addition, by being given to from filtering out the video frame under particular state in target video sequence
Second network model carries out vivo identification, the efficiency of vivo identification, the performance of lifting system can be improved, and can be improved and be
The discrimination of system.
The embodiment of the present invention, can be to the target video first when getting the corresponding target video sequence of target object
Target object in sequence is detected, in order to the subsequent target that can find each video frame in the target video sequence
Key point, and then location information in each video frame can be appeared in by capturing the target critical point, so as to root
It is obtained in each video frame corresponding to the target critical point according to the target critical point in the positional information calculation in each video frame
Behavioral characteristics value;For example, each view can be further calculated out so that the target critical point is key point A and key point B as an example
The distance between key point A and key point B in frequency frame difference, and then the target critical point in available corresponding video frame
Corresponding behavioral characteristics value;Then, it by behavioral characteristics value corresponding to the target critical point in each video frame, can sieve
Select the video frame under particular state, it can filter out behavioral characteristics value in each video frame and meet view under dbjective state
Frequency frame, and then key video sequence frame can be determined from the video frame filtered out, to improve the efficiency of vivo identification, and ensure
The accuracy of vivo identification;It is then possible to determine part belonging to target critical point in key video sequence frame, and then can be with
Local image region where taking out part in the key video sequence frame, to improve the efficiency of image recognition;Finally, logical
The part in the local image region under the particular state can be identified by crossing trained In vivo detection model, with
The precision of vivo identification under particular state is improved, and then the authentication dynamics of system can be improved.
Further, Fig. 9 is referred to, is that the process of another video data handling procedure provided in an embodiment of the present invention is shown
It is intended to.As shown in figure 3, method provided in an embodiment of the present invention may include:
Step S201, acquisition include the video data of the target object, and are the mesh by the Digital video resolution
The corresponding target video sequence of object is marked, and obtains the first video frame and the second video frame from the target video sequence;
Step S202, the image-region where obtaining the target object in first video frame, as described
Targeted object region in one video frame, and the image-region where obtaining the target object in second video frame,
As the targeted object region in second video frame
Wherein, first video frame can be the first video frame in the target video sequence, and first video frame is also
It can be obtained from the target video sequence for the non-first video frame in the target video sequence, the i.e. target terminal user
The corresponding object region of target object in each video frame is got, in order to can further execute step S203, i.e.,
Crucial point location can be carried out to the target object from the object region in each video frame.
Step S203 carries out key point to the target object in the targeted object region in first video frame and determines
It is related to obtain institute of the target object in all key points and first video frame in first video frame for position
The location information of key point, and two key points on first position are determined as first view from obtained all key points
The target critical point of frequency frame
Wherein it is possible to understand, which, can when getting all key points in the first video frame
It is added to first in first video frame with the location information of the key point and key point that further get these
Set of keypoints, so can the location information based on each key point from first set of keypoints, will be on first position
Two key points be determined as the target critical point of first video frame;
Wherein, the detailed process which positions all key points in the first video frame can be with are as follows:
If first video frame is the first video frame of the target video sequence, it is based on first network model, filters out described first
Background area in video frame, and identify image district of the target object in the first video frame behind wiping out background region
Domain, and the image-region that will identify that is as the targeted object region in first video frame;Further, the target user
Terminal can extract all key points in the targeted object region, and by all key points extracted and key point
Location information is added to the first set of keypoints in first video frame.
Further, referring to Figure 10, it is a kind of schematic diagram for obtaining object region provided in an embodiment of the present invention.
As shown in Figure 10, it is assumed that target user is just on certain authentication platform (for example, bank finance platform) by shown in Fig. 10
Target terminal user carries out recognition of face to the image data of the collected face comprising the target user, in order to subsequent
It can identify and hold whether the target user of the target terminal user is true man.Wherein, target terminal user shown in Fig. 10
It before identifying above-mentioned face, needs first to call the camera applications in the terminal, and corresponding by the camera applications
Video data under camera (for example, being built in the front camera in the target terminal user) acquisition specific action, and
In the target terminal user by the Digital video resolution be the corresponding target video sequence of above-mentioned face.In order to improve recognition of face
Efficiency, and improve terminal treatment effeciency, in the process of face recognition can be only to the first of the target video sequence
Video frame carries out Face datection, (can obtain target object area shown in Fig. 10 to obtain the corresponding face frame of the face
Domain).Wherein it is possible to understand, which can be referred to as the first video frame for the first video frame.Then,
The target terminal user can backstage to first video frame got carry out image procossing, for example, can by this first
Front and back scene area in video frame is split, and is used with taking out target shown in Fig. 10 from the first picture frame shown in Fig. 10
Objective contour region corresponding to the overall profile at family.Wherein, foreground area is that the overall profile of above-mentioned target user is corresponding
Objective contour region, background area is that the figure behind the objective contour region of the target user is taken out in first video frame
As region.
Wherein, above-mentioned authentication platform can also include: gate inhibition, attendance, traffic, community, old-age pension qualification authentication etc.
Need to carry out the authentication platform of recognition of face.
It should be appreciated that can be prevented in above-mentioned background area by filtering out the background area in above-mentioned first picture frame
The interference of each pixel, so as to improve the accuracy of subsequent recognition of face.Then, above-mentioned target terminal user can be into one
Step identifies face (i.e. target object) in objective contour region shown in Fig. 10, to get face shown in Fig. 10
The region (i.e. targeted object region) at place.Then, target terminal user shown in Fig. 10 can be based further on first network
Model (for example, the first network model can be multitask convolutional neural networks model) is from targeted object region shown in Fig. 10
Key point associated with above-mentioned face is determined in (face frame), and (all key points in the face frame are as shown in Fig. 10
Added key point in set of keypoints).Further, each in the available set of keypoints of the target terminal user
The location information of key point, and by each crucial point tracking be mapped to second video frame (i.e. with the first video frame
In adjacent next video frame), and based on the key point mapped in the second video frame, it obtains in second video frame
All key points.It can be seen that the target terminal user can be according to key point tracing algorithm, in subsequent video frame
All key points appeared in first set of keypoints are tracked, it is steady in order to be obtained in subsequent video frame
While fixed correspondence key point, the Face datection time can also be saved, processing speed is greatly improved.Wherein, each key
Point is the characteristic point that can characterize the face position of above-mentioned target user.Wherein, all in the second video frame to map obtained pass
Key point can be added to corresponding second set of keypoints of the second video frame.Wherein, each key in the first set of keypoints
Point has mapping relations one by one between each key point in the second set of keypoints.
In order to improve the accuracy rate identified to the key point at the face position in above-mentioned targeted object region, Ke Yixian
Using the targeted object region as pending area, and further the pending area is adjusted to fixed size, then
By the input layer in the image data input multitask convolutional neural networks in the pending area after adjustment size.The multitask
Convolutional neural networks model may include input layer, convolutional layer, pond layer, full articulamentum and output layer;The wherein ginseng of input layer
Number size is equal to the size of the pending area after adjustment size.When to be input to this more for the image data in above-mentioned pending area
After the output layer of task convolutional neural networks model, convolutional layer is subsequently entered, randomly selects the figure in the pending area first
As the fritter in data is as sample, and from this small sample, then study utilizes this sample to some characteristic informations
The all pixels region of the pending area is successively slipped over as a window, that is to say, that the feature learnt from sample
Information does convolution algorithm with the image data in pending area, it is hereby achieved that the image data in the pending area exists
Most significant characteristic information on different location, it can oriented in the pending area by the multitask convolutional neural networks
Target user the corresponding characteristic point in each face position.After finishing convolution algorithm, extract above-mentioned wait locate
The characteristic information of the image data in region is managed, but big only by the feature quantity that convolution algorithm extracts, is calculated to reduce
Amount is also needed to carry out pond operation, that is, will be transmitted from above-mentioned pending area by the characteristic information that convolution algorithm extracts
To pond layer, aggregate statistics are carried out to the characteristic information of extraction, the order of magnitude of these statistical nature information will be well below convolution
The order of magnitude for the characteristic information that operation is extracted, while can also improve classifying quality.Common pond method mainly includes average
Pond operation method and maximum pond operation method.Average pond operation method is to calculate one in a characteristic information set
A average characteristics information represents the feature of this feature information aggregate;Maximum pond operation is extracted in a characteristic information set
Maximum characteristic information represents the feature of this feature information aggregate out.Pass through the process of convolution of convolutional layer and the pond Hua Chu of pond layer
Reason, can extract the static structure characteristic information of the image data in the pending area, it can obtain the pending district
The corresponding characteristic information in face position in domain.
Then, which can further utilize the classifier in the multitask convolutional neural networks model,
Identify pending area in image data static structure characteristic information with it is multiple in the multitask convolutional neural networks model
The matching degree of attribute type feature, and maximum matching degree and respective attributes in multiple matching degrees that above-mentioned classifier is exported
The corresponding label information of type feature is associated, so as to find the region where the face position under specific action, with
Convenient for the subsequent characteristic point that can navigate to each position in above-mentioned face,
Wherein, the key point in set of keypoints (i.e. the first set of keypoints and the second set of keypoints) can be positioning
Out can characterize the corresponding characteristic point in each face position, i.e., above-mentioned key point can be significant portion, the faces such as mouth, eyes
The corresponding characteristic point in position.For example, can be found in the set of keypoints all in mouth region for mouth
Key point similarly for eyes, can find all key points in eyes region in the set of keypoints.
Wherein, attribute type feature included in the multitask convolutional neural networks model (i.e. first network model)
Value volume and range of product is in the training multitask convolutional neural networks by institute in a large amount of training dataset (i.e. standard drawing image set)
What the value volume and range of product for the label information for including determined.
Wherein, multiple attribute type features included in above-mentioned multitask neural network model can be special for eyes type
Sign, nose types feature, mouth type feature, face mask type feature, and each of the multitask neural network model
Attribute type feature corresponds to a label information, in order in the multitask neural network, available above-mentioned face
Matching degree between the corresponding characteristic information in face position and above-mentioned multiple attribute type features, then the target terminal user can
With further will by the obtained matching degree of multitask neural network model maximum matching degree and the multitask nerve net
The corresponding label information of respective attributes type feature in network in multiple attribute type features is associated, to above-mentioned facial regions
Face in domain are classified, so as to navigate to the eyes and mouth that can characterize above-mentioned target user in the first video frame
Key point in bar region.And then it can be from set of keypoints shown in Fig. 10 by two key points on first position
It is determined as the target critical point of first video frame, in order to be able to further execute step S204.
Step S204, it is related to the institute in first video frame in the targeted object region in second video frame
Key point obtains all keys in all key points and second video frame in second video frame to being tracked
The location information of point;
Step S205, according to the target critical point in first video frame, in the target object of second video frame
In all key points that region is included, the target that two key points on the second position are determined as second video frame is closed
Key point;
Wherein, the specific implementation of above-mentioned steps S204- step S205 can be found in obtains in embodiment corresponding to above-mentioned Fig. 3
The detailed process for taking the target critical point in each video frame, will not continue to be described here.
Wherein, based on the first key point pair determined from the first video frame in embodiment corresponding to above-mentioned Fig. 6 and
Speech, first key point is to including key point A and key point B.In the embodiment as corresponding to above-mentioned Fig. 6 in the second video frame
It is obtained that all key points are that the location information based on all key points in the first video frame is tracked, and therefore, constitutes
Between the key point A ' and key point B ' of second key point pair and the key point A and key point B that constitute first key point pair
Certainly exist certain tracking mapping relations.Further, table 3 is referred to, is that a kind of tracking provided in an embodiment of the present invention is reflected
Penetrate relation table.
Table 3
First video frame | Key point A | Key point B |
Location information | (C1, B1) | (C2, B2) |
Second video frame | Key point A ' | Key point B ' |
Location information | (C1 ', B1 ') | (C2 ', B2 ') |
As shown in Table 3 above, the location information of the key point A in the first video frame is coordinate (C1, B1), key point B's
Location information be coordinate (C2, B2), and key point A and key point B be determined in first video frame first key
Point pair, i.e. target critical point in first video frame, therefore, the location information of the target critical point of first video frame is
The location information of key point A and key point B.It should be appreciated that due to the second key point in the second video frame to be to first close
Determined by the location information of the key point A and key point B of key point centering are tracked after mapping, therefore, in the second video frame
Key point A ' and the first video frame in key point A between pass there are above-mentioned tracking mapping relations, in second video frame
The location information of key point A ' can be coordinate (C1 ', B1 ');Similarly, in the key point B ' in the second video frame and the first video frame
Key point B between there is also above-mentioned tracking mapping relations, the location information of the key point B ' in second video frame can be
Coordinate (C2 ', B2 ').In consideration of it, the location information of the target critical point of second video frame is key point A ' and key point B '
Location information.It should be appreciated that can be incited somebody to action for the location information of remaining key point each in above-mentioned first video frame
Each remaining key point is mapped in the second video frame, and then each remaining key can be found in second video frame
The mapping key point (the mapping key point can be referred to as the key point in the second video frame) that point maps, so as to
Determine the tracking mapping relations in all key points and the first video frame in second video frame between all key points.Its
In, each tracking mapped between obtained key point maps in each residue key point and the second video frame in the first video frame
Relationship may refer to the description of the tracking mapping relations between key point A and key point A ' cited by the embodiment of the present invention, this
In will not continue to repeat.Similarly, for determining the tool of the second new key point pair respectively from subsequent each video frame
Body process can here will not together referring to the description in the embodiment of the present invention to the second key point pair is determined in the second video frame
It is further continued for being repeated.
Step S206, the location information of the target critical point based on each video frame, obtains each video frame
The corresponding behavioral characteristics value of target critical point;
Wherein, dbjective state may include the first sign state and the second sign state, and the target critical point is corresponding
Behavioral characteristics value may include the behavioral characteristics value under behavioral characteristics value and the second sign state under the first sign state.
Step S207 obtains the behavioral characteristics value under first sign state, and in institute in each video frame
It states and obtains the first maximum behavioral characteristics value in the behavioral characteristics value under the first sign state, and is special based on the described first maximum dynamic
Value indicative determines first object threshold value;
Step S208 obtains the behavioral characteristics value under second sign state, and in institute in each video frame
It states and obtains the second maximum behavioral characteristics value in the behavioral characteristics value under the second sign state, and is special based on the described second maximum dynamic
Value indicative determines the second targets threshold;
Behavioral characteristics value under first sign state is compared by step S209 with the first object threshold value,
And the behavioral characteristics value under second sign state is compared with second targets threshold;
Step S210, by the behavioral characteristics value under continuous multiple the first sign states greater than first object threshold value, and/or
Less than video frame corresponding to the behavioral characteristics value under the second sign state of the second targets threshold, it is determined as regarding from the target
The video frame for acting the coherent and described target object and being in dbjective state filtered out in frequency sequence, to obtain under dbjective state
Video frame.
Video frame under the dbjective state is determined as key video sequence frame by step S211, and according to the key video sequence
Part belonging to target critical point in frame, the Local map where taking the part in the key video sequence frame
As region;
Specifically, the target terminal user can by the movement filtered out by above-mentioned steps S210 the coherent and described target pair
As in the dbjective state video frame as candidate video frame, and to the targeted object region in the candidate video frame into
Row quality evaluation, and the blurry video frames in the candidate video frame are filtered out according to quality assessment result;Filtering out fuzzy video
In candidate video frame after frame, the candidate video frame with highest resolution is determined as key video sequence frame, and be based on the pass
Part belonging to target critical point in key video frame takes the part location in the key video sequence frame
Domain is as local image region.
For ease of understanding, the embodiment of the present invention is only to filter out the video frame under the state of opening one's mouth as candidate video frame
Example, to describe to carry out the candidate video frame detailed process of quality evaluation.Further, referring to Figure 11, it is of the invention real
A kind of schematic diagram of acquisition key video sequence frame of example offer is provided.Assuming that under the state of opening one's mouth counted by above-mentioned steps S209
First maximum behavioral characteristics value is 5cm, wherein during determining the first maximum behavioral characteristics value, the target user
Terminal can appear in the behavioral characteristics value in each video frame to the target critical point in the mouth region and be ranked up (example
Such as, can arrange by descending sequence), and then can be found in the behavioral characteristics value under first sign state
First maximum behavioral characteristics value, it is possible to further referring in the part successive video frames in embodiment corresponding to above-mentioned table 1
The distribution situation of behavioral characteristics value, so can be determined based on the first maximum behavioral characteristics value first object threshold value (for example,
2cm).Then, which can carry out the behavioral characteristics value in each video frame with the first object threshold value respectively
Compare, in order to be able to the behavioral characteristics under the first sign state greater than first object threshold value are filtered out in these video frames
The corresponding video frame of value, and then these video frames filtered out can be determined as candidate video frame, it can obtain Figure 11
Shown in candidate video frame.Then, which can be by Evaluation Model on Quality, to the video in candidate video frame
Frame 10, video frame 20, video frame 30, video frame 40 carry out quality evaluation, to obtain quality assessment result.For example, at this four
In video frame, if the quality assessment result are as follows: the resolution ratio of video frame 10, video frame 20 and video frame 30 is not up to the quality and comments
Estimate the resolution threshold of model, the rate respectively of video frame 40 has reached the resolution threshold of the instruction assessment models, then the target
User terminal can filter out the blurry video frames in the candidate video frame according to quality assessment result, it can filter out four views
Video frame 10, video frame 20 and video frame 30 in frequency frame;It at the same time, can be true by the video frame 40 with highest resolution
Be set to key video sequence frame, and determine part belonging to the target critical point from the key video sequence frame, so from this
Using the part region as local image region in key video sequence frame, it can found from the key video sequence frame
Local image region (region i.e. where mouth) corresponding to the target critical point.
Optionally, in this four video frames, if the quality assessment result are as follows: the resolution ratio of video frame 20 and video frame 30
The rate respectively of the not up to resolution threshold of the Evaluation Model on Quality, video frame 10 and video frame 40 has reached instruction assessment mould
The resolution threshold of type, then the target terminal user can filter out fuzzy in the candidate video frame according to quality assessment result
Video frame, it can filter out the video frame 20 and video frame 30 in four video frames;At the same time, model video can filtered out
In the candidate video frame of frame, the resolution ratio of video frame 10 and video frame 40 is compared, wherein if the resolution ratio of video frame 40
Greater than the resolution ratio of video frame 10, then the video frame 40 differentiated with highest can be determined as to key video sequence frame, and from described
Local image region (region i.e. where mouth) corresponding to the target critical point is obtained in key video sequence frame.
Step S212 identifies the local image region, obtains target identification as a result, and knowing based on the target
Other result determines the attribute of the target object.
Wherein, the part in the local image region may include the first sign information and the second sign information;
Then the target terminal user can also specifically execute following steps: in the office when executing step S212
First sign information region is determined as the first image-region in portion's image-region, and by second sign information
Region is determined as the second image-region, and the first image region and second image-region are inputted cascade network
Model, to extract the spy of the second image in the first characteristics of image and second image-region in the first image region
Sign;Further, the first image feature is inputted into the first classifier in the cascade network model, output described first
The first matching degree in characteristics of image and second network model between multiple attribute type features of the first classifier;Into one
Second characteristics of image is inputted the second classifier in the cascade network model, it is special to export second image by step ground
The second matching degree in sign and the cascade network model between multiple attribute type features of the second classifier;Described second point
Class device be and the mutually cascade classifier of the first classifier;Further, weighted value and institute based on first classifier
First matching degree is merged with second matching degree, obtains the cascade network by the weighted value for stating the second classifier
The corresponding target identification of network model is as a result, and determine the corresponding attribute of the target object based on the target identification result.
Wherein, the specific implementation of step S212 can be found in embodiment corresponding to above-mentioned Fig. 3 and retouch to step S104
It states, will not continue to repeat here.
The embodiment of the present invention, can be to the target video first when getting the corresponding target video sequence of target object
Target object in sequence is detected, in order to the subsequent target that can find each video frame in the target video sequence
Key point, and then can capture the target critical point and appear in location information in each video frame, so as to according to should
Target critical point obtains moving corresponding to the target critical point in each video frame in the positional information calculation in each video frame
State characteristic value;For example, each video frame can be further calculated out so that the target critical point is key point A and key point B as an example
In key point A and the distance between key point B difference, and then target critical point institute is right in available corresponding video frame
The behavioral characteristics value answered;Then, it by behavioral characteristics value corresponding to the target critical point in each video frame, can filter out
Video frame under particular state, it can filter out behavioral characteristics value in each video frame and meet video under dbjective state
Frame, and then key video sequence frame can be determined from the video frame filtered out, to improve the efficiency of vivo identification, and ensure living body
The accuracy of identification;It is then possible to determine part belonging to target critical point in key video sequence frame, and then can be from this
The local image region where the part is taken out in key video sequence frame, to improve the efficiency of image recognition;Finally, logical
The part in the local image region under the particular state can be identified by crossing trained In vivo detection model, with
The precision of vivo identification under particular state is improved, and then the authentication dynamics of system can be improved.
Further, referring to Figure 12, it is that a kind of structure of video data processing apparatus provided in an embodiment of the present invention is shown
It is intended to.As shown in figure 12, above-mentioned video data processing apparatus 1 can be whole for the target user in embodiment corresponding to above-mentioned Fig. 1
End.Above-mentioned video data processing apparatus 1 may include: retrieval module 10, key point locating module 20, and characteristic value obtains mould
Block 30, video frame choose module 40, key frame determining module 50 and local identification module 60;
Retrieval module 10, for obtaining target video sequence, from each video frame of the target video sequence
Extract the targeted object region where target object;
Wherein, retrieval module 10 includes: data parsing unit 101 and area determination unit 102;
Data parsing unit 101, for acquiring the video data comprising the target object, and by the video data solution
Analysis is the corresponding target video sequence of the target object, and obtains the first video frame and second from the target video sequence
Video frame;
Area determination unit 102, for the image-region where obtaining the target object in first video frame,
As the targeted object region in first video frame, and where obtaining the target object in second video frame
Image-region, as the targeted object region in second video frame.
Wherein, the area determination unit 102 is specifically used for, if first video frame is the target video sequence
First video frame is then based on first network model, filters out the background area in first video frame, and be based on first net
Network model identifies image-region of the target object in the first video frame behind wiping out background region, and the figure that will identify that
Targeted object region as region as the target object in first video frame.
Wherein, the data parsing unit 101 and the specific implementation of area determination unit 102 can be found in above-mentioned Fig. 3
To the description of step S101 in corresponding embodiment, will not continue to repeat here.
Key point locating module 20, it is fixed for carrying out key point to the target object in the targeted object region
Position, obtains the target critical point of the target object in each video frame and the target critical point of each video frame
Location information;
The key point locating module 20 includes: key point positioning unit 201, key point tracing unit 202 and key point
Determination unit 203;
Key point positioning unit 201, in the targeted object region in first video frame to the target pair
As carrying out crucial point location, all key points and first view of the target object in first video frame are obtained
The location information of all key points in frequency frame, and it is from obtained all key points that two key points on first position are true
It is set to the target critical point of first video frame;
Key point tracing unit 202, for being regarded in the targeted object region in second video frame to described first
All key points in frequency frame obtain all key points and second video in second video frame to being tracked
The location information of all key points in frame;
Wherein, the key point tracing unit 202, specifically for based on each key point in first video frame
Each crucial point tracking is mapped to the targeted object region in second video frame by location information, and based on described
The key point mapped in targeted object region in second video frame obtains all keys in second video frame
Point, and the location information of each key point in second video frame in determining second video frame.
Key point determination unit 203, for according to the target critical point in first video frame, in second video
In all key points that the targeted object region of frame is included, two key points on the second position are determined as second view
The target critical point of frequency frame.
Wherein, key point positioning unit 201, the specific implementation of key point tracing unit 202 and key point determination unit 203
Mode can be found in the description in embodiment corresponding to above-mentioned Fig. 3 to step S102, will not continue to repeat here.
Characteristic value acquisition module 30 obtains institute for the location information of the target critical point based on each video frame
State the corresponding behavioral characteristics value of target critical point of each video frame;
Wherein, the characteristic value acquisition module 30 is believed specifically for obtaining the position of target critical point of each video frame
Breath, and the location information of the target critical point according to each video frame, determine the target critical point of each video frame
Corresponding distance difference, and the distance difference determined is determined as the spy of dynamic corresponding to the target critical point of corresponding video frame
Value indicative.
Video frame chooses module 40, meets under dbjective state for choosing behavioral characteristics value from the target video sequence
Video frame;It is coherent that video frame under the dbjective state is used to characterize the movement filtered out from the target video sequence
And the target object is in the video frame of the dbjective state;
Wherein, the dbjective state includes the first sign state and the second sign state, the target of each video frame
The corresponding behavioral characteristics value of key point includes that the dynamic under behavioral characteristics value and the second sign state under the first sign state is special
Value indicative;
It includes: first threshold determination unit 401, second threshold determination unit 402, threshold value that the video frame, which chooses module 40,
Comparing unit 403, video frame selection unit 404;
First threshold determination unit 401, it is dynamic under first sign state for obtaining in each video frame
State characteristic value, and the first maximum behavioral characteristics value is obtained in the behavioral characteristics value under first sign state, and be based on institute
It states the first maximum behavioral characteristics value and determines first object threshold value;
Second threshold determination unit 402, it is dynamic under second sign state for obtaining in each video frame
State characteristic value, and the second maximum behavioral characteristics value is obtained in the behavioral characteristics value under second sign state, and be based on institute
It states the second maximum behavioral characteristics value and determines the second targets threshold;
Threshold value comparison unit 403, for by under first sign state behavioral characteristics value and the first object threshold
Value is compared, and the behavioral characteristics value under second sign state is compared with second targets threshold;
Video frame selection unit 404, the dynamic under continuous multiple the first sign states greater than first object threshold value is special
Value indicative, and/or less than video frame corresponding to the behavioral characteristics value under the second sign state of the second targets threshold, be determined as from
The video frame for acting the coherent and described target object and being in dbjective state filtered out in the target video sequence, to obtain
Video frame under dbjective state.
Wherein, it includes: first threshold determination unit 401, second threshold determination unit that the video frame, which chooses module 40,
402, the specific executive mode of threshold value comparison unit 403, video frame selection unit 404 can be found in embodiment corresponding to above-mentioned Fig. 3
In to step S104 describe, will not continue to repeat here.
Key frame determining module 50, for the video frame under the dbjective state to be determined as key video sequence frame, and according to
Part belonging to target critical point in the key video sequence frame, takes the part in the key video sequence frame
The local image region at place;
Wherein, key frame determining module 50 includes: quality estimation unit 501 and key frame determination unit 502;
Quality estimation unit 501, for the movement filtered out is coherent and the target object to be in the dbjective state
Video frame as candidate video frame, and in the candidate video frame targeted object region carry out quality evaluation, and according to
Quality assessment result filters out the blurry video frames in the candidate video frame;
Key frame determination unit 502, for that will have highest resolution in filtering out the candidate video frame after blurry video frames
The candidate video frame of rate is determined as key video sequence frame, and right based on part belonging to the target critical point in the key video sequence frame
As taking the part region in the key video sequence frame as local image region.
Wherein, the specific executive mode of quality estimation unit 501 and key frame determination unit 502 can be found in above-mentioned Fig. 3 institute
To the description of step S105 in corresponding embodiment, will not continue to repeat here.
Local identification module 60 obtains target knowledge for identifying to the part in the local image region
Not as a result, and determining the attribute of the target object based on the target identification result.
Wherein, local identification module 60 includes: feature extraction unit 601, characteristic matching unit 602 and attribute determining unit
603;
Feature extraction unit 601 for the local image region to be determined as pending area, and is based on the second network
Model carries out feature extraction to the pending area, obtains characteristics of image corresponding with the pending area;
Characteristic matching unit 602, for obtaining described image feature and described second according in second network model
Matching degree in network model between multiple attribute type features;
Attribute determining unit 603, matching degree and the second network mould for will be obtained by second network model
The corresponding label information of multiple attribute type features is associated in type, obtains the corresponding target identification of second network model
As a result, and determining the corresponding attribute of the target object based on the target identification result.
Wherein, the specific executive mode of feature extraction unit 601, characteristic matching unit 602 and attribute determining unit 603 can
Step S106 is described referring in embodiment corresponding to above-mentioned Fig. 3, will not continue to repeat here.
Optionally, local identification module 60 further include: sample determination unit 604 and model training unit 605;
Sample determination unit 604, for obtaining sample set associated with the target object, and in the sample set
The sample data for carrying the first label information is determined as positive sample in conjunction, and the second label will be carried in the sample set
The sample data of information is determined as negative sample;Wherein, the positive sample is the sample number that the attribute of target object is living body attribute
According to the negative sample is that the attribute of target object is the sample data of non-living body attribute;
Model training unit 605 is used in the sample set, by the size of the corresponding image data of the positive sample
Zoom to identical size, and based on corresponding second mark of corresponding first label information of positive sample, the negative sample after scaling
Sign information, training second network model.
Wherein, the sample determination unit 604 and the specific executive mode of model training unit 605 can be found in above-mentioned Fig. 3
To the description of the second network model in corresponding embodiment, will not continue to repeat here.
Optionally, the part includes the first sign information and the second sign information;
The part identification module 60 can also specifically include: image-region determination unit 606, the first matching unit 607,
Second matching unit 608 and matching integrated unit 609;
Image-region determines single 606, is used for first sign information region in the local image region
It is determined as the first image-region, and second sign information region is determined as the second image-region, and by described
One image-region and second image-region input cascade network model, to extract the first figure in the first image region
As the second characteristics of image in feature and second image-region;
First matching unit 607, for the first image feature to be inputted to first point in the cascade network model
Class device exports in the first image feature and second network model between multiple attribute type features of the first classifier
The first matching degree;
Second matching unit 608, for second characteristics of image to be inputted to second point in the cascade network model
Class device exports in second characteristics of image and the cascade network model between multiple attribute type features of the second classifier
The second matching degree;Second classifier be and the mutually cascade classifier of the first classifier;
Match integrated unit 609, the weight for weighted value and second classifier based on first classifier
Value, first matching degree is merged with second matching degree, is obtained the corresponding target of the cascade network model and is known
Not as a result, and determining the corresponding attribute of the target object based on the target identification result.
Wherein, described image area determination unit 606, the first matching unit 607, the second matching unit 608 and matching are melted
The specific executive mode for closing unit 609, which can be found in embodiment corresponding to above-mentioned Fig. 9, describes step S212, here will not be followed by
It is continuous to be repeated.
Wherein, the retrieval module 10, key point locating module 20, characteristic value acquisition module 30, video frame are chosen
The specific executive mode of module 40, key frame determining module 50 and local identification module 60 can be found in the corresponding implementation of above-mentioned Fig. 3
Step S101- step S106 is described in example, will not continue to repeat here.
The embodiment of the present invention, can be to the target video first when getting the corresponding target video sequence of target object
Target object in sequence is detected, in order to the subsequent target that can find each video frame in the target video sequence
Key point, and then can capture the target critical point and appear in location information in each video frame, so as to according to should
Target critical point obtains moving corresponding to the target critical point in each video frame in the positional information calculation in each video frame
State characteristic value;For example, each video frame can be further calculated out so that the target critical point is key point A and key point B as an example
In key point A and the distance between key point B difference, and then target critical point institute is right in available corresponding video frame
The behavioral characteristics value answered;Then, it by behavioral characteristics value corresponding to the target critical point in each video frame, can filter out
Video frame under particular state, it can filter out behavioral characteristics value in each video frame and meet video under dbjective state
Frame, and then key video sequence frame can be determined from the video frame filtered out, to improve the efficiency of vivo identification, and ensure living body
The accuracy of identification;It is then possible to determine part belonging to target critical point in key video sequence frame, and then can be from this
The local image region where the part is taken out in key video sequence frame, to improve the efficiency of image recognition;Finally, logical
The part in the local image region under the particular state can be identified by crossing trained In vivo detection model, with
The precision of vivo identification under particular state is improved, and then the authentication dynamics of system can be improved.
Further, referring to Figure 13, it is the structure of another video data processing apparatus provided in an embodiment of the present invention
Schematic diagram.As shown in figure 13, above-mentioned video data processing apparatus 1000 can be applied to the target in above-mentioned Fig. 1 corresponding embodiment
User terminal.Above-mentioned video data processing apparatus 1000 may include: processor 1001, network interface 1004 and memory
1005, in addition, above-mentioned video data processing apparatus 1000 can also include: user interface 1003 and at least one communication bus
1002.Wherein, communication bus 1002 is for realizing the connection communication between these components.Wherein, user interface 1003 can wrap
Display screen (Display), keyboard (Keyboard) are included, optional user interface 1003 can also include wireline interface, the nothing of standard
Line interface.Network interface 1004 optionally may include standard wireline interface and wireless interface (such as WI-FI interface).Memory
1004 can be high speed RAM memory, be also possible to non-labile memory (non-volatile memory), such as extremely
A few magnetic disk storage.Memory 1005 optionally can also be that at least one is located remotely from the storage of aforementioned processor 1001
Device.As shown in figure 13, as may include operating system, network communication in a kind of memory 1005 of computer storage medium
Module, Subscriber Interface Module SIM and equipment control application program.
In the video data processing apparatus 1000 shown in Figure 13, network interface 1004 can provide network communication function;And
User interface 1003 is mainly used for providing the interface of input for user;And processor 1001 can be used for calling in memory 1005
The equipment of storage controls application program, to realize:
Target video sequence is obtained, the mesh where extracting target object in each video frame of the target video sequence
Mark subject area;
Crucial point location is carried out to the target object in the targeted object region, is obtained in each video frame
The target object target critical point and each video frame target critical point location information;
The location information of target critical point based on each video frame, obtains the target critical of each video frame
The corresponding behavioral characteristics value of point;
Behavioral characteristics value is chosen from the target video sequence meets the video frame under dbjective state;The dbjective state
Under video frame be used to characterize that the movement that is filtered out from the target video sequence to link up and the target object is in institute
State the video frame of dbjective state;
Video frame under the dbjective state is determined as key video sequence frame, and according to the target in the key video sequence frame
Part belonging to key point, the local image region where taking the part in the key video sequence frame;
The local image region is identified, obtains target identification as a result, and true based on the target identification result
The attribute of the fixed target object.
It should be appreciated that the executable Fig. 3 or Fig. 9 above of video data processing apparatus 1000 described in the embodiment of the present invention
To the description of above-mentioned video data handling procedure in corresponding embodiment, also can be performed in embodiment corresponding to Figure 12 above to upper
The description of video data processing apparatus 1 is stated, details are not described herein.In addition, being described to using the beneficial effect of same procedure, also not
It is repeated again.
In addition, it need to be noted that: the embodiment of the invention also provides a kind of computer storage medium, and above-mentioned meter
Computer program performed by the video data processing apparatus 1 being mentioned above, and above-mentioned calculating are stored in calculation machine storage medium
Machine program includes program instruction, when above-mentioned processor executes above procedure instruction, is able to carry out corresponding to Fig. 3 above or Fig. 9
To the description of above-mentioned video data handling procedure in embodiment, therefore, will no longer repeat here.In addition, to using identical
The beneficial effect of method describes, and is also no longer repeated.For in computer storage medium embodiment according to the present invention not
The technical detail of disclosure please refers to the description of embodiment of the present invention method.
Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with
Relevant hardware is instructed to complete by computer program, above-mentioned program can be stored in a computer-readable storage medium
In, the program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, above-mentioned storage medium can be magnetic
Dish, CD, read-only memory (Read-Only Memory, ROM) or random access memory (Random Access
Memory, RAM) etc..
The above disclosure is only the preferred embodiments of the present invention, cannot limit the right model of the present invention with this certainly
It encloses, therefore equivalent changes made in accordance with the claims of the present invention, is still within the scope of the present invention.
Claims (15)
1. a kind of video data handling procedure characterized by comprising
Target video sequence is obtained, the target pair where extracting target object in each video frame of the target video sequence
As region;
Crucial point location is carried out to the target object in the targeted object region, obtains the institute in each video frame
State the location information of the target critical point of target object and the target critical point of each video frame;
The location information of target critical point based on each video frame obtains the target critical point pair of each video frame
The behavioral characteristics value answered;
Behavioral characteristics value is chosen from the target video sequence meets the video frame under dbjective state;Under the dbjective state
For characterizing, the movement filtered out from the target video sequence links up video frame and the target object is in the mesh
The video frame of mark state;
Video frame under the dbjective state is determined as key video sequence frame, and according to the target critical in the key video sequence frame
Part belonging to point, the local image region where taking the part in the key video sequence frame;
The local image region is identified, obtains target identification as a result, and determining institute based on the target identification result
State the attribute of target object.
2. the method according to claim 1, wherein the acquisition target video sequence, from the target video
The targeted object region where target object is extracted in each video frame of sequence, comprising:
Acquisition includes the video data of the target object, and is the corresponding mesh of the target object by the Digital video resolution
Video sequence is marked, and obtains the first video frame and the second video frame from the target video sequence;
Image-region where obtaining the target object in first video frame, as the mesh in first video frame
Subject area, and the image-region where obtaining the target object in second video frame are marked, as second view
Targeted object region in frequency frame.
3. according to the method described in claim 2, it is characterized in that, the target pair in the targeted object region
As carrying out crucial point location, the target critical point and target critical point of target object described in each video frame are obtained
Location information, comprising:
Crucial point location is carried out to the target object in the targeted object region in first video frame, obtains the mesh
Mark the position letter of all key points of the object in all key points and first video frame in first video frame
Breath, and close the target that two key points on first position are determined as first video frame from obtained all key points
Key point;
To all key points in first video frame to chasing after in the targeted object region in second video frame
Track obtains the position letter of all key points in second video frame and all key points in second video frame
Breath;
According to the target critical point in first video frame, in the institute that the targeted object region of second video frame is included
Have in key point, two key points on the second position are determined as to the target critical point of second video frame.
4. according to the method described in claim 2, it is characterized in that, described obtain the target pair from first video frame
As the image-region at place, as the targeted object region in first video frame, comprising:
If first video frame is the first video frame of the target video sequence, it is based on first network model, filters out institute
The background area in the first video frame is stated, and identifies the target object in wiping out background region based on the first network model
The image-region in the first video frame afterwards, and the image-region that will identify that as the target object in first video
Targeted object region in frame.
5. according to the method described in claim 3, it is characterized in that, the targeted object region in second video frame
In to all key points in first video frame to being tracked, obtain all key points in second video frame with
And the location information of all key points in second video frame, comprising:
Based on the location information of each key point in first video frame, each crucial point tracking is mapped to described
Targeted object region in second video frame, and based on mapping in the targeted object region in second video frame
Key point obtains all key points in second video frame, and second video is determined in second video frame
The location information of each key point in frame.
6. method according to claim 1-5, which is characterized in that the target based on each video frame
The location information of key point obtains the corresponding behavioral characteristics value of target critical point of each video frame, comprising:
Obtain the location information of the target critical point of each video frame, and the position of the target critical point according to each video frame
Confidence breath determines the corresponding distance difference of target critical point of each video frame, and the distance difference determined is determined
Behavioral characteristics value corresponding to target critical point for corresponding video frame.
7. the method according to claim 1, wherein the dbjective state includes the first sign state and the second body
Symptom state, the corresponding behavioral characteristics value of target critical point of each video frame include the behavioral characteristics under the first sign state
Behavioral characteristics value under value and the second sign state;
The behavioral characteristics value of choosing from the target video sequence meets the video frame under dbjective state, comprising:
In each video frame, the behavioral characteristics value under first sign state is obtained, and in the first sign shape
The first maximum behavioral characteristics value is obtained in behavioral characteristics value under state, and determines first based on the described first maximum behavioral characteristics value
Targets threshold;
In each video frame, the behavioral characteristics value under second sign state is obtained, and in the second sign shape
The second maximum behavioral characteristics value is obtained in behavioral characteristics value under state, and determines second based on the described second maximum behavioral characteristics value
Targets threshold;
Behavioral characteristics value under first sign state is compared with the first object threshold value, and by second body
Behavioral characteristics value under symptom state is compared with second targets threshold;
By the behavioral characteristics value under continuous multiple the first sign states greater than first object threshold value, and/or less than the second target
Video frame corresponding to behavioral characteristics value under second sign state of threshold value is determined as being sieved from the target video sequence
The video frame for acting the coherent and described target object and being in dbjective state selected, to obtain the video frame under dbjective state.
8. the method according to claim 1, wherein the video frame by under the dbjective state determines key
Video frame, and the part according to belonging to the target critical point in the key video sequence frame are scratched in the key video sequence frame
Take the local image region where the part, comprising:
The movement filtered out is linked up and the target object is in the video frame of the dbjective state as candidate video frame, and
Quality evaluation is carried out to the targeted object region in the candidate video frame, and the candidate view is filtered out according to quality assessment result
Blurry video frames in frequency frame;
In filtering out the candidate video frame after blurry video frames, the candidate video frame with highest resolution is determined as crucial view
Frequency frame, and based on part belonging to the target critical point in the key video sequence frame, it takes in the key video sequence frame
The part region is as local image region.
9. being obtained the method according to claim 1, wherein described identify the local image region
Target identification as a result, and the attribute of the target object is determined based on the target identification result, comprising:
The local image region is determined as pending area, and the pending area is carried out based on the second network model
Feature extraction obtains characteristics of image corresponding with the pending area;
According to obtaining in second network model, multiple attribute types in described image feature and second network model are special
Matching degree between sign;
The matching degree obtained by second network model is corresponding with multiple attribute type features in second network model
Label information be associated, obtain the corresponding target identification of second network model as a result, and based on the target identification
As a result the corresponding attribute of the target object is determined.
10. according to the method described in claim 9, it is characterized in that, the method also includes:
Sample set associated with the target object is obtained, and the first label information will be carried in the sample set
Sample data is determined as positive sample, and the sample data for carrying the second label information is determined the sample that is negative in the sample set
This;Wherein, the positive sample is the sample data that the attribute of target object is living body attribute, and the negative sample is target object
Attribute is the sample data of non-living body attribute;
In the sample set, by the size scaling of the corresponding image data of the positive sample to identical size, and based on contracting
Corresponding first label information of positive sample, corresponding second label information of the negative sample after putting, training second network
Model.
11. the method according to claim 1, wherein the part includes the first sign information and second
Sign information;
It is described that the local image region is identified, target identification is obtained as a result, and true based on the target identification result
The attribute of the fixed target object, comprising:
First sign information region is determined as the first image-region in the local image region, and will be described
Second sign information region is determined as the second image-region, and by the first image region and second image-region
Cascade network model is inputted, to extract in the first characteristics of image and second image-region in the first image region
Second characteristics of image;
The first image feature is inputted into the first classifier in the cascade network model, exports the first image feature
The first matching degree between multiple attribute type features of the first classifier in second network model;
Second characteristics of image is inputted into the second classifier in the cascade network model, exports second characteristics of image
The second matching degree between multiple attribute type features of the second classifier in the cascade network model;Second classification
Device be and the mutually cascade classifier of the first classifier;
The weighted value of weighted value and second classifier based on first classifier, by first matching degree with it is described
Second matching degree is merged, and obtains the corresponding target identification of the cascade network model as a result, and based on the target identification
As a result the corresponding attribute of the target object is determined.
12. a kind of video data processing apparatus characterized by comprising
Retrieval module extracts mesh from each video frame of the target video sequence for obtaining target video sequence
Mark the targeted object region where object;
Key point locating module is obtained for carrying out crucial point location to the target object in the targeted object region
The position of the target critical point of the target critical point and each video frame of the target object in each video frame
Information;
Characteristic value acquisition module obtains described each for the location information of the target critical point based on each video frame
The corresponding behavioral characteristics value of target critical point of video frame;
Video frame chooses module, meets the video under dbjective state for choosing behavioral characteristics value from the target video sequence
Frame;It is coherent and described that video frame under the dbjective state is used to characterize the movement filtered out from the target video sequence
Target object is in the video frame of the dbjective state;
Key frame determining module, for the video frame under the dbjective state to be determined as key video sequence frame, and according to the pass
Part belonging to target critical point in key video frame, where taking the part in the key video sequence frame
Local image region;
Local identification module obtains target identification as a result, and based on the mesh for identifying to the local image region
Mark recognition result determines the attribute of the target object.
13. device according to claim 12, which is characterized in that the retrieval module includes:
Data parsing unit is institute for acquiring the video data comprising the target object, and by the Digital video resolution
The corresponding target video sequence of target object is stated, and obtains the first video frame and the second video from the target video sequence
Frame;
Area determination unit, for the image-region where obtaining the target object in first video frame, as institute
State the targeted object region in the first video frame, and the image district where obtaining the target object in second video frame
Domain, as the targeted object region in second video frame.
14. a kind of video data processing apparatus characterized by comprising processor and memory;
The processor is connected with memory, wherein the memory is for storing program code, and the processor is for calling
Said program code, to execute such as the described in any item methods of claim 1-11.
15. a kind of computer storage medium, which is characterized in that the computer storage medium is stored with computer program, described
Computer program includes program instruction, and described program is instructed when being executed by a processor, executed such as any one of claim 1-11
The method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811532116.4A CN109697416B (en) | 2018-12-14 | 2018-12-14 | Video data processing method and related device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811532116.4A CN109697416B (en) | 2018-12-14 | 2018-12-14 | Video data processing method and related device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109697416A true CN109697416A (en) | 2019-04-30 |
CN109697416B CN109697416B (en) | 2022-11-18 |
Family
ID=66231658
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811532116.4A Active CN109697416B (en) | 2018-12-14 | 2018-12-14 | Video data processing method and related device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109697416B (en) |
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110390033A (en) * | 2019-07-25 | 2019-10-29 | 腾讯科技(深圳)有限公司 | Training method, device, electronic equipment and the storage medium of image classification model |
CN111242178A (en) * | 2020-01-02 | 2020-06-05 | 杭州睿琪软件有限公司 | Object identification method, device and equipment |
CN111291736A (en) * | 2020-05-07 | 2020-06-16 | 南京景三医疗科技有限公司 | Image correction method and device and medical equipment |
CN111460419A (en) * | 2020-03-31 | 2020-07-28 | 周亚琴 | Internet of things artificial intelligence face verification method and Internet of things cloud server |
CN111507301A (en) * | 2020-04-26 | 2020-08-07 | 腾讯科技(深圳)有限公司 | Video processing method, video processing device, computer equipment and storage medium |
CN111836072A (en) * | 2020-05-21 | 2020-10-27 | 北京嘀嘀无限科技发展有限公司 | Video processing method, device, equipment and storage medium |
CN111860107A (en) * | 2020-05-28 | 2020-10-30 | 四川中科凯泽科技有限公司 | Standing long jump evaluation method based on deep learning attitude estimation |
CN111881726A (en) * | 2020-06-15 | 2020-11-03 | 马上消费金融股份有限公司 | Living body detection method and device and storage medium |
CN111932604A (en) * | 2020-08-24 | 2020-11-13 | 腾讯音乐娱乐科技(深圳)有限公司 | Method and device for measuring human ear characteristic distance |
CN112016437A (en) * | 2020-08-26 | 2020-12-01 | 中国科学院重庆绿色智能技术研究院 | Living body detection method based on face video key frame |
CN112055247A (en) * | 2020-09-11 | 2020-12-08 | 北京爱奇艺科技有限公司 | Video playing method, device, system and storage medium |
CN112182256A (en) * | 2020-09-28 | 2021-01-05 | 长城汽车股份有限公司 | Object identification method and device and vehicle |
CN112287850A (en) * | 2020-10-30 | 2021-01-29 | 维沃移动通信有限公司 | Article information identification method and device, electronic equipment and readable storage medium |
CN113158918A (en) * | 2021-04-26 | 2021-07-23 | 深圳市商汤科技有限公司 | Video processing method and device, electronic equipment and storage medium |
CN113178206A (en) * | 2021-04-22 | 2021-07-27 | 内蒙古大学 | AI (Artificial intelligence) composite anchor generation method, electronic equipment and readable storage medium |
WO2021203667A1 (en) * | 2020-04-06 | 2021-10-14 | Huawei Technologies Co., Ltd. | Method, system and medium for identifying human behavior in a digital video using convolutional neural networks |
CN113657251A (en) * | 2021-08-16 | 2021-11-16 | 联想(北京)有限公司 | Detection method and device |
CN114494954A (en) * | 2022-01-18 | 2022-05-13 | 北京达佳互联信息技术有限公司 | Video identification method and device, electronic equipment and storage medium |
CN115272923A (en) * | 2022-07-22 | 2022-11-01 | 华中科技大学同济医学院附属协和医院 | Intelligent identification method and system based on big data platform |
CN115761598A (en) * | 2022-12-20 | 2023-03-07 | 昆明思碓网络科技有限公司 | Big data analysis method and system based on cloud service platform |
CN116389761A (en) * | 2023-05-15 | 2023-07-04 | 南京邮电大学 | Clinical simulation teaching data management system of nursing |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR100851981B1 (en) * | 2007-02-14 | 2008-08-12 | 삼성전자주식회사 | Liveness detection method and apparatus in video image |
CN104794464A (en) * | 2015-05-13 | 2015-07-22 | 上海依图网络科技有限公司 | In vivo detection method based on relative attributes |
CN105426815A (en) * | 2015-10-29 | 2016-03-23 | 北京汉王智远科技有限公司 | Living body detection method and device |
CN106557723A (en) * | 2015-09-25 | 2017-04-05 | 北京市商汤科技开发有限公司 | A kind of system for face identity authentication with interactive In vivo detection and its method |
CN106897658A (en) * | 2015-12-18 | 2017-06-27 | 腾讯科技(深圳)有限公司 | The discrimination method and device of face live body |
CN107346422A (en) * | 2017-06-30 | 2017-11-14 | 成都大学 | A kind of living body faces recognition methods based on blink detection |
CN107358153A (en) * | 2017-06-02 | 2017-11-17 | 广州视源电子科技股份有限公司 | Mouth movement detection method and device and living body identification method and system |
CN107358155A (en) * | 2017-06-02 | 2017-11-17 | 广州视源电子科技股份有限公司 | Method and device for detecting ghost face action and method and system for recognizing living body |
CN107392089A (en) * | 2017-06-02 | 2017-11-24 | 广州视源电子科技股份有限公司 | Eyebrow movement detection method and device and living body identification method and system |
US20170345181A1 (en) * | 2016-05-27 | 2017-11-30 | Beijing Kuangshi Technology Co., Ltd. | Video monitoring method and video monitoring system |
US20180308107A1 (en) * | 2017-04-24 | 2018-10-25 | Guangdong Matview Intelligent Science & Technology Co., Ltd. | Living-body detection based anti-cheating online research method, device and system |
WO2018202089A1 (en) * | 2017-05-05 | 2018-11-08 | 商汤集团有限公司 | Key point detection method and device, storage medium and electronic device |
-
2018
- 2018-12-14 CN CN201811532116.4A patent/CN109697416B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR100851981B1 (en) * | 2007-02-14 | 2008-08-12 | 삼성전자주식회사 | Liveness detection method and apparatus in video image |
CN104794464A (en) * | 2015-05-13 | 2015-07-22 | 上海依图网络科技有限公司 | In vivo detection method based on relative attributes |
CN106557723A (en) * | 2015-09-25 | 2017-04-05 | 北京市商汤科技开发有限公司 | A kind of system for face identity authentication with interactive In vivo detection and its method |
CN105426815A (en) * | 2015-10-29 | 2016-03-23 | 北京汉王智远科技有限公司 | Living body detection method and device |
CN106897658A (en) * | 2015-12-18 | 2017-06-27 | 腾讯科技(深圳)有限公司 | The discrimination method and device of face live body |
US20170345181A1 (en) * | 2016-05-27 | 2017-11-30 | Beijing Kuangshi Technology Co., Ltd. | Video monitoring method and video monitoring system |
US20180308107A1 (en) * | 2017-04-24 | 2018-10-25 | Guangdong Matview Intelligent Science & Technology Co., Ltd. | Living-body detection based anti-cheating online research method, device and system |
WO2018202089A1 (en) * | 2017-05-05 | 2018-11-08 | 商汤集团有限公司 | Key point detection method and device, storage medium and electronic device |
CN107358153A (en) * | 2017-06-02 | 2017-11-17 | 广州视源电子科技股份有限公司 | Mouth movement detection method and device and living body identification method and system |
CN107358155A (en) * | 2017-06-02 | 2017-11-17 | 广州视源电子科技股份有限公司 | Method and device for detecting ghost face action and method and system for recognizing living body |
CN107392089A (en) * | 2017-06-02 | 2017-11-24 | 广州视源电子科技股份有限公司 | Eyebrow movement detection method and device and living body identification method and system |
CN107346422A (en) * | 2017-06-30 | 2017-11-14 | 成都大学 | A kind of living body faces recognition methods based on blink detection |
Non-Patent Citations (6)
Title |
---|
TAO WANG 等: "Face Liveness Detection Using 3D Structure Recovered from a Single Camera", 《2013 INTERNATIONAL CONFERENCE ON BIOMETRICS (ICB)》 * |
XIAO-NAN HOU 等: "Similarity metric learning for face verification using sigmoid decision function", 《VIS COMPUT》 * |
XUAN QI 等: "CNN Based Key Frame Extraction for Face in Video Recognition", 《2018 IEEE 4TH INTERNATIONAL CONFERENCE ON IDENTITY, SECURITY, AND BEHAVIOR ANALYSIS (ISBA)》 * |
李文博 等: "基于傅里叶的卡通动画形状上下文动作捕捉", 《计算机与现代化》 * |
杨健伟: "面向人脸识别的人脸活体检测方法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
王德慧: "基于视频的人脸识别技术在监狱AB门控制系统中的应用与研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (32)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110390033A (en) * | 2019-07-25 | 2019-10-29 | 腾讯科技(深圳)有限公司 | Training method, device, electronic equipment and the storage medium of image classification model |
CN110390033B (en) * | 2019-07-25 | 2023-04-21 | 腾讯科技(深圳)有限公司 | Training method and device for image classification model, electronic equipment and storage medium |
CN111242178A (en) * | 2020-01-02 | 2020-06-05 | 杭州睿琪软件有限公司 | Object identification method, device and equipment |
WO2021135828A1 (en) * | 2020-01-02 | 2021-07-08 | 杭州睿琪软件有限公司 | Object identification method, apparatus and device |
CN111460419A (en) * | 2020-03-31 | 2020-07-28 | 周亚琴 | Internet of things artificial intelligence face verification method and Internet of things cloud server |
CN111460419B (en) * | 2020-03-31 | 2020-11-27 | 深圳市微网力合信息技术有限公司 | Internet of things artificial intelligence face verification method and Internet of things cloud server |
US11625646B2 (en) | 2020-04-06 | 2023-04-11 | Huawei Cloud Computing Technologies Co., Ltd. | Method, system, and medium for identifying human behavior in a digital video using convolutional neural networks |
WO2021203667A1 (en) * | 2020-04-06 | 2021-10-14 | Huawei Technologies Co., Ltd. | Method, system and medium for identifying human behavior in a digital video using convolutional neural networks |
CN111507301A (en) * | 2020-04-26 | 2020-08-07 | 腾讯科技(深圳)有限公司 | Video processing method, video processing device, computer equipment and storage medium |
CN111507301B (en) * | 2020-04-26 | 2021-06-08 | 腾讯科技(深圳)有限公司 | Video processing method, video processing device, computer equipment and storage medium |
CN111291736B (en) * | 2020-05-07 | 2020-08-25 | 南京景三医疗科技有限公司 | Image correction method and device and medical equipment |
CN111291736A (en) * | 2020-05-07 | 2020-06-16 | 南京景三医疗科技有限公司 | Image correction method and device and medical equipment |
CN111836072A (en) * | 2020-05-21 | 2020-10-27 | 北京嘀嘀无限科技发展有限公司 | Video processing method, device, equipment and storage medium |
CN111860107A (en) * | 2020-05-28 | 2020-10-30 | 四川中科凯泽科技有限公司 | Standing long jump evaluation method based on deep learning attitude estimation |
CN111881726A (en) * | 2020-06-15 | 2020-11-03 | 马上消费金融股份有限公司 | Living body detection method and device and storage medium |
CN111932604A (en) * | 2020-08-24 | 2020-11-13 | 腾讯音乐娱乐科技(深圳)有限公司 | Method and device for measuring human ear characteristic distance |
CN112016437B (en) * | 2020-08-26 | 2023-02-10 | 中国科学院重庆绿色智能技术研究院 | Living body detection method based on face video key frame |
CN112016437A (en) * | 2020-08-26 | 2020-12-01 | 中国科学院重庆绿色智能技术研究院 | Living body detection method based on face video key frame |
CN112055247A (en) * | 2020-09-11 | 2020-12-08 | 北京爱奇艺科技有限公司 | Video playing method, device, system and storage medium |
CN112182256A (en) * | 2020-09-28 | 2021-01-05 | 长城汽车股份有限公司 | Object identification method and device and vehicle |
CN112287850A (en) * | 2020-10-30 | 2021-01-29 | 维沃移动通信有限公司 | Article information identification method and device, electronic equipment and readable storage medium |
CN113178206B (en) * | 2021-04-22 | 2022-05-31 | 内蒙古大学 | AI (Artificial intelligence) composite anchor generation method, electronic equipment and readable storage medium |
CN113178206A (en) * | 2021-04-22 | 2021-07-27 | 内蒙古大学 | AI (Artificial intelligence) composite anchor generation method, electronic equipment and readable storage medium |
CN113158918A (en) * | 2021-04-26 | 2021-07-23 | 深圳市商汤科技有限公司 | Video processing method and device, electronic equipment and storage medium |
CN113657251A (en) * | 2021-08-16 | 2021-11-16 | 联想(北京)有限公司 | Detection method and device |
CN114494954A (en) * | 2022-01-18 | 2022-05-13 | 北京达佳互联信息技术有限公司 | Video identification method and device, electronic equipment and storage medium |
CN115272923B (en) * | 2022-07-22 | 2023-04-21 | 华中科技大学同济医学院附属协和医院 | Intelligent identification method and system based on big data platform |
CN115272923A (en) * | 2022-07-22 | 2022-11-01 | 华中科技大学同济医学院附属协和医院 | Intelligent identification method and system based on big data platform |
CN115761598A (en) * | 2022-12-20 | 2023-03-07 | 昆明思碓网络科技有限公司 | Big data analysis method and system based on cloud service platform |
CN115761598B (en) * | 2022-12-20 | 2023-09-08 | 易事软件(厦门)股份有限公司 | Big data analysis method and system based on cloud service platform |
CN116389761A (en) * | 2023-05-15 | 2023-07-04 | 南京邮电大学 | Clinical simulation teaching data management system of nursing |
CN116389761B (en) * | 2023-05-15 | 2023-08-08 | 南京邮电大学 | Clinical simulation teaching data management system of nursing |
Also Published As
Publication number | Publication date |
---|---|
CN109697416B (en) | 2022-11-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109697416A (en) | A kind of video data handling procedure and relevant apparatus | |
CN112215180B (en) | Living body detection method and device | |
CN105872477B (en) | video monitoring method and video monitoring system | |
CN110210276A (en) | A kind of motion track acquisition methods and its equipment, storage medium, terminal | |
CN112183353B (en) | Image data processing method and device and related equipment | |
CN110210535A (en) | Neural network training method and device and image processing method and device | |
CN107358146A (en) | Method for processing video frequency, device and storage medium | |
CN112312087B (en) | Method and system for quickly positioning event occurrence time in long-term monitoring video | |
CN109299658B (en) | Face detection method, face image rendering device and storage medium | |
CN106156702A (en) | Identity identifying method and equipment | |
CN113269091A (en) | Personnel trajectory analysis method, equipment and medium for intelligent park | |
CN109472193A (en) | Method for detecting human face and device | |
CN110442742A (en) | Retrieve method and device, processor, electronic equipment and the storage medium of image | |
CN106372603A (en) | Shielding face identification method and shielding face identification device | |
CN110069983A (en) | Vivo identification method, device, terminal and readable medium based on display medium | |
CN109948727A (en) | The training and classification method of image classification model, computer equipment and storage medium | |
CN108960145A (en) | Facial image detection method, device, storage medium and electronic equipment | |
CN112836625A (en) | Face living body detection method and device and electronic equipment | |
CN106407908A (en) | Training model generation method and human face detection method and device | |
KR20200060942A (en) | Method for face classifying based on trajectory in continuously photographed image | |
CN108171135A (en) | Method for detecting human face, device and computer readable storage medium | |
CN111259757B (en) | Living body identification method, device and equipment based on image | |
CN111881740A (en) | Face recognition method, face recognition device, electronic equipment and medium | |
CN115082992A (en) | Face living body detection method and device, electronic equipment and readable storage medium | |
CN114299583A (en) | Face authentication identification method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |