Invention content
The technical problem to be solved by the present invention is to technical problems complicated for operation existing in the prior art.It provides a kind of
New speaker's identity identifying method based on recognition of face, being somebody's turn to do speaker's identity identifying method based on recognition of face has behaviour
Make it is simple, be not easy to slip, high degree of automation the characteristics of.
In order to solve the above technical problems, the technical solution used is as follows:
A kind of speaker's identity identifying method based on recognition of face, speaker's identity identifying method include identity letter
Cease typing and identification;The identity information typing includes according to speaker's video image, the detection of user's face detection algorithm
Speaker's face image position, and carry out speaker's face image characteristics extraction and obtain face characteristic coding T, input and speaker
The corresponding identity information Info of people;
The identification includes:
Step 1, start to record live streaming, image F is uninterruptedly obtained from the video of recording;
Step 2, pretreatment image F obtains image G;
Step 3, Face datection is carried out to image G using the method consistent with the Face datection algorithm of identity information typing,
The human face region determined in image is searched, human face image sequence L is obtained;
Step 4, characteristics extraction is carried out to human face image sequence L using the method with identity information typing, obtains face
Face characteristic sequence LT in image sequence L;
Step 5, the character pair spacing of database FData and face characteristic sequence LT are calculated, distance metric Lx is using more
Tie up square of European geometric distance;Predetermined threshold value lambda compares the size of distance Lx and threshold value lambda;If distance Lx is less than
The corresponding information Info of TD are read, and number are marked to add 1 by threshold value lambda;
Step 6, step 1- steps 5 are repeated until recording to terminate, the maximum identity information Info of number will be marked as master
Say people's identity information.
In said program, for optimization, further, the identity information typing includes:
Step A1 records the image frame for there was only one people of speaker, and speaker is towards camera lens;
Step A2 captures main broadcaster's video flowing, obtains image FF;
Step A3 carries out video pre-filtering to image FF and obtains image GG;
Step A4 carries out Face datection to image GG, determines the human face region in image GG, obtain facial image RR;
Step A5 extracts characteristic value using Eigenvalue Extraction Method from facial image RR, obtains face characteristic coding T;
Step A6, input speaker's information is Info, and speaker's information is corresponding with face characteristic coding T, is saved in number
According in the FData of library.
Further, the Eigenvalue Extraction Method includes:
Step A:Facial image R is demarcated, 68 characteristic points are demarcated;
Characteristic point in step A is adjusted to fixed position by step B using affine transformation, by the face court of facial image R
To being converted to front;
Step C encodes the facial image obtained in step B, and face characteristic is encoded to a row N-dimensional array T, meter
Ti count in group T at a distance from Tj:
D (i, j)=∑k(Ti(k)-Tj(k))2。
Further, the identity information Info includes name, user name and gender.
Further, the pretreatment is that image denoising is handled.
Beneficial effects of the present invention:The present invention, based on completely new deep learning algorithm, extraction face characteristic information carries out body
Part identification, accuracy, reliability greatly reinforce.It is analyzed using the vision signal of recording system, without increasing additional end
End equipment reduces hardware cost and later maintenance cost.Identification is carried out but also speaker need not be using face
Identity logs carry out any operation bidirectional, are truly realized normalization recording.Relatively independent identifying system need to only safeguard ultralight amount
The face information management platform of change, greatly strengthens the maintainability of system.
The equipment that the present invention makes full use of original recording system, the video for extracting recording system carry out identification.And refer to
Line and the mode swiped the card must need to increase equipment in terminal.The identification process of the present invention is merged in entire recording process, speaker
People ensure that the normalization of speaker is recorded, also avoid the prior art and be likely to occur due to speaker without especially being operated
People forgets to swipe the card or log in the possibility of fingerprint.
Embodiment 1
The present embodiment provides a kind of speaker's identity identifying method based on recognition of face, speaker's identity as described in Figure 1
Authentication method includes identity information typing and identification;The identity information typing includes being made according to speaker's video image
Employment face detection algorithm detects speaker's face image position, and carries out speaker's face image characteristics extraction and obtain face spy
Assemble-publish code T inputs identity information Info corresponding with speaker;The identification includes step 1, starts to record live streaming, from
Image F is uninterruptedly obtained in the video of recording;
Step 2, pretreatment image F obtains image G;
Step 3, Face datection is carried out to image G using the method consistent with the Face datection algorithm of identity information typing,
The human face region determined in image is searched, human face image sequence L is obtained;
Step 4, characteristics extraction is carried out to human face image sequence L using the method with identity information typing, obtains face
Face characteristic sequence LT in image sequence L;
Step 5, the character pair spacing of database FData and face characteristic sequence LT are calculated, distance metric Lx is using more
Tie up square of European geometric distance;Predetermined threshold value lambda compares the size of distance Lx and threshold value lambda;If distance Lx is less than
The corresponding information Info of TD are read, and number are marked to add 1 by threshold value lambda;
Step 6, step 1- steps 5 are repeated until recording to terminate, the maximum identity information Info of number will be marked as master
Say people's identity information.
Such as Fig. 2, identity information typing part:
Before identity information typing starts, it is ensured that there was only one people of speaker before recording camera lens, keeps speaker towards camera lens, and
It opens recording arrangement and carries out normal Image Acquisition and video stream.Recording arrangements all at present all have remote operation work(
Can, recorded video can be sent to network in the form of video flowing automatically.
System captures main broadcaster's video flowing from recording and broadcasting system, in the present embodiment system grabs be recording system Rtmp master
Stream, obtains image F;
Video pre-filtering, including denoising, the processing such as luminance proportion are carried out to image F;The present embodiment only carries out image
Denoising is existing Bilateral Filter methods, and the parameter of use is brightness sigma=20, space sigma
=7, obtain image G;
Face datection algorithm is carried out to image G, finds the human face region image R in image;Face datection algorithm uses mesh
Preceding disclosed fast algorithm, in this example, the histograms of oriented gradients feature detection HOG that uses.
To the facial image R detected, characteristics extraction is carried out.Face characteristic extraction can be divided into following steps:
(1) demarcates the face picture in image R, demarcates 68 characteristic points, including the top of chin, every eye
The in-profile etc. of the exterior contour of eyeball, every eyebrow;The method of use is existing features localization method;
(2) characteristic point found in previous step is adjusted to fixed position by using affine transformation;By this transformation,
Facial orientation in facial image R is converted to front by us;Simple affine transformation ensures that facial image is not distorted;
(3) encodes face characteristic, i.e., face characteristic is expressed as row N [64~256] dimension group T so that every
The feature array T for opening facial image extraction is different from.When facial image is the different pictures of same person shooting, array T exists
Distance in N-dimensional space is smaller, and the array T of the face picture of different people shooting is apart from larger.It is used in present example
Feature coding method is to train the coding method obtained using deep neural network.This method is widely used in recognition of face class and answers
With.In this step, the dimension of ordered series of numbers T is using 128 dimensions.The distance between different ordered series of numbers T, in this example use European geometry away from
From square, such as the distance between Ti and Tj be:
D (i, j)=∑k(Ti(k)-Tj(k))2;
Finally, in identification system, speaker's information is inputted, such as name, user name, gender etc. are denoted as Info;
The personal management platform user name that wherein user name is organized where being, facilitates later data to dock.After input, system will walk
The face characteristic coding T and corresponding information Info extracted in rapid five, preserves into database FData;So far identity typing
Journey terminates.
Identification part:
After speaker starts recording, identification system starts to capture video flowing from recording system, constantly obtains image
F;
Image preprocessing is carried out to image F;In example, denoising only is carried out to image, application is existing
The parameter of Bilateral Filter methods, use is brightness sigma=20, and space sigma=7 obtains image G;
Face datection algorithm is carried out to image G, finds the human face region in image, what is obtained represents more than one people
One column region L;Face datection algorithm is that histograms of oriented gradients feature detects HOG, and algorithm is consistent with system identity typing part.
To the human face image sequence L detected, characteristics extraction is carried out.Face characteristic value extraction step is recorded with system identity
It is consistent to enter part.Finally obtain the characteristic sequence LT of all faces in L;
Calculate database FData and characteristic sequence LT characteristic sequence two-by-two between distance, distance metric use multidimensional Europe
Square of formula geometric distance.When being less than some at a distance from some feature coding T in LT is between the feature coding TD in FData
When threshold value lambda, the corresponding information Info of TD are read, and occurrence number is marked to add one (preliminary examination occurrence number is 0);
Step 1 is repeated to five, is terminated until recording, system is to marking the maximum identity information Info of occurrence number, concurrently
It send to recording system, speaker's information as this recording.By step 6, continuously recognition of face ensures speaker
The correct crawl of everybody face, and further decrease the error rate of recognition of face.
The present embodiment carries out recognition of face analysis using the vision signal of recording and broadcasting system, without additionally being set in terminal addition
It is standby;Video stream is the standard feature of current recording system, the mode that present example is flowed using crawl rtmp, but
Proprietary protocol transmission image may be used.
The identity information typing and identification of the present embodiment speaker are all the vision signals using recording and broadcasting system, rather than are made
The static informations such as the photo with speaker.
The present embodiment uses the face characteristic of user to encode the data input of user, rather than facial image sheet
Body greatly reduces the complexity and information storage of information comparison.
The present embodiment constantly captures the vision signal of analysis recording and broadcasting system by step 6 in identification division, analysis
Caulocarpic statistical result obtains this speaker's identity information recorded.The accuracy of recognition of face is greatly strengthened, is protected
In the case of having demonstrate,proved speaker's normalization recording, its identity information is accurately obtained.
The Face datection and feature coding method of the present embodiment China have much at present.In the carried example of the present invention, only
Using some of which.Using other Face datection algorithms and face characteristic coding method, the step for installing the present invention carries out group
Present disclosure may be implemented in contract sample.Although the illustrative specific implementation mode of the present invention is described above, with
Convenient for those skilled in the art it will be appreciated that the present invention, but the present invention is not limited only to the range of specific implementation mode,
For those skilled in the art, as long as long as the sheet that various change is limited and determined in the attached claims
In spirit and range, all are using the innovation and creation of present inventive concept in the row of protection.