CN108734144A

CN108734144A - A kind of speaker's identity identifying method based on recognition of face

Info

Publication number: CN108734144A
Application number: CN201810524477.8A
Authority: CN
Inventors: 张轶君
Original assignee: Beijing Wen Xiang Information Technology Co Ltd
Current assignee: Beijing Wen Xiang Information Technology Co Ltd
Priority date: 2018-05-28
Filing date: 2018-05-28
Publication date: 2018-11-02

Abstract

The present invention relates to a kind of speaker's identity identifying method based on recognition of face, solution is technical problem complicated for operation, by using including identity information typing and identification；The identity information typing includes according to speaker's video image, and user's face detection algorithm detects speaker's face image position, and carries out speaker's face image characteristics extraction and obtain face characteristic coding T, inputs identity information Info corresponding with speaker；The identification includes the vision signal of continuous crawl analysis recording and broadcasting system, and the caulocarpic statistical result of analysis obtains the technical solution for speaker's identity information that this is recorded, preferably resolves the problem, can be used in multimedia application.

Description

A kind of speaker's identity identifying method based on recognition of face

Technical field

The present invention relates to MultiMedia Fields, and in particular to a kind of speaker's identity identifying method based on recognition of face.

Background technology

Recorded broadcast function is one of the function of many at present information-based meeting rooms and classroom.And it is continuous with recording system Perfect, the standard configuration for having become recording system is recorded in full-automatic tracking shooting.But corresponding recorded content pipe Science and engineering is made or based on manually.With the increase of recorded content so that the filing management of recorded content becomes increasingly complex.And it closes Accurately filing is video must working of being finally able to propagate and preserve to reason.And the studio of school or company be it is general, Plant maintenance people can not possibly all speakers of understanding.And recording system belongs to professional equipment for speaker, is not suitable for them It is logged in and is operated.

Current way before recording or after recording, fills in record manually mainly by recording people or plant maintenance people Information processed is as speaker's personal identification method.There are complicated for operation, human cost is high, the technical issues of being susceptible to careless omission. Therefore it provides a kind of speaker's identity identifying method easy to operate based on recognition of face is with regard to necessary.The present invention is main Identity is identified by identifying speaker's face information in recorded video, and being provided for recorded video automation filing can technology branch Support.

Invention content

The technical problem to be solved by the present invention is to technical problems complicated for operation existing in the prior art.It provides a kind of New speaker's identity identifying method based on recognition of face, being somebody's turn to do speaker's identity identifying method based on recognition of face has behaviour Make it is simple, be not easy to slip, high degree of automation the characteristics of.

In order to solve the above technical problems, the technical solution used is as follows：

A kind of speaker's identity identifying method based on recognition of face, speaker's identity identifying method include identity letter Cease typing and identification；The identity information typing includes according to speaker's video image, the detection of user's face detection algorithm Speaker's face image position, and carry out speaker's face image characteristics extraction and obtain face characteristic coding T, input and speaker The corresponding identity information Info of people；

The identification includes：

Step 1, start to record live streaming, image F is uninterruptedly obtained from the video of recording；

Step 2, pretreatment image F obtains image G；

Step 3, Face datection is carried out to image G using the method consistent with the Face datection algorithm of identity information typing, The human face region determined in image is searched, human face image sequence L is obtained；

Step 4, characteristics extraction is carried out to human face image sequence L using the method with identity information typing, obtains face Face characteristic sequence LT in image sequence L；

Step 5, the character pair spacing of database FData and face characteristic sequence LT are calculated, distance metric Lx is using more Tie up square of European geometric distance；Predetermined threshold value lambda compares the size of distance Lx and threshold value lambda；If distance Lx is less than The corresponding information Info of TD are read, and number are marked to add 1 by threshold value lambda；

Step 6, step 1- steps 5 are repeated until recording to terminate, the maximum identity information Info of number will be marked as master Say people's identity information.

In said program, for optimization, further, the identity information typing includes：

Step A1 records the image frame for there was only one people of speaker, and speaker is towards camera lens；

Step A2 captures main broadcaster's video flowing, obtains image FF；

Step A3 carries out video pre-filtering to image FF and obtains image GG；

Step A4 carries out Face datection to image GG, determines the human face region in image GG, obtain facial image RR；

Step A5 extracts characteristic value using Eigenvalue Extraction Method from facial image RR, obtains face characteristic coding T；

Step A6, input speaker's information is Info, and speaker's information is corresponding with face characteristic coding T, is saved in number According in the FData of library.

Further, the Eigenvalue Extraction Method includes：

Step A：Facial image R is demarcated, 68 characteristic points are demarcated；

Characteristic point in step A is adjusted to fixed position by step B using affine transformation, by the face court of facial image R To being converted to front；

Step C encodes the facial image obtained in step B, and face characteristic is encoded to a row N-dimensional array T, meter Ti count in group T at a distance from Tj：

D (i, j)=∑_k(Ti(k)-Tj(k))²。

Further, the identity information Info includes name, user name and gender.

Further, the pretreatment is that image denoising is handled.

Beneficial effects of the present invention：The present invention, based on completely new deep learning algorithm, extraction face characteristic information carries out body Part identification, accuracy, reliability greatly reinforce.It is analyzed using the vision signal of recording system, without increasing additional end End equipment reduces hardware cost and later maintenance cost.Identification is carried out but also speaker need not be using face Identity logs carry out any operation bidirectional, are truly realized normalization recording.Relatively independent identifying system need to only safeguard ultralight amount The face information management platform of change, greatly strengthens the maintainability of system.

The equipment that the present invention makes full use of original recording system, the video for extracting recording system carry out identification.And refer to Line and the mode swiped the card must need to increase equipment in terminal.The identification process of the present invention is merged in entire recording process, speaker People ensure that the normalization of speaker is recorded, also avoid the prior art and be likely to occur due to speaker without especially being operated People forgets to swipe the card or log in the possibility of fingerprint.

Description of the drawings

Present invention will be further explained below with reference to the attached drawings and examples.

Fig. 1, the schematic diagram of speaker's identity identifying method based on recognition of face.

Fig. 2, identity information typing flow diagram.

Specific implementation mode

In order to make the purpose , technical scheme and advantage of the present invention be clearer, with reference to embodiments, to the present invention It is further elaborated.It should be appreciated that described herein, specific examples are only used to explain the present invention, is not used to limit The fixed present invention.

Embodiment 1

The present embodiment provides a kind of speaker's identity identifying method based on recognition of face, speaker's identity as described in Figure 1 Authentication method includes identity information typing and identification；The identity information typing includes being made according to speaker's video image Employment face detection algorithm detects speaker's face image position, and carries out speaker's face image characteristics extraction and obtain face spy Assemble-publish code T inputs identity information Info corresponding with speaker；The identification includes step 1, starts to record live streaming, from Image F is uninterruptedly obtained in the video of recording；

Step 2, pretreatment image F obtains image G；

Such as Fig. 2, identity information typing part：

Before identity information typing starts, it is ensured that there was only one people of speaker before recording camera lens, keeps speaker towards camera lens, and It opens recording arrangement and carries out normal Image Acquisition and video stream.Recording arrangements all at present all have remote operation work( Can, recorded video can be sent to network in the form of video flowing automatically.

System captures main broadcaster's video flowing from recording and broadcasting system, in the present embodiment system grabs be recording system Rtmp master Stream, obtains image F；

Video pre-filtering, including denoising, the processing such as luminance proportion are carried out to image F；The present embodiment only carries out image Denoising is existing Bilateral Filter methods, and the parameter of use is brightness sigma=20, space sigma =7, obtain image G；

Face datection algorithm is carried out to image G, finds the human face region image R in image；Face datection algorithm uses mesh Preceding disclosed fast algorithm, in this example, the histograms of oriented gradients feature detection HOG that uses.

To the facial image R detected, characteristics extraction is carried out.Face characteristic extraction can be divided into following steps：

(1) demarcates the face picture in image R, demarcates 68 characteristic points, including the top of chin, every eye The in-profile etc. of the exterior contour of eyeball, every eyebrow；The method of use is existing features localization method；

(2) characteristic point found in previous step is adjusted to fixed position by using affine transformation；By this transformation, Facial orientation in facial image R is converted to front by us；Simple affine transformation ensures that facial image is not distorted；

(3) encodes face characteristic, i.e., face characteristic is expressed as row N [64~256] dimension group T so that every The feature array T for opening facial image extraction is different from.When facial image is the different pictures of same person shooting, array T exists Distance in N-dimensional space is smaller, and the array T of the face picture of different people shooting is apart from larger.It is used in present example Feature coding method is to train the coding method obtained using deep neural network.This method is widely used in recognition of face class and answers With.In this step, the dimension of ordered series of numbers T is using 128 dimensions.The distance between different ordered series of numbers T, in this example use European geometry away from From square, such as the distance between Ti and Tj be：

D (i, j)=∑_k(Ti(k)-Tj(k))²；

Finally, in identification system, speaker's information is inputted, such as name, user name, gender etc. are denoted as Info； The personal management platform user name that wherein user name is organized where being, facilitates later data to dock.After input, system will walk The face characteristic coding T and corresponding information Info extracted in rapid five, preserves into database FData；So far identity typing Journey terminates.

Identification part：

After speaker starts recording, identification system starts to capture video flowing from recording system, constantly obtains image F；

Image preprocessing is carried out to image F；In example, denoising only is carried out to image, application is existing The parameter of Bilateral Filter methods, use is brightness sigma=20, and space sigma=7 obtains image G；

Face datection algorithm is carried out to image G, finds the human face region in image, what is obtained represents more than one people One column region L；Face datection algorithm is that histograms of oriented gradients feature detects HOG, and algorithm is consistent with system identity typing part.

To the human face image sequence L detected, characteristics extraction is carried out.Face characteristic value extraction step is recorded with system identity It is consistent to enter part.Finally obtain the characteristic sequence LT of all faces in L；

Calculate database FData and characteristic sequence LT characteristic sequence two-by-two between distance, distance metric use multidimensional Europe Square of formula geometric distance.When being less than some at a distance from some feature coding T in LT is between the feature coding TD in FData When threshold value lambda, the corresponding information Info of TD are read, and occurrence number is marked to add one (preliminary examination occurrence number is 0)；

Step 1 is repeated to five, is terminated until recording, system is to marking the maximum identity information Info of occurrence number, concurrently It send to recording system, speaker's information as this recording.By step 6, continuously recognition of face ensures speaker The correct crawl of everybody face, and further decrease the error rate of recognition of face.

The present embodiment carries out recognition of face analysis using the vision signal of recording and broadcasting system, without additionally being set in terminal addition It is standby；Video stream is the standard feature of current recording system, the mode that present example is flowed using crawl rtmp, but Proprietary protocol transmission image may be used.

The identity information typing and identification of the present embodiment speaker are all the vision signals using recording and broadcasting system, rather than are made The static informations such as the photo with speaker.

The present embodiment uses the face characteristic of user to encode the data input of user, rather than facial image sheet Body greatly reduces the complexity and information storage of information comparison.

The present embodiment constantly captures the vision signal of analysis recording and broadcasting system by step 6 in identification division, analysis Caulocarpic statistical result obtains this speaker's identity information recorded.The accuracy of recognition of face is greatly strengthened, is protected In the case of having demonstrate,proved speaker's normalization recording, its identity information is accurately obtained.

The Face datection and feature coding method of the present embodiment China have much at present.In the carried example of the present invention, only Using some of which.Using other Face datection algorithms and face characteristic coding method, the step for installing the present invention carries out group Present disclosure may be implemented in contract sample.Although the illustrative specific implementation mode of the present invention is described above, with Convenient for those skilled in the art it will be appreciated that the present invention, but the present invention is not limited only to the range of specific implementation mode, For those skilled in the art, as long as long as the sheet that various change is limited and determined in the attached claims In spirit and range, all are using the innovation and creation of present inventive concept in the row of protection.

Claims

1. a kind of speaker's identity identifying method based on recognition of face, it is characterised in that：Speaker's identity identifying method Including identity information typing and identification；

The identity information typing includes according to speaker's video image, and user's face detection algorithm detects speaker's face image Position, and carry out speaker's face image characteristics extraction and obtain face characteristic coding T, input identity letter corresponding with speaker Cease Info；

The identification includes：

Step 2, pretreatment image F obtains image G；

Step 3, Face datection is carried out to image G using the method consistent with the Face datection algorithm of identity information typing, searched It determines the human face region in image, obtains human face image sequence L；

Step 4, characteristics extraction is carried out to human face image sequence L using the method with identity information typing, obtains facial image Face characteristic sequence LT in sequence L；

Step 5, the character pair spacing of database FData and face characteristic sequence LT are calculated, distance metric Lx uses multidimensional Europe Square of formula geometric distance；Predetermined threshold value lambda compares the size of distance Lx and threshold value lambda；If distance Lx is less than threshold value The corresponding information Info of TD are read, and number are marked to add 1 by lambda；

Step 6, step 1- steps 5 are repeated until recording to terminate, the maximum identity information Info of number will be marked as speaker Identity information.

2. speaker's identity identifying method according to claim 1 based on recognition of face, it is characterised in that：The identity Data input includes：

Step A2 captures main broadcaster's video flowing, obtains image FF；

Step A3 carries out video pre-filtering to image FF and obtains image GG；

Step A6, input speaker's information is Info, and speaker's information is corresponding with face characteristic coding T, is saved in database In FData.

3. speaker's identity identifying method according to claim 2 based on recognition of face, it is characterised in that：The feature Value extracting method includes：

Characteristic point in step A is adjusted to fixed position by step B using affine transformation, and the facial orientation of facial image R is turned It is changed to front；

Step C encodes the facial image obtained in step B, and face characteristic is encoded to a row N-dimensional array T, calculates number Ti is at a distance from Tj in group T：

D (i, j)=∑_k(Ti(k)-Tj(k))²。

4. speaker's identity identifying method according to claim 1 based on recognition of face, it is characterised in that：The identity Information Info includes name, user name and gender.

5. according to speaker's identity identifying method described in claim 1 based on recognition of face, it is characterised in that：The pretreatment For image denoising processing.