CN104537358A

CN104537358A - Lip language recognition lip-shape training database generating method based on deep learning

Info

Publication number: CN104537358A
Application number: CN201510018956.9A
Authority: CN
Inventors: 陈拥权; 李建中; 郑荣稳
Original assignee: ANHUI COSWIT INFORMATION TECHNOLOGY Co Ltd
Current assignee: ANHUI COSWIT INFORMATION TECHNOLOGY Co Ltd
Priority date: 2014-12-26
Filing date: 2014-12-26
Publication date: 2015-04-22

Abstract

The invention discloses a lip language recognition lip-shape training database generating method based on deep learning. The method includes the following steps: collecting voiced video images, wherein the lip video images and talking voices of a target person are synchronously collected through a camera with a microphone; carrying out audio and video analysis, wherein the lip video images are analyzed through a computer according to the image analysis technology to obtain lip-shape characteristic values, and the talking voices are analyzed through the computer according to the voice recognition technology to obtain character information; forming a training database, wherein the lip-shape characteristic values correspond to the character information to generate the training database. The lip language recognition lip-shape training database generating method has the advantage that by means of the technical scheme, the efficiency for building a lip-shape model base is greatly improved.

Description

Based on the generation method of the lip reading identification lip tranining database of degree of depth study

The application is divisional application, and the application number of original application is 201410829417.9, and the applying date is on Dec 26th, 2014, and invention and created name is: based on the lip reading identification lip model base construction method of degree of depth study.

Technical field

The present invention relates to man-machine interaction model bank constructing technology field, be specifically related to the generation method of the lip reading identification lip tranining database based on degree of depth study.

Background technology

The development of artificial intelligence technology, people have started computer video analytical technology to be used for lip reading identification, assign problem with the oral production ordering in scene solving the noise places such as workshop.Carrying out in lip reading identifying, needing to use lip model bank, its accuracy, comprehensively will directly determine the efficiency of lip reading identification.The many foundation one by one by manually carrying out lip model of prior art, not only workload is large, and its comprehensive being difficult to ensures.

For solving the problem, the invention provides a kind of lip reading identification lip model base construction method based on degree of depth study, effectively can reduce labor workload, and improve the comprehensive of lip model bank.

Summary of the invention

For the problems referred to above, the present invention is by comprehensively quoting degree of depth learning art, speech recognition technology and image analysis technology, carried out learning and the automatic mode building lip model bank to a large amount of sound lip video by computing machine, lip reading identification lip tranining database based on degree of depth study is provided, thus effectively promotes the structure efficiency of lip model bank.

Concrete technical scheme provided by the invention is: a kind of generation method of the lip reading identification lip tranining database based on degree of depth study, and the method comprises the following steps:

Sound video image acquisition, by lip video image and the sound of speaking of the camera synchronous acquisition target person with microphone;

Audio & video is analyzed, and is analyzed according to image analysis technology by computing machine to described lip video image, obtains lip eigenwert, by computing machine according to speech recognition technology to described sound analysis of speaking, obtain Word message;

Tranining database is formed, by described lip eigenwert and described Word message one_to_one corresponding, and generating training data storehouse.

The quantity of described target person should be no less than 2 people.

Beneficial effect: a kind of lip reading identification lip model base construction method based on degree of depth study provided by the invention, by computing machine, automatic analysis is carried out to lip image and sound of speaking, extract the eigenwert of lip, generate the corresponding word of sound of speaking, form tranining database, afterwards, utilize degree of depth learning art to learn tranining database, build lip model bank.This technical scheme is that the structure of lip model bank provides a kind of technological means efficiently, significantly improves the structure efficiency of lip model bank.

Accompanying drawing explanation

It is workflow diagram of the present invention shown in Fig. 1.

Embodiment

In order to more specifically describe the present invention, below in conjunction with the drawings and the specific embodiments, technical scheme of the present invention is described in detail.

As shown in Figure 1, the lip video image with the camera synchronous acquisition target person of microphone and sound of speaking first is passed through.When carrying out lip video image and speaking sound collection, both synchronisms need be ensured, avoid the situation such as delaying because of sound and bring error to subsequent analysis.Then, according to image analysis technology, described lip video image is analyzed by computing machine, obtains lip eigenwert, by computing machine according to speech recognition technology to described sound analysis of speaking, obtain the Word message that sound of speaking is corresponding.By described lip eigenwert and described Word message one_to_one corresponding, generating training data storehouse, more described tranining database is learnt according to degree of depth learning art by computing machine, build lip model bank.

Each lip model in lip model bank, word corresponding corresponding respectively, the word namely obtained by sound of speaking.

For degree of depth study, how much relevant its training effect is with the data volume of adopted tranining database, and data volume is larger, and training result is more accurate.Therefore, in order to obtain better lip model bank, as far as possible many sound video images should be gathered, to form the tranining database of data volume more horn of plenty.

The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, all any amendments done within the spirit and principles in the present invention, equivalent replacement and improvement etc., all should be included within protection scope of the present invention.

Claims

1., based on the generation method of the lip reading identification lip tranining database of degree of depth study, it is characterized in that, the method comprises the following steps:

2., as claimed in claim 1 based on the generation method of the lip reading identification lip tranining database of degree of depth study, it is characterized in that, the quantity of described target person should be no less than 2 people.