CN104537358A - Lip language recognition lip-shape training database generating method based on deep learning - Google Patents

Lip language recognition lip-shape training database generating method based on deep learning Download PDF

Info

Publication number
CN104537358A
CN104537358A CN201510018956.9A CN201510018956A CN104537358A CN 104537358 A CN104537358 A CN 104537358A CN 201510018956 A CN201510018956 A CN 201510018956A CN 104537358 A CN104537358 A CN 104537358A
Authority
CN
China
Prior art keywords
lip
training database
shape
sound
generating method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510018956.9A
Other languages
Chinese (zh)
Inventor
陈拥权
李建中
郑荣稳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ANHUI COSWIT INFORMATION TECHNOLOGY Co Ltd
Original Assignee
ANHUI COSWIT INFORMATION TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ANHUI COSWIT INFORMATION TECHNOLOGY Co Ltd filed Critical ANHUI COSWIT INFORMATION TECHNOLOGY Co Ltd
Priority to CN201510018956.9A priority Critical patent/CN104537358A/en
Publication of CN104537358A publication Critical patent/CN104537358A/en
Pending legal-status Critical Current

Links

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses a lip language recognition lip-shape training database generating method based on deep learning. The method includes the following steps: collecting voiced video images, wherein the lip video images and talking voices of a target person are synchronously collected through a camera with a microphone; carrying out audio and video analysis, wherein the lip video images are analyzed through a computer according to the image analysis technology to obtain lip-shape characteristic values, and the talking voices are analyzed through the computer according to the voice recognition technology to obtain character information; forming a training database, wherein the lip-shape characteristic values correspond to the character information to generate the training database. The lip language recognition lip-shape training database generating method has the advantage that by means of the technical scheme, the efficiency for building a lip-shape model base is greatly improved.

Description

Based on the generation method of the lip reading identification lip tranining database of degree of depth study
The application is divisional application, and the application number of original application is 201410829417.9, and the applying date is on Dec 26th, 2014, and invention and created name is: based on the lip reading identification lip model base construction method of degree of depth study.
Technical field
The present invention relates to man-machine interaction model bank constructing technology field, be specifically related to the generation method of the lip reading identification lip tranining database based on degree of depth study.
Background technology
The development of artificial intelligence technology, people have started computer video analytical technology to be used for lip reading identification, assign problem with the oral production ordering in scene solving the noise places such as workshop.Carrying out in lip reading identifying, needing to use lip model bank, its accuracy, comprehensively will directly determine the efficiency of lip reading identification.The many foundation one by one by manually carrying out lip model of prior art, not only workload is large, and its comprehensive being difficult to ensures.
For solving the problem, the invention provides a kind of lip reading identification lip model base construction method based on degree of depth study, effectively can reduce labor workload, and improve the comprehensive of lip model bank.
Summary of the invention
For the problems referred to above, the present invention is by comprehensively quoting degree of depth learning art, speech recognition technology and image analysis technology, carried out learning and the automatic mode building lip model bank to a large amount of sound lip video by computing machine, lip reading identification lip tranining database based on degree of depth study is provided, thus effectively promotes the structure efficiency of lip model bank.
Concrete technical scheme provided by the invention is: a kind of generation method of the lip reading identification lip tranining database based on degree of depth study, and the method comprises the following steps:
Sound video image acquisition, by lip video image and the sound of speaking of the camera synchronous acquisition target person with microphone;
Audio & video is analyzed, and is analyzed according to image analysis technology by computing machine to described lip video image, obtains lip eigenwert, by computing machine according to speech recognition technology to described sound analysis of speaking, obtain Word message;
Tranining database is formed, by described lip eigenwert and described Word message one_to_one corresponding, and generating training data storehouse.
The quantity of described target person should be no less than 2 people.
Beneficial effect: a kind of lip reading identification lip model base construction method based on degree of depth study provided by the invention, by computing machine, automatic analysis is carried out to lip image and sound of speaking, extract the eigenwert of lip, generate the corresponding word of sound of speaking, form tranining database, afterwards, utilize degree of depth learning art to learn tranining database, build lip model bank.This technical scheme is that the structure of lip model bank provides a kind of technological means efficiently, significantly improves the structure efficiency of lip model bank.
Accompanying drawing explanation
It is workflow diagram of the present invention shown in Fig. 1.
Embodiment
In order to more specifically describe the present invention, below in conjunction with the drawings and the specific embodiments, technical scheme of the present invention is described in detail.
As shown in Figure 1, the lip video image with the camera synchronous acquisition target person of microphone and sound of speaking first is passed through.When carrying out lip video image and speaking sound collection, both synchronisms need be ensured, avoid the situation such as delaying because of sound and bring error to subsequent analysis.Then, according to image analysis technology, described lip video image is analyzed by computing machine, obtains lip eigenwert, by computing machine according to speech recognition technology to described sound analysis of speaking, obtain the Word message that sound of speaking is corresponding.By described lip eigenwert and described Word message one_to_one corresponding, generating training data storehouse, more described tranining database is learnt according to degree of depth learning art by computing machine, build lip model bank.
Each lip model in lip model bank, word corresponding corresponding respectively, the word namely obtained by sound of speaking.
For degree of depth study, how much relevant its training effect is with the data volume of adopted tranining database, and data volume is larger, and training result is more accurate.Therefore, in order to obtain better lip model bank, as far as possible many sound video images should be gathered, to form the tranining database of data volume more horn of plenty.
The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, all any amendments done within the spirit and principles in the present invention, equivalent replacement and improvement etc., all should be included within protection scope of the present invention.

Claims (2)

1., based on the generation method of the lip reading identification lip tranining database of degree of depth study, it is characterized in that, the method comprises the following steps:
Sound video image acquisition, by lip video image and the sound of speaking of the camera synchronous acquisition target person with microphone;
Audio & video is analyzed, and is analyzed according to image analysis technology by computing machine to described lip video image, obtains lip eigenwert, by computing machine according to speech recognition technology to described sound analysis of speaking, obtain Word message;
Tranining database is formed, by described lip eigenwert and described Word message one_to_one corresponding, and generating training data storehouse.
2., as claimed in claim 1 based on the generation method of the lip reading identification lip tranining database of degree of depth study, it is characterized in that, the quantity of described target person should be no less than 2 people.
CN201510018956.9A 2014-12-26 2014-12-26 Lip language recognition lip-shape training database generating method based on deep learning Pending CN104537358A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510018956.9A CN104537358A (en) 2014-12-26 2014-12-26 Lip language recognition lip-shape training database generating method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510018956.9A CN104537358A (en) 2014-12-26 2014-12-26 Lip language recognition lip-shape training database generating method based on deep learning

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN201410829417.9A Division CN104484656A (en) 2014-12-26 2014-12-26 Deep learning-based lip language recognition lip shape model library construction method

Publications (1)

Publication Number Publication Date
CN104537358A true CN104537358A (en) 2015-04-22

Family

ID=52852878

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510018956.9A Pending CN104537358A (en) 2014-12-26 2014-12-26 Lip language recognition lip-shape training database generating method based on deep learning

Country Status (1)

Country Link
CN (1) CN104537358A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104808794A (en) * 2015-04-24 2015-07-29 北京旷视科技有限公司 Method and system for inputting lip language
CN105653595A (en) * 2015-12-18 2016-06-08 合肥寰景信息技术有限公司 Intelligent voice assistance type network community
CN107945803A (en) * 2017-11-28 2018-04-20 上海与德科技有限公司 The assisted learning method and robot of a kind of robot
CN108520741A (en) * 2018-04-12 2018-09-11 科大讯飞股份有限公司 A kind of whispering voice restoration methods, device, equipment and readable storage medium storing program for executing
CN110276259A (en) * 2019-05-21 2019-09-24 平安科技(深圳)有限公司 Lip reading recognition methods, device, computer equipment and storage medium
CN111724786A (en) * 2019-03-22 2020-09-29 上海博泰悦臻网络技术服务有限公司 Lip language identification system and method
CN111783892A (en) * 2020-07-06 2020-10-16 广东工业大学 Robot instruction identification method and device, electronic equipment and storage medium
CN111832412A (en) * 2020-06-09 2020-10-27 北方工业大学 Sound production training correction method and system
US10834295B2 (en) 2018-08-29 2020-11-10 International Business Machines Corporation Attention mechanism for coping with acoustic-lips timing mismatch in audiovisual processing
CN111988652A (en) * 2019-05-23 2020-11-24 北京地平线机器人技术研发有限公司 Method and device for extracting lip language training data
CN113112997A (en) * 2019-12-25 2021-07-13 华为技术有限公司 Data acquisition method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101101752A (en) * 2007-07-19 2008-01-09 华中科技大学 Monosyllabic language lip-reading recognition system based on vision character
US20090018831A1 (en) * 2005-01-28 2009-01-15 Kyocera Corporation Speech Recognition Apparatus and Speech Recognition Method
CN102169642A (en) * 2011-04-06 2011-08-31 李一波 Interactive virtual teacher system having intelligent error correction function
CN102637071A (en) * 2011-02-09 2012-08-15 英华达(上海)电子有限公司 Multimedia input method applied to multimedia input device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090018831A1 (en) * 2005-01-28 2009-01-15 Kyocera Corporation Speech Recognition Apparatus and Speech Recognition Method
CN101101752A (en) * 2007-07-19 2008-01-09 华中科技大学 Monosyllabic language lip-reading recognition system based on vision character
CN102637071A (en) * 2011-02-09 2012-08-15 英华达(上海)电子有限公司 Multimedia input method applied to multimedia input device
CN102169642A (en) * 2011-04-06 2011-08-31 李一波 Interactive virtual teacher system having intelligent error correction function

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104808794A (en) * 2015-04-24 2015-07-29 北京旷视科技有限公司 Method and system for inputting lip language
CN104808794B (en) * 2015-04-24 2019-12-10 北京旷视科技有限公司 lip language input method and system
CN105653595A (en) * 2015-12-18 2016-06-08 合肥寰景信息技术有限公司 Intelligent voice assistance type network community
CN107945803A (en) * 2017-11-28 2018-04-20 上海与德科技有限公司 The assisted learning method and robot of a kind of robot
CN108520741A (en) * 2018-04-12 2018-09-11 科大讯飞股份有限公司 A kind of whispering voice restoration methods, device, equipment and readable storage medium storing program for executing
US11508366B2 (en) 2018-04-12 2022-11-22 Iflytek Co., Ltd. Whispering voice recovery method, apparatus and device, and readable storage medium
US10834295B2 (en) 2018-08-29 2020-11-10 International Business Machines Corporation Attention mechanism for coping with acoustic-lips timing mismatch in audiovisual processing
CN111724786A (en) * 2019-03-22 2020-09-29 上海博泰悦臻网络技术服务有限公司 Lip language identification system and method
CN110276259B (en) * 2019-05-21 2024-04-02 平安科技(深圳)有限公司 Lip language identification method, device, computer equipment and storage medium
CN110276259A (en) * 2019-05-21 2019-09-24 平安科技(深圳)有限公司 Lip reading recognition methods, device, computer equipment and storage medium
CN111988652B (en) * 2019-05-23 2022-06-03 北京地平线机器人技术研发有限公司 Method and device for extracting lip language training data
CN111988652A (en) * 2019-05-23 2020-11-24 北京地平线机器人技术研发有限公司 Method and device for extracting lip language training data
CN113112997A (en) * 2019-12-25 2021-07-13 华为技术有限公司 Data acquisition method and device
CN111832412A (en) * 2020-06-09 2020-10-27 北方工业大学 Sound production training correction method and system
CN111832412B (en) * 2020-06-09 2024-04-09 北方工业大学 Sounding training correction method and system
CN111783892B (en) * 2020-07-06 2021-10-01 广东工业大学 Robot instruction identification method and device, electronic equipment and storage medium
CN111783892A (en) * 2020-07-06 2020-10-16 广东工业大学 Robot instruction identification method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN104484656A (en) Deep learning-based lip language recognition lip shape model library construction method
CN104537358A (en) Lip language recognition lip-shape training database generating method based on deep learning
CN104504088A (en) Construction method of lip shape model library for identifying lip language
CN111325817B (en) Virtual character scene video generation method, terminal equipment and medium
CN108922538A (en) Conferencing information recording method, device, computer equipment and storage medium
CN110110104B (en) Method and device for automatically generating house explanation in virtual three-dimensional space
CN109346063B (en) Voice data enhancement method
CN109064532B (en) Automatic mouth shape generating method and device for cartoon character
CN103218924A (en) Audio and video dual mode-based spoken language learning monitoring method
CN109410911A (en) Artificial intelligence learning method based on speech recognition
CN104573231A (en) BIM based smart building system and method
CN102982572A (en) Intelligent image editing method and device thereof
CN104777911A (en) Intelligent interaction method based on holographic technique
CN110610698B (en) Voice labeling method and device
CN107833503A (en) Distribution core job augmented reality simulation training system
CN116758451A (en) Audio-visual emotion recognition method and system based on multi-scale and global cross attention
CN115984486A (en) Method and device for generating bridge model fusing laser radar and depth camera
CN104636324B (en) Topic source tracing method and system
CN117315102A (en) Virtual anchor processing method, device, computing equipment and storage medium
CN113053361A (en) Speech recognition method, model training method, device, equipment and medium
CN114492436B (en) Audit interview information processing method, device and system
CN104484041A (en) Lip shape image identification character input method based on deep learning
CN115294947A (en) Audio data processing method and device, electronic equipment and medium
CN115393501A (en) Information processing method and device
WO2019120247A1 (en) Method and device for checking word text

Legal Events

Date Code Title Description
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20150422

WD01 Invention patent application deemed withdrawn after publication