CN104537358A - Lip language recognition lip-shape training database generating method based on deep learning - Google Patents
Lip language recognition lip-shape training database generating method based on deep learning Download PDFInfo
- Publication number
- CN104537358A CN104537358A CN201510018956.9A CN201510018956A CN104537358A CN 104537358 A CN104537358 A CN 104537358A CN 201510018956 A CN201510018956 A CN 201510018956A CN 104537358 A CN104537358 A CN 104537358A
- Authority
- CN
- China
- Prior art keywords
- lip
- training database
- shape
- sound
- generating method
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Image Analysis (AREA)
Abstract
The invention discloses a lip language recognition lip-shape training database generating method based on deep learning. The method includes the following steps: collecting voiced video images, wherein the lip video images and talking voices of a target person are synchronously collected through a camera with a microphone; carrying out audio and video analysis, wherein the lip video images are analyzed through a computer according to the image analysis technology to obtain lip-shape characteristic values, and the talking voices are analyzed through the computer according to the voice recognition technology to obtain character information; forming a training database, wherein the lip-shape characteristic values correspond to the character information to generate the training database. The lip language recognition lip-shape training database generating method has the advantage that by means of the technical scheme, the efficiency for building a lip-shape model base is greatly improved.
Description
The application is divisional application, and the application number of original application is 201410829417.9, and the applying date is on Dec 26th, 2014, and invention and created name is: based on the lip reading identification lip model base construction method of degree of depth study.
Technical field
The present invention relates to man-machine interaction model bank constructing technology field, be specifically related to the generation method of the lip reading identification lip tranining database based on degree of depth study.
Background technology
The development of artificial intelligence technology, people have started computer video analytical technology to be used for lip reading identification, assign problem with the oral production ordering in scene solving the noise places such as workshop.Carrying out in lip reading identifying, needing to use lip model bank, its accuracy, comprehensively will directly determine the efficiency of lip reading identification.The many foundation one by one by manually carrying out lip model of prior art, not only workload is large, and its comprehensive being difficult to ensures.
For solving the problem, the invention provides a kind of lip reading identification lip model base construction method based on degree of depth study, effectively can reduce labor workload, and improve the comprehensive of lip model bank.
Summary of the invention
For the problems referred to above, the present invention is by comprehensively quoting degree of depth learning art, speech recognition technology and image analysis technology, carried out learning and the automatic mode building lip model bank to a large amount of sound lip video by computing machine, lip reading identification lip tranining database based on degree of depth study is provided, thus effectively promotes the structure efficiency of lip model bank.
Concrete technical scheme provided by the invention is: a kind of generation method of the lip reading identification lip tranining database based on degree of depth study, and the method comprises the following steps:
Sound video image acquisition, by lip video image and the sound of speaking of the camera synchronous acquisition target person with microphone;
Audio & video is analyzed, and is analyzed according to image analysis technology by computing machine to described lip video image, obtains lip eigenwert, by computing machine according to speech recognition technology to described sound analysis of speaking, obtain Word message;
Tranining database is formed, by described lip eigenwert and described Word message one_to_one corresponding, and generating training data storehouse.
The quantity of described target person should be no less than 2 people.
Beneficial effect: a kind of lip reading identification lip model base construction method based on degree of depth study provided by the invention, by computing machine, automatic analysis is carried out to lip image and sound of speaking, extract the eigenwert of lip, generate the corresponding word of sound of speaking, form tranining database, afterwards, utilize degree of depth learning art to learn tranining database, build lip model bank.This technical scheme is that the structure of lip model bank provides a kind of technological means efficiently, significantly improves the structure efficiency of lip model bank.
Accompanying drawing explanation
It is workflow diagram of the present invention shown in Fig. 1.
Embodiment
In order to more specifically describe the present invention, below in conjunction with the drawings and the specific embodiments, technical scheme of the present invention is described in detail.
As shown in Figure 1, the lip video image with the camera synchronous acquisition target person of microphone and sound of speaking first is passed through.When carrying out lip video image and speaking sound collection, both synchronisms need be ensured, avoid the situation such as delaying because of sound and bring error to subsequent analysis.Then, according to image analysis technology, described lip video image is analyzed by computing machine, obtains lip eigenwert, by computing machine according to speech recognition technology to described sound analysis of speaking, obtain the Word message that sound of speaking is corresponding.By described lip eigenwert and described Word message one_to_one corresponding, generating training data storehouse, more described tranining database is learnt according to degree of depth learning art by computing machine, build lip model bank.
Each lip model in lip model bank, word corresponding corresponding respectively, the word namely obtained by sound of speaking.
For degree of depth study, how much relevant its training effect is with the data volume of adopted tranining database, and data volume is larger, and training result is more accurate.Therefore, in order to obtain better lip model bank, as far as possible many sound video images should be gathered, to form the tranining database of data volume more horn of plenty.
The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, all any amendments done within the spirit and principles in the present invention, equivalent replacement and improvement etc., all should be included within protection scope of the present invention.
Claims (2)
1., based on the generation method of the lip reading identification lip tranining database of degree of depth study, it is characterized in that, the method comprises the following steps:
Sound video image acquisition, by lip video image and the sound of speaking of the camera synchronous acquisition target person with microphone;
Audio & video is analyzed, and is analyzed according to image analysis technology by computing machine to described lip video image, obtains lip eigenwert, by computing machine according to speech recognition technology to described sound analysis of speaking, obtain Word message;
Tranining database is formed, by described lip eigenwert and described Word message one_to_one corresponding, and generating training data storehouse.
2., as claimed in claim 1 based on the generation method of the lip reading identification lip tranining database of degree of depth study, it is characterized in that, the quantity of described target person should be no less than 2 people.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510018956.9A CN104537358A (en) | 2014-12-26 | 2014-12-26 | Lip language recognition lip-shape training database generating method based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510018956.9A CN104537358A (en) | 2014-12-26 | 2014-12-26 | Lip language recognition lip-shape training database generating method based on deep learning |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410829417.9A Division CN104484656A (en) | 2014-12-26 | 2014-12-26 | Deep learning-based lip language recognition lip shape model library construction method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN104537358A true CN104537358A (en) | 2015-04-22 |
Family
ID=52852878
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510018956.9A Pending CN104537358A (en) | 2014-12-26 | 2014-12-26 | Lip language recognition lip-shape training database generating method based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104537358A (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104808794A (en) * | 2015-04-24 | 2015-07-29 | 北京旷视科技有限公司 | Method and system for inputting lip language |
CN105653595A (en) * | 2015-12-18 | 2016-06-08 | 合肥寰景信息技术有限公司 | Intelligent voice assistance type network community |
CN107945803A (en) * | 2017-11-28 | 2018-04-20 | 上海与德科技有限公司 | The assisted learning method and robot of a kind of robot |
CN108520741A (en) * | 2018-04-12 | 2018-09-11 | 科大讯飞股份有限公司 | A kind of whispering voice restoration methods, device, equipment and readable storage medium storing program for executing |
CN110276259A (en) * | 2019-05-21 | 2019-09-24 | 平安科技(深圳)有限公司 | Lip reading recognition methods, device, computer equipment and storage medium |
CN111724786A (en) * | 2019-03-22 | 2020-09-29 | 上海博泰悦臻网络技术服务有限公司 | Lip language identification system and method |
CN111783892A (en) * | 2020-07-06 | 2020-10-16 | 广东工业大学 | Robot instruction identification method and device, electronic equipment and storage medium |
CN111832412A (en) * | 2020-06-09 | 2020-10-27 | 北方工业大学 | Sound production training correction method and system |
US10834295B2 (en) | 2018-08-29 | 2020-11-10 | International Business Machines Corporation | Attention mechanism for coping with acoustic-lips timing mismatch in audiovisual processing |
CN111988652A (en) * | 2019-05-23 | 2020-11-24 | 北京地平线机器人技术研发有限公司 | Method and device for extracting lip language training data |
CN113112997A (en) * | 2019-12-25 | 2021-07-13 | 华为技术有限公司 | Data acquisition method and device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101101752A (en) * | 2007-07-19 | 2008-01-09 | 华中科技大学 | Monosyllabic language lip-reading recognition system based on vision character |
US20090018831A1 (en) * | 2005-01-28 | 2009-01-15 | Kyocera Corporation | Speech Recognition Apparatus and Speech Recognition Method |
CN102169642A (en) * | 2011-04-06 | 2011-08-31 | 李一波 | Interactive virtual teacher system having intelligent error correction function |
CN102637071A (en) * | 2011-02-09 | 2012-08-15 | 英华达(上海)电子有限公司 | Multimedia input method applied to multimedia input device |
-
2014
- 2014-12-26 CN CN201510018956.9A patent/CN104537358A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090018831A1 (en) * | 2005-01-28 | 2009-01-15 | Kyocera Corporation | Speech Recognition Apparatus and Speech Recognition Method |
CN101101752A (en) * | 2007-07-19 | 2008-01-09 | 华中科技大学 | Monosyllabic language lip-reading recognition system based on vision character |
CN102637071A (en) * | 2011-02-09 | 2012-08-15 | 英华达(上海)电子有限公司 | Multimedia input method applied to multimedia input device |
CN102169642A (en) * | 2011-04-06 | 2011-08-31 | 李一波 | Interactive virtual teacher system having intelligent error correction function |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104808794A (en) * | 2015-04-24 | 2015-07-29 | 北京旷视科技有限公司 | Method and system for inputting lip language |
CN104808794B (en) * | 2015-04-24 | 2019-12-10 | 北京旷视科技有限公司 | lip language input method and system |
CN105653595A (en) * | 2015-12-18 | 2016-06-08 | 合肥寰景信息技术有限公司 | Intelligent voice assistance type network community |
CN107945803A (en) * | 2017-11-28 | 2018-04-20 | 上海与德科技有限公司 | The assisted learning method and robot of a kind of robot |
CN108520741A (en) * | 2018-04-12 | 2018-09-11 | 科大讯飞股份有限公司 | A kind of whispering voice restoration methods, device, equipment and readable storage medium storing program for executing |
US11508366B2 (en) | 2018-04-12 | 2022-11-22 | Iflytek Co., Ltd. | Whispering voice recovery method, apparatus and device, and readable storage medium |
US10834295B2 (en) | 2018-08-29 | 2020-11-10 | International Business Machines Corporation | Attention mechanism for coping with acoustic-lips timing mismatch in audiovisual processing |
CN111724786A (en) * | 2019-03-22 | 2020-09-29 | 上海博泰悦臻网络技术服务有限公司 | Lip language identification system and method |
CN110276259B (en) * | 2019-05-21 | 2024-04-02 | 平安科技(深圳)有限公司 | Lip language identification method, device, computer equipment and storage medium |
CN110276259A (en) * | 2019-05-21 | 2019-09-24 | 平安科技(深圳)有限公司 | Lip reading recognition methods, device, computer equipment and storage medium |
CN111988652B (en) * | 2019-05-23 | 2022-06-03 | 北京地平线机器人技术研发有限公司 | Method and device for extracting lip language training data |
CN111988652A (en) * | 2019-05-23 | 2020-11-24 | 北京地平线机器人技术研发有限公司 | Method and device for extracting lip language training data |
CN113112997A (en) * | 2019-12-25 | 2021-07-13 | 华为技术有限公司 | Data acquisition method and device |
CN111832412A (en) * | 2020-06-09 | 2020-10-27 | 北方工业大学 | Sound production training correction method and system |
CN111832412B (en) * | 2020-06-09 | 2024-04-09 | 北方工业大学 | Sounding training correction method and system |
CN111783892B (en) * | 2020-07-06 | 2021-10-01 | 广东工业大学 | Robot instruction identification method and device, electronic equipment and storage medium |
CN111783892A (en) * | 2020-07-06 | 2020-10-16 | 广东工业大学 | Robot instruction identification method and device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104484656A (en) | Deep learning-based lip language recognition lip shape model library construction method | |
CN104537358A (en) | Lip language recognition lip-shape training database generating method based on deep learning | |
CN104504088A (en) | Construction method of lip shape model library for identifying lip language | |
CN111325817B (en) | Virtual character scene video generation method, terminal equipment and medium | |
CN108922538A (en) | Conferencing information recording method, device, computer equipment and storage medium | |
CN110110104B (en) | Method and device for automatically generating house explanation in virtual three-dimensional space | |
CN109346063B (en) | Voice data enhancement method | |
CN109064532B (en) | Automatic mouth shape generating method and device for cartoon character | |
CN103218924A (en) | Audio and video dual mode-based spoken language learning monitoring method | |
CN109410911A (en) | Artificial intelligence learning method based on speech recognition | |
CN104573231A (en) | BIM based smart building system and method | |
CN102982572A (en) | Intelligent image editing method and device thereof | |
CN104777911A (en) | Intelligent interaction method based on holographic technique | |
CN110610698B (en) | Voice labeling method and device | |
CN107833503A (en) | Distribution core job augmented reality simulation training system | |
CN116758451A (en) | Audio-visual emotion recognition method and system based on multi-scale and global cross attention | |
CN115984486A (en) | Method and device for generating bridge model fusing laser radar and depth camera | |
CN104636324B (en) | Topic source tracing method and system | |
CN117315102A (en) | Virtual anchor processing method, device, computing equipment and storage medium | |
CN113053361A (en) | Speech recognition method, model training method, device, equipment and medium | |
CN114492436B (en) | Audit interview information processing method, device and system | |
CN104484041A (en) | Lip shape image identification character input method based on deep learning | |
CN115294947A (en) | Audio data processing method and device, electronic equipment and medium | |
CN115393501A (en) | Information processing method and device | |
WO2019120247A1 (en) | Method and device for checking word text |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20150422 |
|
WD01 | Invention patent application deemed withdrawn after publication |