CN111832412A - Sound production training correction method and system - Google Patents
Sound production training correction method and system Download PDFInfo
- Publication number
- CN111832412A CN111832412A CN202010517909.XA CN202010517909A CN111832412A CN 111832412 A CN111832412 A CN 111832412A CN 202010517909 A CN202010517909 A CN 202010517909A CN 111832412 A CN111832412 A CN 111832412A
- Authority
- CN
- China
- Prior art keywords
- lip
- target object
- sequence
- standard
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000012549 training Methods 0.000 title claims abstract description 59
- 238000000034 method Methods 0.000 title claims abstract description 28
- 238000012937 correction Methods 0.000 title claims description 11
- 238000004519 manufacturing process Methods 0.000 title description 4
- 238000000605 extraction Methods 0.000 claims description 6
- 208000032041 Hearing impaired Diseases 0.000 description 15
- 238000010586 diagram Methods 0.000 description 10
- 238000004590 computer program Methods 0.000 description 7
- 206010011878 Deafness Diseases 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 238000004088 simulation Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 206010013952 Dysphonia Diseases 0.000 description 1
- 208000016621 Hearing disease Diseases 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 208000016354 hearing loss disease Diseases 0.000 description 1
- 239000007943 implant Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000003997 social interaction Effects 0.000 description 1
- 230000016776 visual perception Effects 0.000 description 1
- 230000001755 vocal effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
- G06V40/28—Recognition of hand or arm movements, e.g. recognition of deaf sign language
Abstract
The invention provides a method and a system for correcting vocalization training, wherein the method comprises the following steps: extracting gesture sequence features and lip sequence features of a target object from a video to be recognized; extracting standard lip sequence features matched with the gesture sequence features from a pre-constructed voice training database; and obtaining the similarity of corresponding lips based on the standard lip sequence characteristics and the lip sequence characteristics of the target object, and providing a standard lip reading sequence for training for the target object. The technical scheme provided by the invention can effectively correct the pronunciation lip of the target object and improve the speech ability of the target object without the help of other people.
Description
Technical Field
The invention relates to the field of rehabilitation training, in particular to a method and a system for correcting vocalization training.
Background
People with hearing impairment and difficult pronunciation can not communicate like ordinary people due to incorrect mouth shape and unclear expression, and furthermore, the speech disability of hearing impaired people is caused by lack of communication, so that the speech disability of hearing impaired people is necessary to be prevented through rehabilitation training.
However, the lack of hearing health care personnel and related resources is considered to be one of the major obstacles in the global treatment of hearing disorders. In recent years, automatic lip reading technology plays a crucial role in visual perception, and especially, promoting social interaction of hearing-impaired people by using the automatic lip reading technology is one of the most promising applications of artificial intelligence in healthcare and rehabilitation. The automatic lip reading is that the system captures the lip action of a speaker through automatic detection so as to identify speech information, and the system can be widely applied to a voice identification and auxiliary driving system in the information security and noisy environment. The current research direction focuses on how to improve the accuracy of extracting features of lips and gestures and how to improve the recognition rate of the features, and how to use an automatic lip reading technology to perform rehabilitation training on hearing impaired people is helpful for the hearing impaired people.
Disclosure of Invention
In order to solve the above-mentioned deficiencies in the prior art, the invention provides a method for correcting vocalization training, comprising:
extracting gesture sequence features and lip sequence features of a target object from a video to be recognized;
extracting standard lip sequence features matched with the gesture sequence features from a pre-constructed voice training database;
and obtaining the similarity of corresponding lips based on the standard lip sequence characteristics and the lip sequence characteristics of the target object, and providing a standard lip reading sequence for training for the target object.
Preferably, the speech training database stores standard sign language vocabulary and a lip shape corresponding to the sign language vocabulary.
Preferably, the extracting the standard lip sequence features matched with the gesture sequence features from the pre-constructed voice training database includes:
finding out various sign language vocabularies matched with the gesture sequence characteristics from the voice training database;
obtaining a standard lip shape corresponding to each sign language vocabulary based on each sign language vocabulary;
and generating standard lip sequence characteristics from the standard lips corresponding to the sign language vocabularies.
Preferably, after finding each sign language vocabulary matching with the gesture sequence feature from the speech training database, the method further includes:
feeding back a recognition result to the target object based on each sign language vocabulary;
and correcting the recognition result based on the expression content of the target object until the recognition result is consistent with the expression content of the target object.
Preferably, the modifying the recognition result based on the expression content of the target object includes:
and when the identification result is inconsistent with the expression content of the target object, recording the video to be identified again based on the expression content of the target object.
Preferably, the lip similarity is calculated as follows:
wherein Simiarityrate is the similarity of labiate, X1Lip shape, X, in a lip shape sequence characteristic of a target object2And k is a penalty coefficient for the lip shape in the lip shape sequence characteristic.
Preferably, the extracting the gesture sequence feature and the lip sequence feature of the target object from the video to be recognized includes:
matching the gesture labels with the lip labels one by one through ResNet50 to obtain gesture sequence characteristics;
segmenting lip image areas in the video to be identified by using a MobileNet network to extract lip characteristics;
and learning time sequence information by using the LSTM network to obtain lip sequence characteristics.
Based on the same inventive concept, the invention also provides a system for correcting the vocalization training, which comprises:
the extraction module is used for extracting the gesture sequence characteristics and the lip sequence characteristics of the target object from the video to be recognized;
the matching module is used for extracting standard lip sequence features matched with the gesture sequence features from a pre-constructed voice training database;
and the comparison module is used for obtaining the similarity of the corresponding lip shape based on the standard lip shape sequence characteristics and the lip shape sequence characteristics of the target object and providing a standard lip reading sequence for training for the target object.
Based on the same inventive concept, the present invention also provides an electronic device, comprising:
a memory storing at least one instruction; and
a processor executing instructions stored in the memory to implement any of the vocalization training correction methods.
Based on the same inventive concept, the present invention further provides a computer-readable storage medium, where at least one instruction is stored, and the at least one instruction is executed by a processor in an electronic device to implement any one of the vocalization training correction methods.
Compared with the prior art, the invention has the beneficial effects that:
according to the technical scheme provided by the invention, the gesture sequence characteristics and the lip sequence characteristics of the target object are extracted from the video to be recognized; extracting standard lip sequence features matched with the gesture sequence features from a pre-constructed voice training database; and obtaining the similarity of corresponding lips based on the standard lip sequence characteristics and the lip sequence characteristics of the target object, and providing a standard lip reading sequence for training for the target object. The invention integrates the gesture characteristic and the lip characteristic, utilizes sign language recognition to assist lip language recognition, further realizes the correction of the mouth shape of a target object, helps the deaf-mute and the old with unclear pronunciation to express correctly by training the speaking modes of the deaf-mute and the old, recovers the speaking ability of the communication between the deaf-mute and the old, and improves the daily life level.
Drawings
FIG. 1 is a flow chart of a method of vocal training correction according to the present invention;
FIG. 2 is a schematic diagram of the inventive concept of the present invention;
FIG. 3 is a diagram illustrating simulation test results according to an embodiment of the present invention.
Detailed Description
For a better understanding of the present invention, reference is made to the following description taken in conjunction with the accompanying drawings and examples.
The invention provides a sound production training and correcting method for hearing-impaired people and dysphonia people, which trains the voice skills of hearing-impaired people and normal people by comparing different mouth shapes of the hearing-impaired people and the normal people, as shown in figure 1, the scheme comprises the following steps:
s1, extracting gesture sequence features and lip sequence features of the target object from the video to be recognized;
s2, extracting standard lip sequence features matched with the gesture sequence features from a pre-constructed voice training database;
and S3, obtaining the similarity of the corresponding lip shape based on the standard lip shape sequence characteristics and the lip shape sequence characteristics of the target object, and providing a standard lip reading sequence for training for the target object.
As shown in fig. 2, the present invention aims to help hearing impaired people correct their lip shape of pronunciation correctly and support them to conduct continuous training. Lip reading recognition is carried out on the video acquired by the camera by utilizing the current advanced voice recognition technology, so that help is provided for hearing-impaired people.
S1, extracting the gesture sequence feature and the lip sequence feature of the target object from the video to be recognized, wherein the method comprises the following steps:
extracting lip sequence characteristics from the instantly recorded video through a mixed neural network of a MobileNet and a long-short term memory network (LSTM); and extracts the gesture sequence features using ResNet 50.
In the actual extraction process, the lip region is small, so that the region where the lips appear needs to be searched in the face image containing the background. However, it is very difficult to locate the lips directly in the panorama, so the face is detected first, and then the lips are searched in the face area. When the gesture features are extracted, the position area of the hand is found firstly, then the hand is positioned, and the image features of the lips and the hand are extracted after the lips and the hand are positioned.
The invention combines ResNet50 with the mixed neural network of MobileNet and LSTM, which not only ensures the advantage that the accuracy rate is not reduced even if the network is deepened, but also reduces the use of parameters and the complexity of the model.
S2, extracting standard lip sequence features matched with the gesture sequence features from a pre-constructed voice training database, wherein the standard lip sequence features comprise:
a speech training database, namely a multi-feature fusion network model, is created in advance, and the database stores standard mouth shapes and corresponding sign language vocabularies.
And finding sign language vocabularies matched with the gesture sequence characteristics and the correct lip shapes matched with the sign language vocabularies from a voice training database.
Outputting the recognition result of the method according to the sign language vocabulary, and judging the output recognition result by the target object: if the output recognition result matches the intention that the target object wants to convey, proceed to S3; if the target object considers that the output recognition result is different from the self idea, the lip reading and gesture sequence of the target object is input again, and trial and error are continuously carried out until the output recognition content is the same as the mind of the target object.
S3, obtaining the similarity of the corresponding lip shape based on the standard lip shape sequence feature and the lip shape sequence feature of the target object, and providing a standard lip reading sequence for training for the target object, including:
and comparing the standard lip sequence characteristics with the lip of the target object, obtaining the similarity, drawing comparison data and the similarity according to the size of the sounding lip of the target object, the angle of the opened lip and the difference between different lip shapes, and giving a standard lip reading sequence of the target object for learning and training. According to the invention, lip shapes which are easy to make mistakes are formed into a pronunciation training model library according to the comparison result of each target object, wherein the pronunciation training model library comprises specific pronunciation analysis.
The target subject may analyze, correct, and continuously train autonomously to improve their mouth-shape similarity based on the comparison.
Wherein the similarity is calculated according to the following formula:
in the formula, X1,X2Respectively, the lip shape and the correct lip shape of the target object, and k is a penalty coefficient.
The technical scheme provided by the invention fuses the gesture characteristics and the lip reading characteristics, utilizes the constructed multi-characteristic fusion network model to synchronously perform sign language recognition and lip language recognition, uses the sign language to assist the system in capturing the lip language, corrects the recognized characters and the mouth shape of an expressor, and continuously returns a trial and error until the mouth shape of the expressor is correct. And finally, constructing a pronunciation training model library according to the comparison result of the target object, wherein the pronunciation training model library comprises pronunciation lip details to train the speaking modes of the deaf-mute and the old with unclear pronunciation, help the deaf-mute and the old to express correctly, recover the speaking ability of the deaf-mute and the old with unclear pronunciation, and improve the daily life level.
The simulation test results of this example are shown in fig. 3, and the first row is the standard lip of the english pronunciation of the numeral 6. The second row is the lip reading sequence after the tester has deliberately changed pronounces the lip, and the third row is the lip reading sequence after the tester has simulated a standard lip reading image and correctly pronounces the number 6. According to the matching result of the system, the matching degree after the pronunciation is intentionally changed is 71.76%, and the matching degree of the correct pronunciation is 86.83%. Experiments show that the target object can be autonomously trained by means of a system based on the vocalization training correction method, and the pronunciation lip shape of the target object can be effectively corrected according to the standard lip reading sequence and the lip reading comparison result given by the system, so that the pronunciation lip shape of the target object can be effectively corrected, and the speech ability of the target object can be improved without the help of other people.
The target objects described in the present embodiment include a hearing-impaired person and a person who has difficulty in speaking.
Based on the same inventive concept, the invention also provides a system for correcting the vocalization training, which comprises:
the extraction module is used for extracting the gesture sequence characteristics and the lip sequence characteristics of the target object from the video to be recognized;
the matching module is used for extracting standard lip sequence features matched with the gesture sequence features from a pre-constructed voice training database;
and the comparison module is used for obtaining the similarity of the corresponding lip shape based on the standard lip shape sequence characteristics and the lip shape sequence characteristics of the target object and providing a standard lip reading sequence for training for the target object.
Firstly, preprocessing is needed to be carried out on the dynamic video, the gesture labels and the lip labels are matched one by one, and accuracy is guaranteed by using a ResNet50 network. Second, the lip image area is segmented using a MobileNet network to extract features. The time series information is then learned using the LSTM network. Because the scientific personnel mainly study how to improve the accuracy of lip and gesture extraction features and how to improve the recognition rate of the lip and gesture extraction features at present, the ResNet50 network, the MobileNet network and the LSTM network are combined and applied to rehabilitation training, and great contribution is made to the rehabilitation training of hearing impaired people.
The hearing-impaired person can perform main training according to the comparison result of the automatic lip reading recognition, and the mouth shape similarity is corrected and improved.
The system provided by the invention supports the cooperation with medical instruments such as the cochlear implant and the like, and assists hearing impaired people to learn how to pronounce correctly, thereby helping the hearing impaired people to recover the speech ability.
The present invention also provides an electronic device, including:
a memory storing at least one instruction; and
and the processor executes the instructions stored in the memory to realize any vocalization training correction method provided by the invention.
An embodiment of the present invention further provides a computer-readable storage medium, where at least one instruction is stored in the computer-readable storage medium, and the at least one instruction is executed by a processor in an electronic device to implement any one of the vocalization training correction methods provided in the present invention.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The present invention is not limited to the above embodiments, and any modifications, equivalent replacements, improvements, etc. made within the spirit and principle of the present invention are included in the scope of the claims of the present invention which are filed as the application.
Claims (10)
1. A method for correcting vocalization training, comprising:
extracting gesture sequence features and lip sequence features of a target object from a video to be recognized;
extracting standard lip sequence features matched with the gesture sequence features from a pre-constructed voice training database;
and obtaining the similarity of corresponding lips based on the standard lip sequence characteristics and the lip sequence characteristics of the target object, and providing a standard lip reading sequence for training for the target object.
2. The method of claim 1, wherein the speech training database stores standard sign language vocabulary and lip shapes corresponding to the sign language vocabulary.
3. The method of claim 2, wherein the extracting standard lip sequence features from a pre-built speech training database that match the gesture sequence features comprises:
finding out various sign language vocabularies matched with the gesture sequence characteristics from the voice training database;
obtaining a standard lip shape corresponding to each sign language vocabulary based on each sign language vocabulary;
and generating standard lip sequence characteristics from the standard lips corresponding to the sign language vocabularies.
4. The method of claim 3, wherein after finding the sign language vocabulary matching the gesture sequence features from the speech training database, further comprising:
feeding back a recognition result to the target object based on each sign language vocabulary;
and correcting the recognition result based on the expression content of the target object until the recognition result is consistent with the expression content of the target object.
5. The method of claim 4, wherein the modifying the recognition result based on the expression content of the target object comprises:
and when the identification result is inconsistent with the expression content of the target object, recording the video to be identified again based on the expression content of the target object.
7. The method as claimed in claim 1, wherein the extracting of the gesture sequence feature and the lip sequence feature of the target object from the video to be recognized comprises:
matching the gesture labels with the lip labels one by one through ResNet50 to obtain gesture sequence characteristics;
segmenting lip image areas in the video to be identified by using a MobileNet network to extract lip characteristics;
and learning time sequence information by using the LSTM network to obtain lip sequence characteristics.
8. A vocalization training orthotic system, comprising:
the extraction module is used for extracting the gesture sequence characteristics and the lip sequence characteristics of the target object from the video to be recognized;
the matching module is used for extracting standard lip sequence features matched with the gesture sequence features from a pre-constructed voice training database;
and the comparison module is used for obtaining the similarity of the corresponding lip shape based on the standard lip shape sequence characteristics and the lip shape sequence characteristics of the target object and providing a standard lip reading sequence for training for the target object.
9. An electronic device, characterized in that the electronic device comprises:
a memory storing at least one instruction; and
a processor executing instructions stored in the memory to implement the vocalization training correction method of any of claims 1-7.
10. A computer-readable storage medium having stored therein at least one instruction for execution by a processor in an electronic device to implement the vocalization training correction method of any one of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010517909.XA CN111832412B (en) | 2020-06-09 | 2020-06-09 | Sounding training correction method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010517909.XA CN111832412B (en) | 2020-06-09 | 2020-06-09 | Sounding training correction method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111832412A true CN111832412A (en) | 2020-10-27 |
CN111832412B CN111832412B (en) | 2024-04-09 |
Family
ID=72899322
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010517909.XA Active CN111832412B (en) | 2020-06-09 | 2020-06-09 | Sounding training correction method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111832412B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114758647A (en) * | 2021-07-20 | 2022-07-15 | 无锡柠檬科技服务有限公司 | Language training method and system based on deep learning |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101882390A (en) * | 2010-06-12 | 2010-11-10 | 黑龙江新洋科技有限公司 | Three-dimensional lip language interactive teaching system and method thereof |
CN104537358A (en) * | 2014-12-26 | 2015-04-22 | 安徽寰智信息科技股份有限公司 | Lip language recognition lip-shape training database generating method based on deep learning |
CN105047196A (en) * | 2014-04-25 | 2015-11-11 | 通用汽车环球科技运作有限责任公司 | Systems and methods for speech artifact compensation in speech recognition systems |
CN107301863A (en) * | 2017-07-13 | 2017-10-27 | 江苏师范大学 | A kind of deaf-mute child's disfluency method of rehabilitation and rehabilitation training system |
CN109389098A (en) * | 2018-11-01 | 2019-02-26 | 重庆中科云丛科技有限公司 | A kind of verification method and system based on lip reading identification |
CN109637521A (en) * | 2018-10-29 | 2019-04-16 | 深圳壹账通智能科技有限公司 | A kind of lip reading recognition methods and device based on deep learning |
CN110047511A (en) * | 2019-04-23 | 2019-07-23 | 赵旭 | A kind of speech training method, device, computer equipment and its storage medium |
CN110059575A (en) * | 2019-03-25 | 2019-07-26 | 中国科学院深圳先进技术研究院 | A kind of augmentative communication system based on the identification of surface myoelectric lip reading |
CN110532850A (en) * | 2019-07-02 | 2019-12-03 | 杭州电子科技大学 | A kind of fall detection method based on video artis and hybrid classifer |
-
2020
- 2020-06-09 CN CN202010517909.XA patent/CN111832412B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101882390A (en) * | 2010-06-12 | 2010-11-10 | 黑龙江新洋科技有限公司 | Three-dimensional lip language interactive teaching system and method thereof |
CN105047196A (en) * | 2014-04-25 | 2015-11-11 | 通用汽车环球科技运作有限责任公司 | Systems and methods for speech artifact compensation in speech recognition systems |
CN104537358A (en) * | 2014-12-26 | 2015-04-22 | 安徽寰智信息科技股份有限公司 | Lip language recognition lip-shape training database generating method based on deep learning |
CN107301863A (en) * | 2017-07-13 | 2017-10-27 | 江苏师范大学 | A kind of deaf-mute child's disfluency method of rehabilitation and rehabilitation training system |
CN109637521A (en) * | 2018-10-29 | 2019-04-16 | 深圳壹账通智能科技有限公司 | A kind of lip reading recognition methods and device based on deep learning |
CN109389098A (en) * | 2018-11-01 | 2019-02-26 | 重庆中科云丛科技有限公司 | A kind of verification method and system based on lip reading identification |
CN110059575A (en) * | 2019-03-25 | 2019-07-26 | 中国科学院深圳先进技术研究院 | A kind of augmentative communication system based on the identification of surface myoelectric lip reading |
CN110047511A (en) * | 2019-04-23 | 2019-07-23 | 赵旭 | A kind of speech training method, device, computer equipment and its storage medium |
CN110532850A (en) * | 2019-07-02 | 2019-12-03 | 杭州电子科技大学 | A kind of fall detection method based on video artis and hybrid classifer |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114758647A (en) * | 2021-07-20 | 2022-07-15 | 无锡柠檬科技服务有限公司 | Language training method and system based on deep learning |
Also Published As
Publication number | Publication date |
---|---|
CN111832412B (en) | 2024-04-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Engwall | Analysis of and feedback on phonetic features in pronunciation training with a virtual teacher | |
CN103838866B (en) | A kind of text conversion method and device | |
WO2015158017A1 (en) | Intelligent interaction and psychological comfort robot service system | |
KR102167760B1 (en) | Sign language analysis Algorithm System using Recognition of Sign Language Motion process and motion tracking pre-trained model | |
CN113657168B (en) | Student learning emotion recognition method based on convolutional neural network | |
CN111126280B (en) | Gesture recognition fusion-based aphasia patient auxiliary rehabilitation training system and method | |
Koller et al. | Read my lips: Continuous signer independent weakly supervised viseme recognition | |
Hoque et al. | Automated Bangla sign language translation system: Prospects, limitations and applications | |
Abdulsalam et al. | Emotion recognition system based on hybrid techniques | |
CN115188074A (en) | Interactive physical training evaluation method, device and system and computer equipment | |
CN110096987B (en) | Dual-path 3DCNN model-based mute action recognition method | |
CN111832412B (en) | Sounding training correction method and system | |
Bhat et al. | Vision sensory substitution to aid the blind in reading and object recognition | |
KR20190068841A (en) | System for training and evaluation of english pronunciation using artificial intelligence speech recognition application programming interface | |
Krishnamoorthy et al. | E-Learning Platform for Hearing Impaired Students | |
Chitu et al. | Visual speech recognition automatic system for lip reading of Dutch | |
Krňoul et al. | Correlation analysis of facial features and sign gestures | |
Mishra et al. | Environment descriptor for the visually impaired | |
Datar et al. | A Review on Deep Learning Based Lip-Reading | |
Foysol et al. | Vision-based Real Time Bangla Sign Language Recognition System Using MediaPipe Holistic and LSTM | |
Thahseen et al. | Smart System to Support Hearing Impaired Students in Tamil | |
CN112786151B (en) | Language function training system and method | |
Janbandhu et al. | Sign Language Recognition Using CNN | |
Huda et al. | Automated bangla sign language conversion system: Present and future | |
Bazaz et al. | Real Time Conversion Of Sign Language To Text and Speech (For Marathi and English) |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |