CN111832412B - Sounding training correction method and system - Google Patents
Sounding training correction method and system Download PDFInfo
- Publication number
- CN111832412B CN111832412B CN202010517909.XA CN202010517909A CN111832412B CN 111832412 B CN111832412 B CN 111832412B CN 202010517909 A CN202010517909 A CN 202010517909A CN 111832412 B CN111832412 B CN 111832412B
- Authority
- CN
- China
- Prior art keywords
- lip
- target object
- standard
- sequence
- sign language
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000012549 training Methods 0.000 title claims abstract description 63
- 238000000034 method Methods 0.000 title claims abstract description 28
- 238000012937 correction Methods 0.000 title claims abstract description 17
- 230000015654 memory Effects 0.000 claims description 8
- 238000000605 extraction Methods 0.000 claims description 7
- 230000008569 process Effects 0.000 claims description 2
- 208000032041 Hearing impaired Diseases 0.000 description 13
- 238000010586 diagram Methods 0.000 description 8
- 238000004590 computer program Methods 0.000 description 7
- 206010011878 Deafness Diseases 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 208000016354 hearing loss disease Diseases 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 230000006735 deficit Effects 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 238000004088 simulation Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 206010013887 Dysarthria Diseases 0.000 description 1
- 206010013952 Dysphonia Diseases 0.000 description 1
- 208000016621 Hearing disease Diseases 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 210000003477 cochlea Anatomy 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 230000003997 social interaction Effects 0.000 description 1
- 230000016776 visual perception Effects 0.000 description 1
- 230000002747 voluntary effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
- G06V40/28—Recognition of hand or arm movements, e.g. recognition of deaf sign language
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Psychiatry (AREA)
- Molecular Biology (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biophysics (AREA)
- Multimedia (AREA)
- Human Computer Interaction (AREA)
- Biomedical Technology (AREA)
- Social Psychology (AREA)
- Computing Systems (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Electrically Operated Instructional Devices (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a sounding training correction method and system, wherein the method comprises the following steps: extracting gesture sequence features and lip sequence features of a target object from a video to be identified; extracting standard lip sequence features matched with the gesture sequence features from a pre-constructed voice training database; and obtaining similarity of corresponding lips based on the standard lip sequence features and the lip sequence features of the target object, and providing a standard lip reading sequence for training for the target object. The technical scheme provided by the invention can effectively correct the pronunciation lip shape of the target object and improve the speech ability of other people without the help of other people.
Description
Technical Field
The invention relates to the field of rehabilitation training, in particular to a sounding training correction method and system.
Background
People with hearing impairment and dysphonia cannot communicate as ordinary people due to incorrect mouth shape and unclear expression, and furthermore, the hearing impaired people suffer from speech impairment due to lack of communication, so that it is necessary to prevent the hearing impaired people from suffering from speech impairment through rehabilitation training.
However, the lack of hearing health care personnel and related resources is considered one of the major disorders for global treatment of hearing disorders. In recent years, automatic lip reading technology plays a crucial role in visual perception, and in particular, promotion of social interaction of people with hearing impairment by using the automatic lip reading technology is one of the most promising applications of artificial intelligence in medical care and rehabilitation. The automatic lip reading is to capture the lip action of a speaker through automatic detection so as to recognize speech information, and can be widely applied to voice recognition and driving assistance systems in information safety and noisy environments. The current research direction focuses on how to improve the accuracy of lip and gesture extraction features and how to improve their recognition rate, and how to use automatic lip reading technology to perform rehabilitation training on hearing impaired people will be greatly helpful to hearing impaired people.
Disclosure of Invention
In order to solve the above-mentioned shortcomings existing in the prior art, the present invention provides a sounding training correction method, including:
extracting gesture sequence features and lip sequence features of a target object from a video to be identified;
extracting standard lip sequence features matched with the gesture sequence features from a pre-constructed voice training database;
and obtaining similarity of corresponding lips based on the standard lip sequence features and the lip sequence features of the target object, and providing a standard lip reading sequence for training for the target object.
Preferably, the voice training database stores standard sign language vocabulary and lips corresponding to the sign language vocabulary.
Preferably, the extracting standard lip sequence features matched with the gesture sequence features from a pre-constructed voice training database includes:
finding out each sign language vocabulary matched with the gesture sequence characteristics from the voice training database;
obtaining standard lip shapes corresponding to each sign language vocabulary based on each sign language vocabulary;
and generating standard lip sequence characteristics from the standard lips corresponding to each sign language vocabulary.
Preferably, after finding each sign language vocabulary matched with the gesture sequence feature from the voice training database, the method further includes:
feeding back a recognition result to the target object based on each sign language vocabulary;
and correcting the identification result based on the expression content of the target object until the identification result is consistent with the expression content of the target object.
Preferably, the correcting the recognition result based on the expression content of the target object includes:
and when the identification result is inconsistent with the expression content of the target object, re-recording the video to be identified based on the expression content of the target object.
Preferably, the similarity of the lips is calculated as follows:
wherein the similarity rate is the similarity of lips, X 1 Is the lip shape in the lip sequence characteristic of the target object, X 2 Is a lip in the lip sequence feature, k is a penalty factor.
Preferably, the extracting the gesture sequence feature and the lip sequence feature of the target object from the video to be identified includes:
the gesture labels are matched with the lip labels one by one through ResNet50, so that gesture sequence characteristics are obtained;
dividing lip image areas in the video to be identified by using a MobileNet network to extract lip features;
and learning time sequence information by utilizing the LSTM network to obtain lip sequence characteristics.
Based on the same inventive concept, the invention also provides a sounding training correction system, comprising:
the extraction module is used for extracting gesture sequence features and lip sequence features of the target object from the video to be identified;
the matching module is used for extracting standard lip sequence features matched with the gesture sequence features from a pre-constructed voice training database;
and the comparison module is used for obtaining the similarity of the corresponding lips based on the standard lip sequence characteristics and the lip sequence characteristics of the target object and providing a standard lip reading sequence for training for the target object.
Based on the same inventive concept, the invention also provides an electronic device, comprising:
a memory storing at least one instruction; and
And the processor executes the instructions stored in the memory to realize any sounding training correction method.
Based on the same inventive concept, the present invention also provides a computer readable storage medium having at least one instruction stored therein, the at least one instruction being executed by a processor in an electronic device to implement any one of the vocalization training correction methods.
Compared with the prior art, the invention has the beneficial effects that:
according to the technical scheme provided by the invention, the gesture sequence characteristics and the lip sequence characteristics of the target object are extracted from the video to be identified; extracting standard lip sequence features matched with the gesture sequence features from a pre-constructed voice training database; and obtaining similarity of corresponding lips based on the standard lip sequence features and the lip sequence features of the target object, and providing a standard lip reading sequence for training for the target object. According to the invention, gesture features and lip features are fused, and the gesture language recognition is utilized to assist in lip language recognition, so that the mouth shape of a target object is corrected, the correct expression of the deaf-mute and the old with unclear pronunciation is helped by training the speaking modes of the deaf-mute and the old, the speaking ability of the old communicating with the person is recovered, and the daily living standard is improved.
Drawings
FIG. 1 is a flow chart of a method for correcting sounding training according to the present invention;
FIG. 2 is a schematic diagram of the inventive concept of the present invention;
FIG. 3 is a schematic diagram of a simulation test result in an embodiment of the present invention.
Detailed Description
For a better understanding of the present invention, reference is made to the following description, drawings and examples.
The invention provides a sounding training correction method aiming at a hearing impaired person and a dysarthria person, which trains the voice skills of the hearing impaired person and a normal person by comparing different mouth shapes of the hearing impaired person and the normal person, as shown in figure 1, wherein the scheme comprises the following steps:
s1, extracting gesture sequence features and lip sequence features of a target object from a video to be identified;
s2, extracting standard lip sequence features matched with the gesture sequence features from a pre-constructed voice training database;
s3, obtaining similarity of corresponding lips based on the standard lip sequence features and the lip sequence features of the target object, and providing a standard lip reading sequence for training for the target object.
As shown in FIG. 2, the present invention aims to help hearing impaired people correct the correct sounding lips and to support them continuously performing voluntary training. Lip reading recognition is carried out on the video acquired by the camera by utilizing the current advanced voice recognition technology, so that assistance is provided for hearing impaired people.
S1, extracting gesture sequence features and lip sequence features of a target object from a video to be identified, wherein the gesture sequence features and the lip sequence features comprise:
extracting lip sequence features from an instant-entry video through a hybrid neural network of a MobileNet and a long short term memory network (LSTM); and extracts gesture sequence features using ResNet 50.
In the actual extraction process, the lip area is small, so that the area where the lips appear needs to be found in the face image containing the background. However, locating lips directly in a panoramic view has great difficulty, so that a face is first detected, and then lips are searched in the face area. When the gesture features are extracted, the position areas of the hands are found, then the hand is positioned, and the image features of the lips and the hands are extracted after the lips and the hands are positioned.
The invention combines ResNet50 with the mixed neural network of MobileNet and LSTM, which not only ensures the advantage that even if the network deepens, the accuracy is not reduced, but also reduces the use of parameters and the complexity of the model.
S2, extracting standard lip sequence features matched with the gesture sequence features from a pre-constructed voice training database, wherein the standard lip sequence features comprise:
a speech training database, i.e., a multi-feature fusion network model, is pre-created, which stores standard mouth shapes and corresponding sign language vocabulary.
And finding the sign language vocabulary matched with the gesture sequence characteristics from a voice training database, and the correct lip shape matched with the sign language vocabulary.
According to the sign language vocabulary, outputting the recognition result of the method, and judging the output recognition result by the target object: if the output recognition result is consistent with the meaning which the target object wants to convey, S3 is carried out; if the target object considers that the output identification result is different from the thought of the target object, the lip reading and gesture sequence of the target object is input again, and the error is tried continuously until the output identification content is the same as the thought of the inner core of the target object.
S3, obtaining similarity of corresponding lips based on the standard lip sequence features and the lip sequence features of the target object, and providing a standard lip reading sequence for training for the target object, wherein the method comprises the following steps:
the standard lip sequence features are compared with the lips of the target object to obtain similarity, comparison data and similarity are drawn according to the size of the sounding lips of the target object, the angle of the opened lips and the difference between different lip shapes, and a standard lip reading sequence of the target object is provided for learning and training. According to the invention, the lip shape which is easy to make mistakes is formed into a pronunciation training model library according to the comparison result of each target object, wherein the pronunciation training model library comprises specific pronunciation analysis.
The target object may analyze, correct its lip shape based on the comparison and constantly train itself to improve their mouth shape similarity.
Wherein the similarity is calculated as follows:
wherein X is 1 ,X 2 Respectively, the lip shape and the correct lip shape of the target object, and k is a penalty coefficient.
According to the technical scheme provided by the invention, gesture features and lip reading features are fused, the constructed multi-feature fusion network model is utilized to synchronously carry out sign language identification and lip language identification, the sign language is used for assisting the system in capturing the lip language, then the identified characters and the mouth shape of the expressive person are corrected, and trial and error is continuously returned until the mouth shape of the expressive person is correct. And finally, constructing a pronunciation training model library according to the comparison result of the target object, wherein the pronunciation training model library comprises a pronunciation lip detail for training the speaking modes of the deaf-mute and the old with unclear pronunciation, helping the correct expression of the deaf-mute and the old, recovering the speaking ability of the deaf-mute and the old, and improving the daily living standard.
The simulation test results of this embodiment are shown in fig. 3, where the first line is a standard lip shape for english pronunciation of the number 6. The second row is the lip read sequence after the tester intentionally changed the sounding lip shape, and the third row is the lip read sequence after the tester mimics the standard lip read image and correctly sounds the number 6. According to the system matching result, the matching degree after the pronunciation is intentionally changed is 71.76%, and the matching degree of the correct pronunciation is 86.83%. Experiments show that the target object can carry out autonomous training by means of the system based on the sounding training correction method, and the sounding lip shape of the target object can be effectively corrected according to the standard lip reading sequence and the lip reading comparison result given by the system, so that the sounding lip shape of the target object can be effectively corrected, and the speech ability of other people can be improved without the help of other people.
The target objects described in this embodiment include people with hearing impairment and people with difficulty in pronunciation.
Based on the same inventive concept, the invention also provides a sounding training correction system, comprising:
the extraction module is used for extracting gesture sequence features and lip sequence features of the target object from the video to be identified;
the matching module is used for extracting standard lip sequence features matched with the gesture sequence features from a pre-constructed voice training database;
and the comparison module is used for obtaining the similarity of the corresponding lips based on the standard lip sequence characteristics and the lip sequence characteristics of the target object and providing a standard lip reading sequence for training for the target object.
Firstly, preprocessing is needed to be carried out on the dynamic video, gesture labels are matched with lip labels one by one, and the ResNet50 network is utilized to ensure accuracy. Second, the lip image area is segmented using a MobileNet network to extract features. Then, the time series information is learned using the LSTM network. At present, scientists mainly study how to improve the accuracy of lip and gesture extraction characteristics and how to improve the recognition rate of the lip and gesture extraction characteristics, but the invention combines the ResNet50 network with the MobileNet network and the LSTM network to be applied to rehabilitation training, thereby making great contribution to rehabilitation training of hearing impaired people.
The hearing impaired person can correct and improve the mouth shape similarity according to the comparison result of the automatic lip reading identification from the main training.
The system provided by the invention supports medical instruments such as an artificial cochlea and the like, and assists hearing impaired people to learn how to correctly pronounce, so that the hearing impaired people can recover the speech ability.
The invention also provides an electronic device comprising:
a memory storing at least one instruction; and
And the processor executes the instructions stored in the memory to realize any one of the sounding training correction methods provided by the invention.
The embodiment of the invention also provides a computer readable storage medium, wherein at least one instruction is stored in the computer readable storage medium, and the at least one instruction is executed by a processor in electronic equipment to realize any sounding training correction method provided by the invention.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The foregoing is illustrative of the present invention and is not to be construed as limiting thereof, but rather as providing for the use of additional embodiments and advantages of all such modifications, equivalents, improvements and similar to the present invention are intended to be included within the scope of the present invention as defined by the appended claims.
Claims (8)
1. A method of vocalization training correction comprising:
extracting gesture sequence features and lip sequence features of a target object from a video to be identified;
extracting standard lip sequence features matched with the gesture sequence features from a pre-constructed voice training database;
obtaining similarity of corresponding lips based on the standard lip sequence features and the lip sequence features of the target object, and providing a standard lip reading sequence for training for the target object;
the voice training database stores standard sign language words and lip shapes corresponding to the sign language words;
the extracting standard lip sequence features matched with the gesture sequence features from a pre-constructed voice training database comprises the following steps:
finding out each sign language vocabulary matched with the gesture sequence characteristics from the voice training database;
obtaining standard lip shapes corresponding to each sign language vocabulary based on each sign language vocabulary;
and generating standard lip sequence characteristics from the standard lips corresponding to each sign language vocabulary.
2. The method of claim 1, wherein after finding each sign language vocabulary matching the gesture sequence feature from the speech training database, further comprising:
feeding back a recognition result to the target object based on each sign language vocabulary;
and correcting the identification result based on the expression content of the target object until the identification result is consistent with the expression content of the target object.
3. The method of claim 2, wherein the modifying the recognition result based on the representation of the target object comprises:
and when the identification result is inconsistent with the expression content of the target object, re-recording the video to be identified based on the expression content of the target object.
4. The method of claim 1, wherein the similarity of the lips is calculated as:
in the method, in the process of the invention,similarity of lip shape +.>Is a lip in the lip sequence feature of the target object,/->Is a lip in a lip sequence feature, +.>Is a penalty coefficient.
5. The method of claim 1, wherein extracting gesture sequence features and lip sequence features of a target object from a video to be identified comprises:
the gesture labels are matched with the lip labels one by one through ResNet50, so that gesture sequence characteristics are obtained;
dividing lip image areas in the video to be identified by using a MobileNet network to extract lip features;
and learning time sequence information by utilizing the LSTM network to obtain lip sequence characteristics.
6. A vocalization training correction system, comprising:
the extraction module is used for extracting gesture sequence features and lip sequence features of the target object from the video to be identified;
the matching module is used for extracting standard lip sequence features matched with the gesture sequence features from a pre-constructed voice training database;
the comparison module is used for obtaining the similarity of the corresponding lips based on the standard lip sequence characteristics and the lip sequence characteristics of the target object and providing a standard lip reading sequence for training for the target object;
the voice training database stores standard sign language words and lip shapes corresponding to the sign language words;
the matching module is specifically configured to:
finding out each sign language vocabulary matched with the gesture sequence characteristics from the voice training database;
obtaining standard lip shapes corresponding to each sign language vocabulary based on each sign language vocabulary;
and generating standard lip sequence characteristics from the standard lips corresponding to each sign language vocabulary.
7. An electronic device, the electronic device comprising:
a memory storing at least one instruction; and
A processor executing instructions stored in the memory to implement the vocalization training correction method of any one of claims 1 to 5.
8. A computer readable storage medium having stored therein at least one instruction for execution by a processor in an electronic device to implement the vocalization training correction method of any one of claims 1-5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010517909.XA CN111832412B (en) | 2020-06-09 | 2020-06-09 | Sounding training correction method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010517909.XA CN111832412B (en) | 2020-06-09 | 2020-06-09 | Sounding training correction method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111832412A CN111832412A (en) | 2020-10-27 |
CN111832412B true CN111832412B (en) | 2024-04-09 |
Family
ID=72899322
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010517909.XA Active CN111832412B (en) | 2020-06-09 | 2020-06-09 | Sounding training correction method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111832412B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114758647A (en) * | 2021-07-20 | 2022-07-15 | 无锡柠檬科技服务有限公司 | Language training method and system based on deep learning |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101882390A (en) * | 2010-06-12 | 2010-11-10 | 黑龙江新洋科技有限公司 | Three-dimensional lip language interactive teaching system and method thereof |
CN104537358A (en) * | 2014-12-26 | 2015-04-22 | 安徽寰智信息科技股份有限公司 | Lip language recognition lip-shape training database generating method based on deep learning |
CN105047196A (en) * | 2014-04-25 | 2015-11-11 | 通用汽车环球科技运作有限责任公司 | Systems and methods for speech artifact compensation in speech recognition systems |
CN107301863A (en) * | 2017-07-13 | 2017-10-27 | 江苏师范大学 | A kind of deaf-mute child's disfluency method of rehabilitation and rehabilitation training system |
CN109389098A (en) * | 2018-11-01 | 2019-02-26 | 重庆中科云丛科技有限公司 | A kind of verification method and system based on lip reading identification |
CN109637521A (en) * | 2018-10-29 | 2019-04-16 | 深圳壹账通智能科技有限公司 | A kind of lip reading recognition methods and device based on deep learning |
CN110047511A (en) * | 2019-04-23 | 2019-07-23 | 赵旭 | A kind of speech training method, device, computer equipment and its storage medium |
CN110059575A (en) * | 2019-03-25 | 2019-07-26 | 中国科学院深圳先进技术研究院 | A kind of augmentative communication system based on the identification of surface myoelectric lip reading |
CN110532850A (en) * | 2019-07-02 | 2019-12-03 | 杭州电子科技大学 | A kind of fall detection method based on video artis and hybrid classifer |
-
2020
- 2020-06-09 CN CN202010517909.XA patent/CN111832412B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101882390A (en) * | 2010-06-12 | 2010-11-10 | 黑龙江新洋科技有限公司 | Three-dimensional lip language interactive teaching system and method thereof |
CN105047196A (en) * | 2014-04-25 | 2015-11-11 | 通用汽车环球科技运作有限责任公司 | Systems and methods for speech artifact compensation in speech recognition systems |
CN104537358A (en) * | 2014-12-26 | 2015-04-22 | 安徽寰智信息科技股份有限公司 | Lip language recognition lip-shape training database generating method based on deep learning |
CN107301863A (en) * | 2017-07-13 | 2017-10-27 | 江苏师范大学 | A kind of deaf-mute child's disfluency method of rehabilitation and rehabilitation training system |
CN109637521A (en) * | 2018-10-29 | 2019-04-16 | 深圳壹账通智能科技有限公司 | A kind of lip reading recognition methods and device based on deep learning |
CN109389098A (en) * | 2018-11-01 | 2019-02-26 | 重庆中科云丛科技有限公司 | A kind of verification method and system based on lip reading identification |
CN110059575A (en) * | 2019-03-25 | 2019-07-26 | 中国科学院深圳先进技术研究院 | A kind of augmentative communication system based on the identification of surface myoelectric lip reading |
CN110047511A (en) * | 2019-04-23 | 2019-07-23 | 赵旭 | A kind of speech training method, device, computer equipment and its storage medium |
CN110532850A (en) * | 2019-07-02 | 2019-12-03 | 杭州电子科技大学 | A kind of fall detection method based on video artis and hybrid classifer |
Also Published As
Publication number | Publication date |
---|---|
CN111832412A (en) | 2020-10-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Engwall | Analysis of and feedback on phonetic features in pronunciation training with a virtual teacher | |
CN103996155A (en) | Intelligent interaction and psychological comfort robot service system | |
CN113657168B (en) | Student learning emotion recognition method based on convolutional neural network | |
CN108389573A (en) | Language Identification and device, training method and device, medium, terminal | |
CN111126280B (en) | Gesture recognition fusion-based aphasia patient auxiliary rehabilitation training system and method | |
CN114898861A (en) | Multi-modal depression detection method and system based on full attention mechanism | |
Ivanko et al. | Automatic lip-reading of hearing impaired people | |
CN109492112A (en) | A kind of method of the computer aided writing scientific popular article of knowledge based map | |
CN115188074A (en) | Interactive physical training evaluation method, device and system and computer equipment | |
CN115936944A (en) | Virtual teaching management method and device based on artificial intelligence | |
CN111832412B (en) | Sounding training correction method and system | |
Dreuw et al. | The signspeak project-bridging the gap between signers and speakers | |
Yeo et al. | Visual Speech Recognition for Languages with Limited Labeled Data Using Automatic Labels from Whisper | |
Poomhiran et al. | Improving the recognition performance of lip reading using the concatenated three sequence keyframe image technique | |
Krishnamoorthy et al. | E-Learning Platform for Hearing Impaired Students | |
KR20190068841A (en) | System for training and evaluation of english pronunciation using artificial intelligence speech recognition application programming interface | |
Nemani et al. | Speaker independent VSR: A systematic review and futuristic applications | |
Yeo et al. | Visual speech recognition for low-resource languages with automatic labels from whisper model | |
Chitu et al. | Visual speech recognition automatic system for lip reading of Dutch | |
CN115985310A (en) | Dysarthria voice recognition method based on multi-stage audio-visual fusion | |
Foysol et al. | Vision-based Real Time Bangla Sign Language Recognition System Using MediaPipe Holistic and LSTM | |
CN114238587A (en) | Reading understanding method and device, storage medium and computer equipment | |
Mishra et al. | Environment descriptor for the visually impaired | |
Idushan et al. | Sinhala Sign Language Learning System for Hearing Impaired Community | |
Datar et al. | A Review on Deep Learning Based Lip-Reading |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |