CN113658584A - Intelligent pronunciation correction method and system - Google Patents
Intelligent pronunciation correction method and system Download PDFInfo
- Publication number
- CN113658584A CN113658584A CN202110956597.7A CN202110956597A CN113658584A CN 113658584 A CN113658584 A CN 113658584A CN 202110956597 A CN202110956597 A CN 202110956597A CN 113658584 A CN113658584 A CN 113658584A
- Authority
- CN
- China
- Prior art keywords
- module
- pronunciation
- user
- lip
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 38
- 238000012937 correction Methods 0.000 title claims abstract description 34
- 238000012545 processing Methods 0.000 claims abstract description 34
- 238000005516 engineering process Methods 0.000 claims abstract description 30
- 210000003254 palate Anatomy 0.000 claims abstract description 27
- 238000011156 evaluation Methods 0.000 claims abstract description 13
- 238000010586 diagram Methods 0.000 claims abstract description 12
- 238000012549 training Methods 0.000 claims description 50
- 206010013887 Dysarthria Diseases 0.000 claims description 29
- 210000000214 mouth Anatomy 0.000 claims description 28
- 230000008569 process Effects 0.000 claims description 22
- 210000000056 organ Anatomy 0.000 claims description 15
- 230000005236 sound signal Effects 0.000 claims description 12
- 230000005477 standard model Effects 0.000 claims description 11
- 238000004458 analytical method Methods 0.000 claims description 9
- 238000007781 pre-processing Methods 0.000 claims description 9
- 238000012360 testing method Methods 0.000 claims description 9
- 210000003205 muscle Anatomy 0.000 claims description 8
- 238000000605 extraction Methods 0.000 claims description 6
- 210000001983 hard palate Anatomy 0.000 claims description 6
- 201000000615 hard palate cancer Diseases 0.000 claims description 6
- 230000001755 vocal effect Effects 0.000 claims description 6
- 238000003058 natural language processing Methods 0.000 claims description 5
- 230000029058 respiratory gaseous exchange Effects 0.000 claims description 5
- 238000012854 evaluation process Methods 0.000 claims description 4
- 238000010835 comparative analysis Methods 0.000 claims description 3
- 239000000284 extract Substances 0.000 claims description 3
- 238000001514 detection method Methods 0.000 abstract description 7
- 230000000694 effects Effects 0.000 description 5
- 230000009471 action Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 208000029028 brain injury Diseases 0.000 description 2
- 230000006735 deficit Effects 0.000 description 2
- 230000036541 health Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 208000028698 Cognitive impairment Diseases 0.000 description 1
- 206010013496 Disturbance in attention Diseases 0.000 description 1
- 208000010428 Muscle Weakness Diseases 0.000 description 1
- 206010028372 Muscular weakness Diseases 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 230000006931 brain damage Effects 0.000 description 1
- 231100000874 brain damage Toxicity 0.000 description 1
- 238000010224 classification analysis Methods 0.000 description 1
- 238000007621 cluster analysis Methods 0.000 description 1
- 208000010877 cognitive disease Diseases 0.000 description 1
- 230000003920 cognitive function Effects 0.000 description 1
- 230000001427 coherent effect Effects 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000009432 framing Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 208000011977 language disease Diseases 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 230000004936 stimulating effect Effects 0.000 description 1
- 230000000638 stimulation Effects 0.000 description 1
- 208000024891 symptom Diseases 0.000 description 1
- 238000002560 therapeutic procedure Methods 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/24—Speech recognition using non-acoustical features
- G10L15/25—Speech recognition using non-acoustical features using position of the lips, movement of the lips or face analysis
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/63—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Child & Adolescent Psychology (AREA)
- General Health & Medical Sciences (AREA)
- Hospice & Palliative Care (AREA)
- Psychiatry (AREA)
- Signal Processing (AREA)
- Electrically Operated Instructional Devices (AREA)
Abstract
The invention relates to the field of pronunciation correction, in particular to an intelligent pronunciation correction method and system. The system comprises a data acquisition module, a data processing module and a feedback module. The technical scheme of the invention integrates voice recognition, electric palate diagram technology and lip recognition technology, the electric palate diagram and lip recognition technology are used as auxiliary rehabilitation correction tools to help a patient to recover normal pronunciation, the voice recognition technology is used as a primary detection evaluation standard to judge the rehabilitation degree of the patient by judging whether the pronunciation of the patient can be recognized, and meanwhile, the electric palate diagram and lip recognition technology are used as a secondary auxiliary detection evaluation standard to help the patient to detect the rehabilitation degree.
Description
Technical Field
The invention relates to the field of pronunciation correction, in particular to an intelligent pronunciation correction method and system.
Background
At present, the dysarthria rehabilitation treatment still needs the participation of speech therapists, and the domestic speech therapists are few and cannot meet the market demand. Although the speech rehabilitation system can assist the patient in rehabilitation therapy, the patient can only exercise the breathing and pronunciation muscles of the patient, and the patient cannot be helped to correct pronunciation and master a correct pronunciation method. Tongue position and mouth shape in the pronunciation process need to be guided and corrected by a specially-assigned person, intelligent correction cannot be realized, and full-self-service home rehabilitation treatment cannot be realized, so that dysarthria rehabilitation treatment depends on speech therapists to a great extent.
The application of the computer-aided technology enables the dysarthria to be recovered with high efficiency and convenience. Studies have shown that multisensory stimulation is more beneficial for learning. The comprehensive mode of vision, hearing and touch is beneficial to mastering new skills more quickly and deepening memory.
Brain damage is one of the major causes of dysarthria. Patients with brain impairment often have cognitive impairment, such as impairment of attention, memory and the like, leading to reduced higher cognitive functions such as comprehension and learning, or language impairment such as auditory understanding, which makes it difficult to guide and understand the language, and in this case, it is important to increase visual feedback. When a clinical therapist is treating, the patient often needs to be subjected to complex language explanation and demonstration, but because the understanding and learning abilities of the patient are poor, the visibility of the action in the mouth is low, the patient, especially children, often have difficulty in understanding the intention of the therapist and need to communicate repeatedly, and the clinical curative effect is influenced.
Disclosure of Invention
The invention aims to provide an intelligent pronunciation correction method.
It is yet another object of the present invention to provide an intelligent pronunciation correction system.
The intelligent pronunciation correction system comprises a data acquisition module, a data processing module and a feedback module, wherein,
the data acquisition module comprises a sound acquisition module, a lip-shaped acquisition module and an oral cavity acquisition module, wherein,
the sound collection module is used for collecting the tone, the tone and the semantic content of the patient,
the lip shape acquisition module is used for acquiring the shape of the lips of a patient during pronunciation,
the oral cavity acquisition module is used for acquiring the related data of the sound, the airflow, the internal pressure of the oral cavity, the tongue position and the mouth shape of a user;
the data processing module comprises a voice processing module, an image recognition module and a tongue position sensing module, wherein,
the voice processing module carries out comparative analysis on the tone, tone and semantic information collected by the sound collection module and a standard model built in the system under the natural language processing technology, carries out correct and wrong judgment feedback on input information,
the image recognition module compares the collected lip information with the lip shape with correct pronunciation to judge whether the pronunciation of the patient is correct or not,
the tongue position sensing module realizes automatic processing of electric palate image data and feeds back tongue and palate contact condition images and data processing results in real time;
the feedback module comprises a voice feedback module, a lip feedback module and an oral cavity feedback module.
The intelligent pronunciation correction system comprises a microphone, a lip-shaped acquisition module and an oral cavity acquisition module, wherein the lip-shaped acquisition module comprises a camera, and the oral cavity acquisition module comprises an electric palate image and an electric palate image specific hard palate mold.
According to the intelligent pronunciation correction system, the image recognition module recognizes the lip shape through the camera.
The intelligent pronunciation correction system according to the present invention, wherein the speech processing module comprises a preprocessing module, a feature extraction module, an acoustic model, a language model and dictionary module, and a decoding module,
the preprocessing module carries out audio data preprocessing on the collected sound signals and extracts audio signals needing to be analyzed from original signals;
the feature extraction module converts the sound signal from a time domain to a frequency domain and provides a proper feature vector for the acoustic model;
the acoustic model calculates the score of each feature vector on the acoustic features according to the acoustic characteristics;
the language model calculates the probability of the sound signal corresponding to possible syllable, character, phrase or sentence sequence according to the theory of linguistic correlation;
and the decoding module decodes the phrase sequence according to the existing dictionary to obtain the final text information fed back by the user.
The system further comprises a text analysis module, wherein the text analysis module is used for analyzing the obtained text information, understanding and processing the text information output in the voice processing process, comparing the obtained result with a standard text after the analysis is finished, obtaining the result of the user on the current evaluation or training task, and displaying the result in the feedback module.
The intelligent pronunciation correction method of the invention comprises a dysarthria evaluation process and a training process, wherein,
the dysarthria assessment process comprises the following steps:
s1-1: acquiring user account information;
s1-2: taking a test task target;
s1-3: obtaining lip shape, sound, airflow, oral cavity internal pressure and tongue position information of a user;
s1-4: processing the collected user voice, lip shape and oral cavity information, and comparing with a built-in standard model;
s1-5: it is determined whether the user information matches a standard model, wherein,
if the current syllable is not matched with the next syllable, the next syllable test is carried out, if the current syllable is not matched with the next syllable, the current syllable error frequency is increased by one, and simultaneously the next syllable is carried out, and if the current syllable error frequency is not larger than 3, the syllable is taken into the training registration;
s1-6: all syllable tests are completed, a report is presented,
the training process comprises the steps of:
s2-1: acquiring user account information;
s2-2: prompting the user to perform pre-training relaxation exercises;
s2-3: prompting a user to perform sound-forming organ training, playing a training course, enabling the user to practice breathing, feel vocal organs and enhance oral muscle strength;
s2-4: the external equipment is connected with lip and tongue position information acquisition;
s2-5: prompting a user to perform daily voice training recommended by the system;
s2-6: presenting feedback, feeding back whether the pronunciation is correct or not through a screen and prompt voice, entering the next question if the pronunciation is correct, and prompting correct pronunciation, tongue position and lip shape model picture for the user to refer to if the pronunciation is wrong;
s2-7: and if the daily training task is completely finished, reminding the user to finish the training.
The technical scheme of the application has the following advantages:
(1) the voice recognition technology, the electric palate diagram technology and the lip recognition technology are fused together for the first time, the patient is assisted in correcting pronunciation from the aspects of hearing, vision, touch and the like, in addition, the system can also help the patient to judge the course of dysarthria, the rehabilitation scheme is intelligently matched for the patient, the training and evaluation are realized, the training scheme is adjusted in real time, and personalized and targeted treatment is really realized.
(2) The face recognition technology is further improved, dynamic lip recognition is realized, and the method is applied to detection and rehabilitation correction of dysarthria.
(3) The voice recognition technology is improved, the original voice recognition technology is dedicated to converting voice into characters, and the voice recognition technology is combined with an artificial intelligence technology to help a speech therapist or a patient to judge dysarthria and the severity of the illness state of the dysarthria and is applied to a speech rehabilitation process.
(4) The application of the technology of the invention can greatly reduce the demand of the market on the speech trainers and cover wider crowds to a greater extent. The cost of manpower and material resources is saved, and the rehabilitation level is improved. The system also provides convenience for patients, and the patients can feel the rehabilitation treatment service of professional speech therapists without going out.
The technical scheme of the invention integrates voice recognition, electric palate diagram technology and lip recognition technology, the electric palate diagram and lip recognition technology are used as auxiliary rehabilitation correction tools to help a patient to recover normal pronunciation, the voice recognition technology is used as a primary detection evaluation standard to judge the rehabilitation degree of the patient by judging whether the pronunciation of the patient can be recognized, and meanwhile, the electric palate diagram and lip recognition technology are used as a secondary auxiliary detection evaluation standard to help the patient to detect the rehabilitation degree.
Drawings
FIG. 1 is a schematic structural diagram of an intelligent pronunciation correction system according to the present invention;
FIG. 2 is a flow chart of dysarthria assessment of the intelligent pronunciation correction method of the present invention;
FIG. 3 is a flow chart of dysarthria training of the intelligent pronunciation correction method of the present invention.
Detailed Description
The technical scheme of the application is described in detail in the following with the accompanying drawings.
As shown in fig. 1, the intelligent pronunciation correction system of the present invention comprises a data acquisition module, a data processing module and a feedback module, wherein,
the data acquisition module comprises a sound acquisition module, a lip-shaped acquisition module and an oral cavity acquisition module, wherein,
the sound collection module is used for collecting the tone, the tone and the semantic content of the patient, and is convenient for evaluating the severity of the illness state and the dysarthria type of the patient.
The lip shape collection module is used for collecting the shape of lips of a patient during pronunciation, so that the wrong pronunciation mode of the patient can be corrected conveniently.
The oral cavity acquisition module is used for acquiring relevant data of a user such as sound, airflow, internal pressure of an oral cavity, tongue position and mouth shape, is convenient for detecting the symptom of dysarthria of a patient, provides basis for matching of training tasks, and is convenient for the patient to learn and master a correct pronunciation mode.
The data processing module comprises a voice processing module, an image recognition module and a tongue position sensing module, wherein,
the voice processing module carries out comparative analysis on the acquired tone, tone and semantic information and a standard model built in the system by the sound information acquisition module under the natural language processing technology, and carries out correct and wrong judgment on input information;
the image recognition module compares the collected lip information with the lip with correct pronunciation to judge whether the pronunciation of the patient is correct or not;
the tongue position sensing module realizes the automatic processing of the electric palate image data and feeds back tongue and palate contact condition images and data processing results in real time.
The feedback module comprises a voice feedback module, a lip feedback module and an oral cavity feedback module.
According to the intelligent pronunciation correction system, the sound collection module comprises a microphone so as to better collect the user's sound and realize voice recognition, and simultaneously, the lip collection module comprises a camera, and the oral cavity collection module comprises an electric palate image and an electric palate image-specific hard palate mold.
According to the intelligent pronunciation correction system, the electric palate diagram device requires a subject to wear a specially customized personalized hard palate mold, 62 electronic sensors are arranged on the mold, the sensors can display the contact point of the tongue and the upper palate on a display module in real time, and can detect sound waves, a sound spectrum, the air flow rate of the mouth and the nose, the fluctuation of the throat and the pressure in the oral cavity, so that convenience is brought to detection and training of dysarthria.
According to the intelligent pronunciation correction system, the image recognition module recognizes the lip shape through the camera, recognizes the dynamic lip shape, conveniently captures the action of the pronunciation organ outside the oral cavity, and realizes the omnibearing pronunciation guidance.
The intelligent pronunciation correction system of the invention is matched with a rehabilitation scheme according to the evaluation result of the user. The system firstly determines the reasons of dysarthria of the patient, such as muscle weakness, dysarthria of the dysarthria and the like according to the evaluation result, and for the problems, the system can recommend corresponding training tasks of exercising the dysarthria, breathing and the like according to the disease of the user. Secondly, a large number of standard libraries such as acoustic models, language models, dictionaries and the like are stored in the system, the pronunciation condition of the patient is judged by comparing and analyzing the collected patient voice with the built-in standard models, and then corresponding characters, words and pronunciations are matched for practice according to the evaluation result.
According to the intelligent pronunciation correction system, the lip shape acquisition module is used for acquiring the shape of the lips of a patient during pronunciation, so that the error pronunciation mode of the patient can be corrected conveniently. Compared with the face recognition technology, the dynamic lip recognition has the advantages that the range of the area needing to be recognized is smaller, and the capture of fine motion is sensitive. Further, the dynamic lip shape recognition is not only a process for a still image, but also a process for a moving image or a continuous multi-frame image, and recognizes a coherent relationship between images. The method comprises the steps of collecting a dynamic lip image of a speaker through an image collection function of a miniature camera, processing the lip image, extracting lip features, and then carrying out contrastive analysis on the lip feature image and a standard phoneme or syllable pronunciation lip model built in a system to judge whether the lip is correct or not.
According to the technical scheme of the invention, the voice processing mainly comprises two aspects of voice recognition and text processing. The voice recognition is realized through the voice processing module, and the voice processing module mainly comprises a preprocessing module, a feature extraction module, an acoustic model, a language model and dictionary module and a decoding module.
The preprocessing module is mainly used for preprocessing the audio data such as filtering and framing of the collected sound signals and properly extracting the audio signals to be analyzed from the original signals;
the feature extraction module converts the sound signal from a time domain to a frequency domain and provides a proper feature vector for the acoustic model;
the acoustic model calculates the score of each feature vector on the acoustic features according to the acoustic characteristics;
the language model calculates the probability of the sound signal corresponding to possible syllable, word, phrase or sentence sequence according to the theory related to linguistics;
and the decoding module decodes the phrase sequence according to the existing dictionary to obtain the final text information fed back by the user.
The text processing is realized through the text analysis module, and the text information output in the voice processing process, such as syllables, characters, phrases or sentences, is understood and processed mainly based on an artificial intelligent natural language processing technology.
The natural language understanding is applied to the text information generated by the speech recognition, and the text length and the content complexity are different according to different contents of evaluation or training. The natural language understanding technology can process longer text information, carries out structure prediction on the text information, marks out the boundary of each word in a sentence fed back by a user, extracts central elements such as time points, places, people and the like, and completes classification and cluster analysis. And after the analysis is finished, comparing the obtained result with the standard text to obtain the result of the user on the current evaluation or training task, and presenting the result to the feedback module.
The pronunciation training method of phoneme and single syllable is as follows: the voice training is first sounded by the tablet system, and then the patient simulates the tablet sound to sound, and the tablet recognizes the voice uttered by the patient. And if the pronunciation is correct, the next training task is carried out, if the pronunciation is incorrect, a corresponding course is presented according to the characteristics of the phonemes and the syllables, and meanwhile, lip positioning and electric palate mapping equipment records the lip and tongue information of the patient in real time.
When the patient carries out voice training, the flat screen can display correct sounding videos, and simultaneously, lip-shaped positioning points and tongue and palate contact images are displayed in real time according to practice phonemes or syllables. In the process that the patient imitates the sound production of flat board, record the lip by camera and location technique to show in real time on the screen of flat board, when the lip is correct, the system is correct with green sign suggestion, and when the contact point is wrong, the system is wrong with red sign suggestion. Meanwhile, the contact point of the tongue position and the palate is recorded by the electric palate image and displayed on a screen of a flat plate in real time, when the contact point of pronunciation is correct, the system prompts the correctness by green identification, and when the contact point is wrong, the system prompts the mistake by red identification. The system can monitor the vocal organs needing to participate in the vocalization in real time according to each voice, detect the health condition of the vocal organs, and complete the detection while training. The training of words and sentences requires the participation of pictures in accordance with the meaning of the words or sentences to help patients practice the vocalization of the words and sentences.
The training system can help the old people to recover dysarthria after stroke and can also help children with dysarthria to correct pronunciation.
The intelligent pronunciation correction method comprises a dysarthria evaluation process and a training process, wherein the dysarthria training process mainly helps a user to exercise pronunciation organs, breath and muscles; the voice training needs to help the user correct the lip shape, the tongue position and the like by means of external equipment so as to help the patient to make meaningful voice and improve the voice accuracy.
As shown in fig. 2, the dysarthria assessment process includes the following steps:
s1-1: the information of the account of the user is acquired,
s1-2: taking test task objects such as simple tones, words, phrases, sentences, or pictures with storylines … …;
s1-3: obtaining information of a user such as lip shape, sound, airflow, oral pressure, tongue position and the like;
s1-4: processing the collected user voice, lip shape and oral cavity information, and comparing with a built-in standard model;
s1-5: it is determined whether the user information matches a standard model, wherein,
if the current syllable is not matched with the current syllable, the next syllable test is carried out, if the current syllable is not matched with the current syllable, the current syllable error frequency is plus 1, and simultaneously the next syllable is carried out, and if the current syllable error frequency is not less than 3, the syllable is taken into the training registration;
s1-6: all syllable tests are completed, reports are presented, and the report content comprises basic information of user name, gender and age, and also comprises contents of the correct rate of each syllable, the syllable project needing to be corrected, guidance suggestions and the like.
As shown in fig. 3, the training process includes the following steps:
s2-1: acquiring user account information;
s2-2: the user is prompted to relax before training, the tongue of the user is guided to move, the muscles of the throat are relaxed, and the flexibility of the vocal organs is improved;
s2-3: prompting a user to perform sound-forming organ training, playing breath and pronunciation muscle training courses, enabling the patient to practice breathing and feel pronunciation organs, enhancing oral muscle strength and helping the user to better control the pronunciation organs;
s2-4: and connecting external equipment such as a lip-shaped catching device, a hard palate mould, an earphone and the like.
S2-5: and prompting the user to perform daily voice training recommended by the system.
S2-6: and presenting feedback, feeding back whether the pronunciation is correct or not through a screen and prompt voice, entering the next question if the pronunciation is correct, and prompting correct pronunciation, tongue position and lip shape model picture for the user to refer to if the pronunciation is wrong.
S2-7: and if the daily training task is completely finished, reminding the user to finish the training.
According to the intelligent pronunciation correction method, the evaluation process requires that a patient wears an electric palate chart instrument to make a sound, and the severity of dysarthria and which sounds are dysarthric are judged by an artificial intelligent voice recognition technology; judging the health degree of the vocal organs and the strength of the sound-forming muscles by the electric palate diagram through factors such as oral-nasal airflow, oral pressure and the like; and finally, judging whether the sounding lip shape is correct or not by the lip shape recognizer, giving an evaluation report by combining the three factors, and giving a reasonable training scheme according to the report.
The invention integrates natural language processing technology in the field of artificial intelligence in the voice recognition process, converts the work of information understanding, recall, search and the like which needs to be finished by doctors, main testers or trainers and the like in the traditional work into real-time feedback of a computer, improves the training efficiency of the patient with language disorder, and reduces the workload of the doctors, the main testers or trainers and the like. In addition, doctors, trial masters or trainees and the like evaluate real-time feedback results of patients according to the system, help the system to automatically correct standard models of syllables, characters, words, phrases and long text information, improve recognition accuracy and enable the rehabilitation system to be more accurate and intelligent.
Although the existing system can combine audio and video in the prior art, the activity condition of the sound-forming organ is not easy to be observed by a speech trainer and a patient, the interestingness and richness of training are increased to a greater extent by combining the audio and video at present, the aim of improving the training effect is achieved by an indirect mode of stimulating multiple senses, and the system is called as an auxiliary system of a speech therapist instead of an intelligent speech rehabilitation system.
The invention introduces an electric palatal map (EPG) technology into dysarthria rehabilitation, and measures the activity degree of the acoustic organs of a patient by recording the number of times of tongue and hard palate touching in the speech process to form an electric palatal map. The technical scheme of the invention provides dynamic visual feedback to reflect the language activity which is not easy to be observed by clinical staff. Studies have shown that this mode of visualization is effective in treating patients with dysarthria after brain injury. The use of computer-aided technology provides a more tangible model and accurate way to address the need for complex specifications by the therapist.
The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.
Claims (6)
1. An intelligent pronunciation correction system is characterized by comprising a data acquisition module, a data processing module and a feedback module, wherein,
the data acquisition module comprises a sound acquisition module, a lip-shaped acquisition module and an oral cavity acquisition module, wherein,
the sound collection module is used for collecting the tone, the tone and the semantic content of the patient,
the lip shape acquisition module is used for acquiring the shape of the lips of a patient during pronunciation,
the oral cavity acquisition module is used for acquiring the related data of the sound, the airflow, the internal pressure of the oral cavity, the tongue position and the mouth shape of a user;
the data processing module comprises a voice processing module, an image recognition module and a tongue position sensing module, wherein,
the voice processing module carries out comparative analysis on the tone, tone and semantic information collected by the sound collection module and a standard model built in the system under the natural language processing technology, carries out correct and wrong judgment feedback on input information,
the image recognition module compares the collected lip information with the lip shape with correct pronunciation to judge whether the pronunciation of the patient is correct or not,
the tongue position sensing module realizes automatic processing of electric palate image data and feeds back tongue and palate contact condition images and data processing results in real time;
the feedback module comprises a voice feedback module, a lip feedback module and an oral cavity feedback module.
2. The intelligent pronunciation correction system of claim 1, wherein the sound collection module comprises a microphone, the lip collection module comprises a camera, and the oral collection module comprises an electric palate diagram, an electric palate diagram specific hard palate mold.
3. The intelligent pronunciation correction system of claim 1, wherein the image recognition module recognizes the lip shape via a camera.
4. The intelligent pronunciation correction system of claim 1, wherein the speech processing module comprises a pre-processing module, a feature extraction module, an acoustic model, a language model and dictionary module, and a decoding module,
the preprocessing module carries out audio data preprocessing on the collected sound signals and extracts audio signals needing to be analyzed from original signals;
the feature extraction module converts the sound signal from a time domain to a frequency domain and provides a proper feature vector for the acoustic model;
the acoustic model calculates the score of each feature vector on the acoustic features according to the acoustic characteristics;
the language model calculates the probability of the sound signal corresponding to possible syllable, character, phrase or sentence sequence according to the theory of linguistic correlation;
and the decoding module decodes the phrase sequence according to the existing dictionary to obtain the final text information fed back by the user.
5. The intelligent pronunciation correction system of claim 4, further comprising a text analysis module, wherein the text analysis module analyzes the obtained text information, understands and processes the text information output in the speech processing process, compares the obtained result with a standard text after the analysis is completed, obtains the result of the user on the current evaluation or training task, and presents the result to the feedback module.
6. An intelligent pronunciation correction method, which is characterized by comprising a dysarthria evaluation process and a training process, wherein,
the dysarthria assessment process comprises the following steps:
s1-1: acquiring user account information;
s1-2: taking a test task target;
s1-3: obtaining lip shape, sound, airflow, oral cavity internal pressure and tongue position information of a user;
s1-4: processing the collected user voice, lip shape and oral cavity information, and comparing with a built-in standard model;
s1-5: it is determined whether the user information matches a standard model, wherein,
if the current syllable is not matched with the next syllable, the next syllable test is carried out, if the current syllable is not matched with the next syllable, the current syllable error frequency is increased by one, and simultaneously the next syllable is carried out, and if the current syllable error frequency is not larger than 3, the syllable is taken into the training registration;
s1-6: all syllable tests are completed, a report is presented,
the training process comprises the steps of:
s2-1: acquiring user account information;
s2-2: prompting the user to perform pre-training relaxation exercises;
s2-3: prompting a user to perform sound-forming organ training, playing a training course, enabling the user to practice breathing, feel vocal organs and enhance oral muscle strength;
s2-4: the external equipment is connected with lip and tongue position information acquisition;
s2-5: prompting a user to perform daily voice training recommended by the system;
s2-6: presenting feedback, feeding back whether the pronunciation is correct or not through a screen and prompt voice, entering the next question if the pronunciation is correct, and prompting correct pronunciation, tongue position and lip shape model picture for the user to refer to if the pronunciation is wrong;
s2-7: and if the daily training task is completely finished, reminding the user to finish the training.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110956597.7A CN113658584A (en) | 2021-08-19 | 2021-08-19 | Intelligent pronunciation correction method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110956597.7A CN113658584A (en) | 2021-08-19 | 2021-08-19 | Intelligent pronunciation correction method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113658584A true CN113658584A (en) | 2021-11-16 |
Family
ID=78492480
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110956597.7A Pending CN113658584A (en) | 2021-08-19 | 2021-08-19 | Intelligent pronunciation correction method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113658584A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114783049A (en) * | 2022-03-21 | 2022-07-22 | 广东工业大学 | Spoken language learning method and system based on deep neural network visual recognition |
CN114944218A (en) * | 2022-01-18 | 2022-08-26 | 华东师范大学 | Data query processing method and database system for correcting consonant dysarthria |
CN116705070A (en) * | 2023-08-02 | 2023-09-05 | 南京优道言语康复研究院 | Method and system for correcting speech pronunciation and nasal sound after cleft lip and palate operation |
CN117198340A (en) * | 2023-09-20 | 2023-12-08 | 南京优道言语康复研究院 | Dysarthria correction effect analysis method based on optimized acoustic parameters |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN2458683Y (en) * | 2000-12-26 | 2001-11-07 | 徐巍 | Visible pronunciation training apparatus |
CN101751809A (en) * | 2010-02-10 | 2010-06-23 | 长春大学 | Deaf children speech rehabilitation method and system based on three-dimensional head portrait |
CN102063903A (en) * | 2010-09-25 | 2011-05-18 | 中国科学院深圳先进技术研究院 | Speech interactive training system and speech interactive training method |
-
2021
- 2021-08-19 CN CN202110956597.7A patent/CN113658584A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN2458683Y (en) * | 2000-12-26 | 2001-11-07 | 徐巍 | Visible pronunciation training apparatus |
CN101751809A (en) * | 2010-02-10 | 2010-06-23 | 长春大学 | Deaf children speech rehabilitation method and system based on three-dimensional head portrait |
CN102063903A (en) * | 2010-09-25 | 2011-05-18 | 中国科学院深圳先进技术研究院 | Speech interactive training system and speech interactive training method |
Non-Patent Citations (1)
Title |
---|
林馨,等: "《语音病理学》", 浙江工商大学出版社, pages: 243 - 246 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114944218A (en) * | 2022-01-18 | 2022-08-26 | 华东师范大学 | Data query processing method and database system for correcting consonant dysarthria |
CN114783049A (en) * | 2022-03-21 | 2022-07-22 | 广东工业大学 | Spoken language learning method and system based on deep neural network visual recognition |
CN114783049B (en) * | 2022-03-21 | 2023-06-23 | 广东工业大学 | Spoken language learning method and system based on deep neural network visual recognition |
CN116705070A (en) * | 2023-08-02 | 2023-09-05 | 南京优道言语康复研究院 | Method and system for correcting speech pronunciation and nasal sound after cleft lip and palate operation |
CN116705070B (en) * | 2023-08-02 | 2023-10-17 | 南京优道言语康复研究院 | Method and system for correcting speech pronunciation and nasal sound after cleft lip and palate operation |
CN117198340A (en) * | 2023-09-20 | 2023-12-08 | 南京优道言语康复研究院 | Dysarthria correction effect analysis method based on optimized acoustic parameters |
CN117198340B (en) * | 2023-09-20 | 2024-04-30 | 南京优道言语康复研究院 | Dysarthria correction effect analysis method based on optimized acoustic parameters |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113658584A (en) | Intelligent pronunciation correction method and system | |
Benus et al. | Articulatory characteristics of Hungarian ‘transparent’vowels | |
US20070055523A1 (en) | Pronunciation training system | |
JP6234563B2 (en) | Training system | |
CN106073706B (en) | A kind of customized information and audio data analysis method and system towards Mini-mental Status Examination | |
Cosentino et al. | Quantitative laughter detection, measurement, and classification—A critical survey | |
CN107301863A (en) | A kind of deaf-mute child's disfluency method of rehabilitation and rehabilitation training system | |
US20020087322A1 (en) | Method for utilizing oral movement and related events | |
CN109727608A (en) | A kind of ill voice appraisal procedure based on Chinese speech | |
TWI294107B (en) | A pronunciation-scored method for the application of voice and image in the e-learning | |
CN108113651A (en) | A kind of patients with Chinese aphasia mental language evaluation method and evaluation system | |
Beckman et al. | Methods for eliciting, annotating, and analyzing databases for child speech development | |
Freitas et al. | An introduction to silent speech interfaces | |
WO2006034569A1 (en) | A speech training system and method for comparing utterances to baseline speech | |
WO2007134494A1 (en) | A computer auxiliary method suitable for multi-languages pronunciation learning system for deaf-mute | |
CN108320625A (en) | Vibrational feedback system towards speech rehabilitation and device | |
CN109166629A (en) | The method and system of aphasia evaluation and rehabilitation auxiliary | |
CN114916921A (en) | Rapid speech cognition assessment method and device | |
Davis et al. | Repeating and remembering foreign language words: Implications for language teaching systems | |
Liu et al. | An interactive speech training system with virtual reality articulation for Mandarin-speaking hearing impaired children | |
CN107591163B (en) | Pronunciation detection method and device and voice category learning method and system | |
CN113593374A (en) | Multi-modal speech rehabilitation training system combining oral muscle training | |
Zhao et al. | Pronouncing rehabilitation of hearing-impaired children based on chinese 3d visual-speech database | |
CN112786151B (en) | Language function training system and method | |
Yamada et al. | Assistive speech technology for persons with speech impairments |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |