CN113658584A

CN113658584A - Intelligent pronunciation correction method and system

Info

Publication number: CN113658584A
Application number: CN202110956597.7A
Authority: CN
Inventors: 王晓怡; 张宝月; 马珠江
Original assignee: Beijing Smart Spirit Technology Co ltd
Current assignee: Beijing Smart Spirit Technology Co ltd
Priority date: 2021-08-19
Filing date: 2021-08-19
Publication date: 2021-11-16

Abstract

The invention relates to the field of pronunciation correction, in particular to an intelligent pronunciation correction method and system. The system comprises a data acquisition module, a data processing module and a feedback module. The technical scheme of the invention integrates voice recognition, electric palate diagram technology and lip recognition technology, the electric palate diagram and lip recognition technology are used as auxiliary rehabilitation correction tools to help a patient to recover normal pronunciation, the voice recognition technology is used as a primary detection evaluation standard to judge the rehabilitation degree of the patient by judging whether the pronunciation of the patient can be recognized, and meanwhile, the electric palate diagram and lip recognition technology are used as a secondary auxiliary detection evaluation standard to help the patient to detect the rehabilitation degree.

Description

Intelligent pronunciation correction method and system

Technical Field

The invention relates to the field of pronunciation correction, in particular to an intelligent pronunciation correction method and system.

Background

At present, the dysarthria rehabilitation treatment still needs the participation of speech therapists, and the domestic speech therapists are few and cannot meet the market demand. Although the speech rehabilitation system can assist the patient in rehabilitation therapy, the patient can only exercise the breathing and pronunciation muscles of the patient, and the patient cannot be helped to correct pronunciation and master a correct pronunciation method. Tongue position and mouth shape in the pronunciation process need to be guided and corrected by a specially-assigned person, intelligent correction cannot be realized, and full-self-service home rehabilitation treatment cannot be realized, so that dysarthria rehabilitation treatment depends on speech therapists to a great extent.

The application of the computer-aided technology enables the dysarthria to be recovered with high efficiency and convenience. Studies have shown that multisensory stimulation is more beneficial for learning. The comprehensive mode of vision, hearing and touch is beneficial to mastering new skills more quickly and deepening memory.

Brain damage is one of the major causes of dysarthria. Patients with brain impairment often have cognitive impairment, such as impairment of attention, memory and the like, leading to reduced higher cognitive functions such as comprehension and learning, or language impairment such as auditory understanding, which makes it difficult to guide and understand the language, and in this case, it is important to increase visual feedback. When a clinical therapist is treating, the patient often needs to be subjected to complex language explanation and demonstration, but because the understanding and learning abilities of the patient are poor, the visibility of the action in the mouth is low, the patient, especially children, often have difficulty in understanding the intention of the therapist and need to communicate repeatedly, and the clinical curative effect is influenced.

Disclosure of Invention

The invention aims to provide an intelligent pronunciation correction method.

It is yet another object of the present invention to provide an intelligent pronunciation correction system.

The intelligent pronunciation correction system comprises a data acquisition module, a data processing module and a feedback module, wherein,

the data acquisition module comprises a sound acquisition module, a lip-shaped acquisition module and an oral cavity acquisition module, wherein,

the sound collection module is used for collecting the tone, the tone and the semantic content of the patient,

the lip shape acquisition module is used for acquiring the shape of the lips of a patient during pronunciation,

the oral cavity acquisition module is used for acquiring the related data of the sound, the airflow, the internal pressure of the oral cavity, the tongue position and the mouth shape of a user;

the data processing module comprises a voice processing module, an image recognition module and a tongue position sensing module, wherein,

the voice processing module carries out comparative analysis on the tone, tone and semantic information collected by the sound collection module and a standard model built in the system under the natural language processing technology, carries out correct and wrong judgment feedback on input information,

the image recognition module compares the collected lip information with the lip shape with correct pronunciation to judge whether the pronunciation of the patient is correct or not,

the tongue position sensing module realizes automatic processing of electric palate image data and feeds back tongue and palate contact condition images and data processing results in real time;

the feedback module comprises a voice feedback module, a lip feedback module and an oral cavity feedback module.

The intelligent pronunciation correction system comprises a microphone, a lip-shaped acquisition module and an oral cavity acquisition module, wherein the lip-shaped acquisition module comprises a camera, and the oral cavity acquisition module comprises an electric palate image and an electric palate image specific hard palate mold.

According to the intelligent pronunciation correction system, the image recognition module recognizes the lip shape through the camera.

The intelligent pronunciation correction system according to the present invention, wherein the speech processing module comprises a preprocessing module, a feature extraction module, an acoustic model, a language model and dictionary module, and a decoding module,

the preprocessing module carries out audio data preprocessing on the collected sound signals and extracts audio signals needing to be analyzed from original signals;

the feature extraction module converts the sound signal from a time domain to a frequency domain and provides a proper feature vector for the acoustic model;

the acoustic model calculates the score of each feature vector on the acoustic features according to the acoustic characteristics;

the language model calculates the probability of the sound signal corresponding to possible syllable, character, phrase or sentence sequence according to the theory of linguistic correlation;

and the decoding module decodes the phrase sequence according to the existing dictionary to obtain the final text information fed back by the user.

The system further comprises a text analysis module, wherein the text analysis module is used for analyzing the obtained text information, understanding and processing the text information output in the voice processing process, comparing the obtained result with a standard text after the analysis is finished, obtaining the result of the user on the current evaluation or training task, and displaying the result in the feedback module.

The intelligent pronunciation correction method of the invention comprises a dysarthria evaluation process and a training process, wherein,

the dysarthria assessment process comprises the following steps:

s1-1: acquiring user account information;

s1-2: taking a test task target;

s1-3: obtaining lip shape, sound, airflow, oral cavity internal pressure and tongue position information of a user;

s1-4: processing the collected user voice, lip shape and oral cavity information, and comparing with a built-in standard model;

s1-5: it is determined whether the user information matches a standard model, wherein,

if the current syllable is not matched with the next syllable, the next syllable test is carried out, if the current syllable is not matched with the next syllable, the current syllable error frequency is increased by one, and simultaneously the next syllable is carried out, and if the current syllable error frequency is not larger than 3, the syllable is taken into the training registration;

s1-6: all syllable tests are completed, a report is presented,

the training process comprises the steps of:

s2-1: acquiring user account information;

s2-2: prompting the user to perform pre-training relaxation exercises;

s2-3: prompting a user to perform sound-forming organ training, playing a training course, enabling the user to practice breathing, feel vocal organs and enhance oral muscle strength;

s2-4: the external equipment is connected with lip and tongue position information acquisition;

s2-5: prompting a user to perform daily voice training recommended by the system;

s2-6: presenting feedback, feeding back whether the pronunciation is correct or not through a screen and prompt voice, entering the next question if the pronunciation is correct, and prompting correct pronunciation, tongue position and lip shape model picture for the user to refer to if the pronunciation is wrong;

s2-7: and if the daily training task is completely finished, reminding the user to finish the training.

The technical scheme of the application has the following advantages:

(1) the voice recognition technology, the electric palate diagram technology and the lip recognition technology are fused together for the first time, the patient is assisted in correcting pronunciation from the aspects of hearing, vision, touch and the like, in addition, the system can also help the patient to judge the course of dysarthria, the rehabilitation scheme is intelligently matched for the patient, the training and evaluation are realized, the training scheme is adjusted in real time, and personalized and targeted treatment is really realized.

(2) The face recognition technology is further improved, dynamic lip recognition is realized, and the method is applied to detection and rehabilitation correction of dysarthria.

(3) The voice recognition technology is improved, the original voice recognition technology is dedicated to converting voice into characters, and the voice recognition technology is combined with an artificial intelligence technology to help a speech therapist or a patient to judge dysarthria and the severity of the illness state of the dysarthria and is applied to a speech rehabilitation process.

(4) The application of the technology of the invention can greatly reduce the demand of the market on the speech trainers and cover wider crowds to a greater extent. The cost of manpower and material resources is saved, and the rehabilitation level is improved. The system also provides convenience for patients, and the patients can feel the rehabilitation treatment service of professional speech therapists without going out.

The technical scheme of the invention integrates voice recognition, electric palate diagram technology and lip recognition technology, the electric palate diagram and lip recognition technology are used as auxiliary rehabilitation correction tools to help a patient to recover normal pronunciation, the voice recognition technology is used as a primary detection evaluation standard to judge the rehabilitation degree of the patient by judging whether the pronunciation of the patient can be recognized, and meanwhile, the electric palate diagram and lip recognition technology are used as a secondary auxiliary detection evaluation standard to help the patient to detect the rehabilitation degree.

Drawings

FIG. 1 is a schematic structural diagram of an intelligent pronunciation correction system according to the present invention;

FIG. 2 is a flow chart of dysarthria assessment of the intelligent pronunciation correction method of the present invention;

FIG. 3 is a flow chart of dysarthria training of the intelligent pronunciation correction method of the present invention.

Detailed Description

The technical scheme of the application is described in detail in the following with the accompanying drawings.

As shown in fig. 1, the intelligent pronunciation correction system of the present invention comprises a data acquisition module, a data processing module and a feedback module, wherein,

the sound collection module is used for collecting the tone, the tone and the semantic content of the patient, and is convenient for evaluating the severity of the illness state and the dysarthria type of the patient.

The lip shape collection module is used for collecting the shape of lips of a patient during pronunciation, so that the wrong pronunciation mode of the patient can be corrected conveniently.

The oral cavity acquisition module is used for acquiring relevant data of a user such as sound, airflow, internal pressure of an oral cavity, tongue position and mouth shape, is convenient for detecting the symptom of dysarthria of a patient, provides basis for matching of training tasks, and is convenient for the patient to learn and master a correct pronunciation mode.

the voice processing module carries out comparative analysis on the acquired tone, tone and semantic information and a standard model built in the system by the sound information acquisition module under the natural language processing technology, and carries out correct and wrong judgment on input information;

the image recognition module compares the collected lip information with the lip with correct pronunciation to judge whether the pronunciation of the patient is correct or not;

the tongue position sensing module realizes the automatic processing of the electric palate image data and feeds back tongue and palate contact condition images and data processing results in real time.

According to the intelligent pronunciation correction system, the sound collection module comprises a microphone so as to better collect the user's sound and realize voice recognition, and simultaneously, the lip collection module comprises a camera, and the oral cavity collection module comprises an electric palate image and an electric palate image-specific hard palate mold.

According to the intelligent pronunciation correction system, the electric palate diagram device requires a subject to wear a specially customized personalized hard palate mold, 62 electronic sensors are arranged on the mold, the sensors can display the contact point of the tongue and the upper palate on a display module in real time, and can detect sound waves, a sound spectrum, the air flow rate of the mouth and the nose, the fluctuation of the throat and the pressure in the oral cavity, so that convenience is brought to detection and training of dysarthria.

According to the intelligent pronunciation correction system, the image recognition module recognizes the lip shape through the camera, recognizes the dynamic lip shape, conveniently captures the action of the pronunciation organ outside the oral cavity, and realizes the omnibearing pronunciation guidance.

The intelligent pronunciation correction system of the invention is matched with a rehabilitation scheme according to the evaluation result of the user. The system firstly determines the reasons of dysarthria of the patient, such as muscle weakness, dysarthria of the dysarthria and the like according to the evaluation result, and for the problems, the system can recommend corresponding training tasks of exercising the dysarthria, breathing and the like according to the disease of the user. Secondly, a large number of standard libraries such as acoustic models, language models, dictionaries and the like are stored in the system, the pronunciation condition of the patient is judged by comparing and analyzing the collected patient voice with the built-in standard models, and then corresponding characters, words and pronunciations are matched for practice according to the evaluation result.

According to the intelligent pronunciation correction system, the lip shape acquisition module is used for acquiring the shape of the lips of a patient during pronunciation, so that the error pronunciation mode of the patient can be corrected conveniently. Compared with the face recognition technology, the dynamic lip recognition has the advantages that the range of the area needing to be recognized is smaller, and the capture of fine motion is sensitive. Further, the dynamic lip shape recognition is not only a process for a still image, but also a process for a moving image or a continuous multi-frame image, and recognizes a coherent relationship between images. The method comprises the steps of collecting a dynamic lip image of a speaker through an image collection function of a miniature camera, processing the lip image, extracting lip features, and then carrying out contrastive analysis on the lip feature image and a standard phoneme or syllable pronunciation lip model built in a system to judge whether the lip is correct or not.

According to the technical scheme of the invention, the voice processing mainly comprises two aspects of voice recognition and text processing. The voice recognition is realized through the voice processing module, and the voice processing module mainly comprises a preprocessing module, a feature extraction module, an acoustic model, a language model and dictionary module and a decoding module.

The preprocessing module is mainly used for preprocessing the audio data such as filtering and framing of the collected sound signals and properly extracting the audio signals to be analyzed from the original signals;

the language model calculates the probability of the sound signal corresponding to possible syllable, word, phrase or sentence sequence according to the theory related to linguistics;

The text processing is realized through the text analysis module, and the text information output in the voice processing process, such as syllables, characters, phrases or sentences, is understood and processed mainly based on an artificial intelligent natural language processing technology.

The natural language understanding is applied to the text information generated by the speech recognition, and the text length and the content complexity are different according to different contents of evaluation or training. The natural language understanding technology can process longer text information, carries out structure prediction on the text information, marks out the boundary of each word in a sentence fed back by a user, extracts central elements such as time points, places, people and the like, and completes classification and cluster analysis. And after the analysis is finished, comparing the obtained result with the standard text to obtain the result of the user on the current evaluation or training task, and presenting the result to the feedback module.

The pronunciation training method of phoneme and single syllable is as follows: the voice training is first sounded by the tablet system, and then the patient simulates the tablet sound to sound, and the tablet recognizes the voice uttered by the patient. And if the pronunciation is correct, the next training task is carried out, if the pronunciation is incorrect, a corresponding course is presented according to the characteristics of the phonemes and the syllables, and meanwhile, lip positioning and electric palate mapping equipment records the lip and tongue information of the patient in real time.

When the patient carries out voice training, the flat screen can display correct sounding videos, and simultaneously, lip-shaped positioning points and tongue and palate contact images are displayed in real time according to practice phonemes or syllables. In the process that the patient imitates the sound production of flat board, record the lip by camera and location technique to show in real time on the screen of flat board, when the lip is correct, the system is correct with green sign suggestion, and when the contact point is wrong, the system is wrong with red sign suggestion. Meanwhile, the contact point of the tongue position and the palate is recorded by the electric palate image and displayed on a screen of a flat plate in real time, when the contact point of pronunciation is correct, the system prompts the correctness by green identification, and when the contact point is wrong, the system prompts the mistake by red identification. The system can monitor the vocal organs needing to participate in the vocalization in real time according to each voice, detect the health condition of the vocal organs, and complete the detection while training. The training of words and sentences requires the participation of pictures in accordance with the meaning of the words or sentences to help patients practice the vocalization of the words and sentences.

The training system can help the old people to recover dysarthria after stroke and can also help children with dysarthria to correct pronunciation.

The intelligent pronunciation correction method comprises a dysarthria evaluation process and a training process, wherein the dysarthria training process mainly helps a user to exercise pronunciation organs, breath and muscles; the voice training needs to help the user correct the lip shape, the tongue position and the like by means of external equipment so as to help the patient to make meaningful voice and improve the voice accuracy.

As shown in fig. 2, the dysarthria assessment process includes the following steps:

s1-1: the information of the account of the user is acquired,

s1-2: taking test task objects such as simple tones, words, phrases, sentences, or pictures with storylines … …;

s1-3: obtaining information of a user such as lip shape, sound, airflow, oral pressure, tongue position and the like;

if the current syllable is not matched with the current syllable, the next syllable test is carried out, if the current syllable is not matched with the current syllable, the current syllable error frequency is plus 1, and simultaneously the next syllable is carried out, and if the current syllable error frequency is not less than 3, the syllable is taken into the training registration;

s1-6: all syllable tests are completed, reports are presented, and the report content comprises basic information of user name, gender and age, and also comprises contents of the correct rate of each syllable, the syllable project needing to be corrected, guidance suggestions and the like.

As shown in fig. 3, the training process includes the following steps:

s2-1: acquiring user account information;

s2-2: the user is prompted to relax before training, the tongue of the user is guided to move, the muscles of the throat are relaxed, and the flexibility of the vocal organs is improved;

s2-3: prompting a user to perform sound-forming organ training, playing breath and pronunciation muscle training courses, enabling the patient to practice breathing and feel pronunciation organs, enhancing oral muscle strength and helping the user to better control the pronunciation organs;

s2-4: and connecting external equipment such as a lip-shaped catching device, a hard palate mould, an earphone and the like.

S2-5: and prompting the user to perform daily voice training recommended by the system.

S2-6: and presenting feedback, feeding back whether the pronunciation is correct or not through a screen and prompt voice, entering the next question if the pronunciation is correct, and prompting correct pronunciation, tongue position and lip shape model picture for the user to refer to if the pronunciation is wrong.

According to the intelligent pronunciation correction method, the evaluation process requires that a patient wears an electric palate chart instrument to make a sound, and the severity of dysarthria and which sounds are dysarthric are judged by an artificial intelligent voice recognition technology; judging the health degree of the vocal organs and the strength of the sound-forming muscles by the electric palate diagram through factors such as oral-nasal airflow, oral pressure and the like; and finally, judging whether the sounding lip shape is correct or not by the lip shape recognizer, giving an evaluation report by combining the three factors, and giving a reasonable training scheme according to the report.

The invention integrates natural language processing technology in the field of artificial intelligence in the voice recognition process, converts the work of information understanding, recall, search and the like which needs to be finished by doctors, main testers or trainers and the like in the traditional work into real-time feedback of a computer, improves the training efficiency of the patient with language disorder, and reduces the workload of the doctors, the main testers or trainers and the like. In addition, doctors, trial masters or trainees and the like evaluate real-time feedback results of patients according to the system, help the system to automatically correct standard models of syllables, characters, words, phrases and long text information, improve recognition accuracy and enable the rehabilitation system to be more accurate and intelligent.

Although the existing system can combine audio and video in the prior art, the activity condition of the sound-forming organ is not easy to be observed by a speech trainer and a patient, the interestingness and richness of training are increased to a greater extent by combining the audio and video at present, the aim of improving the training effect is achieved by an indirect mode of stimulating multiple senses, and the system is called as an auxiliary system of a speech therapist instead of an intelligent speech rehabilitation system.

The invention introduces an electric palatal map (EPG) technology into dysarthria rehabilitation, and measures the activity degree of the acoustic organs of a patient by recording the number of times of tongue and hard palate touching in the speech process to form an electric palatal map. The technical scheme of the invention provides dynamic visual feedback to reflect the language activity which is not easy to be observed by clinical staff. Studies have shown that this mode of visualization is effective in treating patients with dysarthria after brain injury. The use of computer-aided technology provides a more tangible model and accurate way to address the need for complex specifications by the therapist.

The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. An intelligent pronunciation correction system is characterized by comprising a data acquisition module, a data processing module and a feedback module, wherein,

2. The intelligent pronunciation correction system of claim 1, wherein the sound collection module comprises a microphone, the lip collection module comprises a camera, and the oral collection module comprises an electric palate diagram, an electric palate diagram specific hard palate mold.

3. The intelligent pronunciation correction system of claim 1, wherein the image recognition module recognizes the lip shape via a camera.

4. The intelligent pronunciation correction system of claim 1, wherein the speech processing module comprises a pre-processing module, a feature extraction module, an acoustic model, a language model and dictionary module, and a decoding module,

5. The intelligent pronunciation correction system of claim 4, further comprising a text analysis module, wherein the text analysis module analyzes the obtained text information, understands and processes the text information output in the speech processing process, compares the obtained result with a standard text after the analysis is completed, obtains the result of the user on the current evaluation or training task, and presents the result to the feedback module.

6. An intelligent pronunciation correction method, which is characterized by comprising a dysarthria evaluation process and a training process, wherein,

the dysarthria assessment process comprises the following steps:

s1-1: acquiring user account information;

s1-2: taking a test task target;

s1-6: all syllable tests are completed, a report is presented,

the training process comprises the steps of:

s2-1: acquiring user account information;

s2-2: prompting the user to perform pre-training relaxation exercises;