CN114515138A

CN114515138A - Language disorder assessment and correction system

Info

Publication number: CN114515138A
Application number: CN202210011831.3A
Authority: CN
Inventors: 林兆勋
Original assignee: Fuzhou Xingkanglang Language Education Technology Co ltd
Current assignee: Fuzhou Xingkanglang Language Education Technology Co ltd
Priority date: 2022-01-06
Filing date: 2022-01-06
Publication date: 2022-05-20

Abstract

The invention relates to the technical field of language barriers and discloses a language barrier assessment and correction system which comprises an input module, a recording module, a storage module, a processing module, a comparison module, a display module and a playing module, wherein the input module is used for inputting external audio signals, the recording module is used for recording the external audio signals, the storage module is used for storing the audio signals, the processing module is used for converting data to be compared into a type capable of being compared, the comparison module is used for comparing the data converted into the same type to generate a comparison similarity, the display module is used for displaying the comparison similarity to a user, and the playing module is used for clicking and playing audio by the user; the problems that in the prior art, the correction process of the language barrier is not targeted, the process is not strict, and the evaluation system is high in cost, difficult to operate, limited in region and the like are solved.

Description

Language disorder assessment and correction system

Technical Field

The invention relates to the technical field of language disorder, in particular to a language disorder evaluation and correction system.

Background

The language is the most important human communication tool, is a means for thinking and expressing thought of human beings, is the most basic information carrier of human society, is an important medium for thought collision between people, and the existence of the language cannot be separated between people.

Products in the current market have no products for evaluating the conditions of language disorder patients, the evaluation of the language disorder patients is based on artificial and face-to-face evaluation, artificial professional disparities exist in the technology, a unified standard does not exist, and certain influence can be generated on the accuracy of an evaluation result; on the other hand, the existing voice guidance system on the market is complex in device, and needs to prepare various hardware facilities such as a correction device, a projection device, an earphone and a headset, and the correction device is formed by combining unit devices such as a recording unit, a playing unit, a display unit, and a storage unit, so that the device is complicated and one person must set the device. The setting of one auxiliary device is not only high in equipment cost and has no great advantages compared with the one-to-one professional guidance of personnel configuration relative to the next, but also has no pertinence to the correction process of a language barrier, and the process is not strict, which is the defects of high cost, difficult operation, region limitation and the like of the traditional evaluation system, so that the language barrier evaluation and correction system is needed to solve the problems.

Disclosure of Invention

The invention provides a language barrier evaluation and correction system, which saves complex operations of equipment and manual work, fills the blank of digitalization of the evaluation system, has the advantage of facilitating language evaluation and correction of an evaluated person at any time and any place, and solves the problems of no pertinence in the correction process of a language barrier, non-rigorous process, high cost, difficult operation, region limitation and the like of the evaluation system in the prior art.

The invention provides the following technical scheme: a language barrier assessment and correction system comprises an input module, a recording module, a storage module, a processing module, a comparison module, a display module and a playing module, wherein the input module is used for inputting external audio signals, the recording module is used for recording the external audio signals, the storage module is used for storing the audio signals, the processing module is used for converting data to be compared into types capable of being compared, the comparison module is used for comparing the data converted into the same type to generate a comparison similarity, the display module is used for displaying the data to be seen by a user, the playing module is used for clicking the audio to be played by the user, the storage module stores standard pronunciation audio in a file folder in advance, a database only stores respective paths of the standard pronunciation audio and presents the paths to the user for clicking one by one through a circular traversal method, click the corresponding audio frequency of broadcast when the user, can hear the standard pronunciation audio frequency that has deposited, hear corresponding audio frequency back user will follow up reading, the sound that the user said will be recorded to the recording module, and pass through storage module stores to this is as outside audio signal, simultaneously, the record is clicked the audio frequency of broadcast and is regarded as inside audio signal, the recording module can record including the recording module can record the correct pronunciation including unit sound final, compound vowel final and consonant initial, and pass through storage module saves as picture annotation adds the video file of pronunciation.

Preferably, the external audio signal and the internal audio signal are processed by fourier transform and a window function to obtain an energy spectrum of an audio segment, the energy spectrum of the audio segment is extracted into chroma feature vectors of each frame, and a set of the chroma feature vectors is formed, so as to obtain respective feature matrices.

Preferably, the feature matrices of the external audio signal and the internal audio signal are compared with each other by a DTW algorithm to determine the similarity between the two signals.

Preferably, the video file of the picture annotation and pronunciation contains a text description of a voice correction method, and is matched with a voice explanation correction operation and an explanation real operation audio image.

Preferably, the storage module further stores a speech stream training material, and the speech stream training material is used for mandarin chinese speech stream training.

Preferably, the phonemes comprise vowels and consonants, and the syllables comprise initials and finals.

Preferably, the vowels and the consonants include six unit vowels, eighteen complex vowels and consonants.

The invention has the following beneficial effects:

the method and the device aim at the manual evaluation system of the language barrier in the current market, and fill the blank of the digitalization of the evaluation system.

The invention can be automatically operated by a demander on a mobile phone for online evaluation and correction, thereby saving the defects of equipment, complex manual operation, fixed place evaluation and the like and facilitating the language evaluation and correction of the evaluated person at any time and any place.

After the self-service evaluation is carried out on the evaluated person, an evaluation report is provided according to the evaluation condition of the evaluated person, and a correction scheme can be provided in a subsequent language correction system in a targeted manner, so that the language correction can be better carried out.

The system can evaluate at any time in the correction process, the correction progress condition of a user is determined, the correction process is recorded in a data mode, and the system can adjust the correction plan according to the evaluation condition of each time and the requirement; the closed loop setting of evaluation, correction, feedback and evaluation can detect the sound production condition of the user in real time, thereby improving the accuracy of the pronunciation of the language handicapped.

The on-line evaluation and correction mode of the invention is convenient for the evaluation process of the evaluated person and the family to a great extent, and the evaluation can be carried out at home without specially dragging the family to a designated place, reserving, queuing and consulting and the like; when the epidemic situation is abused, the evaluation mode also improves the safety and meets the requirement of national epidemic situation management and control.

Drawings

FIG. 1 is a diagram of data collection in accordance with the present invention;

FIG. 2 is a data processing diagram of the present invention;

FIG. 3 is a diagram of syllables of the present invention;

FIG. 4 is an exemplary diagram of a phrase of the present invention.

Detailed Description

The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.

Referring to fig. 1 to 4, a language disorder evaluating and correcting system includes an input module, a recording module, a storage module, a processing module, a comparison module, a display module, and a playing module.

Wherein, the input module is used for inputting external audio signals, the recording module is used for recording the external audio signals, the storage module is used for storing the audio signals, the processing module is used for converting the data to be compared into the type capable of being compared, the comparison module is used for comparing the data converted into the same type to generate a comparison similarity, the display module is used for displaying the data to a user, the playing module is used for clicking and playing the audio by the user, the external audio signals are input into the input module, the external audio signals are recorded by the recording module, the recorded audio is stored, the standard pronunciation audio signals stored by the storage module are taken into the processing module, the external information signals and the internal audio signals are processed and converted into the data structures of the same type, and the comparison module is used for comparing to form a comparison similarity, recording the information, storing the information, and displaying a test report on a display module; the storage module stores standard pronunciation audio in a folder in advance, the database only stores respective paths of the standard pronunciation audio and the standard pronunciation audio, the standard pronunciation audio is presented to a user for clicking one by one through a circular traversal method, when the user clicks and plays the corresponding audio, the stored standard pronunciation audio is heard, the user can follow and read after hearing the corresponding audio, the recording module records the voice spoken by the user and stores the voice as an external audio signal through the storage module, meanwhile, the clicked and played audio is recorded as an internal audio signal, the recording module records the correct pronunciation of phonemes and syllables, a video file for annotating and adding pronunciation for a picture is stored through the storage module, and a video file for annotating and adding pronunciation for the picture is stored through the storage module.

The external audio signal and the internal audio signal are processed through Fourier transform and a window function to obtain an energy spectrum of a section of audio clip, the energy spectrum of the audio clip is extracted into chroma characteristic vectors of each frame, and a set of the chroma characteristic vectors is formed, and finally respective characteristic matrixes are obtained.

The feature matrices of the external audio signal and the internal audio signal are compared with each other through a DTW algorithm to judge the similarity of the two signals.

The video file of the picture annotation and pronunciation contains the graphic and text description of the voice correction method, and is matched with the voice explanation correction operation and the explanation real operation audio image.

The storage module also stores speech stream training materials, and the speech stream training materials are used for mandarin speech stream training.

The phonemes comprise vowels and consonants, and the syllables comprise initials and finals.

The vowels and consonants include six unit-tone vowels, eighteen complex vowel vowels and twenty-one consonant initials.

The working principle is as follows: the stored standard pronunciation audio is stored in a folder in advance, the database only stores respective paths of the standard pronunciation audio, the stored standard pronunciation audio is presented to a user for clicking one by one through a circular traversal method, when the user clicks and plays the corresponding audio, the stored standard pronunciation audio is heard, the user can read after hearing the corresponding audio, the system can record the voice spoken by the user and store the voice as an external audio signal, and simultaneously the clicked and played audio is recorded as an internal audio signal. The external audio signal and the internal audio signal are processed through Fourier transform and a window function to obtain an energy spectrum of an audio clip, the energy spectrum of the audio clip is extracted into chroma eigenvectors of each frame, and the chroma eigenvectors form a chroma eigenvector group, and finally respective feature matrixes are obtained. Then, the feature matrices of the external audio signal and the internal audio signal are compared for their similarity by the DTW algorithm, thereby judging the similarity of the two signals. After the comparison of the DTW algorithm, the external audio signal is used as a reference, the similarity is correspondingly recorded, a series of information is stored and generated, the accuracy of the pronunciation of the user is judged according to the fact that the value of the similarity reaches a given value, and finally an evaluation report information is formed and displayed to the user.

Wherein the language barrier assessment content comprises:

1. unit tone and vowel: a. o, e, i, u

2. Complex vowel and vowel: ai ei ui ao ou iu ie ue er

First nasal sound and final: an en in un u n

Last nasal sound: ang eng ing ong

3. Consonant consonants: lips sound b, p, m; lip and tongue tip pronunciations f, z, c, s; tongue tip medians d, t, n, l; tongue tip back sounds zh, ch, sh, r; lingual surface sounds j, q, x; tongue root sounds g, k, h.

Note: the correct pronunciations of the 6 unit tones and the finals, the 18 complex vowels and the finals and the 21 consonants are recorded and stored as the video file of the picture annotation and the pronunciations, and the user can follow the reading by referring to the correct pronunciations.

The language disorder correction contents include

Step one, dysarthria correction: phoneme (vowel and consonant) -syllable (consisting of initial and final) -the evaluation system re-evaluates syllable pronunciation until pronunciation criteria are correct;

1. correction of vowels and consonants: for 6 unit sounds and finals, 18 compound vowels and finals and the like and 21 consonants and initials, the specific voice correction method graphic description of each final and initial is recorded, and a user learns and corrects a wrong pronunciation mode by matching with voice explanation and correction operation and explanation of real operation voice images.

2. Correcting syllables: the Chinese character pronunciation is formed by 21 consonant initials and all vowels, and one pronunciation is selected as a correction object for each same tone. The user can learn and correct the wrong pronunciation mode by recording the graphic and text description of the syllable correction method and cooperating with the speech explanation and correction operation and the explanation of the real practice audio image.

3. And continuously comparing and evaluating the corrected standard pronunciation of the evaluation system until the pronunciation is completely correct.

Step two, mandarin chinese language flow training: phrases-short sentences-short texts and conversations are simple to difficult to implement.

Mandarin Chinese streaming training: the user reading the language flow training materials by the correction system follows the reading, phrase training is carried out on the corrected syllables, the voice fluency of the user is improved, and short sentences and short texts are trained after proficiency until the normal conversation is smooth and standard.

Claims

1. A language disorder assessment and correction system, comprising: the voice recognition system comprises an input module, a recording module, a storage module, a processing module, a comparison module, a display module and a playing module, wherein the input module is used for inputting external audio signals, the recording module is used for recording the external audio signals, the storage module is used for storing the audio signals, the processing module is used for converting data to be compared into types capable of being compared, the comparison module is used for comparing the data converted into the same type to generate a comparison similarity, the display module is used for displaying the data to a user, the playing module is used for clicking the audio to be played by the user, the storage module stores standard pronunciation audio in a file folder in advance, a database only stores respective paths of the standard pronunciation audio, the paths are presented to the user for clicking one by one through a circular traversal method, and when the user clicks to play the corresponding audio, the stored standard pronunciation audio can be heard, after hearing the corresponding audio, the user can read the audio, the recording module can record the voice spoken by the user and store the voice through the storage module to be used as an external audio signal, meanwhile, the clicked and played audio is recorded to be used as an internal audio signal, the recording module can record the correct pronunciation of the phonemes and syllables, a video file for annotating the picture and adding the pronunciation is stored through the storage module, and a video file for annotating the picture and adding the pronunciation is stored through the storage module.

2. The system of claim 1, wherein: the external audio signal and the internal audio signal are processed through Fourier transform and a window function to obtain an energy spectrum of an audio clip, the energy spectrum of the audio clip is extracted into chroma characteristic vectors of each frame, and a set of the chroma characteristic vectors is formed, and finally respective characteristic matrixes are obtained.

3. The system of claim 2, wherein: and comparing the similarity of the external audio signal and the internal audio signal by the feature matrix of the external audio signal and the internal audio signal through a DTW algorithm to judge the similarity of the two signals.

4. The system of claim 1, wherein: the video file of the picture annotation and pronunciation contains the graphic and text description of the voice correction method, and is matched with the voice explanation correction operation and the explanation real operation audio image.

5. The system of claim 1, wherein the system further comprises: the storage module also stores speech stream training materials, and the speech stream training materials are used for mandarin speech stream training.

6. The system of claim 1, wherein: the phonemes comprise vowels and consonants, and the syllables comprise initials and finals.

7. The system of claim 6, wherein: the vowels and the consonants comprise six unit-tone finals, eighteen complex vowel finals and twenty-one consonant initials.