CN112885168B

CN112885168B - Immersive speech feedback training system based on AI

Info

Publication number: CN112885168B
Application number: CN202110081356.2A
Authority: CN
Inventors: 范虹; 刘蓝冰; 尉泽民; 严晓波; 茹文亚
Original assignee: Shaoxing Peoples Hospital
Current assignee: Shaoxing Peoples Hospital
Priority date: 2021-01-21
Filing date: 2021-01-21
Publication date: 2022-09-09
Anticipated expiration: 2041-01-21
Also published as: CN112885168A

Abstract

The invention discloses an AI-based immersive speech feedback training system, which comprises a capability rating module, a grading learning module, a standard library, a movie playing module, a voice recognition module, an image acquisition module, a data receiving module, a data processing module, a learning scoring module and a scoring display module, wherein the learning module is used for learning and scoring; the ability rating module is used for rating the voice ability of the patient with language disorder and generating voice rating information, the voice rating information is sent to the grading learning module, the grading learning module receives the voice rating information and calls learning movies at corresponding levels from a standard library, and the standard library stores training movie information at different levels; the movie playing module is used for receiving movie information of a corresponding level sent by the hierarchical learning module control standard library, and the movie playing module starts playing after receiving the movie information of the corresponding level. The invention can better help and promote the rehabilitation training of the person with language disorder.

Description

Immersive speech feedback training system based on AI

Technical Field

The invention relates to the field of language training, in particular to an immersive speech feedback training system based on AI.

Background

Speech and language dysgenesis refers to a disorder of normal language acquisition patterns at an early stage of development, manifested as delays and abnormalities in pronunciation, language understanding, or development of language expression abilities that affect learning, occupation, and social functions. The situations are not caused by the abnormality of nerve or speech mechanisms, sensory impairment, mental development retardation or surrounding environmental factors, and a speech feedback training system is used for assisting rehabilitation training in the rehabilitation process of language disorder.

The existing speech feedback training system has a single function in the using process, so that the training effect is poor, the using requirement of a user cannot be met, certain influence is brought to the use of the speech feedback training system, and therefore, the immersive speech feedback training system based on the AI is provided.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: how to solve current speech feedback training system, in the use, the function singleness leads to its training effect relatively poor, can not satisfy user's user demand, has brought the problem of certain influence for speech feedback training system's use, provides an immersive speech feedback training system based on AI.

The invention solves the technical problems through the following technical scheme that the system comprises a capability rating module, a grading learning module, a standard library, a film and television playing module, a sound identification module, an image acquisition module, a data receiving module, a data processing module, a learning scoring module and a scoring display module;

the ability rating module is used for rating the voice ability of the patient with language disorder and generating voice rating information, the voice rating information is sent to the grading learning module, the grading learning module receives the voice rating information and calls learning movies at corresponding levels from a standard library, and the standard library stores training movie information at different levels;

the video playing module is used for receiving video information of a corresponding level sent by the hierarchical learning module control standard library, the video playing module starts playing after receiving the video information of the corresponding level, the sound identification module collects voice information sent by a language barrier patient at the moment, and meanwhile, the image collection module operates to collect mouth action information when the language barrier patient sends the voice information;

the voice information sent by the language disorder patient and the mouth action information when the voice information is sent by the language disorder patient are both sent to the data receiving module, and the data receiving module processes the received voice information sent by the language disorder patient and the mouth action information when the voice information is sent by the language disorder patient to generate voice comparison information and action comparison information;

the voice comparison information and the action comparison information are both sent to a learning scoring module, the learning scoring module processes the voice comparison information and the action comparison information to generate training scoring information, the training scoring information is sent to a scoring display module, and the scoring display module is used for displaying training scoring.

Preferably, the specific process of the ability rating module for rating the ability of the language-handicapped patient is as follows:

the method comprises the following steps: the capability rating module is preset with different levels of text content information, including primary text, middle level text information, high level text information and normal text information, and the difficulty level is as follows: the primary characters are less than the middle-level character information and more than the high-level character information and less than the normal character information;

step two: sequentially selecting at least x groups of character information from the primary character information, the middle-level character information, the high-level character information and the normal character information from normal to high level, wherein x is more than or equal to 5;

step three: displaying x groups of character information selected from the primary characters, the middle-level character information, the high-level character information and the normal character information, sequentially reading the x groups of character information in the primary characters, the x groups of character information in the middle-level character information, the x groups of character information in the high-level character information and the x groups of character information in the normal character information by a patient with language disorder, and respectively marking the reading of the character information from low to high as K1, K2, K3 and K4 according to the rank sequence;

step four: extracting preset pronunciation information of x groups of character information selected from the primary characters, the middle-level character information, the high-level character information and the normal character information, and respectively marking the preset pronunciation information as M1, M2, M3 and M4 according to the rank order;

step five: carrying out similarity matching on K1 and M1 to obtain similarity Km1, carrying out similarity matching on K2 and M2 to obtain similarity Km2, carrying out similarity matching on K3 and M3 to obtain similarity Km3, and carrying out similarity matching on K4 and M4 to obtain similarity Km 4;

step five: when any one of the similarity of Km1, Km2, Km3 and Km4 is larger than a preset value, the level is judged to belong to, and when two or more than two similarities are larger than the preset value, the similarity with high level is taken as a final judgment result.

Preferably, the video playing module plays the audio information synchronously while playing the video.

Preferably, training movie and television information of different grades stored in the standard library comprises mouth shape coefficient information corresponding to character information, the data processing module processes the acquired mouth motion information into real-time mouth shape coefficient information through the image acquisition module, and the real-time mouth shape coefficient information is compared with the pre-stored mouth shape coefficient information to obtain motion comparison information.

Preferably, the shape factor comprises a first shape factor and a second shape factor, and the specific process of the shape factor is as follows:

the method comprises the following steps: marking the key point of the upper lip as a point A1, marking two corner points of the upper lip as a point A2 and a point A3 respectively, and acquiring an arc segment L1 through the set point A1, the point A2 and the point A3;

step two: marking the key point of the lower lip as a point B1, marking two corner points of the lower lip as a point B2 and a point B3 respectively, and acquiring an arc line segment L2 through a set point B1, a set point B2 and a set point B3;

step three: connecting the point A1 with the point A2 to obtain a line segment L3, measuring the radians of the arc line segment L1 and the arc line segment L2, and measuring the length of the line segment L3;

step four: by the formula (L1+ L2)/(L1-L2) ═ L _{Ratio of} ，L _{Ratio of} The length of L3 is the second mouth shape factor;

the specific process of comparing the real-time mouth shape coefficient information with the pre-stored mouth shape coefficient information by the data processing module is as follows:

s1: extracting a real-time first mouth shape coefficient, a real-time second mouth shape coefficient, a preset first mouth shape coefficient and a preset second mouth shape coefficient, marking the real-time first mouth shape coefficient as P1, marking the real-time second mouth shape coefficient as P2, marking the preset first mouth shape coefficient as Q1 and marking the preset second mouth shape coefficient as Q2;

s2: the difference Pq1 between the real-time first shape factor P1 and the preset first shape factor labeled Q1 is calculated _{Difference (D)} Then, the difference between the real-time second shape coefficient P2 and the preset second shape coefficient Q2 is calculatedValue Pq2 _{Difference (D)} ；

S3: calculate Pq1 _{Difference (D)} Absolute value of (1) and Pq2 _{Difference (D)} Of absolute value of (Pq) _And obtaining the action comparison information Pq _And 。

preferably, the specific processing procedure of the data processing module for processing the voice comparison information is as follows:

SS 1: extracting standard voice information of the film and television information in the pre-storage library, performing voiceprint processing on the standard voice information to obtain standard voiceprint, and marking the standard voiceprint as F _{Sign board} ；

SS 2: performing culture processing on voice information of preset character contents read by language-handicapped patients acquired by the voice recognition module to obtain real-time voiceprints, and marking the real-time voiceprints as F _{Fruit of Chinese wolfberry} I.e. voice comparison information F _{Fruit of Chinese wolfberry} ；

SS 3: the obtained real-time voiceprint F _{Fruit of Chinese wolfberry} And standard voiceprint F _{Sign board} Comparing the similarity to obtain a similarity F _{Ratio of} 。

Preferably, the specific process of the learning scoring module for processing the voice comparison information and the motion comparison information to generate the training scoring information is as follows:

s01: extracting the obtained voice comparison information and action comparison information, and respectively marking the obtained voice comparison information and action comparison information as M and N;

s02: in order to highlight the importance of voice comparison, a correction value U1 is given to the voice comparison information, a correction value U2 is given to the action comparison information, U1 is greater than U2, and U1+ U2 is 1;

s03: by the formula M U1+ N U2 ═ Mn _And obtaining training score information Mn _And 。

preferably, the scoring display module ranks all the received training scoring information from high to low, and displays the personnel information corresponding to the first three maximum training scoring information after being amplified by a preset font.

Compared with the prior art, the invention has the following advantages: the immersive speech feedback training system based on AI can better evaluate the level of language disorder of a patient with language disorder before performing speech training, through the arrangement, the system can better provide speech training contents for the patient with language disorder, the arrangement is easy to achieve, the use experience of the system can be effectively improved, the frustration caused by the difficulty of the training contents to the patient with language disorder can be effectively avoided, the mouth shape condition of pronunciation can be simultaneously checked for the patient with language disorder through synchronous playing of movie contents and sound, pronunciation is performed through observing mouth shape simulation, the rehabilitation training progress of the patient with language disorder is accelerated, meanwhile, through double analysis of mouth shape and pronunciation, the rehabilitation training progress of the patient with language disorder can be more accurately evaluated, different arrangements meet different use requirements of the patient with language disorder, the system is more worthy of popularization and application.

Drawings

FIG. 1 is a system block diagram of the present invention.

Detailed Description

The following examples are given for the detailed implementation and specific operation of the present invention, but the scope of the present invention is not limited to the following examples.

As shown in fig. 1, the present embodiment provides a technical solution: an immersion type speech feedback training system based on AI comprises a capability rating module, a grading learning module, a standard library, a film and television playing module, a voice identification module, an image acquisition module, a data receiving module, a data processing module, a learning scoring module and a scoring display module;

the movie playing module is used for receiving movie information of a corresponding level sent by the hierarchical learning module control standard library, the movie playing module starts playing after receiving the movie information of the corresponding level, the movie playing module performs amplification close-up processing on mouths of characters in images when playing the movie information, so that mouth shape imitation of patients with language disorder is facilitated, the sound identification module collects voice information sent by the patients with language disorder at the moment, and meanwhile, the image collection module operates to collect mouth action information when the patients with language disorder send the voice information;

The specific process of the capacity rating module for rating the capacity of the language barrier patient is as follows:

the method comprises the following steps: the capability rating module is preset with different levels of text content information, including primary text, middle level text information, high level text information and normal text information, and the difficulty level is as follows: the primary characters < the middle-level character information < the high-level character information < the normal character information;

step two: sequentially selecting at least x groups of character information from the primary character information, the middle-level character information, the high-level character information and the normal character information from normal to high, wherein x is more than or equal to 5;

step three: displaying x groups of character information selected from the primary characters, the middle-level character information, the high-level character information and the normal character information, sequentially reading the x groups of character information in the primary characters, the x groups of character information in the middle-level character information, the x groups of character information in the high-level character information and the x groups of character information in the normal character information by a patient with language disorder, and marking the reading voices of the patients with language disorder as K1, K2, K3 and K4 from low to high according to the level sequence;

step four: extracting preset pronunciation information of x groups of character information selected from primary characters, middle-level character information, high-level character information and normal character information, and respectively marking the preset pronunciation information as M1, M2, M3 and M4 according to the rank order;

step five: when any one of the similarity of Km1, Km2, Km3 and Km4 is greater than a preset value, the level is judged to belong to, and when two or more than two similarities are greater than the preset value, the similarity with high level is taken as a final judgment result;

before carrying out the pronunciation training, the better rank assessment of language barrier carries out the language barrier to language barrier patient, through this kind of setting, lets the better for language barrier patient of this system provides the pronunciation training content, from easy to difficult setting, can effectual promotion this system use experience, has effectively avoided the training content too difficult to the frustration that causes for language barrier patient.

The movie & TV broadcast module is when carrying out the image broadcast, and synchronous broadcast audio information has effectually avoided the pronunciation of the language disorder patient that the sound painting desynchronized leads to make mistakes to through movie & TV content and sound synchronous broadcast, let can look over the mouth shape situation of pronunciation for the language disorder patient simultaneously, imitate through observing the mouth shape and pronounce, accelerated language disorder patient's rehabilitation training progress.

The standard library stores training video information of different grades, wherein the training video information comprises mouth shape coefficient information corresponding to character information, the data processing module processes the acquired mouth motion information into real-time mouth shape coefficient information through the image acquisition module, the real-time mouth shape coefficient information is compared with the pre-stored mouth shape coefficient information to acquire motion comparison information, and the rehabilitation training state of the patient with language disorder can be better evaluated by setting the mouth shape coefficient.

The shape coefficient comprises a first shape coefficient and a second shape coefficient, and the specific process of the shape coefficient is as follows:

the judgment accuracy is further improved by setting the two mouth shape coefficients;

s2: the difference Pq1 between the real-time first shape factor P1 and the preset first shape factor labeled Q1 is calculated _{Difference between} Then, the difference Pq2 between the real-time second shape coefficient P2 and the preset second shape coefficient Q2 is calculated _{Difference (D)} ；

S3: calculate Pq1 _{Difference (D)} Absolute value of (1) and Pq2 _{Difference (D)} Of absolute value of (Pq) _And obtaining the action comparison information Pq _And ；

through the setting, the acquisition of action comparison information can be better carried out.

The specific processing process of the data processing module for processing the voice comparison information is as follows:

SS 2: the voice information of the preset character content read by the language barrier patient and acquired by the voice identification module is subjected to culture-filling processing to obtain real-time voiceprints, and the real-time voiceprints are marked as F _{Fruit of Chinese wolfberry} I.e. voice comparison information F _{Fruit of Chinese wolfberry} ；

SS 3: obtaining the real-time voiceprint F _{Fruit of Chinese wolfberry} And standard voiceprint F _Sign Comparing the similarity to obtain a similarity F _{Ratio of} 。

The specific process of processing the voice comparison information and the action comparison information by the learning scoring module to generate training scoring information is as follows:

s03: by the formula M U1+ N U2 ═ Mn _And obtaining training score information Mn _And ；

through the double analysis to the shape of mouth and pronunciation, language disorder patient's that can be more accurate rehabilitation training progress aassessment, and the different user demands of language disorder patient have been satisfied in the setting of multiple difference, let this system be worth using widely more.

The scoring display module is used for ranking all the received training scoring information from high to low, amplifying the personnel information corresponding to the first three persons with the maximum training scoring information by using a preset font and then displaying the personnel information;

the method has the advantages that the language disorder patients can know the recovery states of other patients through the arrangement of the ranking information, and the rehabilitation training confidence of the language disorder patients can be stimulated, so that the rehabilitation speed of the language disorder patients is accelerated.

In conclusion, when the system is used, the ability rating module is used for rating the voice ability of the patient with language disorder and generating voice rating information, the voice rating information is sent to the grading learning module, the grading learning module receives the voice rating information and calls learning movies at corresponding levels from the standard library, the standard library stores training movie information at different levels, the system can better perform level evaluation on the language disorder of the patient with language disorder before voice training, through the setting, the system can better provide voice training contents for the patient with language disorder, the setting is easy to go wrong, the use experience of the system can be effectively improved, the frustration caused by the fact that the training contents are too difficult to the patient with language disorder can be effectively avoided, the movie playing module is used for receiving the movie information of corresponding levels sent by the grading learning module control standard library, the video playing module starts playing after receiving the video information of the corresponding level, and synchronously plays through video content and sound to enable a patient with language disorder to simultaneously check the mouth shape condition of pronunciation, pronounces through observing mouth shape simulation, accelerates the rehabilitation training progress of the patient with language disorder, the sound identification module collects the voice information sent by the patient with language disorder at the moment, the image collection module operates to collect the mouth action information when the patient with language disorder sends the voice information, the voice information sent by the patient with language disorder and the mouth action information when the patient with language disorder sends the voice information are both sent to the data receiving module, the data receiving module processes the received voice information sent by the patient with language disorder and the mouth action information when the patient with language disorder sends the voice information, and generates voice comparison information and action comparison information, the voice comparison information and the action comparison information are all sent to the learning scoring module, the learning scoring module processes the voice comparison information and the action comparison information to generate training scoring information, and meanwhile, through double analysis of mouth shape and pronunciation, the rehabilitation training progress of the language disorder patient can be more accurate to evaluate, different use requirements of the language disorder patient are met through setting of multiple differences, the system is enabled to be more worthy of popularization and use, the training scoring information is sent to the scoring display module, and the scoring display module is used for conducting training scoring display.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or to implicitly indicate the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one of the feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless explicitly specified otherwise.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims

1. An immersion type speech feedback training system based on AI is characterized by comprising a capability rating module, a grading learning module, a standard library, a movie playing module, a voice recognition module, an image acquisition module, a data receiving module, a data processing module, a learning scoring module and a scoring display module;

the voice comparison information and the action comparison information are both sent to a learning scoring module, the learning scoring module processes the voice comparison information and the action comparison information to generate training scoring information, the training scoring information is sent to a scoring display module, and the scoring display module is used for displaying training scoring;

the standard library stores training video information of different grades, including mouth shape coefficient information corresponding to character information, the data processing module processes the acquired mouth motion information into real-time mouth shape coefficient information by the image acquisition module, and compares the real-time mouth shape coefficient information with the pre-stored mouth shape coefficient information to acquire motion comparison information;

the mouth shape coefficient comprises a first mouth shape coefficient and a second mouth shape coefficient, and the specific process of the mouth shape coefficient is as follows:

SS 2: performing voiceprint processing on the voice information of the preset text content read by the language barrier patient acquired by the voice identification module to obtain a real-time voiceprint, and marking the real-time voiceprint as F _{Fruit of Chinese wolfberry} I.e. voice comparison information F _{Fruit of Chinese wolfberry} ；

SS 3: obtaining the real-time voiceprint F _{Fruit of Chinese wolfberry} And standard voiceprint F _Sign Comparing the similarity to obtain the similarity F _Than ；

The specific process of the learning scoring module for processing the voice comparison information and the action comparison information to generate training scoring information is as follows:

s03: by the formula M U1+ N U2 Mn _And obtaining the training score information Mn _And 。

2. the AI-based immersive verbal feedback training system of claim 1, wherein: the specific process of the capacity rating module for rating the capacity of the language barrier patient is as follows:

the method comprises the following steps: the capability rating module presets the text content information of different grades, including primary text, middle-grade text information, high-grade text information and normal text information, and the difficulty is as follows: the primary characters are less than the middle-level character information and more than the high-level character information and less than the normal character information;

step three: displaying x groups of character information selected from the primary characters, the middle-level character information, the high-level character information and the normal character information, sequentially reading the x groups of character information in the primary characters, the x groups of character information in the middle-level character information, the x groups of character information in the high-level character information and the x groups of character information in the normal character information by a patient with language disorder, and respectively marking the reading voices of the patients with language disorder as K1, K2, K3 and K4 from low to high according to the level sequence;

step five: when any one of the similarity degrees of Km1, Km2, Km3 and Km4 is greater than a preset value, the level is judged to belong to, and when two or more than two similarity degrees are greater than the preset value, the similarity degree with high level is taken as a final judgment result.

3. The AI-based immersive verbal feedback training system of claim 1, wherein: the video playing module plays the video and simultaneously plays the audio information synchronously.

4. The AI-based immersive verbal feedback training system of claim 1, wherein: the scoring display module is used for ranking all the received training scoring information from high to low, amplifying the personnel information corresponding to the first three maximum training scoring information by using a preset font and then displaying the personnel information.