CN112885168B - Immersive speech feedback training system based on AI - Google Patents

Immersive speech feedback training system based on AI Download PDF

Info

Publication number
CN112885168B
CN112885168B CN202110081356.2A CN202110081356A CN112885168B CN 112885168 B CN112885168 B CN 112885168B CN 202110081356 A CN202110081356 A CN 202110081356A CN 112885168 B CN112885168 B CN 112885168B
Authority
CN
China
Prior art keywords
information
module
voice
shape coefficient
mouth shape
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110081356.2A
Other languages
Chinese (zh)
Other versions
CN112885168A (en
Inventor
范虹
刘蓝冰
尉泽民
严晓波
茹文亚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shaoxing Peoples Hospital
Original Assignee
Shaoxing Peoples Hospital
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shaoxing Peoples Hospital filed Critical Shaoxing Peoples Hospital
Priority to CN202110081356.2A priority Critical patent/CN112885168B/en
Publication of CN112885168A publication Critical patent/CN112885168A/en
Application granted granted Critical
Publication of CN112885168B publication Critical patent/CN112885168B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B5/00Electrically-operated educational appliances
    • G09B5/06Electrically-operated educational appliances with both visual and audible presentation of the material to be studied
    • G09B5/065Combinations of audio and video presentations, e.g. videotapes, videodiscs, television systems
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B21/00Teaching, or communicating with, the blind, deaf or mute
    • G09B21/06Devices for teaching lip-reading

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Educational Administration (AREA)
  • Educational Technology (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The invention discloses an AI-based immersive speech feedback training system, which comprises a capability rating module, a grading learning module, a standard library, a movie playing module, a voice recognition module, an image acquisition module, a data receiving module, a data processing module, a learning scoring module and a scoring display module, wherein the learning module is used for learning and scoring; the ability rating module is used for rating the voice ability of the patient with language disorder and generating voice rating information, the voice rating information is sent to the grading learning module, the grading learning module receives the voice rating information and calls learning movies at corresponding levels from a standard library, and the standard library stores training movie information at different levels; the movie playing module is used for receiving movie information of a corresponding level sent by the hierarchical learning module control standard library, and the movie playing module starts playing after receiving the movie information of the corresponding level. The invention can better help and promote the rehabilitation training of the person with language disorder.

Description

Immersive speech feedback training system based on AI
Technical Field
The invention relates to the field of language training, in particular to an immersive speech feedback training system based on AI.
Background
Speech and language dysgenesis refers to a disorder of normal language acquisition patterns at an early stage of development, manifested as delays and abnormalities in pronunciation, language understanding, or development of language expression abilities that affect learning, occupation, and social functions. The situations are not caused by the abnormality of nerve or speech mechanisms, sensory impairment, mental development retardation or surrounding environmental factors, and a speech feedback training system is used for assisting rehabilitation training in the rehabilitation process of language disorder.
The existing speech feedback training system has a single function in the using process, so that the training effect is poor, the using requirement of a user cannot be met, certain influence is brought to the use of the speech feedback training system, and therefore, the immersive speech feedback training system based on the AI is provided.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: how to solve current speech feedback training system, in the use, the function singleness leads to its training effect relatively poor, can not satisfy user's user demand, has brought the problem of certain influence for speech feedback training system's use, provides an immersive speech feedback training system based on AI.
The invention solves the technical problems through the following technical scheme that the system comprises a capability rating module, a grading learning module, a standard library, a film and television playing module, a sound identification module, an image acquisition module, a data receiving module, a data processing module, a learning scoring module and a scoring display module;
the ability rating module is used for rating the voice ability of the patient with language disorder and generating voice rating information, the voice rating information is sent to the grading learning module, the grading learning module receives the voice rating information and calls learning movies at corresponding levels from a standard library, and the standard library stores training movie information at different levels;
the video playing module is used for receiving video information of a corresponding level sent by the hierarchical learning module control standard library, the video playing module starts playing after receiving the video information of the corresponding level, the sound identification module collects voice information sent by a language barrier patient at the moment, and meanwhile, the image collection module operates to collect mouth action information when the language barrier patient sends the voice information;
the voice information sent by the language disorder patient and the mouth action information when the voice information is sent by the language disorder patient are both sent to the data receiving module, and the data receiving module processes the received voice information sent by the language disorder patient and the mouth action information when the voice information is sent by the language disorder patient to generate voice comparison information and action comparison information;
the voice comparison information and the action comparison information are both sent to a learning scoring module, the learning scoring module processes the voice comparison information and the action comparison information to generate training scoring information, the training scoring information is sent to a scoring display module, and the scoring display module is used for displaying training scoring.
Preferably, the specific process of the ability rating module for rating the ability of the language-handicapped patient is as follows:
the method comprises the following steps: the capability rating module is preset with different levels of text content information, including primary text, middle level text information, high level text information and normal text information, and the difficulty level is as follows: the primary characters are less than the middle-level character information and more than the high-level character information and less than the normal character information;
step two: sequentially selecting at least x groups of character information from the primary character information, the middle-level character information, the high-level character information and the normal character information from normal to high level, wherein x is more than or equal to 5;
step three: displaying x groups of character information selected from the primary characters, the middle-level character information, the high-level character information and the normal character information, sequentially reading the x groups of character information in the primary characters, the x groups of character information in the middle-level character information, the x groups of character information in the high-level character information and the x groups of character information in the normal character information by a patient with language disorder, and respectively marking the reading of the character information from low to high as K1, K2, K3 and K4 according to the rank sequence;
step four: extracting preset pronunciation information of x groups of character information selected from the primary characters, the middle-level character information, the high-level character information and the normal character information, and respectively marking the preset pronunciation information as M1, M2, M3 and M4 according to the rank order;
step five: carrying out similarity matching on K1 and M1 to obtain similarity Km1, carrying out similarity matching on K2 and M2 to obtain similarity Km2, carrying out similarity matching on K3 and M3 to obtain similarity Km3, and carrying out similarity matching on K4 and M4 to obtain similarity Km 4;
step five: when any one of the similarity of Km1, Km2, Km3 and Km4 is larger than a preset value, the level is judged to belong to, and when two or more than two similarities are larger than the preset value, the similarity with high level is taken as a final judgment result.
Preferably, the video playing module plays the audio information synchronously while playing the video.
Preferably, training movie and television information of different grades stored in the standard library comprises mouth shape coefficient information corresponding to character information, the data processing module processes the acquired mouth motion information into real-time mouth shape coefficient information through the image acquisition module, and the real-time mouth shape coefficient information is compared with the pre-stored mouth shape coefficient information to obtain motion comparison information.
Preferably, the shape factor comprises a first shape factor and a second shape factor, and the specific process of the shape factor is as follows:
the method comprises the following steps: marking the key point of the upper lip as a point A1, marking two corner points of the upper lip as a point A2 and a point A3 respectively, and acquiring an arc segment L1 through the set point A1, the point A2 and the point A3;
step two: marking the key point of the lower lip as a point B1, marking two corner points of the lower lip as a point B2 and a point B3 respectively, and acquiring an arc line segment L2 through a set point B1, a set point B2 and a set point B3;
step three: connecting the point A1 with the point A2 to obtain a line segment L3, measuring the radians of the arc line segment L1 and the arc line segment L2, and measuring the length of the line segment L3;
step four: by the formula (L1+ L2)/(L1-L2) ═ L Ratio of ,L Ratio of The length of L3 is the second mouth shape factor;
the specific process of comparing the real-time mouth shape coefficient information with the pre-stored mouth shape coefficient information by the data processing module is as follows:
s1: extracting a real-time first mouth shape coefficient, a real-time second mouth shape coefficient, a preset first mouth shape coefficient and a preset second mouth shape coefficient, marking the real-time first mouth shape coefficient as P1, marking the real-time second mouth shape coefficient as P2, marking the preset first mouth shape coefficient as Q1 and marking the preset second mouth shape coefficient as Q2;
s2: the difference Pq1 between the real-time first shape factor P1 and the preset first shape factor labeled Q1 is calculated Difference (D) Then, the difference between the real-time second shape coefficient P2 and the preset second shape coefficient Q2 is calculatedValue Pq2 Difference (D)
S3: calculate Pq1 Difference (D) Absolute value of (1) and Pq2 Difference (D) Of absolute value of (Pq) And obtaining the action comparison information Pq And
preferably, the specific processing procedure of the data processing module for processing the voice comparison information is as follows:
SS 1: extracting standard voice information of the film and television information in the pre-storage library, performing voiceprint processing on the standard voice information to obtain standard voiceprint, and marking the standard voiceprint as F Sign board
SS 2: performing culture processing on voice information of preset character contents read by language-handicapped patients acquired by the voice recognition module to obtain real-time voiceprints, and marking the real-time voiceprints as F Fruit of Chinese wolfberry I.e. voice comparison information F Fruit of Chinese wolfberry
SS 3: the obtained real-time voiceprint F Fruit of Chinese wolfberry And standard voiceprint F Sign board Comparing the similarity to obtain a similarity F Ratio of
Preferably, the specific process of the learning scoring module for processing the voice comparison information and the motion comparison information to generate the training scoring information is as follows:
s01: extracting the obtained voice comparison information and action comparison information, and respectively marking the obtained voice comparison information and action comparison information as M and N;
s02: in order to highlight the importance of voice comparison, a correction value U1 is given to the voice comparison information, a correction value U2 is given to the action comparison information, U1 is greater than U2, and U1+ U2 is 1;
s03: by the formula M U1+ N U2 ═ Mn And obtaining training score information Mn And
preferably, the scoring display module ranks all the received training scoring information from high to low, and displays the personnel information corresponding to the first three maximum training scoring information after being amplified by a preset font.
Compared with the prior art, the invention has the following advantages: the immersive speech feedback training system based on AI can better evaluate the level of language disorder of a patient with language disorder before performing speech training, through the arrangement, the system can better provide speech training contents for the patient with language disorder, the arrangement is easy to achieve, the use experience of the system can be effectively improved, the frustration caused by the difficulty of the training contents to the patient with language disorder can be effectively avoided, the mouth shape condition of pronunciation can be simultaneously checked for the patient with language disorder through synchronous playing of movie contents and sound, pronunciation is performed through observing mouth shape simulation, the rehabilitation training progress of the patient with language disorder is accelerated, meanwhile, through double analysis of mouth shape and pronunciation, the rehabilitation training progress of the patient with language disorder can be more accurately evaluated, different arrangements meet different use requirements of the patient with language disorder, the system is more worthy of popularization and application.
Drawings
FIG. 1 is a system block diagram of the present invention.
Detailed Description
The following examples are given for the detailed implementation and specific operation of the present invention, but the scope of the present invention is not limited to the following examples.
As shown in fig. 1, the present embodiment provides a technical solution: an immersion type speech feedback training system based on AI comprises a capability rating module, a grading learning module, a standard library, a film and television playing module, a voice identification module, an image acquisition module, a data receiving module, a data processing module, a learning scoring module and a scoring display module;
the ability rating module is used for rating the voice ability of the patient with language disorder and generating voice rating information, the voice rating information is sent to the grading learning module, the grading learning module receives the voice rating information and calls learning movies at corresponding levels from a standard library, and the standard library stores training movie information at different levels;
the movie playing module is used for receiving movie information of a corresponding level sent by the hierarchical learning module control standard library, the movie playing module starts playing after receiving the movie information of the corresponding level, the movie playing module performs amplification close-up processing on mouths of characters in images when playing the movie information, so that mouth shape imitation of patients with language disorder is facilitated, the sound identification module collects voice information sent by the patients with language disorder at the moment, and meanwhile, the image collection module operates to collect mouth action information when the patients with language disorder send the voice information;
the voice information sent by the language disorder patient and the mouth action information when the voice information is sent by the language disorder patient are both sent to the data receiving module, and the data receiving module processes the received voice information sent by the language disorder patient and the mouth action information when the voice information is sent by the language disorder patient to generate voice comparison information and action comparison information;
the voice comparison information and the action comparison information are both sent to a learning scoring module, the learning scoring module processes the voice comparison information and the action comparison information to generate training scoring information, the training scoring information is sent to a scoring display module, and the scoring display module is used for displaying training scoring.
The specific process of the capacity rating module for rating the capacity of the language barrier patient is as follows:
the method comprises the following steps: the capability rating module is preset with different levels of text content information, including primary text, middle level text information, high level text information and normal text information, and the difficulty level is as follows: the primary characters < the middle-level character information < the high-level character information < the normal character information;
step two: sequentially selecting at least x groups of character information from the primary character information, the middle-level character information, the high-level character information and the normal character information from normal to high, wherein x is more than or equal to 5;
step three: displaying x groups of character information selected from the primary characters, the middle-level character information, the high-level character information and the normal character information, sequentially reading the x groups of character information in the primary characters, the x groups of character information in the middle-level character information, the x groups of character information in the high-level character information and the x groups of character information in the normal character information by a patient with language disorder, and marking the reading voices of the patients with language disorder as K1, K2, K3 and K4 from low to high according to the level sequence;
step four: extracting preset pronunciation information of x groups of character information selected from primary characters, middle-level character information, high-level character information and normal character information, and respectively marking the preset pronunciation information as M1, M2, M3 and M4 according to the rank order;
step five: carrying out similarity matching on K1 and M1 to obtain similarity Km1, carrying out similarity matching on K2 and M2 to obtain similarity Km2, carrying out similarity matching on K3 and M3 to obtain similarity Km3, and carrying out similarity matching on K4 and M4 to obtain similarity Km 4;
step five: when any one of the similarity of Km1, Km2, Km3 and Km4 is greater than a preset value, the level is judged to belong to, and when two or more than two similarities are greater than the preset value, the similarity with high level is taken as a final judgment result;
before carrying out the pronunciation training, the better rank assessment of language barrier carries out the language barrier to language barrier patient, through this kind of setting, lets the better for language barrier patient of this system provides the pronunciation training content, from easy to difficult setting, can effectual promotion this system use experience, has effectively avoided the training content too difficult to the frustration that causes for language barrier patient.
The movie & TV broadcast module is when carrying out the image broadcast, and synchronous broadcast audio information has effectually avoided the pronunciation of the language disorder patient that the sound painting desynchronized leads to make mistakes to through movie & TV content and sound synchronous broadcast, let can look over the mouth shape situation of pronunciation for the language disorder patient simultaneously, imitate through observing the mouth shape and pronounce, accelerated language disorder patient's rehabilitation training progress.
The standard library stores training video information of different grades, wherein the training video information comprises mouth shape coefficient information corresponding to character information, the data processing module processes the acquired mouth motion information into real-time mouth shape coefficient information through the image acquisition module, the real-time mouth shape coefficient information is compared with the pre-stored mouth shape coefficient information to acquire motion comparison information, and the rehabilitation training state of the patient with language disorder can be better evaluated by setting the mouth shape coefficient.
The shape coefficient comprises a first shape coefficient and a second shape coefficient, and the specific process of the shape coefficient is as follows:
the method comprises the following steps: marking the key point of the upper lip as a point A1, marking two corner points of the upper lip as a point A2 and a point A3 respectively, and acquiring an arc segment L1 through the set point A1, the point A2 and the point A3;
step two: marking the key point of the lower lip as a point B1, marking two corner points of the lower lip as a point B2 and a point B3 respectively, and acquiring an arc line segment L2 through a set point B1, a set point B2 and a set point B3;
step three: connecting the point A1 with the point A2 to obtain a line segment L3, measuring the radians of the arc line segment L1 and the arc line segment L2, and measuring the length of the line segment L3;
step four: by the formula (L1+ L2)/(L1-L2) ═ L Ratio of ,L Ratio of The length of L3 is the second mouth shape factor;
the judgment accuracy is further improved by setting the two mouth shape coefficients;
the specific process of comparing the real-time mouth shape coefficient information with the pre-stored mouth shape coefficient information by the data processing module is as follows:
s1: extracting a real-time first mouth shape coefficient, a real-time second mouth shape coefficient, a preset first mouth shape coefficient and a preset second mouth shape coefficient, marking the real-time first mouth shape coefficient as P1, marking the real-time second mouth shape coefficient as P2, marking the preset first mouth shape coefficient as Q1 and marking the preset second mouth shape coefficient as Q2;
s2: the difference Pq1 between the real-time first shape factor P1 and the preset first shape factor labeled Q1 is calculated Difference between Then, the difference Pq2 between the real-time second shape coefficient P2 and the preset second shape coefficient Q2 is calculated Difference (D)
S3: calculate Pq1 Difference (D) Absolute value of (1) and Pq2 Difference (D) Of absolute value of (Pq) And obtaining the action comparison information Pq And
through the setting, the acquisition of action comparison information can be better carried out.
The specific processing process of the data processing module for processing the voice comparison information is as follows:
SS 1: extracting standard voice information of the film and television information in the pre-storage library, performing voiceprint processing on the standard voice information to obtain standard voiceprint, and marking the standard voiceprint as F Sign board
SS 2: the voice information of the preset character content read by the language barrier patient and acquired by the voice identification module is subjected to culture-filling processing to obtain real-time voiceprints, and the real-time voiceprints are marked as F Fruit of Chinese wolfberry I.e. voice comparison information F Fruit of Chinese wolfberry
SS 3: obtaining the real-time voiceprint F Fruit of Chinese wolfberry And standard voiceprint F Sign Comparing the similarity to obtain a similarity F Ratio of
The specific process of processing the voice comparison information and the action comparison information by the learning scoring module to generate training scoring information is as follows:
s01: extracting the obtained voice comparison information and action comparison information, and respectively marking the obtained voice comparison information and action comparison information as M and N;
s02: in order to highlight the importance of voice comparison, a correction value U1 is given to the voice comparison information, a correction value U2 is given to the action comparison information, U1 is greater than U2, and U1+ U2 is 1;
s03: by the formula M U1+ N U2 ═ Mn And obtaining training score information Mn And
through the double analysis to the shape of mouth and pronunciation, language disorder patient's that can be more accurate rehabilitation training progress aassessment, and the different user demands of language disorder patient have been satisfied in the setting of multiple difference, let this system be worth using widely more.
The scoring display module is used for ranking all the received training scoring information from high to low, amplifying the personnel information corresponding to the first three persons with the maximum training scoring information by using a preset font and then displaying the personnel information;
the method has the advantages that the language disorder patients can know the recovery states of other patients through the arrangement of the ranking information, and the rehabilitation training confidence of the language disorder patients can be stimulated, so that the rehabilitation speed of the language disorder patients is accelerated.
In conclusion, when the system is used, the ability rating module is used for rating the voice ability of the patient with language disorder and generating voice rating information, the voice rating information is sent to the grading learning module, the grading learning module receives the voice rating information and calls learning movies at corresponding levels from the standard library, the standard library stores training movie information at different levels, the system can better perform level evaluation on the language disorder of the patient with language disorder before voice training, through the setting, the system can better provide voice training contents for the patient with language disorder, the setting is easy to go wrong, the use experience of the system can be effectively improved, the frustration caused by the fact that the training contents are too difficult to the patient with language disorder can be effectively avoided, the movie playing module is used for receiving the movie information of corresponding levels sent by the grading learning module control standard library, the video playing module starts playing after receiving the video information of the corresponding level, and synchronously plays through video content and sound to enable a patient with language disorder to simultaneously check the mouth shape condition of pronunciation, pronounces through observing mouth shape simulation, accelerates the rehabilitation training progress of the patient with language disorder, the sound identification module collects the voice information sent by the patient with language disorder at the moment, the image collection module operates to collect the mouth action information when the patient with language disorder sends the voice information, the voice information sent by the patient with language disorder and the mouth action information when the patient with language disorder sends the voice information are both sent to the data receiving module, the data receiving module processes the received voice information sent by the patient with language disorder and the mouth action information when the patient with language disorder sends the voice information, and generates voice comparison information and action comparison information, the voice comparison information and the action comparison information are all sent to the learning scoring module, the learning scoring module processes the voice comparison information and the action comparison information to generate training scoring information, and meanwhile, through double analysis of mouth shape and pronunciation, the rehabilitation training progress of the language disorder patient can be more accurate to evaluate, different use requirements of the language disorder patient are met through setting of multiple differences, the system is enabled to be more worthy of popularization and use, the training scoring information is sent to the scoring display module, and the scoring display module is used for conducting training scoring display.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or to implicitly indicate the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one of the feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless explicitly specified otherwise.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (4)

1. An immersion type speech feedback training system based on AI is characterized by comprising a capability rating module, a grading learning module, a standard library, a movie playing module, a voice recognition module, an image acquisition module, a data receiving module, a data processing module, a learning scoring module and a scoring display module;
the ability rating module is used for rating the voice ability of the patient with language disorder and generating voice rating information, the voice rating information is sent to the grading learning module, the grading learning module receives the voice rating information and calls learning movies at corresponding levels from a standard library, and the standard library stores training movie information at different levels;
the video playing module is used for receiving video information of a corresponding level sent by the hierarchical learning module control standard library, the video playing module starts playing after receiving the video information of the corresponding level, the sound identification module collects voice information sent by a language barrier patient at the moment, and meanwhile, the image collection module operates to collect mouth action information when the language barrier patient sends the voice information;
the voice information sent by the language disorder patient and the mouth action information when the voice information is sent by the language disorder patient are both sent to the data receiving module, and the data receiving module processes the received voice information sent by the language disorder patient and the mouth action information when the voice information is sent by the language disorder patient to generate voice comparison information and action comparison information;
the voice comparison information and the action comparison information are both sent to a learning scoring module, the learning scoring module processes the voice comparison information and the action comparison information to generate training scoring information, the training scoring information is sent to a scoring display module, and the scoring display module is used for displaying training scoring;
the standard library stores training video information of different grades, including mouth shape coefficient information corresponding to character information, the data processing module processes the acquired mouth motion information into real-time mouth shape coefficient information by the image acquisition module, and compares the real-time mouth shape coefficient information with the pre-stored mouth shape coefficient information to acquire motion comparison information;
the mouth shape coefficient comprises a first mouth shape coefficient and a second mouth shape coefficient, and the specific process of the mouth shape coefficient is as follows:
the method comprises the following steps: marking the key point of the upper lip as a point A1, marking two corner points of the upper lip as a point A2 and a point A3 respectively, and acquiring an arc segment L1 through the set point A1, the point A2 and the point A3;
step two: marking the key point of the lower lip as a point B1, marking two corner points of the lower lip as a point B2 and a point B3 respectively, and acquiring an arc line segment L2 through a set point B1, a set point B2 and a set point B3;
step three: connecting the point A1 with the point A2 to obtain a line segment L3, measuring the radians of the arc line segment L1 and the arc line segment L2, and measuring the length of the line segment L3;
step four: by the formula (L1+ L2)/(L1-L2) ═ L Ratio of ,L Ratio of The length of L3 is the second mouth shape factor;
the specific process of comparing the real-time mouth shape coefficient information with the pre-stored mouth shape coefficient information by the data processing module is as follows:
s1: extracting a real-time first mouth shape coefficient, a real-time second mouth shape coefficient, a preset first mouth shape coefficient and a preset second mouth shape coefficient, marking the real-time first mouth shape coefficient as P1, marking the real-time second mouth shape coefficient as P2, marking the preset first mouth shape coefficient as Q1 and marking the preset second mouth shape coefficient as Q2;
s2: the difference Pq1 between the real-time first shape factor P1 and the preset first shape factor labeled Q1 is calculated Difference between Then, the difference Pq2 between the real-time second shape coefficient P2 and the preset second shape coefficient Q2 is calculated Difference (D)
S3: calculate Pq1 Difference (D) Absolute value of (1) and Pq2 Difference (D) Of absolute value of (Pq) And obtaining the action comparison information Pq And
the specific processing process of the data processing module for processing the voice comparison information is as follows:
SS 1: extracting standard voice information of the film and television information in the pre-storage library, performing voiceprint processing on the standard voice information to obtain standard voiceprint, and marking the standard voiceprint as F Sign board
SS 2: performing voiceprint processing on the voice information of the preset text content read by the language barrier patient acquired by the voice identification module to obtain a real-time voiceprint, and marking the real-time voiceprint as F Fruit of Chinese wolfberry I.e. voice comparison information F Fruit of Chinese wolfberry
SS 3: obtaining the real-time voiceprint F Fruit of Chinese wolfberry And standard voiceprint F Sign Comparing the similarity to obtain the similarity F Than
The specific process of the learning scoring module for processing the voice comparison information and the action comparison information to generate training scoring information is as follows:
s01: extracting the obtained voice comparison information and action comparison information, and respectively marking the obtained voice comparison information and action comparison information as M and N;
s02: in order to highlight the importance of voice comparison, a correction value U1 is given to the voice comparison information, a correction value U2 is given to the action comparison information, U1 is greater than U2, and U1+ U2 is 1;
s03: by the formula M U1+ N U2 Mn And obtaining the training score information Mn And
2. the AI-based immersive verbal feedback training system of claim 1, wherein: the specific process of the capacity rating module for rating the capacity of the language barrier patient is as follows:
the method comprises the following steps: the capability rating module presets the text content information of different grades, including primary text, middle-grade text information, high-grade text information and normal text information, and the difficulty is as follows: the primary characters are less than the middle-level character information and more than the high-level character information and less than the normal character information;
step two: sequentially selecting at least x groups of character information from the primary character information, the middle-level character information, the high-level character information and the normal character information from normal to high, wherein x is more than or equal to 5;
step three: displaying x groups of character information selected from the primary characters, the middle-level character information, the high-level character information and the normal character information, sequentially reading the x groups of character information in the primary characters, the x groups of character information in the middle-level character information, the x groups of character information in the high-level character information and the x groups of character information in the normal character information by a patient with language disorder, and respectively marking the reading voices of the patients with language disorder as K1, K2, K3 and K4 from low to high according to the level sequence;
step four: extracting preset pronunciation information of x groups of character information selected from primary characters, middle-level character information, high-level character information and normal character information, and respectively marking the preset pronunciation information as M1, M2, M3 and M4 according to the rank order;
step five: carrying out similarity matching on K1 and M1 to obtain similarity Km1, carrying out similarity matching on K2 and M2 to obtain similarity Km2, carrying out similarity matching on K3 and M3 to obtain similarity Km3, and carrying out similarity matching on K4 and M4 to obtain similarity Km 4;
step five: when any one of the similarity degrees of Km1, Km2, Km3 and Km4 is greater than a preset value, the level is judged to belong to, and when two or more than two similarity degrees are greater than the preset value, the similarity degree with high level is taken as a final judgment result.
3. The AI-based immersive verbal feedback training system of claim 1, wherein: the video playing module plays the video and simultaneously plays the audio information synchronously.
4. The AI-based immersive verbal feedback training system of claim 1, wherein: the scoring display module is used for ranking all the received training scoring information from high to low, amplifying the personnel information corresponding to the first three maximum training scoring information by using a preset font and then displaying the personnel information.
CN202110081356.2A 2021-01-21 2021-01-21 Immersive speech feedback training system based on AI Active CN112885168B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110081356.2A CN112885168B (en) 2021-01-21 2021-01-21 Immersive speech feedback training system based on AI

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110081356.2A CN112885168B (en) 2021-01-21 2021-01-21 Immersive speech feedback training system based on AI

Publications (2)

Publication Number Publication Date
CN112885168A CN112885168A (en) 2021-06-01
CN112885168B true CN112885168B (en) 2022-09-09

Family

ID=76051484

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110081356.2A Active CN112885168B (en) 2021-01-21 2021-01-21 Immersive speech feedback training system based on AI

Country Status (1)

Country Link
CN (1) CN112885168B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113744880B (en) * 2021-09-08 2023-11-17 邵阳学院 Child language barrier degree management analysis system
CN114306871A (en) * 2021-12-30 2022-04-12 首都医科大学附属北京天坛医院 Artificial intelligence-based aphasia patient rehabilitation training method and system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0713046U (en) * 1993-07-19 1995-03-03 武盛 豊永 Dictation word processor
US6356868B1 (en) * 1999-10-25 2002-03-12 Comverse Network Systems, Inc. Voiceprint identification system
US9548048B1 (en) * 2015-06-19 2017-01-17 Amazon Technologies, Inc. On-the-fly speech learning and computer model generation using audio-visual synchronization

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080004879A1 (en) * 2006-06-29 2008-01-03 Wen-Chen Huang Method for assessing learner's pronunciation through voice and image
CN101751809B (en) * 2010-02-10 2011-11-09 长春大学 Deaf children speech rehabilitation method and system based on three-dimensional head portrait
CN102063903B (en) * 2010-09-25 2012-07-04 中国科学院深圳先进技术研究院 Speech interactive training system and speech interactive training method
KR20140075994A (en) * 2012-12-12 2014-06-20 주홍찬 Apparatus and method for language education by using native speaker's pronunciation data and thought unit
CN105982641A (en) * 2015-01-30 2016-10-05 上海泰亿格康复医疗科技股份有限公司 Speech and language hypoacousie multi-parameter diagnosis and rehabilitation apparatus and cloud rehabilitation system
CN109872714A (en) * 2019-01-25 2019-06-11 广州富港万嘉智能科技有限公司 A kind of method, electronic equipment and storage medium improving accuracy of speech recognition
CN111081080B (en) * 2019-05-29 2022-05-03 广东小天才科技有限公司 Voice detection method and learning device
CN110349565B (en) * 2019-07-02 2021-03-19 长春大学 Auxiliary pronunciation learning method and system for hearing-impaired people
CN110379221A (en) * 2019-08-09 2019-10-25 陕西学前师范学院 A kind of pronunciation of English test and evaluation system
CN110853624A (en) * 2019-11-29 2020-02-28 杭州南粟科技有限公司 Speech rehabilitation training system
CN112233679B (en) * 2020-10-10 2024-02-13 安徽讯呼信息科技有限公司 Artificial intelligence speech recognition system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0713046U (en) * 1993-07-19 1995-03-03 武盛 豊永 Dictation word processor
US6356868B1 (en) * 1999-10-25 2002-03-12 Comverse Network Systems, Inc. Voiceprint identification system
US9548048B1 (en) * 2015-06-19 2017-01-17 Amazon Technologies, Inc. On-the-fly speech learning and computer model generation using audio-visual synchronization

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于深度学习的唇读识别研究;吴大江;《中国优秀硕士学位论文全文数据库 (信息科技辑)》;20181231;正文 *

Also Published As

Publication number Publication date
CN112885168A (en) 2021-06-01

Similar Documents

Publication Publication Date Title
CN112885168B (en) Immersive speech feedback training system based on AI
CN108281052B (en) A kind of on-line teaching system and online teaching method
WO2020215966A1 (en) Remote teaching interaction method, server, terminal and system
Davies et al. Facial composite production: A comparison of mechanical and computer-driven systems.
CN107203953A (en) It is a kind of based on internet, Expression Recognition and the tutoring system of speech recognition and its implementation
CN105516802B (en) The news video abstract extraction method of multiple features fusion
WO2019095447A1 (en) Guided teaching method having remote assessment function
Hong et al. Video accessibility enhancement for hearing-impaired users
CN106021496A (en) Video search method and video search device
CN111212317A (en) Skip navigation method for video playing
EP1139318A1 (en) Pronunciation evaluation system
US20140302469A1 (en) Systems and Methods for Providing a Multi-Modal Evaluation of a Presentation
CN1804934A (en) Computer-aided Chinese language phonation learning method
CN106952515A (en) The interactive learning methods and system of view-based access control model equipment
TW202042172A (en) Intelligent teaching consultant generation method, system and device and storage medium
CN106228996A (en) Vocality study electron assistant articulatory system
CN110490173B (en) Intelligent action scoring system based on 3D somatosensory model
CN112534425A (en) Singing teaching system, use method thereof and computer readable storage medium
CN111554303A (en) User identity recognition method and storage medium in song singing process
CN116088675A (en) Virtual image interaction method, related device, equipment, system and medium
KR100756671B1 (en) English studying system which uses an accomplishment multimedia
KR20140087956A (en) Apparatus and method for learning phonics by using native speaker&#39;s pronunciation data and word and sentence and image data
CN108429932A (en) Method for processing video frequency and device
CN114936952A (en) Digital education internet learning system
KR20140075994A (en) Apparatus and method for language education by using native speaker&#39;s pronunciation data and thought unit

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant