CN112232127A - Intelligent speech training system and method - Google Patents
Intelligent speech training system and method Download PDFInfo
- Publication number
- CN112232127A CN112232127A CN202010961200.9A CN202010961200A CN112232127A CN 112232127 A CN112232127 A CN 112232127A CN 202010961200 A CN202010961200 A CN 202010961200A CN 112232127 A CN112232127 A CN 112232127A
- Authority
- CN
- China
- Prior art keywords
- image
- sound
- signal
- speaker
- speech
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012549 training Methods 0.000 title claims abstract description 61
- 238000000034 method Methods 0.000 title claims abstract description 12
- 230000009471 action Effects 0.000 claims abstract description 47
- 238000012545 processing Methods 0.000 claims abstract description 42
- 230000006872 improvement Effects 0.000 claims abstract description 31
- 230000005236 sound signal Effects 0.000 claims abstract description 16
- 230000000694 effects Effects 0.000 claims abstract description 9
- 238000001914 filtration Methods 0.000 claims abstract description 7
- 230000009467 reduction Effects 0.000 claims abstract description 5
- 238000000605 extraction Methods 0.000 claims description 21
- 230000000875 corresponding effect Effects 0.000 claims description 19
- 230000008921 facial expression Effects 0.000 claims description 16
- 239000000463 material Substances 0.000 claims description 13
- 238000007781 pre-processing Methods 0.000 claims description 9
- 230000008451 emotion Effects 0.000 claims description 4
- 239000013598 vector Substances 0.000 claims description 4
- 241001310793 Podium Species 0.000 claims description 3
- 230000001737 promoting effect Effects 0.000 claims description 3
- 238000001454 recorded image Methods 0.000 claims description 3
- 238000004891 communication Methods 0.000 claims 1
- 230000008569 process Effects 0.000 abstract description 2
- 238000011161 development Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000002996 emotional effect Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 230000008821 health effect Effects 0.000 description 1
- 230000000704 physical effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 210000000697 sensory organ Anatomy 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/174—Facial expression recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Signal Processing (AREA)
- Acoustics & Sound (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Bioinformatics & Computational Biology (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Quality & Reliability (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Electrically Operated Instructional Devices (AREA)
- Rehabilitation Tools (AREA)
Abstract
The invention discloses an intelligent speech training system, which utilizes a human body biological state sensor to collect gesture actions and sound information of a speaker, processes the collected gesture actions through an image processing module to obtain pure image graphic characteristic signals, converts the sound signals of the speaker into electric signals through a sound processing module, performs noise reduction and filtering, compares the image graphic characteristic signals and the electric signals of the speaker sound with prestored standard gesture actions and speech sounds through a central processing module, and gives improvement suggestions, and the improvement suggestions are output through an output module and are transmitted to the speaker for timely improvement. The invention solves the problems that the prior speech training is not standard, can not carry out standard guidance, has poor training effect and excessively depends on experienced persons for guidance.
Description
Technical Field
The invention relates to the field of speech training, in particular to an intelligent speech training system and method.
Background
In the traditional speech instruction training process, comprehensive judgment can be carried out only by manually watching the speech state of a speaker and observing the posture of the speaker and the sound size and the emotional intensity of the speaker. Moreover, different instructors have different opinions, so that a standard which is more standard and uniform cannot be formed, and deviation is easy to occur.
With the continuous development of science and technology and the introduction of information technology, computer technology and artificial intelligence technology, the research of robots has gradually gone out of the industrial field and gradually expanded to the fields of medical treatment, health care, families, entertainment, service industry and the like. The requirements of people on the robot are also improved from simple and repeated mechanical actions to an intelligent robot with anthropomorphic question answering, autonomy and interaction with other robots, and human-computer interaction also becomes an important factor for determining the development of the intelligent robot. The robot collects the speech state of the speaker and compares the speech state with the standard state, so that the teaching of speech training becomes a development trend
Disclosure of Invention
Therefore, the invention provides an intelligent speech training system and method, and aims to solve the problems that the existing speech training is not standard, standard guidance cannot be performed, the training effect is poor, and guidance is excessively performed by an experienced person.
In order to achieve the above purpose, the invention provides the following technical scheme:
according to the first aspect of the invention, the intelligent speech training system is disclosed, wherein a human body biological state sensor is utilized to collect gesture actions and sound information of a speaker, an image processing module is used for processing the collected gesture actions to obtain pure image graphic characteristic signals, a sound processing module is used for converting sound signals of the speaker into electric signals and carrying out noise reduction and filtering, a central processing module is used for comparing the image graphic characteristic signals and the electric signals of the speaker sound with pre-stored standard gesture actions and speech sounds and providing improvement suggestions, and the improvement suggestions are output through an output module and are transmitted to the speaker for timely improvement.
Further, the human body biological state sensor includes: the voice recognition system comprises a posture detecting instrument and a voice collecting device, wherein the posture detecting instrument is horizontally erected in a range of 2-3 m in front of a speaker through a support, the height of the erected support is 1.6 m, the voice collecting device is installed on clothes or a podium of the speaker, the posture detecting instrument collects facial expressions and limb actions of the speaker, and the voice collecting device collects voice signals of the speaker.
Furthermore, the image processing module is connected with the gesture detector and used for carrying out image graph signal extraction, image graph signal preprocessing, image graph signal feature extraction, direction analysis and intelligent tracking and image graph information coding storage on the collected facial expression and limb action images.
Further, the image processing module performs noise reduction processing and feature enhancement processing on the recorded image graphics signals to obtain relatively pure image graphics signal feature vectors, and the condition of subsequent feature extraction is met; the two parts of image graphic signal extraction and preprocessing are mainly realized by front-end camera equipment; the characteristic extraction of the image graphic signal is to extract characteristic information which is contained in the sequence image and can be used for target tracking; the direction analysis and intelligent tracking realize the recognition of a certain characteristic direction to judge the activity range and the activity frequency of the characteristic, and simultaneously, the intelligent tracking realizes the recording of the motion trail of the characteristic target; the image graphic information is coded and stored, and the analyzed and tracked image graphic information is coded in a coding mode, so that the information quantity is compressed and stored, and convenience is brought to subsequent systems for extracting information.
Furthermore, the sound collection device collects the sound information of the speaker and then sends the sound information to the sound processing module, the sound processing module converts the sound signal into an electric signal, the sound loudness, the frequency, the content, the duration and the time interval between each byte are obtained, the electric signal is filtered and denoised through the filter, and the electric signal which is purified and filtered to eliminate clutter interference is obtained through the amplifier.
Furthermore, central processing module embeds there are memory cell, training unit and comparison unit, the signal of telecommunication of figure image signal and sound is stored to the memory cell, the training unit utilizes the standard action figure image signal and the corresponding standard sound signal of telecommunication that a large amount of articles for speech correspond, the signal of telecommunication of figure image signal and sound and standard action figure image signal and the corresponding standard sound signal of telecommunication that the comparison unit will gather compare.
Furthermore, the training unit performs learning training by using corresponding actions and sounds corresponding to lecture materials prepared in advance, different types of lecture materials correspond to different gesture actions and lecture sound emotions, and standard guidance actions and sounds can be output for different lecture materials after training is completed.
Furthermore, the comparison unit compares the collected electric signals of the actual graph image signals and the sound with the electric signals of the standard graph image signals and the sound, labels different places, points out differences, provides corresponding improvement suggestions, and transmits the improvement suggestions to the output module for output.
Further, the output module transmits the specific improvement suggestion to a display screen in front of the speaker and transmits the related voice prompt information to a Bluetooth headset worn by the speaker through the Bluetooth module.
According to a second aspect of the present invention, a method for intelligent speech training is disclosed, the method comprising:
collecting facial expressions and limb actions of the speaker by using an attitude detector, and collecting a sound signal of the speaker by using a sound collection device;
the image processing module is connected with the gesture detector and is used for carrying out image graphic signal extraction, image graphic signal preprocessing, image graphic signal characteristic extraction, direction analysis and intelligent tracking and image graphic information coding storage on the collected facial expression and limb action images;
the sound processing module converts the sound signal into an electric signal, the electric signal is filtered and denoised by a filter and amplified by an amplifier to obtain a pure electric signal for filtering clutter interference;
a training unit in the central processing module is used for training by utilizing a large amount of data in advance, and after the training is finished, standard guide actions and sounds can be output according to different lecture materials;
comparing the collected electric signals of the actual graph image signals and the sound with the electric signals of the standard graph image signals and the sound, and giving improvement suggestions aiming at different places;
the improvement suggestion is conveyed through the display screen and the bluetooth headset that the lecturer is in front of, makes the lecturer receive relevant suggestion to carry out the adjustment training of speech state, make the speech more normal, the speech level obtains promoting.
The invention has the following advantages:
the invention discloses an intelligent speech training system and method, which are characterized in that attitude actions and sound signals of a speaker are collected, a central processing module is used for comparing the processed attitude actions and sound signals with standard actions and sound, different places are marked, improvement suggestions are output, and the improvement suggestions are fed back to the speaker through a display screen and a Bluetooth headset. The lecturer adjusts according to the given improvement suggestion, has better guiding significance, improves the lecture level of the lecturer, and can form a personalized guidance scheme aiming at different lecture contents.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It should be apparent that the drawings in the following description are merely exemplary, and that other embodiments can be derived from the drawings provided by those of ordinary skill in the art without inventive effort.
The structures, ratios, sizes, and the like shown in the present specification are only used for matching with the contents disclosed in the specification, so as to be understood and read by those skilled in the art, and are not used to limit the conditions that the present invention can be implemented, so that the present invention has no technical significance, and any structural modifications, changes in the ratio relationship, or adjustments of the sizes, without affecting the effects and the achievable by the present invention, should still fall within the range that the technical contents disclosed in the present invention can cover.
Fig. 1 is a flowchart of an intelligent speech training system according to an embodiment of the present invention;
fig. 2 is a schematic diagram of hardware connection of an intelligent speech training system according to an embodiment of the present invention;
Detailed Description
The present invention is described in terms of particular embodiments, other advantages and features of the invention will become apparent to those skilled in the art from the following disclosure, and it is to be understood that the described embodiments are merely exemplary of the invention and that it is not intended to limit the invention to the particular embodiments disclosed. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
This embodiment discloses an intelligence speech training system, the system utilizes human biological state sensor to gather speaker's gesture action and sound information, handle the gesture action of gathering through image processing module, obtain pure image figure characteristic signal, turn into the signal of telecommunication through sound processing module with speaker's sound signal, and fall the noise, the filtering, central processing module compares the signal of telecommunication of image figure characteristic signal and speaker sound with the standard gesture action and the speech sound of prestoring, and give the improvement suggestion, the improvement suggestion is exported through output module, send to the speaker and in time improve.
The human body biological state sensor includes: the voice recognition system comprises a posture detecting instrument and a voice collecting device, wherein the posture detecting instrument is horizontally erected in a range of 2-3 m in front of a speaker through a support, the height of the erected support is 1.6 m, the voice collecting device is installed on clothes or a podium of the speaker, the posture detecting instrument collects facial expressions and limb actions of the speaker, and the voice collecting device collects voice signals of the speaker. The information that the gesture detection instrument gathered includes human facial expression action and health action, and facial expression includes: smile, anger, joy, sadness, excitement, etc., and the physical actions include: waving hands, clenching fists, clapping palms, etc. The sound collection device 2 collects speech sound content of a speaker, and the sound state of the speaker includes: thriving, low, gentle, cheerful and the like. And sending the collected information to a memory for storage.
The image processing module is connected with the gesture detector and used for carrying out image graph signal extraction, image graph signal preprocessing, image graph signal feature extraction, direction analysis and intelligent tracking and image graph information coding storage on the collected facial expression and limb action images. The image processing module carries out noise reduction processing and feature enhancement processing on the recorded image graphic signals to obtain relatively pure image graphic signal feature vectors, and the condition of subsequent feature extraction is met; the two parts of image graphic signal extraction and preprocessing are mainly realized by front-end camera equipment; the characteristic extraction of the image graphic signal is to extract characteristic information which is contained in the sequence image and can be used for target tracking; the direction analysis and intelligent tracking realize the recognition of a certain characteristic direction to judge the activity range and the activity frequency of the characteristic, and simultaneously, the intelligent tracking realizes the recording of the motion trail of the characteristic target; the image graphic information is coded and stored, and the analyzed and tracked image graphic information is coded in a coding mode, so that the information quantity is compressed and stored, and convenience is brought to subsequent systems for extracting information.
The voice collecting device collects voice information of a speaker and then sends the voice information to the voice processing module to establish an acoustic model, and the acoustic model aims to provide an effective method for calculating the distance between a feature vector sequence of voice and each pronunciation template. The design of acoustic models is closely related to the characteristics of speech pronunciation. The size of the acoustic model unit (a word pronunciation model, a semisyllable model, or a phoneme model) has a large influence on the size of the voice training data volume, the system recognition rate, and the flexibility. The size of the recognition unit must be determined according to the characteristics of different languages and the size of the vocabulary of the recognition system. The acoustic model elements commonly used at present are initials, finals, syllables or words, and different elements are selected according to different implementation purposes. The Chinese and tone words have 412 syllables including light tone words and 1282 toned syllable words, so that words are often selected as elements when the isolated word pronunciation is recognized in a small vocabulary, syllables or initial consonants and vowels are often adopted for the voice recognition in a large vocabulary, and initial consonant and vowel modeling is often adopted due to the influence of cooperative pronunciation when the continuous voice is recognized. The common statistical-based speech recognition model is an HMM model lambda (N, M, pi, A, B), and the related theories related to the HMM model include structure selection of the model, initialization of the model, reestimation of model parameters, a corresponding recognition algorithm and the like.
The sound processing module converts the sound signal into an electric signal, obtains the sound loudness, frequency, content, duration and time interval among all bytes, filters and reduces noise of the electric signal through a filter, and amplifies the electric signal through an amplifier to obtain the electric signal which is pure and filtered and has clutter interference.
The central processing module is internally provided with a storage unit, a training unit and a comparison unit, wherein the storage unit is internally provided with electric signals of graphic image signals and sound, the training unit utilizes a large number of standard action graphic image signals and corresponding standard sound electric signals corresponding to the lecture articles, and the comparison unit is used for comparing the collected electric signals of the graphic image signals and the sound with the standard action graphic image signals and the corresponding standard sound electric signals.
The training unit can adopt a convolutional neural network model for training, corresponding actions and sounds corresponding to the lecture materials prepared in advance are utilized for learning and training, different types of lecture materials correspond to different gesture actions and lecture sound emotions, and after training is completed, standard guidance actions and sounds can be output according to different lecture materials. The training data comes from the body movements, facial expressions, speech sound changes, emotion changes and the like of the speech experts aiming at different speech materials in the industry. After a large amount of training, the corresponding standard gesture action and speech sound can be output aiming at the speech manuscript.
The comparison unit compares the collected electric signals of the actual graph image signals and the sound with the electric signals of the standard graph image signals and the sound, and the comparison content comprises the following steps: the action amplitude, the gesture, the action making time, the action making period, the action making frequency and the change of the facial expression are compared aiming at the micro actions of the five sense organs; when the sound is compared, the comparison is carried out according to the tone, frequency, amplitude, loudness, emotional intensity and the like of the sound. And marking different places, indicating difference points, giving corresponding improvement suggestions, and transmitting the improvement suggestions to an output module for output.
The output module transmits the specific improvement suggestion to a display screen in front of the speaker and transmits the related voice prompt information to a Bluetooth headset worn by the speaker through the Bluetooth module. The display screen prompts the speaker of the places needing attention and improvement through text information, and the Bluetooth headset transmits related suggestions and instructions through voice. The lecturer can adjust the lecture state in time, and the continuous improvement training is carried out, so that the lecture level is improved.
Example 2
The embodiment discloses an intelligent speech training method, which comprises the following steps:
collecting facial expressions and limb actions of the speaker by using an attitude detector, and collecting a sound signal of the speaker by using a sound collection device;
the image processing module is connected with the gesture detector and is used for carrying out image graphic signal extraction, image graphic signal preprocessing, image graphic signal characteristic extraction, direction analysis and intelligent tracking and image graphic information coding storage on the collected facial expression and limb action images;
the sound processing module converts the sound signal into an electric signal, the electric signal is filtered and denoised by a filter and amplified by an amplifier to obtain a pure electric signal for filtering clutter interference;
a training unit in the central processing module is used for training by utilizing a large amount of data in advance, and after the training is finished, standard guide actions and sounds can be output according to different lecture materials;
comparing the collected electric signals of the actual graph image signals and the sound with the electric signals of the standard graph image signals and the sound, and giving improvement suggestions aiming at different places;
the improvement suggestion is conveyed through the display screen and the bluetooth headset that the lecturer is in front of, makes the lecturer receive relevant suggestion to carry out the adjustment training of speech state, make the speech more normal, the speech level obtains promoting.
Although the invention has been described in detail above with reference to a general description and specific examples, it will be apparent to one skilled in the art that modifications or improvements may be made thereto based on the invention. Accordingly, such modifications and improvements are intended to be within the scope of the invention as claimed.
Claims (10)
1. The utility model provides an intelligence speech training system, a serial communication port, the system utilizes human biological state sensor to gather speaker's gesture action and sound information, gesture action through image processing module to gathering is handled, obtain pure image figure characteristic signal, turn into the signal of telecommunication through sound processing module with speaker's sound signal, and fall the noise, the filtering, central processing module compares the signal of telecommunication of image figure characteristic signal and speaker sound with the standard gesture action and the speech sound of prestoring, and give improvement suggestion, improve the suggestion and export through output module, send to the speaker and in time improve.
2. The intelligent speech training system of claim 1, wherein the human body biological state sensor comprises: the voice recognition system comprises a posture detecting instrument and a voice collecting device, wherein the posture detecting instrument is horizontally erected in a range of 2-3 m in front of a speaker through a support, the height of the erected support is 1.6 m, the voice collecting device is installed on clothes or a podium of the speaker, the posture detecting instrument collects facial expressions and limb actions of the speaker, and the voice collecting device collects voice signals of the speaker.
3. The intelligent speech training system of claim 1, wherein the image processing module is connected to the gesture detector for image and graphic signal extraction, image and graphic signal preprocessing, image and graphic signal feature extraction, direction analysis and intelligent tracking, and image and graphic information encoding and storage of the collected facial expression and limb movement images.
4. The intelligent speech training system of claim 3, wherein the image processing module performs noise reduction and feature enhancement on the recorded image graphics signal to obtain a relatively clean image graphics signal feature vector, which satisfies the condition of subsequent feature extraction; the two parts of image graphic signal extraction and preprocessing are mainly realized by front-end camera equipment; the characteristic extraction of the image graphic signal is to extract characteristic information which is contained in the sequence image and can be used for target tracking; the direction analysis and intelligent tracking realize the recognition of a certain characteristic direction to judge the activity range and the activity frequency of the characteristic, and simultaneously, the intelligent tracking realizes the recording of the motion trail of the characteristic target; the image graphic information is coded and stored, and the analyzed and tracked image graphic information is coded in a coding mode, so that the information quantity is compressed and stored, and convenience is brought to subsequent systems for extracting information.
5. The intelligent speech training system according to claim 1, wherein the sound collection device collects the voice information of the speaker and sends the voice information to the sound processing module, the sound processing module converts the voice signal into an electrical signal, obtains the loudness, frequency, content, duration of the voice and the time interval between each byte, filters the electrical signal through a filter to reduce noise, and amplifies the electrical signal through an amplifier to obtain the electrical signal with pure noise interference removed.
6. The intelligent speech training system according to claim 1, wherein the central processing module is provided with a memory unit, a training unit and a comparison unit, the memory unit stores graphic image signals and sound electrical signals, the training unit utilizes a plurality of standard motion graphic image signals and corresponding standard sound electrical signals corresponding to speech articles, and the comparison unit compares the collected graphic image signals and sound electrical signals with the standard motion graphic image signals and corresponding standard sound electrical signals.
7. The intelligent speech training system according to claim 6, wherein the training unit performs learning training by using corresponding actions and sounds corresponding to speech materials prepared in advance, different types of speech materials correspond to different gesture actions and speech sound emotions, and after training is completed, standard guidance actions and sounds can be output for different speech materials.
8. The intelligent speech training system of claim 6, wherein the comparison unit compares the collected electric signals of the actual graphic image signal and the sound with the electric signals of the standard graphic image signal and the sound, labels different places, points of difference, gives a corresponding improvement suggestion, and transmits the improvement suggestion to the output module for output.
9. The intelligent lecture training system of claim 8, wherein the output module transmits specific improvement suggestions to a display screen in front of the lecturer and transmits related voice prompt information to a bluetooth headset worn by the lecturer through the bluetooth module.
10. An intelligent speech training method is characterized by comprising the following steps:
collecting facial expressions and limb actions of the speaker by using an attitude detector, and collecting a sound signal of the speaker by using a sound collection device;
the image processing module is connected with the gesture detector and is used for carrying out image graphic signal extraction, image graphic signal preprocessing, image graphic signal characteristic extraction, direction analysis and intelligent tracking and image graphic information coding storage on the collected facial expression and limb action images;
the sound processing module converts the sound signal into an electric signal, the electric signal is filtered and denoised by a filter and amplified by an amplifier to obtain a pure electric signal for filtering clutter interference;
a training unit in the central processing module is used for training by utilizing a large amount of data in advance, and after the training is finished, standard guide actions and sounds can be output according to different lecture materials;
comparing the collected electric signals of the actual graph image signals and the sound with the electric signals of the standard graph image signals and the sound, and giving improvement suggestions aiming at different places;
the improvement suggestion is conveyed through the display screen and the bluetooth headset that the lecturer is in front of, makes the lecturer receive relevant suggestion to carry out the adjustment training of speech state, make the speech more normal, the speech level obtains promoting.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010961200.9A CN112232127A (en) | 2020-09-14 | 2020-09-14 | Intelligent speech training system and method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010961200.9A CN112232127A (en) | 2020-09-14 | 2020-09-14 | Intelligent speech training system and method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112232127A true CN112232127A (en) | 2021-01-15 |
Family
ID=74116213
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010961200.9A Pending CN112232127A (en) | 2020-09-14 | 2020-09-14 | Intelligent speech training system and method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112232127A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113257246A (en) * | 2021-04-19 | 2021-08-13 | 歌尔股份有限公司 | Prompting method, device, equipment, system and storage medium |
CN113411252A (en) * | 2021-06-22 | 2021-09-17 | 邓润阳 | Speech platform and speech method |
CN115629894A (en) * | 2022-12-21 | 2023-01-20 | 深圳市人马互动科技有限公司 | Speech prompting method and related device |
CN117787921A (en) * | 2024-02-27 | 2024-03-29 | 北京烽火万家科技有限公司 | Intelligent education training management method and identity anti-counterfeiting method for intelligent education training |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103714248A (en) * | 2013-12-23 | 2014-04-09 | 青岛优维奥信息技术有限公司 | Training system for competitive speech |
CN204889399U (en) * | 2015-08-18 | 2015-12-23 | 蒋彬 | Intelligence body -building mirror |
CN106847263A (en) * | 2017-01-13 | 2017-06-13 | 科大讯飞股份有限公司 | Speech level evaluation method and apparatus and system |
CN106997243A (en) * | 2017-03-28 | 2017-08-01 | 北京光年无限科技有限公司 | Speech scene monitoring method and device based on intelligent robot |
CN206619289U (en) * | 2017-02-24 | 2017-11-07 | 绥化学院 | A kind of broadcaster's speech training device |
CN206991571U (en) * | 2017-05-17 | 2018-02-09 | 咸阳师范学院 | A kind of sound comparator |
CN108322865A (en) * | 2017-12-28 | 2018-07-24 | 广州华夏职业学院 | A kind of teaching private classroom speaker unit and application method |
CN108921284A (en) * | 2018-06-15 | 2018-11-30 | 山东大学 | Interpersonal interactive body language automatic generation method and system based on deep learning |
CN209962447U (en) * | 2019-03-26 | 2020-01-17 | 共赢时代有限公司 | Speech training device |
-
2020
- 2020-09-14 CN CN202010961200.9A patent/CN112232127A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103714248A (en) * | 2013-12-23 | 2014-04-09 | 青岛优维奥信息技术有限公司 | Training system for competitive speech |
CN204889399U (en) * | 2015-08-18 | 2015-12-23 | 蒋彬 | Intelligence body -building mirror |
CN106847263A (en) * | 2017-01-13 | 2017-06-13 | 科大讯飞股份有限公司 | Speech level evaluation method and apparatus and system |
CN206619289U (en) * | 2017-02-24 | 2017-11-07 | 绥化学院 | A kind of broadcaster's speech training device |
CN106997243A (en) * | 2017-03-28 | 2017-08-01 | 北京光年无限科技有限公司 | Speech scene monitoring method and device based on intelligent robot |
CN206991571U (en) * | 2017-05-17 | 2018-02-09 | 咸阳师范学院 | A kind of sound comparator |
CN108322865A (en) * | 2017-12-28 | 2018-07-24 | 广州华夏职业学院 | A kind of teaching private classroom speaker unit and application method |
CN108921284A (en) * | 2018-06-15 | 2018-11-30 | 山东大学 | Interpersonal interactive body language automatic generation method and system based on deep learning |
CN209962447U (en) * | 2019-03-26 | 2020-01-17 | 共赢时代有限公司 | Speech training device |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113257246A (en) * | 2021-04-19 | 2021-08-13 | 歌尔股份有限公司 | Prompting method, device, equipment, system and storage medium |
CN113257246B (en) * | 2021-04-19 | 2023-03-14 | 歌尔股份有限公司 | Prompting method, device, equipment, system and storage medium |
CN113411252A (en) * | 2021-06-22 | 2021-09-17 | 邓润阳 | Speech platform and speech method |
CN115629894A (en) * | 2022-12-21 | 2023-01-20 | 深圳市人马互动科技有限公司 | Speech prompting method and related device |
CN117787921A (en) * | 2024-02-27 | 2024-03-29 | 北京烽火万家科技有限公司 | Intelligent education training management method and identity anti-counterfeiting method for intelligent education training |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112232127A (en) | Intelligent speech training system and method | |
Ramakrishnan et al. | Speech emotion recognition approaches in human computer interaction | |
CN103996155A (en) | Intelligent interaction and psychological comfort robot service system | |
US10878818B2 (en) | Methods and apparatus for silent speech interface | |
Rosen et al. | Automatic speech recognition and a review of its functioning with dysarthric speech | |
Hennecke et al. | Visionary speech: Looking ahead to practical speechreading systems | |
CN108805087A (en) | Semantic temporal fusion association based on multi-modal Emotion identification system judges subsystem | |
Kandali et al. | Emotion recognition from Assamese speeches using MFCC features and GMM classifier | |
CN110931111A (en) | Autism auxiliary intervention system and method based on virtual reality and multi-mode information | |
CN103413113A (en) | Intelligent emotional interaction method for service robot | |
JPH08339446A (en) | Interactive system | |
CN114121006A (en) | Image output method, device, equipment and storage medium of virtual character | |
Freitas et al. | An introduction to silent speech interfaces | |
CN110444189A (en) | One kind is kept silent communication means, system and storage medium | |
Siriwardena et al. | The secret source: Incorporating source features to improve acoustic-to-articulatory speech inversion | |
Meltzner et al. | Speech recognition for vocalized and subvocal modes of production using surface EMG signals from the neck and face. | |
US20240220811A1 (en) | System and method for using gestures and expressions for controlling speech applications | |
Kim et al. | Preliminary test of a wireless magnetic tongue tracking system for silent speech interface | |
Ye et al. | Attention bidirectional LSTM networks based mime speech recognition using sEMG data | |
Kim et al. | Multiview Representation Learning via Deep CCA for Silent Speech Recognition. | |
Li et al. | Interpreting sign components from accelerometer and sEMG data for automatic sign language recognition | |
CN111627444A (en) | Chat system based on artificial intelligence | |
Freitas et al. | Multimodal silent speech interface based on video, depth, surface electromyography and ultrasonic doppler: Data collection and first recognition results | |
Zhao et al. | Realizing speech to gesture conversion by keyword spotting | |
CN109822587B (en) | Control method for head and neck device of voice diagnosis guide robot for factory and mine hospitals |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |