CN112232127A - Intelligent speech training system and method - Google Patents

Intelligent speech training system and method Download PDF

Info

Publication number
CN112232127A
CN112232127A CN202010961200.9A CN202010961200A CN112232127A CN 112232127 A CN112232127 A CN 112232127A CN 202010961200 A CN202010961200 A CN 202010961200A CN 112232127 A CN112232127 A CN 112232127A
Authority
CN
China
Prior art keywords
image
sound
signal
speaker
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010961200.9A
Other languages
Chinese (zh)
Inventor
赵新博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Liaoning University Of International Business And Economics
Original Assignee
Liaoning University Of International Business And Economics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Liaoning University Of International Business And Economics filed Critical Liaoning University Of International Business And Economics
Priority to CN202010961200.9A priority Critical patent/CN112232127A/en
Publication of CN112232127A publication Critical patent/CN112232127A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Abstract

The invention discloses an intelligent speech training system, which utilizes a human body biological state sensor to collect gesture actions and sound information of a speaker, processes the collected gesture actions through an image processing module to obtain pure image graphic characteristic signals, converts the sound signals of the speaker into electric signals through a sound processing module, performs noise reduction and filtering, compares the image graphic characteristic signals and the electric signals of the speaker sound with prestored standard gesture actions and speech sounds through a central processing module, and gives improvement suggestions, and the improvement suggestions are output through an output module and are transmitted to the speaker for timely improvement. The invention solves the problems that the prior speech training is not standard, can not carry out standard guidance, has poor training effect and excessively depends on experienced persons for guidance.

Description

Intelligent speech training system and method
Technical Field
The invention relates to the field of speech training, in particular to an intelligent speech training system and method.
Background
In the traditional speech instruction training process, comprehensive judgment can be carried out only by manually watching the speech state of a speaker and observing the posture of the speaker and the sound size and the emotional intensity of the speaker. Moreover, different instructors have different opinions, so that a standard which is more standard and uniform cannot be formed, and deviation is easy to occur.
With the continuous development of science and technology and the introduction of information technology, computer technology and artificial intelligence technology, the research of robots has gradually gone out of the industrial field and gradually expanded to the fields of medical treatment, health care, families, entertainment, service industry and the like. The requirements of people on the robot are also improved from simple and repeated mechanical actions to an intelligent robot with anthropomorphic question answering, autonomy and interaction with other robots, and human-computer interaction also becomes an important factor for determining the development of the intelligent robot. The robot collects the speech state of the speaker and compares the speech state with the standard state, so that the teaching of speech training becomes a development trend
Disclosure of Invention
Therefore, the invention provides an intelligent speech training system and method, and aims to solve the problems that the existing speech training is not standard, standard guidance cannot be performed, the training effect is poor, and guidance is excessively performed by an experienced person.
In order to achieve the above purpose, the invention provides the following technical scheme:
according to the first aspect of the invention, the intelligent speech training system is disclosed, wherein a human body biological state sensor is utilized to collect gesture actions and sound information of a speaker, an image processing module is used for processing the collected gesture actions to obtain pure image graphic characteristic signals, a sound processing module is used for converting sound signals of the speaker into electric signals and carrying out noise reduction and filtering, a central processing module is used for comparing the image graphic characteristic signals and the electric signals of the speaker sound with pre-stored standard gesture actions and speech sounds and providing improvement suggestions, and the improvement suggestions are output through an output module and are transmitted to the speaker for timely improvement.
Further, the human body biological state sensor includes: the voice recognition system comprises a posture detecting instrument and a voice collecting device, wherein the posture detecting instrument is horizontally erected in a range of 2-3 m in front of a speaker through a support, the height of the erected support is 1.6 m, the voice collecting device is installed on clothes or a podium of the speaker, the posture detecting instrument collects facial expressions and limb actions of the speaker, and the voice collecting device collects voice signals of the speaker.
Furthermore, the image processing module is connected with the gesture detector and used for carrying out image graph signal extraction, image graph signal preprocessing, image graph signal feature extraction, direction analysis and intelligent tracking and image graph information coding storage on the collected facial expression and limb action images.
Further, the image processing module performs noise reduction processing and feature enhancement processing on the recorded image graphics signals to obtain relatively pure image graphics signal feature vectors, and the condition of subsequent feature extraction is met; the two parts of image graphic signal extraction and preprocessing are mainly realized by front-end camera equipment; the characteristic extraction of the image graphic signal is to extract characteristic information which is contained in the sequence image and can be used for target tracking; the direction analysis and intelligent tracking realize the recognition of a certain characteristic direction to judge the activity range and the activity frequency of the characteristic, and simultaneously, the intelligent tracking realizes the recording of the motion trail of the characteristic target; the image graphic information is coded and stored, and the analyzed and tracked image graphic information is coded in a coding mode, so that the information quantity is compressed and stored, and convenience is brought to subsequent systems for extracting information.
Furthermore, the sound collection device collects the sound information of the speaker and then sends the sound information to the sound processing module, the sound processing module converts the sound signal into an electric signal, the sound loudness, the frequency, the content, the duration and the time interval between each byte are obtained, the electric signal is filtered and denoised through the filter, and the electric signal which is purified and filtered to eliminate clutter interference is obtained through the amplifier.
Furthermore, central processing module embeds there are memory cell, training unit and comparison unit, the signal of telecommunication of figure image signal and sound is stored to the memory cell, the training unit utilizes the standard action figure image signal and the corresponding standard sound signal of telecommunication that a large amount of articles for speech correspond, the signal of telecommunication of figure image signal and sound and standard action figure image signal and the corresponding standard sound signal of telecommunication that the comparison unit will gather compare.
Furthermore, the training unit performs learning training by using corresponding actions and sounds corresponding to lecture materials prepared in advance, different types of lecture materials correspond to different gesture actions and lecture sound emotions, and standard guidance actions and sounds can be output for different lecture materials after training is completed.
Furthermore, the comparison unit compares the collected electric signals of the actual graph image signals and the sound with the electric signals of the standard graph image signals and the sound, labels different places, points out differences, provides corresponding improvement suggestions, and transmits the improvement suggestions to the output module for output.
Further, the output module transmits the specific improvement suggestion to a display screen in front of the speaker and transmits the related voice prompt information to a Bluetooth headset worn by the speaker through the Bluetooth module.
According to a second aspect of the present invention, a method for intelligent speech training is disclosed, the method comprising:
collecting facial expressions and limb actions of the speaker by using an attitude detector, and collecting a sound signal of the speaker by using a sound collection device;
the image processing module is connected with the gesture detector and is used for carrying out image graphic signal extraction, image graphic signal preprocessing, image graphic signal characteristic extraction, direction analysis and intelligent tracking and image graphic information coding storage on the collected facial expression and limb action images;
the sound processing module converts the sound signal into an electric signal, the electric signal is filtered and denoised by a filter and amplified by an amplifier to obtain a pure electric signal for filtering clutter interference;
a training unit in the central processing module is used for training by utilizing a large amount of data in advance, and after the training is finished, standard guide actions and sounds can be output according to different lecture materials;
comparing the collected electric signals of the actual graph image signals and the sound with the electric signals of the standard graph image signals and the sound, and giving improvement suggestions aiming at different places;
the improvement suggestion is conveyed through the display screen and the bluetooth headset that the lecturer is in front of, makes the lecturer receive relevant suggestion to carry out the adjustment training of speech state, make the speech more normal, the speech level obtains promoting.
The invention has the following advantages:
the invention discloses an intelligent speech training system and method, which are characterized in that attitude actions and sound signals of a speaker are collected, a central processing module is used for comparing the processed attitude actions and sound signals with standard actions and sound, different places are marked, improvement suggestions are output, and the improvement suggestions are fed back to the speaker through a display screen and a Bluetooth headset. The lecturer adjusts according to the given improvement suggestion, has better guiding significance, improves the lecture level of the lecturer, and can form a personalized guidance scheme aiming at different lecture contents.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It should be apparent that the drawings in the following description are merely exemplary, and that other embodiments can be derived from the drawings provided by those of ordinary skill in the art without inventive effort.
The structures, ratios, sizes, and the like shown in the present specification are only used for matching with the contents disclosed in the specification, so as to be understood and read by those skilled in the art, and are not used to limit the conditions that the present invention can be implemented, so that the present invention has no technical significance, and any structural modifications, changes in the ratio relationship, or adjustments of the sizes, without affecting the effects and the achievable by the present invention, should still fall within the range that the technical contents disclosed in the present invention can cover.
Fig. 1 is a flowchart of an intelligent speech training system according to an embodiment of the present invention;
fig. 2 is a schematic diagram of hardware connection of an intelligent speech training system according to an embodiment of the present invention;
Detailed Description
The present invention is described in terms of particular embodiments, other advantages and features of the invention will become apparent to those skilled in the art from the following disclosure, and it is to be understood that the described embodiments are merely exemplary of the invention and that it is not intended to limit the invention to the particular embodiments disclosed. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
This embodiment discloses an intelligence speech training system, the system utilizes human biological state sensor to gather speaker's gesture action and sound information, handle the gesture action of gathering through image processing module, obtain pure image figure characteristic signal, turn into the signal of telecommunication through sound processing module with speaker's sound signal, and fall the noise, the filtering, central processing module compares the signal of telecommunication of image figure characteristic signal and speaker sound with the standard gesture action and the speech sound of prestoring, and give the improvement suggestion, the improvement suggestion is exported through output module, send to the speaker and in time improve.
The human body biological state sensor includes: the voice recognition system comprises a posture detecting instrument and a voice collecting device, wherein the posture detecting instrument is horizontally erected in a range of 2-3 m in front of a speaker through a support, the height of the erected support is 1.6 m, the voice collecting device is installed on clothes or a podium of the speaker, the posture detecting instrument collects facial expressions and limb actions of the speaker, and the voice collecting device collects voice signals of the speaker. The information that the gesture detection instrument gathered includes human facial expression action and health action, and facial expression includes: smile, anger, joy, sadness, excitement, etc., and the physical actions include: waving hands, clenching fists, clapping palms, etc. The sound collection device 2 collects speech sound content of a speaker, and the sound state of the speaker includes: thriving, low, gentle, cheerful and the like. And sending the collected information to a memory for storage.
The image processing module is connected with the gesture detector and used for carrying out image graph signal extraction, image graph signal preprocessing, image graph signal feature extraction, direction analysis and intelligent tracking and image graph information coding storage on the collected facial expression and limb action images. The image processing module carries out noise reduction processing and feature enhancement processing on the recorded image graphic signals to obtain relatively pure image graphic signal feature vectors, and the condition of subsequent feature extraction is met; the two parts of image graphic signal extraction and preprocessing are mainly realized by front-end camera equipment; the characteristic extraction of the image graphic signal is to extract characteristic information which is contained in the sequence image and can be used for target tracking; the direction analysis and intelligent tracking realize the recognition of a certain characteristic direction to judge the activity range and the activity frequency of the characteristic, and simultaneously, the intelligent tracking realizes the recording of the motion trail of the characteristic target; the image graphic information is coded and stored, and the analyzed and tracked image graphic information is coded in a coding mode, so that the information quantity is compressed and stored, and convenience is brought to subsequent systems for extracting information.
The voice collecting device collects voice information of a speaker and then sends the voice information to the voice processing module to establish an acoustic model, and the acoustic model aims to provide an effective method for calculating the distance between a feature vector sequence of voice and each pronunciation template. The design of acoustic models is closely related to the characteristics of speech pronunciation. The size of the acoustic model unit (a word pronunciation model, a semisyllable model, or a phoneme model) has a large influence on the size of the voice training data volume, the system recognition rate, and the flexibility. The size of the recognition unit must be determined according to the characteristics of different languages and the size of the vocabulary of the recognition system. The acoustic model elements commonly used at present are initials, finals, syllables or words, and different elements are selected according to different implementation purposes. The Chinese and tone words have 412 syllables including light tone words and 1282 toned syllable words, so that words are often selected as elements when the isolated word pronunciation is recognized in a small vocabulary, syllables or initial consonants and vowels are often adopted for the voice recognition in a large vocabulary, and initial consonant and vowel modeling is often adopted due to the influence of cooperative pronunciation when the continuous voice is recognized. The common statistical-based speech recognition model is an HMM model lambda (N, M, pi, A, B), and the related theories related to the HMM model include structure selection of the model, initialization of the model, reestimation of model parameters, a corresponding recognition algorithm and the like.
The sound processing module converts the sound signal into an electric signal, obtains the sound loudness, frequency, content, duration and time interval among all bytes, filters and reduces noise of the electric signal through a filter, and amplifies the electric signal through an amplifier to obtain the electric signal which is pure and filtered and has clutter interference.
The central processing module is internally provided with a storage unit, a training unit and a comparison unit, wherein the storage unit is internally provided with electric signals of graphic image signals and sound, the training unit utilizes a large number of standard action graphic image signals and corresponding standard sound electric signals corresponding to the lecture articles, and the comparison unit is used for comparing the collected electric signals of the graphic image signals and the sound with the standard action graphic image signals and the corresponding standard sound electric signals.
The training unit can adopt a convolutional neural network model for training, corresponding actions and sounds corresponding to the lecture materials prepared in advance are utilized for learning and training, different types of lecture materials correspond to different gesture actions and lecture sound emotions, and after training is completed, standard guidance actions and sounds can be output according to different lecture materials. The training data comes from the body movements, facial expressions, speech sound changes, emotion changes and the like of the speech experts aiming at different speech materials in the industry. After a large amount of training, the corresponding standard gesture action and speech sound can be output aiming at the speech manuscript.
The comparison unit compares the collected electric signals of the actual graph image signals and the sound with the electric signals of the standard graph image signals and the sound, and the comparison content comprises the following steps: the action amplitude, the gesture, the action making time, the action making period, the action making frequency and the change of the facial expression are compared aiming at the micro actions of the five sense organs; when the sound is compared, the comparison is carried out according to the tone, frequency, amplitude, loudness, emotional intensity and the like of the sound. And marking different places, indicating difference points, giving corresponding improvement suggestions, and transmitting the improvement suggestions to an output module for output.
The output module transmits the specific improvement suggestion to a display screen in front of the speaker and transmits the related voice prompt information to a Bluetooth headset worn by the speaker through the Bluetooth module. The display screen prompts the speaker of the places needing attention and improvement through text information, and the Bluetooth headset transmits related suggestions and instructions through voice. The lecturer can adjust the lecture state in time, and the continuous improvement training is carried out, so that the lecture level is improved.
Example 2
The embodiment discloses an intelligent speech training method, which comprises the following steps:
collecting facial expressions and limb actions of the speaker by using an attitude detector, and collecting a sound signal of the speaker by using a sound collection device;
the image processing module is connected with the gesture detector and is used for carrying out image graphic signal extraction, image graphic signal preprocessing, image graphic signal characteristic extraction, direction analysis and intelligent tracking and image graphic information coding storage on the collected facial expression and limb action images;
the sound processing module converts the sound signal into an electric signal, the electric signal is filtered and denoised by a filter and amplified by an amplifier to obtain a pure electric signal for filtering clutter interference;
a training unit in the central processing module is used for training by utilizing a large amount of data in advance, and after the training is finished, standard guide actions and sounds can be output according to different lecture materials;
comparing the collected electric signals of the actual graph image signals and the sound with the electric signals of the standard graph image signals and the sound, and giving improvement suggestions aiming at different places;
the improvement suggestion is conveyed through the display screen and the bluetooth headset that the lecturer is in front of, makes the lecturer receive relevant suggestion to carry out the adjustment training of speech state, make the speech more normal, the speech level obtains promoting.
Although the invention has been described in detail above with reference to a general description and specific examples, it will be apparent to one skilled in the art that modifications or improvements may be made thereto based on the invention. Accordingly, such modifications and improvements are intended to be within the scope of the invention as claimed.

Claims (10)

1. The utility model provides an intelligence speech training system, a serial communication port, the system utilizes human biological state sensor to gather speaker's gesture action and sound information, gesture action through image processing module to gathering is handled, obtain pure image figure characteristic signal, turn into the signal of telecommunication through sound processing module with speaker's sound signal, and fall the noise, the filtering, central processing module compares the signal of telecommunication of image figure characteristic signal and speaker sound with the standard gesture action and the speech sound of prestoring, and give improvement suggestion, improve the suggestion and export through output module, send to the speaker and in time improve.
2. The intelligent speech training system of claim 1, wherein the human body biological state sensor comprises: the voice recognition system comprises a posture detecting instrument and a voice collecting device, wherein the posture detecting instrument is horizontally erected in a range of 2-3 m in front of a speaker through a support, the height of the erected support is 1.6 m, the voice collecting device is installed on clothes or a podium of the speaker, the posture detecting instrument collects facial expressions and limb actions of the speaker, and the voice collecting device collects voice signals of the speaker.
3. The intelligent speech training system of claim 1, wherein the image processing module is connected to the gesture detector for image and graphic signal extraction, image and graphic signal preprocessing, image and graphic signal feature extraction, direction analysis and intelligent tracking, and image and graphic information encoding and storage of the collected facial expression and limb movement images.
4. The intelligent speech training system of claim 3, wherein the image processing module performs noise reduction and feature enhancement on the recorded image graphics signal to obtain a relatively clean image graphics signal feature vector, which satisfies the condition of subsequent feature extraction; the two parts of image graphic signal extraction and preprocessing are mainly realized by front-end camera equipment; the characteristic extraction of the image graphic signal is to extract characteristic information which is contained in the sequence image and can be used for target tracking; the direction analysis and intelligent tracking realize the recognition of a certain characteristic direction to judge the activity range and the activity frequency of the characteristic, and simultaneously, the intelligent tracking realizes the recording of the motion trail of the characteristic target; the image graphic information is coded and stored, and the analyzed and tracked image graphic information is coded in a coding mode, so that the information quantity is compressed and stored, and convenience is brought to subsequent systems for extracting information.
5. The intelligent speech training system according to claim 1, wherein the sound collection device collects the voice information of the speaker and sends the voice information to the sound processing module, the sound processing module converts the voice signal into an electrical signal, obtains the loudness, frequency, content, duration of the voice and the time interval between each byte, filters the electrical signal through a filter to reduce noise, and amplifies the electrical signal through an amplifier to obtain the electrical signal with pure noise interference removed.
6. The intelligent speech training system according to claim 1, wherein the central processing module is provided with a memory unit, a training unit and a comparison unit, the memory unit stores graphic image signals and sound electrical signals, the training unit utilizes a plurality of standard motion graphic image signals and corresponding standard sound electrical signals corresponding to speech articles, and the comparison unit compares the collected graphic image signals and sound electrical signals with the standard motion graphic image signals and corresponding standard sound electrical signals.
7. The intelligent speech training system according to claim 6, wherein the training unit performs learning training by using corresponding actions and sounds corresponding to speech materials prepared in advance, different types of speech materials correspond to different gesture actions and speech sound emotions, and after training is completed, standard guidance actions and sounds can be output for different speech materials.
8. The intelligent speech training system of claim 6, wherein the comparison unit compares the collected electric signals of the actual graphic image signal and the sound with the electric signals of the standard graphic image signal and the sound, labels different places, points of difference, gives a corresponding improvement suggestion, and transmits the improvement suggestion to the output module for output.
9. The intelligent lecture training system of claim 8, wherein the output module transmits specific improvement suggestions to a display screen in front of the lecturer and transmits related voice prompt information to a bluetooth headset worn by the lecturer through the bluetooth module.
10. An intelligent speech training method is characterized by comprising the following steps:
collecting facial expressions and limb actions of the speaker by using an attitude detector, and collecting a sound signal of the speaker by using a sound collection device;
the image processing module is connected with the gesture detector and is used for carrying out image graphic signal extraction, image graphic signal preprocessing, image graphic signal characteristic extraction, direction analysis and intelligent tracking and image graphic information coding storage on the collected facial expression and limb action images;
the sound processing module converts the sound signal into an electric signal, the electric signal is filtered and denoised by a filter and amplified by an amplifier to obtain a pure electric signal for filtering clutter interference;
a training unit in the central processing module is used for training by utilizing a large amount of data in advance, and after the training is finished, standard guide actions and sounds can be output according to different lecture materials;
comparing the collected electric signals of the actual graph image signals and the sound with the electric signals of the standard graph image signals and the sound, and giving improvement suggestions aiming at different places;
the improvement suggestion is conveyed through the display screen and the bluetooth headset that the lecturer is in front of, makes the lecturer receive relevant suggestion to carry out the adjustment training of speech state, make the speech more normal, the speech level obtains promoting.
CN202010961200.9A 2020-09-14 2020-09-14 Intelligent speech training system and method Pending CN112232127A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010961200.9A CN112232127A (en) 2020-09-14 2020-09-14 Intelligent speech training system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010961200.9A CN112232127A (en) 2020-09-14 2020-09-14 Intelligent speech training system and method

Publications (1)

Publication Number Publication Date
CN112232127A true CN112232127A (en) 2021-01-15

Family

ID=74116213

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010961200.9A Pending CN112232127A (en) 2020-09-14 2020-09-14 Intelligent speech training system and method

Country Status (1)

Country Link
CN (1) CN112232127A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113257246A (en) * 2021-04-19 2021-08-13 歌尔股份有限公司 Prompting method, device, equipment, system and storage medium
CN113411252A (en) * 2021-06-22 2021-09-17 邓润阳 Speech platform and speech method
CN115629894A (en) * 2022-12-21 2023-01-20 深圳市人马互动科技有限公司 Speech prompting method and related device
CN117787921A (en) * 2024-02-27 2024-03-29 北京烽火万家科技有限公司 Intelligent education training management method and identity anti-counterfeiting method for intelligent education training

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103714248A (en) * 2013-12-23 2014-04-09 青岛优维奥信息技术有限公司 Training system for competitive speech
CN204889399U (en) * 2015-08-18 2015-12-23 蒋彬 Intelligence body -building mirror
CN106847263A (en) * 2017-01-13 2017-06-13 科大讯飞股份有限公司 Speech level evaluation method and apparatus and system
CN106997243A (en) * 2017-03-28 2017-08-01 北京光年无限科技有限公司 Speech scene monitoring method and device based on intelligent robot
CN206619289U (en) * 2017-02-24 2017-11-07 绥化学院 A kind of broadcaster's speech training device
CN206991571U (en) * 2017-05-17 2018-02-09 咸阳师范学院 A kind of sound comparator
CN108322865A (en) * 2017-12-28 2018-07-24 广州华夏职业学院 A kind of teaching private classroom speaker unit and application method
CN108921284A (en) * 2018-06-15 2018-11-30 山东大学 Interpersonal interactive body language automatic generation method and system based on deep learning
CN209962447U (en) * 2019-03-26 2020-01-17 共赢时代有限公司 Speech training device

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103714248A (en) * 2013-12-23 2014-04-09 青岛优维奥信息技术有限公司 Training system for competitive speech
CN204889399U (en) * 2015-08-18 2015-12-23 蒋彬 Intelligence body -building mirror
CN106847263A (en) * 2017-01-13 2017-06-13 科大讯飞股份有限公司 Speech level evaluation method and apparatus and system
CN206619289U (en) * 2017-02-24 2017-11-07 绥化学院 A kind of broadcaster's speech training device
CN106997243A (en) * 2017-03-28 2017-08-01 北京光年无限科技有限公司 Speech scene monitoring method and device based on intelligent robot
CN206991571U (en) * 2017-05-17 2018-02-09 咸阳师范学院 A kind of sound comparator
CN108322865A (en) * 2017-12-28 2018-07-24 广州华夏职业学院 A kind of teaching private classroom speaker unit and application method
CN108921284A (en) * 2018-06-15 2018-11-30 山东大学 Interpersonal interactive body language automatic generation method and system based on deep learning
CN209962447U (en) * 2019-03-26 2020-01-17 共赢时代有限公司 Speech training device

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113257246A (en) * 2021-04-19 2021-08-13 歌尔股份有限公司 Prompting method, device, equipment, system and storage medium
CN113257246B (en) * 2021-04-19 2023-03-14 歌尔股份有限公司 Prompting method, device, equipment, system and storage medium
CN113411252A (en) * 2021-06-22 2021-09-17 邓润阳 Speech platform and speech method
CN115629894A (en) * 2022-12-21 2023-01-20 深圳市人马互动科技有限公司 Speech prompting method and related device
CN117787921A (en) * 2024-02-27 2024-03-29 北京烽火万家科技有限公司 Intelligent education training management method and identity anti-counterfeiting method for intelligent education training

Similar Documents

Publication Publication Date Title
CN112232127A (en) Intelligent speech training system and method
Lee et al. Biosignal sensors and deep learning-based speech recognition: A review
CN110992987B (en) Parallel feature extraction system and method for general specific voice in voice signal
Rosen et al. Automatic speech recognition and a review of its functioning with dysarthric speech
US10878818B2 (en) Methods and apparatus for silent speech interface
CN108805087A (en) Semantic temporal fusion association based on multi-modal Emotion identification system judges subsystem
CN103413113A (en) Intelligent emotional interaction method for service robot
CN107972028A (en) Man-machine interaction method, device and electronic equipment
CN105807925A (en) Flexible electronic skin based lip language identification system and method
CN103366618A (en) Scene device for Chinese learning training based on artificial intelligence and virtual reality
Freitas et al. An introduction to silent speech interfaces
CN108052250A (en) Virtual idol deductive data processing method and system based on multi-modal interaction
CN115330911A (en) Method and system for driving mimicry expression by using audio
CN110444189A (en) One kind is kept silent communication means, system and storage medium
CN114121006A (en) Image output method, device, equipment and storage medium of virtual character
Kim et al. Preliminary test of a wireless magnetic tongue tracking system for silent speech interface
Ye et al. Attention bidirectional LSTM networks based mime speech recognition using sEMG data
Siriwardena et al. The secret source: Incorporating source features to improve acoustic-to-articulatory speech inversion
Freitas et al. Multimodal silent speech interface based on video, depth, surface electromyography and ultrasonic doppler: Data collection and first recognition results
Zhao et al. Realizing speech to gesture conversion by keyword spotting
Busso et al. Joint analysis of the emotional fingerprint in the face and speech: A single subject study
Iribe et al. Improvement of animated articulatory gesture extracted from speech for pronunciation training
CN109822587B (en) Control method for head and neck device of voice diagnosis guide robot for factory and mine hospitals
CN108109614A (en) A kind of new robot band noisy speech identification device and method
Vo et al. Automatic vowel sequence reproduction for a talking robot based on PARCOR coefficient template matching

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination