CN111951629A - Pronunciation correction system, method, medium and computing device - Google Patents

Pronunciation correction system, method, medium and computing device Download PDF

Info

Publication number
CN111951629A
CN111951629A CN201910408726.1A CN201910408726A CN111951629A CN 111951629 A CN111951629 A CN 111951629A CN 201910408726 A CN201910408726 A CN 201910408726A CN 111951629 A CN111951629 A CN 111951629A
Authority
CN
China
Prior art keywords
pronunciation
user
pronunciation correction
correction
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910408726.1A
Other languages
Chinese (zh)
Inventor
刘晨晨
沈欣尧
崔守首
胡太
孙怿
余津锐
刘阿猛
纪阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Liulishuo Information Technology Co ltd
Original Assignee
Shanghai Liulishuo Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Liulishuo Information Technology Co ltd filed Critical Shanghai Liulishuo Information Technology Co ltd
Priority to CN201910408726.1A priority Critical patent/CN111951629A/en
Publication of CN111951629A publication Critical patent/CN111951629A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B7/00Electrically-operated teaching apparatus or devices working with questions and answers
    • G09B7/02Electrically-operated teaching apparatus or devices working with questions and answers of the type wherein the student is expected to construct an answer to the question which is presented or wherein the machine gives an answer to the question presented by a student
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Business, Economics & Management (AREA)
  • Educational Administration (AREA)
  • Educational Technology (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The embodiment of the invention provides a pronunciation correction system; the system comprises: the testing module is configured to output testing content determined based on the user learning data, and collect first testing data input by a user according to the testing content, wherein the first testing data comprises face image data and voice data; the pronunciation correction module is configured to extract user pronunciation characteristics from the first test data and match a corresponding pronunciation correction strategy, and feed back pronunciation correction content to the user based on the pronunciation correction strategy, wherein the pronunciation correction content is used for indicating a pronunciation difference type and the corresponding pronunciation correction strategy. The targeted pronunciation correction strategy is obtained based on the user test data, and the corresponding pronunciation correction content is matched, so that targeted pronunciation correction feedback is realized, the pronunciation problem of the user is corrected, the pronunciation correction effect is improved, and the learning experience of the user is improved. In addition, the invention also provides a language pronunciation correction method, a medium and a computing device.

Description

Pronunciation correction system, method, medium and computing device
Technical Field
Embodiments of the present invention relate to the field of software, and more particularly, embodiments of the present invention relate to a pronunciation correction system, method, medium, and computing device.
Background
This section is intended to provide a background or context to the embodiments of the invention that are recited in the claims. The description herein is not admitted to be prior art by inclusion in this section.
Pronunciation capabilities are one of the important capabilities in language learning. In general, when learning various languages, learners can improve their pronunciation capability by reading aloud, reading with them, and so on. However, in most cases, the learner cannot know whether his own pronunciation is accurate, and although the traditional manual lecture method can evaluate the pronunciation ability of the learner, the evaluation is limited by the ability of the lecturer, and the evaluation cannot fully and accurately reflect the pronunciation problem of the learner.
Currently, the pronunciation evaluation function/pronunciation correction function in the existing language learning software or language learning terminal is to record the user's voice and feed back the evaluation result of the user's voice to the user to inform the user whether the pronunciation is accurate. However, the pronunciation evaluation result obtained by the existing pronunciation evaluation function/pronunciation correction function is not only single in content, but is usually an evaluation result of pronunciation accuracy of a user, and cannot correct a specific pronunciation error, so that the pronunciation evaluation result lacks pertinence, that is, the existing pronunciation evaluation function cannot expose weak links of most users in pronunciation, so that the existing pronunciation correction function cannot be trained based on the weak links of the users in pronunciation, and the pronunciation capability, reading capability and fluency of the users are difficult to improve.
Therefore, there is a need for an improved pronunciation correction scheme to solve the above-mentioned technical problems.
Disclosure of Invention
The pronunciation assessment result obtained by the pronunciation assessment/pronunciation correction function in the current language learning software or language learning tool is single in content and cannot indicate specific errors in pronunciation of a user, so that the pronunciation assessment result lacks pertinence, namely, the current pronunciation assessment cannot expose weak links of most users in pronunciation, the current pronunciation correction function cannot be trained based on the weak links of the users in pronunciation, and the pronunciation capability, the reading capability and the fluency of the users are difficult to improve. Therefore, an improved pronunciation correction solution is highly needed to solve the above technical problems.
In this context, embodiments of the present invention are intended to provide a pronunciation correction system, method, medium, and computing device.
In a first aspect of embodiments of the present invention, there is provided a pronunciation correction system comprising: a test module configured to output test contents determined based on the user learning data; acquiring first test data input by a user according to test contents, wherein the first test data comprises facial image data and voice data;
a pronunciation correction module configured to extract user pronunciation characteristics from the first test data and match a corresponding pronunciation correction strategy; and feeding back pronunciation correction contents to the user based on the pronunciation correction strategy, wherein the pronunciation correction contents are used for indicating the pronunciation difference type and the corresponding pronunciation correction strategy.
In one embodiment of the invention, the test module is further configured to: after the pronunciation correction module feeds back pronunciation correction contents to the user based on the pronunciation correction strategy, collecting second test data input by the user according to the pronunciation correction contents, wherein the second test data comprises facial image data and voice data; and obtaining an exercise result based on the second test data and a preset exercise strategy.
In one embodiment of the invention, the test module is further configured to: before the first test data or the second test data is acquired, whether the image acquisition device identifies a face region of the user is determined.
When the test module collects the first test data or the second test data, the test module is specifically configured to: if the image acquisition equipment identifies the face area of the user, outputting a test starting instruction to the user; and/or triggering the image acquisition equipment to start image data acquisition on the face area of the user to obtain face image data, wherein the face image data comprises image data corresponding to the facial action generated by the user based on the test content or pronunciation correction content.
In one embodiment of the invention, the test module is further configured to: after judging whether the image acquisition equipment identifies the face area of the user, if the image acquisition equipment does not identify the face area of the user, outputting an identification result; and/or indicating to the user through the adjusting instruction to adjust the relative position between the face area of the user and the image acquisition equipment.
In one embodiment of the invention, the facial image data includes mouth shape data and/or facial key point data.
In one embodiment of the invention, the user articulation feature comprises an articulation image feature. When the pronunciation correction module extracts the user pronunciation characteristics from the first test data and matches the corresponding pronunciation correction strategy, the pronunciation correction module is specifically configured to: extracting pronunciation image features from at least one image frame corresponding to the facial image data, wherein the pronunciation image features comprise user pronunciation mouth shape features corresponding to preset phonetic symbols; comparing the user pronunciation mouth shape characteristics corresponding to the preset phonetic symbols with the standard mouth shape characteristics corresponding to the preset phonetic symbols and matching the corresponding mouth shape difference types; and setting the mouth shape difference correction strategy corresponding to the mouth shape difference type as a pronunciation correction strategy.
In one embodiment of the present invention, the pronunciation correction content includes a pronunciation correction image. The pronunciation correction module is specifically used for simulating a mouth shape image used as a mouth shape correction guide image when the user pronounces a preset phonetic symbol correctly based on a mouth shape difference correction strategy and user facial features extracted from facial image data when the pronunciation correction module pushes pronunciation correction contents to the user based on the pronunciation correction strategy; and pushing the simulated mouth shape correction guide image as pronunciation correction content to the user.
In one embodiment of the present invention, the pronunciation correction module is further configured to: before collecting pronunciation correction data input by a user according to pronunciation correction contents, judging whether the image collection equipment identifies a user face area; the pronunciation correction module is specifically configured to, when collecting pronunciation correction data input by a user according to pronunciation correction contents and determining a pronunciation correction result: if the image acquisition equipment identifies the face area of the user, outputting an image acquisition starting instruction to the user; and/or triggering image acquisition equipment to start image data acquisition on a user face area so as to take the obtained face image data as pronunciation correction data; and comparing the user pronunciation mouth shape features extracted from the pronunciation correction data with the standard mouth shape features and matching corresponding pronunciation correction results.
In a second aspect of the embodiments of the present invention, there is also provided a pronunciation correction method, including: outputting test contents determined based on the user learning data; collecting first test data input by a user according to test contents and determining a pronunciation correction strategy, wherein the first test data comprises facial image data and voice data; extracting user pronunciation characteristics from the first test data and matching corresponding pronunciation correction strategies; and feeding back pronunciation correction contents to the user based on the pronunciation correction strategy, wherein the pronunciation correction contents are used for indicating the pronunciation difference type and the corresponding pronunciation correction strategy.
In an embodiment of the present invention, after feeding back pronunciation correction content to the user based on the pronunciation correction strategy, the method further includes: collecting second test data input by a user according to pronunciation correction content, wherein the second test data comprises facial image data and voice data; and obtaining an exercise result based on the second test data and a preset exercise strategy.
In an embodiment of the present invention, before acquiring the first test data or the second test data, the method further includes: it is determined whether the image capture device identifies a user's facial region. Collecting first test data or second test data, comprising: if the image acquisition equipment identifies the face area of the user, outputting a test starting instruction to the user; and/or triggering the image acquisition equipment to start image data acquisition on the face area of the user to obtain face image data, wherein the face image data comprises image data corresponding to the facial action generated by the user based on the test content or pronunciation correction content.
In one embodiment of the present invention, after determining whether the image capturing device identifies the face area of the user, the method further includes: if the image acquisition equipment does not identify the face area of the user, outputting an identification result; and/or indicating to the user through the adjusting instruction to adjust the relative position between the face area of the user and the image acquisition equipment.
In one embodiment of the invention, the facial image data includes mouth shape data and/or facial key point data.
In one embodiment of the invention, the user articulation feature comprises an articulation image feature. Extracting user pronunciation characteristics from the first test data and matching corresponding pronunciation correction strategies, comprising: extracting pronunciation image features from at least one image frame corresponding to the facial image data, wherein the pronunciation image features comprise user pronunciation mouth shape features corresponding to preset phonetic symbols; comparing the user pronunciation mouth shape characteristics corresponding to the preset phonetic symbols with the standard mouth shape characteristics corresponding to the preset phonetic symbols and matching the corresponding mouth shape difference types; and setting the mouth shape difference correction strategy corresponding to the mouth shape difference type as a pronunciation correction strategy.
In one embodiment of the present invention, the pronunciation correction content includes a pronunciation correction image. Based on a mouth shape difference correction strategy and user facial features extracted from facial image data, simulating a mouth shape image when the user correctly pronounces a preset phonetic symbol as a mouth shape correction guide image; and pushing the simulated mouth shape correction guide image as pronunciation correction content to the user.
In an embodiment of the present invention, before collecting pronunciation correction data input by a user according to pronunciation correction contents, the method further includes: it is determined whether the image capture device identifies a user's facial region. Collecting pronunciation correction data input by a user according to pronunciation correction contents and determining a pronunciation correction result, comprising: if the image acquisition equipment identifies the face area of the user, outputting an image acquisition starting instruction to the user; and/or triggering image acquisition equipment to start image data acquisition on a user face area so as to take the obtained face image data as pronunciation correction data; and comparing the user pronunciation mouth shape features extracted from the pronunciation correction data with the standard mouth shape features and matching corresponding pronunciation correction results.
In a third aspect of embodiments of the present invention, there is provided a medium storing computer-executable instructions for causing a computer to perform the functions of any one of the module configurations of the pronunciation correction system as described in the first aspect, or to perform the method of any one of the embodiments of the second aspect.
In a fourth aspect of embodiments of the present invention, there is provided a computing device comprising a processing unit, a memory, and an input/output (In/Out, I/O) interface; a memory for storing programs or instructions for execution by the processing unit; a processing unit, configured to execute the functions configured by any module in the pronunciation correction system according to the first aspect or execute the method according to any embodiment of the second aspect according to a program or instructions stored in a memory; an I/O interface for receiving or transmitting data under control of the processing unit.
According to the technical scheme provided by the embodiment of the invention, the targeted pronunciation correction strategy is obtained based on the user test data, and the corresponding pronunciation correction content is matched, so that the targeted pronunciation correction feedback is realized, the pronunciation problem of the user is corrected, the pronunciation correction effect is improved, and the learning experience of the user is improved. In addition, compared with the current pronunciation evaluation scheme, the embodiment of the invention consumes less computing resources and is more suitable for mobile terminal equipment.
Drawings
The above and other objects, features and advantages of exemplary embodiments of the present invention will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the invention are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:
fig. 1 schematically shows a schematic structural diagram of an application scenario according to an embodiment of the present invention;
FIG. 2 is a schematic diagram illustrating a structure of a pronunciation correction system according to an embodiment of the present invention;
FIG. 3A is a schematic diagram illustrating a structure of a user interface for presenting pronunciation tests according to an embodiment of the invention;
FIG. 3B is a schematic diagram illustrating another user interface for presenting pronunciation tests, according to an embodiment of the invention;
FIG. 4 is a flow chart of a pronunciation correction strategy acquisition method according to an embodiment of the invention;
FIG. 5 is a schematic diagram illustrating a structure of a user interface for presenting pronunciation correction content according to an embodiment of the present invention;
FIG. 6 is a flow chart illustrating a pronunciation correction method according to an embodiment of the present invention;
FIG. 7 schematically shows a schematic structural diagram of a medium according to an embodiment of the invention;
FIG. 8 schematically illustrates a structural diagram of a computing device in accordance with an embodiment of the present invention;
in the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.
Detailed Description
The principles and spirit of the present invention will be described with reference to a number of exemplary embodiments. It is understood that these embodiments are given solely for the purpose of enabling those skilled in the art to better understand and to practice the invention, and are not intended to limit the scope of the invention in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
As will be appreciated by one skilled in the art, embodiments of the present invention may be embodied as a system, apparatus, device, method, or computer program product. Accordingly, the present disclosure may be embodied in the form of: entirely hardware, entirely software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.
According to an embodiment of the present invention, a pronunciation correction system, method, medium, and computing device are provided.
The principles and spirit of the present invention are explained in detail below with reference to several representative embodiments of the invention. Moreover, any number of elements in the drawings is intended to be illustrative and not restrictive, and any naming of any elements in the drawings is intended to be distinguished only and not in any limiting sense.
The principles and spirit of the present invention are explained in detail below with reference to several representative embodiments of the invention.
Summary of The Invention
The inventor finds that the pronunciation evaluation result obtained by the pronunciation evaluation/pronunciation correction function in the current language learning software or language learning tool is single in content, usually is most strategy evaluation on pronunciation accuracy of a user, and cannot correct specific errors in pronunciation of the user, so that the pronunciation evaluation result is lack of pertinence, namely, the existing pronunciation evaluation cannot expose weak links of most users in pronunciation, and the existing pronunciation correction function cannot be trained based on the weak links of the users in pronunciation, so that the pronunciation capability, reading capability and fluency of the users are difficult to improve.
To overcome the problems of the prior art, the present invention proposes a pronunciation correction system, method, medium, and computing device. The system comprises: the testing module is configured to output testing content determined based on the user learning data, and collect first testing data input by a user according to the testing content, wherein the first testing data comprises face image data and voice data; the pronunciation correction module is configured to extract user pronunciation characteristics from the first test data and match a corresponding pronunciation correction strategy, and feed back pronunciation correction content to the user based on the pronunciation correction strategy, wherein the pronunciation correction content is used for indicating a pronunciation difference type and the corresponding pronunciation correction strategy.
According to the pronunciation correction system, the targeted pronunciation correction strategy is obtained based on the user test data, and the corresponding pronunciation correction content is matched, so that targeted pronunciation correction feedback is realized, the pronunciation problem of the user is corrected, the pronunciation correction effect is improved, and the learning experience of the user is improved. In addition, compared with the current pronunciation evaluation scheme, the embodiment of the invention consumes less computing resources and is more suitable for mobile terminal equipment.
Having described the general principles of the invention, various non-limiting embodiments of the invention are described in detail below.
Application scene overview
The embodiment of the invention can be applied to pronunciation learning scenes, in particular to pronunciation learning scenes or pronunciation correction scenes in language learning, wherein languages comprise but are not limited to foreign languages such as English, French, German and Japanese, and Chinese branches such as Mandarin, Cantonese and Sichuan. The language learning scenario according to the embodiment of the present invention may be, for example, a pronunciation evaluation scenario, a pronunciation correction scenario, or other language learning scenarios in the language learning software or the language learning terminal, and is not limited in the embodiment of the present invention.
Referring to fig. 1, fig. 1 is a schematic view of an application scenario of pronunciation learning/correction of the present invention, in fig. 1, a user may perform pronunciation learning through a terminal device a, where the terminal device a may display image content to be learned by the user on a terminal device interface, and may also output audio content in a voice form to the user through an audio playing device such as a speaker, and when the user performs pronunciation learning of a language, the terminal device a may further collect voice/audio data and video/image data of the user during pronunciation through a microphone (audio collecting device)/a camera (image collecting device), so as to assist in determining whether the user has learned correct language pronunciation. It is understood that the voice/audio data and the video/image data may be downloaded from a server by the terminal a, and the data collected by the terminal a may be analyzed and processed by the server.
The above application scenarios are only examples, and in an actual application process, the server may have multiple stages, that is, the receiving server may receive the video sent by the terminal device. And the processing server processes the received video data according to the pronunciation learning method of the invention to obtain the pronunciation correct and error result of the user, and then feeds back the pronunciation correct and error result to the terminal A so that the user can correct the error.
The device for bearing the user interface applicable to the embodiment of the invention comprises a terminal and/or a network device, wherein the terminal comprises but is not limited to the following electronic devices: smart phones, tablet computers, MP4, MP3, PCs, PDAs, wearable devices, head-mounted display devices, and the like; the network device includes, but is not limited to, a single network server, a server group consisting of a plurality of network servers, or a Cloud Computing (Cloud Computing) based Cloud consisting of a large number of computers or network servers, wherein Cloud Computing is one of distributed Computing, a super virtual computer consisting of a collection of loosely coupled computers. Wherein the computer device can be operated alone to implement the invention, or can be accessed to a network and implement the invention through interoperation with other computer devices in the network. Further, the network where the computer device is located includes, but is not limited to, the internet, a wide area network, a metropolitan area network, a local area network, a VPN network, and the like. It should be noted that the user equipment, the network device, the network, etc. are only examples, and other existing or future computer devices or networks may also be included in the scope of the present invention, and are included by reference.
Exemplary System
A system for pronunciation correction according to an exemplary embodiment of the present invention is described below with reference to fig. 1 in conjunction with an application scenario. It should be noted that the above application scenarios are merely illustrated for the convenience of understanding the spirit and principles of the present invention, and the embodiments of the present invention are not limited in this respect. Rather, embodiments of the present invention may be applied to any scenario where applicable.
An embodiment of the present invention provides a pronunciation correction system, as shown in fig. 2, the pronunciation correction system 200 at least includes:
a test module 201 configured to output test contents determined based on the user learning data; acquiring first test data input by a user according to test contents, wherein the first test data comprises facial image data and voice data;
a pronunciation correction module 202 configured to extract user pronunciation characteristics from the first test data and match a corresponding pronunciation correction strategy; pronunciation correction content is fed back to the user based on the pronunciation correction strategy, and the pronunciation correction content is used for indicating the pronunciation difference type and the corresponding pronunciation correction strategy.
The testing module 201 is further configured to collect second testing data input by the user according to the pronunciation correction content after the pronunciation correction module 202 feeds back the pronunciation correction content to the user based on the pronunciation correction strategy, and obtain an exercise result based on the second testing data and a preset exercise strategy. Wherein the second test data includes face image data and voice data.
The face image data according to the embodiment of the present invention includes image data corresponding to a face motion generated by a user based on test content or pronunciation correction content, for example, face video data when the user pronounces. The facial image data includes, but is not limited to, one or a combination of mouth shape data, facial key point data.
Considering that the existing pronunciation test or pronunciation learning technology is implemented based on audio data, and the audio data lacks accuracy in positioning the problem in the user pronunciation capability, the test module 201 also collects facial image data input by the user according to the test content or pronunciation correction content, so as to facilitate the subsequent positioning of the problem in the user pronunciation capability based on the facial image data, and further improve the accuracy in positioning the problem in the pronunciation capability.
The test module 201 is further configured to determine whether the image capture device identifies a facial region of the user before capturing the first test data or the second test data. Specifically, in an optional implementation manner, a Histogram of Oriented Gradient (HOG) image feature is used to perform face detection to obtain a Bounding Box (Bounding Box) of a user face area, so as to obtain a sliding window, and a sliding window manner is used to determine whether image data in all sliding windows is the user face area. The HOG image algorithm is combined with the local gradient and the gradient strength of the image to construct an image descriptor; the HOG image characteristic is a characteristic descriptor used for object detection in computer vision and image processing, and is formed by calculating and counting the gradient direction histogram of local areas of the image.
Optionally, before determining whether the image capturing device identifies the face area of the user, an image capturing area is set on the interface of the terminal device and displayed to the user, where the image capturing area corresponds to an image capturing range of the image capturing device. In determining whether the image capturing device recognizes the user face area, it is determined whether the user face area is within an image capturing range of the image capturing device, that is, whether the user face area displayed on the terminal device interface is within the image capturing area. Taking the example that the image capture area is set as a quadrilateral dashed frame in the terminal device interface shown in fig. 3A, the test module 201 may further determine whether the user's face area is within the quadrilateral dashed frame displayed on the terminal device interface before capturing the face image data in the first test data or the second test data.
In one case, the test module 201 determines that the image capturing device identifies the facial region of the user, and outputs a test start instruction to the user, where the test start instruction is used to instruct the user to start the acquisition process of the first test content, that is, to remind the user that a pronunciation test can be performed according to the test content. In this case, the test module 201 may also trigger the image capturing device to start capturing image data of the face region of the user to obtain the face image data. For example, triggering an image pickup device (such as a camera or a camera mounted on a language learning terminal) to start image data acquisition of the face region of the user.
It should be noted that the form of the test start instruction includes, but is not limited to, one or a combination of a prompt tone, a prompt animation, and an image identifier. For example, the test module 201 may play a prompt tone and play a prompt animation to the user in the presentation interface, so as to guide the user to record video data (i.e., capture image data).
In another case, the test module 201 determines that the image capturing device does not recognize the face region of the user, and outputs a recognition result, where the recognition result is used to indicate to the user that the image capturing device fails to recognize the face region of the user; and/or indicating to the user through the adjusting instruction to adjust the relative position between the face area of the user and the image acquisition equipment. The adjustment instruction related to the embodiment of the present invention includes, but is not limited to, target direction indication information, target angle indication information, target distance indication information, or other indication information for prompting an adjustment skill, for example, the target direction indication information may be "move left into box", "move up into box".
Taking the terminal device interface shown in fig. 3A as an example, before the facial image data in the first test data or the second test data is collected, the test module 201 sets the image collection area as a quadrilateral dashed frame in the terminal device interface, and determines whether the facial area of the user is within the quadrilateral dashed frame displayed on the terminal device interface. In a first preset time period, if an overlapping portion between the user face area and a quadrilateral dashed frame displayed on the terminal device interface is not greater than a threshold, it indicates that the image capturing device has not recognized the user face area within the first preset time period, in this case, the testing module 201 determines to continue recognizing the user face area, and outputs a recognition result "face recognition" of the first stage at the bottom of the terminal device interface. If the overlap between the user face area and the quadrilateral dashed frame displayed on the terminal device interface is not greater than the threshold yet within the second preset time period, it indicates that the image capturing device has not recognized the user face area within the second preset time period, in this case, the testing module 201 outputs an adjustment instruction "move the face into the frame" or "keep the face in the frame" in the terminal device interface. Alternatively, in this case, the test module 201 outputs the next test content according to the recognition result, so as to jump to the next test stage. It should be noted that the parameters such as the starting time point and the duration of the first preset time period may be preset or configured, and the parameters such as the starting time point and the duration of the second preset time period may be preset or configured. For example, the starting time point of the first preset time period is configured to have a duration of 30 seconds; the starting time point of the second preset time period is configured as the time when the first preset time period is ended, and the time length is configured as 1 minute and 30 seconds.
If the overlap between the user face area and the quadrilateral dashed frame displayed on the terminal device interface is greater than the threshold, that is, the test module 201 determines that the user face area is within the quadrilateral dashed frame displayed on the terminal device interface, it indicates that the image capturing device identifies the user face area, in this case, the test module 201 plays a "drop" sound (i.e., a prompt tone), and triggers the image capturing device to start to automatically capture the face image data corresponding to the user face area.
In an embodiment of the present invention, the user pronunciation features include pronunciation image features including, but not limited to, user pronunciation mouth shape features corresponding to the preset phonetic symbols. There are various implementations of the pronunciation correction module 202 extracting the user pronunciation characteristics from the first test data and matching the corresponding pronunciation correction strategy, wherein one implementation includes the following steps:
s401, extracting pronunciation image features from at least one image frame corresponding to the face image data.
Specifically, a user utterance start frame and a user utterance end frame are detected from the face image data, and the face image data is clipped to obtain at least one image frame based on the detected user utterance start frame and user utterance end frame. The method for detecting the user pronunciation start frame and the user pronunciation end frame in the embodiment of the invention can be, for example, a z-score threshold matching method, wherein z-score is the standard deviation divided by the average value subtracted from the current facial image data, and the smaller the z-score value is, the smaller the fluctuation of the facial image data is. A plurality of facial key points corresponding to each image frame are acquired from at least one image frame, and the distances between the plurality of facial key points and a preset central point are used as pronunciation image features, wherein the specific form of the pronunciation image features comprises but is not limited to a pronunciation image feature sequence. Furthermore, a first-order partial derivative is calculated for the pronunciation image feature sequence to obtain a partial derivative feature sequence, which is beneficial to improving the accuracy of pronunciation image features and improving the recognition effect of facial image data with poor acquisition effect.
S402, comparing the user pronunciation mouth shape characteristics corresponding to the preset phonetic symbols with the standard mouth shape characteristics corresponding to the preset phonetic symbols and matching the corresponding mouth shape difference types. Before S402, a user pronunciation mouth shape feature corresponding to a preset phonetic symbol is extracted from the pronunciation image feature.
Specifically, a pronunciation image feature sequence corresponding to a preset phonetic symbol is selected from at least one pronunciation image feature sequence corresponding to at least one image frame, and feature extraction is performed on the pronunciation image feature sequence to obtain a user pronunciation mouth shape feature corresponding to the preset phonetic symbol, wherein the user pronunciation mouth shape feature includes but is not limited to a mouth shape feature. Classifying the characteristics of the user pronunciation mouth shape corresponding to the preset phonetic symbols and the characteristics of the standard mouth shape corresponding to the preset phonetic symbols through a mouth shape classifier, if the characteristics of the user pronunciation mouth shape and the characteristics of the standard mouth shape do not belong to the same category, indicating that the deviation exists between the user pronunciation mouth shape and the standard mouth shape, and obtaining the mouth shape difference type according to the category to which the characteristics of the standard mouth shape belong. If the user pronunciation mouth shape feature and the standard mouth shape feature belong to the same category, no difference between the user pronunciation mouth shape and the standard mouth shape is shown.
It should be noted that the selection criteria of the pronunciation image feature sequence corresponding to the preset phonetic symbol may be set based on the mouth shape pronunciation condition, such as the following selection criteria: aiming at the phonetic symbols of MAX _ HEIGHT, the pronunciation image feature sequence is a pronunciation image feature sequence corresponding to the image frame at the maximum moment of the mouth shape; phonetic symbols for STANDTILL, i.e. phonetic symbols which do not open their mouth when pronounced, e.g.
Figure BDA0002062127200000121
Then pronunciation image featureThe sequence is a pronunciation image characteristic sequence corresponding to the image frame at the pause moment within a preset duration range; aiming at the phonetic symbols of MIN _ MAX, the pronunciation image feature sequence is a pronunciation image feature sequence corresponding to the image frame at the minimum moment and the maximum moment of the mouth shape; for the phonetic symbol of MIN _ MIN, such as vowel, the pronunciation image feature sequence is the pronunciation image feature sequence corresponding to the image frame at the maximum time of multiple mouth shapes. Besides the above four selection criteria, the pronunciation image feature sequence corresponding to the preset phonetic symbol may be selected by other manners or criteria, and the embodiment of the present invention is not limited.
S403, setting the mouth shape difference correction strategy corresponding to the mouth shape difference type as a pronunciation correction strategy.
More particularly, embodiments of the present invention relate to a mouth shape difference correction strategy corresponding to a mouth shape difference type, which includes a plurality of types, such as a user's mouth opening requiring more amplitude (enlargee), a user's mouth opening requiring moderate amplitude (MIDDLE), a user's mouth opening requiring less amplitude (SAMLL), a user's mouth Rounding (ROUND), a user's mouth splitting a little more or a user's mouth stretching into a "one" shape (FLAT).
In the embodiment of the invention, the pronunciation correction content comprises a pronunciation correction image and/or pronunciation correction prompt information. The pronunciation correction module 202 has a plurality of implementation manners for pushing pronunciation correction contents to the user based on the pronunciation correction policy, wherein one implementation manner is as follows: and generating pronunciation correction prompt information based on the mouth shape difference correction strategy and pushing the pronunciation correction prompt information to the user. Taking the terminal device interface shown in fig. 3B as an example, based on the mouth shape difference correction policy "entry", the pronunciation correction prompt message "mouth opens a little more", and the pronunciation correction prompt message "mouth opens a little more" is displayed in the middle of the terminal device interface.
The other realization mode is as follows: based on the mouth shape difference correction strategy and the user face features extracted from the face image data, simulating a mouth shape image when the user correctly pronounces a preset phonetic symbol as a mouth shape correction guide image, and pushing the simulated mouth shape correction guide image as pronunciation correction content to the user.
Still another implementation is: and selecting a mouth shape correction teaching image and/or mouth shape correction teaching characters from the pronunciation correction content library based on the mouth shape difference correction strategy as pronunciation correction contents to be pushed to the user. For example, the phonetic symbol/i:/instructional image and the phonetic symbol/i:/instructional text shown in the terminal device interface shown in fig. 5. Specifically, in one embodiment, based on the mouth shape difference correction policy, a standard mouth shape video matching a preset phonetic symbol is selected from pre-entered standard mouth shape videos, and the standard mouth shape video matching the preset phonetic symbol is pushed to a user.
After extracting the user pronunciation feature from the first test data, the pronunciation correction module 202 may in a further possible embodiment indicate to the user that the user pronunciation mouth shape is correct if it is determined that the user pronunciation mouth shape feature extracted from the facial image data is consistent with the standard mouth shape feature or if it is determined that the user pronunciation mouth shape feature extracted from the facial image data is similar to the standard mouth shape feature by more than a threshold, in which case the user pronunciation is correct or the user pronunciation mouth shape is correct. Further, it is determined in this case that the pronunciation correction contents are not fed back to the user.
The pronunciation correction module 202 sets the pronunciation correction policy to a retest policy after comparing the user pronunciation mouth shape features extracted from the facial image data with the standard mouth shape features and matching the corresponding mouth shape difference types. In this case, when the pronunciation correction module 202 pushes the pronunciation correction content to the user based on the pronunciation correction policy, specifically, a retest instruction for instructing the user to input test data based on the test content again or an end instruction for instructing the user to end the pronunciation correction is pushed to the user based on the retest policy. It should be noted that this step is similar to the step of outputting the test content by the test module 201, and the similarities are referred to each other and are not described herein again.
The pronunciation correction module 202 determines whether the image capture device recognizes a facial region of the user before capturing pronunciation correction data input by the user based on the pronunciation correction content. It should be noted that, here, the implementation manner of the pronunciation correcting module 202 determining whether the image capturing device recognizes the facial region of the user is similar to the implementation manner of the testing module 201 determining whether the image capturing device recognizes the facial region of the user, and the similar points are referred to each other, and are not described here again.
The pronunciation correction module 202 collects pronunciation correction data input by a user according to pronunciation correction contents and determines a plurality of implementation manners of pronunciation correction results, and one of the implementation manners is described herein, and if it is determined that the image acquisition device identifies a facial region of the user, an image acquisition start instruction is output to the user; and/or triggering image acquisition equipment to start image data acquisition on a user face area so as to take the obtained face image data as pronunciation correction data; and comparing the user pronunciation mouth shape features extracted from the pronunciation correction data with the standard mouth shape features and matching corresponding pronunciation correction results. The pronunciation correction data is similar to at least one of the image frames in the above S401 to S402, and the similarities are referred to each other and will not be described herein.
Specifically, the pronunciation mouth shape feature of the user is extracted from the pronunciation correction data, the pronunciation mouth shape feature of the user and the standard mouth shape feature are classified through a mouth shape classifier, if the pronunciation mouth shape feature of the user and the standard mouth shape feature do not belong to the same category, the fact that the pronunciation mouth shape of the user is deviated from the standard mouth shape is indicated, and a pronunciation correction result is obtained according to the category to which the standard mouth shape feature belongs. If the pronunciation mouth shape feature of the user and the standard mouth shape feature belong to the same category, the fact that the pronunciation mouth shape of the user is not different from the standard mouth shape is shown, and in the situation, the pronunciation correction result is used for indicating that the pronunciation mouth shape of the user is correct.
Exemplary method
Having described the system of exemplary embodiments of the present invention, the following description provides exemplary methods of implementation. The pronunciation correction method provided by the invention can realize the method executed by any module in the system provided by the embodiment corresponding to the embodiment in FIG. 2. Referring to fig. 6, the pronunciation correction method at least includes:
s601, outputting test contents determined based on user learning data;
s602, collecting first test data input by a user according to test contents and determining a pronunciation correction strategy, wherein the first test data comprises facial image data and voice data;
s603, extracting user pronunciation characteristics from the first test data and matching corresponding pronunciation correction strategies;
and S604, feeding back pronunciation correction contents to the user based on the pronunciation correction strategy, wherein the pronunciation correction contents are used for indicating the pronunciation difference type and the corresponding pronunciation correction strategy.
In an embodiment of the present invention, the face image data includes mouth shape data and/or face key point data.
Optionally, after the step S604 of feeding back the pronunciation correction content to the user based on the pronunciation correction policy, the method further includes: collecting second test data input by a user according to pronunciation correction content, wherein the second test data comprises facial image data and voice data; and obtaining an exercise result based on the second test data and a preset exercise strategy.
Optionally, before the acquiring the first test data or the second test data in S602, the method further includes: it is determined whether the image capture device identifies a user's facial region. Collecting first test data or second test data, comprising: if the image acquisition equipment identifies the face area of the user, outputting a test starting instruction to the user; and/or triggering the image acquisition equipment to start image data acquisition on the face area of the user to obtain face image data, wherein the face image data comprises image data corresponding to the facial action generated by the user based on the test content or pronunciation correction content.
Further, after determining whether the image capturing device identifies the face area of the user, the method further includes: if the image acquisition equipment does not identify the face area of the user, outputting an identification result; and/or indicating to the user through the adjusting instruction to adjust the relative position between the face area of the user and the image acquisition equipment.
Optionally, the user articulation feature comprises an articulation image feature. Extracting user pronunciation characteristics from the first test data and matching corresponding pronunciation correction strategies, comprising: extracting pronunciation image features from at least one image frame corresponding to the facial image data, wherein the pronunciation image features comprise user pronunciation mouth shape features corresponding to preset phonetic symbols; comparing the user pronunciation mouth shape characteristics corresponding to the preset phonetic symbols with the standard mouth shape characteristics corresponding to the preset phonetic symbols and matching the corresponding mouth shape difference types; and setting the mouth shape difference correction strategy corresponding to the mouth shape difference type as a pronunciation correction strategy.
Optionally, the pronunciation correction content includes a pronunciation correction image. Based on a mouth shape difference correction strategy and user facial features extracted from facial image data, simulating a mouth shape image when the user correctly pronounces a preset phonetic symbol as a mouth shape correction guide image; and pushing the simulated mouth shape correction guide image as pronunciation correction content to the user.
Optionally, before collecting pronunciation correction data input by the user according to the pronunciation correction content, the method further includes: it is determined whether the image capture device identifies a user's facial region. Collecting pronunciation correction data input by a user according to pronunciation correction contents and determining a pronunciation correction result, comprising: if the image acquisition equipment identifies the face area of the user, outputting an image acquisition starting instruction to the user; and/or triggering image acquisition equipment to start image data acquisition on a user face area so as to take the obtained face image data as pronunciation correction data; and comparing the user pronunciation mouth shape features extracted from the pronunciation correction data with the standard mouth shape features and matching corresponding pronunciation correction results.
Exemplary Medium
Having described the method and system of the exemplary embodiments of this invention, and referring next to FIG. 7, the present invention provides an exemplary medium having stored thereon computer-executable instructions operable to cause the computer to perform the method of any one of the exemplary embodiments of this invention corresponding to FIG. 6 or the functions of any one of the module configurations of the exemplary embodiments of this invention corresponding to FIG. 2.
Exemplary computing device
Having described the methods, media, and apparatus of exemplary embodiments of the invention, reference is next made to FIG. 8, which illustrates an exemplary computing device 80, where the computing device 80 includes a processing unit 801, a Memory 802, a bus 803, an external device 804, an I/O interface 805, and a network adapter 806, where the Memory 802 includes a Random Access Memory (RAM) 8021, a cache Memory 8022, a Read-Only Memory (ROM) 8023, and a Memory cell array 8025 of at least one Memory cell 8024. The memory 802 is used for storing programs or instructions executed by the processing unit 801; the processing unit 801 is configured to execute the method according to any one of the exemplary embodiments of the present invention corresponding to fig. 6 or the function of any one of the module configurations according to the exemplary embodiments of the present invention corresponding to fig. 2 according to the program or the instruction stored in the memory 802; the I/O interface 805 is used for receiving or transmitting data under the control of the processing unit 801.
It should be noted that although in the above detailed description several units/modules or sub-units/modules of the apparatus are mentioned, such a division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more of the units/modules described above may be embodied in one unit/module according to embodiments of the invention. Conversely, the features and functions of one unit/module described above may be further divided into embodiments by a plurality of units/modules.
Moreover, while the operations of the method of the invention are depicted in the drawings in a particular order, this does not require or imply that the operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.
While the spirit and principles of the invention have been described with reference to several particular embodiments, it is to be understood that the invention is not limited to the disclosed embodiments, nor is the division of aspects, which is for convenience only as the features in such aspects may not be combined to benefit. The invention is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims (10)

1. A pronunciation correction system, comprising:
a test module configured to output test contents determined based on the user learning data; acquiring first test data input by a user according to the test content, wherein the first test data comprises facial image data and voice data;
a pronunciation correction module configured to extract user pronunciation characteristics from the first test data and match a corresponding pronunciation correction strategy; and feeding back pronunciation correction contents to the user based on the pronunciation correction strategy, wherein the pronunciation correction contents are used for indicating the pronunciation difference type and the corresponding pronunciation correction strategy.
2. The system of claim 1, wherein the testing module is further to:
after the pronunciation correction module feeds back pronunciation correction contents to the user based on the pronunciation correction strategy, acquiring second test data input by the user according to the pronunciation correction contents, wherein the second test data comprises facial image data and voice data;
and obtaining an exercise result based on the second test data and a preset exercise strategy.
3. The system of claim 1 or 2, wherein the testing module is further to:
before the first test data or the second test data is collected, judging whether the image collecting equipment identifies a face area of a user;
the test module is specifically configured to, when acquiring the first test data or the second test data:
if the image acquisition equipment is judged to identify the face area of the user, a test starting instruction is output to the user; and/or the presence of a gas in the gas,
triggering the image acquisition equipment to start image data acquisition on the face area of the user so as to obtain the face image data, wherein the face image data comprises image data corresponding to the facial action generated by the user based on the test content or pronunciation correction content.
4. The system of claim 3, wherein the testing module is further to:
after judging whether the image acquisition equipment identifies the face area of the user, if the image acquisition equipment does not identify the face area of the user, outputting an identification result; and/or the presence of a gas in the gas,
and indicating the relative position between the face area of the user and the image acquisition equipment to be adjusted to the user through an adjusting instruction.
5. A system as claimed in any one of claims 2 to 4, wherein the facial image data comprises mouth shape data and/or facial key point data.
6. The system of claim 1, wherein the user pronunciation features include pronunciation image features;
the pronunciation correction module is specifically configured to, when extracting user pronunciation features from the first test data and matching corresponding pronunciation correction strategies:
extracting the pronunciation image features from at least one image frame corresponding to the facial image data, wherein the pronunciation image features comprise user pronunciation mouth shape features corresponding to preset phonetic symbols;
comparing the user pronunciation mouth shape characteristics corresponding to the preset phonetic symbols with the standard mouth shape characteristics corresponding to the preset phonetic symbols and matching the corresponding mouth shape difference types;
and setting the mouth shape difference correction strategy corresponding to the mouth shape difference type as the pronunciation correction strategy.
7. The system of claim 6, wherein the pronunciation correction content includes a pronunciation correction image;
when the pronunciation correction module pushes pronunciation correction content to a user based on the pronunciation correction strategy, the pronunciation correction module is specifically configured to:
simulating a mouth shape image when the user pronounces a preset phonetic symbol correctly as a mouth shape correction guide image based on the mouth shape difference correction strategy and the user face characteristics extracted from the face image data;
and pushing the simulated mouth shape correction guide image as pronunciation correction content to a user.
8. A pronunciation correction method applied to the pronunciation correction system as claimed in one of claims 1 to 7, comprising:
outputting test contents determined based on the user learning data;
collecting first test data input by a user according to the test content and determining a pronunciation correction strategy, wherein the first test data comprises facial image data and voice data;
extracting user pronunciation characteristics from the first test data and matching corresponding pronunciation correction strategies;
and feeding back pronunciation correction contents to the user based on the pronunciation correction strategy, wherein the pronunciation correction contents are used for indicating the pronunciation difference type and the corresponding pronunciation correction strategy.
9. A medium storing program code which, when executed by a processor, implements the functionality of any one of the module arrangements in the pronunciation correction system as claimed in one of claims 1 to 7, or implements the method as claimed in claim 8.
10. A computing device comprising a processor and a storage medium storing program code which, when executed by the processor, implements the functionality of any one of the module arrangements in the pronunciation correction system as claimed in one of claims 1 to 7, or implements the method as claimed in claim 8.
CN201910408726.1A 2019-05-16 2019-05-16 Pronunciation correction system, method, medium and computing device Pending CN111951629A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910408726.1A CN111951629A (en) 2019-05-16 2019-05-16 Pronunciation correction system, method, medium and computing device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910408726.1A CN111951629A (en) 2019-05-16 2019-05-16 Pronunciation correction system, method, medium and computing device

Publications (1)

Publication Number Publication Date
CN111951629A true CN111951629A (en) 2020-11-17

Family

ID=73336653

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910408726.1A Pending CN111951629A (en) 2019-05-16 2019-05-16 Pronunciation correction system, method, medium and computing device

Country Status (1)

Country Link
CN (1) CN111951629A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112786151A (en) * 2020-12-28 2021-05-11 深圳市艾利特医疗科技有限公司 Language function training system and method
CN112949554A (en) * 2021-03-22 2021-06-11 湖南中凯智创科技有限公司 Intelligent children accompanying education robot
CN113257231A (en) * 2021-07-07 2021-08-13 广州思正电子股份有限公司 Language sound correcting system method and device
CN113297924A (en) * 2021-04-30 2021-08-24 北京有竹居网络技术有限公司 Method, device, storage medium and electronic equipment for correcting pronunciation

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101292281A (en) * 2005-09-29 2008-10-22 独立行政法人产业技术综合研究所 Pronunciation diagnosis device, pronunciation diagnosis method, recording medium, and pronunciation diagnosis program
CN102023703A (en) * 2009-09-22 2011-04-20 现代自动车株式会社 Combined lip reading and voice recognition multimodal interface system
CN103220465A (en) * 2013-03-21 2013-07-24 广东欧珀移动通信有限公司 Method, device and mobile terminal for accurate positioning of facial image when mobile phone is used for photographing
KR20140133056A (en) * 2013-05-09 2014-11-19 중앙대학교기술지주 주식회사 Apparatus and method for providing auto lip-synch in animation
US20150056580A1 (en) * 2013-08-26 2015-02-26 Seli Innovations Inc. Pronunciation correction apparatus and method thereof
CN105261246A (en) * 2015-12-02 2016-01-20 武汉慧人信息科技有限公司 Spoken English error correcting system based on big data mining technology
CN106157956A (en) * 2015-03-24 2016-11-23 中兴通讯股份有限公司 The method and device of speech recognition
CN106506959A (en) * 2016-11-15 2017-03-15 上海传英信息技术有限公司 Photographic means and camera installation
CN106775238A (en) * 2016-12-14 2017-05-31 深圳市金立通信设备有限公司 A kind of photographic method and terminal
CN107424450A (en) * 2017-08-07 2017-12-01 英华达(南京)科技有限公司 Pronunciation correction system and method
CN108537702A (en) * 2018-04-09 2018-09-14 深圳市鹰硕技术有限公司 Foreign language teaching evaluation information generation method and device
CN108806367A (en) * 2017-07-21 2018-11-13 河海大学 A kind of Oral English Practice voice correcting system
CN108960166A (en) * 2018-07-11 2018-12-07 谢涛远 A kind of vision testing system, method, terminal and medium
CN109726663A (en) * 2018-12-24 2019-05-07 广东德诚科教有限公司 Online testing monitoring method, device, computer equipment and storage medium

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101292281A (en) * 2005-09-29 2008-10-22 独立行政法人产业技术综合研究所 Pronunciation diagnosis device, pronunciation diagnosis method, recording medium, and pronunciation diagnosis program
CN102023703A (en) * 2009-09-22 2011-04-20 现代自动车株式会社 Combined lip reading and voice recognition multimodal interface system
CN103220465A (en) * 2013-03-21 2013-07-24 广东欧珀移动通信有限公司 Method, device and mobile terminal for accurate positioning of facial image when mobile phone is used for photographing
KR20140133056A (en) * 2013-05-09 2014-11-19 중앙대학교기술지주 주식회사 Apparatus and method for providing auto lip-synch in animation
US20150056580A1 (en) * 2013-08-26 2015-02-26 Seli Innovations Inc. Pronunciation correction apparatus and method thereof
CN106157956A (en) * 2015-03-24 2016-11-23 中兴通讯股份有限公司 The method and device of speech recognition
CN105261246A (en) * 2015-12-02 2016-01-20 武汉慧人信息科技有限公司 Spoken English error correcting system based on big data mining technology
CN106506959A (en) * 2016-11-15 2017-03-15 上海传英信息技术有限公司 Photographic means and camera installation
CN106775238A (en) * 2016-12-14 2017-05-31 深圳市金立通信设备有限公司 A kind of photographic method and terminal
CN108806367A (en) * 2017-07-21 2018-11-13 河海大学 A kind of Oral English Practice voice correcting system
CN107424450A (en) * 2017-08-07 2017-12-01 英华达(南京)科技有限公司 Pronunciation correction system and method
CN108537702A (en) * 2018-04-09 2018-09-14 深圳市鹰硕技术有限公司 Foreign language teaching evaluation information generation method and device
CN108960166A (en) * 2018-07-11 2018-12-07 谢涛远 A kind of vision testing system, method, terminal and medium
CN109726663A (en) * 2018-12-24 2019-05-07 广东德诚科教有限公司 Online testing monitoring method, device, computer equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
栾悉道: "《多媒体情报处理技术》", 31 May 2016, 国防工业出版社 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112786151A (en) * 2020-12-28 2021-05-11 深圳市艾利特医疗科技有限公司 Language function training system and method
CN112949554A (en) * 2021-03-22 2021-06-11 湖南中凯智创科技有限公司 Intelligent children accompanying education robot
CN112949554B (en) * 2021-03-22 2022-02-08 湖南中凯智创科技有限公司 Intelligent children accompanying education robot
CN113297924A (en) * 2021-04-30 2021-08-24 北京有竹居网络技术有限公司 Method, device, storage medium and electronic equipment for correcting pronunciation
CN113257231A (en) * 2021-07-07 2021-08-13 广州思正电子股份有限公司 Language sound correcting system method and device
CN113257231B (en) * 2021-07-07 2021-11-26 广州思正电子股份有限公司 Language sound correcting system method and device

Similar Documents

Publication Publication Date Title
CN110782921B (en) Voice evaluation method and device, storage medium and electronic device
CN110085261B (en) Pronunciation correction method, device, equipment and computer readable storage medium
CN111951629A (en) Pronunciation correction system, method, medium and computing device
US10706738B1 (en) Systems and methods for providing a multi-modal evaluation of a presentation
CN111951825A (en) Pronunciation evaluation method, medium, device and computing equipment
WO2024000867A1 (en) Emotion recognition method and apparatus, device, and storage medium
CN112153397B (en) Video processing method, device, server and storage medium
CN111081080B (en) Voice detection method and learning device
CN113703579B (en) Data processing method, device, electronic equipment and storage medium
US20240064383A1 (en) Method and Apparatus for Generating Video Corpus, and Related Device
CN112614489A (en) User pronunciation accuracy evaluation method and device and electronic equipment
CN112232276B (en) Emotion detection method and device based on voice recognition and image recognition
CN113392273A (en) Video playing method and device, computer equipment and storage medium
CN111079501B (en) Character recognition method and electronic equipment
CN109065024B (en) Abnormal voice data detection method and device
CN111950327A (en) Mouth shape correcting method, mouth shape correcting device, mouth shape correcting medium and computing equipment
CN110874554A (en) Action recognition method, terminal device, server, system and storage medium
CN110046354B (en) Recitation guiding method, apparatus, device and storage medium
CN116645683A (en) Signature handwriting identification method, system and storage medium based on prompt learning
CN111079504A (en) Character recognition method and electronic equipment
CN111078992B (en) Dictation content generation method and electronic equipment
CN113051985B (en) Information prompting method, device, electronic equipment and storage medium
CN111031232B (en) Dictation real-time detection method and electronic equipment
CN111755026B (en) Voice recognition method and system
KR102385779B1 (en) Electronic apparatus and methoth for caption synchronization of contents

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20201117

RJ01 Rejection of invention patent application after publication