CN112309371A - Intonation detection method, apparatus, device and computer readable storage medium - Google Patents

Intonation detection method, apparatus, device and computer readable storage medium Download PDF

Info

Publication number
CN112309371A
CN112309371A CN201910696870.XA CN201910696870A CN112309371A CN 112309371 A CN112309371 A CN 112309371A CN 201910696870 A CN201910696870 A CN 201910696870A CN 112309371 A CN112309371 A CN 112309371A
Authority
CN
China
Prior art keywords
intonation
preset
audio data
change
actual
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910696870.XA
Other languages
Chinese (zh)
Inventor
蒋成林
刘晨晨
沈欣尧
余津锐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Liulishuo Information Technology Co ltd
Original Assignee
Shanghai Liulishuo Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Liulishuo Information Technology Co ltd filed Critical Shanghai Liulishuo Information Technology Co ltd
Priority to CN201910696870.XA priority Critical patent/CN112309371A/en
Publication of CN112309371A publication Critical patent/CN112309371A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B19/00Teaching not covered by other main groups of this subclass
    • G09B19/06Foreign languages
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B5/00Electrically-operated educational appliances
    • G09B5/04Electrically-operated educational appliances with audible presentation of the material to be studied
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications

Abstract

The application discloses a intonation detection method, a intonation detection device, intonation detection equipment and a computer-readable storage medium, wherein the method comprises the following steps: acquiring audio data input aiming at a preset statement; analyzing the audio data to determine the actual intonation change in the preset sentence; and comparing the actual intonation change with a preset intonation change corresponding to the preset statement to generate feedback information for representing whether the current intonation of the preset statement is correct or not. The method that this application provided can be automatically to the audio data of typeeing and carry out the analysis, whether the actual intonation change wherein accords with preset intonation, and to the information whether correct in intonation feedback to the user, can assist the user to understand the notion that the intonation changes, thereby help the user effectively to master the intonation change in the oral english, and simultaneously, this application no longer needs mr to carry out real person demonstration teaching or correction on the spot, time and space's restriction has been overcome, can carry out corresponding exercise anytime and anywhere, the study cost has been practiced thrift.

Description

Intonation detection method, apparatus, device and computer readable storage medium
Technical Field
The present application relates to the field of speech technology, and more particularly, to a tone detection method, apparatus, device, and computer-readable storage medium.
Background
With the development of scientific technology, the application of language learning based on the internet is rapidly developed. In some language learning applications, an application provider sends learning materials to a client through the internet, and a user acquires the learning materials through the client to perform corresponding learning. For language learning, in addition to learning grammar and vocabulary, pronunciation capability is one of the most important capabilities. In general, the user can improve the pronunciation capability of the user by reading aloud, reading with the back and the like. However, in most cases, the user cannot know whether the pronunciation is accurate.
In the process of learning English by learners, learners with Chinese as a mother language are used to directly raise tone at the last word/syllable, but different from the cognition of most learners, the tone of English needs to be raised or lowered from the stressed syllable of a key word in a sentence, and many learners still cannot correctly master tone after understanding the English pronunciation principle due to the unaccustomed continuous raising and tone reading method; moreover, the learning process requires real-man teacher feedback to correct the problems with intonation, so that the user's effective practice is limited in time and space.
Disclosure of Invention
The present application is directed to a intonation detection method, apparatus, device and computer-readable storage medium, so as to solve the problems of the conventional method that the learning efficiency is low and the time and space for effective practice are limited.
In order to achieve the above object, the present application provides a intonation detection method, including:
acquiring audio data input aiming at a preset statement;
analyzing the audio data to determine the actual intonation change in the preset sentence;
and comparing the actual intonation change with a preset intonation change corresponding to the preset statement to generate feedback information for representing whether the current intonation of the preset statement is correct or not.
Optionally, the analyzing the audio data to determine the actual intonation change in the predetermined sentence includes:
analyzing the audio data, and detecting to obtain a vowel part in the audio data;
determining the vibration frequency of the vowel part and calculating the change rate of the vibration frequency;
determining an actual intonation change in the predetermined sentence based on the rate of change.
Optionally, the analyzing the audio data to detect a vowel portion in the audio data includes:
and carrying out forced segmentation and alignment on the audio data through voice recognition to obtain a vowel part in the audio data.
Optionally, before the acquiring the audio data entered for the predetermined sentence, the method further includes:
and marking the preset intonation change through a first visual element of a display interface.
Optionally, after generating feedback information for characterizing whether the current intonation of the predetermined sentence is correct, the method further includes:
when the actual intonation change is consistent with the preset intonation change in comparison, indicating that the intonation change of the preset sentence is correct through a second visual element of the display interface;
and when the actual tone variation is inconsistent with the preset tone variation in comparison, indicating that the tone variation of the preset sentence is incorrect through a third visual element of the display interface.
Optionally, after generating feedback information for characterizing whether the current intonation of the predetermined sentence is correct, the method further includes:
and prompting the feedback information through a specific sound effect.
In order to achieve the above object, the present application provides a intonation detecting apparatus, including:
the acquisition module is used for acquiring audio data recorded aiming at a preset statement;
the determining module is used for analyzing the audio data and determining the actual intonation change in the preset sentence;
and the generating module is used for comparing the actual intonation change with a preset intonation change corresponding to the preset statement and generating feedback information for representing whether the current intonation of the preset statement is correct or not.
In order to achieve the above object, the present application provides a intonation detection apparatus, which is applied to a server, the apparatus includes:
a memory for storing a computer program;
a processor for implementing the steps of any of the aforementioned disclosed intonation detection methods when executing said computer program.
In order to achieve the above object, the present application provides a intonation detection device, which is applied to a client, the device including:
the audio acquisition device is used for inputting audio data aiming at a preset sentence;
the communication device is used for sending the audio data to a server so that the server can analyze the audio data and determine the actual intonation change in the preset statement; comparing the actual intonation change with a preset intonation change corresponding to the preset sentence to generate feedback information for representing whether the current intonation of the preset sentence is correct or not;
and the display device is used for displaying the feedback information on a display interface.
To achieve the above object, the present application provides a computer-readable storage medium, on which a computer program is stored, the computer program, when being executed by a processor, implementing the steps of any one of the intonation detection methods disclosed in the foregoing.
According to the scheme, the intonation detection method provided by the application comprises the following steps: acquiring audio data input aiming at a preset statement; analyzing the audio data to determine the actual intonation change in the preset sentence; and comparing the actual intonation change with a preset intonation change corresponding to the preset statement to generate feedback information for representing whether the current intonation of the preset statement is correct or not. The method that this application provided can be automatically to the audio data of typeeing and carry out the analysis, whether the actual intonation change wherein accords with preset intonation, and to the information whether correct in intonation feedback to the user, can assist the user to understand the notion that the intonation changes, thereby help the user effectively to master the intonation change in the oral english, and simultaneously, this application no longer needs mr to carry out real person demonstration teaching or correction on the spot, time and space's restriction has been overcome, can carry out corresponding exercise anytime and anywhere, the cost of learning has been practiced thrift.
The application also discloses a tone detection device, equipment and a computer readable storage medium, which can also realize the technical effects.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart of a intonation detection method disclosed in an embodiment of the present application;
FIG. 2 is a schematic diagram of a visualization presentation prompting a user to practice intonation at a display interface;
FIG. 3 is a flow chart of a process for determining actual intonation changes in a predetermined sentence;
FIG. 4 is a flow chart of another intonation detection method disclosed in the embodiments of the present application;
FIG. 5 is a schematic diagram of a visualization of feedback to a user's intonation exercise at a display interface;
fig. 6 is a block diagram of a intonation detection apparatus according to an embodiment of the present invention;
fig. 7 is a block diagram of a structure in which intonation detection equipment provided in the embodiment of the present application is applied to a server;
fig. 8 is a block diagram illustrating a structure of a intonation detection apparatus applied to a client according to an embodiment of the present application;
fig. 9 is a block diagram of a intonation detection system according to an embodiment of the present application.
Detailed Description
In order that those skilled in the art will better understand the disclosure, a more particular description of the disclosure will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims of the present application and in the drawings described above, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be noted that the description relating to "first", "second", etc. in the present invention is for descriptive purposes only and is not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In addition, technical solutions between various embodiments may be combined with each other, but must be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination should not be considered to exist, and is not within the protection scope of the present invention.
The embodiment of the invention can be used in pronunciation learning scenes, especially pronunciation learning scenes or pronunciation correction scenes in language learning, wherein languages include but are not limited to foreign languages such as English, French, German and Japanese, and Chinese branches such as Guangdong and Sichuan. The language learning scenario according to the embodiment of the present invention may be, for example, a pronunciation evaluation scenario, a pronunciation correction scenario, or the like in the language learning software or the language learning terminal, or may be another language learning scenario, and the embodiment of the present invention is not limited.
As will be explained in detail below in the application scenario of the embodiment of the present application, a user may perform pronunciation learning through a client, and the client may display a content to be learned by the user on a display interface and may output an audio content in a voice form to the user through an audio playing device such as a speaker. When the user learns pronunciation of voice, the client can acquire audio data of the user during pronunciation through the audio acquisition device so as to perform tone detection operation in the following process. It can be understood that the main body for performing the intonation detection operation may be a client or a server, which does not affect the implementation of the present application.
The client in the embodiment of the present invention may include, but is not limited to: smart phones, tablet computers, MP4, MP3, PCs, PDAs, wearable devices, head-mounted display devices, and the like; the server may include, but is not limited to: a single web server, a server group of multiple web servers, or a cloud based on cloud computing consisting of a large number of computers or web servers.
With reference to the above application scenarios, a flowchart of a specific implementation of the intonation detection method provided by the present application is shown in fig. 1, and the method specifically includes:
s101: acquiring audio data input aiming at a preset statement;
in this embodiment, the predetermined sentence is a sentence to be used for practicing intonation, and may include one or more sentences, each sentence includes one or more sense groups, and each sense group may be at least one word. The user can read the preset sentence through the client side to input voice aiming at the sentence to be trained, and the voice data corresponding to the voice is acquired through the audio acquisition device.
Specifically, as a preferred implementation manner, the embodiment of the present application may use the visual element to mark the predetermined sentence correspondingly to prompt the user of the correct intonation change. As shown in fig. 2, the accent words in the predetermined sentence may be shown in bold and the accent syllables may be further enlarged, and the rising and falling of the intonation may be indicated by inclined arrows on the words, wherein the arrows start from the accent syllables of the accent words and extend to the end of the rising and falling of the intonation.
S102: analyzing the audio data to determine the actual intonation change in the preset sentence;
in this embodiment, the actual intonation change of the user in the reading process of the predetermined sentence is obtained by analyzing the audio data. The process can be executed by the client and can also be executed by the background server, which does not influence the implementation of the application.
As a specific implementation manner, referring to fig. 3, in the present application, the analyzing the audio data to determine the actual intonation change in the predetermined sentence may include:
s1021: analyzing the audio data, and detecting to obtain a vowel part in the audio data;
in the English pronunciation, the consonant part has no obvious periodicity, and the vowel part can detect the frequency of sound vibration. Specifically, the audio data may be subjected to forced segmentation alignment by using speech recognition, so as to obtain a vowel portion in the audio data.
S1022: determining the vibration frequency of the vowel part and calculating the change rate of the vibration frequency;
in the present embodiment, the vibration frequency of the vowel portion at each time is calculated, for example, the vibration frequency of the vowel portion every 0.01 second is calculated, and the change rate of the vibration frequency is determined. Specifically, the change rate of the vibration frequency of the audio data can be estimated through the minimum mean square error, and the corresponding slope is obtained.
S1023: determining an actual intonation change in the predetermined sentence based on the rate of change.
It will be appreciated that after determining the rate of change of the audio data vibration frequency, the actual intonation change may be determined based on the rate of change. For example, if the change rate of the vibration frequency is less than zero, it is determined that the audio data entered by the user is down-pitched, and otherwise, it is determined that the audio data entered by the user is up-pitched.
S103: and comparing the actual intonation change with a preset intonation change corresponding to the preset statement to generate feedback information for representing whether the current intonation of the preset statement is correct or not.
And after determining the actual tone variation corresponding to the audio data input by the user, comparing the actual tone variation with the preset tone variation. If the situation of inconsistency exists, the current intonation change can be considered to be incorrect.
In addition, the feedback information may be visually displayed to the user, or may assist in adding a corresponding sound effect to perform feedback, which is not limited in this embodiment.
According to the scheme, the intonation detection method provided by the application comprises the following steps: acquiring audio data input aiming at a preset statement; analyzing the audio data to determine the actual intonation change in the preset sentence; and comparing the actual intonation change with a preset intonation change corresponding to the preset statement to generate feedback information for representing whether the current intonation of the preset statement is correct or not. The method that this application provided can be automatically to the audio data of typeeing and carry out the analysis, whether the actual intonation change wherein accords with preset intonation, and to the information whether correct in intonation feedback to the user, can assist the user to understand the notion that the intonation changes, thereby help the user effectively to master the intonation change in the oral english, and simultaneously, this application no longer needs mr to carry out real person demonstration teaching or correction on the spot, time and space's restriction has been overcome, can carry out corresponding exercise anytime and anywhere, the cost of learning has been practiced thrift.
The embodiment of the application discloses another intonation detection method, and compared with the previous embodiment, the embodiment further explains and optimizes the technical scheme. Referring to fig. 4, specifically:
s201: acquiring audio data input aiming at a preset statement;
s202: analyzing the audio data to determine the actual intonation change in the preset sentence;
s203: comparing the actual intonation change with a preset intonation change corresponding to the preset sentence;
s204: when the actual intonation change is consistent with the preset intonation change in comparison, indicating that the intonation change of the preset sentence is correct through a second visual element of a display interface;
s205: and when the actual tone variation is inconsistent with the preset tone variation in comparison, indicating that the tone variation of the preset sentence is incorrect through a third visual element of the display interface.
The first visual element and the second visual element can be different geometric patterns or the same geometric pattern, and are distinguished by different indication colors or other characteristics of the geometric patterns. For example, the geometric pattern may be selected as a circle, and when the overall pronunciation tone is correct, the color of the circle is changed to green, and the preset first sound effect is played simultaneously to indicate that the overall tone is correct. When the integral pronunciation tone is incorrect, the color of the circle is changed into red, and meanwhile, the circle shakes or plays a preset second sound effect to indicate the integral tone error.
Referring to fig. 5, a schematic diagram of a visual presentation of feedback on user intonation practice in a display interface is shown, in this embodiment, a predetermined sentence is "Can we record it? ", indicating whether the actual integral pronunciation tone of the user is correct or not through the large circle on the left side above the interface, the color of the circle turning green indicates that the integral pronunciation tone is correct, and the color of the circle turning red indicates that the integral pronunciation tone is incorrect. The key words in the preset sentences are shown in a bold mode, the stressed syllables are further enlarged, and the rising and falling of the tone are indicated by inclined arrows on the words, wherein the arrows start from the stressed syllables of the key words and extend to the tail end of the rising and falling of the tone. For example, the accent word "record" is displayed in bold, and the accent syllable "cord" is further enlarged and highlighted.
In this embodiment, feedback information is fed back to the user through the display interface, and the feedback information may include but is not limited to: the correct and incorrect tone of the whole language, the actual condition of reading with strong and weak syllables in each word and the standard condition of reading with strong and weak syllables. The repeated reading concept of the word is difficult to understand for the learning user and needs to be continuously strengthened, particularly, in the initial learning stage, the learning user is difficult to accurately judge the strength through standard audio and needs direct and clear explanation, so that the embodiment adopts visual elements to assist the learning user in clearly obtaining the strength and weakness of the practice content, the understanding of the concept is strengthened in the practice process, and the problem of the learning user can be quickly positioned. Visually, the user is aided in intuitively understanding the intonation changes of the words by large and small contrasts, with the enlargement and reduction of words, and with more abstract, differently sized geometric shapes.
Furthermore, the method and the device can help the user to clearly compare the problems of the user during pronunciation by playing the audio data and the video audio of the user during exercise, and have an opportunity to further improve the problems by simulating standard demonstration audio.
In the following, a tone detection apparatus according to an embodiment of the present invention is introduced, and the tone detection apparatus described below and the tone detection method described above may be referred to correspondingly.
Fig. 6 is a block diagram of a tone detection apparatus according to an embodiment of the present invention, and referring to fig. 6, the tone detection apparatus may include:
an obtaining module 100, configured to obtain audio data entered for a predetermined sentence;
a determining module 200, configured to analyze the audio data and determine an actual intonation change in the predetermined sentence;
a generating module 300, configured to compare the actual intonation change with a preset intonation change corresponding to the predetermined sentence, and generate feedback information for representing whether the current intonation of the predetermined sentence is correct.
As a specific implementation manner, in the embodiment of the present application, the determining module 200 may specifically include:
the vowel detection unit is used for analyzing the audio data and detecting to obtain a vowel part in the audio data;
a frequency determining unit for determining a vibration frequency of the vowel portion and calculating a rate of change of the vibration frequency;
and the intonation determining unit is used for determining the actual intonation change in the preset sentence based on the change rate.
As a specific implementation manner, the vowel detection unit in the embodiment of the present application is specifically configured to: and carrying out forced segmentation and alignment on the audio data through voice recognition to obtain a vowel part in the audio data.
As a specific implementation manner, the embodiment of the present application may further include:
and the identification module is used for marking the preset intonation change through a first visual element of a display interface before acquiring the audio data input aiming at the preset statement.
As a specific implementation manner, the embodiment of the present application may further include:
the first indicating module is used for indicating that the intonation change of the preset sentence is correct through a second visual element of the display interface when the actual intonation change is consistent with the preset intonation change in a comparison mode;
and the second indicating module is used for indicating that the tone change of the preset statement is incorrect through a third visual element of the display interface when the actual tone change is inconsistent with the preset tone change.
As a specific implementation manner, the embodiment of the present application may further include:
and the prompting module is used for prompting the feedback information through a specific sound effect after generating the feedback information for representing whether the current intonation of the preset sentence is correct.
The intonation detection apparatus of the present embodiment is configured to implement the above-mentioned intonation detection method, and therefore the specific implementation manner of the intonation detection apparatus can be seen in the foregoing examples of the intonation detection method, for example, the obtaining module 100, the determining module 200, and the generating module 300 are respectively configured to implement steps S101, S102, and S103 in the above-mentioned intonation detection method, so that the specific implementation manner thereof may refer to the description of the corresponding examples of each part, and will not be described again here.
This application can carry out the analysis to the audio data of typeeing automatically, and confirm whether the actual intonation change wherein accords with to predetermine the intonation and feed back the information whether correct to the intonation to the user, can assist the user to understand the notion that the intonation changes, thereby help the user effectively to master the intonation change in the oral english, and simultaneously, this application no longer needs mr to carry out real person demonstration teaching or correction on the spot, the restriction in time and space has been overcome, can carry out corresponding exercise anytime and anywhere, the learning cost has been practiced thrift.
In addition, the present application further provides a intonation detection apparatus, which is applied to the server 1, as shown in fig. 7, the apparatus includes:
a memory 11 for storing a computer program;
a processor 12 for implementing the following steps when executing the computer program: acquiring audio data input aiming at a preset statement; analyzing the audio data to determine the actual intonation change in the preset sentence; and comparing the actual intonation change with a preset intonation change corresponding to the preset statement to generate feedback information for representing whether the current intonation of the preset statement is correct or not.
The memory 11 includes at least one type of readable storage medium, which includes a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, and the like. The memory 11 may in some embodiments be an internal storage unit of the intonation detection device, for example a hard disk. The memory 11 may also be an external storage device of the intonation detection device in other embodiments, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and so on. Further, the memory 11 may also include both an internal storage unit of the intonation detection apparatus and an external storage apparatus. The memory 11 may be used not only to store application software installed in the intonation detection apparatus and various types of data, such as the code of the intonation detection program 01, etc., but also to temporarily store data that has been output or is to be output.
The processor 12 may be a Central Processing Unit (CPU), a controller, a microcontroller, a microprocessor or other data Processing chip in some embodiments, and is used for executing program codes stored in the memory 11 or Processing data, such as the program 01 for performing tone detection.
Optionally, the processor 12 is configured to implement the following steps when executing the computer program: analyzing the audio data, and detecting to obtain a vowel part in the audio data; determining the vibration frequency of the vowel part and calculating the change rate of the vibration frequency; determining an actual intonation change in the predetermined sentence based on the rate of change.
Optionally, the processor 12 is configured to implement the following steps when executing the computer program: and carrying out forced segmentation and alignment on the audio data through voice recognition to obtain a vowel part in the audio data.
Optionally, the processor 12 is configured to implement the following steps when executing the computer program: and marking the preset intonation change through a first visual element of a display interface before acquiring the audio data input aiming at the preset statement.
Optionally, the processor 12 is configured to implement the following steps when executing the computer program: when the actual intonation change is consistent with the preset intonation change in comparison, indicating that the intonation change of the preset sentence is correct through a second visual element of the display interface; and when the actual tone variation is inconsistent with the preset tone variation in comparison, indicating that the tone variation of the preset sentence is incorrect through a third visual element of the display interface.
Optionally, the processor 12 is configured to implement the following steps when executing the computer program: and after generating feedback information for representing whether the current intonation of the preset sentence is correct, prompting the feedback information through a specific sound effect.
It can be understood that the server in the embodiment of the present application may include, but is not limited to: a single web server, a server group of multiple web servers, or a cloud based on cloud computing consisting of a large number of computers or web servers.
In addition, the present application further provides a intonation detection device, which is applied to the client 2, as shown in fig. 8, the device includes:
the audio acquisition device 21 is used for inputting audio data aiming at a preset sentence;
the communication device 22 is configured to send the audio data to a server, so that the server analyzes the audio data to determine an actual intonation change in the predetermined sentence; comparing the actual intonation change with a preset intonation change corresponding to the preset sentence to generate feedback information for representing whether the current intonation of the preset sentence is correct or not;
and the display device 23 is used for displaying the feedback information on a display interface.
Optionally, in the intonation detection apparatus provided in the embodiment of the present application, the display device 23 may be further configured to: and marking the preset intonation change through a first visual element of a display interface before acquiring the audio data input aiming at the preset statement.
It can be understood that the client in the embodiment of the present application may include, but is not limited to: smart phones, tablets, MP4, MP3, PCs, PDAs, wearable devices, head mounted display devices, and the like.
Further, the present application also provides a tone detection system, as shown in fig. 9, the system includes any one of the above-mentioned service terminals 1 and any one of the above-mentioned client terminals 2. The user can carry out pronunciation study through the client, and the client can show the content that the user waited to study on display interface to can also export the audio frequency content of speech form to the user through audio playback devices such as speaker, when the user carries out pronunciation study of pronunciation, the client can gather the audio data when the user pronounces through audio acquisition device, and with audio data transmission to server, carry out the process that the intonation detected by the server. And after the audio data are analyzed at the server side and feedback information is obtained, the feedback information is sent to the client side. And displaying the feedback information through a display device of the client, and providing visual auxiliary information for the user.
Furthermore, the present application also provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the steps of any one of the intonation detection methods disclosed in the foregoing embodiments.
The intonation detection device, the intonation detection system and the computer-readable storage medium provided by the application correspond to the intonation detection method. It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
To sum up, this application can be automatically to the audio data analysis of typeeing, whether the actual intonation change wherein accords with preset intonation to confirm, and to the information whether correct in intonation feedback to the user, can assist the user to understand the notion that the intonation changes, thereby help the user effectively to master the intonation change in the oral english, and simultaneously, this application no longer needs the mr to carry out real person's teaching or correction on the spot, time and space's restriction has been overcome, can carry out corresponding exercise anytime and anywhere, the cost of learning has been practiced thrift.
The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description. It should be noted that, for those skilled in the art, it is possible to make several improvements and modifications to the present application without departing from the principle of the present application, and such improvements and modifications also fall within the scope of the claims of the present application.
It is further noted that, in the present specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Claims (10)

1. A intonation detection method, comprising:
acquiring audio data input aiming at a preset statement;
analyzing the audio data to determine the actual intonation change in the preset sentence;
and comparing the actual intonation change with a preset intonation change corresponding to the preset statement to generate feedback information for representing whether the current intonation of the preset statement is correct or not.
2. The intonation detection method according to claim 1, wherein said analyzing said audio data to determine actual intonation changes in said predetermined sentence comprises:
analyzing the audio data, and detecting to obtain a vowel part in the audio data;
determining the vibration frequency of the vowel part and calculating the change rate of the vibration frequency;
determining an actual intonation change in the predetermined sentence based on the rate of change.
3. The intonation detection method according to claim 2, wherein said analyzing said audio data to detect a vowel portion of said audio data comprises:
and carrying out forced segmentation and alignment on the audio data through voice recognition to obtain a vowel part in the audio data.
4. The intonation detection method according to claims 1 to 3, wherein said obtaining audio data entered for a predetermined sentence further comprises:
and marking the preset intonation change through a first visual element of a display interface.
5. The intonation detection method according to claim 4, wherein after generating feedback information for characterizing whether the current intonation of the predetermined sentence is correct, further comprising:
when the actual intonation change is consistent with the preset intonation change in comparison, indicating that the intonation change of the preset sentence is correct through a second visual element of the display interface;
and when the actual tone variation is inconsistent with the preset tone variation in comparison, indicating that the tone variation of the preset sentence is incorrect through a third visual element of the display interface.
6. The intonation detection method according to claim 5, wherein after generating feedback information for characterizing whether the current intonation of the predetermined sentence is correct, further comprising:
and prompting the feedback information through a specific sound effect.
7. A intonation detection apparatus, comprising:
the acquisition module is used for acquiring audio data recorded aiming at a preset statement;
the determining module is used for analyzing the audio data and determining the actual intonation change in the preset sentence;
and the generating module is used for comparing the actual intonation change with a preset intonation change corresponding to the preset statement and generating feedback information for representing whether the current intonation of the preset statement is correct or not.
8. A tone detection apparatus, applied to a server, the apparatus comprising:
a memory for storing a computer program;
a processor for implementing the steps of the intonation detection method according to any one of claims 1 to 6 when executing said computer program.
9. A intonation detection device, for application to a client, the device comprising:
the audio acquisition device is used for inputting audio data aiming at a preset sentence;
the communication device is used for sending the audio data to a server so that the server can analyze the audio data and determine the actual intonation change in the preset statement; comparing the actual intonation change with a preset intonation change corresponding to the preset sentence to generate feedback information for representing whether the current intonation of the preset sentence is correct or not;
and the display device is used for displaying the feedback information on a display interface.
10. A computer-readable storage medium, having stored thereon a computer program which, when being executed by a processor, carries out the steps of the intonation detection method according to any one of claims 1 to 6.
CN201910696870.XA 2019-07-30 2019-07-30 Intonation detection method, apparatus, device and computer readable storage medium Pending CN112309371A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910696870.XA CN112309371A (en) 2019-07-30 2019-07-30 Intonation detection method, apparatus, device and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910696870.XA CN112309371A (en) 2019-07-30 2019-07-30 Intonation detection method, apparatus, device and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN112309371A true CN112309371A (en) 2021-02-02

Family

ID=74485234

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910696870.XA Pending CN112309371A (en) 2019-07-30 2019-07-30 Intonation detection method, apparatus, device and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN112309371A (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH02203396A (en) * 1989-02-01 1990-08-13 Sharp Corp Feature extraction system for voice
US20040006461A1 (en) * 2002-07-03 2004-01-08 Gupta Sunil K. Method and apparatus for providing an interactive language tutor
US20040006468A1 (en) * 2002-07-03 2004-01-08 Lucent Technologies Inc. Automatic pronunciation scoring for language learning
CN101354889A (en) * 2008-09-18 2009-01-28 北京中星微电子有限公司 Method and apparatus for tonal modification of voice
CN101739870A (en) * 2009-12-03 2010-06-16 深圳先进技术研究院 Interactive language learning system and method
CN103310273A (en) * 2013-06-26 2013-09-18 南京邮电大学 Method for articulating Chinese vowels with tones and based on DIVA model
CN104485116A (en) * 2014-12-04 2015-04-01 上海流利说信息技术有限公司 Voice quality evaluation equipment, voice quality evaluation method and voice quality evaluation system
CN107507610A (en) * 2017-09-28 2017-12-22 河南理工大学 A kind of Chinese tone recognition method based on vowel fundamental frequency information
CN109272992A (en) * 2018-11-27 2019-01-25 北京粉笔未来科技有限公司 A kind of spoken language assessment method, device and a kind of device for generating spoken appraisal model

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH02203396A (en) * 1989-02-01 1990-08-13 Sharp Corp Feature extraction system for voice
US20040006461A1 (en) * 2002-07-03 2004-01-08 Gupta Sunil K. Method and apparatus for providing an interactive language tutor
US20040006468A1 (en) * 2002-07-03 2004-01-08 Lucent Technologies Inc. Automatic pronunciation scoring for language learning
CN101354889A (en) * 2008-09-18 2009-01-28 北京中星微电子有限公司 Method and apparatus for tonal modification of voice
CN101739870A (en) * 2009-12-03 2010-06-16 深圳先进技术研究院 Interactive language learning system and method
CN103310273A (en) * 2013-06-26 2013-09-18 南京邮电大学 Method for articulating Chinese vowels with tones and based on DIVA model
CN104485116A (en) * 2014-12-04 2015-04-01 上海流利说信息技术有限公司 Voice quality evaluation equipment, voice quality evaluation method and voice quality evaluation system
CN107507610A (en) * 2017-09-28 2017-12-22 河南理工大学 A kind of Chinese tone recognition method based on vowel fundamental frequency information
CN109272992A (en) * 2018-11-27 2019-01-25 北京粉笔未来科技有限公司 A kind of spoken language assessment method, device and a kind of device for generating spoken appraisal model

Similar Documents

Publication Publication Date Title
CN110085261B (en) Pronunciation correction method, device, equipment and computer readable storage medium
US8065142B2 (en) Synchronization of an input text of a speech with a recording of the speech
US10546508B2 (en) System and method for automated literacy assessment
CN110136747A (en) A kind of method, apparatus, equipment and storage medium for evaluating phoneme of speech sound correctness
CN110136748A (en) A kind of rhythm identification bearing calibration, device, equipment and storage medium
CN104464757A (en) Voice evaluation method and device
CN109448704A (en) Construction method, device, server and the storage medium of tone decoding figure
CN109166569B (en) Detection method and device for phoneme mislabeling
WO2019205383A1 (en) Electronic device, deep learning-based music performance style identification method, and storage medium
JP2002132287A (en) Speech recording method and speech recorder as well as memory medium
CN111325031B (en) Resume analysis method and device
CN109408175B (en) Real-time interaction method and system in general high-performance deep learning calculation engine
KR102414626B1 (en) Foreign language pronunciation training and evaluation system
CN106356053A (en) Method and device for testing recognition accuracy of voice input method and electronic equipment
CN112309429A (en) Method, device and equipment for explosion loss detection and computer readable storage medium
CN112116181B (en) Classroom quality model training method, classroom quality evaluation method and classroom quality evaluation device
CN110097874A (en) A kind of pronunciation correction method, apparatus, equipment and storage medium
CN110085260A (en) A kind of single syllable stress identification bearing calibration, device, equipment and medium
CN110890095A (en) Voice detection method, recommendation method, device, storage medium and electronic equipment
CN111951827B (en) Continuous reading identification correction method, device, equipment and readable storage medium
CN112309371A (en) Intonation detection method, apparatus, device and computer readable storage medium
CN111128237B (en) Voice evaluation method and device, storage medium and electronic equipment
CN109949813A (en) A kind of method, apparatus and system converting speech into text
CN111128181B (en) Recitation question evaluating method, recitation question evaluating device and recitation question evaluating equipment
CN108573713A (en) Speech recognition equipment, audio recognition method and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210202