CN109326162A

CN109326162A - A kind of spoken language exercise method for automatically evaluating and device

Info

Publication number: CN109326162A
Application number: CN201811364229.8A
Authority: CN
Inventors: 罗德安; 张春晓; 夏林中
Original assignee: Shenzhen Institute of Information Technology
Current assignee: Shenzhen Institute of Information Technology
Priority date: 2018-11-16
Filing date: 2018-11-16
Publication date: 2019-02-12

Abstract

The present invention is suitable for audio signal processing technique field, provides a kind of spoken language exercise method for automatically evaluating and device, comprising: obtains colloquial standard file and practice audio corresponding with the colloquial standard file；Extract the first speech rhythm information of the colloquial standard file；Extract the characteristic information of the practice audio, wherein the characteristic information of the practice audio includes the second speech rhythm information of the practice audio；According to first speech rhythm information and second speech rhythm information, the unfluent factor of the practice audio is analyzed；Mark position and the feedback information of unfluent factor described in the practice audio, output the first feedback audio.The embodiment of the present invention can be improved Oral Training efficiency.

Description

A kind of spoken language exercise method for automatically evaluating and device

Technical field

The invention belongs to audio signal processing technique field more particularly to a kind of spoken language exercise method for automatically evaluating and device.

Background technique

Existing spoken language automatic judgment technology mainly passes through the acoustic model and language model of magnanimity corpus training, passes through The statistical model of the scoring tactics foundation of teacher is practised to predict manually to score, feeds back to learner in the form of text or chart etc..

The above method merely completely cannot specifically reflect in learner's details in such a way that prediction is manually scored The overall picture of performance and oracy, Oral Training low efficiency.

Summary of the invention

In view of this, the embodiment of the invention provides a kind of spoken language exercise method for automatically evaluating and device, it is existing to solve In technology the problem of Oral Training low efficiency.

The first aspect of the embodiment of the present invention provides a kind of spoken language exercise method for automatically evaluating, comprising:

Obtain colloquial standard file and practice audio corresponding with the colloquial standard file；

Extract the first speech rhythm information of the colloquial standard file；

Extract the characteristic information of the practice audio, wherein the characteristic information of the practice audio includes the practice sound Second speech rhythm information of frequency；

According to first speech rhythm information and second speech rhythm information, not flowing for the practice audio is analyzed Sharp factor；

Mark position and the feedback information of unfluent factor described in the practice audio, output the first feedback audio.

The second aspect of the embodiment of the present invention provides a kind of spoken language exercise automatic judgment device, comprising:

First acquisition unit, for obtaining colloquial standard file and practice sound corresponding with the colloquial standard file Frequently；

First extraction unit, for extracting the first speech rhythm information of the colloquial standard file；

Second extraction unit, for extracting the characteristic information of the practice audio, wherein the feature letter of the practice audio Breath includes the second speech rhythm information of the practice audio；

Analytical unit, for according to first speech rhythm information and second speech rhythm information, described in analysis Practice the unfluent factor of audio；

First output unit, it is defeated for marking position and the feedback information of unfluent factor described in the practice audio First feedback audio out.

The third aspect of the embodiment of the present invention provides a kind of terminal device, including memory, processor and is stored in In the memory and the computer program that can run on the processor, when the processor executes the computer program It realizes such as the step of the spoken language exercise method for automatically evaluating.

The fourth aspect of the embodiment of the present invention provides a kind of computer readable storage medium, the computer-readable storage Media storage has computer program, and such as the spoken language exercise automatic judgment side is realized when the computer program is executed by processor The step of method.

Existing beneficial effect is the embodiment of the present invention compared with prior art: in the embodiment of the present invention, due to analysis one Unfluent factor in whole section of spoken language exercise audio, and mark the specific unfluent place in one whole section of practice audio and provide Feedback information, output the first feedback audio, therefore can reflect performance of the learner in an entire spoken language exercise in details, it is complete It is whole specifically to feed back spoken evaluation result, allow learner to intuitively understand oneself specific area for improvement, mentions High Oral Training efficiency.

Detailed description of the invention

It to describe the technical solutions in the embodiments of the present invention more clearly, below will be to embodiment or description of the prior art Needed in attached drawing be briefly described, it should be apparent that, the accompanying drawings in the following description is only of the invention some Embodiment for those of ordinary skill in the art without any creative labor, can also be according to these Attached drawing obtains other attached drawings.

Fig. 1 is the implementation process schematic diagram of the first spoken language exercise method for automatically evaluating provided in an embodiment of the present invention；

Fig. 2 is the implementation process schematic diagram of second of spoken language exercise method for automatically evaluating provided in an embodiment of the present invention；

Fig. 3 is a kind of schematic diagram of spoken language exercise automatic judgment device provided in an embodiment of the present invention；

Fig. 4 is the schematic diagram of terminal device provided in an embodiment of the present invention.

Specific embodiment

In being described below, for illustration and not for limitation, the tool of such as particular system structure, technology etc is proposed Body details, to understand thoroughly the embodiment of the present invention.However, it will be clear to one skilled in the art that there is no these specific The present invention also may be implemented in the other embodiments of details.In other situations, it omits to well-known system, device, electricity The detailed description of road and method, in case unnecessary details interferes description of the invention.

In order to illustrate technical solutions according to the invention, the following is a description of specific embodiments.

It should be appreciated that ought use in this specification and in the appended claims, term " includes " instruction is described special Sign, entirety, step, operation, the presence of element and/or component, but be not precluded one or more of the other feature, entirety, step, Operation, the presence or addition of element, component and/or its set.

It is also understood that mesh of the term used in this present specification merely for the sake of description specific embodiment And be not intended to limit the application.As present specification and it is used in the attached claims, unless on Other situations are hereafter clearly indicated, otherwise " one " of singular, "one" and "the" are intended to include plural form.

It will be further appreciated that the term "and/or" used in present specification and the appended claims is Refer to any combination and all possible combinations of one or more of associated item listed, and including these combinations.

As used in this specification and in the appended claims, term " if " can be according to context quilt Be construed to " when ... " or " once " or " in response to determination " or " in response to detecting ".Similarly, phrase " if it is determined that " or " if detecting [described condition or event] " can be interpreted to mean according to context " once it is determined that " or " in response to true It is fixed " or " once detecting [described condition or event] " or " in response to detecting [described condition or event] ".

In addition, term " first ", " second ", " third " etc. are only used for distinguishing description, and cannot in the description of the present application It is interpreted as indication or suggestion relative importance.

Embodiment one:

Fig. 1 shows the flow diagram of the first spoken language exercise method for automatically evaluating provided by the embodiments of the present application, in detail It states as follows:

In S101, colloquial standard file and practice audio corresponding with the colloquial standard file are obtained.

Instruction is received, obtains colloquial standard file, the spoken language file may include standard pronunciation video file or mark Quasi- pronunciation audio file.Practice audio corresponding with the colloquial standard file is obtained, i.e. learner imitates the colloquial standard File and the practice audio recorded.

In S102, the first speech rhythm information of the colloquial standard file is extracted.

The first speech rhythm information of the colloquial standard file is extracted, it can be trained in advance with machine learning is passed through First speech rhythm information of colloquial standard file described in the first audio extraction model extraction.

Optionally, first speech rhythm information includes voice rhythm, the intonation, the duration of a sound in the colloquial standard file Information.

The information such as the voice rhythm of extraction standard spoken language file sound intermediate frequency, intonation, the duration of a sound.Wherein voice rhythm includes audio In pronunciation stall position, intonation include tone height, the duration of a sound includes the tone period that each word needs to continue.

In S103, the characteristic information of the practice audio is extracted, wherein the characteristic information of the practice audio includes institute State the second speech rhythm information of practice audio.

The characteristic information of practice audio is extracted, can be used through the preparatory trained second audio extraction module of machine learning To extract the characteristic information of the practice audio, wherein the characteristic information of the practice audio includes the second language for practicing audio Sound prosodic information.

Optionally, second speech rhythm information includes voice rhythm, intonation, the duration of a sound letter in the practice audio Breath.

Extract the information such as the voice rhythm, intonation, the duration of a sound of practice audio sound intermediate frequency.Wherein voice rhythm includes in audio Pronunciation stall position, intonation include tone height, and the duration of a sound includes the tone period that each word needs to continue.

In S104, according to first speech rhythm information and second speech rhythm information, the practice is analyzed The unfluent factor of audio.

Using the first speech rhythm information as standard, the second speech rhythm information and the first speech rhythm information are analyzed It compares, finds out the unfluent factor of practice audio.The comparison model that can be trained with machine learning algorithm, is diagnosed to be experienced automatically It practises in audio and in place of the gap of standard audio phonetic-rhythm, obtains unfluent factor, the unfluent factor includes voice section Play too fast, excessively slow, speech tone is excessively high, too low, and the duration of a sound is too long, too short etc..

Specifically, the fault-tolerant threshold value of rhythm, the fault-tolerant threshold value of tone, the fault-tolerant threshold value of the duration of a sound etc. can be set judge it is unfluent because Element.For judging the voice duration of a sound in unfluent factor, the setting fault-tolerant threshold value of the duration of a sound is 2 seconds, compares the second phonetic-rhythm letter Breath and the first speech rhythm information, if some voice duration of a sound in the second speech rhythm information is longer than the first speech rhythm information by 2 Second, then determine that the duration of a sound is too long, the too long unfluent factor of the feedback duration of a sound；If some voice in the second speech rhythm information The duration of a sound is 2 seconds shorter than the first speech rhythm information, then determines that the duration of a sound is too short, the too short unfluent factor of the feedback duration of a sound.

In S105, mark the position of unfluent factor described in the practice audio and feedback information, output first anti- Present audio.

Position of the unfluent factor analyzed in label S104 in the practice audio, which can pass through the white silk The specific period practised in audio is marked, and optionally, the sound with unfluent factor in practice audio is marked by color The frequency period.Meanwhile the unfluent specific feedback information of factor of segment record when this has the label of unfluent factor, optionally, when It detects when having touch or click commands on the label period, shows the specific feedback information.The specific feedback information includes Unfluent factor information, i.e. rhythm are too fast, excessively slow, and speech tone is excessively high, too low, and the duration of a sound is too long, the feedback informations such as too short.Label After the practice audio, the first feedback audio is obtained, output feeds back to learner.

In the embodiment of the present invention, due to the unfluent factor in one whole section of spoken language exercise audio of analysis, and mark one Specific unfluent place and feedback information is provided in whole section of practice audio, output the first feedback audio, therefore can reflect study Performance of the person in an entire spoken language exercise in details, completely feeds back spoken evaluation result specifically, so that learner can be with Oneself specific area for improvement is intuitively understood, Oral Training efficiency is improved.

Embodiment two:

Fig. 2 shows the flow diagrams of second of spoken language exercise method for automatically evaluating provided by the embodiments of the present application, in detail It states as follows:

In S201, colloquial standard file and practice audio corresponding with the colloquial standard file are obtained.

Instruction is received, obtains colloquial standard file, the spoken language file may include standard pronunciation video file or mark Quasi- pronunciation audio file.Practice audio corresponding with the colloquial standard file is obtained, the practice audio includes learner's mould The audio imitating the colloquial standard file and recording.

In S202, the first speech rhythm information of the colloquial standard file is extracted.

In S203, the characteristic information of the practice audio is extracted, wherein the characteristic information of the practice audio includes institute State the second speech rhythm information of practice audio and the timbre information of the practice audio.

The characteristic information of practice audio is extracted, can use and pass through machine learning trained third audio extraction module in advance To extract the characteristic information of the practice audio, wherein the characteristic information of the practice audio includes the of the practice audio The timbre information of two speech rhythm informations and the practice audio.The timbre information can be by each in analysis practice audio The frequency of frame audio obtains.

In S204, according to first speech rhythm information and second speech rhythm information, the practice is analyzed The unfluent factor of audio.

In S205, mark the position of unfluent factor described in the practice audio and feedback information, output first anti- Present audio.

Position of the unfluent factor analyzed in label S204 in the practice audio, i.e., in the practice audio The specific period audio period with unfluent factor in practice audio is optionally marked by red.Meanwhile at this Segment record unfluent factor specific feedback information when label with unfluent factor, optionally, when detecting the label period On when having touching instruction, show the specific feedback information.The specific feedback information includes unfluent factor information, i.e. rhythm mistake Fastly, excessively slow, speech tone is excessively high, too low, and the duration of a sound is too long, the feedback informations such as too short.After marking the practice audio, is obtained One feedback audio, output feed back to learner.

In S206, according to the colloquial standard file and the timbre information of the practice audio, the second feedback sound is exported Frequently.

Specifically, the step S206 includes:

S206A1: according to the colloquial standard file and the timbre information of the practice audio, synthesis and the practice sound Identical first standard audio of the tone color of frequency.

According to the timbre information of learner in the audio content of colloquial standard file and practice audio, synthesis is with learner's Identical first standard audio of tone color.

Optionally, believed by the first phonetic-rhythm of the text information of extraction standard spoken language file and colloquial standard file Breath comes content to be expressed and pronunciation law in extraction standard spoken language file.

Model can be extracted by language and characters trained in advance come the text information of extraction standard spoken language file.It is marked In quasi- spoken language file after content to be expressed and pronunciation law, according to content to be expressed in colloquial standard file and hair Sound rule, learner timbre information synthesize the first standard audio with learner's tone color.The first standard pronunciation after synthesis Frequently, the pronouncing frequency of each frame voice is identical as the pronouncing frequency of learner, i.e. the tone color of the first standard audio and learner Practice audio tone color it is identical, while in colloquial standard file expression content and phonetic-rhythm it is identical.

S206A2: output first standard audio.

First standard audio of synthesis is exported, learner is fed back to.

Specifically, the step S206 includes:

S206B1: according to the colloquial standard file and the timbre information of the practice audio, the standard pronunciation prestored is obtained In frequency with it is described practice audio immediate second standard audio of tone color.

For a colloquial standard file, the standard audio with special tamber can be prestored.For example, can prestore and learn Habit person's tone color is close, and the people for capableing of correct grasp standard pronunciation imitates the standard audio of the colloquial standard file.Alternatively, Prestore multiple standard audios including the different tone colors such as male voice, female voice, child's voice.Obtain in these pre-stored criteria audios with the white silk The immediate audio of tone color for practising audio obtains the second standard audio, such as extracts in pre-stored criteria audio, sound pronunciation frequency With practice the smallest second standard audio of audio speech pronouncing frequency difference.

S206B2: output second standard audio.

The second standard audio prestored is exported, learner is fed back to.

In the embodiment of the present invention, due in first feedback audio of the output containing not fluent factor feedback and then into one Step provides and learner according to the colloquial standard file and the timbre information of the practice audio, output the second feedback audio The identical or close standard audio of tone color, allows learner more rapidly can accurately imitate standard pronunciation, to further increase Oral Training efficiency.

It should be understood that the size of the serial number of each step is not meant that the order of the execution order in above-described embodiment, each process Execution sequence should be determined by its function and internal logic, the implementation process without coping with the embodiment of the present invention constitutes any limit It is fixed.

Embodiment three:

Fig. 3 shows a kind of structural schematic diagram of spoken language exercise automatic judgment device provided by the embodiments of the present application, in order to Convenient for explanation, part relevant to the embodiment of the present application is illustrated only:

The spoken language exercise automatic judgment device includes: first acquisition unit 31, the first extraction unit 32, the second extraction unit 33, analytical unit 34, the first output unit 35.Wherein:

First acquisition unit 31, for obtaining colloquial standard file and practice sound corresponding with the colloquial standard file Frequently.

The spoken language file may include standard pronunciation video file or standard pronunciation audio file.It obtains and the mark The corresponding practice audio of quasi- spoken language file, i.e. learner imitate the colloquial standard file and the practice audio recorded.

First extraction unit 32, for extracting the first speech rhythm information of the colloquial standard file.

The first speech rhythm information of the colloquial standard file is extracted, optionally, first extraction unit includes logical Machine learning trained first audio extraction model in advance is crossed, for extracting the first phonetic-rhythm of the colloquial standard file Information.

Optionally, first extraction unit 32 includes the first speech rhythm information extraction module, for extracting the mark Voice rhythm, the intonation, duration information of quasi- spoken language file.

Second extraction unit 33, for extracting the characteristic information of the practice audio, wherein the feature of the practice audio Information includes the second speech rhythm information of the practice audio.

The characteristic information of practice audio is extracted, optionally, second extraction unit includes instructing in advance by machine learning The the second audio extraction module perfected, for extracting the characteristic information of the practice audio, wherein the feature of the practice audio Information includes practicing the second speech rhythm information of audio.Optionally, the characteristic information further includes the sound of the practice audio Color information.

Second extraction unit 33 includes the second speech rhythm information extraction module, for extracting in the practice audio Voice rhythm, intonation, duration information.

Analytical unit 34, for analyzing institute according to first speech rhythm information and second speech rhythm information State the unfluent factor of practice audio.

First output unit 35, for marking position and the feedback information of unfluent factor described in the practice audio, Output the first feedback audio.

Position of the unfluent factor that labeled analysis goes out in the practice audio, i.e., it is specific in the practice audio Period optionally marks the audio period with unfluent factor in practice audio by red.Meanwhile this have not Segment record unfluent factor specific feedback information when the label of fluent factor, optionally, when detecting on the label period there is touching When touching instruction, the specific feedback information is shown.The specific feedback information includes unfluent factor information, i.e. rhythm is too fast, excessively slow, Speech tone is excessively high, too low, and the duration of a sound is too long, the feedback informations such as too short.After marking the practice audio, the first feedback sound is obtained Frequently, output feeds back to learner.

Optionally, the spoken language exercise automatic judgment device further include:

Second output unit, for the timbre information according to the colloquial standard file and the practice audio, output the Two feedback audios.

Optionally, second output unit includes synthesis module and the first standard audio output module:

Synthesis module, for according to the colloquial standard file and it is described practice audio timbre information, synthesis with it is described Practice identical first standard audio of tone color of audio.

First standard audio output module, for exporting first standard audio.

Optionally, second output unit includes second acquisition unit and the second standard audio output module:

Second acquisition unit obtains pre- for the timbre information according to the colloquial standard file and the practice audio In the standard audio deposited with it is described practice audio immediate second standard audio of tone color.

Second standard audio output module, for exporting second standard audio.

Example IV:

Fig. 4 is the schematic diagram for the terminal device that one embodiment of the invention provides.As shown in figure 4, the terminal of the embodiment is set Standby 4 include: processor 40, memory 41 and are stored in the meter that can be run in the memory 41 and on the processor 40 Calculation machine program 42, such as spoken language exercise automatic judgment program.The processor 40 is realized when executing the computer program 42 State the step in each spoken language exercise method for automatically evaluating embodiment, such as step S101 to S105 shown in FIG. 1.Alternatively, institute The function that each module/unit in above-mentioned each Installation practice is realized when processor 40 executes the computer program 42 is stated, such as The function of module 31 to 35 shown in Fig. 3.

Illustratively, the computer program 42 can be divided into one or more module/units, it is one or Multiple module/units are stored in the memory 41, and are executed by the processor 40, to complete the present invention.Described one A or multiple module/units can be the series of computation machine program instruction section that can complete specific function, which is used for Implementation procedure of the computer program 42 in the terminal device 4 is described.For example, the computer program 42 can be divided It is cut into first acquisition unit, the first extraction unit, the second extraction unit, analytical unit, the first output unit, the specific function of each unit It can be as follows:

First acquisition unit, for obtaining colloquial standard file and practice sound corresponding with the colloquial standard file Frequently.

First extraction unit, for extracting the first speech rhythm information of the colloquial standard file.

Second extraction unit, for extracting the characteristic information of the practice audio, wherein the feature letter of the practice audio Breath includes the second speech rhythm information of the practice audio.

Analytical unit, for according to first speech rhythm information and second speech rhythm information, described in analysis Practice the unfluent factor of audio.

The terminal device 4 can be the calculating such as desktop PC, notebook, palm PC and cloud server and set It is standby.The terminal device may include, but be not limited only to, processor 40, memory 41.It will be understood by those skilled in the art that Fig. 4 The only example of terminal device 4 does not constitute the restriction to terminal device 4, may include than illustrating more or fewer portions Part perhaps combines certain components or different components, such as the terminal device can also include input-output equipment, net Network access device, bus etc..

Alleged processor 40 can be central processing unit (Central Processing Unit, CPU), can also be Other general processors, digital signal processor (Digital Signal Processor, DSP), specific integrated circuit (Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field- Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor logic, Discrete hardware components etc..General processor can be microprocessor or the processor is also possible to any conventional processor Deng.

The memory 41 can be the internal storage unit of the terminal device 4, such as the hard disk or interior of terminal device 4 It deposits.The memory 41 is also possible to the External memory equipment of the terminal device 4, such as be equipped on the terminal device 4 Plug-in type hard disk, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card dodge Deposit card (Flash Card) etc..Further, the memory 41 can also both include the storage inside list of the terminal device 4 Member also includes External memory equipment.The memory 41 is for storing needed for the computer program and the terminal device Other programs and data.The memory 41 can be also used for temporarily storing the data that has exported or will export.

It is apparent to those skilled in the art that for convenience of description and succinctly, only with above-mentioned each function Can unit, module division progress for example, in practical application, can according to need and by above-mentioned function distribution by different Functional unit, module are completed, i.e., the internal structure of described device is divided into different functional unit or module, more than completing The all or part of function of description.Each functional unit in embodiment, module can integrate in one processing unit, can also To be that each unit physically exists alone, can also be integrated in one unit with two or more units, it is above-mentioned integrated Unit both can take the form of hardware realization, can also realize in the form of software functional units.In addition, each function list Member, the specific name of module are also only for convenience of distinguishing each other, the protection scope being not intended to limit this application.Above system The specific work process of middle unit, module, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.

In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, is not described in detail or remembers in some embodiment The part of load may refer to the associated description of other embodiments.

Those of ordinary skill in the art may be aware that list described in conjunction with the examples disclosed in the embodiments of the present disclosure Member and algorithm steps can be realized with the combination of electronic hardware or computer software and electronic hardware.These functions are actually It is implemented in hardware or software, the specific application and design constraint depending on technical solution.Professional technician Each specific application can be used different methods to achieve the described function, but this realization is it is not considered that exceed The scope of the present invention.

In embodiment provided by the present invention, it should be understood that disclosed device/terminal device and method, it can be with It realizes by another way.For example, device described above/terminal device embodiment is only schematical, for example, institute The division of module or unit is stated, only a kind of logical function partition, there may be another division manner in actual implementation, such as Multiple units or components can be combined or can be integrated into another system, or some features can be ignored or not executed.Separately A bit, shown or discussed mutual coupling or direct-coupling or communication connection can be through some interfaces, device Or the INDIRECT COUPLING or communication connection of unit, it can be electrical property, mechanical or other forms.

The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.

It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of software functional units.

If the integrated module/unit be realized in the form of SFU software functional unit and as independent product sale or In use, can store in a computer readable storage medium.Based on this understanding, the present invention realizes above-mentioned implementation All or part of the process in example method, can also instruct relevant hardware to complete, the meter by computer program Calculation machine program can be stored in a computer readable storage medium, the computer program when being executed by processor, it can be achieved that on The step of stating each embodiment of the method.Wherein, the computer program includes computer program code, the computer program Code can be source code form, object identification code form, executable file or certain intermediate forms etc..Computer-readable Jie Matter may include: can carry the computer program code any entity or device, recording medium, USB flash disk, mobile hard disk, Magnetic disk, CD, computer storage, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), electric carrier signal, telecommunication signal and software distribution medium etc..It should be noted that described The content that computer-readable medium includes can carry out increasing appropriate according to the requirement made laws in jurisdiction with patent practice Subtract, such as does not include electric carrier signal and electricity according to legislation and patent practice, computer-readable medium in certain jurisdictions Believe signal.

Embodiment described above is merely illustrative of the technical solution of the present invention, rather than its limitations；Although referring to aforementioned reality Applying example, invention is explained in detail, those skilled in the art should understand that: it still can be to aforementioned each Technical solution documented by embodiment is modified or equivalent replacement of some of the technical features；And these are modified Or replacement, the spirit and scope for technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution should all It is included within protection scope of the present invention.

Claims

1. a kind of spoken language exercise method for automatically evaluating characterized by comprising

Extract the first speech rhythm information of the colloquial standard file；

Extract the characteristic information of the practice audio, wherein the characteristic information of the practice audio includes the practice audio Second speech rhythm information；

According to first speech rhythm information and second speech rhythm information, analyze the practice audio it is unfluent because Element；

2. spoken language exercise method for automatically evaluating as described in claim 1, which is characterized in that

First speech rhythm information includes voice rhythm, intonation, duration information in the colloquial standard file；

Second speech rhythm information includes voice rhythm, intonation, duration information in the practice audio.

3. spoken language exercise method for automatically evaluating as described in claim 1, which is characterized in that the characteristic information of the practice audio It further include the timbre information of the practice audio, at this point, the position of the unfluent factor described in the label practice audio It sets and feedback information, output is fed back after audio, further includes:

According to the colloquial standard file and the timbre information of the practice audio, output the second feedback audio.

4. spoken language exercise method for automatically evaluating as claimed in claim 3, which is characterized in that described according to the colloquial standard text The timbre information of part and the practice audio, output the second feedback audio, comprising:

According to the colloquial standard file and the timbre information of the practice audio, synthesize identical as the practice tone color of audio The first standard audio；

Export first standard audio.

5. spoken language exercise method for automatically evaluating as claimed in claim 3, which is characterized in that described according to the colloquial standard text The timbre information of part and the practice audio, output the second feedback audio, comprising:

According to the colloquial standard file and it is described practice audio timbre information, obtain in the standard audio prestored with the white silk Practise immediate second standard audio of tone color of audio；

Export second standard audio.

6. a kind of spoken language exercise automatic judgment device characterized by comprising

First acquisition unit, for obtaining colloquial standard file and practice audio corresponding with the colloquial standard file；

Second extraction unit, for extracting the characteristic information of the practice audio, wherein the characteristic information packet of the practice audio Include the second speech rhythm information of the practice audio；

Analytical unit, for analyzing the practice according to first speech rhythm information and second speech rhythm information The unfluent factor of audio；

First output unit, for marking position and the feedback information of unfluent factor described in the practice audio, output the One feedback audio.

7. spoken language exercise automatic judgment device as claimed in claim 6 characterized by comprising

First extraction unit includes the first speech rhythm information extraction module, for extracting the language of the colloquial standard file Syllable plays, intonation, duration information；

Second extraction unit includes the second speech rhythm information extraction module, for extracting the voice in the practice audio Rhythm, intonation, duration information.

8. spoken language exercise automatic judgment device as claimed in claim 6, which is characterized in that described device further include:

Second output unit, for the timbre information according to the colloquial standard file and the practice audio, output second is anti- Present audio.

9. a kind of terminal device, including memory, processor and storage are in the memory and can be on the processor The computer program of operation, which is characterized in that the processor realizes such as claim 1 to 5 when executing the computer program The step of any one the method.

10. a kind of computer readable storage medium, the computer-readable recording medium storage has computer program, and feature exists In when the computer program is executed by processor the step of any one of such as claim 1 to 5 of realization the method.