CN116894442A - Language translation method and system for correcting guide pronunciation - Google Patents

Language translation method and system for correcting guide pronunciation Download PDF

Info

Publication number
CN116894442A
CN116894442A CN202311159416.3A CN202311159416A CN116894442A CN 116894442 A CN116894442 A CN 116894442A CN 202311159416 A CN202311159416 A CN 202311159416A CN 116894442 A CN116894442 A CN 116894442A
Authority
CN
China
Prior art keywords
pronunciation
audio
byte
user
correction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311159416.3A
Other languages
Chinese (zh)
Other versions
CN116894442B (en
Inventor
刘新星
陈新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Linyi University
Original Assignee
Linyi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Linyi University filed Critical Linyi University
Priority to CN202311159416.3A priority Critical patent/CN116894442B/en
Publication of CN116894442A publication Critical patent/CN116894442A/en
Application granted granted Critical
Publication of CN116894442B publication Critical patent/CN116894442B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Machine Translation (AREA)

Abstract

The application relates to the technical field of language translation, and particularly discloses a language translation method and a language translation system for correcting guide pronunciation, wherein the method comprises the following steps: according to the application, through identifying the corresponding pronunciation audio after guiding and correcting the appointed user and analyzing the pronunciation correction improvement coefficient of the appointed user, more scientific data support can be accurately provided for correcting the audio of the subsequent analysis user, the pronunciation standard degree of the user is improved to a certain extent, meanwhile, through preprocessing the compliance correction pronunciation audio of the appointed user and carrying out text translation conversion, the interference factors in the compliance correction pronunciation audio of the user are effectively reduced, the subsequent analysis of the text translation conversion is more dependent, and the definition, the audibility and the understandability of the audio are improved.

Description

Language translation method and system for correcting guide pronunciation
Technical Field
The application relates to the technical field of language translation, in particular to a language translation method and a language translation system for correcting guide pronunciation.
Background
At present, with the rapid development of culture communication of various countries, more and more people need to know the cross-country culture through foreign language learning, so that the foreign language learning becomes a hot topic in recent years, and when the foreign language learning is performed, the prior art still has a great defect in the analysis of correcting and translating the pronunciation audio of a user into a required foreign language text, and the foreign language learning of the user can be adversely affected to a certain extent, so that in order to ensure the pronunciation of the user to be more accurate, the pronunciation audio of the user needs to be corrected more carefully, so that the text translated later can be more accurate.
Today, the deficiencies in correcting language translations of guided utterances are manifested in the following ways: in the prior art, when the pronunciation audio of the user is corrected, only one correction is often carried out, the situation that the corrected audio of the user still has some differences from the standard audio is not considered, the secondary correction is needed for the audio of the user, the correction audio of the user is not preprocessed generally, if the text translation conversion is carried out on the correction audio of the user directly, some interference factors existing in the audio can be caused, and the translation analysis quality and the understandability of the audio can be negatively influenced.
For example, publication No.: the patent application of CN111968676A discloses a pronunciation correction method, a pronunciation correction device, electronic equipment and a storage medium, wherein a voice fragment with a lower score is obtained through voice evaluation, and a pronunciation correction mode is determined by identifying facial image information of a user corresponding to the voice fragment with the lower score, so that pronunciation correction can be quickly and accurately carried out on students in the process of spoken language teaching by combining two aspects of voice and images.
For example, bulletin numbers: the application patent of CN112037788B discloses a voice correction fusion method, which is used for collecting voice data and video data of a speaker, performing punctuation preprocessing on mouth shapes collected in the video data, comparing the voice data with an audio database to obtain a voice recognition result, adding lip language recognition on the basis of voice recognition, effectively removing the influence of accents on the voice recognition, adopting the lip language recognition in image recognition to remove the influence of the voice, and recognizing the content spoken by the speaker through lips to be more accurate.
However, in the process of implementing the technical scheme of the embodiment of the application, the inventor discovers that the above technology has at least the following technical problems:
in the prior art, when the pronunciation audio of the user is identified, the electronic technology is generally used for directly identifying and correcting the pronunciation audio once, although the audio of the user can be corrected to a certain extent, when the text translation conversion is carried out on the audio subsequently, the translation text cannot be accurately obtained because the corrected audio of the user is not standard enough.
Disclosure of Invention
Aiming at the defects of the prior art, the application provides a language translation method and a language translation system for correcting guide pronunciation, which can effectively solve the problems related to the background art.
In order to achieve the above purpose, the application is realized by the following technical scheme: the first aspect of the present application provides a language translation method for correcting a guide pronunciation, comprising: step one, obtaining the required translation conversion languages of the appointed user, identifying the pronunciation audio of the appointed user, and analyzing the pronunciation error index of each byte corresponding to the appointed user.
And secondly, correcting and prompting the pronunciation of the appointed user, translating and converting corrected standard pronunciation audio into a text, and calculating a pronunciation conversion degree coefficient of the appointed user.
And thirdly, identifying the pronunciation audio corresponding to the guide correction of the appointed user, and analyzing the pronunciation correction improvement coefficient of the appointed user.
And fourthly, preprocessing the compliance correction pronunciation audio of the appointed user, performing text translation conversion, and calculating a compliance value of the text translation conversion, so as to manage and prompt the conversion translation text of the appointed user audio.
As a further method, the specific analysis process of identifying the pronunciation audio of the appointed user is as follows: identifying, screening and counting corresponding byte audio in the voice audio of the appointed user, thereby identifying and counting associated parameters of the corresponding byte audio of the appointed user, wherein the associated parameters comprise durationFrequency->Intensity->Wherein i is denoted by the number of each byte, < >>M is expressed as the number of bytes. Extracting standard duration +.>Standard frequency->Standard intensity->Calculating the sound spectrum coincidence coefficient corresponding to each byte of the appointed user>The calculation formula is as follows:wherein->、/>And->Respectively denoted as a correction factor corresponding to the predefined duration, frequency and intensity. Collecting input perception waveforms corresponding to each byte of audio of a specified user, and comparing the input perception waveforms with standard perception waveforms corresponding to each byte of audio extracted from a language information base to obtain waveform height differences +_ corresponding to each byte of audio of the specified user>And input sense waveform overlap length +>Simultaneously extracting the predefined permissible waveform height difference corresponding to each byte of audio>And extracting standard perception waveform length corresponding to each byte. Calculating perceptual waveform coincidence coefficient corresponding to each byte of a specified user>The calculation formula is as follows:wherein->And->Respectively expressed as a predefined waveform height difference and a perceived waveform length. Acquiring the average tone value +.>Simultaneously extracting standard tone values corresponding to each byte of audio from the language information base>Calculating tone coincidence coefficient corresponding to each byte of a specified user +.>The calculation formula is as follows: />Wherein->Expressed as a correction factor corresponding to a predefined byte tone, e is expressed as a natural constant. As a further method, the specified user corresponds to the pronunciation error index of each byte, which is specifically analyzedThe process is as follows: according to the sound spectrum coincidence coefficient, the perception waveform coincidence coefficient and the tone coincidence coefficient corresponding to each byte of the appointed user, calculating the pronunciation error index (I) of each byte corresponding to the appointed user>The calculation formula is as follows:wherein->、/>And->Respectively expressed as a predefined sound spectrum coincidence coefficient, a perception waveform coincidence coefficient and a weight factor corresponding to the sound tone coincidence coefficient. Obtaining the number of the error bytes in the audio of the appointed user according to the pronunciation error index of each byte corresponding to the appointed user, correcting and prompting the number of the error bytes in the audio of the appointed user, translating and converting the corrected standard pronunciation audio into a text, and recording the text as a first translation and conversion text. As a further method, the specific analysis process of the pronunciation conversion degree coefficient of the specified user is as follows: according to the pronunciation error index of each byte corresponding to the appointed user, comparing the pronunciation error index with the preset reference pronunciation conversion degree limit value corresponding to each pronunciation error index section to obtain the reference pronunciation conversion degree limit value +_ of each byte corresponding to the appointed user>. Sequentially extracting duration difference ++corresponding to each byte of audio of specified user>Frequency difference->Intensity difference->Input perceived waveform Length Difference ∈ ->And tone difference->. Calculating the pronunciation conversion degree coefficient of the appointed user>The calculation formula is as follows:wherein->、/>、/>And->Respectively expressed as predefined unit duration differences, unit frequency differences, unit intensity differences, unit input perceived waveform length differences, and unit tone differences. As a further method, the pronunciation correction improvement coefficient of the appointed user comprises the following specific analysis processes: identifying the pronunciation audio corresponding to the guide corrected appointed user to obtain corrected pronunciation audio of the appointed user, screening and extracting each byte correction frequency spectrum of the appointed user, and acquiring each byte frequency spectrum corresponding to the appointed user according to each byte audio corresponding to the appointed user. The correction frequency spectrum of each byte of the appointed user is subjected to overlapping comparison with the frequency spectrum of each byte corresponding to the appointed user, so that the frequency spectrum overlapping length +.>Simultaneously extracting standard spectrum length corresponding to each byte from the language information base>. According to the analysis mode of the pronunciation error index of each byte corresponding to the appointed user, the corrected pronunciation audio of the appointed user is analyzed and processed in the same way to obtain the pronunciation error index +_ of each corrected byte corresponding to the appointed user>Simultaneously extracting predefined allowable pronunciation error definition values corresponding to each byte. According to the corresponding pronunciation correction improvement coefficient threshold value under each predefined pronunciation conversion degree coefficient interval, matching to obtain the pronunciation correction improvement coefficient threshold value of the appointed user +.>. Calculating pronunciation correction improvement coefficient of specified user>The calculation formula is as follows: />Wherein->And->Respectively representing the correction factors corresponding to the preset spectrum length and the pronunciation error index. As a further method, the compliance correction pronunciation audio of the appointed user is preprocessed, and the specific analysis process is as follows: acquiring preprocessing parameters corresponding to the compliance correction pronunciation audio of a specified user, wherein the preprocessing parameters comprise gain values and frequency values of all bytes, and obtaining a maximum byte gain value corresponding to the compliance correction pronunciation audio of the specified user through screening>And maximum byte frequency value->. Extracting a predefined audio corresponding license byte gain value +.>And reference byte frequency value->Calculating an audio preprocessing degree index of a specified user>The calculation formula is as follows: />Wherein->And->Represented as a predefined byte gain value and a corresponding correction factor to the byte frequency value, respectively. As a further method, the text translation conversion compliance value comprises the following specific analysis processes: performing text translation conversion on the compliance correction pronunciation audio of the specified user to obtain a compliance correction text of the specified user, and extracting the coincident byte number in the compliance correction text of the specified user>Repeat symbol number +.>. Comparing the first translation conversion text with the compliance correction text of the appointed user to obtain the compliance coincident byte number +.>Number of compliant repetition identifier +.>. According to the audio preprocessing degree index of the appointed user, the license reference coincidence byte number corresponding to each preset audio preprocessing degree index section is compared with the license reference repetition identifier number to obtain the license reference coincidence byte number of the appointed user>And permit reference repeat identifier number +.>
Calculating compliance values for text translation transformationsThe calculation formula is as follows:wherein->And->Respectively expressed as a predefined number of coincident bytes and a corresponding correction factor for the number of repeated identifier symbols. As a further method, the management prompt is carried out on the conversion translation text of the audio of the appointed user, and the specific analysis process is as follows: and translating the converted languages according to the pronunciation correction improvement coefficient of the appointed user and the requirement of the appointed user, and matching the converted languages with a reference correction improvement compliance threshold corresponding to the predefined pronunciation correction improvement coefficient interval to obtain the reference correction improvement compliance threshold corresponding to the requirement translation converted languages of the appointed user. And comparing the compliance value of the text translation conversion with a reference correction improvement compliance threshold corresponding to the translation conversion language of the appointed user, and if the compliance value of the text translation conversion is lower than the reference correction improvement compliance threshold corresponding to the translation conversion language of the appointed user, managing and prompting the translation text of the appointed user audio. A second aspect of the present application provides a language translation system for correcting a guide pronunciation, comprising: pronunciation audio frequency identification scoreAnd the analysis module is used for acquiring the requirement translation conversion languages of the appointed user, identifying the pronunciation audio of the appointed user and analyzing the pronunciation error index of each byte corresponding to the appointed user.
And the pronunciation correction translation module is used for carrying out correction prompt on the pronunciation of the appointed user, converting the corrected standard pronunciation audio frequency translation into a text, and calculating the pronunciation conversion degree coefficient of the appointed user.
And the correction audio frequency identification and analysis module is used for identifying the pronunciation audio frequency corresponding to the guide correction of the appointed user and calculating the pronunciation correction improvement coefficient of the appointed user.
And the compliance audio translation management module is used for preprocessing the compliance correction pronunciation audio of the appointed user, performing text translation conversion, and calculating a compliance value of the text translation conversion, so as to manage and prompt the conversion translation text of the appointed user audio.
The language information base is used for storing the standard duration time, standard frequency and standard intensity corresponding to each byte, storing the standard perception waveform and standard tone corresponding to each byte of audio frequency and storing the standard spectrum length corresponding to each byte.
Compared with the prior art, the embodiment of the application has at least the following advantages or beneficial effects:
(1) The application provides the language translation method and the language translation system for correcting and guiding pronunciation, which are used for carrying out secondary correction and text translation conversion on the pronunciation audio of the user, so that a more scientific and reliable data basis is provided for comprehensively correcting the pronunciation audio of the user, and more persuasive support data is provided for managing and prompting the conversion translation text of the user audio, and meanwhile, the pronunciation audio analysis processing of the user is ensured to be more accurate.
(2) According to the application, through identifying the corresponding pronunciation audio after the guiding correction of the appointed user and analyzing the pronunciation correction improvement coefficient of the appointed user, more scientific data support can be accurately provided for the correction audio of the subsequent analysis user, and the pronunciation standard of the user is improved to a certain extent.
(3) According to the method and the device, the compliance correction pronunciation audio of the appointed user is preprocessed, and the text translation conversion is carried out, so that interference factors in the compliance correction pronunciation audio of the user are effectively reduced, the follow-up analysis of the text translation conversion is facilitated to be more dependent, and the definition, the audibility and the understandability of the audio are improved.
(4) According to the method and the device, the compliance value of text translation conversion is calculated, management prompt is carried out on the conversion translation text of the audio of the appointed user, comprehensive analysis is carried out on the user compliance correction pronunciation audio and the audio pretreatment degree index of the appointed user, the effectiveness of managing the conversion translation text of the audio of the user is improved, and meanwhile the user is helped to understand and express the conversion translation text more accurately.
Drawings
The application will be further described with reference to the accompanying drawings, in which embodiments do not constitute any limitation of the application, and other drawings can be obtained by one of ordinary skill in the art without inventive effort from the following drawings.
FIG. 1 is a flow chart of the method steps of the present application.
Fig. 2 is a schematic diagram of system configuration connection according to the present application.
Detailed Description
The technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, but not all embodiments, and all other embodiments obtained by those skilled in the art without making creative efforts based on the embodiments of the present application are included in the protection scope of the present application.
Referring to fig. 1, a first aspect of the present application provides a language translation method for correcting a guide pronunciation, including: step one, obtaining the required translation conversion languages of the appointed user, identifying the pronunciation audio of the appointed user, and analyzing the pronunciation error index of each byte corresponding to the appointed user.
Specifically, the specific analysis process for identifying the pronunciation audio of the appointed user is as follows:
identifying, screening and counting corresponding byte audio in the voice audio of the appointed user, thereby identifying and counting associated parameters of the corresponding byte audio of the appointed user, wherein the associated parameters comprise durationFrequency->Intensity->Wherein i is denoted by the number of each byte, < >>M is expressed as the number of bytes. It should be explained that, the above-mentioned recognition of the pronunciation audio of the specified user is to obtain the associated parameters of each byte of audio by sensing the pronunciation audio of the user through the voice recognition technology. Extracting standard duration +.>Standard frequency->Standard intensity->Calculating the sound spectrum coincidence coefficient corresponding to each byte of the appointed user>The calculation formula is as follows: />Wherein->And->Respectively denoted as a correction factor corresponding to the predefined duration, frequency and intensity. It should be further explained that, the above calculation designates the corresponding sound spectrum coincidence coefficient of each byte of the user, because the duration of sound can provide the duration information of syllables and phonemes, for detecting whether the voice rhythm and pitch boundary of the user are standard, and the frequency of sound can provide the pitch information of pronunciation, for detecting the accuracy of the tone and pitch in the user audio, the intensity of sound can provide the volume information of pronunciation, for detecting whether the user pronounces too strongly or too weakly, so that the refined analysis of the associated parameters of each byte can provide more comprehensive data basis for the subsequent correction of the user audio.
Collecting input perception waveforms corresponding to each byte of audio of a specified user, and comparing the input perception waveforms with standard perception waveforms corresponding to each byte of audio extracted from a language information base to obtain waveform height differences corresponding to each byte of audio of the specified userAnd input sense waveform overlap length +>Simultaneously extracting the predefined permissible waveform height difference corresponding to each byte of audio>And extracting the standard perception waveform length +.>. The input sensing waveforms corresponding to the audio of each byte of the designated user are collected, and waveform analysis can be performed by the audio analysis software. It should be further explained that the waveform height difference corresponding to each byte of audio of the specified user is obtained by screening each word through the input sensing waveform corresponding to each byte of audio of the specified userWaveform heights corresponding to the highest point and the lowest point of the audio-saving frequency input perception waveform are obtained through difference processing, and waveform height differences corresponding to each byte of audio frequency of a specified user are obtained. The input perception waveform overlapping length is obtained by overlapping and comparing the input perception waveform corresponding to each byte of audio of the appointed user with the standard perception waveform. Calculating perceptual waveform coincidence coefficient corresponding to each byte of a specified user>The calculation formula is as follows: />Wherein->And->Respectively expressed as a predefined waveform height difference and a perceived waveform length. It should be explained that, the above calculation designates the perception waveform coincidence coefficient corresponding to each byte of the user, the difference between the pronunciation of the user and the correct pronunciation can be compared through the perception waveform, the pronunciation accuracy of the user can be evaluated, and the perception waveform can provide the voice rhythm and tone information, so as to detect whether the voice rhythm of the user is consistent with the target voice and whether the tone is correct, therefore, it is necessary to analyze the perception waveform in the pronunciation audio of the user, and more accurate data support is provided for the correction audio of the subsequent analysis user. Acquiring the average tone value +.>Simultaneously extracting standard tone values corresponding to each byte of audio from a language information baseCalculating tone coincidence coefficient corresponding to each byte of a specified user +.>The calculation formula is as follows:wherein->Expressed as a correction factor corresponding to a predefined byte tone, e is expressed as a natural constant. It should be further explained that, the average tone value of each byte of audio corresponding to the specified user is obtained through the audio analysis software, the tone refers to the high-low audio feature of the sound, which is determined by the vibration frequency of the vocal cords, the higher the vibration frequency is, the higher the tone is, the lower the vibration frequency is, the lower the tone is, the unit of the tone is hertz, the tone of the audio of the user is analyzed, and tone correction can be performed on the audio, and the content includes adjusting the pitch, tone or frequency component of the audio so as to improve the quality and definition of the audio. Further, the specific analysis process of the pronunciation error index of each byte corresponding to the specified user is as follows: according to the sound spectrum coincidence coefficient, the perception waveform coincidence coefficient and the tone coincidence coefficient corresponding to each byte of the appointed user, calculating the pronunciation error index (I) of each byte corresponding to the appointed user>The calculation formula is as follows:wherein->、/>And->Respectively expressed as a predefined sound spectrum coincidence coefficient, a perception waveform coincidence coefficient and a weight factor corresponding to the sound tone coincidence coefficient.
Obtaining the number of the error bytes in the audio of the appointed user according to the pronunciation error index of each byte corresponding to the appointed user, correcting and prompting the number of the error bytes in the audio of the appointed user, translating and converting the corrected standard pronunciation audio into a text, and recording the text as a first translation and conversion text.
And secondly, correcting and prompting the pronunciation of the appointed user, translating and converting corrected standard pronunciation audio into a text, and calculating a pronunciation conversion degree coefficient of the appointed user.
Specifically, the pronunciation conversion degree coefficient of the appointed user comprises the following specific analysis processes:
according to the pronunciation error index of each byte corresponding to the appointed user, comparing the pronunciation error index with the preset reference pronunciation conversion degree limit value corresponding to each pronunciation error index section to obtain the reference pronunciation conversion degree limit value of each byte corresponding to the appointed user. Sequentially extracting duration difference ++corresponding to each byte of audio of specified user>Frequency difference->Intensity difference->Input perceived waveform Length Difference ∈ ->And tone difference->. It should be explained that the above-mentioned sequential extraction of the duration difference, frequency difference, intensity difference, input perception waveform length difference and tone difference corresponding to each byte of audio of the specified user is performed by obtaining the duration, frequency, intensity, input perception waveform overlapping length and average tone value of each byte of audio corresponding to the specified user, and extracting the standard duration, standard frequency, standard intensity and standard perception waveform length corresponding to each byte from the language information baseDegree and standard tone, duration difference, frequency difference, intensity difference, input perceived waveform length difference and tone difference obtained by difference processing. Calculating the pronunciation conversion degree coefficient of the appointed user>The calculation formula is as follows:wherein->、/>、/>And->Respectively expressed as predefined unit duration differences, unit frequency differences, unit intensity differences, unit input perceived waveform length differences, and unit tone differences. And thirdly, identifying the pronunciation audio corresponding to the guide correction of the appointed user, and analyzing the pronunciation correction improvement coefficient of the appointed user.
Specifically, the pronunciation correction improvement coefficient of the specified user comprises the following specific analysis processes:
identifying the pronunciation audio corresponding to the guide corrected appointed user to obtain corrected pronunciation audio of the appointed user, screening and extracting each byte correction frequency spectrum of the appointed user, and acquiring each byte frequency spectrum corresponding to the appointed user according to each byte audio corresponding to the appointed user.
It should be explained that, the above screening and extracting the correction spectrum of each byte of the specified user analyzes the spectrum in the audio by the spectrum analysis software, analyzes the spectrum in the audio of the user, can detect and identify the wrong phonemes in the user pronunciation, and if the tone of the user pronunciation deviates from the correct tone range, can correct the user pronunciation by providing the accurate tone target, thereby providing a finer analysis basis for correcting the user pronunciation later.
The correction spectrum of each byte of the appointed user is subjected to overlapping comparison with the spectrum of each byte corresponding to the appointed user, so as to obtain the overlapping length of the spectrum corresponding to each byte of the appointed userSimultaneously extracting standard spectrum length corresponding to each byte from the language information base>. According to the analysis mode of the pronunciation error index of each byte corresponding to the appointed user, the corrected pronunciation audio of the appointed user is analyzed and processed in the same way to obtain the pronunciation error index +_ of each corrected byte corresponding to the appointed user>Simultaneously extracting a predefined allowable pronunciation error definition value corresponding to each byte>. According to the corresponding pronunciation correction improvement coefficient threshold value under each predefined pronunciation conversion degree coefficient interval, matching to obtain the pronunciation correction improvement coefficient threshold value of the appointed user +.>. Calculating pronunciation correction improvement coefficient of specified user>The calculation formula is as follows:wherein->And->Respectively representAnd the correction factors corresponding to the preset spectrum length and the pronunciation error index are adopted. In a specific embodiment, the application identifies the pronunciation audio corresponding to the guide correction of the appointed user, analyzes the pronunciation correction improvement coefficient of the appointed user, not only can accurately provide more scientific data support for the correction audio of the subsequent analysis user, but also improves the pronunciation standard of the user to a certain extent. And fourthly, preprocessing the compliance correction pronunciation audio of the appointed user, performing text translation conversion, and calculating a compliance value of the text translation conversion, so as to manage and prompt the conversion translation text of the appointed user audio. Specifically, the method for preprocessing the compliance correction pronunciation audio of the appointed user includes the following specific analysis processes: acquiring preprocessing parameters corresponding to the compliance correction pronunciation audio of a specified user, wherein the preprocessing parameters comprise gain values and frequency values of all bytes, and obtaining a maximum byte gain value corresponding to the compliance correction pronunciation audio of the specified user through screening>And maximum byte frequency value->. It should be explained that, the above-mentioned compliance correction pronunciation audio of the appointed user is that the compliance correction pronunciation audio corresponding to the preset pronunciation correction improvement coefficient interval is matched by the pronunciation correction improvement coefficient of the appointed user, if the pronunciation correction improvement coefficient of the appointed user is within the pronunciation correction improvement coefficient interval, the compliance correction pronunciation audio of the appointed user is extracted. It should be further explained that the device for obtaining the maximum byte gain value and the maximum byte frequency value corresponding to the compliance correction pronunciation audio of the specified user by sieving is an audio processor, and the gain value and the frequency value are obtained by adjusting the gain value of the audio, so that the volume levels of different audio segments can be relatively balanced, which is helpful for eliminating the volume difference in the audio, effectively controlling the noise in the audio, improving the audibility and the perception quality of the audio, adjusting the frequency value of the audio, and balancing the energy of different frequency components in the audioThe quantity distribution can improve the definition and balance of the audio, and is beneficial to the standardization of the subsequent text translation conversion process. Extracting a predefined audio corresponding license byte gain value +.>And reference byte frequency value->Calculating an audio preprocessing degree index of a specified user>The calculation formula is as follows:wherein->And->Represented as a predefined byte gain value and a corresponding correction factor to the byte frequency value, respectively. In a specific embodiment, the method and the device for correcting the pronunciation audio of the specified user perform preprocessing and text translation conversion, so that interference factors in the correction pronunciation audio of the specified user are effectively reduced, follow-up analysis of the text translation conversion is facilitated to be more dependent, and definition, audibility and understandability of the audio are improved. Further, the text translation conversion compliance value comprises the following specific analysis processes:
performing text translation conversion on the compliance correction pronunciation audio of the specified user to obtain a compliance correction text of the specified user, and extracting the number of coincident bytes in the compliance correction text of the specified userRepeat symbol number +.>. It should be explained that the above-mentioned extraction of compliance correction of a specified userThe coincident byte number and the repeated identifier number in the text can enable the compliance value of the translation conversion of the analysis text to be more accurate, so that when the subsequent management prompt is carried out on the conversion translation text of the audio of the appointed user, the management of related personnel can be facilitated to a certain extent, and the user can be helped to understand and express the conversion translation text more accurately. Comparing the first translation conversion text with the compliance correction text of the appointed user to obtain the compliance coincident byte number +.>Number of compliant repetition identifier +.>. According to the audio preprocessing degree index of the appointed user, the license reference coincidence byte number corresponding to each preset audio preprocessing degree index section is compared with the license reference repetition identifier number to obtain the license reference coincidence byte number of the appointed user>And permit reference repeat identifier number +.>. Calculating the compliance value of the text translation conversion +.>The calculation formula is as follows:wherein->And->Respectively expressed as a predefined number of coincident bytes and a corresponding correction factor for the number of repeated identifier symbols. Specifically, the management prompt is performed on the conversion translation text of the audio of the appointed user, and the specific analysis process is as follows: correcting improvement coefficients according to pronunciation of specified user and turning over requirements of specified userAnd translating the converted languages, and matching the reference correction improvement compliance threshold corresponding to each predefined converted language in the pronunciation correction improvement coefficient interval to obtain the reference correction improvement compliance threshold corresponding to the required translated converted language of the appointed user.
And comparing the compliance value of the text translation conversion with a reference correction improvement compliance threshold corresponding to the translation conversion language of the appointed user, and if the compliance value of the text translation conversion is lower than the reference correction improvement compliance threshold corresponding to the translation conversion language of the appointed user, managing and prompting the translation text of the appointed user audio.
In a specific embodiment, the method and the device for managing the translation text of the audio of the user improve the effectiveness of managing the translation text of the audio of the user by calculating the compliance value of the translation of the text, managing and prompting the translation text of the audio of the user, comprehensively analyzing the user compliance correction pronunciation audio and the audio pretreatment degree index of the user, and are beneficial to helping the user to more accurately understand and express the translation text. Referring to FIG. 2, a second aspect of the present application provides a language translation system for correcting a guided utterance, comprising: the system comprises a pronunciation audio recognition analysis module, a pronunciation correction translation module, a correction audio recognition analysis module, a compliance audio translation management module and a language information base.
The pronunciation audio frequency identification analysis module is connected with the pronunciation correction translation module, the pronunciation correction translation module is connected with the correction audio frequency identification analysis module, the correction audio frequency identification analysis module is connected with the compliance audio frequency translation management module, and the pronunciation audio frequency identification analysis module and the correction audio frequency identification analysis module are both connected with the language information base.
The pronunciation audio recognition analysis module is used for acquiring the requirement translation conversion languages of the appointed user, recognizing the pronunciation audio of the appointed user and analyzing the pronunciation error indexes of the corresponding bytes of the appointed user.
The pronunciation correction translation module is used for correcting and prompting the pronunciation of the appointed user, converting corrected standard pronunciation audio frequency translation into text, and calculating the pronunciation conversion degree coefficient of the appointed user.
The correction audio recognition analysis module is used for recognizing the pronunciation audio corresponding to the guide correction of the appointed user and calculating the pronunciation correction improvement coefficient of the appointed user.
The compliance audio translation management module is used for preprocessing the compliance correction pronunciation audio of the appointed user, performing text translation conversion, and calculating a compliance value of the text translation conversion, so that management prompt is performed on the conversion translation text of the appointed user audio.
The language information base is used for storing standard duration, standard frequency and standard intensity corresponding to each byte, storing standard perception waveform and standard tone corresponding to each byte of audio, and storing standard spectrum length corresponding to each byte.
In a specific embodiment, the application provides the language translation method and the language translation system for correcting and guiding pronunciation, which are used for carrying out secondary correction and text translation conversion on the pronunciation audio of the user, providing more scientific and reliable data basis for comprehensively correcting the pronunciation audio of the user, so as to provide more convincing support data for managing and prompting the conversion translation text of the user audio, and simultaneously ensuring that the pronunciation audio analysis processing of the user is more accurate.
The foregoing is merely illustrative of the structures of this application and various modifications, additions and substitutions for those skilled in the art can be made to the described embodiments without departing from the scope of the application or from the scope of the application as defined in the accompanying claims.

Claims (9)

1. A method of correcting a language translation of a guided utterance, comprising:
step one, obtaining the required translation conversion languages of a designated user, identifying the pronunciation audio of the designated user, and analyzing the pronunciation error index of each byte corresponding to the designated user;
correcting and prompting the pronunciation of the appointed user, translating and converting corrected standard pronunciation audio into a text, and calculating a pronunciation conversion degree coefficient of the appointed user;
step three, identifying the pronunciation audio corresponding to the guide correction of the appointed user, and analyzing the pronunciation correction improvement coefficient of the appointed user;
and fourthly, preprocessing the compliance correction pronunciation audio of the appointed user, performing text translation conversion, and calculating a compliance value of the text translation conversion, so as to manage and prompt the conversion translation text of the appointed user audio.
2. A method of language translation for correcting a guided utterance as recited in claim 1, wherein: the specific analysis process for identifying the pronunciation audio of the appointed user comprises the following steps:
identifying, screening and counting corresponding byte audio in the voice audio of the appointed user, thereby identifying and counting associated parameters of the corresponding byte audio of the appointed user, wherein the associated parameters comprise durationFrequency->Intensity->Wherein i is denoted by the number of each byte, < >>M is expressed as the number of bytes; extracting standard duration +.>Standard frequency->Standard intensity->Calculating the sound spectrum coincidence coefficient corresponding to each byte of the appointed user>The calculation formula is as follows:wherein->、/>And->Correction factors, respectively expressed as predefined durations, frequencies and intensities; collecting input perception waveforms corresponding to each byte of audio of a specified user, and comparing the input perception waveforms with standard perception waveforms corresponding to each byte of audio extracted from a language information base to obtain waveform height differences +_ corresponding to each byte of audio of the specified user>And input sense waveform overlap length +>Simultaneously extracting the predefined permissible waveform height difference corresponding to each byte of audio>And extracting standard perception waveform length corresponding to each byteThe method comprises the steps of carrying out a first treatment on the surface of the Calculating perceptual waveform coincidence coefficient corresponding to each byte of a specified user>The calculation formula is as follows:wherein->And->Correction factors respectively expressed as predefined waveform height differences and perceived waveform lengths; acquiring the average tone value +.>Simultaneously extracting standard tone values corresponding to each byte of audio from the language information base>Calculating tone coincidence coefficient corresponding to each byte of a specified user +.>The calculation formula is as follows: />Wherein->Expressed as a correction factor corresponding to a predefined byte tone, e is expressed as a natural constant.
3. A method of language translation for correcting a guided utterance as recited in claim 2, wherein: the specific analysis process of the pronunciation error index of each byte corresponding to the appointed user is as follows:
according to the sound spectrum coincidence coefficient, the perception waveform coincidence coefficient and the tone coincidence coefficient corresponding to each byte of the appointed user, calculating the pronunciation error index of each byte corresponding to the appointed userThe calculation formula is as follows:wherein->、/>And->Respectively representing the predefined sound spectrum coincidence coefficient, the perception waveform coincidence coefficient and the weight factor corresponding to the sound tone coincidence coefficient; obtaining the number of the error bytes in the audio of the appointed user according to the pronunciation error index of each byte corresponding to the appointed user, correcting and prompting the number of the error bytes in the audio of the appointed user, translating and converting the corrected standard pronunciation audio into a text, and recording the text as a first translation and conversion text.
4. A method of language translation for correcting a guided utterance as recited in claim 3, wherein: the specific analysis process of the pronunciation conversion degree coefficient of the appointed user is as follows:
according to the pronunciation error index of each byte corresponding to the appointed user, comparing the pronunciation error index with the preset reference pronunciation conversion degree limit value corresponding to each pronunciation error index section to obtain the reference pronunciation conversion degree limit value of each byte corresponding to the appointed userThe method comprises the steps of carrying out a first treatment on the surface of the Sequentially extracting duration difference ++corresponding to each byte of audio of specified user>Frequency difference->Intensity difference->Input perceived waveform Length Difference ∈ ->And tone difference->The method comprises the steps of carrying out a first treatment on the surface of the Calculating the pronunciation conversion degree coefficient of the appointed user>The calculation formula is as follows:wherein->、/>、/>And->Respectively expressed as predefined unit duration differences, unit frequency differences, unit intensity differences, unit input perceived waveform length differences, and unit tone differences.
5. A method of language translation for correcting a guided utterance as recited in claim 1, wherein: the pronunciation correction improvement coefficient of the appointed user comprises the following specific analysis processes:
identifying pronunciation audio corresponding to the appointed user after guiding and correcting to obtain corrected pronunciation audio of the appointed user, screening and extracting each byte correction frequency spectrum of the appointed user, and acquiring each byte frequency spectrum corresponding to the appointed user according to each byte audio corresponding to the appointed user;
the correction spectrum of each byte of the appointed user is subjected to overlapping comparison with the spectrum of each byte corresponding to the appointed user, so as to obtain the overlapping length of the spectrum corresponding to each byte of the appointed userSimultaneously extracting standard spectrum length corresponding to each byte from the language information base>The method comprises the steps of carrying out a first treatment on the surface of the According to the analysis mode of the pronunciation error index of each byte corresponding to the appointed user, the corrected pronunciation audio of the appointed user is analyzed and processed in the same way to obtain the pronunciation error index +_ of each corrected byte corresponding to the appointed user>Simultaneously extracting a predefined allowable pronunciation error definition value corresponding to each byte>The method comprises the steps of carrying out a first treatment on the surface of the According to the corresponding pronunciation correction improvement coefficient threshold value under each predefined pronunciation conversion degree coefficient interval, matching to obtain the pronunciation correction improvement coefficient threshold value of the appointed user +.>The method comprises the steps of carrying out a first treatment on the surface of the Calculating pronunciation correction improvement coefficient of specified user>The calculation formula is as follows: />Wherein->And->Respectively representing the correction factors corresponding to the preset spectrum length and the pronunciation error index.
6. A method of language translation for correcting a guided utterance as recited in claim 1, wherein: the method comprises the following specific analysis processes of preprocessing the compliance correction pronunciation audio of the appointed user:
acquiring preprocessing parameters corresponding to the compliance correction pronunciation audio of a specified user, wherein the preprocessing parameters comprise gain values and frequency values of all bytes, and obtaining a maximum byte gain value corresponding to the compliance correction pronunciation audio of the specified user through screeningAnd maximum byte frequency value->The method comprises the steps of carrying out a first treatment on the surface of the Extracting a predefined audio corresponding license byte gain value +.>And a reference byte frequency valueCalculating an audio preprocessing degree index of a specified user>The calculation formula is as follows:wherein->And->Represented as a predefined byte gain value and a corresponding correction factor to the byte frequency value, respectively.
7. A method of correcting a guided pronunciation as claimed in claim 6, wherein: the text translation conversion compliance value comprises the following specific analysis processes:
performing text translation conversion on the compliance correction pronunciation audio of the specified user to obtain a compliance correction text of the specified user, and extracting the number of coincident bytes in the compliance correction text of the specified userRepeat symbol number +.>The method comprises the steps of carrying out a first treatment on the surface of the Comparing the first translation conversion text with the compliance correction text of the appointed user to obtain the compliance coincident byte number +.>Number of compliant repetition identifier +.>The method comprises the steps of carrying out a first treatment on the surface of the According to the audio preprocessing degree index of the appointed user, the license reference coincidence byte number corresponding to each preset audio preprocessing degree index section is compared with the license reference repetition identifier number to obtain the license reference coincidence byte number of the appointed user>And permit reference repeat identifier number +.>The method comprises the steps of carrying out a first treatment on the surface of the Calculating the compliance value of the text translation conversion +.>The calculation formula is as follows: />Wherein->And->Respectively expressed as a predefined number of coincident bytes and a corresponding correction factor for the number of repeated identifier symbols.
8. A method of language translation for correcting a guided utterance as recited in claim 1, wherein: the specific analysis process of the management prompt for the conversion translation text of the audio of the appointed user is as follows:
translating the converted languages according to the pronunciation correction improvement coefficient of the appointed user and the requirement of the appointed user, and matching the converted languages with a reference correction improvement compliance threshold corresponding to each predefined converted language in a pronunciation correction improvement coefficient interval to obtain a reference correction improvement compliance threshold corresponding to the requirement translation converted language of the appointed user;
and comparing the compliance value of the text translation conversion with a reference correction improvement compliance threshold corresponding to the translation conversion language of the appointed user, and if the compliance value of the text translation conversion is lower than the reference correction improvement compliance threshold corresponding to the translation conversion language of the appointed user, managing and prompting the translation text of the appointed user audio.
9. A language translation system for correcting a pilot pronunciation, characterized by: comprising the following steps:
the pronunciation audio recognition analysis module is used for acquiring the requirement translation conversion languages of the appointed user, recognizing the pronunciation audio of the appointed user and analyzing pronunciation error indexes of the appointed user corresponding to the bytes;
the pronunciation correction translation module is used for correcting and prompting the pronunciation of the appointed user, converting the corrected standard pronunciation audio frequency translation into a text, and calculating the pronunciation conversion degree coefficient of the appointed user;
the correction audio frequency identification analysis module is used for identifying the pronunciation audio frequency corresponding to the guide correction of the appointed user and calculating the pronunciation correction improvement coefficient of the appointed user;
the compliance audio translation management module is used for preprocessing the compliance correction pronunciation audio of the appointed user, carrying out text translation conversion, and calculating a compliance value of the text translation conversion, so as to manage and prompt the conversion translation text of the appointed user audio;
the language information base is used for storing the standard duration time, standard frequency and standard intensity corresponding to each byte, storing the standard perception waveform and standard tone corresponding to each byte of audio frequency and storing the standard spectrum length corresponding to each byte.
CN202311159416.3A 2023-09-11 2023-09-11 Language translation method and system for correcting guide pronunciation Active CN116894442B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311159416.3A CN116894442B (en) 2023-09-11 2023-09-11 Language translation method and system for correcting guide pronunciation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311159416.3A CN116894442B (en) 2023-09-11 2023-09-11 Language translation method and system for correcting guide pronunciation

Publications (2)

Publication Number Publication Date
CN116894442A true CN116894442A (en) 2023-10-17
CN116894442B CN116894442B (en) 2023-12-05

Family

ID=88311100

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311159416.3A Active CN116894442B (en) 2023-09-11 2023-09-11 Language translation method and system for correcting guide pronunciation

Country Status (1)

Country Link
CN (1) CN116894442B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117313722A (en) * 2023-11-28 2023-12-29 卓世未来(天津)科技有限公司 Large language model reasoning accuracy prediction method and device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101661675A (en) * 2009-09-29 2010-03-03 苏州思必驰信息科技有限公司 Self-sensing error tone pronunciation learning method and system
JP2010055044A (en) * 2008-04-22 2010-03-11 Ntt Docomo Inc Device, method and system for correcting voice recognition result
CN109461436A (en) * 2018-10-23 2019-03-12 广东小天才科技有限公司 A kind of correcting method and system of speech recognition pronunciation mistake
CN109545189A (en) * 2018-12-14 2019-03-29 东华大学 A kind of spoken language pronunciation error detection and correcting system based on machine learning
CN110085261A (en) * 2019-05-16 2019-08-02 上海流利说信息技术有限公司 A kind of pronunciation correction method, apparatus, equipment and computer readable storage medium
CN110718210A (en) * 2019-09-25 2020-01-21 北京字节跳动网络技术有限公司 English mispronunciation recognition method, device, medium and electronic equipment
CN115148226A (en) * 2021-03-30 2022-10-04 暗物智能科技(广州)有限公司 Pronunciation correction method and device and electronic equipment

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010055044A (en) * 2008-04-22 2010-03-11 Ntt Docomo Inc Device, method and system for correcting voice recognition result
CN101661675A (en) * 2009-09-29 2010-03-03 苏州思必驰信息科技有限公司 Self-sensing error tone pronunciation learning method and system
CN109461436A (en) * 2018-10-23 2019-03-12 广东小天才科技有限公司 A kind of correcting method and system of speech recognition pronunciation mistake
CN109545189A (en) * 2018-12-14 2019-03-29 东华大学 A kind of spoken language pronunciation error detection and correcting system based on machine learning
CN110085261A (en) * 2019-05-16 2019-08-02 上海流利说信息技术有限公司 A kind of pronunciation correction method, apparatus, equipment and computer readable storage medium
CN110718210A (en) * 2019-09-25 2020-01-21 北京字节跳动网络技术有限公司 English mispronunciation recognition method, device, medium and electronic equipment
CN115148226A (en) * 2021-03-30 2022-10-04 暗物智能科技(广州)有限公司 Pronunciation correction method and device and electronic equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
XIAOBIN LIU: "Improving English pronunciation via automatic speech recognition technology", INTERNATIONAL JOURNAL OF INNOVATION AND LEARNING, pages 126 - 140 *
赵博;檀晓红;: "基于语音识别技术的英语口语教学系统", 计算机应用, pages 762 - 773 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117313722A (en) * 2023-11-28 2023-12-29 卓世未来(天津)科技有限公司 Large language model reasoning accuracy prediction method and device
CN117313722B (en) * 2023-11-28 2024-02-13 卓世未来(天津)科技有限公司 Large language model reasoning accuracy prediction method and device

Also Published As

Publication number Publication date
CN116894442B (en) 2023-12-05

Similar Documents

Publication Publication Date Title
CN101751919B (en) Spoken Chinese stress automatic detection method
CN109192194A (en) Voice data mask method, device, computer equipment and storage medium
US11810471B2 (en) Computer implemented method and apparatus for recognition of speech patterns and feedback
CN116894442B (en) Language translation method and system for correcting guide pronunciation
KR20160122542A (en) Method and apparatus for measuring pronounciation similarity
CN110008481B (en) Translated voice generating method, device, computer equipment and storage medium
Liu et al. Acoustical assessment of voice disorder with continuous speech using ASR posterior features
US20120078625A1 (en) Waveform analysis of speech
CN110782902A (en) Audio data determination method, apparatus, device and medium
Bone et al. Classifying language-related developmental disorders from speech cues: the promise and the potential confounds.
CN112735404A (en) Ironic detection method, system, terminal device and storage medium
CN106782503A (en) Automatic speech recognition method based on physiologic information in phonation
JP2017076127A (en) Apparatus and method for normalizing input data of acoustic model and speech recognition apparatus
CN111489763A (en) Adaptive method for speaker recognition in complex environment based on GMM model
KR20080018658A (en) Pronunciation comparation system for user select section
CN110246514B (en) English word pronunciation learning system based on pattern recognition
KR102557092B1 (en) Automatic interpretation and translation and dialogue assistance system using transparent display
CN112767961B (en) Accent correction method based on cloud computing
CN113160796B (en) Language identification method, device and equipment for broadcast audio and storage medium
CN114203160A (en) Method, device and equipment for generating sample data set
Wang A Machine Learning Assessment System for Spoken English Based on Linear Predictive Coding
CN109087651B (en) Voiceprint identification method, system and equipment based on video and spectrogram
CN111916106B (en) Method for improving pronunciation quality in English teaching
CN111341298A (en) Speech recognition algorithm scoring method
Seman et al. Hybrid methods of Brandt’s generalised likelihood ratio and short-term energy for Malay word speech segmentation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant