CN116805495B - Pronunciation deviation detection and action feedback method and system based on large language model - Google Patents

Pronunciation deviation detection and action feedback method and system based on large language model Download PDF

Info

Publication number
CN116805495B
CN116805495B CN202311039410.2A CN202311039410A CN116805495B CN 116805495 B CN116805495 B CN 116805495B CN 202311039410 A CN202311039410 A CN 202311039410A CN 116805495 B CN116805495 B CN 116805495B
Authority
CN
China
Prior art keywords
pronunciation
data set
feedback
language model
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311039410.2A
Other languages
Chinese (zh)
Other versions
CN116805495A (en
Inventor
解焱陆
钟辉航
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING LANGUAGE AND CULTURE UNIVERSITY
Original Assignee
BEIJING LANGUAGE AND CULTURE UNIVERSITY
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING LANGUAGE AND CULTURE UNIVERSITY filed Critical BEIJING LANGUAGE AND CULTURE UNIVERSITY
Priority to CN202311039410.2A priority Critical patent/CN116805495B/en
Publication of CN116805495A publication Critical patent/CN116805495A/en
Application granted granted Critical
Publication of CN116805495B publication Critical patent/CN116805495B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Electrically Operated Instructional Devices (AREA)

Abstract

The application provides a pronunciation deviation detection and action feedback method and system based on a large language model, which relate to the field of pronunciation deviation detection and comprise the following steps: obtaining a bilingual pronunciation error data set of a read-following text to carry out phoneme labeling; sending the marked data set and phonemes corresponding to the follow-up text to gtp4, and calling an API of the gtp4 to obtain a pronunciation action feedback data set of any number of follow-up texts; based on the pronunciation action feedback data set, a pronunciation action feedback fine tuning large language model is obtained, and any follow-up text is input to the pronunciation action feedback fine tuning large language model, so that pronunciation deviation detection and action feedback based on the large language model are completed. The pronunciation action feedback of any reading text can be realized, and the data set for fine tuning the large language model can be automatically obtained. The method solves the defects that the existing manpower cost is high, and the follow-up text is limited, and the feedback information obtained by the method based on statistics is more effective because the feedback results are feedback based on real phonemes.

Description

Pronunciation deviation detection and action feedback method and system based on large language model
Technical Field
The application relates to the technical field of pronunciation deviation detection, in particular to a pronunciation deviation detection and action feedback method and system based on a large language model.
Background
In computer-aided bilingual teaching systems, how the system feeds back feedback information of a bilingual learner with some pronunciation actions has been a difficulty and focus of research and industry. The method is characterized in that a finite state automaton is used for modeling paths of correct voices and voices possibly wrong in a follow-up text, feedback information of each path is written in advance manually according to possible conditions of the recognition paths, and feedback information with pronunciation actions can be given to a learner. At present, the common practice in industry is to obtain actual phonemes of a bilingual learner through a bias detection system, compare the actual phonemes with phonemes of a text to be read, wherein different phonemes are considered as places with possibility of mistakes, obtain voice positions with possibility of mistakes, and output reasons for the possibility of mistakes of the positions through some statistical methods.
The method based on finite state automata can give the bilingual learner some feedback information with pronunciation actions, however, the method firstly needs to obtain possible voice paths through advanced design or through a statistical method, and needs to write feedback information manually in advance, both the two works need to be completed by voice-related practitioners, and currently, the industry is generally unable to accept human resources consumed by the method. Secondly, the method can only feed back the follow-up text in a given text range, which is also a reason for the difficulty of the current industrial application. At present, feedback information of most possible error reasons of the position is returned based on a statistical method, however, there may be countless reasons for causing the position to be in error, feedback information seen by most bilingual learners is not feedback information aiming at real conditions, effective information obtained finally is limited, and feedback information for pronunciation action correction is not obtained.
Disclosure of Invention
The application provides a pronunciation deviation detection and action feedback method and system based on a large language model, which solve the problems that feedback information seen by a bilingual learner is not feedback information aiming at real conditions, effective information obtained finally is limited, and feedback information for pronunciation action correction cannot be obtained in the prior art.
In order to solve the above-mentioned purpose, the technical scheme provided by the application is as follows: a pronunciation deviation detection and action feedback method based on a large language model is characterized by comprising the following steps:
s1, acquiring a follow-up text, acquiring a bilingual pronunciation deviation data set based on the follow-up text, and labeling phonemes of the bilingual pronunciation deviation data set;
s2, sending the marked bilingual pronunciation deviation data set and phonemes corresponding to the reading text to gtp4, and performing pronunciation correction training through gpt 4;
s3, based on the pronunciation correction training, calling an API of the gtp4 to obtain a pronunciation action feedback data set of any number of follow-up texts, and carrying out attribute feedback on wrong pronunciation through the pronunciation action feedback data set;
s4, acquiring a pronunciation action feedback fine tuning large language model based on the pronunciation action feedback data set;
s5, inputting any follow-up text to the pronunciation action feedback fine-tuning large language model, and finishing pronunciation deviation detection and action feedback based on the large language model.
Preferably, in step S1, the phoneme labeling of the bilingual pronunciation deviation dataset includes:
and labeling the real phonemes in the bilingual pronunciation deviation data set of the follow-up text.
Preferably, in step S2, the labeled bilingual pronunciation deviation dataset and phonemes corresponding to the read text are sent to gtp4, and pronunciation correction training is performed by the gpt4, which includes:
the marked bilingual pronunciation deviation dataset and phonemes corresponding to the reading text are sent to gtp4, word segmentation is carried out through the gpt4, and the actual phonemes after word segmentation are obtained;
inputting the corresponding relation between the preset correct phonemes and the pronunciation attribute to the gtp4;
and converting the part with the difference between the actual phoneme and the correct phoneme into a pronunciation attribute through the gtp4 based on the corresponding relation between the correct phoneme and the pronunciation attribute, and acquiring correction information of the pronunciation action.
Preferably, inputting the preset correspondence between the correct phoneme and the pronunciation attribute to the gtp4 includes:
and presetting a correct phoneme, and inputting the corresponding relation between the correct phoneme and the pronunciation attribute meaning of each dimension into the gtp4.
Preferably, based on the correspondence between the correct phoneme and the pronunciation attribute, the step of converting the part of the actual phoneme having the difference from the correct phoneme into the pronunciation attribute by the gtp4, and obtaining the correction information of the pronunciation action includes:
converting the part with the difference between the actual phoneme and the correct phoneme into a pronunciation attribute through the gtp4 based on the correspondence between the phoneme and the pronunciation attribute;
and judging the dimension with wrong pronunciation by checking the pronunciation attribute meaning of each dimension, and obtaining the correction information of the pronunciation action.
Preferably, in step S3, based on the pronunciation correction training, calling the API of gtp4 to obtain a pronunciation action feedback data set of any number of follow-up text, and performing attribute feedback on the mispronounced pronunciation through the pronunciation action feedback data set, including:
based on the pronunciation correction training, calling the API of the gtp4, and adding pronunciation attribute priori knowledge into a prompt;
acquiring a pronunciation action feedback data set of any number of follow-up texts, wherein data in the pronunciation action feedback data set is focused on pronunciation action feedback;
and carrying out attribute feedback on the mispronounced sound through the sound action feedback data set.
Preferably, in step S4, based on the pronunciation action feedback data set, a pronunciation action feedback fine tuning large language model is obtained, including:
based on the pronunciation action feedback data set, fine tuning is carried out on the large language model by using Ptning, and a pronunciation action feedback fine tuning large language model is obtained;
wherein the large language model comprises: chatglm6b or chatglm130b.
Preferably, fine tuning the large language model includes:
presetting a question data set and an answer data set;
limiting a context length of the follow-up text by the question dataset; according to the problem data set, word segmentation is carried out on phonemes corresponding to the read text and real phonemes, whether insertion, deletion and replacement errors exist or not is judged;
the answer data set includes: the answers of the question data set and feedback information of pronunciation action correction are given by the model through pronunciation attributes;
and defining the error part of the follow-up text through the question data set and the answer data set, and fine-tuning the large language model according to the error part.
Preferably, in step S5, inputting an arbitrary follow-up text to the pronunciation action feedback fine tuning large language model to complete pronunciation deviation detection and action feedback based on the large language model, including:
inputting any reading-following text to the pronunciation action feedback fine tuning large language model;
judging insertion, deletion or replacement errors in the follow-up text;
and generating pronunciation action feedback information aiming at the insertion, deletion or replacement errors, and finishing pronunciation deviation detection and action feedback based on a large language model.
A pronunciation deviation detection and action feedback system based on a large language model is used for the pronunciation deviation detection and action feedback method based on the large language model, and comprises the following steps:
the data acquisition labeling module is used for acquiring a follow-up text, acquiring a bilingual pronunciation deviation data set based on the follow-up text, and labeling phonemes of the bilingual pronunciation deviation data set;
the training module is used for sending the marked data set and phonemes corresponding to the read-following text to gtp4, and performing pronunciation correction training through the gpt 4;
the preliminary feedback module is used for calling the API of the gtp4 to obtain a pronunciation action feedback data set of any number of follow-up text based on the pronunciation correction training, and carrying out attribute feedback on wrong pronunciation;
the model construction module is used for obtaining a pronunciation action feedback fine tuning large language model based on the pronunciation action feedback data set;
and the error detection feedback module is used for inputting any follow-up text to the pronunciation action feedback fine-tuning large language model to finish pronunciation error detection and action feedback based on the large language model.
In one aspect, an electronic device is provided, where the electronic device includes a processor and a memory, where at least one instruction is stored in the memory, where the at least one instruction is loaded and executed by the processor to implement the foregoing method for detecting pronunciation errors and feeding back actions based on a large language model.
In one aspect, a computer readable storage medium is provided, in which at least one instruction is stored, and the at least one instruction is loaded and executed by a processor to implement the above-mentioned large language model-based pronunciation deviation detection and action feedback method.
Compared with the prior art, the technical scheme has at least the following beneficial effects:
according to the scheme, the application aims to enable the computer bilingual teaching system to feed back feedback information with pronunciation actions, and provides a pronunciation deviation detection and action feedback method based on a large language model.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a large language model-based pronunciation deviation detection and action feedback method according to an embodiment of the present application;
FIG. 2 is a flow chart of gpt4 generated pronunciation action feedback provided by an embodiment of the present application;
FIG. 3 is a Ptning fine tuning diagram provided by an embodiment of the application;
FIG. 4 is a fluent learning feedback provided by an embodiment of the present application;
FIG. 5 is a feedback diagram using a finite state automaton provided by an embodiment of the application;
FIG. 6 is a chatgpt feedback diagram provided by an embodiment of the present application;
FIG. 7 is a chatgpt feedback diagram for adding pronunciation attributes provided by an embodiment of the present application;
FIG. 8 is a block diagram of a large language model-based pronunciation deviation detection and action feedback system according to an embodiment of the present application;
fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present application. It will be apparent that the described embodiments are some, but not all, embodiments of the application. All other embodiments, which can be made by a person skilled in the art without creative efforts, based on the described embodiments of the present application fall within the protection scope of the present application.
The application provides a pronunciation deviation detection and action feedback method and system based on a large language model, aiming at the problems that feedback information seen by a bilingual learner in the prior art is not feedback information aiming at real conditions, effective information obtained finally is limited, and feedback information for pronunciation action correction cannot be obtained.
As shown in FIG. 1, the embodiment of the application provides a pronunciation deviation detection and action feedback method based on a large language model, which can be realized by electronic equipment. The process flow of the method for detecting pronunciation errors and feeding back actions based on the large language model as shown in fig. 1 may include the following steps:
s101, acquiring a follow-up text, acquiring a bilingual pronunciation deviation data set based on the follow-up text, and labeling phonemes of the bilingual pronunciation deviation data set;
in a possible implementation, the phoneme labeling of the bilingual pronunciation bias data set includes:
and labeling the real phonemes in the bilingual pronunciation deviation dataset of the follow-up text.
S102, sending the marked bilingual pronunciation deviation data set and phonemes corresponding to the reading text to gtp4, and performing pronunciation correction training through gpt 4;
in a possible implementation manner, the marked data set is sent to gtp4, and pronunciation correction training is performed through gpt4, which includes:
the marked data set and phonemes corresponding to the read-following text are sent to gtp4, and word segmentation is carried out through gpt 4; obtaining an actual phoneme after word segmentation;
inputting the corresponding relation between the preset phonemes and the pronunciation attributes to gtp4;
and the gtp4 converts the part with the difference between the actual phoneme and the correct phoneme into the pronunciation attribute based on the corresponding relation between the phoneme and the pronunciation attribute, and acquires the correction information of the pronunciation action.
In a possible implementation manner, inputting the preset correspondence between phonemes and pronunciation attributes to gtp4 includes:
the correspondence between phonemes and pronunciation attributes, and the meaning of the pronunciation attribute for each dimension are input to gtp4.
In a possible implementation manner, gtp4 converts a part with a difference between an actual phoneme and a correct phoneme into a pronunciation attribute based on a correspondence between the phonemes and the pronunciation attribute, and obtains correction information of a pronunciation action, including:
gtp4 converts the part with the difference between the actual phoneme and the correct phoneme into the pronunciation attribute based on the corresponding relation between the phoneme and the pronunciation attribute;
and judging the dimension in which the mispronounced sound exists by checking the meaning of each dimension of the sound attribute, and obtaining the correction information of the sound action.
In a possible implementation manner, the following text, the phonemes corresponding to the following text and the real phonemes are sent to tgpt4 first, so that the chatgpt4 performs word segmentation on the phonemes corresponding to the following text and the real phonemes. For example: reading text with: she was his now forever. Correct phonemes: sil sh iy w ah z sil hh ih z n aw f er eh v er sil. Actual phonemes: sil sh iy w ah s ih n hh ih z n aw f ao eh v er sil. The word segmentation results are as follows: she is [ sh iy ] -she is [ sh iy ], is [ w ah z ] -is: [ w ah s ], is: [ hhih z ] -is: [ ihn hhihz ], is: [ n aw ] -now: [ n aw ], forever: [ f er ehv er ] -forever: [ f ao ehv er ]. The correspondence between gpt4 phones and pronunciation attributes and the meaning of the pronunciation attribute for each dimension are then told, the meaning of each dimension of the pronunciation attribute is shown in table 1. As in the first dimension, 0 indicates a position of the chin that is almost closed, 1 indicates that the chin is in a normal position, 2 indicates that the chin is slightly lowered, and 3 indicates that the chin is lowered.
TABLE 1 meaning of each dimension of pronunciation attributes
And then converting the different parts of the correct phonemes and the actual phonemes into pronunciation attributes by the gpt4 through the phoneme forwarding sound attribute relationship, and obtaining correction information of pronunciation actions by checking the meaning of each dimension of the pronunciation attributes. For example, s has a pronunciation attribute of [1,2,2,3,3,3,0,0], and z has a pronunciation attribute of [1,2,2,3,3,3,0,1]. By comparing the pronunciation attributes, it can be found that the two pronunciation attributes differ only in the 8 th dimension, which reflects whether the vocal cords vibrate or not. If a word, s, is wrongly sent as z, we can get feedback information of some pronunciation actions by comparing pronunciation attributes. When reading this word, z needs to increase the vocal cord vibration. This is just one simple example, and many real and useful pronunciation action feedback information can be obtained by comparing pronunciation attributes when errors in actual voices are changed. Fig. 2 is a flow chart of the process of the read-following text gpt4 analysis.
S103, based on the pronunciation correction training, calling an API of the gtp4 to obtain a pronunciation action feedback data set of any number of follow-up texts, and carrying out attribute feedback on wrong pronunciation through the pronunciation action feedback data set;
in a possible implementation, based on pronunciation correction training, calling the API of gtp4 to obtain a pronunciation action feedback data set of any number of read-following texts includes:
based on pronunciation correction training of gtp4, calling an API of gtp4, adding pronunciation attribute priori knowledge into a prompt, obtaining a pronunciation action feedback data set of any number of follow-up texts fed back by focused pronunciation actions, and feeding back attributes of wrong pronunciation.
In a possible implementation, several instructions of S101-S102 teach gpt4 how to feed back feedback information with pronunciation action modification via pronunciation attributes; the chat records are put into the api of chatgpt4, and then the api is called to obtain pronunciation action feedback information of any number of follow-up texts.
S104, acquiring a pronunciation action feedback fine tuning large language model based on the pronunciation action feedback data set;
in a possible embodiment, obtaining a pronunciation action feedback fine tuning large language model based on a pronunciation action feedback data set includes:
fine tuning the large language model using ptung based on the pronunciation action feedback dataset;
wherein the large language model comprises: chatglm6b or chatglm130b.
In a possible embodiment, fine tuning a large language model includes:
presetting a question data set and an answer data set;
limiting a context length of the follow-up text by the question dataset; according to the problem data set, word segmentation is carried out on phonemes corresponding to the read text and real phonemes, whether insertion, deletion and replacement errors exist or not is judged; replying the problem data in the problem data set, wherein replying is to segment the phonemes corresponding to the read text and the real phonemes, and then only judging whether each word has insertion, deletion and error replacement;
the answer dataset includes: the answers of the question data set and feedback information of pronunciation action correction are given by the model through pronunciation attributes;
and defining the error part of the follow-up text through the question data set and the answer data set, and fine-tuning the large language model according to the error part.
In a possible implementation manner, the pronunciation action feedback data set obtained by using gpt4 api is used for fine tuning a large language model, so that pronunciation action feedback information is fed back to any reading following text. The method for calling the api is different from the existing method for calling the api, and the existing method is used for searching a better prompt and completing a specific downstream task. The application provides that priori knowledge of pronunciation attribute is additionally added in the prompt so as to make feedback of pronunciation action more focused in reply. Fine tuning may use ptung and the large language model may select chatglm6b or chatglm130b. FIG. 3 is a flow chart of Ptning fine tuning, wherein Ptning adapts various downstream tasks by adding a learnable vector to a prompt, and the figure shows tasks of adding a learnable vector to a read-following text and adapting a sounding action bias feedback.
S105, inputting any reading-following text to the pronunciation action feedback fine-tuning large language model, and finishing pronunciation deviation detection and action feedback based on the large language model.
In one possible implementation, the method for inputting an arbitrary follow-up text to a pronunciation action feedback fine tuning large language model to complete pronunciation deviation detection and action feedback based on the large language model includes:
inputting any reading following text to the pronunciation action feedback fine tuning large language model;
judging insertion, deletion or replacement errors in the follow-up text;
and generating pronunciation action feedback information aiming at errors, and finishing pronunciation deviation detection and action feedback based on a large language model.
In a practical embodiment, the application is mainly aimed at language models with smaller relative parameter quantities, such as chatglm6b, and the final purpose is to enable the trimmed large language model to be analyzed and considered like chatgpt4 on the pronunciation action feedback task. When the chatglm is finely adjusted, two data sets are used, and the problem of the first data set is that a follow-up text is segmented, and phonemes and real phonemes corresponding to the follow-up text are judged whether insertion, deletion and replacement errors exist or not. The reply is to segment the phonemes corresponding to the read text and the real phonemes, and then only judge whether the insertion, deletion and error replacement exist in each word. The question of the second data set is the answer to the first data set and requires the model to give feedback information on pronunciation action corrections via pronunciation attributes. The purpose of the first dataset is to allow chatglm to learn to generate feedback for each word one by one, rather than directly generating all feedback for a sentence, as the chatglm learns to generate replies. This approach is particularly suited for language models where the parameter amounts are not as large (greater than 6b and less than 10 b). Because this approach limits the context semantic information of only one word to see when the large language model generates feedback, it is critical to large language models with weak logic and capture context. The second data set is to let chatglm learn to generate pronunciation action feedback information through pronunciation attribute, although the second data set has long problem, since the problem has been given which are wrong and which are not wrong, it is also limited that the model only sees the wrong part of the context information at a time. And finally, after chatglm is finely adjusted, two steps are needed to generate feedback information, wherein the first step is to generate information for judging which errors exist in insertion, deletion and replacement, and the second step is to generate pronunciation action feedback information.
The application has the main advantages that the feedback can be performed on any follow-up text, the figure 4 is a fluent app interface picture, the feedback can only tell the learner the possible error position and the overall scoring, the learner cannot know where the learner is out of question, and the feedback can only be performed on a given text. Fig. 5 is a diagram of modeling a correct pronunciation and a possible incorrect pronunciation path using a finite state automaton, but since feedback information obtained by the method needs to be written manually in advance, consumed manpower resources are huge, and feedback can be performed only for a given text, the method has not been adopted by the industry. Fig. 6 shows an interface for feedback by chatgpt, which can be seen that the feedback is more linguistic than fluent app, but the feedback information does not tell the bilingual learner how to make action correction, and effective feedback information is still not obtained for the bilingual learner with poor english basis. FIG. 7 is a schematic diagram of the method of the present application, in which the pronunciation attribute information is added to the chat alert, it can be seen that there are many specific pronunciation actions in the feedback information, and the feedback information is also for the same sentence of following text: but there came no promise from the bow of the canoe, a comparison of FIGS. 6 and 7 demonstrates the effectiveness of the method of the present application.
Fig. 8 is a schematic diagram of a large language model-based pronunciation deviation detection and action feedback system according to the present application, where the system 200 is used in the large language model-based pronunciation deviation detection and action feedback method, and the system 200 includes:
the data acquisition labeling module 210 is configured to obtain a follow-up text, obtain a bilingual pronunciation deviation dataset based on the follow-up text, and label phonemes for the bilingual pronunciation deviation dataset;
the training module 220 is configured to send the labeled bilingual pronunciation deviation dataset and phonemes corresponding to the reading text to gtp4, and perform pronunciation correction training through the gpt 4;
the preliminary feedback module 230 is configured to invoke the API of gtp4 to obtain a pronunciation action feedback data set of any number of follow-up text based on the pronunciation correction training, and perform attribute feedback on the erroneous pronunciation through the pronunciation action feedback data set;
the model building module 240 is configured to obtain a pronunciation action feedback fine tuning large language model based on the pronunciation action feedback data set;
and the error detection feedback module 250 is used for inputting any follow-up text to the pronunciation action feedback fine tuning large language model to complete pronunciation error detection and action feedback based on the large language model.
Preferably, the data collection labeling module 210 is configured to label real phonemes in the bilingual pronunciation bias data set of the follow-up text.
Preferably, the training module 220 is configured to send the labeled bilingual pronunciation deviation dataset and the phonemes corresponding to the read text to gtp4, and perform word segmentation through the gpt4 to obtain the actual phonemes after word segmentation;
inputting the corresponding relation between the preset correct phonemes and the pronunciation attribute to the gtp4;
and converting the part with the difference between the actual phoneme and the correct phoneme into a pronunciation attribute through the gtp4 based on the corresponding relation between the correct phoneme and the pronunciation attribute, and acquiring correction information of the pronunciation action.
Preferably, the training module 220 is configured to preset a correct phoneme, and input a correspondence between the correct phoneme and the pronunciation attribute, and a meaning of the pronunciation attribute of each dimension to the gtp4.
Preferably, the training module 220 is configured to convert, based on the correspondence between the phonemes and pronunciation attributes, the portion of the actual phonemes and the correct phonemes having the difference into a pronunciation attribute through the gtp4;
and judging the dimension with wrong pronunciation by checking the pronunciation attribute meaning of each dimension, and obtaining the correction information of the pronunciation action.
Preferably, the preliminary feedback module 230 is configured to invoke the API of gtp4 based on the pronunciation correction training, and add a priori knowledge of pronunciation attributes to the prompt;
acquiring a pronunciation action feedback data set of any number of follow-up texts, wherein data in the pronunciation action feedback data set is focused on pronunciation action feedback;
and carrying out attribute feedback on the mispronounced sound through the sound action feedback data set.
Preferably, the model building module 240 is configured to use Ptuning to fine tune the large language model based on the pronunciation action feedback data set to obtain a pronunciation action feedback fine-tuned large language model;
wherein the large language model comprises: chatglm6b or chatglm130b.
Preferably, the model building module 240 is configured to preset a question data set, and an answer data set;
limiting a context length of the follow-up text by the question dataset; according to the problem data set, word segmentation is carried out on phonemes corresponding to the read text and real phonemes, whether insertion, deletion and replacement errors exist or not is judged;
the answer data set includes: the answers of the question data set and feedback information of pronunciation action correction are given by the model through pronunciation attributes;
and defining the error part of the follow-up text through the question data set and the answer data set, and fine-tuning the large language model according to the error part.
Preferably, the error detection feedback module 250 is configured to input any following text to the pronunciation action feedback fine tuning large language model;
inputting any reading-following text to the pronunciation action feedback fine tuning large language model;
judging insertion, deletion or replacement errors in the follow-up text;
and generating pronunciation action feedback information aiming at the insertion, deletion or replacement errors, and finishing pronunciation deviation detection and action feedback based on a large language model.
In the embodiment of the application, a scheme capable of giving feedback to any reading text is provided. Compared with the prior industrial scheme, such as fluent app, the application of the application is wider. The application can feed back the information with pronunciation action feedback, and is more effective than the feedback information obtained by the prior art. A comparison of fig. five and fig. six shows that with the present application, feedback information of a specific pronunciation action can be obtained, which is not solved by the prior art. Thirdly, the application provides a language model method with smaller fine tuning parameter quantity, and provides that when the api is used for acquiring the data set, feedback of a word segmentation level is acquired first, and then feedback of a sentence level is acquired, so that the language model with smaller parameter quantity can be inferred under a shorter context length, and the language model with smaller parameter quantity can feed back pronunciation action information.
Fig. 9 is a schematic structural diagram of an electronic device 300 according to an embodiment of the present application, where the electronic device 300 may have a relatively large difference due to different configurations or performances, and may include one or more processors (central processing units, CPU) 301 and one or more memories 302, where at least one instruction is stored in the memories 302, and the at least one instruction is loaded and executed by the processors 301 to implement the following steps of a pronunciation deviation detection and action feedback method based on a large language model:
s1, acquiring a follow-up text, acquiring a bilingual pronunciation deviation data set based on the follow-up text, and labeling phonemes of the bilingual pronunciation deviation data set;
s2, sending the marked bilingual pronunciation deviation data set and phonemes corresponding to the reading text to gtp4, and performing pronunciation correction training through gpt 4;
s3, based on the pronunciation correction training, calling an API of the gtp4 to obtain a pronunciation action feedback data set of any number of follow-up texts, and carrying out attribute feedback on wrong pronunciation through the pronunciation action feedback data set;
s4, acquiring a pronunciation action feedback fine tuning large language model based on the pronunciation action feedback data set;
s5, inputting any follow-up text to the pronunciation action feedback fine-tuning large language model, and finishing pronunciation deviation detection and action feedback based on the large language model.
In an exemplary embodiment, a computer readable storage medium, such as a memory including instructions executable by a processor in a terminal to perform the above-described large language model-based pronunciation bias detection and action feedback method is also provided. For example, the computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and the storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

Claims (6)

1. A pronunciation deviation detection and action feedback method based on a large language model is characterized by comprising the following steps:
s1, acquiring a follow-up text, acquiring a bilingual pronunciation deviation data set based on the follow-up text, and labeling phonemes of the bilingual pronunciation deviation data set;
in the step S1, labeling phonemes on the bilingual pronunciation deviation dataset includes:
labeling real phonemes in the bilingual pronunciation deviation data set of the follow-up text;
s2, sending the marked bilingual pronunciation deviation data set and phonemes corresponding to the reading text to gtp4, and performing pronunciation correction training through the gtp4;
in the step S2, the labeled bilingual pronunciation deviation dataset and phonemes corresponding to the read text are sent to gtp4, and pronunciation correction training is performed through the gtp4, which includes:
the marked bilingual pronunciation deviation dataset and phonemes corresponding to the reading text are sent to gtp4, word segmentation is carried out through the gtp4, and the actual phonemes after word segmentation are obtained;
inputting the corresponding relation between the preset correct phonemes and the pronunciation attribute to the gtp4;
based on the corresponding relation between the correct phonemes and the pronunciation attributes, converting the part with the difference between the actual phonemes and the correct phonemes into the pronunciation attributes through the gtp4, and acquiring correction information of pronunciation actions;
the inputting the preset correspondence between the correct phonemes and the pronunciation attribute to the gtp4 includes:
presetting a correct phoneme, and inputting the corresponding relation between the correct phoneme and the pronunciation attribute meaning of each dimension to the gtp4;
based on the correspondence between the correct phoneme and the pronunciation attribute, converting the part with the difference between the actual phoneme and the correct phoneme into the pronunciation attribute by the gtp4, and obtaining the correction information of the pronunciation action, including:
converting the part with the difference between the actual phoneme and the correct phoneme into a pronunciation attribute through the gtp4 based on the corresponding relation between the correct phoneme and the pronunciation attribute;
judging the dimension with wrong pronunciation by checking the pronunciation attribute meaning of each dimension, and obtaining the correction information of pronunciation action;
s3, based on the pronunciation correction training, calling an API of the gtp4 to obtain a pronunciation action feedback data set of any number of follow-up texts, and carrying out attribute feedback on wrong pronunciation through the pronunciation action feedback data set;
s4, acquiring a pronunciation action feedback fine tuning large language model based on the pronunciation action feedback data set;
s5, inputting any follow-up text to the pronunciation action feedback fine-tuning large language model, and finishing pronunciation deviation detection and action feedback based on the large language model.
2. The method according to claim 1, wherein in the step S3, based on the pronunciation correction training, calling the API of gtp4 to obtain a pronunciation action feedback data set of any number of follow-up texts, and performing attribute feedback on the mispronounced pronunciation through the pronunciation action feedback data set includes:
based on the pronunciation correction training, calling the API of the gtp4, and adding pronunciation attribute priori knowledge into a prompt;
acquiring a pronunciation action feedback data set of any number of follow-up texts, wherein data in the pronunciation action feedback data set is focused on pronunciation action feedback;
and carrying out attribute feedback on the mispronounced sound through the sound action feedback data set.
3. The method according to claim 2, wherein in the step S4, obtaining a pronunciation action feedback fine-tuning large language model based on the pronunciation action feedback data set includes:
based on the pronunciation action feedback data set, fine tuning is carried out on the large language model by using Ptning, and a pronunciation action feedback fine tuning large language model is obtained;
wherein the large language model comprises: chatglm6b or chatglm130b.
4. A method according to claim 3, wherein said fine-tuning a large language model comprises:
presetting a question data set and an answer data set;
limiting a context length of the follow-up text by the question dataset; according to the problem data set, word segmentation is carried out on phonemes corresponding to the read text and real phonemes, whether insertion, deletion and replacement errors exist or not is judged;
the answer data set includes: the answers of the question data set and feedback information of pronunciation action correction are given by the model through pronunciation attributes;
and defining the error part of the follow-up text through the question data set and the answer data set, and fine-tuning the large language model according to the error part.
5. The method according to claim 4, wherein in step S5, inputting any read-following text to the pronunciation action feedback fine-tuning large language model to complete pronunciation deviation detection and action feedback based on the large language model includes:
inputting any reading-following text to the pronunciation action feedback fine tuning large language model;
judging insertion, deletion or replacement errors in the follow-up text;
and generating pronunciation action feedback information aiming at the insertion, deletion or replacement errors, and finishing pronunciation deviation detection and action feedback based on a large language model.
6. A large language model-based pronunciation deviation detection and action feedback system, wherein the system is used for the large language model-based pronunciation deviation detection and action feedback method as claimed in any one of claims 1 to 5, and the system comprises:
the data acquisition labeling module is used for acquiring a follow-up text, acquiring a bilingual pronunciation deviation data set based on the follow-up text, and labeling phonemes of the bilingual pronunciation deviation data set;
the data acquisition labeling module is used for labeling real phonemes in the bilingual pronunciation deviation data set of the follow-up text;
the training module is used for sending the marked bilingual pronunciation deviation data set and phonemes corresponding to the reading text to the gtp4, and carrying out pronunciation correction training through the gtp4;
the training module is used for sending the marked bilingual pronunciation deviation data set and phonemes corresponding to the reading text to the gtp4, and performing word segmentation through the gtp4 to obtain actual phonemes after word segmentation;
inputting the corresponding relation between the preset correct phonemes and the pronunciation attribute to the gtp4;
based on the corresponding relation between the correct phonemes and the pronunciation attributes, converting the part with the difference between the actual phonemes and the correct phonemes into the pronunciation attributes through the gtp4, and acquiring correction information of pronunciation actions;
the training module is used for presetting a correct phoneme, and inputting the corresponding relation between the correct phoneme and the pronunciation attribute meaning of each dimension into the gtp4;
the training module is used for converting the part with the difference between the actual phoneme and the correct phoneme into the pronunciation attribute through the gtp4 based on the corresponding relation between the phoneme and the pronunciation attribute;
judging the dimension with wrong pronunciation by checking the pronunciation attribute meaning of each dimension, and obtaining the correction information of pronunciation action;
the preliminary feedback module is used for calling the API of the gtp4 to obtain a pronunciation action feedback data set of any number of follow-up text based on the pronunciation correction training, and performing attribute feedback on wrong pronunciation through the pronunciation action feedback data set;
the model construction module is used for obtaining a pronunciation action feedback fine tuning large language model based on the pronunciation action feedback data set;
and the error detection feedback module is used for inputting any follow-up text to the pronunciation action feedback fine-tuning large language model to finish pronunciation error detection and action feedback based on the large language model.
CN202311039410.2A 2023-08-17 2023-08-17 Pronunciation deviation detection and action feedback method and system based on large language model Active CN116805495B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311039410.2A CN116805495B (en) 2023-08-17 2023-08-17 Pronunciation deviation detection and action feedback method and system based on large language model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311039410.2A CN116805495B (en) 2023-08-17 2023-08-17 Pronunciation deviation detection and action feedback method and system based on large language model

Publications (2)

Publication Number Publication Date
CN116805495A CN116805495A (en) 2023-09-26
CN116805495B true CN116805495B (en) 2023-11-21

Family

ID=88080818

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311039410.2A Active CN116805495B (en) 2023-08-17 2023-08-17 Pronunciation deviation detection and action feedback method and system based on large language model

Country Status (1)

Country Link
CN (1) CN116805495B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101739870A (en) * 2009-12-03 2010-06-16 深圳先进技术研究院 Interactive language learning system and method
CN109545189A (en) * 2018-12-14 2019-03-29 东华大学 A kind of spoken language pronunciation error detection and correcting system based on machine learning
CN112750465A (en) * 2020-12-29 2021-05-04 昆山杜克大学 Cloud language ability evaluation system and wearable recording terminal
CN115985342A (en) * 2022-12-29 2023-04-18 科大讯飞股份有限公司 Pronunciation error detection method and device, electronic equipment and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11568761B2 (en) * 2017-09-26 2023-01-31 Nippon Telegraph And Telephone Corporation Pronunciation error detection apparatus, pronunciation error detection method and program

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101739870A (en) * 2009-12-03 2010-06-16 深圳先进技术研究院 Interactive language learning system and method
CN109545189A (en) * 2018-12-14 2019-03-29 东华大学 A kind of spoken language pronunciation error detection and correcting system based on machine learning
CN112750465A (en) * 2020-12-29 2021-05-04 昆山杜克大学 Cloud language ability evaluation system and wearable recording terminal
CN115985342A (en) * 2022-12-29 2023-04-18 科大讯飞股份有限公司 Pronunciation error detection method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN116805495A (en) 2023-09-26

Similar Documents

Publication Publication Date Title
CN110717039A (en) Text classification method and device, electronic equipment and computer-readable storage medium
US20170206897A1 (en) Analyzing textual data
CN111402861B (en) Voice recognition method, device, equipment and storage medium
CN110797010A (en) Question-answer scoring method, device, equipment and storage medium based on artificial intelligence
CN111192570B (en) Language model training method, system, mobile terminal and storage medium
CN110929094A (en) Video title processing method and device
CN111312209A (en) Text-to-speech conversion processing method and device and electronic equipment
CN116860949B (en) Question-answering processing method, device, system, computing equipment and computer storage medium
CN112016271A (en) Language style conversion model training method, text processing method and device
US11322151B2 (en) Method, apparatus, and medium for processing speech signal
CN111243571A (en) Text processing method, device and equipment and computer readable storage medium
CN111126084A (en) Data processing method and device, electronic equipment and storage medium
CN116805495B (en) Pronunciation deviation detection and action feedback method and system based on large language model
CN111951785B (en) Voice recognition method and device and terminal equipment
CN112632956A (en) Text matching method, device, terminal and storage medium
CN116186219A (en) Man-machine dialogue interaction method, system and storage medium
CN110895924B (en) Method and device for reading document content aloud, electronic equipment and readable storage medium
CN111914560B (en) Text inclusion relation recognition method, device, equipment and storage medium
CN114330375A (en) Term translation method and system based on fixed paradigm
CN113314108A (en) Voice data processing method, device, equipment, storage medium and program product
CN112560431A (en) Method, apparatus, device, storage medium, and computer program product for generating test question tutoring information
CN112530406A (en) Voice synthesis method, voice synthesis device and intelligent equipment
Wang et al. Improve the accuracy of non-native speech annotation with a semi-automatic approach
CN110858268A (en) Method and system for detecting unsmooth phenomenon in voice translation system
CN113850235B (en) Text processing method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant