CN116564143A - Spoken language learning method and device based on large language model - Google Patents
Spoken language learning method and device based on large language model Download PDFInfo
- Publication number
- CN116564143A CN116564143A CN202310585313.7A CN202310585313A CN116564143A CN 116564143 A CN116564143 A CN 116564143A CN 202310585313 A CN202310585313 A CN 202310585313A CN 116564143 A CN116564143 A CN 116564143A
- Authority
- CN
- China
- Prior art keywords
- language
- text
- student
- level
- language model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 72
- 238000011156 evaluation Methods 0.000 claims description 117
- 238000006243 chemical reaction Methods 0.000 claims description 6
- 238000004590 computer program Methods 0.000 claims description 2
- 230000000694 effects Effects 0.000 abstract description 9
- 230000007246 mechanism Effects 0.000 abstract description 6
- 230000002452 interceptive effect Effects 0.000 abstract description 4
- 230000008569 process Effects 0.000 description 12
- 235000013353 coffee beverage Nutrition 0.000 description 8
- 238000010586 diagram Methods 0.000 description 8
- 238000012545 processing Methods 0.000 description 7
- 230000008901 benefit Effects 0.000 description 6
- 238000000605 extraction Methods 0.000 description 6
- 230000014509 gene expression Effects 0.000 description 5
- 235000013305 food Nutrition 0.000 description 4
- 235000013410 fast food Nutrition 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000004308 accommodation Effects 0.000 description 2
- 235000021152 breakfast Nutrition 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000005802 health problem Effects 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 239000013307 optical fiber Substances 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 241000972773 Aulopiformes Species 0.000 description 1
- 241000840267 Moma Species 0.000 description 1
- 244000269722 Thea sinensis Species 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 235000015109 caffè americano Nutrition 0.000 description 1
- 235000015115 caffè latte Nutrition 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 235000021186 dishes Nutrition 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 235000013575 mashed potatoes Nutrition 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000036651 mood Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000003252 repetitive effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 235000019515 salmon Nutrition 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B5/00—Electrically-operated educational appliances
- G09B5/04—Electrically-operated educational appliances with audible presentation of the material to be studied
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/20—Education
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B5/00—Electrically-operated educational appliances
- G09B5/08—Electrically-operated educational appliances providing for individual presentation of information to a plurality of student stations
- G09B5/14—Electrically-operated educational appliances providing for individual presentation of information to a plurality of student stations with provision for individual teacher-student communication
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/005—Language recognition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/60—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for measuring the quality of voice signals
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Theoretical Computer Science (AREA)
- Educational Technology (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Acoustics & Sound (AREA)
- Educational Administration (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Tourism & Hospitality (AREA)
- Economics (AREA)
- General Business, Economics & Management (AREA)
- Strategic Management (AREA)
- Primary Health Care (AREA)
- Marketing (AREA)
- Human Resources & Organizations (AREA)
- General Health & Medical Sciences (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Electrically Operated Instructional Devices (AREA)
Abstract
The invention relates to a spoken language learning method and device based on a large language model, electronic equipment and a computer readable medium. According to the method, the scene and the person setting are set, so that students can select interesting scenes to learn the spoken language by the aid of the specific person setting, the large language model feeds back based on the specific scene, the specific person setting, the spoken language content of the students and the spoken language level of the students, interactive learning is performed with the students, a difficulty following mechanism is adopted, the feedback difficulty of the large language model accords with the spoken language level of the students or is slightly higher than the spoken language level of the students, the effect of targeted learning according to different situations of different students is achieved, the students can learn the foreign language spoken language anytime and anywhere, and the spoken language learning is very targeted and practical based on dialogue practice of the specific scene and the person setting.
Description
Technical Field
The present invention relates to the field of computer information processing, and in particular, to a spoken language learning method and apparatus based on a large language model, an electronic device, and a computer readable medium.
Background
Spoken language learning is a major problem for students to learn foreign languages. Common ways of practicing spoken language are: removing English corners to find people for oral practice; finding out foreign language one-to-one for carrying out spoken language exercise; listening to a foreign language song, looking at a foreign language movie, mimics the dialog of the person in the play.
Each of these approaches has advantages and disadvantages.
The requirement of the English corner removal person to perform the oral exercise on the idle time is high, the time consumed on the road is relatively long, and the English corner is not always occupied, so that the person cannot want to practice at any time.
Finding foreign language one-to-one for oral practice is expensive and not affordable to many people. Moreover, many small cities have little to no exotic number.
Listening to a foreign language song, watching a foreign language movie, and simulating the dialogue of a person in a play to perform spoken language learning is also a common mode, and has the advantages of convenience, low cost and lack of interaction.
Disclosure of Invention
In view of the above, the invention provides a spoken language learning method, device, electronic equipment and computer readable medium based on a large language model, which achieve the effect of performing targeted learning according to different situations of different students, the students can learn the foreign language spoken language anytime and anywhere, and the spoken language learning is very targeted and practical based on specific scenes and dialog exercises set by people.
Other features and advantages of the invention will be apparent from the following detailed description, or may be learned by the practice of the invention.
According to an aspect of the present invention, there is provided a spoken language learning method based on a large language model, the method including: determining a target scene of student spoken language learning from a plurality of spoken language learning scenes; determining the personal settings of the students and the personal settings of the large language model based on the target scene; acquiring the voice of a student in a target scene based on the person setting of the student for spoken input, and converting the voice into a student language text; performing level evaluation based on a large language input text to obtain a text difficulty level, wherein the large language input text is the student language text or is a text which is obtained based on the conversion of the student language text and has higher difficulty than the student language text; a large language model is called, the target scene, the person setting of the student, the person setting of the large language model, the large language input text and a large language model difficulty level are sent to the large language model, so that the large language model feeds back the large language model language text based on the target scene, the person setting of the student, the large language input text and the large language model difficulty level, and the large language model difficulty level is the same as the text difficulty level or higher than the text difficulty level; and receiving the large language model language text fed back by the large language model, converting the large language model language text into voice and outputting the voice to students.
In an exemplary embodiment of the present invention, a person setting of a student has a correspondence relationship with a person setting of a large language model.
In an exemplary embodiment of the present invention, performing a level rating based on a large language input text to obtain a text difficulty rating includes: setting at least two evaluation indexes, wherein different evaluation indexes are used for evaluating the large language input text from different aspects; setting the weight of each evaluation index; determining a score of the large language input text on each of the evaluation indicators; and determining the text difficulty level according to the score of the large language input text on each evaluation index and the weight of each evaluation index.
In an exemplary embodiment of the invention, the evaluation index includes at least two of vocabulary level, grammar level, ex-vivo, fluency, number of words, number of sentences.
In an exemplary embodiment of the present invention, the large language input text is a text that is converted based on the student language text and has a higher difficulty than the student language text, the large language model difficulty level is the same as the text difficulty level, and the large language input text is subjected to a level evaluation to obtain a text difficulty level, including: extracting language elements of the student language text, and replacing the language elements with higher-level language elements to obtain the large language input text, wherein the language elements comprise at least one of vocabulary, phrase and grammar; performing level evaluation based on the large language input text to obtain the text difficulty level; and taking the text difficulty level as the large language model difficulty level.
In an exemplary embodiment of the invention, the method includes a number of loops, each loop producing student language text, large language input text corresponding to the student language text in the loop, and large language model language text corresponding to the large language input text in the loop, the method further comprising: and carrying out level evaluation based on the large language input text generated in each cycle to obtain a text difficulty level, taking the text difficulty level in each cycle as the large language model difficulty level in the cycle, and transmitting the large language model.
In an exemplary embodiment of the invention, the method further comprises: performing level evaluation on the student language text generated in each cycle, and extracting words in the student language text in two cycles when the level obtained by performing level evaluation on the student language text generated in the next cycle is equal to the level obtained by performing level evaluation on the student language text generated in the previous cycle; judging whether the grade of the vocabulary in the student language text in the last cycle is higher than that of the vocabulary in the student language text in the previous cycle; if so, the number of extracted and replaced language elements is increased in the next cycle.
In an exemplary embodiment of the invention, the method further comprises: and storing the language texts of the students and the language texts of the large language model, and arranging and displaying the language texts of the students according to the time sequence of generating the texts.
According to another aspect of the present invention, there is provided a spoken language learning device based on a large language model, the device including: a first determination unit configured to determine a target scene of student spoken learning from a plurality of spoken learning scenes; a second determining unit configured to determine a person setting of a student and a person setting of a large language model based on the target scene; the acquisition unit is used for acquiring the voice of the student for spoken input based on the person setting of the student in the target scene and converting the voice into a student language text; the evaluation unit is used for carrying out level evaluation based on a large language input text to obtain a text difficulty level, wherein the large language input text is the student language text or is a text which is obtained based on the conversion of the student language text and has higher difficulty than the student language text; the calling unit is used for calling a large language model, and sending the target scene, the student setting, the large language model setting, the large language input text and the large language model difficulty level to the large language model so that the large language model feeds back the large language model language text based on the target scene, the student setting, the large language model setting, the large language input text and the large language model difficulty level, wherein the large language model difficulty level is the same as the text difficulty level, or the large language model difficulty level is higher than the text difficulty level; and the receiving unit is used for receiving the large language model language text fed back by the large language model, converting the large language model language text into voice and outputting the voice to students.
In an exemplary embodiment of the present invention, a person setting of a student has a correspondence relationship with a person setting of a large language model.
In an exemplary embodiment of the invention, the evaluation unit comprises: a first setting subunit, configured to set at least two evaluation indexes, where different evaluation indexes are used to evaluate the large language input text from different aspects; a second setting subunit configured to set a weight of each of the evaluation indexes; a first determination subunit configured to determine a score of the large language input text on each of the evaluation indexes; and the second determining subunit is used for determining the text difficulty level according to the score of the large language input text on each evaluation index and the weight of each evaluation index.
In an exemplary embodiment of the invention, the evaluation index includes at least two of vocabulary level, grammar level, ex-vivo, fluency, number of words, number of sentences.
In an exemplary embodiment of the present invention, the large language input text is a text converted based on the student language text and having a higher difficulty than the student language text, the large language model difficulty level is the same as the text difficulty level, and the evaluation unit includes: the extraction subunit is used for extracting language elements of the student language text and replacing the language elements with more advanced language elements to obtain the large language input text, wherein the language elements comprise at least one of vocabulary, phrase and grammar; the evaluation subunit is used for performing level evaluation based on the large language input text to obtain the text difficulty level; and the third determining subunit is used for taking the text difficulty level as the large language model difficulty level.
In an exemplary embodiment of the present invention, the apparatus includes a plurality of loops, each loop generating student language text, large language input text corresponding to the student language text in the loop, and large language model language text corresponding to the large language input text in the loop, and the evaluation unit performs a level evaluation based on the large language input text generated in each loop to obtain a text difficulty level, and uses the text difficulty level in each loop as the large language model difficulty level in the loop and transmits the large language model.
In an exemplary embodiment of the invention, the apparatus further comprises: an extraction unit that extracts words in the student language text in two cycles when the level obtained by the evaluation unit performing the level evaluation on the student language text generated in the last cycle is equal to the level obtained by the evaluation on the student language text generated in the previous cycle; judging whether the grade of the vocabulary in the student language text in the last cycle is higher than that of the vocabulary in the student language text in the previous cycle; if so, the number of extracted and replaced language elements is increased in the next cycle.
In an exemplary embodiment of the invention, the apparatus further comprises: the storage unit is used for storing the language texts of the students and the language texts of the large language model, and the language texts are arranged according to the time sequence of the generated texts and are displayed to the students.
According to still another aspect of the present invention, there is provided an electronic device including: one or more processors; a storage means for storing one or more programs; the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the methods as described above.
According to a further aspect of the invention, a computer-readable medium is proposed, on which a computer program is stored which, when being executed by a processor, implements a method as described above.
According to the method and the device, the scene and the person are set, so that students can select interesting scenes to learn the spoken language by the aid of the specific person, the large language model feeds back based on the specific scene, the specific person and the spoken language content of the students, the spoken language level of the students and carries out interactive learning with the students, and a difficulty following mechanism is adopted, the difficulty of the feedback of the large language model accords with the spoken language level of the students or is slightly higher than the spoken language level of the students, the effect of carrying out targeted learning according to different situations of different students is achieved, the students can learn the spoken language of the foreign language anytime and anywhere, and the spoken language learning is very targeted and practical due to dialogue exercise of the specific scene and the person.
In addition, the technical solution of the present invention brings about many other advantages, which will be described in detail in the detailed description.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention as claimed.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the following brief description of the drawings of the embodiments will make it apparent that the drawings in the following description relate only to some embodiments of the present invention and are not limiting of the present invention.
FIG. 1 is a flow chart of a spoken language learning method based on a large language model provided in an embodiment of the present application;
FIGS. 2-1 to 2-3 are schematic diagrams of a software interface of a spoken language learning method based on a large language model according to an embodiment of the present application;
3-1 to 3-5 are schematic diagrams of a software interface of a spoken language learning method based on a large language model according to an embodiment of the present application;
FIG. 4 is a flowchart of a spoken language learning method based on a large language model provided by an embodiment of the present application;
FIG. 5 is a schematic diagram of a spoken language learning device based on a large language model, according to an example embodiment;
FIG. 6 is a block diagram of an electronic device, shown in accordance with an exemplary embodiment;
fig. 7 is a block diagram of a computer-readable medium shown according to an example embodiment.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments can be embodied in many forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the invention. One skilled in the relevant art will recognize, however, that the invention may be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known methods, devices, implementations, or operations are not shown or described in detail to avoid obscuring aspects of the invention.
The block diagrams depicted in the figures are merely functional entities and do not necessarily correspond to physically separate entities. That is, the functional entities may be implemented in software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
The flow diagrams depicted in the figures are exemplary only, and do not necessarily include all of the elements and operations/steps, nor must they be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the order of actual execution may be changed according to actual situations.
These terms are used to distinguish one element from another element. Accordingly, a first component discussed below could be termed a second component without departing from the teachings of the present inventive concept. As used herein, the term "and/or" includes any one of the associated listed items and all combinations of one or more.
Those skilled in the art will appreciate that the drawings are schematic representations of example embodiments and that the modules or flows in the drawings are not necessarily required to practice the invention and therefore should not be taken to limit the scope of the invention.
In the embodiment of the application, the spoken language exercise and the spoken language learning are the same concept.
Fig. 1 is a flowchart of a spoken language learning method based on a large language model, which is an artificial intelligence technology driven natural language processing tool, such as ChatGPT, according to an embodiment of the present application. The method can be executed by an application program installed on a mobile terminal such as a mobile phone.
As shown in fig. 1, the method includes:
step S101: a target scene of student spoken language learning is determined from a plurality of spoken language learning scenes.
Step S102: and determining the personal settings of the students and the personal settings of the large language model based on the target scene.
Step S103: and acquiring the voice of the student in the target scene based on the person setting of the student for spoken input, and converting the voice into the language text of the student.
Step S104: and carrying out level evaluation based on the large language input text to obtain a text difficulty level, wherein the large language input text is student language text or is text which is obtained based on student language text conversion and has higher difficulty than the student language text.
Step S105: and calling the large language model, and transmitting the target scene, the student's person setting, the large language input text and the large language model difficulty level to the large language model so that the large language model feeds back the large language model language text based on the target scene, the student's person setting, the large language model setting, the large language input text and the large language model difficulty level, wherein the large language model difficulty level is the same as the text difficulty level or higher than the text difficulty level.
Step S106: and receiving the large language model language text fed back by the large language model, and converting the large language model language text into voice to be output to students.
The inventor finds that students learn spoken language, practice can be very specific in a certain scene, and better learning effect can be achieved. The specific scene has its corresponding common words, phrases and sentences. The spoken language is exercised in the specific scene, so that the common vocabulary, phrases and sentences corresponding to the specific scene can be exercised in a targeted manner, and students can master the common vocabulary, phrases and sentences corresponding to the specific scene in a targeted manner.
As shown in fig. 2-1, 2-2, and 2-3, the scene may be: introduction of oneself, talking about hobbies, talking about pets, old friends' reunion, job site new life, ordering at restaurants, ordering coffee at fast food restaurants, discussion party activities, shopping lists, purchasing groceries, describing shopping needs, directions, weekend plans, holiday plans, airport boarding, customs clearance, talking to taxi drivers, hotel check-in, talking about fitness, talking about sports, party invitations, seeking help, expressing thanks, thanks and billboards, expressing apology, shopping advice, requiring returns, restaurant reservations, ordering, dissatisfaction and complaints about foods, settling at restaurants, health problems, dialing 911, talking about apartments, dissatisfaction and complaints about hotel accommodations, team construction, moving gifts, expressing sadness, comfort friends, expressing support, and the like.
And the students click on a certain scene displayed on the interface to enter a spoken language exercise interface corresponding to the scene.
Step S102 is to determine the personal settings of the student and the personal settings of the large language model based on the target scene. The inventors found that in many cases, a scene has a correspondence with a person setting, a specific scene corresponds to a specific person setting, and a correspondence is also present between two person settings of a conversation.
For example, if a scene is a restaurant reservation, a dialogue is often performed between a customer and a restaurant service person, and thus, if a person is a customer and a restaurant service person, a student person may be a customer, and a person with a large language model may be a restaurant service person.
For example, if a scene is a spot coffee, a dialogue is often performed between a customer and a cafeteria, and thus, if a person is a customer and a cafeteria, a student person may be a customer, and a person with a large language model may be a cafeteria.
For example, since a scene is to describe a shopping need, a dialogue is often performed between a customer and a store clerk, and thus a person is a customer and a store clerk, a person of a student may be a customer, and a person of a large language model may be a store clerk.
For example, if a scene is to talk with a taxi driver, a conversation is often performed between a passenger and the taxi driver, and if a student person is a passenger, a large language model person is a taxi driver.
For example, if the scene is a hotel check-in, a dialogue is often performed between a passenger and a service person of the hotel, the person of the student may be the passenger, and the person of the large language model may be the service person of the hotel.
For example, when a scene is talking about exercise, a customer and a staff of an exercise room often conduct a conversation, a person of a student may be a customer, and a person of a large language model may be a staff of an exercise room.
The inventors have found that two person settings for a dialogue can be determined from a certain specific scene (target scene), which are assigned to students and large language models, respectively. As an alternative implementation, the active questioning person is assigned to students, and the passive answering person is assigned to large language models, so that the initiative and the flexibility of the students to learn can be improved.
Step S104 is to perform a level rating based on the large language input text, the rating mainly including the steps of: setting at least two evaluation indexes, wherein different evaluation indexes are used for evaluating the large language input text from different aspects; setting the weight of each evaluation index; determining the score of the large language input text on each evaluation index; and determining the text difficulty level according to the score of the large language input text on each evaluation index and the weight of each evaluation index.
In an alternative embodiment, in step S104, the large language input text is student language text. Under the condition, the level evaluation is carried out on the large language input text to obtain the text difficulty level, the obtained difficulty level is the student language text, and the spoken language level of the student can be directly indicated.
In another alternative embodiment, in step S104, the large language input text is text that is converted based on the student language text and that is more difficult than the student language text. For example, extracting language elements of the student language text, and replacing the language elements with higher-level language elements to obtain the large language input text, wherein the language elements comprise at least one of vocabulary, phrase and grammar.
The application software builds a thesaurus in which words of the same meaning are given different levels.
As an alternative embodiment, at least one word in the text is replaced with the same meaning but higher level word in the synonym thesaurus. By replacing the language input of the student with higher level words, the large language model provides a higher level language reply.
Application software builds a library of synonymous phrases, where phrases of the same meaning are assigned different levels.
As an alternative embodiment, at least one phrase in the text is replaced with a phrase of the same meaning but higher level in the library of synonymous phrases. By replacing the language input of the student with a higher level phrase, the large language model is enabled to provide a higher level language reply.
The application software builds a synonymous grammar library in which grammars of the same meaning are assigned different levels.
As an alternative embodiment, at least one grammar in the text is replaced with the same meaning but higher level grammar in the synonymous grammar library. By replacing the language input of the student with a higher level grammar, the large language model is enabled to provide a higher level of language reply.
The method aims at extracting and replacing the language elements to obtain a text with higher difficulty than the original text, and sending the text to a large language model, wherein the large language model can feed back the text corresponding to the difficulty level of the text, so that the difficulty of the text obtained from the large language model is slightly higher than that of student input, students can be effectively guided, and the conversation with the students by using very simple words/grammar/phrases is avoided.
In this case, the large language input text is subjected to a level rating to obtain a text difficulty level. The obtained difficulty level of the language text of the student is not necessarily the level of the spoken language of the student, and the level of the spoken language of the student is not necessarily indicated directly.
Different evaluation indexes are used for evaluating the text from different aspects, and the text can be evaluated relatively comprehensively under the condition that the evaluation indexes are more. As an alternative embodiment, the evaluation index includes vocabulary level, grammar level, delicacy, fluency, number of words, number of sentences.
The embodiment of the application is in a human-computer dialogue mode, namely, students speak a section of speech, and then an application program answers the section of speech; then the student speaks a section of speech again, and the application program responds to the section of speech again; the student then speaks a session and the application responds to the session again. The student speaks a session and the application answers the session, which is a dialog unit. The "a paragraph" is generally one or several sentences. The learning method of the embodiment of the application can comprise a plurality of dialogue units in one spoken language learning.
The evaluation of step S104 is based on one dialogue unit.
The word level is an evaluation index for evaluating the degree of high-level of words contained in a text, for example, whether a word used by a student in a spoken language is a four-level word or a six-level word.
The grammar level is an evaluation index for evaluating the high level of grammar contained in a text, for example, whether grammar used by a student's spoken language contains a more complex grammar, such as whether virtual mood is contained, whether completion is contained, whether clauses are contained, or the like.
The evaluation index of the suitability is used for evaluating whether the text is suitable or not.
The fluency level is an evaluation index for evaluating whether or not a text is fluent.
The evaluation index of the number of words is used to evaluate how many words the text contains.
The evaluation index, which is the number of sentences, is used to evaluate how many sentences the text contains.
The six evaluation indexes of vocabulary, grammar, delicacy, fluency, word number and sentence number are respectively given a weight, and the sum of the weights of the six evaluation indexes is 1. For example, the weight of the vocabulary is W (V), the weight of the grammar is W (G), the weight of the corpora is W (a), the weight of the fluency is W (F), the weight of the number of words is W (W), and the weight of the number of sentences is W (S).
The large language input text is respectively scored according to the factors of vocabulary, grammar, meaning, fluency, word number and sentence number, and six scores are obtained, namely Score (V), score (G), score (A), score (F), score (W) and Score (S). Wherein 1 is the lowest score; 10 is the highest score.
S=Score(V)×W(V)+Score(G)×W(G)+Score(A)×W(A)+Score(F)×W(F)+Score(W)×W(W)+Score(S)×W(S)
Where S represents the score obtained by evaluating the large language input text.
If the score S of the large language input text is between [1,3 ], the text difficulty level is a first level.
If the score S of the large language input text is between (3, 5), the text difficulty level is a second level.
If the score S of the large language input text is between (5,6.5), the text difficulty rating is a third level.
If the score S of the large language input text is between (6.5,8), the text difficulty rating is a fourth level.
If the score S of the large language input text is between (8,9.5), the level of text difficulty is a fifth level.
If the score S of the large language input text is between (9.5, 10), the text difficulty level is a sixth level.
Wherein the first level is the lowest level and the sixth level is the highest level.
As an alternative embodiment, W (V) =25%; w (G) =20%; w (a) =15%; w (F) =15%; w (W) =15%; w (S) =10%.
Assume that the large language input text scores 1 at the vocabulary level, grammar level, delicacy, fluency, number of words, number of sentences, the student's score is 1×25% +1×20% +1×15% +1×10% =1. The score of the large language input text is between [1,3 ], the text difficulty level is the first level.
Assume that the large language input text scores 4, 2, 3,5, 3, 4 at vocabulary level, grammar level, delicacy, fluency, number of words, number of sentences, respectively, the student's score is 4×25% +2×20% +3×15% +5×15% +3×15% +4×10% =3.45. The score S of the large language input text is between (3, 5), the text difficulty level is the second level.
Assuming that the large language input text has scores of 5, 7, 6, 4, 5 on the vocabulary level, grammar level, meaning, fluency, number of words, number of sentences, respectively, the student has a score of 5×25% +7×20% +6×15% +4×15% +5×10% =5.25. The score of the large language input text lies between (5,6.5), the text difficulty rating is a third level.
Assuming that the large language input text has scores of 6, 9, 8, 7, and 7 on the factors of vocabulary level, grammar level, meaning, fluency, number of words, and number of sentences, respectively, the student has a score of 6×25% +6×20% +9×15% +8×15% +7×15% +7×10% =7. The score of the large language input text lies between (6.5,8), the text difficulty rating is the fourth level.
Assume that the large language input text scores 10, 8, 7, 9, 8, 9 at vocabulary level, grammar level, delicacy, fluency, number of words, number of sentences, respectively, the student's score is 10×25% +8×20% +7×15% +9×15% +8×15% +9×10% =8.6. The score of the large language input text is between (8.1,9.5), the text difficulty rating is the fifth level.
Assume that the large language input text scores 10 at the vocabulary level, grammar level, delicacy, fluency, word number, sentence number and the like, the student's score is 10×25% +10×20% +10×15% +10×10% =10. The score of the large language input text is between (9.5, 10), and the text difficulty level is the sixth level.
It should be noted that "large language input text is student language text, or" text that is converted based on student language text and is more difficult than student language text "does not necessarily mean that the difficulty level is higher. As can be seen from the judgment mode of the level, each level contains a certain score interval, the score of the large language input text is probably higher than that of the student language text, but the difficulty level of the large language input text and the student language text is the same. For example, student language text has a score of 8.2, large language input text has a score of 8.8, and both are at the fifth level of difficulty. It is also possible that the difficulty level of inputting text in large languages is higher than that of student language text. For example, if the score of the student language text is 8 and the score of the large language input text is 8.5, the difficulty level of the student language text is the fourth level and the difficulty level of the large language input text is the fifth level.
The target scene, the person setting of the student, the person setting of the large language model, the large language input text, and the large language model difficulty level in step S105 are instruction information (may also be referred to as prompt information) to the large language model. After receiving the instruction information, the large language model feeds back a text.
The large language model difficulty level is the difficulty level of the spoken language for which large language model feedback is desired. If the student's spoken language level is very poor, the difficulty level of feedback of the large language model is also expected to be low; if the student's spoken language level is very good, it is desirable that the level of difficulty of feedback of the large language model is also high. This is to enable feedback of the large language model to match the spoken language level of the student to enhance the effect of the student's spoken language learning.
As an alternative implementation, the large language model difficulty level is the same as the student spoken language level. So that students can talk easily.
As another alternative, the large language model difficulty level is one level higher than the student spoken language level. The inventors have found that many students have a higher hearing level than spoken, i.e. do not have to speak for an understandable. In other words, if a student does not memorize a sentence pattern or a vocabulary well, the student cannot be proficiently and flexibly applied, but can well understand the sentence pattern or the vocabulary when hearing the sentence pattern or the vocabulary. This means that the feedback of a large language model can be a level higher than the difficulty of the student's spoken language level, which the student can still understand. In addition, the understanding and memorizing of the sentence patterns and the vocabularies which are not fully mastered by students are further enhanced.
The method of the embodiment of the application comprises a plurality of loops, each loop can generate student language texts, large language input texts corresponding to the student language texts in the loop and large language model language texts corresponding to the large language input texts in the loop, and the method further comprises the steps of: and carrying out level evaluation based on the large language input text generated in each cycle to obtain a text difficulty level, taking the text difficulty level in each cycle as the large language model difficulty level in the cycle, and transmitting the large language model.
Step S103, step S104, step S105, step S106 constitute a loop, one loop starting with the student 'S speech and ending with the speech of the application program' S response. In one learning process, a plurality of cycles are included, including twenty cycles, for example. The "loop" is the same as the meaning of the "dialog unit" mentioned above.
The evaluation step of the method of the present application occurs at each circulation/dialogue unit, so that the application program can evaluate the student level in real time, thereby being able to feed back the text and voice corresponding to the actual level of the student. For example, students are somewhat lively because they have not practiced for a long time, and they have a poor swing in the first five cycles, but they enter a state later and perform normally from the sixth cycle. The difficulty following mechanism of the method can enable the application program to output simpler voice in the first five loops, and output more complex voice from the sixth loop, and the adjustment is used for matching the level currently exhibited by students, so that the students can learn freely at any time.
The method of the embodiment of the application further comprises the following steps: performing level evaluation on the student language text generated in each cycle, and extracting words in the student language text in two cycles when the level obtained by performing level evaluation on the student language text generated in the next cycle is equal to the level obtained by performing level evaluation on the student language text generated in the previous cycle; judging whether the grade of the vocabulary in the student language text in the last cycle is higher than that of the vocabulary in the student language text in the previous cycle; if so, the number of extracted and replaced language elements is increased in the next cycle.
And the student language text is subjected to level evaluation, so that the real spoken language level of the student can be determined.
In some cases, the vocabulary used by the students in the process of learning spoken language is more and more advanced, but is reflected in the horizontal rating, but is not visible. This is because each level contains a fractional interval, and a fractional increase does not necessarily mean that the level is also increased.
From another perspective, even if the level rating of the student is the same in both cycles, it does not mean that the student's spoken level does not have any progress.
Under the condition that the level grades of students in the two cycles are the same, extracting words in language texts of the students in the two cycles; if the level of the vocabulary in the student language text in the last cycle is higher than the level of the vocabulary in the student language text in the previous cycle, the students are more and more proficiently explained to use the vocabulary and use the higher-level vocabulary. Then the number of extracted and replaced language elements is increased in the next cycle. For example, if one vocabulary was originally extracted and replaced in each cycle, two or three vocabularies may be extracted and replaced in the next cycle. For another example, if a vocabulary was originally extracted and replaced in each cycle, a vocabulary and a phrase may be extracted and replaced in the next cycle. Because the language element extraction and replacement are to replace the original vocabulary, phrase and grammar with higher-level vocabulary, phrase and grammar, the difficulty of the obtained text is higher than that of the original input of students. The more substitutions, the more difficult the text is to be than the student would have to input. If the level of the vocabulary in the language text of the student in the last cycle is higher than that of the vocabulary in the language text of the student in the previous cycle, the students are more and more proficiently explained to use the vocabulary, and in this case, the difficulty of feeding back the text of the large language model can be improved by replacing more language elements, so that the text with higher difficulty is fed back by the large language model, and the level improvement of the students is adapted.
The method provided by the embodiment of the application further comprises the following steps: and storing the language texts of the students and the language texts of the large language model, and arranging and displaying the language texts to the students according to the time sequence of generating the texts so as to facilitate the review of the students.
According to the method and the device, the scene and the person are set, so that students can select interesting scenes to learn the spoken language by the aid of the specific person, the large language model feeds back based on the specific scene, the specific person and the spoken language content of the students, the spoken language level of the students and carries out interactive learning with the students, and a difficulty following mechanism is adopted, the difficulty of the feedback of the large language model accords with the spoken language level of the students or is slightly higher than the spoken language level of the students, the effect of carrying out targeted learning according to different situations of different students is achieved, the students can learn the spoken language of the foreign language anytime and anywhere, and the spoken language learning is very targeted and practical due to dialogue exercise of the specific scene and the person.
The embodiment of the application provides application software which can execute the spoken language learning method based on the large language model.
The students can select the interested scenes to learn by logging in the application software.
As shown in fig. 2-1, 2-2, and 2-3, the scene may be: introduction of oneself, talking about hobbies, talking about pets, old friends' reunion, job site new life, ordering at restaurants, ordering coffee at fast food restaurants, discussion party activities, shopping lists, purchasing groceries, describing shopping needs, directions, weekend plans, holiday plans, airport boarding, customs clearance, talking to taxi drivers, hotel check-in, talking about fitness, talking about sports, party invitations, seeking help, expressing thanks, thanks and billboards, expressing apology, shopping advice, requiring returns, restaurant reservations, ordering, dissatisfaction and complaints about foods, settling at restaurants, health problems, dialing 911, talking about apartments, dissatisfaction and complaints about hotel accommodations, team construction, moving gifts, expressing sadness, comfort friends, expressing support, and the like.
For example, a student clicking on the "order at restaurant" module of FIG. 2-2, then enters the interface shown in FIG. 3-1. The interface displays "you are in a restaurant you want to try at all times. After browsing the menu, the attendant orders dishes. Now tell them what dinner you want to eat. "the interface also displays:
“Helpful words and expressions
I’d like to have the grilled salmon.
I’ll have the steak with mashed potatoes.”
These sentences are used to prompt the student that if the student does not know what to say, the prompted sentences can be said to initiate the learning process. The student can also ignore the sentences of the prompts and directly say that he wants to express. After clicking "start", the student enters the language learning process, and the student can output spoken language.
For example, the student clicks on the "spot coffee" module of FIG. 2-2 and enters the interface shown in FIG. 3-2. The interface displays "now you are in a fashion modern coffee shop. What do you want coffee, tea, and others? Telling the coffeemaker what you want. "the interface also displays:
“Helpful words and expressions
A hot Americano, please.
Hi, can I get a medium latte, please? ”
these sentences are used to prompt the student that if the student does not know what to say, the prompted sentences can be said to initiate the learning process. The student can also ignore the sentences of the prompts and directly say that he wants to express. After clicking "start", the student enters the language learning process, and the student can output spoken language.
For example, a student clicking on the "describe shopping needs" module of FIG. 2-2, then enters the interface shown in FIG. 3-3. The interface displays "you are in a clothing store, looking for new jackets. But you do not have too much budget, asking the clerk to help you find a suitable coat you can afford. "the interface also displays:
“Helpful words and expressions
I’m looking for a jacket.
Do you have this in black?”
These sentences are used to prompt the student that if the student does not know what to say, the prompted sentences can be said to initiate the learning process. The student can also ignore the sentences of the prompts and directly say that he wants to express. After clicking "start", the student enters the language learning process, and the student can output spoken language.
For example, a student clicking on the "talk to taxi driver" module of fig. 2-2, then enters the interface shown in fig. 3-4. The interface displays "now you have taken a new york taxi. Telling taxi drivers where to go. At the same time, they are asked how long they take. "the interface also displays:
“Helpful words and expressions
I’m going to MOMA.
Can you take me to the Central train station?”
these sentences are used to prompt the student that if the student does not know what to say, the prompted sentences can be said to initiate the learning process. The student can also ignore the sentences of the prompts and directly say that he wants to express. After clicking "start", the student enters the language learning process, and the student can output spoken language.
For example, a student clicking on the "hotel check-in" module of fig. 2-2, then enters the interface shown in fig. 3-5. The interface displays "you have now arrived at a reserved hotel where check-in is handled and your passport is ready. Inquiring about Wi-Fi, breakfast, gym, or laundry services, etc. "the interface also displays:
“Helpful words and expressions
We have a reservation under Linda Holmes.
What time is the breakfast served?”
These sentences are used to prompt the student that if the student does not know what to say, the prompted sentences can be said to initiate the learning process. The student can also ignore the sentences of the prompts and directly say that he wants to express. After clicking "start", the student enters the language learning process, and the student can output spoken language.
As can be seen from the interfaces shown in fig. 3-1 to 3-5, both the personal settings of the student and the personal settings of the large language model are associated with the scene, and the personal settings of the student and the personal settings of the large language model have a corresponding relationship. In some commercialized or consumer scenarios, the student's person is the customer, while the large language model person is the person providing the business service. In some chat scenarios, a student's person is set to be in a friend relationship, colleague relationship, or the like with a person of a large language model. According to the embodiment of the application, the students can enter the specific scene through the design of the scene and the person, and the dialogue exercise is performed with the large language model based on the specific person in the specific scene, so that the students can know the common words, phrases, sentences and the like in the specific scene in a skilled manner, and the pertinence is very strong. Students can improve the spoken language level in a short time, and perceive the improvement of the spoken language level, and the confidence and the interest of the spoken language learning are increased. Moreover, the dialogue training based on the specific person setting and the large language model has extremely strong practicability under the specific scene.
Compared with the method for simulating the dialogue of a person in a play by listening to a foreign language song and watching a foreign language movie, which is mentioned in the background art, the method provided by the embodiment of the application is strong in interactivity and practicability, can be used for specific practice in daily life and in the spoken language learning, and can greatly improve the efficiency of the spoken language learning.
When a student clicks on "start" in the interfaces shown in fig. 3-1 to 3-5, the student speaks a word, and the word is converted into text by the application software provided in the embodiment of the application, and the application software scores the text from different aspects (vocabulary level, grammar level, delicacy, fluency level, word number and sentence number), and weights the scored text to judge the spoken language level of the student. The feedback difficulty of the large language model is determined according to the spoken language level of the student. If the spoken language level of the student is high, the feedback difficulty of the large language model is high; if the spoken language level of the student is low, the feedback of the large language model is difficult, so that the feedback adaptability of the large language model is matched with the spoken language level of the student.
And calling the large language model, and sending the specific scene selected by the student, the personal setting of the large language model, the text generated according to the spoken language of the student (language element extraction and replacement processing can be performed), and the designated difficulty level (namely the difficulty level of the large language model) to the large language model so that the large language model feeds back the text according to the specific scene, the personal setting of the student, the personal setting of the large language model, the text generated according to the spoken language of the student and the designated difficulty level. The application software receives the text fed back by the large language model, converts the text into voice and outputs the voice to students. To this end, one cycle ends. Next, the student speaks in the next cycle, the application software converts the student speaking into a text, the text is scored and evaluated, the student level is determined according to the evaluation result, the feedback difficulty of the large language model is specified according to the student level, the specific scene selected by the student, the person setting of the large language model, the text generated according to the student spoken language and the specified difficulty level are sent to the large language model, and the large language model feeds back the text. The application software receives the text fed back by the large language model, converts the text into voice and outputs the voice to students. To this end, the further cycle is completed. In spoken language practice for a particular scenario, which typically includes tens of cycles, for example, twenty cycles, the student speaks twenty in total, the large language model feeds back twenty in total, and the spoken language level of the student is evaluated twenty times (once per cycle), the benefit of this is the difficulty of dynamically adjusting the text fed back by the large language model over time according to the spoken language level of the student. This is so designed because the inventors have found that many students are not able to develop a true level at the beginning of a conversation, but are lower than the true level. The reason for this phenomenon is: although the spoken language exercise has been started, the state is not completely entered, and the state is gradually entered with the progress of the dialogue, and the true level is exerted. If the spoken language level of the student is determined according to the text of the student at the initial stage of the dialogue, the obtained evaluation tends to be low relative to the real situation, so that the feedback difficulty of the large language model is also low, which is unfavorable for the student to learn the spoken language. The method can dynamically adjust the evaluation of the spoken language level of the student, so that the evaluation which is closer to the actual situation of the student is obtained.
As an alternative implementation, six difficulty levels are set, from low to high, A1, A2, B1, B2, C1, C2 respectively.
The lexical level is divided into three levels: a1, A2, B1.
Phrase level is divided into three levels: a1, A2, B1.
Grammar level is divided into three levels: a1, A2, B1.
Based on six difficulty levels, six modes, namely an A1 mode, an A2 mode, a B1 mode, a B2 mode, a C1 mode and a C2 mode, are set for the large language model. The difficulty levels of the output texts corresponding to the six modes are A1, A2, B1, B2, C1 and C2 respectively. Namely, in the A1 mode, the difficulty level of the text fed back by the large language model is A1; in the A2 mode, the difficulty level of the text fed back by the large language model is A2; in the B1 mode, the difficulty level of the text fed back by the large language model is B1; in the B2 mode, the difficulty level of the text fed back by the large language model is B2; in the C1 mode, the difficulty level of the text fed back by the large language model is C1; in the C2 mode, the difficulty level of the text fed back by the large language model is C2.
In one cycle, the spoken language level of the student in the cycle is detected, and the method of evaluating the spoken language level of the student based on six evaluation indexes of vocabulary, grammar, deliberance, fluency, number of words, and number of sentences is described above. This is not the only way. In some embodiments, the evaluation may also be performed in two steps, specifically: firstly, detecting the number of words in a vocabulary, grammar and a sentence, if the vocabulary used by a student is found to be very simple (A1), the grammar is found to be very simple (A1), and the number of words in a sentence is also very small (for example, less than 10), then a conclusion that the spoken language of the student is very poor can be obtained, and no evaluation is required from aspects of the delicacy, fluency and the like. If it is found that the vocabulary used by the student belongs to the medium difficulty level (A1), the grammar also belongs to the medium difficulty level (A1), and the number of words contained in one sentence is also moderate (for example, 10 to 30), then the evaluation is further performed from more dimensions.
If the student's spoken language level assessment in a certain cycle is A1, then the large language model is instructed to feed back in A2 mode.
If the student's spoken language level evaluation result in a certain cycle is A2, the large language model is instructed to feed back in B1 mode.
If the student's spoken language level assessment in a certain cycle is B1, then the large language model is instructed to feed back in B2 mode.
If the student's spoken language level assessment in a certain cycle is B2, then the large language model is instructed to feed back in C1 mode.
If the student's spoken language level assessment in a certain cycle is C1, then the large language model is instructed to feed back in C2 mode.
If the student's spoken language level assessment in a certain cycle is C2, then the large language model is instructed to feed back in C2 mode.
In the method provided by the embodiment of the application, the spoken language level evaluation result of the student in two adjacent cycles can jump, for example, the spoken language level evaluation result of the student in the first cycle is A2, the spoken language level evaluation result in the second cycle is B2, and the level of B1 is directly passed. Accordingly, the large language model feeds back in the B1 mode in the first cycle and in the C1 mode in the second cycle, overriding the level of B2. The benefits of allowing the spoken language level assessment to be jumpable are: as described above, the spoken language level of many students cannot be fully developed at the initial stage of the spoken dialogue, but is lower than the real level, and the real spoken language level can be gradually developed as the dialogue advances. The evaluation of the spoken language level of the student is dynamically adjusted in time so as to be maximally close to the real spoken language level of the student, and misjudgment is reduced. Only on the basis of accurately evaluating the true spoken language level of the student, the large language model can be instructed to output the text with expected difficulty. For example, student A had a true spoken language level of B2, but had just begun to be somewhat lively due to the lack of practice, so the first cycle had a lower rating of A2. The large language model is fed back in B1 mode. The student still had an evaluation rating of A2 in the second cycle. The large language model is still fed back in B1 mode. The students gradually show the true level in the third cycle, and the evaluation grade is increased to B2. The large language model is fed back in C1 mode. As can be seen, the method provided by the present application includes a plurality of loops in a single spoken language exercise, each loop including a student dialogue portion and a large language model dialogue portion. The large language model conversation part is feedback to the student conversation part. The student dialogue portion is rated horizontally in each cycle. The difficulty following mechanism is adopted, so that the difficulty level of the large language model dialogue part in each cycle changes along with the difficulty level of the student dialogue part.
The method provided by the application dynamically adjusts the evaluation of the spoken language level of the student so as to realize the accurate evaluation of the spoken language level of the student; the difficulty of the text fed back by the large language model is determined by the evaluation of the spoken language level of the student obtained in each cycle, so that the difficulty of the text fed back by the large language model is equal to or slightly higher than the spoken language level of the student.
Fig. 4 is a flowchart of a method provided in an embodiment of the present application. After the application software is installed on the mobile terminal, the students can learn the spoken language. The method comprises the following steps:
step S201: the student selects a scene.
The application software provides a scene selection interface with a plurality of scenes, such as those shown in fig. 2-1, 2-2 and 2-3, and the scenes of introducing oneself, talking about hobbies, talking about pets, old friends' reunion, new life in a job site, ordering food in a restaurant, ordering coffee in a fast food restaurant, ordering food in a discussion party, shopping list and the like.
After clicking a module of any scene, the student selects the scene, and can enter a spoken language exercise interface corresponding to the scene.
Step S202: the application software determines the settings of the person according to the scene.
The human settings include those of students and those of large language models.
The inventors found that in many cases, a scene has a correspondence with a person setting, a specific scene corresponds to a specific person setting, and a correspondence is also present between two person settings of a conversation.
For example, if a scene is a spot coffee, a dialogue is often performed between a customer and a cafeteria, and thus, if a person is a customer and a cafeteria, a student person may be a customer, and a person with a large language model may be a cafeteria.
For example, since a scene is to describe a shopping need, a dialogue is often performed between a customer and a store clerk, and thus a person is a customer and a store clerk, a person of a student may be a customer, and a person of a large language model may be a store clerk.
The inventors have found that two person settings for a dialogue can be determined from a certain specific scene (target scene), which are assigned to students and large language models, respectively. As an alternative implementation, the active questioning person is assigned to students, and the passive answering person is assigned to large language models, so that the initiative and the flexibility of the students to learn can be improved.
Step S203: the student inputs speech.
Step S204: the speech is converted to text.
Step S205: extracting elements in the text and replacing the elements to obtain a higher-level text.
The elements in the text include at least one of: vocabulary, phrases, grammar.
The application software builds a thesaurus in which words of the same meaning are given different levels.
At least one word in the text is replaced with the word with the same meaning but higher level in the synonym thesaurus. By replacing the language input of the student with higher level words, the large language model provides a higher level language reply.
Application software builds a library of synonymous phrases, where phrases of the same meaning are assigned different levels.
At least one phrase in the text is replaced with a phrase of the same meaning but higher level in the library of synonymous phrases. By replacing the language input of the student with a higher level phrase, the large language model is enabled to provide a higher level language reply.
The application software builds a synonymous grammar library in which grammars of the same meaning are assigned different levels.
At least one grammar in the text is replaced with a grammar in the synonymous grammar library that has the same meaning but a higher level.
By replacing the language input of the student with a higher level grammar, the large language model is enabled to provide a higher level of language reply.
Step S206: the higher level text is rated.
The evaluation mainly comprises the following steps: setting at least two evaluation indexes, wherein the evaluation indexes comprise at least two of vocabulary level, grammar level, delicacy, fluency degree, word number and sentence number, and different evaluation indexes are used for evaluating the text from different aspects.
Step S207: and calling the large language model, and sending the scene, the person setting, the higher-level text and the level evaluation to the large language model.
Step S208: text fed back by the large language model is received.
Step S209: and converting the text fed back by the large language model into voice.
Steps S203 to S209 are loops starting with the student input voice and ending with the application software feeding back the voice to the student, completing a dialogue. Such a cycle may be performed several times until the cycle is completed. The condition for the end of the cycle may be a preset exhaustion of the cycle number or a student chooses to end the session.
The inventors have found that many students have a higher hearing level than spoken, i.e. do not have to speak for an understandable. In other words, if a student does not understand a sentence pattern or a word well, the student cannot be proficiently and flexibly applied, but can understand the sentence pattern or the word well when hearing the sentence pattern or the word. This means that the feedback of a large language model can be a level higher than the difficulty of the student's spoken language level, which the student can still understand. By extracting and replacing elements in the text, a higher-level text is obtained, words, phrases and sentences used by the higher-level text are more advanced, and the result of the grade evaluation may be higher than that of the original text, so that when a scene, a person, the higher-level text and the grade evaluation of the higher-level text are sent to a large language model, the large language model can feed back a text with a higher level than that of a student, and when application software converts the text fed back by the large language model into voice, students can hear a language dialogue with a higher level than that of spoken language, which is very beneficial to the understanding and memorization of the sentence and the word which are not fully mastered by the student.
As shown in fig. 4, the embodiment of the present invention further provides a spoken language learning device based on a large language model, where the device includes: the first determining unit 10, the second determining unit 20, the acquiring unit 30, the evaluating unit 40, the retrieving unit 50, and the receiving unit 60.
A first determining unit 10 for determining a target scene of student spoken learning from a plurality of spoken learning scenes.
A second determining unit 20 for determining a person setting of the student and a person setting of the large language model based on the target scene.
And an acquisition unit 30 for acquiring a voice of a student in a target scene based on a person setting of the student for spoken input and converting the voice into a student language text.
And an evaluation unit 40, configured to perform a level evaluation based on a large language input text, where the large language input text is the student language text, or is a text that is converted based on the student language text and has a higher difficulty than the student language text, to obtain a text difficulty level.
And a retrieving unit 50, configured to retrieve a large language model, and send the target scene, the student's setting, the large language model setting, the large language input text, and a large language model difficulty level to the large language model, so that the large language model feeds back the large language model language text based on the target scene, the student's setting, the large language model setting, the large language input text, and the large language model difficulty level, where the large language model difficulty level is the same as the text difficulty level, or the large language model difficulty level is higher than the text difficulty level.
And a receiving unit 60 for receiving the language text of the large language model fed back by the large language model, and converting the language text of the large language model into voice to output to students.
Through setting up scene, people and setting for the student can select the scene of interest to establish with specific people and carry out the study of spoken language, big language model carries out the feedback based on specific scene, specific people and student's spoken language content, student's spoken language level, and carry out interactive study with the student, and, adopt the degree of difficulty to follow the mechanism, the degree of difficulty of feedback of big language model accords with or is slightly higher than student's spoken language level, reached the effect of carrying out the pertinence study according to different student's different conditions, the student can carry out the study of foreign language spoken language anytime and anywhere, and based on the dialogue exercise that specific scene and people established makes the study of spoken language very pertinence and practicality.
Alternatively, the person settings of the student have a correspondence with the person settings of the large language model.
Alternatively, the evaluation unit 40 includes: a first setting subunit, configured to set at least two evaluation indexes, where different evaluation indexes are used to evaluate the large language input text from different aspects; a second setting subunit configured to set a weight of each of the evaluation indexes; a first determination subunit configured to determine a score of the large language input text on each of the evaluation indexes; and the second determining subunit is used for determining the text difficulty level according to the score of the large language input text on each evaluation index and the weight of each evaluation index.
Optionally, the evaluation index includes at least two of vocabulary level, grammar level, delicacy, fluency, number of words, number of sentences.
Optionally, the large language input text is a text that is converted based on the student language text and has a higher difficulty than the student language text, the large language model difficulty level is the same as the text difficulty level, and the evaluation unit 40 includes: the extraction subunit is used for extracting language elements of the student language text and replacing the language elements with more advanced language elements to obtain the large language input text, wherein the language elements comprise at least one of vocabulary, phrase and grammar; the evaluation subunit is used for performing level evaluation based on the large language input text to obtain the text difficulty level; and the third determining subunit is used for taking the text difficulty level as the large language model difficulty level.
Optionally, the apparatus includes a plurality of loops, each loop generates a student language text, a large language input text corresponding to the student language text in the loop, and a large language model language text corresponding to the large language input text in the loop, and the evaluation unit 40 performs a level evaluation based on the large language input text generated in each loop to obtain a text difficulty level, takes the text difficulty level in each loop as the large language model difficulty level in the loop, and sends the large language model.
Optionally, the apparatus further comprises: an extraction unit that extracts a vocabulary in the student language text in the two cycles when a level obtained by the evaluation unit 40 performing the level evaluation on the student language text generated in the last cycle is equal to a level obtained by the evaluation on the student language text generated in the previous cycle; judging whether the grade of the vocabulary in the student language text in the last cycle is higher than that of the vocabulary in the student language text in the previous cycle; if so, the number of extracted and replaced language elements is increased in the next cycle.
Optionally, the apparatus further comprises: the storage unit is used for storing the language texts of the students and the language texts of the large language model, and the language texts are arranged according to the time sequence of the generated texts and are displayed to the students.
Fig. 6 is a block diagram of an electronic device, according to an example embodiment.
An electronic device 700 according to such an embodiment of the present disclosure is described below with reference to fig. 6. The electronic device 700 shown in fig. 6 is merely an example and should not be construed to limit the functionality and scope of use of embodiments of the present disclosure in any way.
As shown in fig. 6, the electronic device 700 is embodied in the form of a general purpose computing device. Components of electronic device 700 may include, but are not limited to: at least one processing unit 710, at least one memory unit 720, a bus 730 connecting the different system components (including the memory unit 720 and the processing unit 710), a display unit 740, and the like.
Wherein the storage unit stores program code that is executable by the processing unit 710 such that the processing unit 710 performs steps described in the present specification according to various exemplary embodiments of the present disclosure.
The memory unit 720 may include readable media in the form of volatile memory units, such as Random Access Memory (RAM) 7201 and/or cache memory 7202, and may further include Read Only Memory (ROM) 7203.
The storage unit 720 may also include a program/utility 7204 having a set (at least one) of program modules 7205, such program modules 7205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.
Bus 730 may be a bus representing one or more of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic device 700 may also communicate with one or more external devices 700' (e.g., keyboard, pointing device, bluetooth device, etc.), devices that enable a user to interact with the electronic device 700, and/or any devices (e.g., routers, modems, etc.) with which the electronic device 700 can communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 750. Also, electronic device 700 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet, through network adapter 760. Network adapter 760 may communicate with other modules of electronic device 700 via bus 730. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with electronic device 700, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.
From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, as shown in fig. 7, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, and includes several instructions to cause a computing device (may be a personal computer, a server, or a network device, etc.) to perform the above-described method according to the embodiments of the present disclosure.
The software product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The computer readable storage medium may include a data signal propagated in baseband or as part of a carrier wave, with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable storage medium may also be any readable medium that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).
The computer-readable medium carries one or more programs, which when executed by one of the devices, cause the computer-readable medium to perform the functions of: determining a target scene of student spoken language learning from a plurality of spoken language learning scenes; determining the personal settings of the students and the personal settings of the large language model based on the target scene; acquiring the voice of a student in a target scene based on the person setting of the student for spoken input, and converting the voice into a student language text; performing level evaluation based on a large language input text to obtain a text difficulty level, wherein the large language input text is the student language text or is a text which is obtained based on the conversion of the student language text and has higher difficulty than the student language text; a large language model is called, the target scene, the person setting of the student, the person setting of the large language model, the large language input text and a large language model difficulty level are sent to the large language model, so that the large language model feeds back the large language model language text based on the target scene, the person setting of the student, the large language input text and the large language model difficulty level, and the large language model difficulty level is the same as the text difficulty level or higher than the text difficulty level; and receiving the large language model language text fed back by the large language model, converting the large language model language text into voice and outputting the voice to students.
Those skilled in the art will appreciate that the modules may be distributed throughout several devices as described in the embodiments, and that corresponding variations may be implemented in one or more devices that are unique to the embodiments. The modules of the above embodiments may be combined into one module, or may be further split into a plurality of sub-modules.
From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or in combination with the necessary hardware. Thus, the technical solutions according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, and include several instructions to cause a computing device (may be a personal computer, a server, a mobile terminal, or a network device, etc.) to perform the method according to the embodiments of the present disclosure.
Exemplary embodiments of the present disclosure are specifically illustrated and described above. It is to be understood that this disclosure is not limited to the particular arrangements, instrumentalities and methods of implementation described herein; on the contrary, the disclosure is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.
Claims (11)
1. A spoken language learning method based on a large language model, the method comprising:
determining a target scene of student spoken language learning from a plurality of spoken language learning scenes;
determining the personal settings of the students and the personal settings of the large language model based on the target scene;
acquiring the voice of a student in a target scene based on the person setting of the student for spoken input, and converting the voice into a student language text;
performing level evaluation based on a large language input text to obtain a text difficulty level, wherein the large language input text is the student language text or is a text which is obtained based on the conversion of the student language text and has higher difficulty than the student language text;
a large language model is called, the target scene, the person setting of the student, the person setting of the large language model, the large language input text and a large language model difficulty level are sent to the large language model, so that the large language model feeds back the large language model language text based on the target scene, the person setting of the student, the large language input text and the large language model difficulty level, and the large language model difficulty level is the same as the text difficulty level or higher than the text difficulty level;
And receiving the large language model language text fed back by the large language model, converting the large language model language text into voice and outputting the voice to students.
2. The method of claim 1, wherein the student's personal settings have a correspondence with the personal settings of the large language model.
3. The method of claim 1, wherein performing a level rating based on the large language input text to obtain a text difficulty rating comprises:
setting at least two evaluation indexes, wherein different evaluation indexes are used for evaluating the large language input text from different aspects;
setting the weight of each evaluation index;
determining a score of the large language input text on each of the evaluation indicators;
and determining the text difficulty level according to the score of the large language input text on each evaluation index and the weight of each evaluation index.
4. The method of claim 3, wherein the evaluation metrics include at least two of vocabulary level, grammar level, markedness, fluency, number of words, number of sentences.
5. The method of claim 1, wherein the large language input text is a text converted based on the student language text and having a higher difficulty than the student language text, wherein the large language model difficulty level is the same as the text difficulty level, wherein performing a horizontal level evaluation based on the large language input text to obtain a text difficulty level comprises:
Extracting language elements of the student language text, and replacing the language elements with higher-level language elements to obtain the large language input text, wherein the language elements comprise at least one of vocabulary, phrase and grammar;
performing level evaluation based on the large language input text to obtain the text difficulty level;
and taking the text difficulty level as the large language model difficulty level.
6. The method of claim 5, wherein the method includes a plurality of loops, each loop producing student language text, large language input text corresponding to the student language text in the loop, and large language model language text corresponding to the large language input text in the loop, the method further comprising: and carrying out level evaluation based on the large language input text generated in each cycle to obtain a text difficulty level, taking the text difficulty level in each cycle as the large language model difficulty level in the cycle, and transmitting the large language model.
7. The method of claim 6, wherein the method further comprises:
the student language text generated for each cycle is evaluated for a level rating,
Extracting words in the student language text in the two loops when the level obtained by performing level evaluation on the student language text generated in the last loop is equal to the level obtained by performing level evaluation on the student language text generated in the previous loop;
judging whether the grade of the vocabulary in the student language text in the last cycle is higher than that of the vocabulary in the student language text in the previous cycle;
if so, the number of extracted and replaced language elements is increased in the next cycle.
8. The method according to any one of claims 1 to 7, further comprising: and storing the language texts of the students and the language texts of the large language model, and arranging and displaying the language texts of the students according to the time sequence of generating the texts.
9. A spoken language learning device based on a large language model, the device comprising:
a first determination unit configured to determine a target scene of student spoken learning from a plurality of spoken learning scenes;
a second determining unit configured to determine a person setting of a student and a person setting of a large language model based on the target scene;
the acquisition unit is used for acquiring the voice of the student for spoken input based on the person setting of the student in the target scene and converting the voice into a student language text;
The evaluation unit is used for carrying out level evaluation based on a large language input text to obtain a text difficulty level, wherein the large language input text is the student language text or is a text which is obtained based on the conversion of the student language text and has higher difficulty than the student language text;
the calling unit is used for calling a large language model, and sending the target scene, the student setting, the large language model setting, the large language input text and the large language model difficulty level to the large language model so that the large language model feeds back the large language model language text based on the target scene, the student setting, the large language model setting, the large language input text and the large language model difficulty level, wherein the large language model difficulty level is the same as the text difficulty level, or the large language model difficulty level is higher than the text difficulty level;
and the receiving unit is used for receiving the large language model language text fed back by the large language model, converting the large language model language text into voice and outputting the voice to students.
10. An electronic device, comprising:
One or more processors;
a storage means for storing one or more programs;
when executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-8.
11. A computer readable medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310585313.7A CN116564143A (en) | 2023-05-23 | 2023-05-23 | Spoken language learning method and device based on large language model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310585313.7A CN116564143A (en) | 2023-05-23 | 2023-05-23 | Spoken language learning method and device based on large language model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116564143A true CN116564143A (en) | 2023-08-08 |
Family
ID=87503278
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310585313.7A Pending CN116564143A (en) | 2023-05-23 | 2023-05-23 | Spoken language learning method and device based on large language model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116564143A (en) |
-
2023
- 2023-05-23 CN CN202310585313.7A patent/CN116564143A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP7498149B2 (en) | User Programmable Automated Assistant | |
Cowan et al. | Voice anthropomorphism, interlocutor modelling and alignment effects on syntactic choices in human− computer dialogue | |
US20200175890A1 (en) | Device, method, and graphical user interface for a group reading environment | |
KR101857648B1 (en) | User training by intelligent digital assistant | |
CA2929018C (en) | Natural expression processing method, processing and response method, device and system | |
US8972265B1 (en) | Multiple voices in audio content | |
Griol et al. | An architecture to develop multimodal educative applications with chatbots | |
US20080160487A1 (en) | Modularized computer-aided language learning method and system | |
US20140315163A1 (en) | Device, method, and graphical user interface for a group reading environment | |
CN110689261A (en) | Service quality evaluation product customization platform and method | |
US20080195375A1 (en) | Echo translator | |
KR20180105861A (en) | Foreign language study application and foreign language study system using contents included in the same | |
Lhawa | Language revitalization, video, and mobile social media: A case study from the Khroskyabs language amongst Tibetans in China | |
CN116564143A (en) | Spoken language learning method and device based on large language model | |
KR20000049703A (en) | Foreign language learning system using internet(pc network) | |
US20210183261A1 (en) | Interactive virtual learning system and methods of using same | |
CN111966803B (en) | Dialogue simulation method and device, storage medium and electronic equipment | |
CN110136719B (en) | Method, device and system for realizing intelligent voice conversation | |
Pyae | A usability evaluation of the Google Home with non-native English speakers using the system usability scale | |
Elsheikh et al. | Mada tawasol symbols & mobile app | |
JP2018163581A (en) | Voice translation device, voice translation method, and voice translation program | |
KR20140088327A (en) | Method for studying language apply to dynamic conversation, system and apparatus thereof | |
US20050214722A1 (en) | Language online learning system and method integrating local learning and remote companion oral practice | |
KR20000036769A (en) | Foreign language learning system using mobile station(or PC & telephone) | |
JP6383748B2 (en) | Speech translation device, speech translation method, and speech translation program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |