CN113539245B - Language model automatic training method and system - Google Patents

Language model automatic training method and system Download PDF

Info

Publication number
CN113539245B
CN113539245B CN202110757208.8A CN202110757208A CN113539245B CN 113539245 B CN113539245 B CN 113539245B CN 202110757208 A CN202110757208 A CN 202110757208A CN 113539245 B CN113539245 B CN 113539245B
Authority
CN
China
Prior art keywords
intention
language model
language
corpus
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110757208.8A
Other languages
Chinese (zh)
Other versions
CN113539245A (en
Inventor
史彤
董鑫
初敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sipic Technology Co Ltd
Original Assignee
Sipic Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sipic Technology Co Ltd filed Critical Sipic Technology Co Ltd
Priority to CN202110757208.8A priority Critical patent/CN113539245B/en
Publication of CN113539245A publication Critical patent/CN113539245A/en
Application granted granted Critical
Publication of CN113539245B publication Critical patent/CN113539245B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling

Abstract

The embodiment of the invention provides an automatic training method for a language model. The method comprises the following steps: transmitting a desired intention list of a first language model facing a scene and/or a second language model facing a dialogue node to a semantic generalization system, and receiving self-generated language materials representing the intention of a speaker generalized by the semantic generalization system based on intention information in the desired intention list; the self-generated language materials are input into a language model training system, and the first language model and/or the second language model trained by the language model training system are automatically issued. The embodiment of the invention also provides a language model automatic training system applied to the robot customization system. The embodiment of the invention uses the corpus generated by semantic generalization in natural language generation as the data trained by the semantic recognition language model, and represents the likely speaking words of the speaker to a certain extent after generalization. A large number of utterances representing the speaker can be automatically obtained, and the accuracy is high, so that the recovery of the voice recognition robot is more accurate.

Description

Language model automatic training method and system
Technical Field
The invention relates to the field of intelligent voice, in particular to an automatic training method and system for a language model.
Background
Intelligent Speech dialog systems are often composed of five parts ASR (Automatic Speech Recognition ), NLU (Natural Language Processing, natural language understanding), DM (Dialog Management ), NLG (Natural Language Generation, natural language generation), TTS (Text To Speech, speech synthesis), and sometimes FAQ (Frequently Asked Questions, common questions and answers) may also be present. Speech recognition is the first module of an intelligent speech dialogue system, and the accuracy of speech recognition directly influences the task success rate of the whole dialogue system. Currently, according to the application scope, the language model is defined as follows:
one way language model: the universal language model for speech recognition is suitable for a wide dialogue system, such as boring.
Two-way language model: the speech recognition language model oriented to different industry scenes is suitable for a dialogue system of a certain vertical industry, such as financial industry and express industry.
The three-way language model is oriented to the voice recognition language model of different dialogue nodes and is suitable for the dialogue nodes with specific replies, such as the indication confirmation and the indication license plate number.
For a robot in a certain scene, a two-way language model can be configured to enhance ASR recognition; for some nodes in the robot, three-way language models can be configured to enhance ASR recognition.
The corresponding possible reply corpus training and association model is added for the specific scene and the nodes in an offline manner, so that the voice recognition accuracy of the voice robot can be improved to a great extent. Model training often begins with corpus collection, manually training and binding corresponding scenes or nodes.
In the process of implementing the present invention, the inventor finds that at least the following problems exist in the related art:
the traditional two-way and three-way language model construction application method often needs to manually sort a large amount of corpus, manually train and associate the corpus with corresponding scenes or nodes. Manual operation has the possibility of error, and the corpus of manual arrangement often has the condition of insufficient completeness, and repeated training is needed.
The manual corpus arrangement and the manual repetition times often represent a relatively large workload, and the corpus arrangement is not easy to complete at one time, and the corpus is generally used as the corpus to be repeated for several times to train the model when testing verification or on-line environment finds some cases which are incorrectly identified. When new error recognition cases occur again, corpus retraining is added again, and manual association is repeated. The above process is often a method for mending sheep after an error occurs, and the strengthening model is not applied in the robot customization process.
Model training and dialogue customization often need the manual work to operate a plurality of systems simultaneously on different systems, and the mode of production that the tuning flow of above is not standardized increases to some extent, and to the robot customization, the work load size is difficult for the aassessment, and easy repeated single work, and continuous manual training improves discernment, and efficiency is lower, and the cost of labor is great, and easily makes mistakes.
Disclosure of Invention
In order to at least solve the problems of large corpus requirement and low efficiency of training models in the prior art.
In a first aspect, an embodiment of the present invention provides a method for automatically training a language model, which is applied to a robot customization system, including:
transmitting a desired intention list of a first language model facing a scene and/or a second language model facing a dialogue node to a semantic generalization system, and receiving self-generated language representing the intention of a speaker generalized by the semantic generalization system based on intention information in the desired intention list;
inputting the self-generated language materials into a language model training system, and automatically publishing a first language model and/or a second language model trained by the language model training system.
In a second aspect, an embodiment of the present invention provides a method for automatically configuring a language model, which is applied to a robot customization system, including:
transmitting a desired intention list of a first language model facing a scene and/or a second language model facing a dialogue node to a semantic generalization system, and receiving self-generated language representing the intention of a speaker generalized by the semantic generalization system based on intention information in the desired intention list;
inputting the self-generated language materials into a language model training system, and automatically publishing a first language model and/or a second language model trained by the language model training system;
and automatically associating the first language model and/or the second language model with corresponding scenes and/or dialogue nodes to realize automatic configuration of the language models.
In a third aspect, an embodiment of the present invention provides a language model automatic training system applied to a robot customization system, including:
a self-generated language determining program module, configured to send a desired intention list of a first language model facing a scene and/or a second language model facing a dialogue node to a semantic generalization system, and receive self-generated language representing intention of a speaker generalized by the semantic generalization system based on intention information in the desired intention list;
the self-training program module is used for inputting the self-generated language materials into a language model training system and automatically publishing the first language model and/or the second language model trained by the language model training system.
In a fourth aspect, an embodiment of the present invention provides a language model automatic configuration system applied to a robot customization system, including:
a self-generated language determining program module, configured to send a desired intention list of a first language model facing a scene and/or a second language model facing a dialogue node to a semantic generalization system, and receive self-generated language representing intention of a speaker generalized by the semantic generalization system based on intention information in the desired intention list;
the self-training program module is used for inputting the self-generated language materials into a language model training system and automatically publishing a first language model and/or a second language model trained by the language model training system;
and the self-association program module is used for automatically associating the first language model and/or the second language model with the corresponding scene and/or the dialogue node so as to realize the automatic configuration of the language model.
In a fifth aspect, there is provided an electronic device, comprising: the system comprises at least one processor and a memory communicatively connected with the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the language model automatic training method and the automatic configuration method of any one of the embodiments of the present invention.
In a sixth aspect, an embodiment of the present invention provides a storage medium having a computer program stored thereon, wherein the program when executed by a processor implements the steps of the language model automatic training method and the automatic configuration method of any one of the embodiments of the present invention.
The embodiment of the invention has the beneficial effects that: the corpus generated by semantic generalization in natural language generation is used as data for training a semantic recognition language model, and as the expected intention list of the intelligent voice robot represents the intention possibly expressed by a person talking with the intelligent voice robot, the generalized corpus represents the words likely to be spoken by a speaker to a certain extent. Therefore, a large number of utterances representing the speaker can be automatically obtained, the labor cost is saved, and meanwhile, the accuracy is high, so that the voice recognition robot has more accurate voice recognition on a scene or node specific reply.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method for automatic training of language models according to an embodiment of the present invention;
FIG. 2 is a block diagram of an automatic language model training flow of an automatic language model training method according to an embodiment of the present invention;
FIG. 3 is a flow chart of a method for automatically configuring language models according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a dialogue node speaking edit click "three-way model training" of a language model automatic configuration method according to an embodiment of the present invention;
FIG. 5 is a schematic drawing of automatic speech recognition three-way model automatic training corpus of a language model automatic configuration method according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of inputting supplementary corpus into a supplementary corpus frame of an automatic configuration method of language model according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of a training state of a submission training review of a method for automatically configuring a language model according to an embodiment of the present invention;
FIG. 8 is a schematic diagram of an automatic training completion and association three-way model resource for a language model automatic configuration method according to an embodiment of the present invention;
FIG. 9 is a schematic diagram of a scenario set click "model training" pull corpus of a method for automatically configuring language models according to an embodiment of the present invention;
FIG. 10 is a schematic diagram of a two-way model training popup for automatic speech recognition for a method for automatically configuring language models according to an embodiment of the present invention, in which "select corpus" is clicked;
FIG. 11 is a schematic diagram of a new pulling task for clicking "generate corpus" in a corpus list of an automatic configuration method for language models according to an embodiment of the present invention;
FIG. 12 is a schematic diagram of refreshing a list during execution of a corpus pulling task in a language model auto-configuration method according to an embodiment of the present invention;
FIG. 13 is a schematic diagram of inputting supplementary corpus into a supplementary corpus frame of an automatic configuration method of language model according to an embodiment of the present invention;
FIG. 14 is a schematic diagram showing the submitting training and viewing training states of a language model automatic configuration method according to an embodiment of the present invention;
FIG. 15 is a schematic diagram of an automatic training completion and association two-way model resource of a language model automatic configuration method according to an embodiment of the present invention;
FIG. 16 is a schematic diagram of a language model automatic training system applied to a robot customization system according to an embodiment of the present invention;
fig. 17 is a schematic structural diagram of a language model automatic configuration system applied to a robot customization system according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
FIG. 1 is a flowchart of a method for automatically training a language model according to an embodiment of the present invention, including the following steps:
s11: transmitting a desired intention list of a first language model facing a scene and/or a second language model facing a dialogue node to a semantic generalization system, and receiving self-generated language representing the intention of a speaker generalized by the semantic generalization system based on intention information in the desired intention list;
s12: inputting the self-generated language materials into a language model training system, and automatically publishing a first language model and/or a second language model trained by the language model training system.
In the embodiment, the method is applied to a robot customization system, and a function of automatically training a two-way three-way language model is added to the robot customization system. The two-way model can take possible replies related to the whole robot as self-generated language materials of all NLUs (Natural Language Processing, natural language understanding), can selectively add supplementary language materials during training, initiate model training, automatically release the model training when the training is completed, and associate the robot with a language model. The three-path model can take possible replies related to the corresponding dialogue nodes as all self-generated corpus, the supplementary corpus can be added during training, model training is initiated, and the model training is automatically released when the training is completed.
For step S11, the robot customization system actively pulls corpus from the semantic generalization system, as shown in fig. 2, the developer prepares a desired intent list of project requirements, for example, a language model of the express industry is to be developed for different requirements, or in a certain dialogue, a dialogue node that needs user identification confirmation is needed. After the developer prepares the expected intention list, the expected intention list is input into a robot customization system, and the robot customization system pulls corpus to a semantic generalization system based on the expected intention list.
The semantic generalization system expands the corpus based on the expected intention list, and sets the corpus repetition number according to the parameters, so that a corpus file is obtained.
As one embodiment, the expected intention list of the first language model facing the scene includes: the business intent of the robot and knowledge base statements of the robot scene configuration.
The expected intention list of the second language model facing the dialogue node comprises: the service intention of the current node, the service intention of the global dialogue and the knowledge base question of the robot scene configuration.
The first language model comprises a two-way language model, and the second language model comprises a three-way language model.
In this embodiment, the robot customization system sends the desired intent list to the semantic generalization system to obtain the corpus required by the robot scene or node. The expected intention list of the two-way model includes: the business intent of the complete robot (e.g., send express, query express information, cancel transactions, etc.), the complete knowledge base question of the robot scene configuration (e.g., "how to adjust volume; the expected intent list of the three-way model includes: the service intention of the current node, the global dialogue intention and the complete knowledge base question of the robot scene configuration.
As an embodiment, the expected intention list of the second language model facing the dialogue node includes: the service intention of the current node, the service intention of the global dialogue and the knowledge base question of the robot scene configuration.
In this embodiment, the semantic generalization system generates the material file from the expected intent list, and different types have different expected generation strategies.
The classification of business intents has built-in intents, which are algorithm intents built in the system, and are used for general semantic recognition (generally realized by machine learning model training or regular rules), such as getting a part up and turning down the volume. The strategy is to obtain forward corpus data or expansion regular rules during model training. The classification of business intents also has regular intents, meaning keywords or regular rules written according to business requirements when customizing the robot, and possibly referencing a dictionary, for example: numbers (error |not vs|error|incorrect|error|problematic|not). The strategy is to expand and write corpus according to regular rules, and the wildcard makes part-of-speech expansion (human pronouns, verbs and the like) according to positions; the content in the dictionary is read and added to the corpus. The classification of business intents also has similarity intents, meaning similar sentences filled in according to business requirements when customizing the robot, for example: this number is not paired. This number should not be the case. The strategy is to use all similar sentences as part of the corpus.
The classification of knowledge base questions has standard questions, meaning standard questions of knowledge questions and answers. For example: how does volume adjust? The strategy is to use standard questions as part of the corpus. The classification of knowledge base questions also has similar questions, meaning is the expanded similar questions (containing complete sentences and regular rule sentences) of knowledge questions and answers. For example: what is the volume of the Sibichi voice channel to adjust and press? What is the speech volume of the Sibichi too small? Voice volume. The strategy is to take complete similar questions as a part of corpus, and expand the regular sentences into complete sentences.
All possible expected intention combinations can occur billions of times, so that the training set with proper data size is finally obtained by screening according to the positions of sentences and the confidence of the sentences. Because the two-way and three-way language model is used for enhancing ASR recognition, the generated corpus can be repeated for 3-10 times according to parameters, and finally the NLU self-generated language file is completed.
For step S12, the self-generated corpus file is sent to a language model training system for model training, when the self-generated corpus is complete, the developer can choose to submit training, the system can submit tasks to the language model training system in real time, after a period of training (generally 5-10 minutes), the two-way three-way language model training is successful and can be automatically released to an online available state.
According to the embodiment, corpus generated by semantic generalization in natural language generation is used as data for training a semantic recognition language model, and the expected intention list of the intelligent voice robot represents the intention possibly expressed by a person talking with the intelligent voice robot, so that the language model represents the words which are likely to be spoken by a speaker to a certain extent after generalization. Therefore, a large number of utterances representing the speaker can be automatically obtained, the labor cost is saved, and meanwhile, the accuracy is high, so that the voice recognition robot has more accurate voice recognition on a scene or node specific reply.
As an embodiment, after said receiving the self-generated speech representing the speaker's intention generalized by the semantic generalization system based on the intention information within the expected intention list, the method further comprises:
previewing the self-generated language material for a developer;
and when the speech recognition incorrect corpus is not contained in the self-generated corpus, receiving supplementary corpus input by the developer, and supplementing the self-generated corpus based on the supplementary corpus.
In the present embodiment, in order to give the developer the opportunity to supplement, the self-generated corpus is obtained and then previewed in consideration of the fact that the corpus content due to the erroneous speech recognition is not included in the above self-generated corpus file. Thus, when speech recognition is incorrect, the generated corpus is not generated, which requires a developer to supplement the corpus. The developer can manually supplement the corpus and actively repeat the times. And training a language model by using the supplemented corpus.
According to the embodiment, a browsing interface is provided for a developer, and the corpus supplemented by the developer can be received, so that the developer is assisted to supplement the self-generated corpus, and the robot has more accurate voice recognition on a scene or node specific reply.
Fig. 3 is a flowchart of a method for automatically configuring a language model according to an embodiment of the present invention, including the following steps:
s21: transmitting a desired intention list of a first language model facing a scene and/or a second language model facing a dialogue node to a semantic generalization system, and receiving self-generated language representing the intention of a speaker generalized by the semantic generalization system based on intention information in the desired intention list;
s22: inputting the self-generated language materials into a language model training system, and automatically publishing a first language model and/or a second language model trained by the language model training system;
and S23, automatically associating the first language model and/or the second language model with the corresponding scene and/or the dialogue node so as to realize automatic configuration of the language model.
In this embodiment, in order to make the language model more automatic, after the automatic release, the language model is automatically associated with the corresponding scene or dialogue node, so that the automatic configuration of the language model is realized as a whole.
As one embodiment, the first language model and/or the second language model are/is automatically associated with a corresponding scene and/or dialogue node in a configuration interface for display.
Providing a text box of the dialogue node and three-way model training buttons in the natural language generation configuration interface;
generating a three-way model automatic training configuration frame in response to clicking of the three-way model training button by a developer, and providing a corpus file generated based on a desired intention list and a text box for supplementing the corpus in the three-way model automatic training configuration frame;
and responding to clicking of a submitting training button in the three-way model automatic training configuration frame by a developer, performing three-way model training, jumping back to the natural language generation configuration interface, and automatically associating the dialogue node with the three-way language model after training is completed, so as to identify the next-round voice replying to the speaking operation, thereby completing the automatic configuration of the dialogue node of the dialogue robot.
In this embodiment, in the intelligent voice conversation system for outbound call, the user may ask the telephone recipient if he/she is himself/herself, and the conversation is completed at the "confirm identity" node, and the user may indicate intention such as confirm, deny, busy, etc. at the node. The system recognizes the user voice, then recognizes the intention, and finally jumps the dialogue node according to the intention label. Next, taking the construction and application process of the identity node three-way model as an example, the automatic training of the language model of the dialogue node layer is introduced.
Taking the node of "confirm identity" as an example, the three-way model training is selected in the interface, and the three-way model automatic training configuration box is used for displaying the pulled corpus as shown in fig. 5. Clicking the 'click to view corpus content', viewing all information of the intelligently generated corpus, and viewing txt format text content through 'downloading'.
When the developer wishes to supplement the corpus, the corpus content to be trained is manually supplemented in a "supplement corpus" input box, as shown in fig. 6, and the corpus content which is actually not recognized as correct by the business can be filled up and repeated for 3 times.
Clicking "submit training", creating three language model training tasks, and as shown in fig. 7, displaying the current state "in model training", clicking "refreshing", and viewing the training state. As shown in FIG. 8, the model is automatically associated after completion, and the model name and corresponding ID are displayed at the "Next ASR three-way resource". Therefore, automatic training and association of ASR three-way language models are completed, the language models can act on voice recognition in the voice test process, and recognition accuracy of model training corpus is improved.
As another embodiment, the configuration interface includes: a scene configuration interface facing the scene;
providing a two-way model training button of the scene in the scene configuration interface;
generating a two-way model automatic training configuration frame in response to clicking of the two-way model training button by a developer, and providing a corpus file generated based on a desired intention list and a text box for supplementing the corpus in the two-way model automatic training configuration frame;
and responding to clicking of a submitting training button in the two-way model automatic training configuration frame by a developer, performing two-way model training, jumping back to the scene configuration interface, and automatically associating the scene with the two-way model after training is completed, wherein the scene is used for recognizing dialogue voices in the scene so as to complete automatic configuration of the scene of the dialogue robot.
In the embodiment, automatic construction and association of the robot two-way model are shown. Intelligent voice robots often talk to users to perform conversational tasks, for example, in the financial arts, involving a large number of financial, banking, credit-related proper terms. The system recognizes the user voice, then recognizes the intention, and finally jumps the dialogue node according to the intention label. To improve proper noun recognition accuracy, two paths of model resources of scene dimensions are required to be configured. Next, taking automatic construction and association of the two-way model of the financial robot as an example, the automatic training of the two-way model based on NLU is introduced.
In order to pull the corpus, as shown in fig. 9, in the scene setting of the financial robot, the model training is clicked, and the model training page is opened.
Clicking "select corpus", selecting corpus needed for training from the corpus list which is successfully pulled, clicking "generate corpus", creating corpus pulling task, and refreshing list viewing state in task execution process. As shown in fig. 10, the two-way model trains the popup window, clicks "select corpus", as shown in fig. 11, clicks "generate corpus" new pull task in the corpus list, as shown in fig. 12, and in the execution of the corpus pull task, the list view can be refreshed. After the supplementary corpus is selected, the supplementary corpus is input into a supplementary corpus box shown in fig. 13, the corpus content to be trained is manually supplemented in the supplementary corpus input box, and the corpus content which is actually not recognized to be correct by the service can be filled up and written for 3 times.
Clicking "submit training", creating a two-way language model training task, and as shown in fig. 14, displaying "in model training" in the current state, clicking "refreshing", and viewing the training state. As shown in FIG. 15, the model is automatically associated after completion, and the model name and corresponding ID are displayed at the "ASR two-way model". Thus, automatic training and association of the ASR two-way language model are completed, and the language model can act on voice recognition in the voice test process, so that the recognition accuracy of the model training corpus is improved.
Fig. 16 is a schematic structural diagram of a language model automatic training system applied to a robot customization system according to an embodiment of the present invention, where the system may execute the language model automatic training method according to any of the above embodiments and be configured in a terminal.
The language model automatic training system 10 applied to the robot customization system provided in the present embodiment includes: a self-generated corpus determining program module 11 and a self-training program module 12.
Wherein the self-generated language determining program module 11 is configured to send a desired intention list of a first language model facing a scene and/or a second language model facing a dialogue node to a semantic generalization system, and receive self-generated language representing intention of a speaker generalized by the semantic generalization system based on intention information in the desired intention list; the self-training program module 12 is configured to input the self-generated language material into a language model training system, and automatically issue a first language model and/or a second language model trained by the language model training system.
The embodiment of the invention also provides a nonvolatile computer storage medium, wherein the computer storage medium stores computer executable instructions, and the computer executable instructions can execute the language model automatic training method in any method embodiment;
as one embodiment, the non-volatile computer storage medium of the present invention stores computer-executable instructions configured to:
transmitting a desired intention list of a first language model facing a scene and/or a second language model facing a dialogue node to a semantic generalization system, and receiving self-generated language representing the intention of a speaker generalized by the semantic generalization system based on intention information in the desired intention list;
inputting the self-generated language materials into a language model training system, and automatically publishing a first language model and/or a second language model trained by the language model training system.
As a non-volatile computer readable storage medium, it may be used to store a non-volatile software program, a non-volatile computer executable program, and modules, such as program instructions/modules corresponding to the methods in the embodiments of the present invention. One or more program instructions are stored in a non-transitory computer readable storage medium that, when executed by a processor, perform the language model auto-training method of any of the method embodiments described above.
Fig. 17 is a schematic structural diagram of a language model automatic configuration system applied to a robot customization system according to an embodiment of the present invention, where the system may execute the language model automatic configuration method according to any of the foregoing embodiments and configure the method in a terminal.
The language model automatic configuration system 20 applied to the robot customization system provided in the present embodiment includes: a self-generated corpus determining program module 21, a self-training program module 22 and a self-association program module 23.
Wherein the self-generated language determining program module 21 is configured to send a desired intention list of a first language model facing a scene and/or a second language model facing a dialogue node to a semantic generalization system, and receive self-generated language representing intention of a speaker generalized by the semantic generalization system based on intention information in the desired intention list; the self-training program module 22 is configured to input the self-generated language material into a language model training system, and automatically issue a first language model and/or a second language model trained by the language model training system; the self-association program module 23 is configured to automatically associate the first language model and/or the second language model with a corresponding scene and/or dialogue node, so as to implement automatic configuration of the language model.
The embodiment of the invention also provides a nonvolatile computer storage medium, wherein the computer storage medium stores computer executable instructions, and the computer executable instructions can execute the automatic configuration method of the language model in any method embodiment;
as one embodiment, the non-volatile computer storage medium of the present invention stores computer-executable instructions configured to:
transmitting a desired intention list of a first language model facing a scene and/or a second language model facing a dialogue node to a semantic generalization system, and receiving self-generated language representing the intention of a speaker generalized by the semantic generalization system based on intention information in the desired intention list;
inputting the self-generated language materials into a language model training system, and automatically publishing a first language model and/or a second language model trained by the language model training system;
and automatically associating the first language model and/or the second language model with corresponding scenes and/or dialogue nodes to realize automatic configuration of the language models.
As a non-volatile computer readable storage medium, it may be used to store a non-volatile software program, a non-volatile computer executable program, and modules, such as program instructions/modules corresponding to the methods in the embodiments of the present invention. One or more program instructions are stored in a non-transitory computer readable storage medium that, when executed by a processor, perform the language model auto-configuration method of any of the method embodiments described above.
The non-transitory computer readable storage medium may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the device, etc. Further, the non-volatile computer-readable storage medium may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the non-transitory computer readable storage medium may optionally include memory remotely located relative to the processor, which may be connected to the apparatus via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The embodiment of the invention also provides electronic equipment, which comprises: the system comprises at least one processor and a memory communicatively connected with the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the language model automatic training and configuration method of any one of the embodiments of the present invention.
The electronic device of the embodiments of the present application exist in a variety of forms including, but not limited to:
(1) Mobile communication devices, which are characterized by mobile communication functionality and are aimed at providing voice, data communication. Such terminals include smart phones, multimedia phones, functional phones, low-end phones, and the like.
(2) Ultra mobile personal computer equipment, which belongs to the category of personal computers, has the functions of calculation and processing and generally has the characteristic of mobile internet surfing. Such terminals include PDA, MID, and UMPC devices, etc., such as tablet computers.
(3) Portable entertainment devices such devices can display and play multimedia content. The device comprises an audio player, a video player, a palm game machine, an electronic book, an intelligent toy and a portable vehicle navigation device.
(4) Other electronic devices with data processing functions.
In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," comprising, "or" includes not only those elements but also other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (11)

1. An automatic training method of a language model is applied to a robot customization system and comprises the following steps:
transmitting a desired intention list of a first language model facing a scene and/or a second language model facing a dialogue node to a semantic generalization system, and receiving self-generated language representing the intention of a speaker generalized by the semantic generalization system based on intention information in the desired intention list; the semantic generalization system generates a language file according to the expected intention list, and different intention types have different language generation strategies;
previewing the self-generated language material for a developer;
when the speech recognition incorrect corpus is not contained in the self-generated corpus, receiving supplementary corpus input by the developer, and supplementing the self-generated corpus based on the supplementary corpus;
inputting the self-generated language materials into a language model training system, and automatically publishing a first language model and/or a second language model trained by the language model training system;
the list of desired intents includes: business intention and knowledge base question, the type of the business intention includes: built-in intention, regular intention; the built-in intention is algorithm intention built in the system and used for general semantic recognition, and the strategy is to acquire forward corpus data or expanded writing regular rules during model training; the regular intention means that keywords or regular rules are written according to business requirements when the robot is customized, the strategy means that corpus is expanded according to the regular rules, and the wildcards are expanded according to parts of speech according to positions.
2. The method of claim 1, wherein the expected intent list of the scene-oriented first language model comprises: the business intent of the robot and knowledge base statements of the robot scene configuration.
3. The method of claim 1, wherein the list of expected intents of the dialog node-oriented second language model comprises: the service intention of the current node, the service intention of the global dialogue and the knowledge base question of the robot scene configuration.
4. A method according to any one of claims 2 or 3, wherein the type of business intent comprises: similarity intent, types of the knowledge base questions include: standard questions and similar questions.
5. The method of claim 1, wherein the first language model comprises a two-way language model and the second language model comprises a three-way language model.
6. A language model automatic configuration method is applied to a robot customization system and comprises the following steps:
transmitting a desired intention list of a first language model facing a scene and/or a second language model facing a dialogue node to a semantic generalization system, and receiving self-generated language representing the intention of a speaker generalized by the semantic generalization system based on intention information in the desired intention list; previewing the self-generated language material for a developer; when the speech recognition incorrect corpus is not contained in the self-generated corpus, receiving supplementary corpus input by the developer, and supplementing the self-generated corpus based on the supplementary corpus; the semantic generalization system generates a language file according to the expected intention list, and different intention types have different language generation strategies;
inputting the self-generated language materials into a language model training system, and automatically publishing a first language model and/or a second language model trained by the language model training system;
automatically associating the first language model and/or the second language model with corresponding scenes and/or dialogue nodes to realize automatic configuration of the language models;
the list of desired intents includes: business intention and knowledge base question, the type of the business intention includes: built-in intention, regular intention; the built-in intention is algorithm intention built in the system and used for general semantic recognition, and the strategy is to acquire forward corpus data or expanded writing regular rules during model training; the regular intention means that keywords or regular rules are written according to business requirements when the robot is customized, the strategy means that corpus is expanded according to the regular rules, and the wildcards are expanded according to parts of speech according to positions.
7. The method of claim 6, wherein after the automatically associating the first and/or second language model with the corresponding scene and/or dialog node, the method further comprises:
and automatically associating the first language model and/or the second language model with the corresponding scene and/or the dialogue node in a configuration interface for display.
8. The method of claim 7, wherein the configuration interface comprises: generating a configuration interface by natural language facing to the dialogue node;
providing a text box of the dialogue node and three-way model training buttons in the natural language generation configuration interface;
generating a three-way model automatic training configuration frame in response to clicking of the three-way model training button by a developer, and providing a corpus file generated based on a desired intention list and a text box for supplementing the corpus in the three-way model automatic training configuration frame;
and responding to clicking of a submitting training button in the three-way model automatic training configuration frame by a developer, performing three-way model training, jumping back to the natural language generation configuration interface, and automatically associating the dialogue node with the three-way model after training is completed, so as to identify the next-round voice replying to the speaking operation, thereby completing the automatic configuration of the dialogue node of the dialogue robot.
9. The method of claim 7, wherein the configuration interface comprises: a scene configuration interface facing the scene;
providing a two-way model training button of the scene in the scene configuration interface;
generating a two-way model automatic training configuration frame in response to clicking of the two-way model training button by a developer, and providing a corpus file generated based on a desired intention list and a text box for supplementing the corpus in the two-way model automatic training configuration frame;
and responding to clicking of a submitting training button in the two-way model automatic training configuration frame by a developer, performing two-way model training, jumping back to the scene configuration interface, and automatically associating the scene with the two-way model after training is completed, wherein the scene is used for recognizing dialogue voices in the scene so as to complete automatic configuration of the scene of the dialogue robot.
10. A language model automatic training system for a robotic customization system, comprising:
a self-generated language determining program module, configured to send a desired intention list of a first language model facing a scene and/or a second language model facing a dialogue node to a semantic generalization system, and receive self-generated language representing intention of a speaker generalized by the semantic generalization system based on intention information in the desired intention list; previewing the self-generated language material for a developer; when the speech recognition incorrect corpus is not contained in the self-generated corpus, receiving supplementary corpus input by the developer, and supplementing the self-generated corpus based on the supplementary corpus; the semantic generalization system generates a language file according to the expected intention list, and different intention types have different language generation strategies;
the self-training program module is used for inputting the self-generated language materials into a language model training system and automatically publishing a first language model and/or a second language model trained by the language model training system;
the list of desired intents includes: business intention and knowledge base question, the type of the business intention includes: built-in intention, regular intention; the built-in intention is algorithm intention built in the system and used for general semantic recognition, and the strategy is to acquire forward corpus data or expanded writing regular rules during model training; the regular intention means that keywords or regular rules are written according to business requirements when the robot is customized, the strategy means that corpus is expanded according to the regular rules, and the wildcards are expanded according to parts of speech according to positions.
11. A language model auto-configuration system for a robotic customization system, comprising:
a self-generated language determining program module, configured to send a desired intention list of a first language model facing a scene and/or a second language model facing a dialogue node to a semantic generalization system, and receive self-generated language representing intention of a speaker generalized by the semantic generalization system based on intention information in the desired intention list; previewing the self-generated language material for a developer; when the speech recognition incorrect corpus is not contained in the self-generated corpus, receiving supplementary corpus input by the developer, and supplementing the self-generated corpus based on the supplementary corpus; the semantic generalization system generates a language file according to the expected intention list, and different intention types have different language generation strategies;
the self-training program module is used for inputting the self-generated language materials into a language model training system and automatically publishing a first language model and/or a second language model trained by the language model training system;
the self-association program module is used for automatically associating the first language model and/or the second language model with the corresponding scene and/or dialogue node so as to realize the automatic configuration of the language model;
the list of desired intents includes: business intention and knowledge base question, the type of the business intention includes: built-in intention, regular intention; the built-in intention is algorithm intention built in the system and used for general semantic recognition, and the strategy is to acquire forward corpus data or expanded writing regular rules during model training; the regular intention means that keywords or regular rules are written according to business requirements when the robot is customized, the strategy means that corpus is expanded according to the regular rules, and the wildcards are expanded according to parts of speech according to positions.
CN202110757208.8A 2021-07-05 2021-07-05 Language model automatic training method and system Active CN113539245B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110757208.8A CN113539245B (en) 2021-07-05 2021-07-05 Language model automatic training method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110757208.8A CN113539245B (en) 2021-07-05 2021-07-05 Language model automatic training method and system

Publications (2)

Publication Number Publication Date
CN113539245A CN113539245A (en) 2021-10-22
CN113539245B true CN113539245B (en) 2024-03-15

Family

ID=78126720

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110757208.8A Active CN113539245B (en) 2021-07-05 2021-07-05 Language model automatic training method and system

Country Status (1)

Country Link
CN (1) CN113539245B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002268676A (en) * 2001-03-07 2002-09-20 Atr Onsei Gengo Tsushin Kenkyusho:Kk Language model generating device and voice recognition device
CN103165129A (en) * 2011-12-13 2013-06-19 北京百度网讯科技有限公司 Method and system for optimizing voice recognition acoustic model
CN103198828A (en) * 2013-04-03 2013-07-10 中金数据系统有限公司 Method and system of construction of voice corpus
WO2018157700A1 (en) * 2017-03-02 2018-09-07 腾讯科技(深圳)有限公司 Method and device for generating dialogue, and storage medium
CN109949797A (en) * 2019-03-11 2019-06-28 北京百度网讯科技有限公司 A kind of generation method of training corpus, device, equipment and storage medium
CN110349569A (en) * 2019-07-02 2019-10-18 苏州思必驰信息科技有限公司 The training and recognition methods of customized product language model and device
CN111339309A (en) * 2020-05-22 2020-06-26 支付宝(杭州)信息技术有限公司 Corpus expansion method and system for user intention
CN111460117A (en) * 2020-03-20 2020-07-28 平安科技(深圳)有限公司 Dialog robot intention corpus generation method, device, medium and electronic equipment
CN111933116A (en) * 2020-06-22 2020-11-13 厦门快商通科技股份有限公司 Speech recognition model training method, system, mobile terminal and storage medium
CN111933118A (en) * 2020-08-17 2020-11-13 苏州思必驰信息科技有限公司 Method and device for optimizing voice recognition and intelligent voice dialogue system applying same
KR20210016682A (en) * 2019-08-05 2021-02-17 한국전자통신연구원 Apparatus for fixing error of speech recognition result and method thereof

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002268676A (en) * 2001-03-07 2002-09-20 Atr Onsei Gengo Tsushin Kenkyusho:Kk Language model generating device and voice recognition device
CN103165129A (en) * 2011-12-13 2013-06-19 北京百度网讯科技有限公司 Method and system for optimizing voice recognition acoustic model
CN103198828A (en) * 2013-04-03 2013-07-10 中金数据系统有限公司 Method and system of construction of voice corpus
WO2018157700A1 (en) * 2017-03-02 2018-09-07 腾讯科技(深圳)有限公司 Method and device for generating dialogue, and storage medium
CN109949797A (en) * 2019-03-11 2019-06-28 北京百度网讯科技有限公司 A kind of generation method of training corpus, device, equipment and storage medium
CN110349569A (en) * 2019-07-02 2019-10-18 苏州思必驰信息科技有限公司 The training and recognition methods of customized product language model and device
KR20210016682A (en) * 2019-08-05 2021-02-17 한국전자통신연구원 Apparatus for fixing error of speech recognition result and method thereof
CN111460117A (en) * 2020-03-20 2020-07-28 平安科技(深圳)有限公司 Dialog robot intention corpus generation method, device, medium and electronic equipment
CN111339309A (en) * 2020-05-22 2020-06-26 支付宝(杭州)信息技术有限公司 Corpus expansion method and system for user intention
CN111933116A (en) * 2020-06-22 2020-11-13 厦门快商通科技股份有限公司 Speech recognition model training method, system, mobile terminal and storage medium
CN111933118A (en) * 2020-08-17 2020-11-13 苏州思必驰信息科技有限公司 Method and device for optimizing voice recognition and intelligent voice dialogue system applying same

Also Published As

Publication number Publication date
CN113539245A (en) 2021-10-22

Similar Documents

Publication Publication Date Title
US10679613B2 (en) Spoken language understanding system and method using recurrent neural networks
US20200175890A1 (en) Device, method, and graphical user interface for a group reading environment
CN111226224B (en) Method for translating voice signals and electronic equipment
WO2022078146A1 (en) Speech recognition method and apparatus, device, and storage medium
CN112819664A (en) Apparatus for learning foreign language and method for providing foreign language learning service using the same
US20140315163A1 (en) Device, method, and graphical user interface for a group reading environment
CN111739519A (en) Dialogue management processing method, device, equipment and medium based on voice recognition
CN114830139A (en) Training models using model-provided candidate actions
KR102418558B1 (en) English speaking teaching method using interactive artificial intelligence avatar, device and system therefor
CN116821290A (en) Multitasking dialogue-oriented large language model training method and interaction method
Tomko et al. Towards efficient human machine speech communication: The speech graffiti project
CN111046674A (en) Semantic understanding method and device, electronic equipment and storage medium
CN113539245B (en) Language model automatic training method and system
CN111723559A (en) Real-time information extraction method and device
CN114860910A (en) Intelligent dialogue method and system
KR20190070682A (en) System and method for constructing and providing lecture contents
KR20190070683A (en) Apparatus and method for constructing and providing lecture contents
KR20220140301A (en) Video learning systems for enable learners to be identified through artificial intelligence and method thereof
CN109891410A (en) Data collection for new session conversational system
CN110222161B (en) Intelligent response method and device for conversation robot
CN115408500A (en) Question-answer consistency evaluation method and device, electronic equipment and medium
TWI752437B (en) At least two phoneme-based voice input operation method and computer program product
KR102577643B1 (en) Online one to one korean lecture platform system and operating server included in the same
CN112966077B (en) Method, device and equipment for determining conversation state and storage medium
Patel et al. My Buddy App: Communications between Smart Devices through Voice Assist

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant