CN113539245B

CN113539245B - Language model automatic training method and system

Info

Publication number: CN113539245B
Application number: CN202110757208.8A
Authority: CN
Inventors: 史彤; 董鑫; 初敏
Original assignee: Sipic Technology Co Ltd
Current assignee: Sipic Technology Co Ltd
Priority date: 2021-07-05
Filing date: 2021-07-05
Publication date: 2024-03-15
Anticipated expiration: 2041-07-05
Also published as: CN113539245A

Abstract

The embodiment of the invention provides an automatic training method for a language model. The method comprises the following steps: transmitting a desired intention list of a first language model facing a scene and/or a second language model facing a dialogue node to a semantic generalization system, and receiving self-generated language materials representing the intention of a speaker generalized by the semantic generalization system based on intention information in the desired intention list; the self-generated language materials are input into a language model training system, and the first language model and/or the second language model trained by the language model training system are automatically issued. The embodiment of the invention also provides a language model automatic training system applied to the robot customization system. The embodiment of the invention uses the corpus generated by semantic generalization in natural language generation as the data trained by the semantic recognition language model, and represents the likely speaking words of the speaker to a certain extent after generalization. A large number of utterances representing the speaker can be automatically obtained, and the accuracy is high, so that the recovery of the voice recognition robot is more accurate.

Description

Language model automatic training method and system

Technical Field

The invention relates to the field of intelligent voice, in particular to an automatic training method and system for a language model.

Background

Intelligent Speech dialog systems are often composed of five parts ASR (Automatic Speech Recognition ), NLU (Natural Language Processing, natural language understanding), DM (Dialog Management ), NLG (Natural Language Generation, natural language generation), TTS (Text To Speech, speech synthesis), and sometimes FAQ (Frequently Asked Questions, common questions and answers) may also be present. Speech recognition is the first module of an intelligent speech dialogue system, and the accuracy of speech recognition directly influences the task success rate of the whole dialogue system. Currently, according to the application scope, the language model is defined as follows:

one way language model: the universal language model for speech recognition is suitable for a wide dialogue system, such as boring.

Two-way language model: the speech recognition language model oriented to different industry scenes is suitable for a dialogue system of a certain vertical industry, such as financial industry and express industry.

The three-way language model is oriented to the voice recognition language model of different dialogue nodes and is suitable for the dialogue nodes with specific replies, such as the indication confirmation and the indication license plate number.

For a robot in a certain scene, a two-way language model can be configured to enhance ASR recognition; for some nodes in the robot, three-way language models can be configured to enhance ASR recognition.

The corresponding possible reply corpus training and association model is added for the specific scene and the nodes in an offline manner, so that the voice recognition accuracy of the voice robot can be improved to a great extent. Model training often begins with corpus collection, manually training and binding corresponding scenes or nodes.

In the process of implementing the present invention, the inventor finds that at least the following problems exist in the related art:

the traditional two-way and three-way language model construction application method often needs to manually sort a large amount of corpus, manually train and associate the corpus with corresponding scenes or nodes. Manual operation has the possibility of error, and the corpus of manual arrangement often has the condition of insufficient completeness, and repeated training is needed.

The manual corpus arrangement and the manual repetition times often represent a relatively large workload, and the corpus arrangement is not easy to complete at one time, and the corpus is generally used as the corpus to be repeated for several times to train the model when testing verification or on-line environment finds some cases which are incorrectly identified. When new error recognition cases occur again, corpus retraining is added again, and manual association is repeated. The above process is often a method for mending sheep after an error occurs, and the strengthening model is not applied in the robot customization process.

Model training and dialogue customization often need the manual work to operate a plurality of systems simultaneously on different systems, and the mode of production that the tuning flow of above is not standardized increases to some extent, and to the robot customization, the work load size is difficult for the aassessment, and easy repeated single work, and continuous manual training improves discernment, and efficiency is lower, and the cost of labor is great, and easily makes mistakes.

Disclosure of Invention

In order to at least solve the problems of large corpus requirement and low efficiency of training models in the prior art.

In a first aspect, an embodiment of the present invention provides a method for automatically training a language model, which is applied to a robot customization system, including:

transmitting a desired intention list of a first language model facing a scene and/or a second language model facing a dialogue node to a semantic generalization system, and receiving self-generated language representing the intention of a speaker generalized by the semantic generalization system based on intention information in the desired intention list;

inputting the self-generated language materials into a language model training system, and automatically publishing a first language model and/or a second language model trained by the language model training system.

In a second aspect, an embodiment of the present invention provides a method for automatically configuring a language model, which is applied to a robot customization system, including:

inputting the self-generated language materials into a language model training system, and automatically publishing a first language model and/or a second language model trained by the language model training system;

and automatically associating the first language model and/or the second language model with corresponding scenes and/or dialogue nodes to realize automatic configuration of the language models.

In a third aspect, an embodiment of the present invention provides a language model automatic training system applied to a robot customization system, including:

a self-generated language determining program module, configured to send a desired intention list of a first language model facing a scene and/or a second language model facing a dialogue node to a semantic generalization system, and receive self-generated language representing intention of a speaker generalized by the semantic generalization system based on intention information in the desired intention list;

the self-training program module is used for inputting the self-generated language materials into a language model training system and automatically publishing the first language model and/or the second language model trained by the language model training system.

In a fourth aspect, an embodiment of the present invention provides a language model automatic configuration system applied to a robot customization system, including:

the self-training program module is used for inputting the self-generated language materials into a language model training system and automatically publishing a first language model and/or a second language model trained by the language model training system;

and the self-association program module is used for automatically associating the first language model and/or the second language model with the corresponding scene and/or the dialogue node so as to realize the automatic configuration of the language model.

In a fifth aspect, there is provided an electronic device, comprising: the system comprises at least one processor and a memory communicatively connected with the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the language model automatic training method and the automatic configuration method of any one of the embodiments of the present invention.

In a sixth aspect, an embodiment of the present invention provides a storage medium having a computer program stored thereon, wherein the program when executed by a processor implements the steps of the language model automatic training method and the automatic configuration method of any one of the embodiments of the present invention.

The embodiment of the invention has the beneficial effects that: the corpus generated by semantic generalization in natural language generation is used as data for training a semantic recognition language model, and as the expected intention list of the intelligent voice robot represents the intention possibly expressed by a person talking with the intelligent voice robot, the generalized corpus represents the words likely to be spoken by a speaker to a certain extent. Therefore, a large number of utterances representing the speaker can be automatically obtained, the labor cost is saved, and meanwhile, the accuracy is high, so that the voice recognition robot has more accurate voice recognition on a scene or node specific reply.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a method for automatic training of language models according to an embodiment of the present invention;

FIG. 2 is a block diagram of an automatic language model training flow of an automatic language model training method according to an embodiment of the present invention;

FIG. 3 is a flow chart of a method for automatically configuring language models according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a dialogue node speaking edit click "three-way model training" of a language model automatic configuration method according to an embodiment of the present invention;

FIG. 5 is a schematic drawing of automatic speech recognition three-way model automatic training corpus of a language model automatic configuration method according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of inputting supplementary corpus into a supplementary corpus frame of an automatic configuration method of language model according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of a training state of a submission training review of a method for automatically configuring a language model according to an embodiment of the present invention;

FIG. 8 is a schematic diagram of an automatic training completion and association three-way model resource for a language model automatic configuration method according to an embodiment of the present invention;

FIG. 9 is a schematic diagram of a scenario set click "model training" pull corpus of a method for automatically configuring language models according to an embodiment of the present invention;

FIG. 10 is a schematic diagram of a two-way model training popup for automatic speech recognition for a method for automatically configuring language models according to an embodiment of the present invention, in which "select corpus" is clicked;

FIG. 11 is a schematic diagram of a new pulling task for clicking "generate corpus" in a corpus list of an automatic configuration method for language models according to an embodiment of the present invention;

FIG. 12 is a schematic diagram of refreshing a list during execution of a corpus pulling task in a language model auto-configuration method according to an embodiment of the present invention;

FIG. 13 is a schematic diagram of inputting supplementary corpus into a supplementary corpus frame of an automatic configuration method of language model according to an embodiment of the present invention;

FIG. 14 is a schematic diagram showing the submitting training and viewing training states of a language model automatic configuration method according to an embodiment of the present invention;

FIG. 15 is a schematic diagram of an automatic training completion and association two-way model resource of a language model automatic configuration method according to an embodiment of the present invention;

FIG. 16 is a schematic diagram of a language model automatic training system applied to a robot customization system according to an embodiment of the present invention;

fig. 17 is a schematic structural diagram of a language model automatic configuration system applied to a robot customization system according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

FIG. 1 is a flowchart of a method for automatically training a language model according to an embodiment of the present invention, including the following steps:

s11: transmitting a desired intention list of a first language model facing a scene and/or a second language model facing a dialogue node to a semantic generalization system, and receiving self-generated language representing the intention of a speaker generalized by the semantic generalization system based on intention information in the desired intention list;

s12: inputting the self-generated language materials into a language model training system, and automatically publishing a first language model and/or a second language model trained by the language model training system.

In the embodiment, the method is applied to a robot customization system, and a function of automatically training a two-way three-way language model is added to the robot customization system. The two-way model can take possible replies related to the whole robot as self-generated language materials of all NLUs (Natural Language Processing, natural language understanding), can selectively add supplementary language materials during training, initiate model training, automatically release the model training when the training is completed, and associate the robot with a language model. The three-path model can take possible replies related to the corresponding dialogue nodes as all self-generated corpus, the supplementary corpus can be added during training, model training is initiated, and the model training is automatically released when the training is completed.

For step S11, the robot customization system actively pulls corpus from the semantic generalization system, as shown in fig. 2, the developer prepares a desired intent list of project requirements, for example, a language model of the express industry is to be developed for different requirements, or in a certain dialogue, a dialogue node that needs user identification confirmation is needed. After the developer prepares the expected intention list, the expected intention list is input into a robot customization system, and the robot customization system pulls corpus to a semantic generalization system based on the expected intention list.

The semantic generalization system expands the corpus based on the expected intention list, and sets the corpus repetition number according to the parameters, so that a corpus file is obtained.

As one embodiment, the expected intention list of the first language model facing the scene includes: the business intent of the robot and knowledge base statements of the robot scene configuration.

The expected intention list of the second language model facing the dialogue node comprises: the service intention of the current node, the service intention of the global dialogue and the knowledge base question of the robot scene configuration.

The first language model comprises a two-way language model, and the second language model comprises a three-way language model.

In this embodiment, the robot customization system sends the desired intent list to the semantic generalization system to obtain the corpus required by the robot scene or node. The expected intention list of the two-way model includes: the business intent of the complete robot (e.g., send express, query express information, cancel transactions, etc.), the complete knowledge base question of the robot scene configuration (e.g., "how to adjust volume; the expected intent list of the three-way model includes: the service intention of the current node, the global dialogue intention and the complete knowledge base question of the robot scene configuration.

As an embodiment, the expected intention list of the second language model facing the dialogue node includes: the service intention of the current node, the service intention of the global dialogue and the knowledge base question of the robot scene configuration.

In this embodiment, the semantic generalization system generates the material file from the expected intent list, and different types have different expected generation strategies.

The classification of business intents has built-in intents, which are algorithm intents built in the system, and are used for general semantic recognition (generally realized by machine learning model training or regular rules), such as getting a part up and turning down the volume. The strategy is to obtain forward corpus data or expansion regular rules during model training. The classification of business intents also has regular intents, meaning keywords or regular rules written according to business requirements when customizing the robot, and possibly referencing a dictionary, for example: numbers (error |not vs|error|incorrect|error|problematic|not). The strategy is to expand and write corpus according to regular rules, and the wildcard makes part-of-speech expansion (human pronouns, verbs and the like) according to positions; the content in the dictionary is read and added to the corpus. The classification of business intents also has similarity intents, meaning similar sentences filled in according to business requirements when customizing the robot, for example: this number is not paired. This number should not be the case. The strategy is to use all similar sentences as part of the corpus.

The classification of knowledge base questions has standard questions, meaning standard questions of knowledge questions and answers. For example: how does volume adjust? The strategy is to use standard questions as part of the corpus. The classification of knowledge base questions also has similar questions, meaning is the expanded similar questions (containing complete sentences and regular rule sentences) of knowledge questions and answers. For example: what is the volume of the Sibichi voice channel to adjust and press? What is the speech volume of the Sibichi too small? Voice volume. The strategy is to take complete similar questions as a part of corpus, and expand the regular sentences into complete sentences.

All possible expected intention combinations can occur billions of times, so that the training set with proper data size is finally obtained by screening according to the positions of sentences and the confidence of the sentences. Because the two-way and three-way language model is used for enhancing ASR recognition, the generated corpus can be repeated for 3-10 times according to parameters, and finally the NLU self-generated language file is completed.

For step S12, the self-generated corpus file is sent to a language model training system for model training, when the self-generated corpus is complete, the developer can choose to submit training, the system can submit tasks to the language model training system in real time, after a period of training (generally 5-10 minutes), the two-way three-way language model training is successful and can be automatically released to an online available state.

According to the embodiment, corpus generated by semantic generalization in natural language generation is used as data for training a semantic recognition language model, and the expected intention list of the intelligent voice robot represents the intention possibly expressed by a person talking with the intelligent voice robot, so that the language model represents the words which are likely to be spoken by a speaker to a certain extent after generalization. Therefore, a large number of utterances representing the speaker can be automatically obtained, the labor cost is saved, and meanwhile, the accuracy is high, so that the voice recognition robot has more accurate voice recognition on a scene or node specific reply.

As an embodiment, after said receiving the self-generated speech representing the speaker's intention generalized by the semantic generalization system based on the intention information within the expected intention list, the method further comprises:

previewing the self-generated language material for a developer;

and when the speech recognition incorrect corpus is not contained in the self-generated corpus, receiving supplementary corpus input by the developer, and supplementing the self-generated corpus based on the supplementary corpus.

In the present embodiment, in order to give the developer the opportunity to supplement, the self-generated corpus is obtained and then previewed in consideration of the fact that the corpus content due to the erroneous speech recognition is not included in the above self-generated corpus file. Thus, when speech recognition is incorrect, the generated corpus is not generated, which requires a developer to supplement the corpus. The developer can manually supplement the corpus and actively repeat the times. And training a language model by using the supplemented corpus.

According to the embodiment, a browsing interface is provided for a developer, and the corpus supplemented by the developer can be received, so that the developer is assisted to supplement the self-generated corpus, and the robot has more accurate voice recognition on a scene or node specific reply.

Fig. 3 is a flowchart of a method for automatically configuring a language model according to an embodiment of the present invention, including the following steps:

s21: transmitting a desired intention list of a first language model facing a scene and/or a second language model facing a dialogue node to a semantic generalization system, and receiving self-generated language representing the intention of a speaker generalized by the semantic generalization system based on intention information in the desired intention list;

s22: inputting the self-generated language materials into a language model training system, and automatically publishing a first language model and/or a second language model trained by the language model training system;

and S23, automatically associating the first language model and/or the second language model with the corresponding scene and/or the dialogue node so as to realize automatic configuration of the language model.

In this embodiment, in order to make the language model more automatic, after the automatic release, the language model is automatically associated with the corresponding scene or dialogue node, so that the automatic configuration of the language model is realized as a whole.

As one embodiment, the first language model and/or the second language model are/is automatically associated with a corresponding scene and/or dialogue node in a configuration interface for display.

Providing a text box of the dialogue node and three-way model training buttons in the natural language generation configuration interface;

generating a three-way model automatic training configuration frame in response to clicking of the three-way model training button by a developer, and providing a corpus file generated based on a desired intention list and a text box for supplementing the corpus in the three-way model automatic training configuration frame;

and responding to clicking of a submitting training button in the three-way model automatic training configuration frame by a developer, performing three-way model training, jumping back to the natural language generation configuration interface, and automatically associating the dialogue node with the three-way language model after training is completed, so as to identify the next-round voice replying to the speaking operation, thereby completing the automatic configuration of the dialogue node of the dialogue robot.

In this embodiment, in the intelligent voice conversation system for outbound call, the user may ask the telephone recipient if he/she is himself/herself, and the conversation is completed at the "confirm identity" node, and the user may indicate intention such as confirm, deny, busy, etc. at the node. The system recognizes the user voice, then recognizes the intention, and finally jumps the dialogue node according to the intention label. Next, taking the construction and application process of the identity node three-way model as an example, the automatic training of the language model of the dialogue node layer is introduced.

Taking the node of "confirm identity" as an example, the three-way model training is selected in the interface, and the three-way model automatic training configuration box is used for displaying the pulled corpus as shown in fig. 5. Clicking the 'click to view corpus content', viewing all information of the intelligently generated corpus, and viewing txt format text content through 'downloading'.

When the developer wishes to supplement the corpus, the corpus content to be trained is manually supplemented in a "supplement corpus" input box, as shown in fig. 6, and the corpus content which is actually not recognized as correct by the business can be filled up and repeated for 3 times.

Clicking "submit training", creating three language model training tasks, and as shown in fig. 7, displaying the current state "in model training", clicking "refreshing", and viewing the training state. As shown in FIG. 8, the model is automatically associated after completion, and the model name and corresponding ID are displayed at the "Next ASR three-way resource". Therefore, automatic training and association of ASR three-way language models are completed, the language models can act on voice recognition in the voice test process, and recognition accuracy of model training corpus is improved.

As another embodiment, the configuration interface includes: a scene configuration interface facing the scene;

providing a two-way model training button of the scene in the scene configuration interface;

generating a two-way model automatic training configuration frame in response to clicking of the two-way model training button by a developer, and providing a corpus file generated based on a desired intention list and a text box for supplementing the corpus in the two-way model automatic training configuration frame;

and responding to clicking of a submitting training button in the two-way model automatic training configuration frame by a developer, performing two-way model training, jumping back to the scene configuration interface, and automatically associating the scene with the two-way model after training is completed, wherein the scene is used for recognizing dialogue voices in the scene so as to complete automatic configuration of the scene of the dialogue robot.

In the embodiment, automatic construction and association of the robot two-way model are shown. Intelligent voice robots often talk to users to perform conversational tasks, for example, in the financial arts, involving a large number of financial, banking, credit-related proper terms. The system recognizes the user voice, then recognizes the intention, and finally jumps the dialogue node according to the intention label. To improve proper noun recognition accuracy, two paths of model resources of scene dimensions are required to be configured. Next, taking automatic construction and association of the two-way model of the financial robot as an example, the automatic training of the two-way model based on NLU is introduced.

In order to pull the corpus, as shown in fig. 9, in the scene setting of the financial robot, the model training is clicked, and the model training page is opened.

Clicking "select corpus", selecting corpus needed for training from the corpus list which is successfully pulled, clicking "generate corpus", creating corpus pulling task, and refreshing list viewing state in task execution process. As shown in fig. 10, the two-way model trains the popup window, clicks "select corpus", as shown in fig. 11, clicks "generate corpus" new pull task in the corpus list, as shown in fig. 12, and in the execution of the corpus pull task, the list view can be refreshed. After the supplementary corpus is selected, the supplementary corpus is input into a supplementary corpus box shown in fig. 13, the corpus content to be trained is manually supplemented in the supplementary corpus input box, and the corpus content which is actually not recognized to be correct by the service can be filled up and written for 3 times.

Clicking "submit training", creating a two-way language model training task, and as shown in fig. 14, displaying "in model training" in the current state, clicking "refreshing", and viewing the training state. As shown in FIG. 15, the model is automatically associated after completion, and the model name and corresponding ID are displayed at the "ASR two-way model". Thus, automatic training and association of the ASR two-way language model are completed, and the language model can act on voice recognition in the voice test process, so that the recognition accuracy of the model training corpus is improved.

Fig. 16 is a schematic structural diagram of a language model automatic training system applied to a robot customization system according to an embodiment of the present invention, where the system may execute the language model automatic training method according to any of the above embodiments and be configured in a terminal.

The language model automatic training system 10 applied to the robot customization system provided in the present embodiment includes: a self-generated corpus determining program module 11 and a self-training program module 12.

Wherein the self-generated language determining program module 11 is configured to send a desired intention list of a first language model facing a scene and/or a second language model facing a dialogue node to a semantic generalization system, and receive self-generated language representing intention of a speaker generalized by the semantic generalization system based on intention information in the desired intention list; the self-training program module 12 is configured to input the self-generated language material into a language model training system, and automatically issue a first language model and/or a second language model trained by the language model training system.

The embodiment of the invention also provides a nonvolatile computer storage medium, wherein the computer storage medium stores computer executable instructions, and the computer executable instructions can execute the language model automatic training method in any method embodiment;

as one embodiment, the non-volatile computer storage medium of the present invention stores computer-executable instructions configured to:

As a non-volatile computer readable storage medium, it may be used to store a non-volatile software program, a non-volatile computer executable program, and modules, such as program instructions/modules corresponding to the methods in the embodiments of the present invention. One or more program instructions are stored in a non-transitory computer readable storage medium that, when executed by a processor, perform the language model auto-training method of any of the method embodiments described above.

Fig. 17 is a schematic structural diagram of a language model automatic configuration system applied to a robot customization system according to an embodiment of the present invention, where the system may execute the language model automatic configuration method according to any of the foregoing embodiments and configure the method in a terminal.

The language model automatic configuration system 20 applied to the robot customization system provided in the present embodiment includes: a self-generated corpus determining program module 21, a self-training program module 22 and a self-association program module 23.

Wherein the self-generated language determining program module 21 is configured to send a desired intention list of a first language model facing a scene and/or a second language model facing a dialogue node to a semantic generalization system, and receive self-generated language representing intention of a speaker generalized by the semantic generalization system based on intention information in the desired intention list; the self-training program module 22 is configured to input the self-generated language material into a language model training system, and automatically issue a first language model and/or a second language model trained by the language model training system; the self-association program module 23 is configured to automatically associate the first language model and/or the second language model with a corresponding scene and/or dialogue node, so as to implement automatic configuration of the language model.

The embodiment of the invention also provides a nonvolatile computer storage medium, wherein the computer storage medium stores computer executable instructions, and the computer executable instructions can execute the automatic configuration method of the language model in any method embodiment;

As a non-volatile computer readable storage medium, it may be used to store a non-volatile software program, a non-volatile computer executable program, and modules, such as program instructions/modules corresponding to the methods in the embodiments of the present invention. One or more program instructions are stored in a non-transitory computer readable storage medium that, when executed by a processor, perform the language model auto-configuration method of any of the method embodiments described above.

The non-transitory computer readable storage medium may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the device, etc. Further, the non-volatile computer-readable storage medium may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the non-transitory computer readable storage medium may optionally include memory remotely located relative to the processor, which may be connected to the apparatus via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The embodiment of the invention also provides electronic equipment, which comprises: the system comprises at least one processor and a memory communicatively connected with the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the language model automatic training and configuration method of any one of the embodiments of the present invention.

The electronic device of the embodiments of the present application exist in a variety of forms including, but not limited to:

(1) Mobile communication devices, which are characterized by mobile communication functionality and are aimed at providing voice, data communication. Such terminals include smart phones, multimedia phones, functional phones, low-end phones, and the like.

(2) Ultra mobile personal computer equipment, which belongs to the category of personal computers, has the functions of calculation and processing and generally has the characteristic of mobile internet surfing. Such terminals include PDA, MID, and UMPC devices, etc., such as tablet computers.

(3) Portable entertainment devices such devices can display and play multimedia content. The device comprises an audio player, a video player, a palm game machine, an electronic book, an intelligent toy and a portable vehicle navigation device.

(4) Other electronic devices with data processing functions.

In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," comprising, "or" includes not only those elements but also other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. An automatic training method of a language model is applied to a robot customization system and comprises the following steps:

transmitting a desired intention list of a first language model facing a scene and/or a second language model facing a dialogue node to a semantic generalization system, and receiving self-generated language representing the intention of a speaker generalized by the semantic generalization system based on intention information in the desired intention list; the semantic generalization system generates a language file according to the expected intention list, and different intention types have different language generation strategies;

previewing the self-generated language material for a developer;

when the speech recognition incorrect corpus is not contained in the self-generated corpus, receiving supplementary corpus input by the developer, and supplementing the self-generated corpus based on the supplementary corpus;

the list of desired intents includes: business intention and knowledge base question, the type of the business intention includes: built-in intention, regular intention; the built-in intention is algorithm intention built in the system and used for general semantic recognition, and the strategy is to acquire forward corpus data or expanded writing regular rules during model training; the regular intention means that keywords or regular rules are written according to business requirements when the robot is customized, the strategy means that corpus is expanded according to the regular rules, and the wildcards are expanded according to parts of speech according to positions.

2. The method of claim 1, wherein the expected intent list of the scene-oriented first language model comprises: the business intent of the robot and knowledge base statements of the robot scene configuration.

3. The method of claim 1, wherein the list of expected intents of the dialog node-oriented second language model comprises: the service intention of the current node, the service intention of the global dialogue and the knowledge base question of the robot scene configuration.

4. A method according to any one of claims 2 or 3, wherein the type of business intent comprises: similarity intent, types of the knowledge base questions include: standard questions and similar questions.

5. The method of claim 1, wherein the first language model comprises a two-way language model and the second language model comprises a three-way language model.

6. A language model automatic configuration method is applied to a robot customization system and comprises the following steps:

transmitting a desired intention list of a first language model facing a scene and/or a second language model facing a dialogue node to a semantic generalization system, and receiving self-generated language representing the intention of a speaker generalized by the semantic generalization system based on intention information in the desired intention list; previewing the self-generated language material for a developer; when the speech recognition incorrect corpus is not contained in the self-generated corpus, receiving supplementary corpus input by the developer, and supplementing the self-generated corpus based on the supplementary corpus; the semantic generalization system generates a language file according to the expected intention list, and different intention types have different language generation strategies;

automatically associating the first language model and/or the second language model with corresponding scenes and/or dialogue nodes to realize automatic configuration of the language models;

7. The method of claim 6, wherein after the automatically associating the first and/or second language model with the corresponding scene and/or dialog node, the method further comprises:

and automatically associating the first language model and/or the second language model with the corresponding scene and/or the dialogue node in a configuration interface for display.

8. The method of claim 7, wherein the configuration interface comprises: generating a configuration interface by natural language facing to the dialogue node;

and responding to clicking of a submitting training button in the three-way model automatic training configuration frame by a developer, performing three-way model training, jumping back to the natural language generation configuration interface, and automatically associating the dialogue node with the three-way model after training is completed, so as to identify the next-round voice replying to the speaking operation, thereby completing the automatic configuration of the dialogue node of the dialogue robot.

9. The method of claim 7, wherein the configuration interface comprises: a scene configuration interface facing the scene;

10. A language model automatic training system for a robotic customization system, comprising:

a self-generated language determining program module, configured to send a desired intention list of a first language model facing a scene and/or a second language model facing a dialogue node to a semantic generalization system, and receive self-generated language representing intention of a speaker generalized by the semantic generalization system based on intention information in the desired intention list; previewing the self-generated language material for a developer; when the speech recognition incorrect corpus is not contained in the self-generated corpus, receiving supplementary corpus input by the developer, and supplementing the self-generated corpus based on the supplementary corpus; the semantic generalization system generates a language file according to the expected intention list, and different intention types have different language generation strategies;

11. A language model auto-configuration system for a robotic customization system, comprising:

the self-association program module is used for automatically associating the first language model and/or the second language model with the corresponding scene and/or dialogue node so as to realize the automatic configuration of the language model;