Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a flowchart of a semantic resource training method for a speech dialogue platform according to an embodiment of the present invention, which includes the following steps:
s11: responding to clicking of a release button of a voice product in a voice conversation platform, acquiring a first version number of a training script from historical training information of the voice product, and acquiring a second version number of the training script of the voice conversation platform;
s12: when the first version number is consistent with the second version number, determining a first semantic resource corresponding to unmodified data and a second semantic resource corresponding to modified data in the voice product;
s13: determining a training category of the second semantic resource according to the modified data, and determining at least one sub-training for the second semantic resource;
s14: and determining a third semantic resource through the at least one sub-training, and combining the first semantic resource with the third semantic resource to complete the training of the semantic resource.
In this embodiment, the voice dialog platform is an intelligent dialog open platform for developers. In order to enable the whole intelligent conversation content to be easier to understand by developers and more humanized, the starting difficulty of the developers is reduced, and the development speed of the developers is increased. And a plurality of layers of high-level abstractions are made on the basic technologies such as speech recognition/semantic recognition, and the basic technologies are abstracted into various levels such as products/skills/tasks/intentions/descriptions. The semantic training of the voice dialogue platform is of a skill level, the skill comprises tasks, the tasks are interesting under the tasks, the tasks are semantic entities, and a word stock is added into the voice skill to ensure wider recognition.
Semantic training and recognition of the speech dialog platform is skill level. Skills are similar to an APP, and one or more specific functions are performed through a voice dialog, such as WeChat support for multiple functions: messages, circle of friends, payments, etc., a task is one of the functions, such as address query, navigation, search for surroundings or location. A task is one of the important components of a skill, being a single or a collection of multiple sessions that complete a certain function. Each turn of the dialog of the user can be considered as an intention, and a task is composed of one or more intentions.
The function of semantic recognition is to identify whether the user says that the skill was hit, and if so, which task and intent was hit. After the semantics recognize the specific intention, the following data processing and result display can be continued.
And the developer develops the voice product in the voice dialogue platform, and after the developer develops the voice skill and/or word bank in the voice product, the developer performs release test.
For step S11, the voice dialog platform responds to the click of the release button of the voice product, queries the historical training information of the voice product, and queries the version number of the last training script of the voice product, for example, the version number of the training script that last trained the voice product recorded in the historical training information of the voice product is 2.0; and acquiring a second version number of the training script of the voice dialogue platform.
For step S12, for example, the version number of the training script of the voice dialog platform is 2.0, and the first version number is consistent with the second version number, and it is determined that in the voice product, the first semantic resource corresponding to the unmodified data and the second semantic resource corresponding to the modified data are present. For example, the historical version can be found in the voice dialogue platform, so that which data in the currently issued voice product is modified and which data is not modified can be determined, and thus a first semantic resource corresponding to unmodified data and a second semantic resource corresponding to modified data in the voice product are determined.
For step S13, according to the modified data, determining a training category of the second semantic resource, and determining at least one sub-training for the second semantic resource, for example, the speech skills in the speech product are modified: adding and/or modifying speech skills, modifying intentions and/or utterances in speech skills; the lexicon in the speech product is modified. For these modified data, a training class of the second semantic resource is determined, thereby determining at least one sub-training.
For step S14, determining a trained third semantic resource through at least one sub-training determined in step S13, combining the first semantic resource and the third semantic resource, integrating the training result of the sub-task and the unchanged semantic resource in the last training according to the format required by the semantic parsing service, generating the training information according to the training script/training time/training path used for training, and finally generating the semantic resource to complete the training of the semantic resource in the speech dialogue platform.
According to the embodiment, through responding to the click of the release button of the voice product in the voice conversation platform, the semantic resources can be updated in real time after the developer modifies the data of the skill and/or word stock in the product, the debugging of the developer is facilitated, and the use experience of the developer is improved. Since the data are modified by comparison, only a small part of resources need to be trained during training, so that the resources of the server are saved, semantic resource training can still be efficiently completed under the condition that the resources of the server are in short supply, and the use experience of developers is further improved.
As an implementation manner, in this embodiment, the semantic resource training category includes: task classification, intention identification and word stock identification;
the modified data includes: phonetic skills, intentions and/or speech, lexicons;
the sub-training of the second semantic resource comprises at least task classification training when the modified data comprises at least speech skills,
when the modified data includes at least intent and/or utterance, the sub-training of the second semantic resource includes at least task classification training and intent recognition training,
the sub-training of the second semantic resource includes at least a lexicon recognition training when the modified data includes at least a lexicon.
In the present embodiment, for example, the developer adds an intention to inquire about weather to weather skills in voice skills of a voice product, and adds some cold place names to a thesaurus. This expands the functionality of the voice product and makes the voice product locale more widely applicable, so that the modified data includes intentions as well as word bases.
When the modified data includes an intent and a lexicon, the sub-training of the second semantic resource includes at least a task classification training, an intent recognition training, and a lexicon recognition training.
According to the embodiment, the modified data are subjected to task splitting, tasks of the same category are uniformly trained, corresponding training is performed on different tasks, and the training effect is improved.
As an implementation manner, in this embodiment, when the first version number is not consistent with the second version number, a training category of a semantic resource of the voice product is determined according to all data in the voice product, and at least one sub-training is determined for the semantic resource to train the semantic resource.
In this embodiment, when a training version inconsistency is found, all skills should be retrained in the speech dialog platform. Each time the trained speech recognition resource is trained by the training script of version 1.0 or 2.0, the semantic resource is available, and the speech dialogue platform calls the corresponding kernel version according to the version to use. And confirming the version so as to ensure that the training script of one resource is consistent, and if the training script of one resource is inconsistent, the kernel of the voice conversation platform cannot determine which version of the kernel is called corresponding to the voice conversation product.
According to the embodiment, by comparing the version numbers of the training scripts, the kernel can be definitely called by the voice dialogue platform, and the problem of kernel calling caused by the fact that the versions of the training scripts of semantic resources in the same voice product are not uniform is solved.
As an implementation manner, in this embodiment, the method further includes: the at least one sub-training is trained in parallel by the collaborative pass pool.
Generally, the thread pool and the process pool preferably process the same or similar tasks, and if the input and output of the tasks are greatly different, the process pool and the thread pool use a callback function to continue to execute the following tasks in the process of returning results of the tasks, so that the development difficulty is increased, and bugs are easily caused. Since there are many types of subtasks, if one thread pool is added for each task, the parallel effect is much worse. If multiple types of tasks are placed in a thread pool, the difficulty of development is increased. Therefore, the technique is adopted to realize a coroutine pool, and the training task can be added into the coroutine pool only by adding a modifier when in use. The code amount is simplified, and the parallel effect is ensured.
The implementation method can further accelerate the training speed through parallel training, reduce the waiting time of developers and improve the use experience of the developers.
Fig. 2 is a schematic structural diagram of a semantic resource training system for a voice dialog platform according to an embodiment of the present invention, where the technical solution of this embodiment is applicable to a semantic resource training method for a voice dialog platform of a device, and the system can execute the semantic resource training method for the voice dialog platform according to any of the above embodiments and is configured in a terminal.
The semantic resource training system for the voice dialogue platform provided by the embodiment comprises: version number determining program module 11, semantic resource determining program module 12, training category determining program module 13 and semantic resource training program module 14.
The version number determining program module 11 is configured to, in response to a click of a release button of a voice product in a voice dialog platform, obtain a first version number of a training script in historical training information of the voice product, and obtain a second version number of the training script of the voice dialog platform; the semantic resource determining program module 12 is configured to determine, when the first version number is consistent with the second version number, a first semantic resource corresponding to unmodified data and a second semantic resource corresponding to modified data in the voice product; the training category determining program module 13 is configured to determine a training category of the second semantic resource according to the modified data, and determine at least one sub-training for the second semantic resource; the semantic resource training program module 14 is configured to determine a third semantic resource through the at least one sub-training, and combine the first semantic resource with the third semantic resource to complete training of the semantic resource.
Further, the semantic resource training category includes: task classification, intention identification and word stock identification;
the modified data includes: phonetic skills, intentions and/or speech, lexicons;
the sub-training of the second semantic resource comprises at least task classification training when the modified data comprises at least speech skills,
when the modified data includes at least intent and/or utterance, the sub-training of the second semantic resource includes at least task classification training and intent recognition training,
the sub-training of the second semantic resource includes at least a lexicon recognition training when the modified data includes at least a lexicon.
Further, the system is also configured to: and when the first version number is not consistent with the second version number, determining the training type of the semantic resources of the voice product according to all data in the voice product, and determining at least one sub-training for the semantic resources to train the semantic resources.
Further, the system is also configured to: the at least one sub-training is trained in parallel by the collaborative pass pool.
The embodiment of the invention also provides a nonvolatile computer storage medium, wherein the computer storage medium stores computer executable instructions which can execute the semantic resource training method for the voice conversation platform in any method embodiment;
as one embodiment, a non-volatile computer storage medium of the present invention stores computer-executable instructions configured to:
responding to clicking of a word stock button created in a word stock editing page, and generating a word stock editing interface, wherein the word stock editing interface at least comprises a word stock adding button;
responding to clicking of a release button of a voice product in a voice conversation platform, acquiring a first version number of a training script from historical training information of the voice product, and acquiring a second version number of the training script of the voice conversation platform;
when the first version number is consistent with the second version number, determining a first semantic resource corresponding to unmodified data and a second semantic resource corresponding to modified data in the voice product;
determining a training category of the second semantic resource according to the modified data, and determining at least one sub-training for the second semantic resource;
and determining a third semantic resource through the at least one sub-training, and combining the first semantic resource with the third semantic resource to complete the training of the semantic resource.
As a non-volatile computer readable storage medium, may be used to store non-volatile software programs, non-volatile computer executable programs, and modules, such as program instructions/modules corresponding to the methods of testing software in embodiments of the present invention. One or more program instructions are stored in a non-transitory computer readable storage medium, which when executed by a processor, perform a semantic resource training method for a voice dialog platform in any of the method embodiments described above.
The non-volatile computer-readable storage medium may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of a device of test software, and the like. Further, the non-volatile computer-readable storage medium may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the non-transitory computer readable storage medium optionally includes memory located remotely from the processor, which may be connected to the means for testing software over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
An embodiment of the present invention further provides an electronic device, which includes: the apparatus includes at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the semantic resource training method for a speech dialog platform of any of the embodiments of the present invention.
The client of the embodiment of the present application exists in various forms, including but not limited to:
(1) mobile communication devices, which are characterized by mobile communication capabilities and are primarily targeted at providing voice and data communications. Such terminals include smart phones (e.g., iphones), multimedia phones, functional phones, and low-end phones, among others.
(2) The ultra-mobile personal computer equipment belongs to the category of personal computers, has calculation and processing functions and generally has the characteristic of mobile internet access. Such terminals include PDA, MID, and UMPC devices, such as ipads.
(3) Portable entertainment devices such devices may display and play multimedia content. Such devices include audio and video players (e.g., ipods), handheld game consoles, electronic books, as well as smart toys and portable car navigation devices.
(4) Other electronic devices with data processing capabilities.
In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.