CN114357125A

CN114357125A - Natural language identification method, device and equipment in task type dialogue system

Info

Publication number: CN114357125A
Application number: CN202011086835.5A
Authority: CN
Inventors: 李淼; 熊昊奇; 曹云波
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-10-12
Filing date: 2020-10-12
Publication date: 2022-04-15

Abstract

The application provides a natural language identification method, a natural language identification device and natural language identification equipment in a task-based dialog system, and relates to the technical field of computers, in particular to the field of artificial intelligence. The method comprises the following steps: determining a target skill matched with the semantics of the natural language to be recognized according to the matching condition between the semantic information of the natural language to be recognized and the skill description information of each conversational task in the task-based conversational system; determining a target intention according to matching conditions between semantic information of a natural language to be recognized and intention description information of each alternative intention corresponding to the target skill; extracting slot position information of each alternative slot position from the natural language to be identified according to slot position description information of each alternative slot position corresponding to the target intention and slot position information extraction conditions; and obtaining a conversation type task recognition result according to the skill description information of the target skill, the intention description information of the target intention, the slot position description information of each alternative slot position and the slot position information thereof.

Description

Natural language identification method, device and equipment in task type dialogue system

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, and a device for natural language identification in a task-based dialog system.

Background

The open type dialogue platform comprises a plurality of task type dialogue systems with different functions, wherein the task type dialogue systems are dialogue systems capable of helping users to solve specific tasks, each task type dialogue system corresponds to a related skill, for example, skills for helping users to inquire in the field of airline tickets and reserving related airline tickets, the task type dialogue systems correctly understand various possible user inputs related to the skills through a natural language understanding model, and in the related art, the open type dialogue platform allows developers to develop new skills independently. The number of skills supported by the open dialog platform greatly affects the user experience, and therefore, the development of new skills which can be developed by developers more quickly and easily becomes the core capability of the open dialog platform.

When developing a new skill, in order to implement a robust natural language understanding model, a developer needs to compile a large number of linguistic data related to the skill, and label the compiled linguistic data to provide sample data for the natural language understanding model, the developer usually needs to write and label thousands of linguistic data to train to obtain a better natural language understanding model, so that recognition of a conversational task is achieved, and the developer needs to pay huge labor cost and time cost, and development efficiency is affected. Therefore, how to improve the labeling efficiency of natural language and the recognition efficiency of conversational tasks is a problem to be considered.

Disclosure of Invention

The embodiment of the application provides a natural language identification method, a natural language identification device and natural language identification equipment in a task type dialog system, which are used for improving the marking efficiency of natural languages and the development efficiency of the task type dialog system.

In a first aspect of the present application, a natural language identification method in a task-based dialog system is provided, including:

determining a target skill matched with the semantics of the natural language to be recognized from each conversational task according to a matching condition between the semantic information of the natural language to be recognized and the skill description information of each conversational task in the task-based conversational system;

determining a target intention matched with the semantics of the natural language to be recognized according to matching conditions between the semantic information of the natural language to be recognized and intention description information of each alternative intention corresponding to the target skill, wherein each intention description information is used for describing a subtask included in the conversational task;

according to the slot position description information of each alternative slot position corresponding to the target intention and the slot position information extraction condition, extracting the slot position information of each alternative slot position from the natural language to be identified, wherein the slot position information of each alternative slot position is used for limiting each key information of the subtask corresponding to the intention;

and obtaining a natural language conversation type task recognition result to be recognized according to the skill description information of the target skill, the intention description information of the target intention, the slot description information of each alternative slot and the slot information thereof.

In one possible implementation manner, the method further includes:

labeling natural languages identified as untrained conversational tasks;

obtaining sample data of an untrained conversational task;

updating and training the skill classification model, the intention recognition model and the slot position extraction model based on the obtained sample data; and are

Updating the untrained conversational task to a trained conversational task.

In a second aspect of the present application, a method for natural language labeling in a task-based dialog system is provided, including:

extracting slot position information of each alternative slot position from the natural language to be identified according to slot position description information and slot position information extraction conditions of each alternative slot position corresponding to the target intention, wherein the slot position information of each alternative slot position is used for limiting each key information of the subtask corresponding to the intention;

and marking the natural language to be identified according to the skill description information of the target skill, the intention description information of the target intention, and the slot description information and the slot information of each alternative slot to obtain sample data of the conversational task.

In a third aspect of the present application, there is provided a natural language recognition apparatus in a task-based dialog system, comprising:

a first skill determination unit, configured to determine, from each dialogue-type task, a target skill that matches semantics of a natural language to be recognized, according to a matching condition between semantic information of the natural language to be recognized and skill description information of each dialogue-type task in the task-type dialogue system;

a first intention determining unit, configured to determine a target intention that matches the semantics of the natural language to be recognized according to a matching condition between the semantic information of the natural language to be recognized and intention description information of each alternative intention corresponding to the target skill, each intention description information being used for describing one subtask included in the conversational task;

a first slot position determining unit, configured to extract slot position information of each candidate slot position from the natural language to be identified according to slot position description information and a slot position information extraction condition of each candidate slot position corresponding to the target intent, where the slot position information of each candidate slot position is used to define each key information of a subtask corresponding to the intent;

and the recognition unit is used for obtaining the conversation type task recognition result of the natural language to be recognized according to the skill description information of the target skill, the intention description information of the target intention, the slot position description information of each alternative slot position and the slot position information thereof.

In one possible implementation, the target skill is obtained based on a trained skill classification model, the target intent is obtained based on a trained intent recognition model, and the slot position information is obtained based on a trained slot position extraction model, wherein:

the conversational tasks comprise trained conversational tasks and untrained conversational tasks, and the skill classification model, the intention recognition model and the slot position extraction model are obtained by adopting sample data training of each trained conversational task;

the untrained conversational task comprises: skill description information, intention description information of various alternative intentions, and alternative slot description information of each alternative intention.

In one possible implementation manner, the skill classification model includes a first vector representation module and a first classification module, a skill classification corpus sample for training the skill classification model includes at least one sample corresponding to each trained conversational task, each skill classification corpus sample is labeled with a similar probability value of the conversational task to which the skill classification model belongs, and the trained skill classification model is trained by the following method:

respectively obtaining a first target vector representation of skill description information of each trained task based on a first vector representation module;

for each skill classification corpus sample, obtaining a first reference vector representation of the skill classification corpus sample based on a first vector representation module;

respectively obtaining first similarity between the first reference vector representations and the first target vector representations on the basis of the first classification module;

and adjusting parameters of the first vector representation module and the first classification module based on the obtained first similarities until the first similarities meet the set condition.

In a possible implementation, the skill determination unit is specifically configured to:

obtaining a first reference vector representation of skill description information for each trained conversational task and untrained conversational task, respectively, based on a first vector representation module;

obtaining a first to-be-identified vector representation of the to-be-identified natural corpus based on the first vector representation module;

and respectively obtaining second similarity between the first control vector representation and each first to-be-identified vector representation based on the first classification module, and obtaining a classification result based on the second similarity.

In one possible implementation, the intention recognition model includes a second vector representation module and a second classification module, the intention recognition corpus samples for training the intention recognition model include at least one sample corresponding to each of the trained conversational tasks, each of the intention recognition corpus samples is labeled with a similar probability value of the candidate intention, and the trained intention recognition model is trained by:

respectively obtaining second target vector representation of intention description information corresponding to each subtask in each trained task based on a second vector representation module;

for each intention recognition corpus sample, obtaining a second reference vector representation of the intention recognition corpus sample based on a second vector representation module;

respectively obtaining third similarity between the second reference vector representations and the second target vector representations on the basis of a second classification module;

and adjusting the parameters of the second vector representation module and the second classification module based on the obtained third similarities until the third similarities meet the set conditions.

In a possible implementation, the intent determination unit is specifically configured to:

respectively obtaining a second reference vector representation of the intention description information of each subtask in the trained conversational task and each subtask in the untrained conversational task based on a second vector representation module;

obtaining a second vector representation to be recognized of the natural corpus to be recognized based on the second vector representation module;

and respectively obtaining fourth similarity between the second control vector representation and each second vector representation to be identified based on the second classification module, and obtaining a classification result based on the fourth similarity.

In one possible implementation manner, the slot extraction model includes a third vector representation module and a slot extraction module, the slot extraction corpus sample for training the slot extraction model includes at least one sample corresponding to each trained conversational task, each slot extraction corpus sample is labeled with reference slot position information of an affiliated candidate slot position, and the trained slot extraction model is trained in the following manner:

extracting a corpus sample from each slot position, and obtaining a third reference vector representation of the extracted corpus sample of the slot position based on a third vector representation module;

respectively obtaining first target slot position information of slot position description information corresponding to each alternative slot position in the third reference vector representation based on the slot position extraction module;

and adjusting parameters of the third vector representation module and the slot position extraction module based on the obtained first target slot position information until the obtained first target slot position information meets the set condition.

In a possible implementation manner, the slot determining unit is specifically configured to:

obtaining a third vector representation to be recognized of the natural corpus to be recognized based on the third vector representation module;

and respectively obtaining second target slot position information represented by a third vector to be identified based on the slot position extraction module, and obtaining a classification result based on the second target slot position information.

In a possible implementation manner, the system further includes a first labeling unit, specifically configured to:

labeling natural languages identified as untrained conversational tasks;

obtaining sample data of an untrained conversational task;

Updating the untrained conversational task to a trained conversational task.

In a fourth aspect of the present application, there is provided a natural language labeling apparatus in a task-based dialog system, including:

a second skill determination unit, configured to determine, from each dialogue-type task, a target skill that matches the semantic of the natural language to be recognized, according to a matching condition between the semantic information of the natural language to be recognized and the skill description information of each dialogue-type task in the task-type dialogue system;

a second intention determining unit, configured to determine a target intention that matches the semantics of the natural language to be recognized according to a matching condition between the semantic information of the natural language to be recognized and intention description information of each alternative intention corresponding to the target skill, each intention description information being used for describing one subtask included in the conversational task;

a second slot position determining unit, configured to extract slot position information of each candidate slot position from the natural language to be identified according to slot position description information and slot position information extraction conditions of each candidate slot position corresponding to the target intent, where the slot position information of each candidate slot position is used to define each key information of a subtask corresponding to the intent;

and the second labeling unit is used for labeling the natural language to be identified according to the skill description information of the target skill, the intention description information of the target intention, and the slot position description information and the slot position information of each alternative slot position to obtain sample data of the conversational task.

the conversational tasks comprise trained conversational tasks and untrained conversational tasks, and the skill classification model, the intention recognition model and the slot position extraction model are obtained by training sample data of each trained conversational task.

In a possible implementation manner, the system further includes a training unit, specifically configured to:

and further performing updating training on the skill classification model, the intention recognition model and the slot position extraction model according to the obtained sample data, wherein when the obtained sample data comprises the natural language recognized as the untrained conversational task, the untrained conversational task is updated into the trained conversational task after the updating training.

In a fifth aspect of the present application, there is provided a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of the first aspect when executing the program.

A sixth aspect of the present application provides a computer readable storage medium having stored thereon computer instructions which, when run on a computer, cause the computer to perform the method of the first aspect.

Due to the adoption of the technical scheme, the embodiment of the application has at least the following technical effects:

the method can determine the skill of the natural language to be recognized in a plurality of skills corresponding to the task type dialogue system by directly matching semantic information of the natural language to be recognized with skill description information of the task type dialogue system, can determine the belonging intention of the natural language to be recognized by directly matching the semantic information of the natural language to be recognized with intention description information corresponding to the skill, and can determine slot position information in the natural language to be recognized by slot position description information and slot position extraction conditions of alternative slot positions so as to recognize the natural language to be recognized, and can acquire the natural language matched with new skill from a natural language library according to the skill description information, the intention description information and the slot position description information of the new skill when developing the new skill so as to automatically finish the marking of the natural language, therefore, sample data of a new skill can be obtained, a large amount of sample data does not need to be written and marked manually, the task type dialog system can obtain the sample data of the new skill only by defining the skill description information, the intention description information and the alternative slot position description information of the new skill to be developed, and the obtained sample data can be used for training the dialog type task system, so that the process of developing the new skill is simplified, the time cost and the labor cost are saved, and a developer can develop the new skill more quickly and conveniently.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only the embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

Fig. 1 is an exemplary diagram of skill related information, intention related information, and slot position related information in an air ticket field in an embodiment of the present application;

FIG. 2 is a diagram illustrating an example of labeling a natural language input by a user in an embodiment of the present application;

FIG. 3 is a schematic diagram of an application scenario in which the present application is applied;

FIG. 4 is a flowchart illustrating a natural language identification method in a task-based dialog system according to an embodiment of the present application;

FIG. 5 is a diagram illustrating a plurality of natural language understanding models according to an embodiment of the present application;

FIG. 6 is a diagram illustrating an example of a generic natural language understanding model according to an embodiment of the present application;

FIG. 7 is a diagram illustrating a structure of a generic natural language understanding model according to an embodiment of the present application;

FIG. 8 is a diagram illustrating an example of a skill classification model according to an embodiment of the present application;

FIG. 9 is a diagram illustrating an example of a structure of a BERT-based classification model according to an embodiment of the present application;

FIG. 10 is a diagram illustrating an example of a structure of an intent recognition model in an embodiment of the present application;

FIG. 11 is a diagram illustrating an exemplary structure of a slot extraction model according to an embodiment of the present application;

FIG. 12 is a diagram illustrating an example of a structure of a BERT-based QA model according to an embodiment of the present application;

FIG. 13 is a sample data example diagram of skill description information and a partial sample of a weather conversational task, provided by an embodiment of the application;

FIG. 14 is a sample data example diagram of technical description information and a partial sample of a flight dialogue-type task according to an embodiment of the present disclosure;

FIG. 15 is a diagram illustrating skill description information and examples of natural languages to be recognized for a hotel conversation-type task, according to an embodiment of the present disclosure;

FIG. 16 is an exemplary diagram of a natural language identification result of a hotel dialogue-type task according to an embodiment of the present application;

FIG. 17 is a flowchart illustrating a natural language labeling method in a task based dialog system according to an embodiment of the present application;

FIG. 18 is a schematic structural diagram of a natural language recognition apparatus in a task-based dialog system according to an embodiment of the present application;

FIG. 19 is a schematic structural diagram of a natural language labeling apparatus in a task-based dialog system according to an embodiment of the present application;

fig. 20 is a schematic structural diagram of a computing device in an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions in the embodiments of the present application will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by one of ordinary skill in the art from the embodiments given herein without making any creative effort, shall fall within the scope of the claimed protection. In the present application, the embodiments and features of the embodiments may be arbitrarily combined with each other without conflict. Also, while a logical order is shown in the flow diagrams, in some cases, the steps shown or described may be performed in an order different than here.

The terms "first" and "second" in the description and claims of the present application and the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the term "comprises" and any variations thereof, which are intended to cover non-exclusive protection. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus. The "plurality" in the present application may mean at least two, for example, two, three or more, and the embodiments of the present application are not limited.

The embodiments of the present application relate to Artificial Intelligence (AI) and Machine Learning technologies, and are designed based on natural language processing (natural language) and Machine Learning (ML) in the AI.

Artificial intelligence is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making. The artificial intelligence technology mainly comprises a computer vision technology, a natural language processing technology, machine learning/deep learning and other directions.

With the research and progress of artificial intelligence technology, artificial intelligence is researched and applied in a plurality of fields, such as common smart homes, smart customer service, virtual assistants, smart speakers, smart marketing, unmanned driving, automatic driving, robots, smart medical treatment and the like.

Machine learning is a multi-field cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and the like. In the semantic feature extraction process of the video title, the semantic feature extraction model based on machine learning or deep learning is adopted to learn the video title sample with the category label, so that the feature vector of the semantic feature of the input video title can be extracted.

The natural language processing technology is an important direction in the fields of computer science and artificial intelligence. It is a research into various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. Natural language processing techniques typically include text generation, text processing, semantic understanding, machine translation, robotic question answering, knowledge mapping, and the like. According to the embodiment of the application, a semantic understanding technology in a natural language processing technology is adopted to carry out semantic understanding on video titles of various videos, and each video in a video data set is clustered based on the obtained feature vector capable of representing the semantic features of the video titles, so that a plurality of video sets are obtained.

In order to facilitate those skilled in the art to better understand the technical solutions of the present application, the following description refers to the technical terms of the present application.

1) Task-based dialog system: refers to a dialog system that assists a user in solving a particular task, providing information or services under particular conditions. Generally, the system is different from the chat-type dialogue system in order to satisfy task-type scenarios with specific purposes, such as traffic checking, telephone charge checking, meal ordering, ticket booking, consultation and the like. Because the user's demand is more complicated, need to divide into many rounds of interdynamic under normal circumstances, the user also can be in the conversation process constantly revise and perfect oneself's demand, and task type dialog system needs help user's clear purpose through inquiry, clarification and affirmation. The core modules of the task-based Dialog system mainly include a Natural Language Understanding module (Natural Language Understanding), a Dialog Management module (Dialog Management), and a Natural Language Generation module (Natural Language Generation).

2) Query: if the user inputs a natural language into the task-based dialog system, for example, the user's request is a reservation ticket, a request such as "i want to reserve a high-speed rail from today to beijing" may be input into the task-based dialog system.

3) Natural Language Understanding (NLU), domain, intent, and slot;

understanding natural language: natural language understanding techniques in task-based dialog systems. When the natural language input by the user passes through the natural language understanding module, three sub-modules of field identification, user intention identification and slot position extraction are needed;

the field is as follows: the field where the user needs to perform a specific task, such as the air ticket field, the restaurant field, the music field, etc. In a task-based dialog system, a domain is typically defined by a set of intents and slots.

Intention is: within a particular domain there will usually be some fine-grained user intentions, for example the field of airline tickets may have an intention to find an airline ticket, an intention to book an airline ticket, an intention to unsubscribe an airline ticket, etc.

And (4) slot position: key information that needs to be collected when a task is to be completed in a specific field is represented, for example, in the field of tickets, there are slots such as a departure city, a destination city, and a departure date, and each slot needs to identify corresponding key information, such as the departure city: beijing, destination City: shenzhen, departure date: month, day, etc.

4) Applications, i.e. application programs, computer programs that can perform one or more services, typically have a visual display interface that can interact with a user, such as electronic maps and calendars, which may all be referred to as applications. Some applications need to be installed on the terminal device used by the user to be used, and some applications do not need to be installed, for example, each applet, webpage and the like in some social applications. The applet can be used without downloading or installing, and the user can open the application by scanning or searching.

5) A Client (Client), also called Client, refers to a program corresponding to a server and providing local services to clients. Except for some application programs which only run locally, the application programs are generally installed on common clients and need to be operated together with a server. After the internet has developed, the more common clients include web browsers used on the world wide web, email clients for receiving and sending emails, and client software for instant messaging. For this kind of application, a corresponding server and a corresponding service program are required in the network to provide corresponding services, such as database services, e-mail services, etc., so that a specific communication connection needs to be established between the client and the server to ensure the normal operation of the application program.

The following explains the concept of the present application.

In the related art, different task-based dialog systems are developed respectively, and the natural language understanding capability of the open dialog platform for configuring a new skill generally requires three main steps:

1) defining skill-related intents and slots: after defining the intention and the slot position of a skill, a natural language understanding module corresponding to the skill needs to identify the intention corresponding to the skill in the natural language input by the user and extract slot position information in the natural language input by the user, it should be noted that in the related art, a natural language understanding model and the skill are in a one-to-one correspondence relationship, as shown in fig. 1, the skill related information, the intention related information, and the slot position related information in the field of airline tickets proposed by the embodiment of the present application are provided.

2) Writing and labeling linguistic data related to skills: at present, a natural language understanding module of an open type dialogue platform is generally based on a machine learning method, and in order to train a natural language understanding model with good performance, a large amount of labeled data must be provided for dialogue-type tasks to be developed. Thus, the developer first writes the user's natural language that may trigger the skill and then labels the intent and the slots involved in the user's natural language. Due to the diversity and richness of natural languages, in order to train a robust natural language understanding model, a developer is usually required to write a large amount of natural language data with rich diversity and label the data, as shown in fig. 2, which is a schematic diagram for labeling a natural language input by a user.

3) And performing model training by using the labeled data. After the annotation data is obtained, training of a natural language understanding model can be started, wherein the intention recognition is to execute a classification task, and classification models such as TextCNN, LSTM (Long Short-Term Memory network) and the like can be adopted; slot extraction is a sequence annotation task that can be, but is not limited to, using a combination of LSTM and CRF (Conditional Random Field) models.

In the above three steps, the writing and labeling work of the sample data of each conversational task usually consumes a lot of labor cost and time cost, which causes an obstacle to the development of new skills.

In order to develop the corresponding skills of the task-based dialog system more quickly and conveniently, the embodiment of the application considers the similarity of related dialog-based tasks, and provides a universal task-based dialog system, wherein the sample data of the universal task-based dialog system comprises the sample data adopted when each dialog-based task is trained respectively, the trained universal task-based dialog system can learn the universal characteristics of each dialog-based task, and then when in use, the semantic information of the natural language to be recognized is directly matched with the description information of each dialog-based task, and the recognition result of the natural language to be recognized is determined according to the similarity, so that not only one set of system can be used for realizing the functions of a plurality of task-based dialog systems, but also when a new task-based dialog system needs to be developed, the description information of the new task-based dialog system needs to be matched with the similarity of the natural language to be recognized, and taking the matched natural language as an alternative corpus for training the new task type dialog system, and after the obtained alternative corpus is sorted and labeled, taking the alternative corpus as sample data to train the new task type dialog system. Therefore, the general task type dialog system provided by the embodiment of the application can not only recognize the dialog task aiming at the natural language to be recognized, but also find sample data for training a new task type dialog system, thereby greatly improving the development efficiency of the task type dialog system.

Specifically, semantic information of a natural language to be recognized and skill description information of different dialogue type tasks are matched, so that the skill to which the natural language to be recognized belongs can be determined in a plurality of skills corresponding to a task type dialogue system, the semantic information of the natural language to be recognized is directly matched with intention description information corresponding to the skill, so that the intention of the natural language to be recognized can be determined, slot position information in the natural language to be recognized is determined through slot position description information and slot position extraction conditions of alternative slot positions, and the natural language to be recognized is further recognized, compared with the skill corresponding to each natural language understanding model in the task type dialogue system in the related technology, the task type dialogue system corresponds to one natural language understanding model which corresponds to a plurality of different skills, therefore, when a new skill is developed, a large amount of sample data does not need to be written and marked, and only the skill description information, the intention description information and the alternative slot position description information which are required to develop the new skill need to be defined, so that a new skill can be added to the task-based dialog system, the process of developing the new skill is simplified, the time cost and the labor cost are saved, and a developer can develop the new skill more quickly and conveniently.

It should be noted that, the task-based dialog system of the present application may also label the recognized skill, intention, and slot information in the natural language input by the user on the basis of recognizing the corresponding skill, intention, and slot information in the natural language input by the user, and use the labeled natural language as sample data of the corresponding skill, or manually check or label the natural language on the basis of pre-labeling the natural language by the task-based dialog system, so as to further reduce the labor cost required for labeling data.

Particularly, when a new conversational task needs to be developed, natural language recognized as an untrained conversational task can be labeled, sample data of the untrained conversational task is obtained, and the skill classification model, the intention recognition model and the slot extraction model are updated and trained on the basis of the obtained sample data; the untrained conversational task is updated to be a trained conversational task, wherein the natural language recognized as the untrained conversational task is labeled, and the natural language recognized as the untrained conversational task can be labeled manually after being recognized, or labeled after the target skill, the target intention and the slot position information of the natural language of the untrained conversational task are determined directly through a skill classification model, an intention recognition model and a slot position extraction model, or corrected and labeled manually; the skill classification model, the intention recognition model and the slot extraction model are updated and trained through the obtained sample data, so that the recognition accuracy of the skill classification model, the intention recognition model and the slot extraction model can be improved.

In order to better understand the technical solution provided by the embodiment of the present application, some brief descriptions are provided below for application scenarios to which the technical solution provided by the embodiment of the present application is applicable, and it should be noted that the application scenarios described below are only used for illustrating the embodiment of the present application and are not limited. In specific implementation, the technical scheme provided by the embodiment of the application can be flexibly applied according to actual needs.

Fig. 3 is a schematic view of an application scenario of a natural language identification method in a task-based dialog system according to an embodiment of the present application. The application scenario includes a plurality of terminal devices 301 and a server 302. The terminal device 301 and the server 302 are connected via a wireless or wired communication network, and the terminal device 301 includes but is not limited to a desktop computer, a mobile phone, a mobile computer, a tablet computer, a media player, an intelligent wearable device, an intelligent television, a vehicle-mounted device, a Personal Digital Assistant (PDA), and other electronic devices. The server 302 may be a server, a server cluster composed of several servers, or a cloud computing center.

The terminal device 301 is provided with a client that can provide a natural language recognition service, a dialogue management system service, a natural language generation service, and the like in a task-based dialogue system by the server 302, or provide a labeling service in a natural language by the server 302, and the client in the terminal device 301 includes a general task-based dialogue system that can recognize natural languages corresponding to a plurality of skills, and it should be noted that the natural language recognition service obtains skill information, field information, and slot position information in the natural language as input of the dialogue management system. The conversation management system comprises two parts, namely a state tracking module and a conversation strategy, wherein the state tracking module comprises various information of continuous conversation, according to an old state, the marking information of the natural language identification service and the system state (namely, the condition of inquiring with a database) are used for updating the current conversation state, for example, aiming at the natural language of 'I want to order an air ticket to Beijing', the state tracking module can directly inquire the air ticket to Beijing and obtain related information, the conversation strategy is related to the information of the task scene where the conversation strategy is located and is usually used as the output of the conversation management module, for example, a question-back strategy of a slot missing in the scene, and the like, for example, in the related service of the preset air ticket in the natural language, the slot information of travel time is lacked, and the task type conversation system can inquire or remind a user.

In this embodiment, after receiving the natural language of the user, the client of the terminal device 301 determines the skill, the intention, and the slot position information of the user inputting the natural language through the natural language identification service, and further provides a corresponding service for the user.

Of course, the method provided in the embodiment of the present application is not limited to the application scenario shown in fig. 3, and may also be used in other possible application scenarios, and the embodiment of the present application is not limited. The functions that can be implemented by each device in the application scenario shown in fig. 3 will be described in the following method embodiments, and will not be described in detail herein.

To further illustrate the technical solutions provided by the embodiments of the present application, the following detailed description is made with reference to the accompanying drawings and the detailed description. Although the embodiments of the present application provide method steps as shown in the following embodiments or figures, more or fewer steps may be included in the method based on conventional or non-inventive efforts. In steps where no necessary causal relationship exists logically, the order of execution of the steps is not limited to that provided by the embodiments of the present application. The method can be executed in sequence or in parallel according to the method shown in the embodiment or the figure when the method is executed in an actual processing procedure or a device.

The present application provides a natural language recognition method in a task-based dialog system that may be performed by a task-based dialog server, such as server 302 in fig. 3. The natural language identification method in the task-based dialog system provided by the embodiment of the application is shown in fig. 4, and the flowchart shown in fig. 4 is described as follows.

Step S401, according to the matching conditions between the semantic information of the natural language to be recognized and the skill description information of each conversational task in the task-based conversational system, determining a target skill matched with the semantic of the natural language to be recognized from each conversational task;

in the embodiment of the application, the natural language to be recognized is a natural language input by a user, namely a specific task which needs to be completed by the user, compared with the related technology, the current task type conversation system only supports a conversation type task which is trained in advance, and in the embodiment of the application, as long as a developer defines skill description information of the conversation type task, intention description information of each alternative intention and alternative slot position description information of each alternative intention in advance, related services can be provided for the user without training one by one in advance.

As shown in fig. 5, each field in the task-based dialog system in the related art corresponds to a natural language understanding model, and the field of each natural language understanding model is determined, so that the natural language understanding model corresponding to each field needs to be trained by sample data labeled in the field, which greatly increases the labor cost and time cost for writing and labeling the sample data. In the embodiment of the application, the skill description information of the dialogue-type task is defined in advance, and by matching the semantic information of the natural language to be recognized with the skill description information of each dialogue-type task in the task-type dialogue system, determining a target skill matching the semantics of the natural language to be recognized from each dialogue-type task according to a matching condition between the semantic information of the natural language to be recognized and the skill description information of each dialogue-type task in the task-type dialogue system, wherein, the matching condition can be but not limited to determining the similarity between the semantic information of the natural language to be recognized and the skill description information of each conversational task in the task-based conversational system, determining the conversational task with the similarity reaching the target threshold as the target skill matched with the natural language to be recognized, alternatively, the dialogue-type task with the greatest similarity is determined as the target skill matching the natural language to be recognized.

It should be noted that, in the embodiment of the present application, the semantic information of the natural language to be recognized may be, but is not limited to, a vector representation of the natural language to be recognized, and optionally, a vector representation of skill description information of each conversational task in the task-based conversational system may also be determined, and by comparing the similarity between the vector representation of the natural language to be recognized and the vector representation of the skill description information of each conversational task in the task-based conversational system, a target skill that matches the semantic of the natural language to be recognized is determined from each conversational task.

The fields where the specific tasks need to be completed by the user may include the field of air tickets, the field of hotels and the like, but it should be noted that, due to differences of developers, the same field may include a plurality of similar skills, but services provided by the plurality of similar skills in the same field may also be the same, or the plurality of skills in the same field are only slightly different in intentions and slot extraction, so that each conversational task in the implementation of the application directly corresponds to a skill, and the corresponding skill can be selected for the user according to historical operation data of the user.

Step S402, determining a target intention matched with the semantics of the natural language to be recognized according to the matching conditions between the semantic information of the natural language to be recognized and the intention description information of each alternative intention corresponding to the target skill, wherein each intention description information is used for describing a subtask included in the conversational task;

in the embodiment of the application, each target skill corresponds to a plurality of alternative intentions, and after the target skill corresponding to the natural language to be recognized is determined, and determining a target intention matched with the natural language to be recognized from a plurality of alternative intentions corresponding to the target skill, wherein in the embodiment of the application, determining a target intention matched with the semantics of the natural language to be recognized according to the matching condition between the semantic information of the natural language to be recognized and the intention description information of each alternative intention corresponding to the target skill, wherein, the matching condition can be but not limited to determining the similarity between the semantic information of the natural language to be recognized and the intention description information of each alternative intention corresponding to the target skill, determining the alternative intention with the similarity reaching the target threshold as the target intention matched with the natural language to be recognized, or, determining the candidate intention with the maximum similarity as the target intention matched with the natural language to be recognized.

It should be noted that, in the embodiment of the present application, the semantic information of the natural language to be recognized may be, but is not limited to, a vector representation of the natural language to be recognized, and optionally, a vector representation of the intention description information of each candidate intention corresponding to the target skill may also be determined, and by comparing the similarity between the vector representation of the natural language to be recognized and the vector representation of the intention description information of each candidate intention corresponding to the target skill, the target intention that matches the semantic meaning of the natural language to be recognized is determined. In the embodiment of the application, a plurality of alternative intentions under the same skill are taken as a plurality of subtasks in the task-based dialog system, for example, a subtask including a scheduled air ticket and a subtask inquiring an air ticket under an air ticket task.

Step S403, extracting slot position information of each alternative slot position from the natural language to be identified according to slot position description information and slot position information extraction conditions of each alternative slot position corresponding to the target intention, wherein the slot position information of each alternative slot position is used for limiting each key information of the subtask corresponding to the intention;

each intention in the embodiment of the present application corresponds to at least one alternative slot, for example, in the intention of reserving an air ticket, it is necessary to know user identity information of the reserved air ticket, travel time of the air ticket, departure city, destination, and the like, and generally, all slot information in the corresponding intention is usually contained in a natural language to be recognized. In the embodiment of the present application, the slot extracting condition is to search for slot information in the to-be-identified natural language corresponding to the slot description information of the candidate slot, for example, if the slot description information indicates that the corresponding candidate slot is to be a search starting city, information corresponding to a location is searched for from the to-be-identified natural language according to the slot extracting condition, and the location information of the starting place is determined according to the to-be-identified natural language semantic information. In the implementation process, slot position information of each alternative slot position is extracted from the natural language to be identified according to the slot position description information of the alternative slot positions and the slot position extraction condition. In the embodiment of the present application, the slot position information of each alternative slot position is used to define each key information of the subtask corresponding to the intention.

Step S404, obtaining a conversation type task recognition result of the natural language to be recognized according to the skill description information of the target skill, the intention description information of the target intention, the slot position description information of each alternative slot position and the slot position information thereof.

In the embodiment of the application, according to the skill description information of the target skill, the intention description information of the target intention, and the slot description information of each alternative slot and the slot information thereof, a task identification result of the natural language to be identified can be obtained, and then, a service is provided for a user according to the task identification result.

As shown in fig. 7, the natural language understanding model in the task-type dialog system in the embodiment of the present application includes three sub-models, namely, a skill classification model, an intention recognition model, and a slot extraction model. The target skill of the embodiment of the application is obtained based on a trained skill classification model, that is, the natural language to be recognized is input into the trained skill classification model to obtain the target skill matched with the natural language to be recognized, the target intention is obtained based on the trained intention recognition model, that is, the natural language to be recognized is input into the trained intention recognition model to obtain the target skill matched with the natural language to be recognized, the slot position information is obtained based on the trained slot position extraction model, that is, the natural language to be recognized is input into the trained slot position extraction model, and the slot position information of each alternative slot position is extracted from the natural language to be recognized.

The conversational tasks comprise trained conversational tasks and untrained conversational tasks, the skill classification model, the intention recognition model and the slot position extraction model are obtained by training sample data of each trained conversational task, wherein the skill classification model is trained through labeled sample data included in a plurality of skills, the intention recognition model is trained through labeled sample data included in a plurality of intents, and the slot position extraction model is trained through labeled sample data included in a plurality of slot positions. The task corresponding to the sample data training is a trained conversational task, and the untrained conversational task comprises the following steps: skill description information, intention description information of various alternative intentions, and alternative slot description information of each alternative intention. The skill classification model obtained after training in the embodiment of the application is a general skill classification model, and only skill description information of a new skill needs to be defined when the new skill needs to be developed.

The training process and the using process of the skill classification model, the intention recognition model and the slot extraction model are specifically described as follows:

1. skill classification model

As shown in fig. 8, the skill classification model includes a first vector representation module and a first classification module, wherein the first vector representation module is configured to perform vector representation on input data, and the first classification model is configured to determine similarity between at least two obtained vectors;

the skill classification corpus sample of the training skill classification model at least comprises one sample corresponding to each trained conversation type task, each sample comprises skill description information of the conversation type task and is labeled with a similar probability value of the conversation type task, in the embodiment of the application, the similar probability value of the conversation type task to which each skill classification corpus sample belongs can be but is not limited to 0 and 1, when the similar probability value of the skill classification corpus sample and the conversation type task is 1, the skill classification corpus sample is matched with the corresponding conversation type task, and otherwise, the skill classification corpus sample is not matched.

Specifically, if the existing sample data contains N skills, D is₁,D₂,…D_NSkill D_nComprising I_nIntention, S_nA slot position and M_nAnd (3) classifying the corpus samples by the strip labeled skills, wherein a set formed by all natural language data of all fields is M, and for any one natural language data q and any one skill D in the M_iA skill classification corpus sample (l, D) can be generated_iQ), if q ∈ Di, l ═ 1, otherwise l ═ 0.

In the embodiment of the application, the trained skill classification model is trained by the following method:

1) respectively obtaining a first target vector representation of skill description information of each trained task based on a first vector representation module;

the vector representation obtained based on the first vector representation module may be, but is not limited to, a sentence vector representation, and a person skilled in the art may set the vector representation according to actual needs, which is not described herein again.

2) For each skill classification corpus sample, obtaining a first reference vector representation of the skill classification corpus sample based on a first vector representation module;

3) respectively obtaining first similarity between the first reference vector representations and the first target vector representations on the basis of the first classification module;

wherein the first similarity between the first reference vector representation and each first target vector representation may be, but is not limited to, a value between 0 and 1, the greater the first similarity, the more similar the first reference vector representation and the corresponding first target vector representation.

4) Adjusting parameters of the first vector representation module and the first classification module based on the obtained first similarities until the first similarities meet set conditions;

the similarity satisfies the setting condition, but is not limited to setting a similarity threshold, and when the first similarity is greater than the similarity threshold, the first similarity is considered to satisfy the setting condition.

The process of obtaining a target skill based on the skill classification model is described below:

1) obtaining a first reference vector representation of skill description information for each trained conversational task and untrained conversational task, respectively, based on a first vector representation module;

2) obtaining a first to-be-identified vector representation of the natural corpus to be identified based on a first vector representation module;

3) and respectively obtaining second similarity between the first control vector representation and each first to-be-identified vector representation based on the first classification module, and obtaining a classification result based on the second similarity.

As an alternative embodiment, the skill classification model may be, but is not limited to, a BERT pre-training model, and specifically uses a structure of the BERT-based classification model, and a schematic diagram of the structure of the BERT-based classification model is shown in fig. 9.

2. Intention recognition model

As shown in fig. 10, the intention recognition model includes a second vector representation module and a second classification module, wherein the second vector representation module is used for performing vector representation on the input data, and the second classification module is used for determining the similarity between at least two vectors;

in the embodiment of the present application, the similar probability value of the alternative intent to which each intent recognition corpus sample belongs may be, but is not limited to, 0 and 1, when the similar probability value of the intent recognition corpus sample and the alternative intent is 1, the intent recognition corpus sample matches with the corresponding alternative intent, and otherwise, the intent recognition corpus sample does not match.

Specifically, if the existing sample data contains N skills, D is₁,D₂,…D_NSkill D_nComprising I_nIntention, S_nA slot position and M_nThe ribbon labeled skill classification corpus sample, wherein a set formed by all natural language data of all fields is M, and the intention identification corpus sample comprises the following steps: for a certain domain D_iSample data M of_iAny data q in the field and any intention Ii in the field can generate annotation data (l, Ii, q) of an intention classification model, wherein if the intention of q is Ii, l is 1, and otherwise l is 0.

In the embodiment of the application, the trained intention recognition model is trained in the following way:

1) respectively obtaining second target vector representation of intention description information corresponding to each subtask in each trained task based on a second vector representation module;

the vector representation obtained based on the second vector representation module may be, but is not limited to, sentence vector representation, and those skilled in the art can set the vector representation according to actual requirements, which is not described herein.

2) For each intention recognition corpus sample, obtaining a second reference vector representation of the intention recognition corpus sample based on a second vector representation module;

3) respectively obtaining third similarity between the second reference vector representations and the second target vector representations on the basis of a second classification module;

wherein the second similarity between the second reference vector representation and each second target vector representation may be, but is not limited to, a value between 0 and 1, the greater the second similarity, the more similar the second reference vector representation and the corresponding second target vector representation.

4) Adjusting parameters of the second vector representation module and the second classification module based on the obtained third similarities until the third similarities meet the set conditions;

the similarity satisfies the setting condition, but not limited to setting a similarity threshold, and when the third similarity is greater than the similarity threshold, the third similarity is considered to satisfy the setting condition.

The following describes the process of obtaining the target intention based on the intention recognition model:

1) respectively obtaining a second reference vector representation of the intention description information of each subtask in the trained conversational task and each subtask in the untrained conversational task based on a second vector representation module;

2) obtaining a second vector representation to be recognized of the natural corpus to be recognized based on a second vector representation module;

3) and respectively obtaining fourth similarity between the second control vector representation and each second vector representation to be identified based on the second classification module, and obtaining a classification result based on the fourth similarity.

As an alternative embodiment, the intention recognition model may, but is not limited to, employ a BERT pre-training model, specifically using the structure of the BERT-based classification model.

3. Slot extraction model

As shown in fig. 11, the slot extracting model includes a third vector representation module and a slot extracting module, where the third vector representation module is configured to perform vector representation on input data, and the slot extracting module is configured to extract slot information in a natural language;

the slot position extraction corpus sample of the training slot position extraction model at least comprises a sample corresponding to each trained dialogue-type task, reference slot position information of an affiliated alternative slot position is marked on each slot position extraction corpus sample, and the trained slot position extraction model is trained in the following mode:

1) extracting a corpus sample from each slot position, and obtaining a third reference vector representation of the extracted corpus sample of the slot position based on a third vector representation module;

the vector representation obtained based on the third vector representation module may be, but not limited to, sentence vector representation, word vector representation or word vector representation, and those skilled in the art may set the vector representation according to actual requirements, which is not described herein.

2) Respectively obtaining first target slot position information of slot position description information corresponding to each alternative slot position in the third reference vector representation based on the slot position extraction module;

for example, if the candidate slot is a specific time representation, the first target slot information is specific time information in the slot extraction corpus sample.

3) Adjusting parameters of a third vector representation module and a slot position extraction module based on the obtained first target slot position information until the obtained first target slot position information meets a set condition;

the first target slot position information meets the set condition, and the slot position information corresponding to the alternative slot position can be accurately obtained in the slot position extraction corpus sample.

The method for obtaining slot position information based on the trained slot position extraction model comprises the following steps:

1) obtaining a third vector representation to be recognized of the natural corpus to be recognized based on a third vector representation module;

2) and respectively obtaining second target slot position information represented by the third vector to be identified based on the slot position extraction module, and obtaining a classification result based on the second target slot position information.

As an alternative embodiment, the slot extraction model may be, but is not limited to, a BERT pre-training model, specifically, a structure of a BERT-based QA model is used, and a schematic diagram of the structure of the BERT-based QA model is shown in fig. 12.

The following describes in detail a natural language identification method in a task-based dialogue system according to an embodiment of the present application with reference to a specific embodiment, assuming that a conventional open dialogue platform includes sample data of weather dialogue-based task skills, flight dialogue-based task skills, and a related natural language identification model. The skill description information and part of the sample data of the weather dialogue-type task are shown in fig. 13, and the skill description information and part of the sample data of the flight dialogue-type task are shown in fig. 14.

Through the skill description information of the two skills and the labeled sample data, three general natural language recognition models provided by the invention can be obtained through training, wherein the three general natural language recognition models are respectively as follows: a general skill classification model, a general intent recognition model, and a general slot extraction model. The main functions of these three general natural language recognition models are to find semantic correlations in the skill description information of the skill and the natural language to be recognized input by the user in the three aspects of skill, intention, and slot position.

The skill classification model learns semantic correlations between the skill description information and the natural language to be recognized input by the user. For example, "air ticket" and the keywords "plane", "flight", etc. in the natural language to be recognized, which is input by the user, have significant semantic relevance; the 'weather' has obvious semantic relevance with keywords such as 'weather', 'heat', 'humidity' and the like in the natural language to be recognized, which is input by a user.

The intention recognition model learns the semantic relevance between the intention description information and the natural language to be recognized input by the user, for example, the semantic relevance between the 'search' and the keywords 'search' in the natural language to be recognized input by the user is obvious; the "booking" has significant semantic relevance to the keywords "booking", "buying", etc. in the natural language to be recognized entered by the user.

The slot position extraction model can learn semantic relevance between slot position description information and natural language to be recognized input by a user, for example, city names such as city, Beijing, Shanghai have semantic relevance, and date, today and tomorrow have semantic relevance; "from" and "from" have semantic relevance; "reach" and "go", etc. have semantic relevance; the words "number of people" and "two" have semantic relevance.

Suppose a developer newly develops a skill for a hotel conversational task, and the skill description information for the skill and possibly the natural language to be recognized entered by the user is shown in fig. 15. By utilizing the semantic correlation between the skill description information learned by the universal natural language understanding model and the input natural language to be recognized, the universal natural language understanding model provided by the invention can recognize the skill, the intention and the slot position of the natural language to be recognized input by the user corresponding to the skill under the condition of no labeled data.

Firstly, the skill classification model finds that the semantic correlation between the keywords such as 'hotel', 'room' and the skills such as 'flight', 'weather' of the natural language to be recognized input by the user is low, and the correlation between the keywords and the skills such as 'hotel', 'room' is high, so that the skill classification is completed, and the skill of the natural language to be recognized input by the user is obtained and is the hotel. The intention recognition model can distinguish two different intentions of 'inquire hotel' and 'reserve hotel' by keywords such as 'find', 'reserve', etc. in a natural language to be recognized, which is input by a user. The slot extraction model can also extract related keywords in the natural language to be recognized, which is input by the user, according to keywords defined by slots such as "city", "date", and the like. The recognition result of the general natural language understanding model is shown in fig. 16.

On the basis of the natural language identification method in the task-based dialog system provided by the present application, the present application also provides a natural language annotation method, which can be executed by a task-based dialog server, for example, the server 302 in fig. 3. Fig. 17 shows a natural language labeling method in a task-based dialog system according to an embodiment of the present application, and a flowchart shown in fig. 17 is described as follows.

Step S1701, determining a target skill matched with the semantics of the natural language to be recognized from each conversational task according to the matching condition between the semantic information of the natural language to be recognized and the skill description information of each conversational task in the task-based conversational system;

step S1702, determining a target intention matched with the semantics of the natural language to be recognized according to the matching condition between the semantic information of the natural language to be recognized and the intention description information of each alternative intention corresponding to the target skill, wherein each intention description information is used for describing a subtask included in the conversational task;

step S1703, slot position information of each alternative slot position is extracted from the natural language to be identified according to slot position description information and slot position information extraction conditions of each alternative slot position corresponding to the target intention, and the slot position information of each alternative slot position is used for limiting each key information of the subtask corresponding to the intention;

and step S1704, marking the natural language to be identified according to the skill description information of the target skill, the intention description information of the target intention, the slot position description information of each alternative slot position and the slot position information thereof, and obtaining sample data of the conversational task.

The target skill is obtained based on a trained skill classification model, the target intention is obtained based on a trained intention recognition model, the slot position information is obtained based on a trained slot position extraction model, the conversational tasks comprise a trained conversational task and an untrained conversational task, and the skill classification model, the intention recognition model and the slot position extraction model are obtained by adopting sample data of each trained conversational task.

According to the foregoing, after the universal natural language understanding model is obtained, the natural language can be directly identified through the universal natural language understanding model, and similarly, the natural language can be labeled after the skill, the intention, and the slot position information of the natural language are obtained according to the universal understanding model, and the labeled natural language is used as the sample data of the conversational task. According to the obtained sample data, the skill classification model, the intention recognition model and the slot position extraction model are further updated and trained, it needs to be explained that the obtained sample data can comprise the natural language of the trained conversational type task and the natural language of the untrained conversational type task, the skill classification model, the intention recognition model and the slot position extraction model are further updated and trained through the obtained sample data, the processing precision of the skill classification model, the intention recognition model and the slot position extraction model for the trained conversational type task can be improved, and secondly, when the obtained sample data comprises the natural language identified as the untrained conversational type task, the untrained conversational type task is updated to the trained conversational type task after the updating and training.

As an optional implementation manner, in the embodiment of the present application, not only the natural language input by the user may be labeled through the general natural language understanding model, but also the skill, the intention, and the slot information of the natural language may be obtained and labeled through the skill language understanding model, the intention recognition model, and the slot model in the related art. The recognition and marking of the natural language and the training of the model can be carried out regularly according to the updating of the natural language library, and the recognition and marking of the natural language can be automatically completed, so that the model training efficiency and the new skill development efficiency are greatly improved.

Based on the same inventive concept, the embodiments of the present application provide a natural language identification device in a task-based dialog system, where the natural language identification device in the task-based dialog system may be a hardware structure, a software module, or a hardware structure plus a software module. The natural language identification device in the task-based dialog system may be, for example, the server 302 in fig. 3, or may be a functional device disposed in the server 302, and the natural language identification device in the task-based dialog system may be implemented by a chip system, and the chip system may be formed by a chip, or may include a chip and other discrete devices. Referring to fig. 18, the natural language recognition apparatus in the task-based dialog system in the embodiment of the present application includes a first skill determination unit 1801, a first intention determination unit 1802, a first slot position determination unit 1803, and a recognition unit 1804, where:

a first skill determination unit 1801, configured to determine, according to a matching condition between semantic information of a natural language to be recognized and skill description information of each conversational task in the task-based conversational system, a target skill that matches the semantic of the natural language to be recognized from each conversational task;

a first intention determining unit 1802 for determining a target intention matching the semantics of the natural language to be recognized, based on a matching condition between semantic information of the natural language to be recognized and intention description information of respective alternative intentions corresponding to target skills, each intention description information being for describing one subtask included in the dialogue-type task;

a first slot determining unit 1803, configured to extract slot information of each candidate slot from the natural language to be identified according to slot description information and a slot information extraction condition of each candidate slot corresponding to the target intent, where the slot information of each candidate slot is used to define each key information of a subtask corresponding to the intent;

the identifying unit 1804 is configured to obtain a natural language dialog task identification result to be identified according to the skill description information of the target skill, the intention description information of the target intention, and the slot description information and the slot information of each alternative slot.

In one possible implementation, the target skill is obtained based on a trained skill classification model, the target intent is obtained based on a trained intent recognition model, and the slot information is obtained based on a trained slot extraction model, wherein:

the conversational tasks comprise trained conversational tasks and untrained conversational tasks, and the skill classification model, the intention recognition model and the slot position extraction model are obtained by adopting sample data of each trained conversational task for training;

untrained conversational tasks include: skill description information, intention description information of various alternative intentions, and alternative slot description information of each alternative intention.

In one possible implementation manner, the skill classification model includes a first vector representation module and a first classification module, the skill classification corpus samples of the training skill classification model at least include one sample corresponding to each trained conversational task, each skill classification corpus sample is labeled with a similar probability value of the conversational task to which the skill classification model belongs, and the trained skill classification model is trained in the following manner:

In a possible implementation manner, the first skill determining unit 1801 is specifically configured to:

obtaining a first to-be-identified vector representation of the natural corpus to be identified based on a first vector representation module;

In one possible implementation manner, the intention recognition model includes a second vector representation module and a second classification module, the intention recognition corpus samples for training the intention recognition model at least include one sample corresponding to each of the trained conversational tasks, each of the intention recognition corpus samples is labeled with a similar probability value of the candidate intention, and the trained intention recognition model is trained by the following method:

In one possible implementation, the first intention determining unit 1802 is specifically configured to:

obtaining a second vector representation to be recognized of the natural corpus to be recognized based on a second vector representation module;

In one possible implementation manner, the slot extraction model includes a third vector representation module and a slot extraction module, the slot extraction corpus sample of the training slot extraction model includes at least one sample corresponding to each trained conversational task, each slot extraction corpus sample is labeled with reference slot position information of an affiliated candidate slot position, and the trained slot extraction model is trained in the following manner:

In a possible implementation manner, the first slot determining unit 1803 is specifically configured to:

obtaining a third vector representation to be recognized of the natural corpus to be recognized based on a third vector representation module;

and respectively obtaining second target slot position information represented by the third vector to be identified based on the slot position extraction module, and obtaining a classification result based on the second target slot position information.

In a possible implementation manner, the system further includes a first labeling unit 1805, which is specifically configured to:

labeling natural languages identified as untrained conversational tasks;

obtaining sample data of an untrained conversational task;

updating and training a skill classification model, an intention recognition model and a slot position extraction model based on the obtained sample data; and are

The untrained conversational task is updated to a trained conversational task.

All relevant contents of each step involved in the embodiment of the natural language identification method in the task dialog system can be cited to the functional description of the functional module corresponding to the natural language identification device in the task dialog system in the embodiment of the present application, and are not described herein again.

Based on the same inventive concept, the embodiments of the present application provide a natural language labeling device in a task-based dialog system, where the natural language labeling device in the task-based dialog system may be a hardware structure, a software module, or a hardware structure plus a software module. The natural language identification device in the task-based dialog system may be, for example, the server 302 in fig. 3, or may be a functional device disposed in the server 302, and the natural language labeling device in the task-based dialog system may be implemented by a chip system, and the chip system may be formed by a chip, or may include a chip and other discrete devices. Referring to fig. 19, the natural language annotation device in the task-based dialog system in the embodiment of the present application includes a second skill determination unit 1901, a second intention determination unit 1902, a second slot determination unit 1903, and a second annotation unit 1904, where:

a second skill determination unit 1901, configured to determine, according to matching conditions between the semantic information of the natural language to be recognized and the skill description information of each conversational task in the task-based conversational system, a target skill that matches the semantic of the natural language to be recognized from each conversational task;

a second intention determining unit 1902, configured to determine, according to a matching condition between the semantic information of the natural language to be recognized and intention description information of each alternative intention corresponding to the target skill, a target intention that matches the semantic of the natural language to be recognized, where each intention description information is used to describe one subtask included in the conversational task;

a second slot determining unit 1903, configured to extract slot information of each candidate slot from the natural language to be identified according to slot description information and slot information extraction conditions of each candidate slot corresponding to the target intent, where the slot information of each candidate slot is used to define each key information of a subtask corresponding to the intent;

a second labeling unit 1904, configured to label the natural language to be identified according to the skill description information of the target skill, the intention description information of the target intention, and the slot position description information of each alternative slot position and the slot position information thereof, so as to obtain sample data of the conversational task.

In a possible implementation manner, the system further includes a training unit 1905, specifically configured to:

All relevant contents of each step involved in the embodiment of the natural language labeling method in the task dialog system can be cited to the functional description of the functional module corresponding to the natural language labeling device in the task dialog system in the embodiment of the present application, and are not described herein again.

The division of the modules in the embodiments of the present application is schematic, and only one logical function division is provided, and in actual implementation, there may be another division manner, and in addition, each functional module in each embodiment of the present application may be integrated in one processor, may also exist alone physically, or may also be integrated in one module by two or more modules. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.

Based on the same inventive concept, an embodiment of the present application provides a computing device, for example, the server 302 in fig. 3 mentioned above, which is capable of executing the natural language identification method in the task dialog system provided by the embodiment of the present application, as shown in fig. 20, the computing device in the embodiment of the present application includes at least one processor 2001, and a memory 2002 and a communication interface 2003 connected to the at least one processor 2001, a specific connection medium between the processor 2001 and the memory 2002 is not limited in the embodiment of the present application, fig. 20 illustrates an example where the processor 2001 and the memory 2002 are connected by a bus 2000, the bus 2000 is represented by a thick line in fig. 20, and a connection manner between other components is merely illustrated schematically and is not limited. The bus 2000 may be divided into an address bus, a data bus, a control bus, etc., and only one thick line is shown in fig. 20 for convenience of illustration, but does not indicate only one bus or one type of bus.

In the embodiment of the present application, the memory 2002 stores a computer program executable by the at least one processor 2001, and the at least one processor 2001 may execute the steps included in the natural language identification method in the task dialog system by executing the computer program stored in the memory 2002.

The processor 2001 is a control center of the computing device, and may connect various parts of the entire computing device by using various interfaces and lines, and perform various functions of the computing device and process data by operating or executing instructions stored in the memory 2002 and calling data stored in the memory 2002, thereby performing overall monitoring of the computing device. Optionally, the processor 2001 may include one or more processing modules, and the processor 2001 may integrate an application processor and a modem processor, wherein the processor 2001 mainly processes an operating system, a user interface, an application program, and the like, and the modem processor mainly processes wireless communication. It will be appreciated that the modem processor described above may not be integrated into the processor 2001. In some embodiments, the processor 2001 and the memory 2002 may be implemented on the same chip, or in some embodiments, they may be implemented separately on separate chips.

The processor 2001 may be a general-purpose processor, such as a Central Processing Unit (CPU), digital signal processor, application specific integrated circuit, field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or the like, that implements or performs the methods, steps, and logic blocks disclosed in embodiments of the present application. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware processor, or may be implemented by a combination of hardware and software modules in a processor.

The memory 2002, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The Memory 2002 may include at least one type of storage medium, and may include, for example, a flash Memory, a hard disk, a multimedia card, a card-type Memory, a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Programmable Read Only Memory (PROM), a Read Only Memory (ROM), a charge Erasable Programmable Read Only Memory (EEPROM), a magnetic Memory, a magnetic disk, an optical disk, and so on. The memory 2002 is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory 2002 in the embodiments of the present application may also be circuitry or any other device capable of performing a storage function for storing program instructions and/or data.

The communication interface 2003 is a transmission interface capable of being used for communication, and data can be received or sent through the communication interface 2003, for example, data interaction with other devices can be performed through the communication interface 2003, so that the purpose of communication can be achieved.

Further, the computing device includes a basic input/output system (I/O system) 2004, a mass storage device 2008 for storing an operating system 2005, application programs 2006, and other program modules 2007 to facilitate the transfer of information between the various devices within the computing device.

The basic input/output system 2004 includes a display 2009 for displaying information and an input device 2010, such as a mouse, keyboard, etc., for user input of information. Wherein the display 2009 and the input device 2010 are both connected to the processor 2001 through a basic input/output system 2004 that is connected to the system bus 2000. The basic input/output system 2004 may also include an input/output controller for receiving and processing input from a number of other devices, such as a keyboard, mouse, or electronic stylus. Similarly, an input-output controller may also provide output to a display screen, a printer, or other type of output device.

The mass storage device 2008 is connected to the processor 2001 through a mass storage controller (not shown) connected to the system bus 2000. The mass storage device 2008 and its associated computer-readable media provide non-volatile storage for the server package. That is, mass storage device 2008 may include a computer-readable medium (not shown) such as a hard disk or a CD-ROM drive.

According to various embodiments of the present application, the computing device package may also be operated by a remote computer connected to the network through a network, such as the Internet. That is, the computing device may be connected to the network 2011 via the communication interface 2003 that is coupled to the system bus 2000, or the communication interface 2003 may be used to connect to other types of networks or remote computer systems (not shown).

Based on the same inventive concept, the present application also provides a storage medium, which may be a computer-readable storage medium, and the storage medium stores computer instructions, which, when executed on a computer, cause the computer to perform the steps of the natural language identification method in the task-based dialog system as described above.

Based on the same inventive concept, the embodiment of the present application further provides a chip system, where the chip system includes a processor and may further include a memory, and is configured to implement the steps of the natural language identification method in the task-based dialog system. The chip system may be formed by a chip, and may also include a chip and other discrete devices.

In some possible implementations, various aspects of the method for recommending content provided by the embodiments of the present application can also be implemented in the form of a program product including program code for causing a computer to perform the steps of the control method for updating content in a content recommendation pool according to various exemplary implementations of the present application described above when the program product is run on the computer.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A method for natural language recognition in a task-based dialog system, the method comprising:

and acquiring the conversational task recognition result of the natural language to be recognized according to the skill description information of the target skill, the intention description information of the target intention, the slot description information of each alternative slot and the slot information thereof.

2. The method of claim 1, wherein the target skill is obtained based on a trained skill classification model, the target intent is obtained based on a trained intent recognition model, and the slot information is obtained based on a trained slot extraction model, wherein:

3. The method of claim 2, wherein the skill classification model comprises a first vector representation module and a first classification module, wherein the skill classification corpus samples for training the skill classification model at least comprise one sample for each trained conversational task, each skill classification corpus sample is labeled with a similar probability value of the conversational task to which it belongs, and the trained skill classification model is trained by:

4. The method according to claim 3, wherein the target skills are obtained based on a trained skill classification model, comprising in particular:

5. The method of claim 2, wherein the intent recognition model comprises a second vector representation module and a second classification module, wherein the intent recognition corpus samples for training the intent recognition model comprise at least one sample for each of the trained conversational tasks, wherein each of the intent recognition corpus samples is labeled with a similar probability value of the candidate intent, and wherein the trained intent recognition model is trained by:

6. The method according to claim 5, wherein the target intent is obtained based on a trained intent recognition model, specifically comprising:

7. The method according to claim 2, wherein the slot extraction model comprises a third vector representation module and a slot extraction module, wherein a slot extraction corpus sample for training the slot extraction model at least comprises one sample corresponding to each of the trained dialogue-type tasks, each slot extraction corpus sample is labeled with reference slot position information of the candidate slot to which the slot belongs, and the trained slot extraction model is trained by:

8. The method of claim 7, wherein the slot information is obtained based on a trained slot extraction model, comprising:

9. A method for natural language tagging in a task-based dialog system, comprising:

10. The method of claim 9, wherein the target skill is obtained based on a trained skill classification model, the target intent is obtained based on a trained intent recognition model, the slot information is obtained based on a trained slot extraction model, wherein:

11. The method of claim 9, further comprising:

12. A natural language recognition apparatus in a task-based dialog system, comprising:

13. A natural language labeling apparatus in a task-based dialog system, comprising:

14. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the steps of the method of any one of claims 1 to 8, or 9 to 11 are performed when the program is executed by the processor.

15. A computer-readable storage medium having stored thereon computer instructions which, when executed on a computer, cause the computer to perform the method of any one of claims 1-8, or 9-11.