CN115221291A

CN115221291A - Method, device, equipment and storage medium for recognizing intention of dialogue information

Info

Publication number: CN115221291A
Application number: CN202110404810.3A
Authority: CN
Inventors: 侯政旭; 刘亚飞; 欧子菁; 赵瑞辉
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-04-15
Filing date: 2021-04-15
Publication date: 2022-10-21

Abstract

The application discloses a method, a device, equipment and a storage medium for recognizing the intention of dialog information, and belongs to the technical field of artificial intelligence. The method comprises the following steps: acquiring a dialog text to be identified; carrying out feature extraction processing on the dialog text to obtain feature information of the dialog text; carrying out multi-dimensional classification coding on the feature information, converting a plurality of discrete feature variables in the feature information into continuous feature variables, and obtaining multi-dimensional feature vectors of the feature information; classifying the multi-dimensional feature vectors respectively to obtain multi-dimensional classification results of the feature information; and performing fusion recognition processing on the multi-dimensional classification result to determine an intention recognition result of the dialog text. According to the method and the device, the intention identification result is determined by analyzing and processing from multiple dimensions, the accuracy of the intention identification result can be improved, multiple discrete characteristic variables are converted into multi-dimensional continuous characteristic variables, and information loss in the subsequent characteristic processing process can be effectively reduced.

Description

Method, device, equipment and storage medium for recognizing intention of dialogue information

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a method, an apparatus, a device, and a storage medium for recognizing an intention of dialog information.

Background

Task-oriented dialog refers to an automated communication dialog between artificial intelligence and a user. In the task-oriented dialog, intention recognition of dialog text input by the user is required to feed back appropriate response text to the user.

In the related technology, in the application process of task-oriented dialog, an automatic response system determines a fixed response text in advance, then determines a keyword in the dialog text according to a preset keyword extraction rule after acquiring the dialog text input by a user, determines a next sentence of response text corresponding to the keyword based on the keyword in the dialog text, and then displays the next sentence of response text to the user, thereby realizing automatic dialog.

However, in the above-described related art, the keywords cannot completely and accurately reflect the semantic information of the dialog text input by the user, thereby causing an inaccurate intention recognition of the dialog text input by the user.

Disclosure of Invention

The embodiment of the application provides an intention identification method, device and equipment of dialog information and a storage medium, and the accuracy of an intention identification result can be improved. The technical scheme is as follows:

according to an aspect of an embodiment of the present application, there is provided a method for recognizing an intention of dialog information, the method including:

acquiring a dialog text to be identified;

carrying out feature extraction processing on the dialog text to obtain feature information of the dialog text; wherein the feature information comprises a plurality of discrete feature variables;

carrying out multi-dimensional classification coding on the feature information, converting a plurality of discrete feature variables in the feature information into continuous feature variables, and obtaining a multi-dimensional feature vector of the feature information; wherein the multidimensional feature vector comprises a plurality of continuous feature variables of different dimensions;

classifying the multi-dimensional feature vectors respectively to obtain multi-dimensional classification results of the feature information, wherein the multi-dimensional classification results comprise a plurality of classification results of different dimensions;

and performing fusion recognition processing on the multi-dimensional classification result, and determining an intention recognition result of the dialog text.

According to an aspect of an embodiment of the present application, there is provided a method for training an intention recognition model, the method including:

obtaining at least one training sample of an intention recognition model, wherein the training sample comprises a sample dialogue text and label information corresponding to the sample dialogue text;

carrying out feature extraction processing on the sample dialogue text to obtain feature information of the sample dialogue text; wherein the feature information comprises a plurality of discrete feature variables;

carrying out multi-dimensional classification coding on the feature information, converting a plurality of discrete feature variables in the feature information into continuous feature variables, and obtaining a multi-dimensional feature vector of the feature information; the multi-dimensional feature vector comprises a plurality of feature vectors with different dimensions, and the feature vectors are continuous feature variables;

performing fusion recognition processing on the multi-dimensional classification result, and determining an intention recognition result of the dialog text;

calculating a loss function value of the intention recognition model based on the multi-dimensional classification result, the intention recognition result and label information corresponding to the sample dialog text;

adjusting parameters of an intent recognition model of the dialog information based on the loss function value.

According to an aspect of an embodiment of the present application, there is provided an intention recognition apparatus for dialog information, the apparatus including:

the text acquisition module is used for acquiring a dialog text to be identified;

the feature extraction module is used for carrying out feature extraction processing on the dialogue text to obtain feature information of the dialogue text; wherein the feature information comprises a plurality of discrete feature variables;

the characteristic coding module is used for carrying out multi-dimensional classified coding on the characteristic information, converting a plurality of discrete characteristic variables in the characteristic information into continuous characteristic variables and obtaining multi-dimensional characteristic vectors of the characteristic information; wherein the multidimensional feature vector comprises a plurality of continuous feature variables of different dimensions;

the characteristic classification module is used for respectively classifying the multi-dimensional characteristic vectors to obtain multi-dimensional classification results of the characteristic information, wherein the multi-dimensional classification results comprise a plurality of classification results with different dimensions;

and the result fusion module is used for carrying out fusion recognition processing on the multi-dimensional classification result and determining the intention recognition result of the dialog text.

According to an aspect of an embodiment of the present application, there is provided an apparatus for training an intention recognition model, the apparatus including:

the system comprises a sample acquisition module, a recognition module and a recognition module, wherein the sample acquisition module is used for acquiring at least one training sample of an intention recognition model, and the training sample comprises a sample dialogue text and label information corresponding to the sample dialogue text;

the characteristic acquisition module is used for carrying out characteristic extraction processing on the sample conversation text to obtain characteristic information of the sample conversation text; wherein the feature information comprises a plurality of discrete feature variables;

the vector acquisition module is used for carrying out multi-dimensional classified coding on the feature information, converting a plurality of discrete feature variables in the feature information into continuous feature variables and obtaining multi-dimensional feature vectors of the feature information; the multi-dimensional feature vector comprises a plurality of feature vectors with different dimensions, and the feature vectors are continuous feature variables;

the vector classification module is used for respectively classifying the multi-dimensional feature vectors to obtain a multi-dimensional classification result of the feature information, wherein the multi-dimensional classification result comprises a plurality of classification results with different dimensions;

the result acquisition module is used for carrying out fusion recognition processing on the multi-dimensional classification result and determining an intention recognition result of the dialog text;

a function value determining module, configured to calculate a loss function value of the intention recognition model based on the multi-dimensional classification result, the intention recognition result, and label information corresponding to the sample dialog text;

and the parameter adjusting module is used for adjusting the parameters of the intention recognition model of the dialogue information based on the loss function value.

According to an aspect of an embodiment of the present application, there is provided a computer device including a processor and a memory, where at least one instruction, at least one program, a code set, or a set of instructions is stored in the memory, and the at least one instruction, the at least one program, the code set, or the set of instructions is loaded and executed by the processor to implement the intent recognition method for dialog information described above or to implement the training method for the intent recognition model described above.

According to an aspect of the embodiments of the present application, a computer-readable storage medium is provided, in which at least one instruction, at least one program, a code set, or a set of instructions is stored, and the at least one instruction, the at least one program, the code set, or the set of instructions is loaded and executed by a processor to implement the method for intention recognition of dialog information described above, or to implement the method for training the intention recognition model described above.

According to an aspect of embodiments herein, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and executes the computer instructions, so that the computer device executes the intention recognition method for the dialogue information or implements the training method for the intention recognition model.

The technical scheme provided by the embodiment of the application can bring the following beneficial effects:

the multi-dimensional classification result of the dialog text is determined through multi-dimensional classification coding and multi-dimensional protection classification processing of the dialog text, the multi-dimensional classification result is fused, identified and processed, the intention identification result of the dialog text is determined, the intention identification result is determined through analysis processing of multiple dimensions, and accuracy of the intention identification result can be improved; moreover, during classification coding, a plurality of discrete characteristic variables are converted into multi-dimensional continuous characteristic variables, so that information loss in the subsequent characteristic processing process can be effectively reduced, and the accuracy and reliability of the intention identification result can be improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

FIG. 1 is a flow chart of an intent recognition method for dialog information provided by an embodiment of the present application;

FIG. 2 is a flow chart of a method for intent recognition of dialog messages according to another embodiment of the present application;

FIG. 3 is a flow diagram of a method for training an intent recognition model according to one embodiment of the present application;

FIG. 4 illustrates a schematic diagram of the structure of an intent recognition model;

FIG. 5 is a schematic diagram illustrating the structure of a VAE model;

FIG. 6 illustrates a diagram of an application of an intent recognition model in a medical application scenario;

FIG. 7 is a block diagram of an intent recognition apparatus for dialog information provided in one embodiment of the present application;

FIG. 8 is a block diagram of an intent recognition apparatus for dialog information provided in accordance with another embodiment of the present application;

FIG. 9 is a block diagram of an apparatus for training an intent recognition model provided in one embodiment of the present application;

FIG. 10 is a block diagram of a training apparatus for an intent recognition model provided in one embodiment of the present application;

fig. 11 is a block diagram of a computer device according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, the following detailed description of the embodiments of the present application will be made with reference to the accompanying drawings.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is an integrated technique in computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the implementation method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence base technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between a person and a computer using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it has a close relation with the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question answering, knowledge mapping, and the like.

Machine Learning (ML) is a multi-domain cross subject, and relates to multiple subjects such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The method specially researches how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganizes the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach to make computers intelligent, and is applied in various fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and teaching learning.

With the research and development of artificial intelligence technology, the artificial intelligence technology is developed and applied in a plurality of fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, autonomous driving, unmanned aerial vehicles, robots, smart medical services, smart customer service, and the like.

The scheme provided by the embodiment of the application relates to the technologies such as machine learning of artificial intelligence and the like, and the intention recognition model is trained by adopting training samples. The training sample comprises a sample dialogue text and label information corresponding to the sample dialogue text. And then inputting the dialog text into the intention recognition model after the training of the intention recognition model is finished, determining an intention recognition result of the dialog text by the intention recognition model, and selecting a response text corresponding to the dialog text from a dialog text library by the computer equipment according to the intention recognition result to realize automatic dialog between the user and the intelligent customer service.

It should be noted that the intention recognition model provided by the present application can be widely applied to various application scenarios. The method comprises the following specific steps:

(1) In a medical application scenario, a user can know relevant regulations of a hospital through an automatic response system of the hospital. The method comprises the steps that a user inputs contents which the user wants to know into an automatic question-answering system of a hospital, the automatic question-answering system generates a conversation text according to the contents input by the user, the conversation text is processed by an intention recognition model, an intention recognition result of the conversation text is determined, then a response text associated with the intention recognition result is selected from a conversation text library based on the intention recognition result, and the response text is displayed to the user, so that the user can communicate with the automatic response system through the response text, and due to the particularity of a medical application scene, the automatic response system can effectively provide basic medical services for the user, reasonable arrangement of medical resources is guaranteed from the side, and enough medical resources can be arranged for the emergency in time when the emergency occurs.

(2) In a traffic application scene, a user reasonably plans a route through an automatic response system of a vehicle-mounted terminal. The method comprises the steps that a user inputs a place to be visited to an automatic response system of a vehicle-mounted terminal, the automatic response system generates a dialog text to be recognized according to the place input by the user and in combination with the traffic travel field to which the vehicle-mounted terminal belongs, the dialog text is processed by an intention recognition system, the intention recognition result of the dialog text is determined, then a response text related to the intention recognition result is selected from a dialog text library based on the intention recognition result, and the response text is displayed for the user, so that the user can reasonably plan a route according to information provided by the response text.

(3) In the shopping application scene, the user reasonably purchases the required articles through an automatic response system of the shopping application program. The method comprises the steps that a user inputs a service function required at present into an automatic response system of a shopping application program, the automatic question and answer system generates a conversation text according to the service function required at present by the user, the conversation text is processed by an intention recognition model, an intention recognition result of the conversation text is determined, a response text related to the intention recognition result is selected from a conversation text library based on the intention recognition result, and the response text is displayed to the user. It should be noted that shopping in the shopping application scenario refers to any consumption of real or virtual items.

Of course, the intention recognition model of dialog information in the present application can also be applied to other various fields, which are not exemplified here.

For convenience of description, in the following method embodiments, only the execution subject of each step is described as an example of a computer device, and the computer device may be any electronic device with computing and storage capabilities. For example, the computer device may be a server, which may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a web service, cloud communication, a middleware service, a domain name service, a security service, a CDN (Content Delivery Network), and a big data and artificial intelligence platform. For another example, the computer device may also be a terminal, and the terminal may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein. It should be noted that, in the embodiment of the present application, the execution main body of each step may be the same computer device, or may be executed by a plurality of different computer devices in an interactive cooperation manner, which is not limited herein. It should be noted that, in the embodiment of the present application, the main execution body of the intent recognition method of the dialog information described below may be the same computer device as the main execution body of the intent recognition method of the intent recognition model described below, or may be a different computer device, and the embodiment of the present application is not limited to this.

The technical solution of the present application will be described in detail with reference to several embodiments.

Referring to fig. 1, a flowchart of an intention recognition method for dialog information according to an embodiment of the present application is shown. The method may comprise the following steps (101-105):

step 101, a dialog text to be recognized is obtained.

The dialog text refers to text acquired by contents input by the user. Optionally, the content input by the user includes text information, voice information, dynamic image information, static image information, and the like, which is not limited in this embodiment of the present application.

In one possible embodiment, the computer device directly uses the content entered by the user as the dialog text to be recognized.

In another possible implementation, after acquiring the content input by the user, the computer device performs optimization processing on the content, and then takes the content after optimization processing as the dialog text to be recognized. Wherein the optimization process includes, but is not limited to, at least one of the following: extracting text information in the image information, converting voice information into text information, removing invalid contents (such as punctuation marks, blank spaces, meaningless auxiliary words and the like) in the text information, and the like.

In the embodiment of the application, after detecting the content input by the user, the computer device acquires the dialog text to be recognized, and further determines the user intention according to the intention recognition result of the dialog text. The user intention means a meaning that the user expects to express through the dialogue text.

Optionally, the computer device may obtain the dialog text to be recognized in real time, or may obtain the dialog text to be recognized at a time interval. In practical application, the computer device determines the acquisition mode of the dialog text to be recognized according to the actual application scene.

In a possible implementation manner, the actual application scenario is an instant conversation scenario, and the computer device obtains the conversation text to be recognized in real time. The instant conversation scene refers to a scene in which a user needs to respond to the content input by the user in time, such as registration, dish ordering, real-time retrieval and the like. Alternatively, in this case, when it is detected that the user provides new input content, the computer device obtains the content input by the user from itself or another device in real time, and generates the dialog text to be recognized according to the content.

In another possible implementation, the actual application scenario is a non-instantaneous dialog scenario, and the computer device obtains the dialog text to be recognized at certain time intervals. The non-instantaneous dialogue scene refers to a scene that does not need to respond to the content input by the user in time, such as disease inquiry, path planning, and the like. Alternatively, in this case, the computer acquires target content input by the user from itself or another device at certain time intervals, and generates the dialog text to be recognized based on the target content. The target content may include a plurality of contents input by a single user and a plurality of different contents input by a plurality of users, which is not limited in this embodiment of the present application. It should be noted that the time interval may be any value, such as 1s, 1min, 1h, 1 day, or 1 month, and the computer device may flexibly set and adjust the time interval according to the actual situation, which is not limited in the embodiment of the present application.

In actual application, the computer device may adopt different dialog text acquisition modes to be recognized according to different situations for a certain dialog of the user. Exemplarily, in the medical field, in a disease inquiry stage, a user needs to determine a corresponding disease according to disease description information provided by the user, at this time, the requirement on instantaneity of a conversation scene is not high, after the disease description information provided by the user is obtained, computer equipment obtains the disease description information at a certain time interval, determines a conversation text to be identified according to the disease description information, and then after the computer equipment determines a user intention, a disease inquiry unit determines a disease identification result through the disease description information. And then, after the computer equipment provides disease description results for the user, the conversation scene is changed into an even-response scene, the intelligent customer service performs even response on the next question information of the user, and at the moment, the computer equipment acquires the content input by the user in real time and determines the conversation text to be recognized according to the content.

And 102, performing feature extraction processing on the dialog text to obtain feature information of the dialog text.

In the embodiment of the application, after acquiring the dialog text to be recognized, the computer device performs feature extraction processing on the dialog text to obtain feature information of the dialog text. Wherein, the characteristic information comprises a plurality of discrete characteristic variables.

Optionally, after acquiring the dialog text, the computer device performs feature extraction processing on the dialog text, acquires features included in the dialog text, and generates feature information composed of the features. At this time, each feature is independent and discrete in the feature information, that is, the feature is a discrete feature variable.

In one possible implementation, the computer device obtains the characteristic information through text preprocessing. Optionally, after the dialog text is obtained, the computer device performs word segmentation on the dialog text to obtain each entity in the dialog text, and further selects an important entity as a keyword according to the importance degree of each entity, and further determines the features included in the dialog text according to the keyword. The importance degree of the entity is obtained statistically according to the actual meaning of the entity and the use condition of the entity in the big data.

In another possible implementation, the computer device obtains the feature information through a feature extraction network. Optionally, after acquiring the dialog text, the computer device inputs the dialog text into the feature extraction network, and then acquires feature information output by the feature extraction network.

And 103, carrying out multi-dimensional classification coding on the feature information, converting a plurality of discrete feature variables in the feature information into continuous feature variables, and obtaining a multi-dimensional feature vector of the feature information.

In the embodiment of the application, after the computer device obtains the feature information, the computer device performs multi-dimensional classification coding on the feature information, converts a plurality of discrete feature variables in the feature information into continuous feature variables, and obtains a multi-dimensional feature vector of the feature information.

The multi-dimensional feature vector comprises a plurality of continuous feature variables with different dimensions. Optionally, in this embodiment of the present application, the multidimensional feature vector includes a domain feature vector, an action feature vector, and a noun feature vector.

Optionally, in this embodiment of the application, after obtaining the feature information, the computer device uses a domain encoder to perform classified encoding on the feature information from a domain dimension, and further obtains a domain feature vector output by the domain encoder. Wherein the domain feature vector is used for indicating the domain to which the dialog text belongs. Illustratively, the domain may be a wide range of domains, such as medical, transportation, education, and the like; alternatively, the field may be a small-scale field, such as a cold, a fever, a bus trip, a walk, an examination, a book borrowing, and the like, which is not limited in the embodiment of the present application.

Optionally, in this embodiment of the application, after obtaining the feature information, the computer device uses a motion coding classifier to perform classification coding on the feature information from a motion dimension, and further obtains a motion feature vector output by the motion coder. Wherein the motion feature vector is used to indicate a motion to be performed by an inputter of the dialog text (i.e., the user). Illustratively, the action may be an action directly contained in the above-described dialog text, such as "see" in the dialog text "see book"; or, the action may also be an action indirectly included in the dialog text, such as "see, borrow, still, etc" possibly included in the dialog text "book", in which case, of course, a plurality of intention recognition results may be included in the intention recognition result for the dialog text acquired subsequently, and the intention recognition results may be presented to the user, and the user may select the actual intention of the user according to the actual situation. The action to be performed may be an action that the user himself needs to perform, or may be a service action that the user needs to perform by another person, which is not limited in this embodiment of the present application.

Optionally, in this embodiment of the present application, after obtaining the feature information, the computer device performs classified coding on the feature information from a physical noun dimension by using a physical noun coding classifier, and further obtains a physical noun feature vector output by the physical noun coder. The term feature vector is used to indicate a term associated with the field and the action, and the number of the term may be one or more, which is not limited in the embodiments of the present application. Illustratively, the entity noun may be an entity noun directly contained in the dialog text described above, such as "see" in the dialog text "see book"; or, the entity noun may also be an entity noun indirectly contained in the dialog text, such as "book, tv play, video, etc" possibly contained in the dialog text "see", in which case, of course, a plurality of intention recognition results may be included in the subsequently obtained intention recognition result for the dialog text, and the plurality of intention recognition results are presented to the user, and the user selects the actual intention of the user according to the actual situation.

And step 104, respectively carrying out classification processing on the multi-dimensional feature vectors to obtain multi-dimensional classification results of the feature information.

In the embodiment of the present application, after the computer device obtains the multi-dimensional feature vectors, the multi-dimensional feature vectors are respectively classified to obtain a multi-dimensional classification result of feature information. Wherein the multi-dimensional classification result comprises a plurality of classification results of different dimensions.

Optionally, the multidimensional classification result includes a domain classification result, an action classification result, and an entity noun classification result, corresponding to the multidimensional feature vector in the foregoing.

And 105, performing fusion recognition processing on the multi-dimensional classification result to determine an intention recognition result of the dialog text.

In the embodiment of the application, after the computer device obtains the multi-dimensional classification result, fusion recognition processing is performed on the multi-dimensional classification result, and an intention recognition result of the dialog text is determined. Wherein the intention recognition result is used for indicating the dialog intention of the user aiming at the dialog text.

Optionally, in a case that the multi-dimensional classification result includes a domain classification result, an action classification result, and a physical noun classification result, the computer device performs superposition recognition processing on the domain classification result, the action classification result, and the physical noun classification result, and then determines an intention recognition result of the dialog text according to the superposition recognition result.

In summary, in the technical scheme provided by the embodiment of the application, the multidimensional classification result of the dialog text is determined through multidimensional classification coding and multidimensional maintenance classification processing of the dialog text, the multidimensional classification result is further subjected to fusion recognition processing, the intention recognition result of the dialog text is determined, the intention recognition result is determined through analysis processing of multiple dimensions, and the accuracy of the intention recognition result can be improved; moreover, during classification coding, a plurality of discrete characteristic variables are converted into multi-dimensional continuous characteristic variables, so that information loss in the subsequent characteristic processing process can be effectively reduced, and the accuracy and reliability of the intention identification result can be improved.

Next, the manner of obtaining the target dimension classification result is described in terms of the target dimension. The target dimension may be any dimension, such as the above domain dimension, the action dimension, the noun dimension, and the like, which is not limited in the embodiments of the present application.

In an exemplary embodiment, the above step 104 includes the following steps:

1. aiming at the target dimension, processing the target dimension feature vector by adopting a mean value calculation network to obtain a mean value vector of the target dimension feature vector;

2. processing the target dimension characteristic vector by adopting a variance calculation network to obtain a variance vector of the target dimension characteristic vector;

3. carrying out noise superposition processing on the variance vector by adopting target noise to obtain a processed variance vector;

4. determining feature vectors to be classified corresponding to the target dimension feature vectors based on the mean vector and the processed variance vectors;

5. and classifying the characteristic vectors to be classified to obtain a target dimension classification result of the characteristic information.

The mean vector is used to indicate the distribution mean of the feature vector of the target dimension over space. The variance vector is used to indicate the distribution variance of the target dimension feature vector over space. In terms of target dimensions, in the embodiment of the present application, after obtaining a target dimension feature vector, a computer device processes the target dimension feature vector by using a mean computer network to obtain a mean vector of the target dimension feature vector, and processes the target dimension feature vector by using a variance computer network to obtain a variance vector of the target dimension feature vector.

And then, the computer equipment performs noise superposition processing on the variance vector by adopting target noise to obtain a processed variance vector, further determines a feature vector to be classified corresponding to the target dimension vector based on the mean vector and the processed variance vector, and performs classification processing on the feature vector to be classified to obtain a target dimension classification result of the feature information. The target noise may be any noise satisfying a normal distribution, such as gaussian noise.

Optionally, the computer device performs classification processing on the feature vectors to be classified, and performs classification processing on the feature vectors to be classified with different dimensions by using classifiers with different dimensions. Exemplarily, for the domain dimension, the computer device uses a domain classifier to classify the feature vectors to be classified of the domain dimension to obtain a domain classification result of the feature vectors; for the action dimension, the computer equipment adopts an action classifier to classify the feature vectors to be classified of the action dimension to obtain an action classification result of the feature vectors; for the dimension of the entity noun, the computer equipment adopts the entity noun classifier to classify the feature vector to be classified of the dimension of the entity noun, and the entity noun classification result of the feature vector is obtained.

Referring to fig. 2, a flowchart of an intention recognition method for dialog information according to another embodiment of the present application is shown. The method can comprise the following steps (201-208):

step 201, obtaining a dialog text to be recognized.

Step 202, performing feature extraction processing on the dialog text to obtain feature information of the dialog text.

And 203, carrying out multi-dimensional classification coding on the feature information, converting a plurality of discrete feature variables in the feature information into continuous feature variables, and obtaining a multi-dimensional feature vector of the feature information.

And 204, classifying the multi-dimensional feature vectors respectively to obtain multi-dimensional classification results of the feature information.

And step 205, performing fusion recognition processing on the multi-dimensional classification result, and determining an intention recognition result of the dialog text.

The above steps 201 to 205 are the same as steps 101 to 105 in the embodiment of fig. 1, and are specifically referred to the embodiment of fig. 1, which is not repeated herein.

And step 206, acquiring the response text associated with the intention recognition result from the dialog text library as a candidate response text according to the intention recognition result.

In the embodiment of the application, after acquiring the intention recognition result, the computer device acquires the answer text associated with the intention recognition result from the dialog text library as a candidate answer book according to the intention recognition result. Optionally, after obtaining the response text, the computer device may display the response text to the user, and then the user inputs a new dialog text according to the response text and in combination with an actual situation.

Optionally, the dialog text library stores response texts corresponding to the intention recognition results. In the embodiment of the application, since the intention recognition result is determined by overlapping recognition according to the domain classification result, the action classification result and the entity noun classification result, the response text of the intention recognition result is stored in the dialog text library with the same dimension.

Illustratively, the dialog text library stores a response text corresponding to a target field, where the response text corresponding to the target field includes a response text directly corresponding to the target field (used when the intention recognition result cannot include an action and a physical noun indicating accuracy), and the response text corresponding to the target field includes a response text corresponding to each action (used when the intention recognition result indicates an area and an action accurately but cannot indicate a physical noun accurately), and besides, the response text corresponding to the target action includes a response text corresponding to each physical noun (used when the intention recognition result indicates an area, an action, and a physical noun accurately); or, the dialog text library stores a response text corresponding to the target action, the response text corresponding to the target action includes a response text directly corresponding to the target action (used when the intention recognition result cannot include the accurate field and entity noun), and the response text corresponding to the target action includes a response text corresponding to each field (used when the intention recognition result indicates the field and action accurately but cannot indicate the entity noun accurately), and besides, the response text corresponding to the target field includes a response text corresponding to each entity noun (used when the intention recognition result indicates the field, action and entity noun accurately); alternatively, the dialog text library stores a response text corresponding to a target entity noun, the response text corresponding to the target entity noun includes a response text directly corresponding to the target entity noun (used when the intention recognition result cannot include a field and an action indicating accuracy), and the response text corresponding to the target entity noun includes a response text corresponding to each field (used when the intention recognition result indicates a field and an entity noun accurately but cannot indicate an action accurately), and in addition, the response text corresponding to the target field includes a response text corresponding to each action (used when the intention recognition result indicates a field, an action, and an entity noun accurately). Alternatively, the computer device may mark each response text in the form of "domain-action-entity noun" to realize fast search of the response text, wherein if the response text does not have a corresponding domain, the corresponding domain is represented by "empty", and the action and the entity noun are similar to the domain.

And step 207, acquiring the quality evaluation index of each selected response text under the condition that the number of the candidate response texts is not unique.

In this embodiment of the application, after acquiring the candidate answer texts, the computer device determines a processing mode for the candidate answer texts based on the number of the candidate answer texts. If the number of the candidate answer texts is unique, the computer equipment directly takes the candidate answer texts as answer texts corresponding to the dialog texts; and if the number of the candidate answer texts is not unique, the computer equipment acquires the quality evaluation index of each candidate answer text, and selects the answer text corresponding to the dialog text from the candidate answer texts based on the quality evaluation index.

Wherein the quality evaluation index is used for indicating the quality of the response text. Optionally, the quality assessment indicator includes, but is not limited to, at least one of: the adoption rate of the response text, the amount of information contained in the response text, the number of uses of the response text, and the like. The adoption rate of the response text refers to the ratio of the number of accurate responses of the response text to the number of times of use of the response text, the accurate responses of the response text refer to that the user continues to have a smooth conversation with respect to the response text, and exemplarily, under the condition that negative keywords such as 'no right' and 'wrong' do not appear in the conversation text of the response text, the user is determined to continue to have a smooth conversation with respect to the response text.

And step 208, selecting candidate response texts with quality evaluation indexes meeting the conditions from the candidate response texts as response texts corresponding to the dialog texts.

In the embodiment of the application, after acquiring the quality evaluation index, the computer device selects a candidate answer text with the quality evaluation index meeting the condition from the candidate answer texts as an answer text corresponding to the dialog text. Alternatively, the condition may be a condition for a single index in the quality evaluation indexes, or may also be a condition for multiple indexes in the quality evaluation indexes, and a computer device or a worker may flexibly set and adjust the condition according to the actual situation, which is not limited in this embodiment of the present application.

In summary, in the technical scheme provided by the embodiment of the application, the response text corresponding to the dialog text is obtained through the intention recognition result, so that smooth proceeding of automatic dialog in the intelligent service is ensured, and the response text meeting the conditions is selected according to the quality evaluation indexes of the candidate response texts, so that the reliability of the response text is ensured.

Referring to fig. 3, a flowchart of a method for training an intention recognition model according to an embodiment of the present application is shown. The method may comprise the following steps (301-307):

step 301, at least one training sample of an intent recognition model is obtained.

The intention recognition model refers to a deep learning model that determines an intention recognition result of a dialog text from feature information of the dialog text. In an embodiment of the application, a computer device obtains at least one training sample of an intent recognition model prior to training the intent recognition model. The training sample comprises a sample dialog text and label information corresponding to the sample dialog text. Optionally, the computer device obtains the training sample based on historical data of an automatic response system, or the computer device searches for obtaining the training sample from a network environment.

Step 302, performing feature extraction processing on the sample dialogue text to obtain feature information of the sample dialogue text.

In this embodiment of the present application, after obtaining the sample dialog text, the computer device performs feature extraction processing on the sample dialog text to obtain feature information of the sample dialog text. Wherein, the characteristic information comprises a plurality of discrete characteristic variables.

Optionally, the computer device performs feature extraction processing on the sample dialog text through a feature extraction layer in the intention recognition model, and further obtains feature information output by the feature extraction network.

Step 303, performing multidimensional classification coding on the feature information, and converting a plurality of discrete feature variables in the feature information into continuous feature variables to obtain a multidimensional feature vector of the feature information.

In the embodiment of the application, after the computer device obtains the feature information, the computer device performs multi-dimensional classification coding on the feature information, converts a plurality of discrete feature variables in the feature information into continuous feature variables, and obtains a multi-dimensional feature vector of the feature information. Wherein the multi-dimensional feature vector comprises a plurality of continuous feature variables of different dimensions.

Optionally, the computer device separately performs classified encoding on the feature information from different dimensions by using encoders of different dimensions in the intention recognition model, and further separately obtains a plurality of continuous feature variables of different dimensions output by the encoders of different dimensions.

And 304, respectively classifying the multi-dimensional feature vectors to obtain multi-dimensional classification results of the feature information.

In this embodiment, after obtaining the multidimensional feature vector, the computer device performs classification processing on the multidimensional feature vector, respectively, to obtain a multidimensional classification result of feature information. The multi-dimensional classification result comprises a plurality of classification results with different dimensions.

Optionally, when the computer device obtains the multi-dimensional classification result, the multiple dimensional feature vectors are respectively processed through a mean calculation network in the intention identification model to obtain mean vectors corresponding to the multiple dimensional feature vectors, and the multiple dimensional feature vectors are respectively processed through a variance calculation network in the intention identification model to obtain variance vectors corresponding to the multiple dimensional feature vectors. And then, superposing target noise based on the mean vector and the variance vector respectively corresponding to the multiple dimensional feature vectors through a noise superposition network in the intention identification model, determining the feature vectors to be classified respectively corresponding to the multiple feature dimensional vectors, and classifying the feature vectors to be classified in the corresponding dimensions respectively through classifiers in different dimensions in the intention identification model to obtain the multi-dimensional classification result. The mean vector is used for indicating the distribution mean of a certain dimension characteristic vector in the space, and the variance vector is used for indicating the distribution variance of the certain dimension characteristic vector in the space.

And 305, performing fusion recognition processing on the multi-dimensional classification result, and determining an intention recognition result of the sample dialog text.

In the embodiment of the application, after the computer device obtains the multi-dimensional classification result, fusion recognition processing is performed on the multi-dimensional classification result, and an intention recognition result of the sample dialog text is determined.

Optionally, the computer device performs an overlay recognition process on the multi-dimensional classification result through a result fusion layer in the intention recognition model, and determines an intention recognition result of the sample dialog text.

And step 306, calculating a loss function value of the intention recognition model based on the multi-dimensional classification result, the intention recognition result and the label information corresponding to the sample dialog text.

In an embodiment of the application, the computer device calculates a loss function value of the intention recognition model based on the multi-dimensional classification result, the intention recognition result, and the label information corresponding to the sample dialog text. Wherein the loss function is used to measure the reliability of the intention recognition model.

Step 307, parameters of the intent recognition model are adjusted based on the loss function values.

In this embodiment, after obtaining the loss function value, the computer device adjusts a parameter of the intention recognition model based on the loss function value, and continues to train the intention recognition model after the parameter adjustment by using the training sample until the loss function value converges.

In summary, in the technical solution provided in the embodiment of the present application, the intention recognition model is trained through the training samples, and in the training process, during the classification coding, a plurality of discrete feature variables are converted into multi-dimensional continuous feature variables, so that the information loss of the intention recognition model in the use process can be effectively reduced, and the accuracy of the intention recognition result can be improved.

Optionally, in this embodiment of the application, the multidimensional classification result includes a domain classification result, an action classification result, and an entity noun classification result, and the tag information includes a domain classification tag, an action classification tag, an entity noun classification tag, and an intention identification tag corresponding to the sample dialog text. Wherein, the label is the accurate result corresponding to the intention identification model. Optionally, the computer device splits the intention identification tag based on a preset rule, and obtains a domain classification tag, an action classification tag, and a physical noun classification tag. Next, a mode of obtaining the loss function value will be described.

In an exemplary embodiment, the above step 306 includes the following steps:

1. determining a first loss function value corresponding to the intention recognition model based on the field classification result, the action classification result and the entity noun classification result and in combination with the field classification label, the action classification label and the entity noun classification label;

2. determining a second loss function value corresponding to the intention identification model based on the intention identification result and the intention identification label;

3. a loss function value for the intent recognition model is determined based on the first loss function value and the second loss function value.

The first loss function is used to measure the accuracy of the multi-dimensional classification result of the intent recognition model. The second loss function is used to measure the accuracy of the intent recognition result of the intent recognition model. In the embodiment of the application, in order to ensure the reliability of each classifier in the intention recognition model, the computer device determines a first loss function value corresponding to the intention recognition model based on the field classification result, the action classification result and the entity noun classification result in combination with the field classification label, the action classification label and the entity noun classification label; in order to ensure the accuracy of the intention recognition result in the intention recognition model, the computer device determines a second loss function value corresponding to the intention recognition model based on the intention recognition result and the intention recognition label. Further, the computer device determines a loss function value for the intent recognition model based on the first loss function value and the second loss function value.

Optionally, when the first loss function value is obtained, determining a loss function value corresponding to a domain classifier in the intention recognition model based on a domain classification result and in combination with a domain classification tag, an action classification tag and a noun classification tag; determining a loss function value corresponding to an action classifier in the intention recognition model based on the action classification result and in combination with the domain classification label, the action classification label and the entity noun classification label; based on the entity noun classification result, determining a loss function value corresponding to an entity noun classifier in the intention recognition model by combining a domain classification label, an action classification label and an entity noun classification label; and then determining a first loss function value according to the loss function value corresponding to the domain classifier, the loss function value corresponding to the action classifier and the loss function value corresponding to the entity noun classifier. The computer device may add the loss function value corresponding to the domain classifier, the loss function value corresponding to the action classifier, and the loss function value corresponding to the noun classifier to obtain a first loss function value.

Exemplarily, let s be _d Is a domain dimension ofClassification feature vector, s _a Feature vectors to be classified, s, for the action dimension _s The feature vector to be classified of the entity noun dimension is the intention identification result of the intention identification model

Comprises the following steps:

wherein Decoder stands for Decoder, [ s ] _d ，s _a ，s _s ]H + sigma belongs to the element from N (0, I), h represents the mean vector, sigma represents the variance vector, and e represents the Gaussian noise conforming to the normal distribution;

loss function L of a classifier in the first loss function ⁱ _enc Comprises the following steps:

wherein, W _i Parameters representing classifiers, s _i Representing an intention-identifying tag, a _i Representing the classifier output, A _i A classification label representing a classifier, d represents a domain, a represents an action, and s represents a noun;

second loss function L _dec Comprises the following steps:

wherein | S | represents a training sample, n represents the number of samples of the training sample, S _n It is indicated that the intention is to identify the tag,

indicating the intent recognition result.

Next, the structure of the intention recognition model will be described. Optionally, the intent recognition model comprises: the system comprises a feature extraction layer, a domain encoder, an action encoder, a physical noun encoder, a mean calculation network, a variance calculation network, a noise superposition network, a domain classifier, an action classifier, a physical noun classifier and a result fusion layer. The feature extraction layer is used for performing feature extraction processing on the dialog text to acquire feature information corresponding to the dialog text, wherein the feature information comprises a plurality of discrete feature variables; the domain encoder is used for performing domain classification encoding on the characteristic information, converting a plurality of discrete characteristic variables in the characteristic information into continuous characteristic variables to obtain domain characteristic vectors of the characteristic information, wherein the domain characteristic vectors are the continuous characteristic variables; the action encoder is used for carrying out action classification encoding on the characteristic information, converting a plurality of discrete characteristic variables in the characteristic information into continuous characteristic variables to obtain action characteristic vectors of the characteristic information, wherein the action characteristic vectors are the continuous characteristic variables; the entity noun encoder is used for carrying out entity noun classification encoding on the feature information, converting a plurality of discrete feature variables in the feature information into continuous feature variables to obtain entity noun feature vectors of the feature information, wherein the entity noun feature vectors are the continuous feature variables; the mean value calculation network is used for respectively processing the plurality of dimensional feature vectors to obtain mean value vectors corresponding to the plurality of dimensional feature vectors, and the mean value vectors are used for indicating the distribution mean values of the dimensional feature vectors in the space; the variance calculation network is used for respectively processing the plurality of dimensional feature vectors to obtain variance vectors corresponding to the plurality of dimensional feature vectors, and the variance vectors are used for indicating distribution variances of the dimensional feature vectors in space; the noise superposition network is used for superposing target noise based on the mean vector and the variance vector respectively corresponding to the multiple dimension feature vectors and determining the feature vectors to be classified respectively corresponding to the multiple feature dimension vectors; the domain classifier is used for classifying the feature vectors to be classified of the domain dimensions to obtain domain dimension classification results of the feature vectors; the action classifier is used for classifying the characteristic vectors to be classified of the action dimensions to obtain action dimension classification results of the characteristic vectors; the entity noun classifier is used for classifying the feature vectors to be classified of the entity noun dimensions to obtain entity noun dimension classification results of the feature vectors; and the result fusion layer is used for performing superposition recognition on the domain classification result, the action classification result and the entity noun classification result to determine the intention recognition result of the dialog text.

An intention recognition method of the intention recognition model will be described with reference to fig. 4, as an example. The computer equipment inputs the dialog text into the intention recognition model, and the feature extraction layer in the intention recognition model performs feature extraction processing on the dialog text to acquire feature information of the dialog text. Further, the domain encoder performs domain classification encoding on the feature information to obtain a domain feature vector h _d (ii) a The motion encoder carries out motion classification encoding on the characteristic information to obtain a motion characteristic vector h _a (ii) a The entity noun encoder carries out entity noun classification encoding on the characteristic information to obtain an entity noun characteristic vector h _s . Then, respectively obtaining a mean vector and a variance vector corresponding to each dimension feature vector through a mean calculation network and a variance calculation network, and obtaining a feature vector S to be classified of the field dimension through a noise superposition network _d Characteristic vector S to be classified of action dimension _a Entity noun dimension to be classified feature vector S _s . Then, the feature vector S to be classified is classified by a domain classifier _d Performing classification processing to determine a domain classification result; feature vector S to be classified through action classifier _a Performing classification processing to determine action classification results; characteristic vector S to be classified through entity noun classifier _s And (5) carrying out classification processing to determine entity noun classification results. And then, overlapping and fusing the domain classification result, the action classification result and the entity noun classification result, and decoding the overlapped and fused result by adopting a decoder to obtain an intention recognition result of the dialog text.

In addition, an improvement point of the intention recognition model in the present application is described in conjunction with a VAE (spatial automatic encoder) model in the related art. As shown in FIG. 5, in the VAE model, only six dimensions (X1, X2, X3, X4, X5, X6) are input, each dimension is composed of 0 or 1, and then two neural networks are used to calculate the corresponding dimensionValue and variance, introducing standard Gaussian noise to form new sampling variables (Z1, Z2, Z3, Z4, Z5 and Z6), and finally transforming the sampling variables into generated samples according to the generator

However, the intention recognition model in the application converts the discrete characteristic variables into continuous characteristic variables for processing, and effectively reduces the loss of information in the VAE model.

Also, the following tables 1 and 2 are data obtained by experiments:

TABLE 1 comparison of information loss for VAE models and intention recognition models in this application

Model (model)	Loss of information
		VAE	0.1
Intention recognition model of the present application	0.02

TABLE 2 comparison of number of convergences of VAE model and intention recognition model in this application

Model (model)	Converged data
		VAE	100
Intention recognition model of the present application	20

Therefore, the information loss of the intention recognition model in the data processing process is lower than that of the VAE model, the accuracy of the intention recognition result can be effectively ensured, the training steps required by the intention recognition model are fewer, and the cost required by model training is reduced to a certain extent.

In the following, referring to fig. 6, the intention identification method in the present application will be described by taking a medical application scenario as an example. The method specifically comprises the following steps:

step 601, obtaining a dialog text to be identified, wherein the dialog text is input by a user through an automatic answering type medical consultation system.

Step 602, inputting the dialog text into the intention recognition model, and obtaining an intention recognition result output by the intention recognition model.

Optionally, the intention recognition model comprises a feature extraction layer, a domain encoder, an action encoder, a physical noun encoder, a mean computation network, a variance computation network, a noise superposition network, a domain classifier, an action classifier, a physical noun classifier and a result fusion layer.

And performing feature extraction processing on the dialog text by the feature extraction layer to acquire feature information corresponding to the dialog text, wherein the feature information comprises a plurality of discrete feature variables.

Performing domain classification coding on the feature information by a domain coder, converting a plurality of discrete feature variables in the feature information into continuous feature variables to obtain domain feature vectors of the feature information, wherein the domain feature vectors are the continuous feature variables; performing motion classification coding on the feature information by a motion coder, converting a plurality of discrete feature variables in the feature information into continuous feature variables to obtain motion feature vectors of the feature information, wherein the motion feature vectors are continuous feature variables; the entity noun encoder is used for carrying out entity noun classification encoding on the feature information, a plurality of discrete feature variables in the feature information are converted into continuous feature variables, and entity noun feature vectors of the feature information are obtained and are the continuous feature variables.

Respectively processing the plurality of dimensional feature vectors by a mean calculation network to obtain mean vectors corresponding to the plurality of dimensional feature vectors, wherein the mean vector is used for indicating a distribution mean of the dimensional feature vectors in space; and respectively processing the plurality of dimensional feature vectors by a variance calculation network to obtain variance vectors corresponding to the plurality of dimensional feature vectors, wherein the variance vectors are used for indicating the distribution variance of the dimensional feature vectors in space.

And superposing the target noise by a noise superposition network based on the mean vector and the variance vector which respectively correspond to the multiple dimensional feature vectors, and determining the feature vectors to be classified which respectively correspond to the multiple dimensional feature vectors.

Classifying the feature vectors to be classified of the domain dimensions by a domain classifier to obtain domain dimension classification results of the feature vectors; classifying the characteristic vectors to be classified of the action dimensions by an action classifier to obtain action dimension classification results of the characteristic vectors; the entity noun classifier is used for classifying the feature vectors to be classified of the entity noun dimension to obtain entity noun dimension classification results of the feature vectors.

And the result fusion layer carries out superposition recognition on the domain classification result, the action classification result and the entity noun classification result to determine the intention recognition result of the dialog text.

Step 603, determining whether the dialog intention of the user is a specific disease consultation according to the intention recognition result. If the user's dialog intent is a specific medical consultation, go to step 604; if the user's dialog is not intended to be consulted for a particular medical condition, steps 605-608 are performed.

Step 604, distributing the manual consultation service to the users.

Step 605, obtaining the answer text associated with the intention recognition result from the dialog text library as a candidate answer text.

And 606, under the condition that the number of the candidate answer texts is not unique, acquiring the quality evaluation index of each selected answer text.

Step 607, selecting candidate answer texts with quality evaluation indexes meeting the conditions from the candidate answer texts as answer texts corresponding to the dialog texts.

Step 608, the response text is displayed to the user, and a new dialog text to be recognized is obtained, and the steps are repeated from step 601 until the dialog is finished.

The following are embodiments of the apparatus of the present application that may be used to perform embodiments of the method of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the embodiments of the method of the present application.

Referring to fig. 7, a block diagram of an apparatus for recognizing intent of dialog information according to an embodiment of the present application is shown. The device has the function of realizing the intention identification method of the dialogue information, and the function can be realized by hardware or hardware executing corresponding software. The device can be computer equipment, and can also be arranged in the computer equipment. The apparatus 700 may include: a text acquisition module 710, a feature extraction module 720, a feature encoding module 730, a feature classification module 740, and a result fusion module 750.

And the text acquisition module 710 is used for acquiring the dialog text to be recognized.

The feature extraction module 720 is configured to perform feature extraction processing on the dialog text to obtain feature information of the dialog text; wherein the feature information includes a plurality of discrete feature variables.

The feature coding module 730 is configured to perform multidimensional classification coding on the feature information, convert a plurality of discrete feature variables in the feature information into continuous feature variables, and obtain multidimensional feature vectors of the feature information; wherein the multi-dimensional feature vector comprises a plurality of continuous feature variables of different dimensions.

The feature classification module 740 is configured to perform classification processing on the multidimensional feature vectors respectively to obtain multidimensional classification results of the feature information, where the multidimensional classification results include the multiple classification results with different dimensions.

And a result fusion module 750, configured to perform fusion recognition processing on the multi-dimensional classification result, and determine an intention recognition result of the dialog text.

In an exemplary embodiment, the multi-dimensional feature vector includes a domain feature vector, an action feature vector, and a noun feature vector; wherein the domain feature vector is used for indicating a domain to which the dialog text belongs, the action feature vector is used for indicating an action to be performed by an inputter of the dialog text, and the entity noun feature vector is used for indicating an entity noun associated with the domain and the action.

In an exemplary embodiment, the feature classification module 740 is configured to, for a target dimension, process a target dimension feature vector by using a mean calculation network, and obtain a mean vector of the target dimension feature vector; the mean vector is used for indicating the distribution average value of the target dimension feature vector in space; processing the target dimension characteristic vector by adopting a variance calculation network to obtain a variance vector of the target dimension characteristic vector; wherein the variance vector is used for indicating the distribution variance of the target dimension feature vector in space; carrying out noise superposition processing on the variance vector by adopting target noise to obtain a processed variance vector; determining a feature vector to be classified corresponding to the target dimension feature vector based on the mean vector and the processed variance vector; and classifying the feature vectors to be classified to obtain a target dimension classification result of the feature information.

In an exemplary embodiment, as shown in fig. 8, the apparatus 700 further includes: candidate acquisition module 760, metric acquisition module 770, and text selection module 780.

And a candidate obtaining module 760, configured to obtain, according to the intention recognition result, a response text associated with the intention recognition result from a dialog text library as a candidate response text.

An index obtaining module 770, configured to obtain a quality evaluation index of each candidate answer text when the number of the candidate answer texts is not unique.

A text selecting module 780, configured to select, from the candidate response texts, a candidate response text with the quality evaluation index satisfying the condition as a response text corresponding to the dialog text.

To sum up, in the technical solution provided in the embodiment of the present application, the multidimensional classification result of the dialog text is determined through multidimensional classification coding and multidimensional maintenance classification processing of the dialog text, and then the multidimensional classification result is subjected to fusion recognition processing to determine the intention recognition result of the dialog text, and the intention recognition result is determined through analysis processing from multiple dimensions, so that the accuracy of the intention recognition result can be improved; moreover, during classification coding, a plurality of discrete characteristic variables are converted into multi-dimensional continuous characteristic variables, so that information loss in the subsequent characteristic processing process can be effectively reduced, and the accuracy and reliability of the intention identification result can be improved.

Referring to fig. 9, a block diagram of a training apparatus for an intention recognition model according to an embodiment of the present application is shown. The device has the function of realizing the training method of the intention recognition model, and the function can be realized by hardware or hardware executing corresponding software. The device can be computer equipment, and can also be arranged in the computer equipment. The apparatus 900 may include: a sample acquisition module 910, a feature acquisition module 920, a vector acquisition module 930, a vector classification module 940, a result acquisition module 950, a function value determination module 960, and a parameter adjustment module 970.

A sample obtaining module 910, configured to obtain at least one training sample of the intent recognition model, where the training sample includes a sample dialog text and tag information corresponding to the sample dialog text.

A feature obtaining module 920, configured to perform feature extraction processing on the sample dialog text to obtain feature information of the sample dialog text; wherein the feature information includes a plurality of discrete feature variables.

A vector obtaining module 930, configured to perform multidimensional classification coding on the feature information, and convert a plurality of discrete feature variables in the feature information into continuous feature variables to obtain a multidimensional feature vector of the feature information; the multi-dimensional feature vector comprises a plurality of feature vectors with different dimensions, and the feature vectors are continuous feature variables.

A vector classification module 940, configured to perform classification processing on the multidimensional feature vectors respectively to obtain multidimensional classification results of the feature information, where the multidimensional classification results include multiple classification results of different dimensions.

And a result obtaining module 950, configured to perform fusion recognition processing on the multi-dimensional classification result, and determine an intention recognition result of the dialog text.

A function value determining module 960, configured to calculate a loss function value of the intention recognition model based on the multi-dimensional classification result, the intention recognition result, and the label information corresponding to the sample dialog text.

A parameter adjusting module 970, configured to adjust a parameter of the intention recognition model of the dialog information based on the loss function value.

In an exemplary embodiment, the multi-dimensional classification result comprises a domain classification result, an action classification result, and an entity noun classification result; the label information comprises a field classification label, an action classification label and an entity noun classification label corresponding to the sample dialog text, and an intention identification label corresponding to the sample dialog text.

In an exemplary embodiment, the function value determining module 960 includes: the device comprises a first acquisition unit, a second acquisition unit and a function acquisition unit.

A first obtaining unit, configured to determine, based on the domain classification result, the action classification result, and the entity noun classification result, a first loss function value corresponding to the intention identification model in combination with the domain classification tag, the action classification tag, and the entity noun classification tag; wherein the first loss function is used for measuring the accuracy of the multi-dimensional classification result of the intention recognition model.

A second obtaining unit, configured to determine a second loss function value corresponding to the intention recognition model based on the intention recognition result and the intention recognition tag; wherein the second loss function is used for measuring the accuracy of the intention recognition result of the intention recognition model.

A function obtaining unit configured to determine a loss function value of the intention recognition model based on the first loss function value and the second loss function value.

In an exemplary embodiment, the first obtaining unit is configured to determine, based on the domain classification result, a loss function value corresponding to a domain classifier in the intention recognition model in combination with the domain classification label, the action classification label and the noun classification label; based on the action classification result, determining a loss function value corresponding to an action classifier in the intention recognition model by combining the field classification label, the action classification label and the entity noun classification label; determining a loss function value corresponding to a noun classifier in the intention recognition model based on the noun classification result and in combination with the domain classification label, the action classification label and the noun classification label; and determining the first loss function value according to the loss function value corresponding to the domain classifier, the loss function value corresponding to the action classifier and the loss function value corresponding to the entity noun classifier.

In an exemplary embodiment, as shown in fig. 10, the apparatus 900 further comprises: a tag acquisition module 980.

A tag obtaining module 980, configured to split the intention identification tag based on a preset rule, and obtain the field classification tag, the action classification tag, and the entity noun classification tag.

In an exemplary embodiment, the intent recognition model includes: the system comprises a feature extraction layer, a domain encoder, an action encoder, a physical noun encoder, a mean calculation network, a variance calculation network, a noise superposition network, a domain classifier, an action classifier, a physical noun classifier and a result fusion layer.

The feature extraction layer is used for performing feature extraction processing on the dialog text to acquire feature information corresponding to the dialog text; the feature information includes a plurality of discrete feature variables.

The domain encoder is used for performing domain classification encoding on the feature information, converting a plurality of discrete feature variables in the feature information into continuous feature variables and obtaining domain feature vectors of the feature information; wherein the domain feature vector is a continuous feature variable.

The action encoder is used for carrying out action classification encoding on the characteristic information, converting a plurality of discrete characteristic variables in the characteristic information into continuous characteristic variables and obtaining action characteristic vectors of the characteristic information; wherein the motion feature vector is a continuous feature variable.

The entity noun encoder is used for carrying out entity noun classification encoding on the feature information, converting a plurality of discrete feature variables in the feature information into continuous feature variables and obtaining entity noun feature vectors of the feature information; wherein, the entity noun feature vector is a continuous feature variable.

The mean value calculation network is used for respectively processing the plurality of dimensional feature vectors to obtain mean value vectors corresponding to the plurality of dimensional feature vectors; wherein the mean vector is used for indicating the distribution mean of the dimension feature vector in the space.

The variance calculation network is used for respectively processing the plurality of dimensional characteristic vectors to obtain variance vectors corresponding to the plurality of dimensional characteristic vectors; wherein the variance vector is used for indicating the distribution variance of the dimension characteristic vector in space.

The noise superposition network is used for superposing target noise based on the mean vector and the variance vector respectively corresponding to the multiple dimension feature vectors and determining the feature vectors to be classified respectively corresponding to the multiple feature dimension vectors.

The domain classifier is used for classifying the feature vectors to be classified of the domain dimensions to obtain domain dimension classification results of the feature vectors.

The action classifier is used for classifying the characteristic vectors to be classified of the action dimensions to obtain action dimension classification results of the characteristic vectors.

The entity noun classifier is used for classifying the feature vectors to be classified of entity noun dimensions to obtain entity noun dimension classification results of the feature vectors.

And the result fusion layer is used for superposing the field classification result, the action classification result and the entity noun classification result to determine the intention recognition result of the dialog text.

In summary, in the technical scheme provided by the embodiment of the application, the intention recognition model is trained through the training samples, and in the training process and during the classification coding, a plurality of discrete characteristic variables are converted into multi-dimensional continuous characteristic variables, so that the information loss of the intention recognition model in the use process can be effectively reduced, and the accuracy of the intention recognition result is improved.

It should be noted that, when the apparatus provided in the foregoing embodiment implements the functions thereof, only the division of the functional modules is illustrated, and in practical applications, the functions may be distributed by different functional modules according to needs, that is, the internal structure of the apparatus may be divided into different functional modules to complete all or part of the functions described above. In addition, the apparatus and method embodiments provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments for details, which are not described herein again.

Referring to fig. 11, a block diagram of a computer device according to an embodiment of the present application is shown. The computer device may be configured to implement the functions of the above-described intention recognition method of dialogue information or the training method of an intention recognition model. Specifically, the method comprises the following steps:

the computer apparatus 1100 includes a Central Processing Unit (CPU) 1101, a system Memory 1104 including a Random Access Memory (RAM) 1102 and a Read Only Memory (ROM) 1103, and a system bus 1105 connecting the system Memory 1104 and the CPU 1101. The computer device 1100 also includes a basic Input/Output system (I/O system) 1106, which facilitates transfer of information between devices within the computer, and a mass storage device 1107 for storing an operating system 1113, application programs 1114, and other program modules 1115.

The basic input/output system 1106 includes a display 1108 for displaying information and an input device 1109 such as a mouse, keyboard, etc. for a user to input information. Wherein the display 1108 and the input device 1109 are connected to the central processing unit 1101 through an input output controller 1110 connected to the system bus 1105. The basic input/output system 1106 may also include an input/output controller 1110 for receiving and processing input from a number of other devices, such as a keyboard, mouse, or electronic stylus. Similarly, input-output controller 1110 also provides output to a display screen, a printer, or other type of output device.

The mass storage device 1107 is connected to the central processing unit 1101 through a mass storage controller (not shown) that is connected to the system bus 1105. The mass storage device 1107 and its associated computer-readable media provide non-volatile storage for the computer device 1100. That is, the mass storage device 1107 may include a computer-readable medium (not shown) such as a hard disk or a CD-ROM (Compact disk Read-Only Memory) drive.

Without loss of generality, computer readable media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes RAM, ROM, EPROM (Erasable Programmable Read Only Memory), EEPROM (Electrically Erasable Programmable Read Only Memory), flash Memory or other solid state Memory technology, CD-ROM, DVD (Digital Video Disc) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices. Of course, those skilled in the art will appreciate that computer storage media is not limited to the foregoing. The system memory 1104 and the mass storage device 1107 described above may be collectively referred to as memory.

According to various embodiments of the application, the computer device 1100 may also operate as a remote computer connected to a network through a network, such as the Internet. That is, the computer device 1100 may connect to the network 1112 through the network interface unit 1111 connected to the system bus 1105 or may connect to other types of networks and remote computer systems (not shown) using the network interface unit 1111.

The memory also includes a computer program stored in the memory and configured to be executed by the one or more processors to implement the above-described intent recognition method of dialog information, or to implement the above-described training method of an intent recognition model.

In an exemplary embodiment, there is also provided a computer-readable storage medium having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, which when executed by a processor, implements the above-described intention recognition method of dialog information or implements the above-described training method of intention recognition models.

Optionally, the computer-readable storage medium may include: ROM (Read Only Memory), RAM (Random Access Memory), SSD (Solid State drive), or optical disc. The Random Access Memory may include a ReRAM (resistive Random Access Memory) and a DRAM (Dynamic Random Access Memory).

In an exemplary embodiment, a computer program product or computer program is also provided, the computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and executes the computer instructions, so that the computer device executes the intention recognition method for the dialogue information or the training method for the intention recognition model.

It should be understood that reference to "a plurality" herein means two or more. "and/or" describes the association relationship of the associated objects, indicating that there may be three relationships, e.g., a and/or B, which may indicate: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates a relationship where the contextual objects are an "or". In addition, the step numbers described herein only exemplarily show one possible execution sequence among the steps, and in some other embodiments, the steps may also be executed out of the numbering sequence, for example, two steps with different numbers are executed simultaneously, or two steps with different numbers are executed in a reverse order to the illustration, which is not limited in this application.

The above description is only exemplary of the present application and should not be taken as limiting the present application, and any modifications, equivalents, improvements and the like that are made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method for recognizing an intention of dialogue information, the method comprising:

acquiring a dialog text to be identified;

carrying out multi-dimensional classification coding on the feature information, converting a plurality of discrete feature variables in the feature information into continuous feature variables, and obtaining multi-dimensional feature vectors of the feature information; wherein the multi-dimensional feature vector comprises a plurality of continuous feature variables of different dimensions;

classifying the multi-dimensional feature vectors respectively to obtain multi-dimensional classification results of the feature information, wherein the multi-dimensional classification results comprise a plurality of classification results with different dimensions;

2. The method of claim 1, wherein the multidimensional feature vector comprises a domain feature vector, an action feature vector, and a noun feature vector;

wherein the domain feature vector is used for indicating a domain to which the dialog text belongs, the action feature vector is used for indicating an action to be performed by an inputter of the dialog text, and the entity noun feature vector is used for indicating an entity noun associated with the domain and the action.

3. The method according to claim 1, wherein the classifying the multidimensional feature vectors to obtain multidimensional classification results of the feature information comprises:

aiming at a target dimension, processing a target dimension feature vector by adopting a mean calculation network to obtain a mean vector of the target dimension feature vector; wherein the mean vector is used for indicating the distribution mean of the target dimension feature vector in space;

processing the target dimension characteristic vector by adopting a variance calculation network to obtain a variance vector of the target dimension characteristic vector; wherein the variance vector is used for indicating the distribution variance of the target dimension feature vector in space;

carrying out noise superposition processing on the variance vector by adopting target noise to obtain a processed variance vector;

determining a feature vector to be classified corresponding to the target dimension feature vector based on the mean vector and the processed variance vector;

and classifying the feature vectors to be classified to obtain a target dimension classification result of the feature information.

4. The method according to any one of claims 1 to 3, wherein after performing the fusion recognition process on the multi-dimensional classification result and determining the intention recognition result of the dialog text, the method further comprises:

acquiring a response text associated with the intention recognition result from a dialog text library as a candidate response text according to the intention recognition result;

under the condition that the number of the candidate answer texts is not unique, obtaining the quality evaluation index of each candidate answer text;

and selecting the candidate response texts with the quality evaluation indexes meeting the conditions from the candidate response texts as the response texts corresponding to the dialog texts.

5. A method of training an intent recognition model, the method comprising:

performing feature extraction processing on the sample dialogue text to obtain feature information of the sample dialogue text; wherein the feature information comprises a plurality of discrete feature variables;

performing fusion recognition processing on the multi-dimensional classification result, and determining an intention recognition result of the sample dialog text;

adjusting parameters of the intent recognition model based on the loss function values.

6. The method of claim 5,

the multi-dimensional classification result comprises a field classification result, an action classification result and an entity noun classification result;

the label information comprises a field classification label, an action classification label and an entity noun classification label corresponding to the sample dialogue text, and an intention identification label corresponding to the sample dialogue text.

7. The method of claim 6, wherein calculating the loss function value of the intent recognition model based on the multi-dimensional classification result, the intent recognition result, and label information corresponding to the sample dialog text comprises:

determining a first loss function value corresponding to the intention recognition model based on the domain classification result, the action classification result and the entity noun classification result and in combination with the domain classification label, the action classification label and the entity noun classification label; wherein, the first loss function is used for measuring the accuracy of the multi-dimensional classification result of the intention recognition model;

determining a second loss function value corresponding to the intention identification model based on the intention identification result and the intention identification label; wherein a second loss function is used for measuring the accuracy of the intention recognition result of the intention recognition model;

determining a loss function value for the intent recognition model based on the first loss function value and the second loss function value.

8. The method of claim 7, wherein the determining a first loss function value corresponding to the intention recognition model based on the domain classification result, the action classification result, and the noun classification result in combination with the domain classification label, the action classification label, and the noun classification label comprises:

determining a loss function value corresponding to a domain classifier in the intention recognition model based on the domain classification result and in combination with the domain classification label, the action classification label and the entity noun classification label;

determining a loss function value corresponding to an action classifier in the intention recognition model based on the action classification result and in combination with the domain classification label, the action classification label and the entity noun classification label;

based on the entity noun classification result, determining a loss function value corresponding to an entity noun classifier in the intention recognition model by combining the field classification label, the action classification label and the entity noun classification label;

and determining the first loss function value according to the loss function value corresponding to the domain classifier, the loss function value corresponding to the action classifier and the loss function value corresponding to the entity noun classifier.

9. The method of claim 6, further comprising:

and splitting the intention identification label based on a preset rule to obtain the field classification label, the action classification label and the entity noun classification label.

10. The method of any of claims 6 to 8, wherein the intent recognition model comprises: the system comprises a feature extraction layer, a domain encoder, an action encoder, a physical noun encoder, a mean calculation network, a variance calculation network, a noise superposition network, a domain classifier, an action classifier, a physical noun classifier and a result fusion layer;

the feature extraction layer is used for performing feature extraction processing on the dialog text to acquire feature information corresponding to the dialog text; the feature information comprises a plurality of discrete feature variables;

the domain encoder is used for performing domain classification encoding on the feature information, converting a plurality of discrete feature variables in the feature information into continuous feature variables and obtaining domain feature vectors of the feature information; wherein, the domain feature vector is a continuous feature variable;

the motion encoder is used for performing motion classification encoding on the feature information, converting a plurality of discrete feature variables in the feature information into continuous feature variables and obtaining motion feature vectors of the feature information; wherein, the motion characteristic vector is a continuous characteristic variable;

the entity noun encoder is used for carrying out entity noun classified encoding on the feature information, converting a plurality of discrete feature variables in the feature information into continuous feature variables and obtaining entity noun feature vectors of the feature information; wherein, the entity noun feature vector is a continuous feature variable;

the mean value calculation network is used for respectively processing the plurality of dimensional feature vectors to obtain mean value vectors corresponding to the plurality of dimensional feature vectors; wherein the mean vector is used for indicating the distribution mean of the dimension feature vectors in the space;

the variance calculation network is used for respectively processing the plurality of dimensional characteristic vectors to obtain variance vectors corresponding to the plurality of dimensional characteristic vectors; wherein the variance vector is used for indicating the distribution variance of the dimension feature vector in space;

the noise superposition network is used for superposing target noise based on the mean vector and the variance vector respectively corresponding to the multiple dimension feature vectors and determining the feature vectors to be classified respectively corresponding to the multiple feature dimension vectors;

the domain classifier is used for classifying the feature vectors to be classified of the domain dimensions to obtain domain dimension classification results of the feature vectors;

the action classifier is used for classifying the characteristic vectors to be classified of the action dimensions to obtain action dimension classification results of the characteristic vectors;

the entity noun classifier is used for classifying the feature vectors to be classified of entity noun dimensions to obtain entity noun dimension classification results of the feature vectors;

and the result fusion layer is used for superposing the domain classification result, the action classification result and the entity noun classification result to determine the intention recognition result of the dialog text.

11. An apparatus for recognizing an intention of a dialogue information, the apparatus comprising:

the feature extraction module is used for carrying out feature extraction processing on the dialog text to obtain feature information of the dialog text; wherein the feature information comprises a plurality of discrete feature variables;

the characteristic coding module is used for carrying out multi-dimensional classified coding on the characteristic information, converting a plurality of discrete characteristic variables in the characteristic information into continuous characteristic variables and obtaining multi-dimensional characteristic vectors of the characteristic information; wherein the multi-dimensional feature vector comprises a plurality of continuous feature variables of different dimensions;

and the result fusion module is used for performing fusion recognition processing on the multi-dimensional classification result and determining an intention recognition result of the dialog text.

12. An apparatus for training an intention recognition model, the apparatus comprising:

the characteristic acquisition module is used for carrying out characteristic extraction processing on the sample conversation text to obtain characteristic information of the sample conversation text; the characteristic information comprises a plurality of discrete characteristic variables;

the vector acquisition module is used for carrying out multi-dimensional classified coding on the characteristic information, converting a plurality of discrete characteristic variables in the characteristic information into continuous characteristic variables and obtaining multi-dimensional characteristic vectors of the characteristic information; the multi-dimensional feature vector comprises a plurality of feature vectors with different dimensions, and the feature vectors are continuous feature variables;

the vector classification module is used for respectively classifying the multi-dimensional feature vectors to obtain multi-dimensional classification results of the feature information, wherein the multi-dimensional classification results comprise a plurality of classification results with different dimensions;

the result acquisition module is used for carrying out fusion recognition processing on the multi-dimensional classification results and determining the intention recognition result of the dialog text;

a function value determining module, configured to calculate a loss function value of the intention recognition model based on the multi-dimensional classification result, the intention recognition result, and tag information corresponding to the sample dialog text;

13. A computer device comprising a processor and a memory, the memory having stored therein at least one instruction, at least one program, set of codes, or set of instructions, which is loaded and executed by the processor to implement the method of intent recognition of dialog information of any of claims 1 to 4 or to implement the method of training of an intent recognition model of any of claims 5 to 10.

14. A computer-readable storage medium, wherein at least one instruction, at least one program, a set of codes, or a set of instructions is stored in the storage medium, and the at least one instruction, the at least one program, the set of codes, or the set of instructions is loaded and executed by a processor to implement the method for intention recognition of dialog information according to any one of claims 1 to 4, or to implement the method for training the intention recognition model according to any one of claims 5 to 10.