CN113190669A

CN113190669A - Intelligent dialogue method, device, terminal and storage medium

Info

Publication number: CN113190669A
Application number: CN202110603528.8A
Authority: CN
Inventors: 梁方殷; 梁丽娜; 梁子敬; 贺春艳
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2021-05-31
Filing date: 2021-05-31
Publication date: 2021-07-30

Abstract

The invention discloses an intelligent dialogue method, an intelligent dialogue device, a terminal and a storage medium, wherein the method comprises the following steps: acquiring an input text of a user in a current wheel conversation process; performing intention recognition on an input text by using a pre-trained multi-task joint model to obtain intention information and slot position information, wherein the multi-task joint model comprises an intention recognition layer and a slot filling layer, and the intention recognition layer and the slot filling layer are constructed by a gate control circulation unit; inquiring from a resource library to obtain at least one standard text according to the intention information and the slot position information; and calculating the similarity of each standard text and the input text, and outputting the standard text with the highest similarity as the final text. Through the mode, the feasible standard text is positioned by obtaining the intention information and the slot position information of the user, the most appropriate output text is confirmed by utilizing the similarity, the accurate question answering between the user and the system is realized, a simplified multi-task combined model framework is adopted, and the system is suitable for a low-cost environment and is low in maintenance cost.

Description

Intelligent dialogue method, device, terminal and storage medium

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to an intelligent dialogue method, an intelligent dialogue device, an intelligent dialogue terminal, and a storage medium.

Background

The intelligent dialogue system is a current mainstream man-machine dialogue system, and the question-answering system is divided into a vertical field and chatting. The question-answering system in the vertical field can replace manual customer service to complete most of complicated work, can solve the dilemma that a user needs to manually inquire or search various answers, and can be applied to different fields such as tax, work flow, medical field, financial field and the like according to the definition of the vertical field and the scope of the training corpora. Most of the existing intelligent dialogue system frameworks are based on a pipeline method, a natural language understanding module and a natural language generating module are included in the aspect of language understanding, the natural language understanding module and the natural language generating module are split, a core intention recognition model for natural language understanding usually uses a Google bert model, recognition is completed through fine adjustment of downstream tasks of the bert model, however, the bert and series model frameworks are large, the implementation needs to be carried out in a GPU environment when iterative corpus training is involved, the training time is long, resources are consumed very much, the method cannot be suitable for a low-cost environment, and rapid development and iteration are not facilitated.

Disclosure of Invention

The application provides an intelligent dialogue method, an intelligent dialogue device, an intelligent dialogue terminal and a storage medium, and aims to solve the problems that an existing question-answering system is large in model framework, high in operation cost and not beneficial to maintenance.

In order to solve the technical problem, the application adopts a technical scheme that: an intelligent dialogue method is provided, comprising: acquiring an input text of a user in a current wheel conversation process; the method comprises the steps that intention recognition is carried out on an input text by utilizing a pre-trained multi-task combined model to obtain intention information and slot position information, the multi-task combined model comprises an intention recognition layer and a slot filling layer, the intention recognition layer and the slot filling layer are constructed through a gate control circulation unit, at least one slot position is preset in the intention information, and the slot position information is attribute information of the corresponding slot position; inquiring from a resource library to obtain at least one standard text according to the intention information and the slot position information; and calculating the similarity of each standard text and the input text, and outputting the standard text with the highest similarity as the final text.

As a further improvement of the present application, after the intent recognition is performed on the input text by using a pre-trained multitask combined model to obtain the intent information and the slot position information, the method further includes: judging whether the slot position information is complete according to at least one slot position corresponding to the intention information; if not, generating a new round of dialogue text according to the slot position with missing slot position information; and receiving a new input text input by the user aiming at the new round of dialog text, and extracting the missing slot position information.

As a further improvement of the application, the method utilizes a pre-trained multi-task combined model to perform intention recognition on the input text to obtain intention information and slot position information, and comprises the following steps: preprocessing an input text, and converting the preprocessed input text into vector representation; converting the vector representation into a digital sequence, and inputting the digital sequence into a multitask combined model to obtain an intention and a slot position sequence; and carrying out inverse mapping on the intention and the slot position sequence to obtain intention information and slot position information.

As a further improvement of the present application, preprocessing input text and converting the preprocessed input text into a vector representation, comprises: preprocessing an input text; judging whether the input text is Chinese or English; if the input text is Chinese, the input text is converted into word vectors word by word.

As a further improvement of the present application, the method further includes pre-training the multi-task joint model, which includes: acquiring a training sample, preprocessing the training sample and converting the training sample into a sample vector; marking the slot positions in the sample vector, and acquiring a pre-constructed intention list; converting the sample vector into a sample digital sequence, inputting the sample digital sequence into an intention identification layer and a slot filling layer to be trained to obtain a sample intention and slot position sequence, wherein the intention identification layer and the slot filling layer are constructed by adopting a gate control circulation unit; carrying out inverse mapping on the sample intention and the slot position sequence to obtain sample intention information and sample slot position information; inquiring an intention chart according to the sample intention information and the sample slot position information to generate a prediction result; and comparing the predicted result with the real result corresponding to the training sample, and reversely propagating and updating the intention identification layer and the groove filling layer by using the comparison result.

As a further improvement of the present application, after outputting the standard text with the highest similarity as the final text, the method further includes: storing the input text and the final text as new samples to a new sample library; and when the number of the newly added samples in the newly added sample library reaches a preset threshold value, performing iterative training on the multi-task combined model by using the newly added samples, and emptying the newly added sample library.

As a further improvement of the present application, outputting the standard text with the highest similarity as the final text, including: and converting the standard text with the highest similarity into speech and outputting the speech.

In order to solve the above technical problem, another technical solution adopted by the present application is: provided is an intelligent dialogue device, including: the acquisition module is used for acquiring an input text of a user in the current wheel conversation process; the identification module is used for carrying out intention identification on an input text by utilizing a pre-trained multitask combined model to obtain intention information and slot position information, wherein the multitask combined model comprises an intention identification layer and a slot filling layer, the intention identification layer and the slot filling layer are constructed through a gate control circulating unit, the intention information is preset with at least one slot position, and the slot position information is attribute information of the corresponding slot position; the query module is used for querying from a resource library to obtain at least one standard text according to the intention information and the slot position information; and the output module is used for calculating the similarity between each standard text and the input text and outputting the standard text with the highest similarity as the final text.

In order to solve the above technical problem, the present application adopts another technical solution that: there is provided a terminal comprising a processor, a memory coupled to the processor, the memory having stored therein program instructions which, when executed by the processor, cause the processor to carry out the steps of the intelligent dialog method as claimed in any one of the above.

In order to solve the above technical problem, the present application adopts another technical solution that: there is provided a storage medium storing a program file capable of implementing the above-described intelligent dialogue method.

The beneficial effect of this application is: the intelligent dialogue method confirms all feasible standard texts replied aiming at the input texts by identifying the intention information and the slot position information in the input texts of the user, confirms the standard texts with the highest matching degree with the input texts by utilizing similarity calculation, outputs the standard texts as the output texts, completes the dialogue with the user and realizes accurate question answering with the user. And the multitask combined model consists of an intention identification layer and a groove filling layer which are constructed by adopting a gate control circulation unit, and compared with a bert model, the frame structure of the multitask combined model is greatly simplified, so that the multitask combined model is suitable for a low-cost environment and is low in maintenance cost.

Drawings

FIG. 1 is a flow chart diagram of an intelligent dialogue method according to a first embodiment of the invention;

FIG. 2 is a flow chart illustrating a second embodiment of the intelligent dialogue method;

FIG. 3 is a flowchart illustrating a third embodiment of the intelligent dialogue method;

FIG. 4 is a functional block diagram of an intelligent dialog device according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a terminal according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a storage medium according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "first", "second" and "third" in this application are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any indication of the number of technical features indicated. Thus, a feature defined as "first," "second," or "third" may explicitly or implicitly include at least one of the feature. In the description of the present application, "plurality" means at least two, e.g., two, three, etc., unless explicitly specifically limited otherwise. All directional indications (such as up, down, left, right, front, and rear … …) in the embodiments of the present application are only used to explain the relative positional relationship between the components, the movement, and the like in a specific posture (as shown in the drawings), and if the specific posture is changed, the directional indication is changed accordingly. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

Fig. 1 is a flowchart illustrating an intelligent dialogue method according to a first embodiment of the present invention. It should be noted that the method of the present invention is not limited to the flow sequence shown in fig. 1 if the results are substantially the same. As shown in fig. 1, the method comprises the steps of:

step S101: and acquiring the input text of the user in the current wheel conversation process.

Generally, the mode of the human-computer conversation can be divided into a text conversation and a voice conversation, and in this embodiment, if the voice conversation is adopted, after the voice input by the user is acquired, the voice needs to be converted into text.

In step S101, the process of the human-computer dialog generally takes one round of completing a question-answering process between two parties, and each time a new round of question-answering is performed, it is first necessary to obtain the input text of the user in the round of dialog.

Step S102: the method comprises the steps of utilizing a pre-trained multitask combined model to conduct intention identification on an input text to obtain intention information and slot position information, wherein the multitask combined model comprises an intention identification layer and a slot filling layer, the intention identification layer and the slot filling layer are constructed through a gate control circulation unit, at least one slot position is preset in the intention information, and the slot position information is attribute information of the corresponding slot position.

In step S102, after acquiring an input text of a user in a current turn of a conversation, performing intent recognition on the input text by using a pre-trained multi-task combined model, wherein an intent recognition result includes intent information and slot position information. The intention recognition is to judge what the user wants to do according to the input text of the user, for example, the user proposes a problem of how the Shenzhen weather is today, and the intention is to know whether the Shenzhen is sunny, cloudy or rainy today. In the intention recognition, the intention information is usually embodied in a slot filling manner, wherein the intention information is core content in a current input text of a user, and the slot information is keyword information for recognizing the intention of the user. For example, when a man-machine conversation is carried out, if the text input by the user is "how weather is in Shenzhen today", through intention identification, it can be known that intention information of the user is to inquire whether weather is good or bad, time is "today", and place is "Shenzhen", wherein time "today" and place "Shenzhen" are slot information, and the intention of the user can be identified through combination of the intention information and the slot information. It should be understood that the intention information and the slot position corresponding to the intention information are preset before the multitask combined model is trained, wherein the intention information may include a plurality of slots which are set by the user, each intention information corresponds to at least one slot position, and the process of identifying the intention of the user, i.e., a slot filling process, is to fill the slot position information into the slot position corresponding to the intention information so as to convert the intention of the user into an explicit instruction, wherein the slot position is a tag, and the slot position information is attribute information corresponding to the tag.

Further, the step S102 specifically includes:

1. the input text is preprocessed and the preprocessed input text is converted into a vector representation.

Specifically, the input text is preprocessed, wherein the preprocessing comprises the operations of removing special symbols in the input text, correcting wrongly written characters, identifying pinyin and the like, and after the preprocessing operation is completed, the preprocessed input text is converted into vector representation to obtain a vector matrix which can be identified by a model.

Further, in some embodiments, to further improve the accuracy of model identification, the input text is preprocessed, and the preprocessed input text is converted into a vector representation, including:

1.1 preprocessing input text.

1.2 judging whether the input text is Chinese or English;

1.3 if the input text is Chinese, converting the input text word by word into a word vector.

Specifically, in the conventional processing method, an input text is usually converted into a word vector, but for a chinese language, the converted word vector may be inaccurate, for example, the input text is "liquid crystal on my computer screen", when the word vector is divided, a word may be divided inaccurately by using "computer screen" as one word and "screen" as another word, and therefore, in this embodiment, when the input text is converted into a vector representation, it is determined first whether the input text is chinese or english, and if the input text is chinese, the word vector is converted word by word to avoid the problem of inaccurate division of the word vector, and for an english language, the word vector may be divided by word without being divided letter by letter.

2. And converting the vector representation into a digital sequence, and inputting the digital sequence into the multitask joint model to obtain the intention and the slot position sequence.

In particular, by serializing the vector representations, the multitask joint model is enabled to recognize the number sequence.

3. And carrying out inverse mapping on the intention and the slot position sequence to obtain intention information and slot position information.

Specifically, after the intention and slot position sequences are obtained, inverse mapping is performed, that is, the number sequences are converted into characters again, so that intention information and slot position information are obtained.

Further, in this embodiment, the intelligent dialogue method further includes pre-training the multi-task combined model, specifically including:

1. and acquiring a training sample, preprocessing the training sample and converting the training sample into a sample vector.

Specifically, the training sample is prepared in advance by a user, and after the training sample is obtained, the training sample is preprocessed and then converted into a sample vector.

2. And marking the slot positions in the sample vector, and acquiring a pre-constructed intention list.

Specifically, after the sample vector is obtained, slot BIO labeling is performed on the sample vector, for example, "airplane tickets from shenzhen to beijing," where "shenzhen" is labeled as "origin," and "beijing" is labeled as "destination," and when slot filling is performed, two slots of "origin" and "destination" need to be filled. The intention list is a list preset by the user, and the preset intention which the user may initiate is recorded on the intention list.

3. And converting the sample vector into a sample digital sequence, inputting the sample digital sequence into an intention identification layer and a slot filling layer to be trained to obtain a sample intention and slot position sequence, wherein the intention identification layer and the slot filling layer are constructed by adopting a gate control circulation unit.

Specifically, the intent recognition layer and the slot fill layer are constructed using gated round robin units (GRUs). The gate control cycle unit only comprises a reset gate and an update gate, and a model formed by the gate control cycle unit is more accurate in structure, so that the calculated amount during model training is smaller, the whole iterative process can be completed more quickly, and the time required by model training is reduced.

4. And carrying out inverse mapping on the sample intention and the slot position sequence to obtain sample intention information and sample slot position information.

5. Inquiring an intention chart according to the sample intention information and the sample slot position information to generate a prediction result;

6. and comparing the predicted result with the real result corresponding to the training sample, and reversely propagating and updating the intention identification layer and the groove filling layer by using the comparison result.

Step S103: and inquiring at least one standard text from the resource library according to the intention information and the slot position information.

In step S103, the resource pool is a database constructed in advance for storing various types of resource data. For a question of a certain user, after the user is subjected to the ideographic recognition, only one answer may exist, and at this time, all qualified standard texts need to be found out, so as to conveniently confirm the final text from all the standard texts.

Step S104: and calculating the similarity of each standard text and the input text, and outputting the standard text with the highest similarity as the final text.

In step S104, after all the standard texts are queried, the similarity between each standard text and the input text is calculated, and after the similarity is calculated, the standard text with the highest similarity is selected from the calculated standard texts and is output as the final text. The method adopts a similarity calculation mode to improve the understanding of semantics, so that the most accurate selection is made, and the improper answer is further provided.

It should be noted that, if only one standard text exists, the standard text can be directly output as a final text.

Further, when the man-machine conversation is performed by voice, the outputting of the standard text with the highest similarity as the final text specifically includes:

and converting the standard text with the highest similarity into speech and outputting the speech.

The intelligent dialogue method of the first embodiment of the invention confirms all feasible standard texts replied aiming at the input texts by identifying the intention information and slot position information in the input texts of the user, confirms the standard text with the highest matching degree with the input texts by utilizing similarity calculation, outputs the standard text as the output text, completes the dialogue with the user, and realizes the accurate question and answer with the user. And the multitask combined model consists of an intention identification layer and a groove filling layer which are constructed by adopting a gate control circulation unit, and compared with a bert model, the frame structure of the multitask combined model is greatly simplified, so that the multitask combined model is suitable for a low-cost environment and is low in maintenance cost.

Fig. 2 is a flowchart illustrating an intelligent dialogue method according to a second embodiment of the present invention. It should be noted that the method of the present invention is not limited to the flow sequence shown in fig. 2 if the results are substantially the same. As shown in fig. 2, the method comprises the steps of:

step S201: and acquiring the input text of the user in the current wheel conversation process.

In this embodiment, step S201 in fig. 2 is similar to step S101 in fig. 1, and for brevity, is not described herein again.

Step S202: the method comprises the steps of utilizing a pre-trained multitask combined model to conduct intention identification on an input text to obtain intention information and slot position information, wherein the multitask combined model comprises an intention identification layer and a slot filling layer, the intention identification layer and the slot filling layer are constructed through a gate control circulation unit, at least one slot position is preset in the intention information, and the slot position information is attribute information of the corresponding slot position.

In this embodiment, step S202 in fig. 2 is similar to step S102 in fig. 1, and for brevity, is not described herein again.

Step S203: and judging whether the slot position information is complete according to at least one slot position corresponding to the intention information. If not, executing step S204; if yes, go to step S206.

In step S203, during the human-computer conversation, the input text of the user in a round of conversation may lack slot information, for example, when the input text of the user is "there is also a flight from shenzhen to beijing", through intention identification, it can be obtained that the intention information of the user is an air ticket, and it is assumed that a slot corresponding to the intention information of the air ticket includes a departure place, a destination and a time, and the slot information includes the departure place "shenzhen" and the destination "beijing", but does not include slot information corresponding to the time, that is, the slot information of the slot that lacks the time, so that corresponding operations cannot be performed according to the identified intention of the user.

Step S204: and generating a new round of dialog text according to the slot position missing the slot position information.

In step S204, when it is determined that the slot information in the intention information lacks the necessary slot information, a new round of dialog text is generated according to the necessary slot information and output, for example, the explanation continues with "flight from shenzhen to beijing", where in the new round of dialog, the machine can output "what time is required to be scheduled to ask for a question? "to request the user to further supplement the new slot information. It is to be understood that in this case, the intention information and the slot information in the previous session need to be saved to the next session.

Step S205: and receiving a new input text input by the user aiming at the new round of dialog text, and extracting the missing slot position information.

In step S205, when a new input text input by the user for the new round of dialog text is received and necessary slot position information is extracted from the new input text, the slot position information obtained in the previous round of dialog and the necessary slot position information obtained in the current round of dialog are combined to form complete slot position information.

Further, in some embodiments, before the standard text is obtained by querying the resource library through the intention information and the slot information each time, the integrity of the necessary slot information needs to be judged once until the user supplements all the necessary slot information completely.

Step S206: and inquiring at least one standard text from the resource library according to the intention information and the slot position information.

In this embodiment, step S206 in fig. 2 is similar to step S103 in fig. 1, and for brevity, is not described herein again.

Step S207: and calculating the similarity of each standard text and the input text, and outputting the standard text with the highest similarity as the final text.

In this embodiment, step S207 in fig. 2 is similar to step S104 in fig. 1, and for brevity, is not described herein again.

The intelligent dialogue method of the second embodiment of the invention is based on the first embodiment, and detects whether the slot information carried in the text input by the user is complete, if the slot information carried in the text input by the user is incomplete, a new round of dialogue is carried out with the user to request the user to supplement complete and necessary slot information, and after the necessary slot information is supplemented completely, the corresponding standard text is inquired according to the intention information and the slot information, so as to ensure the accuracy of the matched standard text.

Fig. 3 is a flowchart illustrating an intelligent dialogue method according to a third embodiment of the present invention. It should be noted that the method of the present invention is not limited to the flow sequence shown in fig. 3 if the results are substantially the same. As shown in fig. 3, the method comprises the steps of:

step S301: and acquiring the input text of the user in the current wheel conversation process.

In this embodiment, step S301 in fig. 3 is similar to step S101 in fig. 1, and for brevity, is not described herein again.

Step S302: the method comprises the steps of utilizing a pre-trained multitask combined model to conduct intention identification on an input text to obtain intention information and slot position information, wherein the multitask combined model comprises an intention identification layer and a slot filling layer, the intention identification layer and the slot filling layer are constructed through a gate control circulation unit, at least one slot position is preset in the intention information, and the slot position information is attribute information of the corresponding slot position.

In this embodiment, step S302 in fig. 3 is similar to step S102 in fig. 1, and for brevity, is not described herein again.

Step S303: and inquiring at least one standard text from the resource library according to the intention information and the slot position information.

In this embodiment, step S303 in fig. 3 is similar to step S103 in fig. 1, and for brevity, is not described herein again.

Step S304: and calculating the similarity of each standard text and the input text, and outputting the standard text with the highest similarity as the final text.

In this embodiment, step S304 in fig. 3 is similar to step S104 in fig. 1, and for brevity, is not described herein again.

Step S305: and storing the input text and the final text as new samples to a new sample library.

Step S306: and when the number of the newly added samples in the newly added sample library reaches a preset threshold value, performing iterative training on the multi-task combined model by using the newly added samples, and emptying the newly added sample library.

In steps S305 to S306, after completing a session with the user, storing the input text and the final text in the session to a newly added sample library, when the number of newly added samples stored in the newly added sample library reaches a preset threshold, performing iterative update training on the multitask combined model by using the newly added samples in the newly added sample library to further improve the accuracy of the multitask combined model, emptying the newly added samples in the newly added sample library after performing iterative training by using the newly added samples in the newly added sample library, and restarting to calculate the number of the newly added samples.

The intelligent dialogue method of the second embodiment of the invention performs iterative training on the multitask combined model by using historical dialogue data generated by man-machine dialogue as a newly added sample on the basis of the first embodiment, so as to enhance the accuracy of the multitask combined model.

Fig. 4 is a functional module diagram of an intelligent dialogue device according to an embodiment of the present invention. As shown in fig. 6, the intelligent dialogue device 40 includes an acquisition module 41, a recognition module 42, a query module 43, and an output module 44.

The obtaining module 41 is configured to obtain an input text of a user in a current wheel conversation process; the identification module 42 is configured to perform intent identification on an input text by using a pre-trained multitask combined model to obtain intent information and slot position information, where the multitask combined model includes an intent identification layer and a slot filling layer, the intent identification layer and the slot filling layer are constructed by a gate control cycle unit, the intent information is preset with at least one slot position, and the slot position information is attribute information of the corresponding slot position; the query module 43 is configured to query the resource library to obtain at least one standard text according to the intention information and the slot position information; and the output module 44 is used for calculating the similarity between each standard text and the input text, and outputting the standard text with the highest similarity as the final text.

Optionally, after performing the operation of performing intent recognition on the input text by using the pre-trained multitask combined model to obtain intent information and slot position information, the recognition module 42 is further configured to: judging whether the slot position information is complete according to at least one slot position corresponding to the intention information; if not, generating a new round of dialogue text according to the slot position with missing slot position information; and receiving a new input text input by the user aiming at the new round of dialog text, and extracting the missing slot position information.

Optionally, the recognition module 42 performs an operation of performing intent recognition on the input text by using a pre-trained multitask combined model to obtain intent information and slot position information, which specifically includes: preprocessing an input text, and converting the preprocessed input text into vector representation; converting the vector representation into a digital sequence, and inputting the digital sequence into a multitask combined model to obtain an intention and a slot position sequence; and carrying out inverse mapping on the intention and the slot position sequence to obtain intention information and slot position information.

Optionally, the recognition module 42 performs an operation of preprocessing the input text and converting the preprocessed input text into a vector representation, and may further be: preprocessing an input text; judging whether the input text is Chinese or English; if the input text is Chinese, the input text is converted into word vectors word by word.

Optionally, the recognition module 42 is further configured to perform operations of pre-training the multi-task joint model, including: acquiring a training sample, preprocessing the training sample and converting the training sample into a sample vector; marking the slot positions in the sample vector, and acquiring a pre-constructed intention list; converting the sample vector into a sample digital sequence, inputting the sample digital sequence into an intention identification layer and a slot filling layer to be trained to obtain a sample intention and slot position sequence, wherein the intention identification layer and the slot filling layer are constructed by adopting a gate control circulation unit; carrying out inverse mapping on the sample intention and the slot position sequence to obtain sample intention information and sample slot position information; inquiring an intention chart according to the sample intention information and the sample slot position information to generate a prediction result; and comparing the predicted result with the real result corresponding to the training sample, and reversely propagating and updating the intention identification layer and the groove filling layer by using the comparison result.

Optionally, after the output module 44 performs the operation of outputting the standard text with the highest similarity as the final text, the operation is further configured to: storing the input text and the final text as new samples to a new sample library; and when the number of the newly added samples in the newly added sample library reaches a preset threshold value, performing iterative training on the multi-task combined model by using the newly added samples, and emptying the newly added sample library.

Optionally, the output module 44 performs an operation of outputting the standard text with the highest similarity as the final text, and may further include: and converting the standard text with the highest similarity into speech and outputting the speech.

For other details of the technical solution implemented by each module in the intelligent dialog device in the above embodiment, reference may be made to the description of the intelligent dialog method in the above embodiment, and details are not described here again.

It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. For the device-like embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

Referring to fig. 5, fig. 5 is a schematic structural diagram of a terminal according to an embodiment of the present invention. As shown in fig. 5, the terminal 50 includes a processor 51 and a memory 52 coupled to the processor 51, wherein the memory 52 stores program instructions, and the program instructions, when executed by the processor 51, cause the processor 51 to execute the steps of the intelligent dialogue method according to any of the embodiments.

The processor 51 may also be referred to as a CPU (Central Processing Unit). The processor 51 may be an integrated circuit chip having signal processing capabilities. The processor 51 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

Referring to fig. 6, fig. 6 is a schematic structural diagram of a storage medium according to an embodiment of the invention. The storage medium of the embodiment of the present invention stores a program file 61 capable of implementing all the methods described above, wherein the program file 61 may be stored in the storage medium in the form of a software product, and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a mobile hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, or terminal devices, such as a computer, a server, a mobile phone, and a tablet.

In the several embodiments provided in the present application, it should be understood that the disclosed terminal, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit. The above embodiments are merely examples and are not intended to limit the scope of the present disclosure, and all modifications, equivalents, and flow charts using the contents of the specification and drawings of the present disclosure or those directly or indirectly applied to other related technical fields are intended to be included in the scope of the present disclosure.

Claims

1. An intelligent dialog method, comprising:

acquiring an input text of a user in a current wheel conversation process;

performing intention recognition on the input text by using a pre-trained multitask joint model to obtain intention information and slot position information, wherein the multitask joint model comprises an intention recognition layer and a slot filling layer, the intention recognition layer and the slot filling layer are constructed by a gate control circulation unit, the intention information is preset with at least one slot position, and the slot position information is attribute information of the corresponding slot position;

inquiring at least one standard text from a resource library according to the intention information and the slot position information;

and calculating the similarity of each standard text and the input text, and outputting the standard text with the highest similarity as a final text.

2. The intelligent dialogue method according to claim 1, wherein after the intent recognition of the input text by using the pre-trained multitask joint model to obtain the intent information and the slot information, the method further comprises:

judging whether the slot position information is complete according to at least one slot position corresponding to the intention information;

if not, generating a new round of dialogue text according to the slot position with missing slot position information;

and receiving a new input text input by the user aiming at the new round of dialog text, and extracting the missing slot position information.

3. The intelligent dialogue method according to claim 1, wherein the performing intent recognition on the input text by using a pre-trained multi-task combined model to obtain intent information and slot information comprises:

preprocessing the input text, and converting the preprocessed input text into vector representation;

converting the vector representation into a digital sequence, and inputting the digital sequence into the multitask combined model to obtain an intention and a slot position sequence;

and carrying out inverse mapping on the intention and the slot position sequence to obtain the intention information and the slot position information.

4. The intelligent dialog method of claim 3 wherein the pre-processing the input text and converting the pre-processed input text into a vector representation comprises:

preprocessing the input text;

judging whether the input text is Chinese or English;

and if the input text is Chinese, converting the input text into word vectors word by word.

5. The intelligent dialog method of claim 1 further comprising pre-training the multi-tasking federated model, which comprises:

acquiring a training sample, and converting the training sample into a sample vector after preprocessing;

marking the slot positions in the sample vector, and acquiring a pre-constructed intention list;

converting the sample vector into a sample digital sequence, and inputting the sample digital sequence into an intention identification layer and a slot filling layer to be trained to obtain a sample intention and a slot position sequence;

carrying out inverse mapping on the sample intention and the slot position sequence to obtain sample intention information and sample slot position information;

inquiring the intention chart according to the sample intention information and the sample slot position information to generate a prediction result;

and comparing the predicted result with the real result corresponding to the training sample, and updating the intention identification layer and the groove filling layer by utilizing the back propagation of the comparison result.

6. The intelligent dialogue method according to claim 1, wherein after outputting the standard text with the highest similarity as the final text, the method further comprises:

storing the input text and the final text as new samples to a new sample library;

and when the number of the newly added samples in the newly added sample library reaches a preset threshold value, performing iterative training on the multitask combined model by using the newly added samples, and emptying the newly added sample library.

7. The intelligent dialogue method according to claim 1, wherein the outputting the standard text with the highest similarity as the final text comprises:

8. An intelligent dialog device, comprising:

the acquisition module is used for acquiring an input text of a user in the current wheel conversation process;

the identification module is used for carrying out intention identification on the input text by utilizing a pre-trained multitask combined model to obtain intention information and slot position information, wherein the multitask combined model comprises an intention identification layer and a slot filling layer, the intention identification layer and the slot filling layer are constructed through a gate control circulation unit, the intention information is provided with at least one slot position in advance, and the slot position information is attribute information of the corresponding slot position;

the query module is used for querying from a resource library to obtain at least one standard text according to the intention information and the slot position information;

and the output module is used for calculating the similarity between each standard text and the input text and outputting the standard text with the highest similarity as a final text.

9. A terminal, characterized in that it comprises a processor, a memory coupled to the processor, in which are stored program instructions which, when executed by the processor, cause the processor to carry out the steps of the intelligent dialog method according to any of claims 1 to 7.

10. A storage medium storing a program file capable of implementing the intelligent dialogue method according to any one of claims 1 to 7.