CN114428746A

CN114428746A - Bad case identification method, device, equipment and storage medium

Info

Publication number: CN114428746A
Application number: CN202210132588.0A
Authority: CN
Inventors: 汪建; 袁春阳; 张鹏飞
Original assignee: Beijing Sankuai Online Technology Co Ltd
Current assignee: Beijing Sankuai Online Technology Co Ltd
Priority date: 2022-02-14
Filing date: 2022-02-14
Publication date: 2022-05-03

Abstract

The specification discloses a bad case identification method, a bad case identification device, a bad case identification equipment and a storage medium, wherein the data to be identified and preset guide information are displayed to a first user, the first user is guided to input information of a specified type aiming at the data to be identified, the information of the specified type input by the first user is used as characteristic information corresponding to the data to be identified, the characteristic information is input into a pre-trained bad case identification model, and the bad case reason type of the data to be identified is obtained. Therefore, the method obtains the characteristic information of the data to be identified by guiding the first user to input the information of the specified type aiming at the data to be identified according to the preset guiding information, standardizes the judgment of the bad case and improves the accuracy of the reason analysis of the bad case.

Description

Bad case identification method, device, equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a storage medium for identifying bad cases.

Background

In the man-machine conversation, a certain conversation language is used between a user and the intelligent conversation system, and information exchange between the user and the intelligent conversation system is completed in a certain interaction mode. In the process of man-machine conversation, the intelligent conversation system generally identifies, analyzes and the like the voice or text information input by the user to determine the intention of the user, and then obtains an output result according to the intention to complete the conversation with the user.

However, in practical applications, since an algorithm for recognizing and analyzing voice or text information is not yet complete, there may occur a problem that an output result of the intelligent dialogue system does not meet a user's expectation, that is, a bad case occurs. The defects of the man-machine conversation algorithm can be improved and pertinently improved by finding out bad cases and determining the reasons for generating the bad cases in the man-machine conversation process, so that the performance of identifying and analyzing the input information of the user by an intelligent conversation system in the man-machine conversation is improved.

At present, a marking person usually combines business experience to find and analyze bad cases occurring in the interaction process of the intelligent dialog system and the user.

However, the scheme has high requirements on the service level of the annotating personnel, the judging processes of the annotating personnel on the bad examples are inconsistent, the obtained analysis results of the bad examples mostly depend on the service experience and subjective thoughts of the annotating personnel, and the accuracy of the reason for analyzing the bad examples is low.

Disclosure of Invention

The present specification provides a method, an apparatus, a device and a storage medium for identifying a bad case, so as to partially solve the above problems in the prior art.

The technical scheme adopted by the specification is as follows:

the present specification provides a method for identifying a bad case, including:

acquiring data to be identified;

displaying the data to be identified and preset guide information to a first user; the guide information is used for guiding the first user to input information of a specified type aiming at the data to be identified;

receiving information of a specified type input by the first user for the data to be identified based on the guide information, and taking the information input by the first user as feature information of the data to be identified;

inputting the characteristic information into a pre-trained bad case identification model to obtain the bad case reason type of the data to be identified; the bad examples comprise unexpected dialogue data output by the intelligent dialogue system and the second user in the dialogue process.

Optionally, the acquiring data to be identified specifically includes:

acquiring dialogue data between the intelligent dialogue system and a second user;

inputting the dialogue data into a pre-trained bad case discovery model, and judging whether the dialogue data input into the bad case discovery model is a bad case;

and if so, inputting the dialogue data of the bad case discovery model as the data to be identified.

Optionally, the data to be recognized includes at least user dialogue data input by the second user and intelligent dialogue data output by the intelligent dialogue system in response to the user dialogue data.

Optionally, the preset guidance information includes: the method comprises the steps of providing a first user with a dialogue system, and providing a first user with a dialogue system, wherein the dialogue system comprises at least one of guiding information for guiding the first user to input a dialogue turn in which a bad case is positioned, guiding information for guiding the first user to input content recognition accuracy in the user dialogue data, guiding information for guiding the first user to input intention recognition accuracy, and guiding information for guiding the first user to input a reason for generating the bad case.

Optionally, when the guidance information includes guidance information for guiding the first user to input a conversation turn in which a bad example is located, the feature information of the data to be identified includes the conversation turn in which the bad example is located;

when the guide information comprises guide information for guiding a first user to input content identification accuracy in the user dialogue data, the characteristic information of the data to be identified comprises information of content identification errors in the user dialogue data;

when the guide information includes guide information for guiding a first user to input intention recognition accuracy, the feature information of the data to be recognized includes intention recognition error information;

when the guide information comprises guide information for guiding the first user to input the reason for generating the bad example, the characteristic information of the data to be identified comprises the reason for generating the bad example.

Optionally, the step of inputting the feature information into a pre-trained bad case identification model to obtain a bad case cause type to which the data to be identified belongs includes:

and inputting the dialogue data, the data to be recognized and the characteristic information of the data to be recognized into a pre-trained bad case recognition model to obtain the bad case reason type of the data to be recognized.

Optionally, the pre-training of the bad case identification model specifically includes:

obtaining sample dialogue data, sample to-be-identified data corresponding to the sample dialogue data and sample characteristic information of the sample to-be-identified data in advance, and using the sample dialogue data, the sample to-be-identified data and the sample characteristic information as samples for training a bad case identification model;

inputting the sample into a bad case identification model to be trained to obtain the reason type of the bad case to be optimized output by the bad case identification model to be trained;

and training the bad case identification model by taking the minimum difference between the reason type of the bad case to be optimized and the reason type of the marked bad case corresponding to the sample as a training target.

Optionally, the pre-training of the bad case discovery model specifically includes:

obtaining sample dialogue data in advance to serve as a sample for training a bad case discovery model;

inputting the sample into a bad case discovery model to be trained to obtain a discovery result to be optimized output by the bad case discovery model to be trained;

and training the bad case discovery model by using the minimization of the difference between the discovery result to be optimized and the labeled discovery result corresponding to the sample as a training target.

This specification provides a bad example recognition device, including:

the data to be identified acquisition module is used for acquiring data to be identified;

the display module is used for displaying the data to be identified and preset guide information to a first user; the guide information is used for guiding the first user to input information of a specified type aiming at the data to be identified;

a receiving module, configured to receive information of a specified type input by the first user for the data to be identified based on the guidance information, and use the information input by the first user as feature information of the data to be identified;

a bad case cause type obtaining module, configured to input the feature information to a pre-trained bad case identification model, so as to obtain a bad case cause type to which the data to be identified belongs; the bad examples comprise unexpected dialogue data output by the intelligent dialogue system and the second user in the dialogue process.

The present specification provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the above-described bad case identification method.

The present specification provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the above-mentioned bad case identification method when executing the program.

The technical scheme adopted by the specification can achieve the following beneficial effects:

in the method for identifying the bad examples provided by the present specification, the data to be identified and preset guide information are displayed to a first user, the first user is guided to input information of a specified type for the data to be identified, the information of the specified type input by the first user is used as feature information corresponding to the data to be identified, the feature information is input to a bad example identification model trained in advance, and the bad example reason type of the data to be identified is obtained. Therefore, the method obtains the characteristic information of the data to be identified by guiding the first user to input the information of the specified type aiming at the data to be identified according to the preset guiding information, standardizes the judgment of the bad case and improves the accuracy of the reason analysis of the bad case.

Drawings

The accompanying drawings, which are included to provide a further understanding of the specification and are incorporated in and constitute a part of this specification, illustrate embodiments of the specification and together with the description serve to explain the specification and not to limit the specification in a non-limiting sense. In the drawings:

FIG. 1 is a schematic flow chart of a method for identifying bad cases in this specification;

FIG. 2 is a schematic diagram of a bad case identification apparatus provided in the present specification;

fig. 3 is a schematic diagram of an electronic device corresponding to fig. 1 provided in the present specification.

Detailed Description

In order to make the objects, technical solutions and advantages of the present disclosure more clear, the technical solutions of the present disclosure will be clearly and completely described below with reference to the specific embodiments of the present disclosure and the accompanying drawings. It is to be understood that the embodiments described are only a few embodiments of the present disclosure, and not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present specification without any creative effort belong to the protection scope of the present specification.

With the rapid development of artificial intelligence technology, man-machine conversation is increasingly applied to the work and life of people. In the man-machine conversation, the information exchange between the human and the intelligent conversation system is completed through a certain interaction mode between the user and the intelligent conversation system.

In general, an intelligent dialog system may be implemented by technologies such as voice wakeup (KWS), Automatic Speech Recognition (ASR), Natural Speech Processing (NLP), and Natural Language Generation (NLG). The core of the man-machine conversation is that the intelligent conversation system can automatically understand and analyze the information input by the user according to prior data training or learning under a preset system frame and give meaningful responses. The information input by the user may relate to task-oriented information such as business consultation and business handling, or may be non-task-oriented information such as chatting, and the information input by the user may be information in the form of voice, text, picture, and the like, and the type and form of the information input by the user are not limited in this specification.

However, since the algorithm for recognizing and understanding the user input information is not complete in the existing intelligent dialog system, the intelligent dialog system has not been able to fully achieve the same understanding and expression ability as a human. This may cause that the answer output by the intelligent dialog system for the question posed by the user may not meet the expectation of the user during the man-machine dialog, and the wrong answer to the question posed by the user by the intelligent dialog system is taken as a bad example in the man-machine dialog. The existence of bad examples can cause the human-computer conversation to be carried out unsuccessfully, thereby influencing the reliability of the human-computer conversation function. In order to improve the reliability of the intelligent dialogue system, the defects of the man-machine dialogue algorithm can be improved and improved in a targeted manner by finding out bad examples and determining the reasons for generating the bad examples in the man-machine dialogue process, so that the performance of identifying and analyzing user input information by the intelligent dialogue system in the man-machine dialogue is improved. And the bad examples comprise dialogue data which are output by the intelligent dialogue system and the second user in the dialogue process and do not accord with the intention of the second user.

In the embodiment of the present specification, a worker who assists in analyzing the cause of the bad case is taken as a first user, and a user who has a conversation with the intelligent conversation system is taken as a second user. The intelligent dialogue device configured with the intelligent dialogue system and used by the second user may be a smart speaker, a smart phone, an intelligent dialogue robot, and the like, which is not limited in this specification.

The technical solutions provided by the embodiments of the present description are described in detail below with reference to the accompanying drawings.

Fig. 1 is a schematic flow chart of a method for identifying a bad case in this specification, which specifically includes the following steps:

s100: and acquiring data to be identified.

Typically, the second user performs a human-machine dialogue interaction with the intelligent dialogue device while using the intelligent dialogue device configured with the intelligent dialogue system. In the process of the man-machine conversation, the intelligent conversation system can complete a round of conversation with the second user through three processes of firstly receiving and processing user conversation data input by the second user, secondly analyzing the intention of the second user, and then outputting the intelligent conversation data according to the intention of the second user. Of course, the second user may have multiple rounds of conversations with the intelligent dialog system in order to achieve the second user's intent.

During the man-machine conversation, the intelligent conversation system can record user conversation data input by a second user and intelligent conversation data output by the intelligent conversation system in response to the user conversation data to generate at least one pair of conversation data. In general, each pair of dialog data may include user dialog data input by a second user and intelligent dialog data output by the intelligent dialog system in response to the user dialog data. The user dialog data input by the second user may be data in the form of voice, text, picture, and the like, which is not limited in this specification.

S102: displaying the data to be identified and preset guide information to a first user; the guide information is used for guiding the first user to input information of a specified type aiming at the data to be identified.

And displaying the data to be recognized and guide information preset aiming at the data to be recognized to a first user, and finishing the extraction of the characteristics of the data to be recognized by guiding the first user to input the information of the specified type aiming at the data to be recognized. In the step, the guidance information aiming at the data to be identified is preset, and the process of analyzing the data to be identified by the first user is structured and normalized, so that the problem of incomplete or wrong feature extraction of the data to be identified caused by insufficient business experience or subjective idea of the first user is avoided, and meanwhile, the feature information of the data to be identified is easy to quantify.

S104: and receiving the information input by the first user for the specified type of the data to be identified based on the guide information, and taking the information input by the first user as the characteristic information of the data to be identified.

S106: inputting the characteristic information into a pre-trained bad case identification model to obtain the bad case reason type of the data to be identified; the bad examples comprise unexpected dialogue data output by the intelligent dialogue system and the second user in the dialogue process.

Optionally, the dialogue data, the data to be recognized and the feature information of the data to be recognized are input into a pre-trained bad case recognition model, so as to obtain the bad case cause type to which the data to be recognized belongs.

Specifically, the type of the cause of the bad case of the data to be identified may include at least one of an error in identifying content in the user dialog data by the intelligent dialog system, an error in identifying intent, and an inability of the intelligent dialog system to implement the second user intent. When the bad case cause type is that the intelligent dialogue system does not have the function of realizing the second user intention, the intelligent dialogue system correctly identifies the content in the user dialogue data and also correctly identifies the second user intention, but the intelligent dialogue system does not have the function of realizing the second user intention, so that the function which is expected by the user cannot be realized aiming at the second user intention, and the bad case is caused.

The method has the advantages that the type of the cause of the bad case of the data to be recognized is obtained by inputting the characteristic information of the data to be recognized into the pre-trained bad case recognition model, so that the problems of analysis errors of the cause of the bad case caused by insufficient manual business experience and subjective thoughts can be at least partially avoided, and the accuracy of analyzing the cause of the bad case is improved.

In this embodiment of the present specification, in the step S100 of fig. 1, the data to be recognized for analyzing the cause type of the bad case may be dialogue data between the intelligent dialogue system and the second user, or dialogue data processed by a bad case discovery model, which is not limited in this specification.

For the case that the data to be recognized is dialog data of the intelligent dialog system with the second user: and directly taking the acquired dialogue data as data to be identified so as to analyze the data to be identified subsequently to obtain the cause type of the bad case. Compared with the data to be recognized obtained by analyzing the dialogue data by using the bad case discovery model, the dialogue data is not additionally processed under the condition, even if part of labor cost is increased, the feature information in the dialogue data is retained to the maximum extent, and the accuracy of subsequently determining the feature information of the data to be recognized and the bad case cause type can be improved.

For the case where the data to be recognized is dialogue data processed by the bad case discovery model: and inputting the acquired dialogue data of the intelligent dialogue system and the second user into a pre-trained bad case discovery model, and judging whether the dialogue data input into the bad case discovery model is a bad case. And if the bad case finding model judges that the input dialogue data is not a bad case, the dialogue data input with the bad case finding model is not processed. If the bad case finding model judges that the input dialogue data is a bad case, the dialogue data input into the bad case finding model is used as data to be recognized, so that the data to be recognized can be analyzed subsequently to obtain the cause type of the bad case. Because the dialogue data is identified through the bad case discovery model, at least part of the dialogue data which are not bad cases are removed, and the efficiency of bad case analysis is improved.

In this embodiment of the specification, as shown in steps S102 to S104 in fig. 1, the data to be recognized and preset guide information are displayed to a first user, and information of a specified type input by the first user for the data to be recognized based on the guide information is received as feature information of the data to be recognized, the guide information is information for guiding the first user to input the specified type for the data to be recognized, and the information of the specified type input by the first user is used as feature information of the data to be recognized in a process of analyzing a cause type of a bad case in a subsequent process, so that the guide information needs to be set for the cause type of the bad case of the data to be recognized.

Alternatively, the guidance information may be set as a question posed for the data to be identified. The method comprises the steps of presenting data to be recognized and guide information to a first user, guiding the first user to answer and explain questions about the data to be recognized, obtaining answers input by the first user for the data to be recognized, and obtaining feature information of the data to be recognized. The guidance information in the form of the question can be in the forms of a selection question, a judgment question, a blank filling question, a short answer question and the like, and the format of the answer is limited according to the question, so that the information of the specified type input by the first user is more standardized, and the problem that the feature extraction of the data to be identified is incomplete or wrong due to insufficient business experience or subjective thoughts of the first user is avoided.

The guide information and the feature information of the data to be recognized may include at least the following four types:

the first type is: and when the guiding information comprises guiding information for guiding the first user to input the conversation turn in which the bad example is positioned, the characteristic information of the data to be identified comprises the conversation turn in which the bad example is positioned.

The data to be identified displayed to the first user can comprise multiple rounds of conversations of the intelligent conversation system and the second user, in order to accurately position the round of occurrence of the bad case, the first user can be guided to input guide information of the conversation round where the bad case is located by setting, the first user is guided to mark the conversation round where the bad case is located, the position of the bad case is conveniently positioned, and therefore the conversation round where the bad case corresponding to the data to be identified input by the first user is located can be used as feature information of the data to be identified.

Optionally, the first type of guidance information exists in the form of a question, which is a fill-in-blank question that guides the first user to fill in the turn of the dialog in which the bad case is located. For example, the data to be recognized is presented to the first user, and a fill-in question for asking the turn of the dialog in which the bad example is located is "bad example is in the () th turn of the dialog", and the fill-in is restricted to being able to fill in numbers only. The first user may fill in the above-mentioned gap filling question "bad case is in the second (second) round of the dialog" when it is determined that the bad case is in the second round of the dialog by the analysis of the data to be recognized.

The second type: when the guide information comprises guide information for guiding a first user to input content recognition accuracy in the user dialogue data, the characteristic information of the data to be recognized comprises information of content recognition errors in the user dialogue data.

When a second user inputs user dialogue data, the content recognition accuracy of the intelligent dialogue system is low due to the reasons that the data needing to be input is wrongly memorized, the self requirements cannot be expressed smoothly, dialects are used, sentence breaks are not clear and the like. The low accuracy of content recognition in the user dialogue data can at least comprise missing, error, redundancy and incorrect sentence break of text content when the text is converted from voice to text or extracted from images. And guiding the first user to mark information of content recognition errors in the user dialogue data by setting guiding information for guiding the first user to input content recognition accuracy in the user dialogue data, so that the information of the content recognition errors corresponding to the data to be recognized input by the first user is used as the feature information of the data to be recognized.

For example, in a scenario where the second user chats with the intelligent dialogue device, among the data to be identified displayed to the first user, the first user dialogue data is "you eat beef dray", the intelligent dialogue data is "i do not eat beef", and the second user dialogue data is "i ask you not to eat beef jerky". Obviously, this is the case because the intelligent dialogue device recognizes the user dialogue data incorrectly, which results in a content recognition error of the intelligent dialogue system, and thus a bad example, and obviously, the bad example causes the type to recognize the content in the user dialogue data incorrectly. Therefore, the guiding information can guide the first user to label the beef jerky in the user dialogue data by combining multiple rounds of data to be recognized, and input the information that the intelligent dialogue system wrongly recognizes the content in the user dialogue data in the current round of dialogue.

Optionally, the second type of guidance information exists in the form of a question, which may be a choice question or a judgment question, and guides the first user to judge whether the content corresponding to the data to be recognized is recognized incorrectly. For example, the data to be recognized is presented to the first user, and the question asking whether the content corresponding to the data to be recognized recognizes the wrong choice question, "bad () content recognition error of the second round of dialog", and the first user is restricted to being able to select only yes or no answer. The first user may answer the above-mentioned choice question "bad example (yes) of the second round of dialog is a content recognition error" when it is determined through analysis of data to be recognized that the bad example in the second round of dialog is a content recognition error.

The third type: when the guide information includes guide information for guiding the first user to input the intention recognition accuracy, the feature information of the data to be recognized includes information of an intention recognition error.

In the process of the man-machine conversation, the situation that the intelligent conversation system incorrectly identifies the intention of the second user input information can include that the actual input information of the second user is inconsistent with the requirement, the intelligent conversation system is not corrected, the intelligent conversation system incorrectly positions the service field of the second user input information, and the intention is incorrectly identified due to the wrong content identification.

And guiding the first user to mark the information of the intention recognition error by setting guiding information for guiding the first user to input the intention recognition accuracy so as to take the information of the intention recognition error corresponding to the data to be recognized input by the first user as the characteristic information of the data to be recognized.

For example, in the data to be identified displayed to the first user, the current round of dialogue is that the user dialogue data is "i want to listen to song a", and the intelligent dialogue data is "detailed introduction of drama a". This is the case where the second user expects the smart device equipped with the smart dialog system to play song a, and since the names of song a in the user dialog data and drama a output by the smart dialog system are the same, the smart dialog system erroneously recognizes the intention of the second user, resulting in an output result that does not match the expectation of the second user, and it is apparent that the bad example of this bad example is of the type in which the smart dialog system erroneously recognizes the intention. Thus, the guidance information may guide the first user to label "listen" in the user dialog data and "drama" in the intelligent dialog data in the current round of dialog, and input information that the intelligent dialog system recognized the second user's intention as wrong in the current round of dialog.

Optionally, the guidance information of the third type exists in the form of a question, which may be a choice question or a judgment question, and guides the first user to judge whether the intention corresponding to the data to be recognized identifies an error. For example, the data to be recognized is presented to the first user, and the question asking whether the intention corresponding to the data to be recognized recognizes the wrong choice question, "bad case () of the second round of dialog is an intention recognition error", and the first user is restricted to be able to select only whether or not to answer. The first user may answer the above-mentioned choice question "bad example (not) of the second round of dialog is intended to identify error" when the first user determines, through analysis of data to be identified, that the bad example in the second round of dialog is not a content identification error.

The fourth type: when the guide information comprises guide information for guiding the first user to input the reason for generating the bad example, the characteristic information of the data to be identified comprises the reason for generating the bad example.

Optionally, the type of the cause of the bad case of the to-be-identified data shown to the first user does not belong to a content identification error or an intention identification error, and at this time, the first user may be directed to analyze the to-be-identified data in combination with business experience and input the cause of the bad case corresponding to the to-be-identified data, so that the cause of the bad case corresponding to the to-be-identified data input by the first user is used as the feature information of the to-be-identified data.

For example, in the data to be identified displayed to the first user, the user dialogue data is "delete order record", and the intelligent dialogue data is "sorry, which temporarily does not support this function". This is because the intelligent dialogue system correctly recognizes the content of the user dialogue data and also correctly recognizes the intention of the second user, but the intelligent dialogue system having a dialogue with the second user does not have a function of realizing the intention of the second user, and therefore the intelligent dialogue system cannot delete the order record, which leads to a bad example. In this case, the reason why the first user inputs the bad example of the current dialog can be guided to the above situation, and the current intelligent dialog system does not have the function of realizing the second user intention on the premise of correctly identifying the second user intention.

Optionally, a fourth type of guidance information exists in the form of a question, which is in the form of a brief answer, to guide the first user to briefly describe the reason for the bad case. For example, the first user is presented with the data to be identified, and a brief question "please brief the reason for the generation of the bad example in the second round of the dialog" for asking the reason for the generation of the bad example. And the first user determines the reason for generating the bad examples through the analysis of the data to be recognized, and outlines the reason for generating the bad examples in the second round of conversation at the specified input entry.

In the embodiment of the present disclosure, the bad case finding model shown in step S100 in fig. 1 may be specifically trained by the following method.

Firstly, obtaining sample dialogue data in advance as a sample for training a bad case discovery model;

secondly, inputting the sample into a bad case discovery model to be trained to obtain a discovery result to be optimized output by the bad case discovery model to be trained;

and then, training the bad case discovery model by taking the minimum difference between the discovery result to be optimized and the labeled discovery result corresponding to the sample as a training target.

The labeled discovery result corresponding to the sample can be the discovery result of the sample labeled by a professional in the business field corresponding to the sample. And the difference degree between the discovery result to be optimized and the labeled discovery result corresponding to the sample can be evaluated by using a loss function, and the bad case discovery model is trained by using the minimization of the loss function value as a training target by using a gradient descent method and a back propagation algorithm. The loss function used in the present specification is not limited thereto.

In this embodiment, the bad case identification model shown in step S106 in fig. 1 may be trained by the following method.

Firstly, sample dialogue data, sample to-be-identified data corresponding to the sample dialogue data and sample characteristic information of the sample to-be-identified data are obtained in advance and used as samples for training a bad case identification model.

Secondly, inputting the sample into a bad case identification model to be trained to obtain the reason type of the bad case to be optimized output by the bad case identification model to be trained.

And then, training the bad case identification model by taking the difference minimization between the type of the bad case reason to be optimized and the type of the marked bad case reason corresponding to the sample as a training target.

The marked bad case cause type corresponding to the sample can be the marked bad case cause type of the sample by professionals in the business field corresponding to the sample. The scheme for training the bad case identification model is similar to the scheme for training the bad case discovery model, and is not described herein again.

Based on the same idea, the bad example identification method provided for one or more embodiments of the present specification further provides a corresponding bad example identification device, as shown in fig. 2.

Fig. 2 is a schematic diagram of a bad case identification apparatus provided in this specification, which specifically includes:

a to-be-identified data acquisition module 200, configured to acquire to-be-identified data;

a display module 202, configured to display the data to be identified and preset guidance information to a first user; the guide information is used for guiding the first user to input information of a specified type aiming at the data to be identified;

a receiving module 204, configured to receive information input by the first user for a specified type of the data to be identified based on the guidance information, and use the information input by the first user as feature information of the data to be identified;

a bad case cause type obtaining module 206, configured to input the feature information into a pre-trained bad case identification model, so as to obtain a bad case cause type to which the data to be identified belongs; the bad examples comprise unexpected dialogue data output by the intelligent dialogue system and the second user in the dialogue process.

Optionally, the obtaining module 200 is specifically configured to obtain dialog data between the intelligent dialog system and the second user; inputting the dialogue data into a pre-trained bad case discovery model, and judging whether the dialogue data input into the bad case discovery model is a bad case; and if so, inputting the dialogue data of the bad case discovery model as the data to be identified.

Optionally, the preset guidance information includes: and the guidance information is used for guiding the first user to input at least one of the guidance information of the dialogue turn where the bad case is positioned, the guidance information used for guiding the first user to input the content recognition accuracy in the user dialogue data, the guidance information used for guiding the first user to input the intention recognition accuracy and the guidance information used for guiding the first user to input the reason of the bad case.

Optionally, when the guidance information includes guidance information for guiding the first user to input a conversation turn in which a bad example is located, the feature information of the data to be identified includes the conversation turn in which the bad example is located; when the guide information comprises guide information for guiding a first user to input content recognition accuracy in the user dialogue data, the characteristic information of the data to be recognized comprises information of content recognition errors in the user dialogue data; when the guide information includes guide information for guiding a first user to input intention recognition accuracy, the feature information of the data to be recognized includes intention recognition error information; when the guide information comprises guide information for guiding the first user to input the reason for generating the bad example, the characteristic information of the data to be identified comprises the reason for generating the bad example.

Optionally, the bad case cause type obtaining module 206 is specifically configured to input the dialogue data, the data to be recognized, and the feature information of the data to be recognized into a pre-trained bad case recognition model, so as to obtain a bad case cause type to which the data to be recognized belongs.

Optionally, the apparatus further comprises:

the bad case identification model training module 208 is specifically configured to obtain sample dialogue data, sample to-be-identified data corresponding to the sample dialogue data, and sample feature information of the sample to-be-identified data in advance, as a sample for training a bad case identification model; inputting the sample into a bad case identification model to be trained to obtain the reason type of the bad case to be optimized output by the bad case identification model to be trained; and training the bad case identification model by taking the minimum difference between the reason type of the bad case to be optimized and the reason type of the marked bad case corresponding to the sample as a training target.

Optionally, the apparatus further comprises:

a bad case discovery model training module 210, configured to obtain sample dialogue data in advance, where the sample dialogue data is used as a sample for training a bad case discovery model; inputting the sample into a bad case discovery model to be trained to obtain a discovery result to be optimized output by the bad case discovery model to be trained; and training the bad case discovery model by using the minimization of the difference between the discovery result to be optimized and the labeled discovery result corresponding to the sample as a training target.

The present specification also provides a computer-readable storage medium storing a computer program, which can be used to execute the bad case identification method provided in fig. 1.

This specification also provides a schematic block diagram of the electronic device shown in fig. 3. As shown in fig. 3, at the hardware level, the electronic device includes a processor, an internal bus, a network interface, a memory, and a non-volatile memory, but may also include hardware required for other services. The processor reads a corresponding computer program from the non-volatile memory into the memory and then runs the computer program to implement the bad case identification method described in fig. 1. Of course, besides the software implementation, the present specification does not exclude other implementations, such as logic devices or a combination of software and hardware, and the like, that is, the execution subject of the following processing flow is not limited to each logic unit, and may be hardware or logic devices.

In the 90 s of the 20 th century, improvements in a technology could clearly distinguish between improvements in hardware (e.g., improvements in circuit structures such as diodes, transistors, switches, etc.) and improvements in software (improvements in process flow). However, as technology advances, many of today's process flow improvements have been seen as direct improvements in hardware circuit architecture. Designers almost always obtain the corresponding hardware circuit structure by programming an improved method flow into the hardware circuit. Thus, it cannot be said that an improvement in the process flow cannot be realized by hardware physical modules. For example, a Programmable Logic Device (PLD), such as a Field Programmable Gate Array (FPGA), is an integrated circuit whose Logic functions are determined by programming the Device by a user. A digital system is "integrated" on a PLD by the designer's own programming without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Furthermore, nowadays, instead of manually making an Integrated Circuit chip, such Programming is often implemented by "logic compiler" software, which is similar to a software compiler used in program development and writing, but the original code before compiling is also written by a specific Programming Language, which is called Hardware Description Language (HDL), and HDL is not only one but many, such as abel (advanced Boolean Expression Language), ahdl (alternate Hardware Description Language), traffic, pl (core universal Programming Language), HDCal (jhdware Description Language), lang, Lola, HDL, laspam, hardward Description Language (vhr Description Language), vhal (Hardware Description Language), and vhigh-Language, which are currently used in most common. It will also be apparent to those skilled in the art that hardware circuitry that implements the logical method flows can be readily obtained by merely slightly programming the method flows into an integrated circuit using the hardware description languages described above.

The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium that stores computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, and embedded microcontrollers, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic for the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may thus be considered a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.

The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functions of the various elements may be implemented in the same one or more software and/or hardware implementations of the present description.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.

As will be appreciated by one skilled in the art, embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the description may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

This description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The above description is only an example of the present specification, and is not intended to limit the present specification. Various modifications and alterations to this description will become apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present specification should be included in the scope of the claims of the present invention.

Claims

1. A bad case identification method is characterized by comprising the following steps:

acquiring data to be identified;

2. The method of claim 1, wherein obtaining data to be identified specifically comprises:

3. The method of claim 2, wherein the data to be identified includes at least user dialog data entered by the second user and smart dialog data output by the smart dialog system in response to the user dialog data.

4. The method of claim 3, wherein the preset guidance information comprises: the method comprises the steps of providing a first user with a dialogue system, and providing a first user with a dialogue system, wherein the dialogue system comprises at least one of guiding information for guiding the first user to input a dialogue turn in which a bad case is positioned, guiding information for guiding the first user to input content recognition accuracy in the user dialogue data, guiding information for guiding the first user to input intention recognition accuracy, and guiding information for guiding the first user to input a reason for generating the bad case.

5. The method of claim 4, wherein when the guidance information includes guidance information for guiding the first user to input a conversation turn in which a bad case is located, the feature information of the data to be identified includes the conversation turn in which the bad case is located;

when the guide information comprises guide information for guiding a first user to input content recognition accuracy in the user dialogue data, the characteristic information of the data to be recognized comprises information of content recognition errors in the user dialogue data;

6. The method of claim 2, wherein the step of inputting the feature information into a pre-trained bad case identification model to obtain the bad case cause type to which the data to be identified belongs specifically comprises:

7. The method of claim 6, wherein pre-training the bad case identification model specifically comprises:

8. The method of claim 2, wherein pre-training the bad case discovery model specifically comprises:

obtaining sample dialogue data in advance as a sample for training a bad case discovery model;

9. A bad case identification apparatus, comprising:

10. A computer-readable storage medium, characterized in that the storage medium stores a computer program which, when executed by a processor, implements the method of any of the preceding claims 1 to 8.

11. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any of claims 1 to 8 when executing the program.