CN114168706A

CN114168706A - Intelligent dialogue ability test method, medium and test equipment

Info

Publication number: CN114168706A
Application number: CN202010948532.3A
Authority: CN
Inventors: 杨佳霖; 周立君; 蒋孝霞
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2020-09-10
Filing date: 2020-09-10
Publication date: 2022-03-11

Abstract

The application relates to an intelligent dialogue capability test method, a medium and a test device, comprising: the test equipment plays the test statement to the electronic equipment and receives the feedback operation generated by the electronic equipment on the test statement by using the voice assistant application program; the testing equipment records dialogue linguistic data, wherein the dialogue linguistic data comprises a testing statement and a feedback statement corresponding to feedback operation; the testing equipment determines an intention dialogue path according to the dialogue corpus, wherein the intention dialogue path is composed of one or more intents generated based on the testing statement and the feedback operation; the testing device determines a level of intelligent dialog capability of the voice assistant application by comparing the intended dialog path with a preset intended dialog path associated with the test statement. Therefore, the function of automatically measuring the intelligent conversation capacity of the voice assistant application program is realized, and the labor cost is saved.

Description

Intelligent dialogue ability test method, medium and test equipment

Technical Field

The present application relates to the field of information processing technologies, and in particular, to an intelligent dialogue capability test method, medium, and test device.

Background

The voice assistant is an application program which is equipped in most current intelligent terminal devices (such as a smart phone, a smart sound box and the like), and can solve problems for users or present functions of some intelligent devices in a smart conversation and instant question and answer mode.

The application of the voice assistant and its diversification, for example, combining Artificial Intelligence (AI), Automatic Speech Recognition (ASR), Natural Language Understanding (NLU), through wireless connection, not only can execute multimedia resource playing, but also can be connected to an internet of things device in series to realize intelligent testing home control, intelligent testing home, and vehicle-mounted voice control and telephone conference system. However, because of the diversification of the intelligent voice assistant application, the testing and verification of the voice assistant product are more difficult, and the intelligent testing conversation capability of the voice assistant, such as voice recognition, covered scenes, and the like, also needs to be considered to ensure the intelligent conversation quality of the product.

At present, in the intelligent dialogue capability test of the voice assistant, no test technology for automatically evaluating the intelligent dialogue capability of the intelligent voice assistant exists.

Disclosure of Invention

The embodiment of the application provides an electronic device and an intelligent dialogue capability test method, medium and test device of a voice assistant application program thereof, so that the intelligent dialogue capability of an intelligent voice assistant can be automatically evaluated.

A first aspect of the present application provides a method for testing intelligent dialogue capability of a voice assistant application, comprising:

the test equipment sends a test statement to the electronic equipment and receives a feedback operation generated by the electronic equipment on the test statement by utilizing a voice assistant application program;

the test equipment records dialogue linguistic data, and the dialogue linguistic data comprises test sentences and feedback sentences corresponding to the feedback operation;

the testing device determines an intention dialogue path according to the dialogue corpus, wherein the intention dialogue path is composed of one or more intents generated based on a testing statement and the feedback operation;

the testing device determines a level of intelligent dialog capability of the voice assistant application by comparing the intended dialog path with a preset intended dialog path associated with a test statement.

In the embodiment of the application, on one hand, the function of automatically measuring the intelligent conversation capacity of the voice assistant application program is realized, the intervention of people in the testing link of the intelligent conversation capacity of the voice assistant application program is reduced or even omitted, and the labor cost is saved. On the other hand, the intention in the dialogue corpus is compared with the preset intention, and because the intention is the extraction of the intention content of each sentence in the dialogue corpus, a large amount of comparison time is not needed, and different words have various different expression modes, when the preset intention dialogue path is not enough, the preset intention dialogue path is often matched, so that the purposes of accurate test and high test speed are achieved.

In one possible implementation of the first aspect, the comparing, by the testing device, the intention dialog path with a preset intention dialog path associated with a test statement includes:

the test equipment judges whether the intention dialogue path contains a preset intention dialogue path related to the test statement.

In the embodiment of the application, the preset intention dialogue path associated with the test statement is composed of an intention based on the test statement and an intention of setting feedback for the test statement. If the intention conversation path extracted from the conversation corpus comprises the preset intention conversation path associated with the test statement, the voice assistant application program has the conversation capacity corresponding to the preset intention conversation path associated with the test statement, so that the test equipment can automatically judge whether the voice assistant application program has a certain capacity in the intelligent conversation capacity by judging whether the intention conversation path comprises the preset intention conversation path associated with the test statement, and the purpose of automatically and quickly measuring the intelligent conversation capacity of the voice assistant application program is achieved.

In a possible implementation of the first aspect, the method further includes: the intention of each test statement and feedback statement corresponding to the feedback operation in the intention dialogue path corresponds to a node, and the method comprises the following steps:

and traversing the content of the node in the intention dialogue path, and matching the content of the node in the preset intention dialogue path list to judge whether the content of the preset intention dialogue path associated with the test statement is contained.

In the embodiment of the application, each test statement corresponds to one intention, and the feedback statement corresponding to the feedback operation corresponds to one intention, so that the purpose of automatically and quickly measuring the intelligent conversation capacity of the voice assistant application program can be achieved by judging the intention content of each node.

In a possible implementation of the first aspect, the method further includes: the contents of the nodes in the traversal intent dialog path include:

traversing the content of the nodes in the intention dialogue path through a sliding window, wherein the sliding window comprises N nodes, the sliding step length is M, N is a natural number larger than 2, and M is a natural number larger than or equal to 1.

In the embodiment of the application, the preset intention dialogue path associated with the test statement is composed of an intention based on the test statement and an intention of setting feedback for the test statement. Whether the voice assistant application program has intelligent conversation capacity or not is tested, and the intelligent conversation capacity can be tested through the intention of one test statement and the intention of one set feedback, and also can be tested through the intentions of two test statements and the intentions of two set feedbacks corresponding to each test statement, so that the number of nodes of the sliding window can be adjusted according to actual conditions, and the traversal sliding times are reduced. Accurately and quickly determining whether the intended dialog path in the dialog corpus contains a preset intended dialog path.

In a possible implementation of the first aspect, the method further includes: the determining the level of intelligent dialog capability of the voice assistant application comprises:

under the condition that the preset intention conversation path associated with the test statement is included in the intention conversation path, the electronic equipment determines that a voice assistant application program on the electronic equipment has intelligent conversation capacity corresponding to the preset intention conversation path;

in the case that the preset intention conversation path associated with the test statement is not included in the intention conversation path, the electronic device determines that the voice assistant application program on the electronic device does not have the intelligent conversation capability corresponding to the preset intention conversation path.

scoring at least one of the intelligent dialog capabilities of the voice assistant application and/or a total score for each of the intelligent dialog capabilities.

In a possible implementation of the first aspect, the method further includes: the feedback operation comprises feedback voice of the test statement and/or feedback information which is presented on a display screen of the electronic equipment and is associated with the feedback voice.

In a possible implementation of the first aspect, the method further includes: the preset intention dialog path associated with the test sentence is composed of an intention based on the test sentence and an intention of setting feedback for the test sentence.

In a possible implementation of the first aspect, the method further includes: the intelligent dialogue capability comprises at least one of: semantic understanding ability, behavior intelligent dialogue ability and bottom pocket expression ability.

A second aspect of the present application provides a computer-readable medium having stored thereon instructions that, when executed on a computer, cause the computer to perform the method for intelligent dialogue capability testing of a voice assistant application as provided in the first aspect above.

A third aspect of the present application provides a test apparatus, characterized by comprising:

a memory for storing instructions for execution by one or more processors of the system, an

The processor, which is one of the processors of the testing device, is used for executing the intelligent dialogue capability testing method of the voice assistant application program provided by the first aspect.

A fourth aspect of the present application provides a test system comprising an electronic device and a test device;

the electronic equipment is used for running a voice assistant application program;

the test equipment is used for sending a test statement to the electronic equipment and receiving a feedback operation generated by the electronic equipment on the test statement by using a voice assistant application program;

the testing equipment is used for recording dialogue corpora, and the dialogue corpora comprises testing sentences and feedback sentences corresponding to the feedback operation;

determining an intended dialog path from the dialog corpus, the intended dialog path consisting of one or more intents generated based on a test statement and the feedback operation;

for determining a level of intelligent dialog capability of the voice assistant application by comparing the intended dialog path to a preset intended dialog path associated with a test statement.

Drawings

FIG. 1A illustrates an application scenario diagram of a method for intelligent dialog capability testing for a voice assistant application, according to some embodiments of the present application.

FIG. 1B illustrates an application scenario diagram of a method for intelligent dialog capability testing for a voice assistant application, according to some embodiments of the present application.

FIG. 1C illustrates an application scenario diagram of a method for intelligent dialog capability testing for a voice assistant application, according to some embodiments of the present application.

FIG. 1D illustrates an application scenario diagram of a method for intelligent dialog capability testing for a voice assistant application, according to some embodiments of the present application.

Fig. 2A illustrates a prior art application scenario diagram of dialogue capability testing, according to some embodiments of the present application.

FIG. 2B illustrates a prior art application scenario diagram of dialogue capability testing, according to some embodiments of the present application.

FIG. 3 illustrates a block diagram of an intelligent dialog capability testing system for a voice assistant application, according to some embodiments of the present application.

FIG. 4 illustrates a flow diagram for a voice assistant application's intelligent dialog capability, according to some embodiments of the present application.

FIG. 5 illustrates a structural schematic diagram of a neural network-based dialogue model, according to some embodiments of the present application.

Fig. 6A illustrates a structural schematic of a conversation path, according to some embodiments of the present application.

Fig. 6B illustrates a structural schematic of another conversation path, according to some embodiments of the present application.

FIG. 7 illustrates a prompt, according to some embodiments of the present application.

FIG. 8 illustrates a flow diagram for a voice assistant application's intelligent dialog capability, according to some embodiments of the present application.

Fig. 9A illustrates a training process diagram of an intelligent dialogue capability evaluation model, according to some embodiments of the present application.

Fig. 9B illustrates an application process diagram of an intelligent dialogue capability evaluation model, according to some embodiments of the present application.

FIG. 10 illustrates a block diagram of a handset 100 capable of implementing the intelligent dialogue capability test functionality of a voice assistant application, according to some embodiments of the present application.

Fig. 11 illustrates a block diagram of the software architecture of the handset 100, according to some embodiments of the present application.

Detailed Description

Illustrative embodiments of the present application include, but are not limited to, intelligent dialog capability testing methods, media and test equipment for electronic devices and their voice assistant applications.

According to the embodiment of the application, the test equipment 200 is used for carrying out voice call with the electronic equipment 100 loaded with the voice assistant application program, so that the intelligent conversation capability of the voice assistant application program of the electronic equipment 100 is automatically tested. The scheme of the application is beneficial to reducing or even saving human intervention in the testing link of the intelligent conversation capacity of the voice assistant application program, and the labor cost is saved.

Embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

1A, 1B, 1C, and 1D illustrate application scenario diagrams of a method for intelligent dialog capability testing for a voice assistant application, according to some embodiments of the present application. As shown in fig. 1A, the scenario includes an electronic device 100, where the electronic device 100 is loaded with a voice assistant application, and when the electronic device 100 starts the voice assistant application, the testing device 200 may perform an intelligent dialog with the electronic device 100.

As shown in fig. 1B, the scenario includes the electronic device 100 and the test device 200, and during the test session, the test device 200 is configured to make the session with the electronic device 100 with a certain purpose. For example, the testing device 200 utters some keywords or expressions that are easily understood as other intentions to evaluate whether the voice assistant on the electronic device 100 correctly understands the intended keywords or expressions. If the result of the understanding is correct, the voice assistant application program can be considered to have the capability of correctly understanding the intended keywords or expressions. For another example, the test device 200 issues some intent-to-jump voices to judge whether the voice assistant on the electronic device 100 accurately recognizes the intent-to-jump voices, and if the recognition result is correct, it is determined that the voice assistant application has the capability of accurately recognizing the intent-to-jump voices. For another example, the test equipment 200 issues a sentence with multiple pronouns to determine whether the voice assistant of the electronic device 100 can accurately determine what the multiple pronouns refer to. The embodiment of the application can be combined with a multi-scene voice recognition testing technology, the dialogue linguistic data generated by the dialogue model is played in different scenes, and the voice recognition capability and the context processing capability of the intelligent testing voice assistant are comprehensively evaluated.

In some embodiments, the test device 200 is provided with the ability to train or optimize a training dialogue model. As shown in fig. 1C, a dialog conducted by the testing device 200 with the electronic device 100 may generate a new dialog corpus, that is, a dialog that has not occurred before, and the testing device 200 may use the new dialog corpus as a training set to train the dialog model, that is, more sample materials in each dialog scenario are used to optimize the dialog model and improve the intelligent dialog capability of the dialog model.

It will be appreciated that, unlike the embodiments described above, there is a separate training device which is used to train or optimise the training dialogue model. As shown in fig. 1D, the model training device 300 and the testing device 200 may be connected via a wired link or a wireless link, and when a dialog between the testing device 200 and the electronic device 100 generates a new dialog corpus, that is, a dialog that has not occurred before. The testing device 200 transmits the new dialog corpus to the model training device 300 so as to serve as a training set to train the dialog models in the model training device 300 and optimize the dialog models. The model training apparatus 300 may be a server, a computer, or the like. The model training apparatus 300 may be comprised of a processor and a memory. The processor may be a Graphics Processing Unit (CPU), a Central Processing Unit (CPU), or a Tensor Processing Unit (TPU).

It is understood that, although fig. 1A, fig. 1B, fig. 1C and fig. 1D show the mobile phone 100, the electronic device 100 suitable for the present application may be a device with intelligent dialogue capability, such as a mobile phone, a computer, an intelligent sound box, an intelligent electric curtain, an intelligent security device, an intelligent lighting device, an intelligent door lock, an intelligent smoke cooker, an intelligent electric rice cooker, an intelligent washing machine, an intelligent refrigerator, an intelligent electric curtain, an intelligent security device, an intelligent lighting device, an intelligent printer, an intelligent projector, etc., but is not limited thereto. Although fig. 1B, 1C, and 1D illustrate the smart test robot 200, the smart test robot 200 suitable for the present application may be a device having a smart conversation capability test, such as a mobile phone, a computer, or the like, but is not limited thereto.

Fig. 2A and 2B illustrate an application scenario diagram of a prior art dialogue capability test, according to some embodiments of the present application. As shown in fig. 2A, the testing device 400 is loaded with a voice assistant and a voice response testing program, and plays a pre-recorded testing voice through the audio 410, and adds other testing variables such as background noise, angle, distance, and wireless interference during the testing process to perform a stress test on the voice assistant. And judging the response accuracy of the intelligent voice assistant. The technology mainly tests the voice recognition capability of the intelligent voice assistant under different conditions, and does not evaluate the intelligent degree (such as knowledge richness, context processing capability and the like) after the voice recognition is successful comprehensively like the technology.

As shown in FIG. 2B, in the prior art, from three aspects of basic capability, primary knowledge service capability and advanced knowledge service capability, a corresponding test question library is constructed according to the intelligence level of the intelligent voice assistant at the present stage, and is used for evaluating the knowledge service capability of the intelligent voice assistant. However, the prior art has the following disadvantages: a large number of manual access links are required, for example, a question bank and corresponding test corpora/answers need to be constructed manually, so that the labor cost is high and the workload is large. The learning ability and the personalized service ability in the advanced knowledge service ability are tested, and whether a test target completes a task or not needs to be manually judged according to each case. The intelligent degree of the voice assistant in a single-turn question-answering scene is mainly tested, and the automatic assessment function of the voice assistant context processing capability (like graph continuation/skip, reference/context reasoning) in a multi-turn dialogue scene is not realized like the method.

In the following, the electronic device 100 is taken as the mobile phone 100, and the test device 200 is taken as the smart test robot 200.

Fig. 3 shows a schematic structural diagram of an intelligent dialogue capability test system of a voice assistant application according to some embodiments of the present application, and as shown in fig. 3, the system includes a mobile phone 100 and an intelligent test robot 200.

The handset 100 includes a speaker 101, a microphone 102, and a processor 103. The smart test robot 200 includes a speaker 201, a microphone 202, and a processor 203. In the embodiment of the present application, after the mobile phone 100 starts the voice assistant application, it is able to recognize the external voice. When the intelligent test robot 200 emits some keywords or expression voice which can be easily understood as other intentions through the speaker 201. Accordingly, the mobile phone 100 receives the voice emitted from the smart test robot 200 through the microphone 102 and transmits the voice to the processor 103. The processor 103 processes the speech to derive a feedback statement. The mobile phone 100 converts the feedback sentence into voice through the speaker 101 and sends the voice to the intelligent test robot 200. The smart test robot 200 receives the voice uttered by the cellular phone 100 through the microphone 202 and transmits the voice to the processor 203. The processor 203 processes the voice to obtain a voice feedback statement, and the intelligent test robot 200 converts the feedback statement into voice through the speaker 201 and sends the voice to the mobile phone 100. In the process of one-to-one conversation between the intelligent test robot 200 and the mobile phone 100, the intelligent test robot 200 measures the intelligent conversation capability of the voice assistant on the mobile phone 100. For example, the intelligent test robot 200 determines whether the voice assistant on the mobile phone 100 correctly understands the intended keyword or expression, and if the voice assistant does not correctly understand the intended keyword or expression, the voice assistant application is considered to have no ability to correctly understand the intended keyword or expression.

In order to intelligently and accurately recognize the intelligent conversation capability of the voice assistant application, such as the capability of correctly understanding the keyword or expression of the intention, the capability of accurately recognizing the intention skipping voice, and the like, the following describes the testing process of the intelligent conversation capability of the voice assistant application.

The first embodiment is as follows:

FIG. 4 illustrates a flow diagram for intelligent dialogue capability testing of a voice assistant application according to some embodiments of the present application. As shown in FIG. 4, the method for testing the intelligent conversation capability of the voice assistant application program according to the application comprises the following steps:

when testing the intelligent conversation capability of the voice assistant application program, the voice assistant application program needs to be run, that is, the voice assistant application program installed in the mobile phone 100 is triggered to be started, and the mobile phone 100 runs the voice assistant application program, can receive the language transmitted from the outside, and can respond to the language received from the outside.

The method of triggering the voice assistant application may be "voice assistant" as shown in FIG. 1A by clicking an icon control, or may be awakened by a specific voice.

S401, the intelligent test robot 200 performs a test dialogue with a voice assistant application program running on the mobile phone 100, and the intelligent test robot 200 records dialogue corpora.

In some embodiments, the test dialog is dialog content generated during a process of asking for a question and answering by the smart test robot 200 and the mobile phone 100, and the dialog corpus includes test statements and feedback statements corresponding to feedback operations. The dialog corpus may be obtained in the following exemplary manner.

The intelligent test robot 200 sends a test statement to the voice assistant application program on the mobile phone 100, and the mobile phone 100 generates a feedback operation on the test statement by using the voice assistant application program;

the intelligent test robot 200 records dialogue corpora, wherein the dialogue corpora comprise test sentences and feedback sentences corresponding to feedback operations; it is to be understood that the feedback operation may include feedback speech on the test statement; and/or feedback information associated with the feedback speech and/or text, e.g., feedback picture information or text information, etc., presented on the display of the handset 100.

In some embodiments, the trained dialog model is used to implement the dialog function between the smart test robot 200 and the voice assistant on the mobile phone 100, that is, the trained dialog model is loaded on the smart test robot 200, and the smart test robot and the voice assistant application program on the mobile phone 100 perform test statements to form dialog corpora. For example, the intelligent test robot 200 performs speech recognition on a section of speech uttered by the mobile phone 100 to obtain a text, inputs the text into the trained dialogue model, outputs an intention recognition on the section of speech uttered by the mobile phone 100 to obtain a feedback sentence, and the intelligent test robot 200 converts the feedback sentence into speech and utters the speech to the mobile phone 100.

In other embodiments, the trained dialogue model is used for generating context-coherent dialogue corpora according to an input dialogue objective such as an intention, slot information, an entity dialogue corpus, any one or at least one item in a dialogue path in a specific scene. The intention is a purpose of what is desired, i.e., a purpose classification, and for example, information for determining whether the purpose is to ask about weather, travel, or a movie, and different application scenes correspond to different intention classifications. An entity is a vocabulary representing a class, e.g., city, country, etc., and a different vocabulary in the same class of entity is replaced by constructing an entity vocabulary and a synonym table. Slots are understood to be attributes that an entity has well-defined, such as, for example, departure point slots, destination slots in a taxi, and attributes in departure time slots are "departure point", "destination", and "departure time", respectively. Padding refers to the process of completing information for the purpose of translating user intent into user-specific instructions, i.e., extracting values of well-defined attributes of a given entity from a large-scale corpus.

For example, inputting a trained dialog model with the intent of consulting weather and the entities friday, harabine, and sienna, generates the following exemplary dialog corpus:

##1593763495883

-utter_wakeup

wakeup _ response: i am there.

-utter_inquire_weather

Info _ weather: weather was good for [ friday ] { date } [ harabine ] { city }.

-utter_further_inquire_city_or_date

Info _ weather: weather is good for [ xi' an ] { city } [ today ] { date }.

-utter_hesitate_inquire_weather

Info _ weather: the weather of [ today ] { date } [ Harbin ] { city } is good.

The dialog corpus may be used to derive a preset intention dialog path through intention recognition, the preset intention dialog path being associated with the test sentence and being composed of an intention based on the intention of the test sentence and an intention fed back for a setting of the test sentence.

In the scheme of the application, the dialogue corpus under the specific scene/task is automatically generated to replace a manually written test corpus/question bank. The automatically generated dialogue corpus has enough generalization capability, and can reasonably act in the situation of dialogue which does not appear in the corpus.

The dialogue model can be trained in the following exemplary manner, and the initial training set comprises intents, slot information, entities and dialogue paths. The initial training sets can be obtained from a corpus database or manually compiled and input into a dialogue model to be trained for training to obtain a trained dialogue model. Some exemplary initial training sets containing intents, slot information, entities are as follows:

example 1

##intent：restaurant_search

-show me[Chinese]{cuisine}restaurants

In this example, # # is followed by the intent to be a restaurant query, -followed by the corpus to show me the Chinese cuisine, where [ Chinese ] is the country name entity, which may be replaced by other similar words in the vocabulary, { cuisine }.

Example 2

##intent：greet

-hey

-hello

-hi

-good morning

-good evening

-hey there

In this example, # # is followed by the intent to be a greeting, and followed by the corpus to be some calling words.

Some exemplary initial training sets for the dialog path are as follows:

example 1

##simple path

Weather _ address _ date _ time: { "weather _ address": "shanghai", "date _ time": "tomorrow" }

-utter_working_on_it

-action_report_weather

-utter_report_weather

The dialogue scenario for this embodiment is simply asking weather on Shanghai Ming day.

Example 2

##simple path with greet

*greet

-utter_greet

-utter_working_on_it

-action_report_weather

-utter_report_weather

The dialogue scenario of this embodiment is a politely greeting before asking for weather on the open sky.

Example 3

##address+date_time path with greet

*greet

-utter_greet

"weather _ address": "Shanghai" }

-utter_ask_date_time

Weather _ date _ time: { "weather _ date _ time": "tomorrow" }

-utter_working_on_it

-action_report_weather

-utter_report_weather

The dialogue scenario of this embodiment is to politely call out a greeting, and then ask what weather is to be consulted according to the intention of consulting the weather in Shanghai.

The trained dialog model may be a neural network-based dialog model. The structure of the neural network-based dialogue model is exemplified below. Fig. 5 shows a schematic structural diagram of a neural network-based dialogue model according to some embodiments of the present application, and as shown in fig. 5, the neural network-based dialogue model mainly includes three modules:

the Natural Language Understanding (NLU) module 501 mainly functions to process sentences input by a user or a result of speech recognition, and extract a dialog intention of the user and information transmitted by the user. For example, the user asks "i want to eat the mutton soup bun", the NLU module can recognize that the user's intention is "find restaurant", and the key entity is "mutton soup bun". In the embodiment of the present application, some exemplary initial training sets containing intents, slot information and entities may be used as the initial training set of the module.

A Dialog Management (DM) module 502, which is mainly used to control the process of human-machine dialog according to the results of the NLU module, such as the intent, slot, and key entity, to update the state of the system and generate corresponding system actions. The DM module includes two sub-modules: a Dialog State Tracking (DST) module 5021 and a Dialog Policy (DP) module 5022, wherein the DST module maintains the current dialog State according to the dialog history, and the dialog State is semantic representations such as accumulated intention, slot position and key entity of the whole dialog history; and the DP module outputs the next dialog action according to the current dialog state. For example, based on a text derived from a speech spoken by the cell phone 100, the smart test robot 200 may generate a response text to the text. In the embodiment of the present application, some exemplary initial training sets of the above-mentioned dialog paths may be used as the initial training set of the module.

Natural Language Generation (NLG) module 503: and the dialogue action output by the DM module is converted into a text, and the dialogue action is expressed in a text form.

It is understood that in some embodiments, the neural network may be a Long Short-Term Memory network (LSTM), a transform network, or the like. According to the embodiment of the application, the basic corpus capable of enabling the test to be executed is compiled to serve as an initial training set, new dialogue corpora are continuously explored in interaction between the dialogue model and the intelligent test voice assistant, the dialogue model is iteratively trained, and self-promotion is achieved.

In some embodiments, the dialogue model may be continually optimized. The dialogue model based on the neural network is introduced, interaction between a user and a voice assistant is simulated through deep learning, and the quality of generated dialogue can be continuously improved through self-learning. Specifically, the intelligent test robot 200 adds a newly generated dialogue corpus obtained in the process of dialogue with the voice assistant on the mobile phone 100 to the training set of the trained dialogue model, and performs iterative training on the trained dialogue model, that is, more sample materials are available in each dialogue scene, so as to optimize the dialogue model and improve the intelligent dialogue capability of the trained dialogue model. The newly generated dialogue corpus is different from the historically generated dialogue corpus. Determining new dialog corpuses may be obtained by the following example.

The intelligent test robot 200 records dialogue linguistic data in a dialogue process between the intelligent test robot 200 and the mobile phone 100;

the intelligent testing robot 200 matches the recorded dialogue corpus with the original dialogue corpus of the intelligent testing robot 200 in corpus content, and if the recorded dialogue corpus is not matched with the original dialogue corpus of the intelligent testing robot 200, determines that the recorded dialogue corpus is a new dialogue corpus.

In addition, in some other embodiments, the new dialog corpus may be optimized to obtain an optimized dialog corpus, the optimized dialog corpus is added to a training set of the trained dialog model, and the trained dialog model is iteratively trained to obtain the trained dialog model conforming to the preset loss function. It can be understood that the optimization method can be used for manually removing some unnecessary words, adjusting sentences with language sickness and the like.

S402, the intelligent test robot 200 determines a conversation path according to the conversation corpus.

In some embodiments, the dialog path is used to represent the reaction capability of the voice assistant application on the mobile phone 100 to some keywords or expression voices that are easily understood as other intentions, intention skip voices, sentences with multiple person pronouns, and the like when the voice assistant application dialogs with the intelligent test robot 200. The intelligent test robot 200 acquires dialogue corpora generated in an incoming and outgoing dialogue process with the mobile phone 100, arranges an incoming and outgoing dialogue according to the time sequence of voice emission to form a dialogue path, and determines intelligent dialogue capability according to the dialogue path.

In addition, in some other embodiments, comparing the sentences in the dialog corpus with the sentences for specifically testing the level of the intelligent dialog capability of the voice assistant application requires a lot of comparison time, and there are many different expression modes of the same sentence, and it is necessary to preset enough dialog paths to complete the task of testing the level of the intelligent dialog capability of the voice assistant application. Thus, a test is required for intent per sentence, i.e., determining an intended dialog path from the dialog corpus, the intended dialog path being used to determine a level of intelligent dialog capability of the voice assistant application in comparison to a preset intended dialog path associated with the test sentence. The following illustrates a solution for determining a level of intelligent dialog capability for a voice assistant application by presetting an intended dialog path.

According to the dialog corpus generated in the one-to-one dialog process of the intelligent test robot 200 and the mobile phone 100, the intelligent test robot 200 analyzes the speaking intention of the intelligent test robot 200 or the mobile phone 100 in the dialog process, that is, the intention dialog path is composed of one or more intents generated based on test sentences and feedback operations, so that the dialog path is conveniently and quickly compared with the preset intention dialog path. For example, the intention of each sentence is recognized through an intention recognition model, the intention is recorded in the form of a conversation process file, an intention conversation path formed by one to one conversation intention arranged according to the time sequence of voice emission is extracted from the conversation process file, and the intelligent conversation capability is determined according to the intention conversation path. The order of the voice sending time can be sequential or sequential.

Fig. 6A shows a structural diagram of a dialog path according to some embodiments of the present application, as shown in a text box 610 in fig. 6A, and the content and format of the dialog process file recorded by the intelligent test robot 200 may be as follows:

##1593763495883

initial: profile: friday { init _ date }; harbin { init _ city }; xi' an { future _ inquiry _ city };

-utter_wakeup

wakeup _ response: i am there.

-utter_inquire_weather：

Info _ weather: weather was good for [ friday ] { date } [ harabine ] { city }.

-utter_further_inquire_city_or_date

Info _ weather: weather is good for [ xi' an ] { city } [ today ] { date }.

-utter_hesitate_inquire_weather

Info _ weather: the weather of [ today ] { date } [ Harbin ] { city } is good.

The sentence starting with "+" is the text corresponding to the reply intention of the voice assistant on the handset 100. The sentence beginning with "-" is the slot information of the intelligent test robot 200 or the text corresponding to the execution action. ": the "following" indicates a specific sentence, which may be expressed in chinese format or english, but is not limited thereto. ": the preceding content table indicates the intent of a particular statement.

It is to be understood that the format of the dialog process file may be other than the format shown in fig. 6A, and is not limited thereto.

An intended dialog path is extracted from the dialog process file. For example, as shown in fig. 6A, the content of the test sentence and the intention of the feedback sentence corresponding to the feedback operation in the box 611 in the text box 610 are extracted, and the extracted content of the test sentence and the intention of the feedback sentence corresponding to the feedback operation are arranged according to the sequence of the voice sending time, so as to form an intention dialogue path. The content and format of the intended dialog path may be as shown in fig. 6A, where the content of node 1 is initial, representing the intention of the handset 100 to initiate an action, for example, turn on the voice assistant application and store the vocabulary of entities such as friday, harbin, and sienna. The content of the node 2 is-uter _ wakeup, which represents that the intelligent test robot 200 sends out a statement intended to wake up the voice assistant on the mobile phone 100. The content of the node 3 is wakeup _ response, which represents that the handset 100 sends out a statement to reply to the above-mentioned wake-up intention. The content of the node 4 is-uter _ inquiry _ weather, which represents that the intelligent test robot 200 issues a statement intended to ask for weather. The content of the node 5 is info _ weather, which represents that the handset 100 replies to the statement intended to ask for weather. The content of the node 6 is-uter _ future _ inquiry _ city _ or _ date, which represents that the intelligent test robot 200 further issues a weather statement intended to ask for other cities and dates. The content of the node 7 is represented by info _ weather, and the handset 100 replies to the statement intended to ask for weather. The content of the node 8 is-uter _ position _ inquiry _ weather, which represents that the intelligent test robot 200 issues a hesitation sentence intended to ask for weather. The content of the node 9 is "in form _ weather", which represents that the mobile phone 100 recognizes that the intention of the hesitation sentence is to inquire weather, and replies to the sentence with the intention of inquiring weather.

And S403, the intelligent test robot 200 determines the intelligent conversation capacity level of the voice assistant application program by comparing the conversation path with a preset conversation path associated with a test statement in the test conversation.

Under each specific scene such as booking tickets, shopping and the like, the mobile phone 100 and the testing robot 200 carry out conversation, each sentence in the conversation process corresponds to one intention, and each intention in the conversation process generates a conversation path. The conversation path represents the reaction capability of the mobile phone 100 to the test robot, so that the preset conversation path and the intelligent conversation capability have a mapping relation, and the intelligent conversation capability of the voice assistant application program on the mobile phone 100 can be judged from the conversation path.

In some embodiments, the intelligent dialogue capability includes three categories of semantic understanding capability, behavioral intelligent dialogue capability, and bottom-of-pocket performance capability. The following is an exemplary description of intelligent dialogue capability, and testing the intelligent dialogue capability level indicator is embodied in at least one of the following:

1. semantic understanding ability.

(a) Complex intent understanding capabilities. I.e., comprehension ability to include keywords or expressions in the material that are easily understood as other intentions; for example, the smart test robot 200 says to the cell phone 100 that i want to hear me going to pizza, the cell phone 100 should assign the intent of the sentence to music, not navigation; in the case where the cell phone 100 assigns the intention of the sentence to [ navigation ], it is determined that the voice assistant application does not have a complicated intention understanding capability.

For another example, the intelligent test robot 200 says that "i want tomorrow ladies to order a train ticket to shanghai, you remind me to rob the ticket at 8 o' clock in the morning", the mobile phone 100 should assign the intention of the sentence to [ item reminder ], instead of [ order train ticket ], and under the condition that the mobile phone 100 assigns the intention of the sentence to [ order train ticket ], it is determined that the voice assistant does not have a complex intention understanding capability.

(b) Hesitation/quiesce to correctly understand the capability. I.e. longer hesitations or pauses for the user to understand the ability correctly. For example, the smart test robot 200 says me wants to hear … …, me wants, or the blue-white porcelain of zhou-jersey to the cell phone 100. The cell phone 100 should assign the intent of the sentence to music instead of shopping, and in the case where the cell phone 100 assigns the intent of the sentence to shopping, it is determined that the voice assistant application does not have a complex intent understanding capability.

(c) Generalization expresses comprehension ability. That is, the intelligent test robot 200 may say that the mobile phone 100 says that i.e. i wants to eat kentucky or i wants to eat KFC, and the mobile phone 100 may understand the meaning of KFC as well as the meaning of kentucky, and in the case that the mobile phone 100 cannot understand the meaning of KFC, it is determined that the voice assistant application does not have the ability to understand the generalized expression.

(d) Intelligent slot filling capability. I.e., the ability to understand the user's intent, slot filling refers to the process of completing information in order to translate the user's intent into an instruction that is specific to the user. I.e., translating the user's implicit intent into instructions that are displayed for the computer to understand. For example, the smart test robot 200 says: "i want to order a shanghai air ticket", the mobile phone 100 recognizes that the voice is intended to order an air ticket, but slot information of the ticket ordering time is lacked, so that the mobile phone 100 may ask the intelligent test robot 200: the ticket booking time is provided with intelligent slot filling capability.

2. Behavioral intelligent dialogue capabilities.

(a) User guidance capabilities. That is, when the intelligent test robot 200 speaks a speech, but the mobile phone 100 cannot recognize the intention of the speech, the mobile phone 100 may send a guidance of changing the speech; in addition, when the smart test robot 200 speaks a piece of voice with instructions, the mobile phone 100 recognizes the intention of the piece of voice, but the mobile phone 100 does not have a function corresponding to the intention, the smart test robot 200 is guided to issue a piece of voice with instructions, for example, the smart test robot 200 speaks a piece of voice with instructions, "please help me play XX movies", but the mobile phone 100 does not support opening a video application program through voice to play XX movies, and at this time, the voice application program of the mobile phone 100 issues prompts and guides voice: "do not support XX movies played by the audio-enabled video application, and choose to enable the music application to play audio".

(b) The continuation/jump capability is intended. I.e., the previous round of topic/task has been completed, the ability to continue/alter/cancel the previous task in the new topic can be achieved. For example, the user: "remind me three meeting points tomorrow", reply: "good, alarm clock created", user: "change the alarm clock to two points"; the previous round of topic/task is not completed, the new topic is jumped in the middle, and the ability of the previous topic can be returned after the new topic is completed, for example, a user: "help me orders an air ticket to Beijing", reply: "how does tomorrow beijing weather? ", the user: "Mingtian Beijing weather sunny day, 27 degrees", reverts: the "tomorrow starts".

(c) Reference/context reasoning capabilities. I.e. the referring reasoning (xth, his/her/it) capability of user information in the same topic/task. For example, the user: "navigate to nearby supermarket", reply: "find 3, navigate to which? ", the user: "first". Or shared contextual reasoning (time, place) capabilities across topics/tasks; for example, reply: "how does tomorrow beijing weather? ", the user: "Mingtian Beijing weather sunny day, 27 degrees", reverts: "help me to order an air ticket there".

(d) Explicit/implicit confirmation capabilities. If the intelligent test robot 200 has the display confirmation capability, for example, the intelligent test robot 200 issues: an instruction of "order eight alarm clocks", the mobile phone 100 replies: "good, eight-point alarm clock has been created"; if the intelligent test robot 200 has the implicit confirmation capability, and the intelligent test robot 200 sends an instruction of setting an eight-point alarm clock, the mobile phone 100 replies: "good, eight point alarm clock has been created and will remind after two hours.

(e) Feedback capabilities are collected. For example, the mobile phone 100 may say to the intelligent test robot 200 that the dialog is not satisfied, and collect feedback information of the intelligent test robot 200.

3. The performance of the bottom of the body. Namely, the performance of the task corresponding to the wishful graph recognition and the voice with instructions cannot be completed.

(a) And feeding back failure reason capability. That is, the reason why the task corresponding to the voice with the instruction cannot be completed is the recognition of the wish diagram, for example, the intelligent test robot 200 has spoken a piece of voice with the instruction: "please help me play XX movie", but the mobile phone 100 does not support playing XX movie through the audio playback video application, at this time, the audio application of the mobile phone 100 will send a prompt voice indicating the reason of the playback failure: "do not support XX movies played by a voice-on video application".

(b) And (4) individualized reply capability. I.e., humorous, interesting reply capabilities, e.g., the intelligent test robot 200 speaks a piece of speech with instructions: "please help me play XX movie", but the mobile phone 100 does not support playing XX movie through the voice-enabled video application, and at this time, the voice application of the mobile phone 100 sends a reply message: "Chen Zi cannot do anything".

In the embodiment of the present application, the level of the intelligent conversation capability of the voice assistant application includes that the voice assistant application on the mobile phone 100 has the intelligent conversation capability and does not have the intelligent conversation capability; the level of intelligent dialog capabilities of the voice assistant application may also include, but is not limited to, a score of at least one of the intelligent dialog capabilities of the voice assistant application on the cell phone 100 or a composite score of a plurality of items of the intelligent dialog capabilities of the voice assistant application on the cell phone 100 in order to determine a level of the intelligent dialog capabilities of the voice assistant application on the cell phone 100.

The following exemplary description describes a technical solution for determining whether the voice assistant application on the mobile phone 100 has the above-mentioned intelligent conversation capability:

in some embodiments, the intelligent test robot 200 determines whether the dialog path includes a preset dialog path associated with the test statement, and if so, determines that the voice assistant application on the mobile phone 100 has an intelligent dialog capability corresponding to the preset dialog path. If not, determining that the voice assistant application program on the mobile phone 100 does not have the intelligent conversation capability corresponding to the preset conversation path. In some embodiments, upon determining that the voice assistant application does not have the intelligent dialogue capability corresponding to the preset dialogue path, a prompt message is displayed on the display screen of the intelligent test robot 200, for example, fig. 7 illustrates a prompt message according to some embodiments of the present application. As shown in fig. 7, when the intelligent test robot 200 determines that the voice assistant on the mobile phone 100 does not correctly understand the intended keyword or expression, it determines that the voice assistant application program does not have the capability of correctly understanding the intended keyword or expression, and displays "the capability that the voice assistant has the keyword or expression of correctly understanding the intention" on the display screen of the intelligent test robot 200.

Further, in some other embodiments, where the dialog path is an intent dialog path, i.e., an intent corresponding to a test statement and feedback operation arranged in chronological order of speech utterances. The intelligent test robot 200 determines whether the intended dialogue path includes a preset dialogue path associated with the test sentence; that is, the intelligent test robot 200 matches the content in the intended dialog path with the content in the preset dialog path, and determines whether the voice assistant application on the mobile phone 100 has the intelligent dialog capability corresponding to the dialog path according to the matching degree, if the intended dialog path matches the preset dialog path, and the matching degree is 100%, it means that the voice assistant application on the mobile phone 100 has the intelligent dialog capability corresponding to the dialog path. If the intention conversation path is matched with the preset conversation path, and the matching degree is less than 100%, it means that the voice assistant application program on the mobile phone 100 has the intelligent conversation capability corresponding to the conversation path.

It will be appreciated that in some embodiments, each test statement in the intended dialog path corresponds to a node with the intent of the feedback statement corresponding to the feedback operation, and the contents of the nodes in the intended dialog path are traversed and matched with the contents of the nodes in the preset dialog path list to determine whether the contents of the preset dialog path associated with the test statement are included. For example, traversing the contents of the nodes in the dialog path through a sliding window, where the sliding window includes N nodes and the sliding step is M, where N is a natural number greater than 2 and M is a natural number greater than or equal to 1.

As shown in fig. 6A, the content of the node 1 is "enter _ left _ require _ weather" and the content of the node 2 is "info _ weather" in the preset path. The content of each node in the intended dialogue path 620 is traversed by the sliding window 621 capable of detecting the content of the four nodes, and the content of the node 8 and the content of the node 9 in the intended dialogue path 620 are obtained to be matched with the content of the node 1 and the node 2 in the preset path, so that the judgment result is that the voice assistant application program on the mobile phone 100 has hesitation/pause correct understanding capability.

Fig. 6B is a schematic diagram illustrating another structure of the dialog path according to some embodiments of the present application, and as shown in fig. 6B, unlike the content of fig. 6A, when the intelligent test robot 200 asks how it is in today's harbourne weather in hesitant sentences, the mobile phone 100 replies to the intention of traveling because the preset path includes the content of the node 1 as _ enter _ help _ inquiry _ weather and the content of the node 2 as info _ weather. Through the content of each node in the traversal intention conversation path 620 of the sliding window 621 capable of detecting the content of the four nodes, the content of the node 8 and the content of the node 9 in the intention conversation path 622 are obtained to be only matched with the content of the node 1 in the preset path and not matched with the content of the node 2 in the preset path, and then the judgment result is determined that the voice assistant application program on the mobile phone 100 does not have the hesitation/pause correct understanding capability.

In some embodiments, the determination may be stored in the memory in a text form.

In addition, in some other embodiments, the feedback operation includes feedback information associated with feedback voice and/or text presented on the display screen of the mobile phone 100, and the smart test robot 200 determines whether the smart conversation capability is available by detecting feedback information displayed by the mobile phone 100 or performing an action. For example, when the intelligent test robot 200 sends a test statement for opening a video, the mobile phone 100 starts a video application according to the understanding of the test statement, and the intelligent test robot 200 determines that the video application is capable of semantic understanding if detecting that the video application is started.

In some embodiments, the preset dialog path associated with the test sentence is derived from an intention of the test sentence in the intelligent test robot 200 and an intention corresponding to a set feedback sentence for the test sentence. The preset intention dialog path may be text information composed of the test sentence intention and the set feedback sentence intention arranged in time series of dialog issuance. The preset dialog path may be stored in the intelligent test robot 200 or other memory where the intelligent test robot 200 may retrieve the preset dialog path.

It is understood that the preset dialog path may be manually written; or a dialogue path obtained by recognizing dialogue linguistic data extracted from a webpage dictionary and an online article through the intention of a machine or a person; or a preset dialogue corpus in a given environment is obtained through a trained dialogue model, and the test robot 200 or the human performs intention recognition on the dialogue corpus to obtain a preset intention dialogue path corresponding to the preset dialogue corpus.

Each dialogue unit of each test model comprises a certain test point in semantic understanding ability, behavior intelligent dialogue ability and bottom-of-pocket expression ability, and each test point has a corresponding score. The following illustrates an exemplary scenario for scoring a voice assistant application on a cell phone 100:

it is understood that in some embodiments, the preset score corresponding to at least one of the intelligent conversation capabilities corresponding to the voice assistant application on the mobile phone 100 may be obtained, so as to perform comprehensive scoring on the intelligent conversation capabilities of the voice assistant application on the mobile phone 100, or obtain a total scoring on the intelligent conversation capabilities of a plurality of items corresponding to the preset scores of the intelligent conversation capabilities.

The level of intelligent dialog capability of the voice assistant application may also be a level that scores the intelligent dialog capability of the voice assistant application to obtain a preset score for the intelligent dialog capability corresponding to the preset intended dialog path. The preset intention dialogue path comprises various paths, the content of nodes in the intention dialogue path is traversed through a sliding window based on each path in the various intention dialogue paths, and if one path comprising the various intention dialogue paths in the intention dialogue paths is identified, the intelligent dialogue capability corresponding to the path is provided. The score of each path in the preset paths may be the same or different. If it is recognized that the intended dialog path does not include the preset path, it indicates that the voice assistant application program on the mobile phone 100 does not have the intelligent dialog capability corresponding to the preset intended dialog path, and the score of the intelligent dialog capability is low, for example, zero.

For example, based on the content of fig. 4, taking the intelligent dialogue capability as the intent connection capability as an example, the intelligent test robot 200 sets a key _ query in the dialogue path when performing the intent connection capability test. For example, the key _ query "is used to indicate that the value (keyword) of the key question is the key _ inquiry _ weather, and is used to start the intention continuation capability corresponding to the key question, after the intelligent testing robot 200 checks the keyword corresponding to the key question, it further determines whether the intention continuation capability is provided and whether the score is scored, otherwise,' is output, indicating that the key point does not exist in the dialogue corpus, and the score is not counted. And under the condition of judging whether scores are obtained or not, traversing the whole group of conversation paths corresponding to the conversation corpora based on all paths in the preset paths to obtain an intelligent conversation capability test result.

The following illustrates the contents and formats of all paths in a preset path:

intention_continue_path＝{“0”}＝[“utter_inquire_weather”]

intention_continue_path＝{“1”}＝[“inform_weather”]

intention_continue_path＝{“2”}＝

[“utter_further_inquire_city_or_date”]

intention_continue_path＝{“3”}＝[“inform_weather”]

based on the above example of the preset path, if the content of the dialog path of a dialog corpus is the same as the content on the right side of the equation from path [0] to path [3], it is 2 points with the intention to continue.

The following is an exemplary description of the intelligent dialog capabilities and the scores (preset scores) with which the intelligent dialog capabilities correspond.

In particular, it will be appreciated that in some embodiments, the intelligent dialogue capability exhibited by the voice assistant on the handset 100 may be tested via a scoring script, which is an executable file written in a certain format using a particular descriptive language, and used to score dialogue corpora. For example, dialog corpora of the dialog model and the intelligent test voice assistant are recorded, and the intelligent test behavior (like graph continuation/jump, reference/contextual reasoning) exhibited by the voice assistant application is automatically scored by a scoring script. In addition, other machine languages are possible.

The embodiment of the application is mainly applied to the test and evaluation of the voice assistant. The industry does not form a comprehensive testing/evaluating system related to the intelligent testing degree of the voice assistant at present, and the embodiment of the application mainly performs automatic testing and evaluation aiming at the intention recognition capability of the voice assistant and the context processing capability (like graph connection/jump, reference/context reasoning) under a multi-turn conversation scene, thereby supplementing the deficiency of the testing method in the industry in the aspect to a certain extent.

The embodiment of the application has the advantages that: and automatically generating a test corpus and a test case, and simulating the conversation between a user and the intelligent test voice assistant to cover a test scene and branches.

The method solves the problem that the corpus can only be played to the voice assistant in a fixed sequence, and can automatically generate reasonable reply according to the feedback of the voice assistant, thereby achieving the anthropomorphic interaction with the voice assistant.

The method has self-learning capability, and the quality and diversity of the generated corpora/use cases can be improved by self along with the interaction with the intelligent test voice assistant.

The intelligent test voice assistant intention recognition capability and the context processing capability (like graph connection/jump, reference/context reasoning) in the multi-turn dialogue scene are automatically evaluated.

Example two:

the second embodiment is different from the first embodiment in that a deep learning-based scoring model is used instead of the dialogue path matching, and the intelligent dialogue capability automatic recognition and scoring is performed on spoken documents, and the technical scheme of the second embodiment is exemplarily described below.

FIG. 8 illustrates a flow diagram for a voice assistant application's intelligent dialog capability, according to some embodiments of the present application. As shown in fig. 8:

when the intelligent conversation capability test of the voice assistant application program is performed, the voice assistant application program needs to be run, that is, the voice assistant application program installed in the mobile phone 100 is triggered to be started, and the mobile phone 100 runs the voice assistant application program, can receive the language transmitted from the outside, and can respond to the language received from the outside.

S801, the intelligent test robot 200 performs a test dialogue with the voice assistant application program on the mobile phone 100, and the intelligent test robot 200 records dialogue corpora.

This step is similar to S402 and will not be described herein.

And S802, the intelligent testing robot 200 inputs the dialogue corpus into the intelligent dialogue capability evaluation model, and determines the intelligent dialogue capability level of the voice assistant application program based on the intelligent dialogue capability evaluation model.

In some embodiments, the level of intelligent dialog capability includes no intelligent dialog capability or intelligent dialog capability. In addition, in other embodiments, the method can further include the step of providing a score corresponding to each intelligent conversation capability, or providing a score corresponding to each intelligent conversation capability and a total score of each intelligent conversation capability.

Fig. 9A shows a schematic diagram of a training process of the intelligent dialogue capability evaluation model according to some embodiments of the present application, and as shown in fig. 9A, in a training phase, dialogue corpus data, intelligent dialogue capability label data, and score label data are input into the intelligent dialogue capability evaluation model 900 for training, so as to obtain the intelligent dialogue capability evaluation model 900 meeting a preset loss function condition.

The intelligent dialogue ability evaluation model 900 may be a neural network (e.g., RNN, Transformer) based evaluation model. It is to be understood that the dialogue corpus data, the intelligent dialogue capability tag data and the score tag data may be the training data obtained in the first embodiment, or may be obtained by other methods, such as manual labeling. The training process may be performed on the intelligent test robot 200 as shown in fig. 1C, or may be performed on the model training apparatus 300 as shown in fig. 1D.

Fig. 9B is a schematic diagram illustrating an application process of the intelligent dialogue capability evaluation model according to some embodiments of the present application, as shown in fig. 9B, in an application stage, inputting dialogue corpora by using the intelligent dialogue capability evaluation model 900, and obtaining capability test results such as whether the intelligent dialogue capability is provided and scores corresponding to the intelligent dialogue capability through the intelligent dialogue capability evaluation model 900.

According to the embodiment of the application, in the test of the intelligent test voice assistant, the dialogue behavior is generated through the dialogue model based on deep learning, and various scenes and branches are covered.

FIG. 10 illustrates a block diagram of a handset 100 capable of implementing the intelligent dialogue capability test functionality of a voice assistant application, according to some embodiments of the present application. Specifically, as shown in fig. 10, the mobile phone 100 may include a processor 110, an external memory interface 120, an internal memory 121, a Universal Serial Bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 180, a speaker 180A, a receiver 180B, a microphone 180C, an earphone interface 180D, a sensor module 180, keys 190, a motor 197, an indicator 192, a camera 193, a display screen 194, a Subscriber Identity Module (SIM) card interface 195, and the like. The sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, and the like.

It is to be understood that the illustrated structure of the embodiment of the present application does not specifically limit the mobile phone 100. In other embodiments of the present application, the handset 100 may include more or fewer components than shown, or some components may be combined, some components may be separated, or a different arrangement of components may be used. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

Processor 110 may include one or more processing units, such as: the processor 110 may include an Application Processor (AP), a modem processor, a Graphics Processing Unit (GPU), an Image Signal Processor (ISP), a controller, a video codec, a Digital Signal Processor (DSP), a baseband processor, and/or a neural-Network Processing Unit (NPU), etc. The different processing units may be separate devices or may be integrated into one or more processors. The processor 110 may determine whether the voice assistant application on the mobile phone 100 has the intelligent conversation capability corresponding to the preset conversation path according to the intended conversation path in the conversation corpus.

The controller can generate an operation control signal according to the instruction operation code and the timing signal to complete the control of instruction fetching and instruction execution.

A memory may also be provided in processor 110 for storing instructions and data. For example, the processor 110 may store dialog corpus, intended dialog paths, preset intended dialog paths, etc. data, and in some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that have just been used or recycled by the processor 110. If the processor 110 needs to reuse the instruction or data, it can be called directly from memory. Avoiding repeated accesses reduces the latency of the processor 110, thereby increasing the efficiency of the system.

In some embodiments, processor 110 may include one or more interfaces. The interface may include an integrated circuit (I2C) interface, an integrated circuit built-in audio (I2S) interface, a Pulse Code Modulation (PCM) interface, a universal asynchronous receiver/transmitter (UART) interface, a Mobile Industry Processor Interface (MIPI), a general-purpose input/output (GPIO) interface, a Subscriber Identity Module (SIM) interface, and/or a Universal Serial Bus (USB) interface, etc.

Micro USB interface, USB Type C interface etc.. The USB interface 130 may be used to connect a charger to charge the mobile phone 100, and may also be used to transmit data between the mobile phone 100 and peripheral devices, for example, to transmit PPG data and ECG data of a user. And the earphone can also be used for connecting an earphone and playing audio through the earphone. The interface may also be used to connect other electronic devices, such as AR devices and the like.

It should be understood that the interface connection relationship between the modules illustrated in the embodiment of the present application is only an exemplary illustration, and does not constitute a limitation on the structure of the mobile phone 100. In other embodiments of the present application, the mobile phone 100 may also adopt different interface connection manners or a combination of multiple interface connection manners in the above embodiments.

The charging management module 140 is configured to receive charging input from a charger. The power management module 147 is used to connect the battery 142, the charging management module 140 and the processor 180. The power management module 148 receives input from the battery 142 and/or the charge management module 140 and provides power to the processor 180, the internal memory 121, the display 194, the camera 193, the wireless communication module 160, and the like. The power management module 148 may also be used to monitor parameters such as battery capacity, battery cycle number, battery state of health (leakage, impedance), etc. In some other embodiments, the power management module 141 may also be disposed in the processor 180. In other embodiments, the power management module 141 and the charging management module 140 may be disposed in the same device.

The wireless communication function of the mobile phone 100 can be realized by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor, the baseband processor, and the like.

The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. Each antenna in the handset 100 may be used to cover a single or multiple communication bands. Different antennas can also be multiplexed to improve the utilization of the antennas. For example: the antenna 1 may be multiplexed as a diversity antenna of a wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.

The mobile communication module 150 may provide a solution including wireless communication of 2G/3G/4G/5G, etc. applied to the handset 100. The wireless communication module 160 may provide solutions for wireless communication applied to the mobile phone 100, including Wireless Local Area Networks (WLANs) (e.g., wireless fidelity (Wi-Fi) networks), Bluetooth (BT), Global Navigation Satellite System (GNSS), Frequency Modulation (FM), Near Field Communication (NFC), Infrared (IR), and the like. The wireless communication module 160 may be one or more devices integrating at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, performs frequency modulation and filtering processing on electromagnetic wave signals, and transmits the processed signals to the processor 110. The wireless communication module 160 may also receive a signal to be transmitted from the processor 110, perform frequency modulation and amplification on the signal, and convert the signal into electromagnetic waves through the antenna 2 to radiate the electromagnetic waves.

In some embodiments, the mobile phone 100 can be in communication connection with the smart test robot 200 through the mobile communication module 150 or the wireless communication module 160.

In some embodiments, the antenna 1 of the handset 100 is coupled to the mobile communication module 150 and the antenna 2 is coupled to the wireless communication module 160 so that the handset 100 can communicate with networks and other devices through wireless communication techniques. The wireless communication technology may include global system for mobile communications (GSM), General Packet Radio Service (GPRS), code division multiple access (code division multiple access, CDMA), Wideband Code Division Multiple Access (WCDMA), time-division code division multiple access (time-division code division multiple access, TD-SCDMA), Long Term Evolution (LTE), LTE, BT, GNSS, WLAN, NFC, FM, and/or IR technologies, among others. GNSS may include Global Positioning System (GPS), global navigation satellite system (GLONASS), beidou satellite navigation system (BDS), quasi-zenith satellite system (QZSS), and/or Satellite Based Augmentation System (SBAS).

The mobile phone 100 implements the display function through the GPU, the display screen 194, and the application processor. The GPU is a microprocessor for image processing, and is connected to the display screen 194 and an application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. The processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.

The mobile phone 100 may implement a shooting function through the ISP, the camera 193, the video codec, the GPU, the display 194, the application processor, and the like. In some embodiments of the present application, the display screen 194 is used to enable human-computer interaction with a user.

The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to extend the storage capability of the mobile phone 100. The external memory card communicates with the processor 110 through the external memory interface 120 to implement a data storage function. For example, the ECG data and PPG data of the user are saved in an external memory card.

The internal memory 121 may be used to store computer-executable program code, which includes instructions. The internal memory 121 may include a program storage area and a data storage area. The storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required by at least one function, and the like. The data storage area may store data (e.g., audio data, a phonebook, etc.) created during use of the handset 100, and the like. In addition, the internal memory 121 may include a high-speed random access memory, and may further include a nonvolatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (UFS), and the like. The processor 110 executes various functional applications of the cellular phone 100 and data processing by executing instructions stored in the internal memory 121 and/or instructions stored in a memory provided in the processor.

The mobile phone 100 can implement an audio function through the audio module 180, the speaker 180A, the receiver 180B, the microphone 180C, the earphone interface 180D, and the application processor. Such as music playing, recording, etc.

The keys 190 include a power-on key, a volume key, and the like. The keys 190 may be mechanical keys. Or may be touch keys. The cellular phone 100 may receive a key input, and generate a key signal input related to user setting and function control of the cellular phone 100.

The motor 191 may generate a vibration cue. The motor 191 may be used for incoming call vibration cues, as well as for touch vibration feedback. For example, touch operations applied to different applications (e.g., photographing, audio playing, etc.) may correspond to different vibration feedback effects. The motor 191 may also respond to different vibration feedback effects for touch operations applied to different areas of the display screen 194. Different application scenes (such as time reminding, receiving information, alarm clock, game and the like) can also correspond to different vibration feedback effects. The touch vibration feedback effect may also support customization.

Indicator 192 may be an indicator light that may be used to indicate a state of charge, a change in charge, or a message, missed call, notification, etc.

The SIM card interface 195 is used to connect a SIM card.

Referring now to fig. 11, the software system of the handset 100 may employ a layered architecture, an event-driven architecture, a microkernel architecture, a microservice architecture, or a cloud architecture. The embodiment of the application takes an Android system with a layered architecture as an example, and exemplarily illustrates a software structure of a terminal device. Fig. 11 illustrates a block diagram of the software architecture of the handset 100, according to some embodiments of the present application.

The layered architecture divides the software into several layers, each layer having a clear role and division of labor. The layers communicate with each other through a software interface. In some embodiments, the Android system is divided into four layers, an application layer, an application framework layer, an Android runtime (Android runtime) and system library, and a kernel layer from top to bottom.

The application layer may include a series of application packages.

As shown in fig. 11, the application package may include voice assistant applications, phone, camera, gallery, calendar, call, map, navigation, WLAN, bluetooth, music, video, short message, etc. applications.

The application framework layer provides an Application Programming Interface (API) and a programming framework for the application program of the application layer. The application framework layer includes a number of predefined functions.

As shown in FIG. 11, the application framework layers may include a window manager, content provider, view system, phone manager, resource manager, notification manager, and the like.

The window manager is used for managing window programs. The window manager can obtain the size of the display screen, judge whether a status bar exists, lock the screen, intercept the screen and the like.

The content provider is used to store and retrieve data and make it accessible to applications. The data may include video, images, audio, calls made and received, browsing history and bookmarks, phone books, etc.

The view system includes visual controls such as controls to display text, controls to display pictures, and the like. The view system may be used to build applications. The display interface may be composed of one or more views. For example, the display interface including the short message notification icon may include a view for displaying text and a view for displaying pictures.

The telephone manager is used for providing a communication function of the terminal equipment. Such as management of call status (including on, off, etc.).

The resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, and the like.

The notification manager enables the application to display notification information in the status bar, can be used to convey notification-type messages, can disappear automatically after a short dwell, and does not require user interaction. Such as a notification manager used to inform download completion, message alerts, etc. The notification manager may also be a notification that appears in the form of a chart or scroll bar text at the top status bar of the system, such as a notification of a background running application, or a notification that appears on the screen in the form of a dialog window. For example, text information is prompted in the status bar, a prompt tone is given, the terminal device vibrates, an indicator light flickers, and the like.

The Android Runtime comprises a core library and a virtual machine. The Android runtime is responsible for scheduling and managing an Android system.

The core library comprises two parts: one part is a function which needs to be called by java language, and the other part is a core library of android.

The application layer and the application framework layer run in a virtual machine. And executing java files of the application program layer and the application program framework layer into a binary file by the virtual machine. The virtual machine is used for performing the functions of object life cycle management, stack management, thread management, safety and exception management, garbage collection and the like.

The system library may include a plurality of functional modules. For example: surface managers (surface managers), Media Libraries (Media Libraries), three-dimensional graphics processing Libraries (e.g., OpenGL ES), 2D graphics engines (e.g., SGL), and the like.

The surface manager is used to manage the display subsystem and provide fusion of 2D and 3D layers for multiple applications.

The media library supports a variety of commonly used audio, video format playback and recording, and still image files, among others. The media library may support a variety of audio-video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.

The three-dimensional graphic processing library is used for realizing three-dimensional graphic drawing, image rendering, synthesis, layer processing and the like.

The 2D graphics engine is a drawing engine for 2D drawing.

The kernel layer is a layer between hardware and software. The inner core layer at least comprises a display driver, a camera driver, an audio driver and a sensor driver.

Reference in the specification to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one example embodiment or technology in accordance with the present disclosure. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment.

The disclosure also relates to an operating device for executing in text. This apparatus may be specially constructed for the required purposes, or it may comprise a general-purpose computer selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a computer readable medium, such as, but is not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, magnetic-optical disks, read-only memories (ROMs), Random Access Memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, Application Specific Integrated Circuits (ASICs), or any type of media suitable for storing electronic instructions, and each may be coupled to a computer system bus. Further, the computers referred to in the specification may include a single processor or may be architectures employing multiple processor designs for increased computing capability.

The processes and displays presented herein are not inherently related to any particular computer or other apparatus. Various general-purpose systems may also be used with programs in accordance with the teachings herein, or it may prove convenient to construct more specialized apparatus to perform one or more method steps. The structure for a variety of these systems is discussed in the description that follows. In addition, any particular programming language sufficient to implement the techniques and embodiments of the present disclosure may be used. Various programming languages may be used to implement the present disclosure as discussed herein.

Moreover, the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the disclosed subject matter. Accordingly, the present disclosure is intended to be illustrative, but not limiting, of the scope of the concepts discussed herein.

Claims

1. A method for testing intelligent dialogue capability of a voice assistant application program, which is used for a system comprising an electronic device and a testing device, comprises the following steps:

the test equipment plays a test statement to the electronic equipment and receives a feedback operation generated by the electronic equipment on the test statement by using a voice assistant application program;

2. The method of claim 1, wherein the test device compares the intended dialog path with a preset intended dialog path associated with a test statement by:

3. The method of claim 2, wherein the intent of each test statement and feedback statement corresponding to a feedback operation in the intended dialog path corresponds to a node, comprising:

4. The method of claim 3, wherein traversing the contents of a node in the intended dialog path comprises:

5. The method of claim 2, wherein determining the level of intelligent dialog capabilities of the voice assistant application comprises:

6. The method of claim 1, wherein determining the level of intelligent dialog capabilities of the voice assistant application comprises:

7. The method of claim 1, wherein the feedback operation comprises feedback speech for the test statement and/or feedback information associated with the feedback speech presented on a display screen of the electronic device.

8. The method of claim 1, wherein the preset intent dialog path associated with the test statement is comprised of an intent based on the test statement and an intent fed back for a setting of the test statement.

9. The method of claim 1, wherein the intelligent dialog capability comprises at least one of: semantic understanding ability, behavior intelligent dialogue ability and bottom pocket expression ability.

10. A computer-readable medium having stored thereon instructions that, when executed on a computer, cause the computer to perform the method for intelligent dialog capability testing of a voice assistant application of any of claims 1 to 9.

11. A test apparatus, comprising:

A processor, one of the processors of the test equipment, for performing the method for testing intelligent dialog capabilities of a voice assistant application of any of claims 1 to 9.