CN113220590A

CN113220590A - Automatic testing method, device, equipment and medium for voice interaction application

Info

Publication number: CN113220590A
Application number: CN202110624566.1A
Authority: CN
Inventors: 余卓成; 常乐; 陈孝良
Original assignee: Beijing SoundAI Technology Co Ltd
Current assignee: Beijing SoundAI Technology Co Ltd
Priority date: 2021-06-04
Filing date: 2021-06-04
Publication date: 2021-08-06

Abstract

The application discloses an automatic testing method, device, equipment and medium for voice interaction application, and belongs to the technical field of computers. In the embodiment of the application, the test audio can be directly obtained, the test audio is played through the artificial mouth, artificial pronunciation can be replaced, the voice interaction application is tested, the test process does not need to depend on manual work, the test case can be automatically executed, the automatic test process is realized, the artificial mouth is used as a special artificial sound source, the human mouth can be simulated to make sound, the test audio is played, the voice signal acquired by the voice interaction application and the voice signal acquired by acquiring the artificial pronunciation are not poor, the accuracy of the voice interaction application during testing based on the test is consistent with the accuracy during testing of the artificial pronunciation mode, the labor cost is reduced by the automatic test process, and the test efficiency can be greatly improved.

Description

Automatic testing method, device, equipment and medium for voice interaction application

Technical Field

The present application relates to the field of computer technologies, and in particular, to an automated testing method, apparatus, device, and medium for voice interaction applications.

Background

With the development of computer technology and the rise of artificial intelligence industry, voice interaction applications are increasing. The voice interaction application has a voice interaction function, and when the voice interaction application is tested, the need of testing the voice interaction function of the voice interaction application is avoided.

At present, a testing method of voice interactive application is usually completed in a mode of manual pronunciation, and the automatic execution cannot be realized only by automatic software. The testing mode of the artificial pronunciation has high labor cost and lower testing efficiency.

Disclosure of Invention

The embodiment of the application provides an automatic testing method, device, equipment and medium for voice interaction application, which can reduce labor cost and improve testing efficiency. The technical scheme is as follows:

in one aspect, an automated testing method for a voice interaction application is provided, where the method includes:

acquiring an automatic test case, wherein the automatic test case comprises test audio and expected processing data;

playing the test audio in the automatic test case based on the manual mouth;

based on voice interaction application, carrying out interaction processing on the test audio played by the artificial mouth;

and determining a test result based on the data generated by the interactive processing and the expected processing data in the automatic test case.

In some embodiments, the test audio acquisition process comprises:

and acquiring the sound emitted by the user based on recording equipment to obtain test audio, wherein the content of the test audio adopts the sentence patterns or the keywords corresponding to the voice interaction function of the voice interaction application.

In some embodiments, the test audio acquisition process comprises:

acquiring historical data of the voice interaction application;

and extracting historical audio when the user interacts with the voice interaction application from the historical data to be used as test audio.

In some embodiments, the determining a test result based on the data generated by the interactive processing and expected processing data in the automated test case comprises:

comparing the data generated by the interactive processing with expected processing data in the automatic test case;

in response to the comparison result indicating that the data generated by the interactive processing is consistent with the expected processing data, determining that the test result indicates that the automated test case passes the test;

and in response to the comparison result indicating that the data generated by the interactive processing is inconsistent with the expected processing data, determining that the test result indicates that the automated test case test fails.

In some embodiments, the data generated by the interactive processing includes log information stored by a server corresponding to the voice interaction application, log information stored by a terminal where the voice interaction application is located, and interface display information when the terminal feeds back the test audio.

In some embodiments, the interface display information includes elements displayed in an interface when the terminal feeds back the test audio; or the interface display information comprises element changes corresponding to interface jumping when the terminal feeds back the test audio.

In some embodiments, the interactive processing of the test audio played by the artificial mouth based on the voice interaction application includes:

based on voice interaction application, collecting the test audio played by the artificial mouth to obtain a voice signal, wherein the content of the voice signal comprises the content of the test audio;

acquiring feedback information corresponding to the voice signal;

and displaying the feedback information based on the terminal where the voice interaction application is located.

In some embodiments, the displaying the feedback information based on the terminal where the voice interaction application is located includes any one of:

displaying the feedback information in a current display interface of the terminal where the voice interaction application is located;

and displaying the feedback information in a feedback interface after the voice interaction application jumps in the terminal.

In some embodiments, the method further comprises:

responding to the test result to indicate that the test case fails to pass the test, and performing screen capture processing on an interface displayed when the terminal feeds back the test audio to obtain a screen capture;

and sending the screenshot to a target account, and determining errors contained in a voice interaction function of the voice interaction application by the target account based on the screenshot.

In one aspect, an apparatus for automated testing of voice interaction applications is provided, the apparatus comprising:

the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring an automatic test case, and the automatic test case comprises test audio and expected processing data;

the playing module is used for playing the test audio in the automatic test case based on the manual mouth;

the processing module is used for carrying out interactive processing on the test audio played by the artificial mouth based on the voice interactive application;

and the determining module is used for determining a test result based on the data generated by the interactive processing and the expected processing data in the automatic test case.

In some embodiments, the test audio acquisition process comprises:

acquiring historical data of the voice interaction application;

In some embodiments, the determination module is to:

In some embodiments, the processing module is to:

acquiring feedback information corresponding to the voice signal;

In some embodiments, the processing module is to perform any one of:

In some embodiments, the apparatus further comprises:

the screen capture module is used for responding to the test result and indicating that the test of the automatic test case fails, and performing screen capture processing on an interface displayed when the terminal feeds back the test audio to obtain a screenshot;

and the sending module is used for sending the screenshot to a target account, and the target account determines the error contained in the voice interaction function of the voice interaction application based on the screenshot.

In one aspect, an electronic device is provided that includes one or more processors and one or more memories having stored therein at least one computer program that is loaded and executed by the one or more processors to implement various alternative implementations of the automated testing method for voice interaction applications described above.

On one hand, an automatic testing system of voice interaction application is provided, and the system comprises a terminal where the voice interaction application is located, a manual mouth and processing equipment; the processing equipment is used for the automatic test case to comprise test audio and expected processing data; determining a test result based on the data generated by the interactive processing and expected processing data in the automatic test case; the manual mouth is used for playing the test audio in the automatic test case acquired by the processing equipment; and the terminal where the voice interaction application is located is used for carrying out interaction processing on the test audio played by the artificial mouth.

In some embodiments, the system further comprises a sound recording device for collecting sound emitted by a user to obtain the test audio.

In one aspect, a computer-readable storage medium is provided, in which at least one computer program is stored, the at least one computer program being loaded and executed by a processor to implement various alternative implementations of the automated testing method for voice interaction applications described above.

In one aspect, a computer program product or computer program is provided that includes one or more program codes stored in a computer-readable storage medium. The one or more program codes are read from the computer-readable storage medium by one or more processors of the electronic device, and the one or more processors execute the one or more program codes, so that the electronic device executes the automated testing method for the voice interaction application of any one of the above possible embodiments.

In the embodiment of the application, the test audio can be directly obtained, the test audio is played through the artificial mouth, artificial pronunciation can be replaced, the voice interaction application is tested, the test process does not need to depend on manual work, the test case can be automatically executed, the automatic test process is realized, the artificial mouth is used as a special artificial sound source, the human mouth can be simulated to make sound, the test audio is played, the voice signal acquired by the voice interaction application and the voice signal acquired by acquiring the artificial pronunciation are not poor, the accuracy of the voice interaction application during testing based on the test is consistent with the accuracy during testing of the artificial pronunciation mode, the labor cost is reduced by the automatic test process, and the test efficiency can be greatly improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.

FIG. 1 is a schematic diagram of an automated testing system for voice interaction applications provided by an embodiment of the present application;

FIG. 2 is a flowchart of an automated testing method for a voice interaction application according to an embodiment of the present application;

FIG. 3 is a flowchart of an automated testing method for a voice interaction application according to an embodiment of the present application;

FIG. 4 is a schematic structural diagram of an automated testing apparatus for a voice interaction application according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of an electronic device provided in an embodiment of the present application;

fig. 6 is a block diagram of a terminal according to an embodiment of the present disclosure;

fig. 7 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

The terms "first," "second," and the like in this application are used for distinguishing between similar items and items that have substantially the same function or similar functionality, and it should be understood that "first," "second," and "nth" do not have any logical or temporal dependency or limitation on the number or order of execution. It will be further understood that, although the following description uses the terms first, second, etc. to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, the first image is referred to as a second image, and similarly, the second image is referred to as a first image without departing from the scope of the various examples. The first image and the second image are both images, and in some cases, separate and distinct images.

The term "at least one" is used herein to mean one or more, and the term "plurality" is used herein to mean two or more, e.g., a plurality of packets means two or more packets.

It is to be understood that the terminology used in the description of the various examples herein is for the purpose of describing particular examples only and is not intended to be limiting. As used in the description of the various examples and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. The term "and/or" is an associative relationship that describes an associated object, meaning that there are three relationships, e.g., A and/or B, meaning: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" in the present application generally indicates that the former and latter related objects are in an "or" relationship.

It should also be understood that, in the embodiments of the present application, the size of the serial number of each process does not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

It should also be understood that determining B from a does not mean determining B from a alone, but also from a and/or other information.

It will be further understood that the terms "Comprises," "Comprising," "inCludes" and/or "inCluding," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also understood that the term "if" may be interpreted to mean "when" ("where" or "upon") or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined." or "if [ a stated condition or event ] is detected" may be interpreted to mean "upon determining.. or" in response to determining. "or" upon detecting [ a stated condition or event ] or "in response to detecting [ a stated condition or event ]" depending on the context.

The following describes an embodiment of the present application.

Fig. 1 is a schematic diagram of an automated testing system for a voice interaction application according to an embodiment of the present application. As shown in fig. 1 (a), the system includes a terminal where a voice interaction application is located, a human mouth, and a processing device. Wherein, the processing device is connected with the terminal where the voice interaction application is located through a wireless network or a wired network. The processing device is connected with the artificial mouth through a wireless network or a wired network.

The terminal of the voice interaction application is at least one of a smart phone, a game host, a desktop computer, a tablet computer, an electronic book reader, an MP3(Moving Picture Experts Group Audio Layer III, dynamic video Experts compression standard Audio Layer 3) player or an MP4(Moving Picture Experts Group Audio Layer IV, dynamic video Experts compression standard Audio Layer 4) player, a laptop portable computer, a vehicle-mounted terminal, an intelligent robot and a self-service payment device.

The voice interaction application is installed and operated on a terminal where the voice interaction application is located, and other applications can be installed and operated, for example, the application program is a system application, an instant messaging application, a news pushing application, a shopping application, an online video application, and a social application.

Illustratively, the terminal where the voice interaction application is located has a voice interaction function. The terminal where the voice interaction application is located can collect sound in the surrounding environment and carries out interaction processing based on the collected sound.

In some embodiments, the terminal where the voice interaction application is located may independently complete the work, and may also provide a data service for the terminal through a server corresponding to the voice interaction application. The embodiments of the present application do not limit this.

The server corresponding to the voice interaction application comprises at least one of a server, a plurality of servers, a cloud computing platform and a virtualization center. The server corresponding to the voice interaction application is used for providing background service for supporting the voice interaction application.

Optionally, a server corresponding to the voice interaction application undertakes primary processing, and a terminal where the voice interaction application is located undertakes secondary processing; or, the server corresponding to the voice interactive application undertakes the secondary processing work, and the terminal where the voice interactive application is located undertakes the primary processing work; or, the server corresponding to the voice interactive application or the terminal where the voice interactive application is located respectively and independently undertake processing work. Or, a distributed computing architecture is adopted between the server corresponding to the voice interactive application and the terminal where the voice interactive application is located for performing collaborative computing.

Optionally, the server corresponding to the voice interaction application includes at least one server and a database, where the database is used to store data. The server is an independent physical server, is also a server cluster or distributed system formed by a plurality of physical servers, and is also a cloud server for providing basic cloud computing services such as cloud service, a cloud database, cloud computing, cloud functions, cloud storage, network service, cloud communication, middleware service, domain name service, security service, CDN (content delivery network) and a big data and artificial intelligence platform. The terminal is a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, etc., but is not limited thereto.

Illustratively, the processing device is a server corresponding to the voice interaction application. Illustratively, the processing device is a terminal corresponding to the voice interaction application. Illustratively, the processing device is a server corresponding to the voice interaction application and another device other than the terminal where the voice interaction application is located, and the other device may be a terminal or a server.

In the embodiment of the application, the voice interaction application can be tested through the automatic testing system of the voice interaction application. Specifically, the terminal, the manual mouth, and the processing device where the voice interaction application included in the system is located may be placed in a quiet environment, and then each device in the system starts to execute an automated test case, so as to implement a testing process. The following provides a test procedure.

The processing device is used for the automatic test case to comprise test audio and expected processing data. The manual mouth is used for playing the test audio in the automatic test case acquired by the processing equipment. The terminal where the voice interaction application is located is used for carrying out interaction processing on the test audio played by the artificial mouth. The processing device is used for determining a test result based on the data generated by the interactive processing and expected processing data in the automatic test case.

After the processing equipment acquires the automatic test case, the processing equipment can control the manual mouth to play the test audio in the automatic test case through the connection with the manual mouth. And then the terminal where the voice interaction application is located collects surrounding sound, namely collects voice signals, and the content of the voice signals comprises the content of the test audio. The terminal where the voice interaction application is located can perform interaction processing based on the collected voice signals, some data can be generated in the interaction processing process, and the processing equipment can determine whether the voice interaction function of the voice interaction application is normal or not based on comparison between the data generated in the interaction process and expected processing data.

In some embodiments, as shown in fig. 1 (b), the system further comprises a sound recording device for collecting the sound emitted by the user to obtain the test audio. Therefore, the recording equipment can be used before testing, test audio is recorded in advance, and then the manual mouth can call the recorded test audio to play in the recording equipment by compiling an automatic test case.

Fig. 2 is a flowchart of an automated testing method for a voice interaction application according to an embodiment of the present application, where the method may be applied to an electronic device, where the electronic device is a terminal or a server. The method can also be applied to the automatic testing system of the voice interaction application. Referring to fig. 2, the method includes the following steps.

201. An automated test case is obtained, the automated test case including test audio and expected processing data.

202. And playing the test audio in the automatic test case based on the manual mouth.

203. And performing interactive processing on the test audio played by the artificial mouth based on the voice interactive application.

204. And determining a test result based on the data generated by the interactive processing and the expected processing data in the automatic test case.

In some embodiments, the test audio acquisition process includes:

and acquiring the sound emitted by the user based on the recording equipment to obtain a test audio, wherein the content of the test audio adopts a sentence pattern or a keyword corresponding to the voice interaction function of the voice interaction application.

In some embodiments, the test audio acquisition process includes:

acquiring historical data of the voice interaction application;

and extracting historical audio when the user interacts with the voice interaction application from the historical data as test audio.

In some embodiments, the determining a test result based on the data generated by the interactive process and expected process data in the automated test case comprises:

comparing the data generated by the interactive processing with the expected processing data in the automatic test case;

and in response to the comparison result indicating that the data generated by the interactive process is inconsistent with the expected process data, determining that the test result indicates that the automated test case test fails.

In some embodiments, the interactive voice-based application performs interactive processing on the test audio played by the artificial mouth, including:

acquiring feedback information corresponding to the voice signal;

In some embodiments, the method further comprises:

responding to the test result to indicate that the test of the automatic test case fails, and performing screen capture processing on an interface displayed when the terminal feeds back the test audio to obtain a screen capture;

and sending the screenshot to a target account, and determining errors contained in the voice interaction function of the voice interaction application by the target account based on the screenshot.

Fig. 3 is a flowchart of an automated testing method for a voice interaction application according to an embodiment of the present application, and referring to fig. 3, the method includes the following steps.

301. The electronic device obtains an automated test case that includes test audio and expected processing data.

The voice interaction application has a voice interaction function, and the voice interaction is a man-machine interaction mode. Human-Computer Interaction (HCI) refers to a process of exchanging information between a person and a Computer for completing a certain task in a certain interactive manner by using a certain dialogue language between the person and the Computer.

The voice interaction refers to that a person sends voice, the computer can search information according to the voice sent by the person and feed back certain information to the person, and the feedback mode can be a display mode, a voice playing mode or a display mode and a voice playing mode.

In the embodiment of the application, the automatic test can be realized on the voice interaction application through the automatic test case. In the testing process, the voice interaction function of the voice interaction application needs to be tested, so that the automatic test case needs to include a test audio, and whether the voice interaction function of the voice interaction application is normal is tested by observing the processing condition of the voice interaction application after the test audio is played.

The automated test case can be compiled by a technician in the related art when testing the voice interaction application is required. The automated test case includes test audio and expected processing data. The following will specifically describe these two data types.

For the test audio in the automatic test case, the test audio can be directly obtained and can be directly played when the test is needed so as to carry out the test. Therefore, manual pronunciation can be replaced, manual pronunciation is not required to be arranged on a test site, an automatic test case can be automatically executed, and an automatic test process is realized.

In some embodiments, the test audio may be collected based on the sound made by the user by the recording device. Therefore, the test audio comprises the characteristic of artificial pronunciation, and is more similar to the artificial pronunciation compared with the speech converted from the text by a machine, so that the test accuracy can be better ensured by testing. In some embodiments, the recording device may be a recording sound card through which test audio can be collected. The test audio is a test corpus that is played directly to support the test process.

Specifically, the obtaining process of the test audio may be: and acquiring the sound emitted by the user based on the recording equipment to obtain a test audio, wherein the content of the test audio adopts a sentence pattern or a keyword corresponding to the voice interaction function of the voice interaction application.

The voice interaction application may include one or more voice interaction functions, such as weather queries, calendar queries, time queries, route queries, audio queries, connected headset power queries, volume control functions, audio search and play control functions, and the like.

For different voice interaction functions, different sentence patterns or keywords may be provided, which may be referred to as dialogs, and herein refer to features of the sentences used to trigger the corresponding voice interaction functions. For example, the schema corresponding to the weather query function may be "how much weather is today", "weather is today", or "how weather is today", etc. The keywords corresponding to the weather query function may include "weather", "today", and the like. The sentence pattern corresponding to the audio search and play control function may be "switch to next song", "switch song", "play XXX", and so on. The keyword corresponding to the audio search and play control function may be "switch" or "play", etc. The above description is only an exemplary illustration, and the embodiments of the present application are not limited thereto.

In other embodiments, the test audio may be extracted from historical data of the voice interaction application. Specifically, the obtaining process of the test audio may be: acquiring historical data of the voice interaction application; and extracting historical audio when the user interacts with the voice interaction application from the historical data as test audio.

The historical data is data generated in the process for using the voice interactive application. If the user controls the voice interactive application through voice, the voice interactive application collects the sound emitted by the user as audio, and analyzes and processes the audio, and the audio collected by the voice interactive application is also the historical audio.

Expected process data in an automated test case refers to data resulting from the expected interactive processing based on the test audio. That is, if the voice interaction function of the voice interaction application is normal, when the test audio is collected, the data generated when the voice interaction application performs the interaction processing is the expected processing data. The expected processing data is used for comparing with data generated in the test process to analyze whether the voice interaction function of the current voice interaction application is normal.

302. And playing the test audio in the automatic test case by the electronic equipment based on the manual mouth.

The artificial mouth is a special artificial sound source, and is also called as a simulated mouth or an artificial mouth. The artificial mouth can simulate the sound produced by the human mouth when playing audio, so the artificial mouth is called as the artificial mouth. The artificial mouth can simulate the average directivity and radiation pattern of the human mouth, and the difference between the emitted sound and the human sound is small due to the frequency compensation and the sound pressure limitation. The artificial mouth is a high-fidelity sound box, has better fidelity effect and can simulate the sound emitted by a human very well.

The construction of the artificial mouth will be described here only by way of one example. The artificial mouth is formed by mounting a small speaker on a specially shaped baffle. The mask is shaped to simulate the average directivity and radiation pattern of a human mouth. The artificial mouth may also incorporate a frequency compensation network or sound compression to achieve a certain sound frequency response for simulating testing or calibrating the electro-acoustic characteristics of the microphone under actual operating conditions.

The test audio frequency in this automatic test case is for recording the audio frequency that obtains in advance or drawing from historical data, this test audio frequency can directly obtain like this, need not the scene by artifical vocal, the play of reunion manual mouth, the sound that this artifical mouth was broadcast like this can be with artifical pronunciation phase difference little, thereby replace artifical pronunciation, when having guaranteed the accuracy of testing process, the cost of labor has been reduced, make this automatic test case can automatic execution, automatic testing process has been realized.

In some embodiments, the human mouth is capable of invoking the test audio for playback from the recording device when the test audio is recorded based on the recording device. Related technical personnel can write the automatic test script well, and based on the automatic test script, the recording equipment can be called to call the test audio and play the test audio.

303. The electronic equipment collects the test audio played by the artificial mouth based on the voice interaction application to obtain a voice signal, wherein the content of the voice signal comprises the content of the test audio.

The device in which the voice interaction application is located can have a voice interaction function, and specifically, the device in which the voice interaction application is located can collect sounds in the surrounding environment.

The device where the voice interaction application is located may be a terminal, and hereinafter, the device where the voice interaction application is located is referred to as the terminal where the voice interaction application is located. Considering that the terminal has an interface display function, the actual performance of the voice interaction application can be better tested, and the voice interaction application can be installed by using the terminal and tested.

For example, the terminal where the voice interaction application is located may be installed with a sound collection device, for example, the sound collection device may be a microphone or a microphone array, and the terminal where the semantic interaction application is located may collect sound in the surrounding environment based on the microphone or the microphone array. The sound collection device may also be other devices, such as a microphone array, which is not limited in the embodiments of the present application.

In some embodiments, the automated testing process may be performed in a quiet testing environment to avoid recognition errors caused by external noise interference. In the automatic testing process, the testing audio is played by the manual mouth, and the surrounding environment comprises the sound played by the manual mouth. The voice interaction application can collect the sound in the surrounding environment to obtain the voice signal, and the process of collecting the test audio played by the artificial mouth to obtain the voice signal is also realized.

In some embodiments, the placement of the device where the artificial mouth and the voice interaction application are located may be set during the testing process. Accordingly, in step 302, the artificial mouth may determine a playing volume according to the placement position, and play the test audio based on the playing volume. Through correlating the placing position and the playing volume of the artificial mouth and the playing position, the decibel value of a voice signal of the test audio played by the artificial mouth after being collected by the voice interaction application is larger, so that the test audio is interactively processed and cannot be considered as noise.

Correspondingly, in step 303, based on the voice interaction application, the test audio played by the artificial mouth is collected, after the voice signal is obtained, the decibel value of the voice signal can be obtained, and in response to the decibel value of the voice signal being greater than the decibel threshold, the following step 304 is executed. If the decibel value of the voice signal is less than or equal to the decibel threshold, the voice signal is discarded. Discarding the voice signal, and not performing subsequent processing on the voice signal and performing corresponding voice interaction function.

In this embodiment, the voice interaction application may be provided with a decibel threshold corresponding to the interaction processing, and when the decibel value of the collected voice signal is greater than the decibel threshold, the voice signal is considered to be a signal for controlling the voice interaction application to perform voice interaction. If the decibel value is less than or equal to the decibel threshold, the voice signal is considered to be noise in the surrounding environment or a chat signal between users.

304. The electronic equipment acquires feedback information corresponding to the voice signal.

After the voice interactive application collects the voice signal, the voice signal can be processed to identify an intent of the voice signal and determine the required feedback information based on the intent.

In some embodiments, the processing of the speech signal may include processes of speech recognition, intent recognition, and slot extraction. Since the voice signal is a signal in a voice form, the intention of the voice signal cannot be directly known from the voice signal, and the voice signal can be firstly subjected to voice recognition to obtain the text content of the voice signal.

The speech recognition is a technology for a machine to convert a speech signal into a corresponding text or command through a recognition and understanding process, and in brief, the speech recognition is a process for converting a speech signal into a text content of the speech signal. After the text content of the speech signal has thus been obtained by speech recognition, the text content can be analyzed to determine how the speech signal is intended for the speech interactive application to do.

Specifically, the Voice recognition process includes Voice Activity Detection (VAD), framing, feature extraction, feature matching, and the like. Here, framing refers to dividing a speech signal into segments, and the framing process is implemented by a window function, such as a hamming window, a hanning window, and the like. The feature extraction step may be implemented by various algorithms, for example, Mel Frequency Cepstrum Coefficient (MFCC), and Mel Frequency spectrum features are obtained by feature extraction. The feature is usually expressed in a vector form, and the feature extraction process can extract a feature vector of the sentence. The feature matching step is used for matching the features to the corresponding phonemes and further matching the phonemes to the characters. The process can be realized by an acoustic model and a dictionary. For example, the features are converted into phonemes through an acoustic model, and the words corresponding to the phonemes of the audio signal are determined according to the correspondence between the phonemes and the words, that is, the sentence is obtained.

The intent in intent recognition is used for the purpose of describing the user's voice interaction with the machine. In the embodiment of the present application, it is intended to be used for the purpose of describing the voice interaction between the user and the voice interaction application, that is, the user desires instructions or feedback of certain information performed by the voice interaction application. For example, intentions include switching songs, searching for and playing a certain song, querying for weather, querying for routes, querying for calendars, and so forth. Intent recognition refers to recognizing to which intent a user utters speech.

Specifically, the intention identification process may be implemented based on any intention identification manner, such as a sentence-based database comparison manner, an intention identification model identification manner, and a matching manner with the seed sentence of the candidate intention. The intention recognition model may be a classifier such as bayes, xgboost, bert, etc., and may be a model of other types or a statistical training probability analysis based on a word bag (grammar), a word bag, a bigram, etc., which is not described herein in detail. The embodiment of the present application does not limit which specific manner this intention recognition is performed.

The slot in the slot extraction can be understood as an attribute, and the key information can be understood as an attribute value corresponding to the attribute. In the embodiment of the application, the preliminary user intention can be understood as information required to be supplemented by the specific user instruction. In the embodiment of the present application, the slot involved in the voice uttered by the user may be used to indicate the name of the song that the user wants to search for, or the content that the user wants to obtain, or the subject of the target question asked by the user, such as "weather", for example, "today", for example, "XXX (song title)".

Slot extraction refers to a process of obtaining key information corresponding to a slot from a statement, and may also be referred to as slot filling. The purpose of this process is to complement the information for the purpose of translating the user's intent into an instruction that is specific to the user. For example, if the speech recognition result is "play XXX", the slot extraction is targeted to extract "XXX".

Specifically, after the intention is recognized, the slot position extraction can be performed on the voice recognition result based on the recognized intention, and the slot position extraction process can be realized in various ways, for example, slot position extraction is performed based on a recursive transfer Network, slot position extraction is performed based on a long-short term memory Network LSTM and a conditional random field CRF, and for example, slot position extraction is performed based on a Deep Belief Network (DBN), a Support Vector Machine (SVM), or a bidirectional Recurrent Neural Network (RNN), and which way is specifically adopted in the slot position extraction process is not limited in the embodiment of the present application.

In some embodiments, the voice recognition process may be executed by the terminal where the voice interaction application is located, then the terminal where the voice interaction application is located sends the text content obtained by voice recognition to the server corresponding to the voice interaction application, the server executes the intent recognition and slot position extraction steps, and determines the feedback information based on the final recognition result and the slot position, and the server may send the feedback information to the terminal where the voice interaction application is located, and the terminal where the voice interaction application is located executes the subsequent display step, that is, step 305.

In different voice interaction functions, the intention obtained by recognition is different, and the way of obtaining feedback information is also different. For example, in the weather query function, the identified intention may be "weather query", the slot is "today", "weather", and the like, and the voice interaction application may be capable of acquiring the weather information of today from the internet as the feedback information. For another example, in the route query function, the recognized intention may be "route query", the slot may be "position 1 (start point)" and "position 2 (end point)", and the voice interactive application may acquire all routes with the position 1 as the start point and the position 2 as the end point from the internet as the feedback information. For another example, in the audio search and playback control function, the identified intention may be "audio playback control", and the slot is "XXX (song name)", and the voice interaction application may obtain an audio file of "XXX (song name)" from the audio database as the feedback information.

305. And the electronic equipment displays the feedback information based on the terminal where the voice interaction application is located.

After the electronic equipment acquires the feedback information, the electronic equipment can interact with the user based on the feedback information.

During interaction, the terminal where the voice interaction application is located may display the feedback information in the current display interface of the terminal where the voice interaction application is located, or may display the feedback information in the feedback interface after the jump in the terminal where the voice interaction application is located. Whether the interface jump is carried out or not can be determined by related technical personnel when the voice interaction function of the voice interaction application is set, and can also be determined according to the feedback information.

For example, if the feedback information is weather information, the interface does not need to jump, and the weather information is displayed in the current display interface. If the feedback information is the searched audio file of a certain song, if the current display interface is not the playing interface, the interface can be jumped to the playing interface, the related information of the song is displayed in the playing interface, and the audio file can be played.

The foregoing steps 303 to 305 are processes of performing an interactive process on the test audio played by the artificial mouth based on the voice interaction application, and the foregoing has been described only by taking the example that the voice interaction application acquires the voice signal and then acquires and displays the feedback information.

306. The electronic device determines a test result based on the data generated by the interactive processing and the expected processing data in the automated test case.

Through the steps, the voice interaction application carries out interaction processing on the test audio played by the artificial mouth, and a great variety of data can be generated in the interaction processing process, such as collected voice signals, voice recognition results of the voice signals, intention recognition results, slot position extraction results, feedback information, how the terminal where the voice interaction application is located displays the feedback information, and the like.

The data can reflect the voice interaction mode of the voice interaction application for the test audio, and the data generated by the interaction processing can be compared with the expected processing data to determine whether the voice interaction function is normal. Specifically, in step 306, the data generated by the interactive process can be compared with the expected process data in the automation test case. The comparison result can be obtained through the comparison process. The test result can be determined by the comparison result, and the following two cases can be specifically included.

And in response to the comparison result indicating that the data generated by the interactive processing is consistent with the expected processing data, determining that the test result indicates that the automated test case passes the test.

In the first case, the comparison result indicates that the data generated by the interactive processing is consistent with the expected processing data, and thus, the voice interaction function of the voice interaction application can achieve an expected voice interaction effect, and it can be determined that the automatic test case passes the test, so that the voice interaction application can normally achieve the voice interaction function corresponding to the automatic test case.

And in the second case, in response to the comparison result indicating that the data generated by the interactive processing is inconsistent with the expected processing data, determining that the test result indicates that the test of the automatic test case fails.

In the second case, the comparison result indicates that the data generated by the interactive processing is inconsistent with the expected processing data, it is obvious that the expected voice interaction effect cannot be achieved when the voice interaction application performs the interactive processing on the test audio, and the process is different from the expected interactive processing process, and at this time, it can be considered that the voice interaction function of the voice interaction application may need to be modified to achieve the expected voice interaction effect, so that it can be determined that the automatic test case fails to test, which indicates that the voice interaction application cannot normally implement the voice interaction function corresponding to the automatic test case, and it is further determined whether the voice interaction function of the voice interaction application needs to be optimized.

In some embodiments, the data generated by the above-mentioned comparative interactive processing and the expected processing data may include various data, for example, not only the result of the interactive processing, but also the processing result of each link in the interactive processing, so as to analyze whether each link in the interactive processing can be implemented normally in a fine-grained manner. Specifically, the data generated by the interactive processing includes log information stored by a server corresponding to the voice interactive application, log information stored by a terminal where the voice interactive application is located, and interface display information when the terminal feeds back the test audio.

The voice interaction application can record data generated in the process in the interactive processing process to generate log information. The voice interaction application may execute different steps through the server corresponding to the voice interaction application and the terminal where the voice interaction application is located in the interactive processing process, so that log information may be generated in both the devices, and the log information generated in the two devices may be the same or different. For example, the two devices may record data generated by themselves, or may record data generated by the other device synchronously. In addition, the terminal also displays the feedback information in step 305, and may compare the interface display information during the feedback to determine the test result in order to ensure that the interface display function of the voice interaction function is normal.

For example, in a specific example, the log information stored by the terminal where the voice interaction application is located may include a voice recognition result of the collected voice signal and feedback information returned by the server. The log information stored by the server corresponding to the voice interactive application may include a voice recognition result, an intention recognition result, a slot extraction result, and feedback information of the voice signal. The present disclosure is only an exemplary illustration, and the embodiment of the present disclosure does not limit the specific log information.

For the interface display information, when the interface display information is compared, the interface display information can be realized by acquiring elements displayed in the interface, and specifically, the interface display information comprises the elements displayed in the interface when the terminal feeds back the test audio; or the interface display information comprises element changes corresponding to interface jumping when the terminal feeds back the test audio. And then the comparison is carried out based on the acquired elements and the expected displayed elements in the expected processing data, so that the whole interface does not need to be acquired, the comparison of interface display can be realized by comparing the displayed elements, the acquired data volume is small, and the comparison efficiency is higher.

In other embodiments, when the interface display information is compared, screen capture processing may also be performed on the displayed interface to obtain an interface screen capture, where the interface screen capture may include a screen capture of the display interface during feedback, and in the case of interface jump, the interface screen capture may include screen captures of two display interfaces before and after the interface jump. And then the interface screenshot can be compared with an expected interface image in expected processing data, so that the overall comparison of the interface can be realized by comparing the interface screenshot with the expected interface image, and the test of the overall interface display can be realized.

In some embodiments, the test result may indicate that the automated test case passes the test, or may indicate that the automated test case fails the test, and when the test fails, the technician may need to determine which problems exist in the voice interaction function, so as to optimize the voice interaction function. Specifically, the screen capture processing may be performed on an interface displayed when the terminal feeds back the test audio to obtain a screen capture in response to the test result indicating that the test of the automated test case fails; and sending the screenshot to a target account, and determining errors contained in the voice interaction function of the voice interaction application by the target account based on the screenshot.

The target account is the account of the technician, the screen capture processing is performed on the interface display condition when the interaction processing process has problems, and the screen capture processing is sent to the technician, so that the technician can accurately know which voice interaction function has an error, and can further determine what the error contained in the voice interaction function is, and therefore corresponding measures can be taken to correct the error.

The above process is a process of performing an automatic test on the voice interaction function of the voice interaction application, and the above process is described as a process of executing one automatic test case, where the number of the automatic test cases may be one or more in the test process, and the execution process of each automatic test case is the same as the above process, which is not repeated herein.

It should be noted that, in addition to the voice interaction function, the voice interaction application may also include other functions, such as functions corresponding to basic operations, such as clicking, sliding, inputting, searching, and the like. For other functions, other functions can also be tested, and different from the above process, the automated test case does not include a test audio, the automated test case is directly executed at a terminal where the voice interaction application is located, and a test result is determined based on data generated in the processing process and expected processing data.

The testing process combines hardware such as a manual mouth and a recording device and the software for voice interaction application, realizes automatic testing of voice interaction application in a mode of combining software and hardware, is simple to operate and easy to realize, can realize most of voice interaction automatic testing, and greatly improves the working efficiency.

All the above-mentioned optional technical solutions are combined arbitrarily to form the optional embodiments of the present application, and are not described herein again.

Fig. 4 is a schematic structural diagram of an automated testing apparatus for a voice interaction application according to an embodiment of the present application, and referring to fig. 4, the apparatus includes:

an obtaining module 401, configured to obtain an automatic test case, where the automatic test case includes a test audio and expected processing data;

a playing module 402, configured to play the test audio in the automated test case based on the manual mouth;

a processing module 403, configured to perform interactive processing on the test audio played by the artificial mouth based on a voice interactive application;

a determining module 404, configured to determine a test result based on the data generated by the interactive processing and the expected processing data in the automated test case.

In some embodiments, the test audio acquisition process includes:

acquiring historical data of the voice interaction application;

In some embodiments, the determination module 404 is configured to:

In some embodiments, the processing module 403 is configured to:

acquiring feedback information corresponding to the voice signal;

In some embodiments, the processing module 403 is configured to perform any of:

In some embodiments, the apparatus further comprises:

The device that this application embodiment provided, the test audio frequency can directly obtain, broadcast the test audio frequency through artifical mouth, can replace artifical pronunciation, test voice interaction application, this test process need not to rely on the manual work, but the test case automatic execution, automatic test process has been realized, and artifical mouth is as a special artifical sound source, can simulate people's mouth and send out sound, play the test audio frequency with this, can make the voice signal that voice interaction application gathered and the voice signal who gathers artifical pronunciation and obtain no difference, accuracy when can guarantee voice interaction application when testing based on this is unanimous with the accuracy when artifical pronunciation mode is tested, and above-mentioned automatic test process has reduced the cost of labor, can improve efficiency of software testing greatly.

It should be noted that: in the automatic testing device for voice interactive application provided in the above embodiment, only the division of the above functional modules is used for illustration when performing the automatic testing of the voice interactive application, and in practical applications, the above functions are allocated to different functional modules as needed, that is, the internal structure of the automatic testing device for voice interactive application is divided into different functional modules to complete all or part of the above described functions. In addition, the automatic testing apparatus for voice interaction application and the automatic testing method for voice interaction application provided in the above embodiments belong to the same concept, and specific implementation processes thereof are described in detail in the method embodiments and are not described herein again.

Fig. 5 is a schematic structural diagram of an electronic device 500 according to an embodiment of the present application, where the electronic device 500 may generate a relatively large difference due to different configurations or performances, and includes one or more processors (CPUs) 501 and one or more memories 502, where the memory 502 stores at least one computer program, and the at least one computer program is loaded and executed by the processors 501 to implement the automated testing method for voice interaction applications provided by the above method embodiments. The electronic device further includes other components for implementing the functions of the device, for example, the electronic device further includes components such as a wired or wireless network interface and an input/output interface for inputting and outputting. The embodiments of the present application are not described herein in detail.

The electronic device in the above method embodiment is implemented as a terminal. For example, fig. 6 is a block diagram of a terminal according to an embodiment of the present disclosure. The terminal 600 may be a portable mobile terminal such as: a smart phone, a tablet computer, an MP3(Moving Picture Experts Group Audio Layer III, motion video Experts compression standard Audio Layer 3) player, an MP4(Moving Picture Experts Group Audio Layer IV, motion video Experts compression standard Audio Layer 4) player, a notebook computer or a desktop computer. The terminal 600 may also be referred to by other names such as user equipment, portable terminal, laptop terminal, desktop terminal, etc.

In general, the terminal 600 includes: a processor 601 and a memory 602.

The processor 601 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on. The processor 601 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 601 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 601 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content that the display screen needs to display. In some embodiments, processor 601 may also include an AI (Artificial Intelligence) processor for processing computational operations related to machine learning.

The memory 602 may include one or more computer-readable storage media, which may be non-transitory. The memory 602 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 602 is used to store at least one instruction for execution by processor 601 to implement the automated testing method for voice interaction applications provided by method embodiments herein.

In some embodiments, the terminal 600 may further optionally include: a peripheral interface 603 and at least one peripheral. The processor 601, memory 602, and peripheral interface 603 may be connected by buses or signal lines. Various peripheral devices may be connected to the peripheral interface 603 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of a radio frequency circuit 604, a display 605, a camera assembly 606, an audio circuit 607, a positioning component 608, and a power supply 609.

The peripheral interface 603 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 601 and the memory 602. In some embodiments, the processor 601, memory 602, and peripheral interface 603 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 601, the memory 602, and the peripheral interface 603 may be implemented on a separate chip or circuit board, which is not limited in this embodiment.

The Radio Frequency circuit 604 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 604 communicates with communication networks and other communication devices via electromagnetic signals. The rf circuit 604 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 604 comprises: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuitry 604 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: the world wide web, metropolitan area networks, intranets, generations of mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the rf circuit 604 may further include NFC (Near Field Communication) related circuits, which are not limited in this application.

The display 605 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 605 is a touch display screen, the display screen 605 also has the ability to capture touch signals on or over the surface of the display screen 605. The touch signal may be input to the processor 601 as a control signal for processing. At this point, the display 605 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display 605 may be one, disposed on the front panel of the terminal 600; in other embodiments, the display 605 may be at least two, respectively disposed on different surfaces of the terminal 600 or in a folded design; in other embodiments, the display 605 may be a flexible display disposed on a curved surface or a folded surface of the terminal 600. Even more, the display 605 may be arranged in a non-rectangular irregular pattern, i.e., a shaped screen. The Display 605 may be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), and the like.

The camera assembly 606 is used to capture images or video. Optionally, camera assembly 606 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of the terminal, and a rear camera is disposed at a rear surface of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 606 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.

Audio circuitry 607 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 601 for processing or inputting the electric signals to the radio frequency circuit 604 to realize voice communication. For the purpose of stereo sound collection or noise reduction, a plurality of microphones may be provided at different portions of the terminal 600. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 601 or the radio frequency circuit 604 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, audio circuitry 607 may also include a headphone jack.

The positioning component 608 is used for positioning the current geographic Location of the terminal 600 to implement navigation or LBS (Location Based Service). The Positioning component 608 can be a Positioning component based on the Global Positioning System (GPS) in the united states, the beidou System in china, or the galileo System in russia.

Power supply 609 is used to provide power to the various components in terminal 600. The power supply 609 may be ac, dc, disposable or rechargeable. When the power supply 609 includes a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, the terminal 600 also includes one or more sensors 610. The one or more sensors 610 include, but are not limited to: acceleration sensor 611, gyro sensor 612, pressure sensor 613, fingerprint sensor 614, optical sensor 615, and proximity sensor 616.

The acceleration sensor 611 may detect the magnitude of acceleration in three coordinate axes of the coordinate system established with the terminal 600. For example, the acceleration sensor 611 may be used to detect components of the gravitational acceleration in three coordinate axes. The processor 601 may control the display screen 605 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 611. The acceleration sensor 611 may also be used for acquisition of motion data of a game or a user.

The gyro sensor 612 may detect a body direction and a rotation angle of the terminal 600, and the gyro sensor 612 and the acceleration sensor 611 may cooperate to acquire a 3D motion of the user on the terminal 600. The processor 601 may implement the following functions according to the data collected by the gyro sensor 612: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.

Pressure sensors 613 may be disposed on the side bezel of terminal 600 and/or underneath display screen 605. When the pressure sensor 613 is disposed on the side frame of the terminal 600, a user's holding signal of the terminal 600 can be detected, and the processor 601 performs left-right hand recognition or shortcut operation according to the holding signal collected by the pressure sensor 613. When the pressure sensor 613 is disposed at the lower layer of the display screen 605, the processor 601 controls the operability control on the UI interface according to the pressure operation of the user on the display screen 605. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.

The fingerprint sensor 614 is used for collecting a fingerprint of a user, and the processor 601 identifies the identity of the user according to the fingerprint collected by the fingerprint sensor 614, or the fingerprint sensor 614 identifies the identity of the user according to the collected fingerprint. Upon identifying that the user's identity is a trusted identity, the processor 601 authorizes the user to perform relevant sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying, and changing settings, etc. The fingerprint sensor 614 may be disposed on the front, back, or side of the terminal 600. When a physical button or vendor Logo is provided on the terminal 600, the fingerprint sensor 614 may be integrated with the physical button or vendor Logo.

The optical sensor 615 is used to collect the ambient light intensity. In one embodiment, processor 601 may control the display brightness of display screen 605 based on the ambient light intensity collected by optical sensor 615. Specifically, when the ambient light intensity is high, the display brightness of the display screen 605 is increased; when the ambient light intensity is low, the display brightness of the display screen 605 is adjusted down. In another embodiment, the processor 601 may also dynamically adjust the shooting parameters of the camera assembly 606 according to the ambient light intensity collected by the optical sensor 615.

A proximity sensor 616, also known as a distance sensor, is typically disposed on the front panel of the terminal 600. The proximity sensor 616 is used to collect the distance between the user and the front surface of the terminal 600. In one embodiment, when proximity sensor 616 detects that the distance between the user and the front face of terminal 600 gradually decreases, processor 601 controls display 605 to switch from the bright screen state to the dark screen state; when the proximity sensor 616 detects that the distance between the user and the front face of the terminal 600 is gradually increased, the processor 601 controls the display 605 to switch from the breath-screen state to the bright-screen state.

Those skilled in the art will appreciate that the configuration shown in fig. 6 is not intended to be limiting of terminal 600 and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components may be used.

The electronic device in the above method embodiment is implemented as a server. For example, fig. 7 is a schematic structural diagram of a server 700 according to an embodiment of the present application, where the server 700 may generate relatively large differences due to different configurations or performances, and includes one or more processors (CPUs) 701 and one or more memories 702, where the memory 702 stores at least one computer program, and the at least one computer program is loaded and executed by the processors 701 to implement the automated testing method for the voice interaction application according to the above-described method embodiments. Certainly, the server further has a wired or wireless network interface, an input/output interface, and other components to facilitate input and output, and the server further includes other components for implementing the functions of the device, which are not described herein again.

In an exemplary embodiment, a computer-readable storage medium, such as a memory including at least one computer program, executable by a processor, is also provided to perform the automated testing method of the voice interaction application in the above embodiments. For example, the computer readable storage medium is a Read-Only Memory (ROM), a Random Access Memory (RAM), a Compact Disc Read-Only Memory (CD-ROM), a magnetic tape, a floppy disk, an optical data storage device, and the like.

In an exemplary embodiment, a computer program product or a computer program is also provided, which comprises one or more program codes, which are stored in a computer-readable storage medium. The one or more program codes are read from the computer-readable storage medium by one or more processors of the electronic device, and the one or more processors execute the one or more program codes, so that the electronic device executes the automated testing method of the voice interaction application.

In some embodiments, the computer program according to the embodiments of the present application may be deployed to be executed on one electronic device, or on a plurality of electronic devices located at one site, or on a plurality of electronic devices distributed at a plurality of sites and interconnected by a communication network, and the plurality of electronic devices distributed at the plurality of sites and interconnected by the communication network may constitute a block chain system.

Those skilled in the art will understand that all or part of the steps for implementing the above embodiments are implemented by hardware, and also implemented by a program for instructing relevant hardware, where the program is stored in a computer-readable storage medium, and the storage medium mentioned above is a read-only memory, a magnetic disk or an optical disk, etc.

The above description is intended only to be an alternative embodiment of the present application, and not to limit the present application, and any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. An automated testing method for voice interactive applications, the method comprising:

playing the test audio in the automatic test case based on the manual mouth;

2. The method of claim 1, wherein the test audio acquisition process comprises:

3. The method of claim 1, wherein the test audio acquisition process comprises:

acquiring historical data of the voice interaction application;

4. The method of claim 1, wherein determining a test result based on the data generated by the interactive process and expected process data in the automated test case comprises:

5. The method according to claim 1, wherein the data generated by the interactive processing includes log information stored by a server corresponding to the voice interaction application, log information stored by a terminal where the voice interaction application is located, and interface display information when the terminal feeds back the test audio.

6. The method of claim 5, wherein the interface display information comprises elements displayed in an interface when the terminal feeds back the test audio; or the interface display information comprises element changes corresponding to interface jumping when the terminal feeds back the test audio.

7. The method of claim 1, wherein the interactive processing of the test audio played by the artificial mouth based on the voice interaction application comprises:

acquiring feedback information corresponding to the voice signal;

8. The method according to claim 7, wherein the displaying the feedback information based on the terminal where the voice interaction application is located comprises any one of:

9. The method of claim 1, further comprising:

10. An apparatus for automated testing of voice interactive applications, the apparatus comprising:

11. An electronic device, comprising one or more processors and one or more memories, wherein at least one computer program is stored in the one or more memories, and loaded and executed by the one or more processors to implement the automated testing method for the voice interaction application according to any one of claims 1 to 9.

12. An automatic testing system for voice interaction application is characterized by comprising a terminal where the voice interaction application is located, a manual mouth and processing equipment;

the processing equipment is used for the automatic test case to comprise test audio and expected processing data; determining a test result based on the data generated by the interactive processing and expected processing data in the automatic test case;

the manual mouth is used for playing the test audio in the automatic test case acquired by the processing equipment;

and the terminal where the voice interaction application is located is used for carrying out interaction processing on the test audio played by the artificial mouth.

13. The system of claim 12, further comprising a recording device for capturing sounds made by a user to obtain the test audio.

14. A computer-readable storage medium, in which at least one computer program is stored, which is loaded and executed by a processor to implement a method for automated testing of a voice interaction application according to any one of claims 1 to 9.