CN115223545A

CN115223545A - Voice interaction test method, device, system, equipment and storage medium

Info

Publication number: CN115223545A
Application number: CN202210717430.XA
Authority: CN
Inventors: 杨诗鹏; 刘露平; 刘巍; 车婷婷
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-06-23
Filing date: 2022-06-23
Publication date: 2022-10-21

Abstract

The disclosure provides a voice interaction testing method, device, system, equipment and storage medium, relating to the technical field of computers, in particular to the technical fields of voice technology, artificial intelligence, natural language processing and deep learning. The specific implementation scheme is as follows: performing voice interaction with the first terminal according to the determined interaction scene information; the interactive scene information includes: voice interaction mode information, noise information and corpus information; acquiring first CPU information corresponding to a first terminal in a voice interaction process according to a preset sampling frequency; and generating a test result of the voice interaction function of the first terminal according to the first CPU information. According to the scheme disclosed by the invention, the CPU information corresponding to the voice interaction function in the specific voice interaction scene can be obtained, and the test result of the voice interaction function of the terminal can be obtained based on the CPU information.

Description

Voice interaction test method, device, system, equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technology, and more particularly, to the field of speech technology, artificial intelligence, natural language processing, and deep learning.

Background

With the rapid development of artificial intelligence technology and the breakthrough progress of core technology, more and more intelligent terminal devices such as smart homes and mobile phones carry voice interaction technology, and more convenient and efficient interaction modes are provided for users. The performance of a Central Processing Unit (CPU) of the voice interaction software has a great influence on the operation stability of the intelligent terminal device and the user interaction experience, and the CPU of the voice interaction software occupies too high, which may cause the voice interaction software to stop responding, make recognition error, and the like.

Disclosure of Invention

The present disclosure provides a method, apparatus, system, device and storage medium for voice interaction testing.

According to an aspect of the present disclosure, there is provided a method of voice interaction testing, including:

performing voice interaction with the first terminal according to the determined interaction scene information; wherein, the interactive scene information includes: voice interaction mode information, noise information and corpus information;

acquiring first CPU information corresponding to a first terminal in a voice interaction process according to a preset sampling frequency; and

and generating a test result of the voice interaction function of the first terminal according to the first CPU information.

According to another aspect of the present disclosure, there is provided an apparatus for voice interaction testing, including:

the first interaction module is used for carrying out voice interaction with the first terminal according to the determined interaction scene information; wherein, the interactive scene information includes: voice interaction mode information, noise information and corpus information;

the first acquisition module is used for acquiring first CPU information corresponding to the first terminal in the voice interaction process according to a preset sampling frequency; and

and the first generating module is used for generating a test result of the voice interaction function of the first terminal according to the first CPU information.

According to another aspect of the present disclosure, there is provided a system for voice interaction testing, comprising:

the control terminal is used for controlling the first voice playing device and the first terminal to carry out voice interaction according to the determined interaction scene information; acquiring first CPU information corresponding to a first terminal in a voice interaction process according to a preset sampling frequency; generating a test result of the voice interaction function of the first terminal according to the first CPU information; wherein, the interactive scene information includes: voice interaction mode information, noise information and corpus information;

the first voice playing device is used for playing interactive voice according to the received interactive scene information;

and the first terminal is used for carrying out voice interaction with the first voice playing device according to the received interactive scene information and the interactive voice.

According to another aspect of the present disclosure, there is provided an electronic device including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to cause the at least one processor to perform a method according to any one of the embodiments of the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform a method according to any one of the embodiments of the present disclosure.

According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements a method according to any of the embodiments of the present disclosure.

According to the scheme disclosed by the invention, the CPU information corresponding to the voice interaction function in the specific voice interaction scene can be obtained, and the test result of the voice interaction function of the terminal can be obtained based on the CPU information.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a flow diagram illustrating a method of voice interaction testing in accordance with an embodiment of the present disclosure;

FIG. 2 is a schematic diagram of an application scenario of a method of voice interaction testing according to an embodiment of the present disclosure;

FIG. 3 is a flow diagram of a method of voice interaction testing according to another embodiment of the present disclosure;

FIG. 4 is a schematic structural diagram of an apparatus for voice interaction testing according to an embodiment of the present disclosure;

FIG. 5 is a schematic block diagram of a system for voice interaction testing in accordance with an embodiment of the present disclosure;

FIG. 6 is a block diagram of an electronic device for implementing a method of voice interaction testing of an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

According to an aspect of the present disclosure, as shown in fig. 1, a method for testing voice interaction is provided, which may include:

s101: and performing voice interaction with the first terminal according to the determined interaction scene information. Wherein, the interactive scene information comprises: voice interaction mode information, noise information, and corpus information.

S102: and acquiring first CPU information corresponding to the first terminal in the voice interaction process according to the preset sampling frequency.

S103: and generating a test result of the voice interaction function of the first terminal according to the first CPU information.

According to the embodiments of the present disclosure, it should be noted that:

the interactive scene information may be understood as voice interactive information that needs to be utilized when performing voice interaction with the first terminal.

The first terminal can be understood as any intelligent terminal equipment capable of realizing a voice interaction function. For example, a smart phone, a smart watch, smart glasses, a smart speaker, a computer, a robot, a car machine, etc., which are not limited herein. As long as voice interaction software is installed or a voice interaction module is carried.

And performing voice interaction with the first terminal according to the determined interaction scene information, which can be understood as that the execution main body of the voice interaction test method performs voice interaction with the first terminal directly according to the interaction scene information. The execution main body of the voice interaction testing method can also be understood as the fact that the execution main body of the voice interaction testing method indirectly performs voice interaction with the first terminal according to the interaction scene information by using other voice playing devices. The execution subject may be a computer, a server, a distributed server cluster, etc., and is not limited in this respect.

The voice interaction mode information may include, but is not limited to: an active mode, a response mode, a persistent interaction mode, or a combined mode. The active mode may be understood as a mode for waking up or activating a voice interaction function of the first terminal by a specific wake-up word. The response mode may be understood as a mode for enabling the first terminal to perform at least one voice interaction. The continuous interaction mode may be understood as a mode for making the first terminal perform a plurality of or more voice interactions in succession. The combined mode may be understood to include any two or all of the active mode, the responsive mode, and the persistent interaction mode.

The corpus information may be understood as the content of a voice (query sentence or wakeup word sentence) to be played when the first terminal performs voice interaction, and the corpus information is used for the first terminal to perform voice recognition, and the first terminal may be enabled to feed back the reply content related to the corpus information based on the recognition result.

The noise information may be understood as a noise environment in which the first terminal is located during voice interaction, and when noise exists in the noise environment, the voice received by the first terminal may include noise voice. The noise information may include a noise-free environment, an internal noise environment, and an external noise environment. A noiseless environment may be understood as a quiet environment without noise, i.e. the speech received by the first terminal only contains speech corresponding to the corpus information. The internal noise environment may be understood as an environment in which the first terminal itself generates noise, for example, the first terminal itself plays music, video or voice. The external noise environment may be understood as an environment in which the first terminal is located and which includes noise generated by other sound sources than the first terminal.

The first CPU information corresponding to the first terminal in the voice interaction process may be understood as related data information generated by the CPU occupied by the first terminal in the voice interaction process. The relevant data information generated by the CPU can change in real time along with the voice interaction process.

The first CPU information may include, but is not limited to: the CPU occupation maximum value, the CPU occupation minimum value, the CPU occupation average value, the CPU occupation amplification and one or more information in the CPU occupation variation trend.

The test result of the voice interaction function of the first terminal can be understood as a test result of the running program in the voice interaction software installed on the first terminal or the voice interaction module carried by the first terminal, which is obtained based on the condition that the voice interaction function occupies the CPU in the voice interaction process.

And the first terminal provided with the voice interaction software or the voice interaction module realizes the voice interaction function. And after the voice interaction function is started, the voice interaction software or the voice interaction module starts to operate and occupies part of the CPU and/or the memory of the first terminal. Under different noise environment scenes and voice interaction modes, the proportion of a CPU and/or a memory occupied by voice interaction software or a voice interaction module is greatly different, so that how to comprehensively and objectively evaluate the CPU performance of the voice interaction software or the voice interaction module of the first terminal becomes a difficult problem. According to the scheme of the embodiment of the disclosure, the CPU information corresponding to the voice interaction function in the specific voice interaction scene can be obtained, and the test result of the voice interaction function of the terminal can be obtained based on the CPU information. According to the test result, whether the performance of the voice interaction software installed on the first terminal or the running program in the carried voice interaction module needs to be optimized or not can be accurately obtained. The embodiment of the disclosure aims at the interactive function test of the first terminal, can realize the automatic voice interaction test with the first terminal according to various designed voice interaction modes, and realizes the test of voice interaction scene coverage full coverage. And automatically acquiring the CPU information of the first terminal in the voice interaction process, and automatically generating a corresponding test result according to the acquired CPU information of the first terminal, so that the test efficiency and the test accuracy of the voice interaction test of the first terminal are improved, and the objectivity and the high reliability of the test result are ensured. Meanwhile, the method of the embodiment of the disclosure is automatically completed, so that the labor cost of the test is reduced, and the automatic voice interaction test, the automatic acquisition of the performance data of the CPU, and the automatic analysis and display of the test result are realized.

In one embodiment, the method for testing voice interaction according to the embodiments of the present disclosure includes steps S101 to S103, where step S101: and performing voice interaction with the first terminal according to the determined interaction scene information. Wherein, the interactive scene information includes: the voice interaction mode information, the noise information and the corpus information may include:

and determining voice interaction mode information, noise information and corpus information in the interaction scene information.

And under the condition that the noise information is determined to comprise a noise-free environment, sending voice interaction mode information and corpus information to the first voice playing device.

And sending voice interaction mode information to the first terminal so that the first voice playing device and the first terminal perform voice interaction.

the voice interaction mode information sent to the first voice playing device may include only one voice interaction mode, for example, any one of an active mode, a response mode, or a continuous interaction mode. The voice interaction mode information sent to the first voice playing device may also include a plurality of voice interaction modes, for example, any two or all of an active mode, a response mode and a persistent interaction mode.

The specific corpus content, corpus duration, etc. of the corpus information sent to the first voice playing device may be selected and adjusted according to the test requirement, and no specific limitation is made here.

The first voice playing device may be any device capable of playing voice, such as a sound box, a mobile terminal, a speaker, and the like.

According to the scheme of the embodiment of the disclosure, the voice interaction test can be performed on the first terminal in a noise-free environment, and the CPU information generated by the first terminal in the voice interaction process in the noise-free environment can be accurately acquired. Therefore, the voice interaction software of the first terminal under the noise-free environment or the CPU performance of the voice interaction module is further determined based on the CPU information.

In one embodiment, the method for testing voice interaction according to the embodiments of the present disclosure includes steps S101 to S103, where step S101: and performing voice interaction with the first terminal according to the determined interaction scene information. Wherein, the interactive scene information comprises: the voice interaction mode information, the noise information and the corpus information may include:

And under the condition that the noise information comprises the internal noise environment, sending voice interaction mode information and corpus information to the first voice playing device. And

and sending the voice interaction mode information and the noise information to the first terminal so that the first voice playing device and the first terminal perform voice interaction in an internal noise environment.

According to the scheme of the embodiment of the disclosure, the voice interaction test can be performed on the first terminal in the internal noise environment, and the CPU information generated by the first terminal in the voice interaction process in the internal noise environment can be accurately acquired. Therefore, the voice interaction software of the first terminal under the internal noise environment or the CPU performance carrying the voice interaction module is further determined based on the CPU information.

And under the condition that the noise information comprises the external noise environment, sending voice interaction mode information and corpus information to the first voice playing device.

And sending the voice interaction mode information to the first terminal. And

and sending the noise information to the second voice playing device so that the voice playing device and the first terminal perform voice interaction in the external noise environment.

the voice interaction mode information sent to the first voice playing device may include only one voice interaction mode, for example, any one of an active mode, a response mode, or a persistent interaction mode. The voice interaction mode information sent to the first voice playing device may also include a plurality of voice interaction modes, for example, any two or all of an active mode, a response mode and a persistent interaction mode.

The first voice playing device and the second voice playing device may be any devices capable of playing voice, such as a sound box, a mobile terminal, a speaker, and the like. The first voice playing device and the second voice playing device may be the same voice playing device in the embodiment of the present disclosure.

According to the scheme of the embodiment of the disclosure, the voice interaction test can be performed on the first terminal in the external noise environment, and the CPU information generated by the first terminal in the voice interaction process in the external noise environment can be accurately acquired. Therefore, the voice interaction software of the first terminal under the external noise environment or the CPU performance carrying the voice interaction module is further determined based on the CPU information.

And under the condition that the noise information comprises an internal noise environment and an external noise environment, sending voice interaction mode information and corpus information to the first voice playing device.

And sending the voice interaction mode information and the noise information (internal noise environment) to the first terminal.

And sending noise information (external noise environment) to the second voice playing device.

Therefore, the first voice playing device and the first terminal perform voice interaction in the internal noise environment and the external noise environment.

According to the scheme of the embodiment of the disclosure, the voice interaction test can be performed on the first terminal in the environment with both external noise and internal noise, and the CPU information generated by the first terminal in the voice interaction process in the environment with both external noise and internal noise can be accurately acquired. Therefore, the voice interaction software of the first terminal or the CPU performance of the voice interaction module is further determined under the environment with both external noise and internal noise based on the CPU information.

In one example, as shown in fig. 2, the steps of the method for voice interaction testing of the disclosed embodiments may be performed by a computer as an execution subject. The computer can test a plurality of terminals simultaneously. The computer can control the first voice playing device to play interactive voice and control the second voice playing device to play external noise simulating the external noise environment. The computer can also control the terminal to play the internal noise simulating the internal noise environment.

In one embodiment, the method for testing voice interaction according to the embodiments of the present disclosure includes steps S101 to S103, where step S102: according to the preset sampling frequency, acquiring first CPU information corresponding to the first terminal in the voice interaction process may include:

and determining the duration of the corpus according to the corpus information.

And determining the preset sampling frequency based on the first criterion of Nakunstatten according to the duration of the corpus.

And acquiring first CPU information corresponding to the first terminal in the voice interaction process according to the preset sampling frequency.

According to the scheme of the embodiment of the disclosure, the sampling frequency determined by the first criterion of Nakanstatten ensures that the sampling point can completely reflect the CPU performance state of the voice interaction software installed on the first terminal or the running program in the carried voice interaction module during working, and the computing power reasonability and the maximum bearing capacity of the voice interaction software of the first terminal or the running program in the carried voice interaction module are effectively verified. Moreover, the CPU information sampling performed by the embodiment of the disclosure has small fluctuation and low sampling point redundancy, effectively covers the working time period of the voice interaction software of the first terminal or the running program in the carried voice interaction module, and can well reflect the performance of the CPU during working.

In one example, the definition rule of the preset sampling frequency is as follows: assuming that the corpus duration is l, sampling interval of CPU performance data is t, and t is less than or equal to l/4.

In an embodiment, the method for testing voice interaction according to the embodiment of the present disclosure includes steps S101 to S103, and may further include:

and acquiring memory information and/or file handle information corresponding to the first terminal in the voice interaction process. And

step S103: generating a test result of the voice interaction function of the first terminal according to the first CPU information, may further include:

and generating a test result of the voice interaction function of the first terminal according to the first CPU information, the memory information and/or the file handle information.

the memory information includes: dalvik memory occupancy, native memory occupancy and total memory occupancy.

The file handle information may be understood as the number information of the acquired file handles.

According to the scheme of the embodiment of the disclosure, the memory information corresponding to the voice interaction function in the specific voice interaction scene can be obtained, so that the determination of the voice interaction software of the first terminal or the memory performance of the voice interaction module can be realized based on the memory information. In the test process of voice interaction, file handle information is obtained, and whether the number of file handles fluctuates in an expected range or not can be determined. If the performance index exceeds the fluctuation range, the performance index of the voice interaction software of the first terminal or the file handle carrying the voice interaction module can be determined according to the file handle information, and the performance index does not reach the standard, so that errors exist in the processing logic of the voice interaction software of the first terminal or the file handle carrying the voice interaction module.

In an example, when at least two of the active mode, the response mode, and the persistent interaction mode are tested simultaneously, the first CPU information, the memory information, and the file handle information of the first terminal during the execution of each mode may be respectively obtained according to threads of different modes.

In an embodiment, the method for testing voice interaction according to the embodiments of the present disclosure includes steps S101 to S103, and may further include:

and performing voice interaction with the second terminal according to the determined interaction scene information.

And acquiring second CPU information corresponding to the second terminal in the voice interaction process according to the preset sampling frequency.

And generating a comparison test result according to the first CPU information and the second CPU information.

the voice interaction software (or the loaded voice interaction module) of the second terminal and the first terminal may be in different versions, for example, the voice interaction software (or the loaded voice interaction module) of the second terminal is a reference version, and the voice interaction software (or the loaded voice interaction module) of the first terminal is an upgrade version to be tested.

According to the scheme of the embodiment of the disclosure, CPU information corresponding to the voice interaction functions of different terminals in a specific voice interaction scene can be obtained, and whether the voice interaction software of the first terminal or the CPU performance carrying the voice interaction module meets the design requirements or not can be determined more visually through comparison of voice interaction function versions of the different terminals.

In an example, the method for testing voice interaction according to the embodiment of the present disclosure includes steps S101 to S103, and may further include:

and acquiring memory information and/or file handle information corresponding to the second terminal in the voice interaction process.

And generating a comparison test result according to the first CPU information, the memory information and/or the file handle information of the first terminal and the second CPU information, the memory information and/or the file handle information of the second terminal.

According to the scheme of the embodiment of the disclosure, CPU information, memory information and file handle information corresponding to the voice interaction functions of different terminals in a specific voice interaction scene can be obtained, and whether the performance of the voice interaction software or the loaded voice interaction module of the first terminal meets the design requirements or not can be determined more intuitively through comparison of voice interaction function versions of different terminals.

In an example, when at least two of the active mode, the response mode, and the persistent interaction mode are tested simultaneously, second CPU information, memory information, and file handle information of the second terminal during the execution of each mode may be acquired according to threads of different modes.

In one example, as shown in fig. 3, the method for testing voice interaction of the embodiment of the present disclosure may be implemented by a combination of a plurality of functional modules. Specifically, the method comprises the following steps:

and the test presetting module is used for setting versions to be tested, and if a compared baseline version exists, two versions can be simultaneously set for simultaneous testing. The module also performs setting of a test scenario (noise information) and setting of an interaction mode (voice interaction mode). Based on the implementation principle of voice interaction software, under a quiet scene (a noiseless environment) and an internal noise (an internal noise environment) plus external noise (an external noise environment) scene, the internal operation logic of the voice interaction software is obviously different, so that the test scene setting comprises the quiet scene and the internal noise plus external noise scene, the internal noise plus external noise scene is set, an execution main body executing a test scheme sends an internal noise playing command to an intelligent terminal device to be tested (a first terminal) to enable the intelligent terminal device to play preset internal noise corpora, and in addition, the execution main body transmits audio to a loudspeaker (a second voice playing device) to play the preset external noise corpora; and setting a quiet scene, and not executing the command of playing the internal noise and the external noise. The interaction mode of general voice interaction software comprises the following steps: activate, respond, continue to interact. In the activation mode, the voice interaction software only responds to specific words; in the response mode, the voice interaction software can recognize a voice request and respond to the request once; in the continuous interactive mode, the voice interactive software can continuously recognize the voice request and respond to the request in turn. The voice interaction software occupies different CPU, memory and other performance indexes under different interaction modes, can quickly set various interaction modes through the test preset module, and can set the combination of different interaction modes to cover the interaction mode of the software to be tested. In addition, the test time of one round can be set through the module.

And the test execution module is used for executing a preset scene firstly, and comprises sending an external noise frequency signal to a loudspeaker (second voice playing equipment) and sending an internal noise corpus playing command to the terminal equipment. And then, executing the interactive mode, wherein the execution module sends a corresponding interactive mode broadcast command to the intelligent terminal device to be tested (the first terminal) so that the voice interactive software enters a working state corresponding to the interactive mode. And then the execution module sends a playing command to the voice request playing device (the first voice playing device), the request corpus is played circularly and continuously, the time length of the request corpus is about 12s, according to the duration of the corpus, the intelligent device from the host to the test version and the intelligent device of the baseline version carry out uniform sampling at an interval of 3s on the CPU performance data, and the sampling result can completely reflect the change of the CPU performance data and the like in the corpus playing time. And the sampling result is transmitted back to the execution main body for storage. And after the test time of the scene and the interactive mode of one round is finished, automatically starting the next round of test.

And the result analysis module analyzes and evaluates the performance of the voice interaction software according to the stored performance data. The method comprises the following steps of counting CPU related performance information of voice interaction software, analyzing CPU peak values, average values, variation trends and the like, and refining to each thread: the memory information can distinguish different memory areas, such as a dalvik memory and a native memory, and can effectively help developers to locate problems. In addition, the statistics also comprise file handle information, CPU temperature information and the like. Meanwhile, a comparison table of the test version and the comparison version can be automatically generated, the difference value is calculated, and the difference between the test version and the comparison version can be quickly seen. And a plurality of automatic drawing functions of the chart are provided, and the data difference and the data change are visually seen.

The method for testing voice interaction provided by the embodiment of the disclosure classifies test scenes in detail, and voice interaction performance is different in different noise environments, different corpus information and different voice interaction modes, starting from voice interaction software of a first terminal or an operation principle of carrying a voice interaction module, a running path of the voice interaction software or the voice interaction module is covered on the test scenes, noise environment settings include a quiet scene and an internal noise and external noise scene, and a voice interaction mode (an activation mode, a response mode and a continuous interaction mode) can be set in each scene. The different interaction modes correspond to the voice interaction software or different voice algorithm operation modules carrying the voice interaction module. By setting different interaction modes, the performance of different voice algorithm operation modules can be evaluated, and the evaluation of the CPU performance and/or the memory performance is targeted; by testing different interaction modes in a combined mode, the scheme has comprehensiveness on the evaluation of the CPU performance and/or the memory performance.

According to an aspect of the present disclosure, as shown in fig. 4, an apparatus for testing voice interaction is provided, which may include:

and a first interaction module 410, configured to perform voice interaction with the first terminal according to the determined interaction scenario information. Wherein, the interactive scene information comprises: voice interaction mode information, noise information, and corpus information.

The first obtaining module 420 is configured to obtain, according to a preset sampling frequency, first CPU information corresponding to the first terminal in the voice interaction process. And

and the first generating module 430 is configured to generate a test result of the voice interaction function of the first terminal according to the first CPU information.

According to the scheme of the embodiment of the disclosure, the CPU information corresponding to the voice interaction function in the specific voice interaction scene can be obtained, and the test result of the voice interaction function of the terminal can be obtained based on the CPU information. According to the test result, whether the performance of the voice interaction software installed on the first terminal or the running program in the carried voice interaction module needs to be optimized or not can be accurately obtained. The embodiment of the disclosure aims at the interactive function test of the first terminal, can realize the automatic voice interaction test with the first terminal according to various designed voice interaction modes, and realizes the test of voice interaction scene coverage full coverage. And automatically acquiring the CPU information of the first terminal in the voice interaction process, and automatically generating a corresponding test result according to the acquired CPU information of the first terminal, so that the test efficiency and the test accuracy of the voice interaction test of the first terminal are improved, and the objectivity and the high reliability of the test result are ensured. Meanwhile, the method of the embodiment of the disclosure is automatically completed, so that the labor cost of the test is reduced, and the automatic voice interaction test, the automatic acquisition of the performance data of the CPU, and the automatic analysis and display of the test result are realized.

In one embodiment, the first interaction module 410 includes:

and the first determining submodule is used for determining voice interaction mode information, noise information and language material information in the interaction scene information.

And the first sending submodule is used for sending the voice interaction mode information and the corpus information to the first voice playing device and sending the voice interaction mode information to the first terminal under the condition that the noise information is determined to comprise a noise-free environment, so that the first voice playing device and the first terminal perform voice interaction.

In one embodiment, the first interaction module 410 includes:

and the second determining submodule is used for determining the voice interaction mode information, the noise information and the corpus information in the interaction scene information.

And the second sending submodule is used for sending the voice interaction mode information and the corpus information to the first voice playing device under the condition that the noise information is determined to comprise an internal noise environment. And sending the voice interaction mode information and the noise information to the first terminal so that the first voice playing device and the first terminal perform voice interaction in an internal noise environment.

In one embodiment, the first interaction module 410 includes:

and the third determining submodule is used for determining voice interaction mode information, noise information and corpus information in the interaction scene information.

And the third sending submodule is used for sending the voice interaction mode information and the corpus information to the first voice playing equipment under the condition that the noise information is determined to comprise an external noise environment. And sending the voice interaction mode information to the first terminal. And sending the noise information to the second voice playing device so that the voice playing device and the first terminal perform voice interaction in an external noise environment.

In one embodiment, the voice interaction mode information includes: an active mode, a response mode, a persistent interaction mode, or a combination mode. Wherein the active mode is for waking up the first terminal. The response mode is used for enabling the first terminal to perform voice interaction once. The continuous interactive mode is used for enabling the first terminal to continuously carry out voice interaction for a plurality of times. The combination mode includes at least two modes of an activation mode, a response mode, and a persistent interaction mode.

In one embodiment, the first obtaining module 420 includes:

and the fourth determining submodule is used for determining the duration of the corpus according to the corpus information.

And the fifth determining submodule is used for determining the preset sampling frequency according to the duration of the corpus and based on the first criterion of Nakunstatten.

And the obtaining submodule is used for obtaining first CPU information corresponding to the first terminal in the voice interaction process according to the preset sampling frequency.

In one embodiment, the apparatus for testing voice interaction further comprises:

and the second acquisition module is used for acquiring the corresponding memory information and/or file handle information of the first terminal in the voice interaction process. And

the first generating module is further used for generating a test result of the voice interaction function of the first terminal according to the first CPU information, the memory information and/or the file handle information.

and the second interaction module is used for carrying out voice interaction with the second terminal according to the determined interaction scene information.

And the third acquisition module is used for acquiring second CPU information corresponding to the second terminal in the voice interaction process according to the preset sampling frequency.

And the second generation module is used for generating a comparison test result according to the first CPU information and the second CPU information.

For a description of specific functions and examples of each module and each sub-module of the apparatus in the embodiment of the present disclosure, reference may be made to the related description of the corresponding steps in the foregoing method embodiments, and details are not repeated here.

According to an aspect of the present disclosure, as shown in fig. 5, a system for testing voice interaction is provided, which may include:

and the control terminal is used for controlling the first voice playing equipment and the first terminal to carry out voice interaction according to the determined interaction scene information. And acquiring first CPU information corresponding to the first terminal in the voice interaction process according to the preset sampling frequency. And generating a test result of the voice interaction function of the first terminal according to the first CPU information. Wherein, the interactive scene information comprises: voice interaction mode information, noise information, and corpus information.

And the first voice playing device is used for playing the interactive voice according to the received interactive scene information.

It should be noted that the control end may be understood as an execution subject of the method for testing voice interaction according to any embodiment of the present disclosure.

In one embodiment, the system for testing voice interaction further comprises:

and the second voice playing device is used for playing the noise of the external noise environment under the condition that the received noise information comprises the external noise environment.

In one embodiment, the system for voice interaction testing further comprises:

and the second terminal is used for carrying out voice interaction with the first voice playing device according to the received interactive scene information and the interactive voice.

For a description of specific functions and examples of the control end, the first voice playing device, the first terminal, the second voice playing device, and the second terminal of the system in the embodiment of the present disclosure, reference may be made to the related description of the corresponding steps in the foregoing method embodiments, and details are not repeated here.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations, and do not violate the good customs of the public order.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 6 illustrates a schematic block diagram of an example electronic device 600 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic devices may also represent various forms of mobile devices, such as personal digital processors, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not intended to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 6, the apparatus 600 includes a computing unit 601, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the device 600 can also be stored. The calculation unit 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

A number of components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, a mouse, and the like; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

Computing unit 601 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 601 performs the various methods and processes described above, such as the method of voice interaction testing. For example, in some embodiments, the method of voice interaction testing may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM 602 and/or the communication unit 609. When the computer program is loaded into RAM 603 and executed by the computing unit 601, one or more steps of the method of voice interaction testing described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured by any other suitable means (e.g., by means of firmware) to perform the method of voice interaction testing.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user may provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above, reordering, adding or deleting steps, may be used. For example, the steps described in the present disclosure may be executed in parallel, sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

Claims

1. A method of voice interaction testing, comprising:

performing voice interaction with the first terminal according to the determined interaction scene information; wherein the interactive scene information includes: voice interaction mode information, noise information and corpus information;

acquiring first CPU information corresponding to the first terminal in the voice interaction process according to a preset sampling frequency; and

2. The method of claim 1, wherein the voice interaction is performed with the first terminal according to the determined interaction scenario information; wherein the interactive scene information includes: the voice interaction mode information, the noise information and the corpus information comprise:

determining voice interaction mode information, noise information and corpus information in interaction scene information;

under the condition that the noise information is determined to include a noise-free environment, sending the voice interaction mode information and the corpus information to first voice playing equipment; and

and sending the voice interaction mode information to a first terminal so as to enable the first voice playing equipment to perform voice interaction with the first terminal.

3. The method of claim 1, wherein the voice interaction is performed with the first terminal according to the determined interaction scenario information; wherein the interactive scene information includes: the voice interaction mode information, the noise information and the corpus information comprise:

determining voice interaction mode information, noise information and corpus information in the interaction scene information;

under the condition that the noise information comprises an internal noise environment, sending the voice interaction mode information and the corpus information to first voice playing equipment; and

and sending the voice interaction mode information and the noise information to a first terminal so as to enable the first voice playing device and the first terminal to perform voice interaction in the internal noise environment.

4. The method according to claim 1 or 3, wherein the voice interaction is performed with the first terminal according to the determined interaction scenario information; wherein the interactive scene information includes: the voice interaction mode information, the noise information and the corpus information comprise:

under the condition that the noise information comprises an external noise environment, sending the voice interaction mode information and the corpus information to first voice playing equipment;

sending the voice interaction mode information to a first terminal; and

and sending the noise information to a second voice playing device so as to enable the voice playing device and the first terminal to perform voice interaction in the external noise environment.

5. The method of claim 1, wherein the voice interaction mode information comprises: an active mode, a response mode, a persistent interaction mode, or a combination mode;

the active mode is used for waking up a voice interaction function of the first terminal;

the response mode is used for enabling the first terminal to carry out at least one voice interaction;

the continuous interaction mode is used for enabling the first terminal to continuously carry out voice interaction for multiple times;

the combined mode includes at least two modes of the active mode, the response mode, and the persistent interaction mode.

6. The method according to claim 1, wherein the acquiring, according to a preset sampling frequency, first CPU information corresponding to the first terminal in a voice interaction process includes:

determining the duration of the corpus according to the corpus information;

determining a preset sampling frequency based on a first criterion of Nanknstattet according to the duration of the corpus;

7. The method of claim 1, further comprising:

acquiring memory information and/or file handle information corresponding to the first terminal in a voice interaction process; and

the generating a test result of the voice interaction function of the first terminal according to the first CPU information comprises:

8. The method of claim 1, further comprising:

performing voice interaction with a second terminal according to the determined interaction scene information;

acquiring second CPU information corresponding to the second terminal in the voice interaction process according to the preset sampling frequency;

9. An apparatus for voice interaction testing, comprising:

the first interaction module is used for carrying out voice interaction with the first terminal according to the determined interaction scene information; wherein the interactive scene information includes: voice interaction mode information, noise information and corpus information;

10. The apparatus of claim 9, wherein the first interaction module comprises:

the first determining submodule is used for determining voice interaction mode information, noise information and corpus information in the interaction scene information;

the first sending submodule is used for sending the voice interaction mode information and the corpus information to a first voice playing device and sending the voice interaction mode information to a first terminal under the condition that the noise information is determined to comprise a noise-free environment, so that the first voice playing device and the first terminal can carry out voice interaction.

11. The apparatus of claim 9, wherein the first interaction module comprises:

the second determining submodule is used for determining voice interaction mode information, noise information and corpus information in the interaction scene information;

the second sending submodule is used for sending the voice interaction mode information and the corpus information to the first voice playing device under the condition that the noise information is determined to comprise an internal noise environment; and sending the voice interaction mode information and the noise information to a first terminal so as to enable the first voice playing device and the first terminal to perform voice interaction in the internal noise environment.

12. The apparatus of claim 9 or 11, wherein the first interaction module comprises:

the third determining submodule is used for determining voice interaction mode information, noise information and corpus information in the interaction scene information;

the third sending submodule is used for sending the voice interaction mode information and the corpus information to the first voice playing device under the condition that the noise information is determined to comprise an external noise environment; sending the voice interaction mode information to a first terminal; and sending the noise information to a second voice playing device so that the voice playing device and the first terminal perform voice interaction in the external noise environment.

13. The apparatus of claim 9, wherein the voice interaction mode information comprises: an active mode, a response mode, a persistent interaction mode, or a combination mode;

14. The apparatus of claim 9, wherein the first obtaining means comprises:

the fourth determining submodule is used for determining the duration of the corpus according to the corpus information;

the fifth determining submodule is used for determining a preset sampling frequency according to the duration of the corpus based on a first criterion of Nakunstatten;

15. The apparatus of claim 9, further comprising:

the second acquisition module is used for acquiring the corresponding memory information and/or file handle information of the first terminal in the voice interaction process; and

the first generating module is further configured to generate a test result of the voice interaction function of the first terminal according to the first CPU information, the memory information, and/or the file handle information.

16. The apparatus of claim 9, further comprising:

the second interaction module is used for carrying out voice interaction with a second terminal according to the determined interaction scene information;

the third acquisition module is used for acquiring second CPU information corresponding to the second terminal in the voice interaction process according to the preset sampling frequency;

17. A system for voice interaction testing, comprising:

the control end is used for controlling the first voice playing equipment and the first terminal to carry out voice interaction according to the determined interaction scene information; acquiring first CPU information corresponding to the first terminal in the voice interaction process according to a preset sampling frequency; generating a test result of the voice interaction function of the first terminal according to the first CPU information; wherein the interactive scene information includes: voice interaction mode information, noise information and corpus information;

18. The system of claim 17, further comprising:

19. The system of claim 17 or 18, further comprising:

20. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 8.

21. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1 to 8.

22. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 8.