CN111128139B

CN111128139B - Non-invasive voice test method and device

Info

Publication number: CN111128139B
Application number: CN201911309691.2A
Authority: CN
Inventors: 何龙; 秦以南
Original assignee: Sipic Technology Co Ltd
Current assignee: Sipic Technology Co Ltd
Priority date: 2019-12-18
Filing date: 2019-12-18
Publication date: 2022-07-08
Anticipated expiration: 2039-12-18
Also published as: CN111128139A

Abstract

The invention discloses a non-invasive voice testing method, which comprises the following steps: configuring operation instruction information; acquiring image information of a test target according to the configured operation instruction information; and carrying out image recognition on the acquired image information of the test target to generate a first test result for storage. The invention also discloses a non-invasive voice testing device, according to the method and the system disclosed by the invention, operations such as testing and the like can be realized in a non-invasive mode, the testing efficiency is greatly improved, the operation is simple and convenient, ordinary personnel can complete the testing, and the labor cost is saved.

Description

Non-invasive voice test method and device

Technical Field

The invention relates to the technical field of voice test, in particular to a non-invasive voice test method and a non-invasive voice test device.

Background

Due to the scene characteristics of the voice technology products, later debugging of various developed voice technology products is particularly important. In the voice testing process, an artificial part in the voice testing process is often replaced by a machine in some ways, so that the testing efficiency, stability and the like are improved. However, the installation method of appium needs to deploy a test APP in a host machine (tested machine), so that the APP collects the interface feedback condition of the machine, and this action needs the tested machine to satisfy the following conditions:

1. an adjustable interface (android is generally usb);

2. adb debugging is supported;

3. supporting installation of the app;

4. a communication port is available.

If only one of the above conditions is not met, the method cannot be implemented, and some tested machines do not meet the conditions, so that the implementation mode of testing by the method is complex and has great limitations.

Disclosure of Invention

In order to solve the problems, the inventor thinks that the non-intrusive automatic voice test is realized by a scheme of directly acquiring the screen information of the tested machine on external equipment and identifying the picture based on an image identification technology so as to judge the voice test result, so that the test and other operations can be performed in a non-intrusive mode, the test efficiency is greatly improved, the operation is simple and convenient, and the defects of the problems are overcome.

According to one aspect of the present invention, there is provided a non-invasive voice testing method, comprising the steps of:

configuring operation instruction information;

acquiring image information of a test target according to the configured operation instruction information;

and carrying out image recognition on the acquired image information of the test target to generate a first test result for storage.

According to another aspect of the present invention, there is provided a non-intrusive voice test device, comprising:

the first configuration module is used for configuring the storage of the operation instruction information;

the image acquisition module is used for acquiring the image information of the test target according to the configured operation instruction information; and

and the first test result generation module is used for carrying out image recognition on the acquired image information of the test target to generate a first test result for storage.

According to still another aspect of the present invention, there is provided an electronic apparatus including: the computer-readable medium includes at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the above-described method.

According to a further aspect of the invention, a storage medium is provided, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method.

According to the scheme of the embodiment of the invention, a test app is not required to be implanted into the target equipment, the requirement on the tested equipment is reduced, the implementation is simple, the operation is convenient, the test can be finished without professional testers, the labor cost is saved, and the test efficiency is improved.

Drawings

FIG. 1 is a flow chart of a method for non-intrusive voice testing in accordance with an embodiment of the present invention;

FIG. 2 is a flow chart of a method for non-intrusive voice testing in accordance with another embodiment of the present invention;

FIG. 3 is a schematic structure diagram of a non-invasive voice testing apparatus according to an embodiment of the present invention;

FIG. 4 is a schematic structure diagram of a non-invasive voice testing apparatus according to an embodiment of the present invention;

fig. 5 is a schematic block diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

As used in this application, the terms "module," "apparatus," "system," and the like are intended to refer to a computer-related entity, either hardware, a combination of hardware and software, or software in execution. In particular, for example, an element may be, but is not limited to being, a process running on a processor, an object, an executable, a thread of execution, a program, and/or a computer. Also, an application or script running on a server, or a server, may be an element. One or more elements may be in a process and/or thread of execution and an element may be localized on one computer and/or distributed between two or more computers and can be operated by various computer-readable media. The elements may also communicate by way of local and/or remote processes based on a signal having one or more data packets, e.g., from a data packet interacting with another element in a local system, distributed system, and/or across a network in the internet with other systems by way of the signal.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The implementation scheme of the non-intrusive voice test related to the embodiment of the invention can be applied to any target device which needs to be subjected to a voice test, so that the voice test on the voice interaction device is realized, and the implementation scheme is particularly suitable for the voice interaction device which cannot or is inconvenient to implant test apps, such as vehicle-mounted devices, and the like, but the application range of the invention is not limited to this. By the scheme provided by the embodiment of the invention, the noninvasive voice test can be realized without implanting a test app into the target equipment, the requirement on the test equipment is reduced, the operation is convenient, and the test efficiency is improved.

The present invention will be described in further detail with reference to the accompanying drawings.

Fig. 1 schematically shows a flowchart of a method for non-intrusive voice test according to an embodiment of the present invention, and as shown in fig. 1, the method of this embodiment includes the following steps:

step S101: and configuring operation instruction information. In order to realize the automated non-intrusive voice test, the embodiment of the invention adopts a mode of configuring the operation instruction information according to the requirement in advance to trigger the subsequent operation. Specifically, the subsequent operation automatically triggered according to the operation instruction information is implemented in the embodiment of the present invention to acquire image information of a test target for subsequent processing, and includes an instruction execution condition and a first instruction content, where the instruction execution condition is a trigger condition for executing the first instruction content, and may be set by user-defined settings according to a user test requirement, and may be set to an instruction execution frequency and an execution number (for example, the first instruction content is executed once every 10 seconds and 20 times in total), or may be set to a condition based on a specified time (for example, the first instruction content is executed XX minutes XX seconds on XX days XX), and the specific condition is not limited in the embodiment of the present invention, and it is sufficient to trigger the execution of the first instruction content based on the user test requirement. The first instruction content refers to an action instruction for acquiring image information of a test target, and exemplarily, in a specific implementation example, the first instruction content is an instruction for starting a camera to take a picture, and in another specific implementation example, the first instruction content may also be an instruction for acquiring a screenshot.

Step S102: and acquiring the image information of the test target according to the configured operation instruction information. After the above operation instruction information is configured, after the method is started, the corresponding instruction execution condition is obtained for monitoring (for example, determined by real-time detection condition), and when the instruction execution condition is determined to be satisfied, the first instruction content is obtained to execute the action of obtaining the image information of the test target, so as to obtain the image information of the test target. Illustratively, when the first instruction content is an instruction for starting a camera to take a picture, the camera can be started to take a picture according to the instruction, and by preventing the camera from being in front of a screen of a test target, image information of the test target, such as a screen picture of the test target, can be automatically acquired by automatically executing the first instruction content. When the first instruction content is an instruction for acquiring a screen capture (taking a test target as a vehicle-mounted device as an example, and taking an adb capture instruction), the user does not need to take a picture through a camera, and only needs to start the test target such as the capture instruction of the vehicle-mounted device. However, compared with the prior art, the method of taking pictures through the camera is simple and convenient to operate, does not need a peripheral debugging line, and is wider in application range.

Step S103: and performing image recognition on the acquired image information of the test target to generate a first test result for storage. The voice test is carried out in an image recognition mode, and how to ensure the realization of the image recognition and the optimization of the effect is a key and difficult point for ensuring the accuracy of the test result. In order to ensure the image recognition effect, the embodiment of the invention realizes the image recognition of the acquired image information of the test target in an image comparison mode. In particular, it may be implemented as: firstly, configuring a mapping relation between voice action characteristics and a target state image, and generating a target state image library for storage, namely, performing associated storage on the voice action characteristics and a correct target state image which is corresponding to the voice action characteristics and is to be presented when the voice action characteristics are adopted, so as to form a target state image library; secondly, comparing the acquired image information of the test target with a target state image in a target state image library to determine whether a target state image consistent with the acquired image information exists; and finally, according to the comparison result, when the target state image consistent with the comparison result is confirmed, acquiring the system time and the corresponding voice action characteristic, and generating a first test result for storage according to the acquired voice action characteristic and the system time.

The voice action characteristic mentioned in the embodiment of the present invention refers to the test voice used, and exemplarily, in the wake-up test, the voice action characteristic is a wake-up word used in the test, and in the voice recognition test, the voice action characteristic is a corpus used in the test. Each tested voice, namely voice action characteristic corresponds to a target state image which should be presented, and the target state image can be uploaded by a developer or a user through screenshot in advance.

For example, the image comparison may be implemented by comparing the similarity of two images with respect to the pixel point matrix by using a python image library, and the similarity reaching a preset threshold is used as a criterion for determining that the two images are consistent.

Illustratively, performing image alignment may also be implemented using a Baidu alignment API.

And when the consistent target state image exists through image comparison, the voice test is considered to be successful, otherwise, the voice test is considered to be failed. Based on the comparison result, a first test result can be generated. Illustratively, the first test result generated may be a test result including a test time and a test result (i.e., success or failure). The test time can be obtained by obtaining the system time, the test result can be judged based on the comparison result, and when the first test result is generated, the first test result is bound with the corresponding voice action characteristic, namely the adopted test voice, so as to be marked and convenient for subsequent use. In a specific implementation, the generated first test result may also be set according to a user's test purpose, which is not considered to be limited in the embodiments of the present invention.

By the method, the target equipment, especially the target equipment (namely the tested equipment) can be automatically tested by voice when the target equipment does not have the condition of installing the test app, the requirement on the equipment is low (only a screen is needed), the operation is simple and easy, the method can be realized without professional testers, the labor is saved, and the test efficiency can be effectively improved.

Fig. 2 schematically shows a non-intrusive speech testing method according to another embodiment of the present invention, and as shown in fig. 2, the method according to the embodiment of the present invention further includes, on the basis of the embodiment shown in fig. 1:

the operation instruction information configured in step S101 further includes a second instruction content, which is configured to generate an instruction of sending a final test report to a specified user, for example. Therefore, the required test result, namely the second test result, can be formed and output to the specified address when the test is finished based on the content of the second instruction. The second instruction content may be obtained based on the instruction execution condition, that is, when the instruction execution condition is monitored, if it is monitored that the instruction execution condition has satisfied the condition for ending the test, the second instruction content is obtained to automatically trigger the subsequent operation. Therefore, the method of the embodiment of the present invention further includes, based on the configuration information:

step S104: and acquiring the second instruction content according to the instruction execution condition. For example, when the instruction execution condition is generally set according to the test purpose, and when the instruction execution condition is mistakenly awakened, that is, terminated, the comparison failure is detected according to the image comparison result, the second instruction content may be obtained; when the instruction execution condition is that the test is performed based on the frequency and the number of times, the second instruction content can be acquired when the preset number of times of the test is reached. The setting can be customized by a user according to the requirement.

Step S105: and acquiring the stored first test result according to the second instruction content, and generating a second test result according to the stored first test result for outputting. The generated second test result is defined according to the test purpose of the user, and for example, when the test purpose of the user is to obtain the wake-up rate, the wake-up rate (the number of times of successful wake-up/the total number of tests) is calculated according to all the stored first test results and is output to the user (for example, the wake-up rate may be output to a mailbox specified by the user).

By the method, the required test result report can be automatically acquired and output to the user on the basis of realizing the automatic voice test, so that the method is convenient for the user to check and is very friendly to the user.

As a preferred embodiment, in a specific implementation, in order to ensure the operation performance and effect of the above scheme, after each time the image comparison is performed to generate the first test result, a sleep time with a predetermined duration, such as ten seconds, may be set, and after the sleep time, the next instruction execution condition monitoring and image acquisition may be started. FIG. 3 is a schematic block diagram of a non-invasive voice testing apparatus according to an embodiment of the present invention, and the apparatus 1 includes the same as shown in FIG. 3

The first configuration module 30 is configured to configure the storage of the operation instruction information, where the configured instruction operation information includes an instruction execution condition, a first instruction content, and a second instruction content, and the specific configuration manner and content thereof refer to the description in the method section;

a second configuration module 31, configured to configure a mapping relationship between the voice action features and the target state image, and generate a target state image library for storage, where the contents of the configured voice action features and the target state image are described in the foregoing method section;

the image acquisition module 32 is used for acquiring the image information of the test target according to the configured operation instruction information; and

and a first test result generating module 33, configured to perform image recognition on the acquired image information of the test target to generate a first test result, and store the first test result.

Preferably, the image obtaining module 32 of the apparatus may be implemented as a camera for obtaining a screen picture of the test target according to the operation instruction information, and may be exemplarily implemented as a camera carried by a notebook computer, so that the first instruction content in the configured operation instruction information may be set based on an instruction interface for calling the camera thereof to take a picture.

The first test result generation module 33 of the apparatus shown in fig. 3 may be implemented to include

A comparison unit 33A, configured to compare the acquired screen photo with a target state image in the target state image library, determine whether a target state image consistent with the acquired screen photo exists, and output a corresponding voice action feature to the result generation unit 32C when the target state image consistent with the acquired screen photo exists; and

and the result generating unit 33B is configured to obtain the system time, generate a first test result storage according to the voice action characteristic and the system time, and generate the first test result which may include the voice action characteristic, the test time (recorded as the system time), and the test result (recorded as failure or success).

In other embodiments, the image obtaining module 32 may not include the image capturing unit 32A, but obtains the screen shot as the image information of the test target through the adb command based on the debug line connection target device.

Fig. 4 schematically shows a schematic block diagram of a non-intrusive speech testing apparatus according to another embodiment of the present invention, and as shown in fig. 4, the embodiment of the present invention further includes, on the basis of the embodiment shown in fig. 3: and the second test result generating module 34 is configured to, in response to the instruction for ending the test, obtain the stored first test result, and generate a second test result according to the stored first test result for output. The instruction for ending the test is the content of the second instruction, and the content of the generated second test result is set by a user according to a test purpose, and specifically, the content may be output to a mailbox specified by the user.

The specific implementation and interaction process of each module and unit in the apparatus are described in the foregoing method section, and are not described herein again.

It should be noted that the apparatus in the embodiment of the present invention is an intelligent device independent of a target device, such as a PC or a smart phone, and a test app is not required to be implanted on the target device (i.e., a device under test), and a screen photograph or a screenshot of the target device may be obtained through a camera or a debugging line of the device only by using the independent apparatus, so that an automatic voice test is implemented, the execution is convenient, the requirement on the target device is low, the labor cost is greatly saved, and the test efficiency is improved. And the device can generate a second test result, namely a test report, based on the test purpose and output the second test result to the user after the test is finished, so that the device is very intelligent and improves the user experience.

In some embodiments, the present invention further provides a computer-readable storage medium, in which one or more programs including executable instructions are stored, where the executable instructions can be read and executed by an electronic device (including but not limited to a computer, a server, or a network device, etc.) to perform the above-mentioned method for non-intrusive voice test of the present invention.

In some embodiments, the present invention further provides a computer program product comprising a computer program stored on a non-volatile computer-readable storage medium, the computer program comprising program instructions which, when executed by a computer, cause the computer to perform the above-described method of non-intrusive speech testing.

In some embodiments, an embodiment of the present invention further provides an electronic device, which includes: at least one processor, and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to perform the method for non-intrusive voice testing.

In some embodiments, the present invention further provides a storage medium having a computer program stored thereon, where the computer program is capable of performing the above method of non-intrusive voice testing when executed by a processor.

The non-invasive voice testing apparatus according to the embodiment of the present invention may be used to execute the method for testing a non-invasive voice according to the embodiment of the present invention, and accordingly achieve the technical effect achieved by the method for implementing a non-invasive voice testing according to the embodiment of the present invention, which is not described herein again. In the embodiment of the present invention, the relevant functional module may be implemented by a hardware processor (hardware processor).

Fig. 5 is a schematic hardware structure diagram of an electronic device for performing a method of non-intrusive voice test according to another embodiment of the present application, and as shown in fig. 5, the electronic device includes:

one or more processors 510 and memory 520, with one processor 510 being an example in fig. 5.

The apparatus for performing the method of non-intrusive voice testing may further comprise: an input device 530 and an output device 540.

The processor 510, the memory 520, the input device 530, and the output device 540 may be connected by a bus or other means, such as the bus connection in fig. 3.

Memory 520, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions/modules corresponding to the method for non-intrusive voice testing in embodiments of the present application. The processor 510 executes various functional applications of the server and data processing, i.e., the method of implementing the non-intrusive voice test in the above method embodiments, by executing non-volatile software programs, instructions and modules stored in the memory 520.

The memory 520 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the stored data area may store data created from use of the apparatus for non-intrusive voice testing, and the like. Further, the memory 520 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, memory 520 may optionally include memory remotely located from processor 510, which may be connected to a device for non-intrusive voice testing via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input device 530 may receive input numeric or character information and generate signals related to user settings and functional control of the device for non-intrusive voice testing. The output device 540 may include a display device such as a display screen.

The one or more modules described above are stored in the memory 520 and, when executed by the one or more processors 510, perform the method of non-intrusive voice testing in any of the method embodiments described above.

The product can execute the method provided by the embodiment of the application, and has the corresponding functional modules and beneficial effects of the execution method. For technical details that are not described in detail in this embodiment, reference may be made to the methods provided in the embodiments of the present application.

The electronic device of the embodiments of the present application exists in various forms, including but not limited to:

(1) mobile communication devices, which are characterized by mobile communication functions and are primarily targeted at providing voice and data communications. Such terminals include smart phones (e.g., iphones), multimedia phones, functional phones, and low-end phones, among others.

(2) The ultra-mobile personal computer equipment belongs to the category of personal computers, has calculation and processing functions and generally has the characteristic of mobile internet access. Such terminals include PDA, MID, and UMPC devices, such as ipads.

(3) Portable entertainment devices such devices may display and play multimedia content. Such devices include audio and video players (e.g., ipods), handheld game consoles, electronic books, as well as smart toys and portable car navigation devices.

(4) The server is similar to a general computer architecture, but has higher requirements on processing capacity, stability, reliability, safety, expandability, manageability and the like because of the need of providing high-reliability service.

(5) And other electronic devices with data interaction functions.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a general hardware platform, and certainly can also be implemented by hardware. Based on such understanding, the above technical solutions substantially or contributing to the related art may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application

What has been described above are merely some embodiments of the present invention. It will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention.

Claims

1. The non-invasive voice testing method is characterized by comprising the following steps:

configuring operation instruction information;

the image recognition is carried out on the obtained image information of the test target to generate a first test result for storage, and the method comprises the following steps:

configuring a mapping relation between voice action characteristics and a target state image, and generating a target state image library for storage;

comparing the acquired image information of the test target with a target state image in a target state image library, and determining whether a target state image consistent with the image information exists;

and when the target state image consistent with the target state image is confirmed to exist, acquiring the system time and the corresponding voice action characteristic, and generating a first test result according to the acquired voice action characteristic and the system time for storage.

2. The method according to claim 1, wherein the configured operation instruction information includes an instruction execution condition and a first instruction content, and the acquiring the image information of the test target according to the configured operation instruction information includes:

acquiring first instruction content according to the instruction execution condition;

and executing the action of acquiring the image information of the test target according to the first instruction content to acquire the image information of the test target.

3. The method of claim 2, wherein the configured operation instruction information further comprises a second instruction content, the method further comprising:

acquiring second instruction content according to the instruction execution condition;

and acquiring the stored first test result according to the second instruction content, and generating a second test result according to the stored first test result for outputting.

4. The method according to claim 3, wherein the first instruction content is an instruction for starting a camera to take a picture, and the acquired image information of the test target is a screen picture of the test target.

5. A non-intrusive voice testing apparatus, comprising:

the image acquisition module is used for acquiring the image information of the test target according to the configured operation instruction information;

the first test result generation module is used for carrying out image recognition on the acquired image information of the test target to generate a first test result for storage;

the second configuration module is used for configuring the mapping relation between the voice action characteristics and the target state image, and generating a target state image library for storage;

the image acquisition module is used for acquiring a screen photo of a test target according to the operation instruction information;

the first test result generation module comprises

The comparison unit is used for comparing the acquired screen picture with a target state image in a target state image library, confirming whether a target state image consistent with the acquired screen picture exists or not, and outputting corresponding voice action characteristics when the target state image consistent with the acquired screen picture exists; and

and the result generating unit is used for acquiring the system time and generating a first test result according to the voice action characteristics and the system time for storage.

6. The apparatus of claim 5, further comprising:

and the second test result generation module is used for responding to the instruction for finishing the test, acquiring the stored first test result, and generating and outputting a second test result according to the stored first test result.

7. An electronic device, comprising: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the method of any one of claims 1-4.

8. Storage medium on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 4.