CN114639373A

CN114639373A - Intelligent voice evaluation method, system and storage medium

Info

Publication number: CN114639373A
Application number: CN202210259474.2A
Authority: CN
Inventors: 马立民
Original assignee: Beijing Defengyun Technology Co ltd
Current assignee: Beijing Defengyun Technology Co ltd
Priority date: 2022-03-16
Filing date: 2022-03-16
Publication date: 2022-06-17

Abstract

The invention discloses an intelligent voice evaluation method, a system and a storage medium, wherein the method comprises the following steps: configuring voice test equipment and test monitoring equipment, and associating the voice test equipment and the test monitoring equipment; configuring voice test detection logic and a voice test task, and associating the voice test detection logic and the voice test task; executing a voice test task, and acquiring voice test process information output by the voice test equipment by the test monitoring equipment; and judging the voice test process information according to the voice test detection logic to obtain a voice test result. According to the embodiment of the invention, the test information is automatically analyzed by using the voice test detection logic to obtain the test result, so that the voice evaluation efficiency and accuracy are improved; because the evaluation standard is uniform and the data in the judgment process is completely recorded, the evaluation of various intelligent voice products is more fair and accurate, thereby improving the credibility of the intelligent voice evaluation.

Description

Intelligent voice evaluation method, system and storage medium

Technical Field

The invention relates to the technical field of computers, in particular to an intelligent voice evaluation method, an intelligent voice evaluation system and a storage medium.

Background

With the development of intelligent voice technology, such as the development of voice recognition technology, more and more products have intelligent voice functions. Products such as intelligent audio amplifier, smart mobile phone, intelligent house all have intelligent voice function, carry out corresponding intelligent control through speech recognition.

The intelligent voice product needs to be tested before being released, or the intelligent voice product is evaluated through a third-party mechanism. The test or evaluation of the current intelligent voice product is mainly carried out in a manual mode, and the test efficiency is low; because the test is carried out manually, the test standards are inconsistent, and the evaluation accuracy and the fairness of the intelligent voice product are lower.

Disclosure of Invention

The invention mainly aims to provide an intelligent voice evaluation method, an intelligent voice evaluation system and a storage medium, and aims to solve the problems of low evaluation efficiency and low fairness of intelligent voice products in the prior art.

In order to achieve the purpose, the invention provides an intelligent voice evaluation method, which comprises the following steps:

configuring voice test equipment and test monitoring equipment, and associating the voice test equipment with the test monitoring equipment;

configuring voice test detection logic and a voice test task, and associating the voice test detection logic and the voice test task;

executing the voice test task, wherein the test monitoring equipment acquires voice test process information output by the voice test equipment;

and judging the voice test process information according to the voice test detection logic to obtain a voice test result.

Optionally, the test monitoring device comprises: an image acquisition device and/or a data acquisition device;

the image acquisition device is used for acquiring image information in the testing process of the voice testing equipment;

the data acquisition device is used for acquiring data output in the testing process of the voice testing equipment.

Optionally, the voice test detection logic comprises at least one of: brightness recognition, icon recognition, character recognition, motion recognition and voice recognition;

the brightness recognition comprises the following steps:

a first identification range is circled on an image acquisition picture of the image acquisition device;

setting a brightness threshold value;

the image acquisition device acquires the brightness value in the first identification range and judges whether the brightness value is in the brightness threshold range; if the brightness is within the brightness threshold range, the voice test brightness recognition is successful;

the icon identification comprises the following steps:

a second identification range is circled on an image acquisition picture of the image acquisition device;

setting a standard icon;

the image acquisition device acquires the icons in the second identification range and judges whether the icons are matched with the standard icons or not; if the voice test icon is matched with the voice test icon, the voice test icon is successfully identified;

the character recognition comprises the following steps:

a third identification range is circled on an image acquisition picture of the image acquisition device;

setting standard characters;

the image acquisition device acquires characters in the third identification range and judges whether the characters are matched with the standard characters; if the matching is successful, the voice test character recognition is identified;

the motion recognition comprises the following steps:

a fourth identification range is circled on an image acquisition picture of the image acquisition device;

setting a movement starting position and a movement ending position in the fourth identification range;

the image acquisition device acquires the moving image in the fourth identification range, and judges whether the moving object reaches the movement starting position or the movement ending position according to the moving image; if the moving object reaches the movement starting position or the movement ending position, identifying that the voice test movement recognition is successful;

the speech recognition comprises the following steps:

setting standard voice information;

the voice acquisition device acquires the voice output in the testing process of the voice testing equipment, and performs voice recognition on the voice to obtain testing voice information;

judging whether the test voice information is matched with the standard voice information; if the matching is successful, the voice test voice recognition is indicated to be successful.

Optionally, the determining whether the moving object reaches the movement starting position or the movement ending position according to the moving image includes:

selecting a first characteristic device and/or installing a second characteristic device in the moving object;

acquiring the feature identifier of the moving object according to the first feature device and/or the second feature device;

judging whether the characteristic mark is in the range of the motion starting position or the motion ending position; determining that the moving object reaches the motion starting position or the motion ending position if the feature identifier is within the motion starting position or the motion ending position range.

Optionally, the voice test task comprises at least one of: a voice error awakening test task, a voice recognition test task and a voice control test task;

the voice false wake-up test task comprises at least one of the following: the system comprises first noise information, first voice test detection logic, test times and first test monitoring equipment;

the voice wake-up test task comprises at least one of the following: second noise information, awakening set information, second voice test detection logic, minimum response time, maximum response time and second test monitoring equipment;

the speech recognition test task includes at least one of: third noise information, third voice test detection logic and third test monitoring equipment;

the voice-controlled test task includes at least one of: fourth noise information, fourth voice test detection logic and fourth test monitoring equipment.

Optionally, the method further comprises the steps of:

generating a voice test report according to the voice test result;

the voice test report includes at least one of: a voice false wake-up test report, a voice recognition test report, and a voice control test report.

In addition, in order to achieve the above object, the present invention further provides an intelligent voice evaluation system, including:

the first configuration module is used for configuring voice test equipment and test monitoring equipment and associating the voice test equipment and the test monitoring equipment;

the second configuration module is used for configuring voice test detection logic and a voice test task and associating the voice test detection logic and the voice test task;

the test execution module is used for executing the voice test task, and the test monitoring equipment acquires voice test process information output by the voice test equipment;

and the test analysis module is used for judging the voice test process information according to the voice test detection logic to obtain a voice test result.

Optionally, the second configuration module comprises: a voice test detection logic configuration unit and a voice test task configuration unit;

the voice test detection logic configuration unit is used for configuring voice test detection logic; the voice test detection logic configuration comprises at least one of: brightness recognition configuration, icon recognition configuration, character recognition configuration, motion recognition configuration and voice recognition configuration;

the brightness recognition arrangement comprises the steps of:

setting a brightness threshold value;

the icon identifying configuration comprises the steps of:

setting a standard icon;

the character recognition configuration comprises the following steps:

setting standard characters;

the motion recognition arrangement comprises the steps of:

the speech recognition arrangement comprises the steps of:

setting standard voice information;

judging whether the test voice information is matched with the standard voice information; if the matching is successful, the voice test voice recognition is successful;

the voice test task configuration unit is used for configuring a voice test task;

the voice test task includes at least one of: a voice error awakening test task, a voice recognition test task and a voice control test task;

Optionally, the system further comprises: a test report generation module;

the test report generating module is used for generating a voice test report according to the voice test result;

Furthermore, to achieve the above object, the present invention also provides a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps of the intelligent speech assessment method as described above.

According to the embodiment of the invention, the test information is automatically analyzed by using the voice test detection logic to obtain the test result, so that the voice evaluation efficiency and accuracy are improved; because the evaluation standard is uniform and the data in the judgment process is completely recorded, the evaluation of various intelligent voice products is more fair and accurate, thereby improving the credibility of the intelligent voice evaluation.

Drawings

Fig. 1 is a schematic flow chart of an intelligent voice evaluation method provided by the present invention.

Fig. 2 is a schematic flow chart of luminance identification according to the present invention.

Fig. 3 is a schematic flow chart of icon identification provided in the present invention.

Fig. 4 is a schematic flow chart of the character recognition provided by the present invention.

Fig. 5 is a schematic flow chart of the motion recognition provided by the present invention.

Fig. 6 is a schematic flow chart of the method for determining that a moving object reaches a specified position according to the present invention.

FIG. 7 is a flow chart of speech recognition provided by the present invention.

Fig. 8 is a block diagram of an embodiment of an intelligent speech evaluation system according to the present invention.

Fig. 9 is a block diagram of a second configuration module according to an embodiment of the present invention.

Fig. 10 is another block diagram of the intelligent voice evaluation system according to the embodiment of the present invention.

Fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

In order to make the technical problems, technical solutions and advantageous effects to be solved by the present invention clearer and clearer, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

In the following description, suffixes such as "module", "component", or "unit" used to denote elements are used only for facilitating the explanation of the present invention, and have no specific meaning in itself. Thus, "module", "component" or "unit" may be used mixedly.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.

In one embodiment, as shown in fig. 1, the present invention provides an intelligent voice assessment method, including:

step 101, configuring a voice testing device and a testing monitoring device, and associating the voice testing device with the testing monitoring device.

Before intelligent voice test or evaluation is carried out, an intelligent voice test product needs to be configured in the intelligent voice evaluation system. And configuring products needing intelligent voice evaluation in the evaluation system through configuration. When the intelligent voice product is configured, the following contents are included:

brand manufacturer: the user needs to select from the drop-down list; meanwhile, user-defined adding manufacturers are supported (the adding of the manufacturers supports adding in a database dictionary and also supports front-end user-defined adding), and the user-defined adding manufacturers need to be filled;

the product name is as follows: the user needs to input manually, and the method has no limitation of numbers and characters and must fill in the characters; when the device management configures the associated product, the field needs to be called;

the equipment model is as follows: the user needs to input manually, and the method has no limitation of numbers and characters and must fill in the characters;

product category: the user needs to select from the drop-down list; meanwhile, user-defined adding categories (adding of categories supports adding in a database dictionary and also supports front-end user-defined adding) are supported, and filling is necessary;

and (3) product picture: manual uploading by a user is required, and the method has no format limitation and is not required to be filled;

after the intelligent voice product is configured, the evaluation system supports operations such as modification, deletion, query and the like on the configuration information of the intelligent voice product.

And configuring intelligent voice product testing monitoring equipment, wherein the testing monitoring equipment is used for collecting testing information output in the intelligent voice product testing process. The test monitoring device includes: image acquisition device, data acquisition device. The image acquisition device is used for acquiring image information in the intelligent voice product testing process; the data acquisition device is used for acquiring data output in the testing process of the intelligent voice product.

If the intelligent voice product allows test data to be output through the usb interface and the serial port, the data acquisition device is used for acquiring data output in the test process of the intelligent voice product. For example, the Android smart phone is docked with a data acquisition device (such as a PC) through a usb interface, and then data generated in the intelligent voice test process is output through an ADB function of an Android system.

If the intelligent voice product cannot output test data through the USB interface or other interfaces, the image acquisition device (such as a camera) is used for acquiring image information in the testing process of the intelligent voice product in real time, such as information of a screen of an intelligent mobile phone, the running state of an intelligent home and the like.

Before intelligent voice testing or evaluation is performed, one or more test monitoring devices need to be configured for each intelligent voice product. When the test monitoring equipment is configured, the following contents are included:

binding the equipment: the method comprises the following steps of dividing a USB device into a USB device and an image device, and filling in the USB device and the image device;

and the equipment MAC: automatically inputting after selecting the binding equipment, wherein the binding equipment is the only identification of the monitoring equipment and must be filled;

the device type is automatically input after the binding device is selected and filled;

the device name needs to be manually input by a user, and the device name is not limited by the number of words and characters and must be filled;

the device description needs to be manually input by a user, and is not necessary to be filled;

and the picture of the equipment needs to be uploaded manually by a user, and is not filled in unnecessarily.

When the test monitoring equipment is configured, the test monitoring equipment is already accessed to the system through a WiFi or USB mode. And in the process of binding the equipment, the equipment is monitored by selecting the test access system. After the test monitoring equipment is selected, the mac address and the equipment type (image acquisition device and data acquisition device) of the equipment are automatically acquired.

The intelligent voice product needs to be associated with the test monitoring equipment, and one intelligent voice product can be associated with one or more test monitoring equipment. As shown in the following table:

step 102, configuring a voice test detection logic and a voice test task, and associating the voice test detection logic and the voice test task.

Each intelligent voice product has different testing methods, and each testing method has corresponding testing detection logic. Each test task may be associated with one or more test methods, each of which is in turn associated with a test detection logic.

The test detection logic includes: brightness recognition, icon recognition, character recognition and motion recognition.

The brightness recognition detection logic configuration is as follows with reference to the flow chart in fig. 2:

step 201, a first identification range is circled on an image acquisition picture of the image acquisition device.

The test detection logic of brightness recognition can only be applied to image acquisition devices, such as network cameras. And (4) circling the identification range in the image picture of the network camera by using a mouse. The subsequent camera can only detect the brightness in the identification range.

Step 202, setting a brightness threshold.

A luminance threshold value within the recognition range set in the image capturing apparatus is set. When the brightness threshold value is set, the camera automatically acquires the brightness value in the current identification range. If the brightness value in the identification range when the indicator lamp is on after the intelligent sound box is tested to be awakened is used as a first brightness threshold value, when the brightness detected by the subsequent network camera in the identification range reaches the first brightness threshold value, the intelligent sound box is considered to be awakened successfully.

The brightness value of the intelligent sound box in the non-working state can be set as a second brightness threshold value, and when the brightness detected by the subsequent network camera in the identification range reaches the second brightness threshold value, the intelligent sound box is considered to be in the non-working state. The method can be used for the sound box to wake up to confirm whether the intelligent sound box recovers to the initial state. Every time the test is awakened, the intelligent sound box needs to be restored to the initial state.

The brightness threshold may be a range of values, such as: 175 to 200. When setting the brightness threshold, it is also necessary to set the brightness detection time period. If the brightness detection time length is set to be 5 seconds, the brightness detected by the camera in continuous 5 seconds is in the brightness threshold range, and the brightness in the current recognition range reaches the set brightness threshold. The brightness thresholds set are shown in the following table:

step 203, the image acquisition device acquires the brightness value in the first identification range, and judges whether the brightness value is in the brightness threshold range; if the brightness is within the brightness threshold range, the voice test brightness recognition is successful.

After the brightness threshold value is set in the designated range of the network camera, the testing system obtains the configured brightness threshold value and the detection duration according to the test case (the test case is matched with the test detection logic). When the brightness acquired by the network camera within the detection time is within the brightness threshold range, the test detection logic is considered to be successfully matched; otherwise, the matching is considered to be failed, namely the test case is executed to fail.

Icon recognition detection logic configuration, see the flow described in fig. 3:

in the intelligent voice product testing process, corresponding APP can be opened or corresponding equipment can be controlled through intelligent voice recognition. If the smart phone can start the corresponding APP through voice recognition, the smart phone can display the corresponding content of the APP after the APP is opened. The network camera can judge whether to start the correct APP or not through image real-time comparison.

Step 301, a second identification range is circled on an image acquisition picture of the image acquisition device.

And (4) circling the identification range in the image picture of the network camera by using a mouse. The subsequent camera can only detect the images within the identification range.

Step 302, setting a standard icon.

A standard image within the recognition range set in the image capturing apparatus is set. And when the standard image is set, the camera acquires the image in the current identification range. If the image in the identification range when the APP (such as a navigation APP) is opened by the smart phone is tested to serve as a first standard image, when the image detected by the follow-up network camera in the identification range is matched with the first standard image, the smart phone is considered to be successful in opening the APP.

The UI interface after the APP is closed by the smart phone can also be used as a second standard image, and when the image detected by the subsequent network camera in the identification range is matched with the second standard image, the APP is closed by the smart phone. The method can be used for judging whether the smart phone is restored to the initial state or not. When the APP test is started each time, the smart phone needs to be restored to the initial state.

The standard images set up are shown in the following table:

identifying a range	Standard icon
		Recognition range A	App.jpg is opened
Recognition range B	Close app

303, the image acquisition device acquires the icons in the second identification range and judges whether the icons are matched with the standard icons; and if the matching is carried out, the voice test icon recognition is successful.

After the standard picture is set in the specified range of the network camera, the test system acquires the configured standard picture according to the test case (the test case is matched with the test detection logic). And when the network camera acquires the image in the designated range in real time, matching the image with the standard image. If the acquired image is matched with the standard image, the test detection logic is considered to be successfully matched; otherwise, the matching is considered to be failed, namely the test case is failed to execute.

Matching the image acquired by the network camera with the standard image, wherein the matching can be performed in a pixel matching mode, namely, if the pixel similarity of the two pictures is greater than a certain threshold value, the two pictures are considered to be matched; and an artificial intelligence mode can also be adopted for matching, for example, a convolutional neural network is used for matching, and whether the two pictures are similar or not is judged. The technical scheme is particularly used for matching, and the technical scheme is not limited.

The text recognition detection logic configuration is as follows with reference to the flow chart shown in fig. 4:

in the testing process of the intelligent voice product, the voice can be subjected to character recognition through intelligent voice recognition, and a user can input contents conveniently. For example, the smart phone inputs characters through voice recognition instead of a keyboard mode. The network camera can identify characters in the image after acquiring the image, then compare the identified characters with standard characters, and judge whether the voice recognition of the intelligent voice product is correct.

Step 401, a third identification range is circled on an image acquisition picture of the image acquisition device.

And (4) circling the identification range in the image picture of the network camera by using a mouse. The subsequent camera can only perform character recognition on the images in the recognition range.

Step 402, standard characters are set.

Standard characters within the recognition range set in the image acquisition device are set. The set characters can be set according to the test case, and if the characters corresponding to the voice content of the test case are the characters A, the characters A are used as standard characters.

The standard text set is shown in the following table:

identifying a range	Standard characters
		Recognition range A	Character A
Recognition range B	Character B

Step 403, the image acquisition device acquires the characters in the third identification range, and judges whether the characters are matched with the standard characters; if so, the voice is identified, and the text recognition is tested to be successful.

And after the network camera acquires the image in the designated range, character recognition is carried out on the character image in the pattern, then the recognized character is compared with the standard character, and the matching degree of the recognized character and the standard character is judged. If the matching degree is greater than a certain threshold value, the recognized characters are considered to be successfully matched with the standard characters, and the success of the voice recognition test of the intelligent voice product is determined; otherwise, the failure is considered.

The threshold of the matching degree between the recognized text and the standard text can be set according to the requirement, for example, the threshold is set to be 85% of the matching degree.

The motion recognition detection logic configuration is as follows with reference to the flow chart in fig. 5:

in the testing process of the intelligent voice product, the intelligent equipment is controlled through the intelligent voice product. If the intelligent sound box is used for controlling the intelligent home, the intelligent curtain is opened or closed through the intelligent sound box. The network camera can identify the running condition of an object (such as a curtain) in the image after acquiring the image, and then compares the identified object motion position with the configured starting position or ending position to judge whether the voice control of the intelligent voice product is correct.

And step 501, a fourth identification range is circled on an image acquisition picture of the image acquisition device.

And (4) circling the identification range in the image picture of the network camera by using a mouse. And the subsequent camera can only identify the motion of the image in the identification range.

And 502, setting a movement starting position and a movement ending position in the fourth identification range.

And identifying the starting position and the ending position of the movement of the object in the identification range, wherein when the curtain is opened, the starting position of the curtain is on the right side, and the ending position of the curtain is on the left side. The specific position can be selected by a mouse in the identification range, and then the selected range is used as the starting position or the ending position.

Step 503, the image acquisition device acquires the motion image in the fourth identification range, and judges whether the moving object reaches the motion starting position or the motion ending position according to the motion image; and if the moving object reaches the movement starting position or the movement ending position, identifying that the voice test movement recognition is successful.

The network camera acquires images in the identification range in real time, and then judges whether the moving object moves to the initial position or the end position according to the images. The specific judgment process is as follows in the flow illustrated in fig. 6:

step 601, selecting a first characteristic device and/or installing a second characteristic device in the moving object.

The motion recognition detection logic needs a network camera or a background system to support the motions of recognizing curtains (supporting the opening degree of the curtains, such as opening 50 percent of the curtains), drying racks, sweeping robots, and the like. In order to facilitate the identification of moving object (such as curtain) images acquired by a network camera, a unique characteristic device of a special mark in the moving object can be selected; if the moving object does not have a special device, such as a window glass, a special marker can be arranged on the moving object, such as a sticker with a special color is pasted on the window glass, and the sticker is used as a characteristic device.

Step 602, obtaining the feature identifier of the moving object according to the first feature device and/or the second feature device.

The image acquisition feature identification of the moving object containing the special device can be marked and acquired by using an artificial intelligent algorithm, and the corresponding special mark coordinate feature identification can be selected from the image containing the feature device in a manual mode. For example, a red sticker pasted on the window glass is used as the characteristic mark.

Step 603, judging whether the feature identifier is in the range of the motion starting position or the motion ending position; determining that the moving object reaches the motion starting position or the motion ending position if the feature identifier is within the motion starting position or the motion ending position range.

The image acquired by the network camera in the appointed range is judged through the pixel comparison of the image or an artificial intelligence algorithm (a deep learning algorithm), and whether the characteristic identification (such as a red sticker pasted on the window glass) of the moving object moves to the appointed range of the appointed starting position or ending position is judged.

For example, in a test case for controlling the closing of the car window glass through intelligent voice, the position of the bottom of the car window is set as the starting position of a moving object, and the position of the top of the car window is set as the ending position of the moving object. When the red sticker of the window glass is detected at the end position, the voice control command for closing the window is successfully executed; and when the red paster of the window glass is detected at the starting position, the window glass is in an open state, and the voice control command test of window closing can be carried out. Every time the voice control command test of the car window closing is carried out, the car window is required to be ensured to be in an opening state.

The speech recognition detection logic is configured, referring to the flow chart in fig. 7:

in the testing process of the intelligent voice product, corresponding voice can be played according to the voice command of the user after intelligent voice recognition. For example, the smart sound box plays music, inquires weather and the like according to the voice command of the user. The voice acquisition device acquires voices played by the intelligent sound box in the test process in real time, then identifies the voices played by the intelligent sound box, and judges whether the intelligent sound box responds correctly according to semantics of the identified characters.

And step 701, setting standard voice information.

And setting corresponding standard voice information according to the test case. For example, playing a test case controlled by voice: when music of Liu De Hua is played, the standard voice information is as follows: liu De Hua is the singer. Different test cases correspond to different standard voice information. If weather forecast is inquired, the standard voice information is as follows: semantic field- -weather forecast.

Step 702, the voice collecting device obtains the voice output in the testing process of the voice testing equipment, and performs voice recognition on the voice to obtain testing voice information.

Step 703, judging whether the test voice information is matched with the standard voice information; if the matching is successful, the voice test voice recognition is indicated to be successful.

The voice acquisition equipment acquires voice according to the test case, for example, the test case is music playing in Liudebua. And after the voice control instruction is played, starting voice acquisition to acquire the voice content played by the voice test equipment.

And after the voice content is acquired, identifying the voice content. And during identification, different identifications are carried out according to different test cases. And if the music plays the related test cases, identifying the name and singer of the music played by the intelligent sound box.

For example, playing a test case controlled by voice: playing music of Liu De Hua, and after the voice acquired by the voice acquisition equipment is recognized, if the obtained song singer is Liu De Hua, indicating that the test case is successfully executed; otherwise, the test case is failed to execute.

When the intelligent voice product is evaluated, a corresponding test task needs to be configured. The test task can protect a plurality of test cases. If the car window voice control test task, the test task may include: the method comprises the following steps of voice awakening test cases, car window opening test cases, car window closing test cases and the like. Each test case is associated with a test detection logic, an intelligent voice product and one or more test monitoring devices. As shown in the following table:

and 103, executing the voice test task, wherein the test monitoring equipment acquires the voice test process information output by the voice test equipment.

And step 104, judging the voice test process information according to the voice test detection logic to obtain a voice test result.

The evaluation system tests according to the configured voice test tasks, and when one test task comprises a plurality of test cases, the test cases are executed one by one. If the car window voice control test task comprises the following steps: the method comprises the following steps of voice awakening test cases, car window opening test cases and car window closing test cases.

When the voice awakening test case is executed, the information in the test process is judged according to the voice test detection logic configured by the case, and whether the awakening is successful is judged through brightness identification. When the awakening is judged to be successful, the awakening time length of the intelligent voice equipment can be calculated according to the time length from the awakening word playing to the awakening success, meanwhile, an awakening time length threshold value can be set, and if the waiting time length after the awakening word playing exceeds the awakening time length threshold value, the test case execution failure is represented.

In a test task, a test case can be executed for multiple times, and the specific times can be set according to test requirements. If the awakening test is executed for 100 times, then the awakening success rate and the awakening failure rate of the intelligent voice equipment are calculated according to the awakening success times and the awakening failure times.

When the test case is executed, if the test case is configured with the noise information, the noise file in the noise information is played in the test process, and the playing time can be configured.

When the vehicle window opening test case is executed, the information in the test process is judged according to the voice test detection logic configured by the case, and whether the vehicle window is opened successfully is judged through motion recognition. When the window opening is judged to be successful, the control response time length of the intelligent voice equipment can be calculated according to the time length from the time when the window opening is successfully started by playing a control command (such as opening the window), meanwhile, a control response time length threshold value can be set, and if the waiting time length after the control command is played exceeds the control response time length threshold value, the test case execution is failed.

In a test task, a test case can be executed for multiple times, and the specific times can be set according to test requirements. If the window opening test is executed for 100 times, then the command control success rate and the command control failure rate of the intelligent voice equipment are calculated according to the window opening success times and the window opening failure times.

When some cases are tested, the moving object is required to be restored to the initial state. For example, when the window opening test case is executed for multiple times, the window needs to be restored to the closed state when the window opening test case is executed. The moving object is restored to the initial state and can be realized through a corresponding test case, for example, the vehicle window opening test case is executed after the vehicle window closing test case is executed successfully. If the moving object cannot be restored to the initial state, if the test case for closing the car window is not successfully executed for many times, the execution of the subsequent test case for opening the car window is stopped, and a user is prompted that the test task fails.

And after the evaluation system executes the corresponding test cases according to the test tasks, producing a test report according to the test result of each test case. The voice test report includes: the voice false wake-up test report, the voice recognition test report, the voice control test report and the like can also be summarized to generate a test report, such as a rating report. Such as rating or scoring the products involved in the evaluation based on the test results.

The voice false wake-up test report comprises: false wake-up frequency and average false wake-up frequency; the voice wake-up test report includes: including wake-up success rate and failure rate; the speech recognition test report includes: identifying a success rate and a failure rate; the voice control test report includes a recognition success rate and a failure rate.

In addition, an embodiment of the present invention further provides an intelligent voice evaluation system, and with reference to fig. 8, the intelligent voice evaluation system includes: a first configuration module 10, a second configuration module 20, a test execution module 30, and a test analysis module 40.

A first configuration module 10, configured to configure a voice testing device and a testing monitoring device, and associate the voice testing device with the testing monitoring device;

product picture: manual uploading by a user is required, and the method has no format limitation and is not necessary to fill;

And configuring intelligent voice product test monitoring equipment, wherein the test monitoring equipment is used for collecting test information output in the intelligent voice product test process. The test monitoring device includes: image acquisition device, data acquisition device. The image acquisition device is used for acquiring image information in the intelligent voice product testing process; the data acquisition device is used for acquiring data output in the testing process of the intelligent voice product.

If the intelligent voice product cannot output test data through the USB interface or other interfaces, image information in the test process of the intelligent voice product, such as information of a screen of an intelligent mobile phone, the running state of an intelligent home and the like, is obtained in real time by using an image acquisition device (such as a camera).

Before intelligent voice testing or evaluation, one or more test monitoring devices need to be configured for each intelligent voice product. When the test monitoring equipment is configured, the following contents are included:

binding equipment: the method comprises the following steps of dividing USB equipment and image equipment into two types, and filling in the USB equipment and the image equipment;

and the equipment MAC: automatically inputting after selecting binding equipment, wherein the binding equipment is the only identifier of the monitoring equipment and must be filled;

and the device pictures need to be uploaded manually by a user and are not required to be filled.

a second configuration module 20, configured to configure a voice test detection logic and a voice test task, and associate the voice test detection logic and the voice test task.

As shown in fig. 9, the second configuration die 20 includes: a voice test detection logic configuration unit 21 and a voice test task configuration unit 22.

A voice test detection logic configuration unit 21, configured to configure a voice test detection logic; the voice test detection logic configuration includes at least one of: brightness recognition configuration, icon recognition configuration, character recognition configuration, motion recognition configuration, and voice recognition configuration.

The brightness recognition arrangement comprises the steps of:

setting a brightness threshold value;

the image acquisition device acquires the brightness value in the first identification range and judges whether the brightness value is greater than the brightness threshold value; if the voice test brightness is larger than the preset threshold, the voice test brightness recognition is successful.

The test detection logic of brightness recognition can only be applied to image acquisition devices, such as network cameras. And circling the identification range by using a mouse in an image picture of the network camera. The subsequent camera can only detect the brightness in the identification range.

Step 202, setting a brightness threshold.

The brightness value of the intelligent sound box in the non-working state can be set as a second brightness threshold value, and the intelligent sound box is considered to be in the non-working state when the brightness detected by the subsequent network camera in the identification range reaches the second brightness threshold value. The method can be used for the sound box to wake up to confirm whether the intelligent sound box recovers to the initial state. Every time the test is awakened, the intelligent sound box needs to be restored to the initial state.

identifying a range	Luminance threshold	Duration of detection
			Recognition range A	175～200	5 seconds
Recognition range B	50～75	5 seconds

After the brightness threshold value is set in the designated range of the network camera, the testing system obtains the configured brightness threshold value and the detection duration according to the test case (the test case is matched with the test detection logic). When the brightness acquired by the network camera within the detection time is within the brightness threshold range, the test detection logic is considered to be successfully matched; otherwise, the matching is considered to be failed, namely the test case is failed to execute.

The icon identifying configuration comprises the steps of:

setting a standard icon;

the image acquisition device acquires the icons in the second identification range and judges whether the icons are matched with the standard icons or not; and if the matching is carried out, the voice test icon recognition is successful.

And (4) circling the identification range in the image picture of the network camera by using a mouse. And the subsequent camera can only detect the images in the identification range.

Step 302, setting a standard icon.

The UI interface after the APP is closed by the smart phone can be used as a second standard image, and when the image detected by the subsequent network camera in the identification range is matched with the second standard image, the smart phone is considered to have closed the APP. The method can be used for judging whether the smart phone is restored to the initial state or not. When the APP test is started each time, the smart phone needs to be restored to the initial state.

The brightness threshold may be a range of values, such as: 175 to 200. When setting the brightness threshold, it is also necessary to set the brightness detection time period. If the brightness detection time length is set to be 5 seconds, the brightness detected by the camera in continuous 5 seconds is in the brightness threshold range, and the brightness in the current recognition range reaches the set brightness threshold. The set brightness thresholds are shown in the following table:

identifying a range	Standard icon
		Recognition range A	Turn on app.jpg
Recognition range B	Close app.jpg

Matching the image acquired by the network camera with the standard image, wherein the matching can be performed in a pixel matching mode, namely, if the pixel similarity of the two pictures is greater than a certain threshold value, the two pictures are considered to be matched; and an artificial intelligence mode can also be adopted for matching, such as matching by using a convolutional neural network, and judging whether the two pictures are similar. The technical scheme is particularly used for matching, and the technical scheme is not limited.

The character recognition configuration comprises the following steps:

setting standard characters;

the image acquisition device acquires characters in the third identification range and judges whether the characters are matched with the standard characters; if so, the voice is identified, and the text recognition is tested to be successful.

Step 402, standard characters are set.

And setting standard characters in the recognition range set in the image acquisition device. The set characters can be set according to the test case, and if the characters corresponding to the voice content of the test case are the characters A, the characters A are used as standard characters.

The standard text set is shown in the following table:

Step 403, the image acquisition device acquires the characters in the third identification range, and judges whether the characters are matched with the standard characters; if the matching is successful, the voice test character recognition is identified.

The motion recognition arrangement comprises the steps of:

the image acquisition device acquires the moving image in the fourth identification range, and judges whether the moving object reaches the movement starting position or the movement ending position according to the moving image; and if the moving object reaches the movement starting position or the movement ending position, identifying that the voice test movement recognition is successful.

And (4) circling the identification range in the image picture of the network camera by using a mouse. The subsequent camera can only carry out motion recognition on the images within the recognition range.

The network camera acquires images in the identification range in real time, and then judges whether the moving object moves to the initial position or the end position according to the images. Referring to the flow illustrated in fig. 6, a specific determination process is shown:

The speech recognition arrangement comprises the steps of:

setting standard voice information;

Step 701, standard voice information is set.

And setting corresponding standard voice information according to the test case. For example, playing a test case controlled by voice: when music of Liu De Hua is played, the standard voice information is as follows: liu De Hua is the singer. Different test cases correspond to different standard voice information. If the weather forecast is queried, the standard voice information is: semantic field- -weather forecast.

And a voice test task configuration unit 22 for configuring the voice test task.

The voice testing task includes at least one of: a voice error awakening test task, a voice recognition test task and a voice control test task;

the voice wake-up test task includes at least one of: second noise information, awakening set information, second voice test detection logic, minimum response time, maximum response time and second test monitoring equipment;

When the intelligent voice product is evaluated, a corresponding test task needs to be configured. The test task can protect a plurality of test cases. For example, the car window voice control test task may include: the method comprises the following steps of voice awakening test cases, car window opening test cases, car window closing test cases and the like. Each test case is associated with a test detection logic, an intelligent voice product and one or more test monitoring devices. As shown in the following table:

the test execution module 30 is configured to execute the voice test task, and the test monitoring device obtains the voice test process information output by the voice test device;

and the test analysis module 40 is configured to judge the voice test process information according to the voice test detection logic to obtain a voice test result.

When the voice awakening test case is executed, the information in the test process is judged according to the voice test detection logic configured by the case, and whether the awakening is successful is judged through brightness identification. When whether the awakening is successful or not is judged, the awakening time length of the intelligent voice device can be calculated according to the time length from the time when the awakening word is played to the time when the awakening is successful, meanwhile, an awakening time length threshold value can be set, and if the waiting time length after the awakening word is played exceeds the awakening time length threshold value, the test case is indicated to be failed to execute.

When the test case is executed, if the test case is configured with the noise information, the noise file in the noise information is played in the test process, and the playing time length can be configured.

When some cases are tested, the moving object is required to be restored to the initial state. If the test case for opening the car window is executed for multiple times, the car window needs to be restored to a closed state when the test case for opening the car window is executed. The moving object is restored to the initial state, and the moving object can be restored through a corresponding test case, for example, the vehicle window test case is closed, and the vehicle window test case is opened only after the vehicle window test case is successfully closed. And if the moving object cannot be restored to the initial state, if the execution of the test case for closing the car window is not successful for one time after a plurality of times, stopping the execution of the subsequent test case for opening the car window, and prompting a user that the test task fails.

In addition, another intelligent voice evaluation system is further provided in an embodiment of the present invention, with reference to fig. 10, the intelligent voice evaluation system is on the system shown in fig. 8, and further includes: a test report generation module 50.

A test report generating module 50, configured to generate a voice test report according to the voice test result; the voice test report includes at least one of: a voice false wake-up test report, a voice recognition test report, and a voice control test report.

According to the embodiment of the invention, the test report is automatically generated according to the test result, so that the usability of the test system is improved, and the user experience is effectively improved.

It should be noted that each module or unit in the system may be configured to implement each step in the method, and achieve the corresponding technical effect, which is not described herein again.

Referring to fig. 11, fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

As shown in fig. 11, the electronic device may include: a processor 1001, such as a CPU, a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. The communication bus 1002 is used to implement connection communication among these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include standard wired interfaces, wireless interfaces (e.g., WI-FI, 4G, 5G interfaces). The memory 1005 may be a high-speed RAM memory or a non-volatile memory such as a disk memory. The memory 1005 may alternatively be a storage device separate from the processor 1001.

Those skilled in the art will appreciate that the configuration shown in fig. 10 is not limiting to electronic devices and may include more or fewer components than shown, or some components may be combined, or a different arrangement of components.

As shown in fig. 11, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and an intelligent voice evaluation program.

In the electronic apparatus shown in fig. 11, the network interface 1004 is mainly used for data communication with an external network; the user interface 1003 is mainly used for receiving input instructions of a user; the electronic device calls the smart voice evaluation program stored in the memory 1005 through the processor 1001 and performs the following operations:

the brightness recognition comprises the following steps:

setting a brightness threshold value;

the icon identification comprises the following steps:

setting a standard icon;

the character recognition comprises the following steps:

setting standard characters;

the motion recognition comprises the following steps:

the speech recognition comprises the following steps:

setting standard voice information;

the voice false wake-up test task comprises at least one of the following: the method comprises the steps of firstly, obtaining first noise information, first voice test detection logic, test times and first test monitoring equipment;

Optionally, the method further comprises the steps of:

generating a voice test report according to the voice test result;

In addition, an embodiment of the present invention further provides a computer-readable storage medium, where an intelligent voice evaluation program is stored on the computer-readable storage medium, and when executed by a processor, the intelligent voice evaluation program implements the following operations:

the brightness recognition comprises the following steps:

setting a brightness threshold value;

the icon identification comprises the following steps:

setting a standard icon;

the character recognition comprises the following steps:

setting standard characters;

the motion recognition comprises the following steps:

the speech recognition comprises the following steps:

setting standard voice information;

judging whether the characteristic mark is in the range of the motion starting position or the motion ending position; determining that the moving object reaches the motion start position or the motion end position if the feature identifier is within the motion start position or the motion end position range.

the voice-controlled test task comprises at least one of: fourth noise information, fourth voice test detection logic and fourth test monitoring equipment.

Optionally, the method further comprises the steps of:

generating a voice test report according to the voice test result;

the voice test report includes at least one of: a voice false wake-up test report, a voice recognition test report, a voice control test report.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, a controller, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention, and all equivalent structures or equivalent processes performed by the present invention or directly or indirectly applied to other related technical fields are also included in the scope of the present invention.

Claims

1. An intelligent voice evaluation method is characterized by comprising the following steps:

2. The method of claim 1, wherein the test monitoring device comprises: an image acquisition device and/or a data acquisition device and/or a voice acquisition device;

the data acquisition device is used for acquiring data output in the test process of the voice test equipment;

the voice acquisition device is used for acquiring the voice output in the testing process of the voice testing equipment.

3. The method of claim 2, wherein the voice test detection logic comprises at least one of: brightness recognition, icon recognition, character recognition, motion recognition and voice recognition;

the brightness recognition comprises the following steps:

setting a brightness threshold value;

the icon identification comprises the following steps:

setting a standard icon;

the character recognition comprises the following steps:

setting standard characters;

the motion recognition comprises the following steps:

the speech recognition comprises the following steps:

setting standard voice information;

4. The method according to claim 3, wherein the determining whether a moving object reaches the movement start position or the movement end position based on the moving image comprises:

5. The method of claim 3, wherein the voice test task comprises at least one of: a voice error awakening test task, a voice recognition test task and a voice control test task;

6. The method according to claim 1, characterized in that the method further comprises the steps of:

generating a voice test report according to the voice test result;

7. An intelligent voice assessment system, characterized in that the system comprises:

the first configuration module is used for configuring voice test equipment and test monitoring equipment and associating the voice test equipment with the test monitoring equipment;

the second configuration module is used for configuring voice test detection logic and a voice test task and associating the voice test detection logic with the voice test task;

the test execution module is used for executing the voice test task, and the test monitoring equipment acquires the voice test process information output by the voice test equipment;

8. The system of claim 7, wherein the second configuration module comprises: a voice test detection logic configuration unit and a voice test task configuration unit;

the brightness recognition arrangement comprises the steps of:

setting a brightness threshold value;

the icon identifying configuration comprises the steps of:

setting a standard icon;

the character recognition configuration comprises the following steps:

setting standard characters;

the motion recognition arrangement comprises the steps of:

the speech recognition arrangement comprises the steps of:

setting standard voice information;

9. The system of claim 7, further comprising: a test report generation module;

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the intelligent speech assessment method according to any one of claims 1 to 6.