CN113450767A

CN113450767A - Voice recognition test method, device, test equipment and storage medium

Info

Publication number: CN113450767A
Application number: CN202110706771.2A
Authority: CN
Inventors: 陆海鹏
Original assignee: Ping An International Smart City Technology Co Ltd
Current assignee: Ping An International Smart City Technology Co Ltd
Priority date: 2021-06-24
Filing date: 2021-06-24
Publication date: 2021-09-28

Abstract

The application relates to the field of artificial intelligence, and provides a voice recognition testing method, a voice recognition testing device and a storage medium, wherein the method comprises the following steps: acquiring a test file, wherein the test file comprises a plurality of voice files generated based on different radio angles, and radio angles and standard texts corresponding to the voice files; sending the plurality of voice files to a plurality of devices to be tested, and instructing the plurality of devices to be tested to perform voice recognition on the plurality of voice files respectively so that the plurality of devices to be tested generate a plurality of recognition texts corresponding to the plurality of voice files one by one respectively; acquiring a plurality of identification texts generated by a plurality of devices to be tested respectively; comparing the recognition text corresponding to the same voice file with the standard text to determine the recognition result of each voice file of a plurality of devices to be tested; and determining the respective voice recognition accuracy of the multiple devices to be tested according to the respective recognition results of the multiple devices to be tested on the multiple voice files. Therefore, the accuracy in batch testing can be improved.

Description

Voice recognition test method, device, test equipment and storage medium

Technical Field

The application relates to artificial intelligence and provides a voice recognition testing method, a voice recognition testing device, voice recognition testing equipment and a storage medium.

Background

With the development of artificial intelligence, more and more electronic devices have a voice recognition function, such as AI robots, intelligent home appliances, intelligent toys, and the like, and it can be understood that the recognition accuracy of such electronic devices to natural voice is a very critical index, which directly affects the effectiveness of the devices and the user experience, and therefore, the accuracy test for voice recognition is essential.

In order to improve the testing efficiency, the voice recognition test performed on the same electronic device generally adopts a batch testing mode, that is, a plurality of electronic devices are tested simultaneously, specifically, during the batch testing, a playing source plays a plurality of natural voices, so that each electronic device can perform voice testing separately. However, in the batch test process, the position and the angle of each electronic device and the playing source are random, and the electronic devices interfere with each other, so that the accuracy of the batch test is influenced.

Disclosure of Invention

Based on this, the embodiment of the application provides a voice recognition testing method, a voice recognition testing device, a testing device and a storage medium, so as to improve the accuracy in batch testing.

In a first aspect, an embodiment of the present application provides a speech recognition testing method, which is used for testing a device, and the method includes:

acquiring a test file, wherein the test file comprises a plurality of voice files generated based on different radio reception angles, radio reception angles corresponding to the voice files and standard texts corresponding to the voice files;

sending the plurality of voice files to a plurality of devices to be tested, and instructing the plurality of devices to be tested to perform voice recognition on the plurality of voice files respectively, so that the plurality of devices to be tested generate a plurality of recognition texts in one-to-one correspondence with the plurality of voice files respectively;

acquiring a plurality of identification texts generated by the devices to be tested respectively;

comparing the recognition text corresponding to the same voice file with the standard text to determine the recognition result of each voice file by the multiple devices to be tested;

and determining the respective voice recognition accuracy of the multiple devices to be tested according to the respective recognition results of the multiple devices to be tested on the multiple voice files.

In a second aspect, an embodiment of the present application provides a speech recognition testing apparatus, including:

the test file comprises a plurality of voice files generated based on different radio angles, radio angles corresponding to the voice files and standard texts corresponding to the voice files;

the data sending module is used for sending the voice files to a plurality of devices to be tested and instructing the devices to be tested to perform voice recognition on the voice files respectively so as to enable the devices to be tested to generate a plurality of recognition texts in one-to-one correspondence with the voice files respectively;

the second data acquisition module is used for acquiring a plurality of identification texts generated by the devices to be tested respectively;

the text comparison module is used for performing text comparison on the identification text corresponding to the same voice file and the standard text to determine the identification result of each voice file of the multiple devices to be tested;

and the result determining module is used for determining the voice recognition accuracy of the equipment to be tested according to the recognition results of the equipment to be tested on the voice files respectively.

In a third aspect, an embodiment of the present application provides a test apparatus, including a processor and a memory;

the memory for storing a computer program;

the processor is configured to execute the computer program and to implement the speech recognition testing method according to the first aspect when executing the computer program.

In a fourth aspect, the present application provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the computer program causes the processor to implement the speech recognition testing method according to the first aspect.

The embodiment of the application provides a voice recognition test method, a voice recognition test device, a test device and a storage medium, wherein the method comprises the following steps: acquiring a test file, wherein the test file comprises a plurality of voice files generated based on different radio angles, and radio angles and standard texts corresponding to the voice files; sending the plurality of voice files to a plurality of devices to be tested, and instructing the plurality of devices to be tested to perform voice recognition on the plurality of voice files respectively so that the plurality of devices to be tested generate a plurality of recognition texts corresponding to the plurality of voice files one by one respectively; acquiring a plurality of identification texts generated by a plurality of devices to be tested respectively; comparing the recognition text corresponding to the same voice file with the standard text to determine the recognition result of each voice file of a plurality of devices to be tested; and determining the respective voice recognition accuracy of the multiple devices to be tested according to the respective recognition results of the multiple devices to be tested on the multiple voice files. Therefore, when the batch test is carried out, the test equipment can send the plurality of voice files to the plurality of devices to be tested, and then the voice recognition accuracy rate of each device to be tested is determined, it can be understood that the plurality of voice files are generated based on different radio reception angles, and in the generation process of the plurality of voice files, not only is external interference not present, but also the broadcast direction and the radio reception direction are determined, so that the accuracy of the batch test is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic diagram of an exemplary prior art electronic device with speech recognition function;

FIG. 2 is a diagram of an exemplary scenario for performing batch testing in the prior art;

FIG. 3 is a schematic flow chart illustrating a speech recognition testing method according to an embodiment of the present application;

FIG. 4 is an exemplary scenario diagram of test file generation in an embodiment of the present application;

fig. 5 is a schematic block diagram of a structure of a test apparatus according to an embodiment of the present disclosure.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It is to be understood that the terminology used in the description of the present application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of the present application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should also be understood that the terms "first," "second," "third," "fourth," and the like in the description, in the claims, or in the above-described drawings (if any) are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order, and may be construed to indicate or imply relative importance or implicitly to the features indicated.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

As shown in fig. 1, an electronic device with a voice recognition function may include a microphone array, an audio processing module, a voice recognition module, and so on. Wherein, the microphone array is used for receiving natural voice, such as "weather today" receiving natural voice; the audio processing module is configured to perform audio processing on the natural speech received by the microphone array, for example, perform audio processing such as dereverberation, noise reduction, and gain compensation on the natural speech, and output an audio stream file to the speech recognition module after the audio processing, for example, perform audio processing on the natural speech "how do you feel cheerful today" received by the microphone array to generate a corresponding audio stream file; the speech recognition module is configured to perform speech recognition on the audio stream file input by the audio processing module to generate a recognition text, for example, generating a recognition text "what weather is today" according to the audio stream file. It can be understood that the recognized text is not necessarily identical to the actual content of the natural speech, and if the speech recognition accuracy of the electronic device is too low, the effectiveness and the user experience of the device are affected, so that an accuracy test for the speech recognition is indispensable.

In the prior art, a batch test mode is usually adopted for voice recognition tests of the same kind of electronic equipment, specifically, a playing source plays multiple pieces of natural voice in a laboratory, so that each electronic equipment can perform voice recognition respectively, and the voice recognition accuracy of each equipment can be determined respectively. The inventor finds that: in the batch test process, the position angles of each electronic device and the playing source are random, that is, the broadcasting direction of the playing source and the receiving direction of each electronic device are random, and the electronic devices are interfered with each other, so that the accuracy of the test result is influenced. Illustratively, as shown in fig. 2, the position angles of the respective electronic devices and the playing sources are random and interfere with each other.

To this end, the embodiment of the present application provides a speech recognition testing method, which is used for testing a device, where the testing device may include, for example, a terminal device or a server, where the terminal device includes, for example, a personal computer, and the like.

As shown in fig. 3, the method includes steps S10 through S50.

Step S10, obtaining a test file, where the test file includes a plurality of voice files generated based on different sound reception angles, a sound reception angle corresponding to each voice file, and a standard text corresponding to each voice file.

Specifically, the test equipment acquires the test files and is mainly used for batch test of voice recognition, and in order to ensure accuracy of test results, a plurality of voice files in the test files are generated based on different sound reception angles, that is, based on a plurality of sound reception angles.

The plurality of voice files can be generated by a voice processing device receiving natural voice under different radio reception angles. Specifically, the voice processing apparatus may receive natural voice and perform audio processing on the received natural voice, thereby outputting an audio stream file, that is, the voice file is an audio stream file; meanwhile, the voice processing device and the corresponding device in the device under test may be the same device, and for example, the voice processing device may include a microphone array and an audio processing module, and then the microphone array in the voice processing device and the microphone array in the device under test are the same electronic component, and similarly, the audio processing module in the voice processing device and the audio processing module in the device under test are the same software and/or hardware module. It can be understood that for each device under test, the plurality of voice files are equivalent to the device under test receiving natural voice from different reception angles, that is, each device under test does not need to receive natural voice and generate a voice file separately in the embodiment of the present application. In an embodiment, the plurality of voice files may be generated by the target device receiving natural voice from different radio reception angles, wherein the target device may be one of the plurality of devices to be tested, and it can be understood that since the target device is one of the plurality of devices to be tested and the target device is the same device as the other devices to be tested, the target device generates the plurality of voice files, and the convenience of batch testing can be improved.

The sound receiving angle corresponding to the voice file is used for representing the relative position relation between the voice processing device and the broadcast source when receiving the natural voice, and the sound receiving angle can be simply understood as the receiving angle of the voice processing device when receiving the natural voice. The standard text corresponding to the voice file refers to the actual content corresponding to the natural voice, for example, if a certain voice file is generated by the voice processing apparatus receiving the natural voice "hello" at a radio reception angle, the standard text corresponding to the voice file is "hello".

It can be understood that the plurality of voice files acquired by the test device are generated by the voice processing device receiving the natural voice alone, and in the process, when the voice processing device receives the natural voice, not only does not have external interference (for example, no interference caused by other devices to be tested), but also the relative position relationship between the voice processing device and the playing source is determined when the natural voice is received, that is, the broadcasting direction of the playing source is determined, and the sound receiving direction of the voice processing device is also determined, so that the accuracy of the batch test can be improved.

For example, the test file acquired by the test device may be generated in the following manner. In the exemplary scenario shown in fig. 4, the playing source may play the natural voice at four preset positions in the drawing, and it can be understood that when the playing source plays the natural voice at different preset positions, the relative position relationship between the voice processing apparatus and the playing source is different, that is, the sound reception angles are different, specifically, when the playing source plays the natural voice at four preset positions in the drawing, the sound reception angles are 45 °, 90 °, 135 °, and 225 °, respectively; it should be further noted that the playback source may play different natural voices at the same preset position, or play the same natural voice at different preset positions, and the embodiment of the present application is not limited thereto, for example, the playback source plays the natural voices "how do it in the weather today", "what news there is today" and "play music" at four preset positions in the drawing. Based on this, after the playing is completed, the voice processing device can generate a plurality of voice files, and the sound reception angle and the standard text corresponding to each voice file can be generated artificially, so that the test equipment can acquire the test file.

Step S20, sending the multiple voice files to multiple devices under test, and instructing the multiple devices under test to perform voice recognition on the multiple voice files, so that the multiple devices under test generate multiple recognition texts corresponding to the multiple voice files one to one.

The test equipment can perform batch test after acquiring the test file, and specifically, the test equipment can send a plurality of voice files in the test file to each device to be tested, and instruct each device to be tested to perform voice recognition on the voice files, so that each device to be tested can generate a plurality of recognition texts corresponding to the voice files one by one after completing the voice recognition, that is, for one device to be tested, a plurality of recognition texts corresponding to the voice files one by one can be generated after performing the voice recognition on the voice files, that is, each voice file of each device to be tested has a recognition text corresponding to the voice file. In an embodiment, the test device sends the plurality of voice files to the device to be tested, and the plurality of voice files can be sent to the device to be tested by calling an SDK interface service preset on the device to be tested.

As can be seen from the foregoing discussion, the multiple voice files can be generated by the device under test receiving natural voices from different radio angles, so that the device under test performs voice recognition on the voice files to generate recognition texts, and accuracy of the voice recognition is not affected.

For example, the plurality of voice files may include a voice file 1 and a voice file 2, the voice file 1 may correspond to a radio angle of 45 °, the annotation text may be "what is today's weather and cheer", the voice file 2 may correspond to a radio angle of 90 °, and the annotation text may be "what is today's news". The test device may send the two voice files to each device under test, and instruct each device under test to perform voice recognition, specifically, since the two voice files are already in a recognizable state, each device under test may directly recognize the two voice files to generate a recognition text corresponding to each voice file one to one, for example, a certain device under test generates a recognition text for the voice file 1 of "what weather is today" and a recognition text for the voice file 2 of "what news is today".

Step S30, acquiring a plurality of identification texts generated by each of the plurality of devices under test.

And step S40, comparing the recognition texts corresponding to the same voice file with the standard texts to determine the recognition results of the multiple devices to be tested on the voice files respectively.

After each device to be tested generates a plurality of identification texts, the test device can acquire the identification texts generated by each device to be tested, and perform text comparison according to the identification texts. Specifically, as can be seen from the foregoing discussion, each of the voice files in the test file has the standard text corresponding thereto, and each of the recognition texts is generated by the device under test according to the voice files, so that it can be understood that one voice file corresponds to one standard file and one recognition text. Illustratively, the standard text corresponding to the voice file 1 is a, and the identification text generated by the device under test 1 according to the voice file 1 is B1, then the standard text a and the identification text B1 correspond to the same voice file, and similarly, the identification text generated by the device under test 2 according to the voice file 1 is B2, then the standard text a and the identification text B2 correspond to the same voice file.

Therefore, for a device to be tested, the test device can perform text comparison (i.e., text content comparison) on the recognition text corresponding to the same voice file and the standard text, and determine the recognition result of the device to be tested on each voice file. It will be appreciated that in the same manner, the test device may determine the recognition result of each voice file by each device under test.

For example, the device under test may include a device under test a and a device under test B, and the plurality of voice files in the test file may include a voice file a and a voice file B, where the voice file a corresponds to the standard text a and the voice file B corresponds to the standard text B. The test device may send both of the voice files to both of the devices under test and instruct both of the devices under test to perform voice recognition. Based on this, the test apparatus a may generate the recognized text a1 and the recognized text a2 for the voice file a and the voice file B, respectively, and the test apparatus B may generate the recognized text B1 and the recognized text B2 for the voice file a and the voice file B, respectively. Therefore, the test device may obtain the identification text generated by each device to be tested, and perform text comparison, specifically, for the device to be tested a, perform text comparison on the identification text a1 corresponding to the voice file a and the standard text a, and perform text comparison on the identification text a2 corresponding to the voice file B and the standard text B, thereby determining the identification results of the device to be tested a on the two voice files, and the device to be tested B is similar and is not repeated.

In an embodiment, the step S40 of comparing the recognition text corresponding to the same voice file with the standard text to determine the recognition result of the voice file by the device under test includes:

and step S410, determining whether the text content of the recognized text is consistent with the text content of the standard text.

Step S420, if yes, determining that the recognition result of the device under test to the voice file is valid.

Specifically, the text comparison may adopt a consistency comparison strategy, that is, whether the text content of the recognized text is consistent with the text content of the standard text is judged, if so, it is determined that the recognition result of the device to be tested on the voice file is valid, and if not, it is determined that the recognition result of the device to be tested on the voice file is invalid, where the fact that the recognition result of the device to be tested on the voice file is valid indicates that the device to be tested can successfully recognize the natural voice corresponding to the voice file, and conversely, the fact that the recognition result of the device to be tested on the voice file is invalid indicates that the device to be tested cannot successfully recognize the natural voice corresponding to the voice file. For example, the standard text corresponding to the voice file may be "what news exists today", and the recognition text generated by the device under test after performing voice recognition may be "what news exists today", so that the test device may compare the texts after obtaining the recognition text, and obviously it can be determined that the text contents of the two are consistent, and therefore it can be determined that the recognition result of the device under test on the voice file is valid. It is understood that in the same manner, the recognition result of each device under test for each voice file can be determined.

In an embodiment, after step S410, the method further includes:

and S430, if the text content of the recognized text is inconsistent with the text content of the standard text, determining the similarity between the text content of the recognized text and the text content of the standard text.

And S440, if the similarity is greater than or equal to a preset threshold, determining that the recognition result of the equipment to be tested on the voice file is valid.

And S450, if the similarity is smaller than a preset threshold, determining that the recognition result of the equipment to be tested on the voice file is invalid.

In some cases, the actual content of the natural voice may include a number of word aids (e.g., "la") that are not easily received by the device under test, which may cause the text content of the recognized text to be inconsistent with the content of the standard text, for example, the text content of the standard text is "how la" in the weather today, and the text content of the recognized text is "how so in the weather today", in which case, if the recognition result is determined to be invalid directly, the voice recognition accuracy of the device under test may be affected. Based on this, in order to further improve the accuracy of batch testing, the embodiment of the present application further determines under the condition that the text contents are inconsistent, specifically, the similarity of the text contents and the text contents can be determined, if the similarity exceeds a preset threshold, the recognition result of the device to be tested on the voice file is determined to be valid, and if the similarity is lower than the preset threshold, the recognition result of the device to be tested on the voice file is determined to be invalid, wherein the preset threshold can be set reasonably according to the actual situation. For example, the text content of the recognition text may be "how like the weather is today", the text content of the standard text may be "how cheer the weather is today", and the preset threshold value may be 90%, it is obvious that the similarity of the text content between the two exceeds the preset threshold value, and thus the recognition result of the voice file may be determined to be valid.

In one embodiment, the step S430 of determining the similarity between the text content of the recognized text and the text content of the standard text includes:

s431, converting the recognized text into a first sentence vector, and converting the standard text into a second sentence vector.

S432, determining a cosine angle distance value based on the first sentence vector and the second sentence vector, and taking the cosine angle distance value as the similarity.

Specifically, the test equipment can convert the recognition text and the standard text into sentence vectors through a preset word2vec model, and then calculate a cosine angle distance value according to the two sentence vectors, so that the similarity of the recognition text and the standard text on the text content can be determined.

In one embodiment, the first sentence vector converted from the text content of the recognition text may be represented as Y ═ Y₁,y₂,y₃,...,y_n]Where Y represents the first sentence vector, Y_nRepresenting a word vector, n represents the number of characters of the recognized text, e.g. the text content of the recognized text is "what is the weather today", n is 7, y₁Is the word vector corresponding to the current word; the second sentence vector converted from the text content of the standard text may be represented as X ═ X₁,x₂,x₃,...,x_n]Where X represents a second sentence vector, X_nRepresenting a word vector, n represents the number of characters of the standard text, e.g. the text content of the standard text is "how cheerful today's weather", n is 8, x₈Is the word vector corresponding to "la". Based on this, the cosine angle distance value cos θ can be calculated by the following formula and taken as the similarity of the recognized text and the standard text on the text content.

And step S50, determining the respective voice recognition accuracy of the multiple devices to be tested according to the respective recognition results of the multiple devices to be tested on the multiple voice files.

For a device to be tested, after text comparison is completed, the test device can determine the voice recognition accuracy of the device to be tested according to the recognition results of the device to be tested on a plurality of voice files. In the same manner, the test equipment can determine the speech recognition accuracy of each device under test.

It can be understood that the embodiment of the present application can determine the overall speech recognition accuracy of the device under test, and the foregoing discussion can show that the speech file corresponds to the radio reception angle, so that the embodiment of the present application can also determine the speech recognition accuracy of the device under test at a certain radio reception angle, the speech recognition accuracy in a certain radio reception angle range, and the like.

In one embodiment, the test device may store the speech recognition accuracy of the device under test in a blockchain node, where the blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

In one embodiment, step S50 includes at least one of:

determining the voice recognition accuracy of the equipment to be tested at the radio receiving angle according to the recognition results of a plurality of voice files corresponding to the same radio receiving angle;

determining the voice recognition accuracy of the equipment to be tested in the radio reception angle range according to the recognition results of a plurality of voice files corresponding to the preset radio reception angle range; and

and determining the overall voice recognition accuracy of the equipment to be tested according to the recognition results of all the voice files.

As can be seen from the foregoing discussion, each voice file corresponds to a sound reception angle, so that when the voice recognition accuracy of the device to be tested (that is, the voice recognition accuracy of the device to be tested at the sound reception angle is the target sound reception angle) is to be determined when the sound source is located at a certain position, the voice recognition accuracy can be determined according to the recognition results of all the voice files corresponding to the target sound reception angle. For example, the number of the voice files corresponding to the target sound reception angle may be 100, and in these voice files, the recognition result is valid 91, and the voice recognition accuracy of the device under test at the target sound reception angle is 91%.

Similarly, when the speech recognition accuracy of the device to be tested (i.e. the speech recognition accuracy of the device to be tested in the target sound reception angle range) is to be determined when the broadcast source is in a certain position range, the speech recognition accuracy can be determined according to the recognition results of the speech files with all the sound reception angles in the range. For example, the target sound reception angle range is 30 ° to 150 °, the number of the voice files with the sound reception angle within the range may be 100, and the recognition result in the voice files is 90%, so that the voice recognition accuracy of the device under test in the target sound reception angle range is 90%.

Of course, if the overall speech recognition accuracy of the device under test is to be determined, the determination may be performed according to the recognition results of all the speech files, for example, the number of the speech files may be 100, and the recognition result in these speech files is valid in 89, so the overall speech recognition accuracy of the device under test is 89%.

In an embodiment, the test device may establish a connection with the device to be tested through a tcp (transmission Control protocol)/udp (user data program) protocol. It can be understood that, when performing batch testing, data transmission between the testing device and the device to be tested may not require participation of a third-party device (e.g., a third-party server), and therefore the testing device may establish a connection with each device to be tested through a TCP/UDP protocol, and thus, data transmission may be performed between the testing device and the device to be tested based on the TCP/UDP protocol, for example, the testing device sends a plurality of process voice files to the device to be tested based on the TCP/UDP protocol. Therefore, the data transmission based on the TCP/UDP protocol can ensure the safety of the data transmission in the batch test process.

In combination with the foregoing discussion, it can be seen that the speech recognition testing method provided in the embodiments of the present application includes: acquiring a test file, wherein the test file comprises a plurality of voice files generated based on different radio angles, and radio angles and standard texts corresponding to the voice files; sending the plurality of voice files to a plurality of devices to be tested, and instructing the plurality of devices to be tested to perform voice recognition on the plurality of voice files respectively so that the plurality of devices to be tested generate a plurality of recognition texts corresponding to the plurality of voice files one by one respectively; acquiring a plurality of identification texts generated by a plurality of devices to be tested respectively; comparing the recognition text corresponding to the same voice file with the standard text to determine the recognition result of each voice file of a plurality of devices to be tested; and determining the respective voice recognition accuracy of the multiple devices to be tested according to the respective recognition results of the multiple devices to be tested on the multiple voice files. Therefore, when the batch test is carried out, the test equipment can send the plurality of voice files to the plurality of devices to be tested, and then the voice recognition accuracy rate of each device to be tested is determined, it can be understood that the plurality of voice files are generated based on different radio reception angles, and in the generation process of the plurality of voice files, not only is external interference not present, but also the broadcast direction and the radio reception direction are determined, so that the accuracy of the batch test is improved.

The embodiment of the present application further provides a speech recognition testing apparatus, and the apparatus includes:

The specific implementation of the above scheme is discussed in the foregoing, and is not described in detail here.

An embodiment of the present application further provides a testing apparatus, as shown in fig. 5, where the testing apparatus includes a processor and a memory, and the memory is used to store a computer program; the processor is configured to execute the computer program and implement any one of the speech recognition test methods provided in the embodiments of the present application when executing the computer program.

It should be understood that the Processor may be a Central Processing Unit (CPU), and the Processor may be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, etc. Wherein a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the processor is enabled to implement any one of the voice recognition test methods provided by the embodiment of the present application.

It will be understood by those of ordinary skill in the art that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed by several physical components in cooperation. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable storage media, which may include computer readable storage media (or non-transitory media) and communication media (or transitory media).

The term computer-readable storage medium includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer-readable storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.

For example, the computer readable storage medium may be an internal storage unit of the testing device described in the foregoing embodiment, for example, a hard disk or a memory of the testing device. The computer readable storage medium may also be an external storage device of the test device, such as a plug-in hard disk provided on the computer terminal, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like.

While the invention has been described with reference to specific embodiments, the scope of the invention is not limited thereto, and those skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the invention. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A speech recognition testing method for testing a device, the method comprising:

2. The method according to claim 1, wherein the comparing the recognition text corresponding to the same voice file with the standard text to determine the recognition result of the device under test on the voice file comprises:

determining whether the text content of the recognized text is consistent with the text content of the standard text;

and if the voice file identification result is consistent with the voice file identification result, the equipment to be tested is determined to be effective.

3. The method according to claim 2, wherein the text comparison between the recognition text corresponding to the same voice file and the standard text is performed to determine the recognition result of the device under test on the voice file, further comprising:

if the text content of the identification text is determined to be inconsistent with the text content of the standard text, determining the similarity between the text content of the identification text and the text content of the standard text;

if the similarity is larger than or equal to a preset threshold value, determining that the recognition result of the equipment to be tested on the voice file is valid;

and if the similarity is smaller than the preset threshold, determining that the recognition result of the equipment to be tested on the voice file is invalid.

4. The method of claim 3, wherein determining the similarity between the text content of the recognized text and the text content of the standard text comprises:

converting the recognized text into a first sentence vector;

converting the standard text into a second sentence vector;

determining a cosine angle distance value based on the first sentence vector and the second sentence vector, and taking the cosine angle distance value as the similarity.

5. The method of claim 1, wherein determining the speech recognition accuracy of the device under test according to the recognition result of the device under test on the plurality of speech files comprises at least one of:

determining the voice recognition accuracy of the equipment to be tested under the radio reception angle according to the recognition results of a plurality of voice files corresponding to the same radio reception angle;

6. The method of any of claims 1-5, wherein the plurality of voice files are generated by a target device receiving natural voice from different radio angles, the target device being one of the plurality of devices under test.

7. The method according to any of claims 1-5, wherein the test device establishes a connection with the device under test via TCP/UDP protocol.

8. A speech recognition test device, comprising:

9. A test apparatus comprising a processor and a memory;

the memory for storing a computer program;

the processor for executing the computer program and implementing the speech recognition testing method according to any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, causes the processor to carry out the speech recognition test method according to any one of claims 1 to 7.