WO2018157840A1 - 语音识别测试方法及测试终端、计算设备及存储介质 - Google Patents

语音识别测试方法及测试终端、计算设备及存储介质 Download PDF

Info

Publication number
WO2018157840A1
WO2018157840A1 PCT/CN2018/077784 CN2018077784W WO2018157840A1 WO 2018157840 A1 WO2018157840 A1 WO 2018157840A1 CN 2018077784 W CN2018077784 W CN 2018077784W WO 2018157840 A1 WO2018157840 A1 WO 2018157840A1
Authority
WO
WIPO (PCT)
Prior art keywords
tested
test
result
voice recognition
speech recognition
Prior art date
Application number
PCT/CN2018/077784
Other languages
English (en)
French (fr)
Inventor
单永生
张驰
王亚军
Original Assignee
广东神马搜索科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广东神马搜索科技有限公司 filed Critical 广东神马搜索科技有限公司
Publication of WO2018157840A1 publication Critical patent/WO2018157840A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals

Definitions

  • the present invention relates to the field of voice recognition, and in particular to a voice recognition test method, a test terminal, a computing device, and a storage medium.
  • an object of the embodiments of the present invention is to provide a voice recognition test method and a test terminal.
  • a voice recognition test method for use in a voice recognition test system, the system comprising a test terminal, a client terminal, and a voice recognition server, the test terminal and the client terminal through an audio transmission line Electrically connecting, the client terminal is in communication connection with the voice recognition server through a network, and the method includes:
  • the test terminal transmits the voice data generated by the audio file to be tested played by the test terminal to the client terminal through the audio transmission line;
  • the client terminal encodes the received voice data, and sends the encoded voice data to the voice recognition server for voice recognition;
  • the voice recognition server identifies the voice data and transmits the voice recognition result to the client terminal;
  • the test terminal acquires the voice recognition result from the client terminal
  • the test terminal compares the speech recognition result with a pre-stored standard result corresponding to the to-be-tested audio file to obtain a test result.
  • test terminal wherein the test terminal is electrically connected to a client terminal through an audio transmission line, and the client terminal is communicatively coupled to a voice recognition server, the test terminal comprising:
  • a voice recognition test device installed/stored in the memory and executed by the processor
  • the voice recognition test device includes:
  • a voice data transmission module configured to transmit voice data generated by the audio file to be tested played by the test terminal to the client terminal through an audio transmission line, so that the client terminal encodes the voice data and sends the voice data to the voice recognition a server, the voice recognition server transmitting a voice recognition result to the client terminal;
  • a recognition result obtaining module configured to acquire the voice recognition result from the client terminal
  • the test result generating module is configured to compare the speech recognition result with the pre-stored standard result corresponding to the to-be-tested audio file to obtain a test result.
  • a voice recognition test method is further provided, which is applied to a test terminal, wherein the test terminal is electrically connected to a client terminal through an audio transmission line, and the client terminal communicates with a voice recognition server. Connecting; the method includes:
  • a computing device comprising: a processor; and a memory having executable code stored thereon, when the executable code is executed by the processor, causes the processing
  • the method of the first aspect and the third aspect of the invention is performed by the present invention.
  • a non-transitory machine readable storage medium having stored thereon executable code for causing said processor to be executed by a processor of an electronic device
  • the method of the first aspect and the third aspect of the invention is carried out as described above.
  • the voice recognition test method and the test terminal of the present invention transmit voice data to the client terminal through the audio transmission line by directly using the test terminal to simulate the user input voice, and the client terminal transmits the received voice data to the voice recognition.
  • Server that implements automated voice search testing.
  • the voice data is transmitted to the client terminal through the audio transmission line, which most realistically simulates the user usage scene, and can avoid the problem that the test accuracy is unreliable due to interference of external factors such as noise, and greatly improves the test efficiency.
  • FIG. 1 is a schematic diagram of a speech recognition test system in accordance with one embodiment of the present invention.
  • FIG. 2 is a block schematic diagram of a test terminal in accordance with one embodiment of the present invention.
  • FIG. 3 is a schematic diagram of functional modules of a speech recognition test apparatus according to an embodiment of the present invention.
  • FIG. 4 is a flow chart of a speech recognition test method in accordance with one embodiment of the present invention.
  • FIG. 5 is a flow chart of a speech recognition test method according to another embodiment of the present invention.
  • FIG. 1 is a schematic diagram of a speech recognition test system in accordance with one embodiment of the present invention.
  • the voice recognition test system of the present invention may include: a test terminal 100, a client terminal 200, and a voice recognition server 300.
  • the test terminal 100 and the client terminal 200 may be a personal computer (PC), a tablet computer, a smart phone, a personal digital assistant (PDA), or the like.
  • the test terminal 100 can be a PC for testing the voice recognition function of the client terminal 200, and the client terminal 200 is equipped with voice recognition software (such as search software, browser, instant messaging). Software, etc.) Mobile terminals such as mobile phones and tablet computers.
  • the voice recognition server 300 is in communication connection with one or more client terminals 200 over a network for data communication or interaction.
  • the speech recognition server 300 can include, but is not limited to, a network speech recognition server, a database speech recognition server, and the like.
  • FIG. 2 is a block schematic diagram of a test terminal in accordance with one embodiment of the present invention.
  • the test terminal 100 of the present invention may include a voice recognition test device 110, a memory 111, a memory controller 112, a processor 113, a peripheral interface 114, an input and output unit 115, an audio unit 116, and a display unit 117.
  • the components of the memory 111, the memory controller 112, the processor 113, the peripheral interface 114, the input and output unit 115, the audio unit 116, and the display unit 117 are directly or indirectly electrically connected to each other to implement data transmission. Or interaction.
  • the components can be electrically connected to one another via one or more communication buses or signal lines.
  • the voice recognition test apparatus 110 includes at least one software function module that can be stored in the memory 111 or in an operating system (OS) of the test terminal in the form of software or firmware.
  • the processor 113 is configured to execute an executable module stored in a memory, such as a software function module or a computer program included in the voice recognition test device 110.
  • the memory 111 can be, but not limited to, a random access memory (RAM), a read only memory (ROM), and a programmable read-only memory (PROM). Erasable Programmable Read-Only Memory (EPROM), Electric Erasable Programmable Read-Only Memory (EEPROM), and the like.
  • RAM random access memory
  • ROM read only memory
  • PROM programmable read-only memory
  • EPROM Erasable Programmable Read-Only Memory
  • EEPROM Electric Erasable Programmable Read-Only Memory
  • the memory 111 is used to store a program, and the processor 113 executes the program after receiving the execution instruction.
  • the method executed by the test terminal 100 defined by the flow process disclosed in any embodiment of the present invention may be applied to
  • the processor 113 is implemented by or by the processor 113.
  • the processor 113 may be an integrated circuit chip with signal processing capabilities.
  • the processor 113 may be a general-purpose processor, including a central processing unit (CPU), a network processor (NP processor, etc.), or a digital signal processor (DSP), an application specific integrated circuit. (ASIC), Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware component.
  • CPU central processing unit
  • NP processor network processor
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • FPGA Field Programmable Gate Array
  • the general purpose processor may be a microprocessor or the processor or any conventional processor or the like.
  • peripheral interface 114 couples various input/output devices to the processor 113 and the memory 111.
  • peripheral interface 114, processor 113, and memory controller 112 can be implemented in a single chip. In other instances, they can be implemented by separate chips.
  • the input and output unit 115 is configured to provide input data to the user.
  • the input and output unit 115 may be, but not limited to, a mouse, a keyboard, and the like.
  • the audio unit 116 provides an audio interface to a user, which may include one or more microphones, one or more speakers, and audio circuitry.
  • the display unit 117 provides an interactive interface (such as a user operation interface) between the test terminal 200 and the user or for displaying image data to the user for reference.
  • the display unit 117 can be a liquid crystal display or a touch display.
  • a touch display it can be a capacitive touch screen or a resistive touch screen that supports single-point and multi-touch operations. Supporting single-point and multi-touch operations means that the touch display can sense the simultaneous touch operation from one or more positions on the touch display, and the touch operation is performed by the processor. Calculation and processing.
  • FIG. 3 is a schematic diagram of functional modules of a speech recognition test apparatus according to an embodiment of the present invention.
  • the voice recognition test apparatus 110 includes a voice data transmission module 1101, a recognition result acquisition module 1102, a test result generation module 1103, a recognition result judgment module 1104, and a data deletion module 1105.
  • the test result generating module 1103 specifically includes: a correct word count calculating unit 11031 and an accuracy rate calculating unit 11032.
  • FIG. 4 is a flow chart of a speech recognition test method applied to the speech recognition test system shown in FIG. 1 according to an embodiment of the present invention. The specific flow shown in FIG. 4 will be described in detail below.
  • step S101 the test terminal 100 transmits the voice data generated by the audio file to be tested played by the test terminal 100 to the client terminal 200 through an audio transmission line.
  • the process described in step S101 can be performed and implemented by the voice data transmission module 1101.
  • the test terminal 100 can be electrically connected to the client terminal 200 through an audio transmission line.
  • the test terminal 100 can be connected to the microphone of the client terminal 200 through an audio transmission line.
  • the audio file to be tested can be played through the test terminal 100, and then the voice data is generated.
  • the test terminal 100 transmits the voice data generated by playing the audio file to be tested to the client terminal 200 through the audio transmission line.
  • step S102 the client terminal 200 encodes the received voice data, and transmits the encoded voice data to the voice recognition server 300 for voice recognition.
  • the step S102 can be performed by the to-be-tested application with the voice recognition function installed by the client terminal 200, and the to-be-tested application can directly send the received voice data to the voice recognition server 300 for voice recognition.
  • the speech recognition function of the application to be tested performs automated speech recognition.
  • the voice recognition server 300 identifies the voice data and transmits the voice recognition result to the client terminal 200.
  • the voice recognition server 300 identifies that the recognition result of the voice data may be a corresponding character string. For example, if the voice data is Chinese voice, the voice recognition result is a character string composed of Chinese characters. For example, if the voice data is English voice, the voice recognition result is an English character string composed of English words or letters.
  • the client terminal 200 may generate a result log of the received voice recognition result, and store the result log to a system log buffer.
  • the client terminal 200 can use android.util.Log to print the final result to the system log buffer as a log.
  • step S104 the test terminal 100 acquires the voice recognition result from the client terminal 200.
  • the process described in step S104 is performed and implemented by the recognition result obtaining module 1102.
  • test terminal 100 can directly obtain the speech recognition result from the system log buffer area of the client terminal 200.
  • step S105 the test terminal 100 compares the speech recognition result with the pre-stored standard result corresponding to the to-be-tested audio file to obtain a test result.
  • the process described in step S105 can be performed and implemented by the test result generation module 1103.
  • the corresponding standard result may be a manually labeled character string corresponding to the audio file to be tested, and the standard result is obtained according to the content of the audio file to be tested manually and recorded in the test terminal.
  • the tester stores a plurality of the to-be-tested audio files in the memory 111 and the manually labeled characters corresponding to the to-be-tested audio file before using the test terminal 100 for the voice recognition test.
  • a string (standard result) is stored in association with the audio file to be tested. For example, the degree of similarity between the speech recognition result and the manually annotated character string can be compared by comparing the speech recognition result with the manually annotated character string.
  • the test result may be the same rate of the speech recognition result and the manually labeled character string.
  • step S105 specifically includes:
  • the process described in this step can be performed and implemented by the correct word count computing unit 11031.
  • the standard result corresponding to the audio file to be tested may be “I am going to work at nine o'clock today”, and the speech recognition result is “I am going to work near today”, then the number of characters recognized correctly is five.
  • the speech recognition accuracy rate of the audio file to be tested is calculated according to the number of correctly recognized characters and the number of characters included in the standard result.
  • the process described in this step can be performed and implemented by the accuracy calculation unit 11032.
  • the standard result corresponding to the audio file to be tested may be “I am going to work at nine o'clock today”, and the speech recognition result is “I am going to work today”, then the correct number of characters is Five.
  • the standard result corresponding to the audio file to be tested is seven words, and the speech recognition accuracy is five-fifths.
  • test terminal 100 may continuously send a plurality of different sets of audio files to be tested to the client terminal 200, and the step S105 may further include the following steps:
  • the process described in this step can be performed and implemented by the correct word count computing unit 11031.
  • the test result may include: correcting the number of characters in a voice test result, the total number of words in the voice data generated by the audio file to be tested, the total correct number of characters in the plurality of sets of voice test results, and the to-be-tested The total number of words in the voice data generated by the audio file, the error rate of a single voice test result, and the total error rate of multiple sets of voice test results.
  • ai indicates the correct number of characters of the i-th voice test result
  • bi indicates the total number of words in the voice data generated by the i-th test audio file
  • m indicates the total correct character of the plurality of sets of voice test results.
  • the number indicates the total number of words in the voice data generated by the audio file to be tested
  • the error rate of the i-th voice test result is represented by wi
  • the total error rate of the plurality of sets of voice test results is represented by wt.
  • the calculation formula of the test result can be:
  • n is the number of sets of voice data generated by the audio file to be tested for one test.
  • the voice data is transmitted to the client terminal 200 by directly simulating the user input voice with the test terminal 100, and the client terminal 200 transmits the received voice data to the voice recognition server to implement an automated voice search. test.
  • the voice data is transmitted to the client terminal 200 through the audio transmission line, which most realistically simulates the user usage scene, avoids interference of external factors such as noise, and improves the accuracy and efficiency of the automated voice recognition test.
  • test terminal 100 may further determine whether the voice recognition result satisfies a preset condition, and trigger the test terminal 100 to play the voice recognition result correspondingly when the voice recognition result does not satisfy the preset condition.
  • the audio file to be tested is transmitted to the client terminal 200 through the audio transmission line, and the audio file to be tested is tested again.
  • the preset condition may be that the number of characters corresponding to the speech recognition result exceeds three characters.
  • the test terminal 100 may delete the audio file to be tested after determining that the voice recognition result of the to-be-tested audio file does not satisfy the preset condition after the preset number of times of testing (for example, three times).
  • the preset condition may be that the voice recognition result includes a character length exceeding a preset character length.
  • test terminal 100 may further send the test result to a communication terminal corresponding to the preset communication account.
  • the communication account may be a communication account corresponding to the tester, so that the tester can view the test result in real time.
  • FIG. 5 is a flow chart of a speech recognition test method applied to the test terminal 100 shown in FIG. 2, in accordance with one embodiment of the present invention. This embodiment is similar to the above embodiment, except that the present embodiment is based on the test terminal 100 for explaining the voice recognition test method. Further details regarding the present embodiment can be further referred to the method embodiments described above. As shown in FIG. 5, the method in this embodiment includes the following steps:
  • step S201 the voice data generated by the audio file to be tested played by the test terminal 100 is transmitted to the client terminal 200 through the audio transmission line, so that the client terminal 200 encodes the voice data and sends the voice data to the voice recognition.
  • the server 300, the voice recognition server 300 transmits a voice recognition result to the client terminal 200.
  • step S201 the process described in step S201 is performed and implemented by the voice data transmission module 1101.
  • step S202 the speech recognition result is acquired from the client terminal 200.
  • step S202 the process described in step S202 is performed and implemented by the recognition result obtaining module 1102.
  • step S203 the speech recognition result is compared with the pre-stored standard result corresponding to the audio file to be tested to obtain a test result.
  • step S203 the process described in step S203 is performed and implemented by the test result generation module 1103.
  • test terminal 100 may further determine whether the voice recognition result satisfies a preset condition, and trigger the test terminal 100 to play the voice recognition result correspondingly when the voice recognition result does not satisfy the preset condition.
  • the audio file to be tested is transmitted to the client terminal 200 through the audio transmission line, and the audio file to be tested is tested again.
  • the preset condition may be that the number of characters corresponding to the speech recognition result exceeds three characters.
  • the test terminal 100 may delete the audio file to be tested after determining that the voice recognition result of the to-be-tested audio file does not satisfy the preset condition after the preset number of times of testing (for example, three times).
  • the preset condition may be that the voice recognition result includes a character length exceeding a preset character length.
  • step S203 may specifically include: comparing the speech recognition result with a standard result corresponding to the audio file to be tested, and obtaining a correct number of characters in the speech recognition result.
  • the process described in the above steps is performed and implemented by the correct word count computing unit 11031.
  • the speech recognition accuracy rate of the audio file to be tested is calculated according to the number of correctly recognized characters and the number of characters included in the standard result.
  • the process described in the above steps is performed and implemented by the accuracy calculation unit 11032.
  • the step S203 may specifically include: comparing the voice recognition result with a standard result corresponding to the audio file to be tested, and obtaining the correct number of characters in the voice recognition result comprises: calculating a pair The total number of correct characters for multiple voice files to be tested for speech recognition.
  • the process described in the above steps is performed and implemented by the correct word count computing unit 11031.
  • the step of calculating a voice recognition accuracy rate of the audio file to be tested according to the number of characters recognized by the correct number of characters and the number of characters included in the standard result includes: according to the total number of correct characters and the plurality of to-be-satisfied
  • the total accuracy of the speech recognition of the plurality of audio files to be tested is calculated by summing the number of characters included in the standard result corresponding to the test audio file.
  • the voice data is transmitted to the client terminal 200 by directly simulating the user input voice with the test terminal 100, and the client terminal 200 transmits the received voice data to the voice recognition server to implement an automated voice search. test.
  • the voice data is transmitted to the client terminal 200 through the audio transmission line, which most realistically simulates the user usage scenario, and improves the accuracy and efficiency of the automated voice recognition test.
  • each block of the flowchart or block diagram can represent a module, a program segment, or a portion of code that includes one or more of the Executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the blocks may also occur in a different order than those illustrated in the drawings.
  • each block of the block diagrams and/or flowcharts, and combinations of blocks in the block diagrams and/or flowcharts can be implemented in a dedicated hardware-based system that performs the specified function or function. Or it can be implemented by a combination of dedicated hardware and computer instructions.
  • the above technical concept of the present invention can be embodied as a non-transitory machine readable storage medium having executable code stored thereon.
  • the executable code is executed by a processor of the electronic device, the processor is caused to perform the method described above.
  • the above technical concept of the present invention can also be implemented as a computing device including a processor and a memory.
  • the memory is stored with executable code.
  • the processor is caused to perform the method described above.
  • the functional modules in various embodiments of the present invention may be integrated to form a separate portion, or each module may exist separately, or two or more modules may be integrated to form a separate portion.
  • the functions, if implemented in the form of software functional modules and sold or used as separate products, may be stored in a computer readable storage medium.
  • the technical solution of the present invention which is essential or contributes to the prior art, or a part of the technical solution, may be embodied in the form of a software product, which is stored in a storage medium, including
  • the instructions are used to cause a computer device (which may be a personal computer, voice recognition server 300, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Mathematical Physics (AREA)
  • Telephonic Communication Services (AREA)

Abstract

提供语音识别测试方法及测试终端,语音识别测试方法包括:测试终端将测试终端播放的待测试音频文件产生的语音数据通过音频传导线传输给客户终端(S101);客户终端将接收到的语音数据进行编码,并将编码后的语音数据发送给语音识别服务器进行语音识别(S102);语音识别服务器将语音数据进行识别,并将语音识别结果发送给客户终端(S103);测试终端从客户终端中获取语音识别结果(S104);测试终端将语音识别结果与预存的待测试音频文件对应的标准结果进行比对得到测试结果(S105);由此可以自动化语音识别测试以提高语音识别测试的效率。

Description

语音识别测试方法及测试终端、计算设备及存储介质 技术领域
本发明涉及语音识别领域,具体而言,涉及一种语音识别测试方法及测试终端、计算设备及存储介质。
背景技术
随着信息技术的不断发展和进步,信息搜索方式已经不再拘泥于文字搜索等常规搜索方式,越来越多的软件产品提供了代替手工输入关键字搜索的语音搜索方式。提供语音搜索的软件产品,在销售或者上市之前都需要进行语音识别测试,以保证语音搜索的效率。现有的语音识别测试方式,大多需要人工录入语音来测试识别准确率。这种人工测试准确率的方式有比较大的弊端。例如,两次相同的语音内容的测试之间的测试用例(如,输入语音)不能保证完全相同,导致测试结果参考性不高。此外,影响语音识别准确率的因素有很多,比如:语速,语调等都可能对最终的识别结果带来影响。另外,人工测试的语音词条相对比较少,而且费时费力。
发明内容
有鉴于此,本发明实施例的目的在于提供一种语音识别测试方法及测试终端。
根据本发明的第一方面,提供了一种语音识别测试方法,应用于语音识别测试系统,该系统包括测试终端、客户终端以及语音识别服务器,所述测试终端通过音频传导线与所述客户终端电性连接,所述客户终端通过网络与所述语音识别服务器通信连接,该方法包括:
所述测试终端将测试终端播放的待测试音频文件产生的语音数据通过所述音频传导线传输给所述客户终端;
所述客户终端将接收到的所述语音数据进行编码,并将编码后的语音 数据发送给所述语音识别服务器进行语音识别;
所述语音识别服务器将所述语音数据进行识别,并将语音识别结果发送给所述客户终端;
所述测试终端从所述客户终端中获取所述语音识别结果;以及
所述测试终端将所述语音识别结果与预存的所述待测试音频文件对应的标准结果进行比对得到测试结果。
根据本发明的第二方面,还提供了一种测试终端,所述测试终端通过音频传导线与客户终端电性连接,所述客户终端与一语音识别服务器通信连接,所述测试终端包括:
存储器;
处理器;
安装/存储于所述存储器并由所述处理器执行的语音识别测试装置;
所述语音识别测试装置包括:
语音数据传输模块,用于将测试终端播放的待测试音频文件产生的语音数据通过音频传导线传输给所述客户终端,使所述客户终端将所述语音数据进行编码并发送给所述语音识别服务器,所述语音识别服务器将语音识别结果发送给所述客户终端;
识别结果获取模块,用于从所述客户终端中获取所述语音识别结果;
测试结果生成模块,用于将所述语音识别结果与预存的所述待测试音频文件对应的标准结果进行比对得到测试结果。
根据本发明的第三方面,还提供了一种语音识别测试方法,该方法应用于测试终端,所述测试终端通过音频传导线与客户终端电性连接,所述客户终端与一语音识别服务器通信连接;所述方法包括:
将测试终端播放的待测试音频文件产生的语音数据通过音频传导线传输给所述客户终端;使所述客户终端将所述语音数据进行编码并发送给所述语音识别服务器,所述语音识别服务器将语音识别结果发送给所述客户 终端;
从所述客户终端中获取所述语音识别结果;
将所述语音识别结果与预存的所述待测试音频文件对应的标准结果进行比对得到测试结果。
根据本发明的第四方面,还提供了一种计算设备,包括:处理器;以及存储器,其上存储有可执行代码,当所述可执行代码被所述处理器执行时,使所述处理器执行本发明如上第一方面和第三方面所述的方法。
根据本发明的第五方面,还提供了一种非暂时性机器可读存储介质,其上存储有可执行代码,当所述可执行代码被电子设备的处理器执行时,使所述处理器执行本发明如上第一方面和第三方面所述的方法。
与现有技术相比,本发明的语音识别测试方法及测试终端,通过直接用测试终端模拟用户输入语音通过音频传导线向客户终端传输语音数据,客户终端将接收到的语音数据发送给语音识别服务器,实现自动化的语音搜索的测试。另外,语音数据通过音频传导线传输给所述客户终端,最真实地模拟用户使用场景,同时可以避免噪声等外界因素的干扰导致的测试准确率不可靠的问题,也极大地提高了测试效率。
为使本发明的上述目的、特征和优点能更明显易懂,下文特举较佳实施例,并配合所附附图,作详细说明如下。
附图说明
为了更清楚地说明本发明实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,应当理解,以下附图仅示出了本发明的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。
图1为根据本发明一个实施例的语音识别测试系统的示意图。
图2为根据本发明一个实施例的测试终端的方框示意图。
图3为根据本发明一个实施例的语音识别测试装置的功能模块示意图。
图4为根据本发明一个实施例的语音识别测试方法的流程图。
图5为根据本发明另一个实施例的语音识别测试方法的流程图。
具体实施方式
下面将结合本发明实施例中附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。通常在此处附图中描述和示出的本发明实施例的组件可以以各种不同的配置来布置和设计。因此,以下对在附图中提供的本发明的实施例的详细描述并非旨在限制要求保护的本发明的范围,而是仅仅表示本发明的选定实施例。基于本发明的实施例,本领域技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本发明保护的范围。
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步定义和解释。同时,在本发明的描述中,术语“第一”、“第二”等仅用于区分描述,而不能理解为指示或暗示相对重要性。
图1为根据本发明一个实施例的语音识别测试系统的示意图。
如图1所示,本发明的语音识别测试系统可以包括:测试终端100、客户终端200及语音识别服务器300。所述测试终端100及客户终端200可以是个人电脑(personal computer,PC)、平板电脑、智能手机、个人数字助理(personal digital assistant,PDA)等。在一个优选实施例中,所述测试终端100可以是一个用于测试客户终端200的语音识别功能的PC机,所述客户终端200为安装有语音识别软件(如搜索软件、浏览器、即时通信软件等)的手机、平板电脑等移动终端。所述语音识别服务器300通过网络与一个或多个客户终端200进行通信连接,以进行数据通信或交互。所述语音识别服务器300可以包括但不限于是网络语音识别服务器、数据库语音 识别服务器等。
图2为根据本发明一个实施例的测试终端的方框示意图。
如图2所示,本发明的测试终端100可以包括语音识别测试装置110、存储器111、存储控制器112、处理器113、外设接口114、输入输出单元115、音频单元116及显示单元117。
所述存储器111、存储控制器112、处理器113、外设接口114、输入输出单元115、音频单元116及显示单元117等各元件相互之间直接或间接地电性连接,以实现数据的传输或交互。例如,这些元件相互之间可通过一条或多条通讯总线或信号线实现电性连接。所述语音识别测试装置110包括至少一个可以软件或固件(firmware)的形式存储于所述存储器111中或固化在所述测试终端的操作系统(operating system,OS)中的软件功能模块。所述处理器113用于执行存储器中存储的可执行模块,例如所述语音识别测试装置110包括的软件功能模块或计算机程序。
其中,所述存储器111可以是,但不限于,随机存取存储器(Random Access Memory,RAM),只读存储器(Read Only Memory,ROM),可编程只读存储器(Programmable Read-Only Memory,PROM),可擦除只读存储器(Erasable Programmable Read-Only Memory,EPROM),电可擦除只读存储器(Electric Erasable Programmable Read-Only Memory,EEPROM)等。其中,存储器111用于存储程序,所述处理器113在接收到执行指令后,执行所述程序,本发明实施例任一实施例揭示的流过程定义的测试终端100所执行的方法可以应用于处理器113中,或者由处理器113实现。
所述处理器113可能是一种集成电路芯片,具有信号的处理能力。上述的处理器113可以是通用处理器,包括中央处理器(Central Processing Unit,简称CPU)、网络处理器(Network Processor,简称NP)等;还可以是数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、 分立硬件组件。可以实现或者执行本发明实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
所述外设接口114将各种输入/输出装置耦合至处理器113以及存储器111。在一些实施例中,外设接口114,处理器113以及存储控制器112可以在单个芯片中实现。在其他一些实例中,他们可以分别由独立的芯片实现。
所述输入输出单元115用于提供给用户输入数据。所述输入输出单元115可以是,但不限于,鼠标和键盘等。
所述音频单元116向用户提供音频接口,其可包括一个或多个麦克风、一个或者多个扬声器以及音频电路。
所述显示单元117在测试终端200与用户之间提供一个交互界面(例如用户操作界面)或用于显示图像数据给用户参考。在本实施例中,所述显示单元117可以是液晶显示器或触控显示器。若为触控显示器,其可为支持单点和多点触控操作的电容式触控屏或电阻式触控屏等。支持单点和多点触控操作是指触控显示器能感应到来自该触控显示器上一个或多个位置处同时产生的触控操作,并将该感应到的触控操作交由处理器进行计算和处理。
图3为根据本发明一个实施例的语音识别测试装置的功能模块示意图。如图3所示,所述语音识别测试装置110包括:语音数据传输模块1101、识别结果获取模块1102、测试结果生成模块1103、识别结果判断模块1104及数据删除模块1105。其中,所述测试结果生成模块1103具体包括:正确 字数计算单元11031及准确率计算单元11032。
以下将结合语音识别测试方法的两个实施例中的描述对上述语音识别测试装置110包括的各功能模块进行详细描述。
图4为根据本发明一个实施例的应用于图1所示的语音识别测试系统的语音识别测试方法的流程图。下面将对图4所示的具体流程进行详细阐述。
参见图4,在步骤S101,所述测试终端100将测试终端100播放的待测试音频文件产生的语音数据通过音频传导线传输给所述客户终端200。一较佳实施例中,所述步骤S101所描述的过程可由所述语音数据传输模块1101执行并实现。
本实施例中,所述测试终端100可通过音频传导线与所述客户终端200电性连接。例如,所述测试终端100可通过音频传导线连接到客户终端200的麦克风。在测试开始时,可通过所述测试终端100播放待测试音频文件,然后产生语音数据。接着,所述测试终端100将播放所述待测试音频文件产生的语音数据通过所述音频传导线传输给所述客户终端200。
在步骤S102,所述客户终端200将接收到的所述语音数据进行编码,并将编码后的语音数据发送给所述语音识别服务器300进行语音识别。所述步骤S102可由客户终端200安装的具有语音识别功能的待测试应用程序执行,该待测试应用程序将接收到的语音数据进行编码后可直接发送给语音识别服务器300进行语音识别,以对该待测试应用程序的语音识别功能进行自动化的语音识别。
在步骤S103,所述语音识别服务器300将所述语音数据进行识别,并将语音识别结果发送给所述客户终端200。本实施例中,所述语音识别服务器300识别所述语音数据的识别结果可以是对应的字符串。例如,所述语音数据为中文语音,则所述语音识别结果为中文文字组成的字符串。再例 如,所述语音数据为英文语音,则所述语音识别结果为英文单词或字母组成的英文字符串。
进一步地,所述客户终端200可将接收到的所述语音识别结果生成结果日志,并将所述结果日志存储至系统日志缓存区。在一个实例中,所述客户终端200可使用android.util.Log,将最终结果以日志的形式打印到系统日志缓存区。
在步骤S104,所述测试终端100从所述客户终端200中获取所述语音识别结果。一较佳实施例中,所述步骤S104所描述的过程由所述识别结果获取模块1102执行并实现。
例如,所述测试终端100可以直接从所述客户终端200的系统日志缓存区获取所述语音识别结果。
在步骤S105,所述测试终端100将所述语音识别结果与预存的所述待测试音频文件对应的标准结果进行比对得到测试结果。一较佳实施例中,所述步骤S105所描述的过程可由所述测试结果生成模块1103执行并实现。
本实施例中,所述对应的标准结果可以是所述待测试音频文件对应的人工标注的字符串,通过人工的方式根据待测试音频文件的内容得到所述标准结果并记录在所述测试终端100中。在一个实例中,测试人员在使用所述测试终端100进行语音识别测试之前,先将多个所述待测试音频文件存储在存储器111中,并将所述待测试音频文件对应的人工标注的字符串(标准结果)与所述待测试音频文件关联存储。例如,可以通过将所述语音识别结果与所述人工标注的字符串进行对比,比较所述语音识别结果与所述人工标注的字符串的相似程度。所述测试结果则可为所述语音识别结果与所述人工标注的字符串的相同率。
在一种实施方式中,所述步骤S105具体包括:
将所述语音识别结果与所述待测试音频文件对应的标准结果进行比对,得到所述语音识别结果中识别正确的字符数量。一较佳实施例中,该 步骤所描述的过程可由所述正确字数计算单元11031执行并实现。例如,所述待测试音频文件对应的标准结果可为“我今天九点上班”,而所述语音识别结果为“我今天就近上班”,则所述识别正确的字符数量为五个。
根据所述识别正确的字符数量及所述标准结果包括的字符数量计算得到对所述待测试音频文件的语音识别准确率。一较佳实施例中,该步骤所描述的过程可由所述准确率计算单元11032执行并实现。以上述实施例为例,所述待测试音频文件对应的标准结果可为“我今天九点上班”,而所述语音识别结果为“我今天就近上班”,则所述识别正确的字符数量为五个。而所述待测试音频文件对应的标准结果为七个字,则所述语音识别准确率为七分之五。
进一步地,所述测试终端100可连续向所述客户终端200发送多组不同的待测试音频文件,步骤S105具体还可包括以下步骤:
计算对多个待测试音频文件进行语音识别的总正确字符数量。一较佳实施例中,该步骤所描述的过程可由所述正确字数计算单元11031执行并实现。
根据所述总正确字符数量及所述多个待测试音频文件分别对应的标准结果包含的字符数量的总和计算得到所述多个待测试音频文件的语音识别总准确率。该步骤所描述的过程可由所述准确率计算单元11032执行并实现。
进一步地,所述测试结果可包括:一条语音测试结果的识别正确的字符数、一条待测试音频文件产生的语音数据中的总字数、多组语音测试结果的总正确字符数、所述待测试音频文件产生的语音数据中的总字数、单条语音测试结果错误率、多组语音测试结果的总错误率。在一个实例中,以ai表示第i条语音测试结果的正确字符数,以bi表示第i条待测试音频文件产生的语音数据中的总字数,以m表示多组语音测试结果的总正确字符数,以n表示所述待测试音频文件产生的语音数据中的总字数,以wi表 示第i条语音测试结果错误率,以wt表示多组语音测试结果的总错误率。以上述表示,则所述测试结果的计算公式可以为:
(bi-ai)/bi=wi;
((b1+b2+b3+…+bn)–(a1+a2+a3+…+an))/(b1+b2+b3+…+bn)=wt;
其中n为用于一次测试的所述待测试音频文件产生的语音数据的组数。
根据上述实施例提供的语音识别测试方法,通过直接用测试终端100模拟用户输入语音向客户终端200传输语音数据,客户终端200将接收到的语音数据发送给语音识别服务器,实现自动化的语音搜索的测试。另外,语音数据通过音频传导线传输给所述客户终端200,最真实地模拟用户使用场景,避免噪声等外界因素的干扰,可提高自动化语音识别测试的准确率及效率。
进一步地,所述测试终端100还可判断所述语音识别结果是否满足预设条件,以及当所述语音识别结果不满足预设条件时,触发所述测试终端100再次播放所述语音识别结果对应的待测试音频文件,以将产生的语音数据通过音频传导线传输给所述客户终端200,对该待测试音频文件进行再次测试。例如,所述预设条件可以是所述语音识别结果对应的字符数量超过三个字符。
基于上述步骤,所述测试终端100在判断所述待测试音频文件在测试预设次数(例如三次)后的语音识别结果仍不满足所述预设条件后,可删除所述待测试音频文件。所述预设条件可以是:所述语音识别结果中包括字符长度超过预设字符长度。如此,通过将语音识别结果不满足所述预设条件对应的待测试音频文件删除,可以排除一些非正常待测试音频文件导致的测试结果,有助于提高测试效率,避免一些无用的测试,节省测试资源。一较佳实施例中,上述步骤所描述的过程由所述数据删除模块1105执行并实现。
进一步地,所述测试终端100还可以将所述测试结果发送给预设的通信账号对应的通信终端。例如,所述通信账号可以是测试人员对应的通信账号,以方便测试人员实时查看测试结果。
图5为根据本发明一个实施例的应用于图2所示的测试终端100的语音识别测试方法的流程图。本实施例与上述实施例类似,其不同之处在于,本实施例是基于测试终端100对语音识别测试方法进行说明。关于本实施例的其它细节可进一步地参考上述的方法实施例。如图5所示,本实施例中的方法包括以下步骤:
在步骤S201,将测试终端100播放的待测试音频文件产生的语音数据通过音频传导线传输给所述客户终端200,使所述客户终端200将所述语音数据进行编码并发送给所述语音识别服务器300,所述语音识别服务器300将语音识别结果发送给所述客户终端200。
一较佳实施例中,所述步骤S201所描述的过程由所述语音数据传输模块1101执行并实现。
在步骤S202,从所述客户终端200中获取所述语音识别结果。
一较佳实施例中,所述步骤S202所描述的过程由所述识别结果获取模块1102执行并实现。
在步骤S203,将所述语音识别结果与预存的所述待测试音频文件对应的标准结果进行比对得到测试结果。
一较佳实施例中,所述步骤S203所描述的过程由所述测试结果生成模块1103执行并实现。
进一步地,所述测试终端100还可判断所述语音识别结果是否满足预设条件,以及当所述语音识别结果不满足预设条件时,触发所述测试终端100再次播放所述语音识别结果对应的待测试音频文件,以将产生的语音数据通过音频传导线传输给所述客户终端200,对该待测试音频文件进行再次 测试。例如,所述预设条件可以是所述语音识别结果对应的字符数量超过三个字符。一较佳实施例中,上述步骤所描述的过程由所述识别结果判断模块1104执行并实现。
基于上述步骤,所述测试终端100在判断所述待测试音频文件在测试预设次数(例如三次)后的语音识别结果仍不满足所述预设条件后,可删除所述待测试音频文件。所述预设条件可以是:所述语音识别结果中包括字符长度超过预设字符长度。如此,通过将语音识别结果不满足所述预设条件对应的待测试音频文件删除,可以排除一些非正常待测试音频文件导致的测试结果,有助于提高测试效率,避免一些无用的测试,节省测试资源。一较佳实施例中,上述步骤所描述的过程由所述数据删除模块1105执行并实现。
在一种实施方式中,步骤S203具体可包括:将所述语音识别结果与所述待测试音频文件对应的标准结果进行比对,得到所述语音识别结果中识别正确的字符数量。一较佳实施例中,上述步骤所描述的过程由所述正确字数计算单元11031执行并实现。根据所述识别正确的字符数量及所述标准结果包括的字符数量计算得到对所述待测试音频文件的语音识别准确率。上述步骤所描述的过程由所述准确率计算单元11032执行并实现。
进一步地,步骤S203具体可包括:所述将所述语音识别结果与所述待测试音频文件对应的标准结果进行比对,得到所述语音识别结果中识别正确的字符数量的步骤包括:计算对多个待测试音频文件进行语音识别的总正确字符数量。一较佳实施例中,上述步骤所描述的过程由所述正确字数计算单元11031执行并实现。所述根据所述识别正确的字符数量及所述标准结果包括的字符数量计算得到对所述待测试音频文件的语音识别准确率的步骤包括:根据所述总正确字符数量及所述多个待测试音频文件分别对应的标准结果包含的字符数量的总和计算得到所述多个待测试音频文件的语音识别总准确率。上述步骤所描述的过程由所述准确率计算单元11032 执行并实现。
根据上述实施例提供的语音识别测试方法,通过直接用测试终端100模拟用户输入语音向客户终端200传输语音数据,客户终端200将接收到的语音数据发送给语音识别服务器,实现自动化的语音搜索的测试。另外,语音数据通过音频传导线传输给所述客户终端200,最真实地模拟用户使用场景,提高自动化语音识别测试的准确率及效率。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,也可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的,例如,附图中的流程图和框图显示了根据本发明的多个实施例的装置、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或代码的一部分,所述模块、程序段或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现方式中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或动作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
因此,本发明的上述技术构思可以被实施为一种非暂时性机器可读存储介质,其上存储有可执行代码。当该可执行代码被电子设备的处理器执行时,使该处理器执行上文所述的方法。
另一方面,本发明的上述技术构思还可以被实施为一种计算设备,该计算设备包括处理器及存储器。该存储器上存储有可执行代码。当该可执行代码被该处理器执行时,使该处理器执行上文所述的方法。
另外,在本发明各个实施例中的各功能模块可以集成在一起形成一个 独立的部分,也可以是各个模块单独存在,也可以两个或两个以上模块集成形成一个独立的部分。
所述功能如果以软件功能模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,语音识别服务器300,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
以上所述仅为本发明的优选实施例而已,并不用于限制本发明,对于本领域的技术人员来说,本发明可以有各种更改和变化。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步定义和解释。
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应所述以权利要求的保护范围为准。

Claims (19)

  1. 一种语音识别测试方法,其特征在于,应用于语音识别测试系统,该系统包括测试终端、客户终端以及语音识别服务器,所述测试终端通过音频传导线与所述客户终端电性连接,所述客户终端通过网络与所述语音识别服务器通信连接,该方法包括:
    所述测试终端将测试终端播放的待测试音频文件产生的语音数据通过所述音频传导线传输给所述客户终端;
    所述客户终端将接收到的所述语音数据进行编码,并将编码后的语音数据发送给所述语音识别服务器进行语音识别;
    所述语音识别服务器将所述语音数据进行识别,并将语音识别结果发送给所述客户终端;
    所述测试终端从所述客户终端中获取所述语音识别结果;以及
    所述测试终端将所述语音识别结果与预存的所述待测试音频文件对应的标准结果进行比对得到测试结果。
  2. 如权利要求1所述的语音识别测试方法,其特征在于,该方法还包括:
    所述测试终端判断所述语音识别结果是否满足预设条件,
    在所述语音识别结果不满足预设条件的情况下,再次播放所述语音识别结果对应的待测试音频文件,以将产生的语音数据通过音频传导线传输给所述客户终端,对该待测试音频文件进行再次测试。
  3. 如权利要求2所述的语音识别测试方法,其特征在于,该方法还包括:
    所述测试终端在判断所述待测试音频文件在测试预设次数后的语音识别结果仍不满足所述预设条件后,删除所述待测试音频文件。
  4. 如权利要求1所述的语音识别测试方法,其特征在于,所述测试终 端将所述语音识别结果与预存的所述待测试音频文件对应的标准结果进行比对得到测试结果的步骤具体包括:
    将所述语音识别结果与所述待测试音频文件对应的标准结果进行比对,得到所述语音识别结果中识别正确的字符数量;以及
    根据所述识别正确的字符数量及所述标准结果包括的字符数量,计算得到对所述待测试音频文件的语音识别准确率。
  5. 如权利要求4所述的语音识别测试方法,其特征在于,所述测试终端将所述语音识别结果与预存的所述待测试音频文件对应的标准结果进行比对得到测试结果的步骤具体包括:
    计算对多个待测试音频文件进行语音识别的总正确字符数量;以及
    根据所述总正确字符数量及所述多个待测试音频文件分别对应的标准结果包含的字符数量的总和,计算得到所述多个待测试音频文件的语音识别总准确率。
  6. 如权利要求1所述的语音识别测试方法,其特征在于,该方法还包括:
    所述客户终端将所述语音识别结果生成结果日志,并将所述结果日志存储至系统日志缓存区;以及
    所述测试终端从所述客户终端的系统日志缓存区获取所述语音识别结果。
  7. 如权利要求1-6中任意一项所述的语音识别测试方法,其特征在于,该方法还包括:
    所述测试终端将所述测试结果发送给预设的通信账号对应的通信终端。
  8. 一种测试终端,所述测试终端通过音频传导线与客户终端电性连接,所述客户终端与一语音识别服务器通信连接,其特征在于,所述测试终端包括:
    存储器;
    处理器;
    安装/存储于所述存储器并由所述处理器执行的语音识别测试装置;
    所述语音识别测试装置包括:
    语音数据传输模块,用于将测试终端播放的待测试音频文件产生的语音数据通过音频传导线传输给所述客户终端,使所述客户终端将所述语音数据进行编码并发送给所述语音识别服务器,由所述语音识别服务器进行语音识别后将语音识别结果发送给所述客户终端;
    识别结果获取模块,用于从所述客户终端中获取所述语音识别结果;
    测试结果生成模块,用于将所述语音识别结果与预存的所述待测试音频文件对应的标准结果进行比对得到测试结果。
  9. 如权利要求8所述的测试终端,其特征在于,所述测试终端还包括:
    识别结果判断模块,用于判断所述语音识别结果是否满足预设条件,
    在所述语音识别结果不满足预设条件的情况下,触发所述测试终端再次播放所述语音识别结果对应的待测试音频文件,以将产生的语音数据通过音频传导线传输给所述客户终端,对该待测试音频文件进行再次测试。
  10. 如权利要求9所述的测试终端,其特征在于,所述测试终端还包括:
    数据删除模块,用于在所述待测试音频文件在测试预设次数后的语音识别结果仍不满足所述预设条件后,删除所述待测试音频文件。
  11. 如权利要求8所述的测试终端,其特征在于,所述测试结果生成模块包括:
    正确字数计算单元,用于将所述语音识别结果与所述待测试音频文件对应的标准结果进行比对,得到所述语音识别结果中识别正确的字符数量;以及
    准确率计算单元,用于根据所述识别正确的字符数量及所述标准结果包括的字符数量计算得到对所述待测试音频文件的语音识别准确率。
  12. 如权利要求11所述的测试终端,其特征在于,所述正确字数计算单元,还用于计算对多个待测试音频文件进行语音识别的总正确字符数量;
    所述准确率计算单元,还用于根据所述总正确字符数量及所述多个待测试音频文件分别对应的标准结果包含的字符数量的总和计算得到所述多个待测试音频文件的语音识别总准确率。
  13. 一种语音识别测试方法,应用于测试终端,其特征在于,所述测试终端通过音频传导线与客户终端电性连接,所述客户终端与一语音识别服务器通信连接,所述方法包括:
    将所述测试终端播放的待测试音频文件产生的语音数据通过音频传导线传输给所述客户终端,使所述客户终端将所述语音数据进行编码并发送给所述语音识别服务器,由所述语音识别服务器进行语音识别后将语音识别结果发送给所述客户终端;
    从所述客户终端中获取所述语音识别结果;
    将所述语音识别结果与预存的与所述待测试音频文件对应的标准结果进行比对得到测试结果。
  14. 如权利要求13所述的语音识别测试方法,其特征在于,该方法还包括:
    判断所述语音识别结果是否满足预设条件,
    在所述语音识别结果不满足预设条件时,触发所述测试终端再次播放所述语音识别结果对应的待测试音频文件,以将产生的语音数据通过音频传导线传输给所述客户终端,对该待测试音频文件进行再次测试。
  15. 如权利要求14所述的语音识别测试方法,其特征在于,该方法还包括:
    在所述待测试音频文件在测试预设次数后的语音识别结果仍不满足所述预设条件后,删除所述待测试音频文件。
  16. 如权利要求14所述的语音识别测试方法,其特征在于,所述将所 述语音识别结果与预存的所述待测试音频文件对应的标准结果进行比对得到测试结果的步骤包括:
    将所述语音识别结果与所述待测试音频文件对应的标准结果进行比对,得到所述语音识别结果中识别正确的字符数量;以及
    根据所述识别正确的字符数量及所述标准结果包括的字符数量计算得到对所述待测试音频文件的语音识别准确率。
  17. 如权利要求16所述的语音识别测试方法,其特征在于,
    所述将所述语音识别结果与所述待测试音频文件对应的标准结果进行比对,得到所述语音识别结果中识别正确的字符数量的步骤包括:计算对多个待测试音频文件进行语音识别的总正确字符数量;
    所述根据所述正确字符数量及所述标准结果包括的字符数量计算得到对所述待测试音频文件的语音识别准确率的步骤包括:根据所述总正确字符数量及所述多个待测试音频文件分别对应的标准结果包含的字符数量的总和计算得到所述多个待测试音频文件的语音识别总准确率。
  18. 一种计算设备,包括:
    处理器;以及
    存储器,其上存储有可执行代码,当所述可执行代码被所述处理器执行时,使所述处理器执行如权利要求1-8、13-17中任何一项所述的方法。
  19. 一种非暂时性机器可读存储介质,其上存储有可执行代码,当所述可执行代码被电子设备的处理器执行时,使所述处理器执行如权利要求1-8、13-17中任一项所述的方法。
PCT/CN2018/077784 2017-03-01 2018-03-01 语音识别测试方法及测试终端、计算设备及存储介质 WO2018157840A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710118130.9 2017-03-01
CN201710118130.9A CN108538296A (zh) 2017-03-01 2017-03-01 语音识别测试方法及测试终端

Publications (1)

Publication Number Publication Date
WO2018157840A1 true WO2018157840A1 (zh) 2018-09-07

Family

ID=63369812

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/077784 WO2018157840A1 (zh) 2017-03-01 2018-03-01 语音识别测试方法及测试终端、计算设备及存储介质

Country Status (2)

Country Link
CN (1) CN108538296A (zh)
WO (1) WO2018157840A1 (zh)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111145737B (zh) * 2018-11-06 2022-07-01 中移(杭州)信息技术有限公司 语音测试方法、装置和电子设备
CN111354335A (zh) * 2018-12-24 2020-06-30 深圳市优必选科技有限公司 一种语音识别测试方法、装置、存储介质及终端设备
CN111369976A (zh) * 2018-12-25 2020-07-03 华为技术有限公司 测试语音识别设备的方法及测试装置
CN109523990B (zh) * 2019-01-21 2021-11-05 未来电视有限公司 语音检测方法和装置
CN111613242B (zh) * 2019-02-25 2023-03-07 杭州海康威视数字技术股份有限公司 一种设备音频线路的测试方法、装置及电子设备
CN111696523B (zh) * 2019-03-12 2024-03-01 大众问问(北京)信息科技有限公司 一种语音识别引擎的准确度测试方法、装置、电子设备
CN110164474B (zh) * 2019-05-08 2021-09-14 北京百度网讯科技有限公司 语音唤醒自动化测试方法及系统
CN110335628B (zh) * 2019-06-28 2022-03-18 百度在线网络技术(北京)有限公司 智能设备的语音测试方法、装置及电子设备
CN110264995A (zh) * 2019-06-28 2019-09-20 百度在线网络技术(北京)有限公司 智能设备的语音测试方法、装置电子设备及可读存储介质
CN110556098B (zh) * 2019-07-23 2023-04-18 平安科技(深圳)有限公司 语音识别结果测试方法、装置、计算机设备和介质
CN110503960B (zh) * 2019-09-26 2022-02-11 大众问问(北京)信息科技有限公司 语音识别结果的实时上载方法、装置、设备及存储介质
CN111415684B (zh) * 2020-03-18 2023-12-22 歌尔微电子股份有限公司 语音模组的测试方法、装置及计算机可读存储介质
US11769484B2 (en) * 2020-09-11 2023-09-26 International Business Machines Corporation Chaos testing for voice enabled devices
CN112261214A (zh) * 2020-10-21 2021-01-22 广东商路信息科技有限公司 网络语音通信自动化测试方法及系统
CN113485914B (zh) * 2021-06-09 2022-03-08 镁佳(北京)科技有限公司 一种车载语音sdk测试方法、装置及系统
CN115171657A (zh) * 2022-05-26 2022-10-11 青岛海尔科技有限公司 语音设备的测试方法和装置、存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101295505A (zh) * 2007-04-28 2008-10-29 赛微科技股份有限公司 验证语音数据正确性的方法
CN102723080A (zh) * 2012-06-25 2012-10-10 惠州市德赛西威汽车电子有限公司 一种语音识别测试系统及方法
US20130262103A1 (en) * 2012-03-28 2013-10-03 Simplexgrinnell Lp Verbal Intelligibility Analyzer for Audio Announcement Systems
CN103578463A (zh) * 2012-07-27 2014-02-12 腾讯科技(深圳)有限公司 自动化测试方法及测试装置
CN103745731A (zh) * 2013-12-31 2014-04-23 安徽科大讯飞信息科技股份有限公司 一种语音识别效果自动化测试系统及测试方法
CN104538042A (zh) * 2014-12-22 2015-04-22 南京声准科技有限公司 终端智能语音测试系统和方法
CN106228986A (zh) * 2016-07-26 2016-12-14 北京奇虎科技有限公司 一种语音识别引擎的自动化测试方法、装置和系统

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101295505A (zh) * 2007-04-28 2008-10-29 赛微科技股份有限公司 验证语音数据正确性的方法
US20130262103A1 (en) * 2012-03-28 2013-10-03 Simplexgrinnell Lp Verbal Intelligibility Analyzer for Audio Announcement Systems
CN102723080A (zh) * 2012-06-25 2012-10-10 惠州市德赛西威汽车电子有限公司 一种语音识别测试系统及方法
CN103578463A (zh) * 2012-07-27 2014-02-12 腾讯科技(深圳)有限公司 自动化测试方法及测试装置
CN103745731A (zh) * 2013-12-31 2014-04-23 安徽科大讯飞信息科技股份有限公司 一种语音识别效果自动化测试系统及测试方法
CN104538042A (zh) * 2014-12-22 2015-04-22 南京声准科技有限公司 终端智能语音测试系统和方法
CN106228986A (zh) * 2016-07-26 2016-12-14 北京奇虎科技有限公司 一种语音识别引擎的自动化测试方法、装置和系统

Also Published As

Publication number Publication date
CN108538296A (zh) 2018-09-14

Similar Documents

Publication Publication Date Title
WO2018157840A1 (zh) 语音识别测试方法及测试终端、计算设备及存储介质
US10832002B2 (en) System and method for scoring performance of chatbots
CN110069608B (zh) 一种语音交互的方法、装置、设备和计算机存储介质
WO2019196274A1 (zh) 网页页面测试方法、装置、电子设备和介质
US10387292B2 (en) Determining application test results using screenshot metadata
US10761963B2 (en) Object monitoring in code debugging
WO2018082462A1 (zh) 应用界面遍历方法、系统和测试设备
US20170200445A1 (en) Speech synthesis method and apparatus
AU2019204674A1 (en) Code assessment platform
WO2019169723A1 (zh) 测试用例选择方法、装置、设备以及计算机可读存储介质
WO2018120720A1 (zh) 客户端程序的测试错误定位方法、电子装置及存储介质
US10973458B2 (en) Daily cognitive monitoring of early signs of hearing loss
US11048883B2 (en) System and method for detecting portability of sentiment analysis system based on changes in a sentiment confidence score distribution
CN110289015B (zh) 一种音频处理方法、装置、服务器、存储介质及系统
WO2020164272A1 (zh) 上网设备的识别方法、装置及存储介质、计算机设备
US11856129B2 (en) Systems and methods to manage models for call data
WO2020052060A1 (zh) 用于生成修正语句的方法和装置
CN110335628B (zh) 智能设备的语音测试方法、装置及电子设备
US10324822B1 (en) Data analytics in a software development cycle
WO2020252880A1 (zh) 反向图灵验证方法及装置、存储介质、电子设备
WO2023060954A1 (zh) 数据处理与数据质检方法、装置及可读存储介质
WO2023115831A1 (zh) 应用程序的测试方法、装置、电子设备及存储介质
US11631416B1 (en) Audio content validation via embedded inaudible sound signal
CN107515821B (zh) 控件测试方法和装置
US20220130411A1 (en) Defect-detecting device and defect-detecting method for an audio device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18761678

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18761678

Country of ref document: EP

Kind code of ref document: A1