CN110211567A - Voice recognition terminal evaluation system and method - Google Patents

Voice recognition terminal evaluation system and method Download PDF

Info

Publication number
CN110211567A
CN110211567A CN201910393143.6A CN201910393143A CN110211567A CN 110211567 A CN110211567 A CN 110211567A CN 201910393143 A CN201910393143 A CN 201910393143A CN 110211567 A CN110211567 A CN 110211567A
Authority
CN
China
Prior art keywords
test
equipment
voice
recognition
noise
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910393143.6A
Other languages
Chinese (zh)
Inventor
傅蓉蓉
刘毓伟
李玮
董千洲
张小雨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Academy of Information and Communications Technology CAICT
Original Assignee
China Academy of Information and Communications Technology CAICT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Academy of Information and Communications Technology CAICT filed Critical China Academy of Information and Communications Technology CAICT
Priority to CN201910393143.6A priority Critical patent/CN110211567A/en
Publication of CN110211567A publication Critical patent/CN110211567A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/01Assessment or evaluation of speech recognition systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/75Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 for modelling vocal tract parameters

Abstract

It includes: voice playing equipment that the present invention, which provides a kind of voice recognition terminal evaluation system and method, the system, for exporting test voice corpus;Terminal to be measured obtains recognition result for identifying test voice corpus under the different test environment for including noise testing environment;Noise generates equipment, for noise needed for generating test;Image capture device obtains for carrying out Image Acquisition to recognition result and speech recognition image is sent to control equipment;Control equipment, for converting test voice corpus with corpus of text for test by phoneme synthesizing method, image recognition is carried out to speech ciphering equipment image based on deep learning algorithm and obtains recognition result, recognition result is compared acquisition comparison result with preset tape label data, comparison result is used to show the speech recognition performance of terminal to be measured.The program uses automatic test, can support reperformance test, can reduce cost of labor using the functional test compared based on deep learning algorithm.

Description

Voice recognition terminal evaluation system and method
Technical field
The present invention relates to technical field of voice recognition, in particular to a kind of voice recognition terminal evaluation system and method.
Background technique
With the fast development and growth of Internet of Things, interactive voice become open Internet of Things entrance comparative maturity mode it One.Speech recognition technology also becomes most popular one of the technology of current consumption science and technology market.The test of speech recognition is with voice The mature landing of interaction technique, more and more attention has been paid to.But speech recognition test is also in developing stage, most of tests It requires manually to complete, test effect is bad and labor intensive cost.
Summary of the invention
The embodiment of the invention provides a kind of voice recognition terminal evaluation system and methods, solve and use in the prior art Artificial the technical issues of carrying out bad test effect caused by speech recognition and labor intensive cost.
Voice recognition terminal evaluation system provided in an embodiment of the present invention includes: to control equipment, terminal to be measured, image to adopt Collect equipment, voice playing equipment and noise generate equipment, wherein the control equipment and described image acquisition equipment and described Voice playing equipment connection;
Wherein, the control equipment is used for: converting test term with corpus of text for test by phoneme synthesizing method Sound corpus;
The voice playing equipment is used for: the test voice corpus is exported;
The terminal to be measured is used for: under different test environment, identifying the test term of the voice playing equipment output Sound corpus obtains recognition result, and the different test environment include noise testing environment;
The noise generates equipment and is used for: noise needed for generating test in noise testing environment;
Described image acquisition equipment is used for: Image Acquisition carried out to the recognition result, obtains speech recognition image, it will The speech recognition image is sent to the control equipment;
The control equipment is also used to: being carried out image recognition to the speech ciphering equipment image based on deep learning algorithm, is obtained Recognition result is obtained, the recognition result is compared with preset tape label data, obtains comparison result, it is described relatively to tie Fruit is used to show the speech recognition performance of terminal to be measured.
The voice recognition terminal assessment method that the embodiment of the present invention also provides includes:
It controls equipment and test voice corpus is converted with corpus of text for test by phoneme synthesizing method;
Voice playing equipment exports test voice corpus;
Noise generates noise needed for equipment generates test in noise testing environment;
Terminal to be measured identifies the test voice corpus of the voice playing equipment output, obtains under different test environment Recognition result is obtained, the different test environment include noise testing environment;
Image capture device carries out Image Acquisition to the recognition result, obtains speech recognition image, the voice is known Other image is sent to the control equipment;
It controls equipment and is based on deep learning algorithm to speech ciphering equipment image progress image recognition, obtain recognition result, The recognition result is compared with preset tape label data, obtain comparison result, the comparison result be used to show to Survey the speech recognition performance of terminal;
Wherein, the control equipment acquires equipment with described image and the voice playing equipment is connect.
The embodiment of the invention also provides a kind of computer equipments, including memory, processor and storage are on a memory And the computer program that can be run on a processor, the processor realize side described above when executing the computer program Method.
The embodiment of the invention also provides a kind of computer readable storage medium, the computer-readable recording medium storage There is the computer program for executing method described above.
In embodiments of the present invention, control equipment converts test with corpus of text for test by phoneme synthesizing method and uses Voice corpus, voice playing equipment export test voice corpus, and noise generates noise of equipment and generates equipment, to be measured Terminal identifies the test voice corpus of the voice playing equipment output, obtains recognition result under different test environment, The different test environment include noise testing environment, and image capture device carries out Image Acquisition to the recognition result, obtains Speech recognition image, control equipment are based on deep learning algorithm and carry out image recognition to the speech ciphering equipment image, identified As a result, the recognition result is compared with preset tape label data, comparison result is obtained, the comparison result is used to Show the speech recognition performance of terminal to be measured.Compared with prior art, the present invention uses automatic test, can support repeatability Test, can reduce cost of labor using the functional test compared based on deep learning algorithm.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention without creative efforts, may be used also for those of ordinary skill in the art To obtain other drawings based on these drawings.
Fig. 1 is a kind of voice recognition terminal evaluation system structural block diagram provided in an embodiment of the present invention;
Fig. 2 is a kind of equipment placement position schematic diagram provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that the described embodiment is only a part of the embodiment of the present invention, instead of all the embodiments.Based on this Embodiment in invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
In embodiments of the present invention, a kind of voice recognition terminal evaluation system is provided, as shown in Figure 1, the system packet Include: control equipment 4, terminal to be measured 3, image capture device 1, voice playing equipment 5 and noise generate equipment 6, wherein described Control equipment 4 acquires equipment 1 with described image and the voice playing equipment 5 is connect;
Wherein, the control equipment 4 is used for: will be tested by phoneme synthesizing method (TTS, i.e. Text To Speech) Test voice corpus is converted into corpus of text;
The voice playing equipment 5 is used for: the test voice corpus is exported;
The terminal to be measured 3 is used for: under different test environment, identifying that the test of the voice playing equipment output is used Voice corpus obtains recognition result, and the different test environment include noise testing environment;
The noise generates equipment 6 and is used for: noise needed for generating test in noise testing environment;
Described image acquisition equipment 1 is used for: Image Acquisition carried out to the recognition result, obtains speech recognition image, it will The speech recognition image is sent to the control equipment;
The control equipment 4 is also used to: image recognition is carried out to the speech ciphering equipment image based on deep learning algorithm, Recognition result is obtained, the recognition result is compared with preset tape label data, obtains comparison result, the comparison As a result it is used to show the speech recognition performance of terminal to be measured.
Wherein, preset tape label data refer to: corresponding to testing material existing one for tested speech ciphering equipment The expected correct image data of gained of group.It is tied compared with the recognition result is compared acquisition with preset tape label data Fruit refers to: by carrying out recognizer comparison to the tested speech ciphering equipment image grabbed in test and expected correct images, obtaining Obtain comparison result and statistical correction rate.
In embodiments of the present invention, as shown in Figure 1, image capture device 1 can be high-speed camera, pass through bracket It is set up in the upper surface of terminal 3 to be measured.Voice playing equipment 5 can be artificial mouth, and it can be high-fidelity that noise, which generates equipment 6, Speaker, control equipment 4 can be computer equipment.
As shown in Figure 1, terminal 3 to be measured should belong in same Wireless LAN 7 with control equipment 4.Radio connection Can include but is not limited to 3G/4G connection, WiFi connection, bluetooth connection and other it is currently known or in the future exploitation it is wireless Connection type.
As shown in Figure 1, the system can also include testboard 2, wherein terminal 3 to be measured is placed on testboard 2.
In embodiments of the present invention, noise mentioned above, which generates equipment 6, can be artificial setting adjusting noise generation equipment 6 noise to keep its generation required.It is connect furthermore it is also possible to which noise is generated equipment 6 with control equipment 4, by being set in control Setting noise generates parameter in standby 4, and the noise of setting is then generated parameter and is sent to noise generation equipment 6, then noise produces Generating apparatus 6 generates parameter according to above-mentioned noise and generates corresponding noise, and the automatic production of noise may be implemented in this way It is raw.
In embodiments of the present invention, the placement position that terminal 3 to be measured, voice playing equipment 5 and noise generate equipment 6 can With as shown in Figure 2.Also different according to the difference of terminal 3 to be measured, and the distance of artificial mouth, such as terminal 3 to be measured is that mobile phone is whole End, horizontal distance can be controlled in 50cm;If terminal 3 to be measured is intelligent sound box, the horizontal distance with artificial mouth can be 3m and 5m. According to test request and environment, if having particular/special requirement (market/kindergarten/office etc.) to environment, noise generates equipment 6 can It is placed at the horizontal distance 1.5m apart from terminal 3 to be measured.If terminal 3 to be measured is intelligent sound box, generally all configuration has noise suppression The microphone of production, so the angle that voice playing equipment 5 and noise generate equipment 6 can be at 45 °, 90 °, 135 °, 180 ° Angularly position is tested in left and right sides.
In embodiments of the present invention, (intelligent sound equipment, smart television, bluetooth be can be since terminal 3 to be measured is different Earphone, smart phone, intelligent sound box etc.), testing requirement is also difference, as shown in table 1.
Table 1
Test dimension can be set when carrying out speech recognition test based on this.Specifically, the control equipment is also used In: multiple test dimensions are set according to the termination property to be measured;
Image recognition is carried out to the speech ciphering equipment image according to multiple test dimensions, it is corresponding to obtain multiple test dimensions Multiple recognition results, by the corresponding multiple recognition results of multiple test dimensions respectively with corresponding preset tape label data into Row compares, and obtains the corresponding multiple comparison results of multiple test dimensions, and the corresponding multiple comparison results of multiple examination dimensions are carried out Statistical analysis, obtains statistic analysis result, and the statistic analysis result is used to show the speech recognition performance of terminal to be measured.
In embodiments of the present invention, the design for controlling the testing material in equipment 4 should ensure that with tested speech and actually answer With the consistency of scene.User's factor that voice recognition terminal may face include but is not limited to different user, category of language, Accent, pronunciation, word speed, vocabulary, context, distance, noise circumstance etc., corpus should fully consider each influence factor when designing. The selection of corpus content, and be designed according to the demand of object to be measured.The application scenarios of voice recognition terminal are used for mostly Family life, vehicle-mounted, public place, content can cover life & amusement, business meetings etc..Specific standard corpus collection such as table 2 It is shown.
Table 2
Based on this, when carrying out speech recognition test, control equipment selection to be that test corpus of text includes multiple, For testing material collection.Such as, it may be considered that gender, region and languages are concentrated from designed standard corpus and choose testing material Collection.
For example, voice wake-up, semantic understanding can be chosen according to the testing requirement of the built-in voice assistant in smart phone With user's delivery rate, service this four test dimensions of covering as test item.
The system specifically proceeds as follows speech recognition test:
The control equipment is used for: being converted multiple tests with corpus of text for multiple tests by phoneme synthesizing method and is used Voice corpus;
The voice playing equipment (can be dummy head) is used for: the multiple test is sequentially output with voice corpus;
The terminal to be measured (can be smart phone) is used for: under different test environment, successively identifying that the voice is broadcast The multiple test voice corpus for putting equipment output obtain multiple recognition results, and the different test environment include noise testing Environment.
The noise generates equipment (can be high-fidelity music center) and is used for: generating needed for test in noise testing environment Noise.
For example, providing quiet environment, ambient noise (three kinds of test wrappers in market and office according to test equipment characteristic Border) it is tested, dummy head and smart phone horizontal distance are 50cm, and specific placement position is as shown in Figure 2.Due to intelligent hand The angle of straight line where the microphone of machine does not have noise suppressing function, high-fidelity music center (noise) and dummy head and smart phone It is not required.
Described image acquisition equipment is used for: being carried out Image Acquisition to the multiple recognition result, is obtained multiple speech recognitions The multiple speech recognition image is sent to the control equipment by image;
The control equipment is also used to: assigning different weights to multiple test dimensions according to the termination property to be measured (weight parameter can be adjusted flexibly according to Devices to test);The multiple speech ciphering equipment image is carried out according to multiple test dimensions Image recognition obtains multiple test voice corpus, the corresponding multiple recognition results of multiple test dimensions, multiple tests is used Voice corpus, the corresponding multiple recognition results of multiple test dimensions are compared with corresponding preset tape label data respectively, Multiple test voice corpus, the corresponding multiple comparison results of multiple test dimensions are obtained, the comparison result is to use test The successful number of voice corpus identification is counted according to the weight of multiple test dimensions, the corresponding successfully number of multiple comparison results Analysis is as a result, the statistic analysis result is used to show the speech recognition performance of terminal to be measured.
In some embodiments, it is covered such as test dimension for service, total N test case (i.e. test voice corpus) controls Control equipment is sequentially output test case by artificial mouth, while being shone with the feedback that high-speed camera is continuously shot mobile phone speech assistant Piece.Computer equipment compares feedback result and tape label data using image algorithm, exports as a result, such as completing this test case Then terminate this test, otherwise continues artificial mouth output test case and carry out retest, until this test terminates.
The test of voice recognition terminal automation may be implemented by the above method.
Specifically, the control equipment is specifically used for:
Statistical is obtained according to the weight of multiple test dimensions, the corresponding successfully number of multiple comparison results according to following formula Analyse result:
Wherein, Score is statistic analysis result;K is test dimension sum;The weight of dimension, i=are tested for i-th 1,2 ..., k, andnSuccessFor the successful number identified under each test dimension to test voice corpus;N is that test is used Voice corpus number.
If the weight of each dimension is 0.25, the final score of last voice assistant test result is
Summarize generation assessment report by the way that formula is for statistical analysis to each test dimension, can intuitively find out assessment knot Fruit.
Equally, in some embodiments, some test dimension can also continue to draw molecular testing dimension carry out a weight assignment comment Divide voice wake-up test item such as that can be divided into false wake-up, wake-up response time, wake-up rate can distinguish the different weight of assignment.Language A kind of methods of marking of sound wake-up test item are as follows: wake-up rate is scored atWherein, nAlwaysFor time correctly waken up Number, N are wake-up test total degree;False wake-up rate is scored atWherein, nAccidentallyIt is missed for equipment under test The number of wake-up, N are length of testing speech hourage (it is required that N >=24);Wakeup time T=T2-T1, at the time of T1 falls for speech, T2 is the time that terminal to be measured is begun to respond to.The test of multiplicating property can be carried out to wakeup time test item, accumulate a large amount of tests Data analyze data, carry out segmentation scoring using the form of piecewise function.Its benefit is not utilize absolute numerical value As a result, but using relative value come evaluation result.Such as some implementation test cases are to wake up delay and response time, network Delay has a certain impact to test result, after carrying out the conversion scoring of piecewise function, eliminates shadow caused by network delay It rings.Three sub- test items then finally are added to obtain the test result that final voice wakes up respectively with respective multiplied by weight.
Based on the same inventive concept, a kind of voice recognition terminal assessment method is additionally provided in the embodiment of the present invention, it is as follows Described in the embodiment in face.The principle and voice recognition terminal evaluation system solved the problems, such as due to voice recognition terminal assessment method It is similar, therefore the implementation of voice recognition terminal assessment method may refer to the implementation of voice recognition terminal evaluation system, repetition Place repeats no more.
The voice recognition terminal assessment method includes:
It controls equipment and test voice corpus is converted with corpus of text for test by phoneme synthesizing method;
Voice playing equipment exports test voice corpus;
Noise generates noise needed for equipment generates test in noise testing environment;
Terminal to be measured identifies the test voice corpus of the voice playing equipment output, obtains under different test environment Recognition result is obtained, the different test environment include noise testing environment;
Image capture device carries out Image Acquisition to the recognition result, obtains speech recognition image, the voice is known Other image is sent to the control equipment;
It controls equipment and is based on deep learning algorithm to speech ciphering equipment image progress image recognition, obtain recognition result, The recognition result is compared with preset tape label data, obtain comparison result, the comparison result be used to show to Survey the speech recognition performance of terminal;
Wherein, the control equipment acquires equipment with described image and the voice playing equipment is connect.
The embodiment of the invention also provides a kind of computer equipments, including memory, processor and storage are on a memory And the computer program that can be run on a processor, the processor realize side described above when executing the computer program Method.
The embodiment of the invention also provides a kind of computer readable storage medium, the computer-readable recording medium storage There is the computer program for executing method described above.
In conclusion voice recognition terminal evaluation system proposed by the present invention and method are shot by using high-speed camera The automatic test without interface may be implemented in the image alignment algorithm of photo and utilization based on deep learning, and will test Range can reduce cost of labor to subjective testing item from objective examination further expansion, and can accomplish reperformance test. On the other hand, quantifiable standards of grading are used when finally being evaluated, expansibility is strong, has a wide range of application, tester Flexible modulation parameter can be needed according to test.
It should be understood by those skilled in the art that, the embodiment of the present invention can provide as method, system or computer journey Sequence product.Therefore, complete hardware embodiment, complete software embodiment or combining software and hardware aspects can be used in the present invention The form of embodiment.Moreover, it wherein includes the calculating of computer usable program code that the present invention, which can be used in one or more, The computer program implemented in machine usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) The form of product.
The present invention be referring to according to the method for the embodiment of the present invention, the process of equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that can be realized by computer program instructions each in flowchart and/or the block diagram The combination of process and/or box in process and/or box and flowchart and/or the block diagram.It can provide these computers Processor of the program instruction to general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices To generate a machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute For realizing the function of being specified in one or more flows of the flowchart and/or one or more blocks of the block diagram Device.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that instruction stored in the computer readable memory generation includes The manufacture of command device, the command device are realized in one box of one or more flows of the flowchart and/or block diagram Or the function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer Or the instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or box The step of function of being specified in figure one box or multiple boxes.
The foregoing is only a preferred embodiment of the present invention, is not intended to restrict the invention, for the skill of this field For art personnel, the embodiment of the present invention can have various modifications and variations.All within the spirits and principles of the present invention, made Any modification, equivalent substitution, improvement and etc., should all be included in the protection scope of the present invention.

Claims (10)

1. a kind of voice recognition terminal evaluation system characterized by comprising control equipment, terminal to be measured, Image Acquisition are set Standby, voice playing equipment and noise generate equipment, wherein the control equipment acquires equipment with described image and the voice is broadcast Put equipment connection;
Wherein, the control equipment is used for: converting test voice language with corpus of text for test by phoneme synthesizing method Material;
The voice playing equipment is used for: the test voice corpus is exported;
The terminal to be measured is used for: under different test environment, identifying the test voice language of the voice playing equipment output Material obtains recognition result, and the different test environment include noise testing environment;
The noise generates equipment and is used for: noise needed for generating test in noise testing environment;
Described image acquisition equipment is used for: being carried out Image Acquisition to the recognition result, speech recognition image is obtained, by institute's predicate Sound identification image is sent to the control equipment;
The control equipment is also used to: being carried out image recognition to the speech ciphering equipment image based on deep learning algorithm, is known Not as a result, the recognition result is compared with preset tape label data, comparison result is obtained, the comparison result is used to Show the speech recognition performance of terminal to be measured.
2. voice recognition terminal evaluation system as described in claim 1, which is characterized in that it is high speed that described image, which acquires equipment, Video camera, the voice playing equipment are artificial mouth, and it is Hi-Fi sound-box that the noise, which generates equipment,.
3. voice recognition terminal evaluation system as described in claim 1, which is characterized in that further include: testboard;
The terminal to be measured is placed on the testboard.
4. voice recognition terminal evaluation system as described in claim 1, which is characterized in that the control equipment is also made an uproar with described Sound generates equipment connection;
The control equipment is also used to: setting noise generates parameter, and noise generation parameter is sent to the noise and is generated Equipment;
The noise generates equipment and is specifically used for: generating parameter according to the noise and generates corresponding noise.
5. voice recognition terminal evaluation system as described in claim 1, which is characterized in that the control equipment is also used to: root Multiple test dimensions are set according to the termination property to be measured;
Image recognition is carried out to the speech ciphering equipment image according to multiple test dimensions, it is corresponding multiple to obtain multiple test dimensions Recognition result compares the corresponding multiple recognition results of multiple test dimensions with corresponding preset tape label data respectively Compared with the corresponding multiple comparison results of the multiple test dimensions of acquisition count the corresponding multiple comparison results of multiple examination dimensions Analysis, obtains statistic analysis result, and the statistic analysis result is used to show the speech recognition performance of terminal to be measured.
6. voice recognition terminal evaluation system as claimed in claim 5, which is characterized in that the test includes with corpus of text It is multiple;
The control equipment is used for: converting multiple test voices with corpus of text for multiple tests by phoneme synthesizing method Corpus;
The voice playing equipment is used for: the multiple test is sequentially output with voice corpus;
The terminal to be measured is used for: under different test environment, successively identifying multiple tests of the voice playing equipment output With voice corpus, multiple recognition results are obtained, the different test environment include noise testing environment;
The noise generates equipment and is used for: noise needed for generating test in noise testing environment;
Described image acquisition equipment is used for: Image Acquisition carried out to the multiple recognition result, obtains multiple speech recognition images, The multiple speech recognition image is sent to the control equipment;
The control equipment is also used to: assigning different weights to multiple test dimensions according to the termination property to be measured;According to Multiple test dimensions carry out image recognition to the multiple speech ciphering equipment image, obtain multiple test voice corpus, multiple surveys The corresponding multiple recognition results of dimension are tried, by multiple tests voice corpus, the corresponding multiple recognition results of multiple test dimensions It is compared respectively with corresponding preset tape label data, it is corresponding to obtain multiple test voice corpus, multiple test dimensions Multiple comparison results, the comparison result is the successful number identified to test voice corpus, according to multiple test dimensions The corresponding successfully number of weight, multiple comparison results obtains statistic analysis result, and the statistic analysis result is used to show end to be measured The speech recognition performance at end.
7. voice recognition terminal evaluation system as claimed in claim 6, which is characterized in that the control equipment is specifically used for:
Statistical analysis knot is obtained according to the weight of multiple test dimensions, the corresponding successfully number of multiple comparison results according to following formula Fruit:
Wherein, Score is statistic analysis result;K is test dimension sum;For i-th test dimension weight, i=1,2 ..., K, andnSuccessFor the successful number identified under each test dimension to test voice corpus;N is test voice corpus Number.
8. a kind of voice recognition terminal assessment method characterized by comprising
It controls equipment and test voice corpus is converted with corpus of text for test by phoneme synthesizing method;
Voice playing equipment exports test voice corpus;
Noise generates noise needed for equipment generates test in noise testing environment;
Terminal to be measured identifies the test voice corpus of the voice playing equipment output, is known under different test environment Not as a result, the different test environment include noise testing environment;
Image capture device carries out Image Acquisition to the recognition result, speech recognition image is obtained, by the speech recognition figure As being sent to the control equipment;
It controls equipment and is based on deep learning algorithm to speech ciphering equipment image progress image recognition, recognition result is obtained, by institute It states recognition result to be compared with preset tape label data, obtains comparison result, the comparison result is used to show end to be measured The speech recognition performance at end;
Wherein, the control equipment acquires equipment with described image and the voice playing equipment is connect.
9. a kind of computer equipment including memory, processor and stores the meter that can be run on a memory and on a processor Calculation machine program, which is characterized in that the processor realizes claim 8 the method when executing the computer program.
10. a kind of computer readable storage medium, which is characterized in that the computer-readable recording medium storage has perform claim It is required that the computer program of 8 the methods.
CN201910393143.6A 2019-05-13 2019-05-13 Voice recognition terminal evaluation system and method Pending CN110211567A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910393143.6A CN110211567A (en) 2019-05-13 2019-05-13 Voice recognition terminal evaluation system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910393143.6A CN110211567A (en) 2019-05-13 2019-05-13 Voice recognition terminal evaluation system and method

Publications (1)

Publication Number Publication Date
CN110211567A true CN110211567A (en) 2019-09-06

Family

ID=67787161

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910393143.6A Pending CN110211567A (en) 2019-05-13 2019-05-13 Voice recognition terminal evaluation system and method

Country Status (1)

Country Link
CN (1) CN110211567A (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110415681A (en) * 2019-09-11 2019-11-05 北京声智科技有限公司 A kind of speech recognition effect testing method and system
CN110728975A (en) * 2019-10-10 2020-01-24 南京创维信息技术研究院有限公司 System and method for automatically testing ASR recognition rate
CN110808029A (en) * 2019-11-20 2020-02-18 斑马网络技术有限公司 Vehicle-mounted machine voice test system and method
CN110838285A (en) * 2019-11-20 2020-02-25 青岛海尔科技有限公司 System, method and device for terminal voice test
CN110942768A (en) * 2019-11-20 2020-03-31 Oppo广东移动通信有限公司 Equipment wake-up test method and device, mobile terminal and storage medium
CN111026652A (en) * 2019-11-27 2020-04-17 南京创维信息技术研究院有限公司 Automatic testing method for awakening rate of voice artificial intelligence system
CN111242455A (en) * 2020-01-07 2020-06-05 北京百度网讯科技有限公司 Method and device for evaluating voice function of electronic map, electronic equipment and storage medium
CN111261195A (en) * 2020-01-10 2020-06-09 Oppo广东移动通信有限公司 Audio testing method and device, storage medium and electronic equipment
CN111785268A (en) * 2020-06-30 2020-10-16 北京声智科技有限公司 Method and device for testing voice interaction response speed and electronic equipment
CN112017635A (en) * 2020-08-27 2020-12-01 北京百度网讯科技有限公司 Method and device for detecting voice recognition result
CN112405561A (en) * 2020-11-30 2021-02-26 天津链数科技有限公司 Testing system for intelligent level test of household appliances
CN112685083A (en) * 2019-10-17 2021-04-20 北京沃东天骏信息技术有限公司 Method and system for measuring wake-up rate
CN113362806A (en) * 2020-03-02 2021-09-07 北京奇虎科技有限公司 Intelligent sound evaluation method, system, storage medium and computer equipment thereof
CN114120969A (en) * 2022-01-29 2022-03-01 中国电子技术标准化研究院 Method and system for testing voice recognition function of intelligent terminal and electronic equipment
WO2022052945A1 (en) * 2020-09-11 2022-03-17 International Business Machines Corporation Chaos testing for voice enabled devices
CN115171657A (en) * 2022-05-26 2022-10-11 青岛海尔科技有限公司 Voice equipment testing method and device and storage medium
CN115474146A (en) * 2022-08-26 2022-12-13 北京百度网讯科技有限公司 Voice test system, method and device
CN115512686A (en) * 2022-06-22 2022-12-23 青岛海尔科技有限公司 Method and device for determining wake-up result, storage medium and electronic device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106548772A (en) * 2017-01-16 2017-03-29 上海智臻智能网络科技股份有限公司 Speech recognition test system and method
CN107613447A (en) * 2017-10-27 2018-01-19 深圳市传测科技有限公司 A kind of intelligent terminal audio test device, system and method for testing
CN107680613A (en) * 2017-08-13 2018-02-09 惠州市德赛西威汽车电子股份有限公司 A kind of voice-operated device speech recognition capabilities method of testing and equipment
CN108806666A (en) * 2018-05-28 2018-11-13 成都昊铭科技有限公司 Without the speech recognition test device of interface, system and method
CN108899012A (en) * 2018-07-27 2018-11-27 中国电子产品可靠性与环境试验研究所((工业和信息化部电子第五研究所)(中国赛宝实验室)) Interactive voice equipment evaluating method, system, computer equipment and storage medium
US20180342236A1 (en) * 2016-10-11 2018-11-29 Mediazen, Inc. Automatic multi-performance evaluation system for hybrid speech recognition
CN109192195A (en) * 2018-09-29 2019-01-11 深圳市微测检测有限公司 A kind of speech recognition test macro and method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180342236A1 (en) * 2016-10-11 2018-11-29 Mediazen, Inc. Automatic multi-performance evaluation system for hybrid speech recognition
CN106548772A (en) * 2017-01-16 2017-03-29 上海智臻智能网络科技股份有限公司 Speech recognition test system and method
CN107680613A (en) * 2017-08-13 2018-02-09 惠州市德赛西威汽车电子股份有限公司 A kind of voice-operated device speech recognition capabilities method of testing and equipment
CN107613447A (en) * 2017-10-27 2018-01-19 深圳市传测科技有限公司 A kind of intelligent terminal audio test device, system and method for testing
CN108806666A (en) * 2018-05-28 2018-11-13 成都昊铭科技有限公司 Without the speech recognition test device of interface, system and method
CN108899012A (en) * 2018-07-27 2018-11-27 中国电子产品可靠性与环境试验研究所((工业和信息化部电子第五研究所)(中国赛宝实验室)) Interactive voice equipment evaluating method, system, computer equipment and storage medium
CN109192195A (en) * 2018-09-29 2019-01-11 深圳市微测检测有限公司 A kind of speech recognition test macro and method

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110415681A (en) * 2019-09-11 2019-11-05 北京声智科技有限公司 A kind of speech recognition effect testing method and system
CN110415681B (en) * 2019-09-11 2022-02-18 北京声智科技有限公司 Voice recognition effect testing method and system
CN110728975A (en) * 2019-10-10 2020-01-24 南京创维信息技术研究院有限公司 System and method for automatically testing ASR recognition rate
CN112685083A (en) * 2019-10-17 2021-04-20 北京沃东天骏信息技术有限公司 Method and system for measuring wake-up rate
CN110808029A (en) * 2019-11-20 2020-02-18 斑马网络技术有限公司 Vehicle-mounted machine voice test system and method
CN110838285A (en) * 2019-11-20 2020-02-25 青岛海尔科技有限公司 System, method and device for terminal voice test
CN110942768A (en) * 2019-11-20 2020-03-31 Oppo广东移动通信有限公司 Equipment wake-up test method and device, mobile terminal and storage medium
CN111026652A (en) * 2019-11-27 2020-04-17 南京创维信息技术研究院有限公司 Automatic testing method for awakening rate of voice artificial intelligence system
CN111242455A (en) * 2020-01-07 2020-06-05 北京百度网讯科技有限公司 Method and device for evaluating voice function of electronic map, electronic equipment and storage medium
CN111261195A (en) * 2020-01-10 2020-06-09 Oppo广东移动通信有限公司 Audio testing method and device, storage medium and electronic equipment
CN113362806A (en) * 2020-03-02 2021-09-07 北京奇虎科技有限公司 Intelligent sound evaluation method, system, storage medium and computer equipment thereof
CN111785268A (en) * 2020-06-30 2020-10-16 北京声智科技有限公司 Method and device for testing voice interaction response speed and electronic equipment
CN112017635A (en) * 2020-08-27 2020-12-01 北京百度网讯科技有限公司 Method and device for detecting voice recognition result
WO2022052945A1 (en) * 2020-09-11 2022-03-17 International Business Machines Corporation Chaos testing for voice enabled devices
GB2614192A (en) * 2020-09-11 2023-06-28 Ibm Chaos testing for voice enabled devices
US11769484B2 (en) 2020-09-11 2023-09-26 International Business Machines Corporation Chaos testing for voice enabled devices
CN112405561A (en) * 2020-11-30 2021-02-26 天津链数科技有限公司 Testing system for intelligent level test of household appliances
CN114120969A (en) * 2022-01-29 2022-03-01 中国电子技术标准化研究院 Method and system for testing voice recognition function of intelligent terminal and electronic equipment
CN115171657A (en) * 2022-05-26 2022-10-11 青岛海尔科技有限公司 Voice equipment testing method and device and storage medium
CN115512686A (en) * 2022-06-22 2022-12-23 青岛海尔科技有限公司 Method and device for determining wake-up result, storage medium and electronic device
CN115474146A (en) * 2022-08-26 2022-12-13 北京百度网讯科技有限公司 Voice test system, method and device

Similar Documents

Publication Publication Date Title
CN110211567A (en) Voice recognition terminal evaluation system and method
Schuller et al. The INTERSPEECH 2021 computational paralinguistics challenge: COVID-19 cough, COVID-19 speech, escalation & primates
Czyzewski et al. An audio-visual corpus for multimodal automatic speech recognition
CN108288468B (en) Audio recognition method and device
CN105512348B (en) For handling the method and apparatus and search method and device of video and related audio
CN109862393B (en) Method, system, equipment and storage medium for dubbing music of video file
US20190228791A1 (en) Method and device for generating far-field speech data, computer device and computer readable storage medium
CN107767869A (en) Method and apparatus for providing voice service
US20130211826A1 (en) Audio Signals as Buffered Streams of Audio Signals and Metadata
CN107799126A (en) Sound end detecting method and device based on Supervised machine learning
CN110853617B (en) Model training method, language identification method, device and equipment
CN109545192A (en) Method and apparatus for generating model
CN109616142A (en) Device and method for audio classification and processing
JP2020034895A (en) Responding method and device
CN109754783A (en) Method and apparatus for determining the boundary of audio sentence
CN108877782A (en) Audio recognition method and device
WO2020253128A1 (en) Voice recognition-based communication service method, apparatus, computer device, and storage medium
CN111369976A (en) Method and device for testing voice recognition equipment
CN109256115A (en) A kind of speech detection system and method for intelligent appliance
CN110019962B (en) Method and device for generating video file information
US8620670B2 (en) Automatic realtime speech impairment correction
JP2022509485A (en) Systems and methods for domain adaptation in neural networks using cross-domain batch normalization
WO2022105693A1 (en) Sample generation method and apparatus
CN113257283A (en) Audio signal processing method and device, electronic equipment and storage medium
US11087779B2 (en) Apparatus that identifies a scene type and method for identifying a scene type

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Fu Rongrong

Inventor after: Liu Yuwei

Inventor after: Li Wei

Inventor after: Dong Qianzhou

Inventor after: Zhang Xiaoyu

Inventor before: Fu Rongrong

Inventor before: Liu Yuwei

Inventor before: Li Wei

Inventor before: Dong Qianzhou

Inventor before: Zhang Xiaoyu

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190906