CN111429884A

CN111429884A - Speech recognition rate analysis system

Info

Publication number: CN111429884A
Application number: CN202010244371.XA
Authority: CN
Inventors: 潘浩贤; 蔡伟雄; 严冬; 冼佳莉; 陈南洲; 陈晓燕
Original assignee: Foshan University
Current assignee: Foshan University
Priority date: 2020-03-31
Filing date: 2020-03-31
Publication date: 2020-07-17
Anticipated expiration: 2040-03-31
Also published as: CN111429884B

Abstract

The invention discloses a speech recognition rate analysis system, comprising: the first microphone array is used for acquiring test audio; the display module is used for displaying the function options of the system and providing click selection for a user; the wireless transmitting module and the wireless receiving module are used for receiving the feedback information of the voice module of the test object in a mutually matched manner; the test audio delivery module comprises a first loudspeaker and a second microphone array, wherein the first loudspeaker is used for playing the test audio, and the second microphone array is used for collecting a test audio signal played by the first loudspeaker; the distance measurement module is used for measuring the distance between the processing module and the tested voice module; the invention can make the voice analysis system more intelligent and provide a good hardware environment for the accurate test of the voice recognition rate.

Description

Speech recognition rate analysis system

Technical Field

The invention relates to the technical field of voice recognition, in particular to a voice recognition rate analysis system.

Background

Speech recognition, which is a very popular technology in the present, has been used reasonably by many industries.

The reference value of the voice recognition rate of the existing voice product is low because the analysis of environmental parameters and functional parameters in a recognition state is lacked in the test process;

the existing voice recognition rate test methods mainly comprise two methods: software simulation test and manual test. The former inputs audio signals to a voice module through software, and a test recognition result is obtained on a computer. The latter arranges a large number of testers to repeatedly carry out testing, recording, uploading data and statistical analysis on site, and the testing method consumes a large amount of human resources and has complicated actual operation steps and low efficiency;

at present, a few of voice recognition equipment are tested by building a hardware system, but the tested parameters lack values for judging, analyzing and measuring the performance of a voice module, and the hardware architecture is complex;

in a few existing hardware test systems, most test objects are complete voice recognition products, so that the test is difficult to be compatible with different devices, and the universality is low.

Therefore, a speech recognition rate analysis system is urgently needed in the current market, the speech recognition rate analysis system can be built through a simpler hardware structure, and an excellent hardware environment can be provided for the accurate test of the speech recognition rate.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a voice recognition rate analysis system, which can be built through a simpler hardware structure and can provide a good hardware environment for completing accurate test of the voice recognition rate.

The solution of the invention for solving the technical problem is as follows: a speech recognition rate analysis system comprising:

a first microphone array for conducting acquisition of test audio;

the display module is used for displaying the function options of the system and providing click selection for a user;

the wireless transmitting module and the wireless receiving module are used for receiving feedback information of the voice module of the test object in a matched manner;

the test audio delivery module comprises a first loudspeaker and a second microphone array, wherein the first loudspeaker is used for playing the test audio, and the second microphone array is used for collecting a test audio signal played by the first loudspeaker;

the distance measurement module is used for measuring the distance between the processing module and the tested voice module;

the processing module comprises:

an ambient noise measurement unit for measuring a degree of ambient noise by a sound pressure level;

the multimedia encoder is used for filtering the test audio collected by the first microphone array, performing A/D conversion on the filtered test audio and storing the converted test audio into an F L ASH cache in the form of a WAV file;

and the multimedia decoder is used for performing D/A conversion on the WAV file when the WAV file is called by the serial port, performing power amplification and then broadcasting the WAV file by the first loudspeaker.

Further, the display module embedding sets up in the box body center, the lower part of box body encircles and is provided with first microphone array, the left and right both sides at the rear end lower part center of box body are provided with range finding module and wireless receiving module, the rear end upper portion center of box body is provided with first speaker.

Further, the wireless transmitting module and the wireless receiving module respectively comprise an infrared transmitting module and an infrared receiving module.

Further, the range finding module is specifically a laser range finding module, including laser emitter and SPAD infrared receiver.

Further, the display module comprises a TFT L CD display screen.

Further, the processing module comprises a micro control module of STM32F10X series and peripheral circuits thereof.

Further, the second microphone array is arranged at a tested voice module, and a second loudspeaker is further arranged at the tested voice module.

The invention has the beneficial effects that: the invention provides a voice recognition rate analysis system, which can directly interact with a test object after a voice module (test object) is externally connected with a wireless transmitting module, namely, artificial sound production and result recording are replaced, so that the voice analysis system is more intelligent, and a good hardware environment can be provided for accurate test of the voice recognition rate.

Drawings

In order to more clearly illustrate the technical solution in the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly described below. It is clear that the described figures are only some embodiments of the invention, not all embodiments, and that a person skilled in the art can also derive other designs and figures from them without inventive effort.

FIG. 1 is a schematic diagram of a front side of a display hardware portion of a speech recognition rate analysis system according to the present invention;

FIG. 2 is a schematic diagram of the rear side of the hardware portion of the display of a speech recognition rate analysis system of the present invention;

FIG. 3 is a system diagram of a speech recognition rate analysis system of the present invention;

FIG. 4 is a functional flow diagram of a speech recognition rate analysis system of the present invention;

FIG. 5 is a schematic diagram illustrating the generation and playing principle of the test audio of the speech recognition rate analysis system according to the present invention.

Detailed Description

The conception, the specific structure and the technical effects of the present invention will be clearly and completely described below in conjunction with the embodiments and the accompanying drawings to fully understand the objects, the features and the effects of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments, and those skilled in the art can obtain other embodiments without inventive effort based on the embodiments of the present invention, and all embodiments are within the protection scope of the present invention. In addition, all the connection relations mentioned herein do not mean that the components are directly connected, but mean that a better connection structure can be formed by adding or reducing connection accessories according to the specific implementation situation. All technical characteristics in the invention can be interactively combined on the premise of not conflicting with each other.

Embodiment 1, referring to fig. 1, fig. 2, fig. 3, fig. 4, and fig. 5, a speech recognition rate analysis system includes:

a first microphone array 110, the first microphone array 110 for conducting an acquisition of test audio;

the display module 120, the display module 120 is used for displaying the function options of the system and providing the user to click and select;

the test system comprises a wireless transmitting module 131 and a wireless receiving module 132, wherein the wireless transmitting module 131 and the wireless receiving module 132 are used for receiving feedback information of the voice module 200 of the test object in a mutual matching manner;

the wireless transmitting module 131 is placed at the tested voice module 200, and the pin connection between the wireless receiving module 132 and the processing module 160 is controlled by the processing module 160;

a test audio delivery module, wherein the test audio delivery module includes a first speaker 141 and a second microphone array 142, the first speaker 141 is used for playing the test audio, and the second microphone array 142 is used for collecting a test audio signal played by the first speaker 141;

a ranging module 150, wherein the ranging module 150 is used for measuring the distance between the processing module 160 and the tested voice module 200;

the processing module 160 includes:

a multimedia encoder 161, wherein the multimedia encoder 161 is configured to filter the test audio collected by the first microphone array 110, perform an a/D conversion on the filtered test audio, and store the converted test audio in a WAV file form in an F L ASH buffer;

and the multimedia decoder 162 is used for performing D/A conversion on the WAV file when the WAV file is called by a serial port, performing power amplification and then playing the WAV file by the first loudspeaker 141.

Specifically, the first microphone array 110 and the second microphone array 142 are used for recording high fidelity audio for testing in the field, i.e. for detecting the voice recognized by the voice module 200, and the audio is played through the first speaker 141. Before testing, the spoken command voice is stored in the memory, and the effect of manual testing is achieved by playing during testing, so that the manual operation is replaced to carry out objective, scientific and efficient testing.

When the display module 120 is started, the function options of 5 virtual buttons, which are respectively "audio recording", "audio playing", "distance testing", "noise testing", and "automatic testing",

the automatic test is an automatic repeated test and a test result is reserved, and other tests are single tests;

entering the option of 'automatic test', starting the test work after clicking 'test' by setting and selecting the test audio, the test times, the audio playing time interval and the placing position,

firstly, detecting an environmental sound signal through a first microphone array 110, and obtaining a sound pressure level through system processing; then, the distance measurement module 150 is operated to measure the distance, and when the sensor on the distance measurement module 150 receives the laser light scattered back, the distance is obtained through data processing; then, the first time of playing of the test audio is started through the first speaker 141, after the playing is finished, the micro control module starts an internal timer to start timing, and the infrared receiving module detects and waits for a signal from the infrared transmitting module at any time. When the second microphone array 142 of the voice module 200 receives the command voice from the test system, the infrared signal is output through the infrared transmitting module of the test port. And the micro control system finishes timing when receiving the infrared signal, processes and records the received signal, completes the first test and displays the current test result through the display screen. And before the set times are reached, the test is circulated.

Turning on an audio recording function through operation of a touch display screen, detecting and receiving sound signals immediately by a first microphone array, enabling the collected sound signals to enter a multimedia encoder 161 through a filter circuit, integrating an analog-to-digital converter (ADC) with adjustable sampling frequency in the encoder to complete analog-to-digital conversion, outputting a generated WAV (uncompressed audio format) file to the encoder and storing the WAV file in a memory, such as F L ASH (flash memory), when the WAV file needs to be extracted for voice test, sending an instruction by an upper computer, extracting voice from F L ASH, sending the voice data to a multimedia decoder 162 at high speed through an SPI (serial peripheral interface) protocol, decoding the voice data through a high-performance DAC, and playing test voice through a power amplifier circuit and a first loudspeaker 141;

the microphone array and multimedia encoder 161 can be used for detection of ambient sound pressure in addition to being used as a recorder and player. After the sound signal of the environment passes through the first microphone array, the WAV file is generated by the ADC of the multimedia encoder 161, and the conversion of the sound signal into a voltage signal is completed. The sound pressure can be obtained by utilizing the voltage signal and the sensitivity parameter conversion of the microphone array, and finally, the environmental decibel size is obtained through a sound pressure level formula.

In a preferred embodiment of the present invention, the display module 120 is embedded in the center of a case, the first microphone array 110 is disposed around the lower portion of the case, the distance measuring module 150 and the wireless receiving module 132 are disposed on the left and right sides of the center of the lower portion of the rear end of the case, and the first speaker 141 is disposed in the center of the upper portion of the rear end of the case.

In addition, a second microphone array 210 and a second speaker 220 are disposed at the tested voice module, wherein the second microphone array 210 is used for detecting and receiving the sound signal of the voice module, and the second speaker 220 is used for enhancing the sound emitted by the voice module.

In a preferred embodiment of the present invention, the wireless transmitting module 131 and the wireless receiving module 132 respectively include an infrared transmitting module and an infrared receiving module.

IN the present embodiment, DATA of the infrared receiving module is led out "REMOTE _ IN" to be connected to PB9 of the STM32 of the processing module 160. IN the infrared communication protocol used IN the present system, the DATA bit is normally set, i.e. DATA remains connected to the 3.3V high level, so that REMOTE _ IN is pulled up to the high level, and when the DATA needs to be pulled down, DATA is cleared, and then REMOTE _ IN is pulled down to the low level.

As a preferred embodiment of the present invention, the distance measuring module 150 is specifically a laser distance measuring module 150, and includes a laser transmitter and a SPAD infrared receiver.

In this embodiment, the core chip of the laser ranging module 150 is V L5310X, and a voltage stabilizing chip XC6206P282MR is adopted,

the specific recognition principle of the speech recognition rate is that the data frame format fed back by the speech module 200 includes a start code, a user code, a data code and a data code complement, and the data code carries core information. By utilizing the characteristics that the voice module 200 receives different voices and feeds back different data frames, in the device, the receiving end compares the decoded signals with the sent voices so as to judge whether the voice module 200 correctly identifies. For example, if the system plays a voice test file "000", which is recognized by the voice module 200 and feeds back a corresponding and unique data frame "000" to the system, the recognition is correct; if the device plays the voice test file '000' and receives the feedback data frame '001', the recognition is wrong.

In this embodiment, the distance measuring module 150 is used for measuring the distance between the micro control module and the measured voice module 200, the device uses laser distance measurement, the processing module 160 can directly give instructions to the distance measuring module 150, the distance can be measured after the measuring mode is selected, the measured value can be displayed, specifically, the pulse type distance measuring technology is adopted, and 940nm red light-free scintillation laser is radiated. When the upper computer sends a distance measurement starting instruction and simultaneously opens the internal timer to start timing, the laser emitter radiates photons to the target. As shown in fig. five, the photons are scattered after striking the object to be measured, and the sensor immediately sends an interrupt request to the upper computer after receiving the returned photons, and the timing is finished. The distance of the back-and-forth flight is calculated by the product of the back-and-forth flight time and the light speed of the measured photon, and half of the value is the actual distance.

The display module 120 comprises a TFT L CD display screen, the system uses a 4.3-inch TFT L CD with a touch screen, also called a true color LCD, the resolution of the module is 800 × 480, the color depth of 24 bits can be displayed in 65536, and the NT35510 is used for driving chip control, and the pins of the module are connected with the FSMC of the processing module 160STM 32.

As a preferred embodiment of the present invention, the processing module 160 includes a micro control module of model STM32F10X series and its peripheral circuits.

While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that the present invention is not limited to the details of the embodiments shown and described, but is capable of numerous equivalents and substitutions without departing from the spirit of the invention as set forth in the claims appended hereto.

Claims

1. A speech recognition rate analysis system, comprising:

a first microphone array for conducting acquisition of test audio;

the processing module comprises:

2. A speech recognition rate analysis system according to claim 1, wherein: the display module embedding sets up in the box body center, the lower part of box body encircles and is provided with first microphone array, the left and right both sides at the rear end lower part center of box body are provided with ranging module and wireless receiving module, the rear end upper portion center of box body is provided with first speaker.

3. A speech recognition rate analysis system according to claim 2, wherein: the wireless transmitting module and the wireless receiving module respectively comprise an infrared transmitting module and an infrared receiving module.

4. A speech recognition rate analysis system according to claim 2, wherein: the ranging module is specifically a laser ranging module and comprises a laser transmitter and an SPAD infrared receiver.

5. The system of claim 1, wherein the display module comprises a TFT L CD display.

6. The speech recognition rate analysis system of claim 5, wherein: the processing module comprises a micro control module of STM32F10X series and a peripheral circuit thereof.

7. A speech recognition rate analysis system according to claim 1, wherein: the second microphone array is arranged at a tested voice module, and a second loudspeaker is further arranged at the tested voice module.