Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide a voice recognition testing method and system based on reverberation simulation.
In a first aspect, the present invention provides a method for testing speech recognition based on reverberation simulation, comprising the steps of:
step S1, a first test scene is set, wherein the first test scene comprises at least one first sound source, a plurality of first closed boundaries and a plurality of reverberation parameter acquisition devices, and the first sound source comprises a plurality of first closed boundaries and a plurality of reverberation parameter acquisition devices, wherein the first sound source comprises a plurality of first closed boundaries:
the first sound source is located within the first closed boundary;
the reverberation parameter acquisition equipment is arranged in the first closed boundary and is arranged around a preset position to be detected in a three-dimensional space;
step S2, through a first test audio sent by the first sound source, the first test audio is reflected by the first closed boundary to form a reverberant sound, and each reverberation parameter acquisition device executes reverberation acquisition according to the reverberant sound received in the acquisition direction and generates a corresponding reverberation parameter;
step S3, generating simulated reverberation test audio according to the reverberation parameter and a second test audio, wherein the second test audio comprises a test instruction corpus representing a preset test instruction;
step S4, setting a second test scene, wherein the second test scene comprises a second closed boundary and a plurality of second sound sources, and the equipment to be tested is arranged in the second test scene, and the second test scene comprises the following components:
the second closed boundary is used for realizing sound insulation between an internal closed environment and an external open environment and eliminating the reverberation possibly generated by the internal closed environment;
the plurality of second sound sources and the equipment to be measured are positioned in the second closed boundary, and the relative position relationship between the equipment to be measured and each second sound source is consistent with the relative position relationship between the position to be measured and each reverberation parameter acquisition equipment;
step S5, through the simulated reverberation test audio sent by the second sound source, the equipment to be tested carries out voice recognition according to the received simulated reverberation test audio and generates a corresponding voice recognition result;
and S6, judging whether the voice recognition result is consistent with the preset test instruction or not, and recording the judgment result.
Optionally, the simulated reverberation test audio sent by the second sound source is generated according to the reverberation parameter generated by the reverberation parameter acquisition device with consistent relative position relationship and the second test audio.
Optionally, in the step S2, the reverberation collection includes:
step S21, sequentially extracting one first test audio from a first test audio set to be sent, wherein the first test audio set comprises a plurality of first test audio with different frequencies, and each first test audio has the same first duration;
step S22, the reverberation parameter acquisition equipment continuously acquires the audio signals received in the acquisition direction, and acquires second duration time and frequency change conditions of the audio signals;
and (3) repeatedly executing the steps S21 to S22 until each reverberation parameter acquisition device cannot acquire the audio signals, and playing all the first test audio in the first test audio set by the first sound source.
Optionally, the reverberation parameter includes a reverberation duration and a frequency decay curve corresponding to a frequency of each of the first test audio;
the reverberation duration includes a difference between the second duration and the first duration at a corresponding frequency;
the frequency decay curve includes the frequency variation over the reverberation duration at a corresponding frequency.
Optionally, in the step S3, the generating the simulated reverberation test audio includes:
step S31, extracting a characteristic segment of the second test audio, and acquiring the average frequency of the characteristic segment;
step S32, selecting the corresponding reverberation parameter according to the average frequency, and generating reverberation superposition audio based on the selected reverberation parameter;
and step S33, superposing the reverberation superposition audio and the second test audio to generate the simulated reverberation test audio.
Optionally, the second test audio further includes an environmental noise corpus representing the preset test instruction, where the environmental noise corpus is used to provide a real environmental simulation for the speech recognition test.
Optionally, the simulated reverberation test audio includes test instruction reverberation audio and ambient noise reverberation audio;
the test instruction reverberation audio is generated according to the test instruction corpus and the reverberation parameter;
the ambient noise reverberant audio is generated according to the ambient noise corpus and the reverberation parameter.
Optionally, at least a portion of the plurality of second sound sources emit the test instruction reverberant audio;
at least a portion of the plurality of second sound sources play the ambient noise reverberant audio.
Optionally, the second sound source emitting the test instruction reverberant audio is on the same horizontal plane as the device under test.
In a second aspect, the present invention further provides a voice recognition test system based on reverberant sound simulation, which is applied to the voice recognition test method, and includes:
a first test scenario for providing a reverberation parameter acquisition environment including at least one first sound source, a first closed boundary, and a plurality of reverberation acquisition devices, wherein:
the first sound source is positioned in the first closed boundary and is used for emitting first test audio, and the first test audio is reflected by the first closed boundary to form reverberant sound;
the reverberation parameter acquisition devices are arranged in the first closed boundary, are arranged around the position to be detected in the three-dimensional space, and are used for executing reverberation acquisition according to the reverberation received in the acquisition direction and generating corresponding reverberation parameters;
the audio generator is used for generating simulated reverberation test audio according to the reverberation parameter and second test audio, and the second test audio comprises a test instruction corpus representing a preset test instruction;
the second test scene is used for providing a voice recognition test environment for the equipment to be tested, the second test scene comprises a second closed boundary and a plurality of second sound sources, and the equipment to be tested is arranged in the second test scene, wherein:
the second closed boundary is used for realizing sound insulation between an internal closed environment and an external open environment and eliminating the reverberation possibly generated by the internal closed environment;
the plurality of second sound sources and the equipment to be tested are all arranged in the second closed boundary, and the relative position relationship between the equipment to be tested and each second sound source is consistent with the relative position relationship between the position to be tested and each reverberation parameter acquisition equipment, so as to play the simulated reverberation test audio;
the equipment to be tested is used for carrying out voice recognition on the received simulated reverberation test audio and generating a corresponding voice recognition result;
and the processor is used for judging whether the voice recognition result is consistent with the preset test instruction or not and recording the judgment result.
Compared with the prior art, the invention has the following beneficial effects:
1. according to the voice recognition testing method and system based on the reverberation simulation, the method for simulating the real reverberation is adopted to replace a traditional method for testing in a real environment, the method and system are not limited by sites any more, and the operation is more convenient and quicker.
2. According to the voice recognition testing method and system based on the reverberation simulation, simulation of a plurality of different scenes can be achieved in the same second closed boundary, so that simulation tests under different reverberation conditions are achieved, the application range of the voice recognition testing method and system is improved, after testing of one scene is completed, the voice recognition testing method and system do not need to be transferred to another testing environment to conduct testing of the other scene, and overall testing efficiency is improved greatly.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the present invention, but are not intended to limit the invention in any way. It should be noted that variations and modifications could be made by those skilled in the art without departing from the inventive concept. These are all within the scope of the present invention.
Before the embodiments of the present invention are explained, the reverberation will be briefly explained, and when the sound wave propagates indoors, the sound wave is reflected by the wall, the ceiling, the floor and other obstacles, and each reflection needs to be absorbed by the obstacles. Therefore, after the sound source stops sounding, the sound waves are reflected and absorbed for a plurality of times in the room and finally disappear, and the sound source is perceived to be mixed with a plurality of sound waves for a period of time after the sound source stops sounding (the sound continuation phenomenon still exists after the sound source stops sounding in the room). This phenomenon is called reverberation and this period is called reverberation time.
Examples
Fig. 1 is a flowchart of a voice recognition testing method based on reverberation simulation according to an embodiment of the present invention, and fig. 2 is a block diagram of a voice recognition testing system based on reverberation simulation according to an embodiment of the present invention; FIG. 3 is a graph of reverberation parameters at 1000Hz of a speech recognition test method based on reverberation simulation according to an embodiment of the present invention; FIG. 4 is a graph of reverberation parameters at 1100Hz of a voice recognition testing method based on reverberation simulation according to an embodiment of the present invention; fig. 5 is a reverberation parameter acquisition chart of a voice recognition testing method based on reverberation simulation according to an embodiment of the present invention; referring to fig. 1, 3, 4 and 5, the method in this embodiment includes:
step S1, a first test scene is set, wherein the first test scene comprises at least one first sound source, a plurality of first closed boundaries and a plurality of reverberation parameter acquisition devices, and the first sound source comprises a plurality of first closed boundaries and a plurality of reverberation parameter acquisition devices, wherein the first sound source comprises a plurality of first closed boundaries:
the first sound source is located within the first closed boundary;
the reverberation parameter acquisition equipment is arranged in the first closed boundary and is arranged around a preset position to be detected in the three-dimensional space.
In this embodiment, the first closed side in the step S1 may be a living room, a bedroom, or a conference room, which is not particularly limited in this application, where the first sound source is generally located at a common sound generating point in the first closed boundary, and the common sound generating point may be a position where a common person is located, a position where an electronic device is located, or the like, where when the reverberation parameter is collected by the reverberation parameter collecting device, one or several positions to be measured need to be preset, and through the preset position to be measured, the coordinate position of the reverberation parameter collecting device relative to the preset position to be measured can be clearly known, so that the setting of the position between the device to be measured and each second sound source in the subsequent step S4 is facilitated.
And S2, through a first test audio sent by a first sound source, the first test audio forms a reverberant sound through reflection of a first closed boundary, and each reverberant parameter acquisition device performs reverberant acquisition according to the reverberant sound received in the acquisition direction and generates corresponding reverberant parameters.
In this embodiment, the reverberation parameter can include a pulse file, in particular, the pulse file can be a kind of snapshot reflecting how the physical space or the audio system responds to the input signal and generates some output in combination with the input signal. In the above embodiment, the pulse file may correspond to a reverberation characteristic curve representing a reverberation parameter.
In this embodiment, the first test audio in step S2 may be a sound emitted by a person or a sound emitted by an electronic device, and the collection direction of the reverberation parameter collection device may be that 6 positions, including up, down, left, right, front, back, and the left of the preset position to be tested, are collected separately, or may be increased to 12 positions or more, and the collection of the reverberation parameters of the multiple positions makes the following simulated real environment more approximate to the real environment.
And S3, generating a simulated reverberation test audio according to the reverberation parameter and a second test audio, wherein the second test audio comprises a test instruction corpus representing a preset test instruction.
In this embodiment, the preset test instruction generally includes a test instruction corpus and an environmental noise corpus, and when the voice performance test is performed, the test instruction corpus with the most basic test instruction corpus is needed to complete the test of the voice performance.
Step S4, setting a second test scene, wherein the second test scene comprises a second closed boundary and a plurality of second sound sources, and the equipment to be tested is arranged in the second test scene, and the second test scene comprises:
the second closed boundary is used for realizing sound insulation between the inner closed environment and the outer open environment and eliminating reverberation possibly generated by the inner closed environment;
the plurality of second sound sources and the equipment to be measured are all positioned in the second closed boundary, the relative position relation between the equipment to be measured and each second sound source is consistent with the relative position relation between the preset position to be measured and each reverberation parameter acquisition equipment, and the setting mode of the position relation is used for realizing the restoration of the test scene of the first closed boundary.
In this embodiment, the device to be tested may be a mobile phone or an intelligent voice recognition robot in a mall, which is not specifically limited in this application, and the relative positional relationship is consistent and may be understood as: in the first closed boundary, a space rectangular coordinate system is established by taking a preset position to be detected as a coordinate origin, and then the space coordinate position of the reverberation parameter acquisition device can be expressed as (X, Y, Z); in the second closed boundary, the position of the device to be detected is taken as the origin of coordinates, and a space rectangular coordinate system identical to the space rectangular coordinate system is established, so that the position of the second sound source corresponding to the reverberation parameter acquisition device can be expressed as (X, Y, Z), wherein X, Y, Z respectively represent coordinates on an X axis, a Y axis and a Z axis of the space rectangular coordinate system.
And S5, performing voice recognition on the equipment to be detected according to the received simulated reverberation test audio by using the simulated reverberation test audio sent by the second sound source and generating a corresponding voice recognition result.
In this embodiment, the speech recognition result may include specific text information.
And S6, judging whether the voice recognition result is consistent with a preset test instruction or not, and recording the judgment result.
In this embodiment, comparing specific text information of a voice recognition result with text information of a preset test instruction, if the specific text information is consistent with the text information of the preset test instruction, marking the specific text information as normal audio and recording correct recognition in a test log; if the voice frequency is inconsistent, the voice frequency is marked as abnormal voice frequency, and a string word or unrecognized word is recorded in the test log.
In this embodiment, the simulation of the same second closed boundary on different test scenes may be implemented, which may be understood that in the process of simulating multiple different real scenes, in the same second closed boundary referred to in this application, the restoration of different real scenes may be implemented by adjusting different reverberation parameters corresponding to the different real scenes. Specifically, for a specific second sound source, the simulated reverberation test audio emitted by the specific second sound source can be generated by combining different pulse files and the second test audio, and the different pulse files respectively correspond to different real scenes, so that simulation of different test scenes in the same second closed boundary is realized.
In an alternative embodiment, the simulated reverberation test audio emitted by the second sound source is generated according to the reverberation parameter generated by the reverberation parameter acquisition device and the second test audio consistent in relative position relationship.
In this embodiment, the generation of the simulated reverberation test audio is implemented by using the reverberation parameters having the consistent relative positional relationship described in step S4.
In an alternative embodiment, in step S2, the reverberation collection includes:
step S21, sequentially extracting a first test audio from a first test audio set to acquire the first test audio, wherein the first test audio set comprises a plurality of first test audio with different frequencies, and each first test audio has the same first duration;
step S22, the reverberation parameter acquisition equipment continuously acquires the audio signals received in the acquisition direction, and acquires the second duration time and the frequency change condition of the audio signals;
and (3) repeatedly executing the steps S21 to S22 until the condition that each reverberation parameter acquisition device cannot acquire the audio signals is reached, and playing all the first test audio in the first test audio set by the first sound source.
In this embodiment, the first test audio set uses 100Hz as the playing precision, that is, 100Hz is the playing precision, that is, the frequency difference between the audio in the first test audio set is a natural multiple of 100Hz, the first test audio set includes sounds with frequencies from 100Hz to 20KHz, the first duration of each frequency sound may be 4s, and when the first duration of each frequency sound is collected, a blank time is reserved after the reverberation of one frequency is completely eliminated, the blank time is used for recording the reverberation parameter in the environment, after the recording is completed, the collection of the other frequency is performed, in this embodiment, the collection is performed on the audio with frequencies of 1000Hz and 1100Hz, fig. 3 shows a reverberation characteristic curve at the frequency of 1000Hz, and fig. 4 shows a reverberation characteristic curve at the frequency of 1100Hz, where the reverberation characteristic curve is composed of the reverberation duration and the frequency decay curve. Specifically, in the reverberation characteristic curves shown in fig. 3 and 4, the horizontal coordinate of the coordinate system in which it is located represents time in seconds, and the vertical coordinate represents frequency in hertz. In the reverberation profile shown in fig. 3 and 4, the first duration of the first test audio at both the 1000Hz frequency and the 1100Hz frequency is 4s. It will be appreciated that the frequency curve shown at the upper part in fig. 3 is the frequency curve of the first test audio at a frequency of 1000Hz, the frequency curve shown at the lower part is the frequency curve collected by the reverberation parameter collection device, and the frequency curve located in the rectangular frame 300 is the frequency decay curve, i.e. the reverberation characteristic curve, within the reverberation duration. Likewise, in fig. 4, the frequency curve shown in the upper part is the frequency curve of the first test audio at 1100Hz, the frequency curve shown in the lower part is the frequency curve collected by the reverberation parameter collection device, and the frequency curve located in the rectangular frame 400 is the frequency decay curve within the reverberation duration.
In an alternative embodiment, the reverberation parameter includes a reverberation duration and a frequency decay curve corresponding to the frequency of each first test audio;
the reverberation duration includes a difference between the second duration and the first duration at the corresponding frequency;
the frequency decay curve includes the frequency change over the duration of the reverberation at the corresponding frequency.
In the present embodiment, the calculation formula of the reverberation duration is T 0 =T 1 -T 2
Wherein T is 0 For reverberation duration, T 1 For a second duration, T 2 For the first duration, a calculation formula for the reverberation duration can be derived without any doubt based on the definition of the reverberation time described above.
In an alternative embodiment, in step S3, the generation of the simulated reverberation test audio includes:
step S31, extracting a characteristic segment of the second test audio, and acquiring the average frequency of the characteristic segment;
step S32, selecting corresponding reverberation parameters according to the average frequency, and generating reverberation superposition audio based on the selected reverberation parameters;
and step S33, superposing the reverberation superposition audio and the second test audio to generate the simulated reverberation test audio.
In this embodiment, the characteristic segment of the second test audio in step S31 is a frequency segment of the corresponding time taken in the section excluding the peak area and the valley area of the frequency in the second test audio. In step S32, the specific implementation of generating the reverberant superimposed audio based on the selected reverberant parameter may be understood that the reverberant parameter and the second test audio are convolved to generate the reverberant superimposed audio, and those skilled in the art may perform the convolution calculation on the reverberant parameter and the second test audio by using a conventional calculation method in the art, which is not limited herein.
In an alternative embodiment, the second test audio further includes an environmental noise corpus representing a preset test instruction, where the environmental noise corpus is used to provide a simulation of a real noise environment for the speech recognition test, and by adding the environmental noise corpus, the speech performance of the device to be tested can be further better tested.
In this embodiment, the environmental noise corpus may be a sound emitted by a person or a sound emitted by other electronic devices, that is, the environmental noise corpus basically adopts a sound that may occur in a real environment.
In an alternative embodiment, the simulated reverberation test audio includes test instruction reverberation audio and ambient noise reverberation audio;
generating test instruction reverberation audio according to the test instruction corpus and the reverberation parameter;
the ambient noise reverberant audio is generated from the ambient noise corpus and the reverberation parameters.
In an alternative embodiment, at least a portion of the plurality of second sound sources emit test instruction reverberant audio;
at least a portion of the plurality of second sound sources plays the ambient noise reverberant audio.
In an alternative embodiment, the second sound source that emits the test instruction reverberant audio is on the same horizontal plane as the device under test. Because in real life, the sound source position and the equipment to be tested are generally on the same horizontal plane, the device is also arranged in the simulation test process, the device is closer to the real environment, and the obtained result has more practical significance.
Referring to fig. 2, the present embodiment further provides a voice recognition test system based on reverberant sound simulation, which is applied to the above voice recognition test method, and includes:
a first test scenario for providing a reverberation parameter acquisition environment comprising at least one first sound source, a first closed boundary and a plurality of reverberation acquisition devices 3, wherein:
the first sound source is positioned in the first closed boundary and used for emitting first test audio, and the first test audio is reflected by the first closed boundary to form reverberant sound;
a plurality of reverberation parameter acquisition devices 3, which are placed in the first closed boundary and around the position to be measured in the three-dimensional space, and are used for performing reverberation acquisition according to the reverberation received in the acquisition direction and generating corresponding reverberation parameters;
the audio generator 4 is used for generating simulated reverberation test audio according to the reverberation parameter and second test audio, wherein the second test audio comprises a test instruction corpus representing a preset test instruction;
a second test scenario, configured to provide a speech recognition test environment for the device under test 8, where the second test scenario includes a second closed boundary 9 and a plurality of second sound sources, and the device under test 8 is in the second test scenario, where:
the second closed boundary is used for realizing sound insulation between the inner closed environment and the outer open environment and eliminating reverberation possibly generated by the inner closed environment;
the plurality of second sound sources and the equipment to be measured 8 are all arranged in the second closed boundary 9, and the relative position relationship between the equipment to be measured 8 and each second sound source is consistent with the relative position relationship between the position to be measured 8 and each reverberation parameter acquisition equipment 3, so as to play the simulated reverberation test audio.
In this embodiment, both the first sound source and the second sound source may use hi-fi.
In this embodiment, the second closed boundary 9 adopts a box body, and the box body can isolate external sound and can fully absorb the sound, thereby avoiding the generation of reverberation. The box body comprises a shell, the shell comprises six panels, the six panels are made of cold-rolled steel plates with the thickness of 2-3mm, the shell is made by stamping, welding, pickling and spraying, one of the six panels can be opened and closed, and 3-6 layers of composite sound insulation panels made of damping, sound insulation and sound absorption materials are fixed on the inner wall of the shell, so that the complete absorption of sound is realized.
The device under test 5 is used for performing voice recognition on the received simulated reverberation test audio and generating a corresponding voice recognition result.
And the processor 7 judges whether the voice recognition result is consistent with a preset test instruction or not and records the judgment result.
In this embodiment, in a specific implementation process, the system may further include an input interface 1 as shown in fig. 2, provided on the filter 2, for importing specified entries;
the filter 2 is used for filtering the imported appointed vocabulary entries from the corpus of the test library to obtain appointed vocabulary entries as the corpus of the test instruction;
and the memory 6 is used for automatically storing the test log file and the abnormal audio file which are output by the processor.
By the above embodiment, the following effects can be achieved:
1. the real environment reverberation audio generation method, the voice performance test method and the system provided by the invention adopt a method for simulating the real reverberation to replace the traditional method for testing in the real environment, are not limited by sites any more, and are more convenient and quicker to operate.
2. According to the real environment reverberation audio generation method, the voice performance test method and the system, simulation of a plurality of different scenes can be achieved in the same second closed boundary, so that simulation tests under different reverberation conditions are achieved, the application range of the real environment reverberation audio generation method is improved, the real environment reverberation audio generation method does not need to be transferred to another test environment to test another scene after the test of one scene is completed, and the overall test efficiency is greatly improved.
The foregoing describes specific embodiments of the present invention. It is to be understood that the invention is not limited to the particular embodiments described above, and that various changes or modifications may be made by those skilled in the art within the scope of the appended claims without affecting the spirit of the invention. The embodiments of the present application and features in the embodiments may be combined with each other arbitrarily without conflict.