WO2022199461A1 - 语音交互系统的测试方法、音频识别方法及相关设备 - Google Patents

语音交互系统的测试方法、音频识别方法及相关设备 Download PDF

Info

Publication number
WO2022199461A1
WO2022199461A1 PCT/CN2022/081530 CN2022081530W WO2022199461A1 WO 2022199461 A1 WO2022199461 A1 WO 2022199461A1 CN 2022081530 W CN2022081530 W CN 2022081530W WO 2022199461 A1 WO2022199461 A1 WO 2022199461A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio
waveform data
standard
test
content
Prior art date
Application number
PCT/CN2022/081530
Other languages
English (en)
French (fr)
Inventor
苗锐
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2022199461A1 publication Critical patent/WO2022199461A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/01Assessment or evaluation of speech recognition systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Definitions

  • the present application relates to a test method of a voice interaction system, an audio recognition method and related equipment.
  • Voice interaction technology is being used more and more, for example, more and more vehicles are equipped with voice interaction function, so that the driver can call car navigation, adjust driving mode and control various actuators of the vehicle through voice, which greatly improves the performance of the vehicle.
  • the convenience of the driver's operation is improved.
  • the function and performance of the voice interactive system need to be strictly tested to ensure the actual application effect of the voice interactive system.
  • the embodiments of the present application provide a test method for testing a voice interactive system, a test device for implementing the test method, an audio recognition method and a device for implementing the recognition method, which can realize automated testing and identification.
  • a first aspect of the present application provides a test method for testing a voice interactive system, comprising: sending a voice command to the voice interactive system; acquiring first waveform data of audio output from a speaker of the voice interactive system; acquiring first waveform data of standard audio two waveform data; dividing the first waveform data into a plurality of first waveform data blocks; dividing the second waveform data into a plurality of second waveform data blocks; calculating the difference between the first waveform data blocks and the Correlation of the second waveform data blocks; generating a first test result according to the correlation, the first test result indicating that the output audio matches or does not match the voice instruction.
  • the output audio content can be recognized at a lower development cost and higher recognition efficiency.
  • the application provides an automated testing method that enables rapid testing. It is worth noting that in the method of the present application, it is not necessary to directly identify the content of the output audio, and the identification of the content itself usually requires high development costs (for example, a large amount of speech training is required), and the recognition speed is low.
  • the above test method is to indirectly identify the content of the output audio based on the comparison result by comparing the output audio with the standard audio. Therefore, the above test method is especially suitable for scenarios where the output audio is relatively fixed, such as the interaction of command/control voice applied to the vehicle and the interaction of the automatic telephone reply system.
  • the standard audio can be obtained by the test equipment testing the voice interaction system in the local database according to the voice command, or it can be sent by the server together with the voice command to the test device. .
  • the calculating the correlation between the first waveform data block and the second waveform data block includes: separately calculating the plurality of first waveform data blocks a plurality of correlation degrees with the plurality of second waveform data blocks; the generating a first test result according to the correlation degrees includes: comparing the plurality of correlation degrees with a plurality of preset thresholds respectively, and according to the The result of the above comparison generates a first test result.
  • the sizes of the multiple preset thresholds are set to be different.
  • the output audio may contain various noises and noises due to different collection or recording conditions, and the quality of the output audio will affect the audio recognition result.
  • the preset thresholds corresponding to each block pair respectively the adaptability of the audio recognition method of the present application to various output audio qualities can be improved, thereby improving the reliability of the recognition result.
  • the applicable range of the above-mentioned test method can be improved.
  • the method further includes: acquiring the content of the standard audio; acquiring the content of the plurality of second waveform data blocks according to the content of the standard audio; The content of the waveform data block, and the preset threshold is set.
  • the method further includes: acquiring the content of the standard audio; when the first test result indicates that the output audio matches the voice instruction, according to the standard audio The content of the output audio is generated.
  • test method of the voice interaction system can realize the recognition of the content of the output audio generated in response to the voice command, thereby improving the test depth.
  • the testing method further includes: acquiring a first output image of the display of the voice interaction system; acquiring a first standard image; based on the first output image and the The first standard image generates a second test result indicating that the first output image and the voice command match or do not match.
  • the above test method can further improve the test depth and improve the test reliability.
  • the first output image is acquired through an Android debugging bridge interface of the voice interaction system.
  • the output image is collected through the Android debugging bridge interface of the voice interactive system, and the above test method can quickly and accurately obtain the output image. Compared with the traditional way of collecting the output image through the camera, this method can reduce or avoid the introduction of negative factors such as image deformation during the acquisition process, thereby improving the reliability of subsequent identification.
  • the testing method further includes: after sending the voice command, sending a man-machine interface operation command to the voice interaction system; two output images; acquiring a second standard image; generating a third test result based on the second output image and the second standard image, the third test result indicating the operation of the second output image and the man-machine interface Instructions match or do not match.
  • the second output image is acquired through the Android debug bridge interface.
  • test method can provide man-machine interface operation instructions related to voice commands. It can provide a test environment that is more suitable for the actual use scene of the car and the machine. Thus, the above-mentioned test method increases the test depth, thereby improving the test reliability.
  • the method further includes: acquiring a first packet sent and received by the voice interaction system; acquiring a first standard packet; according to the first packet and the first standard packet The message generates a fourth test result, and the fourth test result indicates that the first message and the voice command match or do not match.
  • the above test method can further improve the test depth and improve the test reliability.
  • a second aspect of the present application provides an audio recognition method, comprising: acquiring first waveform data of audio to be recognized; acquiring second waveform data of standard audio; dividing the first waveform data into multiple first waveform data blocks ; Divide the second waveform data into a plurality of second waveform data sub-blocks; calculate the correlation between the first waveform data sub-block and the second waveform data sub-block; generate a first identification according to the correlation
  • the first recognition result indicates that the to-be-recognized audio is the same as or different from the standard audio.
  • the calculating the correlation between the first waveform data block and the second waveform data block includes: separately calculating the plurality of first waveform data blocks a plurality of correlation degrees with the plurality of second waveform data blocks; the generating a first identification result according to the correlation degrees includes: comparing the plurality of correlation degrees with a plurality of preset thresholds respectively, and according to the The result of the above comparison generates a first recognition result.
  • the sizes of the multiple preset thresholds are set to be different.
  • the method further includes: acquiring the content of the standard audio; acquiring the content of the plurality of second waveform data blocks according to the content of the standard audio; The content of the waveform data block, and the preset threshold is set.
  • the content of the audio to be recognized is generated according to the content of the standard audio.
  • a third aspect of the present application provides a test device for testing a voice interactive system, including: a voice command generation device for sending voice commands to the voice interactive system; an audio collection device for outputting first waveform data of audio; a first obtaining device for obtaining second waveform data of standard audio; a first dividing module for dividing the first waveform data into a plurality of first waveform data blocks; The second division module is used to divide the second waveform data into a plurality of second waveform data blocks; the calculation module is used to calculate the correlation between the first waveform data block and the second waveform data block ; an audio determining device, configured to generate a first test result according to the correlation degree, the first test result indicating that the output audio matches or does not match the voice instruction.
  • the calculating the correlation between the first waveform data block and the second waveform data block performed by the calculating module includes: calculating the multiple a plurality of correlation degrees between the first waveform data block and the plurality of second waveform data blocks; the generating the first test result according to the correlation degree performed by the audio determining apparatus includes: correlating the plurality of correlation degrees The test result is compared with a plurality of preset thresholds respectively, and the first test result is generated according to the result of the comparison.
  • a threshold adjustment module is further included, configured to set the sizes of the plurality of preset thresholds to be different.
  • the first obtaining device is further configured to: obtain the content of the standard audio; obtain the content of the plurality of second waveform data blocks according to the content of the standard audio content;
  • the threshold adjustment module is further configured to set the preset threshold according to the content of the second waveform data block.
  • the first obtaining device is further configured to obtain the content of the standard audio;
  • the testing device further includes an audio identification module, and the audio identification module is configured to When the first test result indicates that the output audio matches the voice instruction, the content of the output audio is generated according to the content of the standard audio.
  • the testing device further includes: an image acquisition device for acquiring the first output image of the display of the voice interaction system; a second acquisition device for acquiring the first output image Standard image; an image determination device for generating a second test result according to the first output image and the first standard image, the second test result indicating that the first output image matches the voice instruction or Mismatch.
  • the acquiring the first output image of the display of the voice interaction system performed by the image acquisition device includes: acquiring the first output image of the display of the voice interaction system through an Android debugging bridge interface of the voice interaction system. the first output image.
  • the testing device further includes: a man-machine interface operation instruction generating device, configured to send a man-machine interface operation instruction to the voice interaction system after sending the voice instruction an image acquisition device for acquiring a second output image of the display of the voice interactive system; a second acquiring device for acquiring a second standard image; an image determining device for acquiring a second output image according to the second output image and the The second standard image generates a third test result indicating that the second output image matches or does not match the man-machine interface operation instruction.
  • a man-machine interface operation instruction generating device configured to send a man-machine interface operation instruction to the voice interaction system after sending the voice instruction an image acquisition device for acquiring a second output image of the display of the voice interactive system
  • a second acquiring device for acquiring a second standard image
  • an image determining device for acquiring a second output image according to the second output image and the The second standard image generates a third test result indicating that the second output image matches or does not match the man-machine interface
  • the acquiring the second output image of the display of the voice interaction system performed by the image acquisition device includes: acquiring through an Android debugging bridge interface of the human-computer interaction system the second output image.
  • the method further includes: a message collection device, configured to obtain the first message sent and received by the voice interaction system; a third obtaining device, used to obtain the first standard message; A message determination device, configured to generate a fourth test result according to the first message and the first standard message, where the fourth test result indicates that the first message and the voice command match or do not match.
  • a fourth aspect of the present application provides an audio recognition device, comprising: an audio acquisition module for acquiring first waveform data of audio to be recognized; a first acquiring module for acquiring second waveform data of standard audio; a first dividing module , used to divide the first waveform data into a plurality of first waveform data blocks; a second division module, used to divide the second waveform data into a plurality of second waveform data blocks; a calculation module, used in calculating the correlation between the first waveform data block and the second waveform data block; the identification module is used for generating a first identification result according to the correlation, and the first identification result indicates the to-be-identified
  • the audio is the same as or different from the standard audio.
  • the calculating the correlation between the first waveform data block and the second waveform data block performed by the calculating module includes: calculating the multiple a plurality of correlation degrees between the first waveform data block and the plurality of second waveform data blocks; wherein the generating the first recognition result according to the correlation degree performed by the identification module includes: combining the plurality of The correlation is compared with a plurality of preset thresholds respectively, and the first identification result is generated according to a result of the comparison.
  • a threshold adjustment module is further included, configured to set the sizes of the plurality of preset thresholds to be different.
  • the first obtaining module is further configured to: obtain the content of the standard audio; obtain the content of the plurality of second waveform data blocks according to the content of the standard audio content; the threshold adjustment module is further configured to set the preset threshold according to the content of the second waveform data block.
  • the first obtaining module is further configured to obtain the content of the standard audio; the identifying module is further configured to indicate the audio to be identified when the first identification result indicates the audio to be identified.
  • the content of the to-be-identified audio is generated according to the content of the standard audio.
  • the present application provides a vehicle-machine voice interaction test system, which includes: a test management device for sending test cases to manage the vehicle-machine voice interaction test; the test according to any one of the above-mentioned fourth aspects
  • the device is connected with the test management device, and is used for performing a car-machine voice interaction test on the vehicle-machine; wherein, the test device provides the test instruction according to the test case.
  • the fifth aspect includes the test equipment of the fourth aspect, which will similarly have the advantages or benefits of the fourth aspect, the advantages or benefits of the fifth aspect will not be repeated here.
  • the present application provides a computing device comprising: a bus; a communication interface connected to the bus; at least one processor connected to the bus; and at least one memory connected to the bus and store program instructions, the program instructions, when executed by the at least one processor, cause the at least one processor to perform the method of any one of the above-mentioned first aspect and the above-mentioned third aspect.
  • the sixth aspect may perform the method described in any one of the first aspect and the third aspect above, it will similarly have the advantages or benefits of the first aspect or the third aspect described above, so for the advantages of the sixth aspect Or the benefits will not be repeated here.
  • the present application provides a computer-readable storage medium storing program instructions, wherein the program instructions, when executed by a computer, cause the computer to execute any one of the above-mentioned first aspect and the above-mentioned third aspect the method described.
  • the seventh aspect can perform the method described in any one of the first aspect and the third aspect above, it will similarly have the advantages or benefits of the first aspect or the third aspect above, and therefore the advantages of the seventh aspect Or the benefits will not be repeated here.
  • FIG. 1 is a schematic structural block diagram of a vehicle machine involved in an embodiment of the present application
  • FIG. 2 is a schematic structural diagram of a vehicle-machine voice interaction test system involved in an embodiment of the present application
  • Fig. 3 is the structural representation of the test equipment of Fig. 2;
  • FIG. 4 is a schematic block diagram of the structure of an electronic control unit involved in an embodiment of the present application.
  • FIG. 5 is a schematic flowchart of a testing method of a voice interaction system according to an embodiment of the present application
  • FIG. 6 is a schematic structural diagram of an audio recognition device according to an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of an identification module according to an embodiment of the present application.
  • Fig. 8 is the audio recognition method of one embodiment of the present application.
  • Fig. 9 is the pre-processing performed by the signal processing module of an embodiment of the present application on the waveform of the output audio;
  • Fig. 10 is the division of the waveform of the output audio of Fig. 9 and the waveform of standard audio;
  • FIG. 11 is a schematic structural diagram of an image recognition apparatus according to an embodiment of the present application.
  • FIG. 12 is a schematic diagram of various aspects of an image recognition method related to one embodiment of the present application.
  • FIG. 13 is a schematic structural diagram of a computing device provided by an embodiment of the present application.
  • test system plays the corpus through a playback device such as an artificial mouth
  • the car microphone collects the playing corpus
  • the car recognizes the playing corpus and plays the feedback voice through the speaker.
  • the test system then collects the audio or original waveform signal of the car's speakers to determine whether the car's car recognizes the voice command and gives voice feedback.
  • this solution only judges whether the vehicle has given voice feedback, but does not determine the specific content of the voice feedback from the vehicle. Verification of other aspects of interaction, insufficient test depth.
  • test method in which the test system manages the corpus database and the noise database, plays the corpus that needs to be played in a specific scenario through the voice playback system, and superimposes a certain decibel value on the played audio through the noise simulation system noise, so as to verify the recognition performance of the vehicle-machine voice interaction system in various noise scenarios.
  • This method can verify the recognition performance of the vehicle-machine voice interaction system in various noise scenarios simulating the actual driving process.
  • this method only verifies the voice recognition performance of the vehicle and the vehicle by collecting the voice recognition log of the vehicle and the vehicle, and lacks the process of verifying the feedback voice content of the vehicle and the vehicle according to the actual feedback voice of the vehicle.
  • this method only tests the speech recognition performance of the vehicle and does not test the feedback voice performance of the vehicle.
  • the voice interaction test is incomplete and the test depth is insufficient.
  • this method cannot verify whether the car-machine performs other related operations other than the feedback voice for the voice command, and cannot completely verify the correctness of the logical interaction of the car body.
  • an embodiment of the present application provides a test method for testing a voice interaction system.
  • the testing method of the voice interactive system includes: sending voice commands to the microphone of the voice interactive system; obtaining first waveform data of the output audio of the speaker of the voice interactive system; obtaining second waveform data of standard audio; The first waveform data is divided into a plurality of first waveform data blocks; the second waveform data is divided into a plurality of second waveform data blocks; the first waveform data blocks and the second waveform data blocks are calculated. a degree of relevance of the block; generating a first test result based on the degree of relevance, the first test result indicating that the output audio matches or does not match the voice instruction.
  • the waveform data here is data representing the change in the intensity of the audio over time.
  • the present application provides an automated testing method that can quickly perform testing.
  • the above method it is not necessary to directly recognize the content of the output audio itself, and the recognition of the content itself usually requires high development costs (eg, a large amount of speech training is required), and the recognition speed is low.
  • the above method compares the output audio with the standard audio, and indirectly identifies the content of the output audio based on the comparison result, so that the test result can be obtained quickly.
  • the above method is especially suitable for scenarios where the output audio is relatively fixed, such as the interaction of command/control voice applied to the vehicle and the interaction of an automatic telephone reply system.
  • the block processing method can be used to set different discrimination conditions in different audio output environments, thereby increasing the reliability of the automated test results.
  • the testing method of the voice interaction system of the present embodiment can realize rapid and reliable automatic testing.
  • the frequency-domain correlation calculation may be performed for each waveform data block separately to obtain a plurality of frequency-domain correlation degrees.
  • the frequency domain correlation here is an example of the correlation in the present application.
  • the so-called correlation here refers to the degree of similarity, and it goes without saying that in addition to the correlation in the frequency domain, other correlations may also be used.
  • the same number of first waveform data blocks and the second waveform data blocks can be obtained, or different numbers of first waveform data blocks can be obtained. and block the second waveform data. In this case, a part of the waveform data in the side with the larger number may not be processed.
  • the waveform data of the output audio and the waveform data of the standard audio can be divided by the same time step, and the time steps used for dividing the waveform data of the output audio and the waveform data of the standard audio can be adjusted respectively.
  • the correlation when calculating the correlation between the first waveform data block and the second waveform data block, the correlation may be calculated for each pair of data blocks, or only a part of the data block pairs may be calculated.
  • relativity As a case where the correlation is only calculated for a part of the data blocks, for example, the content of the standard audio can be obtained, and then the content of each second data block of the standard audio can be obtained, and whether to calculate the correlation is determined according to the content. Specifically, assuming that the content of standard audio is "
  • the calculating the correlation between the first waveform data block and the second waveform data block includes: calculating the plurality of first waveform data blocks respectively a plurality of correlation degrees with the plurality of second waveform data blocks; the generating a first test result according to the correlation degrees includes: comparing the plurality of correlation degrees with a plurality of preset thresholds respectively, and according to the The result of the above comparison generates a first test result.
  • the first test result indicates that the output audio matches or does not match the voice command
  • the test result for example, when the number of correlations greater than the preset threshold accounts for a certain proportion or more of the total correlations, a first test result indicating that the output audio matches the voice command is generated.
  • the sizes of the plurality of preset thresholds are set to be different.
  • the output audio may contain various noises and noises due to different collection or recording conditions, and the quality of the output audio will affect the audio recognition result.
  • the preset thresholds corresponding to each block pair respectively By adjusting the preset thresholds corresponding to each block pair respectively, the adaptability of the testing method of this embodiment to various output audio qualities can be improved, thereby improving the reliability of the recognition result.
  • the applicable scope of the test method of this embodiment can be improved, so that the above-mentioned test method can be adapted to various scenarios, and the reliability of the test result can be ensured.
  • it further includes: acquiring the content of the standard audio; acquiring the content of the plurality of second waveform data blocks according to the content of the standard audio; The content of the waveform data block, and the preset threshold is set.
  • the multiple preset thresholds that are set are usually different, but may also be the same.
  • the preset threshold corresponding to the data block is set to a lower value, and the preset threshold corresponding to the waveform data block of “
  • it further includes: acquiring the content of the standard audio; when the first test result indicates that the output audio matches the voice instruction, according to the standard audio
  • the content of the output audio is generated.
  • the content of the standard audio can be directly used as the content of the output audio.
  • the content of the output audio may be slightly different from that of the standard audio.
  • test method of the voice interaction system can realize the identification of the content of the output audio generated in response to the voice command, and then can realize the judgment of the matching between the output audio content and the voice command, thereby improving the test depth.
  • the voice interaction system includes a display; the test method further includes: acquiring a first output image of the display; acquiring a first standard image; based on the first output The image and the first standard image generate a second test result indicating that the first output image and the voice command match or do not match.
  • the above test method can further improve the test depth and improve the reliability of the test result.
  • the voice interaction system includes an Android debugging bridge interface; and the first output image is obtained through the Android debugging bridge interface.
  • the output image is collected through the Android debugging bridge interface of the voice interactive system, and the above test method can quickly and accurately obtain the output image.
  • this method can reduce or avoid negative factors such as image deformation during the acquisition process, thereby improving the reliability of subsequent image recognition, thereby ensuring the reliability of test results.
  • the test method further includes: after sending the voice command, sending a man-machine interface operation command to the voice interaction system; acquiring the information of the display of the voice interaction system a second output image; acquiring a second standard image; generating a third test result based on the first output image and the second standard image, the third test result indicating the relationship between the second output image and the man-machine interface Operation instructions match or do not match.
  • the human-computer interaction system includes an Android debugging bridge interface; the second output image is obtained through the Android debugging bridge interface.
  • the human-machine interface operation instructions are provided through the Android debugging bridge interface, and the above test method can provide operation instructions related to voice instructions, and the operation instructions directly act on the vehicle, especially the human-machine interface that can directly act on the vehicle.
  • a test environment that is more suitable for the actual use scene of the vehicle can be provided.
  • the above-mentioned test method increases the test depth, thereby improving the test reliability.
  • test method of this embodiment optionally, it further includes: acquiring a first message sent and received by the voice interaction system; acquiring a first standard message; and according to the first message and the first standard message The message generates a fourth test result, and the fourth test result indicates that the first message and the voice command match or do not match.
  • the above test method can further improve the test depth and improve the test reliability.
  • an embodiment of the present application provides a test device for a voice interaction system, including: a voice command generation device for sending voice commands to the voice interaction system; an audio collection device for using is used to obtain the first waveform data of the output audio of the speaker of the voice interaction system; the first obtaining device is used to obtain the second waveform data of the standard audio; the first division module is used to divide the first waveform data into A plurality of first waveform data blocks; a second division module for dividing the second waveform data into a plurality of second waveform data blocks; a calculation module for calculating the difference between the first waveform data blocks and all the second waveform data blocks The correlation degree of the second waveform data block; the audio judgment device is configured to generate a first test result according to the correlation degree, and the first test result indicates that the output audio matches or does not match the voice instruction.
  • the present application provides an automated testing method that can quickly perform testing.
  • the above method it is not necessary to directly recognize the content of the output audio itself, and the recognition of the content itself usually requires high development costs (eg, a large amount of speech training is required), and the recognition speed is low.
  • the above method compares the output audio with the standard audio, and indirectly identifies the content of the output audio based on the comparison result, so that the test result can be obtained quickly.
  • the above method is especially suitable for scenarios where the output audio is relatively fixed, such as the interaction of command/control voice applied to the vehicle and the interaction of an automatic telephone reply system.
  • the block processing method can be used to set different discrimination conditions in different audio output environments, thereby increasing the reliability of the automated test results.
  • the testing method of the voice interaction system of the present embodiment can realize rapid and reliable automatic testing.
  • the frequency-domain correlation calculation may be performed for each waveform data block separately to obtain a plurality of frequency-domain correlation degrees.
  • the frequency domain correlation here is an example of the correlation in the present application.
  • the so-called correlation here refers to the degree of similarity, and it goes without saying that in addition to the correlation in the frequency domain, other correlations may also be used.
  • the calculating the correlation between the first waveform data segment and the second waveform data segment performed by the calculating module includes: calculating the plurality of a plurality of correlation degrees between the first waveform data block and the plurality of second waveform data blocks; the generating the first test result according to the correlation degree performed by the audio determining apparatus includes: correlating the plurality of correlation degrees The degree is compared with a plurality of preset thresholds respectively, and a first test result is generated according to the result of the comparison.
  • the testing device of this embodiment optionally, it further includes a threshold adjustment module, configured to set the sizes of the plurality of preset thresholds to be different.
  • the output audio may contain various noises and noises due to different collection or recording conditions, and the quality of the output audio will affect the audio recognition result.
  • the preset thresholds corresponding to each block pair respectively By adjusting the preset thresholds corresponding to each block pair respectively, the adaptability of the testing device of this embodiment to various output audio qualities can be improved, thereby improving the reliability of the recognition result.
  • the applicable scope of the test equipment of this embodiment can be improved, so that the above-mentioned test method can be adapted to various scenarios, and the reliability of the test result can be ensured.
  • the first obtaining device is further configured to: obtain the content of the standard audio; obtain the data of the plurality of second waveform data blocks according to the content of the standard audio content; the threshold adjustment module is further configured to set the preset threshold according to the content of the second waveform data block.
  • the preset threshold corresponding to the data block is set to a lower value, and the preset threshold corresponding to the waveform data block of “
  • the first obtaining device is further configured to obtain the content of the standard audio; the test device further includes an audio recognition module, and the audio recognition module is used for The first test result indicates that the output audio is generated according to the content of the standard audio when the output audio matches the voice instruction.
  • test method of the voice interaction system can realize the identification of the content of the output audio generated in response to the voice command, and then can realize the judgment of the matching between the output audio content and the voice command, thereby improving the test depth.
  • the voice interaction system includes a display; the test device further includes: an image acquisition device for acquiring a first output image of the display; a second acquisition device for using for acquiring a first standard image; an image determination device for generating a second test result according to the first output image and the first standard image, the second test result indicating the first output image and the voice Instructions match or do not match.
  • the test device of this embodiment can further improve the test depth and improve the reliability of the test result.
  • the voice interaction system includes an Android debugging bridge interface; the image acquisition device is configured to acquire the first output image through the Android debugging bridge interface.
  • the output image is collected through the Android debugging bridge interface of the voice interaction system, and the test device of this embodiment can quickly and accurately obtain the output image.
  • this method can reduce or avoid negative factors such as image deformation during the acquisition process, thereby improving the reliability of subsequent image recognition, thereby ensuring the reliability of test results.
  • the voice interaction system includes a display; the test equipment further includes: a man-machine interface operation instruction generating device, configured to send the voice instruction to the voice
  • the interactive system sends the man-machine interface operation instruction;
  • the image acquisition device is used to acquire the second output image of the display;
  • the second acquisition device is used to acquire the second standard image;
  • the image determination device is used to obtain the second output image according to the first output
  • the image and the second standard image generate a third test result indicating that the second output image matches or does not match the man-machine interface operating instructions.
  • the human-computer interaction system includes an Android debugging bridge interface; the image acquisition device is configured to acquire the second output image through the Android debugging bridge interface.
  • the man-machine interface operation instructions are provided through the Android debugging bridge interface, and the test device of this embodiment can provide operation instructions related to voice instructions, and the operation instructions directly act on the vehicle, especially the man-machine that can directly act on the vehicle.
  • the interactive interface can provide a test environment that is more suitable for the actual use scene of the vehicle.
  • the test method of the present embodiment increases the test depth, thereby improving the test reliability.
  • test equipment of this embodiment optionally, it further includes: a message collection device, used for obtaining the first message sent and received by the voice interaction system; a third obtaining device, used for obtaining the first standard message; A message determination device, configured to generate a fourth test result according to the first message and the first standard message, where the fourth test result indicates that the first message and the voice command match or do not match.
  • an embodiment of the present application provides an audio recognition method.
  • the method includes: acquiring first waveform data of the audio to be recognized; acquiring second waveform data of standard audio; dividing the first waveform data into a plurality of first waveform data blocks; dividing the first waveform data into blocks; The second waveform data is divided into a plurality of second waveform data blocks; the correlation degree between the first waveform data block and the second waveform data block is calculated; a first identification result is generated according to the correlation degree, and the first A recognition result indicates that the to-be-recognized audio is the same or different from the standard audio.
  • the waveform data of the audio to be recognized and the standard audio are divided into blocks, and then the correlation between the waveform data blocks of the audio to be recognized and the data blocks of the standard audio is calculated. After satisfying the set conditions, it can be considered that The audio to be recognized is the same as the standard audio. Since the waveform data of the audio to be recognized that is actually collected is easily disturbed in the actual environment, the block processing method is used to set different judgment conditions in different environments, thereby increasing the reliability of the automatic judgment result.
  • the block correlation-based audio recognition method and apparatus can be applied to various scenarios.
  • the above-mentioned audio recognition method can be applied not only to the vehicle machine to recognize the human voice command, but also to the vehicle machine test equipment to recognize the output audio generated by the vehicle machine in response to the test instruction.
  • the above-mentioned audio recognition method can not only be applied to the car machine and the voice interaction test equipment for testing the car machine, but also can be applied to devices and systems with audio recognition functions and the voice interaction test equipment that can test these devices and systems.
  • the device and system with audio recognition function are, for example, a robot system, an automatic telephone answering system, an automatic customer service system, and the like.
  • test method of the voice interaction system provided by the embodiments of the present application can not only be applied to the test with the vehicle and the machine as the object, but also can be similarly applied to the test object of the robot system, the telephone automatic reply system, the automatic customer service system, etc. in the test.
  • the frequency-domain correlation calculation may be performed for each waveform data block separately to obtain a plurality of frequency-domain correlation degrees.
  • the frequency domain correlation here is an example of the correlation in the present application.
  • the so-called correlation here refers to the degree of similarity, and it goes without saying that in addition to the correlation in the frequency domain, other correlations may also be used.
  • the same number of first waveform data blocks and second waveform data blocks can be obtained, or different numbers of first waveform data blocks can be obtained. and block the second waveform data. In this case, a part of the waveform data in the side with the larger number may not be processed.
  • the correlation when calculating the correlation between the first waveform data block and the second waveform data block, the correlation may be calculated for each pair of data blocks, or only a part of the data block pairs may be calculated.
  • relativity As a case where the correlation is only calculated for a part of the data blocks, for example, the content of the standard audio can be obtained, and then the content of each second data block of the standard audio can be obtained, and whether to calculate the correlation is determined according to the content. Specifically, assuming that the content of the standard audio is "
  • the calculating the correlation between the first waveform data segment and the second waveform data segment includes: calculating the plurality of first waveform data segments respectively. a plurality of correlation degrees between the block and the plurality of second waveform data blocks; the generating a first identification result according to the correlation degrees includes: respectively comparing the plurality of correlation degrees with a plurality of preset thresholds, and according to The result of the comparison generates a first recognition result.
  • the sizes of the plurality of preset thresholds are set to be different.
  • the collected audio to be recognized may contain various noises and noises due to different collection or recording conditions, and the quality of the audio with recognition will affect the audio recognition result.
  • the preset thresholds corresponding to each block pair respectively By adjusting the preset thresholds corresponding to each block pair respectively, the adaptability of the audio recognition method of this embodiment to the audio quality of various band recognition can be improved, thereby improving the reliability of the recognition result.
  • the applicable scope of the method of this embodiment can be increased, so that the method can adapt to various scenarios, and the reliability of the test result can be ensured.
  • it further includes: acquiring the content of the standard audio; acquiring the content of the plurality of second waveform data blocks according to the content of the standard audio; The content of the two waveform data blocks, and the preset threshold is set.
  • the reliability of the recognition result, or the accuracy of the recognition result can be improved.
  • the content of the standard audio is "
  • the preset threshold corresponding to the data block is set lower, and the preset threshold corresponding to the waveform data block of "
  • the audio recognition method of this embodiment optionally, it further includes acquiring the content of the standard audio; when the first recognition result indicates that the to-be-recognized audio is the same as the standard audio, according to the standard audio The content of the audio to be recognized is generated.
  • an embodiment of the present application provides an audio recognition device, comprising: an audio acquisition module for acquiring the first waveform data of the audio to be recognized; a first acquisition module for acquiring the first waveform data of the standard audio. Two waveform data; a first dividing module for dividing the first waveform data into multiple first waveform data blocks; a second dividing module for dividing the second waveform data into a plurality of second waveforms a data block; a calculation module for calculating the correlation between the first waveform data block and the second waveform data block; an identification module for generating a first identification result according to the correlation, the first A recognition result indicates that the to-be-recognized audio is the same or different from the standard audio.
  • the waveform data of the audio to be recognized and the standard audio are divided into blocks, and then the correlation between the waveform data blocks of the audio to be recognized and the data blocks of the standard audio is calculated. After satisfying the set conditions, it can be considered that The audio to be recognized is the same as the standard audio. Since the waveform data of the audio to be recognized that is actually collected is easily disturbed in the actual environment, the block processing method is used to set different judgment conditions in different environments, thereby increasing the reliability of the automatic judgment result.
  • the calculating the correlation between the first waveform data segment and the second waveform data segment performed by the calculating module includes: calculating the multiple a plurality of correlation degrees between the first waveform data blocks and the plurality of second waveform data blocks; the generating a first recognition result according to the correlation degrees performed by the identification module includes: correlating the plurality of correlation degrees The degree of detection is compared with a plurality of preset thresholds respectively, and a first identification result is generated according to the result of the comparison.
  • a threshold adjustment module is further included, configured to set the sizes of the plurality of preset thresholds to be different.
  • the first obtaining module is further configured to: obtain the content of the standard audio; obtain the plurality of second waveform data blocks according to the content of the standard audio content; the threshold adjustment module is further configured to set the preset threshold according to the content of the second waveform data block.
  • the first acquisition module is further configured to acquire the content of the standard audio; the identification module is further configured to indicate the output audio in the first test result When matching with the voice instruction, the content of the audio to be recognized is generated according to the content of the standard audio.
  • FIG. 1 is a structural block diagram of a vehicle machine according to an embodiment of the present application.
  • the vehicle 2000 has a voice interaction function, as shown in FIG. 1 , it has a control unit 2001 , a microphone 2002 , a speaker 2003 , a display 2004 and an Android Debug Bridge (ADB) interface 2005 and the like.
  • the control unit 2001 can be an electronic control unit (ECU), and the electronic control unit refers to a control device composed of an integrated circuit for realizing a series of functions such as data analysis, processing and transmission. The arithmetic processing of is performed by the control unit 2001.
  • Microphone 2002 is used to receive voice commands.
  • the speaker 2003 is used to issue a prompt sound to the occupant.
  • the speaker 2003 sends out a prompt sound of "the air conditioner has been turned on for you".
  • the speaker 2003 can also be used to play music and the like.
  • the display 2004, for example, has a touch screen for displaying a human-computer interaction interface, and the display 2004, for example, can also display a navigation screen and the like.
  • the Android debug bridge interface 2005 is used for the test equipment described later to provide input operations to the human-computer interaction interface displayed on the display 2004 , and is also used for the test equipment to obtain the output image displayed on the display 2004 from the vehicle machine 2000 .
  • FIG. 2 schematically shows a schematic structural diagram of a vehicle-machine voice interaction test system 1000 according to an embodiment of the present application.
  • the vehicle-machine voice interaction test system 1000 in this embodiment includes a test management device 1100 and a test device 1200 that are connected to each other.
  • the test management device 1100 is used to manage the tests performed by the test device 1200 by sending test cases to the test device 1200 .
  • the test management device 1100 may be a device or server with data storage and management functions, such as a cloud server, a network server, an application server, and a management server.
  • the test management device 1100 may send test cases to the test voice interaction test device 1200 through the interactive interface, and may receive feedback information such as test results from the test voice interaction test device 1200 .
  • the functions of the test management device 1100 and the functions of the test device 1200 may also be integrated into one device.
  • Test cases may include voice command use cases, Human Machine Interaction (HMI) operation command use cases, standard audio, standard images, and standard messages. All of these contents may be sent by the test management device 1100 to the test device 1200 , or the test management device 1100 may only send a part of the content to the test device 1200 .
  • the local database of the test device 1200 includes a standard audio database, a standard image database, and a standard message database.
  • the test management device 1100 sends the voice command use case and the man-machine interface operation command use case to the test device 1200, and the test device 1200 retrieves standard audio, standard image and standard message from the local database.
  • a search index associated with each test instruction may also be attached to the test case.
  • the retrieval index is used by the test equipment 1200 to retrieve the standard audio, standard image and standard message associated with the command use case (voice command use case and human-machine interface operation command use case) from all standard audios, standard images and standard messages.
  • the instruction use case is related to the air conditioner
  • retrieve the standard audio, standard image and standard message related to the air conditioner function and use these standard audio, standard image and standard message with the collected audio, image and message in the car.
  • the packets are compared to determine whether the test passes. In this way, compared to testing using all standard audio, standard images, and standard messages, the amount of computation can be reduced and the test speed can be improved.
  • the number of retrieved standard audios associated with the instruction use case may be multiple or one.
  • test management device 1100 can send the test case to the test device 1200 .
  • test case can be sent through Transmission Control Protocol/Internet Protocol (TCPIP).
  • TCP Transmission Control Protocol/Internet Protocol
  • the test equipment 1200 is used to test the vehicle machine 2000 based on the instruction case (voice instruction and man-machine interface operation instruction) included in the test case.
  • the in-vehicle machine 2000 here is an example of the voice interaction system in this application, which can also be referred to as an in-vehicle voice interaction system.
  • FIG. 3 schematically shows a schematic structural diagram of the test equipment of FIG. 2 .
  • the test apparatus of the embodiment of the present application will be described in detail below with reference to FIG. 3 .
  • the test equipment 1200 may include a voice command generation device 1201, an audio collection device 1202, a first acquisition device 1203, an audio recognition device 1204 and an audio determination device 1205, these devices are used to determine the output of the vehicle machine in response to the voice command contained in the test case Consistency of the content of the audio with the intent of the voice command.
  • the voice instruction generating means 1201 is configured to provide voice instructions based on the test case sent by the test management device 1100.
  • the voice instruction generating means 1201 may include, for example, a speaker for providing voice instructions.
  • the audio collection device 1202 is configured to collect the output audio generated by the speaker of the vehicle machine 2000 in response to the voice command, and obtain the waveform data of the output audio.
  • the waveform data here is data representing the change in the intensity of the audio over time.
  • the audio collection device 1202 may be, for example, a microphone, a tape recorder, or the like.
  • the audio collection device 1202 can also collect the original waveform signal of the speaker.
  • the audio collection device 1202 here also corresponds to the audio collection module in this application.
  • the first acquiring means 1203 is configured to acquire the standard audio associated with the voice command, and obtain the content and waveform data of the standard audio.
  • the standard audio may be acquired from the test case, or may be acquired from the audio database of the test device 1200 according to the retrieval index included in the test case.
  • the first obtaining means 1203 here also corresponds to the first obtaining module in this application.
  • the waveform data of all standard audios of the same voice instruction may be stored in association with the same retrieval index.
  • the standard audio may be stored in the database in the form of text, and when it is called, the waveform data of the standard audio is generated according to the text of each standard audio.
  • the audio recognition device 1204 is connected to the audio collection device 1202 and the first acquisition device 1203, and is used to obtain the recognition result of the output audio.
  • the audio recognition device 1204 and the audio recognition method performed by the audio recognition device 1204 in the embodiment of the present application will be described in detail below.
  • the audio judgment device 1205 is connected with the voice command generation device 1201, and is used to determine whether the intention of the output audio and the voice command is consistent based on the recognition result of the output audio, or to determine whether the output audio and the voice command match, and obtain the audio judgment result (" match or not”), the audio judgment result corresponds to the first recognition result in this application.
  • the connection between the voice instruction generating device 1201 and the audio determining device 1205 is not represented by a solid connecting line, but is respectively represented by a connecting line extending from the two devices and ending with a dot. represented by two line segments.
  • the first acquisition device 1203, the audio recognition device 1204 and the audio determination device 1205 are conceptually three independent devices, however, the functions of the first acquisition device 1203 and the audio determination device 1205 may be integrated in In the audio recognition device 1204, the audio recognition device 1204 here corresponds to the audio recognition device in this application.
  • the first acquiring means 1203, the audio identifying means 1204 and the audio determining means 1205 may be implemented by an electronic control unit.
  • an electronic control unit ECU which includes a microcomputer (microcomputer), an input circuit, an output circuit, and an analog-to-digital (A/D) converter .
  • the main function of the input circuit is to preprocess the input signal (such as the signal from the sensor).
  • Different input signals have different processing methods.
  • the input circuit may include an input circuit processing analog signals and an input circuit processing digital signals.
  • the main function of the A/D converter is to convert the analog signal into a digital signal.
  • the analog signal is preprocessed by the corresponding input circuit and then input to the A/D converter for processing and converted into a digital signal accepted by the microcomputer.
  • the output circuit is a device that establishes the connection between the microcomputer and the actuator. Its function is to convert the processing results sent by the microcomputer into control signals to drive the actuators to work.
  • the output circuit generally uses a power transistor, which controls the electronic circuit of the actuator by turning on or off according to the instructions of the microcomputer.
  • the microcomputer includes a central processing unit (CPU), a memory and an input/output (I/O) interface.
  • the CPU is connected to the memory and I/O interface through a bus, and information can be communicated between each other through the bus. exchange.
  • the memory can be a read-only memory (ROM) or a random access memory (RAM) or other memory.
  • the I/O interface is the connection circuit for exchanging information between the central processing unit (CPU) and the input circuit, the output circuit or the A/D converter. Specifically, the I/O interface can be divided into a bus interface and a communication interface .
  • the memory stores a program, and the CPU can call the program in the memory to execute the test method and the audio recognition method described in the corresponding embodiments of FIG. 5 , FIG. 8 , and FIG. 12 .
  • the test equipment may also include a man-machine interface operation instruction generation device 1214, an image acquisition device 1206, a second acquisition device 1207, an image recognition device 1208, and an image determination device 1209. These devices are used in conjunction with the voice instruction generation device 1201 to identify the vehicle 2000. , especially the output image of the display of the vehicle machine 2000 in response to the voice command and the man-machine interface operation command.
  • the output image output by the display in response to the voice command corresponds to the first output image in the present application
  • the output image output by the display in response to the man-machine interface operation command corresponds to the second output image in the present application.
  • the man-machine interface operation instruction generating device 1214 is configured to send the man-machine interface operation instruction to the vehicle.
  • the man-machine interface operation instruction is an operation instruction performed by a simulated human hand on the man-machine interface, and is related to the voice instruction.
  • the voice instruction is the voice instruction "turn on the air conditioner”
  • the man-machine interface operation instruction may be an operation instruction of clicking a button related to the air conditioner operation displayed on the man-machine interface in response to the voice instruction.
  • the man-machine interface operation instruction generating device 1214 may be a controller that provides input operations to the vehicle-machine man-machine interface via the Android debug bridge interface.
  • the image acquisition device 1206 is used to acquire the output image generated by the vehicle machine 2000 in response to the voice command and the man-machine interface operation command.
  • the image acquisition device 1206 has an ADB interface, and can be connected to the bottom layer ADB interface of the vehicle machine through the ADB interface, so as to directly acquire the image of the human-computer interaction interface from the vehicle machine.
  • the image acquisition device 1206 may also be a camera.
  • the second acquiring means 1207 is configured to acquire the standard image associated with the voice command and the man-machine interface operation command.
  • the standard image can be obtained from the test case, or it can be obtained from the local database according to the retrieval index contained in the test case.
  • the standard image associated with the voice command corresponds to the first standard image in the present application
  • the standard image associated with the man-machine interface operation command corresponds to the second standard image in the present application.
  • the image recognition device 1208 is connected to the image acquisition device 1206 and the second acquisition device 1207, and is configured to obtain the recognition result of the output image based on the output image and the standard image.
  • the image recognition device 1208 in the embodiment of the present application and the image recognition method related thereto will be described in detail below.
  • the image determination device 1209 is connected to the voice command generation device 1201, the man-machine interface operation command generation device 1214 and the image recognition device 1208, and is used to determine the output image based on the recognition result of the output image and the intent of the voice command or the human-machine interface operation command. Whether the intention is consistent, or whether the output image matches the voice command or the man-machine interface operation command, obtain the image judgment result (“match” or “doesn’t match”), indicating whether the output image matches the voice command or not.
  • the image determination result corresponds to the second test result in this application, and the image determination result indicating that the output image matches or does not match the man-machine interface operation instruction corresponds to the third test result in this application.
  • match or “doesn’t match”
  • connection between the voice instruction generating device 1201 and the image determining device 1209 is not represented by a solid connecting line, but is respectively represented by a connecting line extending from the two devices and ending with a dot.
  • Two line segments are represented; and the connection between the man-machine interface operation instruction generating device 1214 and the image determining device 1209 is not represented by a solid line connecting line, but is respectively represented by two lines extending from these two devices and ending with a square point. line segment representation.
  • the test equipment of the present application may further include a message collection device 1210, a third acquisition device 1211, a message identification device 1212, and an image determination device 1209. These devices are used in conjunction with the voice command generation device 1201 to determine whether the vehicle 2000 responds to the voice command. Up and down messages.
  • the message collection device 1210 is used for collecting the uplink and downlink messages sent and received after the vehicle machine 2000 receives the voice command or the man-machine interface operation command.
  • the uplink and downlink packets include uplink packets and downlink packets.
  • the uplink message is generated by the vehicle engine 2000 in response to the voice command
  • the downlink message is generated by the vehicle actuator (not shown) in response to the uplink message output by the vehicle engine 2000 .
  • the message collection apparatus 1210 which are not limited in this application.
  • the message associated with the voice command corresponds to the first message in the present application
  • the message associated with the man-machine interface operation command corresponds to the second message in the present application.
  • the third obtaining means 1211 is configured to obtain standard messages associated with voice commands and man-machine interface operation commands.
  • the standard message may be obtained from the test case, or may be obtained from the local database according to the retrieval index included in the test case.
  • the standard message associated with the voice command corresponds to the first standard message in the present application
  • the standard message associated with the man-machine interface operation command corresponds to the second standard message in the present application.
  • the message identification device 1212 is connected to the message collection device 1210 and the third obtaining device 1211, and is configured to obtain the identification result of the upstream and downstream messages based on the upstream and downstream messages and the standard messages.
  • the message identification device 1212 compares the upstream and downstream messages with the standard messages, and then compares such comparison results (eg, correlation) with preset identification conditions (eg, preset thresholds). If the result satisfies the preset identification condition, the message identification device 1212 outputs a positive message identification result, and if the packet comparison result does not meet the preset identification condition, the message identification device 1212 outputs a negative message identification result.
  • the message determination device 1213 is connected to the voice command generation device 1201 and the message recognition device 1212, and is used to determine whether the uplink and downlink messages are consistent with the intent of the voice command or the man-machine interface operation command based on the message recognition result, or to determine the context Whether the line message and the voice command match, obtain the message judgment result ("match" or "unmatch"), indicating that the upstream and downstream packets match or do not match the voice command.
  • the message judgment result corresponds to the one in this application.
  • the fourth test result, the message determination result indicating that the uplink and downlink messages match or do not match the man-machine interface operation instruction corresponds to the fifth test result in this application.
  • the connection between the voice instruction generating device 1201 and the message recognizing device 1212 is not represented by a solid connecting line, but is respectively extended from the two devices and ends with a dot represented by two line segments.
  • Test equipment 1200 may also include test summary means 1215 .
  • the test summary device 1215 can be connected with the audio determination device 1205, the image determination device 1209 and the message determination device 1213, respectively, and is used to form a form according to the determination results from the audio determination device 1205, the image determination device 1209 and the message determination device 1213.
  • the test results (pass or fail) are aggregated, and the aggregated test results are sent, for example, to the test management device 1100 or other device (not shown). For example, after inputting a voice command, when the audio judgment means 1205, the image judgment means 1209 and the message judgment means 1213 all have the judgment results of "consistent", the testing equipment 1200 determines that the aggregated test result for this piece of voice is "passed” .
  • the test management device 1100 determines that the test result for the entire vehicle is "passed” when the aggregated test results of all the voice commands are passed or the proportion of the aggregated test results of "passed” exceeds the threshold.
  • the function of the test management device 1100 can also be implemented by the test device 1200, for example, the function is integrated in the test summary device 1215.
  • the test equipment 1200 can detect the feedback signals (such as output voice, output images, and uplink and downlink messages) of the vehicle 2000 in response to each test instruction, so as to realize a high-depth "full link”. "Detection.
  • FIG. 5 schematically shows a flow chart of the testing method of the voice interaction system according to the embodiment of the present application, which includes steps S101-115.
  • step S101 the voice command generating apparatus 1201 provides voice commands to the vehicle machine 2000 based on the test case received by the test equipment 1200 .
  • step S102 the audio collection device 1202 collects the output audio generated by the vehicle machine 2000 in response to the voice command.
  • step S103 the first obtaining means 1203 obtains the standard audio associated with the voice command.
  • step S104 the audio recognition device 1204 performs audio recognition based on the output audio and the standard audio, and obtains a recognition result of the output audio.
  • the specific audio recognition method used in this step will be described in more detail later.
  • step S105 the audio determination device 1205 determines the consistency of the intent of the output audio and the voice command based on the recognition result of the output audio, and obtains an audio determination result.
  • step S106 the man-machine interface operation instruction generating device 1214 provides the vehicle machine 2000 with the man-machine interface operation instruction related to the voice instruction via the ADB interface based on the test case received by the test equipment 1200.
  • step S107 the image capture device 1206 captures the output image generated by the display of the vehicle 2000 in response to the voice command and the man-machine interface operation command.
  • the output image generated by the display of the vehicle machine 2000 in response to the voice command corresponds to the first output image in this application
  • the output image generated by the display of the vehicle machine 2000 in response to the man-machine interface operation instruction corresponds to the second output image in this application. output image.
  • step S108 the second obtaining means 1207 obtains the standard image associated with the voice command and the man-machine interface operation command.
  • the standard image associated with the voice command corresponds to the first standard image in the present application
  • the standard image associated with the human interface operation command corresponds to the second standard image in the present application.
  • step S109 the image recognition device 1208 performs image recognition based on the output image and the standard image, and obtains a recognition result of the output image.
  • the specific image recognition method used in this step will be described in more detail later.
  • step S110 based on the recognition result of the output image, the image determination device 1209 determines the consistency of the output image with the intention of the first, man-machine interface operation instruction, and obtains an image determination result.
  • the message collection device 1210 collects the uplink and downlink messages generated by the vehicle machine 2000 in response to the test instruction.
  • the uplink and downlink packets include uplink packets and downlink packets.
  • the uplink message is generated by the vehicle engine 2000 in response to the test command, and the downlink message is generated by the vehicle actuator (not shown) in response to the uplink message output by the vehicle engine 2000 .
  • There are various ways to collect packets which are not limited in this application.
  • step S112 the third obtaining means 1211 obtains the standard message associated with the test instruction.
  • step S113 the message identification device 1212 obtains the identification result of the upstream and downstream messages based on the upstream and downstream messages and the standard message.
  • the message determining means 1213 determines the consistency of the intent of the uplink and downlink messages and the test instruction based on the identification result of the uplink and downlink messages, and obtains the message determination result.
  • the uplink and downlink packets can be compared with the standard packets first, and then the comparison result (such as correlation) can be compared with preset identification conditions (such as preset threshold). If the packet comparison result satisfies the preset identification condition, output a positive packet identification result; if the packet comparison result does not meet the preset identification condition, output a negative packet identification result.
  • step S115 the test summarizing means 1215 summarises each judgment result, and forms and outputs the summarised test result.
  • steps S101 to S115 are not arranged in the order in which they actually occur.
  • steps S102 to S105 , steps S106 to S110 , and steps S111 to S114 may be performed in sequence or performed simultaneously.
  • some of the above steps may be omitted, for example, steps S106 to S110 and/or steps S111 to S114 may be omitted, or step S115 may be omitted.
  • step S106 may be omitted, and in this case, steps S107 to S110 may be adjusted accordingly to perform related operations only based on voice commands.
  • some of the above-mentioned means may be omitted, for example, the means for performing steps S106 to S110 and/or the means for performing steps S111 to S114, or the means for performing step S115 may be omitted.
  • the man-machine interface operation instruction generating means 1214 may be omitted, in this case, each means for performing steps S107 to S110 may be adjusted accordingly to perform related operations based on voice instructions only.
  • FIG. 6 schematically shows a schematic structural diagram of an audio recognition apparatus 1204 according to an embodiment of the present application.
  • the audio recognition device 1204 includes an audio receiving module 401 , a signal processing module 402 , a division module 403 , a calculation module 404 , an audio recognition module 405 , a polling module 406 and a threshold adjustment module 407 .
  • the audio receiving module 401 is configured to respectively receive the waveform data of the output audio and the waveform data of the standard audio.
  • the signal processing module 402 is connected to the audio receiving module 401, and is used for pre-processing the acquired waveform of the output audio, and normalizing the processed waveform.
  • the dividing module 403 is connected with the signal processing module 402, and is used for dividing the waveform data of the output audio and the waveform data of the standard audio with the same time step (for example, 0.5 seconds) to obtain N (N is a natural number) pairs of waveform data blocks. .
  • the waveform data of the output audio corresponds to the first waveform data in the present application
  • the waveform data of the standard audio corresponds to the second waveform data in the present application.
  • the N waveform data blocks obtained by dividing the waveform data of the output audio correspond to the first waveform data block in the present application
  • the N waveform data blocks obtained by dividing the waveform data of the standard audio correspond to the first waveform data blocks in the present application.
  • the output audio also corresponds to the audio to be recognized in this application.
  • the division module 403 corresponds to the first division module and the second division module in this application.
  • the number of waveform data blocks obtained by dividing the waveform data of the output audio and the waveform data of the standard audio is the same. However, as other embodiments, it may be different. For example, the normalization performed by the signal processing module 402 described above is omitted. In the unified processing, the waveform data of the output audio and the waveform data of the standard audio are divided according to the same preset time step. In addition, in this embodiment, the waveform data of the output audio and the waveform data of the standard audio are completely divided according to the same preset time step. However, as other embodiments, the time of some waveform data blocks may be appropriately adjusted Step size, such as the time step to extend the data blocks at the beginning and end of the output audio.
  • the calculation module 404 is connected with the division module 403, and is used for performing frequency domain correlation calculation on each pair of waveform data blocks respectively to obtain N frequency domain correlation degrees.
  • the frequency domain correlation here is an example of the correlation in the present application.
  • the so-called correlation here refers to the degree of similarity, and it goes without saying that in addition to the correlation in the frequency domain, other correlations can also be used.
  • the calculation module 404 may calculate the correlation degree for each pair of data blocks, or may calculate the correlation degree only for a part of the data block pairs.
  • the correlation is only calculated for a part of the data segment pairs, for example, the content of the standard audio can be obtained, and then the content of each data segment of the standard audio can be obtained, and whether to calculate the correlation is determined according to the content. Specifically, assuming that the content of standard audio is "
  • the audio recognition module 405 is respectively connected to the calculation module 404 and the division module 403, and is used for obtaining the recognition result of the output audio based on the N frequency domain correlations and the N preset thresholds corresponding to the N pairs of waveform data blocks.
  • FIG. 7 schematically shows a schematic structural diagram of an audio recognition module 405 according to an embodiment of the present application.
  • the audio recognition module 405 includes a first audio recognition module 4051 , a second audio recognition module 4052 and a third audio recognition module 4053 .
  • the audio recognition module 405 corresponds to the recognition module in this application.
  • the first audio identification module 4051 is configured to compare N frequency-domain correlations with N preset thresholds corresponding to N pairs of waveform data blocks, and obtain N comparison results.
  • the second audio recognition module 4052 is connected to the first audio recognition module 4051, and is used to generate a recognition result indicating that the output audio is the same or different from the standard audio based on the preset recognition conditions and N comparison results, and the recognition result corresponds to the the first recognition result.
  • the third audio identification module 4053 is connected to the second audio identification module 4052, and is configured to acquire the content of the standard audio based on the identification result of the output audio, and output the content of the standard audio as the content of the output audio.
  • the content of the output audio generated by the third audio identification module 4053 may also be slightly different from the content of the standard audio.
  • the content of the standard audio is "Xiao U has turned on the air conditioner for you”
  • the content of the standard audio is generated according to the content of the standard audio.
  • the content of the output audio is "Little Y has turned on the air conditioner for you”.
  • the polling module 406 is respectively connected with the audio identification module 405 and the division module 403, and is used for the existence of other unused standard audios based on the identification results of the output audio and standard audio, and repeats the division when there is an uncompared standard audio. The processing performed by the module 403, the calculation module 404 and the audio recognition module 405 until the output audio is compared with all standard audios.
  • the threshold adjustment module 407 is respectively connected with the audio receiving module 401 and the audio identification module 405, and is used for adjusting the N preset thresholds respectively based on the output audio and the standard audio, so that the N preset thresholds are different.
  • the preset threshold value corresponding to the standard audio waveform data block may be adjusted according to the content, which will be described in detail later.
  • the first obtaining device may also obtain the content of the waveform data blocks of the standard audio according to the content of the standard audio.
  • the present embodiment can realize the indirect identification of the content of the output audio with a small development cost, and avoid the direct identification of the audio content with a high development cost.
  • the present embodiment performs processing based on data blocks, the reliability and flexibility are improved.
  • FIG. 8 schematically shows an audio recognition method according to an embodiment of the present application.
  • the audio recognition method specifically includes steps S201-S209.
  • step S201 the audio receiving module 401 respectively receives the waveform data of the output audio and the waveform data and content of the standard audio.
  • the standard audio can come from an audio database included in the test equipment.
  • the output audio of a vehicle under test in response to the test command can be "OK, Xiao X has turned on the air conditioner for you", and the pre- Standard audios stored in the database or included in test cases may include “OK, U has turned on the air conditioner for you”, “OK, the air conditioner is turned on for you”, and "The air conditioner is turned on”.
  • step S202 the signal processing module 402 pre-processes the acquired waveform of the output audio, and normalizes the processed waveform.
  • the pre-processing may include filtering processing for suppressing noise, phase correction processing for removing noise, etc., so as to remove the negative effects of noise, blank audio signal, etc. on subsequent processing. Normalization processing can reduce the complexity of subsequent processing and reduce the computational overhead of subsequent processing.
  • the output audio may not only include the feedback voice signal of "Okay, Xiao X has turned on the air conditioner for you" generated by the car-machine, but may also include the feedback voice signal generated by the vehicle while driving. Vibration noise and ambient noise such as chatting in the car. In this case, the output audio can be filtered to suppress noise.
  • the output audio may also include a blank audio signal recorded during this period of time.
  • phase correction can be performed on the output audio.
  • FIG. 9 schematically shows the preprocessing performed by the signal processing module of the embodiment of the present application on the waveform signal of the output audio.
  • the upper part of FIG. 9 is the waveform of the output audio
  • the lower part is the waveform of the standard audio
  • the horizontal axis in the figure is time
  • the vertical axis is the amplitude. Since the output audio and the standard audio include blank audio, phase correction is achieved by shifting the waveforms of the output audio and the standard audio to the left as a whole (removing the blank part), so that the initial phases of the output audio and the standard audio are basically the same.
  • the shorter of the output audio and the standard audio can also be used as a reference to cut the longer (for example, remove the longer rear of the longer than the shorter, refer to the back of the vertical thick dashed line in FIG. 10 ). part), so that the duration of the processed output audio is basically the same as that of the standard audio.
  • step S203 the dividing module 403 divides the waveform of the output audio and the waveform of the standard audio in the same manner to obtain N pairs of waveform data blocks.
  • FIG. 10 schematically shows the division of the waveform of the output audio and the waveform of the standard audio of FIG. 9 .
  • the initial phase of the output audio in the upper part of FIG. 10 is basically the same as the initial phase of the standard audio in the lower part of FIG. 10 , and the durations of the two are basically the same.
  • the output audio and the standard audio are divided in units of 0.5 seconds to obtain 12 pairs of waveform data blocks.
  • dividing in the same way is intended to make the durations of two waveform data blocks in a pair of waveform data blocks equal, but is not intended to limit the duration of different waveform pairs.
  • the duration of the first pair of waveform data blocks may not be equal to the duration of the second pair of waveform data blocks.
  • the unit duration of the waveform data block can be adjusted.
  • the unit duration used for division is in the range of 0.2 to 0.7 seconds.
  • step S204 the calculation module 404 respectively performs frequency domain correlation calculation on each pair of waveform data blocks to obtain N pairs of waveform data blocks.
  • step S204 first perform fast Fourier transform (fast Fourier transform, FFT) on each waveform data block in the 12 pairs of waveform data blocks, so that a total of 24 waveform data blocks are performed.
  • the waveform data block is converted from a time domain function to a frequency domain function.
  • step S204 take a pair of waveform data blocks as a unit, perform correlation calculation on each pair of frequency domain waveform data blocks respectively, and obtain 12 frequency domain correlations.
  • time-domain-frequency-domain transforms such as other Fourier transforms, may be used.
  • step S205 the threshold adjustment module 407 adjusts N preset thresholds respectively based on the output audio and the standard audio, wherein the N preset thresholds correspond to N waveform pairs.
  • the output audio can be intercepted as "OK, Xiao X has turned on the air conditioner for you”.
  • the output audio and the standard audio can be divided into 12 blocks respectively to form 12 pairs of waveform data blocks:
  • the frequency domain correlations of the above 12 pairs of waveform data blocks are calculated.
  • the 12 preset thresholds corresponding to the 12 N frequency domain correlation pairs are adjusted according to the test instruction "turn on the air conditioner” and the standard audio "Okay, Xiao X has turned on the air conditioner for you", which will be related to the first
  • the preset thresholds corresponding to 9 to 12 waveform pairs i.e.
  • ) is set to a low value, such as 0; the preset threshold corresponding to other waveform pairs is set to a relatively low value, such as 60.
  • step S205 the preset threshold value of the key recognition area of the audio can be increased, the preset threshold value of the non-key recognition area of the audio can be lowered, and even the difference area that may be generated due to different vehicles (for example, The preset thresholds of the anthropomorphic names of different vehicles and machines are set to 0. In this way, the success rate of audio recognition can be improved to a greater extent, and the false recognition rate of audio can be reduced.
  • the above-mentioned adjustment and setting methods of the threshold are only exemplary.
  • step S206 the first audio recognition module 4051 of the audio recognition module 405 compares the N frequency domain correlations and N preset thresholds respectively, and obtains N comparison results.
  • the result of their comparison is an affirmative result (“Yes”).
  • the result of their comparison is a negative result (“No”).
  • step S207 the second audio recognition module 4052 of the audio recognition module 405 obtains the recognition result of the output audio based on the preset recognition conditions and N comparison results, and when the recognition result is a positive result, step S208 is performed, and the recognition result When the result is negative, step S209 is executed.
  • the preset identification condition is that the number of positive results is more than 80% of the number of all comparison results.
  • the output audio is identified as the standard audio.
  • the output audio is identified as not the standard audio.
  • step S208 the third audio recognition module 4053 of the audio recognition module 405 acquires the content of the standard audio, and outputs the content of the standard audio as the content of the output audio.
  • step S201 acquires the waveform of the standard audio
  • step S208 acquires the file of the content of the standard audio.
  • step S209 the polling module 406 judges whether there are other standard audios associated with the test instruction, and if so, go to step S203, when the output audio and all the standard audio recognition results are negative results, output Negative recognition results.
  • step S207 the second standard audio "air conditioner has been turned on” will be used as the standard audio, and go to step S203; repeat the subsequent steps, and then In step S207, the output audio will once again be identified as not the second standard audio "air conditioner is turned on”.
  • step S209 the output audio will be recognized as the third standard audio "Okay, Xiao U has turned on the air conditioner for you" "As the standard audio, go to step S203; repeat the subsequent steps; in step S207, the output audio "Okay, Xiao X has turned on the air conditioner for you” is identified as the third standard audio "Okay, small U has turned on the air conditioner for you”; finally, in step S208, the content of the third standard audio is obtained, and the content of the third standard audio is output as the content of the output audio.
  • step S207 the output audio "Okay, small "X has turned on the air conditioner for you” is recognized as not the third standard audio "Okay, Xiao U has turned on the air conditioner for you", in the subsequent step S209, since there are no other unrecognized standards at this time Audio, the polling module 406 will output a negative recognition result.
  • steps S201 to S209 are not arranged in the order in which they actually occur.
  • step S205 may occur before step S204.
  • some of the above steps may be omitted, for example, steps S202, S205, S208, S209 or any combination thereof may be omitted.
  • S207 may be specified to directly output a positive or negative recognition result.
  • some of the above modules may be omitted, for example, the modules for performing steps S202, S205, S208 or S209, or any combination of these modules may be omitted.
  • FIG. 11 schematically shows a schematic structural diagram of an image recognition apparatus 1208 according to an embodiment of the present application.
  • the image recognition device 1208 includes an image receiving module 801 , a grayscale processing module 802 , a binarization processing module 803 , an image matching and recognition module 804 and an image character recognition module 805 .
  • the image receiving module 801 is configured to receive the vehicle machine 2000, especially the output image generated by the human-machine interface (User Interface, UI) of the vehicle machine in response to a voice command and/or a human-machine interface operation command (such as a screen cutting operation), and A standard image correspondingly associated with a voice command and/or a human-machine interface operation command (eg, a screen cutting operation) is received.
  • UI Human-machine interface
  • the grayscale processing module 802 is connected to the image receiving module 801, and is used for performing grayscale processing on the output image. Since the grayscale processing of images is a relatively mature technology, the implementation manner of the grayscale processing will not be described in detail in this application.
  • the binarization processing module 803 is connected to the grayscale processing module 802, and is used to perform binarization processing on the output image. Since image binarization processing is a relatively mature technology, the present application will not describe in detail how the binarization processing can be implemented.
  • the image matching and identification module 804 is connected with the binarization processing module 803 and the image receiving module 801, and is used for intercepting the processed output image in the shape of a preset template, and then matching and identifying the intercepted output image and the standard image, Output the first image recognition result.
  • the image character recognition module 805 is connected to the binarization processing module 803, and is used for performing Optical Character Recognition (OCR) on the processed output image and outputting a second image recognition result. Since the optical character recognition of images is a relatively mature technology, this application will not describe in detail how the optical character recognition can be realized.
  • OCR Optical Character Recognition
  • FIG. 12 schematically shows a schematic diagram of various aspects of an image recognition method related to an embodiment of the present application.
  • the test device of the embodiment of the present application respectively sends a voice command and a screen-cutting operation command as a man-machine interface operation command to the vehicle machine.
  • the screen cutting operation instruction is transmitted via the ADB interface.
  • the human-machine interface of the vehicle-machine generates an output image in response to the voice command and the man-machine interface operation command.
  • the test equipment collects the output image of its human-computer interaction interface from the vehicle and obtains the standard image associated with the voice command and the second term command from the image database, respectively.
  • an image recognition method involved in an embodiment of the present application is schematically shown, and the image recognition method includes the following steps S301 to S305 .
  • the image receiving module 801 receives, for example, an output image generated by the human-machine interface of the vehicle-machine in response to a voice command and/or a man-machine-interface operation instruction (such as a screen-cutting operation), and, for example, an image from the vehicle-machine
  • the database receives standard images correspondingly associated with voice commands and/or human-machine interface operation commands (eg, screen cutting operations).
  • step S302 the grayscale processing module 802 performs grayscale processing on the output image.
  • step S303 the binarization processing module 803 performs binarization processing on the output image.
  • step S304 the image matching and recognizing module 804 intercepts the processed output image in the shape of the preset template, and then matches and recognizes the intercepted output image with the standard image, and outputs the first image recognition result.
  • step S305 the image character recognition module 805 performs optical character recognition on the processed output image and outputs a second image recognition result.
  • steps S302 and S303 may be omitted.
  • step S304 or S305 may be omitted.
  • modules for performing steps S302 and S303 may be omitted.
  • the module for performing step S304 or performing step S305 may be omitted.
  • FIG. 13 is a schematic structural diagram of a computing device 1500 provided by an embodiment of the present application.
  • the computing device 1500 includes: a processor 1510 , a memory 1520 , a communication interface 1530 , and a bus 1540 .
  • the communication interface 1530 in the computing device 1500 shown in FIG. 13 may be used to communicate with other devices.
  • the processor 1510 can be connected with the memory 1520 .
  • the memory 1520 may be used to store the program codes and data. Therefore, the memory 1520 may be a storage unit inside the processor 1510 , or an external storage unit independent from the processor 1510 , or may include a storage unit inside the processor 1510 and an external storage unit independent from the processor 1510 . part.
  • computing device 1500 may also include bus 1540 .
  • the memory 1520 and the communication interface 1530 may be connected to the processor 1510 through the bus 1540 .
  • the bus 1540 may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus or an Extended Industry Standard Architecture (Extended Industry Standard Architecture, EISA) bus or the like.
  • PCI peripheral component interconnect standard
  • EISA Extended Industry Standard Architecture
  • the bus 1540 can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one line is shown in FIG. 13, but it does not mean that there is only one bus or one type of bus.
  • the processor 1510 may adopt a central processing unit (central processing unit, CPU).
  • the processor may also be other general-purpose processors, digital signal processors (DSPs), application specific integrated circuits (ASICs), off-the-shelf programmable gate arrays (FPGAs) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • DSPs digital signal processors
  • ASICs application specific integrated circuits
  • FPGAs off-the-shelf programmable gate arrays
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the processor 1510 uses one or more integrated circuits to execute related programs to implement the technical solutions provided by the embodiments of the present application.
  • the memory 1520 may include read only memory and random access memory and provides instructions and data to the processor 1510 .
  • a portion of the processor 1510 may also include non-volatile random access memory.
  • the processor 1510 may also store device type information.
  • the processor 1510 executes the computer-implemented instructions in the memory 1520 to perform the operational steps of the above-described methods.
  • the computing device 1500 may correspond to corresponding subjects in executing the methods according to the various embodiments of the present application, and the above-mentioned and other operations and/or functions of the modules in the computing device 1500 are respectively for the purpose of realizing the present application.
  • the corresponding processes of each method in the embodiment will not be repeated here.
  • the disclosed system, apparatus and method may be implemented in other manners.
  • the apparatus embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and components shown as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the functions, if implemented in the form of software functional units and sold or used as independent products, may be stored in a computer-readable storage medium.
  • the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution.
  • the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes .
  • Embodiments of the present application further provide a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, is used to execute any one of the above-mentioned audio recognition methods, image recognition methods, and testing methods , the method includes at least one of the solutions described in the above embodiments.
  • the computer storage medium of the embodiments of the present application may adopt any combination of one or more computer-readable media.
  • the computer-readable medium may be a computer-readable signal medium or a computer-readable storage medium.
  • the computer readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or any combination of the above. More specific examples (a non-exhaustive list) of computer readable storage media include: electrical connections having one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), Erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the above.
  • a computer-readable storage medium can be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a propagated data signal in baseband or as part of a carrier wave, with computer-readable program code embodied thereon. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device .
  • Program code embodied on a computer readable medium may be transmitted using any suitable medium including, but not limited to, wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
  • Computer program code for performing the operations of the present application may be written in one or more programming languages, including object-oriented programming languages—such as Java, Smalltalk, C++, but also conventional Procedural programming language - such as the "C" language or similar programming language.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server.
  • the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or wide area network (WAN), or may be connected to an external computer (eg, through the Internet using an Internet service provider) connect).
  • LAN local area network
  • WAN wide area network
  • Internet service provider an external computer
  • the content of the output audio is identified as the content of the standard audio on the condition that the number of waveform data blocks with a correlation greater than the threshold is 80% or more of the number of all comparison results.
  • this application does not Limited to this, for example, the calculated multiple correlation degrees can be averaged, and when the average value is greater than the average value threshold, the content of the output audio is identified as the content of the standard audio; Different weight values, based on which the average is calculated.
  • the weight value here can be set according to the content of standard audio.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Debugging And Monitoring (AREA)

Abstract

本申请实施方式提供一种语音交互系统的测试方法、音频识别方法及相关设备等。在该语音交互系统的测试方法中,包括:向所述语音交互系统发送语音指令;获取所述语音交互系统的扬声器的输出音频的第一波形数据;获取标准音频的第二波形数据;将所述第一波形数据分为多个第一波形数据分块;将所述第二波形数据分为多个第二波形数据分块;计算所述第一波形数据分块与所述第二波形数据分块的相关度;根据所述相关度确定所述输出音频与所述语音指令相匹配或不匹配。通过利用上述音频识别方法,本申请实施方式的语音交互系统的测试方法能够实现迅速的自动化测试。

Description

语音交互系统的测试方法、音频识别方法及相关设备 技术领域
本申请涉及一种语音交互系统的测试方法、音频识别方法及相关设备。
背景技术
语音交互技术被越来越多的应用,例如越来越多的车辆中附带了语音交互功能,使驾驶员可通过语音的方式调用汽车导航、调整驾驶模式和控制车辆各个执行器等,这大大提高了驾驶员操作的方便性。在车辆的出厂交付前,需要对语音交互系统的功能、性能等进行严格的测试,以保证语音交互系统的实际应用效果。
语音交互系统涉及的场景和测试项众多,例如,包括车机语音交互的噪声叠加性能测试,带口音普通话/方言的测试,不同语速的测试等。因此,需要采用自动化的音频识别测试手段来替代人工点测试,来满足测试普适性和测试效率的要求。
发明内容
本申请实施例提供了一种测试语音交互系统的测试方法及实现该测试方法的测试设备、音频识别方法及实现该识别方法的装置,能够实现自动化的测试与识别。
本申请第一方面提供一种测试语音交互系统的测试方法,包括:向所述语音交互系统发送语音指令;获取所述语音交互系统的扬声器的输出音频的第一波形数据;获取标准音频的第二波形数据;将所述第一波形数据分为多个第一波形数据分块;将所述第二波形数据分为多个第二波形数据分块;计算所述第一波形数据分块与所述第二波形数据分块的相关度;根据所述相关度生成第一测试结果,所述第一测试结果指示所述输出音频与所述语音指令相匹配或不匹配。
采用如上的测试方法,通过对输出音频和标准音频划分成块,并计算各分块对的相关度,能够以较低开发成本和较高识别效率,实现对输出音频内容的识别,由此本申请提供了一种能够迅速进行测试的自动化测试方法。值得注意的是,本申请的方法中不必须对输出音频的内容本身进行直接识别,对内容本身进行识别通常需要较高的开发成本(例如需要进行大量语音训练等),并且识别速度较低。上述测试方法是通过将输出音频和标准音频进行比较,基于该比较结果来间接地对输出音频的内容进行识别。因此,上述测试方法尤其适合应用于输出音频比较固定的场景,例如应用于车机的指令/控制性语音的交互,电话自动回复系统的交互等。
这里的标准音频的获取方式没有限制,例如该标准音频可以是对语音交互系统进行测试的测试设备根据语音指令在本地数据库中查询获取到的,也可以是服务器等连同语音指令发送给测试设备的。
作为本申请第一方面的一个可能的实现方式,所述计算所述第一波形数据分块与所述第二波形数据分块的相关度包括:分别计算所述多个第一波形数据分块与所述多个第二波形数据分块的多个相关度;所述根据所述相关度生成第一测试结果包括:将 所述多个相关度与多个预设阈值分别进行比较,根据所述比较的结果生成第一测试结果。
作为本申请第一方面的一个可能的实现方式,所述多个预设阈值的大小被设定为不同。
在实际中,输出音频由于采集或收录条件的不同,可能包含各种噪音和杂音,输出音频的质量会影响音频识别结果。通过分别调整对应于各个分块对的预设阈值,能够提高本申请音频识别方法对于各种输出音频质量的适应性,从而提高识别结果的可靠性。另外,通过自由调整预设阈值,能够提高上述测试方法的适用范围。
作为本申请第一方面的一个可能的实现方式,还包括:获取所述标准音频的内容;根据所述标准音频的内容获取所述多个第二波形数据分块的内容;根据所述第二波形数据分块的内容,设定所述预设阈值。
作为本申请第一方面的一个可能的实现方式,还包括:获取所述标准音频的内容;在所述第一测试结果指示所述输出音频与所述语音指令相匹配时,根据所述标准音频的内容生成所述输出音频的内容。
如此,语音交互系统的测试方法能够实现对响应于语音指令而产生的输出音频的内容的识别,从而提高测试深度。
作为本申请第一方面的一个可能的实现方式,所述测试方法还包括:获取所述语音交互系统的显示器的第一输出图像;获取第一标准图像;基于所述第一输出图像和所述第一标准图像生成第二测试结果,所述第二测试结果指示所述第一输出图像和所述语音指令相匹配或不匹配。
通过实现对响应于语音指令而产生的输出图像的识别,以及实现输出图像与语音指令之间一致性的判别,上述测试方法能够进一步提高测试深度,提高测试可靠性。
作为本申请第一方面的一个可能的实现方式,通过所述语音交互系统的安卓调试桥接口获取所述第一输出图像。
通过语音交互系统的安卓调试桥接口采集输出图像,上述测试方法能够快速准确地获取输出图像。与传统通过摄像头采集输出图像相比,这种方式能够减小或避免采集过程中引入图像变形等负面因素,从而提高了后续识别的可靠性。
作为本申请第一方面的一个可能的实现方式,所述测试方法还包括:在发送所述语音指令之后,向所述语音交互系统发送人机界面操作指令;获取所述语音交互系统显示器的第二输出图像;获取第二标准图像;基于所述第二输出图像和所述第二标准图像生成第三测试结果,所述第三测试结果指示所述第二输出图像与所述人机界面操作指令相匹配或不匹配。
作为本申请第一方面的一个可能的实现方式,通过所述安卓调试桥接口获取所述第二输出图像。
通过安卓调试桥接口提供人机界面操作指令,这样测试方法能够提供与语音指令相关的人机界面操作指令,该人机界面操作指令直接作用于车机,尤其是可以直接作用于车机的人机交互界面,从而能够提供与车机的实际使用场景更加贴合的测试环境。由此,上述测试方法提高了测试深度,从而提高了测试可靠性。
作为本申请第一方面的一个可能的实现方式,还包括:获取所述语音交互系统收 发的第一报文;获取第一标准报文;根据所述第一报文和所述第一标准报文生成第四测试结果,所述第四测试结果指示所述第一报文和所述语音指令相匹配或不匹配。
通过实现对响应于语音指令而产生的报文的识别,上述测试方法能够进一步提高测试深度,提高测试可靠性。
本申请第二方面提供一种音频识别方法,包括:获取待识别音频的第一波形数据;获取标准音频的第二波形数据;将所述第一波形数据分为多个第一波形数据分块;将所述第二波形数据分为多个第二波形数据分块;计算所述第一波形数据分块与所述第二波形数据分块的相关度;根据所述相关度生成第一识别结果,所述第一识别结果指示所述待识别音频与所述标准音频相同或不同。
作为本申请第二方面的一个可能的实现方式,所述计算所述第一波形数据分块与所述第二波形数据分块的相关度包括:分别计算所述多个第一波形数据分块与所述多个第二波形数据分块的多个相关度;所述根据所述相关度生成第一识别结果包括:将所述多个相关度与多个预设阈值分别进行比较,根据所述比较的结果生成第一识别结果。
作为本申请第二方面的一个可能的实现方式,所述多个预设阈值的大小被设定为不同。
作为本申请第二方面的一个可能的实现方式,还包括:获取所述标准音频的内容;根据所述标准音频的内容获取所述多个第二波形数据分块的内容;根据所述第二波形数据分块的内容,设定所述预设阈值。
作为本申请第二方面的一个可能的实现方式,在所述第一识别结果指示所述待识别音频与所述标准音频相同时,根据所述标准音频的内容生成所述待识别音频的内容。
本申请第三方面提供一种测试语音交互系统的测试设备,包括:语音指令生成装置,用于向所述语音交互系统发送语音指令;音频采集装置,用于获取所述语音交互系统的扬声器的输出音频的第一波形数据;第一获取装置,用于获取标准音频的第二波形数据;第一划分模块,用于将所述第一波形数据分为多个第一波形数据分块;第二划分模块,用于将所述第二波形数据分为多个第二波形数据分块;计算模块,用于计算所述第一波形数据分块与所述第二波形数据分块的相关度;音频判定装置,用于根据所述相关度生成第一测试结果,所述第一测试结果指示所述输出音频与所述语音指令相匹配或不匹配。
作为本申请第三方面的一个可能的实现方式,所述计算模块执行的所述计算所述第一波形数据分块与所述第二波形数据分块的相关度包括:分别计算所述多个第一波形数据分块与所述多个第二波形数据分块的多个相关度;所述音频判定装置执行的所述根据所述相关度生成第一测试结果包括:将所述多个相关度与多个预设阈值分别进行比较,根据所述比较的结果生成所述第一测试结果。
作为本申请第三方面的一个可能的实现方式,还包括阈值调整模块,用于将所述多个预设阈值的大小设定为不同。
作为本申请第三方面的一个可能的实现方式,所述第一获取装置还用于:获取所述标准音频的内容;根据所述标准音频的内容获取所述多个第二波形数据分块的内 容;
所述阈值调整模块还用于根据所述第二波形数据分块的内容,设定所述预设阈值。
作为本申请第三方面的一个可能的实现方式,所述第一获取装置还用于获取所述标准音频的内容;所述测试设备还包括音频识别模块,所述音频识别模块用于在所述第一测试结果指示所述输出音频与所述语音指令相匹配时,根据所述标准音频的内容生成所述输出音频的内容。
作为本申请第三方面的一个可能的实现方式,所述测试设备还包括:图像采集装置,用于获取所述语音交互系统的显示器的第一输出图像;第二获取装置,用于获取第一标准图像;图像判定装置,用于根据所述第一输出图像和所述第一标准图像生成第二测试结果,所述第二测试结果指示所述第一输出图像和所述语音指令相匹配或不匹配。
作为本申请第三方面的一个可能的实现方式,所述图像采集装置执行的所述获取所述语音交互系统的显示器的第一输出图像包括:通过所述语音交互系统的安卓调试桥接口获取所述第一输出图像。
作为本申请第三方面的一个可能的实现方式,所述测试设备还包括:人机界面操作指令生成装置,用于在发送所述语音指令之后,向所述语音交互系统发送人机界面操作指令;图像采集装置,用于获取所述语音交互系统的显示器的第二输出图像;第二获取装置,用于获取第二标准图像;图像判定装置,用于根据所述第二输出图像和所述第二标准图像像生成第三测试结果,所述第三测试结果指示所述第二输出图像与所述人机界面操作指令相匹配或不匹配。
作为本申请第三方面的一个可能的实现方式,所述图像采集装置执行的所述获取所述语音交互系统的显示器的第二输出图像包括:通过所述人机交互系统的安卓调试桥接口获取所述第二输出图像。
作为本申请第三方面的一个可能的实现方式,还包括:报文采集装置,用于获取所述语音交互系统收发的第一报文;第三获取装置,用于获取第一标准报文;报文判定装置,用于根据所述第一报文和所述第一标准报文生成第四测试结果,所述第四测试结果指示所述第一报文和所述语音指令相匹配或不匹配。
本申请第四方面提供一种音频识别装置,包括:音频采集模块,用于获取待识别音频的第一波形数据;第一获取模块,用于获取标准音频的第二波形数据;第一划分模块,用于将所述第一波形数据分为多个第一波形数据分块;第二划分模块,用于将所述第二波形数据分为多个第二波形数据分块;计算模块,用于计算所述第一波形数据分块与所述第二波形数据分块的相关度;识别模块,用于根据所述相关度生成第一识别结果,所述第一识别结果指示所述待识别音频与所述标准音频相同或不同。
作为本申请第四方面的一个可能的实现方式,所述计算模块执行的所述计算所述第一波形数据分块与所述第二波形数据分块的相关度包括:分别计算所述多个第一波形数据分块与所述多个第二波形数据分块的多个相关度;其中,所述识别模块执行的所述根据所述相关度生成第一识别结果包括:将所述多个相关度与多个预设阈值分别进行比较,根据所述比较的结果生成所述第一识别结果。
作为本申请第四方面的一个可能的实现方式,还包括阈值调整模块,用于将所述多个预设阈值的大小设定为不同。
作为本申请第四方面的一个可能的实现方式,所述第一获取模块还用于:获取所述标准音频的内容;根据所述标准音频的内容获取所述多个第二波形数据分块的内容;所述阈值调整模块还用于根据所述第二波形数据分块的内容,设定所述预设阈值。
作为本申请第四方面的一个可能的实现方式,所述第一获取模块还用于获取所述标准音频的内容;所述识别模块还用于在所述第一识别结果指示所述待识别音频与所述标准音频相同时,根据所述标准音频的内容生成所述待识别音频的内容。
第五方面,本申请提供一种车机语音交互测试系统,其包括:测试管理设备,用于发送测试用例,以管理车机语音交互测试;如上述第四方面中任一种所述的测试设备,与所述测试管理设备连接,用于对车机进行车机语音交互测试;其中,所述测试设备根据所述测试用例提供所述测试用指令。
由于第五方面包括第四方面的测试设备,其将类似地具有上述第四方面所具有的优点或益处,因此对于第五方面的优点或益处在此不再赘述。
第六方面,本申请提供一种计算设备,其包括:总线;通信接口,其与所述总线连接;至少一个处理器,其与所述总线连接;以及至少一个存储器,其与所述总线连接并存储有程序指令,所述程序指令当被所述至少一个处理器执行时使得所述至少一个处理器执行上述第一方面和上述第三方面中任一所述的方法。
由于第六方面可以执行上述第一方面和上述第三方面中任一所述的方法,其将类似地具有上述第一方面或第三方面所具有的优点或益处,因此对于第六方面的优点或益处在此不再赘述。
第七方面,本申请提供一种计算机可读存储介质,存储有程序指令,其特征在于,所述程序指令当被计算机执行时使得所述计算机执行上述第一方面和上述第三方面中任一所述的方法。
由于第七方面可以执行上述第一方面和上述第三方面中任一所述的方法,其将类似地具有上述第一方面或第三方面所具有的优点或益处,因此对于第七方面的优点或益处在此不再赘述。
本申请的这些和其它方面在以下(多个)实施例的描述中会更加简明易懂。
附图说明
以下参照附图来进一步说明本申请的各个特征和各个特征之间的联系。附图均为示例性的,一些特征并不以实际比例示出,并且一些附图中可能省略了本申请所涉及领域的惯常的且对于本申请非必要的特征,或是额外示出了对于本申请非必要的特征,附图所示的各个特征的组合并不用以限制本申请。另外,在本说明书全文中,相同的附图标记所指代的内容也是相同的。具体的附图说明如下:
图1是本申请一个实施方式中涉及的车机的结构示意框图;
图2是本申请一个实施方式中涉及的车机语音交互测试系统的结构示意图;
图3是图2的测试设备的结构示意图;
图4是本申请一个实施方式中涉及的电子控制单元的结构示意框图;
图5是本申请一个实施方式的语音交互系统的测试方法的流程示意图;
图6是本申请一个实施方式的音频识别装置的结构示意图;
图7是本申请一个实施方式的识别模块的结构示意图;
图8是本申请一个实施方式的音频识别方法;
图9是本申请一个实施方式的信号处理模块对输出音频的波形进行的前处理;
图10是图9的输出音频的波形和标准音频的波形的划分;
图11是本申请一个实施方式的图像识别装置的结构示意图;
图12是涉及本申请一个实施方式的图像识别方法的各方面的示意图;以及
图13是本申请实施例提供的一种计算设备的结构性示意性图。
具体实施方式
说明书和权利要求书中的词语“第一、第二、第三等”是为了在同类事物间予以区分,不代表特定排序和重要性。
在以下的描述中,所涉及的表示步骤的标号,如S101、S102……等,并不表示一定会按此步骤执行,在允许的情况下可以互换前后步骤的顺序,或同时执行。
目前有一种对语音交互系统的测试方法,其中,测试系统通过人工嘴等播放设备播放语料,车机麦克风采集播放语料,车机识别播放语料,并通过扬声器播放反馈语音。测试系统然后采集车机扬声器的音频或原始波形信号,以此判断车机是否识别语音指令并进行了语音反馈。然而,这种方案仅对车机是否进行了语音反馈进行判断,而没有判断出车机反馈语音的具体内容,无法验证车机反馈内容和语音指令意图的一致性,更不用说实现车机语音交互其他方面的验证,测试深度不足。
还有一种测试方法,在该方法中,测试系统管理语料数据库和噪声数据库,通过语音播放系统将在特定场景下需要播放的语料进行播放,并且通过噪声模拟系统,对播放的音频叠加一定分贝值的噪声,从而验证车机语音交互系统在各种噪声场景下的识别性能。这种方法能够验证车机语音交互系统在各种模拟实际行车过程中的噪声场景下的识别性能。然而,这种方法通过采集车机语音识别日志的方式仅验证了车机的语音识别性能,缺乏根据车机实际的反馈语音验证车机的反馈语音内容这个过程。也就是说,该方法仅测试了车机的语音识别性能,并没有测试车机的反馈语音性能,语音交互测试不完整,测试深度不足。此外,这种方法还无法验证车机是否针对语音指令执行除反馈语音之外的其他相关操作,无法完全验证车体逻辑交互的正确性。
有鉴于此,本申请一个实施方式提供一种测试语音交互系统的测试方法。该语音交互系统的测试方法包括:向所述语音交互系统的麦克风发送语音指令;获取所述语音交互系统的扬声器的输出音频的第一波形数据;获取标准音频的第二波形数据;将所述第一波形数据分为多个第一波形数据分块;将所述第二波形数据分为多个第二波形数据分块;计算所述第一波形数据分块与所述第二波形数据分块的相关度;根据所述相关度生成第一测试结果,所述第一测试结果指示所述输出音频与所述语音指令相匹配或不匹配。
这里的波形数据是表示音频的强度随时间的变化的数据。
采用如上的测试方法,通过对输出音频和标准音频划分成多个分块数据,并计算 输出音频的分块数据和标准音频的分块数据间的相关度,能够以较低开发成本和较高识别效率,实现对输出音频内容的识别,由此本申请提供了一种能够迅速进行测试的自动化测试方法。在具体一点说,上述方法中不必须对输出音频的内容本身进行直接识别,对内容本身进行识别通常需要较高的开发成本(例如需要进行大量语音训练等),并且识别速度较低。上述方法是通过将输出音频和标准音频进行比较,基于该比较结果来间接地对输出音频的内容进行识别,从而能够迅速地得到测试结果。另外,上述方法尤其适合应用于输出音频比较固定的场景,例如应用于车机的指令/控制性语音的交互,电话自动回复系统的交互等。
另外,采用如上的测试方法,使用根据波形数据分块相关度对采集的输出音频的波形数据和标准音频的波形数据进行比对,即对采集的输出音频的波形数据和标准音频进行分块,然后对输出音频的波形数据分块进行和标准音频的数据分块的相关性计算,满足设定条件后,可认为采集的输出音频和标准音频一致。由于实际采集的波形数据在实际环境下容易受到干扰,因此采用分块处理手段,能够在不同音频输出环境下,设定不同的判别条件,从而增加自动化测试结果的可靠性。
综上,通过如上的技术手段,本实施方式的语音交互系统的测试方法能够实现迅速、可靠的自动化测试。
另外,作为上述相关度的计算方法的例子,可以分别对每个波形数据分块进行频域相关性计算,获得多个频域相关度。这里的频域相关度是本申请中的相关度的一例。此处所谓的相关度表示的是相似程度,不言而喻,除了频域相关度外,还可以采用其他方式的相关度。
另外,在对输出音频的波形数据和标准音频的波形数据进行划分时,可以得到相同数量的第一波形数据分块和第二波形数据分块,也可以得到不同数量的第一波形数据分块和第二波形数据分块。此时,可以对较多的那一方中的一部分波形数据分块不做处理。
另外,可以用相同的时间步长对输出音频的波形数据和标准音频的波形数据进行划分,对输出音频的波形数据和标准音频的波形数据的划分所使用的时间步长分别进行调整。
另外,在本实施方式中,在计算第一波形数据分块与第二波形数据分块的相关度时,可以对每对数据分块都计算相关度,也可以仅对一部分数据分块对计算相关度。作为仅对一部分数据分块计算相关度的情形,例如,可以获取标准音频的内容,进而获取标准音频的各第二数据分块的内容,根据其内容判断是否计算与其相关度。具体而言,假设标准音频的内容是“|好|的|,|小|U|已|为|您|打|开|空|调|”,那么,可以选择不计算“|小|U|”对应的第二数据分块的相关度。
在本实施方式的测试方法中,可选地,所述计算所述第一波形数据分块与所述第二波形数据分块的相关度包括:分别计算所述多个第一波形数据分块与所述多个第二波形数据分块的多个相关度;所述根据所述相关度生成第一测试结果包括:将所述多个相关度与多个预设阈值分别进行比较,根据所述比较的结果生成第一测试结果。
在上述测试方法中,关于“根据相关度生成第一测试结果,第一测试结果指示输出音频与语音指令相匹配或不匹配”,例如,在存在多个相关度的情况下,可以是在 全部的相关度都大于预设阈值时,生成指示输出音频与语音指令相匹配的第一测试结果,也可以是在一部分相关度大于预设阈值时,生成指示输出音频与语音指令相匹配的第一测试结果,例如当大于预设阈值的相关度的数量占全部相关度的数量一定比例以上时,生成指示输出音频与语音指令相匹配的第一测试结果。
关于“输出音频与语音指令是否相匹配”的含义,以语音指令的内容是“请打开空调”为例进行说明,当输出音频的内容是“好的,小U已为您打开空调”、“好的,请问空调需要设定为多少度?”或者“电池电量不足,不能打开空调”等与空调相关的内容(即与语音指令的内容相关的内容)时,会得到指示输出音频与语音指令相匹配的第一测试结果,例如当输出音频是“今天的天气是多云”、“好的,已为您打开雨刮器”等与空调无关的内容时,会得到指示输出音频与语音指令不匹配的第一测试结果。
在本实施方式的测试方法中,可选地,所述多个预设阈值的大小被设定为不同。
在实际中,输出音频由于采集或收录条件的不同,可能包含各种噪音和杂音,输出音频的质量会影响音频识别结果。通过分别调整对应于各个分块对的预设阈值,能够提高本实施方式的测试方法对于各种输出音频质量的适应性,从而提高识别结果的可靠性。另外,通过自由调整预设阈值,能够提高本实施方式的测试方法的适用范围,使上述测试方法能够适应多种场景,保证测试结果的可靠性。
在本实施方式的测试方法中,可选地,还包括:获取所述标准音频的内容;根据所述标准音频的内容获取所述多个第二波形数据分块的内容;根据所述第二波形数据分块的内容,设定所述预设阈值。此时,设定的多个预设阈值通常是不同的,但也可能是相同的。
例如,假设标准音频的内容是“|好|的|,|小|U|已|为|您|打|开|空|调|”,此时,可以将“|小|U|”的波形数据分块对应的预设阈值设定得较低,“|打|开|空|调|”的波形数据分块对应的预设阈值设定得较高,如此,能够提高测试结果的可靠性。
在本实施方式的测试方法中,可选地,还包括:获取所述标准音频的内容;在所述第一测试结果指示所述输出音频与所述语音指令相匹配时,根据所述标准音频的内容生成所述输出音频的内容。例如,可以直接将标准音频的内容作为输出音频的内容。此外,也可以使输出音频的内容与标准音频的内容稍稍不同。
如此,语音交互系统的测试方法能够实现对响应于语音指令而产生的输出音频的内容的识别,进而能够实现输出音频内容与语音指令之间匹配性的判别,从而提高测试深度。
在本实施方式的测试方法中,可选地,所述语音交互系统包括显示器;所述测试方法还包括:获取所述显示器的第一输出图像;获取第一标准图像;基于所述第一输出图像和所述第一标准图像生成第二测试结果,所述第二测试结果指示所述第一输出图像和所述语音指令相匹配或不匹配。
通过实现对响应于语音指令而产生的输出图像的识别,以及实现输出图像与语音指令之间匹配性的判别,上述测试方法能够进一步提高测试深度,提高测试结果的可靠性。
在本实施方式的测试方法中,可选地,所述语音交互系统包括安卓调试桥接口; 所述第一输出图像是通过所述安卓调试桥接口获取的。
通过语音交互系统的安卓调试桥接口采集输出图像,上述测试方法能够快速准确地获取输出图像。与通过摄像头采集输出图像的传统方式相比,这种方式能够减小或避免采集过程中图像变形等负面因素,从而提高了后续图像识别的可靠性,进而保证测试结果的可靠性。
在本实施方式的测试方法中,可选地,所述测试方法还包括:在发送所述语音指令之后,向所述语音交互系统发送人机界面操作指令;获取所述语音交互系统的显示器的第二输出图像;获取第二标准图像;基于所述第一输出图像和所述第二标准图像生成第三测试结果,所述第三测试结果指示所述第二输出图像与所述人机界面操作指令相匹配或不匹配。
在本实施方式的测试方法中,可选地,所述人机交互系统包括安卓调试桥接口;所述第二输出图像是通过所述安卓调试桥接口获取的。
如此,通过安卓调试桥接口提供人机界面操作指令,上述测试方法能够提供与语音指令相关的操作指令,该操作指令直接作用于车机,尤其是可以直接作用于车机的人机交互界面,从而能够提供与车机的实际使用场景更加贴合的测试环境。由此,上述测试方法提高了测试深度,从而提高了测试可靠性。
在本实施方式的测试方法中,可选地,还包括:获取所述语音交互系统收发的第一报文;获取第一标准报文;根据所述第一报文和所述第一标准报文生成第四测试结果,所述第四测试结果指示所述第一报文和所述语音指令相匹配或不匹配。
通过实现对响应于语音指令而产生的报文的识别,上述测试方法能够进一步提高测试深度,提高测试可靠性。
另外,与上面的测试方法相对应,本申请一个实施方式提供了一种语音交互系统的测试设备,包括:语音指令生成装置,用于向所述语音交互系统发送语音指令;音频采集装置,用于获取所述语音交互系统的扬声器的输出音频的第一波形数据;第一获取装置,用于获取标准音频的第二波形数据;第一划分模块,用于将所述第一波形数据分为多个第一波形数据分块;第二划分模块,用于将所述第二波形数据分为多个第二波形数据分块;计算模块,用于计算所述第一波形数据分块与所述第二波形数据分块的相关度;音频判定装置,用于根据所述相关度生成第一测试结果,所述第一测试结果指示所述输出音频与所述语音指令相匹配或不匹配。
采用如上的测试方法,通过对输出音频和标准音频划分成多个分块数据,并计算输出音频的分块数据和标准音频的分块数据间的相关度,能够以较低开发成本和较高识别效率,实现对输出音频内容的识别,由此本申请提供了一种能够迅速进行测试的自动化测试方法。在具体一点说,上述方法中不必须对输出音频的内容本身进行直接识别,对内容本身进行识别通常需要较高的开发成本(例如需要进行大量语音训练等),并且识别速度较低。上述方法是通过将输出音频和标准音频进行比较,基于该比较结果来间接地对输出音频的内容进行识别,从而能够迅速地得到测试结果。另外,上述方法尤其适合应用于输出音频比较固定的场景,例如应用于车机的指令/控制性语音的交互,电话自动回复系统的交互等。
另外,采用如上的测试方法,使用根据波形数据分块相关度对采集的输出音频的 波形数据和标准音频的波形数据进行比对,即对采集的输出音频的波形数据和标准音频进行分块,然后对输出音频的波形数据分块进行和标准音频的数据分块的相关性计算,满足设定条件后,可认为采集的输出音频和标准音频一致。由于实际采集的波形数据在实际环境下容易受到干扰,因此采用分块处理手段,能够在不同音频输出环境下,设定不同的判别条件,从而增加自动化测试结果的可靠性。
综上,通过如上的技术手段,本实施方式的语音交互系统的测试方法能够实现迅速、可靠的自动化测试。
另外,作为上述相关度的计算方法的例子,可以分别对每个波形数据分块进行频域相关性计算,获得多个频域相关度。这里的频域相关度是本申请中的相关度的一例。此处所谓的相关度表示的是相似程度,不言而喻,除了频域相关度外,还可以采用其他方式的相关度。
在本实施方式的测试设备中,可选地,所述计算模块执行的所述计算所述第一波形数据分块与所述第二波形数据分块的相关度包括:分别计算所述多个第一波形数据分块与所述多个第二波形数据分块的多个相关度;所述音频判定装置执行的所述根据所述相关度生成第一测试结果包括:将所述多个相关度与多个预设阈值分别进行比较,根据所述比较的结果生成第一测试结果。
在本实施方式的测试设备中,可选地,还包括阈值调整模块,用于将所述多个预设阈值的大小设定为不同。
在实际中,输出音频由于采集或收录条件的不同,可能包含各种噪音和杂音,输出音频的质量会影响音频识别结果。通过分别调整对应于各个分块对的预设阈值,能够提高本实施方式的测试设备对于各种输出音频质量的适应性,从而提高识别结果的可靠性。另外,通过自由调整预设阈值,能够提高本实施方式的测试设备的适用范围,使上述测试方法能够适应多种场景,保证测试结果的可靠性。
在本实施方式的测试设备中,可选地,所述第一获取装置还用于:获取所述标准音频的内容;根据所述标准音频的内容获取所述多个第二波形数据分块的内容;所述阈值调整模块还用于根据所述第二波形数据分块的内容,设定所述预设阈值。
例如,假设标准音频的内容是“|好|的|,|小|U|已|为|您|打|开|空|调|”,此时,可以将“|小|U|”的波形数据分块对应的预设阈值设定得较低,“|打|开|空|调|”的波形数据分块对应的预设阈值设定得较高,如此,能够提高测试结果的可靠性。
在本实施方式的测试设备中,可选地,所述第一获取装置还用于获取所述标准音频的内容;所述测试设备还包括音频识别模块,所述音频识别模块用于在所述第一测试结果指示所述输出音频与所述语音指令相匹配时,根据所述标准音频的内容生成。
如此,语音交互系统的测试方法能够实现对响应于语音指令而产生的输出音频的内容的识别,进而能够实现输出音频内容与语音指令之间的匹配性的判别,从而提高测试深度。
在本实施方式的测试设备中,可选地,所述语音交互系统包括显示器;所述测试设备还包括:图像采集装置,用于获取所述显示器的第一输出图像;第二获取装置,用于获取第一标准图像;图像判定装置,用于根据所述第一输出图像和所述第一标准图像生成第二测试结果,所述第二测试结果指示所述第一输出图像和所述语音指令相 匹配或不匹配。
通过实现对响应于语音指令而产生的输出图像的识别,以及实现输出图像与语音指令之间是否匹配的判别,本实施方式的测试设备能够进一步提高测试深度,提高测试结果的可靠性。
在本实施方式的测试设备中,可选地,所述语音交互系统包括安卓调试桥接口;所述图像采集装置用于通过所述安卓调试桥接口获取所述第一输出图像。
通过语音交互系统的安卓调试桥接口采集输出图像,本实施方式的测试设备能够快速准确地获取输出图像。与通过摄像头采集输出图像的传统方式相比,这种方式能够减小或避免采集过程中图像变形等负面因素,从而提高了后续图像识别的可靠性,进而保证测试结果的可靠性。
在本实施方式的测试设备中,可选地,所述语音交互系统包括显示器;所述测试设备还包括:人机界面操作指令生成装置,用于在发送所述语音指令之后,向所述语音交互系统发送人机界面操作指令;图像采集装置,用于获取所述显示器的第二输出图像;第二获取装置,用于获取第二标准图像;图像判定装置,用于根据所述第一输出图像和所述第二标准图像生成第三测试结果,所述第三测试结果指示所述第二输出图像与所述人机界面操作指令相匹配或不匹配。
在本实施方式的测试设备中,可选地,所述人机交互系统包括安卓调试桥接口;所述图像采集装置用于通过所述安卓调试桥接口获取所述第二输出图像。
如此,通过安卓调试桥接口提供人机界面操作指令,本实施方式的测试设备能够提供与语音指令相关的操作指令,该操作指令直接作用于车机,尤其是可以直接作用于车机的人机交互界面,从而能够提供与车机的实际使用场景更加贴合的测试环境。由此,本实施方式的测试方法提高了测试深度,从而提高了测试可靠性。
在本实施方式的测试设备中,可选地,还包括:报文采集装置,用于获取所述语音交互系统收发的第一报文;第三获取装置,用于获取第一标准报文;报文判定装置,用于根据所述第一报文和所述第一标准报文生成第四测试结果,所述第四测试结果指示所述第一报文和所述语音指令相匹配或不匹配。
此外,本申请一个实施方式提供一种音频识别方法。在该音频识别方法中,包括:获取待识别音频的第一波形数据;获取标准音频的第二波形数据;将所述第一波形数据分为多个第一波形数据分块;将所述第二波形数据分为多个第二波形数据分块;计算所述第一波形数据分块与所述第二波形数据分块的相关度;根据所述相关度生成第一识别结果,所述第一识别结果指示所述待识别音频与所述标准音频相同或不同。
采用如上的音频识别方法,对待识别音频的波形数据和标准音频进行分块,然后对待识别音频的波形数据分块进行和标准音频的数据分块的相关度计算,满足设定条件后,可认为待识别音频和标准音频一致。由于实际采集的待识别音频的波形数据在实际环境下容易受到干扰,因此采用分块处理手段,能够在不同环境下,设定不同的判别条件,从而增加自动化判断结果的可靠性。
本申请实施方式提供的这种基于分块相关性的音频识别方法和装置能够应用于多种场景。例如,上述音频识别方法既可以应用于车机以识别人的语音指令,也可以应用于车机测试设备以识别车机响应于测试用指令产生的输出音频。此外,上述音频 识别方法不仅能够应用于车机以及用于测试车机的语音交互测试设备,还可以应用于具有音频识别功能的装置、系统以及可以测试这些装置、系统的语音交互测试设备。具有音频识别功能的装置、系统例如为机器人系统、电话自动回复系统、自动客服系统等。
类似地,本申请实施方式提供的语音交互系统的测试方法不仅可以应用于以车机作为对象的测试中,还可以类似地应用于以机器人系统、电话自动回复系统、自动客服系统等作为测试对象的测试中。
另外,作为上述相关度的计算方法的例子,可以分别对每个波形数据分块进行频域相关性计算,获得多个频域相关度。这里的频域相关度是本申请中的相关度的一例。此处所谓的相关度表示的是相似程度,不言而喻,除了频域相关度外,还可以采用其他方式的相关度。
另外,在对待识别音频的波形数据和标准音频的波形数据进行划分时,可以得到相同数量的第一波形数据分块和第二波形数据分块,也可以得到不同数量的第一波形数据分块和第二波形数据分块。此时,可以对较多的那一方中的一部分波形数据分块不做处理。
另外,在本实施方式中,在计算第一波形数据分块与第二波形数据分块的相关度时,可以对每对数据分块都计算相关度,也可以仅对一部分数据分块对计算相关度。作为仅对一部分数据分块计算相关度的情形,例如,可以获取标准音频的内容,进而获取标准音频的各第二数据分块的内容,根据其内容判断是否计算与其相关度。具体而言,假设标准音频的内容是“|好|的|,|小|U|已|为|您|打|开|空|调|”,那么,可以选择不计算“|小|U|”对应的第二数据分块的相关度。
在本实施方式的音频识别方法中,可选地,所述计算所述第一波形数据分块与所述第二波形数据分块的相关度包括:分别计算所述多个第一波形数据分块与所述多个第二波形数据分块的多个相关度;所述根据所述相关度生成第一识别结果包括:将所述多个相关度与多个预设阈值分别进行比较,根据所述比较的结果生成第一识别结果。
在本实施方式的音频识别方法中,可选地,所述多个预设阈值的大小被设定为不同。
在实际中,采集的待识别音频由于采集或收录条件的不同,可能包含各种噪音和杂音,带识别音频的质量会影响音频识别结果。通过分别调整对应于各个分块对的预设阈值,能够提高本实施方式的音频识别方法对于各种带识别音频质量的适应性,从而提高识别结果的可靠性。另外,通过自由调整预设阈值,能够提高本实施方式的方法的适用范围,使该方法能够适应多种场景,保证测试结果的可靠性。
在本实施方式的音频识别方法中,可选地,还包括:获取所述标准音频的内容;根据所述标准音频的内容获取所述多个第二波形数据分块的内容;根据所述第二波形数据分块的内容,设定所述预设阈值。
如此,能够提高识别结果的可靠性,或者说提高识别结果的精度。例如,假设标准音频的内容是“|好|的|,|小|U|已|为|您|打|开|空|调|”,此时,可以将“|小|U|”的波形数据分块对应的预设阈值设定得较低,“|打|开|空|调|”的波形数据分块对应的预设 阈值设定得较高,如此,能够提高识别结果的可靠性。
在本实施方式的音频识别方法中,可选地,还包括获取所述标准音频的内容;在所述第一识别结果指示所述待识别音频与所述标准音频相同时,根据所述标准音频的内容生成所述待识别音频的内容。
与上述音频识别方法相对应,本申请一个实施方式提供一种音频识别装置,包括:音频采集模块,用于获取待识别音频的第一波形数据;第一获取模块,用于获取标准音频的第二波形数据;第一划分模块,用于将所述第一波形数据分为多个第一波形数据分块;第二划分模块,用于将所述第二波形数据分为多个第二波形数据分块;计算模块,用于计算所述第一波形数据分块与所述第二波形数据分块的相关度;识别模块,用于根据所述相关度生成第一识别结果,所述第一识别结果指示所述待识别音频与所述标准音频相同或不同。
采用如上的音频识别装置,对待识别音频的波形数据和标准音频进行分块,然后对待识别音频的波形数据分块进行和标准音频的数据分块的相关度计算,满足设定条件后,可认为待识别音频和标准音频一致。由于实际采集的待识别音频的波形数据在实际环境下容易受到干扰,因此采用分块处理手段,能够在不同环境下,设定不同的判别条件,从而增加自动化判断结果的可靠性。
在本实施方式的音频识别装置中,可选地,所述计算模块执行的所述计算所述第一波形数据分块与所述第二波形数据分块的相关度包括:分别计算所述多个第一波形数据分块与所述多个第二波形数据分块的多个相关度;所述识别模块执行的所述根据所述相关度生成第一识别结果包括:将所述多个相关度与多个预设阈值分别进行比较,根据所述比较的结果生成第一识别结果。
在本实施方式的音频识别装置中,可选地,还包括阈值调整模块,用于将所述多个预设阈值的大小设定为不同。
在本实施方式的音频识别装置中,可选地,所述第一获取模块还用于:获取所述标准音频的内容;根据所述标准音频的内容获取所述多个第二波形数据分块的内容;所述阈值调整模块还用于根据所述第二波形数据分块的内容,设定所述预设阈值。
在本实施方式的音频识别装置中,可选地,所述第一获取模块还用于获取所述标准音频的内容;所述识别模块还用于在所述第一测试结果指示所述输出音频与所述语音指令相匹配时,根据所述标准音频的内容生所述待识别音频的内容。
以下结合附图1-12对本申请的一个实施方式进行详细描述。
图1是本申请一个实施方式中涉及的车机的结构框图。该车机2000具备语音交互功能,如图1所示,其具有控制单元2001、麦克风2002、扬声器2003、显示器2004与安卓调试桥(Android Debug Bridge,ADB)接口2005等。控制单元2001可以是电子控制单元(electronic control unit,ECU),电子控制单元是指由集成电路组成的用于实现对数据的分析处理发送等一系列功能的控制装置,上述语音交互功能的所需的运算处理由该控制单元2001执行。麦克风2002用于接收语音指令。扬声器2003例如用于向乘员发出提示音,例如,在乘员发出“打开空调”的语音指令时,在控制单元2001的控制下,扬声器2003发出“已为您打开空调”的提示音。另外,扬声器2003还可以用于播放音乐等。显示器2004例如具有触控屏,用于显示人机交互界面,此 外显示器2004例如还可以显示导航画面等。安卓调试桥接口2005用于供后述的测试设备对显示器2004显示的人机交互界面提供输入操作,还用于供测试设备从车机2000获取显示器2004显示的输出图像。
图2示意性地示出了本申请一个实施方式的车机语音交互测试系统1000的结构示意图。如图2所示,本实施方式的车机语音交互测试系统1000包括相互连接的测试管理设备1100和测试设备1200。
测试管理设备1100用于通过向测试设备1200发送测试用例来管理测试设备1200执行的测试。测试管理设备1100可以是云服务器、网络服务器、应用服务器以及管理服务器等具有数据存储、管理功能的设备或服务器。测试管理设备1100可以通过交互接口来向测试语音交互测试设备1200发送测试用例,并且可以接收来自测试语音交互测试设备1200的例如为测试结果的反馈信息。另外,作为其他实施方式,也可以将测试管理设备1100的功能与测试设备1200的功能集成在一个设备中。
测试用例可以包括语音指令用例、人机界面(Human Machine Interaction,简称HMI)操作指令用例、标准音频、标准图像和标准报文。这些内容可以全部由测试管理设备1100发送给测试设备1200,也可以是测试管理设备1100仅发送一部分内容给测试设备1200。例如,测试设备1200的本地数据库包括标准音频数据库、标准图像数据库和标准报文数据库,在执行测试时,测试管理设备1100将语音指令用例、人机界面操作指令用例发送给测试设备1200,测试设备1200从本地数据库中调取标准音频、标准图像和标准报文。
测试用例中还可以附带与各测试用指令相关联的检索索引。该检索索引供测试设备1200从全部的标准音频、标准图像和标准报文中检索与指令用例(语音指令用例和人机界面操作指令用例)相关联的标准音频、标准图像和标准报文。例如,在指令用例与空调相关时,检索出与空调功能相关的标准音频、标准图像和标准报文,用这些标准音频、标准图像和标准报文同采集到的车机中的音频、图像和报文进行比较,来判断测试是否通过。如此,与使用全部的标准音频、标准图像和标准报文进行测试相比,能够削减运算量,提高测试速度。检索到的与指令用例相关联的标准音频的数量可能是多个,也可能是一个。
测试管理设备1100向测试设备1200发送测试用例的方式可以有多种。作为一种可实现方式,可以通过传输控制协议/网际协议(Transmission Control Protocol/Internet Protocol,TCPIP)发送测试用例。
测试设备1200用于基于测试用例中包含的指令用例(语音指令和人机界面操作指令),对车机2000进行测试。这里的车机2000是本申请中的语音交互系统的一例,也可以称之为车载语音交互系统。
图3示意性地示出了图2的测试设备的结构示意图。下面将参照图3详细地描述本申请实施方式的测试设备。
测试设备1200可以包括语音指令生成装置1201、音频采集装置1202、第一获取装置1203、音频识别装置1204和音频判定装置1205,这些装置用于判别车机响应于测试用例中包含的语音指令的输出音频的内容与语音指令的意图的一致性。
具体地,语音指令生成装置1201用于基于测试管理设备1100发送的测试用例提 供语音指令。语音指令生成装置1201例如可以包括用于提供语音指令的扬声器。
音频采集装置1202用于采集车机2000的扬声器响应于语音指令产生的输出音频,得到输出音频的波形数据。这里的波形数据是表示音频的强度随时间的变化的数据。音频采集装置1202例如可以是麦克风、录音机等。另外,音频采集装置1202也可以采集扬声器的原始波形信号。另外,这里的音频采集装置1202还对应于本申请中的音频采集模块。
第一获取装置1203用于获取与语音指令相关联的标准音频,得到标准音频的内容和波形数据。如上所述,标准音频可以是从测试用例中获取的,也可以是依据测试用例中包含的检索索引从测试设备1200的音频数据库中获取的。另外,这里的第一获取装置1203还对应于本申请中的第一获取模块。
在数据库中存储标准音频的可实现方式有多种。例如,在一种可实现方式中,可以使用同一检索索引关联存储与同一语音指令的所有标准音频的波形数据。例如,在另一种可实现方式中,可以以文本的形式在数据库中存储标准音频,当其被调用时,在根据各标准音频的文本来产生标准音频的波形数据。
音频识别装置1204同音频采集装置1202和第一获取装置1203连接,用于获得输出音频的识别结果。本申请实施方式中的音频识别装置1204以及其执行的音频识别方法将在以下详述。
音频判定装置1205与语音指令生成装置1201连接,用于基于输出音频的识别结果,判定输出音频和语音指令的意图是否一致,或者说判定输出音频和语音指令是否相匹配,获得音频判定结果(“相匹配或不匹配”),该音频判定结果对应本申请中的第一识别结果。在图3中,为了简单清晰起见,语音指令生成装置1201和音频判定装置1205之间的连接并未用实线连接线表示,而是分别用从这两个装置伸出并以圆点结束的两个线段表示。
另外,在上面的描述中,第一获取装置1203、音频识别装置1204和音频判定装置1205在概念上为三个独立的装置,然而,第一获取装置1203和音频判定装置1205的功能可以集成在音频识别装置1204中,这里的音频识别装置1204对应于本申请中的音频识别装置。
另外,第一获取装置1203、音频识别装置1204和音频判定装置1205可以由电子控制单元实现。如图4所示,本申请实施方式提供了一种电子控制单元ECU,该ECU包括微型计算机(microcomputer)、输入电路、输出电路和模/数(analog-to-digital,A/D)转换器。
输入电路的主要功能是对输入信号(例如来自传感器的信号)进行预处理,输入信号不同,处理方法也不同。具体地,因为输入信号有两类:模拟信号和数字信号,所以输入电路可以包括处理模拟信号的输入电路和处理数字信号的输入电路。
A/D转换器的主要功能是将模拟信号转变为数字信号,模拟信号经过相应输入电路预处理后输入A/D转换器进行处理转换为微型计算机接受的数字信号。
输出电路是微型计算机与执行器之间建立联系的一个装置。它的功能是将微型计算机发出的处理结果转变成控制信号,以驱动执行器工作。输出电路一般采用的是功率晶体管,根据微型计算机的指令通过导通或截止来控制执行元件的电子回路。
微型计算机包括中央处理器(central processing unit,CPU)、存储器和输入/输出(input/output,I/O)接口,CPU通过总线与存储器、I/O接口相连,彼此之间可以通过总线进行信息交换。存储器可以是只读存储器(read-only memory,ROM)或随机存取存储器(random access memory,RAM)等存储器。I/O接口是中央处理单元(central processor unit,CPU)与输入电路、输出电路或A/D转换器之间交换信息的连接电路,具体的,I/O接口可以分为总线接口和通信接口。存储器存储有程序,CPU调用存储器中的程序可以执行图5、图8、图12对应实施例描述的测试方法与音频识别方法。
测试设备还可以包括人机界面操作指令生成装置1214、图像采集装置1206、第二获取装置1207、图像识别装置1208和图像判定装置1209,这些装置结合语音指令生成装置1201一起用于判别车机2000,尤其是车机2000的显示器响应于语音指令和人机界面操作指令的输出图像。这里,显示器响应语音指令输出的输出图像对应于本申请中的第一输出图像,显示器响应人机界面操作指令输出的输出图像对应于本申请中的第二输出图像。
人机界面操作指令生成装置1214用于向车机发送人机界面操作指令。人机界面操作指令是模拟人手对人机交互界面进行的操作指令,且与语音指令相关。例如,当语音指令是语音指令“打开空调”时,人机界面操作指令可以是点击人机交互界面响应语音指令显示的与空调操作相关的按钮的操作指令。人机界面操作指令生成装置1214可以是经由安卓调试桥接口对车机人机交互界面提供输入操作的控制器。
图像采集装置1206用于采集输出图像车机2000响应于语音指令和人机界面操作指令产生的输出图像。在本实施方式中,图像采集装置1206具有ADB接口,可以通过ADB接口与车机底层ADB接口连接,以直接从车机获取人机交互界面的图像。作为其他实施方式图像采集装置1206也可以是摄像头。
第二获取装置1207用于获取与语音指令和人机界面操作指令相关联的标准图像。如上所述,标准图像可以是从测试用例中获取的,也可以是依据测试用例中包含的检索索引从本地数据库中获取的。这里,与语音指令相关联的标准图像对应于本申请中的第一标准图像,与人机界面操作指令相关联的标准图像对应于本申请中的第二标准图像。
图像识别装置1208与图像采集装置1206和第二获取装置1207连接,用于基于输出图像和标准图像,获得输出图像的识别结果。本申请实施方式中的图像识别装置1208以及其涉及的图像识别方法将在以下详述。
图像判定装置1209与语音指令生成装置1201、人机界面操作指令生成装置1214和图像识别装置1208连接,用于基于输出图像的识别结果,判定输出图像同语音指令的意图或人机界面操作指令的意图是否一致,或者说判定输出图像同语音指令或人机界面操作指令是否相匹配,获得图像判定结果(“相匹配”或“不匹配”),指示输出图像与语音指令相匹配或不匹配的图像判定结果对应本申请中的第二测试结果,指示输出图像与人机界面操作指令相匹配或不匹配的图像判定结果对应本申请中的第三测试结果。在图4中,为了简单清晰起见,语音指令生成装置1201和图像判定装置1209之间的连接并未用实线连接线表示,而是分别用从这两个装置伸出并以圆点结束的两个线段表示;并且人机界面操作指令生成装置1214和图像判定装置1209之 间的连接并未用实线连接线表示,而是分别用从这两个装置伸出并以方点结束的两个线段表示。
本申请测试设备还可以包括报文采集装置1210、第三获取装置1211、报文识别装置1212和图像判定装置1209,这些装置结合语音指令生成装置1201一起用于判别车机2000响应于语音指令的上下行报文。
报文采集装置1210用于采集车机2000接收到语音指令或人机界面操作指令后收发的上下行报文。该上下行报文包括上行报文和下行报文。上行报文是车机2000响应于语音指令产生的,下行报文是车辆执行器(未示出)响应于车机2000输出的上行报文产生的。报文采集装置1210的实例有多种,本申请不对此进行限制。这里,与语音指令相关联的报文对应于本申请中的第一报文,与人机界面操作指令相关联的报文对应于本申请中的第二报文。
第三获取装置1211用于获取与语音指令、人机界面操作指令相关联的标准报文。如上所述,标准报文可以是从测试用例中获取的,也可以是依据测试用例中包含的检索索引从本地数据库中获取的。这里,与语音指令相关联的标准报文对应于本申请中的第一标准报文,与人机界面操作指令相关联的标准报文对应于本申请中的第二标准报文。
报文识别装置1212与报文采集装置1210和第三获取装置1211连接,用于基于上下行报文和标准报文,获得上下行报文的识别结果。报文识别装置1212将上下行报文和标准报文进行比对,然后将这样的比对结果(例如相关性)与预设识别条件(例如预设阈值)进行比较,如果报文的比对结果满足预设识别条件,报文识别装置1212输出肯定性报文识别结果,如果报文的比对结果不满足预设识别条件,报文识别装置1212输出否定性报文识别结果。
报文判定装置1213与语音指令生成装置1201和报文识别装置1212连接,用于基于报文识别结果,判定上下行报文同语音指令或人机界面操作指令的意图是否一致,或者说判定上下行报文和语音指令是否相匹配,获得报文判定结果(“相匹配”或“不匹配”),指示上下行报文同语音指令相匹配或不匹配的报文判定结果对应本申请中的第四测试结果,指示上下行报文与人机界面操作指令相匹配或不匹配的报文判定结果对应本申请中的第五测试结果。在图4中,为了简单清晰起见,语音指令生成装置1201和报文识别装置1212之间的连接并未用实线连接线表示,而是分别用从这两个装置伸出并以圆点结束的两个线段表示。
测试设备1200还可以包括测试汇总装置1215。测试汇总装置1215可以分别与音频判定装置1205、图像判定装置1209和报文判定装置1213连接,用于根据来自于音频判定装置1205、图像判定装置1209和报文判定装置1213的各判定结果,形成汇总测试结果(测试通过或者不通过),并将汇总测试结果例如发送给测试管理设备1100或其他设备(未示出)。例如,在输入一条语音指令后,音频判定装置1205、图像判定装置1209和报文判定装置1213的判定结果都是“一致”时,测试设备1200确定针对此条语音的汇总测试结果为“通过”。测试管理设备1100在全部的语音指令的汇总测试结果为通过时或者汇总测试结果为“通过”的结果占比超过阈值时,确定对整个车机的测试结果为“通过”。另外,测试管理设备1100的该功能也可由测试设备 1200实现,例如将该功能集成在测试汇总装置1215中。
测试设备1200通过包括上述各个装置,能够分别对车机2000响应于各测试用指令的各反馈信号(例如输出语音、输出图像和上下行报文)进行检测,实现深度较高的“全链路”检测。
图5示意性地示出了本申请实施方式的语音交互系统的测试方法的流程示意图,其包括步骤S101-115。
在步骤S101中,语音指令生成装置1201基于测试设备1200接收的测试用例,向车机2000提供语音指令。
在步骤S102中,音频采集装置1202采集车机2000响应于语音指令产生的输出音频。
在步骤S103中,第一获取装置1203获取与语音指令相关联的标准音频。
在步骤S104中,音频识别装置1204基于输出音频和标准音频,进行音频识别,获得输出音频的识别结果。此步骤中采用的具体的音频识别方法将在后面进行更详细的描述。
在步骤S105中,音频判定装置1205基于输出音频的识别结果,判定输出音频和语音指令的意图的一致性,获得音频判定结果。
在步骤S106中,人机界面操作指令生成装置1214经基于测试设备1200接收的测试用例,经由ADB接口,向车机2000提供与语音指令相关的人机界面操作指令。
在步骤S107中,图像采集装置1206采集车机2000的显示器响应于语音指令和人机界面操作指令产生的输出图像。这里,车机2000的显示器响应于语音指令产生的输出图像对应于本申请中的第一输出图像,车机2000的显示器响应于人机界面操作指令产生的输出图像对应于本申请中的第二输出图像。
在步骤S108中,第二获取装置1207获取与语音指令、人机界面操作指令相关联的标准图像。这里,与语音指令相关联的标准图像对应于本申请中的第一标准图像,与人际界面操作指令相关联的标准图像对应于本申请中的第二标准图像。
在步骤S109中,图像识别装置1208基于输出图像和标准图像,进行图像识别,获得输出图像的识别结果。此步骤中采用的具体的图像识别方法将在后面进行更详细的描述。
在步骤S110中,图像判定装置1209基于输出图像的识别结果,判定输出图像同第一、人机界面操作指令的意图的一致性,获得图像判定结果。
在步骤S111中,报文采集装置1210采集车机2000响应于测试用指令产生的上下行报文。其中,上下行报文包括上行报文和下行报文。上行报文是车机2000响应于测试用指令产生的,下行报文是车辆执行器(未示出)响应于车机2000输出的上行报文产生的。采集报文的方式有多种,本申请不对此进行限制。
在步骤S112中,第三获取装置1211获取与测试用指令相关联的标准报文。
在步骤S113中,报文识别装置1212基于上下行报文和标准报文,获得上下行报文的识别结果。
在步骤S114中,报文判定装置1213基于上下行报文的识别结果,判定上下行报文和测试用指令的意图的一致性,获得报文判定结果。具体地,可以先对上下行报文 和标准报文进行比对,然后将这样的比对结果(例如相关性)与预设识别条件(例如预设阈值)进行比较。如果报文的比对结果满足预设识别条件,输出肯定性报文识别结果;如果报文的比对结果不满足预设识别条件,输出否定性报文识别结果。
在步骤S115中,测试汇总装置1215汇总各判定结果,形成并输出汇总测试结果。
可以理解的是,步骤S101至S115并不是以实际发生的顺序排列。在实际中,步骤S102至S105、步骤S106至S110和步骤S111至S114可以以系列为单位调换顺序进行或同时进行。此外,在一些其它测试方法实施方式中,上述一些步骤可以省略,例如可以省略步骤S106至S110和/或步骤S111至S114,或者省略步骤S115。此外,在一些其它测试方法实施方式中,可以省略步骤S106,在这种情况下,步骤S107至S110可被相应地调整为仅基于语音指令进行相关的操作。
类似地,在一些其它测试设备实施方式中,上述一些装置可以省略,例如可以省略执行步骤S106至S110的各装置和/或执行步骤S111至S114的各装置,或执行步骤S115的装置。在一些其它测试设备实施方式中,可以省略人机界面操作指令生成装置1214,在这种情况下,执行步骤S107至S110的各装置可以被相应地调整为仅基于语音指令进行相关的操作。
下面参照图6至图10进一步描述本申请实施方式的音频识别装置和音频识别方法。
图6示意性地示出了本申请实施方式的音频识别装置1204的结构示意图。音频识别装置1204包括音频接收模块401、信号处理模块402、划分模块403、计算模块404、音频识别模块405、轮询模块406和阈值调整模块407。
音频接收模块401用于分别接收输出音频的波形数据和标准音频的波形数据。
信号处理模块402与音频接收模块401连接,用于对获取到的输出音频的波形进行前处理,并对处理后的波形进行归一化。
划分模块403与信号处理模块402连接,用于以相同的时间步长(例如0.5秒),对输出音频的波形数据和标准音频的波形数据进行划分,获得N(N是自然数)对波形数据块。输出音频的波形数据对应于本申请中的第一波形数据,标准音频的波形数据对应于本申请中的第二波形数据。对输出音频的波形数据进行划分得到的N个波形数据块对应于本申请中的第一波形数据分块,对标准音频的波形数据进行划分得到的N个波形数据块对应于本申请中的第二波形数据分块。另外,输出音频还对应于本申请中的待识别音频。划分模块403对应于本申请中的第一划分模块与第二划分模块。
在本实施方式中,输出音频的波形数据和标准音频的波形数据进行划分得到的波形数据块的数量相同,然而,作为其他实施方式,也可以不同,例如,省略上述信号处理模块402进行的归一化处理,对输出音频的波形数据和标准音频的波形数据按照对相同预设时间步长进行划分,如果得到不同数量的波形数据分块,可以舍去一部分波形数据分块。另外,在本实施方式中,对输出音频的波形数据和标准音频的波形数据完全按照对相同预设时间步长进行划分,然而,作为其他实施方式,可以适当调整某些波形数据分块的时间步长,例如延长输出音频的首尾处的数据分块的时间步长。
计算模块404与划分模块403连接,用于分别对每对波形数据分块进行频域相关性计算,获得N个频域相关度。这里的频域相关度是本申请中的相关度的一例。此处 所谓的相关度表示的是相似程度,不言而喻,除了频域相关度外,还可以采用其他方式的相关度。
另外,计算模块404可以对每对数据分块都计算相关度,也可以仅对一部分数据分块对计算相关度。作为仅对一部分数据分块对计算相关度的情形,例如,可以获取标准音频的内容,进而获取标准音频的各数据分块的内容,根据其内容判断是否计算与其相关度。具体而言,假设标准音频的内容是“|好|的|,|小|U|已|为|您|打|开|空|调|”,那么,可以选择不计算“|小|U|”对应的第二数据分块的相关度。
音频识别模块405分别与计算模块404和划分模块403连接,用于基于N个频域相关性和与N对波形数据块对应的N个预设阈值,获得输出音频的识别结果。
图7示意性地示出了本申请实施方式的音频识别模块405的结构示意图。如图7所示,音频识别模块405包括第一音频识别模块4051、第二音频识别模块4052和三音频识别模块4053。音频识别模块405对应本申请中的识别模块。
第一音频识别模块4051用于分别比较N个频域相关性和与N对波形数据块对应的N个预设阈值,获得N个比较结果。
第二音频识别模块4052与第一音频识别模块4051连接,用于基于预设的识别条件和N个比较结果,生成指示输出音频与标准音频相同或不同的识别结果,该识别结果对应本申请中的第一识别结果。
第三音频识别模块4053与第二音频识别模块4052连接,用于基于输出音频的识别结果,获取标准音频的内容,输出标准音频的内容作为输出音频的内容。作为其他实施方式,第三音频识别模块4053生成的输出音频的内容也可以与标准音频的内容稍稍不同,例如标准音频的内容是“小U已为您打开空调”,根据此标准音频的内容生成的输出音频的内容是“小Y已为您打开空调”。
轮询模块406分别与音频识别模块405和划分模块403连接,用于基于输出音频和标准音频的识别结果和其他未使用过的标准音频的存在,在存在未比较过的标准音频时,重复划分模块403、计算模块404和音频识别模块405所执行的处理,直至使输出音频和全部的标准音频都进行了比较。
阈值调整模块407分别与音频接收模块401和音频识别模块405连接,用于基于输出音频和标准音频,分别调整N个预设阈值,使N个预设阈值存在不同。例如,可以根据标准音频的波形数据分块的内容来调整与其对应的预设阈值,具体将在后面进行描述。此时,第一获取装置还可以根据标准音频的内容得到标准音频的波形数据分块的内容。
通过上述各个模块,本实施方式能够以较小的开发成本实现对输出音频的内容间接识别,而避免了开发成本高昂的音频内容的直接识别。此外,本实施方式由于基于数据分块进行处理,其可靠性、灵活性得到了提高。
图8示意性地示出了本申请实施方式的音频识别方法。该音频识别方法具体包括步骤S201-S209。
在步骤S201中,音频接收模块401分别接收输出音频的波形数据和标准音频的波形数据以及内容。其中,如图8所示,标准音频可以来自测试设备包括的音频数据库。
例如,当测试用指令是输入语音“请帮我打开空调”时,某被测车机响应于该测试用指令的输出音频可以是“好的,小X已为您打开空调了”,而预先存储于数据库中或包含在测试用例中的标准音频可以包括“好的,小U已为您打开空调”、“好的,空调已为您打开”以及“已开启空调”。
在步骤S202中,信号处理模块402对获取到的输出音频的波形进行前处理,并对处理后的波形进行归一化。
其中,前处理可以包括用于抑制噪声的滤波处理、用于去除杂音的相位纠偏处理等,以去除噪声、空白音频信号等对后续处理的负面影响。归一化处理可以减小后续处理的复杂度,降低后续处理的计算开销。
例如,在实际中,以上述车机反馈语音为例,输出音频可能不仅包括车机产生的“好的,小X已为您打开空调了”的反馈语音信号,还可能包括车辆行驶中产生的振动噪声和车厢内例如闲聊的环境噪声等。在这种情况下,可以对输出音频进行滤波处理,以抑制噪声。
此外,车机在收到测试用指令后需要一段时间才能产生反馈语音信号,由此,输出音频还可能包括在该段时间内收录的空白音频信号。在这种情况下,可以对输出音频进行相位纠偏处理。
图9示意性地示出了本申请实施方式的信号处理模块对输出音频的波形信号进行的前处理,图9上部为输出音频的波形,下部为标准音频的波形,图中横轴为时间,纵轴为幅值。由于输出音频与标准音频包括空白音频,因此通过将输出音频与标准音频的波形整体向左平移(去掉空白部分),从而使得输出音频和标准音频的初相位基本一致,来实现相位纠偏。此外,还可以以输出音频和标准音频之间的较短者为基准,对较长者进行截取(例如去掉较长者的比较短者长的后部,参照图10中竖直粗虚线后面的部分),使得处理后的输出音频和标准音频的时长基本一致。
在步骤S203中,划分模块403以相同的方式,对输出音频的波形和标准音频的波形进行划分,获得N对波形数据块。
图10示意性地示出了图9的输出音频的波形和标准音频的波形的划分。在经过图4的相位纠偏处理后,图10上部的输出音频的初相位和图10下部的标准音频的初相位基本一致,并且两者的时长基本一致。以0.5秒时长为单位,对输出音频和标准音频进行划分,得到12对波形数据块。
在本申请中,以相同方式划分意在使得一对波形数据块中的两个波形数据块的时长相等,但不意在限制不同波形对的时长。例如,在本申请的其它一些实施方式中,第1对波形数据块的时长可以不等于第2对波形数据块的时长。此外,可以对波形数据块的单位时长进行调整。例如,在本申请的其它一些实施方式中,用于划分的单位时长在0.2至0.7秒的范围内。
在步骤S204中,计算模块404分别对每对波形数据块进行频域相关性计算,获得N对波形数据块。
以图10示出的分块方式为例,在步骤S204中,先分别对12对波形数据块中的每个波形数据块进行快速傅里叶变换(fast Fourier transform,FFT),使得总共24个波形数据块从时域函数转换为频域函数。然后在以一对波形数据块为单位,分别对每 对频域波形数据块进行相关性计算,获得12个频域相关性。
在本申请的其它一些实施方式中,可以使用其它时域-频域变换的方式,例如其它的傅里叶变换。
在步骤S205中,阈值调整模块407基于输出音频和标准音频,分别调整N个预设阈值,其中N个预设阈值对应于N个波形对。
以上述输出音频“好的,小X已为您打开空调了”和标准音频“好的,小U已为您打开空调”为例。在上述步骤S202中,可以将输出音频截取为“好的,小X已为您打开空调”。并且在上述步骤S203中,可以将输出音频和标准音频分别划分为12块,形成12对波形数据块:
|好|的|,|小|U|已|为|您|打|开|空|调|,
|好|的|,|小|X|已|为|您|打|开|空|调|。
然后在上述步骤S204中,计算了上述12对波形数据块的频域相关性。在S205中,则根据测试用指令“打开空调”和标准音频“好的,小X已为您打开空调”对与12个N个频域相关性对于的12个预设阈值调整,将与第9至12个波形对(即|打|开|空|调|)对应的预设阈值设定为较高的值,例如85;将与第4至5个波形对(即|小|U|和|小|X|)对应的预设阈值设定为较低值,例如0;将其它波形对对应的预设阈值设定为相对低的值,例如为60。也就是说,在步骤S205中,可以调高音频的重点识别区域的预设阈值,降低音频的非重点识别区域的预设阈值,甚至可以将可能由于车机不同而产生的差异化区域(例如不同车机的拟人化称呼小U和小X)的预设阈值设定为0。通过这种方式,能够在较大程度上提高音频识别成功率,降低音频的误识别率。
可以理解的是,上述阈值的调整和设定方式仅是示例性的。在本申请的其它一些实施方式中,分块的个数越少,各分块的各预设阈值的平均赋值越高;分块的个数越多,各分块的各预设阈值的平均赋值越低。
在步骤S206中,音频识别模块405的第一音频识别模块4051分别比较N个频域相关性和N个预设阈值,获得N个比较结果。
具体地,如果其中一个频域相关性大于与其对应的预设阈值时,则它们的比较结果为肯定性结果(“是”)。反之,即该频域相关性小于或等于与其对应的预设阈值时,则它们的比较结果为否定性结果(“否”)。
在步骤S207中,音频识别模块405的第二音频识别模块4052基于预设的识别条件和N个比较结果,获得输出音频的识别结果,其中识别结果为肯定性结果时,执行步骤S208,识别结果为否定性结果时,执行步骤S209。
例如,预设的识别条件是肯定性结果的数量是所有比较结果数量的80%以上,当N个比较结果满足该预设的识别条件时,该输出音频则被识别为是该标准音频,当N个比较结果不满足该预设的识别条件时,该输出音频则被识别为不是该标准音频。可以理解的是,上述预设的识别条件的比例值仅是示例性的。在本申请的其它一些实施方式中,分块的个数越多,预设的识别条件的比例赋值越低;而分块的个数越少,预设的识别条件的比例赋值越高。
在步骤S208中,音频识别模块405的第三音频识别模块4053获取标准音频的内容,将该标准音频的内容作为输出音频的内容并输出。
例如,当数据库中的标准音频以文本形式存储时,可以直接调用该标准音频的文本作为输出音频的内容。当数据库中的标准音频是以波形形式存储时,还需要存储有其内容的文件,在这种情况下,步骤S201获取该标准音频的波形,而步骤S208则获取该标准音频的内容的文件。
在步骤S209中,轮询模块406判断是否还有其它与测试用指令关联的标准音频,如果有,则转至步骤S203,在输出音频与所有标准音频的识别结果均为否定性结果时,输出否定性识别结果。
以上述测试用指令“打开空调”、输出音频“好的,小X已为您打开空调了”和相关联的3个标准音频为例。例如被划分并与输出音频“好的,小X已为您打开空调了”比较第1个标准音频为“好的,空调已为您打开”,那么在步骤S207中,输出音频将被识别为不是该第1个标准音频“好的,空调已为您打开。随后在步骤S209中,将以第2个标准音频“已开启空调”作为标准音频,转至步骤S203;重复后续各步骤,然后在步骤S207中,输出音频将再一次被识别为不是该第2个标准音频“已开启空调”。随即在步骤S209中,将以第3个标准音频“好的,小U已为您打开空调”作为标准音频,转至步骤S203;重复后续各步骤;在步骤S207中,输出音频“好的,小X已为您打开空调了”被识别为是该第3个标准音频“好的,小U已为您打开空调”;最后在步骤S208中获取第3个标准音频的内容,输出第3个标准音频的内容作为输出音频的内容。或者,在步骤S207中,输出音频“好的,小X已为您打开空调了”被识别为不是该第3个标准音频“好的,小U已为您打开空调”,在随后的步骤S209中,由于此时已经没有其他未经过识别处理的标准音频,轮询模块406将输出否定性识别结果。
可以理解的是,步骤S201至S209并不是以实际发生的顺序排列。例如,在一些其它音频识别方法实施方式中,步骤S205可以发生在步骤S204之前。此外,在一些其它音频识别方法实施方式中,上述一些步骤可以省略,例如可以省略步骤S202、S205、S208、S209或它们的任意组合。此外,在一些其它音频识别方法实施方式中,S207可以规定为直接输出肯定性或否定性的识别结果。类似地,在一些其它音频识别装置实施方式中,上述一些模块可以省略,例如可以省略执行步骤S202、S205、S208或S209的模块,或这些模块的任意组合。
下面参照图11和图12进一步描述本申请实施方式的图像识别装置和图像识别方法。
图11示意性地示出了本申请实施方式的图像识别装置1208的结构示意图。图像识别装置1208包括图像接收模块801、灰度处理模块802、二值化处理模块803、图像匹配识别模块804和图像文字识别模块805。
图像接收模块801用于接收车机2000,尤其是车机的人机交互界面(User Interface,UI)响应于语音指令和/或人机界面操作指令(例如切屏操作)产生的输出图像,以及接收相应地与语音指令和/或人机界面操作指令(例如切屏操作)相关联的标准图像。
灰度处理模块802与图像接收模块801连接,用于对输出图像进行灰度处理。由于图像的灰度处理是较为成熟的技术,本申请对于灰度处理的可实现方式不再详述。
二值化处理模块803与灰度处理模块802连接,用于对输出图像进行二值化处理。 由于图像的二值化处理是较为成熟的技术,本申请对于二值化处理的可实现方式不再详述。
图像匹配识别模块804与二值化处理模块803和图像接收模块801连接,用于将经过处理的输出图像以预设模板为形状进行截取,然后将经过截取的输出图像与标准图像进行匹配识别,输出第一图像识别结果。
图像文字识别模块805与二值化处理模块803连接,用于对经过处理的输出图像进行光学字符识别(Optical Character Recognition,OCR),输出第二图像识别结果。由于图像的光学字符识别是较为成熟的技术,本申请对于光学字符识别的可实现方式不再详述。
图12示意性地示出了涉及本申请实施方式的图像识别方法的各方面的示意图。
其中,在图12中测试设备的右侧,示意性地示出了与本申请实施方式的图像识别方法相关的上述步骤S101、S106至S108。其中,本申请实施方式的测试设备根据测试用例分别向车机发送语音指令和作为人机界面操作指令的切屏操作指令。其中,切屏操作指令是经由ADB接口传输的。车机的人机交互界面响应于语音指令和人机界面操作指令产生输出图像。测试设备分别从车机采集其人机交互界面的输出图像和从图像数据库中获取与语音指令和第二术语指令相关联的标准图像。
在图12中测试设备的左侧,示意性地示出了本申请一个实施方式中涉及的图像识别方法,该图像识别方法包括下述步骤S301至S305。
在步骤S301中,图像接收模块801例如从车机的人机交互界面接收其响应于语音指令和/或人机界面操作指令(例如切屏操作)产生的输出图像,以及例如从车机的图像数据库接收相应地与语音指令和/或人机界面操作指令(例如切屏操作)相关联的标准图像。
在步骤S302中,灰度处理模块802对输出图像进行灰度处理。
在步骤S303中,二值化处理模块803对输出图像进行二值化处理。
在步骤S304中,图像匹配识别模块804将经过处理的输出图像以预设模板为形状进行截取,然后将经过截取的输出图像与标准图像进行匹配识别,输出第一图像识别结果。
在步骤S305中,图像文字识别模块805对经过处理的输出图像进行光学字符识别输出第二图像识别结果。
可以理解的是,上述一些步骤可以省略。例如在一些其它图像识别方法实施方式中,可以省略步骤S302和S303。在另一些其它图像识别方法实施方式中,可以省略步骤S304或S305。
类似地,上述一些模块可以省略。例如在一些其它图像识别装置实施方式中,可以省略执行步骤S302和S303的各模块。在另一些其他图像识别装置实施方式中,可以省略执行步骤S304或执行步骤S305的模块。
图13是本申请实施例提供的一种计算设备1500的结构性示意性图。该计算设备1500包括:处理器1510、存储器1520、通信接口1530、总线1540。
应理解,图13所示的计算设备1500中的通信接口1530可以用于与其他设备之间进行通信。
其中,该处理器1510可以与存储器1520连接。该存储器1520可以用于存储该程序代码和数据。因此,该存储器1520可以是处理器1510内部的存储单元,也可以是与处理器1510独立的外部存储单元,还可以是包括处理器1510内部的存储单元和与处理器1510独立的外部存储单元的部件。
可选的,计算设备1500还可以包括总线1540。其中,存储器1520、通信接口1530可以通过总线1540与处理器1510连接。总线1540可以是外设部件互连标准(Peripheral Component Interconnect,PCI)总线或扩展工业标准结构(Extended Industry Standard Architecture,EISA)总线等。总线1540可以分为地址总线、数据总线、控制总线等。为便于表示,图13中仅用一条线表示,但并不表示仅有一根总线或一种类型的总线。
应理解,在本申请实施例中,该处理器1510可以采用中央处理单元(central processing unit,CPU)。该处理器还可以是其它通用处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现成可编程门阵列(field programmable gate Array,FPGA)或者其它可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。或者该处理器1510采用一个或多个集成电路,用于执行相关程序,以实现本申请实施例所提供的技术方案。
该存储器1520可以包括只读存储器和随机存取存储器,并向处理器1510提供指令和数据。处理器1510的一部分还可以包括非易失性随机存取存储器。例如,处理器1510还可以存储设备类型的信息。
在计算设备1500运行时,处理器1510执行存储器1520中的计算机执行指令执行上述方法的操作步骤。
应理解,根据本申请实施例的计算设备1500可以对应于执行根据本申请各实施例的方法中的相应主体,并且计算设备1500中的各个模块的上述和其它操作和/或功能分别为了实现本实施例各方法的相应流程,为了简洁,在此不再赘述。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显 示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
本申请实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时用于执行上述各音频识别方法、图像识别方法和测试方法中的任一种方法,该方法包括上述各个实施例所描述的方案中的至少之一。
本申请实施例的计算机存储介质,可以采用一个或多个计算机可读的介质的任意组合。计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质。计算机可读存储介质例如可以是,但不限于,电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本文件中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。
计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。
计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括、但不限于无线、电线、光缆、RF等等,或者上述的任意合适的组合。
可以以一种或多种程序设计语言或其组合来编写用于执行本申请操作的计算机程序代码,所述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络,包括局域网(LAN)或广域网(WAN),连接到用户计算机,或者,可以连接到外 部计算机(例如利用因特网服务提供商来通过因特网连接)。
注意,上述仅为本申请的较佳实施例及所运用的技术原理。本领域技术人员会理解,本申请不限于这里所述的特定实施例,对本领域技术人员来说能够进行各种明显的变化、重新调整和替代而不会脱离本申请的保护范围。因此,虽然通过以上实施例对本申请进行了较为详细的说明,但是本申请不仅仅限于以上实施例,在不脱离本申请的构思的情况下,还可以包括更多其他等效实施例,均属于本申请的保护范畴。
例如,在上面的描述中,以相关度大于阈值的波形数据分块的数量是所有比较结果数量的80%以上为条件,将输出音频的内容识别为标准音频的内容,然而,本申请并不限于此,例如,可以对计算出的多个相关度求平均值,在平均值大于平均值阈值时,将输出音频的内容识别为标准音频的内容;或者,对计算出的多个相关度赋予不同的权重值,在此基础上计算平均值。这里的权重值可以根据标准音频的内容进行设定。

Claims (32)

  1. 一种测试语音交互系统的测试方法,其特征在于,包括:
    向所述语音交互系统发送语音指令;
    获取所述语音交互系统的扬声器的输出音频的第一波形数据;
    获取标准音频的第二波形数据;
    将所述第一波形数据分为多个第一波形数据分块;
    将所述第二波形数据分为多个第二波形数据分块;
    计算所述第一波形数据分块与所述第二波形数据分块的相关度;
    根据所述相关度生成第一测试结果,所述第一测试结果指示所述输出音频与所述语音指令相匹配或不匹配。
  2. 根据权利要求1所述的测试方法,其特征在于,所述计算所述第一波形数据分块与所述第二波形数据分块的相关度包括:
    分别计算所述多个第一波形数据分块与所述多个第二波形数据分块的多个相关度;
    其中,所述根据所述相关度生成第一测试结果包括:
    将所述多个相关度与多个预设阈值分别进行比较,根据所述比较的结果生成所述第一测试结果。
  3. 根据权利要求2所述的测试方法,其特征在于,所述多个预设阈值的大小被设定为不同。
  4. 根据权利要求2或3所述的测试方法,其特征在于,还包括:
    获取所述标准音频的内容;
    根据所述标准音频的内容获取所述多个第二波形数据分块的内容;
    根据所述第二波形数据分块的内容,设定所述预设阈值。
  5. 根据权利要求1-4中任一项所述的测试方法,其特征在于,还包括:
    获取所述标准音频的内容;
    在所述第一测试结果指示所述输出音频与所述语音指令相匹配时,根据所述标准音频的内容生成所述输出音频的内容。
  6. 根据权利要求1-5中任一项所述的测试方法,其特征在于,还包括:
    获取所述语音交互系统的显示器的第一输出图像;
    获取第一标准图像;
    根据所述第一输出图像和所述第一标准图像生成第二测试结果,所述第二测试结果指示所述第一输出图像和所述语音指令相匹配或不匹配。
  7. 根据权利要求6所述的测试方法,其特征在于,所述获取所述语音交互系统的显示器的第一输出图像包括:通过所述语音交互系统的安卓调试桥接口获取所述第一输出图像。
  8. 根据权利要求1-7中任一项所述的测试方法,其特征在于,还包括:
    在发送所述语音指令之后,向所述语音交互系统发送人机界面操作指令;
    获取所述语音交互系统的显示器的第二输出图像;
    获取第二标准图像;
    根据所述第二输出图像和所述第二标准图像生成第三测试结果,所述第三测试结果指示所述第二输出图像与所述人机界面操作指令相匹配或不匹配。
  9. 根据权利要求8所述的测试方法,其特征在于,所述获取所述语音交互系统的显示器的第二输出图像包括:通过所述语音交互系统的安卓调试桥接口获取所述第二输出图像。
  10. 根据权利要求1-9中任一项所述的测试方法,其特征在于,还包括:
    获取所述语音交互系统收发的第一报文;
    获取第一标准报文;
    根据所述第一报文和所述第一标准报文生成第四测试结果,所述第四测试结果指示所述第一报文和所述语音指令相匹配或不匹配。
  11. 一种音频识别方法,其特征在于,包括:
    获取待识别音频的第一波形数据;
    获取标准音频的第二波形数据;
    将所述第一波形数据分为多个第一波形数据分块;
    将所述第二波形数据分为多个第二波形数据分块;
    计算所述第一波形数据分块与所述第二波形数据分块的相关度;
    根据所述相关度生成第一识别结果,所述第一识别结果指示所述待识别音频与所述标准音频相同或不同。
  12. 根据权利要求11所述的音频识别方法,其特征在于,
    所述计算所述第一波形数据分块与所述第二波形数据分块的相关度包括:分别计算所述多个第一波形数据分块与所述多个第二波形数据分块的多个相关度;
    其中,所述根据所述相关度生成第一识别结果包括:将所述多个相关度与多个预设阈值分别进行比较,根据所述比较的结果生成所述第一识别结果。
  13. 根据权利要求12所述的音频识别方法,其特征在于,所述多个预设阈值的大小被设定为不同。
  14. 根据权利要求12或13所述的音频识别方法,其特征在于,还包括:
    获取所述标准音频的内容;
    根据所述标准音频的内容获取所述多个第二波形数据分块的内容;
    根据所述第二波形数据分块的内容,设定所述预设阈值。
  15. 根据权利要求11-14中任一项所述的音频识别方法,其特征在于,还包括:
    获取所述标准音频的内容;
    在所述第一识别结果指示所述待识别音频与所述标准音频相同时,根据所述标准音频的内容生成所述待识别音频的内容。
  16. 一种用于测试语音交互系统的测试设备,其特征在于,包括:
    语音指令生成装置,用于向所述语音交互系统发送语音指令;
    音频采集装置,用于获取所述语音交互系统的扬声器的输出音频的第一波形数据;
    第一获取装置,用于获取标准音频的第二波形数据;
    第一划分模块,用于将所述第一波形数据分为多个第一波形数据分块;
    第二划分模块,用于将所述第二波形数据分为多个第二波形数据分块;
    计算模块,用于计算所述第一波形数据分块与所述第二波形数据分块的相关度;
    音频判定装置,用于根据所述相关度生成第一测试结果,所述第一测试结果指示所述输出音频与所述语音指令相匹配或不匹配。
  17. 根据权利要求16所述的测试设备,其特征在于,
    所述计算模块执行的所述计算所述第一波形数据分块与所述第二波形数据分块的相关度包括:分别计算所述多个第一波形数据分块与所述多个第二波形数据分块的多个相关度;
    所述音频判定装置执行的所述根据所述相关度生成第一测试结果包括:将所述多个相关度与多个预设阈值分别进行比较,根据所述比较的结果生成所述第一测试结果。
  18. 根据权利要求17所述的测试设备,其特征在于,还包括阈值调整模块,用于将所述多个预设阈值的大小设定为不同。
  19. 根据权利要求18所述的测试设备,其特征在于,
    所述第一获取装置还用于:
    获取所述标准音频的内容;
    根据所述标准音频的内容获取所述多个第二波形数据分块的内容;
    所述阈值调整模块还用于根据所述第二波形数据分块的内容,设定所述预设阈值。
  20. 根据权利要求16-19中任一项所述的测试设备,其特征在于,
    所述第一获取装置还用于获取所述标准音频的内容;
    所述测试设备还包括音频识别模块,所述音频识别模块用于在所述第一测试结果指示所述输出音频与所述语音指令相匹配时,根据所述标准音频的内容生成所述输出音频的内容。
  21. 根据权利要求16-20中任一项所述的测试设备,其特征在于,还包括:
    图像采集装置,用于获取所述语音交互系统的显示器的第一输出图像;
    第二获取装置,用于获取第一标准图像;
    图像判定装置,用于根据所述第一输出图像和所述第一标准图像生成第二测试结果,所述第二测试结果指示所述第一输出图像和所述语音指令相匹配或不匹配。
  22. 根据权利要求21所述的测试设备,其特征在于,所述图像采集装置执行的所述获取所述语音交互系统的显示器的第一输出图像包括:通过所述语音交互系统的安卓调试桥接口获取所述第一输出图像。
  23. 根据权利要求16-22中任一项所述的测试设备,其特征在于,还包括:
    人机界面操作指令生成装置,用于在发送所述语音指令之后,向所述语音交互系统发送人机界面操作指令;
    图像采集装置,用于获取所述语音交互系统的显示器的第二输出图像;
    第二获取装置,用于获取第二标准图像;
    图像判定装置,用于根据所述第二输出图像和所述第二标准图像生成第三测试结果,所述第三测试结果指示所述第二输出图像与所述人机界面操作指令相匹配或不匹配。
  24. 根据权利要求23所述的测试设备,其特征在于,所述图像采集装置执行的所述获取所述语音交互系统的显示器的第二输出图像包括:通过所述人机交互系统的安卓调试桥接口获取所述第二输出图像。
  25. 根据权利要求16-24中任一项所述的测试设备,其特征在于,还包括:
    报文采集装置,用于获取所述语音交互系统收发的第一报文;
    第三获取装置,用于获取第一标准报文;
    报文判定装置,用于根据所述第一报文和所述第一标准报文生成第四测试结果,所述第四测试结果指示所述第一报文和所述语音指令相匹配或不匹配。
  26. 一种音频识别装置,其特征在于,包括:
    音频采集模块,用于获取待识别音频的第一波形数据;
    第一获取模块,用于获取标准音频的第二波形数据;
    第一划分模块,用于将所述第一波形数据分为多个第一波形数据分块;
    第二划分模块,用于将所述第二波形数据分为多个第二波形数据分块;
    计算模块,用于计算所述第一波形数据分块与所述第二波形数据分块的相关度;
    识别模块,用于根据所述相关度生成第一识别结果,所述第一识别结果指示所述待识别音频与所述标准音频相同或不同。
  27. 根据权利要求26所述的音频识别装置,其特征在于,
    所述计算模块执行的所述计算所述第一波形数据分块与所述第二波形数据分块的相关度包括:分别计算所述多个第一波形数据分块与所述多个第二波形数据分块的多个相关度;
    其中,所述识别模块执行的所述根据所述相关度生成第一识别结果包括:将所述多个相关度与多个预设阈值分别进行比较,根据所述比较的结果生成所述第一识别结果。
  28. 根据权利要求27所述的音频识别装置,其特征在于,还包括阈值调整模块,用于将所述多个预设阈值的大小设定为不同。
  29. 根据权利要求28所述的音频识别装置,其特征在于,
    所述第一获取模块还用于:
    获取所述标准音频的内容;
    根据所述标准音频的内容获取所述多个第二波形数据分块的内容;
    所述阈值调整模块还用于根据所述第二波形数据分块的内容,设定所述预设阈值。
  30. 根据权利要求26-29中任一项所述的音频识别装置,其特征在于,
    所述第一获取模块还用于获取所述标准音频的内容;
    所述识别模块还用于在所述第一识别结果指示所述待识别音频与所述标准音频相同时,根据所述标准音频的内容生成所述待识别音频的内容。
  31. 一种计算设备,其特征在于,包括至少一个处理器与至少一个存储器,所述存储器存储有程序指令,所述程序指令当被所述至少一个处理器执行时使得所述至少一个处理器执行权利要求1-15中任一项所述的方法。
  32. 一种计算机可读存储介质,存储有程序指令,其特征在于,所述程序指令当被计算机执行时使得所述计算机执行权利要求1-15中任一项所述的方法。
PCT/CN2022/081530 2021-03-24 2022-03-17 语音交互系统的测试方法、音频识别方法及相关设备 WO2022199461A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110314846.2A CN115132173A (zh) 2021-03-24 2021-03-24 语音交互系统的测试方法、音频识别方法及相关设备
CN202110314846.2 2021-03-24

Publications (1)

Publication Number Publication Date
WO2022199461A1 true WO2022199461A1 (zh) 2022-09-29

Family

ID=83374426

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/081530 WO2022199461A1 (zh) 2021-03-24 2022-03-17 语音交互系统的测试方法、音频识别方法及相关设备

Country Status (2)

Country Link
CN (1) CN115132173A (zh)
WO (1) WO2022199461A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117009755A (zh) * 2023-10-07 2023-11-07 国仪量子(合肥)技术有限公司 波形数据的处理方法、计算机可读存储介质和电子设备

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107516534A (zh) * 2017-08-31 2017-12-26 广东小天才科技有限公司 一种语音信息的比对方法、装置及终端设备
CN109003602A (zh) * 2018-09-10 2018-12-14 百度在线网络技术(北京)有限公司 语音产品的测试方法、装置、设备及计算机可读介质
CN110808030A (zh) * 2019-11-22 2020-02-18 珠海格力电器股份有限公司 语音唤醒方法、系统、存储介质及电子设备
CN110838285A (zh) * 2019-11-20 2020-02-25 青岛海尔科技有限公司 终端语音测试的系统、方法及装置
CN110880329A (zh) * 2018-09-06 2020-03-13 腾讯科技(深圳)有限公司 一种音频识别方法及设备、存储介质
CN111933108A (zh) * 2020-09-25 2020-11-13 蘑菇车联信息科技有限公司 一种智能网联终端智能语音交互系统自动化测试方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107516534A (zh) * 2017-08-31 2017-12-26 广东小天才科技有限公司 一种语音信息的比对方法、装置及终端设备
CN110880329A (zh) * 2018-09-06 2020-03-13 腾讯科技(深圳)有限公司 一种音频识别方法及设备、存储介质
CN109003602A (zh) * 2018-09-10 2018-12-14 百度在线网络技术(北京)有限公司 语音产品的测试方法、装置、设备及计算机可读介质
CN110838285A (zh) * 2019-11-20 2020-02-25 青岛海尔科技有限公司 终端语音测试的系统、方法及装置
CN110808030A (zh) * 2019-11-22 2020-02-18 珠海格力电器股份有限公司 语音唤醒方法、系统、存储介质及电子设备
CN111933108A (zh) * 2020-09-25 2020-11-13 蘑菇车联信息科技有限公司 一种智能网联终端智能语音交互系统自动化测试方法

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117009755A (zh) * 2023-10-07 2023-11-07 国仪量子(合肥)技术有限公司 波形数据的处理方法、计算机可读存储介质和电子设备
CN117009755B (zh) * 2023-10-07 2023-12-19 国仪量子(合肥)技术有限公司 波形数据的处理方法、计算机可读存储介质和电子设备

Also Published As

Publication number Publication date
CN115132173A (zh) 2022-09-30

Similar Documents

Publication Publication Date Title
KR102339594B1 (ko) 객체 인식 방법, 컴퓨터 디바이스 및 컴퓨터 판독 가능 저장 매체
CN110415681B (zh) 一种语音识别效果测试方法及系统
US10861480B2 (en) Method and device for generating far-field speech data, computer device and computer readable storage medium
WO2019196205A1 (zh) 外语教学评价信息生成方法以及装置
CN107799126A (zh) 基于有监督机器学习的语音端点检测方法及装置
CN102723080B (zh) 一种语音识别测试系统及方法
CN111933108B (zh) 一种智能网联终端智能语音交互系统自动化测试方法
TWI539440B (zh) 互動式語音識別電子裝置及方法
CN109326305B (zh) 一种批量测试语音识别和文本合成的方法和测试系统
KR20160024858A (ko) 지역성 말투를 구분하는 음성 데이터 인식 방법, 장치 및 서버
WO2019228306A1 (zh) 对齐语音的方法和装置
WO2020253073A1 (zh) 语音端点检测方法、装置、设备及存储介质
WO2022199461A1 (zh) 语音交互系统的测试方法、音频识别方法及相关设备
CN107274895B (zh) 一种语音识别设备及方法
WO2022033109A1 (zh) 语音检测方法、装置和电子设备
KR20180012639A (ko) 음성 인식 방법, 음성 인식 장치, 음성 인식 장치를 포함하는 기기, 음성 인식 방법을 수행하기 위한 프로그램을 저장하는 저장 매체, 및 변환 모델을 생성하는 방법
WO2021051566A1 (zh) 机器合成语音识别方法、装置、电子设备及存储介质
US12039970B1 (en) System and method for source authentication in voice-controlled automation
CN113643704A (zh) 车机语音系统的测试方法、上位机、系统和存储介质
TWI774472B (zh) 異音檢測方法及裝置
CN110503941A (zh) 语言能力评测方法、装置、系统、计算机设备及存储介质
WO2023179229A1 (zh) 用于测试空调的方法及装置、测试系统、存储介质
WO2023193573A1 (zh) 一种音频处理方法、装置、存储介质及电子设备
WO2022166220A1 (zh) 一种语音分析方法及其语音记录装置
WO2021136298A1 (zh) 一种语音处理方法、装置、智能设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22774122

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22774122

Country of ref document: EP

Kind code of ref document: A1