CN109979487B - Voice signal detection method and device - Google Patents

Voice signal detection method and device Download PDF

Info

Publication number
CN109979487B
CN109979487B CN201910172909.8A CN201910172909A CN109979487B CN 109979487 B CN109979487 B CN 109979487B CN 201910172909 A CN201910172909 A CN 201910172909A CN 109979487 B CN109979487 B CN 109979487B
Authority
CN
China
Prior art keywords
voice
signal
detected
collection
voice signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910172909.8A
Other languages
Chinese (zh)
Other versions
CN109979487A (en
Inventor
张腾飞
陈建哲
钟思思
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Apollo Zhilian Beijing Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201910172909.8A priority Critical patent/CN109979487B/en
Publication of CN109979487A publication Critical patent/CN109979487A/en
Application granted granted Critical
Publication of CN109979487B publication Critical patent/CN109979487B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/60Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for measuring the quality of voice signals

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The embodiment of the invention provides a voice signal detection method and a voice signal detection device. Wherein the method comprises the following steps: sending a playing instruction to a playing device; sending a signal collection instruction to collection equipment according to the voice index to be detected; receiving a voice signal to be detected returned by the collecting equipment according to the signal collecting instruction, wherein the voice signal to be detected comprises a signal processed by a voice function node related to the voice index to be detected in the collecting equipment; and obtaining an analysis result according to the voice signal to be detected. According to the embodiment of the invention, the signal collection instruction is sent to the collection equipment according to the voice index to be detected, so that the voice signal to be detected processed by the voice function node related to the index in the collection equipment is obtained, and whether each voice function node in the collection equipment is normal or not is conveniently determined by using the voice signal to be detected.

Description

Voice signal detection method and device
Technical Field
The present invention relates to the field of multimedia technologies, and in particular, to a method and an apparatus for detecting a voice signal.
Background
At present, in the field of automatic driving, detection methods of voice system performance, voice quality and the like are immature. Many vehicle enterprises have fixed accessory suppliers, such as vehicle hardware (e.g., processor, display, etc.), audio, microphone module, Global Positioning System (GPS), vehicle mounted intelligent BOX (TBOX), etc. If the vehicle hardware, the microphone module, the sound, the vehicle system voice signal path and the like go wrong, voice awakening and poor recognition effect can be directly caused. The difficulty of modifying the hardware of the car machine is high when the car leaves the factory and the problems are found.
Disclosure of Invention
The embodiment of the invention provides a voice signal detection method and a voice signal detection device, which are used for solving one or more technical problems in the prior art.
In a first aspect, an embodiment of the present invention provides a method for detecting a speech signal, including:
sending a playing instruction to a playing device;
sending a signal collection instruction to collection equipment according to the voice index to be detected;
receiving a voice signal to be detected returned by the collecting equipment according to the signal collecting instruction, wherein the voice signal to be detected comprises a signal processed by a voice function node related to the voice index to be detected in the collecting equipment;
and obtaining an analysis result according to the voice signal to be detected.
In one embodiment, the playing device and the collecting device are the same device.
In one embodiment, the signal collection indication is used to instruct the collection device to collect at least one of a voice signal picked up by a microphone, a voice signal output by a DSP, and a voice signal acquired by application layer software.
In an embodiment, the obtaining an analysis result according to the speech signal to be tested includes:
and analyzing the voice signal to be detected by utilizing a voice algorithm, wherein the voice algorithm comprises at least one of spectral analysis, time delay analysis and noise suppression of the voice signal.
In one embodiment, the voice index to be tested includes: at least one of round-trip delay value, node voice quality, near-field voice signal amplitude, system background noise of the equipment to be tested, microphone array frequency consistency, microphone array phase consistency, signal saturation detection, echo reference delay and reference signal frequency consistency.
In a second aspect, an embodiment of the present invention provides a method for detecting a speech signal, including:
receiving a signal collection instruction sent by the control equipment according to the voice index to be detected;
under the condition that the playing equipment plays the voice content, collecting signals processed by the voice function nodes related to the voice index to be tested according to the signal collection instruction to obtain a voice signal to be tested;
and sending the voice signal to be tested to the control equipment.
In one embodiment, the method further comprises:
receiving a playing instruction, wherein the playing instruction comprises voice content needing to be played;
and playing the voice content.
In one embodiment, collecting signals processed by a voice function node related to the voice index to be tested includes:
and collecting at least one of voice signals picked up by a microphone in the collection device, voice signals output by the DSP and voice signals acquired by application layer software.
In one embodiment, the playing device and the collecting device are the same device.
In one embodiment, the voice index to be tested includes: at least one of round-trip delay value, node voice quality, near-field voice signal amplitude, system background noise of the equipment to be tested, microphone array frequency consistency, microphone array phase consistency, signal saturation detection, echo reference delay and reference signal frequency consistency.
In a third aspect, an embodiment of the present invention provides a speech signal detection apparatus, including:
the first sending module is used for sending a playing instruction to the playing equipment;
the second sending module is used for sending a signal collection instruction to the collection equipment according to the voice index to be detected;
the first receiving module is used for receiving a to-be-detected voice signal returned by the collecting equipment according to the signal collecting instruction, wherein the to-be-detected voice signal comprises a signal processed by a voice function node related to the to-be-detected voice index in the collecting equipment;
and the analysis module is used for obtaining an analysis result according to the voice signal to be detected.
In one embodiment, the playing device and the collecting device are the same device.
In one embodiment, the analysis module is further configured to analyze the voice signal to be tested by using a voice algorithm, where the voice algorithm includes at least one of a spectrum analysis, a delay analysis, and a noise suppression of the voice signal.
In a fourth aspect, an embodiment of the present invention provides a speech signal detection apparatus, including:
the second receiving module is used for receiving a signal collection instruction sent by the control equipment according to the voice index to be detected;
the collection module is used for collecting signals processed by the voice function nodes related to the voice indexes to be tested according to the signal collection instructions under the condition that the playing equipment plays voice contents to obtain voice signals to be tested;
and the third sending module is used for sending the voice signal to be detected to the control equipment.
In one embodiment, the apparatus further comprises:
a third receiving module, configured to receive a play instruction, where the play instruction includes a voice content to be played;
and the playback module is used for playing the voice content.
In one embodiment, the collection module is further configured to collect at least one of a voice signal picked up by a microphone in the collection device, a voice signal output by the DSP, and a voice signal acquired by application layer software.
In a fifth aspect, an embodiment of the present invention provides a speech signal detection device, where functions of the device may be implemented by hardware, or may be implemented by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the above-described functions.
In one embodiment, the apparatus includes a processor and a memory, the memory is used for storing a program supporting the apparatus to execute the voice signal detection method, and the processor is configured to execute the program stored in the memory. The device may also include a communication interface for communicating with other devices or a communication network.
In a sixth aspect, an embodiment of the present invention provides a computer-readable storage medium for storing computer software instructions for a speech signal detection apparatus, which includes a program for executing the speech signal detection method.
One of the above technical solutions has the following advantages or beneficial effects: and sending a signal collection instruction to the collection equipment according to the voice index to be detected, so as to obtain the voice signal to be detected processed by the voice function node related to the index in the collection equipment, and conveniently determining whether each voice function node in the collection equipment is normal by using the voice signal to be detected.
Another technical scheme in the above technical scheme has the following advantages or beneficial effects: the method can detect the signal quality of the vehicle-mounted machine voice, the system voice channel and the like of the vehicle-mounted machine detection before the vehicle leaves the factory, is beneficial to finding in advance, quickly positioning and solving problems, and reduces the cost of voice development.
The foregoing summary is provided for the purpose of description only and is not intended to be limiting in any way. In addition to the illustrative aspects, embodiments, and features described above, further aspects, embodiments, and features of the present invention will be readily apparent by reference to the drawings and following detailed description.
Drawings
In the drawings, like reference numerals refer to the same or similar parts or elements throughout the several views unless otherwise specified. The figures are not necessarily to scale. It is appreciated that these drawings depict only some embodiments in accordance with the disclosure and are therefore not to be considered limiting of its scope.
Fig. 1 shows a flow chart of a voice signal detection method according to an embodiment of the present invention.
Fig. 2 shows a flow chart of a voice signal detection method according to an embodiment of the present invention.
Fig. 3 shows a flow chart of a voice signal detection method according to an embodiment of the present invention.
Fig. 4 shows a flow chart of a voice signal detection method according to an embodiment of the present invention.
Fig. 5 is a schematic structural diagram showing an application example of the voice signal detection method according to the embodiment of the present invention.
Fig. 6 shows a flowchart of an application example of the voice signal detection method according to the embodiment of the present invention.
Fig. 7 shows a block diagram of a voice signal detecting apparatus according to an embodiment of the present invention.
Fig. 8 is a block diagram showing a configuration of a voice signal detecting apparatus according to an embodiment of the present invention.
Fig. 9 shows a block diagram of a voice signal detecting apparatus according to an embodiment of the present invention.
Fig. 10 shows a block diagram of a voice signal detecting apparatus according to an embodiment of the present invention.
Detailed Description
In the following, only certain exemplary embodiments are briefly described. As those skilled in the art will recognize, the described embodiments may be modified in various different ways, all without departing from the spirit or scope of the present invention. Accordingly, the drawings and description are to be regarded as illustrative in nature, and not as restrictive.
Fig. 1 shows a flow chart of a voice signal detection method according to an embodiment of the present invention. The method may be applied to a control device. In the embodiment of the present invention, the control device may include, but is not limited to, a Personal Computer (PC), a notebook Computer, a palm Computer, a mobile phone, and other devices having a control function. As shown in fig. 1, the method may include:
step S10, sending a playing instruction to a playing device, wherein the playing instruction may include voice content to be played;
step S11, sending a signal collection instruction to a collection device according to the voice index to be detected;
step S12, receiving a voice signal to be detected returned by the collecting equipment according to the signal collecting indication, wherein the voice signal to be detected comprises a signal processed by a voice function node related to the voice index to be detected in the collecting equipment;
and step S13, obtaining an analysis result according to the voice signal to be detected.
In one example, the vehicle machine may also be referred to as a traveling computer, an on-board computer, an intelligent central control device, and the like. The vehicle machine can realize multiple voice functions of voice entertainment control, vehicle-mounted voice control, intelligent voice prompt, voice navigation and the like. The car interior can include software and hardware functional nodes such as a microphone module, a sound device, a hardware voice noise elimination module DSP (Digital Signal Processor), a voice awakening recognition functional module and the like. The control equipment can send corresponding signal collection instructions to the collection equipment to be detected according to various voice indexes to be detected.
In one example, after the playing device plays the voice content, a microphone of the collecting device may pick up a voice signal, and the voice signal picked up by the microphone may be input into the DSP for processing. And the voice signals output after the DSP processing are processed by the system through frequency conversion, resampling, noise elimination and the like. The Application layer software can obtain the voice signal processed by the system through an API (Application Programming Interface). The microphone, the DSP, the application layer software and the like belong to voice function nodes related to voice indexes to be tested.
In one embodiment, the voice index to be tested includes: at least one of round-trip delay value, node voice quality, near-field voice signal amplitude, system background noise of the equipment to be tested, microphone array frequency consistency, microphone array phase consistency, signal saturation detection, echo reference delay and reference signal frequency consistency.
In one embodiment, the signal collection indication is used to instruct the collection device to collect at least one of a voice signal picked up by a microphone, a voice signal output by a DSP, and a voice signal acquired by application layer software.
The voice signals processed by the voice function node related to the voice index to be tested, which need to be collected by different voice indexes to be tested, may be different.
For example, in the car-in-car system, the following voice indexes can be used to detect the signal of the car-in-car voice
The quality of the signal and the voice channel of the car machine system.
(1) The Round trip delay value may include the Round trip delay value of the voice signal on the uplink and downlink paths. For example, the uplink process of the voice signal in the car machine system includes: the processing is from microphone (mic) reception to DSP processing, to MCU (Microcontroller Unit) processing, and then to the application layer of the System On Chip (SOC) of the vehicle. The downlink process comprises processing from a vehicle machine system layer to the MCU, then processing to the DSP for echo and noise elimination, and finally reaching the vehicle-mounted sound equipment and other equipment. Aiming at the index, the car machine needs to collect voice signals of each node of the uplink and downlink paths and send the voice signals to the PC. And the PC calculates the time of the voice signal reaching each node of the uplink and the downlink, and further calculates to obtain a round-trip delay value.
(2) mic & Lout node speech quality. The mic & Lout node may be a node from the DSP output to the SOC. Therefore, for the index, the car machine needs to collect the voice signal output by the DSP to the SOC and send the voice signal to the PC. And calculating the mic & Lout node voice quality by the PC.
(3) Near-field speech signal amplitude. The near-field speech may be speech near the device to be tested, such as real user speech, speech played by a sound system simulating user speech, or the like. For the index, the car machine needs to collect signals picked up by a microphone (mic) array, output signals after noise elimination and amplification of the DSP, and signals obtained by an application layer of the system, and send the signals to the PC. The PC can calculate the amplitude of these signals.
(4) The size of the ground noise of the car machine system. Aiming at the index, the car machine needs to collect signals acquired by an application layer of the system and send the signals to the PC. The PC can utilize the signal that application layer acquireed to calculate car machine system noise floor size.
(5) mic array frequency uniformity. Aiming at the index, the car machine can collect voice signals picked up by a plurality of microphones and send the voice signals to the PC. The PC uses the signals of these microphones to analyze the mic array frequency uniformity.
(6) And (5) phase consistency of the mic array. Aiming at the index, the car machine can collect voice signals picked up by a plurality of microphones and send the voice signals to the PC. The PC uses the signals of these microphones to analyze the mic array phase consistency.
(7) And (5) detecting saturation of mic and Lout signals. Wherein the saturation detection may comprise: and detecting whether the mic and Lout signals exceed a preset signal maximum value or not to judge whether amplitude truncation occurs in the signals or not. Therefore, for the index, the car machine needs to collect signals picked up by each microphone in the microphone array and signals output to the SOC by the DSP.
(8) Echo reference delay. Aiming at the index, the car machine needs to collect the signals acquired by the application layer and send the signals to the PC. The echo reference delay is calculated by the PC.
(9) mic-in (microphone input), AEC (Acoustic Echo Cancellation) reference signal frequency coherence. Aiming at the index, the car machine needs to collect the microphone input voice signal and the AEC input voice signal and send the signals to the PC. The frequency coincidence of the microphone input and the AEC reference signal is calculated by the PC.
(10) Linear AEC effect. Aiming at the index, under the condition that the DSP starts linear AEC to reduce noise, the car machine needs to collect the voice signal output by the DSP and send the voice signal to the PC. The PC calculates how many decibels (dB) can be eliminated by turning on the linear AEC noise reduction.
Different signal collection indications may be preconfigured in the control device for different speech indicators. And, for different voice indexes, a playing device may be required to play specific voice content. The playback device may be the same device as the collection device. For example, the playback function is realized by a speaker or the like in the collection device. The playback device may also be a device separate from the collection device. Therefore, the voice content to be played by the collecting device and/or the playing device can be configured in advance in the control device according to different voice indexes.
In some scenarios, a collection device may be required to play specified voice content. The control device may send the signal collection indication and the play indication to the collection device together, or may send the signal collection indication and the play indication to the collection device separately. For example, the PC is connected to a car machine, and the PC may send a signal collection instruction and a play instruction corresponding to a certain voice indicator to the car machine. The playing instruction may include a voice content that needs to be played by the car machine.
In some scenarios, a separate playback device may be required to play the specified voice content. The control device may send a signal collection indication to the collection device and a playback indication to the playback device. For example, in a speech recognition scenario, an independent playing device may be controlled to play some pre-recorded voice of a user speaking to test the speech recognition function of the car machine. In this case, the PC may transmit the prerecorded sound to a stereo connected to the PC by a play instruction, and play the prerecorded sound as a playing device through the stereo.
In some scenarios, it may be desirable for the collection device to play the specified voice content simultaneously with a separate playback device. The control device may send a signal collection instruction and a play instruction to the collection device, and send a play instruction to the play device. The voice contents played by the collecting device and the playing device at the same time can be the same or different.
In one embodiment, as shown in fig. 2, step S13 further includes: and step S21, analyzing the voice signal to be detected by using a voice algorithm, wherein the voice algorithm comprises at least one of the spectral analysis, the time delay analysis and the noise suppression of the voice signal.
In one example, the analysis results may include analysis reports of the various indicators. In addition, the voice signal to be tested from the collecting device can be stored in the control device and used as the corpus for subsequent analysis and reference.
In this embodiment, the control device sends a signal collection instruction to the collection device according to the voice index to be detected, so as to obtain a voice signal to be detected, which is processed by the voice function node related to the index in the collection device, and it is convenient to determine whether each voice function node in the collection device is normal by using the voice to be detected.
Fig. 3 shows a flow chart of a voice signal detection method according to an embodiment of the present invention. The method can be applied to a voice device to be tested, and as shown in fig. 3, the method can include:
step S31, receiving a signal collection instruction sent by the control equipment according to the voice index to be detected;
step S32, under the condition that the playing equipment plays the voice content, collecting the signal processed by the voice function node related to the voice index to be tested according to the signal collection instruction to obtain the voice signal to be tested;
and step S33, sending the voice signal to be tested to the control equipment.
In this embodiment, the control device sends a signal collection instruction to the collection device according to the voice index to be measured. And after receiving the signal collection instruction, the collection equipment to be tested starts to collect the signals processed by the voice function nodes related to the voice indexes to be tested. And then sending the collected voice signal to be tested to the control equipment. The control device may analyze the speech signal under test using a speech algorithm including at least one of spectral analysis, delay analysis, and noise suppression of the speech signal.
Different signal collection indications may be preconfigured in the control device for different speech indicators. And, for different voice indexes, a playing device may be required to play specific voice content. The playback device may be the same device as the collection device. For example, the playback function is realized by a speaker or the like in the collection device. The playback device may also be a device separate from the collection device. Therefore, the voice content to be played by the collecting device and/or the playing device can be configured in advance in the control device according to different voice indexes.
In one embodiment, as shown in fig. 4, the method comprises:
step S41, receiving a playing instruction, wherein the playing instruction comprises voice content to be played;
and step S42, playing the voice content.
In one embodiment, the playing device and the collecting device are the same device.
In some scenarios, a collection device may be required to play specified voice content. The control device may send the signal collection indication and the play indication to the collection device together, or may send the signal collection indication and the play indication to the collection device separately.
In some scenarios, a separate playback device may be required to play specified voice content to assist the control device in detecting the collection device. The control device may send a signal collection indication to the collection device and a playback indication to the playback device.
In some scenarios, it may be desirable for the collection device to play the specified voice content simultaneously with a separate playback device. The control device may send a signal collection instruction and a play instruction to the collection device, and send a play instruction to the play device. The voice contents played by the collecting device and the playing device at the same time can be the same or different.
In one embodiment, collecting signals processed by a voice function node related to the voice index to be tested includes:
and collecting at least one of voice signals picked up by a microphone in the collection device, voice signals output by the DSP and voice signals acquired by application layer software.
In one embodiment, the voice index to be tested includes: at least one of round-trip delay value, node voice quality, near-field voice signal amplitude, system background noise of the equipment to be tested, microphone array frequency consistency, microphone array phase consistency, signal saturation detection, echo reference delay and reference signal frequency consistency. For specific examples of these indexes, reference may be made to the description related to the above embodiments.
In an application example, referring to fig. 5, a sound 53 with better uniformity of sound emitting direction can be placed at the front row in the vehicle near the head of the driver, and the sound 53 is connected to the PC 52. As shown in fig. 6, the voice signal detection process may include: the test software is opened in the PC (S61), the PC is connected to the car machine 51 through a USB (Universal Serial Bus), and an ADB connection is established with the car machine system through an ADB (Android Debug Bridge) tool (S62).
Next, FE (Far End) and NE (Near End) are calibrated (S63). For example, the FE includes music played by a stereo inside the car machine 51, and the NE includes a human voice played by an externally connected stereo 53. The volume of the two sounds can be calibrated, so that the volume of the voice signal received by the central point of the microphone array of the car machine meets the set condition. For example, if the volume level of both signals is about 80dBA, the calibration is successful.
Then, a test is performed (S64). According to the index measured by the requirement, the sound equipment can play a near field white noise signal, and the vehicle machine can play noise signals such as music. The PC can send the contents to be played to the sound equipment and the car machine. In addition, the PC can control the application layer software of the car machine, the DSP software integrated in the car machine and the like to circularly record and play back through a software protocol. And the microphone signal and the application layer voice signal collected by the vehicle machine are sent to the PC.
The PC uses the speech signal spectral analysis, time delay analysis, noise suppression and other related speech algorithm calculation to generate a specific analysis report, and finishes the automatic evaluation process (S65). The various speech indicators described above may be included in the analysis report. In addition, the PC can also store the expectation and various voice signals received from the car machine.
The embodiment of the invention can detect the signal quality of the vehicle-mounted machine voice, the system voice channel and the like by calculating various voice indexes, is favorable for setting the complete and objective indexes for the hardware of each supplier, thereby determining the responsibility division among all parties of the project, and being capable of quickly positioning and solving the problems.
In addition, the voice signal detection method provided by the embodiment of the invention can detect the problems of signal quality, system access and the like in the admission stage and inform an OEM (Original Engineered manufactured) manufacturer of modification. This advantageously exposes many problems before OEM's submit them to the car manufacturer for acceptance, reducing the cost of voice development and thus greatly reducing the DSP joint tone access time.
In this embodiment, the to-be-tested collection device receives the signal collection instruction sent by the control device according to the to-be-tested voice index, can collect the to-be-tested voice signal processed by the voice function node related to the index in the collection device, and then sends the collected to-be-tested voice signal to the control device. Therefore, the control equipment can subsequently determine whether each voice function node in the collection equipment is normal by using the voice to be tested. In addition, by adopting the method of the embodiment of the invention, the voice signal quality, the system voice channel and the like of the vehicle-mounted machine detection vehicle-mounted machine can be detected by using the control equipment before the vehicle leaves the factory, thereby being beneficial to finding in advance, quickly positioning and solving problems and reducing the voice development cost.
In addition, the method of the present embodiment may also be applied to other devices with a voice function, and detect whether each voice function node of the device is normal.
Fig. 7 shows a block diagram of a voice signal detecting apparatus according to an embodiment of the present invention. The apparatus may be provided in a control device, as shown in fig. 7, and may include:
a first sending module 70, configured to send a play instruction to a playing device, where the play instruction may include a voice content to be played;
the second sending module 71 is configured to send a signal collection instruction to the collection device according to the voice index to be detected; a first receiving module 72, configured to receive a to-be-detected voice signal returned by the collecting device according to the signal collection instruction, where the to-be-detected voice signal includes a signal processed by a voice function node in the collecting device, where the voice function node is related to the to-be-detected voice indicator;
and the analysis module 73 is configured to obtain an analysis result according to the voice signal to be detected.
In one embodiment, the playing device and the collecting device are the same device.
In one embodiment, the signal collection indication is used to instruct the collection device to collect at least one of a voice signal picked up by a microphone, a voice signal output by a DSP, and a voice signal acquired by application layer software.
In one embodiment, the analysis module is further configured to analyze the voice signal to be tested by using a voice algorithm, where the voice algorithm includes at least one of a spectrum analysis, a delay analysis, and a noise suppression of the voice signal.
In one embodiment, the voice index to be tested includes: at least one of round-trip delay value, node voice quality, near-field voice signal amplitude, system background noise of the equipment to be tested, microphone array frequency consistency, microphone array phase consistency, signal saturation detection, echo reference delay and reference signal frequency consistency.
Fig. 8 is a block diagram showing a configuration of a voice signal detecting apparatus according to an embodiment of the present invention. The apparatus may be provided in a voice device to be tested, and as shown in fig. 8, the apparatus may include:
the second receiving module 81 is configured to receive a signal collection instruction sent by the control device according to the voice index to be detected;
the collecting module 82 is configured to collect, according to the signal collection instruction, a signal processed by the voice function node related to the voice index to be detected under the condition that the playing device plays the voice content, so as to obtain a voice signal to be detected;
and a third sending module 83, configured to send the voice signal to be detected to the control device. The control equipment can obtain an analysis result according to the voice signal to be detected.
As shown in fig. 9, the apparatus further includes:
a third receiving module 85, configured to receive a playing instruction, where the playing instruction includes a voice content to be played;
and the playback module 84 is used for playing the voice content.
In one embodiment, the collection module is further configured to collect at least one of a voice signal picked up by a microphone in the collection device, a voice signal output by a DSP, and a voice signal acquired by application layer software.
In one embodiment, the voice index to be tested includes: at least one of round-trip delay value, node voice quality, near-field voice signal amplitude, system background noise of the equipment to be tested, microphone array frequency consistency, microphone array phase consistency, signal saturation detection, echo reference delay and reference signal frequency consistency.
In one embodiment, the playing device and the collecting device are the same device.
The functions of each module in each apparatus in the embodiments of the present invention may refer to the corresponding description in the above method, and are not described herein again.
Fig. 10 shows a block diagram of a structure of a voice signal detecting apparatus according to an embodiment of the present invention. As shown in fig. 10, the voice signal detecting apparatus includes: a memory 910 and a processor 920, the memory 910 having stored therein computer programs operable on the processor 920. The processor 920 implements the voice signal detection method in the above-described embodiment when executing the computer program. The number of the memory 910 and the processor 920 may be one or more.
The apparatus further comprises:
and a communication interface 930 for communicating with an external device to perform data interactive transmission.
Memory 910 may include high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
If the memory 910, the processor 920 and the communication interface 930 are implemented independently, the memory 910, the processor 920 and the communication interface 930 may be connected to each other through a bus and perform communication with each other. The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (Extended Industry Standard Architecture) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 10, but this is not intended to represent only one bus or type of bus.
Optionally, in an implementation, if the memory 910, the processor 920 and the communication interface 930 are integrated on a chip, the memory 910, the processor 920 and the communication interface 930 may complete communication with each other through an internal interface.
An embodiment of the present invention provides a computer-readable storage medium, which stores a computer program, and the computer program is used for implementing the method of any one of the above embodiments when being executed by a processor.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a separate product, may also be stored in a computer readable storage medium. The storage medium may be a read-only memory, a magnetic or optical disk, or the like.
The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive various changes or substitutions within the technical scope of the present invention, and these should be covered by the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (18)

1. A method for detecting a speech signal, comprising:
sending a playing instruction to a playing device;
sending signal collection instructions corresponding to the voice indexes to be detected to collection equipment according to the voice indexes to be detected, wherein the signal collection instructions corresponding to each voice index are configured in advance;
receiving a voice signal to be detected returned by the collecting equipment according to the signal collecting instruction, wherein the voice signal to be detected comprises a signal processed by a voice function node related to the voice index to be detected in the collecting equipment;
and obtaining an analysis result according to the voice signal to be detected so as to determine whether each voice function node in the collection equipment is normal or not by using the voice signal to be detected.
2. The method of claim 1, wherein the playback device and the collection device are the same device.
3. The method of claim 1, wherein the signal collection indication is used to instruct the collection device to collect at least one of a voice signal picked up by a microphone, a voice signal output by a DSP, and a voice signal acquired by application layer software.
4. The method according to claim 1, wherein obtaining an analysis result according to the speech signal to be tested comprises:
and analyzing the voice signal to be detected by utilizing a voice algorithm, wherein the voice algorithm comprises at least one of spectral analysis, time delay analysis and noise suppression of the voice signal.
5. The method according to any one of claims 1 to 4, wherein the speech indicator under test comprises: at least one of round-trip delay value, node voice quality, near-field voice signal amplitude, system background noise of the equipment to be tested, microphone array frequency consistency, microphone array phase consistency, signal saturation detection, echo reference delay and reference signal frequency consistency.
6. A method for detecting a speech signal, comprising:
receiving signal collection instructions which are sent by control equipment according to voice indexes to be detected and correspond to the voice indexes to be detected, wherein the signal collection instructions corresponding to each voice index are configured in advance;
under the condition that the playing equipment plays the voice content, collecting signals processed by the voice function nodes related to the voice index to be tested according to the signal collection instruction to obtain a voice signal to be tested;
and sending the voice signal to be tested to the control equipment, wherein the voice signal to be tested is used for determining whether each voice function node in the collection equipment is normal or not.
7. The method of claim 6, further comprising:
receiving a playing instruction, wherein the playing instruction comprises voice content needing to be played;
and playing the voice content.
8. The method of claim 6, wherein collecting signals processed by the voice function node associated with the voice metric under test comprises:
and collecting at least one of voice signals picked up by a microphone in the collection device, voice signals output by the DSP and voice signals acquired by application layer software.
9. The method of claim 8, wherein the playback device and the collection device are the same device.
10. The method according to any one of claims 6 to 9, wherein the speech indicator under test comprises: at least one of round-trip delay value, node voice quality, near-field voice signal amplitude, system background noise of the equipment to be tested, microphone array frequency consistency, microphone array phase consistency, signal saturation detection, echo reference delay and reference signal frequency consistency.
11. A speech signal detection apparatus, comprising:
the first sending module is used for sending a playing instruction to the playing equipment;
the second sending module is used for sending signal collection instructions corresponding to the voice indexes to be detected to the collection equipment according to the voice indexes to be detected, wherein the signal collection instructions corresponding to each voice index are configured in advance;
the first receiving module is used for receiving a to-be-detected voice signal returned by the collecting equipment according to the signal collecting instruction, wherein the to-be-detected voice signal comprises a signal processed by a voice function node related to the to-be-detected voice index in the collecting equipment;
and the analysis module is used for obtaining an analysis result according to the voice signal to be detected so as to determine whether each voice function node in the collection equipment is normal or not by using the voice signal to be detected.
12. The apparatus of claim 11, wherein the playing device and the collecting device are the same device.
13. The apparatus of claim 11, wherein the analysis module is further configured to analyze the speech signal under test using a speech algorithm, and wherein the speech algorithm comprises at least one of a spectral analysis, a delay analysis, and a noise suppression of the speech signal.
14. A speech signal detection apparatus, comprising:
the second receiving module is used for receiving a signal collection instruction which is sent by the control equipment according to the voice indexes to be detected and corresponds to the voice indexes to be detected, and the signal collection instruction corresponding to each voice index is configured in advance;
the collection module is used for collecting signals processed by the voice function nodes related to the voice indexes to be tested according to the signal collection instructions under the condition that the playing equipment plays voice contents to obtain voice signals to be tested;
and a third sending module, configured to send the voice signal to be tested to the control device, where the voice signal to be tested is used to determine whether each voice function node in the collection device is normal.
15. The apparatus of claim 14, further comprising:
a third receiving module, configured to receive a play instruction, where the play instruction includes a voice content to be played;
and the playback module is used for playing the voice content.
16. The apparatus of claim 15, wherein the collection module is further configured to collect at least one of a voice signal picked up by a microphone in the collection device, a voice signal output by the DSP, and a voice signal obtained by application layer software.
17. A speech signal detection apparatus, characterized by comprising:
one or more processors;
storage means for storing one or more programs;
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-10.
18. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 10.
CN201910172909.8A 2019-03-07 2019-03-07 Voice signal detection method and device Active CN109979487B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910172909.8A CN109979487B (en) 2019-03-07 2019-03-07 Voice signal detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910172909.8A CN109979487B (en) 2019-03-07 2019-03-07 Voice signal detection method and device

Publications (2)

Publication Number Publication Date
CN109979487A CN109979487A (en) 2019-07-05
CN109979487B true CN109979487B (en) 2021-07-30

Family

ID=67078102

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910172909.8A Active CN109979487B (en) 2019-03-07 2019-03-07 Voice signal detection method and device

Country Status (1)

Country Link
CN (1) CN109979487B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110390954B (en) * 2019-08-06 2022-05-13 京东方科技集团股份有限公司 Method and device for evaluating quality of voice product
CN112017636B (en) * 2020-08-27 2024-02-23 大众问问(北京)信息科技有限公司 User pronunciation simulation method, system, equipment and storage medium based on vehicle

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102143524A (en) * 2010-08-31 2011-08-03 华为技术有限公司 Method, system and device for detecting voice quality
CN107886951A (en) * 2016-09-29 2018-04-06 百度在线网络技术(北京)有限公司 A kind of speech detection method, device and equipment
CN108389592A (en) * 2018-02-27 2018-08-10 上海讯飞瑞元信息技术有限公司 A kind of voice quality assessment method and device
WO2018192659A1 (en) * 2017-04-20 2018-10-25 Telefonaktiebolaget Lm Ericsson (Publ) Handling of poor audio quality in a terminal device
CN108877806A (en) * 2018-06-29 2018-11-23 中国航空无线电电子研究所 System is verified in the test for testing instruction type speech control system

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100623214B1 (en) * 1999-05-25 2006-09-12 내셔널 세미컨덕터 코포레이션 Real-time quality analyzer for voice and audio signals
EP2678861B1 (en) * 2011-02-22 2018-07-11 Speak With Me, Inc. Hybridized client-server speech recognition
CN102368384A (en) * 2011-10-19 2012-03-07 福建联迪商用设备有限公司 Voice module test method and voice module test device
CN103716470B (en) * 2012-09-29 2016-12-07 华为技术有限公司 The method and apparatus of Voice Quality Monitor
CN103077727A (en) * 2013-01-04 2013-05-01 华为技术有限公司 Method and device used for speech quality monitoring and prompting
CN105989853B (en) * 2015-02-28 2020-08-18 科大讯飞股份有限公司 Audio quality evaluation method and system
CN106816158B (en) * 2015-11-30 2020-08-07 华为技术有限公司 Voice quality assessment method, device and equipment
CN109256148B (en) * 2017-07-14 2022-06-03 中国移动通信集团浙江有限公司 Voice quality assessment method and device
CN109147765B (en) * 2018-11-16 2021-09-03 安徽听见科技有限公司 Audio quality comprehensive evaluation method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102143524A (en) * 2010-08-31 2011-08-03 华为技术有限公司 Method, system and device for detecting voice quality
CN107886951A (en) * 2016-09-29 2018-04-06 百度在线网络技术(北京)有限公司 A kind of speech detection method, device and equipment
WO2018192659A1 (en) * 2017-04-20 2018-10-25 Telefonaktiebolaget Lm Ericsson (Publ) Handling of poor audio quality in a terminal device
CN108389592A (en) * 2018-02-27 2018-08-10 上海讯飞瑞元信息技术有限公司 A kind of voice quality assessment method and device
CN108877806A (en) * 2018-06-29 2018-11-23 中国航空无线电电子研究所 System is verified in the test for testing instruction type speech control system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
A Parametric Objective Quality Assessment Tool for Speech Signals Degraded by Acoustic Echo;Leonardo O.Nunes,et al.;《IEEE Transactions on Audio, Speech, and Language Processing》;IEEE;20120507;第20卷(第8期);第2181-2190页 *
网络实时音频QoS性能分析新方法;董昕等;《电讯技术》;中国知网;20180906;第58卷(第9期);第1096-1102页 *

Also Published As

Publication number Publication date
CN109979487A (en) 2019-07-05

Similar Documents

Publication Publication Date Title
US8126156B2 (en) Calibrating at least one system microphone
US8073146B2 (en) Audio test apparatus and test method thereof
CN109831733A (en) Test method, device, equipment and the storage medium of audio broadcast performance
CN102510418B (en) Intelligibility of speech method of measurement under noise circumstance and device
CN109712608B (en) Multi-sound zone awakening test method, device and storage medium
CN109979487B (en) Voice signal detection method and device
US10530917B2 (en) System for automating tuning hands-free systems
CN111798852A (en) Voice wake-up recognition performance test method, device and system and terminal equipment
CN105979462A (en) Test processing method and device based on microphone
CN107360530A (en) The method of testing and device of a kind of echo cancellor
CN109195090A (en) Test method and system for microphone electro mechanical parameter in product
CN111866690A (en) Microphone testing method and device
CN112261229B (en) Bone conduction call equipment testing method, device and system
CN111060874A (en) Sound source positioning method and device, storage medium and terminal equipment
EP3121808A3 (en) System and method of modeling characteristics of a musical instrument
CN102565191A (en) Device for evaluating sound insulation and absorption performance of automobile body part
CN111951833A (en) Voice test method and device, electronic equipment and storage medium
JP3920226B2 (en) Resonance frequency detection method, resonance frequency selection method, and resonance frequency detection apparatus
CN112017636A (en) Vehicle-based user pronunciation simulation method, system, device and storage medium
CN112995882B (en) Intelligent equipment audio open loop test method
CN115901943A (en) Method and system for detecting internal cavity
CN116142101A (en) Entertainment system and method for vehicle, vehicle and storage medium
CN114245281B (en) Voice performance test method and system
US20020099551A1 (en) STI measuring
CN113096694B (en) Electronic terminal and play quality detection method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20211012

Address after: 100176 Room 101, 1st floor, building 1, yard 7, Ruihe West 2nd Road, economic and Technological Development Zone, Daxing District, Beijing

Patentee after: Apollo Zhilian (Beijing) Technology Co.,Ltd.

Address before: 100085 Baidu Building, 10 Shangdi Tenth Street, Haidian District, Beijing

Patentee before: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) Co.,Ltd.

TR01 Transfer of patent right