CN110010126B - Speech recognition method, apparatus, device and storage medium - Google Patents

Speech recognition method, apparatus, device and storage medium Download PDF

Info

Publication number
CN110010126B
CN110010126B CN201910180338.2A CN201910180338A CN110010126B CN 110010126 B CN110010126 B CN 110010126B CN 201910180338 A CN201910180338 A CN 201910180338A CN 110010126 B CN110010126 B CN 110010126B
Authority
CN
China
Prior art keywords
voice
positions
awakening
signal
signals
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910180338.2A
Other languages
Chinese (zh)
Other versions
CN110010126A (en
Inventor
陈建哲
张腾飞
向伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Apollo Intelligent Connectivity Beijing Technology Co Ltd
Original Assignee
Baidu International Technology Shenzhen Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Baidu International Technology Shenzhen Co ltd filed Critical Baidu International Technology Shenzhen Co ltd
Priority to CN202111266002.1A priority Critical patent/CN113990320A/en
Priority to CN201910180338.2A priority patent/CN110010126B/en
Priority to CN202111055499.2A priority patent/CN113782019A/en
Publication of CN110010126A publication Critical patent/CN110010126A/en
Application granted granted Critical
Publication of CN110010126B publication Critical patent/CN110010126B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S5/00Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
    • G01S5/18Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • General Physics & Mathematics (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Traffic Control Systems (AREA)
  • Navigation (AREA)

Abstract

The embodiment of the invention provides a voice recognition method, a voice recognition device, voice recognition equipment and a storage medium. The voice recognition method can comprise the following steps: acquiring multi-path awakening voice signals from a plurality of positions; carrying out sound source positioning on the multi-channel awakening voice signals, and determining awakening voice positions; suppressing audio signals at other positions except the awakening voice position to obtain a signal to be identified; and carrying out voice recognition on the signal to be recognized. The voice awakening position is determined firstly, and the audio signals of other positions can be restrained, so that the effectiveness of the voice awakening position is kept, the influence of noise signals of other positions on voice recognition is reduced, and the interference to the voice awakening position is reduced.

Description

Speech recognition method, apparatus, device and storage medium
Technical Field
The present invention relates to the field of speech recognition technologies, and in particular, to a speech recognition method, apparatus, device, and storage medium.
Background
The current vehicle-mounted voice recognition system usually only allows a person in a specific position to input voice in a quiet environment. However, in a vehicle-mounted environment, a scene in which a plurality of people speak in a vehicle often occurs. For example, someone is making a call while another wants to voice initiate operations such as navigation. At this time, if the sound of the telephone is recorded by a microphone of the car machine, a lot of false identifications of the car machine can be caused.
Disclosure of Invention
Embodiments of the present invention provide a speech recognition method, apparatus, device, and storage medium, so as to solve one or more technical problems in the prior art.
In a first aspect, an embodiment of the present invention provides a speech recognition method, including:
acquiring multi-path awakening voice signals from a plurality of positions;
carrying out sound source positioning on the multi-channel awakening voice signals, and determining awakening voice positions;
suppressing audio signals at other positions except the awakening voice position to obtain a signal to be identified;
and carrying out voice recognition on the signal to be recognized.
In an embodiment of the present invention, the performing sound source localization on the multiple wake-up voice signals and determining the wake-up voice position includes:
and positioning a sound source by utilizing the signal energy of the multi-path awakening voice signals, and determining the position corresponding to the path of awakening voice signal with the maximum signal energy as the awakening voice position.
In one embodiment of the invention, the method further comprises:
and adjusting the angle of a microphone array by utilizing a beam forming mode so that the microphone array faces the awakening voice position.
In an embodiment of the present invention, suppressing audio signals at other positions than the wake-up voice position to obtain a signal to be recognized includes:
receiving a first voice signal of a microphone of the wake-up voice position;
receiving second voice signals of microphones at other positions;
and eliminating each second voice signal from the first voice signal by using a digital signal processor to obtain a signal to be recognized.
In an embodiment of the present invention, suppressing audio signals at other positions than the wake-up voice position to obtain a signal to be recognized includes:
controlling the microphones at the other positions to stop receiving sound;
and receiving a signal to be identified of the microphone at the awakening voice position.
In a second aspect, an embodiment of the present invention provides a speech recognition apparatus, including:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring multi-path awakening voice signals from a plurality of positions;
the sound source positioning unit is used for carrying out sound source positioning on the multipath awakening voice signals and determining awakening voice positions;
the suppression unit is used for suppressing the audio signals at other positions except the awakening voice position to obtain a signal to be identified;
and the recognition unit is used for carrying out voice recognition on the signal to be recognized.
In an embodiment of the present invention, the sound source positioning unit is further configured to perform sound source positioning by using signal energy of the multiple paths of wake-up voice signals, and determine a position corresponding to one path of wake-up voice signal with the largest signal energy as a wake-up voice position.
In one embodiment of the invention, the apparatus further comprises:
and the beam forming unit is used for adjusting the angle of the microphone array in a beam forming mode so that the microphone array faces the awakening voice position.
In one embodiment of the present invention, the suppressing unit includes:
the first receiving subunit is used for receiving a first voice signal of the microphone at the awakening voice position; receiving second voice signals of microphones at other positions;
and the eliminating subunit is used for eliminating each second voice signal from the first voice signal by using a digital signal processor to obtain a signal to be recognized.
In one embodiment of the present invention, the suppressing unit includes:
the stop control unit is used for controlling the microphones at other positions to stop sound reception;
and the second receiving subunit is used for receiving the signal to be identified of the microphone at the awakening voice position.
In a third aspect, an embodiment of the present invention provides a speech recognition device, where functions of the speech recognition device may be implemented by hardware, or may be implemented by hardware executing corresponding software. The hardware or software includes one or more units corresponding to the above functions.
In one embodiment, the apparatus is configured to include a processor and a memory, the memory is used for storing a program that supports the apparatus to execute the above-mentioned speech recognition method, and the processor is configured to execute the program stored in the memory. The device may also include a communication interface for communicating with other devices or a communication network.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium for storing computer software instructions for a speech recognition apparatus, which includes a program for executing the speech recognition method.
One of the above technical solutions has the following advantages or beneficial effects: the voice awakening position is determined firstly, and the audio signals of other positions can be restrained, so that the effectiveness of the voice awakening position is kept, the influence of noise signals of other positions on voice recognition is reduced, and the interference to the voice awakening position is reduced. Therefore, accurate voice recognition results can be obtained more favorably, and user experience is improved.
Another technical scheme in the above technical scheme has the following advantages or beneficial effects: by adopting the voice recognition method provided by the embodiment of the invention, an anti-interference recognition scheme can be added in the vehicle. If a person at a location within the vehicle utters a wake up word, the location is determined to be a wake up voice location, and recognition of words spoken by the person at the location can then be performed. People at other positions can not interfere with people who awaken the voice position when speaking, so that the user experience is better, and the voice recognition of the car machine is more intelligent and accurate.
The foregoing summary is provided for the purpose of description only and is not intended to be limiting in any way. In addition to the illustrative aspects, embodiments, and features described above, further aspects, embodiments, and features of the present invention will be readily apparent by reference to the drawings and following detailed description.
Drawings
In the drawings, like reference numerals refer to the same or similar parts or elements throughout the several views unless otherwise specified. The figures are not necessarily to scale. It is appreciated that these drawings depict only some embodiments in accordance with the disclosure and are therefore not to be considered limiting of its scope.
Fig. 1 schematically shows a flow chart of a speech recognition method according to an embodiment of the invention.
Fig. 2 schematically shows a flow chart of a speech recognition method according to another embodiment of the invention.
Fig. 3 schematically shows a schematic view of an application scenario of a speech recognition method according to yet another embodiment of the present invention.
Fig. 4 schematically shows a flow chart of a speech recognition method according to a further embodiment of the invention.
Fig. 5 schematically shows a schematic view of a speech recognition arrangement according to an embodiment of the invention.
Fig. 6 schematically shows a schematic view of a speech recognition arrangement according to another embodiment of the invention.
Fig. 7 schematically shows a schematic view of a speech recognition device according to an embodiment of the present invention.
Detailed Description
In the following, only certain exemplary embodiments are briefly described. As those skilled in the art will recognize, the described embodiments may be modified in various different ways, all without departing from the spirit or scope of the present invention. Accordingly, the drawings and description are to be regarded as illustrative in nature, and not as restrictive.
Fig. 1 schematically shows a flow chart of a speech recognition method according to an embodiment of the invention. As shown in fig. 1, the method may include:
step 101, acquiring multipath wake-up voice signals from a plurality of positions.
And 102, carrying out sound source positioning on the multipath awakening voice signals, and determining the awakening voice position.
And 103, suppressing the audio signals at other positions except the awakening voice position to obtain a signal to be identified.
And step 104, performing voice recognition on the signal to be recognized.
In one embodiment, the microphone array may include a plurality of microphones installed at a plurality of positions, and the designated space may be divided into a plurality of sound zones according to the positions of the microphones. For example: four microphones are installed in the vehicle, and are respectively close to a front driving position, a left rear driving position and a right rear driving position. The four microphones are used to divide the space inside the vehicle into four sound zones, corresponding to a driving sound zone, a copilot sound zone, a left rear driving sound zone and a right rear driving sound zone.
Each microphone may also be connected to a corresponding wake-up engine. The microphone can keep the sound receiving state when the voice device is not awakened. If a voice signal received by a certain microphone includes a wake-up word, a wake-up engine connected with the microphone can wake up the voice function of the way. These voice signals including the wake-up word may be referred to simply as wake-up voice signals.
In one embodiment, step 102 comprises: and positioning a sound source by utilizing the signal energy of the multi-path awakening voice signals, and determining the position corresponding to the path of awakening voice signal with the maximum signal energy as the awakening voice position.
The distance between the same sound source and each microphone may be different, and thus, the amount of energy received by each microphone from the voice signal emitted from the sound source may be different. Comparing the wake-up voice signals received by the microphone signals, the position of the microphone receiving the signal with the maximum energy can be determined as the wake-up voice position.
In one embodiment, as shown in fig. 2, the method further comprises:
step 201, adjusting an angle of a microphone array by using a beam forming manner so that the microphone array faces to the wake-up voice position. Specifically, after sound source positioning, the angle of the microphone array is adjusted by using a beam forming mode, so that the energy of a voice signal acquired in the direction where the awakening voice position is located is maximum, the voice signal is most effective, and other directions are weakened, so that the preliminary suppression effect on a noise signal is achieved.
In one embodiment, step 103 may include a variety of ways, exemplified by the following:
example one: and eliminating audio signals of other positions by using a Digital Signal Processor (DSP). The method specifically comprises the following steps: receiving a first voice signal of a microphone of the wake-up voice position; receiving second voice signals of microphones at other positions; and utilizing the DSP to eliminate each second voice signal from the first voice signal so as to obtain a signal to be recognized. For example, the DSP may subtract out the signals of the microphones other than the wake-up voice location.
In one application scenario, the received speech signal of the driving microphone includes the speech signal received by the co-driving microphone. If the voice signal received by the microphone of the co-driver is available, the voice signal received by the microphone of the co-driver can be cancelled from the voice signal received by the microphone of the driving. In this way, the influence of other microphones on the signals received by the driving microphone can be eliminated more effectively.
Example two: and controlling the microphone in the voice awakening position to receive the voice, and forbidding the microphones in other positions to receive the voice. The method specifically comprises the following steps: controlling the microphones at the other positions to stop receiving sound; and receiving a signal to be identified of the microphone at the awakening voice position.
According to the method provided by the embodiment of the invention, the voice awakening position is determined firstly, and the audio signals at other positions can be inhibited, so that the effectiveness of the voice at the voice awakening position is kept, the influence of noise signals at other positions on voice recognition is reduced, and the interference on the voice awakening position is reduced. Therefore, accurate voice recognition results can be obtained more favorably, and user experience is improved.
In an application example, a speech recognition system of a vehicle is taken as an example. As shown in fig. 3, the vehicle interior includes four corresponding microphones 301 at four positions (e.g., a driving position, a passenger driving position, a rear left driving position, and a rear right driving position). Each microphone 301 is connected to one wake-up engine 302 for a total of four wake-up engines. The voice signals received by the four microphones are all input into the same DSP for inhibition processing, so that the anti-interference effect is achieved. In addition, the system can also comprise a path recognition engine 303 for performing voice recognition on the signal after the DSP suppression processing.
As shown in fig. 4, the flow of speech recognition may include:
step 401, if the voice sent by the sound source includes a wake-up word, a certain position is woken up. For example, a microphone near the location of a sound source has a speech signal input that includes a wake-up word that wakes up the zone in which the microphone is located. Because the voice signals are not subjected to DSP suppression processing, and voice signals are input in other three positions, the sound zone where other positions are located can be awakened. In this case, four microphones of the car machine perform four-way recording. The driving, the copilot, the left-right driving and the right-rear driving are all provided with sound recording.
Step 402, the DSP can adjust the angles of the microphone array to point to the corresponding four positions by the wave velocity forming technology during initialization. Therefore, four wake-up voice signals at four positions are input into the DSP for processing. Wherein, through sound source localization, can regard the position that the signal energy is the biggest as awakening the pronunciation position. And the DSP inhibits the audio signals of other three positions through an algorithm to obtain a clean voice signal of the awakening voice position.
Step 403, the sound zone where the awakening voice position is located can be locked, and only the signal to be identified of the sound zone where the awakening voice position is located is obtained. In addition, when voice recognition is carried out, a clean signal to be recognized of the sound zone can be obtained through the DSP.
Step 404, after determining the sound zone where the awakening voice position is located, if the voice recognition function is executed, only the signal to be recognized of the sound zone where the awakening voice position is located may be responded. If someone in other positions chats, the anti-interference function can be achieved. For example, the driver is in the voice zone to wake up and then the navigation is recognized, and at the moment, if someone is calling at the position of the copilot. Since only the voice signal of the driving position is acquired and the voice signal of the driving position is subjected to the DSP suppression processing. Therefore, the instruction for navigation can be correctly recognized while driving.
By adopting the voice recognition method provided by the embodiment of the invention, an anti-interference recognition scheme can be added in the vehicle. If a person at a location within the vehicle utters a wake up word, the location is determined to be a wake up voice location, and recognition of words spoken by the person at the location can then be performed. People at other positions can not interfere with people who awaken the voice position when speaking, so that the user experience is better, and the voice recognition of the car machine is more intelligent and accurate.
Fig. 5 schematically shows a schematic view of a speech recognition arrangement according to an embodiment of the invention. As shown in fig. 5, the apparatus may include:
an obtaining unit 501, configured to obtain multiple wake-up voice signals from multiple locations;
a sound source positioning unit 502, configured to perform sound source positioning on the multiple wake-up voice signals, and determine a wake-up voice position;
a suppressing unit 503, configured to suppress audio signals at other positions than the wake-up voice position to obtain a signal to be recognized;
a recognition unit 504, configured to perform speech recognition on the signal to be recognized.
In an embodiment of the present invention, the sound source positioning unit is further configured to perform sound source positioning by using signal energy of the multiple paths of wake-up voice signals, and determine a position corresponding to one path of wake-up voice signal with the largest signal energy as a wake-up voice position.
In one embodiment of the present invention, as shown in fig. 6, the apparatus further comprises:
a beam forming unit 601, configured to adjust an angle of the microphone array by using beam forming, so that the microphone array faces the wake-up voice position.
In one embodiment of the present invention, the suppressing unit 503 includes:
the first receiving subunit is used for receiving a first voice signal of the microphone at the awakening voice position; receiving second voice signals of microphones at other positions;
and the eliminating subunit is used for eliminating each second voice signal from the first voice signal by using a digital signal processor to obtain a signal to be recognized.
In one embodiment of the present invention, the suppressing unit 503 includes:
the stop control unit is used for controlling the microphones at other positions to stop sound reception;
and the second receiving subunit is used for receiving the signal to be identified of the microphone at the awakening voice position.
The functions of each unit in each device in the embodiments of the present invention may refer to the corresponding description in the above method, and are not described herein again.
Fig. 7 schematically shows a schematic view of a speech recognition device according to an embodiment of the present invention. As shown in fig. 7, the voice recognition apparatus includes: a memory 910 and a processor 920, the memory 910 having stored therein computer programs operable on the processor 920. The processor 920 implements the speech recognition method in the above embodiments when executing the computer program. The number of the memory 910 and the processor 920 may be one or more.
The apparatus further comprises:
and a communication interface 930 for communicating with an external device to perform data interactive transmission.
Memory 910 may include high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
If the memory 910, the processor 920 and the communication interface 930 are implemented independently, the memory 910, the processor 920 and the communication interface 930 may be connected to each other through a bus and perform communication with each other. The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (Extended Industry Standard Architecture) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 7, but this is not intended to represent only one bus or type of bus.
Optionally, in an implementation, if the memory 910, the processor 920 and the communication interface 930 are integrated on a chip, the memory 910, the processor 920 and the communication interface 930 may complete communication with each other through an internal interface.
An embodiment of the present invention provides a computer-readable storage medium, which stores a computer program, and the computer program is used for implementing the method of any one of the above embodiments when being executed by a processor.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a separate product, may also be stored in a computer readable storage medium. The storage medium may be a read-only memory, a magnetic or optical disk, or the like.
The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive various changes or substitutions within the technical scope of the present invention, and these should be covered by the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (10)

1. A speech recognition method, comprising:
acquiring multi-path awakening voice signals from a plurality of positions;
carrying out sound source positioning on the multi-channel awakening voice signals, and determining awakening voice positions;
suppressing the audio signals at other positions except the awakening voice position in a way of stopping reception of the audio signals at other positions except the awakening voice position to obtain a signal to be identified;
and carrying out voice recognition on the signal to be recognized.
2. The method of claim 1, wherein performing sound source localization on the plurality of wake-up voice signals and determining a wake-up voice position comprises:
and positioning a sound source by utilizing the signal energy of the multi-path awakening voice signals, and determining the position corresponding to the path of awakening voice signal with the maximum signal energy as the awakening voice position.
3. The method of claim 1, further comprising:
and adjusting the angle of a microphone array by utilizing a beam forming mode so that the microphone array faces the awakening voice position.
4. The method according to any one of claims 1 to 3, wherein the suppressing the audio signals at the positions other than the wake-up voice position to obtain the signal to be recognized by stopping the sound reception of the audio signals at the positions other than the wake-up voice position comprises:
controlling the microphones at the other positions to stop receiving sound;
and receiving a signal to be identified of the microphone at the awakening voice position.
5. A speech recognition apparatus, comprising:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring multi-path awakening voice signals from a plurality of positions;
the sound source positioning unit is used for carrying out sound source positioning on the multipath awakening voice signals and determining awakening voice positions;
the suppression unit is used for suppressing the audio signals at the positions other than the awakening voice position in a way of stopping sound reception of the audio signals at the positions other than the awakening voice position so as to obtain a signal to be identified;
and the recognition unit is used for carrying out voice recognition on the signal to be recognized.
6. The apparatus according to claim 5, wherein the sound source localization unit is further configured to perform sound source localization by using signal energy of the multiple wake-up voice signals, and determine a position corresponding to one of the wake-up voice signals with the largest signal energy as the wake-up voice position.
7. The apparatus of claim 5, further comprising:
and the beam forming unit is used for adjusting the angle of the microphone array in a beam forming mode so that the microphone array faces the awakening voice position.
8. The apparatus according to any one of claims 5 to 7, wherein the suppressing unit comprises:
the stop control unit is used for controlling the microphones at other positions to stop sound reception;
and the second receiving subunit is used for receiving the signal to be identified of the microphone at the awakening voice position.
9. A speech recognition device, comprising:
one or more processors;
storage means for storing one or more programs;
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-4.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 4.
CN201910180338.2A 2019-03-11 2019-03-11 Speech recognition method, apparatus, device and storage medium Active CN110010126B (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN202111266002.1A CN113990320A (en) 2019-03-11 2019-03-11 Speech recognition method, apparatus, device and storage medium
CN201910180338.2A CN110010126B (en) 2019-03-11 2019-03-11 Speech recognition method, apparatus, device and storage medium
CN202111055499.2A CN113782019A (en) 2019-03-11 2019-03-11 Speech recognition method, apparatus, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910180338.2A CN110010126B (en) 2019-03-11 2019-03-11 Speech recognition method, apparatus, device and storage medium

Related Child Applications (2)

Application Number Title Priority Date Filing Date
CN202111266002.1A Division CN113990320A (en) 2019-03-11 2019-03-11 Speech recognition method, apparatus, device and storage medium
CN202111055499.2A Division CN113782019A (en) 2019-03-11 2019-03-11 Speech recognition method, apparatus, device and storage medium

Publications (2)

Publication Number Publication Date
CN110010126A CN110010126A (en) 2019-07-12
CN110010126B true CN110010126B (en) 2021-10-08

Family

ID=67166812

Family Applications (3)

Application Number Title Priority Date Filing Date
CN201910180338.2A Active CN110010126B (en) 2019-03-11 2019-03-11 Speech recognition method, apparatus, device and storage medium
CN202111266002.1A Withdrawn CN113990320A (en) 2019-03-11 2019-03-11 Speech recognition method, apparatus, device and storage medium
CN202111055499.2A Pending CN113782019A (en) 2019-03-11 2019-03-11 Speech recognition method, apparatus, device and storage medium

Family Applications After (2)

Application Number Title Priority Date Filing Date
CN202111266002.1A Withdrawn CN113990320A (en) 2019-03-11 2019-03-11 Speech recognition method, apparatus, device and storage medium
CN202111055499.2A Pending CN113782019A (en) 2019-03-11 2019-03-11 Speech recognition method, apparatus, device and storage medium

Country Status (1)

Country Link
CN (3) CN110010126B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110364176A (en) * 2019-08-21 2019-10-22 百度在线网络技术(北京)有限公司 Audio signal processing method and device
CN110517677B (en) * 2019-08-27 2022-02-08 腾讯科技(深圳)有限公司 Speech processing system, method, apparatus, speech recognition system, and storage medium
CN110673096B (en) * 2019-09-30 2022-02-01 北京地平线机器人技术研发有限公司 Voice positioning method and device, computer readable storage medium and electronic equipment
CN113066504A (en) * 2019-12-31 2021-07-02 上海汽车集团股份有限公司 Audio transmission method, device and computer storage medium
CN111599357A (en) * 2020-04-07 2020-08-28 宁波吉利汽车研究开发有限公司 In-vehicle multi-tone-area pickup method and device, electronic equipment and storage medium
CN111599366B (en) * 2020-05-19 2024-04-12 科大讯飞股份有限公司 Vehicle-mounted multitone region voice processing method and related device
CN111968642A (en) * 2020-08-27 2020-11-20 北京百度网讯科技有限公司 Voice data processing method and device and intelligent vehicle
CN112002340B (en) * 2020-09-03 2024-08-23 北京海云捷迅科技股份有限公司 Multi-user-based voice acquisition method and device
CN112460757A (en) * 2020-11-13 2021-03-09 芜湖美智空调设备有限公司 Air conditioner, voice control method thereof and storage medium
CN112669837B (en) * 2020-12-15 2022-12-06 北京百度网讯科技有限公司 Awakening method and device of intelligent terminal and electronic equipment
US11682411B2 (en) 2021-08-31 2023-06-20 Spotify Ab Wind noise suppresor
CN114974239A (en) * 2022-05-14 2022-08-30 云知声智能科技股份有限公司 Voice interaction method and device, electronic equipment and storage medium
CN115691490A (en) * 2022-10-09 2023-02-03 蔚来汽车科技(安徽)有限公司 Method for dynamically switching sound zone, voice interaction method, equipment, medium and vehicle

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6965863B1 (en) * 1998-11-12 2005-11-15 Microsoft Corporation Speech recognition user interface
CN105280183A (en) * 2015-09-10 2016-01-27 百度在线网络技术(北京)有限公司 Voice interaction method and system
CN105719644A (en) * 2014-12-04 2016-06-29 中兴通讯股份有限公司 Method and device for adaptively adjusting voice recognition rate
CN105976815A (en) * 2016-04-22 2016-09-28 乐视控股(北京)有限公司 Vehicle voice recognition method and vehicle voice recognition device
CN107180627A (en) * 2017-06-22 2017-09-19 歌尔股份有限公司 The method and apparatus for removing noise
CN108122556A (en) * 2017-08-08 2018-06-05 问众智能信息科技(北京)有限公司 Reduce the method and device that driver's voice wakes up instruction word false triggering
CN108286386A (en) * 2018-01-22 2018-07-17 奇瑞汽车股份有限公司 The method and apparatus of vehicle window control

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9493130B2 (en) * 2011-04-22 2016-11-15 Angel A. Penilla Methods and systems for communicating content to connected vehicle users based detected tone/mood in voice input
CN105103227A (en) * 2013-03-15 2015-11-25 英特尔公司 Mechanism for facilitating dynamic adjustment of audio input/output (I/O) setting devices at conferencing computing devices
EP3379844A4 (en) * 2015-11-17 2018-11-14 Sony Corporation Information processing device, information processing method, and program
CN105957523A (en) * 2016-04-22 2016-09-21 乐视控股(北京)有限公司 Vehicular system control method and device
US10448150B2 (en) * 2016-06-03 2019-10-15 Faraday & Future Inc. Method and apparatus to detect and isolate audio in a vehicle using multiple microphones
CN108073381A (en) * 2016-11-15 2018-05-25 腾讯科技(深圳)有限公司 A kind of object control method, apparatus and terminal device
CN106878281B (en) * 2017-01-11 2020-03-31 上海蔚来汽车有限公司 In-vehicle positioning device and method based on mixed audio and in-vehicle equipment control system
CN106782585B (en) * 2017-01-26 2020-03-20 芋头科技(杭州)有限公司 Pickup method and system based on microphone array
CN108877827B (en) * 2017-05-15 2021-04-20 福州瑞芯微电子股份有限公司 Voice-enhanced interaction method and system, storage medium and electronic equipment
CN107554456A (en) * 2017-08-31 2018-01-09 上海博泰悦臻网络技术服务有限公司 Vehicle-mounted voice control system and its control method
CN107919119A (en) * 2017-11-16 2018-04-17 百度在线网络技术(北京)有限公司 Method, apparatus, equipment and the computer-readable medium of more equipment interaction collaborations
CN108986833A (en) * 2018-08-21 2018-12-11 广州市保伦电子有限公司 Sound pick-up method, system, electronic equipment and storage medium based on microphone array
CN109391528A (en) * 2018-08-31 2019-02-26 百度在线网络技术(北京)有限公司 Awakening method, device, equipment and the storage medium of speech-sound intelligent equipment
CN109192203B (en) * 2018-09-29 2021-08-10 百度在线网络技术(北京)有限公司 Multi-sound-zone voice recognition method, device and storage medium
CN109448718A (en) * 2018-12-11 2019-03-08 广州小鹏汽车科技有限公司 A kind of audio recognition method and system based on multi-microphone array

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6965863B1 (en) * 1998-11-12 2005-11-15 Microsoft Corporation Speech recognition user interface
CN105719644A (en) * 2014-12-04 2016-06-29 中兴通讯股份有限公司 Method and device for adaptively adjusting voice recognition rate
CN105280183A (en) * 2015-09-10 2016-01-27 百度在线网络技术(北京)有限公司 Voice interaction method and system
CN105976815A (en) * 2016-04-22 2016-09-28 乐视控股(北京)有限公司 Vehicle voice recognition method and vehicle voice recognition device
CN107180627A (en) * 2017-06-22 2017-09-19 歌尔股份有限公司 The method and apparatus for removing noise
CN108122556A (en) * 2017-08-08 2018-06-05 问众智能信息科技(北京)有限公司 Reduce the method and device that driver's voice wakes up instruction word false triggering
CN108286386A (en) * 2018-01-22 2018-07-17 奇瑞汽车股份有限公司 The method and apparatus of vehicle window control

Also Published As

Publication number Publication date
CN113782019A (en) 2021-12-10
CN110010126A (en) 2019-07-12
CN113990320A (en) 2022-01-28

Similar Documents

Publication Publication Date Title
CN110010126B (en) Speech recognition method, apparatus, device and storage medium
JP6914236B2 (en) Speech recognition methods, devices, devices, computer-readable storage media and programs
CN109286875B (en) Method, apparatus, electronic device and storage medium for directional sound pickup
CN109192203B (en) Multi-sound-zone voice recognition method, device and storage medium
CN107577449B (en) Wake-up voice pickup method, device, equipment and storage medium
US20140112496A1 (en) Microphone placement for noise cancellation in vehicles
US9978355B2 (en) System and method for acoustic management
US20190237067A1 (en) Multi-channel voice recognition for a vehicle environment
CN109273020B (en) Audio signal processing method, apparatus, device and storage medium
CN107910013B (en) Voice signal output processing method and device
CN110673096B (en) Voice positioning method and device, computer readable storage medium and electronic equipment
US20170352349A1 (en) Voice processing device
CN111599357A (en) In-vehicle multi-tone-area pickup method and device, electronic equipment and storage medium
US20190037363A1 (en) Vehicle based acoustic zoning system for smartphones
US20160127827A1 (en) Systems and methods for selecting audio filtering schemes
CN106575499A (en) System and method of microphone placement for noise attenuation
WO2016103710A1 (en) Voice processing device
WO2016143340A1 (en) Speech processing device and control device
CN111599366B (en) Vehicle-mounted multitone region voice processing method and related device
CN113270095B (en) Voice processing method, device, storage medium and electronic equipment
WO2024078435A1 (en) Method for dynamically switching speech zones, speech interaction method, device, medium, and vehicle
CN109215648A (en) Vehicle-mounted voice identifying system and method
JP2024026716A (en) Signal processor and signal processing method
CN115567810A (en) Sound pickup system, sound pickup method and vehicle
CN116863968A (en) Voice signal processing method, device and vehicle system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20211012

Address after: 100176 Room 101, 1st floor, building 1, yard 7, Ruihe West 2nd Road, economic and Technological Development Zone, Daxing District, Beijing

Patentee after: Apollo Intelligent Connectivity (Beijing) Technology Co., Ltd.

Address before: 518000 301, floor 3, unit D, productivity building, No. 5, Gaoxin middle 2nd Road, Nanshan District, Shenzhen, Guangdong

Patentee before: BAIDU INTERNATIONAL TECHNOLOGY (SHENZHEN) Co.,Ltd.