CN110010126B - Speech recognition method, apparatus, device and storage medium - Google Patents
Speech recognition method, apparatus, device and storage medium Download PDFInfo
- Publication number
- CN110010126B CN110010126B CN201910180338.2A CN201910180338A CN110010126B CN 110010126 B CN110010126 B CN 110010126B CN 201910180338 A CN201910180338 A CN 201910180338A CN 110010126 B CN110010126 B CN 110010126B
- Authority
- CN
- China
- Prior art keywords
- voice
- positions
- awakening
- signal
- signals
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 38
- 230000005236 sound signal Effects 0.000 claims abstract description 18
- 230000001629 suppression Effects 0.000 claims description 6
- 238000004590 computer program Methods 0.000 claims description 5
- 230000004807 localization Effects 0.000 claims description 5
- 230000006870 function Effects 0.000 description 10
- 238000004891 communication Methods 0.000 description 9
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000005764 inhibitory process Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S5/00—Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
- G01S5/18—Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- General Physics & Mathematics (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Circuit For Audible Band Transducer (AREA)
- Traffic Control Systems (AREA)
- Navigation (AREA)
Abstract
The embodiment of the invention provides a voice recognition method, a voice recognition device, voice recognition equipment and a storage medium. The voice recognition method can comprise the following steps: acquiring multi-path awakening voice signals from a plurality of positions; carrying out sound source positioning on the multi-channel awakening voice signals, and determining awakening voice positions; suppressing audio signals at other positions except the awakening voice position to obtain a signal to be identified; and carrying out voice recognition on the signal to be recognized. The voice awakening position is determined firstly, and the audio signals of other positions can be restrained, so that the effectiveness of the voice awakening position is kept, the influence of noise signals of other positions on voice recognition is reduced, and the interference to the voice awakening position is reduced.
Description
Technical Field
The present invention relates to the field of speech recognition technologies, and in particular, to a speech recognition method, apparatus, device, and storage medium.
Background
The current vehicle-mounted voice recognition system usually only allows a person in a specific position to input voice in a quiet environment. However, in a vehicle-mounted environment, a scene in which a plurality of people speak in a vehicle often occurs. For example, someone is making a call while another wants to voice initiate operations such as navigation. At this time, if the sound of the telephone is recorded by a microphone of the car machine, a lot of false identifications of the car machine can be caused.
Disclosure of Invention
Embodiments of the present invention provide a speech recognition method, apparatus, device, and storage medium, so as to solve one or more technical problems in the prior art.
In a first aspect, an embodiment of the present invention provides a speech recognition method, including:
acquiring multi-path awakening voice signals from a plurality of positions;
carrying out sound source positioning on the multi-channel awakening voice signals, and determining awakening voice positions;
suppressing audio signals at other positions except the awakening voice position to obtain a signal to be identified;
and carrying out voice recognition on the signal to be recognized.
In an embodiment of the present invention, the performing sound source localization on the multiple wake-up voice signals and determining the wake-up voice position includes:
and positioning a sound source by utilizing the signal energy of the multi-path awakening voice signals, and determining the position corresponding to the path of awakening voice signal with the maximum signal energy as the awakening voice position.
In one embodiment of the invention, the method further comprises:
and adjusting the angle of a microphone array by utilizing a beam forming mode so that the microphone array faces the awakening voice position.
In an embodiment of the present invention, suppressing audio signals at other positions than the wake-up voice position to obtain a signal to be recognized includes:
receiving a first voice signal of a microphone of the wake-up voice position;
receiving second voice signals of microphones at other positions;
and eliminating each second voice signal from the first voice signal by using a digital signal processor to obtain a signal to be recognized.
In an embodiment of the present invention, suppressing audio signals at other positions than the wake-up voice position to obtain a signal to be recognized includes:
controlling the microphones at the other positions to stop receiving sound;
and receiving a signal to be identified of the microphone at the awakening voice position.
In a second aspect, an embodiment of the present invention provides a speech recognition apparatus, including:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring multi-path awakening voice signals from a plurality of positions;
the sound source positioning unit is used for carrying out sound source positioning on the multipath awakening voice signals and determining awakening voice positions;
the suppression unit is used for suppressing the audio signals at other positions except the awakening voice position to obtain a signal to be identified;
and the recognition unit is used for carrying out voice recognition on the signal to be recognized.
In an embodiment of the present invention, the sound source positioning unit is further configured to perform sound source positioning by using signal energy of the multiple paths of wake-up voice signals, and determine a position corresponding to one path of wake-up voice signal with the largest signal energy as a wake-up voice position.
In one embodiment of the invention, the apparatus further comprises:
and the beam forming unit is used for adjusting the angle of the microphone array in a beam forming mode so that the microphone array faces the awakening voice position.
In one embodiment of the present invention, the suppressing unit includes:
the first receiving subunit is used for receiving a first voice signal of the microphone at the awakening voice position; receiving second voice signals of microphones at other positions;
and the eliminating subunit is used for eliminating each second voice signal from the first voice signal by using a digital signal processor to obtain a signal to be recognized.
In one embodiment of the present invention, the suppressing unit includes:
the stop control unit is used for controlling the microphones at other positions to stop sound reception;
and the second receiving subunit is used for receiving the signal to be identified of the microphone at the awakening voice position.
In a third aspect, an embodiment of the present invention provides a speech recognition device, where functions of the speech recognition device may be implemented by hardware, or may be implemented by hardware executing corresponding software. The hardware or software includes one or more units corresponding to the above functions.
In one embodiment, the apparatus is configured to include a processor and a memory, the memory is used for storing a program that supports the apparatus to execute the above-mentioned speech recognition method, and the processor is configured to execute the program stored in the memory. The device may also include a communication interface for communicating with other devices or a communication network.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium for storing computer software instructions for a speech recognition apparatus, which includes a program for executing the speech recognition method.
One of the above technical solutions has the following advantages or beneficial effects: the voice awakening position is determined firstly, and the audio signals of other positions can be restrained, so that the effectiveness of the voice awakening position is kept, the influence of noise signals of other positions on voice recognition is reduced, and the interference to the voice awakening position is reduced. Therefore, accurate voice recognition results can be obtained more favorably, and user experience is improved.
Another technical scheme in the above technical scheme has the following advantages or beneficial effects: by adopting the voice recognition method provided by the embodiment of the invention, an anti-interference recognition scheme can be added in the vehicle. If a person at a location within the vehicle utters a wake up word, the location is determined to be a wake up voice location, and recognition of words spoken by the person at the location can then be performed. People at other positions can not interfere with people who awaken the voice position when speaking, so that the user experience is better, and the voice recognition of the car machine is more intelligent and accurate.
The foregoing summary is provided for the purpose of description only and is not intended to be limiting in any way. In addition to the illustrative aspects, embodiments, and features described above, further aspects, embodiments, and features of the present invention will be readily apparent by reference to the drawings and following detailed description.
Drawings
In the drawings, like reference numerals refer to the same or similar parts or elements throughout the several views unless otherwise specified. The figures are not necessarily to scale. It is appreciated that these drawings depict only some embodiments in accordance with the disclosure and are therefore not to be considered limiting of its scope.
Fig. 1 schematically shows a flow chart of a speech recognition method according to an embodiment of the invention.
Fig. 2 schematically shows a flow chart of a speech recognition method according to another embodiment of the invention.
Fig. 3 schematically shows a schematic view of an application scenario of a speech recognition method according to yet another embodiment of the present invention.
Fig. 4 schematically shows a flow chart of a speech recognition method according to a further embodiment of the invention.
Fig. 5 schematically shows a schematic view of a speech recognition arrangement according to an embodiment of the invention.
Fig. 6 schematically shows a schematic view of a speech recognition arrangement according to another embodiment of the invention.
Fig. 7 schematically shows a schematic view of a speech recognition device according to an embodiment of the present invention.
Detailed Description
In the following, only certain exemplary embodiments are briefly described. As those skilled in the art will recognize, the described embodiments may be modified in various different ways, all without departing from the spirit or scope of the present invention. Accordingly, the drawings and description are to be regarded as illustrative in nature, and not as restrictive.
Fig. 1 schematically shows a flow chart of a speech recognition method according to an embodiment of the invention. As shown in fig. 1, the method may include:
And 102, carrying out sound source positioning on the multipath awakening voice signals, and determining the awakening voice position.
And 103, suppressing the audio signals at other positions except the awakening voice position to obtain a signal to be identified.
And step 104, performing voice recognition on the signal to be recognized.
In one embodiment, the microphone array may include a plurality of microphones installed at a plurality of positions, and the designated space may be divided into a plurality of sound zones according to the positions of the microphones. For example: four microphones are installed in the vehicle, and are respectively close to a front driving position, a left rear driving position and a right rear driving position. The four microphones are used to divide the space inside the vehicle into four sound zones, corresponding to a driving sound zone, a copilot sound zone, a left rear driving sound zone and a right rear driving sound zone.
Each microphone may also be connected to a corresponding wake-up engine. The microphone can keep the sound receiving state when the voice device is not awakened. If a voice signal received by a certain microphone includes a wake-up word, a wake-up engine connected with the microphone can wake up the voice function of the way. These voice signals including the wake-up word may be referred to simply as wake-up voice signals.
In one embodiment, step 102 comprises: and positioning a sound source by utilizing the signal energy of the multi-path awakening voice signals, and determining the position corresponding to the path of awakening voice signal with the maximum signal energy as the awakening voice position.
The distance between the same sound source and each microphone may be different, and thus, the amount of energy received by each microphone from the voice signal emitted from the sound source may be different. Comparing the wake-up voice signals received by the microphone signals, the position of the microphone receiving the signal with the maximum energy can be determined as the wake-up voice position.
In one embodiment, as shown in fig. 2, the method further comprises:
In one embodiment, step 103 may include a variety of ways, exemplified by the following:
example one: and eliminating audio signals of other positions by using a Digital Signal Processor (DSP). The method specifically comprises the following steps: receiving a first voice signal of a microphone of the wake-up voice position; receiving second voice signals of microphones at other positions; and utilizing the DSP to eliminate each second voice signal from the first voice signal so as to obtain a signal to be recognized. For example, the DSP may subtract out the signals of the microphones other than the wake-up voice location.
In one application scenario, the received speech signal of the driving microphone includes the speech signal received by the co-driving microphone. If the voice signal received by the microphone of the co-driver is available, the voice signal received by the microphone of the co-driver can be cancelled from the voice signal received by the microphone of the driving. In this way, the influence of other microphones on the signals received by the driving microphone can be eliminated more effectively.
Example two: and controlling the microphone in the voice awakening position to receive the voice, and forbidding the microphones in other positions to receive the voice. The method specifically comprises the following steps: controlling the microphones at the other positions to stop receiving sound; and receiving a signal to be identified of the microphone at the awakening voice position.
According to the method provided by the embodiment of the invention, the voice awakening position is determined firstly, and the audio signals at other positions can be inhibited, so that the effectiveness of the voice at the voice awakening position is kept, the influence of noise signals at other positions on voice recognition is reduced, and the interference on the voice awakening position is reduced. Therefore, accurate voice recognition results can be obtained more favorably, and user experience is improved.
In an application example, a speech recognition system of a vehicle is taken as an example. As shown in fig. 3, the vehicle interior includes four corresponding microphones 301 at four positions (e.g., a driving position, a passenger driving position, a rear left driving position, and a rear right driving position). Each microphone 301 is connected to one wake-up engine 302 for a total of four wake-up engines. The voice signals received by the four microphones are all input into the same DSP for inhibition processing, so that the anti-interference effect is achieved. In addition, the system can also comprise a path recognition engine 303 for performing voice recognition on the signal after the DSP suppression processing.
As shown in fig. 4, the flow of speech recognition may include:
By adopting the voice recognition method provided by the embodiment of the invention, an anti-interference recognition scheme can be added in the vehicle. If a person at a location within the vehicle utters a wake up word, the location is determined to be a wake up voice location, and recognition of words spoken by the person at the location can then be performed. People at other positions can not interfere with people who awaken the voice position when speaking, so that the user experience is better, and the voice recognition of the car machine is more intelligent and accurate.
Fig. 5 schematically shows a schematic view of a speech recognition arrangement according to an embodiment of the invention. As shown in fig. 5, the apparatus may include:
an obtaining unit 501, configured to obtain multiple wake-up voice signals from multiple locations;
a sound source positioning unit 502, configured to perform sound source positioning on the multiple wake-up voice signals, and determine a wake-up voice position;
a suppressing unit 503, configured to suppress audio signals at other positions than the wake-up voice position to obtain a signal to be recognized;
a recognition unit 504, configured to perform speech recognition on the signal to be recognized.
In an embodiment of the present invention, the sound source positioning unit is further configured to perform sound source positioning by using signal energy of the multiple paths of wake-up voice signals, and determine a position corresponding to one path of wake-up voice signal with the largest signal energy as a wake-up voice position.
In one embodiment of the present invention, as shown in fig. 6, the apparatus further comprises:
a beam forming unit 601, configured to adjust an angle of the microphone array by using beam forming, so that the microphone array faces the wake-up voice position.
In one embodiment of the present invention, the suppressing unit 503 includes:
the first receiving subunit is used for receiving a first voice signal of the microphone at the awakening voice position; receiving second voice signals of microphones at other positions;
and the eliminating subunit is used for eliminating each second voice signal from the first voice signal by using a digital signal processor to obtain a signal to be recognized.
In one embodiment of the present invention, the suppressing unit 503 includes:
the stop control unit is used for controlling the microphones at other positions to stop sound reception;
and the second receiving subunit is used for receiving the signal to be identified of the microphone at the awakening voice position.
The functions of each unit in each device in the embodiments of the present invention may refer to the corresponding description in the above method, and are not described herein again.
Fig. 7 schematically shows a schematic view of a speech recognition device according to an embodiment of the present invention. As shown in fig. 7, the voice recognition apparatus includes: a memory 910 and a processor 920, the memory 910 having stored therein computer programs operable on the processor 920. The processor 920 implements the speech recognition method in the above embodiments when executing the computer program. The number of the memory 910 and the processor 920 may be one or more.
The apparatus further comprises:
and a communication interface 930 for communicating with an external device to perform data interactive transmission.
If the memory 910, the processor 920 and the communication interface 930 are implemented independently, the memory 910, the processor 920 and the communication interface 930 may be connected to each other through a bus and perform communication with each other. The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (Extended Industry Standard Architecture) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 7, but this is not intended to represent only one bus or type of bus.
Optionally, in an implementation, if the memory 910, the processor 920 and the communication interface 930 are integrated on a chip, the memory 910, the processor 920 and the communication interface 930 may complete communication with each other through an internal interface.
An embodiment of the present invention provides a computer-readable storage medium, which stores a computer program, and the computer program is used for implementing the method of any one of the above embodiments when being executed by a processor.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a separate product, may also be stored in a computer readable storage medium. The storage medium may be a read-only memory, a magnetic or optical disk, or the like.
The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive various changes or substitutions within the technical scope of the present invention, and these should be covered by the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.
Claims (10)
1. A speech recognition method, comprising:
acquiring multi-path awakening voice signals from a plurality of positions;
carrying out sound source positioning on the multi-channel awakening voice signals, and determining awakening voice positions;
suppressing the audio signals at other positions except the awakening voice position in a way of stopping reception of the audio signals at other positions except the awakening voice position to obtain a signal to be identified;
and carrying out voice recognition on the signal to be recognized.
2. The method of claim 1, wherein performing sound source localization on the plurality of wake-up voice signals and determining a wake-up voice position comprises:
and positioning a sound source by utilizing the signal energy of the multi-path awakening voice signals, and determining the position corresponding to the path of awakening voice signal with the maximum signal energy as the awakening voice position.
3. The method of claim 1, further comprising:
and adjusting the angle of a microphone array by utilizing a beam forming mode so that the microphone array faces the awakening voice position.
4. The method according to any one of claims 1 to 3, wherein the suppressing the audio signals at the positions other than the wake-up voice position to obtain the signal to be recognized by stopping the sound reception of the audio signals at the positions other than the wake-up voice position comprises:
controlling the microphones at the other positions to stop receiving sound;
and receiving a signal to be identified of the microphone at the awakening voice position.
5. A speech recognition apparatus, comprising:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring multi-path awakening voice signals from a plurality of positions;
the sound source positioning unit is used for carrying out sound source positioning on the multipath awakening voice signals and determining awakening voice positions;
the suppression unit is used for suppressing the audio signals at the positions other than the awakening voice position in a way of stopping sound reception of the audio signals at the positions other than the awakening voice position so as to obtain a signal to be identified;
and the recognition unit is used for carrying out voice recognition on the signal to be recognized.
6. The apparatus according to claim 5, wherein the sound source localization unit is further configured to perform sound source localization by using signal energy of the multiple wake-up voice signals, and determine a position corresponding to one of the wake-up voice signals with the largest signal energy as the wake-up voice position.
7. The apparatus of claim 5, further comprising:
and the beam forming unit is used for adjusting the angle of the microphone array in a beam forming mode so that the microphone array faces the awakening voice position.
8. The apparatus according to any one of claims 5 to 7, wherein the suppressing unit comprises:
the stop control unit is used for controlling the microphones at other positions to stop sound reception;
and the second receiving subunit is used for receiving the signal to be identified of the microphone at the awakening voice position.
9. A speech recognition device, comprising:
one or more processors;
storage means for storing one or more programs;
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-4.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 4.
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111266002.1A CN113990320A (en) | 2019-03-11 | 2019-03-11 | Speech recognition method, apparatus, device and storage medium |
CN201910180338.2A CN110010126B (en) | 2019-03-11 | 2019-03-11 | Speech recognition method, apparatus, device and storage medium |
CN202111055499.2A CN113782019A (en) | 2019-03-11 | 2019-03-11 | Speech recognition method, apparatus, device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910180338.2A CN110010126B (en) | 2019-03-11 | 2019-03-11 | Speech recognition method, apparatus, device and storage medium |
Related Child Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111266002.1A Division CN113990320A (en) | 2019-03-11 | 2019-03-11 | Speech recognition method, apparatus, device and storage medium |
CN202111055499.2A Division CN113782019A (en) | 2019-03-11 | 2019-03-11 | Speech recognition method, apparatus, device and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110010126A CN110010126A (en) | 2019-07-12 |
CN110010126B true CN110010126B (en) | 2021-10-08 |
Family
ID=67166812
Family Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910180338.2A Active CN110010126B (en) | 2019-03-11 | 2019-03-11 | Speech recognition method, apparatus, device and storage medium |
CN202111266002.1A Withdrawn CN113990320A (en) | 2019-03-11 | 2019-03-11 | Speech recognition method, apparatus, device and storage medium |
CN202111055499.2A Pending CN113782019A (en) | 2019-03-11 | 2019-03-11 | Speech recognition method, apparatus, device and storage medium |
Family Applications After (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111266002.1A Withdrawn CN113990320A (en) | 2019-03-11 | 2019-03-11 | Speech recognition method, apparatus, device and storage medium |
CN202111055499.2A Pending CN113782019A (en) | 2019-03-11 | 2019-03-11 | Speech recognition method, apparatus, device and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (3) | CN110010126B (en) |
Families Citing this family (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110364176A (en) * | 2019-08-21 | 2019-10-22 | 百度在线网络技术(北京)有限公司 | Audio signal processing method and device |
CN110517677B (en) * | 2019-08-27 | 2022-02-08 | 腾讯科技(深圳)有限公司 | Speech processing system, method, apparatus, speech recognition system, and storage medium |
CN110673096B (en) * | 2019-09-30 | 2022-02-01 | 北京地平线机器人技术研发有限公司 | Voice positioning method and device, computer readable storage medium and electronic equipment |
CN113066504A (en) * | 2019-12-31 | 2021-07-02 | 上海汽车集团股份有限公司 | Audio transmission method, device and computer storage medium |
CN111599357A (en) * | 2020-04-07 | 2020-08-28 | 宁波吉利汽车研究开发有限公司 | In-vehicle multi-tone-area pickup method and device, electronic equipment and storage medium |
CN111599366B (en) * | 2020-05-19 | 2024-04-12 | 科大讯飞股份有限公司 | Vehicle-mounted multitone region voice processing method and related device |
CN111968642A (en) * | 2020-08-27 | 2020-11-20 | 北京百度网讯科技有限公司 | Voice data processing method and device and intelligent vehicle |
CN112002340B (en) * | 2020-09-03 | 2024-08-23 | 北京海云捷迅科技股份有限公司 | Multi-user-based voice acquisition method and device |
CN112460757A (en) * | 2020-11-13 | 2021-03-09 | 芜湖美智空调设备有限公司 | Air conditioner, voice control method thereof and storage medium |
CN112669837B (en) * | 2020-12-15 | 2022-12-06 | 北京百度网讯科技有限公司 | Awakening method and device of intelligent terminal and electronic equipment |
US11682411B2 (en) | 2021-08-31 | 2023-06-20 | Spotify Ab | Wind noise suppresor |
CN114974239A (en) * | 2022-05-14 | 2022-08-30 | 云知声智能科技股份有限公司 | Voice interaction method and device, electronic equipment and storage medium |
CN115691490A (en) * | 2022-10-09 | 2023-02-03 | 蔚来汽车科技(安徽)有限公司 | Method for dynamically switching sound zone, voice interaction method, equipment, medium and vehicle |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6965863B1 (en) * | 1998-11-12 | 2005-11-15 | Microsoft Corporation | Speech recognition user interface |
CN105280183A (en) * | 2015-09-10 | 2016-01-27 | 百度在线网络技术(北京)有限公司 | Voice interaction method and system |
CN105719644A (en) * | 2014-12-04 | 2016-06-29 | 中兴通讯股份有限公司 | Method and device for adaptively adjusting voice recognition rate |
CN105976815A (en) * | 2016-04-22 | 2016-09-28 | 乐视控股(北京)有限公司 | Vehicle voice recognition method and vehicle voice recognition device |
CN107180627A (en) * | 2017-06-22 | 2017-09-19 | 歌尔股份有限公司 | The method and apparatus for removing noise |
CN108122556A (en) * | 2017-08-08 | 2018-06-05 | 问众智能信息科技(北京)有限公司 | Reduce the method and device that driver's voice wakes up instruction word false triggering |
CN108286386A (en) * | 2018-01-22 | 2018-07-17 | 奇瑞汽车股份有限公司 | The method and apparatus of vehicle window control |
Family Cites Families (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9493130B2 (en) * | 2011-04-22 | 2016-11-15 | Angel A. Penilla | Methods and systems for communicating content to connected vehicle users based detected tone/mood in voice input |
CN105103227A (en) * | 2013-03-15 | 2015-11-25 | 英特尔公司 | Mechanism for facilitating dynamic adjustment of audio input/output (I/O) setting devices at conferencing computing devices |
EP3379844A4 (en) * | 2015-11-17 | 2018-11-14 | Sony Corporation | Information processing device, information processing method, and program |
CN105957523A (en) * | 2016-04-22 | 2016-09-21 | 乐视控股(北京)有限公司 | Vehicular system control method and device |
US10448150B2 (en) * | 2016-06-03 | 2019-10-15 | Faraday & Future Inc. | Method and apparatus to detect and isolate audio in a vehicle using multiple microphones |
CN108073381A (en) * | 2016-11-15 | 2018-05-25 | 腾讯科技(深圳)有限公司 | A kind of object control method, apparatus and terminal device |
CN106878281B (en) * | 2017-01-11 | 2020-03-31 | 上海蔚来汽车有限公司 | In-vehicle positioning device and method based on mixed audio and in-vehicle equipment control system |
CN106782585B (en) * | 2017-01-26 | 2020-03-20 | 芋头科技(杭州)有限公司 | Pickup method and system based on microphone array |
CN108877827B (en) * | 2017-05-15 | 2021-04-20 | 福州瑞芯微电子股份有限公司 | Voice-enhanced interaction method and system, storage medium and electronic equipment |
CN107554456A (en) * | 2017-08-31 | 2018-01-09 | 上海博泰悦臻网络技术服务有限公司 | Vehicle-mounted voice control system and its control method |
CN107919119A (en) * | 2017-11-16 | 2018-04-17 | 百度在线网络技术(北京)有限公司 | Method, apparatus, equipment and the computer-readable medium of more equipment interaction collaborations |
CN108986833A (en) * | 2018-08-21 | 2018-12-11 | 广州市保伦电子有限公司 | Sound pick-up method, system, electronic equipment and storage medium based on microphone array |
CN109391528A (en) * | 2018-08-31 | 2019-02-26 | 百度在线网络技术(北京)有限公司 | Awakening method, device, equipment and the storage medium of speech-sound intelligent equipment |
CN109192203B (en) * | 2018-09-29 | 2021-08-10 | 百度在线网络技术(北京)有限公司 | Multi-sound-zone voice recognition method, device and storage medium |
CN109448718A (en) * | 2018-12-11 | 2019-03-08 | 广州小鹏汽车科技有限公司 | A kind of audio recognition method and system based on multi-microphone array |
-
2019
- 2019-03-11 CN CN201910180338.2A patent/CN110010126B/en active Active
- 2019-03-11 CN CN202111266002.1A patent/CN113990320A/en not_active Withdrawn
- 2019-03-11 CN CN202111055499.2A patent/CN113782019A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6965863B1 (en) * | 1998-11-12 | 2005-11-15 | Microsoft Corporation | Speech recognition user interface |
CN105719644A (en) * | 2014-12-04 | 2016-06-29 | 中兴通讯股份有限公司 | Method and device for adaptively adjusting voice recognition rate |
CN105280183A (en) * | 2015-09-10 | 2016-01-27 | 百度在线网络技术(北京)有限公司 | Voice interaction method and system |
CN105976815A (en) * | 2016-04-22 | 2016-09-28 | 乐视控股(北京)有限公司 | Vehicle voice recognition method and vehicle voice recognition device |
CN107180627A (en) * | 2017-06-22 | 2017-09-19 | 歌尔股份有限公司 | The method and apparatus for removing noise |
CN108122556A (en) * | 2017-08-08 | 2018-06-05 | 问众智能信息科技(北京)有限公司 | Reduce the method and device that driver's voice wakes up instruction word false triggering |
CN108286386A (en) * | 2018-01-22 | 2018-07-17 | 奇瑞汽车股份有限公司 | The method and apparatus of vehicle window control |
Also Published As
Publication number | Publication date |
---|---|
CN113782019A (en) | 2021-12-10 |
CN110010126A (en) | 2019-07-12 |
CN113990320A (en) | 2022-01-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110010126B (en) | Speech recognition method, apparatus, device and storage medium | |
JP6914236B2 (en) | Speech recognition methods, devices, devices, computer-readable storage media and programs | |
CN109286875B (en) | Method, apparatus, electronic device and storage medium for directional sound pickup | |
CN109192203B (en) | Multi-sound-zone voice recognition method, device and storage medium | |
CN107577449B (en) | Wake-up voice pickup method, device, equipment and storage medium | |
US20140112496A1 (en) | Microphone placement for noise cancellation in vehicles | |
US9978355B2 (en) | System and method for acoustic management | |
US20190237067A1 (en) | Multi-channel voice recognition for a vehicle environment | |
CN109273020B (en) | Audio signal processing method, apparatus, device and storage medium | |
CN107910013B (en) | Voice signal output processing method and device | |
CN110673096B (en) | Voice positioning method and device, computer readable storage medium and electronic equipment | |
US20170352349A1 (en) | Voice processing device | |
CN111599357A (en) | In-vehicle multi-tone-area pickup method and device, electronic equipment and storage medium | |
US20190037363A1 (en) | Vehicle based acoustic zoning system for smartphones | |
US20160127827A1 (en) | Systems and methods for selecting audio filtering schemes | |
CN106575499A (en) | System and method of microphone placement for noise attenuation | |
WO2016103710A1 (en) | Voice processing device | |
WO2016143340A1 (en) | Speech processing device and control device | |
CN111599366B (en) | Vehicle-mounted multitone region voice processing method and related device | |
CN113270095B (en) | Voice processing method, device, storage medium and electronic equipment | |
WO2024078435A1 (en) | Method for dynamically switching speech zones, speech interaction method, device, medium, and vehicle | |
CN109215648A (en) | Vehicle-mounted voice identifying system and method | |
JP2024026716A (en) | Signal processor and signal processing method | |
CN115567810A (en) | Sound pickup system, sound pickup method and vehicle | |
CN116863968A (en) | Voice signal processing method, device and vehicle system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20211012 Address after: 100176 Room 101, 1st floor, building 1, yard 7, Ruihe West 2nd Road, economic and Technological Development Zone, Daxing District, Beijing Patentee after: Apollo Intelligent Connectivity (Beijing) Technology Co., Ltd. Address before: 518000 301, floor 3, unit D, productivity building, No. 5, Gaoxin middle 2nd Road, Nanshan District, Shenzhen, Guangdong Patentee before: BAIDU INTERNATIONAL TECHNOLOGY (SHENZHEN) Co.,Ltd. |