CN108573699A - Voice sharing recognition methods - Google Patents
Voice sharing recognition methods Download PDFInfo
- Publication number
- CN108573699A CN108573699A CN201710144058.7A CN201710144058A CN108573699A CN 108573699 A CN108573699 A CN 108573699A CN 201710144058 A CN201710144058 A CN 201710144058A CN 108573699 A CN108573699 A CN 108573699A
- Authority
- CN
- China
- Prior art keywords
- signal
- speech recognition
- voice
- voice signal
- background service
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 34
- 230000001960 triggered effect Effects 0.000 claims abstract description 8
- 230000003993 interaction Effects 0.000 claims abstract description 5
- 230000000977 initiatory effect Effects 0.000 claims abstract description 3
- 238000005498 polishing Methods 0.000 claims abstract description 3
- 230000004044 response Effects 0.000 claims description 34
- 230000005540 biological transmission Effects 0.000 claims description 4
- 238000005286 illumination Methods 0.000 claims description 2
- 230000002045 lasting effect Effects 0.000 claims description 2
- 230000008569 process Effects 0.000 claims description 2
- 230000011664 signaling Effects 0.000 claims 3
- 230000005236 sound signal Effects 0.000 claims 1
- 241000209140 Triticum Species 0.000 description 1
- 235000021307 Triticum Nutrition 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000006698 induction Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/32—Multiple recognisers used in sequence or in parallel; Score combination systems therefor, e.g. voting systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02163—Only one microphone
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Telephonic Communication Services (AREA)
Abstract
The invention discloses a kind of voice sharing recognition methods, when there are when multiple speech recognition equipments in same environment, each speech recognition equipment is triggered by voice signal, proximity sensor, button or remote signal, it can reinforce phonetic recognization rate by background service device or mutually collaboration, and the specified one or more speech recognition equipments of service data are responded;Background service device can distinguish each voice signal source and position according to the device identification and/or network address of speech recognition equipment, identify that matching rate preselects speech recognition equipment according to highest, and all voice signals are aligned according to same initiation feature, it carries out signal and merges tuning and segment polishing, more preferably voice signal and new identification matching rate are generated, finally goes out to acquire the best speech recognition equipment of signal according to the intensity of voice signal, sound source position and feature calculation to continue multi-process interaction.
Description
Technical field
The present invention relates to a kind of voice sharing recognition methods, and voice is detected simultaneously by suitable for working as more speech recognition equipments
When signal, determining complete speech signal, and the treatment mechanism that one or more specified device is responded how are cooperateed with.
Technical background
Common speech recognition equipment can only be triggered by specific voice signal, be responded, voice signal 5 meters with
Upper discrimination can be greatly reduced, and in order to respond the voice of user at any time, more speech recognitions are just needed in big environment
Device, each device all independently can be identified and be responded at present, can be interfered with each other, and bad user experience is caused.If can be
Coordinate between more speech recognition equipments, recognition effect can be reinforced by the voice signal of multi pass acquisition, while according to signal
The parameters such as intensity, direction and distance, find and are responded with the most matched device of user, speech-recognition services can be greatly improved
Experience.
Invention content
The invention discloses a kind of voice sharing recognition methods, which is characterized in that for having speech trigger and will acquire
Voice signal be uniformly sent to the speech recognition equipment that specified background service device is handled, when in same environment exist it is more
When a speech recognition equipment, it can be known to reinforce voice using voice sharing recognition methods by the background service device
Not rate finds related service data, and specified one or more speech recognition equipments are responded;The voice sharing identification side
Method is, the background service device, can be from receiving what first speech recognition equipment was sent when idle state receives data
Start timing at the time of residing for voice signal, in particular time range, waits and to be received send from each speech recognition equipment
Voice signal, background service device can distinguish respectively according to the device identification and/or network address of the speech recognition equipment
Each voice signal can be identified in voice signal source, background service device, select the identification highest PRELIMINARY RESULTS of matching rate
Corresponding speech recognition equipment is set as preliminary response apparatus, if highest identification matching rate does not reach the minimum of setting and wants
It asks, after all voice signals can be aligned by background service device according to same initiation feature again, carries out the merging tuning of signal
It with segment polishing, generates more preferably voice signal and is identified, if recognition result is different, and identifies matching rate higher, then
PRELIMINARY RESULTS is replaced with new recognition result, and goes out to acquire signal according to the intensity of voice signal, sound source position and feature calculation
Best speech recognition equipment replaces preliminary response apparatus, and recognition result is received by finally determining response apparatus, continues
Follow-up processing flow;When multiple speech recognition equipments are in consolidated network, due to the Intranet of each speech recognition equipment
Address is different, and background service device is identical in outer net address, remains able to distinguish each voice according to internal address
Signal source.
The voice sharing recognition methods, which is characterized in that the speech trigger of the speech recognition equipment is to pass through list
Microphone acquires voice signal realization in real time, and speech recognition equipment acquires voice signal by low-power consumption, held according to voice signal
Continuous amplitude characteristic is then activated into acquisition in real time, judges voice signal according to local identification library when lasting amplitude is more than setting value
Whether it is trigger signal or locally executes instruction, if not trigger signal and locally executes instruction and then ignore current speech letter
Number, continue to acquire, if it is trigger signal, first send out response signal, is further continued for acquisition subsequent voice signal, is sent to background service
Device then first sends out response signal if it is instruction is locally executed, then executes local control operation.
The voice sharing recognition methods, which is characterized in that the speech trigger of the speech recognition equipment is to pass through wheat
Gram wind array acquires voice signal realization in real time, and speech recognition equipment calculates sound source position according to the voice signal of microphone array
It sets, and the microphone signal of corresponding position can be reinforced, the microphone signal in the other positions that decay ultimately generates high quality
Voice signal and sound source position information, speech recognition equipment according to local identification library judge voice signal whether be trigger signal or
Instruction is locally executed, then ignores voice signal with instruction is locally executed if not trigger signal, continues to acquire, if it is triggering
Signal first sends out response signal, and voice signal harmony source location information is sent to backstage, while continuing to acquire subsequent voice letter
Number, it is continuously sent to background service device, if it is instruction is locally executed, then first sends out response signal, then execute local control
Operation.
The voice sharing recognition methods, which is characterized in that the speech trigger of the speech recognition equipment is to pass through people
Body induction sensor acquires signal realization in real time, when human body comes close to or in contact with speech recognition equipment, can trigger human body sensing
Sensor sends out trigger signal, and speech recognition equipment first sends out response signal, is further continued for acquisition subsequent voice signal, is sent to backstage
Service unit.
The voice sharing recognition methods, which is characterized in that the speech trigger of the speech recognition equipment be by by
Button realizes that when a button is pressed, can send out trigger signal, speech recognition equipment first sends out response signal, is further continued for adopting
Collect subsequent voice signal, is sent to background service device.
The voice sharing recognition methods, which is characterized in that the speech trigger of the speech recognition equipment is to pass through nothing
Line reception device is realized, when radio receiver receives specific wireless signal, can send out trigger signal, speech recognition
Device first sends out response signal, is further continued for acquisition subsequent voice signal, is sent to background service device.
The voice sharing recognition methods, which is characterized in that after the speech recognition equipment receives trigger signal, a side
Face carries out local identification, while trigger signal is sent to background service device, background service device according to trigger signal quality and/
Or arrival time, only allow one or more speech recognition equipments to send out response signal, all speech recognition equipments being triggered after
Continuous acquisition subsequent voice signal, is sent to background service device.
The voice sharing recognition methods, which is characterized in that after the speech recognition equipment receives trigger signal, a side
Face carries out local identification, while trigger signal is sent to background service device, background service device according to trigger signal quality and/
Or arrival time, it only allows one or more speech recognition equipments to send out response signal, and continue to acquire subsequent voice signal, is sent to
Background service device, it is other to be not allowed to the speech recognition equipment of response that not continue to acquisition subsequent voice signal.
The voice sharing recognition methods, which is characterized in that advanced after the speech recognition equipment receives trigger signal
Trigger signal, is sent to background service device, background service device is according to trigger signal matter by the local identification of row again after identifying successfully
Amount and/or arrival time, one or more speech recognition equipments is only allowed to send out response signal, and continues to acquire subsequent voice letter
Number, it is sent to background service device, it is other to be not allowed to the speech recognition equipment of response that not continue to acquisition subsequent voice signal.
The voice sharing recognition methods, which is characterized in that the speech recognition equipment passes to after receiving trigger signal
Background service device, and be allowed to after sending out response signal, according to the voice signal of subsequent acquisition, it is sent to background service device,
By returned data to play sound, control illumination, transmission infrared forwarding data, transmission wireless data, execute local control instruction
In one or more modes presented, and within certain time, continue to acquire voice signal, pass back to background service
Device forms multi-process interaction, this is triggered without trigger signal in the process, unless voice is not detected in time-out time
Signal, then speech recognition equipment return to state to be triggered, wait for trigger signal that could enter interaction flow..
Specific implementation mode
The voice sharing recognition methods of the present invention, specific implementation mode are that the master controller of speech recognition equipment uses
The high speed processor of programmable band DSP, built-in memory unit, the infrared link block with radio function of outer tape splicing, and
Trigger button and infrared proximity transducer, user use for the first time, need the application program controlling by external equipment, set voice
The position of identification device and device identification, registering and log in background service account number can normal use.
When master controller receives button, after the trigger signal of infrared proximity transducer or voice signal, make a sound and lamp
Optical response signal, and continue to detect voice signal and be sent to background service device and handled, by the service data received to play
Sound controls light, transmits infrared signal, and the mode for transmitting wireless signal is presented.
When there are multiple speech recognition equipments, coordinated by background service device, or coordinated between each other so that only
Have from the speech recognition equipment response user voice that user is nearest or acquisition signal is best and feeds back.
Claims (10)
1. a kind of voice sharing recognition methods, which is characterized in that for having speech trigger and unifying the voice signal of acquisition
It is sent to the speech recognition equipment that specified background service device is handled, when there are multiple speech recognitions in same environment
When device, phonetic recognization rate can be reinforced using voice sharing recognition methods by the background service device, find correlation
Service data, and specified one or more speech recognition equipments are responded;The voice sharing recognition methods is the backstage
Service unit, can be from receiving residing for the voice signal that first speech recognition equipment is sent when idle state receives data
Moment starts timing, in particular time range, the voice signal to be received sent from each speech recognition equipment, backstage is waited to take
Business device can distinguish each voice signal source according to the device identification and/or network address of the speech recognition equipment, after
Each voice signal can be identified in platform service unit, select the corresponding speech recognition dress of the identification highest PRELIMINARY RESULTS of matching rate
It sets, is set as preliminary response apparatus, if highest identification matching rate does not reach the minimum requirements of setting, background service device meeting
After all voice signals are aligned according to same initiation feature again, the merging tuning and segment polishing of signal are carried out, is generated more
Ideal voice signal is identified, if recognition result is different, and identifies matching rate higher, is then replaced with new recognition result
PRELIMINARY RESULTS, and go out to acquire the best speech recognition equipment of signal according to the intensity of voice signal, sound source position and feature calculation,
Preliminary response apparatus is replaced, recognition result is received by finally determining response apparatus, continues follow-up processing flow;When multiple
The speech recognition equipment is in consolidated network, since the internal address of each speech recognition equipment is different, background service dress
It sets identical in outer net address, remains able to distinguish each voice signal source according to internal address.
2. according to the voice sharing recognition methods described in claim 1, which is characterized in that the voice of the speech recognition equipment
Triggering is to acquire voice signal realization in real time by single microphone, and speech recognition equipment acquires voice signal, root by low-power consumption
Continue amplitude characteristic according to voice signal, when lasting amplitude is more than setting value, then activates into acquisition in real time, library is identified according to local
Judge whether voice signal is trigger signal or locally executes instruction, then ignores with instruction is locally executed if not trigger signal
Current speech signal continues to acquire, and if it is trigger signal, first sends out response signal, is further continued for acquisition subsequent voice signal, send
Response signal is then first sent out if it is instruction is locally executed to background service device, then executes local control operation.
3. according to the voice sharing recognition methods described in claim 1, which is characterized in that the voice of the speech recognition equipment
Triggering is to acquire voice signal realization in real time by microphone array, and speech recognition equipment is according to the voice signal of microphone array
Sound source position is calculated, and the microphone signal of corresponding position can be reinforced, the microphone signal in the other positions that decay, finally
The voice signal and sound source position information of high quality are generated, whether speech recognition equipment judges voice signal according to local identification library
For trigger signal or instruction is locally executed, then ignores voice signal with instruction is locally executed if not trigger signal, continues to adopt
Collection, if it is trigger signal, first sends out response signal, voice signal harmony source location information is sent to backstage, while continuing to adopt
Collect subsequent voice signal, be continuously sent to background service device, if it is instruction is locally executed, then first sends out response signal, then hold
The local control operation of row.
4. according to the voice sharing recognition methods described in claim 1, which is characterized in that the voice of the speech recognition equipment
Triggering is to acquire signal realization in real time by human body sensor, can when human body comes close to or in contact with speech recognition equipment
Triggering human body sensor sends out trigger signal, and speech recognition equipment first sends out response signal, is further continued for acquisition subsequent voice
Signal is sent to background service device.
5. according to the voice sharing recognition methods described in claim 1, which is characterized in that the voice of the speech recognition equipment
Triggering is realized by button, when a button is pressed, can send out trigger signal, speech recognition equipment first sends out response letter
Number, it is further continued for acquisition subsequent voice signal, is sent to background service device.
6. according to the voice sharing recognition methods described in claim 1, which is characterized in that the voice of the speech recognition equipment
Triggering is realized by radio receiver, when radio receiver receives specific wireless signal, can send out triggering
Signal, speech recognition equipment first send out response signal, are further continued for acquisition subsequent voice signal, are sent to background service device.
7. according to any voice sharing recognition methods in claim 1 to 6, which is characterized in that the speech recognition dress
It sets after receiving trigger signal, on the one hand carries out local identification, while trigger signal is sent to background service device, background service dress
It sets according to trigger signal quality, one or more speech recognition equipments is only allowed to send out response signal, all voices being triggered are known
Other device can continue to acquisition subsequent voice signal, be sent to background service device.
8. according to the voice sharing recognition methods described in claim 7, which is characterized in that the speech recognition equipment receives tactile
After signalling, local identification is on the one hand carried out, while trigger signal is sent to background service device, background service device is according to tactile
Signalling quality and/or arrival time only allow one or more speech recognition equipments to send out response signal, and it is follow-up to continue acquisition
Voice signal, is sent to background service device, and other speech recognition equipments for being not allowed to response will not continue to acquire follow-up language
Sound signal.
9. according to the voice sharing recognition methods described in claim 8, which is characterized in that the speech recognition equipment receives tactile
After signalling, local identification is first carried out, trigger signal is just sent to background service device, background service device root after identifying successfully
According to trigger signal quality and/or arrival time, one or more speech recognition equipments is only allowed to send out response signal, and continues to acquire
Subsequent voice signal, is sent to background service device, it is other be not allowed to the speech recognition equipment of response that will not continue to acquisition after
Continuous voice signal.
10. according to the voice sharing recognition methods described in claim 9, which is characterized in that the speech recognition equipment receives
Background service device is passed to after trigger signal, and is allowed to after sending out response signal, according to the voice signal of subsequent acquisition, is sent to
Background service device, by returned data to play sound, control illumination, transmission infrared forwarding data, transmission wireless data, execution
One or more modes in local control instruction are presented, and within certain time, are continued to acquire voice signal, be returned
Background service device is passed to, multi-process interaction is formed, this is triggered without trigger signal in the process, unless in time-out time
Voice signal is not detected, then speech recognition equipment returns to state to be triggered, waits for trigger signal that could enter interaction flow.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710144058.7A CN108573699A (en) | 2017-03-13 | 2017-03-13 | Voice sharing recognition methods |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710144058.7A CN108573699A (en) | 2017-03-13 | 2017-03-13 | Voice sharing recognition methods |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108573699A true CN108573699A (en) | 2018-09-25 |
Family
ID=63577952
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710144058.7A Pending CN108573699A (en) | 2017-03-13 | 2017-03-13 | Voice sharing recognition methods |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108573699A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111370012A (en) * | 2020-05-27 | 2020-07-03 | 北京小米移动软件有限公司 | Bluetooth voice audio acquisition method and system |
CN112820287A (en) * | 2020-12-31 | 2021-05-18 | 乐鑫信息科技(上海)股份有限公司 | Distributed speech processing system and method |
-
2017
- 2017-03-13 CN CN201710144058.7A patent/CN108573699A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111370012A (en) * | 2020-05-27 | 2020-07-03 | 北京小米移动软件有限公司 | Bluetooth voice audio acquisition method and system |
CN112820287A (en) * | 2020-12-31 | 2021-05-18 | 乐鑫信息科技(上海)股份有限公司 | Distributed speech processing system and method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2018049782A1 (en) | Household appliance control method, device and system, and intelligent air conditioner | |
US11561519B2 (en) | Systems and methods of gestural interaction in a pervasive computing environment | |
US11056108B2 (en) | Interactive method and device | |
CN107450390B (en) | intelligent household appliance control device, control method and control system | |
US9474042B1 (en) | Detecting location within a network | |
US11770649B2 (en) | Systems and methods for automatic speech recognition | |
US10531540B2 (en) | Intelligent lamp holder and usage method applied therein | |
US20180048482A1 (en) | Control system and control processing method and apparatus | |
WO2019112924A1 (en) | Indoor position and vector tracking systems and method | |
EP1217608B1 (en) | Activation of voice-controlled apparatus | |
EP3602241B1 (en) | Method and apparatus for interaction with an intelligent personal assistant | |
CN108231079A (en) | For the method, apparatus, equipment and computer readable storage medium of control electronics | |
CN106970535B (en) | Control method and electronic equipment | |
EP3198995A1 (en) | Smart lighting device, and smart lighting control system and method | |
CN108573699A (en) | Voice sharing recognition methods | |
WO2022017003A1 (en) | Voice transmission control method, voice remote controller, terminal device, and storage medium | |
JP2002311990A5 (en) | ||
CN110383236A (en) | Master device is selected to realize isochronous audio | |
CN113671846B (en) | Intelligent device control method and device, wearable device and storage medium | |
CN107479710A (en) | Intelligent mirror and control method, device, equipment and storage medium thereof | |
CN111090412B (en) | Volume adjusting method and device and audio equipment | |
EP3777485A1 (en) | System and methods for augmenting voice commands using connected lighting systems | |
US7092886B2 (en) | Controlling the order of output of multiple devices | |
CN112105129B (en) | Intelligent lamp, intelligent lighting method and computer readable storage medium | |
US20020082835A1 (en) | Device group discovery method and apparatus |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20180925 |
|
WD01 | Invention patent application deemed withdrawn after publication |