CN112382280A - Voice interaction method and device - Google Patents

Voice interaction method and device Download PDF

Info

Publication number
CN112382280A
CN112382280A CN202011244729.5A CN202011244729A CN112382280A CN 112382280 A CN112382280 A CN 112382280A CN 202011244729 A CN202011244729 A CN 202011244729A CN 112382280 A CN112382280 A CN 112382280A
Authority
CN
China
Prior art keywords
voice
user
voice interaction
module
command
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011244729.5A
Other languages
Chinese (zh)
Inventor
刘洋宇
黄安子
张云翔
饶竹一
李智诚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Power Supply Bureau Co Ltd
Original Assignee
Shenzhen Power Supply Bureau Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Power Supply Bureau Co Ltd filed Critical Shenzhen Power Supply Bureau Co Ltd
Priority to CN202011244729.5A priority Critical patent/CN112382280A/en
Publication of CN112382280A publication Critical patent/CN112382280A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/20Arrangements for obtaining desired frequency or directional characteristics
    • H04R1/32Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only
    • H04R1/40Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers
    • H04R1/406Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/005Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech

Abstract

The invention discloses a voice interaction method, which comprises the steps of obtaining a command for a user to wake up voice interaction equipment; carrying out directional pickup, far field noise reduction and echo elimination on a wake-up command of a user so as to reduce voice recognition errors; correcting the voice information according to the acquired data information before and after the user wakes up the voice interaction device to acquire an information text containing the intention of the user; and feeding back the user according to the information text. The invention also discloses voice interaction equipment. By implementing the voice interaction method and the voice interaction equipment, the accuracy of voice recognition is improved, and the method and the equipment can adapt to different task scenes; and the user experience is further improved.

Description

Voice interaction method and device
Technical Field
The invention relates to the technical field of voice recognition, in particular to a voice interaction method and voice interaction equipment.
Background
In the prior art, a voice recognition technology is mature and widely applied in life. In the traditional human-computer voice interaction process, the text is synthesized into voice through voice synthesis TTS, and the voice is transmitted back to the client for broadcasting. The simple human-computer voice interaction link can meet the requirement of a system for realizing the demonstration level, but can face difficulties in a real user task scene, so that the user experience is seriously reduced, and the existing technical problems are as follows:
1. speech recognition is not accurate. With the breakthrough of deep learning technology in speech recognition, speech recognition has become available in the context of general environment user coordination. However, the speech recognition is affected by various factors such as noisy environment, distance, dialect accent, vertical domain terminology, personalized vocabulary, and specific expression in an instant scene, and the current practical speech recognition effect is not ideal.
2. Semantic understanding is not right. Semantic understanding in speech interactions the intended expression of spoken language of a user to be processed, human languages generally exist: the method comprises the following steps of context association, scene specific wording, spoken language, common sense background, omission of explanation and other language phenomena, meanwhile, the naming of some vertical field entities is complex, a large number of entity ambiguity phenomena exist, and the semantic understanding in voice interaction is more difficult due to the continuous switching of scenes, contexts and interactive objects.
Therefore, the intelligent degree of speech recognition is limited in a real user task scene, and the application of the speech recognition is limited to a certain extent, so that the user experience effect is poor.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a voice interaction method and a voice interaction device, which can improve the accuracy of voice recognition and adapt to different task scenes; and the user experience is further improved.
In order to solve the above technical problem, an embodiment of the present invention provides a voice interaction method, including the following steps: acquiring a command for waking up the voice interaction equipment by a user; carrying out directional pickup, far field noise reduction and echo elimination on a wake-up command of a user so as to reduce voice recognition errors; correcting the voice information according to the acquired data information before and after the user wakes up the voice interaction device to acquire an information text containing the intention of the user; and feeding back the user according to the information text.
Wherein, the step of carrying out directional pickup, far field noise reduction and echo cancellation on the awakening command of the user so as to reduce the voice recognition error comprises the following steps: and performing semantic error correction in a mode of semantic modeling of a bottom layer sentence according to the bidirectional cyclic neural network model, the convolutional neural network model and/or the end-to-end neural network model and combining sentence pattern data so as to avoid the step of system response deviation caused by user intention understanding misalignment or content noise.
The method comprises the steps of calculating the angle and the distance between a sound source and a microphone array through multi-microphone array hardware and corresponding sound source positioning and beam forming, realizing the tracking of a target sound source, effectively forming a beam in an expected sound source direction, and only picking up signals in the beam, thereby achieving the steps of simultaneously extracting the sound source and suppressing noise.
The steps of far-field noise reduction and echo elimination comprise the steps of eliminating coupling between a loudspeaker and a microphone through a self-adaptive filter so as to improve the quality of sound obtained by picking up, monitoring effective human voice through an end point detection technology and filtering some non-human voice.
Wherein, according to the information text, the step of feeding back the user comprises the following steps: and adopting the steps of splicing speech synthesis and/or waveform modeling speech synthesis.
The step of synthesizing the spliced voice comprises the step of synthesizing through a template with a limited field and a fixed text format; the step of modeling speech synthesis using waveforms includes the step of synthesizing by dynamically varying portions of the content.
In order to solve the above technical problem, the present invention further provides a voice interaction device, including: the voice recognition system comprises a voice input module, a voice recognition module, a main controller, a control module, an external memory and a voice output module, wherein the voice input module carries out directional pickup, far field noise reduction and echo elimination on a wake-up command of a user so as to reduce voice recognition errors; the voice input module is in signal connection with the voice recognition module, the voice recognition module is in signal connection with an external memory and a voice output module through signal lines respectively, the external memory is in signal connection with an upper computer, the voice recognition module is in signal connection with a main controller through signal lines, the main controller is in signal connection with the external memory through signal lines, and the main controller is in signal connection with a control module through signal lines.
Wherein: the external memory sends data to the main control module through the serial port in the running process of the equipment, and the main control module stores the data in the external memory in a transferring mode.
The implementation of the voice interaction method and the voice interaction equipment has the following beneficial effects: acquiring a command for waking up the voice interaction equipment by a user; acquiring a command of a user for waking up the voice interaction equipment, and performing directional pickup, far-field noise reduction and echo elimination on the wake-up command of the user so as to reduce voice recognition errors; correcting the voice information according to the acquired data information before and after the user wakes up the voice interaction device to acquire an information text containing the intention of the user; and feeding back the user according to the information text. The accuracy rate of voice recognition is improved, so that the method can adapt to different task scenes; and the user experience is further improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flow chart of a voice interaction method according to an embodiment of the present invention.
Fig. 2 is a block diagram of a voice interaction device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 shows a first embodiment of a voice interaction method according to the present invention.
The embodiment of the invention provides a voice interaction method, which comprises the following steps: s10, acquiring the command of user to wake up the voice interaction device; s20, obtaining the command of user to wake up the voice interaction device, and carrying out directional pickup, far field noise reduction and echo elimination on the user' S wake up command to reduce the voice recognition error; s30, correcting the voice information according to the acquired data information before and after the user wakes up the voice interaction device, and acquiring an information text containing the intention of the user; s40 provides feedback to the user based on the information text.
In specific implementation, step S20, the steps of performing directional sound pickup, far-field noise reduction, and echo cancellation on the wake-up command of the user to reduce the voice recognition error include the following steps: according to the bidirectional cyclic neural network model, the convolutional neural network model and/or the end-to-end neural network model, and in combination with sentence pattern data, semantic error correction is performed in a mode of semantic modeling of a bottom layer sentence, and the semantic error correction method has the functions of: a step of avoiding system response bias due to user intent to understand misalignment or content noise.
Further, the step of directionally picking up the wake-up command of the user in step S20 includes calculating the angle and distance of the sound source from the microphone array through the multi-microphone array hardware and corresponding sound source localization and beam forming, so as to track the target sound source, and at the same time, effectively forming a beam in the desired sound source direction, picking up only the signals in the beam, thereby achieving the steps of simultaneously extracting the sound source and suppressing the noise.
Further, the step of far-field noise reduction and echo cancellation in step S20 includes the steps of canceling the coupling between the speaker and the microphone through an adaptive filter, thereby improving the quality of the picked-up voice, and monitoring the effective human voice and filtering some non-human voices through an endpoint detection technique.
The effect of implementing the above process is: on one hand, the interactive process is friendly, the recognition result can not be generated due to a little voice, and then the system can not respond in disorder, on the other hand, the load of a communication network and voice recognition service can be reduced, the information depended on by endpoint detection is the relative energy change trend of audio, when a user talks hesitatively and discontinuously, audio cut-off caused by a long pause time can often occur, and therefore the current voice interaction fails.
For example, the continuity and completeness factors of human language can be considered, the waiting time is longer when the user does not speak completely, and the waiting time is shorter when the user finishes speaking, under the consideration, the language level is required to evaluate whether the user language is completely expressed, and meanwhile, the comprehensive evaluation is carried out to judge whether the user language is detected as a voice endpoint by combining the endpoint detection technology of the acoustic level.
Step S30, according to the obtained data information before and after the user wakes up the voice interaction device, correcting the error of the voice information, and obtaining the information text containing the intention of the user, the steps are as follows: semantic understanding is carried out on the identified information text, error correction is carried out on the information identified by the voice, and the intention of a user is understood; deviations in system response due to user intent to understand misalignment or content noise are avoided.
In step S40, the step of feeding back the user according to the information text includes: and adopting the steps of splicing speech synthesis and/or waveform modeling speech synthesis. In specific implementation, the step of synthesizing the spliced voice comprises the step of synthesizing the spliced voice through a template with a fixed text format in a limited field; the step of modeling speech synthesis using waveforms includes the step of synthesizing by dynamically varying portions of the content. For example: the dynamically changing portions of content are synthesized using a waveform modeling synthesis system, which may be used alone or in combination in an interactive device.
Fig. 2 shows a first embodiment of the voice interaction device according to the present invention.
An embodiment of the present invention provides a voice interaction device, including: the voice recognition system comprises a voice input module 1, a voice recognition module 2, a main controller 3, a control module 4, an external memory 5 and a voice output module 6, wherein the voice input module 1 carries out directional pickup, far field noise reduction and echo elimination on a wake-up command of a user so as to reduce voice recognition errors; the voice input module 1 is in signal connection with the voice recognition module 2, the voice recognition module 2 is in signal connection with the external storage 5 and the voice output module 6 through signal lines respectively, the external storage 5 is in signal connection with the upper computer 7, the voice recognition module 2 is in signal connection with the main controller 3 through signal lines, the main controller 3 is in signal connection with the external storage 5 through signal lines, and the main controller 3 is in signal connection with the control module 4 through signal lines.
In specific implementation, the voice recognition module 2 has functions of directional sound pickup, far-field noise reduction, echo cancellation and end point detection. The directional pickup technology mainly calculates the angle and distance between a sound source and a microphone array through multi-microphone array hardware and corresponding sound source positioning and beam forming, realizes the tracking of a target sound source, effectively forms a beam in an expected sound source direction, and only picks up signals in the beam, thereby achieving the purposes of simultaneously extracting the sound source and suppressing noise.
The echo cancellation technology cancels the coupling between the loudspeaker and the microphone through the adaptive filter, so that the quality of sound obtained by pickup is improved, and the endpoint detection technology monitors effective human voice and filters some non-human voice.
In addition, the speech recognition module 2 in this embodiment is embedded with a bidirectional cyclic neural network, a convolutional neural network, and an end-to-end neural network model, and combines with other resources such as corresponding entities or sentences, so as to obtain enhancement of generalization performance by improving semantic modeling capability of the bottom-layer sentences, and have certain semantic error correction capability, so as to avoid system response deviation caused by incorrect understanding or content noise intended by the user, and further solve the problems of far-field recognition, invalid input rejection, sentence break, context understanding, and the like.
The above functions of the speech recognition module 2 in this embodiment can make the interactive process more friendly, and will not cause a system spurious response due to a recognition result caused by a little voice. On the other hand, the load of a communication network and a voice recognition service can be reduced, the relative energy change trend of the audio is the information depended on by the endpoint detection, and when a user hesitates and breaks, the audio cut-off caused by a longer pause time can often occur, so that the current voice interaction fails.
Further, speech recognition module 2 passes through signal line and external memory 5 and the 6 signal connection of speech output module, speech recognition module 6 adopts LD3320 processing chip, speech input person module 1 sends the speech information who gathers into LD3320 processing chip and discerns, LD3320 returns the recognition result for main control module 3, take out speech data from external memory 5 simultaneously and carry out the speech output broadcast, main control module 3 analysis recognition result back this moment, utilize control module 4 to reach final control terminal equipment's effect.
During specific implementation, the voice output recognition module 6 is formed by combining a splicing voice synthesis system and a waveform modeling voice synthesis system, the splicing synthesis system is used for synthesizing a large number of fixed text format templates in a limited field, the waveform modeling synthesis system is used for synthesizing part of contents of dynamic changes, the voice output recognition module is combined and used in an interactive system, the voice recognition module 6 is in signal connection with the upper computer 7 through the external storage 5 and can be changed in real time in the operation process of equipment, the written upper computer 7 is used for sending data to the main control module 3 through a serial port, and the main control module 3 stores the data in an external storage.
Preferably, the main controller 3 adopts an STC10I08XE chip, the main controller 3 is in signal connection with the external memory 5 through a signal line, the main controller 3 is in signal connection with the control module 4 through a signal line, and the control module 4 is connected with the plurality of terminal devices through a signal line.
In other embodiments, the external memory 5 sends data to the main control module 4 through a serial port in the device operation process, and the main control module transfers the data to the external memory for storage.
The invention also discloses a voice readable storage medium, which stores a computer program, and the computer program can realize the steps of the voice interaction method when being executed by a processor.
The implementation of the voice interaction method and the voice interaction equipment has the following beneficial effects: acquiring a command for waking up the voice interaction equipment by a user; acquiring a command of a user for waking up the voice interaction equipment, and performing directional pickup, far-field noise reduction and echo elimination on the wake-up command of the user so as to reduce voice recognition errors; correcting the voice information according to the acquired data information before and after the user wakes up the voice interaction device to acquire an information text containing the intention of the user; and feeding back the user according to the information text. The accuracy rate of voice recognition is improved, so that the method can adapt to different task scenes; and the user experience is further improved.

Claims (8)

1. A voice interaction method, comprising the steps of:
acquiring a command for waking up the voice interaction equipment by a user;
carrying out directional pickup, far field noise reduction and echo elimination on a wake-up command of a user so as to reduce voice recognition errors;
correcting the voice information according to the acquired data information before and after the user wakes up the voice interaction device to acquire an information text containing the intention of the user;
and feeding back the user according to the information text.
2. The method of claim 1, wherein the steps of directionally picking up sound, reducing far-field noise, and eliminating echo for the wake-up command of the user to reduce the voice recognition error comprise:
and performing semantic error correction in a mode of semantic modeling of a bottom layer sentence according to the bidirectional cyclic neural network model, the convolutional neural network model and/or the end-to-end neural network model and combining sentence pattern data so as to avoid the step of system response deviation caused by user intention understanding misalignment or content noise.
3. The voice interaction method of claim 1 or 2, wherein the step of directionally picking up the wake-up command of the user comprises:
the method comprises the steps of calculating the angle and the distance between a sound source and a microphone array through multi-microphone array hardware and corresponding sound source positioning and beam forming, realizing the tracking of a target sound source, effectively forming a beam in an expected sound source direction, and only picking up signals in the beam, thereby achieving the steps of simultaneously extracting the sound source and suppressing noise.
4. The voice interaction method of claim 1 or 2, wherein the far-field noise reduction and echo cancellation steps include the steps of canceling the coupling between the speaker and the microphone by an adaptive filter, thereby improving the quality of the picked-up voice, and the end-point detection technique monitoring the active human voice and filtering some non-human voices.
5. The voice interaction method according to claim 1 or 2, wherein the step of feeding back the user according to the information text comprises:
and adopting the steps of splicing speech synthesis and/or waveform modeling speech synthesis.
6. The voice interaction method of claim 5, wherein the step of using the stitched speech synthesis comprises the step of synthesizing through a limited-domain fixed-text-format template; the step of modeling speech synthesis using waveforms includes the step of synthesizing by dynamically varying portions of the content.
7. A voice interaction device, comprising:
voice input module, speech recognition module, main control unit, control module, external memory and speech output module, wherein:
the voice input module carries out directional pickup, far field noise reduction and echo elimination on a wake-up command of a user so as to reduce voice recognition errors;
the voice input module is in signal connection with the voice recognition module, the voice recognition module is in signal connection with the external memory and the voice output module through signal lines respectively, the external memory is in signal connection with an upper computer, the voice recognition module is in signal connection with a main controller through signal lines, the main controller is in signal connection with the external memory through signal lines, and the main controller is in signal connection with the control module through signal lines.
8. The voice interaction device of claim 7, wherein the external memory sends data to the main control module through a serial port during operation of the device, and the main control module transfers the data to the external memory for storage.
CN202011244729.5A 2020-11-10 2020-11-10 Voice interaction method and device Pending CN112382280A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011244729.5A CN112382280A (en) 2020-11-10 2020-11-10 Voice interaction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011244729.5A CN112382280A (en) 2020-11-10 2020-11-10 Voice interaction method and device

Publications (1)

Publication Number Publication Date
CN112382280A true CN112382280A (en) 2021-02-19

Family

ID=74578603

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011244729.5A Pending CN112382280A (en) 2020-11-10 2020-11-10 Voice interaction method and device

Country Status (1)

Country Link
CN (1) CN112382280A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114944153A (en) * 2022-07-26 2022-08-26 中诚华隆计算机技术有限公司 Enhanced awakening method and device for terminal of Internet of things and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107030691A (en) * 2017-03-24 2017-08-11 华为技术有限公司 A kind of data processing method and device for nursing robot
US20180286396A1 (en) * 2017-03-29 2018-10-04 Baidu Online Network Technology (Beijing) Co., Ltd. Method and device for processing speech instruction
CN109147762A (en) * 2018-10-19 2019-01-04 广东小天才科技有限公司 A kind of audio recognition method and system
CN109949803A (en) * 2019-02-11 2019-06-28 特斯联(北京)科技有限公司 Building service facility control method and system based on semantic instructions intelligent recognition
CN110099246A (en) * 2019-02-18 2019-08-06 深度好奇(北京)科技有限公司 Monitoring and scheduling method, apparatus, computer equipment and storage medium
CN110910874A (en) * 2019-11-08 2020-03-24 深圳明心科技有限公司 Interactive classroom voice control method, terminal equipment, server and system
EP3671733A1 (en) * 2017-08-17 2020-06-24 Sony Corporation Information processing device, information processing method, and program

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107030691A (en) * 2017-03-24 2017-08-11 华为技术有限公司 A kind of data processing method and device for nursing robot
US20180286396A1 (en) * 2017-03-29 2018-10-04 Baidu Online Network Technology (Beijing) Co., Ltd. Method and device for processing speech instruction
EP3671733A1 (en) * 2017-08-17 2020-06-24 Sony Corporation Information processing device, information processing method, and program
CN109147762A (en) * 2018-10-19 2019-01-04 广东小天才科技有限公司 A kind of audio recognition method and system
CN109949803A (en) * 2019-02-11 2019-06-28 特斯联(北京)科技有限公司 Building service facility control method and system based on semantic instructions intelligent recognition
CN110099246A (en) * 2019-02-18 2019-08-06 深度好奇(北京)科技有限公司 Monitoring and scheduling method, apparatus, computer equipment and storage medium
CN110910874A (en) * 2019-11-08 2020-03-24 深圳明心科技有限公司 Interactive classroom voice control method, terminal equipment, server and system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114944153A (en) * 2022-07-26 2022-08-26 中诚华隆计算机技术有限公司 Enhanced awakening method and device for terminal of Internet of things and storage medium

Similar Documents

Publication Publication Date Title
US11620983B2 (en) Speech recognition method, device, and computer-readable storage medium
US11502859B2 (en) Method and apparatus for waking up via speech
US10546593B2 (en) Deep learning driven multi-channel filtering for speech enhancement
JP7348288B2 (en) Voice interaction methods, devices, and systems
EP3923273B1 (en) Voice recognition method and device, storage medium, and air conditioner
TWI455112B (en) Speech processing apparatus and electronic device
CN102024457B (en) Information processing apparatus and information processing method
CN107464565B (en) Far-field voice awakening method and device
CN103152546A (en) Echo suppression method for videoconferences based on pattern recognition and delay feedforward control
US10529331B2 (en) Suppressing key phrase detection in generated audio using self-trigger detector
US11222652B2 (en) Learning-based distance estimation
CN110942779A (en) Noise processing method, device and system
CN110660407B (en) Audio processing method and device
CN114944153A (en) Enhanced awakening method and device for terminal of Internet of things and storage medium
CN205754809U (en) A kind of robot self-adapting volume control system
CN114365216A (en) Targeted voice separation for speech recognition by speaker
CN116420188A (en) Speech filtering of other speakers from call and audio messages
CN112382280A (en) Voice interaction method and device
CN110517682B (en) Voice recognition method, device, equipment and storage medium
Jaroslavceva et al. Robot Ego‐Noise Suppression with Labanotation‐Template Subtraction
CN109920433A (en) The voice awakening method of electronic equipment under noisy environment
CN110534084B (en) Intelligent voice control method and system based on FreeWITCH
JP2019139146A (en) Voice recognition system and voice recognition method
CN209515191U (en) A kind of voice enabling apparatus
CN116320144B (en) Audio playing method, electronic equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination