WO2015103836A1 - Procédé et dispositif de commande vocale - Google Patents
Procédé et dispositif de commande vocale Download PDFInfo
- Publication number
- WO2015103836A1 WO2015103836A1 PCT/CN2014/078463 CN2014078463W WO2015103836A1 WO 2015103836 A1 WO2015103836 A1 WO 2015103836A1 CN 2014078463 W CN2014078463 W CN 2014078463W WO 2015103836 A1 WO2015103836 A1 WO 2015103836A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- keyword
- data
- voice
- speech
- recognition
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 67
- 239000013598 vector Substances 0.000 claims description 41
- 230000008569 process Effects 0.000 claims description 28
- 238000012545 processing Methods 0.000 claims description 20
- 238000005457 optimization Methods 0.000 claims description 14
- 238000013139 quantization Methods 0.000 claims description 13
- 230000001960 triggered effect Effects 0.000 abstract description 5
- 238000004891 communication Methods 0.000 description 19
- 230000000007 visual effect Effects 0.000 description 16
- 230000006870 function Effects 0.000 description 10
- 230000005540 biological transmission Effects 0.000 description 7
- 238000000605 extraction Methods 0.000 description 5
- 230000004044 response Effects 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 230000011664 signaling Effects 0.000 description 4
- 230000010354 integration Effects 0.000 description 3
- 238000007619 statistical method Methods 0.000 description 3
- 238000004590 computer program Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000009432 framing Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
Definitions
- the present invention relates to voice technology, and in particular, to a voice control method and apparatus. Background technique
- the embodiment of the present invention is directed to providing a voice control method and apparatus, which can issue a control command by voice, which is convenient for the user to operate and free the user's hands.
- a voice control method comprising:
- the sending of the keyword control command is triggered, and the recognized keyword data is used as a control command to respond to the user operation to implement voice control.
- performing voice recognition on the voice data performing keyword matching according to a predetermined manner, and obtaining the identified keyword data from the voice data, including:
- the keyword matching is performed according to a predetermined manner of HMM modeling of the hidden Markov model
- the acoustic feature parameter extracted by the speech data is MFCC feature parameter
- the recognition result is used as a reference reference for keyword matching, and is identified. Keyword data.
- the method further includes: after obtaining the recognized keyword data, performing keyword matching optimization processing according to a predetermined manner of the shortest distance.
- the keyword matching optimization process is performed according to the predetermined method based on the shortest distance, and includes:
- the keyword data identified by the keyword matching optimization process is obtained.
- the method further includes:
- the keyword data includes: at least one basic control command information of incoming call, outgoing call, answer, and hang up.
- a voice control device comprising:
- the voice acquiring unit is configured to trigger voice data after the user operates
- a keyword identifying unit configured to perform voice recognition on the voice data, perform keyword matching according to a predetermined manner, and obtain the identified keyword data from the voice data;
- a voice control unit configured to trigger the sending of a keyword control command, and the identified The keyword data responds to the user operation as a control command to implement voice control.
- the keyword identifying unit is further configured to perform keyword matching according to a predetermined manner of Hidden Markov Model HMM modeling, wherein the acoustic feature parameter extracted by the voice data for voice recognition is an MFCC feature parameter. The recognition result is used as a reference for keyword matching, and the identified keyword data is obtained.
- the keyword identification unit is further configured to perform keyword matching optimization processing based on a predetermined method of the shortest distance after obtaining the recognized keyword data.
- the keyword identification unit is further configured to perform a keyword matching optimization process based on a predetermined manner of a shortest distance, and establish a keyword data speech library; and extract an acoustic characteristic parameter of the identified keyword data.
- a keyword matching optimization process based on a predetermined manner of a shortest distance, and establish a keyword data speech library; and extract an acoustic characteristic parameter of the identified keyword data.
- VQ vector quantization
- the keyword identification unit is further configured to determine whether the control command is executed by comparing the energy information of the keyword data, and if the execution is completed, end the current keyword matching, and perform voice recognition on the voice data again. .
- the keyword data includes: at least one basic control command information of incoming call, outgoing call, answer, and hang up.
- the voice acquiring unit, the keyword identifying unit, and the voice control unit may use a central processing unit (CPU), a digital signal processor (DSP, Digital Singnal Processor) or programmable when performing processing.
- a central processing unit CPU
- DSP digital signal processor
- programmable when performing processing.
- Logic array FPGA, Field - Programmable Gate Array
- the method of the embodiment of the present invention includes: acquiring voice data after triggering a user operation; performing voice recognition on the voice data, performing keyword matching according to a predetermined manner, from the voice data Obtaining the identified keyword data; triggering the transmission of the keyword control command, and responding to the user operation by using the recognized keyword data as a control command to implement voice control. Since the transmission of the keyword control command can be triggered by the recognized keyword data, the voice control is implemented in response to the user operation. Therefore, the automatic matching of the control command in the embodiment of the present invention is used instead of the existing one. The user operates manually, which is convenient for the user to operate and free the user's hands.
- FIG. 2 is a structural diagram of a device according to an embodiment of the present invention.
- FIG. 3 is a flowchart of an application scenario according to an embodiment of the present invention.
- FIG. 4 is a schematic diagram of an example of vector quantization according to an embodiment of the present invention.
- FIG. 5-7 is a flowchart of an implementation of a basic module of an apparatus in an application scenario according to an embodiment of the present invention. detailed description
- the solution of the embodiment of the present invention is a scheme for applying a voice recognition technology to perform keyword recognition and implementing voice control, and can be used in various application scenarios such as a visual communication system, a call between terminal devices, and a mutual message sending, through voice data keywords.
- the identification of the automatic matching control command instead of the current manual control, the embodiment of the present invention as an auxiliary means enables the user to perform various more personalized control operations.
- the voice control method of the embodiment of the present invention includes:
- Step 101 Acquire voice data after triggering a user operation.
- Step 102 Perform voice recognition on the voice data, perform keyword matching according to a predetermined manner, and obtain the identified keyword data from the voice data.
- Step 103 trigger sending of a keyword control command, and use the identified keyword data as Voice control is implemented in response to the user's operation by a control command.
- the keyword data includes: at least one basic control command information of incoming call, outgoing call, answer, and hang up.
- step 102 performs voice recognition on the voice data, and performs keyword matching according to a predetermined manner. If the recognized keyword data is obtained from the voice data, step 103 may be performed, and if it does not match, the key cannot be recognized. The keyword data can be sent as normal data.
- the voice control device of the embodiment of the present invention includes:
- the voice acquiring unit 11 is configured to acquire voice data after triggering a user operation.
- the keyword identifying unit 12 is configured to perform voice recognition on the voice data, perform keyword matching according to a predetermined manner, and obtain the identified keyword data from the voice data.
- the voice control unit 13 is configured to trigger the transmission of the keyword control command, and the recognized keyword data is used as a control command to respond to the user operation to implement voice control.
- the embodiments of the present invention can be used in various application scenarios such as a visual communication system, a call between terminal devices, and a mutual text message.
- a visual communication application scenario such as a visual communication system, a call between terminal devices, and a mutual text message.
- the following is specifically illustrated in a visual communication application scenario.
- the embodiment of the present invention includes the following steps in the visual communication application scenario: Step 201: After the user triggers the visual communication operation, the voice data is acquired.
- Step 202 Perform keyword matching identification on the voice data, and if yes, perform steps
- step 204 go to step 204.
- Step 203 In response to the user operation, issue a keyword control command to implement voice control in the visual communication operation.
- Step 204 Send an RTP data packet.
- the voice acquiring unit is embedded in the RTP voice data packet.
- the voice signal is collected by the voice input device such as a microphone for sampling, and then the preprocessing of step 202 is performed.
- the element performs keyword matching recognition processing, and if the keyword data is matched, responds to the user operation; otherwise, the piece of voice data is packaged and transmitted. That is, as long as the recognition result of the voice data is not a control command conforming to the keyword, the voice data is directly transmitted. If the recognition result is a control command conforming to the keyword, the transmission of the keyword control command is triggered, and the control command is used. Visual communication for operational control.
- the adopted content includes: 1) a predetermined mode algorithm of the first level; 2) a predetermined mode algorithm combining the first level and the second level Combining, using content 2) is a matching optimization process for content 1) to ultimately get a more accurate keyword and issue it as the final control command.
- the first-level predetermined mode algorithm is modeled based on a hidden Markov model (HMM) to implement keyword recognition; and the second-level predetermined mode algorithm is a shortest distance matching method to implement keywords Identification.
- HMM hidden Markov model
- a two-layer algorithm parallel recognition process may be adopted and combined, and the first layer is modeled by using an existing HMM-based method,
- the acoustic feature parameters extracted by speech data for speech recognition are Mel frequency Cepstrum Coefficient feature parameters, the identified keyword data is used as the first layer reference, and the second layer is to establish a setting.
- a good keyword data speech library and then extracting acoustic characteristic parameters of the identified keyword data obtained by the first layer, the acoustic feature parameters being MFCC feature parameters, using vector quantization
- VQ Vector Quantization
- the shortest distance of the representative vector in (or called the cell), the shortest distance is compared with an empirical threshold obtained by experience, and if a certain predetermined criterion is met, it is the result of the final recognition, ie The result is a more accurate keyword and is issued as the final control command.
- the above algorithm used by the keyword recognition unit is described as follows:
- the execution process of the speech recognition algorithm usually includes: 1) receiving a speech signal; 2) parameter extraction; 3) modeling statistical analysis; 4) judging logic; 5) identifying output.
- the embodiment of the present invention takes the extraction of the MFCC.
- the main steps are: Pre-emphasis, framing, windowing, fast Fourier transform, triangular bandpass filter and other existing steps.
- HMM Hidden Markov Model
- the HMM is a parameterized representation of the statistical characteristics of the random process.
- Probability model In the Hidden Markov Model of Language Recognition, each word generates a corresponding HMM, and each observation sequence consists of a word of speech. The recognition of the word is evaluated to select the most likely to produce the pronunciation represented by the observation sequence.
- the HMM is implemented.
- the second layer of vector quantization is a further optimization of the first layer reference. It is also an existing modeling statistical analysis.
- the basic principle of vector quantization is: Combine several scalar data into one vector (or from one frame of speech data) The extracted feature vector) gives overall quantization in the multidimensional space, so that the data can be compressed with less loss of information.
- the scalar here can be understood as the extracted parameter.
- the control command bl uses vector quantization (VQ) to perform clustering, and obtains a vector a corresponding to the recognition result a, a vector b corresponding to the control command bl, and then compares the degree of similarity of the vector a and the vector b, that is, the shortest distance.
- VQ vector quantization
- the solution is to subtract the vector b from the vector a to obtain the vector c. If the amplitude and phase of the vector c are closer to 0 (or 180°), the more similar the recognition result a1 and the control command b1 are.
- this empirical threshold is the degree of similarity between the recognition result a l and the control command b l , which depends on the resulting vector c.
- the acquisition of this value of vector c requires multiple iterations to determine.
- the voice signal needs to be collected by the voice input device.
- the voice signal is acquired, we can identify the obtained voice signal through the keyword recognition unit.
- the keyword data is recognized, the energy of the recognized keyword can be calculated.
- the information, and the energy information of the 20 frames before and after are compared. If the average value of the energy of the identified keyword is twice the energy average of the 20 frames before and after, it is determined that the keyword data that needs to be controlled is actually obtained and used as a control command. Control operations such as incoming, outgoing, answering, and hanging up can then be controlled based on the set keywords.
- the device of the embodiment of the present invention adds a voice acquiring unit and a keyword identifying unit on the basis of the existing module, and mainly implements keyword matching recognition through a keyword identifying module, and will identify the The keyword responds as a control command.
- the existing modules include: a proxy module 21, a SIP protocol stack module 22, a component communication function library integration module 23, a signaling control process module 24, a media processing process module 25, and a media video display module. 26.
- the proxy module 21 includes an MS message proxy 211 and a friend proxy module 212.
- the SIP protocol stack module 22 is responsible for receiving and transmitting messages with the SIP server.
- the component communication function library integration module 23 includes functions required to provide functions.
- the mediation process module 24 is configured to receive the audio and video data collection function, and the media video display module 26 is mainly configured to display the collected video.
- the call control process module 24 is configured to process the call control command and manage the audio and video input and output devices. data.
- FIG. 5-7 flowcharts for implementing the operations of each module of the signaling control process module 24, the media processing process module 25, and the media video display module 26, respectively, and the proxy module 21, the SIP protocol stack module 22, and the component communication functions are shown.
- the functions of the library integration module 23 are mainly configured for the transmission and interaction of the voice data.
- the embodiments of the present invention focus on the processing of the voice data. Therefore, these modules have little relationship with the embodiments of the present invention, and are not described in detail herein.
- the signaling control process module 24 is configured to process call control commands, and the implementation process includes the following steps:
- Step 401 Receive a control command.
- Step 402 According to the voice control command table, the software API corresponding to the operation is invoked, and all the operations of the manual can be replaced.
- Step 403 After calling the software API corresponding to the operation, perform corresponding operations, open various devices, process specific transactions, or shut down various devices.
- the media processing process module 25 is configured to collect audio data and determine audio that is not a control command.
- the implementation process includes the following steps:
- Step 501 Collect audio data and collect video data separately.
- Step 502 Using open source software coding for collecting audio data and video data respectively.
- Step 503 Code multiplexing, integrating audio data and video data to obtain audio and video data.
- Step 504 The network transmits the audio and video data.
- the media video display module 26 is configured to display the collected video data, and the video data compressed by the media processing process module 25 is decoded, decoded, and sent.
- the audio data is decoded by the third-party open source software FFmpeg (FFmpeg is an open source free cross-platform video and audio streaming solution that provides a complete solution for recording, converting and streaming audio and video). After processing, it is sent to the sound card to output audio for playing corresponding video data, and the implementation process includes the following steps:
- Step 601 Receive audio and video data that is coded and multiplexed.
- Step 602 Demultiplexing, parsing the audio and video data into audio data and video data.
- Step 603 Obtain an audio data packet and a video data packet respectively.
- Step 604 Decode the audio data packet and the video data packet separately, decode the audio data packet by using open source software, obtain pulse code modulation (PCM) data, and decode the video data packet by using a self-developed chip hardware decoder to obtain a picture.
- PCM pulse code modulation
- Step 605 Send the decoded audio data to the sound card output, and send the decoded video data to the hardware display buffer (buffer) for hardware display.
- a speech recognition keyword information module and a keyword processing module are also included, and the two modules are integrated into the basic module, and some operations for controlling visual communication by voice, such as incoming calls, can be performed. , call out, answer and hang up.
- voice can be recognized by the keyword information module.
- the keyword is used.
- the processing module determines whether the identified keyword information needs to be processed in the next step by using certain criteria, and if necessary, starts to feedback the voice control operation; otherwise, the control operation is not performed.
- voice recognition technology can be accurately applied in visual communication, and operations such as incoming call, outgoing call, answering, and hanging up are controlled by voice.
- the embodiments of the present invention mainly include a voice acquisition module and a key in an application scenario suitable for voice recognition and input, such as visual communication, mobile phone making, mobile phone texting, and the like.
- the word recognition unit after the voice input, mainly through the keyword recognition unit, can detect the required keyword, and then control the control operation in the visual communication according to the keyword trigger command, such as calling, hanging up, and calling in. Wait.
- two-layer speech recognition control is adopted, one layer is a HMM-based speech recognition method, and the other layer is a shortest distance matching method, and two-layer identification control is performed to obtain accurate keyword data, and at the same time The energy of 20 frames before and after is compared according to the keyword data, thereby matching whether the required keyword data is obtained.
- the integrated module described in the embodiment of the present invention can also be stored in a computer readable storage medium shield if it is implemented in the form of a software function module and sold or used as an independent product.
- the technical solution of the embodiments of the present invention may be embodied in the form of a software product in essence or in the form of a software product.
- the computer software product is stored in a storage medium and includes a plurality of instructions.
- a computer device (which may be a personal computer, server, or network device, etc.) is implemented to perform all or part of the methods described in various embodiments of the present invention.
- the foregoing storage medium includes: a USB flash drive, a removable hard disk, a read only memory (ROM), a random access memory (RAM), a magnetic disk or an optical disk, and the like, which can store program codes. medium.
- ROM read only memory
- RAM random access memory
- magnetic disk or an optical disk and the like, which can store program codes. medium.
- embodiments of the invention are not limited to any specific combination of hardware and software.
- the embodiment of the present invention further provides a computer storage medium, wherein a computer program is stored, and the computer program is used to execute the voice control method of the embodiment of the present invention.
- the method of the embodiment of the present invention includes: acquiring voice data after triggering a user operation; performing voice recognition on the voice data, performing keyword matching according to a predetermined manner, and obtaining the recognized keyword data from the voice data; triggering a keyword Controlling the transmission of the command, and the recognized keyword data is used as a control command to respond to the user operation to implement voice control. Since the transmission of the keyword control command can be triggered by the recognized keyword data, the voice control is implemented in response to the user operation. Therefore, the automatic matching of the control command in the embodiment of the present invention is used instead of the existing one. The user operates manually, which is convenient for the user to operate and free the user's hands.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Telephonic Communication Services (AREA)
- Telephone Function (AREA)
Abstract
L'invention concerne un procédé et un dispositif de commande vocale. Le procédé comprend : l'acquisition de données vocales après le déclenchement d'une opération de l'utilisateur (101) ; la réalisation d'une reconnaissance vocale sur les données vocales, la réalisation d'une comparaison de mots-clés d'une manière prédéfinie et l'obtention de données de mots-clés reconnus à partir des données vocales (102) ; et le déclenchement de l'envoi d'une instruction de commande de mots-clés et la prise en compte des données de mots-clés reconnus comme instruction de commande pour répondre à l'opération de l'utilisateur de manière à réaliser une commande vocale (103).
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410007018.4 | 2014-01-07 | ||
CN201410007018.4A CN104766608A (zh) | 2014-01-07 | 2014-01-07 | 一种语音控制方法及装置 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2015103836A1 true WO2015103836A1 (fr) | 2015-07-16 |
Family
ID=53523498
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2014/078463 WO2015103836A1 (fr) | 2014-01-07 | 2014-05-26 | Procédé et dispositif de commande vocale |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN104766608A (fr) |
WO (1) | WO2015103836A1 (fr) |
Families Citing this family (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105139858B (zh) * | 2015-07-27 | 2019-07-26 | 联想(北京)有限公司 | 一种信息处理方法及电子设备 |
CN106488286A (zh) * | 2015-08-28 | 2017-03-08 | 上海欢众信息科技有限公司 | 云端信息收集系统 |
CN106686445B (zh) * | 2015-11-05 | 2019-06-11 | 北京中广上洋科技股份有限公司 | 对多媒体文件进行按需跳转的方法 |
CN105898496A (zh) * | 2015-11-18 | 2016-08-24 | 乐视网信息技术(北京)股份有限公司 | 基于Android设备的HLS流硬解码方法及装置 |
CN108242241B (zh) * | 2016-12-23 | 2021-10-26 | 中国农业大学 | 一种纯语音快速筛选方法及其装置 |
WO2018170992A1 (fr) | 2017-03-21 | 2018-09-27 | 华为技术有限公司 | Procédé et dispositif de contrôle de conversation |
CN110349572B (zh) * | 2017-05-27 | 2021-10-22 | 腾讯科技(深圳)有限公司 | 一种语音关键词识别方法、装置、终端及服务器 |
CN107249116B (zh) * | 2017-08-09 | 2020-05-05 | 成都全云科技有限公司 | 基于视频会议的噪音回音消除装置 |
CN109003604A (zh) * | 2018-06-20 | 2018-12-14 | 恒玄科技(上海)有限公司 | 一种实现低功耗待机的语音识别方法及系统 |
CN110174924B (zh) * | 2018-09-30 | 2021-03-30 | 广东小天才科技有限公司 | 一种基于可穿戴设备的交友方法及可穿戴设备 |
CN109887512A (zh) * | 2019-03-15 | 2019-06-14 | 深圳市奥迪信科技有限公司 | 智慧酒店客房控制方法及系统 |
CN112086091A (zh) * | 2020-09-18 | 2020-12-15 | 南京孝德智能科技有限公司 | 一种智能化养老服务系统及方法 |
CN112687269B (zh) * | 2020-12-18 | 2022-11-08 | 山东盛帆蓝海电气有限公司 | 楼宇管理机器人语音自动识别方法及系统 |
CN113709545A (zh) * | 2021-04-13 | 2021-11-26 | 腾讯科技(深圳)有限公司 | 视频的处理方法、装置、计算机设备和存储介质 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1999053478A1 (fr) * | 1998-04-15 | 1999-10-21 | Microsoft Corporation | Modele acoustique a configuration dynamique pour systemes de reconnaissance vocale |
CN1455388A (zh) * | 2002-09-30 | 2003-11-12 | 中国科学院声学研究所 | 语音识别系统及用于语音识别系统的特征矢量集的压缩方法 |
CN101154379A (zh) * | 2006-09-27 | 2008-04-02 | 夏普株式会社 | 定位语音中的关键词的方法和设备以及语音识别系统 |
CN101673112A (zh) * | 2009-09-17 | 2010-03-17 | 李华东 | 智能家居语音控制器 |
EP1588354B1 (fr) * | 2003-01-14 | 2011-08-24 | Motorola Mobility, Inc. | Procede et appareil de reconstitution de la parole |
CN103366743A (zh) * | 2012-03-30 | 2013-10-23 | 北京千橡网景科技发展有限公司 | 操作语音命令的方法和装置 |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN2136559Y (zh) * | 1992-07-09 | 1993-06-16 | 陈康 | 语音控制的电话自动拨号装置 |
JP4263614B2 (ja) * | 2001-12-17 | 2009-05-13 | 旭化成ホームズ株式会社 | リモートコントロール装置及び情報端末装置 |
CN101516005A (zh) * | 2008-02-23 | 2009-08-26 | 华为技术有限公司 | 一种语音识别频道选择系统、方法及频道转换装置 |
CN101345668A (zh) * | 2008-08-22 | 2009-01-14 | 中兴通讯股份有限公司 | 监控设备的控制方法和装置 |
CN101923857A (zh) * | 2009-06-17 | 2010-12-22 | 复旦大学 | 一种人机交互的可扩展语音识别方法 |
CN102568478B (zh) * | 2012-02-07 | 2015-01-07 | 合一网络技术(北京)有限公司 | 一种基于语音识别的视频播放控制方法和系统 |
CN102938811A (zh) * | 2012-10-15 | 2013-02-20 | 华南理工大学 | 一种基于语音识别的家庭手机通话系统 |
-
2014
- 2014-01-07 CN CN201410007018.4A patent/CN104766608A/zh not_active Withdrawn
- 2014-05-26 WO PCT/CN2014/078463 patent/WO2015103836A1/fr active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1999053478A1 (fr) * | 1998-04-15 | 1999-10-21 | Microsoft Corporation | Modele acoustique a configuration dynamique pour systemes de reconnaissance vocale |
CN1455388A (zh) * | 2002-09-30 | 2003-11-12 | 中国科学院声学研究所 | 语音识别系统及用于语音识别系统的特征矢量集的压缩方法 |
EP1588354B1 (fr) * | 2003-01-14 | 2011-08-24 | Motorola Mobility, Inc. | Procede et appareil de reconstitution de la parole |
CN101154379A (zh) * | 2006-09-27 | 2008-04-02 | 夏普株式会社 | 定位语音中的关键词的方法和设备以及语音识别系统 |
CN101673112A (zh) * | 2009-09-17 | 2010-03-17 | 李华东 | 智能家居语音控制器 |
CN103366743A (zh) * | 2012-03-30 | 2013-10-23 | 北京千橡网景科技发展有限公司 | 操作语音命令的方法和装置 |
Also Published As
Publication number | Publication date |
---|---|
CN104766608A (zh) | 2015-07-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2015103836A1 (fr) | Procédé et dispositif de commande vocale | |
WO2021082941A1 (fr) | Procédé et appareil de reconnaissance de silhouette sur vidéo, support de stockage et dispositif électronique | |
US11875820B1 (en) | Context driven device arbitration | |
WO2021139425A1 (fr) | Procédé, appareil et dispositif de détection d'activité vocale, et support d'enregistrement | |
WO2020258661A1 (fr) | Procédé et appareil de séparation relatifs à une personne qui parle fondés sur un réseau neuronal récurrent et sur des caractéristiques acoustiques | |
US9330667B2 (en) | Method and system for endpoint automatic detection of audio record | |
JP6469252B2 (ja) | アカウント追加方法、端末、サーバ、およびコンピュータ記憶媒体 | |
KR100636317B1 (ko) | 분산 음성 인식 시스템 및 그 방법 | |
WO2017114201A1 (fr) | Procédé et dispositif d'exécution d'opération de réglage | |
WO2017076222A1 (fr) | Procédé et appareil de reconnaissance vocale | |
US9786284B2 (en) | Dual-band speech encoding and estimating a narrowband speech feature from a wideband speech feature | |
CN110047481B (zh) | 用于语音识别的方法和装置 | |
JP2019211749A (ja) | 音声の始点及び終点の検出方法、装置、コンピュータ設備及びプログラム | |
CN112102850B (zh) | 情绪识别的处理方法、装置、介质及电子设备 | |
CN104575504A (zh) | 采用声纹和语音识别进行个性化电视语音唤醒的方法 | |
TW201543467A (zh) | 語音輸入方法、裝置和系統 | |
CN105679310A (zh) | 一种用于语音识别方法及系统 | |
US11763819B1 (en) | Audio encryption | |
US20170301341A1 (en) | Methods and systems for identifying keywords in speech signal | |
WO2014173325A1 (fr) | Procédé et dispositif de reconnaissance de gutturophonie | |
WO2016183961A1 (fr) | Procédé, système et dispositif de changement de l'interface d'un dispositif intelligent, et support d'informations non volatil pour ordinateur | |
CN112802498B (zh) | 语音检测方法、装置、计算机设备和存储介质 | |
CN109065036A (zh) | 语音识别的方法、装置、电子设备及计算机可读存储介质 | |
CN111816216A (zh) | 语音活性检测方法和装置 | |
CN112242149A (zh) | 音频数据的处理方法、装置、耳机及计算机可读存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 14877679 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 14/10/2016) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 14877679 Country of ref document: EP Kind code of ref document: A1 |