BR112017021673B1

BR112017021673B1 - VOICE CONTROL METHOD, COMPUTER READABLE NON-TRANSITORY MEDIUM AND TERMINAL

Info

Publication number: BR112017021673B1
Application number: BR112017021673-6A
Authority: BR
Inventors: Junyang ZHOU
Original assignee: Honor Device Co., Ltd
Priority date: 2015-04-10
Filing date: 2015-04-10
Publication date: 2023-02-14
Also published as: WO2016161641A1; AU2015390534A1; EP3282445A4; CA2982196C; JP6564058B2; US20210287671A1; EP3282445A1; AU2015390534B2; CN106463112A; AU2021286393B2; US10943584B2; CN106463112B; AU2021286393A1; US11783825B2; CA2982196A1; JP2018517919A; US20180033436A1; AU2019268131A1; BR112017021673A2

Abstract

MÉTODO DE RECONHECIMENTO DE VOZ, APARELHO DE ATIVAÇÃO DE VOZ, APARELHO DE RECONHECIMENTO DE VOZ, E TERMINAL. Modalidades da presente invenção fornecem um método de reconhecimento de voz e um terminal. O usuário necessita transmitir apenas uma instrução, e requisitos do usuário podem ser satisfeitos. O usuário não necessita de uma ajuda de uma tela sensível ao toque e também não necessita de inserir múltiplas instruções. O método inclui: escutar (301), por um aparelho de ativação de voz (101), informação de voz em um ambiente circundante; quando é determinado que a informação de voz obtida pela escuta corresponde a um modelo de ativação de voz, armazenar em buffer (301), pelo aparelho de ativação de voz (101), informação de voz, da primeira duração predefinida, obtida pela escuta, e transmitir um sinal de disparo para disparar a habilitação de um aparelho de reconhecimento de voz (102), onde o sinal de disparo é usado para instruir o aparelho de reconhecimento de voz a ler e reconhecer a informação de voz armazenada em buffer pelo aparelho de ativação de voz (101), após o aparelho de reconhecimento de voz ser habilitado; receber (302), pelo aparelho de reconhecimento de voz (102), o sinal de disparo transmitido pelo aparelho de ativação de voz; após receber (...).VOICE RECOGNITION METHOD, VOICE ACTIVATION DEVICE, VOICE RECOGNITION DEVICE, AND TERMINAL. Embodiments of the present invention provide a voice recognition method and a terminal. The user only needs to transmit one instruction, and the user's requirements can be satisfied. The user does not need the help of a touch screen and also does not need to enter multiple instructions. The method includes: listening (301), by a voice activation apparatus (101), to voice information in a surrounding environment; when it is determined that the voice information obtained by listening corresponds to a voice activation template, buffering (301), by the voice activation apparatus (101), voice information, of the first predefined duration, obtained by listening, and transmitting a trigger signal to trigger the enabling of a voice recognition apparatus (102), wherein the trigger signal is used to instruct the speech recognition apparatus to read and recognize speech information buffered by the speech recognition apparatus. voice activation (101), after the voice recognition apparatus is enabled; receiving (302), by the voice recognition apparatus (102), the trigger signal transmitted by the voice activation apparatus; after receiving (...).

Description

TECHNICAL FIELD

[001] A presente invenção refere-se ao campo de tecnologias de comunicações móveis e, em particular, a um método de reconhecimento de voz, a um aparelho de ativação de voz, a um aparelho de reconhecimento de voz, e a um terminal.[001] The present invention relates to the field of mobile communication technologies, and in particular to a voice recognition method, a voice activation apparatus, a voice recognition apparatus, and a terminal.

FUNDAMENTALS

[002] No momento, com a crescente popularidade de um terminal portátil móvel, em particular, um telefone móvel, tecnologias de telas sensíveis ao toque estão também se tornando cada vez mais maduras. Embora as tecnologias de telas sensíveis ao toque facilitem uma operação de um usuário, múltiplas etapas de toques necessitam ser executadas para completar uma interação de chamada, e uma chamada pode ser perdida quando o usuário está dirigindo ou não é conveniente executar um toque.[002] At present, with the growing popularity of a mobile handheld terminal, in particular a mobile phone, touch screen technologies are also becoming more and more mature. Although touch screen technologies facilitate a user's operation, multiple touch steps need to be performed to complete a call interaction, and a call can be missed when the user is driving or it is not convenient to perform a touch.

[003] Portanto, funções tais como fazer uma chamada ou transmitir uma mensagem de SMS baseadas em tecnologias de reconhecimento de voz emergem. Além disso, no momento, como uma nova e importante tecnologia de interação com o usuário, o reconhecimento de voz é cada vez mais aplicado amplamente a terminais móveis.[003] Therefore, functions such as making a call or transmitting an SMS message based on voice recognition technologies emerge. In addition, at present, as an important new user interaction technology, voice recognition is more and more widely applied to mobile terminals.

[004] Contudo, serviços atuais, tais como fazer uma chamada ou transmitir uma mensagem de SMS, baseados em tecnologias de reconhecimento de voz, podem ser implementados apenas quando as tecnologias de telas sensíveis ao toque são usadas cooperativamente.[004] However, current services, such as making a call or transmitting an SMS message, based on voice recognition technologies, can only be implemented when touch screen technologies are used cooperatively.

SUMMARY

[005] Modalidades da presente invenção fornecem um método de reconhecimento de voz, um aparelho de ativação de voz, um aparelho de reconhecimento de voz, e um terminal. O usuário necessita transmitir apenas uma instrução, e requisitos do usuário podem ser satisfeitos. O usuário não necessita de uma ajuda de uma tela sensível ao toque e também não necessita de inserir múltiplas instruções.[005] Embodiments of the present invention provide a voice recognition method, a voice activation apparatus, a voice recognition apparatus, and a terminal. The user only needs to transmit one instruction, and the user's requirements can be satisfied. The user does not need the help of a touch screen and also does not need to enter multiple instructions.

[006] De acordo com um primeiro aspecto, uma modalidade da presente invenção fornece um método de reconhecimento de voz, e o método inclui: escutar, por um aparelho de ativação de voz, informação de voz em um ambiente circundante; e quando é determinado que a informação de voz obtida pela escuta corresponde a um modelo de ativação de voz, armazenar em buffer, pelo aparelho de ativação de voz, informação de voz, de primeira duração predefinida, obtida pela escuta, e transmitir um sinal de disparo, para disparar a habilitação de um aparelho de reconhecimento de voz, onde o sinal de disparo é usado para instruir o aparelho de reconhecimento de voz a ler e reconhecer a informação de voz armazenada em buffer pelo aparelho de ativação de voz, após o aparelho de reconhecimento de voz ser habilitado.[006] According to a first aspect, an embodiment of the present invention provides a voice recognition method, and the method includes: listening, by a voice activation apparatus, voice information in a surrounding environment; and when it is determined that the voice information obtained by listening corresponds to a voice activation pattern, buffering, by the voice activation apparatus, voice information, of first predefined duration, obtained by listening, and transmitting a signal of trigger, for triggering the enabling of a voice recognition apparatus, where the trigger signal is used to instruct the voice recognition apparatus to read and recognize the voice information buffered by the voice activation apparatus, after the apparatus for voice recognition to be enabled.

[007] Com referência ao primeiro aspecto, em um primeiro modo possível de implementação do primeiro aspecto, a determinação de que a informação de voz obtida pela escuta corresponde a um modelo de ativação de voz inclui: quando a informação de voz obtida pela escuta corresponder à informação de ativação de voz predeterminada, determinar que a informação de voz obtida pela escuta corresponde ao modelo de ativação de voz.[007] With reference to the first aspect, in a first possible way of implementing the first aspect, determining that the voice information obtained by listening corresponds to a voice activation model includes: when the voice information obtained by listening matches to the predetermined voice activation information, determining that the voice information obtained by listening corresponds to the voice activation model.

[008] Com referência ao primeiro aspecto, em um segundo modo possível de implementação do primeiro aspecto, a determinação de que a informação de voz obtida pela escuta corresponde a um modelo de ativação de voz inclui: quando a informação de voz obtida pela escuta corresponder à informação de ativação de voz predeterminada, extrair um recurso de impressão de voz em um sinal de voz obtido pela escuta, determinar que o recurso de impressão de voz extraído corresponde a um recurso de impressão de voz predeterminado, e determinar que a informação de voz obtida pela escuta corresponde ao modelo de ativação de voz.[008] With reference to the first aspect, in a second possible way of implementing the first aspect, determining that the voice information obtained by listening corresponds to a voice activation model includes: when the voice information obtained by listening matches to the predetermined speech activation information, extracting a voiceprint feature from a voice signal obtained by listening, determining that the extracted voiceprint feature corresponds to a predetermined voiceprint feature, and determining that the voice information obtained by listening corresponds to the voice activation model.

[009] De acordo com um segundo aspecto, uma modalidade da presente invenção fornece um método de reconhecimento de voz, e o método inclui: receber, por um aparelho de reconhecimento de voz, um sinal de disparo transmitido por um aparelho de ativação de voz, onde o sinal de disparo é usado para instruir o aparelho de reconhecimento de voz a habilitar-se e reconhecer a primeira informação armazenada em buffer pelo aparelho de ativação de voz; após receber o sinal de disparo, habilitar-se, pelo aparelho de reconhecimento de voz, e escutar a segunda informação de voz da segunda duração predefinida; e reconhecer a primeira informação de voz armazenada em buffer pelo aparelho de ativação de voz e a segunda informação de voz obtida pela escuta, para obter um resultado de reconhecimento.[009] According to a second aspect, an embodiment of the present invention provides a voice recognition method, and the method includes: receiving, by a voice recognition apparatus, a trigger signal transmitted by a voice activation apparatus , where the trigger signal is used to instruct the voice recognition apparatus to wake up and recognize the first information buffered by the voice activation apparatus; after receiving the trigger signal, enabling by the voice recognition apparatus and listening to the second voice information of the second preset duration; and recognizing the first voice information buffered by the voice activation apparatus and the second voice information obtained by listening, to obtain a recognition result.

[010] Com referência ao segundo aspecto, em um primeiro modo possível de implementação do segundo aspecto, após o aparelho de reconhecimento de voz obter o resultado de reconhecimento, o método ainda inclui: executar, pelo aparelho de reconhecimento de voz, correspondência entre o resultado de reconhecimento obtido e a informação de instrução de voz pré-armazenada; e executar, pelo aparelho de reconhecimento de voz, uma operação correspondente à informação de instrução de voz correspondente.[010] With reference to the second aspect, in a first possible way of implementing the second aspect, after the voice recognition apparatus obtains the recognition result, the method further includes: executing, by the voice recognition apparatus, correspondence between the obtained recognition result and the pre-stored voice instruction information; and performing, by the voice recognition apparatus, an operation corresponding to the corresponding voice instruction information.

[011] Com referência ao segundo aspecto do primeiro modo possível de implementação do segundo aspecto, em um segundo modo possível de implementação do segundo aspecto, o método inclui ainda: quando é determinado que o sinal de disparo não é recebido novamente dentro da terceira duração predefinida após o sinal de disparo ser recebido, desabilitá-lo automaticamente, pelo aparelho de reconhecimento de voz.[011] With reference to the second aspect of the first possible mode of implementation of the second aspect, in a second possible mode of implementation of the second aspect, the method further includes: when it is determined that the trigger signal is not received again within the third duration preset after the trigger signal is received, disable it automatically by the voice recognition device.

[012] De acordo com um terceiro aspecto, uma modalidade da presente invenção fornece um método de reconhecimento de voz, e o método inclui: escutar, por um aparelho de ativação de voz, informação de voz em um ambiente circundante; e quando é determinado que a informação de voz obtida pela escuta corresponde a um modelo de ativação de voz, transmitir, pelo aparelho de ativação de voz, um sinal de disparo para disparar a habilitação de um aparelho de reconhecimento de voz.[012] According to a third aspect, an embodiment of the present invention provides a voice recognition method, and the method includes: listening, by a voice activation apparatus, voice information in a surrounding environment; and when it is determined that the voice information obtained by listening corresponds to a voice activation pattern, transmitting, by the voice activation apparatus, a trigger signal to trigger the enabling of a voice recognition apparatus.

[013] Com referência ao terceiro aspecto, em um primeiro modo possível de implementação do terceiro aspecto, a determinação de que a informação de voz obtida pela escuta corresponde a um modelo de ativação de voz inclui: quando a informação de voz obtida por escuta corresponder à informação de ativação de voz predeterminada, determinar que a informação de voz obtida pela escuta corresponde ao modelo de ativação de voz.[013] With reference to the third aspect, in a first possible way of implementing the third aspect, determining that the voice information obtained by listening corresponds to a voice activation model includes: when the voice information obtained by listening matches to the predetermined voice activation information, determining that the voice information obtained by listening corresponds to the voice activation model.

[014] Com referência ao terceiro aspecto, em um segundo modelo possível de implementação do terceiro aspecto, a determinação de que a informação de voz obtida pela escuta corresponde a um modelo de ativação de voz inclui: quando a informação de voz obtida pela escuta corresponder à informação de ativação de voz predeterminada, extrair um recurso de impressão de voz em um sinal de voz obtido pela escuta, determinar que o recurso de impressão de voz extraído corresponde a um recurso de impressão de voz predeterminado, e determinar que a informação de voz obtida pela escuta corresponde ao modelo de ativação de voz.[014] With reference to the third aspect, in a second possible model of implementation of the third aspect, the determination that the voice information obtained by listening corresponds to a voice activation model includes: when the voice information obtained by listening matches to the predetermined speech activation information, extracting a voiceprint feature from a voice signal obtained by listening, determining that the extracted voiceprint feature corresponds to a predetermined voiceprint feature, and determining that the voice information obtained by listening corresponds to the voice activation model.

[015] De acordo com um quarto aspecto, uma modalidade da presente invenção fornece um método de reconhecimento de voz, e o método inclui: receber, por um aparelho de reconhecimento de voz, um sinal de disparo transmitido por um aparelho de ativação de voz; habilitar-se, pelo aparelho de reconhecimento de voz após recepção do sinal de disparo, e transmitir uma instrução de lembrete de voz a um usuário; e gravar, pelo aparelho de reconhecimento de voz, um sinal de voz inserido pelo usuário de acordo com a instrução de lembrete de voz, e executar reconhecimento sobre o sinal de voz para obter um resultado de reconhecimento.[015] According to a fourth aspect, an embodiment of the present invention provides a voice recognition method, and the method includes: receiving, by a voice recognition apparatus, a trigger signal transmitted by a voice activation apparatus ; enabling itself, by the voice recognition apparatus upon receipt of the trigger signal, and transmitting a voice reminder instruction to a user; and recording, by the voice recognition apparatus, a voice signal entered by the user in accordance with the voice reminder instruction, and performing recognition on the voice signal to obtain a recognition result.

[016] De acordo com um quinto aspecto, uma modalidade da presente invenção fornece ainda um aparelho de ativação de voz, e o aparelho inclui: um módulo de escuta, configurado para escutar informação de voz em um ambiente circundante; um módulo de determinação, configurado para determinar se a informação de voz obtida pela escuta do módulo de escuta corresponde a um modelo de ativação de voz; um módulo de armazenamento em buffer para: quando o módulo de determinação determinar que a informação de voz obtida pela escuta pelo módulo de escuta corresponde ao modelo de ativação de voz, armazenar em buffer informação de voz, da primeira duração predefinida, obtida pela escuta pelo módulo de escuta; e um módulo de transmissão, configurado para transmitir um sinal de disparo para disparar a habilitação de um aparelho de reconhecimento de voz, onde o sinal de disparo é usado para instruir o aparelho de reconhecimento de voz a ler e reconhecer a informação de voz armazenada em buffer pelo aparelho de ativação de voz, após o aparelho de reconhecimento de voz ser habilitado.[016] According to a fifth aspect, an embodiment of the present invention further provides a voice activation apparatus, and the apparatus includes: a listening module, configured to listen to voice information in a surrounding environment; a determining module, configured to determine whether the voice information obtained by listening to the listening module corresponds to a voice activation template; a buffering module for: when the determination module determines that the voice information obtained by listening by the listening module matches the voice activation pattern, buffering voice information, of the first predefined duration, obtained by listening by the listening module; and a transmission module, configured to transmit a trigger signal to trigger enabling a voice recognition apparatus, where the trigger signal is used to instruct the voice recognition apparatus to read and recognize voice information stored in buffered by the voice activation device after the voice recognition device is enabled.

[017] Com referência ao quinto aspecto, em um primeiro modo possível de implementação do quinto aspecto, o módulo de determinação é especificamente configurado para: quando é determinado que a informação de voz obtida pela escuta corresponde à informação de ativação de voz predeterminada, determinar que a informação de voz obtida pela escuta corresponde ao modelo de ativação de voz.[017] With reference to the fifth aspect, in a first possible way of implementing the fifth aspect, the determination module is specifically configured to: when it is determined that the voice information obtained by listening corresponds to the predetermined voice activation information, determine that the voice information obtained by listening corresponds to the voice activation model.

[018] Com referência ao quinto aspecto, em um segundo modo possível de implementação do quinto aspecto, o aparelho inclui ainda: um módulo de extração, configurado para: quando o módulo de determinação determinar que a informação de voz obtida pela escuta corresponde à informação de ativação de voz predeterminada, extrair um recurso de impressão de voz em um sinal de voz obtido pela escuta; onde o módulo de determinação é ainda configurado para: quando é determinado que o recurso de impressão de voz extraído pelo módulo de extração corresponde a um recurso de impressão de voz predeterminado, determinar que a informação de voz obtida pela escuta corresponde ao modelo de ativação de voz.[018] With reference to the fifth aspect, in a second possible way of implementing the fifth aspect, the device also includes: an extraction module, configured for: when the determination module determines that the voice information obtained by listening corresponds to the information of predetermined voice activation, extracting a voiceprint feature on a voice signal obtained by listening; where the determination module is further configured to: when it is determined that the voiceprint feature extracted by the extraction module corresponds to a predetermined speechprint feature, determining that the voice information obtained by the listening corresponds to the activation model of voice.

[019] De acordo com um sexto aspecto, uma modalidade da presente invenção fornece um aparelho de reconhecimento de voz, que inclui: um módulo de recepção, configurado para receber um sinal de disparo transmitido por um aparelho de ativação de voz, onde o sinal de disparo é usado para instruir o aparelho de reconhecimento de voz a habilitar-se e reconhecer a primeira informação de voz armazenada em buffer pelo aparelho de ativação de voz; um módulo de escuta, configurado para: após o módulo de recepção receber o sinal de disparo, habilitar-se e escutar segunda informação de voz de segunda duração predefinida; e um módulo de reconhecimento, configurado para reconhecer a primeira informação de voz armazenada em buffer pelo aparelho de ativação de voz e a segunda informação de voz obtida pela escuta pelo módulo de escuta para obter um resultado de reconhecimento.[019] According to a sixth aspect, an embodiment of the present invention provides a voice recognition apparatus, which includes: a reception module, configured to receive a trigger signal transmitted by a voice activation apparatus, where the signal trigger is used to instruct the voice recognition apparatus to wake up and recognize the first voice information buffered by the voice activation apparatus; a listening module, configured so that: after the reception module receives the trigger signal, it enables itself and listens to second voice information of a second predefined duration; and a recognition module, configured to recognize the first voice information buffered by the voice activation apparatus and the second voice information obtained by listening by the listening module to obtain a recognition result.

[020] Com referência ao sexto aspecto, em um primeiro modo possível de implementação do sexto aspecto, o aparelho inclui ainda: um módulo de correspondência, configurado para executar correspondência entre o resultado de reconhecimento obtido após o módulo de reconhecimento executar reconhecimento e informação de instrução de voz pré-armazenada; e um módulo de execução, configurado para executar uma operação correspondente à informação de instrução de voz correspondente.[020] With reference to the sixth aspect, in a first possible way of implementing the sixth aspect, the apparatus further includes: a correspondence module, configured to perform correspondence between the recognition result obtained after the recognition module performs recognition and information of pre-stored voice instruction; and an execution module, configured to perform an operation corresponding to the corresponding voice instruction information.

[021] Com referência ao sexto aspecto ou ao primeiro modo possível de implementação do sexto aspecto, em um segundo modo possível de implementação do sexto aspecto, o aparelho inclui ainda: um módulo de inabilitação, configurado para: quando o sinal de disparo não for recebido novamente dentro da terceira duração predefinida após o sinal de disparo ser recebido, desabilitar o módulo de reconhecimento de voz.[021] With reference to the sixth aspect or the first possible way of implementing the sixth aspect, in a second possible way of implementing the sixth aspect, the device also includes: a disablement module, configured for: when the trigger signal is not received again within the third preset duration after the trigger signal is received, disable the voice recognition module.

[022] De acordo com um sétimo aspecto, uma modalidade da presente invenção fornece um aparelho de ativação de voz, que inclui: um módulo de escuta, configurado para escutar informação de voz em um ambiente circundante; um módulo de determinação, configurado para determinar se a informação de voz obtida pela escuta corresponde a um modelo de ativação de voz; e um módulo de transmissão, configurado para: quando o módulo de determinação determinar que a informação de voz obtida pela escuta corresponde ao modelo de ativação de voz, transmitir um sinal de disparo para disparar a habilitação de um aparelho de reconhecimento de voz.[022] According to a seventh aspect, an embodiment of the present invention provides a voice activation apparatus, which includes: a listening module, configured to listen to voice information in a surrounding environment; a determination module, configured to determine whether the voice information obtained by the listener corresponds to a voice activation template; and a transmission module, configured to: when the determination module determines that the voice information obtained by listening corresponds to the voice activation pattern, transmit a trigger signal to trigger the activation of a voice recognition apparatus.

[023] Com referência ao sétimo aspecto, em um primeiro modo possível de implementação do sétimo aspecto, o módulo de determinação é especificamente configurado para: quando é determinado que a informação de voz obtida pela escuta corresponde à informação de ativação de voz predeterminada, determinar que a informação de voz obtida pela escuta corresponde ao modelo de ativação de voz.[023] With reference to the seventh aspect, in a first possible way of implementing the seventh aspect, the determination module is specifically configured to: when it is determined that the voice information obtained by listening corresponds to the predetermined voice activation information, determine that the voice information obtained by listening corresponds to the voice activation model.

[024] Com referência ao sétimo aspecto, em um segundo modo possível de implementação do sétimo aspecto, o aparelho inclui ainda: um módulo de extração, configurado para: quando o módulo de determinação determinar que a informação de voz obtida pela escuta corresponde à informação de ativação de voz predeterminada, extrair um recurso de impressão de voz em um sinal de voz obtido pela escuta; onde o módulo de determinação é ainda configurado para: quando é determinado que o recurso de impressão de voz extraído corresponde a um recurso de impressão de voz predeterminado, determinar que a informação de voz obtida pela escuta corresponde ao modelo de ativação de voz.[024] With reference to the seventh aspect, in a second possible way of implementing the seventh aspect, the device also includes: an extraction module, configured for: when the determination module determines that the voice information obtained by listening corresponds to the information of predetermined voice activation, extracting a voiceprint feature on a voice signal obtained by listening; wherein the determination module is further configured to: when it is determined that the extracted speechprint resource corresponds to a predetermined speechprint resource, determining that the voice information obtained by listening corresponds to the voice activation model.

[025] De acordo com um oitavo aspecto, uma modalidade da presente invenção fornece um aparelho de reconhecimento de voz, que inclui: um módulo de recepção, configurado para receber um sinal de disparo transmitido por um aparelho de ativação de voz; um módulo de transmissão, configurado para: após o módulo de recepção receber o sinal de disparo, habilitar-se e transmitir uma instrução de lembrete de voz a um usuário; e um módulo de processamento, configurado para gravar um sinal de voz inserido pelo usuário de acordo com a instrução de lembrete de voz, e reconhecer o sinal de voz para obter um resultado de reconhecimento.[025] According to an eighth aspect, an embodiment of the present invention provides a voice recognition apparatus, including: a receiver module, configured to receive a trigger signal transmitted by a voice activation apparatus; a transmitting module, configured to: after the receiving module receives the trigger signal, power up and transmit a voice reminder instruction to a user; and a processing module, configured to record a voice signal entered by the user in accordance with the voice reminder instruction, and recognize the voice signal to obtain a recognition result.

[026] De acordo com um nono aspecto, uma modalidade da presente invenção fornece um terminal, que inclui: um aparelho de ativação de voz e um aparelho de reconhecimento de voz; onde o aparelho de ativação de voz é configurado para: escutar informação de voz em um ambiente circundante; quando é determinado que a informação de voz obtida pela escuta corresponde a um modelo de ativação de voz, armazenar em buffer a primeira informação de voz obtida pela escuta dentro da primeira duração predefinida, e transmitir um sinal de disparo para disparar a habilitação do aparelho de reconhecimento de voz; e o aparelho de reconhecimento de voz é configurado para: após receber o sinal de disparo transmitido pelo aparelho de ativação de voz, habilitar-se e escutar a segunda informação de voz dentro da segunda duração predefinida, e reconhecer a primeira informação de voz armazenada em buffer pelo aparelho de ativação de voz e a segunda informação obtida pela escuta para obter um resultado de reconhecimento.[026] According to a ninth aspect, an embodiment of the present invention provides a terminal, including: a voice activation apparatus and a voice recognition apparatus; where the voice activation apparatus is configured to: hear voice information in a surrounding environment; when it is determined that the voice information obtained by the tap corresponds to a voice activation pattern, buffering the first voice information obtained by the tap within the first predefined duration, and transmitting a trigger signal to trigger the enable of the listening device. voice recognition; and the voice recognition apparatus is configured to: after receiving the trigger signal transmitted by the voice activation apparatus, power up and listen to the second voice information within the second preset duration, and recognize the first voice information stored in buffered by the voice activation apparatus and the second information obtained by listening to obtain a recognition result.

[027] Com referência ao nono aspecto, em um primeiro modo possível de implementação do nono aspecto, o aparelho de ativação de voz é um processador digital de sinais DSP.[027] With reference to the ninth aspect, in a first possible mode of implementation of the ninth aspect, the voice activation apparatus is a DSP digital signal processor.

[028] Com referência ao nono aspecto do primeiro modo possível de implementação do nono aspecto, em um segundo modo possível de implementação do nono aspecto, o aparelho de reconhecimento de voz é um processador de aplicativos AP.[028] With reference to the ninth aspect of the first possible mode of implementation of the ninth aspect, in a second possible mode of implementation of the ninth aspect, the voice recognition apparatus is an AP application processor.

[029] Ao utilizar soluções fornecidas nas modalidades da presente invenção, um usuário necessita transmitir apenas uma instrução, e requisitos do usuário podem ser satisfeitos. Além disso, as soluções são aplicáveis independentemente de o terminal estar em um estado de espera ou em um estado sem espera. Além disso, um aparelho de ativação de voz armazena em buffer informação de voz obtida pela escuta, e o aparelho de reconhecimento de voz escuta segunda informação de voz após ser habilitado, e reconhece a primeira informação de voz armazenada em buffer e a segunda informação de voz, de modo que uma perda de informação parcial de voz transmitida pelo usuário antes do aparelho de reconhecimento de voz estar habilitado pode ser evitada quando o aparelho de reconhecimento de voz inicia a obtenção de informação de voz após ser ativado.[029] When using solutions provided in the embodiments of the present invention, a user needs to transmit only one instruction, and user requirements can be satisfied. Furthermore, the solutions are applicable regardless of whether the endpoint is in a standby state or a non-wait state. In addition, a voice activation apparatus buffers voice information obtained by listening, and the voice recognition apparatus listens to second voice information after being enabled, and recognizes the first buffered voice information and the second voice information. voice, so that a loss of partial voice information transmitted by the user before the voice recognition apparatus is enabled can be avoided when the voice recognition apparatus starts acquiring voice information after being activated.

BRIEF DESCRIPTION OF THE DRAWINGS

[030] A FIG. 1 é um diagrama estrutural esquemático de um terminal de acordo com uma modalidade da presente invenção.[030] FIG. 1 is a schematic structural diagram of a terminal according to an embodiment of the present invention.

[031] A FIG. 2 é um fluxograma de um método de reconhecimento de voz de acordo com uma modalidade da presente invenção.[031] FIG. 2 is a flowchart of a speech recognition method according to an embodiment of the present invention.

[032] A FIG. 3 é outro fluxograma de um método de reconhecimento de voz de acordo com uma modalidade da presente invenção.[032] FIG. 3 is another flowchart of a speech recognition method according to an embodiment of the present invention.

[033] A FIG. 4 é ainda outro fluxograma de um método de reconhecimento de voz de acordo com uma modalidade da presente invenção.[033] FIG. 4 is yet another flowchart of a speech recognition method according to an embodiment of the present invention.

[034] A FIG. 5 é ainda mais outro fluxograma de um método de reconhecimento de voz de acordo com uma modalidade da presente invenção.[034] FIG. 5 is yet another flowchart of a speech recognition method according to an embodiment of the present invention.

[035] A FIG. 6 é um diagrama esquemático de um aparelho de ativação de voz de acordo com uma modalidade da presente invenção.[035] FIG. 6 is a schematic diagram of a voice activation apparatus in accordance with an embodiment of the present invention.

[036] A FIG. 7 é um diagrama esquemático de um aparelho de reconhecimento de voz de acordo com uma modalidade da presente invenção.[036] FIG. 7 is a schematic diagram of a speech recognition apparatus in accordance with an embodiment of the present invention.

[037] A FIG. 8 é outro diagrama esquemático de um aparelho de ativação de voz de acordo com uma modalidade da presente invenção.[037] FIG. 8 is another schematic diagram of a voice activation apparatus in accordance with an embodiment of the present invention.

[038] A FIG. 9 é outro diagrama esquemático de um aparelho de reconhecimento de voz de acordo com uma modalidade da presente invenção.[038] FIG. 9 is another schematic diagram of a speech recognition apparatus in accordance with an embodiment of the present invention.

[039] A FIG. 10 é um diagrama esquemático de um método de reconhecimento de voz de acordo com uma modalidade da presente invenção.[039] FIG. 10 is a schematic diagram of a speech recognition method according to an embodiment of the present invention.

DESCRIPTION OF MODALITIES

[040] Para tornar os objetivos, soluções técnicas e vantagens da presente invenção mais claros, a seguir descreve-se ainda a presente invenção em detalhe com referência aos desenhos anexos. Aparentemente, as modalidades descritas são meramente uma parte ao invés de todas as modalidades da presente invenção. Todas as outras modalidades obtidas por pessoas versadas na técnica baseadas nas modalidades da presente invenção sem esforços criativos deverão cair dentro do âmbito de proteção da presente invenção.[040] To make the objectives, technical solutions and advantages of the present invention more clear, the present invention is further described in detail below with reference to the attached drawings. Apparently, the described embodiments are merely a part rather than all embodiments of the present invention. All other embodiments obtained by persons skilled in the art based on the embodiments of the present invention without creative efforts shall fall within the scope of protection of the present invention.

[041] Modalidades da presente invenção fornecem um método de reconhecimento de voz, um aparelho de ativação de voz, um aparelho de conhecimento de voz e um terminal. Um usuário necessita transmitir apenas uma instrução, e requisitos do usuário podem ser satisfeitos. O usuário não necessita da ajuda de uma tela sensível ao toque e não necessita tão pouco inserir múltiplas instruções. O método e o aparelho são baseados em um mesmo conceito inventivo. Uma vez que os princípios para solucionar um problema pelo método e pelo aparelho são similares, a implementação do terminal, do aparelho e do método podem fazer referência uns aos outros, e não será fornecida descrição repetida.[041] Embodiments of the present invention provide a voice recognition method, a voice activation apparatus, a voice recognition apparatus and a terminal. A user only needs to transmit one instruction, and user requirements can be satisfied. The user does not need the assistance of a touch screen and does not need to enter multiple instructions. The method and the apparatus are based on the same inventive concept. Since the principles for solving a problem by method and by apparatus are similar, the implementation of the terminal, apparatus and method may refer to each other, and repeated description will not be provided.

[042] Uma modalidade da presente invenção fornece um terminal. Como mostrado na FIG. 1, o terminal inclui um aparelho de ativação de voz 101 e um aparelho de reconhecimento de voz 102.[042] An embodiment of the present invention provides a terminal. As shown in FIG. 1, the terminal includes a voice activation apparatus 101 and a voice recognition apparatus 102.

[043] O aparelho de ativação de voz 101 pode ser implementado mediante utilização de um processador digital de sinais (Processador Digital de Sinais, DSP abreviado). O aparelho de reconhecimento de voz 102 pode ser implementado mediante utilização de um processador de aplicativos (Processador de Aplicativos, AP abreviado). O aparelho de reconhecimento de voz 102 pode ainda ser implementado mediante utilização de uma unidade central de processamento (Unidade Central de Processo, CPU abreviado).[043] The voice activation apparatus 101 can be implemented using a digital signal processor (Digital Signal Processor, abbreviated DSP). Speech recognition apparatus 102 can be implemented using an application processor (Application Processor, AP for short). The voice recognition apparatus 102 may further be implemented using a central processing unit (Central Processing Unit, abbreviated CPU).

[044] O aparelho de ativação de voz 101 é configurado para: escutar informação de voz em um ambiente circundante; quando é determinado que a informação de voz obtida pela escuta corresponde a um modelo de ativação de voz, armazenar em buffer a primeira informação de voz obtida pela escuta dentro da primeira duração predefinida, e transmitir um sinal de disparo para disparar a habilitação do aparelho de reconhecimento de voz.[044] The voice activation apparatus 101 is configured to: listen to voice information in a surrounding environment; when it is determined that the voice information obtained by the tap corresponds to a voice activation pattern, buffering the first voice information obtained by the tap within the first predefined duration, and transmitting a trigger signal to trigger the enable of the listening device. voice recognition.

[045] O aparelho de reconhecimento de voz 102 é configurado para: após receber o sinal de disparo transmitido pelo aparelho de ativação de voz, habilitar-se e escutar a segunda informação de voz dentro da segunda duração predefinida, e reconhecer a primeira informação de voz armazenada em buffer pelo aparelho de ativação de voz e a segunda informação de voz obtida pela escuta, para obter um resultado de reconhecimento.[045] The voice recognition apparatus 102 is configured to: after receiving the trigger signal transmitted by the voice activation apparatus, enable and listen to the second voice information within the second predefined duration, and recognize the first information of voice buffered by the voice activation apparatus and the second voice information obtained by listening, to obtain a recognition result.

[046] Opcionalmente, quando é determinado que o sinal de disparo não é recebido novamente dentro da terceira duração predefinida após o sinal de disparo ser recebido, o aparelho de reconhecimento de voz 102 automaticamente se desabilita.[046] Optionally, when it is determined that the trigger signal is not received again within the third predefined duration after the trigger signal is received, the voice recognition apparatus 102 automatically disables itself.

[047] Ao utilizar soluções fornecidas nesta modalidade da presente invenção, um usuário necessita transmitir apenas uma instrução, e requisitos do usuário podem ser satisfeitos. Além disso, as soluções são aplicáveis independentemente de um terminal estar em um estado de espera ou em um estado sem espera. Além disso, um aparelho de ativação de voz armazena em buffer informação de voz obtida pela escuta, e um aparelho de reconhecimento de voz escuta segunda informação de voz após ser habilitado, e reconhece a primeira informação de voz armazenada em buffer e a segunda informação de voz, de modo que uma perda de informação parcial de voz transmitida pelo usuário antes do aparelho de reconhecimento de voz estar habilitado pode ser evitada quando o aparelho de reconhecimento de voz inicia a obtenção de informação de voz após ser ativado.[047] When using solutions provided in this embodiment of the present invention, a user needs to transmit only one instruction, and user requirements can be satisfied. Furthermore, the solutions are applicable regardless of whether an endpoint is in a standby state or a non-wait state. Furthermore, a voice activation apparatus buffers voice information obtained by listening, and a voice recognition apparatus listens to second voice information after being enabled, and recognizes the first buffered voice information and the second voice information. voice, so that a loss of partial voice information transmitted by the user before the voice recognition apparatus is enabled can be avoided when the voice recognition apparatus starts acquiring voice information after being activated.

[048] Especificamente, geralmente após um DSP iniciar um sinal de disparo, após ser habilitado, um AP habilita um canal de gravação para executar gravação. Geralmente, a gravação é iniciada após o AP ser habilitado. Contudo, nesta solução, antes do AP ser habilitado, o DSP inicia a execução de gravação e armazenamento em buffer quando recebendo informação de ativação. Após ser habilitado, o AP continua a executar gravação para obter informação de voz, e em seguida reconhece informação de voz lida de um buffer de DSP e a informação de voz obtida após o AP ser habilitado. Em um cenário Hipotético, existe uma diferença de tempo entre ativar o DSP e transmitir uma instrução pelo DSP. Se a gravação for executada após o AP ser habilitado, apenas informação de voz após o AP ser habilitado pode ser gravada, e informação de voz dentro da diferença de tempo acima mencionada é perdida. Contudo, se a gravação for iniciada e armazenada em buffer quando o DSP for ativado, a informação de voz dentro da diferença de tempo acima mencionada pode ser obtida.[048] Specifically, usually after a DSP initiates a trigger signal, after being enabled, an AP enables a recording channel to perform recording. Generally, recording starts after the AP is enabled. However, in this solution, before the AP is enabled, the DSP starts performing recording and buffering when receiving activation information. After being enabled, the AP continues to perform recording to obtain voice information, and then recognizes voice information read from a DSP buffer and the voice information obtained after the AP is enabled. In a Hypothetical scenario, there is a time difference between activating the DSP and transmitting an instruction through the DSP. If recording is performed after the AP is enabled, only voice information after the AP is enabled can be recorded, and voice information within the above-mentioned time difference is lost. However, if recording is started and buffered when the DSP is activated, voice information within the above-mentioned time difference can be obtained.

[049] Por exemplo, um ponto de tempo para iniciar a falar uma palavra de ativação é t0, um ponto de tempo para terminar de falar a palavra de ativação é t1, um ponto de tempo para iniciar a falar uma palavra de comando é t2 e um ponto de tempo para habilitar o AP é t3. O buffer inclui informação de voz de t0 a t3. Contudo, se a gravação for executada apenas quando o AP estiver habilitado, apenas informação de voz após t3 pode ser gravada, e informação de voz de t0 a t3 não pode ser gravada. Portanto, na solução fornecida nesta modalidade da presente invenção, pode ser obtida informação de voz após informação de voz usada para ativação, e a perda de informação de voz é evitada, melhorando deste modo o reconhecimento de voz.[049] For example, a time point to start speaking a wake word is t0, a time point to finish speaking a wake word is t1, a time point to start speaking a command word is t2 and a time point to enable the AP is t3. The buffer includes speech information from t0 to t3. However, if recording is performed only when the AP is enabled, only voice information after t3 can be recorded, and voice information from t0 to t3 cannot be recorded. Therefore, in the solution provided in this embodiment of the present invention, voice information after voice information used for activation can be obtained, and loss of voice information is avoided, thereby improving voice recognition.

[050] Opcionalmente, os modos a seguir podem ser especificamente usados para determinar que a informação de voz obtida pela escuta corresponde ao modelo de ativação de voz:[050] Optionally, the following modes can be specifically used to determine that the voice information obtained by listening corresponds to the voice activation model:

[051] Um primeiro modo de implementação: quando a informação de voz obtida pela escuta corresponder à informação de ativação de voz predeterminada, a informação de voz obtida pela escuta corresponde ao modelo de ativação de voz.[051] A first implementation mode: when the voice information obtained by listening corresponds to the predetermined voice activation information, the voice information obtained by listening corresponds to the voice activation model.

[052] Um usuário pode definir informação de ativação de voz no aparelho de ativação de voz de acordo com um lembrete em avanço, por exemplo: Alô, pequeno E. Alternativamente, informação de ativação de voz é predefinida em um terminal no momento da entrega de fábrica. Quando detectando informação de voz em um ambiente circundante, o aparelho de ativação de voz compara a informação de voz com informação de ativação de voz armazenada. Se as duas forem iguais, a informação de voz corresponde ao modelo de ativação de voz, e uma instrução de disparo é transmitida ao aparelho de reconhecimento de voz 102; ou se as duas forem diferentes, o aparelho de ativação de voz 101 pode descartar a informação de voz correntemente detectada e continuar a executar trabalho de detecção e determinação.[052] A user can set voice activation information on the voice activation device according to a reminder in advance, for example: Hello, little E. Alternatively, voice activation information is preset in a terminal at the time of delivery of manufactures. When detecting voice information in a surrounding environment, the voice activation device compares the voice information with stored voice activation information. If the two are the same, the voice information corresponds to the voice activation model, and a trigger instruction is transmitted to the voice recognition apparatus 102; or if the two are different, the voice activation apparatus 101 can discard the currently detected voice information and continue to perform detection and determination work.

[053] Um segundo modo de implementação: quando a informação de voz obtida pela escuta corresponder à informação de ativação de voz predeterminada, é extraído um recurso de impressão de voz em um sinal de voz obtido pela escuta, o recurso de impressão de voz extraído corresponde a um recurso de impressão de voz predeterminado, e a informação de voz obtida pela escuta corresponde ao modelo de ativação de voz.[053] A second implementation mode: when the voice information obtained by listening corresponds to the predetermined voice activation information, a voiceprint feature is extracted from a voice signal obtained by listening, the voiceprint feature extracted corresponds to a predetermined voiceprint resource, and the voice information obtained by listening corresponds to the voice activation model.

[054] O recurso de impressão de voz inclui um ou mais dos seguintes recursos: um parâmetro acústico que reflete o recurso de impressão de voz, tal como uma entonação, um coeficiente de predição linear, um parâmetro de envelope espectral, uma razão harmônica de energia, uma frequência ressonante de pico e sua largura de banda, um cepstrum (também denominado cepstrum de potência), ou um coeficiente de cepstrum de coeficiente Mel (Coeficiente de Cepstrum de Frequência MEL, MFCC abreviado). Esta modalidade não está limitada aos parâmetros de recursos de impressão de voz acima mencionados.[054] The voiceprint feature includes one or more of the following features: an acoustic parameter that reflects the voiceprint feature, such as an intonation, a linear prediction coefficient, a spectral envelope parameter, a harmonic ratio of energy, a peak resonant frequency and its bandwidth, a cepstrum (also called a power cepstrum), or a Mel coefficient cepstrum coefficient (MEL Frequency Cepstrum Coefficient, abbreviated MFCC). This modality is not limited to the above-mentioned voiceprint feature parameters.

[055] Um aparelho de configuração pode ser ainda incluído nesta modalidade da presente invenção. Um recurso de impressão de voz de um usuário é pré-extraído e armazenado no aparelho de ativação de voz. Por exemplo, o usuário pode gravar informação de voz dentro de um módulo de configuração de acordo com um lembrete, em seguida extrair um recurso de impressão de voz e armazenar o recurso de impressão de voz extraído no aparelho de ativação de voz.[055] A configuration apparatus can be further included in this embodiment of the present invention. A user's voiceprint asset is pre-extracted and stored in the voice activation device. For example, the user can record voice information within a setting module according to a reminder, then extract a voiceprint feature and store the extracted voiceprint feature in the voice activation apparatus.

[056] Ao utilizar as soluções fornecidas nesta modalidade da presente invenção, um recurso de impressão de voz é adicionado ao modelo de ativação, de modo que ruído em ambiente circundante e entrada de voz de outros usuários podem ser filtrados, e um aparelho de ativação de voz pode fornecer segurança confiável a um usuário.[056] When using the solutions provided in this embodiment of the present invention, a voiceprint feature is added to the activation model, so that noise in the surrounding environment and voice input from other users can be filtered, and an activation device voice can provide a user with reliable security.

[057] Opcionalmente, após o aparelho de reconhecimento de voz obter o resultado de reconhecimento, o aparelho de reconhecimento de voz executa correspondência entre o resultado de reconhecimento obtido e a informação de instrução de voz pré-armazenada; o aparelho de reconhecimento de voz controla a execução de uma operação que corresponde à informação de instrução de voz correspondente.[057] Optionally, after the voice recognition apparatus obtains the recognition result, the voice recognition apparatus performs correspondence between the obtained recognition result and the pre-stored voice instruction information; the voice recognition apparatus controls the execution of an operation corresponding to the corresponding voice instruction information.

[058] A informação de instrução de voz é pré-armazenada no aparelho de reconhecimento de voz. O aparelho de reconhecimento de voz inclui múltiplas peças de informação de instrução de voz.[058] The voice instruction information is pre-stored in the voice recognition apparatus. The voice recognition apparatus includes multiple pieces of voice instruction information.

[059] Um módulo de execução que executa uma operação correspondente à informação de instrução de voz pode ser ainda incluído nesta modalidade da presente invenção. O aparelho de reconhecimento de voz pode transmitir uma instrução de execução ao módulo de execução que executa a operação correspondente à informação de instrução de voz. Por exemplo, é incluído um alto-falante, um aparelho emissor de luz, ou similar.[059] An execution module that performs an operation corresponding to the voice instruction information can be further included in this embodiment of the present invention. The voice recognition apparatus may transmit an execution instruction to the execution module which performs the operation corresponding to the voice instruction information. For example, a loudspeaker, light-emitting apparatus, or the like is included.

[060] Por exemplo, quando detectando que a informação de voz em um ambiente circundante satisfaz um modelo de ativação, um módulo de ativação de voz armazena em buffer a primeira informação de voz da primeira duração predefinida, tal como 2s, dispara um módulo de reconhecimento de voz para habilitar-se e escutar a segunda informação de voz, reconhece em seguida a informação da primeira informação de voz armazenada em buffer e a segunda informação de voz, e vagamente compara um resultado de reconhecimento com a informação de instrução de voz para determinar se a informação de voz corresponde a uma peça da informação de instrução de voz. Por exemplo, a informação de instrução de voz inclui informação de instrução de voz que instrui a tocar uma campainha ou MP3, tal como “Toque uma Campainha” ou “Toque um MP3”; ou inclui informação de instrução de voz que instrui a fazer uma pergunta, tal como “Onde está você?”; ou inclui informação de instrução de voz que instrui a acender um flash de câmera, tal como “ligar um flash de câmera”.[060] For example, when detecting that voice information in a surrounding environment satisfies an activation model, a voice activation module buffers the first voice information of the first predefined duration, such as 2s, triggers a voice recognition to enable and listen to the second voice information, then recognize the buffered first voice information information and the second voice information, and loosely compare a recognition result with the voice instruction information to determining whether the voice information corresponds to a piece of voice instruction information. For example, voice instruction information includes voice instruction information that instructs you to play a bell or MP3, such as "Ring a Bell" or "Play an MP3"; or includes voice instruction information that instructs you to ask a question, such as “Where are you?”; or includes voice instruction information that instructs you to turn on a camera flash, such as “turn on a camera flash”.

[061] Um dispositivo terminal tal como um telefone móvel pode ser procurado mediante utilização da solução fornecida nesta modalidade da presente invenção. Em casa, o telefone móvel é geralmente colocado aleatoriamente e leva algum tempo até encontrá-lo se for necessário usá-lo. Ao utilizar a solução fornecida nesta modalidade da presente invenção, “Alô, pequeno E, onde você está?” pode ser falado. Portanto, um módulo de ativação de voz no telefone móvel detecta a informação de voz, e executa correspondência entre a informação de voz e um modelo de ativação de voz (por exemplo, informação de ativação de voz é “Alô, pequeno E”). Quando a informação de voz corresponder ao modelo de ativação de voz, a informação de voz é armazenada em um Buffer, e um sinal de disparo é transmitido a um aparelho de reconhecimento de voz. O módulo de reconhecimento de voz habilita-se e inicia a escuta de informação de voz, reconhece em seguida a informação de voz armazenada em buffer e a informação de voz obtida pela escuta, para obter um resultado de reconhecimento (um resultado de texto é “Alô, pequeno E, onde você está?”), e executar correspondência entre o resultado de texto e informação de instrução de voz. Por exemplo, se a informação de instrução de voz que corresponde a “Alô, pequeno E, onde você está?” for tocar MP3, música MP3 será tocada para lembrar o usuário.[061] A terminal device such as a mobile phone can be searched using the solution provided in this embodiment of the present invention. At home, the mobile phone is usually placed randomly and it takes some time to find it if you need to use it. By using the solution provided in this embodiment of the present invention, “Hello, little E, where are you?” can be spoken. Therefore, a voice activation module in the mobile phone detects the voice information, and performs correspondence between the voice information and a voice activation template (for example, voice activation information is "Hello, little E"). When the voice information matches the voice activation model, the voice information is stored in a Buffer, and a trigger signal is transmitted to a voice recognition device. The speech recognition module turns on and starts listening for voice information, then recognizes the buffered speech information and the voice information obtained by listening, to obtain a recognition result (a text result is “ Hello, little E, where are you?”), and perform correspondence between the text result and voice instruction information. For example, if the voice instruction information that corresponds to “Hello, little E, where are you?” for playing MP3, MP3 music will be played to remind the user.

[062] Pode ser feita uma chamada mediante utilização da solução fornecida nesta modalidade da presente invenção. Independentemente de o telefone móvel estar em um estado em espera ou em um estado de tela travada, um módulo de ativação de voz do telefone móvel está sempre em um estado habilitado, de modo que a informação de voz transmitida por um usuário pode ser obtida pela escuta, por exemplo de “Alô, pequeno E, chame pequeno A”. Então, uma chamada pode ser feita diretamente, e não são necessárias quaisquer outras operações.[062] A call can be made using the solution provided in this embodiment of the present invention. Regardless of whether the mobile phone is in a standby state or in a screen-locked state, a mobile phone's voice activation module is always in an enabled state, so that voice information transmitted by a user can be obtained by listening, for example “Hello, little E, call little A”. Then a call can be made directly, and no further operations are required.

[063] Uma modalidade da presente invenção fornece ainda um método de reconhecimento de voz. Como mostrado na FIG. 2, o método inclui:[063] An embodiment of the present invention also provides a voice recognition method. As shown in FIG. 2, the method includes:

[064] Etapa 201: Um aparelho de ativação de voz escuta informação de voz em um ambiente circundante.[064] Step 201: A voice activation device listens for voice information in a surrounding environment.

[065] Etapa 202: Quando é determinado que a informação de voz obtida pela escuta corresponde a um modelo de ativação de voz, o aparelho de ativação de voz armazena em buffer informação de voz, da primeira duração predefinida, obtida pela escuta, e transmite um sinal de disparo para disparar a habilitação de um aparelho de reconhecimento de voz, onde o sinal de disparo é usado para instruir o aparelho de reconhecimento de voz a ler e reconhecer a informação de voz armazenada em buffer pelo aparelho de ativação de voz, após o aparelho de reconhecimento de voz ser habilitado.[065] Step 202: When it is determined that the voice information obtained by listening corresponds to a voice activation model, the voice activation apparatus buffers voice information, of the first predefined duration, obtained by listening, and transmits a trigger signal for triggering the enabling of a voice recognition apparatus, where the trigger signal is used to instruct the speech recognition apparatus to read and recognize speech information buffered by the voice activation apparatus after the voice recognition device is enabled.

[066] Ao utilizar soluções fornecidas nesta modalidade da presente invenção, um usuário necessita transmitir apenas uma instrução, e requisitos do usuário podem ser satisfeitos. Além disso, as soluções são aplicáveis independentemente de um terminal estar em um estado de espera ou em um estado sem espera. Além disso, a informação de voz obtida pela escuta é armazenada em buffer e o aparelho de reconhecimento de voz habilita-se e o aparelho de reconhecimento de voz escuta informação de voz, e em seguida reconhece a informação de voz armazenada em buffer e a informação de voz obtida pela escuta de modo que uma perda de informação parcial de voz pode ser evitada quando o aparelho de reconhecimento de voz começa a obter informação de voz após ser ativado, e o reconhecimento de voz é melhorado.[066] When using solutions provided in this embodiment of the present invention, a user needs to transmit only one instruction, and user requirements can be satisfied. Furthermore, the solutions are applicable regardless of whether an endpoint is in a standby state or a non-wait state. Furthermore, the voice information obtained by listening is buffered and the voice recognition apparatus turns on, and the voice recognition apparatus listens to voice information, and then recognizes the buffered voice information and the information of voice obtained by listening so that a loss of partial voice information can be avoided when the voice recognition apparatus starts to obtain voice information after being activated, and voice recognition is improved.

[067] Opcionalmente, os seguintes modos podem ser especificamente usados para determinar que a informação de voz obtida pela escuta corresponde ao modelo de ativação de voz.[067] Optionally, the following modes can be specifically used to determine that the voice information obtained by listening corresponds to the voice activation model.

[068] Um primeiro modo de implementação: quando a informação de voz obtida pela escuta corresponder à informação de ativação de voz predeterminada, a informação de voz obtida pela escuta corresponde ao modelo de ativação de voz.[068] A first implementation mode: when the voice information obtained by listening corresponds to the predetermined voice activation information, the voice information obtained by listening corresponds to the voice activation model.

[069] Um segundo modo de implementação: quando a informação de voz obtida pela escuta corresponder à informação de ativação de voz predeterminada, é extraído um recurso de impressão de voz em um sinal de voz obtido pela escuta, o recurso de impressão de voz extraído corresponde a um recurso de impressão de voz predeterminado, e a informação de voz obtida pela escuta corresponde ao modelo de ativação de voz.[069] A second implementation mode: when the voice information obtained by listening corresponds to the predetermined voice activation information, a voiceprint feature is extracted into a voice signal obtained by listening, the extracted voiceprint feature corresponds to a predetermined voiceprint resource, and the voice information obtained by listening corresponds to the voice activation model.

[070] Uma modalidade da presente invenção fornece ainda um método de reconhecimento de voz. Como mostrado na FIG. 3, o método inclui:[070] An embodiment of the present invention also provides a voice recognition method. As shown in FIG. 3, the method includes:

[071] Etapa 301: Um aparelho de reconhecimento de voz recebe um sinal de disparo transmitido por um aparelho de ativação de voz, onde o sinal de disparo é usado para instruir o aparelho de reconhecimento de voz a habilitar-se e reconhecer a primeira informação de voz armazenada em buffer pelo aparelho de ativação de voz.[071] Step 301: A voice recognition apparatus receives a trigger signal transmitted by a voice activation apparatus, where the trigger signal is used to instruct the voice recognition apparatus to enable itself and recognize the first information voice buffered by the voice activation device.

[072] Etapa 302: Após receber o sinal de disparo, o aparelho de reconhecimento de voz está habilitado e o aparelho de reconhecimento de voz escuta a segunda informação de voz da segunda duração predefinida.[072] Step 302: After receiving the trigger signal, the voice recognition apparatus is enabled and the voice recognition apparatus listens to the second voice information of the second preset duration.

[073] Etapa 303: Reconhecer a primeira informação de voz armazenada em buffer pelo aparelho de ativação de voz e a segunda informação de voz obtida pela escuta, para obter um resultado de reconhecimento.[073] Step 303: Recognize the first voice information buffered by the voice activation apparatus and the second voice information obtained by listening, to obtain a recognition result.

[074] Ao utilizar soluções fornecidas nesta modalidade da presente invenção, um usuário necessita transmitir apenas uma instrução, e requisitos do usuário podem ser satisfeitos. Além disso, as soluções são aplicáveis independentemente de um terminal estar em um estado de espera ou em um estado sem espera. Além disso, a informação de voz obtida pela escuta é armazenada em buffer, e um aparelho de reconhecimento de voz habilita-se e o aparelho de reconhecimento de voz escuta informação de voz, e em seguida reconhece a informação de voz armazenada em buffer e a informação de voz obtida pela escuta, de modo que uma perda de informação parcial de voz pode ser evitada quando o aparelho de reconhecimento de voz começa a obter informação de voz após ser ativado, e o reconhecimento de voz é melhorado.[074] When using solutions provided in this embodiment of the present invention, a user needs to transmit only one instruction, and user requirements can be satisfied. Furthermore, the solutions are applicable regardless of whether an endpoint is in a standby state or a non-wait state. Furthermore, the voice information obtained by listening is buffered, and a voice recognition apparatus turns on, and the voice recognition apparatus listens to voice information, and then recognizes the buffered voice information and the voice information obtained by listening, so that a loss of partial voice information can be avoided when the voice recognition apparatus starts to obtain voice information after being activated, and voice recognition is improved.

[075] Opcionalmente, após o aparelho de reconhecimento de voz obter o resultado de reconhecimento, o método inclui ainda: executar, pelo aparelho de reconhecimento de voz, correspondência entre o resultado de reconhecimento obtido e informação de instrução de voz pré-armazenada; e controlar, pelo aparelho de reconhecimento de voz, a execução de uma operação correspondente à informação de instrução de voz correspondente.[075] Optionally, after the voice recognition apparatus obtains the recognition result, the method further includes: executing, by the voice recognition apparatus, correspondence between the obtained recognition result and pre-stored voice instruction information; and controlling, by the voice recognition apparatus, the execution of an operation corresponding to the corresponding voice instruction information.

[076] Opcionalmente, quando é determinado que o sinal de disparo não é novamente recebido dentro da terceira duração predefinida após o sinal de disparo ser recebido, o aparelho de reconhecimento de voz automaticamente desabilita-se.[076] Optionally, when it is determined that the trigger signal is not received again within the third preset duration after the trigger signal is received, the voice recognition apparatus automatically disables itself.

[077] Uma modalidade da presente invenção fornece ainda um método de reconhecimento de voz. Como mostrado na FIG. 4, o método inclui:[077] An embodiment of the present invention also provides a voice recognition method. As shown in FIG. 4, the method includes:

[078] Etapa 401: Um aparelho de reconhecimento de voz escuta informação de voz em um ambiente circundante.[078] Step 401: A voice recognition apparatus listens for voice information in a surrounding environment.

[079] Etapa 402: Quando é determinado que a informação de voz obtida pela escuta corresponde a um modelo de ativação de voz, o aparelho de ativação de voz transmite um sinal de disparo para disparar a habilitação de um aparelho de reconhecimento de voz.[079] Step 402: When it is determined that the voice information obtained by listening corresponds to a voice activation model, the voice activation apparatus transmits a trigger signal to trigger the enabling of a voice recognition apparatus.

[080] Opcionalmente, os seguintes modos podem ser especificamente usados para determinar que a informação de voz obtida pela escuta corresponde ao modelo de ativação de voz.[080] Optionally, the following modes can be specifically used to determine that the voice information obtained by listening corresponds to the voice activation model.

[081] Um primeiro modo de implementação: quando a informação de voz obtida pela escuta corresponder à informação de ativação de voz predeterminada, a informação de voz obtida pela escuta corresponde ao modelo de ativação de voz.[081] A first implementation mode: when the voice information obtained by listening corresponds to the predetermined voice activation information, the voice information obtained by listening corresponds to the voice activation model.

[082] Um segundo modo de implementação: quando a informação de voz obtida pela escuta corresponder à informação de ativação de voz predeterminada, é extraído um recurso de impressão de voz em um sinal de voz obtido pela escuta, o recurso de impressão de voz extraído corresponde a um recurso de impressão de voz predeterminado, e a informação de voz obtida pela escuta corresponde ao modelo de ativação de voz.[082] A second implementation mode: when the voice information obtained by listening corresponds to the predetermined voice activation information, a voiceprint feature is extracted into a voice signal obtained by listening, the voiceprint feature extracted corresponds to a predetermined voiceprint resource, and the voice information obtained by listening corresponds to the voice activation model.

[083] Ao utilizar soluções fornecidas nesta modalidade da presente invenção, um usuário necessita transmitir apenas uma instrução, e requisitos do usuário podem ser satisfeitos. Além disso, as soluções são aplicáveis independentemente de um terminal estar em um estado de espera ou em um estado sem espera. Além disso, a informação de voz obtida pela escuta é armazenada em buffer, e um aparelho de reconhecimento de voz habilita-se e escuta informação de voz, e em seguida reconhece a informação de voz armazenada em buffer e a informação de voz obtida pela escuta, de modo que uma perda de informação parcial de voz pode ser evitada quando o aparelho de reconhecimento de voz começa a obter informação de voz após ser ativado, e o reconhecimento de voz é melhorado.[083] When using solutions provided in this embodiment of the present invention, a user needs to transmit only one instruction, and user requirements can be satisfied. Furthermore, the solutions are applicable regardless of whether an endpoint is in a standby state or a non-wait state. In addition, the voice information obtained by listening is buffered, and a voice recognition apparatus turns on and listens to voice information, and then recognizes the buffered voice information and the voice information obtained by listening , so that a loss of partial voice information can be avoided when the voice recognition apparatus starts to obtain voice information after being activated, and voice recognition is improved.

[084] Uma modalidade da presente invenção fornece ainda um método de reconhecimento de voz. Como mostrado na FIG. 5, o método inclui:[084] An embodiment of the present invention also provides a voice recognition method. As shown in FIG. 5, the method includes:

[085] Etapa 501: Um aparelho de reconhecimento de voz recebe um sinal de disparo transmitido por um aparelho de ativação de voz.[085] Step 501: A voice recognition apparatus receives a trigger signal transmitted by a voice activation apparatus.

[086] Etapa 502: Após receber o sinal de disparo, o aparelho de reconhecimento de voz habilita-se e transmite uma instrução de lembrete de voz a um usuário.[086] Step 502: Upon receiving the trigger signal, the voice recognition device powers up and transmits a voice reminder instruction to a user.

[087] Etapa 503: O aparelho de reconhecimento de voz grava um sinal de voz inserido pelo usuário de acordo com a instrução de lembrete de voz, e reconhece o sinal de voz para obter um resultado de reconhecimento.[087] Step 503: The voice recognition apparatus records a voice signal entered by the user in accordance with the voice reminder instruction, and recognizes the voice signal to obtain a recognition result.

[088] Ao utilizar soluções fornecidas nesta modalidade da presente invenção, um usuário necessita transmitir apenas uma instrução, e requisitos do usuário podem ser satisfeitos. Além disso, as soluções são aplicáveis independentemente de um terminal estar em um estado de espera ou em um estado sem espera.[088] When using solutions provided in this embodiment of the present invention, a user needs to transmit only one instruction, and user requirements can be satisfied. Furthermore, the solutions are applicable regardless of whether an endpoint is in a standby state or a non-wait state.

[089] Opcionalmente, após o aparelho de reconhecimento de voz reconhecer o sinal de voz para obter o resultado de reconhecimento, o método inclui ainda: executar, pelo aparelho de reconhecimento de voz, correspondência entre o resultado de reconhecimento obtido e a informação de instrução de voz pré-armazenada; e controlar, pelo aparelho de reconhecimento de voz, a execução de uma operação correspondente à informação de instrução de voz correspondente.[089] Optionally, after the voice recognition apparatus recognizes the voice signal to obtain the recognition result, the method further includes: executing, by the voice recognition apparatus, correspondence between the obtained recognition result and the instruction information pre-stored voice; and controlling, by the voice recognition apparatus, the execution of an operation corresponding to the corresponding voice instruction information.

[090] Uma modalidade da presente invenção fornece ainda um aparelho de ativação de voz. Como mostrado na FIG. 6, o aparelho inclui: um módulo de escuta 601, configurado para escutar informação de voz em um ambiente circundante; um módulo de determinação 602, configurado para determinar se a informação de voz obtida pela escuta pelo módulo de escuta 601 corresponde a um modelo de ativação de voz; um módulo de buffer 603, configurado para: quando o módulo de determinação 602 determinar que a informação de voz obtida pela escuta pelo módulo de escuta 601 corresponde ao modelo de ativação de voz, armazenar em buffer informação de voz, da primeira duração predefinida, obtida pela escuta pelo módulo de escuta 601; e um módulo de transmissão 604, configurado para transmitir um sinal de disparo para disparar a habilitação de um aparelho de reconhecimento de voz, onde o sinal de disparo é usado para instruir o aparelho de reconhecimento de voz a ler e reconhecer a informação de voz armazenada em buffer pelo aparelho de ativação de voz, após o aparelho de reconhecimento de voz ser habilitado.[090] An embodiment of the present invention also provides a voice activation apparatus. As shown in FIG. 6, the apparatus includes: a listening module 601, configured to listen for voice information in a surrounding environment; a determination module 602, configured to determine whether the voice information obtained by listening by the listening module 601 corresponds to a voice activation template; a buffer module 603, configured to: when the determination module 602 determines that the voice information obtained by listening by the listening module 601 corresponds to the voice activation model, buffer voice information, of the first predefined duration, obtained by listening by listening module 601; and a transmission module 604, configured to transmit a trigger signal to trigger enabling a voice recognition apparatus, where the trigger signal is used to instruct the speech recognition apparatus to read and recognize the stored speech information buffered by the voice activation device after the voice recognition device is enabled.

[091] Ao utilizar soluções fornecidas nesta modalidade da presente invenção, um usuário necessita transmitir apenas uma instrução, e requisitos do usuário podem ser satisfeitos. Além disso, as soluções são aplicáveis independentemente de um terminal estar em um estado de espera ou em um estado sem espera. Além disso, a informação de voz obtida pela escuta é armazenada em buffer, e um aparelho de reconhecimento de voz habilita-se e escuta informação de voz, e em seguida reconhece a informação de voz armazenada em buffer e a informação de voz obtida pela escuta, de modo que uma perda de informação parcial de voz pode ser evitada quando o aparelho de reconhecimento de voz começa a obter informação de voz após ser ativado, e o reconhecimento de voz é melhorado.[091] When using solutions provided in this embodiment of the present invention, a user needs to transmit only one instruction, and user requirements can be satisfied. Furthermore, the solutions are applicable regardless of whether an endpoint is in a standby state or a non-wait state. In addition, the voice information obtained by listening is buffered, and a voice recognition apparatus turns on and listens to voice information, and then recognizes the buffered voice information and the voice information obtained by listening , so that a loss of partial voice information can be avoided when the voice recognition apparatus starts to obtain voice information after being activated, and voice recognition is improved.

[092] Opcionalmente, o módulo de determinação 602 é especificamente configurado para: quando é determinado que a informação de voz obtida pela escuta corresponde a informação de ativação de voz predeterminada, determinar que a informação de voz obtida pela escuta corresponde ao modelo de ativação de voz.[092] Optionally, the determination module 602 is specifically configured to: when it is determined that the voice information obtained by listening corresponds to predetermined voice activation information, determining that the voice information obtained by listening corresponds to the activation model of voice.

[093] Opcionalmente, o aparelho ainda inclui: um módulo de extração, configurado para: quando o módulo de determinação 602 determinar que a informação de voz obtida pela escuta corresponde à informação de ativação de voz predeterminada, extrair um recurso de impressão de voz em um sinal de voz obtido pela escuta; e o módulo de determinação 602 é ainda configurado para: quando é determinado que o recurso de impressão de voz extraído pelo módulo de extração corresponde a um recurso de impressão de voz predeterminado, determinar que a informação de voz obtida pela escuta corresponde ao modelo de ativação de voz.[093] Optionally, the apparatus also includes: an extraction module, configured to: when the determination module 602 determines that the voice information obtained by listening corresponds to the predetermined voice activation information, extract a voice print resource in a voice signal obtained by listening; and the determining module 602 is further configured to: when it is determined that the voiceprint resource extracted by the extraction module corresponds to a predetermined speechprint resource, determining that the voice information obtained by listening corresponds to the activation model of voice.

[094] Uma modalidade da presente invenção fornece ainda um aparelho de reconhecimento de voz. Como mostrado na FIG. 7, o aparelho inclui: um módulo de recepção 701, configurado para receber um sinal de disparo transmitido por um aparelho de ativação de voz, onde o sinal de disparo é usado para instruir o aparelho de reconhecimento de voz a habilitar-se e reconhecer primeira informação de voz armazenada em buffer pelo aparelho de ativação de voz; um módulo de escuta 702, configurado para: após o módulo de recepção 701 receber o sinal de disparo, habilitar-se e escutar segunda informação de voz da segunda duração predefinida; e um módulo de reconhecimento 703, configurado para reconhecer a primeira informação de voz armazenada em buffer pelo aparelho de ativação de voz e a segunda informação de voz obtida pela escuta pelo módulo de escuta, para obter um resultado de reconhecimento.[094] An embodiment of the present invention also provides a voice recognition apparatus. As shown in FIG. 7, the apparatus includes: a receiver module 701, configured to receive a trigger signal transmitted by a voice activation apparatus, where the trigger signal is used to instruct the voice recognition apparatus to enable and recognize first voice information buffered by the voice activation device; a listening module 702, configured to: after the receiving module 701 receives the trigger signal, enable itself and listen to second voice information of the second predefined duration; and a recognition module 703 configured to recognize the first voice information buffered by the voice activation apparatus and the second voice information obtained by listening by the listening module to obtain a recognition result.

[095] Ao utilizar soluções fornecidas nesta modalidade da presente invenção, um usuário necessita transmitir apenas uma instrução, e requisitos do usuário podem ser satisfeitos. Além disso, as soluções são aplicáveis independentemente de um terminal estar em um estado de espera ou em um estado sem espera.[095] When using solutions provided in this embodiment of the present invention, a user needs to transmit only one instruction, and user requirements can be satisfied. Furthermore, the solutions are applicable regardless of whether an endpoint is in a standby state or a non-wait state.

[096] Opcionalmente, o aparelho inclui ainda: um módulo de correspondência, configurado para executar correspondência entre o resultado de reconhecimento obtido após o módulo de reconhecimento 703 executar reconhecimento e a informação de instrução de voz pré-armazenada; e um módulo de execução, configurado para executar uma operação correspondente à informação de instrução de voz correspondente.[096] Optionally, the apparatus also includes: a correspondence module, configured to perform correspondence between the recognition result obtained after the recognition module 703 performs recognition and the pre-stored voice instruction information; and an execution module, configured to perform an operation corresponding to the corresponding voice instruction information.

[097] Opcionalmente, o aparelho inclui ainda: um módulo de inabilitação, configurado para: quando o sinal de disparo não for recebido novamente dentro da terceira duração predefinida após o módulo de recepção receber o sinal de disparo, desabilitar o módulo de reconhecimento de voz.[097] Optionally, the device also includes: a disabling module, configured for: when the trigger signal is not received again within the third predefined duration after the reception module receives the trigger signal, disable the voice recognition module .

[098] Ao utilizar soluções fornecidas nesta modalidade da presente invenção, um usuário necessita transmitir apenas uma instrução, e requisitos do usuário podem ser satisfeitos. Além disso, as soluções são aplicáveis independentemente de um terminal estar em um estado de espera ou em um estado sem espera. Além disso, a informação de voz obtida pela escuta é armazenada em buffer, e um aparelho de reconhecimento de voz habilita-se e escuta informação de voz, e em seguida reconhece a informação de voz armazenada em buffer e a informação de voz obtida pela escuta, de modo que uma perda de informação parcial de voz pode ser evitada quando o aparelho de reconhecimento de voz começa a obter informação de voz após ser ativado, e o reconhecimento de voz é melhorado.[098] When using solutions provided in this embodiment of the present invention, a user needs to transmit only one instruction, and user requirements can be satisfied. Furthermore, the solutions are applicable regardless of whether an endpoint is in a standby state or a non-wait state. In addition, the voice information obtained by listening is buffered, and a voice recognition apparatus turns on and listens to voice information, and then recognizes the buffered voice information and the voice information obtained by listening , so that a loss of partial voice information can be avoided when the voice recognition apparatus starts to obtain voice information after being activated, and voice recognition is improved.

[099] Uma modalidade da presente invenção fornece ainda um aparelho de ativação de voz. Como mostrado na FIG. 8, o aparelho inclui: um módulo de escuta 801, configurado para escutar informação de voz em um ambiente circundante; um módulo de determinação 802, configurado para determinar se a informação de voz obtida pela escuta corresponde a um modelo de ativação de voz; e um módulo de transmissão 803, configurado para: quando o módulo de determinação 802 determinar que a informação de voz obtida pela escuta corresponde ao modelo de ativação de voz, transmitir um sinal de disparo para disparar a habilitação de um aparelho de reconhecimento de voz.[099] An embodiment of the present invention also provides a voice activation apparatus. As shown in FIG. 8, the apparatus includes: a listening module 801, configured to listen for voice information in a surrounding environment; a determination module 802, configured to determine whether the voice information obtained by the eavesdropper corresponds to a voice activation template; and a transmission module 803, configured to: when the determination module 802 determines that the voice information obtained by listening corresponds to the voice activation pattern, transmit a trigger signal to trigger the activation of a voice recognition apparatus.

[100] Opcionalmente, o módulo de determinação 802 é especificamente configurado para: quando é determinado que a informação de voz obtida pela escuta corresponde à informação de ativação de voz predeterminada, determinar que a informação de voz obtida pela escuta corresponde ao modelo de ativação de voz.[100] Optionally, determination module 802 is specifically configured to: when it is determined that the voice information obtained by the listener corresponds to the predetermined voice activation information, determine that the voice information obtained by the listener corresponds to the activation model of voice.

[101] Opcionalmente, o aparelho inclui ainda: um módulo de extração, configurado para: quando o módulo de determinação 802 determinar que a informação de voz obtida pela escuta corresponde à informação de ativação de voz predeterminada, extrair um recurso de impressão de voz em um sinal de voz obtido pela escuta; onde o módulo de determinação 802 é especificamente configurado para: quando é determinado que o recurso de impressão de voz extraído corresponde a um recurso de impressão de voz predeterminado, determinar que a informação de voz obtida pela escuta corresponde ao modelo de ativação de voz.[101] Optionally, the apparatus also includes: an extraction module, configured to: when the determination module 802 determines that the voice information obtained by listening corresponds to the predetermined voice activation information, extract a voiceprint resource in a voice signal obtained by listening; wherein determination module 802 is specifically configured to: when it is determined that the extracted speechprint resource corresponds to a predetermined speechprint resource, determining that the voice information obtained by listening corresponds to the voice activation model.

[102] Uma modalidade da presente invenção fornece um aparelho de reconhecimento de voz. Como mostrado na FIG. 9, o aparelho inclui: um módulo de recepção 901, configurado para receber um sinal de disparo transmitido por um aparelho de ativação de voz; um módulo de transmissão 902, configurado para: após o módulo de recepção 901 receber o sinal de disparo, habilitar- se e transmitir uma instrução de lembrete de voz a um usuário; e um módulo de processamento 903, configurado para gravar um sinal de voz inserido pelo usuário de acordo com a instrução de lembrete de voz, e reconhecer o sinal de voz, para obter um resultado de reconhecimento.[102] An embodiment of the present invention provides a voice recognition apparatus. As shown in FIG. 9, the apparatus includes: a receiver module 901, configured to receive a trigger signal transmitted by a voice activation apparatus; a transmitting module 902, configured to: after the receiving module 901 receives the trigger signal, power up and transmit a voice reminder instruction to a user; and a processing module 903, configured to record a voice signal entered by the user in accordance with the voice reminder instruction, and recognize the voice signal to obtain a recognition result.

[103] Ao utilizar soluções fornecidas nesta modalidade da presente invenção, um usuário necessita transmitir apenas uma instrução, e requisitos do usuário podem ser satisfeitos. Além disso, as soluções são aplicáveis independentemente de um terminal estar em um estado de espera ou em um estado sem espera.[103] By using solutions provided in this embodiment of the present invention, a user needs to transmit only one instruction, and user requirements can be satisfied. Furthermore, the solutions are applicable regardless of whether an endpoint is in a standby state or a non-wait state.

[104] No que segue abaixo, modalidades da presente invenção são especificamente descritas com referência a um processo de implementação de software, como mostrado na FIG. 10.[104] In what follows below, embodiments of the present invention are specifically described with reference to a software implementation process, as shown in FIG. 10.

[105] A partir de uma perspectiva de software, um módulo de reconhecimento de voz pode ser dividido em uma camada de acionamento, uma camada de abstração de hardware de áudio (HAL de Áudio), uma camada de estrutura (Estrutura), um mecanismo de reconhecimento de voz (Serviço de VA), e configuração de aplicativos (Configuração).[105] From a software perspective, a speech recognition module can be divided into a driver layer, an audio hardware abstraction layer (Audio HAL), a framework layer (Structure), an engine voice recognition (VA Service), and application configuration (Configuration).

[106] P1. Comunica um evento. Especificamente, a camada de acionamento comunica um evento de disparo à Estrutura após receber o sinal de disparo de um DSP.[106] P1. Reports an event. Specifically, the trigger layer communicates a trigger event to the Framework after receiving the trigger signal from a DSP.

[107] P2. Comunica o evento. Especificamente, a HAL de Áudio comunica o evento de disparo precedente ao Serviço de VA.[107] P2. Communicates the event. Specifically, the Audio HAL communicates the preceding trigger event to the VA Service.

[108] P3. Fixa um parâmetro. Especificamente, é configurado para ler dados de um buffer.[108] P3. Fixed a parameter. Specifically, it is configured to read data from a buffer.

[109] P4. Habilita o Serviço de VA.[109] P4. Enables the VA Service.

[110] P5. O Serviço de VA transmite uma instrução de habilitar gravação à Estrutura.[110] Q5. The VA Service passes a write enable instruction to the Framework.

[111] P6. A Estrutura transmite uma instrução de ler dados de áudio à HAL de Áudio após receber a instrução de habilitar gravação.[111] Q6. The Framework passes a read audio data instruction to the Audio HAL after receiving the write enable instruction.

[112] P7. A HAL de Áudio habilita a leitura dos dados de Buffer após receber a instrução de ler dados de áudio transmitida pela Estrutura.[112] P7. The Audio HAL enables reading data from Buffer after receiving the instruction to read audio data transmitted by the Framework.

[113] P8. A HAL de Áudio transmite uma instrução de obter dados de Buffer a um acionador, de modo que o acionador transmita a instrução de obter dados de Buffer ao DSP, e em seguida o DSP transmite os dados de Buffer ao acionador.[113] P8. The Audio HAL passes a Get Buffer data instruction to a trigger, so the trigger passes the Get Buffer data instruction to the DSP, and then the DSP passes the Buffer data to the trigger.

[114] P9. O acionador comunica os dados de Buffer recebidos ao Serviço de VA.[114] P9. The trigger communicates the received Buffer data to the VA Service.

[115] P10. O Serviço de VA executa processamento de reconhecimento nos dados de Buffer e nos dados gravados.[115] P10. The VA Service performs recognition processing on Buffer data and written data.

[116] P11. O Serviço de VA transmite uma instrução de parar gravação à Estrutura.[116] P11. The VA Service passes a stop recording instruction to the Framework.

[117] P12. A Estrutura transmite uma instrução de parar a leitura de dados de áudio à HAL de Áudio, após receber a instrução de parar de gravar.[117] P12. The Framework passes an instruction to stop reading audio data to the Audio HAL after receiving the instruction to stop recording.

[118] P13. A HAL de Áudio desabilita leitura para os dados de Buffer após receber a instrução de parar a leitura de dados de áudio transmitida pela Estrutura.[118] P13. The Audio HAL disables reading for Buffer data after receiving the instruction to stop reading audio data transmitted by the Framework.

[119] P14. A HAL de Áudio transmite uma instrução de parar a obtenção de dados de Buffer ao acionador.[119] P14. The Audio HAL passes a stop getting Buffer data instruction to the trigger.

[120] Pessoas versadas na técnica deverão compreender que as modalidades da presente invenção podem ser fornecidas como um método, um sistema, ou um produto de programa de computador. Portanto, a presente invenção pode usar uma forma de apenas modalidades de hardware, apenas modalidades de software, ou modalidades com uma combinação de software e hardware. Além disso, a presente invenção pode usar uma forma de um produto de programa de computador que é implementado em um ou mais meios de armazenamento utilizáveis por computador (incluindo uma memória de disco, um CD-ROM, uma memória ótica, e similares, mas sem se limitar a estes) que incluem código de programa utilizável por computador.[120] Persons skilled in the art should understand that embodiments of the present invention may be provided as a method, a system, or a computer program product. Therefore, the present invention can use a form of hardware-only modalities, software-only modalities, or modalities with a combination of software and hardware. Furthermore, the present invention may use a form of a computer program product that is implemented on one or more computer usable storage media (including a disk memory, a CD-ROM, an optical memory, and the like, but but not limited to) that include computer-usable program code.

[121] A presente invenção é descrita com referência aos fluxogramas e/ou diagramas de blocos do método, do dispositivo (sistema), e do produto de programa de computador de acordo com as modalidades da presente invenção. Deverá ser entendido que instruções de programa de computador podem ser usadas para implementar cada processo e/ou cada bloco nos fluxogramas e/ou nos diagramas de blocos. Estas instruções de programa de computador podem ser fornecidas a um computador de uso geral, um computador dedicado, um processador embutido, ou um processador de qualquer outro dispositivo programável de processamento de dados para gerar uma máquina, de modo que as instruções executadas por um computador ou um processador de qualquer outro dispositivo programável de processamento de dados gerem um aparelho para implementar uma função específica em um ou mais processos nos fluxogramas e/ou em um ou mais blocos nos diagramas de blocos.[121] The present invention is described with reference to flowcharts and/or block diagrams of the method, device (system), and computer program product in accordance with embodiments of the present invention. It should be understood that computer program instructions may be used to implement each process and/or each block in the flowcharts and/or block diagrams. These computer program instructions may be fed to a general purpose computer, a dedicated computer, an embedded processor, or a processor of any other programmable data processing device to generate a machine such that instructions executed by a computer or a processor of any other programmable data processing device generates an apparatus for implementing a specific function in one or more processes in the flowcharts and/or in one or more blocks in the block diagrams.

[122] Estas instruções de programa de computador podem ser armazenadas em uma memória legível por computador que possa instruir o computador ou qualquer outro dispositivo programável de processamento de dados a trabalhar em um modo específico, de modo que as instruções armazenadas na memória legível por computador gerem um artefato que inclua um aparelho de instruções. O aparelho de instruções implementa uma função específica em um ou mais processos nos fluxogramas e/ou em um ou mais blocos nos diagramas de blocos.[122] These computer program instructions may be stored in computer readable memory which can instruct the computer or any other programmable data processing device to work in a specific mode such that the instructions stored in computer readable memory generate an artifact that includes a set of instructions. The instruction apparatus implements a specific function in one or more processes in flowcharts and/or in one or more blocks in block diagrams.

[123] Estas instruções de programa de computador podem ser carregadas para um computador ou outro dispositivo programável de processamento de dados, de modo que uma série de operações e etapas sejam executadas no computador ou no outro dispositivo programável, gerando deste modo processamento implementado em computador. Portanto, as instruções executadas no computador ou no outro dispositivo programável fornecem etapas para implementação de uma função específica em um ou mais processos dos fluxogramas e/ou em um ou mais blocos nos diagramas de blocos.[123] These computer program instructions may be loaded into a computer or other programmable data processing device such that a series of operations and steps are performed on the computer or other programmable device, thereby generating computer-implemented processing . Therefore, instructions executed on the computer or other programmable device provide steps for implementing a specific function in one or more processes in the flowcharts and/or in one or more blocks in the block diagrams.

[124] Embora algumas modalidades da presente invenção tenham sido descritas, pessoas versadas na técnica podem fazer mudanças e modificações àquelas modalidades desde que aprendam o conceito básico da invenção. Portanto, as reivindicações a seguir destinam-se a ser consideradas como abrangendo as modalidades e todas as mudanças e modificações que caiam dentro do âmbito da presente invenção.[124] Although some embodiments of the present invention have been described, persons skilled in the art can make changes and modifications to those embodiments as long as they learn the basic concept of the invention. Therefore, the following claims are intended to be considered as covering embodiments and all changes and modifications that fall within the scope of the present invention.

[125] Obviamente, pessoas versadas na técnica podem fazer diversas modificações e variações às modalidades da presente invenção sem divergir do espírito e âmbito das modalidades da presente invenção. A presente invenção destina-se a cobrir estas modificações e variações desde que caiam dentro do âmbito de proteção definido pelas reivindicações a seguir e suas tecnologias equivalentes.[125] Obviously, persons skilled in the art can make various modifications and variations to the embodiments of the present invention without departing from the spirit and scope of the embodiments of the present invention. The present invention is intended to cover these modifications and variations as long as they fall within the scope of protection defined by the following claims and their equivalent technologies.

[126] A seguir são fornecidas outras modalidades da presente invenção. Deverá ser observado que a numeração usada na seção a seguir não necessariamente necessita corresponder à numeração usada nas seções anteriores.[126] Further embodiments of the present invention are provided below. It should be noted that the numbering used in the following section need not necessarily correspond to the numbering used in the previous sections.

[127] Modalidade 1. Um método de reconhecimento de voz, que compreende: escutar, por um aparelho de ativação de voz, informação de voz em um ambiente circundante; e quando é determinado que a informação de voz obtida pela escuta corresponde a um modelo de ativação de voz, armazenar em buffer, pelo aparelho de ativação de voz, informação de voz, da primeira duração predefinida, obtida pela escuta, e transmitir um sinal de disparo para disparar a habilitação de um aparelho de reconhecimento de voz, onde o sinal de disparo é usado para instruir o aparelho de reconhecimento de voz a ler e reconhecer a informação de voz armazenada em buffer pelo aparelho de ativação de voz, após o aparelho de reconhecimento de voz ser habilitado.[127] Embodiment 1. A voice recognition method, comprising: listening, by a voice activation device, to voice information in a surrounding environment; and when it is determined that the voice information obtained by listening corresponds to a voice activation pattern, buffering, by the voice activation apparatus, voice information, of the first predefined duration, obtained by listening, and transmitting a signal of trigger for triggering the enabling of a voice recognition apparatus, where the trigger signal is used to instruct the voice recognition apparatus to read and recognize the voice information buffered by the voice activation apparatus, after the triggering apparatus voice recognition be enabled.

[128] Modalidade 2. O método de acordo com a modalidade 1, onde a determinação de que a informação de voz obtida pela escuta corresponde a um modelo de ativação de voz compreende: quando a informação de voz obtida pela escuta corresponder à informação de ativação de voz predeterminada, determinar que a informação de voz obtida pela escuta corresponde ao modelo de ativação de voz.[128] Embodiment 2. The method according to embodiment 1, wherein determining that the voice information obtained by the tapping corresponds to a voice activation pattern comprises: when the voice information obtained by the tapping corresponds to the activation information of predetermined voice, determine that the voice information obtained by listening corresponds to the voice activation model.

[129] Modalidade 3. O método de acordo com a modalidade 1, onde a determinação de que a informação de voz obtida pela escuta corresponde a um modelo de ativação de voz compreende: quando a informação de voz obtida pela escuta corresponder à informação de ativação de voz predeterminada, extrair um recurso de impressão de voz em um sinal de voz obtido pela escuta, determinar que o recurso de impressão de voz extraído corresponde a um recurso de impressão de voz predeterminado, e determinar que a informação de voz obtida pela escuta corresponde ao modelo de ativação de voz.[129] Embodiment 3. The method according to embodiment 1, wherein determining that the voice information obtained by the eavesdropping corresponds to a voice activation pattern comprises: when the voice information obtained by the eavesdropping corresponds to the activation information of predetermined speech, extracting a voiceprint feature in a speech signal obtained by the tapping, determining that the extracted voiceprint feature corresponds to a predetermined speechprint feature, and determining that the voice information obtained by the tapping corresponds to the voice activation model.

[130] Modalidade 4. Um método de reconhecimento de voz, que compreende: receber, por um aparelho de reconhecimento de voz, um sinal de disparo transmitido por um aparelho de ativação de voz, onde o sinal de disparo é usado para instruir o aparelho de reconhecimento de voz a habilitar-se e reconhecer primeira informação de voz armazenada em buffer pelo aparelho de ativação de voz; após receber o sinal de disparo, habilitar-se, pelo aparelho de reconhecimento de voz, e escutar a segunda informação de voz da segunda duração predefinida; e reconhecer a primeira informação de voz armazenada em buffer pelo aparelho de ativação de voz e a segunda informação de voz obtida pela escuta, para obter um resultado de reconhecimento.[130] Embodiment 4. A method of voice recognition, comprising: receiving, by a voice recognition apparatus, a trigger signal transmitted by a voice activation apparatus, where the trigger signal is used to instruct the apparatus voice recognition to enable and recognize first voice information buffered by the voice activation apparatus; after receiving the trigger signal, enabling by the voice recognition apparatus and listening to the second voice information of the second preset duration; and recognizing the first voice information buffered by the voice activation apparatus and the second voice information obtained by listening, to obtain a recognition result.

[131] Modalidade 5. O método de acordo com a modalidade 4, onde após o aparelho de reconhecimento de voz obter o resultado de reconhecimento, o método compreende ainda: executar, pelo aparelho de reconhecimento de voz, correspondência entre o resultado de reconhecimento obtido e a informação de instrução de voz pré-armazenada; e executar, pelo aparelho de reconhecimento de voz, uma operação correspondente à informação de instrução de voz correspondente.[131] Embodiment 5. The method according to embodiment 4, where after the voice recognition apparatus obtains the recognition result, the method further comprises: performing, by the voice recognition apparatus, correspondence between the recognition result obtained and the pre-stored voice instruction information; and performing, by the voice recognition apparatus, an operation corresponding to the corresponding voice instruction information.

[132] Modalidade 6. O método de acordo com a modalidade 4 ou 5, que compreende ainda: quando é determinado que o sinal de disparo não é novamente recebido dentro da terceira duração predefinida após o sinal de disparo ser recebido, desabilitar-se automaticamente, pelo aparelho de reconhecimento de voz.[132] Mode 6. The method according to mode 4 or 5, further comprising: when it is determined that the trigger signal is not received again within the predefined third duration after the trigger signal is received, automatically disabling , by the voice recognition device.

[133] Modalidade 7. Um método de reconhecimento de voz, que compreende: escutar, por um aparelho de ativação de voz, informação de voz em um ambiente circundante; e quando é determinado que a informação de voz obtida pela escuta corresponde a um modelo de ativação de voz, transmitir, pelo aparelho de ativação de voz, um sinal de disparo para disparar a habilitação de um aparelho de reconhecimento de voz.[133] Embodiment 7. A method of speech recognition, comprising: listening, by a voice activation apparatus, to voice information in a surrounding environment; and when it is determined that the voice information obtained by listening corresponds to a voice activation pattern, transmitting, by the voice activation apparatus, a trigger signal to trigger the enabling of a voice recognition apparatus.

[134] Modalidade 8. O método de acordo com a modalidade 7, onde a determinação de que a informação de voz obtida pela escuta corresponde a um modelo de ativação de voz compreende: quando a informação de voz obtida pela escuta corresponder à informação de ativação de voz predeterminada, determinar que a informação de voz obtida pela escuta corresponde ao modelo de ativação de voz.[134] Embodiment 8. The method according to embodiment 7, wherein determining that the voice information obtained by the tapping corresponds to a voice activation pattern comprises: when the voice information obtained by the tapping corresponds to the activation information of predetermined voice, determine that the voice information obtained by listening corresponds to the voice activation model.

[135] Modalidade 9. O método de acordo com a modalidade 7, onde a determinação de que a informação de voz obtida pela escuta corresponde a um modelo de ativação de voz compreende: quando a informação de voz obtida pela escuta corresponder à informação de ativação de voz predeterminada, extrair um recurso de impressão de voz em um sinal de voz obtido pela escuta, determinar que o recurso de impressão de voz extraído corresponde a um recurso de impressão de voz predeterminado, e determinar que a informação de voz obtida pela escuta corresponde ao modelo de ativação de voz.[135] Embodiment 9. The method according to embodiment 7, wherein determining that the voice information obtained by the tapping corresponds to a voice activation pattern comprises: when the voice information obtained by the tapping corresponds to the activation information of predetermined speech, extracting a voiceprint feature in a speech signal obtained by the tapping, determining that the extracted voiceprint feature corresponds to a predetermined speechprint feature, and determining that the voice information obtained by the tapping corresponds to the voice activation model.

[136] Modalidade 10. Um método de reconhecimento de voz, que compreende: receber, por um aparelho de reconhecimento de voz, um sinal de disparo transmitido por um aparelho de ativação de voz; habilitar-se, pelo aparelho de reconhecimento de voz após receber o sinal de disparo, e transmitir uma instrução de lembrete de voz a um usuário; e gravar, pelo aparelho de reconhecimento de voz, um sinal de voz inserido pelo usuário de acordo com a instrução de lembrete de voz, e executar reconhecimento no sinal de voz para obter um resultado de reconhecimento.[136] Embodiment 10. A voice recognition method, comprising: receiving, by a voice recognition apparatus, a trigger signal transmitted by a voice activation apparatus; enabling itself, by the voice recognition apparatus after receiving the trigger signal, and transmitting a voice reminder instruction to a user; and recording, by the voice recognition apparatus, a voice signal entered by the user in accordance with the voice reminder instruction, and performing recognition on the voice signal to obtain a recognition result.

[137] Modalidade 11. Um aparelho de ativação de escuta, que compreende: um módulo de escuta, configurado para escutar informação de voz em um ambiente circundante; um módulo de determinação, configurado para determinar se a informação de voz obtida pela escuta pelo módulo de escuta corresponde a um modelo de ativação de voz; um módulo de armazenamento em buffer para: quando o módulo de determinação determinar que a informação de voz obtida pela escuta pelo módulo de escuta corresponde ao modelo de ativação de voz, armazenar em buffer informação de voz, da primeira duração predefinida, obtida pela escuta pelo módulo de escuta; e um módulo de transmissão, configurado para transmitir um sinal de disparo para disparar a habilitação de um aparelho de reconhecimento de voz, onde o sinal de disparo é usado para instruir o aparelho de reconhecimento de voz a ler e reconhecer a informação de voz armazenada em buffer pelo aparelho de ativação de voz, após o aparelho de reconhecimento de voz ser habilitado.[137] Embodiment 11. A listening activation apparatus, comprising: a listening module, configured to listen for voice information in a surrounding environment; a determining module, configured to determine whether the voice information obtained by listening by the listening module corresponds to a voice activation template; a buffering module for: when the determination module determines that the voice information obtained by listening by the listening module matches the voice activation pattern, buffering voice information, of the first predefined duration, obtained by listening by the listening module; and a transmission module, configured to transmit a trigger signal to trigger enabling a voice recognition apparatus, where the trigger signal is used to instruct the voice recognition apparatus to read and recognize voice information stored in buffered by the voice activation device after the voice recognition device is enabled.

[138] Modalidade 12. O aparelho de acordo com a modalidade 11, onde o módulo de determinação é especificamente configurado para: quando é determinado que a informação de voz obtida pela escuta corresponde a informação de ativação de voz predeterminada, determinar que a informação de voz obtida pela escuta corresponde ao modelo de ativação de voz.[138] Embodiment 12. The apparatus according to embodiment 11, wherein the determination module is specifically configured to: when it is determined that the voice information obtained by listening corresponds to predetermined voice activation information, determining that the voice obtained by listening corresponds to the voice activation model.

[139] Modalidade 13. O aparelho de acordo com a modalidade 11, que compreende ainda: um módulo de extração, configurado para: quando o módulo de determinação determinar que a informação de voz obtida pela escuta corresponde à informação de ativação de voz predeterminada, extrair um recurso de impressão de voz em um sinal de voz obtido pela escuta; onde o módulo de determinação é ainda configurado para: quando é determinado que o recurso de impressão de voz extraído pelo módulo de extração corresponde a um recurso de impressão de voz predeterminado, determinar que a informação de voz obtida pela escuta corresponde ao modelo de ativação de voz.[139] Embodiment 13. The apparatus according to embodiment 11, further comprising: an extraction module, configured to: when the determination module determines that the voice information obtained by listening corresponds to the predetermined voice activation information, extracting a voiceprint feature from a voice signal obtained by listening; where the determination module is further configured to: when it is determined that the voiceprint feature extracted by the extraction module corresponds to a predetermined speechprint feature, determining that the voice information obtained by the listening corresponds to the activation model of voice.

[140] Modalidade 14. Um aparelho de reconhecimento de voz, que compreende: um módulo de recepção, configurado para receber um sinal de disparo transmitido por um aparelho de ativação de voz, onde o sinal de disparo é usado para instruir o aparelho de reconhecimento de voz a habilitar-se e reconhecer a primeira informação de voz armazenada em buffer pelo aparelho de ativação de voz; um módulo de escuta, configurado para: após o módulo de recepção receber o sinal de disparo, habilitar-se e escutar segunda informação de voz de segunda duração predefinida; e um módulo de reconhecimento, configurado para reconhecer a primeira informação de voz armazenada em buffer pelo aparelho de ativação de voz e a segunda informação de voz obtida pela escuta pelo módulo de escuta para obter um resultado de reconhecimento.[140] Embodiment 14. A voice recognition apparatus, comprising: a receiving module, configured to receive a trigger signal transmitted by a voice activation apparatus, where the trigger signal is used to instruct the recognition apparatus voice enabling and recognizing the first voice information buffered by the voice activation apparatus; a listening module, configured so that: after the reception module receives the trigger signal, it enables itself and listens to second voice information of a second predefined duration; and a recognition module, configured to recognize the first voice information buffered by the voice activation apparatus and the second voice information obtained by listening by the listening module to obtain a recognition result.

[141] Modalidade 15. O aparelho de acordo com a modalidade 14, que compreende ainda: um módulo de correspondência, configurado para executar correspondência entre o resultado de reconhecimento obtido após o módulo de reconhecimento executar reconhecimento e informação de instrução de voz pré-armazenada; e um módulo de execução, configurado para executar uma operação correspondente à informação de instrução de voz correspondente.[141] Embodiment 15. The apparatus according to embodiment 14, further comprising: a matching module, configured to perform matching between the recognition result obtained after the recognition module performs recognition and pre-stored speech instruction information ; and an execution module, configured to perform an operation corresponding to the corresponding voice instruction information.

[142] Modalidade 16. O aparelho de acordo com a modalidade 14 ou 15, que compreende ainda: um módulo de inabilitação, configurado para: quando o sinal de disparo não for recebido novamente dentro da terceira duração predefinida após o sinal de disparo ser recebido, desabilitar o módulo de reconhecimento de voz.[142] Embodiment 16. The apparatus according to embodiment 14 or 15, further comprising: a disabling module, configured to: when the trigger signal is not received again within the third predefined duration after the trigger signal is received , disable the voice recognition module.

[143] Modalidade 17. Um aparelho de ativação de voz, que compreende: um módulo de escuta, configurado para escutar informação de voz em um ambiente circundante; um módulo de determinação, configurado para determinar se a informação de voz obtida pela escuta corresponde a um modelo de ativação de voz; e um módulo de transmissão, configurado para: quando o módulo de determinação determinar que a informação de voz obtida pela escuta corresponde ao modelo de ativação de voz, transmitir um sinal de disparo para disparar a habilitação de um aparelho de reconhecimento de voz.[143] Embodiment 17. A voice activation apparatus, comprising: a listening module, configured to listen for voice information in a surrounding environment; a determination module, configured to determine whether the voice information obtained by the listener corresponds to a voice activation template; and a transmission module, configured to: when the determination module determines that the voice information obtained by listening corresponds to the voice activation pattern, transmit a trigger signal to trigger the activation of a voice recognition apparatus.

[144] Modalidade 18. O aparelho de acordo com a modalidade 17, onde o módulo de determinação é especificamente configurado para: quando é determinado que a informação de voz obtida pela escuta corresponde à informação de ativação de voz predeterminada, determinar que a informação de voz obtida pela escuta corresponde ao modelo de ativação de voz.[144] Embodiment 18. The apparatus according to embodiment 17, wherein the determination module is specifically configured to: when it is determined that the voice information obtained by listening corresponds to the predetermined voice activation information, determining that the voice obtained by listening corresponds to the voice activation model.

[145] Modalidade 19. O aparelho de acordo com a modalidade 17, que compreende ainda: um módulo de extração, configurado para: quando o módulo de determinação determinar que a informação de voz obtida pela escuta corresponde à informação de ativação de voz predeterminada, extrair um recurso de impressão de voz em um sinal de voz obtido pela escuta; onde o módulo de determinação é especificamente configurado para: quando é determinado que o recurso de impressão de voz extraído corresponde a um recurso de impressão de voz predeterminado, determinar que a informação de voz obtida pela escuta corresponde ao modelo de ativação de voz.[145] Embodiment 19. The apparatus according to embodiment 17, further comprising: an extraction module, configured to: when the determination module determines that the voice information obtained by listening corresponds to the predetermined voice activation information, extracting a voiceprint feature from a voice signal obtained by listening; where the determination module is specifically configured to: when it is determined that the extracted speechprint resource corresponds to a predetermined speechprint resource, determining that the voice information obtained by listening corresponds to the voice activation model.

[146] Modalidade 20. Um aparelho de reconhecimento de voz, que compreende: um módulo de recepção, configurado para receber um sinal de disparo transmitido por um aparelho de ativação de voz; um módulo de transmissão, configurado para: após o módulo de recepção receber o sinal de disparo, habilitar-se e transmitir uma instrução de lembrete de voz a um usuário; e um módulo de processamento, configurado para gravar um sinal de voz inserido pelo usuário de acordo com a instrução de lembrete de voz, e reconhecer o sinal de voz para obter um resultado de reconhecimento.[146] Embodiment 20. A voice recognition apparatus, comprising: a receiver module, configured to receive a trigger signal transmitted by a voice activation apparatus; a transmitting module, configured to: after the receiving module receives the trigger signal, power up and transmit a voice reminder instruction to a user; and a processing module, configured to record a voice signal entered by the user in accordance with the voice reminder instruction, and recognize the voice signal to obtain a recognition result.

[147] Modalidade 21. Um terminal, que compreende: um aparelho de ativação de voz e um aparelho de reconhecimento de voz; onde o aparelho de ativação de voz é configurado para: escutar informação de voz em um ambiente circundante; quando é determinado que a informação de voz obtida pela escuta corresponde a um modelo de ativação de voz, armazenar em buffer a primeira informação de voz obtida pela escuta dentro da primeira duração predefinida, e transmitir um sinal de disparo para disparar a habilitação do aparelho de reconhecimento de voz; e o aparelho de reconhecimento de voz é configurado para: após receber o sinal de disparo transmitido pelo aparelho de ativação de voz, habilitar-se e escutar a segunda informação de voz dentro da segunda duração predefinida, e reconhecer a primeira informação de voz armazenada em buffer pelo aparelho de ativação de voz e a segunda informação obtida pela escuta para obter um resultado de reconhecimento.[147] Embodiment 21. A terminal, comprising: a voice activation apparatus and a voice recognition apparatus; where the voice activation apparatus is configured to: hear voice information in a surrounding environment; when it is determined that the voice information obtained by the tap corresponds to a voice activation pattern, buffering the first voice information obtained by the tap within the first predefined duration, and transmitting a trigger signal to trigger the enable of the listening device. voice recognition; and the voice recognition apparatus is configured to: after receiving the trigger signal transmitted by the voice activation apparatus, power up and listen to the second voice information within the second preset duration, and recognize the first voice information stored in buffered by the voice activation apparatus and the second information obtained by listening to obtain a recognition result.

[148] Modalidade 22. O terminal de acordo com modalidade 21, onde o aparelho de ativação de voz é processador digital de sinais DSP.[148] Modality 22. The terminal according to modality 21, where the voice activation apparatus is digital DSP signal processor.

[149] Modalidade 23. O terminal de acordo com modalidade 21 ou 22, onde o aparelho de reconhecimento voz é um processador de aplicativos AP.[149] Style 23. The terminal according to style 21 or 22, where the voice recognition apparatus is an AP application processor.

Claims

1. Voice control method, characterized in that it is applied to a terminal comprising a voice activation apparatus (101) and a voice recognition apparatus (102), the method comprising: listening (301), by the activation apparatus voice information, a first voice information in a surrounding environment, wherein the first voice information comprises an activation information and a first part of a command word, wherein the activation information is used to enable the voice recognition apparatus voice; enabling, by the voice activation apparatus, the voice recognition apparatus according to the activation information; listening, by the voice recognition apparatus, to a second voice information, wherein the second voice information comprises a second part of the command word; obtaining, by the voice recognition apparatus, a voice instruction information according to the first voice information and the second voice information, wherein the voice instruction information corresponds to the command word, the command word comprising the first part of the command word and the second part of the command word.

2. Method, according to claim 1, characterized in that the enabling, by the voice activation apparatus, of the voice recognition apparatus according to the activation information, comprises: generating, by the voice activation apparatus , a trigger signal for enabling the voice recognition apparatus in a case where it determines that the activation information corresponds to a voice activation template.

3. Method according to claim 2, characterized in that determining that the activation information corresponds to a voice activation model comprises: in a case where the activation information corresponds to a predetermined activation voice information , determine that the activation information corresponds to a speech activation model.

4. Method according to claim 2, characterized in that determining that the activation information corresponds to a voice activation model comprises: in a case where the activation information corresponds to a predetermined activation voice information , extracting a voiceprint feature in the activation information, in a case where the extracted voiceprint feature corresponds to a predetermined voiceprint feature, determining that the activation information corresponds to a voice activation template.

5. Method, according to claim 4, characterized in that the speech printing resource includes one or more of the following resources: intonation, a linear prediction coefficient, a spectral envelope parameter, a harmonic energy ratio, a peak resonant frequency and its bandwidth, a cepstrum, or a Mel coefficient cepstrum coefficient.

6. Method according to claim 1, characterized in that obtaining, by the voice recognition apparatus, a voice instruction information according to the first voice information and the second voice information comprises: obtaining by the voice recognition apparatus, a recognition result according to the first voice information and the second voice information, wherein the recognition result comprises command word information; obtaining, by the speech recognition apparatus, the speech instruction information corresponding to the recognition result by correspondence between the obtained recognition result and a pre-stored speech instruction information.

7. Method according to any one of claims 1 to 6, characterized in that the activation information is heard in a first period by the voice alarm device, the first part of the command word is heard in a second period by the voice wake-up device; the second voice information is heard in a third period by the voice recognition apparatus.

8. Method according to any one of claims 1 to 6, characterized in that hearing, by the voice activation apparatus, a first voice information in a surrounding environment comprises: listening to the first voice information in a surrounding environment in a standby state; or hearing the first voice information in a surrounding environment in a non-standby state; or hearing the first voice information in a surrounding environment in a locked screen state.

9. Method according to claim 2, characterized in that it further comprises: sending, by the voice activation apparatus, the trigger signal to the voice recognition apparatus to enable the voice recognition apparatus.

10. Method according to any one of claims 1 to 6, characterized in that it further comprises: controlling, by the voice recognition apparatus, the execution of an operation corresponding to a corresponding voice instruction information.

11. Method according to any one of claims 1 to 6, characterized in that it further comprises: by determining that the voice information used to enable the voice recognition apparatus is not received again within a predefined duration after enabling the voice recognition device will automatically disable by the voice recognition device itself.

12. Method according to any one of claims 1 to 6, characterized in that the voice activation apparatus is a digital signal processor (DSP).

13. Method according to any one of claims 1 to 6, characterized in that the voice recognition device is an application processor (AP).

14. Terminal, comprising: one or more processors; and a memory storing instructions, the terminal characterized in that when the instructions are executed by the one or more processors, they cause the voice control terminal to execute the method defined in any one of claims 1 to 13.

15. Non-transient computer-readable medium having computer-usable instructions stored therein for execution by a processor, characterized in that the instructions cause the processor to execute the method defined in any one of claims 1 to 13.

16. Terminal, characterized in that it comprises: a voice activation apparatus (101) and a voice recognition apparatus (102); wherein the voice activation apparatus (101) is configured to listen to a first voice information in a surrounding environment, wherein the first voice information comprises an activation information and a first part of a command word, wherein the information and activation is used to enable voice recognition pops; the voice activation apparatus is also configured to enable the voice recognition apparatus according to the activation information; the voice recognition apparatus (102) is configured to listen for second voice information, wherein the second voice information comprises a second part of the command word; the speech recognition apparatus is also configured to obtain a voice instruction information according to the first voice information and the second voice information, wherein the voice instruction information corresponds to the command word, the command word, command comprising the first part of the command word and the second part of the command word.

17. Terminal according to claim 16, characterized in that the voice activation apparatus is configured to determine that the activation information corresponds to a voice activation model, in a case where the activation information corresponds to a predetermined activation speech information.

18. Terminal according to claim 16, characterized in that the voice activation apparatus is configured to: in a case where the activation information corresponds to a predetermined activation voice information, extract a print resource in the activation information, in a case where the extracted voiceprint feature corresponds to a predetermined voiceprint feature, determining that the activation information corresponds to a voice activation template.

19. Terminal, according to claim 18, characterized in that the voiceprint feature includes one or more of the following features: intonation, a linear prediction coefficient, a spectral envelope parameter, a harmonic energy ratio, a peak resonant frequency and its bandwidth, a cepstrum, or a Mel coefficient cepstrum coefficient.

20. Terminal, according to Claim 16, characterized by the fact that the voice recognition apparatus is configured to: Obtain a recognition result according to the first voice information and the second voice information, in which the Recognition Result Comprises Command Word Information; Obtaining the Voice Instruction Information That Corresponds to the Recognition Result by matching the Acquired Recognition Result to a pre-stored Voice Instruction Information.

21. Terminal according to any one of claims 16 to 20, characterized in that the activation information is heard in a first period by the voice alarm device, the first part of the command word is heard in a second period by the voice wake-up device; the second voice information is heard in a third period by the voice recognition apparatus.

22. Terminal according to any one of claims 16 to 20, characterized in that the voice activation apparatus is configured to: hear the first voice information in a surrounding environment in a waiting state; or hearing the first voice information in a surrounding environment in a non-standby state; or hearing the first voice information in a surrounding environment in a locked screen state.

23. Terminal according to any one of claims 16 to 20, characterized in that the voice recognition apparatus is configured to: automatically disable when determining that the voice information used to enable the voice recognition apparatus is not received again within a predefined duration after enabling the voice recognition device.

24. Terminal according to any one of claims 16 to 20, characterized in that it further comprises an execution module; wherein the voice recognition apparatus is further configured to send an execution instruction corresponding to the voice instruction information to the execution module; the execution module is configured to perform an operation corresponding to the execution instruction.

25. Terminal according to any one of claims 16 to 20, characterized in that the voice activation apparatus is a digital signal processor (DSP) and the voice recognition apparatus is an application processor (AP).