WO2019051668A1 - Procédé de commande de démarrage et système de commande de démarrage pour terminal intelligent - Google Patents

Procédé de commande de démarrage et système de commande de démarrage pour terminal intelligent Download PDF

Info

Publication number
WO2019051668A1
WO2019051668A1 PCT/CN2017/101570 CN2017101570W WO2019051668A1 WO 2019051668 A1 WO2019051668 A1 WO 2019051668A1 CN 2017101570 W CN2017101570 W CN 2017101570W WO 2019051668 A1 WO2019051668 A1 WO 2019051668A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice recognition
smart terminal
unit
recognition information
template
Prior art date
Application number
PCT/CN2017/101570
Other languages
English (en)
Chinese (zh)
Inventor
王周丹
杨康
夏相声
Original Assignee
深圳传音通讯有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳传音通讯有限公司 filed Critical 深圳传音通讯有限公司
Priority to CN201780096731.2A priority Critical patent/CN111345016A/zh
Priority to PCT/CN2017/101570 priority patent/WO2019051668A1/fr
Publication of WO2019051668A1 publication Critical patent/WO2019051668A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/725Cordless telephones

Definitions

  • the present invention relates to the field of intelligent terminals, and in particular, to a startup control method and a startup control system for an intelligent terminal.
  • the startup or shutdown of various smart terminals is basically realized by the user pressing the power switch button of the smart terminals.
  • the smart terminal When the smart terminal is powered on, when the power supply module of the smart terminal detects that the power switch key is pressed, the smart terminal battery voltage is converted into a voltage suitable for each part of the intelligent terminal circuit, and is supplied to the corresponding power module, when the clock circuit After the power supply voltage is obtained, a vibration signal is generated and sent to the logic circuit, and the CPU executes the power-on procedure after obtaining the voltage and the clock signal to perform the subsequent power-on operation.
  • the keyboard detection module sends a shutdown request model to the digital logic part, and the CPU cancels the power-on maintenance signal, executes the shutdown procedure, and the power supply module cancels the power supply.
  • the RF and logic circuits immediately stop working and perform a shutdown operation.
  • the process of opening and shutting down the smart terminal by pressing the power button of the power switch of these smart terminals is cumbersome and complicated for the user who pursues efficiency; in addition, since the power button of the power switch is frequently used, the button is easily aged, resulting in aging. The sensitivity of the button is reduced or even disabled.
  • the present invention provides a startup control method and a startup control system for an intelligent terminal.
  • the user inputs a voice command in the smart terminal, and when the user needs to turn the phone on or off, inputs the voice command to the smart terminal, through voice recognition technology, and The preset voice commands in the smart terminal are matched.
  • the smart terminal can be started or turned off.
  • the power button of the smart terminal can be omitted, and the overall layout of the smart terminal is more convenient, compact, and beautiful, so that the smart terminal is more scientific and improves the user experience; and can be better.
  • an object of the present invention is to provide a startup control method and a startup control system for an intelligent terminal.
  • the invention discloses a startup control method for an intelligent terminal, comprising the following steps:
  • the voiceprint feature of the voice recognition information matches the voiceprint feature of the voice recognition template, determining whether the voice recognition information includes the specific keyword of the voice recognition template;
  • the smart terminal When the voice recognition information includes the specific keyword of the voice recognition template, and the smart terminal is in a power on state, controlling the smart terminal to shut down; and/or
  • the smart terminal When the voice recognition information includes the specific keyword of the voice recognition template, and the smart terminal is in a shutdown state, the smart terminal is controlled to be powered on.
  • the step of determining whether the voiceprint feature of the voice recognition information matches the voiceprint feature of the voice recognition template comprises:
  • Pre-emphasizing the voice recognition information by pre-emphasizing digital filtering, framing, windowing, etc.;
  • the multi-dimensional feature vector matches the multi-dimensional feature vector of the speech recognition template, it is determined that the voiceprint feature of the speech recognition information matches the voiceprint feature of the speech recognition template.
  • the step of determining whether the voice recognition information includes the specific keyword of the voice recognition template comprises:
  • the voice recognition information includes the specific keyword of the voice recognition template.
  • the step of determining whether the voiceprint feature of the voice recognition information matches the voiceprint feature of the voice recognition template comprises:
  • the step of determining whether the voiceprint feature of the voice recognition information matches the voiceprint feature of the voice recognition template comprises:
  • the voice recognition information is continuous within the time threshold, it is determined whether the voiceprint feature of the voice recognition information matches the voiceprint feature of the voice recognition template.
  • the invention also discloses a startup control system for an intelligent terminal, comprising a template setting module, a voice acquisition module, a voiceprint matching module, a keyword matching module and a startup control module;
  • the template setting module is configured to establish a voice recognition template containing a specific keyword in the smart terminal
  • the voice acquiring module calls a microphone of the smart terminal to acquire voice recognition information input by the smart terminal;
  • the voiceprint matching module is communicably connected to the template setting module and the voice acquiring module, and determines whether the voiceprint feature of the voice recognition information matches the voiceprint feature of the voice recognition template;
  • the keyword matching module is communicably connected to the template setting module, the voice acquiring module, and the voiceprint matching module, and when the voiceprint feature of the voice recognition information matches the voiceprint feature of the voice recognition template, Determining whether the voice recognition information includes the specific keyword of the voice recognition template;
  • the startup control module is communicatively coupled to the keyword matching module, and when the voice recognition information includes the specific keyword of the voice recognition template, and the smart terminal is in a power on state, controlling the smart terminal Shutdown; and/or
  • the smart terminal When the voice recognition information includes the specific keyword of the voice recognition template, and the smart terminal is in a shutdown state, the smart terminal is controlled to be powered on.
  • the voiceprint matching module includes a preprocessing unit, a multidimensional vector unit, a voiceprint matching unit, and a voiceprint confirming unit;
  • the pre-processing unit performs pre-emphasis digital filtering, framing, windowing and the like on the voice recognition information
  • the multi-dimensional vector unit is communicably connected to the pre-processing unit, and extracts a first-order difference between the pre-processed speech recognition information, the linear frequency cepstral coefficient, the linear prediction cepstral coefficient, and the Mel frequency cepstral coefficient.
  • the first-order difference of linear predictive cepstral coefficients, the first-order difference of energy, energy, and the cepstral filter cepstral coefficients together form a multi-dimensional feature vector;
  • the voiceprint matching unit is communicatively coupled to the multi-dimensional vector unit to determine whether the multi-dimensional feature vector matches a multi-dimensional feature vector of the voice recognition template;
  • the voiceprint confirming unit is communicably connected to the voiceprint matching unit, and when the multi-dimensional feature vector matches the multi-dimensional feature vector of the voice recognition template, determining a voiceprint feature of the voice recognition information and the voice The voiceprint features of the recognition template are matched.
  • the keyword matching module includes a text parsing unit, a keyword matching unit, and a keyword confirming unit;
  • the text parsing unit is communicatively coupled to the multidimensional vector unit to parse the multidimensional feature vector to text information
  • the keyword matching unit is communicably connected to the text parsing unit to determine whether the text information matches the specific keyword of the speech recognition template;
  • the keyword confirmation unit is communicably connected to the keyword matching unit, and when the text information matches the specific keyword of the voice recognition template, determining that the voice recognition information includes the voice recognition template The specific keyword.
  • the voiceprint matching module includes a loudness setting unit, a loudness detecting unit, a loudness determining unit, and a loudness confirming unit;
  • the loudness setting unit sets a loudness threshold in the smart terminal
  • the loudness detecting unit acquires voice recognition information input by the smart terminal, and detects a speech sound level of the voice recognition information
  • the loudness determining unit is communicably connected to the loudness setting unit and the loudness detecting unit, and determines whether the speech sound level exceeds the loudness threshold;
  • the loudness confirming unit is communicably connected to the loudness determining unit, and when the speech sound level exceeds the loudness threshold, determining whether the voiceprint feature of the voice recognition information matches the voiceprint feature of the voice recognition template .
  • the voiceprint matching module includes a time setting unit, a continuous detecting unit, and a continuous confirming unit;
  • the time setting unit sets a time threshold in the smart terminal
  • the continuous detecting unit is connected to the time setting unit, and acquires voice recognition information input by the smart terminal, and detects whether the voice recognition information is continuous within the time threshold;
  • the continuous confirmation unit is communicably connected to the continuous detecting unit, and when the voice recognition information is continuous within the time threshold, determining whether the voiceprint feature of the voice recognition information and the voiceprint of the voice recognition template Features match.
  • the present invention provides a startup control method and a startup control system for an intelligent terminal.
  • the user inputs a voice command in the smart terminal.
  • the voice command is input to the smart terminal, and the voice recognition technology is used.
  • the preset voice commands in the smart terminal are matched.
  • the smart terminal can be started or turned off.
  • the power button of the smart terminal can be omitted, and the overall layout of the smart terminal is more convenient, compact, and beautiful, so that the smart terminal is more scientific and improves the user experience; and can be better.
  • FIG. 1 is a schematic flow chart of a startup control method in accordance with a preferred embodiment of the present invention
  • FIG. 2 is a schematic flow chart of determining whether a voiceprint feature matches in the startup control method of FIG. 1;
  • FIG. 3 is a schematic flowchart of determining whether a specific keyword matches in the startup control method of FIG. 2;
  • FIG. 4 is a schematic flow chart of speech sound level detection of the startup control method of FIG. 1;
  • FIG. 5 is a schematic flowchart of voice continuous detection of the startup control method of FIG. 1;
  • Figure 6 is a block diagram showing the structure of a start control system in accordance with a preferred embodiment of the present invention.
  • module or "unit” for indicating an element is merely an explanation for facilitating the present invention, and does not have a specific meaning per se. Therefore, “module” and “unit” can be used in combination.
  • the startup control method and the startup control system 100 of the present invention can be applied to an intelligent terminal, and the smart terminal can be implemented in various forms.
  • the liquid crystal display terminal described in the present invention may include, for example, a mobile phone, a smart phone, a notebook computer, a PDA (Personal Digital Assistant), a PAD (Tablet), a PMP (Portable Multimedia Player), a navigation device, a smart watch, or the like.
  • Mobile terminals, as well as fixed terminals such as digital TVs, desktop computers, and the like.
  • the present invention will be described assuming that the terminal is a mobile terminal and assuming that the mobile terminal is a smart phone.
  • a method for controlling startup of a smart terminal includes the following steps:
  • S100 Establish a voice recognition template containing a specific keyword in the smart terminal.
  • S200 Calling a microphone of the smart terminal to acquire voice recognition information input by the smart terminal;
  • S400 determining, when the voiceprint feature of the voice recognition information matches the voiceprint feature of the voice recognition template, whether the voice recognition information includes the specific keyword of the voice recognition template;
  • the smart terminal When the voice recognition information includes the specific keyword of the voice recognition template, and the smart terminal is in a shutdown state, the smart terminal is controlled to be powered on.
  • Step S100 establishing a voice recognition template containing a specific keyword in the smart terminal
  • the startup control method of the invention can retain the power button of the switch of the traditional intelligent terminal or save the power button of the switch of the traditional smart terminal.
  • the power button of the smart terminal is reserved, the user can set the mode for powering on and/or off in the system, for example, using the power button to power on and/or off, or using voice to turn on and/or off;
  • the smart terminal is turned on or off, the user can use the voice to turn on and/or off.
  • a voice command containing a specific keyword is entered to create a voice recognition template containing a specific keyword, and each time the user needs to start and/or shut down, the user only needs to speak the corresponding voice command, that is, The corresponding power on and / or shutdown operations can be performed.
  • the user uses the recording function of the smart terminal to record a voice command containing a specific keyword, the duration of the voice command, the language of the specific keyword, and the content.
  • the number of words can be freely set by the user. For example, if the user enters the voice command "Please turn off", a voice recognition template containing a specific keyword of "Please shut down” is established.
  • the user may import a recording file, a local or network downloaded audio file, or intercept a segment into the smart terminal as a voice command to create a voice recognition template containing a specific keyword. From the perspective of balancing the power consumption of smart terminals and the efficiency of speech recognition, the duration of voice commands should not be too long, and the content and number of words of a particular keyword should not be too much.
  • the voiceprint of the voice command input by the user is analyzed, the voiceprint feature of the user in the voice command is extracted, and the voice recognition template is stored as the first judgment condition for the voice recognition matching; the voice command input to the user
  • the content is analyzed, the specific keyword is extracted, and the speech recognition template is stored as the second judgment condition of the speech recognition matching, that is, the preset speech recognition template includes two parts of the feature content: the voiceprint feature and the specific keyword.
  • the voice recognition information includes not only the text content, but also the timbre, the tone, the sound in the environment, etc., therefore, the voice instruction input by the user is extracted and converted, and the voice is required to be used.
  • Format conversion For common voice information formats, such as mp3 format, all are compressed formats, which need to be converted to uncompressed pure waveform files for subsequent processing, such as Windows PCM files, also known as wav files. In addition to a file header stored in the wav file, it is a point of the sound waveform.
  • Pre-emphasis digital filtering The purpose of pre-emphasis is to make the high-frequency characteristics of the speech signal more prominent. This is usually done by passing the transfer function to a high-pass digital filter.
  • a is the pre-emphasis coefficient, generally It is between 0.9 and 1.0, usually 0.98.
  • the speech signal After pre-emphasis digital filtering processing, the following is the framing and framing processing, the speech signal has short-term stability (the speech signal can be considered to be approximately unchanged within 10ms ⁇ 30ms), so that the speech can be The signal is divided into short segments for processing. Framing of speech signals is accomplished by weighting a finite length window that is movable. Generally, the number of frames per second is about 33 to 100 frames, as the case may be.
  • the general framing method is overlapping segmentation In the method, the overlapping portion of the previous frame and the next frame is called frame shift, and the ratio of the frame shift to the frame length is generally 0 to 0.5.
  • each frame waveform is transformed into a multi-dimensional vector.
  • Each vector contains the content information of the speech of the frame.
  • the above process is the acoustic feature extraction. In practical applications, the acoustic characteristics are not limited to MFCC.
  • the steps of converting a unit audio stream into a multi-dimensional vector specifically include:
  • the unit audio stream signal is processed based on the wavelet transform.
  • the application of wavelet transform is based on two-channel decomposition of signals and their cascading.
  • the sampling of the unit audio stream signal satisfies Shannon's theorem Shannon, it is assumed that its digital frequency is between 0 and + ⁇ .
  • the unit audio stream signal is passed through an ideal low pass filter H and an ideal high pass filter G, respectively, and the extracted spectrum of the unit audio stream signal is decomposed into Low frequency part and The high frequency part.
  • the low frequency part can be regarded as the smooth part of the unit audio stream signal, and can also be understood as the overview of the unit audio stream signal.
  • the multi-stage processing can be cascaded, and the low-frequency portion of the upper-level decomposition is used as the input of the next-level unit audio stream signal to be decomposed again by G and H. Two extraction operations are performed after each G and H.
  • the fine structure and the abrupt part of the unit audio stream signal mainly have high-frequency components.
  • a multi-dimensional vector representing the audio stream signal of the multi-level, ie, n-layer coefficients, is formed, and each layer coefficient is a vector feature extracted from the low-frequency part and the high-frequency part.
  • Text conversion Convert the above observation sequence into text information.
  • Phoneme The pronunciation of a word consists of phonemes. For English, a commonly used phoneme set is a set of 39 phonemes composed by Carnegie Mellon University. Chinese generally uses all initials and finals as phoneme sets, and Chinese The identification is also divided into a tone;
  • Time domain The shaping of external voice information is based on the time domain. Therefore, it is necessary to analyze the playing time domain of the external voice information and record it as a time frame.
  • frame processing is performed by three small steps: recognizing the unit audio stream as a state; combining states into phonemes; and combining phonemes into words.
  • Several unit audio streams correspond to one state, and each three states are combined into one phoneme, and several phonemes are combined into one word. That is to say, as long as the state of each unit of audio stream per frame is known, the result of speech recognition comes out.
  • the "acoustic model” memory there are a lot of parameters, through these parameters, you can know the probability of the unit audio stream and the state corresponding.
  • the method of getting this bunch of parameters is called “training” and requires a huge amount of voice data.
  • HMM Hidden Markov Model
  • the second step is to find the path that best matches the sound from the state network.
  • the result is limited to a preset network, and then by constructing a state network, searching for an optimal path in the state network, the probability that the voice corresponds to the path is the greatest.
  • the path search algorithm is a dynamic plan pruning algorithm called Viterbi algorithm for finding the global optimal path. This basically completes the text information obtained after processing the multidimensional vector based on the time frame.
  • Step S200 Invoking a microphone of the smart terminal to acquire voice recognition information input by the smart terminal;
  • the microphone of the smart terminal is separately powered by the CPU of the smart terminal.
  • the voice recognition information is input to the smart terminal, and the microphone of the smart terminal is called to obtain the voice recognition information input by the user.
  • S200: calling the microphone of the smart terminal, and acquiring the voice recognition information input by the smart terminal includes the following steps:
  • S210 preset one or more specific applications in the smart terminal
  • S220 Invoking a microphone of the smart terminal to detect voice recognition information input by the user;
  • the smart terminal Since the user uses the smart terminal to make a call or the user carries the smart terminal in a noisy environment, the smart terminal may be automatically turned on and/or off due to misidentification. Therefore, in a preferred embodiment, the user may be in the system. Setting one or more specific applications, for example, setting a specific application to a phone, etc. when the user makes When a call is made with the phone function of the smart terminal, even if the user inputs a voice command of the voice recognition template, the subsequent shutdown operation will not be performed.
  • Step S300 determining whether the voiceprint feature of the voice recognition information matches the voiceprint feature of the voice recognition template
  • the voice recognition information is first matched with the voiceprint feature of the voice recognition template, and it is determined whether the voice recognition information input by the user satisfies the first determination condition.
  • determining whether the voiceprint feature of the voice recognition information matches the voiceprint feature of the voice recognition template includes:
  • S310 performing pre-emphasis digital filtering, framing, windowing, and the like on the voice recognition information
  • S320 Extracting the Mel frequency cepstral coefficient, the linear predictive cepstrum coefficient, the first-order difference of the Mel frequency cepstral coefficient, the first-order difference of the linear predictive cepstral coefficient, the energy, and the energy of the pre-processed speech recognition information
  • the first-order difference and the Gammatone filter cepstral coefficients together form a multi-dimensional feature vector
  • S330 Determine whether the multi-dimensional feature vector matches a multi-dimensional feature vector of the speech recognition template.
  • the speech recognition information is pre-emphasized digital filtering, framing, windowing and other pre-processing, and then the first-order difference of the voiceprint features MFCC, LPCC, ⁇ MFCC, ⁇ LPCC, energy and energy is extracted from the pre-processed speech recognition information.
  • the GFCCs jointly form a multi-dimensional feature vector, wherein: the MFCC is a Mel frequency cepstral coefficient, the LPCC is a linear predictive cepstrum coefficient, the ⁇ MFCC is a first-order difference of the MFCC, and the ⁇ LPCC is the The first order difference of the LPCC, which is the Gammatone filter cepstral coefficient.
  • step S300 it is determined whether the multi-dimensional feature vector completely matches the multi-dimensional feature vector corresponding to the pre-stored voiceprint feature. If it is completely matched, it is determined that the voiceprint feature of the voice recognition information matches the voiceprint feature of the voice recognition template.
  • determining whether the voiceprint feature of the voice recognition information matches the voiceprint feature of the voice recognition template includes:
  • S320' acquiring voice recognition information input by the smart terminal, and detecting a speech sound level of the voice recognition information
  • S340' when the speech sound level exceeds the loudness threshold, determine whether the voiceprint feature of the voice recognition information matches the voiceprint feature of the voice recognition template.
  • determining whether the voiceprint feature of the voice recognition information matches the voiceprint feature of the voice recognition template includes:
  • S320 acquiring voice recognition information input by the smart terminal, and detecting whether the voice recognition information is continuous within the time threshold;
  • the smart terminal may be automatically turned on and/or off due to misidentification. Therefore, in a preferred embodiment, the user may The loudness threshold or the time threshold of the voice recognition is set in the system, and it is judged whether the input voice recognition information reaches a certain loudness or is continuous for a certain period of time, and only when the above conditions are met, the subsequent power on and/or shutdown operations are performed.
  • Step S400 determining, when the voiceprint feature of the voice recognition information matches the voiceprint feature of the voice recognition template, whether the voice recognition information includes the specific keyword of the voice recognition template;
  • the voice recognition information input by the user is obtained, firstly, the voice recognition information is matched with the voiceprint feature of the voice recognition template, and it is determined whether the voice recognition information input by the user satisfies the first determination condition, and when the first condition is met, When the condition is judged, the voice recognition information is matched with the specific keyword of the voice recognition template, and it is determined whether the voice recognition information input by the user satisfies the second determination condition.
  • S400 determining whether the voice recognition information includes the voiceprint feature of the voice recognition information when the voiceprint feature matches the voiceprint feature of the voice recognition template
  • the steps of the specific keyword of the speech recognition template include:
  • S420 Determine whether the text information matches the specific keyword of the voice recognition template.
  • step S400 For the specific process of parsing the text in step S400, refer to the description in step S100, and details are not described herein again.
  • the specific keywords in the successfully matched speech recognition information are arranged in a specific keyword order of the speech recognition template, and are continuous for a certain period of time. Input. For example, if the specific keyword of the voice recognition template preset by the user is “Please shut down”, the user needs to continuously say “Please shut down” and “Please shut down” in a certain period of time, such as within 20 seconds. Happ Change, "Please shut down” can not be combined with other words.
  • the present invention further discloses a startup control system 100 for an intelligent terminal, comprising a template setting module 11, a voice acquisition module 12, a voiceprint matching module 13, a keyword matching module 14, and a startup control module 15;
  • the template setting module 11 is configured to establish a voice recognition template containing a specific keyword in the smart terminal;
  • the voice acquiring module 12 calls a microphone of the smart terminal to acquire voice recognition information input by the smart terminal;
  • the voiceprint matching module 13 is communicably connected to the template setting module 11 and the voice acquiring module 12, and determines whether the voiceprint feature of the voice recognition information matches the voiceprint feature of the voice recognition template.
  • the keyword matching module 14 is communicatively coupled to the template setting module 11, the voice acquiring module 12, and the voiceprint matching module 13, when the voiceprint feature of the voice recognition information and the voiceprint feature of the voice recognition template When matching, determining whether the voice recognition information includes the specific keyword of the voice recognition template;
  • the startup control module 15 is communicatively coupled to the keyword matching module 14.
  • the control is performed. Smart terminal shutdown; and / or
  • the smart terminal When the voice recognition information includes the specific keyword of the voice recognition template, and the smart terminal is in a shutdown state, the smart terminal is controlled to be powered on.
  • the voiceprint matching module 13 includes a preprocessing unit, a multidimensional vector unit, a voiceprint matching unit, and a voiceprint confirming unit;
  • the pre-processing unit performs pre-emphasis digital filtering, framing, windowing and the like on the voice recognition information
  • the multi-dimensional vector unit is communicably connected to the pre-processing unit, and extracts a first-order difference between the pre-processed speech recognition information, the linear frequency cepstral coefficient, the linear prediction cepstral coefficient, and the Mel frequency cepstral coefficient.
  • the first-order difference of linear predictive cepstral coefficients, the first-order difference of energy, energy, and the cepstral filter cepstral coefficients together form a multi-dimensional feature vector;
  • the voiceprint matching unit is communicatively coupled to the multi-dimensional vector unit to determine whether the multi-dimensional feature vector matches a multi-dimensional feature vector of the voice recognition template;
  • the voiceprint confirming unit is communicably connected to the voiceprint matching unit, and when the multi-dimensional feature vector matches the multi-dimensional feature vector of the voice recognition template, determining a voiceprint feature of the voice recognition information and the voice The voiceprint features of the recognition template are matched.
  • the keyword matching module 14 includes a text parsing unit, a keyword matching unit, and a keyword confirming unit;
  • the text parsing unit is communicatively coupled to the multidimensional vector unit to parse the multidimensional feature vector to text information
  • the keyword matching unit is communicably connected to the text parsing unit to determine whether the text information matches the specific keyword of the speech recognition template;
  • the keyword confirmation unit is communicably connected to the keyword matching unit, and when the text information matches the specific keyword of the voice recognition template, determining that the voice recognition information includes the voice recognition template The specific keyword.
  • the voiceprint matching module 13 includes a loudness setting unit, a loudness detecting unit, a loudness determining unit, and a loudness confirming unit;
  • the loudness setting unit sets a loudness threshold in the smart terminal
  • the loudness detecting unit acquires voice recognition information input by the smart terminal, and detects a speech sound level of the voice recognition information
  • the loudness determining unit is communicably connected to the loudness setting unit and the loudness detecting unit, and determines whether the speech sound level exceeds the loudness threshold;
  • the loudness confirming unit is communicably connected to the loudness determining unit, and when the speech sound level exceeds the loudness threshold, determining whether the voiceprint feature of the voice recognition information matches the voiceprint feature of the voice recognition template .
  • the voiceprint matching module 13 includes a time setting unit, a continuous detecting unit, and a continuous confirming unit;
  • the time setting unit sets a time threshold in the smart terminal
  • the continuous detecting unit is connected to the time setting unit, and acquires voice recognition information input by the smart terminal, and detects whether the voice recognition information is continuous within the time threshold;
  • the continuous confirmation unit is communicably connected to the continuous detecting unit, and when the voice recognition information is continuous within the time threshold, determining whether the voiceprint feature of the voice recognition information and the voiceprint of the voice recognition template Features match.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Telephone Function (AREA)
  • Telephonic Communication Services (AREA)

Abstract

La présente invention concerne un procédé et un système de commande de démarrage pour un terminal intelligent. Un utilisateur enregistre une instruction vocale dans un terminal intelligent ; lorsque l'utilisateur a besoin de mettre le terminal intelligent sous tension ou hors tension, le procédé consiste à entrer l'instruction vocale dans le terminal intelligent ; et l'instruction vocale est séquentiellement mise en correspondance avec une caractéristique d'empreinte vocale d'une instruction vocale et avec un mot-clé spécifique prédéfini dans le terminal intelligent au moyen d'une technologie de reconnaissance vocale, et lorsque la caractéristique d'empreinte vocale et le mot-clé spécifique sont mis en correspondance avec succès, le terminal intelligent peut être mis sous tension ou hors tension. Au moyen du procédé de commande de démarrage et du système de commande de démarrage, un bouton d'alimentation du terminal intelligent peut être supprimé, de telle sorte que le terminal intelligent peut être plus léger, plus pratique, plus petit, plus raffiné et élégant dans l'agencement global, de telle sorte que le terminal intelligent apporte davantage de contenu technologique, et l'expérience d'utilisation de l'utilisateur est améliorée ; la confidentialité d'un utilisateur dans le terminal intelligent peut être mieux protégée, et la fuite d'informations personnelles de l'utilisateur dans le terminal intelligent est évitée.
PCT/CN2017/101570 2017-09-13 2017-09-13 Procédé de commande de démarrage et système de commande de démarrage pour terminal intelligent WO2019051668A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201780096731.2A CN111345016A (zh) 2017-09-13 2017-09-13 一种智能终端的启动控制方法及启动控制系统
PCT/CN2017/101570 WO2019051668A1 (fr) 2017-09-13 2017-09-13 Procédé de commande de démarrage et système de commande de démarrage pour terminal intelligent

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/101570 WO2019051668A1 (fr) 2017-09-13 2017-09-13 Procédé de commande de démarrage et système de commande de démarrage pour terminal intelligent

Publications (1)

Publication Number Publication Date
WO2019051668A1 true WO2019051668A1 (fr) 2019-03-21

Family

ID=65723218

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/101570 WO2019051668A1 (fr) 2017-09-13 2017-09-13 Procédé de commande de démarrage et système de commande de démarrage pour terminal intelligent

Country Status (2)

Country Link
CN (1) CN111345016A (fr)
WO (1) WO2019051668A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112102834A (zh) * 2020-09-28 2020-12-18 安徽康居人健康科技有限公司 智能语音控制呼吸制氧一体机
CN116597839A (zh) * 2023-07-17 2023-08-15 山东唐和智能科技有限公司 一种智能语音交互系统及方法
CN117854506A (zh) * 2024-03-07 2024-04-09 鲁东大学 一种机器人语音智能交互系统

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020062211A1 (en) * 2000-10-13 2002-05-23 Li Qi P. Easily tunable auditory-based speech signal feature extraction method and apparatus for use in automatic speech recognition
CN1856820A (zh) * 2003-07-28 2006-11-01 西门子公司 语音识别方法和通信设备
CN101051464A (zh) * 2006-04-06 2007-10-10 株式会社东芝 说话人认证的注册和验证方法及装置
CN101441869A (zh) * 2007-11-21 2009-05-27 联想(北京)有限公司 语音识别终端用户身份的方法及终端
CN102779509A (zh) * 2011-05-11 2012-11-14 联想(北京)有限公司 语音处理设备和语音处理方法
CN103414830A (zh) * 2013-08-28 2013-11-27 上海斐讯数据通信技术有限公司 基于语音实现快速关机的方法及系统
CN104575504A (zh) * 2014-12-24 2015-04-29 上海师范大学 采用声纹和语音识别进行个性化电视语音唤醒的方法
CN107463384A (zh) * 2017-08-18 2017-12-12 湖州靖源信息技术有限公司 一种移动设备及开关机方法

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20050081470A (ko) * 2004-02-13 2005-08-19 주식회사 엑스텔테크놀러지 음성인식 가능한 메시지 녹음/재생방법
KR101590332B1 (ko) * 2012-01-09 2016-02-18 삼성전자주식회사 영상장치 및 그 제어방법
CN102710867B (zh) * 2012-06-20 2015-06-17 北京三星通信技术研究有限公司 移动终端的控制装置及方法
CN103024177A (zh) * 2012-12-13 2013-04-03 广东欧珀移动通信有限公司 一种移动终端驾驶模式操作方法及移动终端
CN103077713B (zh) * 2012-12-25 2019-02-01 青岛海信电器股份有限公司 一种语音处理方法及装置
CN104331265A (zh) * 2014-09-30 2015-02-04 北京金山安全软件有限公司 一种语音输入方法、装置及终端
CN106060235A (zh) * 2016-05-05 2016-10-26 广东小天才科技有限公司 一种应用于移动设备的开关机控制方法及装置、移动设备

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020062211A1 (en) * 2000-10-13 2002-05-23 Li Qi P. Easily tunable auditory-based speech signal feature extraction method and apparatus for use in automatic speech recognition
CN1856820A (zh) * 2003-07-28 2006-11-01 西门子公司 语音识别方法和通信设备
CN101051464A (zh) * 2006-04-06 2007-10-10 株式会社东芝 说话人认证的注册和验证方法及装置
CN101441869A (zh) * 2007-11-21 2009-05-27 联想(北京)有限公司 语音识别终端用户身份的方法及终端
CN102779509A (zh) * 2011-05-11 2012-11-14 联想(北京)有限公司 语音处理设备和语音处理方法
CN103414830A (zh) * 2013-08-28 2013-11-27 上海斐讯数据通信技术有限公司 基于语音实现快速关机的方法及系统
CN104575504A (zh) * 2014-12-24 2015-04-29 上海师范大学 采用声纹和语音识别进行个性化电视语音唤醒的方法
CN107463384A (zh) * 2017-08-18 2017-12-12 湖州靖源信息技术有限公司 一种移动设备及开关机方法

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112102834A (zh) * 2020-09-28 2020-12-18 安徽康居人健康科技有限公司 智能语音控制呼吸制氧一体机
CN112102834B (zh) * 2020-09-28 2024-01-23 安徽双歌健康科技有限公司 智能语音控制呼吸制氧一体机
CN116597839A (zh) * 2023-07-17 2023-08-15 山东唐和智能科技有限公司 一种智能语音交互系统及方法
CN116597839B (zh) * 2023-07-17 2023-09-19 山东唐和智能科技有限公司 一种智能语音交互系统及方法
CN117854506A (zh) * 2024-03-07 2024-04-09 鲁东大学 一种机器人语音智能交互系统
CN117854506B (zh) * 2024-03-07 2024-05-14 鲁东大学 一种机器人语音智能交互系统

Also Published As

Publication number Publication date
CN111345016A (zh) 2020-06-26

Similar Documents

Publication Publication Date Title
CN110310623B (zh) 样本生成方法、模型训练方法、装置、介质及电子设备
US9775113B2 (en) Voice wakeup detecting device with digital microphone and associated method
US10719115B2 (en) Isolated word training and detection using generated phoneme concatenation models of audio inputs
WO2021093449A1 (fr) Procédé et appareil de détection de mot de réveil employant l'intelligence artificielle, dispositif, et support
CN106981290B (zh) 语音控制装置和语音控制方法
EP3032535A1 (fr) Dispositif et procédé de détection d'activation vocale
US20140200890A1 (en) Methods, systems, and circuits for speaker dependent voice recognition with a single lexicon
US11893350B2 (en) Detecting continuing conversations with computing devices
JP7485858B2 (ja) 実世界ノイズを使用した音声個性化および連合訓練
US10229701B2 (en) Server-side ASR adaptation to speaker, device and noise condition via non-ASR audio transmission
CN108346426B (zh) 语音识别装置以及语音识别方法
US11676582B2 (en) Detecting conversations with computing devices
WO2019051668A1 (fr) Procédé de commande de démarrage et système de commande de démarrage pour terminal intelligent
US20120221335A1 (en) Method and apparatus for creating voice tag
US20220238118A1 (en) Apparatus for processing an audio signal for the generation of a multimedia file with speech transcription
KR102217292B1 (ko) 적어도 하나의 의미론적 유닛의 집합을 음성을 이용하여 개선하기 위한 방법, 장치 및 컴퓨터 판독 가능한 기록 매체
Tsai et al. Customized wake-up word with key word spotting using convolutional neural network
Prasangini et al. Sinhala speech to sinhala unicode text conversion for disaster relief facilitation in sri lanka
CN115129923B (zh) 语音搜索方法、设备及存储介质
KR102392992B1 (ko) 음성 인식 기능을 활성화시키는 호출 명령어 설정에 관한 사용자 인터페이싱 장치 및 방법
US20230386458A1 (en) Pre-wakeword speech processing
Chung et al. Speech processing in Java-based PC speech commanding application
KashyapDas et al. Cepstral Analysis of Assamese Vowel Phonemes
CN115410557A (zh) 语音处理方法、装置、电子设备及存储介质
Chao et al. A system for Mandarin short phrase recognition on portable devices

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17924934

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17924934

Country of ref document: EP

Kind code of ref document: A1