CN108735209A

CN108735209A - Wake up word binding method, smart machine and storage medium

Info

Publication number: CN108735209A
Application number: CN201810407844.6A
Authority: CN
Inventors: 何瑞澄
Original assignee: Midea Group Co Ltd; Guangdong Midea Refrigeration Equipment Co Ltd
Current assignee: Midea Group Co Ltd; GD Midea Air Conditioning Equipment Co Ltd
Priority date: 2018-04-28
Filing date: 2018-04-28
Publication date: 2018-11-02
Anticipated expiration: 2038-04-28
Also published as: CN108735209B

Abstract

The invention discloses a kind of wake-up word binding methods, including：Step S1, the voice signal that acquisition user sends out；Step S2, the wake-up word information and user information in the voice signal are extracted；Step S3, the user information and the wake-up word information are bound with the user.The present invention also proposes a kind of smart machine and storage medium.The present invention reduces operation, easy to use, raising intelligence degree without recording a large amount of voices.

Description

Wake up word binding method, smart machine and storage medium

Technical field

The present invention relates to a kind of technical field of voice recognition more particularly to wake-up word binding method, smart machine and storages Medium.

Background technology

Speech recognition technology, exactly allow machine by identification and understanding process voice signal be changed into corresponding text or The high-tech of order, that is, the voice that allows machine to understand the mankind.Also referred to as automatic speech recognition (Automatic Speech Recognition, ASR), it is computer-readable input that target, which is by the vocabulary Content Transformation in human speech, such as by Key, binary coding or character string.Speech recognition technology comes into household electrical appliances, communication, electronic product, home services in recent years Equal fields are controlled with the near field or far field that provide household electrical appliances or electronic product, and it is user's household electrical appliances or electronics to wake up word binding technology The near field of product or far field control provide premise.

The mainstream technology for waking up word binding is that technical software wakes up, but running software is premised on system starts, to ensure The phonetic order of user can be received whenever and wherever possible, and running background and monitoring, system cannot be introduced into speech recognition engine needs always The standby electricity-saving state of suspend mode, power consumption are larger.For reduction system power dissipation, occur voice low-power consumption awakening technology at present, It is trained to fixed wake-up word by recording a large amount of voice data, to identify the wake-up word in the phonetic order of user When wake up system.

But inventor has found that above-mentioned technology at least has the following technical problems：

The self-defined word that wakes up needs definition to record very much very more voice data, cumbersome, inconvenient to use, intelligence Change degree is poor.

Invention content

The embodiment of the present invention solves existing self-defined wake-up word and needs to define by providing a kind of wake-up word binding method Record very much very more voice data, cumbersome, inconvenient to use, the technical problem of intelligence degree difference.

An embodiment of the present invention provides wake-up word binding method, include the following steps：

Step S1, the voice signal that acquisition user sends out；

Step S2, the wake-up word information and user information in the voice signal are extracted；

Step S3, the user information and the wake-up word information are bound with the user.

Optionally, the step S3 includes：

Step S31, the user's registration is obtained to the wake-up word model of speech recognition system, by the user information and institute It states and wakes up word and wake-up word model binding.

Optionally, it is voiceprint in the user information, the step S31 includes：

Step S311, multi collect wake-up word sound signal input by user；

Step S312, it is special to obtain timing feature, tonality feature and the phoneme waken up in word sound signal inputted every time Sign；

Step S313, acoustic feature processing is carried out to the timing feature and tonality feature that obtain every time, will passed through The timing characteristic information of acoustic feature processing and the voice print database that tonality feature information registering is the user；

Step S314, combination is ranked up to the phoneme feature obtained every time based on preset acoustic model, obtained described Wake up word model；

Step S315, by the voice print database and word and wake-up word model interaction preservation are waken up.

Optionally, the step S2 includes：

Step S21, when receiving voice signal, judge whether the volume value of the voice signal is more than default volume Value；

Step S22, if so, obtaining the wake-up word information in the voice signal based on acoustic model and syntactic structure, The voiceprint in the voice signal is obtained based on sound groove recognition technology in e.

Optionally, after the step S3, further include：

Step S4, it receives and wakes up voice signal, extract the wake-up word waken up in voice signal；

Step S5, when the wake-up word is matched with the default wake-up word in speech recognition system, to the wake-up word Sound signal response executes the operation of response.

Optionally, after the step S4, further include：

Step S6, the recognition threshold of the default wake-up word in speech recognition system is adjusted；

Step S7, when the wake-up word is matched with the default wake-up word after adjustment, the wake-up word sound signal is rung The operation of response should be executed.

Optionally, the user information is voiceprint, and the step S6 includes：

Step S61, the voiceprint waken up in word sound signal is extracted；

Step S62, when voice print database matched with the voiceprint is not present in speech recognition system, voice is turned up The wake-up word recognition threshold of identifying system；

Step S63, when there is voice print database matched with the voiceprint in speech recognition system, voice knowledge is turned down The wake-up word recognition threshold of other system.

Optionally, after the step S61, further include：

Step S64, the voiceprint and the vocal print number for being registered in speech recognition system are calculated according to default sound-groove model According to similarity；

Step S65, when the similarity within a preset range when, judgement speech recognition system in exist and the vocal print believe Cease matched voice print database；

Step S66, it when the similarity is when except preset range, is not present in judgement speech recognition system and the sound The voice print database of line information matches.

The present invention also proposes a kind of storage medium, which, which is stored with, wakes up word binding procedure, and the wake-up word is tied up Determine to realize the step of waking up word binding as described above when program is executed by processor.

The present invention by obtaining the wake-up word information in the voice signal that receives, by the wake-ups word and the user into Row binding, rather than a large amount of voices of the recording of blindness, but after recording wake-up word, word will be waken up and bound with user information, In follow-up identification process, it can directly correspond to user and wake up word to identify, recognition accuracy be improved, without recording a large amount of languages Sound reduces operation, easy to use, improves intelligence degree.

Description of the drawings

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technology description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with The structure shown according to these attached drawings obtains other attached drawings.

Fig. 1 is the structural schematic diagram for the hardware running environment that the smart machine of the present invention is related to；

Fig. 2 is the flow diagram of the wake-up word binding method first embodiment of the present invention；

Fig. 3 is to obtain the user's registration to the wake-up word model of speech recognition system, by institute in one embodiment of the invention State user information and the flow diagram for waking up word and the wake-up word model binding；

Fig. 4 is the refinement flow diagram of step S20 in one embodiment of the invention；

Fig. 5 is the flow diagram that the present invention wakes up word binding method second embodiment；

Fig. 6 is the flow diagram that the present invention wakes up word binding method 3rd embodiment；

Fig. 7 is the flow diagram that recognition threshold is adjusted in one embodiment of the invention；

Fig. 8 is the flow diagram that voiceprint is judged in one embodiment of the invention；

Fig. 9 is the refinement flow diagram of step S203 in one embodiment of the invention；

Figure 10 is the refinement flow diagram of step S70 in one embodiment of the invention.

Drawing reference numeral explanation：

Label	Title	Label	Title
				100	Smart machine	101	Radio frequency unit
102	WiFi module	103	Audio output unit
				104	A/V input units	1041	Graphics processor
1042	Microphone	105	Sensor
				106	Display unit	1061	Display interface
107	User input unit	1071	Operation and control interface
				1072	Other input equipments	108	Interface unit
109	Memory	110	Processor
				111	Power supply

The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.

Specific implementation mode

It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not intended to limit the present invention.

In subsequent description, using for indicating that the suffix of such as " module ", " component " or " unit " of element is only The explanation for being conducive to the present invention, itself does not have a specific meaning.Therefore, " module ", " component " or " unit " can mix Ground uses.

Smart machine can be implemented in a variety of manners.For example, smart machine described in the present invention can be by such as hand Machine, tablet computer, laptop, palm PC, personal digital assistant (Personal Digital Assistant, PDA), Portable media player (Portable Media Player, PMP), navigation device, wearable device, Intelligent bracelet, meter step Device, intelligent sound box etc. have display interface mobile terminal realize, can also by such as number TV, desktop computer, air conditioner, There is the fixed terminal of display interface to realize for refrigerator, water heater, dust catcher etc..

It will be illustrated by taking smart machine as an example in subsequent descriptions, it will be appreciated by those skilled in the art that in addition to special Except element for moving purpose, the intelligence that construction according to the embodiment of the present invention can also apply to fixed type is set It is standby.

Referring to Fig. 1, a kind of hardware architecture diagram of its smart machine of each embodiment to realize the present invention, the intelligence Can equipment 100 may include：RF (Radio Frequency, radio frequency) unit 101, WiFi module 102, audio output unit 103, A/V (audio/video) input unit 104, sensor 105, display area 106, user input unit 107, interface unit 108, the components such as memory 109, processor 110 and power supply 111.It will be understood by those skilled in the art that shown in Fig. 1 Smart machine structure does not constitute the restriction to smart machine, and smart machine may include components more more or fewer than diagram, Either combine certain components or different components arrangement.

The all parts of smart machine are specifically introduced with reference to Fig. 1：

Radio frequency unit 101 can be used for receiving and sending messages or communication process in, signal sends and receivees, specifically, by base station Downlink information receive after, to processor 110 handle；In addition, the data of uplink are sent to base station.In general, radio frequency unit 101 Including but not limited to antenna, at least one amplifier, transceiver, coupler, low-noise amplifier, duplexer etc..In addition, penetrating Frequency unit 101 can also be communicated with network and other equipment by radio communication.Above-mentioned wireless communication can use any communication Standard or agreement, including but not limited to GSM (Global System of Mobile communication, global system for mobile telecommunications System), GPRS (General Packet Radio Service, general packet radio service), CDMA2000 (Code Division Multiple Access2000, CDMA 2000), WCDMA (Wideband Code Division Multiple Access, wideband code division multiple access), TD-SCDMA (Time Division-Synchronous Code Division Multiple Access, TD SDMA), FDD-LTE (Frequency Division Duplexing-Long Term Evolution, frequency division duplex long term evolution) and TDD-LTE (Time Division Duplexing-Long Term Evolution, time division duplex long term evolution) etc..

WiFi belongs to short range wireless transmission technology, and smart machine can help user to receive and dispatch electricity by WiFi module 102 Sub- mail, browsing webpage and access streaming video etc., it has provided wireless broadband internet to the user and has accessed.Although Fig. 1 shows Go out WiFi module 102, but it is understood that, and it is not belonging to must be configured into for smart machine, it completely can be according to need It to be omitted in the range for the essence for not changing invention.Such as in the present embodiment, smart machine 100 can be based on WiFi module 102 establish synchronization association relationship with App terminals.

Audio output unit 103 can be in call signal reception pattern, call mode, record mould in smart machine 100 When under the isotypes such as formula, speech recognition mode, broadcast reception mode, it is that radio frequency unit 101 or WiFi module 102 are received or The audio data stored in memory 109 is converted into audio signal and exports to be sound.Moreover, audio output unit 103 The relevant audio output of specific function executed with smart machine 100 can also be provided (for example, call signal receives sound, disappears Breath receives sound etc.).Audio output unit 103 may include loud speaker, buzzer etc..In the present embodiment, exporting When re-entering the prompt of voice signal, which can be voice prompt, the vibration prompting etc. based on buzzer.

A/V input units 104 are for receiving audio or video signal.A/V input units 104 may include graphics processor (Graphics Processing Unit, GPU) 1041 and microphone 1042, graphics processor 1041 is in video acquisition mode Or the image data of the static images or video obtained by image capture apparatus (such as camera) in image capture mode carries out Reason.Treated, and picture frame may be displayed on display area 106.Through graphics processor 1041, treated that picture frame can be deposited Storage is sent in memory 109 (or other storage mediums) or via radio frequency unit 101 or WiFi module 102.Mike Wind 1042 can connect in telephone calling model, logging mode, speech recognition mode etc. operational mode via microphone 1042 Quiet down sound (audio data), and can be audio data by such acoustic processing.Audio that treated (voice) data can To be converted to the format output that can be sent to mobile communication base station via radio frequency unit 101 in the case of telephone calling model. Microphone 1042 can implement various types of noises elimination (or inhibition) algorithms and send and receive sound to eliminate (or inhibition) The noise generated during frequency signal or interference.

Smart machine 100 further includes at least one sensor 105, such as optical sensor, motion sensor and other biographies Sensor.Specifically, optical sensor includes ambient light sensor and proximity sensor, wherein ambient light sensor can be according to environment The light and shade of light adjusts the brightness of display interface 1061, and proximity sensor can close when smart machine 100 is moved in one's ear Display interface 1061 and/or backlight.As a kind of motion sensor, accelerometer sensor can detect in all directions (general For three axis) size of acceleration, size and the direction of gravity are can detect that when static, can be used to identify the application of mobile phone posture (such as horizontal/vertical screen switching, dependent game, magnetometer pose calibrating), Vibration identification correlation function (such as pedometer, percussion) etc.； The fingerprint sensor that can also configure as mobile phone, pressure sensor, iris sensor, molecule sensor, gyroscope, barometer, The other sensors such as hygrometer, thermometer, infrared sensor, details are not described herein.

Display area 106 is for showing information input by user or being supplied to the information of user.Display area 106 can wrap Display interface 1061 is included, liquid crystal display (Liquid Crystal Display, LCD), Organic Light Emitting Diode may be used Forms such as (Organic Light-Emitting Diode, OLED) configure display interface 1061.

User input unit 107 can be used for receiving the number or character information of input, and generate the use with smart machine Family is arranged and the related key signals input of function control.Specifically, user input unit 107 may include operation and control interface 1071 with And other input equipments 1072.Operation and control interface 1071, also referred to as touch screen collect user on it or neighbouring touch operation (for example user uses any suitable objects or attachment such as finger, stylus on operation and control interface 1071 or in operation and control interface 1071 Neighbouring operation), and corresponding attachment device is driven according to preset formula.Operation and control interface 1071 may include touch detection Two parts of device and touch controller.Wherein, the touch orientation of touch detecting apparatus detection user, and detect touch operation band The signal come, transmits a signal to touch controller；Touch controller receives touch information from touch detecting apparatus, and by it It is converted into contact coordinate, then gives processor 110, and order that processor 110 is sent can be received and executed.In addition, can To realize operation and control interface 1071 using multiple types such as resistance-type, condenser type, infrared ray and surface acoustic waves.In addition to operation and control interface 1071, user input unit 107 can also include other input equipments 1072.Specifically, other input equipments 1072 can wrap It includes but is not limited in physical keyboard, function key (such as volume control button, switch key etc.), trace ball, mouse, operating lever etc. It is one or more, do not limit herein specifically.

Further, operation and control interface 1071 can cover display interface 1061, when operation and control interface 1071 detect on it or After neighbouring touch operation, processor 110 is sent to determine the type of touch event, is followed by subsequent processing device 110 according to touch thing The type of part provides corresponding visual output on display interface 1061.Although in Fig. 1, operation and control interface 1071 and display interface 1061 be to realize the function that outputs and inputs of smart machine as two independent components, but in certain embodiments, can The function that outputs and inputs of smart machine is realized so that operation and control interface 1071 and display interface 1061 is integrated, is not done herein specifically It limits.

Interface unit 108 be used as at least one external device (ED) connect with smart machine 100 can by interface.For example, External device (ED) may include wired or wireless headphone port, external power supply (or battery charger) port, wired or nothing Line data port, memory card port, the port for connecting the device with identification module, audio input/output (I/O) end Mouth, video i/o port, ear port etc..Interface unit 108 can be used for receiving the input from external device (ED) (for example, number It is believed that breath, electric power etc.) and the input received is transferred to one or more elements in smart machine 100 or can be with For the transmission data between smart machine 100 and external device (ED).

Memory 109 can be used for storing software program and various data.Memory 109 can include mainly storing program area And storage data field, wherein storing program area can storage program area, application program (such as the language needed at least one function Sound identifying system etc.) etc.；Storage data field can store according to smart machine use created data (such as voice print database, Wake up word model, user information etc.) etc..In addition, memory 109 may include high-speed random access memory, can also include Nonvolatile memory, for example, at least a disk memory, flush memory device or other volatile solid-state parts.

Processor 110 is the control centre of smart machine, utilizes each of various interfaces and the entire smart machine of connection A part by running or execute the software program and/or module that are stored in memory 109, and calls and is stored in storage Data in device 109 execute the various functions and processing data of smart machine, to carry out integral monitoring to smart machine.Place Reason device 110 may include one or more processing units；Preferably, processor 110 can integrate application processor and modulatedemodulate is mediated Manage device, wherein the main processing operation system of application processor, user interface and application program etc., modem processor is main Processing wireless communication.It is understood that above-mentioned modem processor can not also be integrated into processor 110.

Smart machine 100 can also include the power supply 111 (such as battery) powered to all parts, it is preferred that power supply 111 Can be logically contiguous by power-supply management system and processor 110, to realize management charging by power-supply management system, put The functions such as electricity and power managed.

Although Fig. 1 is not shown, smart machine 100 can also include the bluetooth module that communication connection can be established with other-end Deng details are not described herein.

Based on the hardware configuration of above-mentioned smart machine, the smart machine of the embodiment of the present invention is mounted with speech recognition system, Wake-up word information in the voice signal received by acquisition, the wake-up word is bound with the user, rather than The a large amount of voices of recording of blindness, but after recording wake-up word, word will be waken up and bound with user information, in follow-up identification process In, it can directly correspond to user and wake up word to identify, improve recognition accuracy, without a large amount of voices of recording, reduce operation, It is easy to use, improve intelligence degree.

As shown in Figure 1, as may include operating system and wake-up word in a kind of memory 109 of computer storage media Binding procedure.

In smart machine 100 shown in Fig. 1, WiFi module 102 is mainly used for connecting background server or big data cloud End, with background server or big data high in the clouds into row data communication, and can realize and be communicatively coupled with other-end equipment；Place Reason device 110 can be used for calling the wake-up word binding application program stored in memory 109, and execute following operation：

Step S1, the voice signal that acquisition user sends out；

Optionally, the step S3 includes：

Further, it is voiceprint in the user information, processor 110 can be used for calling to be deposited in memory 109 The wake-up word binding application program of storage, and execute following operation：

Step S311, multi collect wake-up word sound signal input by user；

Further, processor 110 can be used for calling the wake-up word binding application program stored in memory 109, and Execute following operation：

Further, after the step S3, processor 110 can be used for calling the wake-up word stored in memory 109 Binding application program, and execute following operation：

Further, after the step S4, processor 110 can be used for calling the wake-up word stored in memory 109 Binding application program, and execute following operation：

Further, the user information is voiceprint, and processor 110 can be used for calling to be stored in memory 109 Wake-up word binding application program, and execute following operation：

Step S61, the voiceprint waken up in word sound signal is extracted；

Further, after the step S61, processor 110 can be used for calling the wake-up word stored in memory 109 Binding application program, and execute following operation：

Present invention further propose that a kind of wake-up word binding method, applied to wake-up speech recognition system or is mounted with voice The smart machine of identifying system.

With reference to the flow diagram for the wake-up word binding method first embodiment that Fig. 2, Fig. 2 are the present invention.

In this embodiment, the wake-up word binding method includes the following steps：

S10：The voice signal that acquisition user sends out；

In the present embodiment, when user for the first time wakes up the speech recognition system using self-defined wake-up word or Person fails to avoid waking up in the wake-up word for needing typing user, improves wake-up rate, need the customized wake-up of training user Word model, to be responded when receiving user and inputting wake-up word corresponding comprising the wake-up word model.User sends out Voice signal, acquires the voice signal that the user sends out, can be in the voice signal include " air-conditioning ", " dehumidifier " or " fan " etc. can also be to be provided as waking up word in advance including " booting ", " temperature is turned up ", " one grade of wind speed is turned up " etc. Information.

S20, the wake-up word information in the extraction voice signal and user information；

After getting voice signal input by user, the wake-up word information and user's letter in the voice signal are extracted Breath；The user information can be subscriber identity information, can be used for identifying the information of user for user's voice print database etc..It is described The extraction for waking up word and user information is converted into text information by the conversion to voice signal, is carried from text information It is taken as waking up word and carries the sentence of user information.

S30, the user information and the wake-up word information are bound with the user.

Specifically, the user-defined wake-up word sound signal of acquisition, as user can repeatedly input the voice of " air-conditioning " Signal, and after smart machine is based on microphone or audio sensor pickup to the voice signal of " air-conditioning ", obtain the use Family is registered to the wake-up word model of speech recognition system, and the user information and the wake-up word are tied up with the wake-up word model It is fixed.

More accurately adjustment is carried out to waking up word recognition threshold according to the voice print database recognized for ease of follow-up, is being obtained After getting the voice print database of registration user and the wake-up word model of registration, further by the voice print database and the wake-up word mould Type is associated, and establishes incidence relation between the two.

The present embodiment is by obtaining the wake-up word information in the voice signal received, by the wake-up word and the user It is bound, rather than a large amount of voices of the recording of blindness, but after recording wake-up word, word will be waken up and bound with user information, In follow-up identification process, it can directly correspond to user and wake up word to identify, improve recognition accuracy, it is a large amount of without recording Voice reduces operation, easy to use, improves intelligence degree.

Further, with reference to Fig. 3, the wake-up word binding method based on above-described embodiment is described to obtain the user's registration To the wake-up word model of speech recognition system, by the user information and the step for waking up word and the wake-up word model binding Suddenly include：

S100：Multi collect wake-up word sound signal input by user；

In the present embodiment, the user information is described by taking user's voice print database as an example.In order to improve the essence for waking up word binding Parasexuality, method, can be with multi collect wake-up word sound signal input by user, then according to more in sample phase in the present embodiment The wake-up word sound signal of secondary acquisition obtains optimal wake-up word model and voice print database.

S200：Obtain timing feature, tonality feature and the phoneme feature waken up in word sound signal inputted every time；

In the voice print database and user's registration for obtaining user according to the wake-up word sound signal of multi collect to speech recognition When the wake-up word model of system, the wake-up word sound signal that same user inputs every time is specially converted into voice digital signal Afterwards, timing feature and the tonality feature in the voice signal are obtained based on sound groove recognition technology in e；Based on acoustic model and language Method structure obtains the factor feature in the voice signal, such as obtains voice signal in various paragraph (such as sounds by end-point detection Element, syllable, morpheme) initial point and final position, unvoiced segments are excluded from voice signal.

S300：Acoustic feature processing is carried out to the timing feature and tonality feature that obtain every time, acoustics will be passed through The timing characteristic information and tonality feature information registering of characteristic processing are the voice print database of the user；

After getting the wake-up word sound signal of input for the first time, timing feature 1 and sound are obtained based on Application on Voiceprint Recognition Feature 1 is adjusted, the timing feature 2 and tonality feature 2 of second of input waken up in word sound signal is then obtained, it is poor when existing When different larger, optimize timing feature 1 using timing feature 2, optimizes tonality feature 1 using tonality feature 2, and so on, Until the timing feature n obtained again and tonality feature n is respectively between current timing feature n-1, tonality feature n-1 Difference within a preset range, the current tempo sense feature and tonality feature are registered as into the use after acoustic feature is handled The voice print database at family.

S400：Combination is ranked up to the phoneme feature obtained every time based on preset acoustic model, obtains the wake-up Word model；

Similarly, it after getting the wake-up word sound signal of input for the first time, is obtained based on acoustic model and syntactic structure Then phoneme feature 1 obtains the phoneme feature 2 of second of input waken up in word sound signal, obtains same phoneme and arranging Position in combination, when first time, input with inputting different for the second time, in the wake-up word sound signal for obtaining third time input Phoneme feature 3 obtain institute until after determining the position of each phoneme in the preset phoneme permutation and combination for waking up word model It states and wakes up word model.

S500：By the voice print database and wake up word and wake-up word model interaction preservation.

After the wake-up word model for the voice print database and registration for getting user, by the user information of the user, such as use The voice print database and wake-up word are preserved with the wake-up word model interaction to the voice and are known by family account, Customs Assigned Number etc. Other system, in order to determine the corresponding wake-up word model of the user according to the voice print database identified in follow-up wakeup process, with Word identification is waken up for subsequently making.It by voice print database and wakes up word and wakes up being associated with for word model so that pass through voice print database It is more accurate that identification wakes up.

Further, reference Fig. 4, the wake-up word binding method based on above-described embodiment, step S20 include：

S20a：When receiving voice signal, judge whether the volume value of the voice signal is more than default volume value；

In the present embodiment, since vocal print is the sound wave spectrum with verbal information, vocal print itself and amplitude, frequency, base Because profile, formant frequency bandwidth etc. are closely related, and sound wave is in communication process, the voice that the distance of propagation far receives The volume value of signal is smaller, and amplitude and volume value are inversely, so the volume value of vocal print and the voice signal received It is related.In addition, the speech recognition engine of speech recognition system only identifies that speech volume reaches the voice of predetermined threshold value, therefore, it is Whether the accuracy for improving Application on Voiceprint Recognition and speech recognition needs the volume value for the voice signal for judging to receive to be more than default Volume value, the default volume value are the minimal volume value of the voice signal needed for Application on Voiceprint Recognition and speech recognition.

S20b：If so, obtaining the wake-up word information in the voice signal, and base based on acoustic model and syntactic structure The voiceprint in the voice signal is obtained in sound groove recognition technology in e.

When the volume value of the voice signal received is more than default volume value, judge that the voice signal received is effective, Can Application on Voiceprint Recognition and acoustics model analysis further be carried out to it, such as based on end-point detection by voice signal phoneme, syllable, Unvoiced segments in the paragraphs such as morpheme exclude, and the syllable characteristic being then based in voice signal obtains the vocal print letter of the voice signal Breath, the wake-up in the voice signal is obtained based on morpheme feature, phoneme feature acoustic model and the syntactic structure in voice signal Word information.

Further, with reference to Fig. 5, the wake-up word binding method based on above-described embodiment after step S30, further includes：

S40 is received and is waken up voice signal, extracts the wake-up word waken up in voice signal；

S50, when the wake-up word is matched with the default wake-up word in speech recognition system, to the wake-up word message Number response execute response operation.

After user has binding to wake up word, receives and wake up word sound signal, make wake operation, extract the wake-up voice Wake-up word in signal, when the wake-up word is matched with the default wake-up word in speech recognition system, to the wake-up word Sound signal response executes the operation of response.The wake-up word of extraction it is corresponding with user be stored in speech recognition system default call out When word of waking up matches, the operation of response is executed.Realize accurate wake up.

Further, it in order to preferably accomplish to wake up, reduces error rate and after the step S40, is also wrapped with reference to figure 6 It includes：

S60：Adjust the recognition threshold of the default wake-up word in speech recognition system；

S70：When the wake-up word is matched with the default wake-up word after adjustment, wake-up word sound signal response is held The operation of row response.

It adjusts, will not immobilize to waking up word, adjusted as user situation is different.Specifically, with reference to The process of Fig. 7, the adjustment includes：

S201：Extract the voiceprint in the wake-up word sound signal；

After extracting wake-up word information, voiceprint is extracted from the wake-up word sound signal, due to of the invention real The main purpose applied is exactly to solve user to speech recognition system or to be mounted with the language using personalized or customized wake-up word When the smart machine of sound identifying system is waken up, the low problem of wake-up rate, and wake up word binding technology and speech recognition technology Core be exactly that training pattern and identification model need in advance so in order to improve the wake-up rate of speech recognition in speech recognition System registry wakes up word model and voice print database accordingly, for waking up the voice after the matched voice signal of user's the input phase Identifying system.In order to further increase the wake-up rate of speech recognition system, and avoid false wake-up caused by ambient noise, Ke Yiyou First judge to whether there is and the matched voice print database of the voiceprint in speech recognition system.Exist in speech recognition system When the voiceprint, step S202 is executed, in the absence of, execute step S203.

S202：Turn down the wake-up word recognition threshold of speech recognition system；

When there is voice print database matched with the voiceprint in the speech recognition system, can be existed according to user The voice print database registered in the speech recognition system determines that the active user of the smart machine as registered users, eliminates ring The case where border noise or other sound false wake-ups, to turn down the wake-up word recognition threshold of the corresponding user of the voice print database, with Improve the probability that user wakes up speech recognition system.

S203：The wake-up word recognition threshold of speech recognition system is turned up.

When voice print database matched with the voiceprint is not present in the speech recognition system, the language may infer that Sound signal may be ambient noise, it is also possible to what nonregistered user was sent out, in order to avoid false wake-up caused by ambient noise, together The wake-up word recognition threshold of speech recognition system can be accordingly turned up, to carry in the safety of Shi Tigao speech recognition systems at this time Height wakes up difficulty.

Further, with reference to Fig. 8, the wake-up word binding method based on above-described embodiment after step S201, further includes：

S204：The voiceprint is calculated according to default sound-groove model and is registered in the voice print database of speech recognition system Similarity；

In the present embodiment, judging in speech recognition system with the presence or absence of matched with the voiceprint in voice signal When voice print database, in order to improve the accuracy of Application on Voiceprint Recognition to improve subsequently to the wake-up rate of subsequent speech recognition, judging When can calculate the voiceprint in voice signal based on default sound-groove model and be registered in the voice print database of speech recognition system Similarity, can be specifically to carry out syllable state to the tone A in the voiceprint based on the default sound-groove model to cut Point, be then based on same means and syllable state cutting carried out to the tone S in the voice print database, then compare tone A with The registration of each state syllable of tone S, the registration are the similarity.In other embodiments, can also pass through The timing B and timing D in voice print database compared in the voiceprint in voice signal calculates the similarity.

S205：When the similarity within a preset range when, judgement speech recognition system in exist and the voiceprint Matched voice print database；

When the registration of tone A and each state syllable of tone S within a preset range when, it is possible to determine that speech recognition system It is interior to exist and the matched voice print database of the voiceprint.

S206：When the similarity is when except preset range, it is not present in judgement speech recognition system and the vocal print The voice print database of information matches.

When the registration of tone A and each state syllable of tone S is when except preset range, judge in speech recognition system There is no with the matched voice print database of the voiceprint.

Further, with reference to Fig. 9, the wake-up word binding method based on above-described embodiment, step S203, including：

S2031：When voice print database matched with the voiceprint is not present in speech recognition system, current use is obtained Family status information and image information；

In the present embodiment, when the registration of tone A and each state syllable of tone S is when except preset range, judgement In speech recognition system there is no with the matched voice print database of the voiceprint, may be at this time user have input it is unregistered Wake up word, it is also possible to which receiving ambient noise causes, thus needs further to obtain current user state information and image letter Breath, whether to be whether the voice signal registered user or received judges as ambient noise to active user.

S2032：When detecting except the non-sounding of active user, the identification range in speech recognition system or current use When family is unregistered, the wake-up word recognition threshold of speech recognition system is turned up.

When judging that the non-sounding of user or judgement user are in speech recognition system according to the current user state information of acquisition Identification range except when, judge the voice signal received for ambient noise, to reduce ambient noise caused by false wake-up, adjust The wake-up word recognition threshold of high speech recognition system reduces false wake-up rate to improve wake-up difficulty.When the current use according to acquisition When family image information judgement active user is unregistered, the wake-up word recognition threshold of speech recognition system is turned up, hardly possible is waken up to improve Degree, improves the safety of speech recognition.

Further, referring to Fig.1 0, the wake-up word binding method based on above-described embodiment, step S70 includes：

S71：It counts the wake-up word information in the voice signal received and is registered to the wake-up of speech recognition system The matching degree of word model；

In the present embodiment, due to mainly matching the wake-up word information in voice signal with word model is waken up, and Specific matched mode can be the matching degree of the permutation and combination between phoneme, such as when it includes 48 phonemes to wake up word model, It needs to count the wake-up word information in the voice signal received, namely statistics wakes up the phoneme feature in word information, then compares Further compare the permutation and combination method between phoneme when reaching preset quantity compared with the phoneme waken up in word information.

S72：When the matching degree reaches the wake-up word recognition threshold after turning down or being turned up, wake up speech recognition system or Wake up the smart machine where speech recognition system.

Reach the coincidence factor of the permutation and combination between preset quantity and phoneme more than default when waking up the phoneme in word information When threshold value, judge that the wake-up word information in voice signal reaches the wake-up word after turning down or being turned up with the matching degree for waking up word model Recognition threshold at this time can make a response the voice signal, such as wake up speech recognition system or wake up speech recognition system The smart machine at place is instructed with the phonetic control command or interactive voice that identify subsequent user input, and then make a response Control action or interactive action, to improve the intelligent of smart machine.

In addition, the embodiment of the present invention also proposes a kind of storage medium, which, which is stored with, wakes up word binding using journey Sequence realizes the step of waking up word binding method as described above when the wake-up word binding procedure is executed by processor.

Wherein, wake-up word binding procedure is performed realized method and can refer to each of present invention wake-up word binding method A embodiment, details are not described herein again.

It should be understood by those skilled in the art that, the embodiment of the present invention can be provided as method, system or computer program Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the present invention Apply the form of example.Moreover, the present invention can be used in one or more wherein include computer usable program code computer The computer program production implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) The form of product.

The present invention be with reference to according to the method for the embodiment of the present invention, the flow of equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that can be realized by computer program instructions every first-class in flowchart and/or the block diagram The combination of flow and/or box in journey and/or box and flowchart and/or the block diagram.These computer programs can be provided Instruct the processor of all-purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine so that the instruction executed by computer or the processor of other programmable data processing devices is generated for real The device for the function of being specified in present one flow of flow chart or one box of multiple flows and/or block diagram or multiple boxes.

These computer program instructions, which may also be stored in, can guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works so that instruction generation stored in the computer readable memory includes referring to Enable the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one box of block diagram or The function of being specified in multiple boxes.

These computer program instructions also can be loaded onto a computer or other programmable data processing device so that count Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, in computer or The instruction executed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one The step of function of being specified in a box or multiple boxes.

It should be noted that in the claims, any reference mark between bracket should not be configured to power The limitation that profit requires.Word "comprising" does not exclude the presence of component not listed in the claims or step.Before component Word "a" or "an" does not exclude the presence of multiple such components.The present invention can be by means of including several different components It hardware and is realized by means of properly programmed computer.In the unit claims listing several devices, these are filled Several in setting can be embodied by the same hardware branch.The use of word first, second, and third is not Indicate any sequence.These words can be construed to title.

Although preferred embodiments of the present invention have been described, it is created once a person skilled in the art knows basic Property concept, then additional changes and modifications may be made to these embodiments.So it includes excellent that the following claims are intended to be interpreted as It selects embodiment and falls into all change and modification of the scope of the invention.

Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the art God and range.In this way, if these modifications and changes of the present invention belongs to the range of the claims in the present invention and its equivalent technologies Within, then the present invention is also intended to include these modifications and variations.

Claims

1. a kind of wake-up word binding method, which is characterized in that the wake-up word binding method includes the following steps：

Step S1, the voice signal that acquisition user sends out；

2. wake-up word binding method according to claim 1, which is characterized in that the step S3 includes：

Step S31, it obtains the user's registration to the wake-up word model of speech recognition system, by the user information and described calls out Word of waking up is bound with the wake-up word model.

3. wake-up word binding method according to claim 2, which is characterized in that the user information be voiceprint, The step S31 includes：

Step S311, multi collect wake-up word sound signal input by user；

Step S312, timing feature, tonality feature and the phoneme feature waken up in word sound signal inputted every time is obtained；

Step S313, acoustic feature processing is carried out to the timing feature and tonality feature that obtain every time, acoustics will be passed through The timing characteristic information and tonality feature information registering of characteristic processing are the voice print database of the user；

Step S314, combination is ranked up to the phoneme feature obtained every time based on preset acoustic model, obtains the wake-up Word model；

4. wake-up word binding method according to claim 3, which is characterized in that the step S2 includes：

Step S22, it if so, obtaining the wake-up word information in the voice signal based on acoustic model and syntactic structure, is based on Sound groove recognition technology in e obtains the voiceprint in the voice signal.

5. wake-up word binding method according to claim 3, which is characterized in that after the step S3, further include：

Step S5, when the wake-up word is matched with the default wake-up word in speech recognition system, to the wake-up word message Number response execute response operation.

6. wake-up word binding method according to claim 5, which is characterized in that after the step S4, further include：

Step S7, when the wake-up word is matched with the default wake-up word after adjustment, wake-up word sound signal response is held The operation of row response.

7. waking up word binding method according to claim 6, which is characterized in that the user information is voiceprint, described Step S6 includes：

Step S61, the voiceprint waken up in word sound signal is extracted；

Step S62, when voice print database matched with the voiceprint is not present in speech recognition system, speech recognition is turned up The wake-up word recognition threshold of system；

Step S63, when there is voice print database matched with the voiceprint in speech recognition system, speech recognition system is turned down The wake-up word recognition threshold of system.

8. wake-up word binding method according to claim 7, which is characterized in that after the step S61, further include：

Step S64, basis presets sound-groove model and calculates the voiceprint and be registered in the voice print database of speech recognition system Similarity；

Step S65, when the similarity within a preset range when, judgement speech recognition system in exist and the voiceprint The voice print database matched；

Step S66, when the similarity is when except preset range, there is no believe with the vocal print in judgement speech recognition system Cease matched voice print database.

9. a kind of smart machine, which is characterized in that the smart machine is mounted with speech recognition system, and the smart machine further includes Memory, processor and the wake-up word binding application program that is stored in the memory and can run on the processor, institute Speech recognition system is stated to be connected to the processor, wherein：

The speech recognition system is for responding the voice signal for meeting wake-up condition；

Realize that claim 1 to 8 any one of them such as wakes up word when the wake-up word binding procedure is executed by the processor The step of binding method.

10. a kind of storage medium, which is characterized in that the storage medium, which is stored with, wakes up word binding application program, the wake-up word The step of waking up word binding method such as claim 1 to 8 any one of them is realized when binding application program is executed by processor.