CN109065046A

CN109065046A - Method, apparatus, electronic equipment and the computer readable storage medium that voice wakes up

Info

Publication number: CN109065046A
Application number: CN201811006300.5A
Authority: CN
Inventors: 李深; 胡亚光
Original assignee: Chumen Wenwen Information Technology Co Ltd
Current assignee: Chumen Wenwen Information Technology Co Ltd
Priority date: 2018-08-30
Filing date: 2018-08-30
Publication date: 2018-12-21

Abstract

The embodiment of the invention provides method, apparatus, electronic equipment and computer readable storage mediums that a kind of voice wakes up, are applied to technical field of voice recognition.This method comprises: extracting spectrum signature information in user speech from collecting, then by spectrum signature information input to the first keyword detection model, obtain corresponding first confidence level of spectrum signature information, if corresponding first confidence level of spectrum signature information is not less than the first confidence threshold value, spectrum signature information and corresponding first confidence level of spectrum signature information are then input to the second keyword detection model, obtain testing result, first confidence threshold value is the corresponding confidence threshold value of the first keyword detection model, it is then based on testing result, determine whether to execute voice wake operation.The embodiment of the present invention, which realizes, reduces the computing cost that keyword detection is carried out to user speech.

Description

Method, apparatus, electronic equipment and the computer readable storage medium that voice wakes up

Technical field

The present embodiments relate to technical field of voice recognition, specifically, the present embodiments relate to a kind of voices to call out Awake method, apparatus, electronic equipment and computer readable storage medium.

Background technique

With the development of information technology, speech recognition technology also develops therewith, more and more using the product of speech recognition, For example, session assistant, intelligent robot, smartwatch etc..These products be all enhanced by speech recognition user experience and Improve the level of natural human-computer interaction.In speech recognition, a kind of very important technology is exactly keyword detection, generally may be used It is waken up with becoming voice.

The mode for carrying out voice wake-up in the prior art is by predetermined keyword detection model, to collected user's language Sound carries out keyword detection, when there are when target keyword, realize that voice wakes up in collected user speech.

However, inventors have found that when passing through existing predetermined keyword detection model during carrying out innovation and creation When realizing that voice wakes up, keyword detection is carried out since all voices of user are required to predetermined keyword detection model, with true It is fixed whether to execute voice wake operation, since existing predetermined keyword detection model is more complicated, user speech is closed The calculation amount of keyword detection is larger, larger so as to cause calculation amount expense.

Summary of the invention

The embodiment of the invention provides method, apparatus, electronic equipment and computer-readable storage mediums that a kind of voice wakes up Matter, for solving the problems, such as that the computing cost for carrying out keyword detection to user speech is larger.

To solve the above-mentioned problems, the embodiment of the present invention mainly provides the following technical solutions:

In a first aspect, a kind of method that voice wakes up is provided, this method comprises:

Spectrum signature information is extracted in user speech from collecting；

By the spectrum signature information input to the first keyword detection model, it is corresponding to obtain the spectrum signature information First confidence level；

If corresponding first confidence level of the spectrum signature information is not less than the first confidence threshold value, and the frequency spectrum is special Reference breath and corresponding first confidence level of the spectrum signature information are input to the second keyword detection model, obtain detection knot Fruit, the first confidence threshold value are the corresponding confidence threshold value of the first keyword detection model；

Based on testing result, it is determined whether execute voice wake operation.

Second aspect, provides a kind of device that voice wakes up, which includes:

Extraction module, for from collect in user speech extract spectrum signature information；

First input module, spectrum signature information input to the first keyword for extracting the extraction module are examined Model is surveyed, corresponding first confidence level of the spectrum signature information is obtained；

Second input module, for being not less than the first confidence level threshold when corresponding first confidence level of the spectrum signature information When value, the spectrum signature information and corresponding first confidence level of the spectrum signature information that the extraction module is extracted are defeated Enter to the second keyword detection model, obtain testing result, the first confidence threshold value is that the first keyword detection model is corresponding Confidence threshold value；

Determining module, for being based on testing result, it is determined whether execute voice wake operation.

The third aspect provides a kind of electronic equipment, which includes:

At least one processor；

And at least one processor, the bus being connected to the processor；Wherein,

The processor, memory complete mutual communication by the bus；

The processor is used to call the program instruction in the memory, to execute the wake-up of voice shown in first aspect Method.

Fourth aspect provides a kind of non-transient computer readable storage medium, the non-transient computer readable storage Medium storing computer instruction, the computer instruction make the computer execute the side that voice shown in first aspect wakes up Method.

Technical solution provided in an embodiment of the present invention at least has the advantage that

The embodiment of the invention provides method, apparatus, electronic equipment and computer-readable storage mediums that a kind of voice wakes up Matter, compared with realizing that voice wakes up by existing predetermined keyword detection model in the prior art, the embodiment of the present invention is from adopting Collect extraction spectrum signature information in user speech, then obtains spectrum signature information input to the first keyword detection model To corresponding first confidence level of spectrum signature information, if corresponding first confidence level of spectrum signature information is not less than the first confidence level Spectrum signature information and corresponding first confidence level of spectrum signature information are then input to the second keyword detection mould by threshold value Type obtains testing result, and the first confidence threshold value is the corresponding confidence threshold value of the first keyword detection model, is then based on inspection Survey result, it is determined whether execute voice wake operation.I.e. partial user of embodiment of the present invention voice is examined by the first keyword It surveys after model, it can determine not execute voice wake operation, do not need to carry out by the second keyword detection model Key detection, since the first keyword detection model structure complexity is much smaller than existing predetermined keyword detection model, The only calculating by the calculation amount expense of the first keyword detection model much smaller than predetermined keyword detection model in the prior art Amount, so as to reduce the calculation amount expense for carrying out keyword detection to user speech.

Detailed description of the invention

To describe the technical solutions in the embodiments of the present invention more clearly, institute in being described below to the embodiment of the present invention Attached drawing to be used is needed to be briefly described.

Fig. 1 is the method flow schematic diagram that a kind of voice provided in an embodiment of the present invention wakes up；

Fig. 2 is the apparatus structure schematic diagram that a kind of voice provided in an embodiment of the present invention wakes up；

Fig. 3 is the apparatus structure schematic diagram that another voice provided in an embodiment of the present invention wakes up；

Fig. 4 is the electronic devices structure schematic diagram that a kind of voice provided in an embodiment of the present invention wakes up.

Specific embodiment

Embodiments herein is described below in detail, examples of the embodiments are shown in the accompanying drawings, wherein from beginning to end Same or similar label indicates same or similar element or element with the same or similar functions.Below with reference to attached The embodiment of figure description is exemplary, and is only used for explaining the application, and is not construed as limiting the claims.

Those skilled in the art of the present technique are appreciated that unless expressly stated, singular " one " used herein, " one It is a ", " described " and "the" may also comprise plural form.It is to be further understood that being arranged used in the description of the present application Diction " comprising " refer to that there are the feature, integer, step, operation, element and/or component, but it is not excluded that in the presence of or addition Other one or more features, integer, step, operation, element, component and/or their group.It should be understood that when we claim member Part is " connected " or when " coupled " to another element, it can be directly connected or coupled to other elements, or there may also be Intermediary element.In addition, " connection " used herein or " coupling " may include being wirelessly connected or wirelessly coupling.It is used herein to arrange Diction "and/or" includes one or more associated wholes for listing item or any cell and all combinations.

To keep the purposes, technical schemes and advantages of the application clearer, below in conjunction with attached drawing to the application embodiment party Formula is described in further detail.

It needs to carry out keyword detection by existing keyword detection model in voice wake-up in the prior art, it is existing When in technology in order to guarantee that false wake-up rate is lower, need crucial keyword detection model structure especially complex, and the side of calculating Formula is also especially complex, and since all voices of user's input in the prior art are required to through the existing keyword detection Model, therefore all voices of user's input are required to calculate by the calculation in existing model, calculation amount is larger, Computing cost is larger.

Method, apparatus, electronic equipment and the computer readable storage medium that voice provided in an embodiment of the present invention wakes up, purport In the technical problem as above for solving the prior art.

How the technical solution of the application and the technical solution of the application are solved with specifically embodiment below above-mentioned Technical problem is described in detail.These specific embodiments can be combined with each other below, for the same or similar concept Or process may repeat no more in certain embodiments.Below in conjunction with attached drawing, embodiments herein is described.

Embodiment one

The embodiment of the invention provides a kind of methods that voice wakes up, as shown in Figure 1, this method comprises:

Step S101, from collect in user speech extract spectrum signature information.

For the embodiment of the present invention, collect user for a period of time in voice, then from one section of collected user when Spectrum signature information is extracted in interior voice.

For the embodiment of the present invention, the method that voice wakes up, which can be run on electronic equipment, can receive user's transmission Voice.In embodiments of the present invention, when electronic equipment is in running order, the sound of surrounding can be monitored in real time, to connect Receive the voice messaging of user.

For the embodiment of the present invention, voice awakening method runs on electronic equipment can be by wired connection mode or nothing Line connection type receives voice messaging using its terminal for carrying out interactive voice from user.It should be pointed out that above-mentioned wirelessly connect The mode of connecing can include but is not limited to 3G/4G connection, WiFi connection, bluetooth connection, WiMAX connection, Zigbee connection, UWB (Ultra Wideband) connection and other radio connections.

For the embodiment of the present invention, it is normal known in this field that the mode of spectrum signature information is extracted from user speech Know, details are not described herein.

Step S102, by spectrum signature information input to the first keyword detection model, it is corresponding to obtain spectrum signature information The first confidence level.

For the embodiment of the present invention, the first keyword detection model can be neural network relatively simple for structure.At this In inventive embodiments, the first keyword detection model is for determining corresponding first confidence level of spectrum signature information and the of input The relationship of one confidence threshold value.

For the embodiment of the present invention, it is the wake-up voice for waking up electronic equipment that confidence level, which is used to characterize user speech, Probability.In embodiments of the present invention, the first confidence level is to call out for characterizing the first keyword search model inspection to user speech The probability of the wake-up voice of awake electronic equipment.

If step S103, corresponding first confidence level of spectrum signature information is not less than the first confidence threshold value, by frequency spectrum Characteristic information and corresponding first confidence level of spectrum signature information are input to the second keyword detection model, obtain detection knot Fruit.

Wherein, the first confidence threshold value is the corresponding confidence threshold value of the first keyword detection model.

For the embodiment of the present invention, the inspection of the second keyword detection model and the first keyword detection model to spectrum information It is not identical to survey dimension.In embodiments of the present invention, the second keyword detection model can be neural network, and second is crucial Structure is complicated than the first keyword detection model for the structure of word detection model.

Step S104, it is based on testing result, it is determined whether execute voice wake operation.

The embodiment of the invention provides a kind of methods that voice wakes up, and pass through existing predetermined keyword in the prior art Detection model realizes that voice wake-up is compared, and the embodiment of the present invention is from extraction spectrum signature information in user speech is collected, then By spectrum signature information input to the first keyword detection model, corresponding first confidence level of spectrum signature information is obtained, if frequency Corresponding first confidence level of spectrum signature information is not less than the first confidence threshold value, then believes spectrum signature information and spectrum signature It ceases corresponding first confidence level and is input to the second keyword detection model, obtain testing result, the first confidence threshold value is first The corresponding confidence threshold value of keyword detection model, is then based on testing result, it is determined whether executes voice wake operation.I.e. originally Inventive embodiments partial user voice passes through after the first keyword detection model, it can determines not execute voice wake-up Operation does not need carrying out crucial detection by the second keyword detection model, since the first keyword detection model structure is multiple Miscellaneous degree is much smaller than existing predetermined keyword detection model, therefore only remote by the calculation amount expense of the first keyword detection model Less than the calculation amount of predetermined keyword detection model in the prior art, keyword detection is carried out to user speech so as to reduce Calculation amount expense.

Embodiment two

The embodiment of the invention provides alternatively possible implementations, further include implementing on the basis of example 1 Method shown in example two, wherein

Step S104 includes step S1041 (being not marked in figure) and step S1042 (being not marked in figure), wherein

If step S1041, indicating in testing result, corresponding second confidence level of spectrum signature information is not less than the second confidence Spend threshold value, it is determined that execute voice wake operation.

For the embodiment of the present invention, it to user speech is electric that the second confidence level, which is the second keyword search model inspection of characterization, Sub- equipment wakes up the probability of voice.

Indicate that corresponding second confidence level of spectrum signature information is not less than second for the embodiment of the present invention, in testing result It is larger that confidence threshold value characterizes the probability that the user speech is electronic equipment wake-up voice, it is determined that executes voice and wakes up.

If step S1042, indicating corresponding second confidence level of spectrum signature information less than the second confidence level in testing result Threshold value, it is determined that do not execute voice wake operation.

Wherein, the second confidence threshold value is the corresponding confidence threshold value of the second keyword detection model.

For the embodiment of the present invention, if indicating the corresponding second confidence level small fish second of spectrum signature information in testing result Confidence threshold value it is lower then to characterize the probability that user speech is wake-up voice, it is determined that do not execute voice wake operation.

Embodiment three

The alternatively possible implementation of the embodiment of the present invention further includes embodiment on the basis of shown in the embodiment two It is operated shown in three, wherein

It further include step SA (being not marked in figure) before step S102, wherein

Step SA, the first keyword detection model of training and the second keyword detection model.

For the embodiment of the present invention, realize that the method that voice wakes up needs to utilize the first keyword detection model and second Keyword detection model.In embodiments of the present invention, the first keyword detection model and the second keyword detection mould are being utilized Before type carries out keyword detection, need through a large amount of training sample to the first keyword detection model and the second keyword Detection model is trained.

For the embodiment of the present invention, step SA be can specifically include: the first keyword of training by way of training under line Detection model and the second keyword detection model；And/or the first keyword detection model of training by way of on-line study And the second keyword detection model.

It, can the first keyword detection model and the second keyword detection simultaneously in step SA for the embodiment of the present invention Model, can also first train the first keyword detection model, and retraining the second keyword detection model can also first train second Keyword detection model, retraining the first keyword detection model.In embodiments of the present invention without limitation.

Example IV

The alternatively possible implementation of the embodiment of the present invention further includes embodiment on the basis of shown in the embodiment three It is operated shown in four, wherein

The mode of the first keyword detection model of training in step SA, comprising: step SA1 (being not marked in figure) and step SA2 (is not marked in figure), wherein

Step SA1, first sample information is obtained.

Wherein, first sample information includes: at least one first sample spectrum information and each first sample frequency spectrum letter Cease the markup information whether corresponding first confidence level is not less than the first confidence threshold value.

For the embodiment of the present invention, which may include first identifier and second identifier, wherein first identifier can To characterize corresponding first confidence level of first sample spectrum information not less than the first confidence threshold value, second identifier is for characterizing the Corresponding first confidence level of one sample spectrum information is less than the first confidence threshold value.

For example, first identifier can be " 0 ", second identifier can be " 1 ".

Step SA2, first sample information, the first keyword detection model of training are based on.

Specifically, the mode of the second keyword detection model is trained in step SA, comprising: step SA3 (is not marked in figure) And step SA4 (being not marked in figure), wherein

Step SA3, the second sample information is obtained.

Wherein, the second sample information includes: at least one second sample spectra information group and any second sample spectra Whether corresponding second confidence level of information group is not less than the markup information of the second confidence threshold value；Any second sample spectra information It include: the second sample spectra information and corresponding first confidence level of the second spectrum information in group；Any second sample frequency Spectrum information is the sample spectra information that the first confidence level is not less than the first confidence threshold value.

For the embodiment of the present invention, which may include third mark and the 4th mark, wherein third mark can It is not less than the second confidence threshold value to characterize corresponding second confidence level of the second sample spectra information group, the 4th mark is for characterizing Corresponding second confidence level of second sample spectra information group is less than the second confidence threshold value.

For the embodiment of the present invention, third mark with the 4th mark it is not identical, third identify can with first identifier or Person's second identifier is identical, and the 4th mark can be identical as first identifier or second identifier.

For the embodiment of the present invention, first sample spectrum information can be identical with the second sample spectra information, can also not Together.In embodiments of the present invention without limitation.

For the embodiment of the present invention, if first sample spectrum information is identical as the second sample spectra information, the second sample Corresponding first confidence level of spectrum information is that second spectrum information passes through the confidence level obtained after the first keyword detection model.

Step SA4, the second sample information, the second keyword detection model of training are based on.

For the embodiment of the present invention, pass through a large amount of sample information (including first sample information and second sample information) The first keyword detection model and the second keyword detection model are trained respectively, first after capable of being trained is closed The second keyword detection model after keyword detection model and training, due to the first keyword detection model and the second key Word detection model is trained by a large amount of sample, so as to improve through the first keyword detection mould after training The second keyword detection model after type and training determines whether to execute the accuracy of voice wake operation, and then can be promoted User experience.

Embodiment five

The alternatively possible implementation of the embodiment of the present invention further includes on the basis of embodiment three or example IV It is operated shown in embodiment five, wherein

It further include step SB (being not marked in figure) before step SA, wherein

Step SB, the first confidence threshold value and the second confidence threshold value are configured.

Wherein, the first confidence threshold value is less than the second confidence threshold value.

For the embodiment of the present invention, since the first confidence threshold value is the corresponding confidence level threshold of the first keyword detection model Value, the second confidence threshold value is the corresponding confidence threshold value of the second keyword detection model, and the second keyword detection model is Secondary verification is carried out for by the spectrum information of the first keyword detection model, therefore the second keyword detection model is corresponding Confidence threshold value be higher than the corresponding confidence threshold value of the first keyword detection model just can secondary verification effect.

For the embodiment of the present invention, first confidence threshold value and the second confidence threshold value can by user setting, The operator that application can be waken up by voice is arranged.In embodiments of the present invention without limitation.

Embodiment six

The alternatively possible implementation of the embodiment of the present invention further includes six institute of embodiment on the basis of example 1 The operation shown, wherein

It further include step SC (being not marked in figure) after step S102, wherein

If step SC, corresponding first confidence level of spectrum signature information is less than the first confidence threshold value, it is determined that do not execute Voice wake operation.

For the embodiment of the present invention, set if obtaining spectrum signature information corresponding first by the first keyword detection model Reliability is less than the first confidence threshold value, it is determined that do not execute voice wake operation, and do not need yet by the spectrum signature information with And corresponding first confidence level of the spectrum signature information is input to the second keyword detection model and carries out secondary detection, can reduce Calculate pressure.

Embodiment seven

The alternatively possible implementation of the embodiment of the present invention further includes seven institute of embodiment on the basis of example 1 The operation shown, wherein

This method further includes step SD (being not marked in figure), wherein

Step SD, if the first keyword detection model is configured at local, the second keyword detection model is configured at cloud, and It detects that terminal device is not currently connected to cloud, then passes through corresponding first confidence level of spectrum signature information and the first confidence Spend threshold value, it is determined whether execute voice wake operation.

For the embodiment of the present invention, the first keyword detection model and the second keyword detection model can be configured at Local, the first keyword detection model and the second keyword detection model can be configured at cloud, the first keyword detection Model can be configured at local, and the second keyword detection model can be configured at cloud.It does not limit in embodiments of the present invention It is fixed.

For the embodiment of the present invention, the first keyword detection model and the second keyword detection model are configured at local go back It is arranged in cloud, is determined by the computing capability of terminal device and the memory space of terminal device.

For the embodiment of the present invention, when the first keyword detection model is configured at local, the second keyword detection model is matched It is placed in cloud, and when terminal device is not currently connected to cloud device, i.e., terminal device can not be by spectrum signature information and should Corresponding first confidence level of spectrum signature information is sent to cloud device, then can only pass through the inspection of the first keyword detection model Survey result, it is determined whether execute wake operation.

For the embodiment of the present invention, when the testing result of the first keyword detection model indicates the of the spectrum signature information When one confidence level is less than the first confidence threshold value, determination does not execute voice wake operation；When the inspection of the first keyword detection model When survey result indicates the first confidence level of the spectrum signature information not less than the first confidence threshold value, determine that executing voice wake-up grasps Make.

For the embodiment of the present invention, when the first keyword detection model is configured at local, the second keyword detection model is matched It is placed in cloud, and detects that terminal device is not currently connected to cloud, the detection knot of the first keyword detection model can be passed through Fruit, it is determined whether execute voice wake operation, avoid leading to not execute voice wake-up behaviour due to currently cannot connect to cloud The case where making, and then can further promote user experience.

For the embodiment of the present invention, when the first keyword detection model and the second keyword detection model are configured at cloud End, and when terminal device is not currently connected to cloud, then prompt information is exported, for prompting user's present terminal equipment that can not connect It is connected to cloud.

Embodiment eight

The apparatus structure schematic diagram that a kind of voice provided in an embodiment of the present invention wakes up, as shown in Fig. 2, the embodiment of the present invention Voice wake up device 20 may include: extraction module 201, the first input module 202, the second input module 203 and really Cover half block 204, wherein

Extraction module 201, for from collect in user speech extract spectrum signature information.

First input module 202, spectrum signature information input for extracting extraction module 201 to the first keyword Detection model obtains corresponding first confidence level of spectrum signature information.

Second input module 203, for being not less than the first confidence level threshold when corresponding first confidence level of spectrum signature information When value, corresponding first confidence level of spectrum signature information and spectrum signature information that extraction module 201 extracts is input to Second keyword detection model, obtains testing result.

Determining module 204, for being based on testing result, it is determined whether execute voice wake operation.

The embodiment of the invention provides the devices that a kind of voice wakes up, and pass through existing predetermined keyword in the prior art Detection model realizes that voice wake-up is compared, and the embodiment of the present invention is from extraction spectrum signature information in user speech is collected, then By spectrum signature information input to the first keyword detection model, corresponding first confidence level of spectrum signature information is obtained, if frequency Corresponding first confidence level of spectrum signature information is not less than the first confidence threshold value, then believes spectrum signature information and spectrum signature It ceases corresponding first confidence level and is input to the second keyword detection model, obtain testing result, the first confidence threshold value is first The corresponding confidence threshold value of keyword detection model, is then based on testing result, it is determined whether executes voice wake operation.I.e. originally Inventive embodiments partial user voice passes through after the first keyword detection model, it can determines not execute voice wake-up Operation does not need carrying out crucial detection by the second keyword detection model, since the first keyword detection model structure is multiple Miscellaneous degree is much smaller than existing predetermined keyword detection model, therefore only remote by the calculation amount expense of the first keyword detection model Less than the calculation amount of predetermined keyword detection model in the prior art, keyword detection is carried out to user speech so as to reduce Calculation amount expense.

The method that a kind of voice that the embodiment of the present invention one provides wakes up can be performed in the device that the voice of the present embodiment wakes up, Its realization principle is similar, and details are not described herein again.

Embodiment nine

The apparatus structure schematic diagram that another kind voice provided in an embodiment of the present invention wakes up, as shown in figure 3, the present invention is implemented Example voice wake up device 30 may include: extraction module 301, the first input module 302, the second input module 303 and Determining module 304, wherein

Extraction module 301, for from collect in user speech extract spectrum signature information.

Wherein, the extraction module 301 in Fig. 3 is same or similar with the function of extraction module 201 in Fig. 2.

First input module 302, spectrum signature information input for extracting extraction module 301 to the first keyword Detection model obtains corresponding first confidence level of spectrum signature information.

Wherein, the first input module 302 in Fig. 3 is same or similar with the function of the first input module 202 in Fig. 2.

Second input module 303, for being not less than the first confidence level threshold when corresponding first confidence level of spectrum signature information When value, corresponding first confidence level of spectrum signature information and spectrum signature information that extraction module 301 extracts is input to Second keyword detection model, obtains testing result.

Wherein, the second input module 303 in Fig. 3 is same or similar with the function of the second input module 203 in Fig. 2.

Determining module 304, for being based on testing result, it is determined whether execute voice wake operation.

Wherein, the determining module 304 in Fig. 3 is same or similar with the function of determining module 204 in Fig. 2.

Specifically, it is determined that module 304, is specifically used for when corresponding second confidence of instruction spectrum signature information in testing result When degree is not less than the second confidence threshold value, determines and execute voice wake operation.

Determining module 304 is specifically also used to when corresponding second confidence level of instruction spectrum signature information is small in testing result When the second confidence threshold value, determination does not execute voice wake operation.

Further, as shown in figure 3, can also include: the first training module 305 and the second training mould in the device 30 Block 306, wherein the first training module 305 and the second training module 306 can be the same training module, or two Training module.In embodiments of the present invention without limitation.Wherein, the first training module 305 and the second training mould are shown in Fig. 3 Block 306 is two training modules, wherein

First training module 305, for training the first keyword detection model.

Second training module 306, for training the second keyword detection model.

Specifically, the first training module 305 includes: first acquisition unit 3051 and the first training unit 3052, wherein

First acquisition unit 3051, for obtaining first sample information.

First training unit 3052, the first sample information for being obtained based on first acquisition unit 3051, training first Keyword detection model.

Specifically, the second training module 306 includes: second acquisition unit 3061 and the second training unit 3062, wherein

Second acquisition unit 3061, for obtaining the second sample information.

For the embodiment of the present invention, when the first training module 305 and the second training module 306 are the same training module, Then first acquisition unit 3051 and second acquisition unit 3061 can be the same acquiring unit, or two acquisitions are single Member.In embodiments of the present invention without limitation.Only show that first acquisition unit 3051 is with second acquisition unit 3061 in Fig. 3 The case where two acquiring units.

Second training unit 3062, the second sample information for being obtained based on second acquisition unit 3061, training second Keyword detection model.

For the embodiment of the present invention, when the first training module 305 and the second training module 306 are the same training module, Then the first training unit 3052 and the second training unit 3062 can be the same training unit, or two training are single Member.In embodiments of the present invention without limitation.Only show that the first training unit 3052 is with the second training unit 3062 in Fig. 3 The case where two training units.

Further, as shown in figure 3, the device 30 further include: the first configuration module 307 and the second configuration module 308, Wherein,

First configuration module 307, for configuring the first confidence threshold value.

Second configuration module 308, for configuring the second confidence threshold value.

For the embodiment of the present invention, the first configuration module 307 and the second configuration module 308 can be same configuration module, It may be two configuration modules.In embodiments of the present invention without limitation.

Wherein, the first configuration module 307 of display is only shown in Fig. 3 and the second configuration module 308 is two configuration modules Situation.

In a possible implementation, determining module 304 is also used to when corresponding first confidence of spectrum signature information When degree is less than the first confidence threshold value, determination does not execute voice wake operation.

In a possible implementation, determining module 304 is also used to be configured at this when the first keyword detection model Ground, the second keyword detection model are configured at cloud, and when detecting that terminal device is not currently connected to cloud, and it is special to pass through frequency spectrum Reference ceases corresponding first confidence level and the first confidence threshold value, it is determined whether executes voice wake operation.

The device that the voice of the present embodiment wakes up can be performed shown in the embodiment of the present invention one to seven any embodiment of embodiment Voice wake up method, realization principle is similar, and details are not described herein again.

Embodiment ten

The embodiment of the invention provides a kind of electronic equipment, as shown in figure 4, electronic equipment shown in Fig. 4 4000 includes: place Manage device 4001 and memory 4003.Wherein, processor 4001 is connected with memory 4003, is such as connected by bus 4002.

Wherein, processor 4001 is applied in the embodiment of the present invention, for realizing Fig. 2 or extraction module shown in Fig. 3, the One input module, the function of the second input module and determining module and/or the first training module shown in Fig. 3, the second training The function of module, the first configuration module and the second configuration module.

Processor 4001 can be CPU, general processor, DSP, ASIC, FPGA or other programmable logic device, crystalline substance Body pipe logical device, hardware component or any combination thereof.It, which may be implemented or executes, combines described by present disclosure Various illustrative logic blocks, module and circuit.Processor 4001 is also possible to realize the combination of computing function, such as wraps It is combined containing one or more microprocessors, DSP and the combination of microprocessor etc..

Bus 4002 may include an access, and information is transmitted between said modules.Bus 4002 can be pci bus or Eisa bus etc..Bus 4002 can be divided into address bus, data/address bus, control bus etc..Only to be used in Fig. 4 convenient for indicating One thick line indicates, it is not intended that an only bus or a type of bus.

Memory 4003 can be ROM or can store the other kinds of static storage device of static information and instruction, RAM Or the other kinds of dynamic memory of information and instruction can be stored, it is also possible to EEPROM, CD-ROM or other CDs Storage, optical disc storage (including compression optical disc, laser disc, optical disc, Digital Versatile Disc, Blu-ray Disc etc.), magnetic disk storage medium Or other magnetic storage apparatus or can be used in carry or store have instruction or data structure form desired program generation Code and can by any other medium of computer access, but not limited to this.

Memory 4003 is used to store the application code for executing application scheme, and is held by processor 4001 to control Row.Processor 4001 is for executing the application code stored in memory 4003, to realize Fig. 2 or embodiment illustrated in fig. 3 The movement for the device that the voice of offer wakes up.

Electronic equipment in embodiments of the present invention can be local terminal, or cloud device.Herein not It limits.

The embodiment of the invention provides a kind of voice wake up electronic equipment, in the prior art pass through existing default pass Keyword detection model realize voice wake-up compares, the embodiment of the present invention from collect in user speech extract spectrum signature information, Then by spectrum signature information input to the first keyword detection model, corresponding first confidence level of spectrum signature information is obtained, It is if corresponding first confidence level of spectrum signature information is not less than the first confidence threshold value, spectrum signature information and frequency spectrum is special Reference ceases corresponding first confidence level and is input to the second keyword detection model, obtains testing result, the first confidence threshold value is The corresponding confidence threshold value of first keyword detection model, is then based on testing result, it is determined whether executes voice wake operation. I.e. partial user of embodiment of the present invention voice passes through after the first keyword detection model, it can determines not execute voice Wake operation does not need carrying out crucial detection by the second keyword detection model, due to the first keyword detection model knot Structure complexity is much smaller than existing predetermined keyword detection model, therefore is only opened by the calculation amount of the first keyword detection model Pin carries out keyword to user speech so as to reduce much smaller than the calculation amount of predetermined keyword detection model in the prior art The calculation amount expense of detection.

The embodiment of the invention provides a kind of electronic equipment to be suitable for above method embodiment one to any reality of embodiment seven Apply example.Details are not described herein.

Embodiment 11

The embodiment of the present invention provides a kind of non-transient computer readable storage medium, non-transient computer readable storage medium Computer instruction is stored, computer instruction makes computer execute above-mentioned each method embodiment one to seven any embodiment institute of embodiment The method that the voice shown wakes up.

The embodiment of the invention provides a kind of non-transient computer readable storage mediums, existing with passing through in the prior art Predetermined keyword detection model realizes that voice wake-up is compared, and the embodiment of the present invention extracts spectrum signature from collecting in user speech Information obtains spectrum signature information corresponding first and sets then by spectrum signature information input to the first keyword detection model Reliability, if corresponding first confidence level of spectrum signature information is not less than the first confidence threshold value, by spectrum signature information and Corresponding first confidence level of spectrum signature information is input to the second keyword detection model, obtains testing result, the first confidence level Threshold value is the corresponding confidence threshold value of the first keyword detection model, is then based on testing result, it is determined whether executes voice and calls out It wakes up and operates.I.e. partial user of embodiment of the present invention voice passes through after the first keyword detection model, it can determines not Voice wake operation is executed, does not need carrying out crucial detection by the second keyword detection model, since the first keyword is examined It surveys model structure complexity and is much smaller than existing predetermined keyword detection model, therefore only pass through the first keyword detection model Calculation amount expense much smaller than predetermined keyword detection model in the prior art calculation amount, so as to reduce to user speech into The calculation amount expense of row keyword detection.

The embodiment of the invention provides a kind of non-transient computer readable storage mediums to be suitable for any implementation of the above method Example.Details are not described herein.

It should be understood that although each step in the flow chart of attached drawing is successively shown according to the instruction of arrow, These steps are not that the inevitable sequence according to arrow instruction successively executes.Unless expressly stating otherwise herein, these steps Execution there is no stringent sequences to limit, can execute in the other order.Moreover, at least one in the flow chart of attached drawing Part steps may include that perhaps these sub-steps of multiple stages or stage are not necessarily in synchronization to multiple sub-steps Completion is executed, but can be executed at different times, execution sequence, which is also not necessarily, successively to be carried out, but can be with other At least part of the sub-step or stage of step or other steps executes in turn or alternately.

The above is only some embodiments of the invention, it is noted that for the ordinary skill people of the art For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered It is considered as protection scope of the present invention.

Claims

1. a kind of method that voice wakes up characterized by comprising

Spectrum signature information is extracted in user speech from collecting；

By the spectrum signature information input to the first keyword detection model, the spectrum signature information corresponding first is obtained Confidence level；

If corresponding first confidence level of the spectrum signature information is not less than the first confidence threshold value, the spectrum signature is believed Breath and corresponding first confidence level of the spectrum signature information are input to the second keyword detection model, obtain testing result, First confidence threshold value is the corresponding confidence threshold value of the first keyword detection model；

Based on testing result, it is determined whether execute voice wake operation.

2. the method according to claim 1, wherein being based on testing result, it is determined whether execute voice and wake up behaviour Make, comprising:

If indicating in the testing result, corresponding second confidence level of the spectrum signature information is not less than the second confidence threshold value, It then determines and executes voice wake operation；

If indicating in the testing result, corresponding second confidence level of the spectrum signature information is less than the second confidence level threshold Value, it is determined that do not execute voice wake operation；

Second confidence threshold value is the corresponding confidence threshold value of the second keyword detection model.

3. according to the method described in claim 2, it is characterized in that, the spectrum signature information input to the first keyword is examined Model is surveyed, obtains corresponding first confidence level of the spectrum signature information, before further include:

Training the first keyword detection model and the second keyword detection model.

4. according to the method described in claim 3, it is characterized in that, the mode of training the first keyword detection model, packet It includes:

First sample information is obtained, the first sample information includes: at least one first sample spectrum information and each the Whether corresponding first confidence level of one sample spectrum information is not less than the markup information of first confidence threshold value；

Based on the first sample information, training the first keyword detection model；

The mode of training the second keyword detection model, comprising:

Obtain the second sample information, second sample information includes: at least one second sample spectra information group and any Whether corresponding second confidence level of the second sample spectra information group is not less than the markup information of the second confidence threshold value；Any second It include: the second sample spectra information and corresponding first confidence level of the second spectrum information in sample spectra information group；Appoint One second sample spectra information is the sample spectra information that the first confidence level is not less than the first confidence threshold value；

Based on second sample information, training the second keyword detection model.

5. according to the described in any item methods of claim 3-4, which is characterized in that training the first keyword detection model with And the second keyword detection model, before further include:

First confidence threshold value and second confidence threshold value are configured, first confidence threshold value is set less than second Confidence threshold.

6. the method according to claim 1, wherein the spectrum signature information input to the first keyword is examined Model is surveyed, obtains corresponding first confidence level of the spectrum signature information, later further include:

If corresponding first confidence level of the spectrum signature information is less than the first confidence threshold value, it is determined that do not execute voice wake-up Operation.

7. the method according to claim 1, wherein the method also includes:

If the first keyword detection model is configured at local, described second keyword detection model and is configured at cloud, and examines It measures terminal device and is not currently connected to cloud, then pass through corresponding first confidence level of the spectrum signature information and described One confidence threshold value, it is determined whether execute voice wake operation.

8. the device that a kind of voice wakes up characterized by comprising

First input module, spectrum signature information input for extracting the extraction module to the first keyword detection mould Type obtains corresponding first confidence level of the spectrum signature information；

Second input module, for being not less than the first confidence threshold value when corresponding first confidence level of the spectrum signature information When, the corresponding first confidence level input of the spectrum signature information and the spectrum signature information that the extraction module is extracted To the second keyword detection model, testing result is obtained, the first confidence threshold value, which is that the first keyword detection model is corresponding, sets Confidence threshold；

9. a kind of electronic equipment characterized by comprising

At least one processor；

And at least one processor, the bus being connected to the processor；Wherein,

The processor, memory complete mutual communication by the bus；

The processor is used to call the program instruction in the memory, any into claim 7 with perform claim requirement 1 The method that voice described in wakes up.

10. a kind of non-transient computer readable storage medium, which is characterized in that the non-transient computer readable storage medium is deposited Store up computer instruction, the computer instruction requires the computer perform claim 1 to described in any one of claim 7 The method that voice wakes up.