CN110097876A

CN110097876A - Voice wakes up processing method and is waken up equipment

Info

Publication number: CN110097876A
Application number: CN201810088343.6A
Authority: CN
Inventors: 刘勇
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2018-01-30
Filing date: 2018-01-30
Publication date: 2019-08-06

Abstract

Processing method is waken up this application provides a kind of voice and is waken up equipment, wherein it includes: acquisition voice data that the voice, which wakes up processing method,；Identify in the voice data whether there is wake-up word by waking up model；In the case where having identified wake-up word, by the sound bite between the predetermined number of frames after the end position of predetermined number of frames and wake-up word before the starting position for waking up word in the voice data, server is uploaded to as word sound bite is waken up, wherein, the server is updated the wake-up model by the wake-up word sound bite.Using technical solution provided by the embodiments of the present application, a possibility that false wake-up can be reduced, the accuracy rate for waking up word identification is improved.

Description

Voice wakes up processing method and is waken up equipment

Technical field

The application belongs to Internet technical field more particularly to a kind of voice wakes up processing method and is waken up equipment.

Background technique

With the continuous development of intelligent identification technology, artificial intelligence wake up using more and more extensive.For example, intelligent sound Case, smart television, intelligent automobile etc. all can gradually be waken up by artificial intelligence.Wake-up primarily now Mode still by way of waking up word, wakes up word " Bei Bei " for example, being arranged for intelligent automobile, then automobile can be monitored in real time Extraneous sound, if recognizing the external world has " Bei Bei " this voice data input, with regard to wakeup of automotive, that is, pass through wake-up Word realizes the wake-up to equipment.

However, be easy to causeing false wake-up by way of waking up word come wake-up device, that is, user say it is other, not It cries " Bei Bei ", but identifying system identifies mistake, has been identified as " Bei Bei ", then will lead to false wake-up, has seriously affected User experience.

In view of the above-mentioned problems, currently no effective solution has been proposed.

Summary of the invention

The application is designed to provide a kind of voice and wakes up processing method and be waken up equipment, to reduce false wake-up generation Probability improves user experience.

The application, which provides a kind of voice and wakes up processing method and be waken up equipment, to be achieved in that

A kind of voice wakes up processing method, applied to being waken up in equipment, which comprises

Obtain voice data；

Identify in the voice data whether there is wake-up word by waking up model；

In the case where having identified wake-up word, by the predetermined number before the starting position for waking up word in the voice data The sound bite between predetermined number of frames after the end position of amount frame and wake-up word, is uploaded to as word sound bite is waken up Server, wherein the server is updated the wake-up model by the wake-up word sound bite.

A kind of data processing method is applied in server, which comprises

It obtains from the wake-up word voice data for being waken up equipment；

The wake-up model for being waken up equipment is updated according to the wake-up word voice data；

Updated wake-up model is pushed to and described is waken up equipment.

One kind being waken up equipment, including processor and for the memory of storage processor executable instruction, the place Manage the step of realizing the above method when device executes described instruction.

A kind of server, including processor and for the memory of storage processor executable instruction, the processor The step of realizing the above method when executing described instruction.

A kind of cloud server, including processor and for the memory of storage processor executable instruction, the place Manage the step of realizing the above method when device executes described instruction.

A kind of computer readable storage medium is stored thereon with computer instruction, and it is above-mentioned that described instruction is performed realization The step of method.

A kind of mobile unit, including processor and for the memory of storage processor executable instruction, the processing Device realizes the above method when executing described instruction.

A kind of mobile device, including processor and for the memory of storage processor executable instruction, the processing Device realizes the above method when executing described instruction.

A kind of conference facility, including processor and for the memory of storage processor executable instruction, the processing Device realizes the above method when executing described instruction.

Voice provided by the present application wakes up processing method and data processing method, and being waken up equipment end by control can be The identification for locally carrying out waking up word can will wake up word segment and upload to cloud, cloud can be with base after recognizing wake-up word Be updated optimization to model is waken up in these data, a possibility that reduce false wake-up, improve wake up word identify it is accurate Rate.

Detailed description of the invention

In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this The some embodiments recorded in application, for those of ordinary skill in the art, in the premise of not making the creative labor property Under, it is also possible to obtain other drawings based on these drawings.

Fig. 1 is a kind of configuration diagram of data processing system provided by the present application；

Fig. 2 is the extraction schematic diagram provided by the present application for waking up word sound bite；

Fig. 3 is the data storage provided by the present application for having cloud in the case of multiple equipment；

Fig. 4 is wake-up word identification provided by the present application and wakes up model modification flow diagram；

Fig. 5 is the method flow diagram that voice provided by the present application wakes up processing method；

Fig. 6 is the configuration diagram of terminal device provided by the present application；

Fig. 7 is the structural block diagram that voice provided by the present application wakes up processing unit；

Fig. 8 is the structural block diagram of data processing equipment provided by the present application.

Specific embodiment

In order to make those skilled in the art better understand the technical solutions in the application, below in conjunction with the application reality The attached drawing in example is applied, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described implementation Example is merely a part but not all of the embodiments of the present application.Based on the embodiment in the application, this field is common The application protection all should belong in technical staff's every other embodiment obtained without creative efforts Range.

It, can be by improving identification model in view of the false wake-up problem in the presence of existing wake-up word identification technology Accuracy rate is realized.For this purpose, in this example, being waken up equipment end by control can know in the identification that locally carry out waking up word It is clipped to after waking up word, can will wake up word segment and upload to cloud, cloud can based on these data answer equipment end It is existing, so that it is determined that the false wake-up probability of equipment end out, and optimization can be updated to wake-up model based on these data, so as to A possibility that reducing false wake-up, improves the accuracy rate for waking up word identification.

Based on this, a kind of data processing system is provided in this example, it can be as shown in Figure 1, comprising: be waken up equipment 101, server end 102.

It in one embodiment, can also include: to wake up word detection module 103 to determine language for obtaining voice data Whether wake-up word is had in sound data.Wherein, it wakes up word detection module and can be and be arranged in being waken up equipment 101, be also possible to It is independent with server end 102 and to be waken up equipment 101 self-existent, positioned at server 102 and be waken up between equipment.

Voice data can be obtained by the MIC being waken up in equipment 101 by waking up word detection module 1001.For being transferred to The voice data for waking up word detection module 1001 can be through the voice data after denoising, because have passed through at denoising Reason can effectively improve the accuracy for waking up word identification.

For waking up word detection module 1001, signal processing algorithm can have been run thereon and has waken up word identification model (being referred to as waking up engine), can be handled the voice data of acquisition by the signal processing algorithm of operation, for example, Denoising is carried out, text, or identification voice therein etc. are converted into, it is then, defeated as input data Enter to waking up in word identification model, identifies whether there is wake-up word in voice data by waking up word identification model.

If waking up word detection module 103 to detect in voice data without waking up word, result return can be will test To equipment 101 is waken up, equipment 101 is waken up with control and continues audio monitoring.It is called out if detected in voice data Awake word, then wake up the sound bite data of word in available voice data, wherein the sound bite data are for being sent to clothes Business device end 102, for reappearing the voice scene being waken up in equipment, to detect whether that there are false wake-ups, and can be based on detection As a result wake-up word identification model is further optimized.

In one embodiment, it is contemplated that if only interception wakes up that section of voice data of word, it is incomplete to will lead to information Face, recognition efficiency reduce.In order to solve this problem, as shown in Fig. 2, the beginning and end of wake-up word can be will test forward Extend default frame number (such as 20 frames), extend default frame number (such as 10 frames) backward, such voice data more fully, Ke Yishi Existing more accurate identification.Above-mentioned default frame number can also be by the way of preset time, for example, 5 seconds etc..That is, will inspection The beginning and end of the wake-up measured extends forward 5 seconds, extends 5 seconds backward.

It is to be noted, however, that the quantity and time span of above-mentioned cited frame are all only a kind of exemplary descriptions, When actually realizing, other number of frames can be used, such as: 1 frame, 3 frames, 9 frames, 12 frames, 35 frames, 40 frames etc., this Application is not especially limited, and can be selected according to actual needs.

By taking vehicle intelligent equipment as an example, waking up word is " piggy is hurried up ", when vehicle intelligent equipment recognizes the sound of surrounding In have " piggy is hurried up " this wake up word after, available " piggy is hurried up " corresponding voice data, and " piggy is hurried up " 10 frames and the later data of 10 frames before, to form the sound bite data for carrying and waking up word.It in turn, can be by the voice Fragment data is uploaded to server side.

It is worth noting that, the above-mentioned cited mode and time span that extend forward or backward, are only a kind of Schematic description can in other manners be not construed as limiting this with length, the application when actually realizing.Even It if not considering transimission and storage cost, can not intercept, but all data are all transmitted to server end 102.Pass through The mode of interception can effectively save flow, improving performance, be waken up equipment end using less flow reproduction to reach The purpose of voice data.

In one embodiment, being waken up equipment 101 can all upload all sound bites for recognizing wake-up word To server end 102, can also upload at intervals.For example, it is every occur five times or every appearance three times, upload it is primary, in this way The burden of server end 102 can be reduced, it is also possible to concentrate the sound bite for uploading the word of the wake-up in a period of time, that is, can be with It all uploads, can also upload according to a certain percentage, and when uploading, compressed data can be uploaded, to save stream Amount.

In one embodiment, server end 102 can be handled multiple equipment, that is, be carried out to multiple equipment The processing of false wake-up.For this purpose, as shown in figure 3, use can be increased in the voice data after the interception for being transferred to server end 102 In the mark of identification equipment, so that server end 102 can identify that voice data is which is waken up equipment from.Example Such as, it can be carried in the voice data for sending server end 102 to be each waken up one ID of equipment setting or distribution The ID of equipment.

Server end 102 simultaneously to it is multiple be waken up equipment and handle in the case where, can be according to the language received Sound data are stored respectively according to device id, for example, the wake-up word voice data from equipment 1, stores corresponding to equipment 1 In storage unit, the wake-up word voice data from equipment 2 is stored into the corresponding storage unit of equipment 2, that is, realization is set up separately Standby data storage.

In one embodiment, it for being waken up equipment 101, after identifying wake-up word, can will wake up Word text is also passed to server end 102, in this way, server end is confirmed whether that there are in the case where false wake-up subsequent, so that it may To be directly compared with the wake-up word text, to determine whether there is false wake-up.

In one embodiment, server end 102 can be to multiple wake-up word tablet segment datas of acquisition at Reason to identify one by one to wake-up word segment, and determines whether recognition result is consistent with word is waken up, if unanimously, then it is assumed that It is positive sample, if recognition result and wake-up word are inconsistent, then it is assumed that be negative sample.

When realizing, positive negative sample can be and identify, is also possible to manually to mark, in this regard, this Application is not construed as limiting, and can be selected according to actual needs.

Based on above-mentioned recognition result, server end 102 can count the recognition accuracy of corresponding equipment, and can be with Further the wake-up word identification model for the equipment is trained.For example, if some equipment has 100 wake-up word sounds Segment, wherein have 90 recognition results with wake up word it is consistent, then it is considered that the recognition accuracy of the equipment be 90%.

Based on positive sample obtained above and negative sample, server 102 can to wake up word identification model carry out into The training of one step.

In one embodiment, server end 102 can be cloud server, and cloud platform can also be taken with one group of processing The relatively high processing equipment etc. of the server cluster or a processing capacity that business device forms can be realized, as long as energy Realize the centralized processing to data, the equipment for having higher processing capacity all can serve as the server end.Realization when It waits, cloud is selected to implement in contrast as server end, processing capacity is more stronger, is also easy to set with multiple be waken up It is standby to establish connection.

In one embodiment, server end 102 can be according to acquisition and storage the sound bite pair for waking up word The identification model for being waken up equipment 101 is updated.Model modification is carried out specifically, can trigger according to one of the following conditions:

1) data volume accumulated reaches certain amount；

2) there is the ratio of negative sample beyond certain threshold value in a certain equipment；

3) it is high to wake up error rate for user's active feedback.

In one embodiment, server end 102 can by above-mentioned positive sample and negative sample to identification model into Row training.Specifically, can be trained, be also possible to identification model for each equipment is waken up when realizing Iteration is uniformly updated by an identification model.

It is above-mentioned be waken up equipment 101 and can be intelligent sound box, intelligent automobile, smart television etc. need what is be waken up to set It is standby.

Be specifically described below with reference to a concrete scene, it should be noted, however, that the specific embodiment be only for The application is better described, does not constitute an undue limitation on the present application.

By taking vehicle system as an example, if the wake-up word identification model recognition effect of vehicle system it is bad there is false wake-up will be compared with It is waken up the wake-up fragment data of equipment by uploading in this example for the serious experience for influencing user, utilizes these wake-ups Fragment data can reappear being waken up in equipment as a result, can carry out false wake-up analysis by these results completely offline.Together When, after grabbing wake-up data (including: false wake-up), it can actively update the wake-up word identification mould being waken up in equipment Type, to improve user experience.It can be as shown in figure 4, including the following steps:

S1: it is waken up equipment and uploads the fragment data waken up；

S2: cloud determines that the data uploaded are positive sample or negative sample；

S3: wake up by positive negative sample the update of model；

S4: after regression test passes through, by new wake-up model modification to being waken up in equipment.

Specifically, may include steps of:

S1: the vehicle device MIC voice data recorded is input in vehicle device equipment end, at the signal run in vehicle device equipment end Adjustment method and wake-up engine handle these input datas.

It can be set in vehicle device equipment end and waken up word detection module, which detects whether voice data is wake-up Word just will test result and be back in vehicle device equipment end, then, continue to monitor if it is not to wake up word that testing result, which is, The data of MIC acquisition；If it is word is waken up, then the sound bite data for waking up word are obtained, and wake up the text and equipment of word Id information, and these data informations are uploaded to cloud.

In view of vehicle device equipment end is when carrying out waking up word detection, can there are a starting point and an end point, But for cloud, if directly usually can not all reappear vehicle device out using the piece segment information between the two time points The result of equipment end.For this purpose, waking up word detection module can be extended forward with final choice by the starting point that word detects is waken up 20 frames, tail point extend 10 frames backward, so that on the one hand being uploaded to the data in cloud can be used for reappearing in vehicle device equipment end As a result, to detect false wake-up, be on the other hand used directly for the update training of model；

S2: the wake-up fragment data that cloud uploads collected vehicle device equipment end carries out classification storage according to device id；

S3: the sound bite data of accumulation can be passed through identification engine after data are accumulated to certain amount by cloud It is identified, if recognition result is consistent with word is waken up, is used as positive sample, if recognition result and wake-up word are inconsistent, As negative sample.

S4: by the positive sample and negative sample identified, the accuracy rate waken up in corresponding vehicle device equipment can be calculated；

S5: it can be trained by the positive sample and negative sample identified to model is waken up.

Wherein, identification model can update in triggering one of in the following manner: the data volume of cloud accumulation reaches a fixed number There is the ratio of negative sample beyond certain threshold value in amount, a certain vehicle device equipment, and it is high that user's active feedback wakes up error rate.

S6: carrying out regression test to updated wake-up model, if test passes through, can be pushed directly to corresponding In vehicle device equipment, if exception occurs in regression test, pushed again after can ascertaining the reason.

After wake-up model in the updated is pushed in vehicle device equipment end, restarting equipment can just make user next time With new model.

It should be noted, however, that above-mentioned is the schematic description carried out using vehicle system as being waken up, in reality When border is realized, being waken up equipment can also be vehicle intelligent equipment, mobile device, conference facility, intelligent sound box etc., only If there is the equipment for waking up word demand can realize that the application is to the class for being waken up equipment by method provided by the present application Type and existence form etc. are not especially limited.

Fig. 5 is the method flow diagram that a kind of herein described voice wakes up processing method one embodiment.Although the application It provides as the following examples or method operating procedure shown in the drawings or apparatus structure, but based on conventional or without creativeness Labour may include more or less operating procedure or modular unit in the method or device.In logicality not In the step of there are necessary causalities or structure, the execution sequence of these steps or the modular structure of device are not limited to the application Embodiment description and execution shown in the drawings sequence or modular structure.The device in practice of the method or modular structure Or end product in application, can according to embodiment or method shown in the drawings or modular structure connection the execution of carry out sequence or Person executes (such as environment or even distributed processing environment of parallel processor or multiple threads) parallel.As shown in figure 5, The voice wakes up processing method and may include steps of applied to being waken up in equipment:

Step 501: obtaining voice data；

Step 502: identifying in the voice data whether there is wake-up word by waking up model；

Step 503:, will be in the voice data before the starting position of wake-up word in the case where having identified wake-up word Predetermined number of frames and wake up word end position after predetermined number of frames between sound bite, as wake up word tablet Section is uploaded to server, wherein the server is updated the wake-up model by the wake-up word sound bite.

The above-mentioned wake-up word sound bite by the voice data is uploaded to server and may include:

S1: it is identified from the voice data and wakes up word sound bite；

S2: it obtains and wakes up word and the device identification for being waken up equipment；

S3: the wake-up word sound bite, the wake-up word, the wake-up device are identified, the server is uploaded to.

The condition that triggering carries out model modification can be set, for example, can be the case where meeting at least one the following conditions Under, triggering carries out model modification:

1) quantity of the wake-up word sound bite of the server aggregates reaches preset threshold；

2) server detects that being waken up the ratio that negative sample occurs in equipment exceeds default sample threshold, wherein negative It is inconsistent with the wake-up word recognition result that is waken up equipment end that sample is that server end wakes up word recognition result；

3) server receives the model modification instruction information of user.

It is important to note, however, that the condition that above-mentioned cited trigger model updates is only a kind of exemplary description, When practical realization, trigger model it can also update in other manners.

When carrying out model modification, server be can be within a preset time by positive negative sample to the wake-up mould Type is updated.

In order to realize the effective of data or be ordered into acquisition, equipment can will be waken up according to preset ratio or period The wake-up word sound bite at end is uploaded to server.

Above-mentioned server can be cloud server, can also other type of server.

Embodiment of the method provided by the embodiment of the present application can be in classes such as mobile terminal, terminal or servers As execute in arithmetic unit.For running on computer terminals, Fig. 6 is at a kind of voice wake-up of the embodiment of the present invention The hardware block diagram of the terminal of reason method.As shown in fig. 6, terminal 10 may include one or more (figures In only show one) (processor 102 can include but is not limited to Micro-processor MCV or programmable logic device to processor 102 The processing unit of FPGA etc.), memory 104 for storing data and the transmission module 106 for communication function.Ability Domain those of ordinary skill is appreciated that structure shown in fig. 6 is only to illustrate, and does not cause to limit to the structure of above-mentioned electronic device It is fixed.For example, terminal 10 may also include than shown in Fig. 6 more perhaps less component or have with shown in Fig. 6 not Same configuration.

Memory 104 can be used for storing the software program and module of application software, such as the voice in the embodiment of the present invention Wake up the corresponding program instruction/module of processing method, the software program that processor 102 is stored in memory 104 by operation And module realizes the voice wake-up processing of above-mentioned application program thereby executing various function application and data processing Method.Memory 104 may include high speed random access memory, may also include nonvolatile memory, such as one or more magnetism Storage device, flash memory or other non-volatile solid state memories.In some instances, memory 104 can further comprise phase The memory remotely located for processor 102, these remote memories can pass through network connection to terminal 10.On The example for stating network includes but is not limited to internet, intranet, local area network, mobile radio communication and combinations thereof.

Transmission module 106 is used to that data to be received or sent via a network.Above-mentioned network specific example may include The wireless network that the communication providers of terminal 10 provide.In an example, transmission module 106 includes that a network is suitable Orchestration (Network Interface Controller, NIC), can be connected by base station with other network equipments so as to Internet is communicated.In an example, transmission module 106 can be radio frequency (Radio Frequency, RF) module, For wirelessly being communicated with internet.

Referring to FIG. 7, the voice wakes up processing unit and is applied in the terminal of client-side in Software Implementation, It may include acquiring unit, recognition unit and uploading unit.Wherein:

Acquiring unit, for obtaining voice data；

Recognition unit, for identifying in the voice data whether there is wake-up word by waking up model；

Uploading unit, for the start bit of word will to be waken up in the voice data in the case where having identified wake-up word The sound bite between predetermined number of frames after the end position of predetermined number of frames before setting and wake-up word, as wake-up word Sound bite is uploaded to server, wherein the server carries out the wake-up model by the wake-up word sound bite It updates.

In one embodiment, uploading unit can specifically be identified from the voice data wakes up word tablet Section；It obtains and wakes up word and the device identification for being waken up equipment；By the wake-up word sound bite, the wake-up word, described Wake-up device mark, is uploaded to the server.

In one embodiment, uploading unit specifically can be waken up equipment for described according to preset ratio or period The wake-up word sound bite at end is uploaded to the server.

Referring to FIG. 8, in Software Implementation, the data processing equipment of server side may include: acquiring unit, more New unit and push unit.Wherein:

Acquiring unit, for obtaining from the wake-up word voice data for being waken up equipment；

Updating unit, for being carried out more according to the wake-up word voice data to the wake-up model for being waken up equipment Newly；

Updated wake-up model is pushed to and described is waken up equipment by push unit.

In one embodiment, above-mentioned data processing equipment can also comprise determining that unit, for according to the wake-up The false wake-up ratio of equipment is waken up described in word voice data is determining.

In one embodiment, updating unit specifically can carry out in accordance with the following steps model modification:

S1: it will be waken up in the wake-up word voice data identification model that data are delivered in the server one by one Word identification；

S2: recognition result is compared with the recognition result for being waken up equipment end；

S3: if consistent, it is used as positive sample, if it is inconsistent, as negative sample；

S4: the wake-up model is updated by the positive sample and negative sample.

In one embodiment, updating unit can specifically determine the case where one of meeting but be not limited to the following conditions Under, triggering is updated the wake-up model for being waken up equipment according to the wake-up word voice data:

1) the wake-up word voice data accumulated reaches preset data amount threshold value；

2) it is waken up the ratio that negative sample occurs in equipment and exceeds default sample threshold；

3) it receives and wakes up the high instruction information of error rate.

Although this application provides the method operating procedure as described in embodiment or flow chart, based on conventional or noninvasive The labour for the property made may include more or less operating procedure.The step of enumerating in embodiment sequence is only numerous steps One of execution sequence mode, does not represent and unique executes sequence.It, can when device or client production in practice executes To execute or parallel execute (such as at parallel processor or multithreading according to embodiment or method shown in the drawings sequence The environment of reason).

The device or module that above-described embodiment illustrates can specifically realize by computer chip or entity, or by having The product of certain function is realized.For convenience of description, it is divided into various modules when description apparatus above with function to describe respectively. The function of each module can be realized in the same or multiple software and or hardware when implementing the application.It is of course also possible to Realization the module for realizing certain function is combined by multiple submodule or subelement.

Method, apparatus or module described herein can realize that controller is pressed in a manner of computer readable program code Any mode appropriate is realized, for example, controller can take such as microprocessor or processor and storage can be by (micro-) The computer-readable medium of computer readable program code (such as software or firmware) that processor executes, logic gate, switch, specially With integrated circuit (Application Specific Integrated Circuit, ASIC), programmable logic controller (PLC) and embedding Enter the form of microcontroller, the example of controller includes but is not limited to following microcontroller: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20 and Silicone Labs C8051F320, Memory Controller are also implemented as depositing A part of the control logic of reservoir.It is also known in the art that in addition to real in a manner of pure computer readable program code Other than existing controller, completely can by by method and step carry out programming in logic come so that controller with logic gate, switch, dedicated The form of integrated circuit, programmable logic controller (PLC) and insertion microcontroller etc. realizes identical function.Therefore this controller It is considered a kind of hardware component, and hardware can also be considered as to the device for realizing various functions that its inside includes Structure in component.Or even, it can will be considered as the software either implementation method for realizing the device of various functions Module can be the structure in hardware component again.

Part of module in herein described device can be in the general of computer executable instructions Upper and lower described in the text, such as program module.Generally, program module includes executing particular task or realization specific abstract data class The routine of type, programs, objects, component, data structure, class etc..The application can also be practiced in a distributed computing environment, In these distributed computing environment, by executing task by the connected remote processing devices of communication network.In distribution It calculates in environment, program module can be located in the local and remote computer storage media including storage equipment.

As seen through the above description of the embodiments, those skilled in the art can be understood that the application can It is realized by the mode of software plus required hardware.Based on this understanding, the technical solution of the application is substantially in other words The part that contributes to existing technology can be embodied in the form of software products, and can also pass through the implementation of Data Migration It embodies in the process.The computer software product can store in storage medium, such as ROM/RAM, magnetic disk, CD, packet Some instructions are included to use so that a computer equipment (can be personal computer, mobile terminal, server or network are set It is standby etc.) execute method described in certain parts of each embodiment of the application or embodiment.

Each embodiment in this specification is described in a progressive manner, the same or similar portion between each embodiment Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.The whole of the application or Person part can be used in numerous general or special purpose computing system environments or configuration.Such as: personal computer, server calculate Machine, handheld device or portable device, mobile communication terminal, multicomputer system, based on microprocessor are at laptop device System, programmable electronic equipment, network PC, minicomputer, mainframe computer, the distribution including any of the above system or equipment Formula calculates environment etc..

Although depicting the application by embodiment, it will be appreciated by the skilled addressee that the application there are many deformation and Variation is without departing from spirit herein, it is desirable to which the attached claims include these deformations and change without departing from the application's Spirit.

Claims

1. a kind of voice wakes up processing method, which is characterized in that applied to being waken up in equipment, which comprises

Obtain voice data；

Identify in the voice data whether there is wake-up word by waking up model；

In the case where having identified wake-up word, by the predetermined number of frames before the starting position for waking up word in the voice data Sound bite between the predetermined number of frames after the end position of wake-up word is uploaded to service as word sound bite is waken up Device, wherein the server is updated the wake-up model by the wake-up word sound bite.

2. the method according to claim 1, wherein the wake-up word sound bite in the voice data is uploaded Include: to server

It is identified from the voice data and wakes up word sound bite；

It obtains and wakes up word and the device identification for being waken up equipment；

By the wake-up word sound bite, the wake-up word, wake-up device mark, it is uploaded to the server.

3. the method according to claim 1, wherein the server is by the wake-up word sound bite to institute Wake-up model is stated to be updated, comprising:

In the case where meeting at least one the following conditions, triggers the server and pass through positive negative sample within a preset time to institute Wake-up model is stated to be updated:

The quantity of the wake-up word sound bite of the server aggregates reaches preset threshold；

The server detects that being waken up the ratio that negative sample occurs in equipment exceeds default sample threshold, wherein negative sample is Server end wake-up word recognition result and the wake-up word recognition result for being waken up equipment end are inconsistent；

The server receives the model modification instruction information of user.

4. the method according to claim 1, wherein the wake-up word sound bite in the voice data is uploaded To server, comprising:

According to preset ratio or period, the wake-up word sound bite for being waken up equipment end is uploaded to the server.

5. the method according to claim 1, wherein the server includes: cloud server.

6. one kind is waken up equipment, including processor and for the memory of storage processor executable instruction, the processing Device realizes method described in any one of claims 1 to 5 when executing described instruction.

7. a kind of computer readable storage medium is stored thereon with computer instruction, described instruction, which is performed, realizes that right is wanted The step of seeking any one of 1 to 5 the method.

8. a kind of mobile unit, including processor and for the memory of storage processor executable instruction, the processor Method described in any one of claims 1 to 5 is realized when executing described instruction.

9. a kind of mobile device, including processor and for the memory of storage processor executable instruction, the processor Method described in any one of claims 1 to 5 is realized when executing described instruction.

10. a kind of conference facility, including processor and for the memory of storage processor executable instruction, the processor Method described in any one of claims 1 to 5 is realized when executing described instruction.