CN108536668A

CN108536668A - Wake up word appraisal procedure and device, storage medium, electronic equipment

Info

Publication number: CN108536668A
Application number: CN201810159653.2A
Authority: CN
Inventors: 吴国兵; 潘嘉; 王海坤
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2018-02-26
Filing date: 2018-02-26
Publication date: 2018-09-14
Anticipated expiration: 2038-02-26
Also published as: CN108536668B

Abstract

A kind of wake-up word appraisal procedure of disclosure offer and device, storage medium, electronic equipment.This method includes：Obtain word to be assessed input by user；The assessment feature of the word to be assessed is extracted, the assessment feature is for indicating the word to be assessed in acoustics level and/or the distinction of semantic level；Using the assessment feature of the word to be assessed as input, after the wake-up word assessment models processing through building in advance, determine the word to be assessed if appropriate for as wake-up word.Such scheme helps to improve the accuracy for waking up word assessment result, and then improves the wake-up effect of the wake-up word of user setting.

Description

Wake up word appraisal procedure and device, storage medium, electronic equipment

Technical field

This disclosure relates to voice process technology field, and in particular, to a kind of wake-up word appraisal procedure and device are deposited Storage media, electronic equipment.

Background technology

Voice awakening technology is the important branch in voice process technology field, in smart home, intelligent robot, intelligence Energy vehicle device, smart mobile phone etc. have important application.

In actual application, intelligent terminal captures voice data input by user, by the wake-up model built in advance It carries out waking up word identification, if the voice data is identified as waking up word, wakes up success；Otherwise failure is waken up.

In order to improve the usage experience of user, personalized wake-up word can be according to demand set by user.Meanwhile in order to Ensure wake-up effect, need first to carry out when user setting wakes up word to wake up word assessment, judge user setting wake-up word whether Properly.

Current wake-up word assessment Main Basiss experience or rule are realized.Specifically, waiting for for user setting can be obtained Word is assessed, judges whether word to be assessed meets default evaluation condition, if it is satisfied, then illustrating that word to be assessed is suitable as Wake up word.For example, default evaluation condition may include：The length of word is more than preset length；And/or word includes Difference between syllable is more than default difference.Wherein, the length of word can be presented as the word quantity and/or word that word includes The audio duration of the corresponding voice data of language；Difference between syllable can be presented as whether adjacent syllable is identical, and then count The quantity for going out different adjacent syllables, compared with default difference.

The wake-up word evaluation process so realized based on experience or rule, since rule setting has certain subjectivity Property, cause assessment result accuracy relatively low, and then influence the wake-up effect of the wake-up word of user setting.

Invention content

It is a general object of the present disclosure to provide a kind of wake-up word appraisal procedure and device, storage medium, electronic equipments, help In the accuracy for improving wake-up word assessment result, and then improve the wake-up effect of the wake-up word of user setting.

To achieve the goals above, the disclosure provides a kind of wake-up word appraisal procedure, the method includes：

Obtain word to be assessed input by user；

The assessment feature of the word to be assessed is extracted, the assessment feature is for indicating the word to be assessed in acoustics The distinction of level and/or semantic level；

Using the assessment feature of the word to be assessed as input, after the wake-up word assessment models processing through building in advance, Determine the word to be assessed if appropriate for as wake-up word.

Optionally, for indicating that the assessment feature of distinction of the word to be assessed in acoustics level includes voice unit Distribution characteristics, then the assessment feature of the extraction word to be assessed include：Analyze the language that the word to be assessed includes Sound unit counts number that the total number of voice unit, the number of different phonetic unit, variant voice unit occur, specified At least one of in the number that the number of voice unit, each specified speech unit occur, the distribution as institute's speech units is special Sign；

And/or

For indicating that the assessment feature of distinction of the word to be assessed in acoustics level includes the knowledge of word to be assessed Other probability, then the assessment feature of the extraction word to be assessed include：Obtain the voice list that the word to be assessed includes The identification probability of member；By the mean value of the identification probability of each voice unit, as the identification probability of the word to be assessed, the knowledge Other probability includes accuracy rate and/or false alarm rate；

And/or

For indicate the assessment feature of distinction of the word to be assessed in acoustics level include word to be assessed when Long, then the assessment feature of the extraction word to be assessed includes：Obtain the voice unit that the word to be assessed includes Duration；By the sum of the duration of each voice unit, the duration as the word to be assessed；

And/or

For indicating that the assessment feature of distinction of the word to be assessed in acoustics level includes the sound of word to be assessed Feature is adjusted, then the assessment feature of the extraction word to be assessed includes：Obtain the individual character that the word to be assessed includes Tone calculates the pitch variance between adjacent individual character；It is performed mathematical calculations, is obtained using the pitch variance between the adjacent individual character To the tonality feature of the word to be assessed；

And/or

For indicating that the assessment feature of distinction of the word to be assessed in semantic level includes the score of language model, Then the assessment feature of the extraction word to be assessed includes：Using the word to be assessed as input, through what is built in advance After language model processing, the score of the word to be assessed is exported, the score is used to indicate what the word to be assessed occurred Frequency；

And/or

For indicating that the assessment feature of distinction of the word to be assessed in semantic level includes the word of word to be assessed Property feature, then the assessment feature of the extraction word to be assessed include：Obtain the word that the word to be assessed includes Part of speech；Count the number of different parts of speech, the number that variant part of speech occurs, the part of speech feature as the word to be assessed；

And/or

For indicating that the assessment feature of distinction of the word to be assessed in semantic level includes the suitable of word to be assessed Slippery feature, then the assessment feature of the extraction word to be assessed include：The word for including using the word to be assessed, Calculate the semantic smoothness of forward direction of the word to be assessed and reverse semantic smoothness；Utilize the positive semantic smoothness and institute It states reverse semantic smoothness to perform mathematical calculations, obtains the smoothness feature of the word to be assessed.

Optionally it is determined that when the word to be assessed is not suitable as waking up word, the method further includes：

The problem of extracting the word to be assessed feature；

According to described problem feature, determine that problem types existing for the word to be assessed, described problem type are used for table Show the reason of word to be assessed is not suitable as waking up word.

Optionally, described problem feature includes the score of language model, then existing for the determination word to be assessed Problem types include：Using the word to be assessed as input, after the language model processing through building in advance, output is described to be evaluated Estimate the score of word, the score is used to indicate the frequency that the word to be assessed occurs；When the score of the word to be assessed When more than default score value, judge that problem types existing for the word to be assessed are high frequency vocabulary；

And/or

Described problem feature includes the duration of word to be assessed, then the determination word to be assessed there are the problem of class Type includes：Obtain the duration for the voice unit that the word to be assessed includes；By the sum of the duration of each voice unit, as described The duration of word to be assessed；When the duration of the word to be assessed is less than preset duration, the judgement word to be assessed exists The problem of type be duration it is too short；

And/or

Described problem feature includes the schwa feature of word to be assessed, then is asked existing for the determination word to be assessed Inscribing type includes：Count the number for the schwa phoneme that the word to be assessed includes；When the number of the schwa phoneme is more than pre- If when number, judging that problem types existing for the word to be assessed are excessive for schwa.

According to the semantic similar word knowledge mapping built in advance, the corresponding replaceable word of the word to be assessed is obtained；

The assessment feature of the replaceable word is extracted, the assessment feature is for indicating the replaceable word in acoustics The distinction of level and/or semantic level；

Using the assessment feature of the replaceable word as input institute is determined after wake-up word assessment models processing Replaceable word is stated if appropriate for as wake-up word；

If the replaceable word is suitable as waking up word, recommend the replaceable word to user.

The disclosure provides a kind of wake-up word apparatus for evaluating, and described device includes：

Word acquisition module to be assessed, for obtaining word to be assessed input by user；

Characteristic extracting module, the assessment feature for extracting the word to be assessed are assessed, the assessment feature is used for table Show the word to be assessed in acoustics level and/or the distinction of semantic level；

Word determining module is waken up, for using the assessment feature of the word to be assessed as input, being called out through what is built in advance It wakes up after word assessment models processing, determines the word to be assessed if appropriate for as wake-up word.

Optionally, for indicating that the assessment feature of distinction of the word to be assessed in acoustics level includes voice unit Distribution characteristics, then the assessment characteristic extracting module, the voice unit for including for analyzing the word to be assessed counts language The number of the total number of sound unit, the number of different phonetic unit, the number that variant voice unit occurs, specified speech unit At least one of in the number that mesh, each specified speech unit occur, the distribution characteristics as institute's speech units；

And/or

For indicating that the assessment feature of distinction of the word to be assessed in acoustics level includes the knowledge of word to be assessed Other probability, then the assessment characteristic extracting module, the identification probability for obtaining the voice unit that the word to be assessed includes； By the mean value of the identification probability of each voice unit, as the identification probability of the word to be assessed, the identification probability includes standard True rate and/or false alarm rate；

And/or

For indicate the assessment feature of distinction of the word to be assessed in acoustics level include word to be assessed when It grows, then the assessment characteristic extracting module, the duration for obtaining the voice unit that the word to be assessed includes；By each voice The sum of duration of unit, the duration as the word to be assessed；

And/or

For indicating that the assessment feature of distinction of the word to be assessed in acoustics level includes the sound of word to be assessed Feature is adjusted, then the assessment characteristic extracting module, the tone for obtaining the individual character that the word to be assessed includes calculates adjacent Pitch variance between individual character；It is performed mathematical calculations, is obtained described to be assessed using the pitch variance between the adjacent individual character The tonality feature of word；

And/or

For indicating that the assessment feature of distinction of the word to be assessed in semantic level includes the score of language model, The then assessment characteristic extracting module, for using the word to be assessed as input, the language model through building in advance to be handled Afterwards, the score of the word to be assessed is exported, the score is used to indicate the frequency that the word to be assessed occurs；

And/or

For indicating that the assessment feature of distinction of the word to be assessed in semantic level includes the word of word to be assessed Property feature, then the assessment characteristic extracting module, the part of speech for obtaining the word that the word to be assessed includes；Statistics is different The number that the number of part of speech, variant part of speech occur, the part of speech feature as the word to be assessed；

And/or

For indicating that the assessment feature of distinction of the word to be assessed in semantic level includes the suitable of word to be assessed Slippery feature, then the assessment characteristic extracting module, the word for including using the word to be assessed calculate described to be evaluated Estimate the semantic smoothness of forward direction of word and reverse semantic smoothness；It is suitable using the positive semantic smoothness and the reverse semanteme Slippery performs mathematical calculations, and obtains the smoothness feature of the word to be assessed.

Optionally, described device further includes：

Problem characteristic extraction module, for when determining that the word to be assessed is not suitable as waking up word, described in extraction The problem of word to be assessed feature；

Problem types determining module, for according to described problem feature, determine the word to be assessed there are the problem of class Type, described problem type is for indicating the reason of word to be assessed is not suitable as waking up word.

Optionally, described problem feature includes the score of language model, then described problem determination type module, is used for institute Word to be assessed is stated as input, after the language model processing through building in advance, exports the score of the word to be assessed, it is described Score is used to indicate the frequency that the word to be assessed occurs；When the score of the word to be assessed is more than default score value, sentence Problem types existing for the fixed word to be assessed are high frequency vocabulary；

And/or

Described problem feature includes the duration of word to be assessed, then described problem determination type module, described for obtaining The duration for the voice unit that word to be assessed includes；By the sum of the duration of each voice unit, as the word to be assessed when It is long；When the duration of the word to be assessed is less than preset duration, judge problem types existing for the word to be assessed for when Length is too short；

And/or

Described problem feature includes the schwa feature of word to be assessed, then described problem determination type module, for counting The number for the schwa phoneme that the word to be assessed includes；When the number of the schwa phoneme is more than preset number, institute is judged It is that schwa is excessive to state problem types existing for word to be assessed.

Optionally, described device further includes：

Replaceable word obtains module, for when determining that the word to be assessed is not suitable as waking up word, according to pre- The semantic similar word knowledge mapping first built obtains the corresponding replaceable word of the word to be assessed；

The assessment characteristic extracting module, the assessment feature for extracting the replaceable word, the assessment feature are used In the expression replaceable word in acoustics level and/or the distinction of semantic level；

The wake-up word determining module is used for using the assessment feature of the replaceable word as input, through the wake-up After the processing of word assessment models, determine the replaceable word if appropriate for as wake-up word；

Replaceable word recommending module, for when the replaceable word is suitable as waking up word, recommending institute to user State replaceable word.

The disclosure provides a kind of storage medium, wherein being stored with a plurality of instruction, described instruction is loaded by processor, in execution State the step of waking up word appraisal procedure.

The disclosure provides a kind of electronic equipment, and the electronic equipment includes；

Above-mentioned storage medium；And

Processor, for executing the instruction in the storage medium.

In disclosure scheme, it can carry out waking up word assessment based on the assessment feature of word to be assessed, specifically, assessment is special Sign can objectively reflect that word to be assessed in acoustics level and/or the distinction of semantic level, passes through compared with the existing technology The rule of subjectivity setting carries out waking up word assessment, and disclosure scheme helps to improve the accuracy of assessment result, and then improves and use The wake-up effect of the wake-up word of family setting.

Other feature and advantage of the disclosure will be described in detail in subsequent specific embodiment part.

Description of the drawings

Attached drawing is for providing further understanding of the disclosure, and a part for constitution instruction, with following tool Body embodiment is used to explain the disclosure together, but does not constitute the limitation to the disclosure.In the accompanying drawings：

Fig. 1 is the flow diagram that disclosure scheme wakes up word appraisal procedure；

Fig. 2 is the composition schematic diagram that disclosure scheme wakes up word apparatus for evaluating；

Fig. 3 is the structural schematic diagram for the electronic equipment that disclosure scheme is used to wake up word assessment.

Specific implementation mode

The specific implementation mode of the disclosure is described in detail below in conjunction with attached drawing.It should be understood that this place is retouched The specific implementation mode stated is only used for describing and explaining the disclosure, is not limited to the disclosure.

Referring to Fig. 1, show that the disclosure wakes up the flow diagram of word appraisal procedure.It may comprise steps of：

S101 obtains word to be assessed input by user.

In disclosure scheme, user can be according to self-demand, and it is the word to be assessed for waking up word and using that a work done in the manner of a certain author, which is arranged, Language.Disclosure scheme can be not specifically limited the composition of word to be assessed, can use same languages, can also mix multiple Languages, for example, word to be assessed is " you fly at good news ", " hello iflytek ", " hello news fly " etc., it specifically can be by user's root It is arranged according to demand.

As an example, user can input word to be assessed by voice mode, correspond to this, can pass through Mike Wind obtains word to be assessed input by user；Alternatively, user can input word to be assessed by text mode, correspond to this, Word to be assessed input by user can be obtained by input-output equipment such as keyboards.Disclosure scheme is to obtaining word to be assessed Concrete mode can not limit.

In actual application, the evaluation process of disclosure scheme can be set by the intelligence with voice arousal function It is standby to realize, and then word to be assessed is determined as by the corresponding wake-up word of the smart machine according to assessment result；Alternatively, disclosure side The evaluation process of case can realize by other special equipments, and then according to assessment result is allocated to word to be assessed corresponding Smart machine, for waking up the corresponding smart machine.Disclosure scheme can not do specific limit to the executive agent of evaluation process It is fixed.

S102 extracts the assessment feature of the word to be assessed, and the assessment feature is for indicating the word to be assessed In acoustics level and/or the distinction of semantic level.

After getting word to be assessed input by user, it can extract and indicate word to be assessed in acoustics level and/or language The assessment feature of the distinction of adopted level is used for waking up the processing of word assessment models.

As an example, for indicating that word to be assessed in the assessment feature of the distinction of acoustics level, may include At least one of following characteristics：The distribution characteristics of voice unit, the identification probability of word to be assessed, word to be assessed when The tonality feature of word long, to be assessed.

As an example, for indicating that word to be assessed in the assessment feature of the distinction of semantic level, may include At least one of following characteristics：The score of language model, the part of speech feature of word to be assessed, the smoothness of word to be assessed are special Sign.

Meaning about each character representation and specific extraction process wouldn't be described in detail herein reference can be made to hereafter introducing.

S103, using the assessment feature of the word to be assessed as input, at the wake-up word assessment models through building in advance After reason, determine the word to be assessed if appropriate for as wake-up word.

After extracting assessment feature in word to be assessed, mould can be carried out using the wake-up word assessment models built in advance Type processing determines word to be assessed if appropriate for as wake-up word.

As an example, the output for waking up word assessment models can include 2 output nodes, respectively represent word to be assessed Language is suitable as wake-up word, word to be assessed is not suitable as waking up word；Alternatively, waking up the output of word assessment models can include 1 output node, the point value of evaluation for indicating word to be assessed judge to be assessed if point value of evaluation is less than preset value Word is not suitable as waking up word；Otherwise judge that word to be assessed is suitable as waking up word.Disclosure scheme is to waking up word assessment The output form of model can be not specifically limited.

It to sum up, can be according to word to be assessed in acoustics level and/or language after disclosure scheme gets word to be assessed The distinction of adopted level carries out waking up word assessment.In general, the distinction of word to be assessed is better, when being used as wake-up word Wake-up effect it is better.It carries out waking up word assessment by the rule of subjectivity setting compared with the existing technology, disclosure scheme has more Objectivity helps to improve the accuracy of assessment result, and then improves the wake-up effect of the wake-up word of user setting.

As an example, after waking up the processing of word assessment models, however, it is determined that the word to be assessed that user currently inputs is not It is suitable as waking up word, disclosure scheme also provides following preferred embodiment, to improve the success rate that user setting wakes up word.

Preferred embodiment one, the problem of word to be assessed can be extracted feature；According to problem characteristic, determine that word to be assessed is deposited The problem of type, that is, analyze word to be assessed be not suitable as wake up word the reason of.

As an example, feature can be presented as the score of language model the problem of word to be assessed.It, can corresponding to this Word to be assessed as input, after the language model processing through building in advance, is exported the score of word to be assessed, the score The frequency that word to be assessed occurs can be indicated, in general, the higher frequency for illustrating to occur of score is higher；It is waited for it is then possible to compare It assesses the score of word, preset score value size between the two, when the score of word to be assessed is more than default score value, explanation waits for Assess word occur frequency it is higher, it is likely that occur the word in every-day language, cause smart machine by false wake-up can Energy property increases, therefore can determine that problem types existing for word to be assessed are high frequency vocabulary, i.e., word to be assessed is not suitable as calling out The reason of awake word is that the word to be assessed belongs to high frequency vocabulary.

For example, language model can calculate the score of word to be assessed in the following manner：

Word segmentation processing can be carried out to word to be assessed, obtain word sequence { w₁, w₂..., w_k..., w_f, wherein w_kTable Show k-th of word of word to be assessed；Then probability P (the w that f word sequentially occurs according to the sequence of word sequence is calculated₁, w₂..., w_f), as the frequency that word to be assessed occurs, i.e., the score of word to be assessed.

In disclosure scheme, preferably by word to be assessed from w₁To w_fProbability P (the w in direction₁, w₂..., w_f) indicate to be evaluated Estimate the score of word, can specifically be presented as following formula：

Wherein, P (w_k|w_k-1) acquisition can be counted by general corpus.

As an example, feature can be presented as the duration of word to be assessed the problem of word to be assessed.Corresponding to this, The duration for the voice unit that word to be assessed includes can be obtained；By the sum of the duration of each voice unit, as word to be assessed Duration；It is then possible to the size that the duration of word more to be assessed, preset duration are between the two, when word to be assessed When length is less than preset duration, illustrates that word to be assessed is very short, may be difficult to be captured by smart machine in actual application It arrives, when extracting wherein useful information for carrying out smart machine wake-up, therefore can determine that problem types existing for word to be assessed are The reason of length is too short, i.e., word to be assessed is not suitable as wake-up word is that the duration of the word to be assessed is too short.

For example, the duration of word to be assessed can be calculated in the following manner：

It is possible, firstly, to count to obtain the duration of each voice unit, it specifically, can be advance for each voice unit The pronunciation duration that multiple speakers correspond to the voice unit is acquired, then by the pronunciation duration mean value of multiple speakers, is determined as The duration of the pronunciation unit；It is then possible to analyze the voice unit that word to be assessed includes, so by these voice units when It is the sum of long, it is determined as the duration of word to be assessed.For example, voice unit can be presented as phoneme, syllable etc., disclosure side Case can be not specifically limited this.

As an example, feature can be presented as the schwa feature of word to be assessed the problem of word to be assessed.It is corresponding In this, the number for the schwa phoneme that word to be assessed includes can be counted；Compare both number, preset numbers of schwa phoneme it Between size illustrate that word to be assessed includes poor light of more distinction when the number of schwa phoneme is more than preset number Sound phoneme may influence the wake-up success rate intelligently waken up, therefore can determine that problem types existing for word to be assessed are schwa Excessively, i.e., the reason of word to be assessed is not suitable as waking up word is that the schwa that the word to be assessed includes is excessive.For example, Word to be assessed is " bodhi bodhi ", wherein " Bodhisattva " word includes schwa p, and " carrying " includes schwa t.

It is to be appreciated that the preset number in disclosure scheme, can be a pre-set fixed numbers；Alternatively, Phoneme sum that can also be according to word to be assessed include, pre-set fixed ratio, it is calculated can variable value, this public affairs Evolution case can be not specifically limited this.

In actual application, word to be assessed may be because single reason, be not suitable as waking up word；Or May be to cause it to be not suitable as waking up word because of multiple reasons.Disclosure scheme can be not specifically limited this.

Preferred embodiment two, can be in conjunction with word to be assessed input by user, before ensureing that semanteme is same or analogous as possible It puts, carries out waking up word recommendation for user.

Specifically, it can obtain that word to be assessed is corresponding to be replaced according to the semantic similar word knowledge mapping built in advance Change word；Judge that replaceable word if appropriate for as word is waken up, can be presented as referring next to scheme shown in Fig. 1：Extraction can The assessment feature of word is replaced, assessment feature is for indicating replaceable word in acoustics level and/or the distinction of semantic level； Using the assessment feature of replaceable word as input, through wake up word assessment models processing after, determine replaceable word if appropriate for As wake-up word；If replaceable word is suitable as waking up word, the replaceable word can be recommended to user.

As an example, the problem of can be combined with word to be assessed type determines replaceable word for word to be assessed Language.For example, type is high frequency vocabulary the problem of word " robot " to be assessed, can recommend to be revised as " little Man robots " As replaceable word, to reduce the score of language model；The problem of word " booting " to be assessed type be duration it is too short, It can recommend to be revised as " please be switched on " as replaceable word, to increase pronunciation duration；Word " bodhi bodhi " to be assessed asks It is that schwa is excessive to inscribe type, can recommend to be revised as " your good bodhi " as replaceable word, to reduce schwa quantity.

To sum up, user can know the reason of word to be assessed is not suitable as waking up word, and then targetedly carry out Modification；In addition, in order to improve the success rate of user's modification, user can also be carried out to wake up word recommendation, select to confirm for user. In this way, while improving user setting wake-up word success rate, user experience is also contributed to.

The assessment feature in disclosure scheme is explained below.

1. indicating the assessment feature of distinction of the word to be assessed in acoustics level

(1) distribution characteristics of voice unit

As an example, the voice unit that word to be assessed includes can be analyzed, the distribution for counting voice unit is special Sign.For example, the distribution characteristics of voice unit can be presented as at least one in following items：The sum of voice unit Number, each specified speech of mesh, the number of different phonetic unit, the number that variant voice unit occurs, specified speech unit The number that unit occurs.Wherein, voice unit can be presented as that phoneme, syllable etc., disclosure scheme can not do this specific limit It is fixed.

In general, if the word to be assessed voice unit that includes is very few, such as include only one or two of voice unit, it is daily right There may be many pronunciations similar with word to be assessed in words, causes the pronunciation distinction of word to be assessed relatively low, increase Smart machine is by the possibility of false triggering.In addition, if the voice unit that word to be assessed includes is more, but all voice units All same, such as word to be assessed are " uh uh uh ", it is this pronounce single word to be assessed pronunciation distinction equally also very It is low, easy to produce false triggering.In view of this, disclosure scheme can extract the total number of voice unit, different phonetic unit The number that number, variant voice unit occur, the assessment feature as word to be assessed.

By voice unit be syllable for, word " ding-dong ding-dong " to be assessed can be divided into " ding ", " dong ", 4 voice units of " ding ", " dong ".In the example, the total number of voice unit is 4；The number of different phonetic unit is 2, Respectively " ding ", " dong "；The number that the number that voice unit " ding " occurs is 2, voice unit " dong " occurs is 2.

By taking voice unit is phoneme as an example, it is based on bag of words thinkings, it is contemplated that Chinese or English share 80 sounds Element can set the distribution characteristics of voice unit to 80 dimensional vectors, per one phoneme of one-dimensional representation, per one-dimensional numerical value Indicate the number that the phoneme occurs in word to be assessed.

In addition, it is necessary to explanation, in order to improve the acoustics distinction of word to be assessed, disclosure scheme can also be advance Determine some specified speech units, the designating unit number that word to be assessed includes is more, and acoustics distinction is better, more suitable As wake-up word.In view of this, disclosure scheme can also extract the number of specified speech unit, each specified speech unit occurs Number, the assessment feature as word to be assessed.

For example, can opening degree is big, loudness is big, pronunciation is more visible, is easy the voice unit being captured, be determined as Specified speech unit, for example, combination simple or compound vowel of a Chinese syllable ua, iao, ian, iong etc. of Chinese, English vowel ai, ao etc. can specifically be tied Practical application request setting is closed, disclosure scheme can not limit this.

(2) identification probability of word to be assessed

In disclosure scheme, the identification probability of word to be assessed can be embodied as：The accuracy rate of word to be assessed and/ Or the false alarm rate of word to be assessed.In general, the accuracy rate of word to be assessed is higher, false alarm rate is lower, acoustics distinction is better, It is more suitable as waking up word.

As an example, the identification probability of word to be assessed can be obtained by way of off-line test.With to be assessed For the accuracy rate of word, N positive example sample of word to be assessed can under various circumstances, be acquired, statistics is wherein correct The sample size M of identification calculates the accuracy rate under each environment using M/N；Then again by the equal of the accuracy rate under each environment Value, is determined as the accuracy rate of word to be assessed.By taking the false alarm rate of word to be assessed as an example, it can monitor pre- under various circumstances It fixes time in section, word to be assessed is as word is waken up by the number of false wake-up, for example, the false alarm rate under some environment is 24 hours It is interior by false wake-up 2 times；Then by the mean value of the false alarm rate under varying environment, it is determined as the false alarm rate of word to be assessed.

As an example, the voice unit that can include based on word to be assessed, the identification for obtaining word to be assessed are general Rate.Specifically, the identification probability for the voice unit that word to be assessed includes can be obtained；By the identification probability of each voice unit Mean value, the identification probability as word to be assessed.Wherein, the discrimination of voice unit, false alarm rate are referred to be described above, with Offline mode counts to obtain, and and will not be described here in detail.

(3) duration of word to be assessed

In general, the duration of word to be assessed is longer, acoustics distinction is better, is more suitable as waking up word.It obtains to be evaluated Estimate the process of the duration of word, reference can be made to the place of problem types analysis above is introduced, and will not be described here in detail.

(4) tonality feature of word to be assessed

As an example, the tone that can obtain the individual character that word to be assessed includes calculates the sound between adjacent individual character Variance is adjusted, if for example, the tone of two neighboring individual character is consistent, pitch variance 0；Otherwise pitch variance is 1；Then, it utilizes Pitch variance between adjacent individual character performs mathematical calculations, and calculates the tonality feature of word to be assessed, for example, can be by each tone The mean value of the sum of variance or each pitch variance is determined as the tonality feature of word to be assessed, and disclosure scheme is to specifically counting Student movement is calculated mode and can not be limited.

For example, the tone classifier built in advance can be utilized, the pitch sequences { b of word to be assessed is obtained₁, b₂..., b_j..., b_n, wherein b_jIndicate the corresponding tone types of j-th of individual character of word to be assessed.By taking Chinese as an example, individual character Tone types can be presented as 4 kinds of common tones, identifier " 1 ", " 2 ", " 3 ", " 4 " can be used to indicate different tone； Or the tone types that other languages determine individual character are can be combined with, disclosure scheme can be not specifically limited this.

In general, the pronunciation of the word to be assessed of modulation in tone has more distinction, i.e., the tonality feature value of word to be assessed is got over Greatly, acoustics distinction is better, is more suitable as waking up word.

2. indicating the assessment feature of distinction of the word to be assessed in semantic level

(1) score of language model

In general, the score of language model is higher, and it is higher by the probability of false triggering, it is more not suitable as waking up word.It obtains The process of the score of word to be assessed, reference can be made to the place of problem types analysis above is introduced, and will not be described here in detail.

(2) part of speech feature of word to be assessed

As an example, the part of speech for the word that word to be assessed includes can be obtained；Count the number, each of different parts of speech The number that different parts of speech occur, the part of speech feature as word to be assessed.In general, the part of speech feature that word to be assessed includes is richer Richness, semantic differentiation is better, is more suitable as waking up word.

For example, word segmentation processing can be carried out to word to be assessed, obtains part of speech sequence { q₁, q₂..., q_k..., q_f, Wherein, q_kIndicate the part of speech of k-th of word of word to be assessed.As an example, for following 11 kinds of parts of speech：Noun moves Word, adjective, numeral-classifier compound, pronoun, adverbial word, preposition, conjunction, auxiliary verb, interjection, onomatopoeia, can be by the word of word to be assessed Property feature be set as 11 dimensional vectors, per one-dimensional representation one part of speech, indicate the part of speech in word to be assessed per one-dimensional numerical value The number occurred in language.

(3) the smoothness feature of word to be assessed

As an example, the word that word to be assessed can be utilized to include, the forward direction for calculating word to be assessed are semantic suitable Slippery and reverse semantic smoothness；It is performed mathematical calculations, is obtained to be evaluated using positive semantic smoothness and reverse semantic smoothness Estimate the smoothness feature of word.

The calculation of semantic smoothness, reference can be made to the place of problem types analysis above is introduced, and will not be described here in detail.Its In, positive semanteme smoothness can be presented as word to be assessed from w₁To w_fProbability P (the w in direction₁, w₂..., w_f), it is reverse semantic Smoothness can be presented as word to be assessed from w_fTo w₁Probability P (the w in direction_f, w_f-1..., w₁)。

For example, positive semantic smoothness and reverse semantic smoothness perform mathematical calculations and can be presented as, positive language The absolute value of the difference of adopted smoothness and reverse semantic smoothness.The smoothness characteristic value obtained generally, based on this is bigger, illustrates just To more reasonable, the easier statement of word to be assessed, more it is suitable as waking up word.

For example, positive semantic smoothness and reverse semantic smoothness perform mathematical calculations and can be presented as, positive language The quotient of adopted smoothness and reverse semantic smoothness.The smoothness characteristic value obtained generally, based on this is bigger, illustrates that forward direction is more closed Reason, the easier statement of word to be assessed are more suitable as waking up word.

As an example, a large amount of sample can be acquired and wake up word, calling out in disclosure scheme is obtained based on this training Awake word assessment models.Wherein, sample, which wakes up word, can be presented as that positive example sample wakes up word, negative data wakes up word；In addition, may be used also Positive example sample wake-up word to be labeled as being suitable as waking up word in advance, negative data wake-up word is labeled as uncomfortable cooperation in advance To wake up word.

When carrying out model training, it may be determined that the topological structure for waking up word assessment models well, for example, can be presented as CNN (English：Convolutional Neural Network, Chinese：Convolutional neural networks), RNN (English：Recurrent Neural Network, Chinese：Recognition with Recurrent Neural Network), DNN (English：Deep Neural Network, Chinese：Depth nerve Network) etc., disclosure scheme can be not specifically limited this.In this way, after being waken up from sample and extracting assessment feature in word, it can be with The assessment feature that word is waken up in conjunction with selected topological structure, sample carries out waking up the training of word assessment models, until waking up word assesses mould Until the assessment result of type output is consistent with the assessment result of sample wake-up word mark.

Referring to Fig. 2, show that the disclosure wakes up the composition schematic diagram of word apparatus for evaluating.Described device may include：

Word acquisition module 201 to be assessed, for obtaining word to be assessed input by user；

Characteristic extracting module 202, the assessment feature for extracting the word to be assessed are assessed, the assessment feature is used for Indicate the word to be assessed in acoustics level and/or the distinction of semantic level；

Word determining module 203 is waken up, is used for using the assessment feature of the word to be assessed as input, through what is built in advance After waking up the processing of word assessment models, determine the word to be assessed if appropriate for as wake-up word.

And/or

Optionally, described device further includes：

And/or

Optionally, described device further includes：

About the device in above-described embodiment, wherein modules execute the concrete mode of operation in related this method Embodiment in be described in detail, explanation will be not set forth in detail herein.

Referring to Fig. 3, the structural schematic diagram of electronic equipment 300 of the disclosure for waking up word assessment is shown.Reference Fig. 3, Electronic equipment 300 includes processing component 301, further comprises one or more processors, and by 302 generations of storage medium The storage device resource of table, can be by the instruction of the execution of processing component 301, such as application program for storing.Storage medium 302 The application program of middle storage may include it is one or more each correspond to one group of instruction module.In addition, processing Component 301 is configured as executing instruction, to execute above-mentioned wake-up word appraisal procedure.

Electronic equipment 300 can also include a power supply module 303, be configured as executing the power supply pipe of electronic equipment 300 Reason；One wired or wireless network interface 304 is configured as electronic equipment 300 being connected to network；With an input and output (I/O) interface 305.Electronic equipment 300 can be operated based on the operating system for being stored in storage medium 302, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM or similar.

The preferred embodiment of the disclosure is described in detail above in association with attached drawing, still, the disclosure is not limited to above-mentioned reality The detail in mode is applied, in the range of the technology design of the disclosure, a variety of letters can be carried out to the technical solution of the disclosure Monotropic type, these simple variants belong to the protection domain of the disclosure.

It is further to note that specific technical features described in the above specific embodiments, in not lance In the case of shield, can be combined by any suitable means, in order to avoid unnecessary repetition, the disclosure to it is various can The combination of energy no longer separately illustrates.

In addition, arbitrary combination can also be carried out between a variety of different embodiments of the disclosure, as long as it is without prejudice to originally Disclosed thought equally should be considered as disclosure disclosure of that.

Claims

1. a kind of wake-up word appraisal procedure, which is characterized in that the method includes：

Obtain word to be assessed input by user；

The assessment feature of the word to be assessed is extracted, the assessment feature is for indicating the word to be assessed in acoustics level And/or the distinction of semantic level；

Using the assessment feature of the word to be assessed as input, after the wake-up word assessment models processing through building in advance, determine The word to be assessed is if appropriate for as wake-up word.

2. according to the method described in claim 1, it is characterized in that,

For indicating that the assessment feature of distinction of the word to be assessed in acoustics level includes the distribution characteristics of voice unit, Then the assessment feature of the extraction word to be assessed includes：The voice unit that the word to be assessed includes is analyzed, is counted The number of the total number of voice unit, the number of different phonetic unit, the number that variant voice unit occurs, specified speech unit At least one of in the number that mesh, each specified speech unit occur, the distribution characteristics as institute's speech units；

And/or

For indicate the assessment feature of distinction of the word to be assessed in acoustics level include word to be assessed identification it is general Rate, then the assessment feature of the extraction word to be assessed include：Obtain the voice unit that the word to be assessed includes Identification probability；By the mean value of the identification probability of each voice unit, as the identification probability of the word to be assessed, the identification is general Rate includes accuracy rate and/or false alarm rate；

And/or

For indicating that the assessment feature of distinction of the word to be assessed in acoustics level includes the duration of word to be assessed, then The assessment feature of the extraction word to be assessed includes：Obtain the duration for the voice unit that the word to be assessed includes； By the sum of the duration of each voice unit, the duration as the word to be assessed；

And/or

For indicating that the assessment feature of distinction of the word to be assessed in acoustics level includes the tone spy of word to be assessed It levies, then the assessment feature of the extraction word to be assessed includes：The tone for the individual character that the word to be assessed includes is obtained, Calculate the pitch variance between adjacent individual character；It is performed mathematical calculations using the pitch variance between the adjacent individual character, obtains institute State the tonality feature of word to be assessed；

And/or

For indicating that the assessment feature of distinction of the word to be assessed in semantic level includes the score of language model, then institute It states and extracts the assessment feature of the word to be assessed and include：Using the word to be assessed as input, the language through building in advance After model treatment, the score of the word to be assessed is exported, the score is used to indicate the frequency that the word to be assessed occurs；

And/or

For indicating that the assessment feature of distinction of the word to be assessed in semantic level includes the part of speech spy of word to be assessed It levies, then the assessment feature of the extraction word to be assessed includes：Obtain the part of speech for the word that the word to be assessed includes； Count the number of different parts of speech, the number that variant part of speech occurs, the part of speech feature as the word to be assessed；

And/or

For indicating that the assessment feature of distinction of the word to be assessed in semantic level includes the smoothness of word to be assessed Feature, then the assessment feature of the extraction word to be assessed include：The word for including using the word to be assessed calculates The semantic smoothness of forward direction and reverse semantic smoothness of the word to be assessed；Utilize the positive semantic smoothness and described inverse It performs mathematical calculations to semantic smoothness, obtains the smoothness feature of the word to be assessed.

3. method according to claim 1 or 2, which is characterized in that determine that the word to be assessed is not suitable as waking up When word, the method further includes：

The problem of extracting the word to be assessed feature；

According to described problem feature, problem types existing for the word to be assessed are determined, described problem type is for indicating institute State the reason of word to be assessed is not suitable as waking up word.

4. according to the method described in claim 3, it is characterized in that,

Described problem feature includes the score of language model, then problem types packet existing for the determination word to be assessed It includes：Using the word to be assessed as input, after the language model processing through building in advance, obtaining for the word to be assessed is exported Point, the score is used to indicate the frequency that the word to be assessed occurs；When the score of the word to be assessed is more than default point When value, judge that problem types existing for the word to be assessed are high frequency vocabulary；

And/or

Described problem feature includes the duration of word to be assessed, then problem types packet existing for the determination word to be assessed It includes：Obtain the duration for the voice unit that the word to be assessed includes；By the sum of the duration of each voice unit, as described to be evaluated Estimate the duration of word；When the duration of the word to be assessed is less than preset duration, judge to ask existing for the word to be assessed It is that duration is too short to inscribe type；

And/or

Described problem feature includes the schwa feature of word to be assessed, then the determination word to be assessed there are the problem of class Type includes：Count the number for the schwa phoneme that the word to be assessed includes；When the number of the schwa phoneme is more than present count When mesh, judge that problem types existing for the word to be assessed are excessive for schwa.

5. method according to claim 1 or 2, which is characterized in that determine that the word to be assessed is not suitable as waking up When word, the method further includes：

The assessment feature of the replaceable word is extracted, the assessment feature is for indicating the replaceable word in acoustics level And/or the distinction of semantic level；

It can described in determination after wake-up word assessment models processing using the assessment feature of the replaceable word as input Word is replaced if appropriate for as wake-up word；

6. a kind of wake-up word apparatus for evaluating, which is characterized in that described device includes：

Characteristic extracting module is assessed, the assessment feature for extracting the word to be assessed, the assessment feature is for indicating institute Word to be assessed is stated in acoustics level and/or the distinction of semantic level；

Word determining module is waken up, is used for using the assessment feature of the word to be assessed as input, the wake-up word through building in advance After assessment models processing, determine the word to be assessed if appropriate for as wake-up word.

7. device according to claim 6, which is characterized in that

For indicating that the assessment feature of distinction of the word to be assessed in acoustics level includes the distribution characteristics of voice unit, The then assessment characteristic extracting module, the voice unit for including for analyzing the word to be assessed, counts the total of voice unit Number, the number of specified speech unit, each specified language that number, the number of different phonetic unit, variant voice unit occur At least one of in the number that sound unit occurs, the distribution characteristics as institute's speech units；

And/or

For indicate the assessment feature of distinction of the word to be assessed in acoustics level include word to be assessed identification it is general Rate, then the assessment characteristic extracting module, the identification probability for obtaining the voice unit that the word to be assessed includes；It will be each The mean value of the identification probability of voice unit, as the identification probability of the word to be assessed, the identification probability includes accuracy rate And/or false alarm rate；

And/or

For indicating that the assessment feature of distinction of the word to be assessed in acoustics level includes the duration of word to be assessed, then The assessment characteristic extracting module, the duration for obtaining the voice unit that the word to be assessed includes；By each voice unit The sum of duration, the duration as the word to be assessed；

And/or

For indicating that the assessment feature of distinction of the word to be assessed in acoustics level includes the tone spy of word to be assessed Sign, then the assessment characteristic extracting module, the tone for obtaining the individual character that the word to be assessed includes calculate adjacent individual character Between pitch variance；It is performed mathematical calculations using the pitch variance between the adjacent individual character, obtains the word to be assessed Tonality feature；

And/or

For indicating that the assessment feature of distinction of the word to be assessed in semantic level includes the score of language model, then institute Characteristic extracting module is estimated in commentary, is used for using the word to be assessed as input, defeated after the language model processing through building in advance Go out the score of the word to be assessed, the score is used to indicate the frequency that the word to be assessed occurs；

And/or

For indicating that the assessment feature of distinction of the word to be assessed in semantic level includes the part of speech spy of word to be assessed It levies, then the assessment characteristic extracting module, the part of speech for obtaining the word that the word to be assessed includes；Count different parts of speech Number, variant part of speech occur number, the part of speech feature as the word to be assessed；

And/or

For indicating that the assessment feature of distinction of the word to be assessed in semantic level includes the smoothness of word to be assessed Feature, then the assessment characteristic extracting module, the word for including using the word to be assessed calculate the word to be assessed The semantic smoothness of forward direction and reverse semantic smoothness of language；Utilize the positive semantic smoothness and the reverse semantic smoothness It performs mathematical calculations, obtains the smoothness feature of the word to be assessed.

8. the device described according to claim 6 or 7, which is characterized in that described device further includes：

Problem characteristic extraction module, for when determining that the word to be assessed is not suitable as waking up word, extraction to be described to be evaluated The problem of estimating word feature；

Problem types determining module, for according to described problem feature, determining problem types existing for the word to be assessed, institute Problem types are stated for indicating the reason of word to be assessed is not suitable as waking up word.

9. device according to claim 8, which is characterized in that

Described problem feature includes the score of language model, then described problem determination type module, is used for the word to be assessed Language after the language model processing through building in advance, exports the score of the word to be assessed, the score is used for table as input Show the frequency that the word to be assessed occurs；When the score of the word to be assessed is more than default score value, judgement is described to be evaluated It is high frequency vocabulary to estimate problem types existing for word；

And/or

Described problem feature includes the duration of word to be assessed, then described problem determination type module, described to be evaluated for obtaining Estimate the duration for the voice unit that word includes；By the sum of the duration of each voice unit, the duration as the word to be assessed；When When the duration of the word to be assessed is less than preset duration, judge that problem types existing for the word to be assessed are duration mistake It is short；

And/or

Described problem feature includes the schwa feature of word to be assessed, then described problem determination type module, described for counting The number for the schwa phoneme that word to be assessed includes；When the number of the schwa phoneme is more than preset number, waited for described in judgement It is that schwa is excessive to assess problem types existing for word.

10. the device described according to claim 6 or 7, which is characterized in that described device further includes：

Replaceable word obtains module, for when determining that the word to be assessed is not suitable as waking up word, according to advance structure The semantic similar word knowledge mapping built obtains the corresponding replaceable word of the word to be assessed；

The assessment characteristic extracting module, the assessment feature for extracting the replaceable word, the assessment feature are used for table Show the replaceable word in acoustics level and/or the distinction of semantic level；

The wake-up word determining module, for using the assessment feature of the replaceable word as input, being commented through the wake-up word After estimating model treatment, determine the replaceable word if appropriate for as wake-up word；

Replaceable word recommending module, for when the replaceable word is suitable as waking up word, to user recommend described in can Replace word.

11. a kind of storage medium, wherein being stored with a plurality of instruction, which is characterized in that described instruction is loaded by processor, right of execution Profit requires the step of any one of 1 to 5 the method.

12. a kind of electronic equipment, which is characterized in that the electronic equipment includes；

Storage medium described in claim 11；And

Processor, for executing the instruction in the storage medium.