CN111696555A - Method and system for confirming awakening words - Google Patents

Method and system for confirming awakening words Download PDF

Info

Publication number
CN111696555A
CN111696555A CN202010530753.9A CN202010530753A CN111696555A CN 111696555 A CN111696555 A CN 111696555A CN 202010530753 A CN202010530753 A CN 202010530753A CN 111696555 A CN111696555 A CN 111696555A
Authority
CN
China
Prior art keywords
awakening
score
calculating
word
intelligent equipment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010530753.9A
Other languages
Chinese (zh)
Inventor
冯大航
陈孝良
常乐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing SoundAI Technology Co Ltd
Original Assignee
Beijing SoundAI Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing SoundAI Technology Co Ltd filed Critical Beijing SoundAI Technology Co Ltd
Priority to CN202010530753.9A priority Critical patent/CN111696555A/en
Publication of CN111696555A publication Critical patent/CN111696555A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • G10L2015/025Phonemes, fenemes or fenones being the recognition units

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention provides a method and a system for confirming a wakeup word, wherein the method comprises the following steps: acquiring phoneme characteristics of the awakening words to be analyzed and judging whether to awaken the intelligent equipment or not; when the judgment result is that the intelligent equipment is awakened, calculating the phoneme characteristics to obtain an intermediate result; and inputting the intermediate result into a preset confirmation model for processing to obtain a wakeup confirmation result. In the scheme, whether the intelligent device can be awakened or not is determined by using the phoneme characteristics of the awakening words to be analyzed. If the voice is capable of being awakened, the phoneme characteristics are calculated to obtain an intermediate result, the intermediate result is input into a preset confirmation model to obtain an awakening confirmation result, and false awakening is reduced while the awakening rate of the intelligent equipment is ensured.

Description

Method and system for confirming awakening words
Technical Field
The invention relates to the technical field of voice recognition, in particular to a method and a system for confirming a wakeup word.
Background
With the development of artificial intelligence, intelligent devices are increasingly being widely used. In the smart device, the wake-up plays an important role in the smart application, and in order to perform interaction between a person and the smart device, at present, a wake-up word is generally required to wake up the smart device, and then the interaction is performed.
When the smart device is awakened, the difficulty level of awakening the smart device is closely related to the use experience of the user. Therefore, how to reduce false wake-up while ensuring the wake-up rate is a problem that needs to be solved nowadays.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method and a system for confirming a wakeup word, so as to reduce false wakeup while ensuring a wakeup rate.
In order to achieve the above purpose, the embodiments of the present invention provide the following technical solutions:
the first aspect of the embodiments of the present invention discloses a method for confirming a wakeup word, where the method includes:
acquiring phoneme characteristics of the awakening words to be analyzed and judging whether to awaken the intelligent equipment or not;
when the judgment result is that the intelligent equipment is awakened, calculating the phoneme characteristics to obtain an intermediate result;
and inputting the intermediate result into a preset confirmation model for processing to obtain a wakeup confirmation result.
Preferably, the process of determining whether to wake up the smart device includes:
respectively calculating a first score of the first awakening path and a second score of the second awakening path by using the phoneme characteristics;
calculating a score difference between the first score and the second score;
if the score difference is smaller than a score threshold value, the intelligent equipment is confirmed to be awakened;
and if the score difference is larger than or equal to the score threshold value, determining that the intelligent equipment is not awakened.
Preferably, the calculating the phoneme characteristics to obtain an intermediate result includes:
respectively calculating a first score of the first awakening path and a second score of the second awakening path by using the phoneme characteristics;
calculating a score difference between the first score and the second score;
and calculating the time length and the average posterior probability of each initial and final.
Preferably, the process of obtaining the confirmation model includes:
inputting the sample data of the awakening word and the sample data of the non-awakening word into a preset neural network model, and training the neural network model until the neural network model converges to obtain the confirmation model.
Preferably, the process of calculating the average posterior probability of each initial and final includes:
determining the frame number of each initial consonant and vowel;
calculating the posterior probability of each frame of the initial consonants and the final consonants aiming at each frame of the initial consonants and the final consonants;
and calculating the average value of the posterior probabilities of the initials and the finals of each frame aiming at each initial and final to obtain the average posterior probability of the initial and the final.
Preferably, the inputting the intermediate result into a preset confirmation model for processing to obtain a wakeup confirmation result includes:
inputting the intermediate result into a preset confirmation model to confirm the awakening word, and judging whether the awakening word to be analyzed is the awakening word for awakening the intelligent equipment;
if so, determining the awakening word to be analyzed as the awakening word for awakening the intelligent equipment;
if not, determining that the awakening word to be analyzed is not the awakening word for awakening the intelligent equipment.
Preferably, the inputting the intermediate result into a preset confirmation model for processing to obtain a wakeup confirmation result includes: and inputting the fractional value difference, the time length of each initial consonant and the time length of each final consonant and the average posterior probability into a preset confirmation model for processing to obtain an awakening confirmation result.
A second aspect of the present invention discloses a system for confirming a wakeup word, where the system includes:
the processing unit is used for acquiring the phoneme characteristics of the awakening words to be analyzed and judging whether to awaken the intelligent equipment or not;
the calculating unit is used for calculating the phoneme characteristics to obtain an intermediate result when the judging result is that the intelligent equipment is awakened;
and the awakening confirmation unit is used for inputting the intermediate result into a preset confirmation model for processing to obtain an awakening confirmation result.
Preferably, the processing unit includes:
the first calculating module is used for respectively calculating a first score of the first awakening path and a second score of the second awakening path by utilizing the phoneme characteristics;
the second calculation module is used for calculating the score difference between the first score and the second score;
and the determining module is used for determining to awaken the intelligent equipment if the score difference is smaller than a score threshold value, and determining not to awaken the intelligent equipment if the score difference is larger than or equal to the score threshold value.
Preferably, the calculation unit includes:
the first calculating module is used for respectively calculating a first score of the first awakening path and a second score of the second awakening path by utilizing the phoneme characteristics;
the second calculation module is used for calculating the score difference between the first score and the second score;
and the third calculating module is used for calculating the time length and the average posterior probability of each initial and final.
Based on the method and the system for confirming the awakening word provided by the embodiment of the invention, the method comprises the following steps: acquiring phoneme characteristics of the awakening words to be analyzed and judging whether to awaken the intelligent equipment or not; when the judgment result is that the intelligent equipment is awakened, calculating the phoneme characteristics to obtain an intermediate result; and inputting the intermediate result into a preset confirmation model for processing to obtain a wakeup confirmation result. In the scheme, whether the intelligent device can be awakened or not is determined by using the phoneme characteristics of the awakening words to be analyzed. If yes, calculating the phoneme characteristics to obtain an intermediate result, inputting the intermediate result into a preset confirmation model to obtain a wakeup confirmation result, and reducing false wakeup while ensuring the wakeup rate of the intelligent equipment.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a flowchart of a method for confirming a wakeup word according to an embodiment of the present invention;
fig. 2 is a flowchart of determining whether to wake up an intelligent device according to an embodiment of the present invention;
fig. 3 is a schematic diagram of phoneme characteristics of a wakeup word to be analyzed according to an embodiment of the present invention;
FIG. 4 is a flow chart of computing intermediate results provided by embodiments of the present invention;
fig. 5 is a block diagram of a system for confirming a wakeup word according to an embodiment of the present invention;
fig. 6 is another block diagram of a system for confirming a wakeup word according to an embodiment of the present invention;
fig. 7 is a block diagram of another structure of a system for confirming a wakeup word according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In this application, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
As can be seen from the background art, a corresponding wake-up word is required when the smart device is woken up, and the difficulty of waking up the smart device is closely related to the user experience. How to reduce false awakening while ensuring the awakening rate is a problem which needs to be solved urgently today.
Therefore, the embodiment of the present invention provides a method and a system for confirming a wakeup word, where if an intelligent device is woken up, a phoneme feature of the word to be woken up is calculated to obtain an intermediate result. And inputting the intermediate result into a confirmation model for processing to obtain a wakeup confirmation result so as to reduce false wakeup while ensuring the wakeup rate.
Referring to fig. 1, a flowchart of a method for confirming a wakeup word according to an embodiment of the present invention is shown, where the method for confirming a wakeup word includes the following steps:
step S101: and acquiring the phoneme characteristics of the awakening words to be analyzed.
It should be noted that the phoneme features at least include initials and finals of the wake-up word to be analyzed, and the wake-up word to be analyzed is a word used by the current user to wake up the smart device.
It can be understood that the initials and finals of the wake words to be analyzed indicate the initials and finals that constitute the wake words to be analyzed, that is, obtaining the initials and finals of the wake words to be analyzed is to obtain the initials and finals corresponding to the wake words to be analyzed.
It should be further noted that a corresponding wake-up word is set for the smart device in advance, that is, when the smart device acquires the wake-up word, a wake-up operation is performed.
For example: the 'turn on sound box' is set as a wake-up word of the intelligent sound box. When the user speaks to the smart speaker: when the sound box is turned on, the intelligent sound box is awakened, and then man-machine interaction can be carried out, wherein the turning on of the sound box is an awakening word of the preset intelligent sound box.
In the process of specifically implementing step S101, after the wake-up word to be analyzed is obtained, feature extraction is performed on the wake-up word to be analyzed, so as to obtain a phoneme feature of the wake-up word to be analyzed.
For example: the 'turn on speaker' is the wake-up word to be analyzed. And receiving the 'opening sound box', and extracting the initial consonant and the final consonant (initial consonant and final vowel) of each character in the 'opening sound box'. The initial consonant obtained by the word "open" is "k", the initial consonant obtained by the word "open" is "q", the initial consonant obtained by the word "sound" is "y", and the initial consonant obtained by the word "box" is "x". The opening character obtains the final sound as ai, the opening character obtains the final sound as i, the pronunciation character obtains the final sound as in, and the box character obtains the final sound as iang. The above obtains 8 phonemes corresponding to the "open speaker", and adds the mute phoneme "sil", that is, obtains 9 types of phoneme features in total.
Step S102: and judging whether to awaken the intelligent equipment. If the determination result is to wake up the smart device, step S103 is executed.
In the process of implementing step S102 specifically, the phoneme feature of the wake-up word to be analyzed is used to calculate the score of the wake-up word to be analyzed, and if the calculated score meets the preset wake-up condition, it is determined that the wake-up word to be analyzed wakes up the intelligent device, that is, if the calculated score meets the preset wake-up condition, the determination result is to wake up the intelligent device.
Step S103: and calculating the phoneme characteristics to obtain an intermediate result.
It should be noted that, after waking up the smart device, a corresponding intermediate result is generated, and the intermediate result at least includes: the awakening score, the posterior probability of each initial and final, the duration length of each initial and final, and the like. In the process of implementing step S103 specifically, the phoneme characteristics of the wake-up word to be analyzed are calculated to obtain a corresponding intermediate result.
It should be noted that, for the same wake-up word, due to different pronunciation habits of users, the time length (duration length) of each initial and final of the wake-up word is different, and the time lengths between each initial and final also affect each other, for example: when the speaking speed of the user is fast, the time length of each initial consonant and vowel is also short.
Step S104: and inputting the intermediate result into a preset confirmation model for processing to obtain a wakeup confirmation result.
It should be noted that the confirmation model is obtained by training the neural network model based on sample data in advance, and the specific process is as follows: and collecting sample data such as awakening word sample data and non-awakening word sample data in advance, calculating the intermediate result of each sample awakening word and each sample non-awakening word according to the method, inputting the intermediate result of each sample awakening word and each sample non-awakening word into a preset neural network model, and training the neural network model until the neural network model converges to obtain a confirmation model.
For example: and training a Deep Neural Network (DNN) model by using the sample data of the awakening words and the sample data of the non-awakening words until convergence, and obtaining a confirmation model.
In the process of implementing step S104 specifically, the intermediate result corresponding to the wake-up word to be analyzed is input into the confirmation model for processing, so as to obtain a wake-up word confirmation result, that is, whether the wake-up word to be analyzed is a wake-up word for waking up the intelligent device is determined, if yes, the wake-up word to be analyzed is determined to be the wake-up word for waking up the intelligent device, and if not, the wake-up word to be analyzed is determined not to be the wake-up word for waking up the intelligent device. That is, the output result of the verification model can verify whether the wakeup word to be analyzed is a wakeup word for waking up the smart device.
It should be noted that, after the wake word to be analyzed is input into the confirmation model, the result output by the confirmation model is a number between 0 and 1. That is, if the result output by the confirmation model is greater than or equal to the wakeup word score threshold, the wakeup word to be analyzed is determined to be the wakeup word that wakes up the smart device. And if the result output by the confirmation model is smaller than the awakening word score threshold value, determining that the awakening word to be analyzed is not the awakening word for awakening the intelligent equipment.
In the embodiment of the invention, whether the intelligent equipment can be awakened or not is determined by utilizing the phoneme characteristics of the awakening word to be analyzed, if so, the phoneme characteristics are calculated to obtain an intermediate result, the intermediate result is input into a preset confirmation model, whether the awakening word to be analyzed is the awakening word for awakening the intelligent equipment or not is determined, and the awakening rate of the intelligent equipment is ensured and meanwhile, the false awakening is reduced.
In the above embodiment of the present invention, referring to fig. 2, a process of determining whether to wake up an intelligent device related in step S102 in fig. 1 shows a flowchart of determining whether to wake up an intelligent device according to an embodiment of the present invention, which includes the following steps:
step S201: and respectively calculating a first score of the first wake-up path and a second score of the second wake-up path by using the phoneme characteristics.
In the process of implementing step S201 specifically, a viterbi algorithm and phoneme features are used to calculate a first score of the first wake-up path and a second score of the second wake-up path, respectively.
To better explain the content in step S201, the illustration is made by using a schematic diagram of the phoneme characteristics of the wake word to be analyzed shown in fig. 3, and it should be noted that fig. 3 is only used for illustration.
As shown in fig. 3, the "sound box on" is used as the wake-up word to be analyzed, wherein the phoneme feature of the wake-up word to be analyzed is 8 classes, and then the silence phoneme "sil" is added, so that the total number of the phoneme features is 9 classes. Using the viterbi algorithm, a first score of the first wake-up path is calculated, and a second score of the second wake-up path is calculated.
Step S202: a score difference between the first score and the second score is calculated.
In the process of specifically implementing step S202, a difference between the first score and the second score is calculated, so as to obtain a score difference between the first score and the second score.
Presetting a corresponding score threshold, if the score difference is smaller than the score threshold, determining to awaken the intelligent equipment, namely determining to-be-analyzed awakening words to awaken the intelligent equipment, and if the score difference is larger than or equal to the score threshold, determining to not awaken the intelligent equipment, namely determining to-be-analyzed awakening words not to awaken the intelligent equipment.
Step S203: and if the difference value of the scores is smaller than the score threshold value, the intelligent equipment is determined to be awakened.
Step S204: and if the score difference is larger than or equal to the score threshold value, determining that the intelligent equipment is not awakened.
In the embodiment of the invention, a first score of the first wake-up path and a second score of the second wake-up path are respectively calculated by utilizing a Viterbi algorithm and phoneme characteristics. And determining whether the awakening word to be analyzed can awaken the intelligent equipment or not by utilizing the difference between the first score and the second score so as to ensure the awakening rate.
The process of calculating the intermediate result involved in step S103 in fig. 1 in the embodiment of the present invention described above is shown in fig. 4, which is a flowchart of calculating the intermediate result provided in the embodiment of the present invention, and includes the following steps:
step S401: and respectively calculating a first score of the first wake-up path and a second score of the second wake-up path by using the phoneme characteristics.
In the process of specifically implementing step S401, please refer to the content in step S201 in fig. 2 in the embodiment of the present invention, and details thereof are not described herein again.
Step S402: a score difference between the first score and the second score is calculated.
Step S403: and calculating the time length and the average posterior probability of each initial and final.
It should be noted that after the wake-up word to be analyzed is determined by using the score difference to wake up the smart device, the alignment result of the wake-up word to be analyzed is determined, that is, the number of frames of each initial consonant and vowel of the wake-up word to be analyzed is determined. That is, the number of frames of each initial consonant of the wake-up word to be analyzed is determined, and the number of frames of each final of the wake-up word to be analyzed is determined.
In the process of implementing step S403 specifically, the number of frames of each initial and final is determined, and for each initial and final, the posterior probability of each initial and final is calculated. And calculating the average value of the posterior probabilities of the initials and the finals of each frame aiming at each initial and final to obtain the average posterior probability of the initial and the final.
For example: and (3) taking the 'opening sound box' as a wake-up word to be analyzed, calculating the posterior probability of each frame 'k' on the assumption that the frame number of the initial consonant 'k' is 5 frames, and calculating the average value of the posterior probabilities of the 5 frames 'k', namely obtaining the average posterior probability of the 'k'. Through the method, the average posterior probability of each initial consonant and vowel of the awakening word to be analyzed is calculated.
That is, the intermediate results of the wake word to be analyzed include at least: the difference of the values of the components, the time length of each initial and final and the average posterior probability of each initial and final.
For example: the 'opening sound box' is used as a wake-up word to be analyzed, the 'opening sound box' comprises 8 initials and finals, and each initial and final corresponds to respective time length and average posterior probability. That is to say, the intermediate result of "turning on the sound box" is the difference of the values of the components, the respective time length and the average posterior probability corresponding to each initial and final, and the total number is 17.
And in the process of inputting the intermediate result into a preset confirmation model to confirm the awakening words, inputting the score difference corresponding to the awakening words to be analyzed, the time length of each initial and final and the average posterior probability into the confirmation model to be processed (awakening word confirmation) to obtain awakening confirmation results, namely determining whether the awakening words to be analyzed are awakening words for awakening the intelligent equipment.
For example: inputting the 17 values (only used for example) corresponding to the wake-up word to be analyzed into a confirmation model to confirm the wake-up word, and determining whether the wake-up word to be analyzed is a wake-up word for waking up the intelligent device.
Similarly, in the process of training the neural network model, for sample words such as each sample awakening word and each sample non-awakening word, the score difference corresponding to each sample word, the time length of each initial and final sound and the average posterior probability are calculated according to the above mode. For example: and calculating 17 values corresponding to each sample awakening word, calculating 17 values of each sample non-awakening word, inputting each sample awakening word and 17 values corresponding to each sample non-awakening word into the neural network model, and training the neural network model until the neural network model converges to obtain the confirmation model.
In the embodiment of the invention, the value difference between the first value and the second value corresponding to the awakening word to be analyzed is calculated, and the time length and the average posterior probability of each initial and final are calculated. And inputting the score difference corresponding to the awakening words to be analyzed, the time length of each initial and final sound and the average posterior probability into a confirmation model to confirm the awakening words, determining whether the awakening words to be analyzed are awakening words for awakening the intelligent equipment, and reducing false awakening while ensuring the awakening rate of the intelligent equipment.
Corresponding to the method for confirming a wakeup word provided in the embodiment of the present invention, referring to fig. 5, an embodiment of the present invention further provides a structural block diagram of a system for confirming a wakeup word, where the system for confirming a wakeup word includes: a processing unit 501, a calculating unit 502 and a wakeup confirmation unit 503;
the processing unit 501 is configured to obtain a phoneme characteristic of the wake-up word to be analyzed and determine whether to wake up the intelligent device.
The calculating unit 502 is configured to calculate the phoneme feature to obtain an intermediate result when the determination result indicates that the intelligent device is awakened.
And a wakeup confirmation unit 503, configured to input the intermediate result into a preset confirmation model for processing, so as to obtain a wakeup confirmation result.
In a specific implementation, the wake-up confirmation unit 503 for obtaining the confirmation model is specifically configured to: inputting the sample data of the awakening word and the sample data of the non-awakening word into a preset neural network model, and training the neural network model until the neural network model converges to obtain a confirmation model.
In a specific implementation, the wakeup confirmation unit 503 is specifically configured to: and inputting the intermediate result into a preset confirmation model to confirm the awakening word, judging whether the awakening word to be analyzed is the awakening word for awakening the intelligent equipment, if so, determining that the awakening word to be analyzed is the awakening word for awakening the intelligent equipment, and if not, determining that the awakening word to be analyzed is not the awakening word for awakening the intelligent equipment.
In the embodiment of the invention, whether the intelligent equipment can be awakened or not is determined, if yes, the phoneme characteristics are calculated to obtain an intermediate result, the intermediate result is input into a preset confirmation model to be processed to obtain an awakening confirmation result, and the awakening rate of the intelligent equipment is ensured while error awakening is reduced.
Referring to fig. 6 in conjunction with the content shown in fig. 5, another structural block diagram of a system for confirming a wakeup word according to an embodiment of the present invention is shown, where the processing unit 501 includes: a first calculation module 5011, a second calculation module 5012, and a determination module 5013;
the first calculating module 5011 is configured to calculate a first score of the first wake-up path and a second score of the second wake-up path respectively by using the phoneme characteristics.
A second calculating module 5012 for calculating a score difference between the first score and the second score.
The determining module 5013 is configured to determine to wake up the smart device if the score difference is smaller than the score threshold, and determine not to wake up the smart device if the score difference is greater than or equal to the score threshold.
In the embodiment of the invention, a first score of the first wake-up path and a second score of the second wake-up path are respectively calculated by utilizing a Viterbi algorithm and phoneme characteristics. And determining whether the awakening word to be analyzed can awaken the intelligent equipment or not by utilizing the difference between the first score and the second score so as to ensure the awakening rate.
Referring to fig. 7 in conjunction with the content shown in fig. 5, there is shown another structural block diagram of a system for confirming a wakeup word according to an embodiment of the present invention, where the computing unit 502 includes: a first computation module 5021, a second computation module 5022, and a third computation module 5023;
a first calculating module 5021, configured to calculate a first score of the first wake-up path and a second score of the second wake-up path respectively by using the phoneme characteristics.
A second calculating module 5022 is used for calculating the score difference between the first score and the second score.
And the third calculating module 5023 is used for calculating the time length and the average posterior probability of each initial and final.
In a specific implementation, the third calculation module 5023 is specifically configured to: determining the frame number of each initial and final, calculating the posterior probability of each initial and final for each initial and final, and calculating the average value of the posterior probability of each initial and final for each initial and final to obtain the average posterior probability of the initial and final.
Correspondingly, the wakeup confirmation unit 503 is specifically configured to: and inputting the score difference, the time length of each initial and final and the average posterior probability into a preset confirmation model for processing to obtain an awakening confirmation result.
In the embodiment of the invention, the value difference between the first value and the second value corresponding to the awakening word to be analyzed is calculated, and the time length and the average posterior probability of each initial and final are calculated. And inputting the value difference corresponding to the awakening words to be analyzed, the time length of each initial and final sound and the average posterior probability into a confirmation model for processing to obtain an awakening confirmation result, so that the awakening rate of the intelligent equipment is ensured, and meanwhile, false awakening is reduced.
To sum up, the embodiment of the present invention provides a method and a system for confirming a wakeup word, where the method includes: acquiring phoneme characteristics of the awakening words to be analyzed and judging whether to awaken the intelligent equipment or not; when the judgment result is that the intelligent equipment is awakened, calculating the phoneme characteristics to obtain an intermediate result; and inputting the intermediate result into a preset confirmation model for processing to obtain a wakeup confirmation result. In the scheme, whether the intelligent device can be awakened or not is determined by using the phoneme characteristics of the awakening words to be analyzed. If the voice is capable of being awakened, the phoneme characteristics are calculated to obtain an intermediate result, the intermediate result is input into a preset confirmation model to obtain an awakening confirmation result, and false awakening is reduced while the awakening rate of the intelligent equipment is ensured.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, the system or system embodiments are substantially similar to the method embodiments and therefore are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for related points. The above-described system and system embodiments are only illustrative, wherein the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method for acknowledging a wake word, the method comprising:
acquiring phoneme characteristics of the awakening words to be analyzed and judging whether to awaken the intelligent equipment or not;
when the judgment result is that the intelligent equipment is awakened, calculating the phoneme characteristics to obtain an intermediate result;
and inputting the intermediate result into a preset confirmation model for processing to obtain a wakeup confirmation result.
2. The method of claim 1, wherein determining whether to wake up the smart device comprises:
respectively calculating a first score of the first awakening path and a second score of the second awakening path by using the phoneme characteristics;
calculating a score difference between the first score and the second score;
if the score difference is smaller than a score threshold value, the intelligent equipment is confirmed to be awakened;
and if the score difference is larger than or equal to the score threshold value, determining that the intelligent equipment is not awakened.
3. The method of claim 1, wherein said computing said phoneme features yields an intermediate result, comprising:
respectively calculating a first score of the first awakening path and a second score of the second awakening path by using the phoneme characteristics;
calculating a score difference between the first score and the second score;
and calculating the time length and the average posterior probability of each initial and final.
4. The method of claim 1, wherein obtaining the validation model comprises:
inputting the sample data of the awakening word and the sample data of the non-awakening word into a preset neural network model, and training the neural network model until the neural network model converges to obtain the confirmation model.
5. The method of claim 3, wherein the step of calculating the average posterior probability of each initial and final comprises:
determining the frame number of each initial consonant and vowel;
calculating the posterior probability of each frame of the initial consonants and the final consonants aiming at each frame of the initial consonants and the final consonants;
and calculating the average value of the posterior probabilities of the initials and the finals of each frame aiming at each initial and final to obtain the average posterior probability of the initial and the final.
6. The method according to claim 1, wherein the inputting the intermediate result into a preset confirmation model for processing to obtain a wakeup confirmation result comprises:
inputting the intermediate result into a preset confirmation model to confirm the awakening word, and judging whether the awakening word to be analyzed is the awakening word for awakening the intelligent equipment;
if so, determining the awakening word to be analyzed as the awakening word for awakening the intelligent equipment;
if not, determining that the awakening word to be analyzed is not the awakening word for awakening the intelligent equipment.
7. The method according to claim 3, wherein the inputting the intermediate result into a preset confirmation model for processing to obtain a wakeup confirmation result comprises: and inputting the fractional value difference, the time length of each initial consonant and the time length of each final consonant and the average posterior probability into a preset confirmation model for processing to obtain an awakening confirmation result.
8. A system for acknowledging a wakeup word, the system comprising:
the processing unit is used for acquiring the phoneme characteristics of the awakening words to be analyzed and judging whether to awaken the intelligent equipment or not;
the calculating unit is used for calculating the phoneme characteristics to obtain an intermediate result when the judging result is that the intelligent equipment is awakened;
and the awakening confirmation unit is used for inputting the intermediate result into a preset confirmation model for processing to obtain an awakening confirmation result.
9. The system of claim 8, wherein the processing unit comprises:
the first calculating module is used for respectively calculating a first score of the first awakening path and a second score of the second awakening path by utilizing the phoneme characteristics;
the second calculation module is used for calculating the score difference between the first score and the second score;
and the determining module is used for determining to awaken the intelligent equipment if the score difference is smaller than a score threshold value, and determining not to awaken the intelligent equipment if the score difference is larger than or equal to the score threshold value.
10. The system of claim 8, wherein the computing unit comprises:
the first calculating module is used for respectively calculating a first score of the first awakening path and a second score of the second awakening path by utilizing the phoneme characteristics;
the second calculation module is used for calculating the score difference between the first score and the second score;
and the third calculating module is used for calculating the time length and the average posterior probability of each initial and final.
CN202010530753.9A 2020-06-11 2020-06-11 Method and system for confirming awakening words Pending CN111696555A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010530753.9A CN111696555A (en) 2020-06-11 2020-06-11 Method and system for confirming awakening words

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010530753.9A CN111696555A (en) 2020-06-11 2020-06-11 Method and system for confirming awakening words

Publications (1)

Publication Number Publication Date
CN111696555A true CN111696555A (en) 2020-09-22

Family

ID=72480423

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010530753.9A Pending CN111696555A (en) 2020-06-11 2020-06-11 Method and system for confirming awakening words

Country Status (1)

Country Link
CN (1) CN111696555A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114333799A (en) * 2022-03-09 2022-04-12 深圳市友杰智新科技有限公司 Detection method and device for phase-to-phase sound misidentification and computer equipment
CN116884399A (en) * 2023-09-06 2023-10-13 深圳市友杰智新科技有限公司 Method, device, equipment and medium for reducing voice misrecognition

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019502947A (en) * 2015-11-30 2019-01-31 ゼットティーイー コーポレイション Voice wakeup implementation method, apparatus and terminal, and computer storage medium
CN110428810A (en) * 2019-08-30 2019-11-08 北京声智科技有限公司 A kind of recognition methods, device and electronic equipment that voice wakes up
CN110473536A (en) * 2019-08-20 2019-11-19 北京声智科技有限公司 A kind of awakening method, device and smart machine
CN110570857A (en) * 2019-09-06 2019-12-13 北京声智科技有限公司 Voice wake-up method and device, electronic equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019502947A (en) * 2015-11-30 2019-01-31 ゼットティーイー コーポレイション Voice wakeup implementation method, apparatus and terminal, and computer storage medium
CN110473536A (en) * 2019-08-20 2019-11-19 北京声智科技有限公司 A kind of awakening method, device and smart machine
CN110428810A (en) * 2019-08-30 2019-11-08 北京声智科技有限公司 A kind of recognition methods, device and electronic equipment that voice wakes up
CN110570857A (en) * 2019-09-06 2019-12-13 北京声智科技有限公司 Voice wake-up method and device, electronic equipment and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114333799A (en) * 2022-03-09 2022-04-12 深圳市友杰智新科技有限公司 Detection method and device for phase-to-phase sound misidentification and computer equipment
CN114333799B (en) * 2022-03-09 2022-08-02 深圳市友杰智新科技有限公司 Detection method and device for phase-to-phase sound misidentification and computer equipment
CN116884399A (en) * 2023-09-06 2023-10-13 深圳市友杰智新科技有限公司 Method, device, equipment and medium for reducing voice misrecognition
CN116884399B (en) * 2023-09-06 2023-12-08 深圳市友杰智新科技有限公司 Method, device, equipment and medium for reducing voice misrecognition

Similar Documents

Publication Publication Date Title
CN106782536B (en) Voice awakening method and device
CN110415699B (en) Voice wake-up judgment method and device and electronic equipment
CN109273007B (en) Voice wake-up method and device
CN111880856B (en) Voice wakeup method and device, electronic equipment and storage medium
CN110473536B (en) Awakening method and device and intelligent device
CN111341325A (en) Voiceprint recognition method and device, storage medium and electronic device
CN112634867A (en) Model training method, dialect recognition method, device, server and storage medium
CN111161728B (en) Awakening method, awakening device, awakening equipment and awakening medium of intelligent equipment
CN113096647B (en) Voice model training method and device and electronic equipment
CN110767231A (en) Voice control equipment awakening word identification method and device based on time delay neural network
CN111462756B (en) Voiceprint recognition method and device, electronic equipment and storage medium
CN108536668B (en) Wake-up word evaluation method and device, storage medium and electronic equipment
CN111312222A (en) Awakening and voice recognition model training method and device
CN108595406B (en) User state reminding method and device, electronic equipment and storage medium
CN110544468B (en) Application awakening method and device, storage medium and electronic equipment
CN109767763A (en) It is customized wake up word determination method and for determine it is customized wake up word device
CN111710337A (en) Voice data processing method and device, computer readable medium and electronic equipment
CN111696555A (en) Method and system for confirming awakening words
CN111883121A (en) Awakening method and device and electronic equipment
CN113782009A (en) Voice awakening system based on Savitzky-Golay filter smoothing method
CN115457938A (en) Method, device, storage medium and electronic device for identifying awakening words
CN111145748A (en) Audio recognition confidence determining method, device, equipment and storage medium
CN111862963A (en) Voice wake-up method, device and equipment
CN111554270B (en) Training sample screening method and electronic equipment
CN112289311B (en) Voice wakeup method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination