CN111091839B - Voice awakening method and device, storage medium and intelligent device - Google Patents

Voice awakening method and device, storage medium and intelligent device Download PDF

Info

Publication number
CN111091839B
CN111091839B CN202010198736.XA CN202010198736A CN111091839B CN 111091839 B CN111091839 B CN 111091839B CN 202010198736 A CN202010198736 A CN 202010198736A CN 111091839 B CN111091839 B CN 111091839B
Authority
CN
China
Prior art keywords
matrix
category
voice
probability
attention
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010198736.XA
Other languages
Chinese (zh)
Other versions
CN111091839A (en
Inventor
徐泓洋
王广新
杨汉丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Youjie Zhixin Technology Co ltd
Original Assignee
Shenzhen Youjie Zhixin Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Youjie Zhixin Technology Co ltd filed Critical Shenzhen Youjie Zhixin Technology Co ltd
Priority to CN202010198736.XA priority Critical patent/CN111091839B/en
Publication of CN111091839A publication Critical patent/CN111091839A/en
Application granted granted Critical
Publication of CN111091839B publication Critical patent/CN111091839B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/06Decision making techniques; Pattern matching strategies
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/18Artificial neural networks; Connectionist approaches

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Health & Medical Sciences (AREA)
  • Business, Economics & Management (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Game Theory and Decision Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a voice awakening method, a voice awakening device, a storage medium and intelligent equipment; the method comprises the following steps: coding and calculating an input voice sequence through an Encoder, and outputting a first matrix with the same row and column number as the voice sequence; performing linear re-expression on the first matrix through a feedforward neural network, and outputting a second matrix, wherein the row and column number of the second matrix is the same as the row and column number of the first matrix; performing dimension compression on the second matrix through soft-attention to obtain an attention vector; identifying probabilities of a plurality of classes from the attention vector; and judging whether to execute the awakening function or not according to the probability result of the category. By using the Encoder, the feedforward neural network, the soft-attention and other structures and by referring to the internal structure of the Transformer, the attention vector containing local and global information is finally generated to obtain the probability of the category, whether the awakening function is executed or not is judged according to the probability result of the category, the parameter quantity is less, the end-to-end voice awakening judgment is realized, the response speed of the voice awakening method is higher, and the voice awakening method is suitable for voice awakening.

Description

Voice awakening method and device, storage medium and intelligent device
Technical Field
The present invention relates to the field of voice wake-up, and in particular, to a voice wake-up method, apparatus, storage medium, and intelligent device.
Background
In the fields of existing translation, voice recognition and the like, the current selectable basic network structures for constructing the voice recognition model comprise CNN, RNN/LSTM, a multi-head attention mechanism and the like, and each manufacturer can select a network suitable for the application requirement of the manufacturer to construct the voice recognition model; the effect achieved by the transform based on the multi-head attention machine system is better than the effect of a model based on prediction of a cnn/lstm combined with a CTC (connected _ Temporal _ classification) structure, which shows that the multi-head attention machine system has specific advantages in the aspect of feature extraction, but compared with the transform structure, the transform structure is more complex, the model is larger, and the transform is not suitable for a voice awakening scene.
Disclosure of Invention
The invention mainly aims to provide a voice awakening method, a voice awakening device, a storage medium and intelligent equipment, and can solve the problem that the existing Transformer structure is not suitable for voice awakening.
The invention provides a voice awakening method, which comprises the following steps:
coding and calculating an input voice sequence through an Encoder, and outputting a first matrix with the same row and column number as the voice sequence;
performing linear re-expression on the first matrix through a feedforward neural network, and outputting a second matrix, wherein the row and column number of the second matrix is the same as the row and column number of the first matrix;
performing dimension compression on the second matrix through soft-attention to obtain an attention vector;
identifying probabilities of a plurality of classes from the attention vector;
and judging whether to execute the awakening function or not according to the probability result of the category.
Further, the step of determining whether to execute the wake-up function according to the probability result of the category includes:
extracting the category with the highest probability as the identified category;
judging whether the identified category is a target category:
if yes, judging whether the probability of the identified category reaches a threshold value;
if so, executing the awakening function corresponding to the category;
if not, the identification result is ignored, and the awakening function is not executed.
Further, the step of performing encoding calculation on the input voice sequence through the Encoder and outputting a first matrix with the same row and column number as the voice sequence includes:
and performing coding calculation on the input voice sequence through N layers of superposed encoders, and outputting a first matrix with the same row number and column number as the voice sequence, wherein N is a positive integer.
Further, the step of identifying probabilities of the plurality of classes based on the attention vector includes:
inputting the attention vector into a full-connection layer for classification to obtain a plurality of classes;
the probability of belonging to each category is calculated according to the softmax function.
Further, the number of speech sequences and the first matrix rows and columns is 1 × 99 × 40.
Further, the number of attention vector rows and columns is 40 × 1.
Further, before the step of performing encoding calculation on the input speech sequence by the Encoder and outputting the first matrix with the same number of rows and columns as the speech sequence, the method includes:
and extracting FBANK features from the recorded original audio frame to obtain a voice sequence, wherein the voice sequence is a matrix with the row and column number of 1 × 99 × 40.
The present application further provides a voice wake-up apparatus, including:
the encoding module is used for encoding and calculating the input voice sequence through the Encoder and outputting a first matrix with the same row and column number as the voice sequence;
the second expression module is used for performing linear re-expression on the first matrix through a feedforward neural network and outputting a second matrix, and the row number and the column number of the second matrix are the same as those of the first matrix;
the attention vector acquisition module is used for carrying out dimension compression on the second matrix through soft-attention to obtain an attention vector;
the probability acquisition module is used for identifying the probabilities of a plurality of categories according to the attention vector;
and the awakening judgment module is used for judging whether to execute the awakening function according to the probability result of the category.
The present application also proposes a storage medium, which is a computer-readable storage medium, on which a computer program is stored, which when executed implements the above-mentioned voice wake-up method.
The application also provides an intelligent device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor implements the voice wake-up method when executing the computer program.
The voice awakening method comprises the steps of establishing an Encoder based on a multi-head attention mechanism by referencing the internal structure of a transducer through an Encoder, a feedforward neural network and a soft-attention structure and the like, carrying out dimension compression on input data after the encodings of the Encoder and the feedforward neural network, finally generating an attention vector containing local and global information to obtain the probability of a category, judging whether to execute an awakening function according to the probability result of the category, and realizing end-to-end voice awakening judgment, so that the response speed of the voice awakening method is higher, and the voice awakening method is suitable for voice awakening.
Drawings
FIG. 1 is a schematic diagram of a step structure of an embodiment of a voice wake-up method according to the present invention;
FIG. 2 is a schematic structural diagram of a voice wake-up apparatus according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of an embodiment of a storage medium according to the present invention;
fig. 4 is a schematic structural diagram of an embodiment of the smart device of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As used herein, the singular forms "a", "an", "the" and "the" include plural referents unless the content clearly dictates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, units, modules, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, units, modules, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.
It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
Referring to fig. 1, the voice wake-up method of the present invention includes the following steps:
s1, carrying out coding calculation on the input voice sequence through an Encoder, and outputting a first matrix with the same row and column number as the voice sequence;
s2, performing linear re-expression on the first matrix through a feedforward neural network, and outputting a second matrix, wherein the row and column number of the second matrix is the same as the row and column number of the first matrix;
s3, performing dimension compression on the second matrix through soft-attribute to obtain an attention vector;
s4, identifying the probability of a plurality of categories according to the attention vector;
and S5, judging whether to execute the awakening function according to the probability result of the category.
In step S1, in some embodiments, for an Encoder, the input feature sequence (e.g., 1 × 99 × 40 matrix) is linearly transformed (dense layer) and then uniformly divided into n parts, where n is the number of heads. Each head is subjected to matrix operation (self-integration) to obtain weighted small matrixes (for example, the small matrixes are 1 x 99 x 8), n small matrixes are spliced into a large matrix (namely, the large matrix is spliced again into a 1 x 99 x 40 matrix), then the result of linear transformation is added to the large matrix according to a residual linking mechanism, and a first matrix is output after normalization, wherein the first matrix is still 1 x 99 x 40. The rear side of the Encoder can be connected with the next Encoder, or only one Encoder; encoder performs feature learning on feature information from a local perspective.
In the above step S2, the feedforward neural network is a point-wise feedforward neural network, where the point-wise feedforward neural network is a point-wise feedforward neural network.
In the step S3, the soft-attribute can learn the feature information from the global perspective, and perform dimension compression to simplify the data processed in the next step, thereby increasing the processing speed.
In the above step S4, in some embodiments, a plurality of categories are obtained by inputting the attention vector into the fully-connected layer for classification; and calculating the probability of belonging to each category according to a softmax function, wherein the maximum value of the probability of each category is 1, and the minimum value of the probability of each category is 0.
In the above step S5, the category with the highest probability may be extracted as the identified category; then judging whether the identified category is a target category: if yes, judging whether the probability of the identified category reaches a threshold value; if so, executing the awakening function corresponding to the category; if not, the identification result is ignored, and the awakening function is not executed.
The voice awakening method comprises the steps that a matrix output by an Encoder is linearly re-expressed through a feedforward neural network, dimensionality compression is conducted on the matrix output by the feedforward neural network through soft-attention to obtain attention vectors, probabilities of multiple categories are obtained, the Encoder, the feedforward neural network, the soft-attention and other structures are used for reference of the internal structure of a transform, the Encoder is constructed based on a multi-head attention mechanism, input data are coded through the Encoder and then calculated through the feedforward neural network, dimensionality compression is conducted through a soft-attention mechanism, the attention vectors containing local and global information are finally generated, the probabilities of the categories are obtained, whether an awakening function is executed or not is judged according to the probability results of the categories, the parameter quantity is less, end-to-end voice awakening judgment is achieved, and the response speed of the voice awakening method is higher; and on the basis of local feature learning of atttion in Encoder, global feature learning of atttion in soft-atttion is added, so that the learning capability is stronger, the recognition effect is better, and the method is suitable for voice awakening.
Further, in some embodiments, the step S5 of determining whether to execute the wake-up function according to the probability result of the category includes:
s51, extracting the category with the highest probability as the identified category;
s52, judging whether the identified category is a target category:
s53, if yes, judging whether the probability of the identified category reaches a threshold value;
s54, if yes, executing the awakening function corresponding to the category;
and S55, if not, ignoring the identification result and not executing the awakening function.
In the above steps S51-S55, the threshold is a predetermined probability reference value, for example, 80%, when the probability of the identified category is greater than or equal to 80%, the probability of the identified category is determined to reach the threshold, and when the probability of the identified category is less than 80%, the probability of the identified category is determined not to reach the threshold, wherein the wake-up functions corresponding to different categories may be different.
Further, the step S1 of performing encoding calculation on the input speech sequence by the Encoder and outputting the first matrix having the same number of rows and columns as the speech sequence includes:
and S11, coding and calculating the input voice sequence through N layers of superposed encoders, and outputting a first matrix with the same row and column number as the voice sequence, wherein N is a positive integer.
In step S11, the first Encoder performs encoding calculation on the input speech sequence, outputs a matrix having the same row and column number as the speech sequence, and then sequentially inputs the matrix output by the previous Encoder for the next Encoder, and outputs the matrix having the same row and column number until the last Encoder outputs the first matrix; wherein, N is preferably 6, which not only ensures higher accuracy of the awakening result, but also can ensure higher processing speed.
Further, the step S4 of identifying probabilities of the plurality of categories according to the attention vector includes:
s41, inputting the attention vector into the full-connection layer for classification to obtain a plurality of classes;
and S42, calculating the probability of belonging to each category according to the softmax function.
In step S41, the category is a recognizable category, and only the predetermined category needs to be recognized, and the number of recognizable categories is small, the recognition speed is high, and it is not necessary to have a large number of recognizable categories like voice recognition, which results in a slow operation speed.
In the above step S42, the maximum value of the probability for each category is 1, i.e., 100%, and the minimum value of the probability for each category is 0.
By means of the Encoder, the feedforward neural network, soft-attention and other structures, the internal structure of a transform is used for reference, the Encoder is constructed based on a multi-head attention mechanism, input data are subjected to calculation through the feedforward neural network after being coded by the Encoder, dimension compression is carried out next to a soft-attention mechanism, attention vectors containing local and global information are finally generated, the probability of the type is obtained, whether the awakening function is executed or not is judged according to the probability result of the type, the parameter quantity is less, end-to-end voice awakening judgment is achieved, the response speed of the voice awakening method is higher, and the voice awakening method is suitable for voice awakening.
In some embodiments, the selected category may be the one with the highest probability of being the yes or no category, and whether to perform the wake-up function may be directly determined according to the selected category, for example, if the selected category is "yes", the wake-up function is performed, and if the selected category is "no", the wake-up function is not performed.
Further, preferably, the number of speech sequences and the number of rows and columns of the first matrix is 1 × 99 × 40. When the number of the rows and the columns of the matrix is 1 x 99 x 40, higher processing speed and higher accuracy can be ensured at the same time; in some embodiments, the speech sequence may adopt other row and column numbers, which are determined according to different requirements.
Further, the number of attention vector rows and columns is 40 × 1. And compressing each row in the second matrix with the row and column number of 1 × 99 × 40 to obtain the attention vector, and obtaining the attention vector with the row and column number of 40 × 1.
Further, before step S1 of performing encoding calculation on the input speech sequence by the Encoder and outputting the first matrix having the same number of rows and columns as the speech sequence, the method includes:
and S1a, extracting FBANK features from the recorded original audio frame to obtain a voice sequence, wherein the voice sequence is a matrix with the row and column number of 1 x 99 x 40.
In the above step S1a, the FBANK feature is 40 dimensions, the sampling rate of the original audio is 16000, the duration of the audio is 1 second, the audio is framed with twenty milliseconds as the window length and ten milliseconds as the step length, so the number of rows and columns of the speech sequence is 1 × 99 × 40; in some embodiments, parameters such as sampling rate, window length, step size, etc. may be appropriately adjusted according to application requirements.
Referring to fig. 2, the present application further provides a voice wake-up apparatus, including:
the encoding module 1 is used for encoding and calculating an input voice sequence through an Encoder and outputting a first matrix with the same row and column number as the voice sequence;
the re-expression module 2 is used for performing linear re-expression on the first matrix through a feedforward neural network and outputting a second matrix, and the row number and the column number of the second matrix are the same as those of the first matrix;
the attention vector acquisition module 3 is used for carrying out dimension compression on the second matrix through soft-attention to obtain an attention vector;
a probability obtaining module 4, configured to identify probabilities of multiple categories according to the attention vector;
and the awakening judgment module 5 is used for judging whether to execute the awakening function according to the probability result of the category.
In the encoding module 1, in some embodiments, for an Encoder, the input signature sequence (e.g., a matrix of 1 × 99 × 40) is linearly transformed (dense layer) and then uniformly divided into n parts, where n is the number of heads. Each head is subjected to matrix operation (self-integration) to obtain weighted small matrixes (for example, the small matrixes are 1 x 99 x 8), n small matrixes are spliced into a large matrix (namely, the large matrix is spliced again into a 1 x 99 x 40 matrix), then the result of linear transformation is added to the large matrix according to a residual linking mechanism, and a first matrix is output after normalization, wherein the first matrix is still 1 x 99 x 40. The rear side of the Encoder can be connected with the next Encoder, or only one Encoder; encoder performs feature learning on feature information from a local perspective.
In the re-expression module 2, the feedforward neural network is a point-wise feedforward neural network, wherein the point-wise feedforward neural network is a point-wise feedforward neural network.
In the attention vector acquisition module 3, the soft-attention can learn the feature information from the global angle, and through dimension compression, data processed in the next step is simplified, and the processing speed is increased.
In the probability obtaining module 4, in some embodiments, a plurality of categories are obtained by inputting the attention vector into the full-link layer for classification; and calculating the probability of belonging to each category according to a softmax function, wherein the maximum value of the probability of each category is 1, and the minimum value of the probability of each category is 0.
In the wake-up judging module 5, the category with the highest probability can be extracted as the identified category; then judging whether the identified category is a target category: if yes, judging whether the probability of the identified category reaches a threshold value; if so, executing the awakening function corresponding to the category; if not, the identification result is ignored, and the awakening function is not executed.
Referring to fig. 3, a storage medium 100, which is a computer-readable storage medium, is further provided, and a computer program 200 is stored on the storage medium, and when executed, the computer program 200 implements the voice wake-up method in any of the embodiments.
Referring to fig. 4, an embodiment of the present application further provides an intelligent device 300, which includes a memory 400, a processor 500, and a computer program 200 stored on the memory 400 and executable on the processor 500, wherein the processor 500 implements the voice wake-up method in any of the above embodiments when executing the computer program 200.
Those skilled in the art will appreciate that the smart device 300 of the embodiments of the present application is a device referred to above for performing one or more of the methods of the present application. These devices may be specially designed and manufactured for the required purposes, or they may comprise known devices in general-purpose computers. These devices have stored therein computer programs 200 or application programs, which computer programs 200 are selectively activated or reconfigured. Such a computer program 200 may be stored in a device (e.g., computer) readable medium, including, but not limited to, any type of disk including floppy disks, hard disks, optical disks, CD-ROMs, and magnetic-optical disks, ROMs (Read-Only memories), RAMs (Random Access memories), EPROMs (Erasable Programmable Read-Only memories), EEPROMs (Electrically Programmable Read-Only memories), flash memories, magnetic cards, or optical cards, or any type of media suitable for storing electronic instructions, and each coupled to a bus. That is, a readable medium includes any medium that stores or transmits information in a form readable by a device (e.g., a computer).
The voice awakening method comprises the steps of establishing an Encoder based on a multi-head attention mechanism by referencing the internal structure of a transducer through an Encoder, a feedforward neural network and a soft-attention structure and the like, carrying out dimension compression on input data after the encodings of the Encoder and the feedforward neural network, finally generating an attention vector containing local and global information to obtain the probability of a category, judging whether to execute an awakening function according to the probability result of the category, and realizing end-to-end voice awakening judgment, so that the response speed of the voice awakening method is higher, and the voice awakening method is suitable for voice awakening.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (8)

1. A voice wake-up method, comprising the steps of:
coding and calculating an input voice sequence through an Encoder, and outputting a first matrix with the same row and column number as the voice sequence;
performing linear re-expression on the first matrix through a feedforward neural network, and outputting a second matrix, wherein the row and column number of the second matrix is the same as the row and column number of the first matrix;
performing dimension compression on the second matrix through soft-attention to obtain an attention vector;
identifying probabilities of a plurality of classes from the attention vector;
and judging whether to execute the awakening function or not according to the probability result of the category.
2. The voice wake-up method according to claim 1, wherein the step of determining whether to perform the wake-up function according to the probability result of the class comprises:
extracting the category with the highest probability as the identified category;
judging whether the identified category is a target category:
if so, judging whether the probability of the identified category reaches a threshold value;
if so, executing the awakening function corresponding to the category;
if not, the identification result is ignored, and the awakening function is not executed.
3. The voice wake-up method according to claim 1, wherein the step of performing encoding calculation on the input voice sequence by an Encoder and outputting a first matrix with the same number of rows and columns as the voice sequence comprises:
and carrying out coding calculation on the input voice sequence through N layers of superposed encoders, and outputting a first matrix with the same row number and column number as the voice sequence, wherein N is a positive integer.
4. The voice wake method according to claim 1, wherein the step of identifying probabilities of classes based on the attention vector comprises:
inputting the attention vector into a full-connection layer for classification to obtain a plurality of classes;
the probability of belonging to each category is calculated according to the softmax function.
5. The voice wake-up method according to claim 1, wherein the number of attention vector rows and columns is 40 x 1.
6. A voice wake-up apparatus, comprising:
the encoding module is used for encoding and calculating the input voice sequence through the Encoder and outputting a first matrix with the same row and column number as the voice sequence;
the second expression module is used for performing linear re-expression on the first matrix through a feedforward neural network and outputting a second matrix, and the row number and the column number of the second matrix are the same as those of the first matrix;
the attention vector acquisition module is used for carrying out dimension compression on the second matrix through soft-attention to obtain an attention vector;
the probability acquisition module is used for identifying the probabilities of a plurality of categories according to the attention vector;
and the awakening judgment module is used for judging whether to execute the awakening function according to the probability result of the category.
7. Storage medium, characterized in that it is a computer-readable storage medium, on which a computer program is stored which, when being executed, implements a voice wake-up method according to any of claims 1 to 5.
8. An intelligent device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the voice wake-up method of any of claims 1 to 5 when executing the computer program.
CN202010198736.XA 2020-03-20 2020-03-20 Voice awakening method and device, storage medium and intelligent device Active CN111091839B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010198736.XA CN111091839B (en) 2020-03-20 2020-03-20 Voice awakening method and device, storage medium and intelligent device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010198736.XA CN111091839B (en) 2020-03-20 2020-03-20 Voice awakening method and device, storage medium and intelligent device

Publications (2)

Publication Number Publication Date
CN111091839A CN111091839A (en) 2020-05-01
CN111091839B true CN111091839B (en) 2020-06-26

Family

ID=70400576

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010198736.XA Active CN111091839B (en) 2020-03-20 2020-03-20 Voice awakening method and device, storage medium and intelligent device

Country Status (1)

Country Link
CN (1) CN111091839B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112669830A (en) * 2020-12-18 2021-04-16 上海容大数字技术有限公司 End-to-end multi-awakening-word recognition system
CN113051897B (en) * 2021-05-25 2021-09-10 中国电子科技集团公司第三十研究所 GPT2 text automatic generation method based on Performer structure
CN113282707B (en) * 2021-05-31 2024-01-26 平安国际智慧城市科技股份有限公司 Data prediction method and device based on transducer model, server and storage medium
CN113642319B (en) * 2021-07-29 2022-11-29 北京百度网讯科技有限公司 Text processing method and device, electronic equipment and storage medium
CN113762251B (en) * 2021-08-17 2024-05-10 慧影医疗科技(北京)股份有限公司 Attention mechanism-based target classification method and system

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107221326B (en) * 2017-05-16 2021-05-28 百度在线网络技术(北京)有限公司 Voice awakening method and device based on artificial intelligence and computer equipment
US11210475B2 (en) * 2018-07-23 2021-12-28 Google Llc Enhanced attention mechanisms
CN109872713A (en) * 2019-03-05 2019-06-11 深圳市友杰智新科技有限公司 A kind of voice awakening method and device
CN110619034A (en) * 2019-06-27 2019-12-27 中山大学 Text keyword generation method based on Transformer model
CN110534102B (en) * 2019-09-19 2020-10-30 北京声智科技有限公司 Voice wake-up method, device, equipment and medium

Also Published As

Publication number Publication date
CN111091839A (en) 2020-05-01

Similar Documents

Publication Publication Date Title
CN111091839B (en) Voice awakening method and device, storage medium and intelligent device
CN110738090B (en) System and method for end-to-end handwritten text recognition using neural networks
CN112418292B (en) Image quality evaluation method, device, computer equipment and storage medium
WO2024060684A1 (en) Model training method, image processing method, device, and storage medium
CN111242840A (en) Handwritten character generation method, apparatus, computer device and storage medium
CN112587129A (en) Human body action recognition method and device
CN111461979A (en) Verification code image denoising and identifying method, electronic device and storage medium
CN116258989A (en) Text and vision based space-time correlation type multi-modal emotion recognition method and system
CN115795038A (en) Intention identification method and device based on localization deep learning framework
CN113962371B (en) Image identification method and system based on brain-like computing platform
CN117197727B (en) Global space-time feature learning-based behavior detection method and system
CN110659641A (en) Character recognition method and device and electronic equipment
CN112489687B (en) Voice emotion recognition method and device based on sequence convolution
CN115063710A (en) Time sequence analysis method based on double-branch attention mechanism TCN
CN115116470A (en) Audio processing method and device, computer equipment and storage medium
CN114359786A (en) Lip language identification method based on improved space-time convolutional network
CN114638229A (en) Entity identification method, device, medium and equipment of record data
CN110990837B (en) System call behavior sequence dimension reduction method, system, equipment and storage medium
CN114004992A (en) Training method of multi-label classification model and multi-label classification method of image
CN110705331B (en) Sign language recognition method and device
You et al. A new training principle for stacked denoising autoencoders
CN110457700B (en) Short text description method and device
CN117708643B (en) Bridge monitoring abnormal data identification method and system based on fusion sequence characteristics
Wang et al. Global Context Encoding for Salient Objects Detection
CN116486835A (en) Synthetic voice detection method and system, computer device, and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant