CN112802452A - Junk instruction identification method and device - Google Patents

Junk instruction identification method and device Download PDF

Info

Publication number
CN112802452A
CN112802452A CN202011521158.5A CN202011521158A CN112802452A CN 112802452 A CN112802452 A CN 112802452A CN 202011521158 A CN202011521158 A CN 202011521158A CN 112802452 A CN112802452 A CN 112802452A
Authority
CN
China
Prior art keywords
audio
audio information
information
neural network
deep neural
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011521158.5A
Other languages
Chinese (zh)
Inventor
胡晓慧
孟振南
雷欣
李志飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Volkswagen China Investment Co Ltd
Mobvoi Innovation Technology Co Ltd
Original Assignee
Go Out And Ask Wuhan Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Go Out And Ask Wuhan Information Technology Co ltd filed Critical Go Out And Ask Wuhan Information Technology Co ltd
Priority to CN202011521158.5A priority Critical patent/CN112802452A/en
Publication of CN112802452A publication Critical patent/CN112802452A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

A junk instruction identification method and device are disclosed. The method should include obtaining audio information; converting the audio information into character information; extracting audio features of the audio information to generate an audio feature set; acquiring a feature vector of the text information by using a pre-trained text model; and inputting the audio feature set and the feature vector as a deep neural network classifier, and determining whether the audio information is a junk instruction according to the output of the deep neural network.

Description

Junk instruction identification method and device
Technical Field
The application relates to the technical field of natural language processing, in particular to a junk instruction identification method and device.
Background
At present, most intelligent equipment has a voice recognition function, and the voice recognition state of the intelligent equipment has two types, one type is an awakening-free state, and the other type is an awakening state. The wake-free state is the most different from the wake state in that the user speaks the wake word to wake up the smart device first and then speaks the command after the smart device is woken up. The voice of the user received by the intelligent device after being awakened can be regarded as a valid instruction, so that the functional identification of the instruction content (such as weather checking, music playing and the like) is directly carried out. And the intelligent equipment can realize the state of continuous conversation by waking up once in the state of no waking up, and does not need to say a waking up word once in each conversation, thereby being capable of obtaining better user experience.
However, in the wake-up-free state, the smart device is required to be able to recognize whether the received audio is an instruction to the smart device, filter out invalid interference information, and then react. Therefore, how to identify whether the received audio is a spam instruction with high quality needs to be solved.
Disclosure of Invention
In order to solve the above problems, the present invention provides a method and an apparatus for identifying a junk instruction, which can identify whether a received audio is a junk instruction with high quality, thereby improving the accuracy of audio identification of an intelligent device in an awake-free state and improving user experience.
In order to achieve the above object, in a first aspect, an embodiment of the present invention provides a method for identifying a spam instruction, where the method includes:
acquiring audio information;
converting the audio information into character information;
extracting audio features of the audio information to generate an audio feature set;
acquiring a feature vector of the text information by using a pre-trained text model;
and inputting the audio feature set and the feature vector as a deep neural network classifier, and determining whether the audio information is a junk instruction according to the output of the deep neural network.
Preferably, after the acquiring the audio information, the method further comprises: and if the audio information cannot be converted into character information, determining that the audio information is a junk instruction, and discarding the audio information.
Preferably, after the audio feature set and the feature vector are input as a deep neural network classifier and whether the audio information is a spam instruction is determined according to the output of the deep neural network, the method further comprises: if the audio information is not a junk instruction, performing natural language understanding on the text information, and executing an action corresponding to the audio information; discarding the audio information if the audio information is a spam instruction.
Preferably, the inputting the audio feature set and the feature vector as a deep neural network classifier, and determining whether the audio information is a spam instruction according to an output of the deep neural network includes: and synthesizing the audio feature set and the feature vector into a one-dimensional feature, inputting the one-dimensional feature as a deep neural network classifier, and determining whether the audio information is a junk instruction according to the output of the deep neural network.
Preferably, the audio features include: voice audio features, voice text features, and voice duration.
In a second aspect, an embodiment of the present invention provides a garbage instruction recognition apparatus, where the apparatus includes:
a first acquisition unit configured to acquire audio information;
the conversion unit is used for converting the audio information into character information;
the generating unit is used for extracting the audio features of the audio information to generate an audio feature set;
the second acquisition unit is used for acquiring the feature vector of the text information by using a pre-trained text model;
and the determining unit is used for inputting the audio feature set and the feature vector as a deep neural network classifier and determining whether the audio information is a junk instruction according to the output of the deep neural network.
Preferably, the apparatus further comprises: and the discarding unit is used for determining that the audio information is a junk instruction and discarding the audio information if the audio information cannot be converted into character information.
Preferably, the apparatus further comprises: the execution unit is used for performing natural language understanding on the text information and executing the action corresponding to the audio information if the audio information is not a junk instruction; a discarding unit configured to discard the audio information if the audio information is a spam instruction.
Preferably, the determining unit is specifically configured to: and synthesizing the audio feature set and the feature vector into a one-dimensional feature, inputting the one-dimensional feature as a deep neural network classifier, and determining whether the audio information is a junk instruction according to the output of the deep neural network.
Preferably, the audio features include: voice audio features, voice text features, and voice duration.
In a third aspect, an embodiment of the present invention provides a computer-readable storage medium, where the storage medium stores a computer program, and the computer program is configured to execute the method for identifying a spam instruction according to the first aspect.
In a fourth aspect, an embodiment of the present invention provides an electronic device, including:
a processor;
a memory for storing the processor-executable instructions;
the processor is configured to read the executable instruction from the memory, and execute the instruction to implement the method for identifying spam instructions according to the first aspect.
By utilizing the junk instruction identification method and device provided by the invention, the audio characteristics of the received audio information and the characteristics of the text information corresponding to the audio information are combined, the audio characteristics and the text characteristics are simultaneously used as the input of the deep neural network classifier, and the deep neural network classifier is utilized for identification, so that whether the received audio is the junk instruction or not can be identified with high quality, and therefore, the intelligent device can effectively filter invalid contents, accurately identify the user instruction and better improve the user experience in the state of no awakening.
Drawings
The above and other objects, features and advantages of the present application will become more apparent by describing in more detail embodiments of the present application with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of the embodiments of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the principles of the application. In the drawings, like reference numbers generally represent like parts or steps.
Fig. 1 is a schematic flowchart of a method for identifying a spam instruction according to an exemplary embodiment of the present application;
fig. 2 is a block diagram of a garbage instruction recognition apparatus according to an exemplary embodiment of the present application;
fig. 3 is a block diagram of another garbage instruction identification apparatus according to an exemplary embodiment of the present application;
fig. 4 is a block diagram of an electronic device according to an exemplary embodiment of the present application.
Detailed Description
Hereinafter, example embodiments according to the present application will be described in detail with reference to the accompanying drawings. It should be understood that the described embodiments are only some embodiments of the present application and not all embodiments of the present application, and that the present application is not limited by the example embodiments described herein.
Fig. 1 is a schematic flowchart of a method for identifying a spam instruction according to an embodiment of the present application. The junk instruction identification method can be applied to electronic equipment, and as shown in fig. 1, the method includes:
step 101, audio information is acquired.
In an example, an application scenario of the junk instruction identification method is that the electronic device is in an awake-free state, and in this scenario, the obtained audio information may include: background sounds or voices, wherein a voice may be a valid instruction or may be content that a user chats.
Step 102, converting the audio information into text information.
Specifically, the audio information may be recognized by an Automatic Speech Recognition (ASR) module in the electronic device, and the audio information may be converted into text information.
It is understood that not all audio information may be identified and converted into text information, such as a noisy background sound, and if the received audio information cannot be converted into text information, the audio information may be considered as a spam command. Based on this, the method may further comprise:
and if the audio information cannot be converted into character information, determining that the audio information is a junk instruction, and discarding the audio information.
And 103, extracting the audio features of the audio information to generate an audio feature set.
Among other things, audio features include, but are not limited to: voice audio features, voice text features, and voice duration. And then, performing characteristic accumulation and combination on the generated audio characteristics to obtain an audio characteristic set.
And 104, acquiring a feature vector of the text information by using the pre-trained text model.
It should be noted that the process of obtaining the feature vector of the text information can be implemented by using the prior art, and is not described herein again.
And 105, inputting the audio feature set and the feature vector as a deep neural network classifier, and determining whether the audio information is a junk instruction according to the output of the deep neural network.
Specifically, the method for determining whether the audio information is a junk instruction according to the output of the deep neural network by taking the audio feature set and the feature vector as the input of a deep neural network classifier comprises the following steps:
and synthesizing the audio feature set and the feature vector into a one-dimensional feature, inputting the one-dimensional feature as a deep neural network classifier, and determining whether the audio information is a junk instruction according to the output of the deep neural network.
In a specific example, the audio feature set is a one-dimensional feature 1 with a length of m, the feature vector of the text message is a one-dimensional feature 2 with a length of n, the one-dimensional feature 1 and the one-dimensional feature 2 are spliced into a one-dimensional feature with a length of (m + n), the one-dimensional feature with the length of (m + n) is used as an input of a deep neural network classifier, and whether the audio message is a spam instruction is determined according to an output of the deep neural network.
In one example, the method may further comprise:
and if the audio information is not the junk instruction, performing natural language understanding on the text information and executing the action corresponding to the audio information.
If the audio information is a spam instruction, the audio information is discarded.
By using the junk instruction identification method provided by the embodiment of the invention, the audio characteristics of the received audio information and the characteristics of the text information corresponding to the audio information are combined, the audio characteristics and the text characteristics are simultaneously used as the input of the deep neural network classifier, and the deep neural network classifier is used for identification, so that whether the received audio is the junk instruction or not can be identified with high quality, and therefore, the intelligent device can effectively filter invalid contents, accurately identify the user instruction and better improve the user experience in the state of no awakening.
An embodiment of the present invention provides a garbage instruction recognition apparatus, and fig. 2 is a structural diagram of the garbage instruction recognition apparatus. The device can be applied to electronic equipment. As shown in fig. 2, the spam instruction identifying apparatus includes:
a first acquisition unit 201 for acquiring audio information;
a conversion unit 202, configured to convert the audio information into text information;
a generating unit 203, configured to extract audio features of the audio information to generate an audio feature set;
a second obtaining unit 204, configured to obtain a feature vector of the text information by using a pre-trained text model;
the determining unit 205 is configured to input the audio feature set and the feature vector as a deep neural network classifier, and determine whether the audio information is a spam instruction according to an output of the deep neural network.
Preferably, as shown in fig. 3, the apparatus further comprises: a discarding unit 206, configured to determine that the audio information is a spam instruction if the audio information cannot be converted into text information, and discard the audio information.
Preferably, as shown in fig. 3, the apparatus further comprises: an executing unit 207, configured to perform natural language understanding on the text information and execute an action corresponding to the audio information if the audio information is not a spam instruction; a discarding unit 206, configured to discard the audio information if the audio information is a spam instruction.
Preferably, the determining unit 205 is specifically configured to: and synthesizing the audio feature set and the feature vector into a one-dimensional feature, inputting the one-dimensional feature as a deep neural network classifier, and determining whether the audio information is a junk instruction according to the output of the deep neural network.
Preferably, the audio features include: voice audio features, voice text features, and voice duration.
By utilizing the junk instruction identification device provided by the invention, the audio characteristics of the received audio information and the characteristics of the text information corresponding to the audio information are combined, the audio characteristics and the text characteristics are simultaneously used as the input of the deep neural network classifier, and the deep neural network classifier is utilized for identification, so that whether the received audio is a junk instruction or not can be identified with high quality, and therefore, the intelligent equipment can effectively filter invalid contents, accurately identify user instructions and better improve user experience under the state of no awakening.
Next, an electronic apparatus 11 according to an embodiment of the present application is described with reference to fig. 4.
As shown in fig. 4, the electronic device 11 includes one or more processors 111 and memory 112.
The processor 111 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 11 to perform desired functions.
Memory 112 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on the computer-readable storage medium and executed by processor 111 to implement the spam instruction identification methods of the various embodiments of the application described above and/or other desired functionality. Various contents such as an input signal, a signal component, a noise component, etc. may also be stored in the computer-readable storage medium.
In one example, the electronic device 11 may further include: an input device 113 and an output device 114, which are interconnected by a bus system and/or other form of connection mechanism (not shown).
The input device 113 may include, for example, a keyboard, a mouse, and the like.
The output device 114 may output various information including the determined distance information, direction information, and the like to the outside. The output devices 114 may include, for example, a display, speakers, a printer, and a communication network and remote output devices connected thereto, among others.
Of course, for the sake of simplicity, only some of the components of the electronic device 11 relevant to the present application are shown in fig. 4, and components such as buses, input/output interfaces, and the like are omitted. In addition, the electronic device 11 may include any other suitable components, depending on the particular application.
Exemplary computer program product and computer-readable storage Medium
In addition to the above-described methods and apparatus, embodiments of the present application may also be a computer program product comprising computer program instructions that, when executed by a processor, cause the processor to perform the steps in the method for spam instruction identification according to various embodiments of the present application described in the above-mentioned "exemplary methods" section of this specification.
The computer program product may be written with program code for performing the operations of embodiments of the present application in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.
Furthermore, embodiments of the present application may also be a computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, cause the processor to perform the steps in the junk instruction recognition method according to various embodiments of the present application described in the "exemplary methods" section above in this specification.
The computer-readable storage medium may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The foregoing describes the general principles of the present application in conjunction with specific embodiments, however, it is noted that the advantages, effects, etc. mentioned in the present application are merely examples and are not limiting, and they should not be considered essential to the various embodiments of the present application. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the foregoing disclosure is not intended to be exhaustive or to limit the disclosure to the precise details disclosed.
The block diagrams of devices, apparatuses, systems referred to in this application are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations, etc. must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," "comprising," "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably therewith. The words "or" and "as used herein mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".
It should also be noted that in the devices, apparatuses, and methods of the present application, the components or steps may be decomposed and/or recombined. These decompositions and/or recombinations are to be considered as equivalents of the present application.
The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present application. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the application. Thus, the present application is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The foregoing description has been presented for purposes of illustration and description. Furthermore, the description is not intended to limit embodiments of the application to the form disclosed herein. While a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, alterations, additions and sub-combinations thereof.

Claims (12)

1. A junk instruction recognition method, the method comprising:
acquiring audio information;
converting the audio information into character information;
extracting audio features of the audio information to generate an audio feature set;
acquiring a feature vector of the text information by using a pre-trained text model;
and inputting the audio feature set and the feature vector as a deep neural network classifier, and determining whether the audio information is a junk instruction according to the output of the deep neural network.
2. The method of claim 1, wherein after the obtaining audio information, the method further comprises:
and if the audio information cannot be converted into character information, determining that the audio information is a junk instruction, and discarding the audio information.
3. The method of claim 1, wherein after the inputting the set of audio features and the feature vector as a deep neural network classifier and determining whether the audio information is a spam instruction according to an output of the deep neural network, the method further comprises:
if the audio information is not a junk instruction, performing natural language understanding on the text information, and executing an action corresponding to the audio information;
discarding the audio information if the audio information is a spam instruction.
4. The method of claim 1, wherein inputting the audio feature set and the feature vector as a deep neural network classifier, and determining whether the audio information is a spam instruction according to an output of the deep neural network comprises:
and synthesizing the audio feature set and the feature vector into a one-dimensional feature, inputting the one-dimensional feature as a deep neural network classifier, and determining whether the audio information is a junk instruction according to the output of the deep neural network.
5. The method of claim 1, wherein the audio features comprise: voice audio features, voice text features, and voice duration.
6. A junk instruction recognition apparatus, the apparatus comprising:
a first acquisition unit configured to acquire audio information;
the conversion unit is used for converting the audio information into character information;
the generating unit is used for extracting the audio features of the audio information to generate an audio feature set;
the second acquisition unit is used for acquiring the feature vector of the text information by using a pre-trained text model;
and the determining unit is used for inputting the audio feature set and the feature vector as a deep neural network classifier and determining whether the audio information is a junk instruction according to the output of the deep neural network.
7. The apparatus of claim 6, further comprising:
and the discarding unit is used for determining that the audio information is a junk instruction and discarding the audio information if the audio information cannot be converted into character information.
8. The apparatus of claim 6, further comprising:
the execution unit is used for performing natural language understanding on the text information and executing the action corresponding to the audio information if the audio information is not a junk instruction;
a discarding unit configured to discard the audio information if the audio information is a spam instruction.
9. The apparatus according to claim 1, wherein the determining unit is specifically configured to:
and synthesizing the audio feature set and the feature vector into a one-dimensional feature, inputting the one-dimensional feature as a deep neural network classifier, and determining whether the audio information is a junk instruction according to the output of the deep neural network.
10. The apparatus of claim 1, wherein the audio features comprise: voice audio features, voice text features, and voice duration.
11. A computer-readable storage medium storing a computer program for executing the method of identifying a spam instruction according to any one of claims 1-5.
12. An electronic device, comprising:
a processor;
a memory for storing the processor-executable instructions;
the processor is used for reading the executable instructions from the memory and executing the instructions to realize the junk instruction identification method of any one of the claims 1-5.
CN202011521158.5A 2020-12-21 2020-12-21 Junk instruction identification method and device Pending CN112802452A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011521158.5A CN112802452A (en) 2020-12-21 2020-12-21 Junk instruction identification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011521158.5A CN112802452A (en) 2020-12-21 2020-12-21 Junk instruction identification method and device

Publications (1)

Publication Number Publication Date
CN112802452A true CN112802452A (en) 2021-05-14

Family

ID=75807103

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011521158.5A Pending CN112802452A (en) 2020-12-21 2020-12-21 Junk instruction identification method and device

Country Status (1)

Country Link
CN (1) CN112802452A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109326289A (en) * 2018-11-30 2019-02-12 深圳创维数字技术有限公司 Exempt to wake up voice interactive method, device, equipment and storage medium
KR20190062008A (en) * 2017-11-28 2019-06-05 한국전자통신연구원 Deep-Neural network based state determination appratus and method for speech recognition acoustic models
CN110139146A (en) * 2019-04-03 2019-08-16 深圳康佳电子科技有限公司 Speech recognition anti-interference method, device and storage medium based on Application on Voiceprint Recognition
JP2019200408A (en) * 2018-05-18 2019-11-21 バイドゥ オンライン ネットワーク テクノロジー (ベイジン) カンパニー リミテッド Method and device for generating voice synthesis model
CN110838289A (en) * 2019-11-14 2020-02-25 腾讯科技(深圳)有限公司 Awakening word detection method, device, equipment and medium based on artificial intelligence
CN111968625A (en) * 2020-08-26 2020-11-20 上海依图网络科技有限公司 Sensitive audio recognition model training method and recognition method fusing text information

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20190062008A (en) * 2017-11-28 2019-06-05 한국전자통신연구원 Deep-Neural network based state determination appratus and method for speech recognition acoustic models
JP2019200408A (en) * 2018-05-18 2019-11-21 バイドゥ オンライン ネットワーク テクノロジー (ベイジン) カンパニー リミテッド Method and device for generating voice synthesis model
CN109326289A (en) * 2018-11-30 2019-02-12 深圳创维数字技术有限公司 Exempt to wake up voice interactive method, device, equipment and storage medium
CN110139146A (en) * 2019-04-03 2019-08-16 深圳康佳电子科技有限公司 Speech recognition anti-interference method, device and storage medium based on Application on Voiceprint Recognition
CN110838289A (en) * 2019-11-14 2020-02-25 腾讯科技(深圳)有限公司 Awakening word detection method, device, equipment and medium based on artificial intelligence
CN111968625A (en) * 2020-08-26 2020-11-20 上海依图网络科技有限公司 Sensitive audio recognition model training method and recognition method fusing text information

Similar Documents

Publication Publication Date Title
CN105931644B (en) A kind of audio recognition method and mobile terminal
CN112634867A (en) Model training method, dialect recognition method, device, server and storage medium
EP3477635B1 (en) System and method for natural language processing
CN110047481B (en) Method and apparatus for speech recognition
CN110209812B (en) Text classification method and device
CN111916061B (en) Voice endpoint detection method and device, readable storage medium and electronic equipment
CN104969288A (en) Methods and systems for providing speech recognition systems based on speech recordings logs
CN112017633B (en) Speech recognition method, device, storage medium and electronic equipment
CN110826637A (en) Emotion recognition method, system and computer-readable storage medium
CN111326154B (en) Voice interaction method and device, storage medium and electronic equipment
CN114038457A (en) Method, electronic device, storage medium, and program for voice wakeup
CN111210824B (en) Voice information processing method and device, electronic equipment and storage medium
CN113515594A (en) Intention recognition method, intention recognition model training method, device and equipment
CN113053377B (en) Voice awakening method and device, computer readable storage medium and electronic equipment
CN113611316A (en) Man-machine interaction method, device, equipment and storage medium
CN113808621A (en) Method and device for marking voice conversation in man-machine interaction, equipment and medium
CN111862943B (en) Speech recognition method and device, electronic equipment and storage medium
CN108962226B (en) Method and apparatus for detecting end point of voice
CN113889091A (en) Voice recognition method and device, computer readable storage medium and electronic equipment
CN107919127B (en) Voice processing method and device and electronic equipment
CN112306560B (en) Method and apparatus for waking up an electronic device
CN113903338A (en) Surface labeling method and device, electronic equipment and storage medium
CN111508481A (en) Training method and device of voice awakening model, electronic equipment and storage medium
CN112802452A (en) Junk instruction identification method and device
CN113096651A (en) Voice signal processing method and device, readable storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20211125

Address after: 210000 8th floor, building D11, Hongfeng science and Technology Park, Nanjing Economic and Technological Development Zone, Jiangsu Province

Applicant after: New Technology Co.,Ltd.

Applicant after: Volkswagen (China) Investment Co., Ltd

Address before: 430223 floor 30, building a, block K18, poly times, No. 332, Guanshan Avenue, Donghu New Technology Development Zone, Wuhan City, Hubei Province

Applicant before: Go out and ask (Wuhan) Information Technology Co.,Ltd.