CN110223429A - Voice access control system - Google Patents

Voice access control system Download PDF

Info

Publication number
CN110223429A
CN110223429A CN201910534516.7A CN201910534516A CN110223429A CN 110223429 A CN110223429 A CN 110223429A CN 201910534516 A CN201910534516 A CN 201910534516A CN 110223429 A CN110223429 A CN 110223429A
Authority
CN
China
Prior art keywords
voice
voice messaging
network
processor
access control
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910534516.7A
Other languages
Chinese (zh)
Inventor
沈希忠
孙陈影
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Institute of Technology
Original Assignee
Shanghai Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Institute of Technology filed Critical Shanghai Institute of Technology
Priority to CN201910534516.7A priority Critical patent/CN110223429A/en
Publication of CN110223429A publication Critical patent/CN110223429A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G07CHECKING-DEVICES
    • G07CTIME OR ATTENDANCE REGISTERS; REGISTERING OR INDICATING THE WORKING OF MACHINES; GENERATING RANDOM NUMBERS; VOTING OR LOTTERY APPARATUS; ARRANGEMENTS, SYSTEMS OR APPARATUS FOR CHECKING NOT PROVIDED FOR ELSEWHERE
    • G07C9/00Individual registration on entry or exit
    • G07C9/30Individual registration on entry or exit not involving the use of a pass
    • G07C9/32Individual registration on entry or exit not involving the use of a pass in combination with an identity check
    • G07C9/37Individual registration on entry or exit not involving the use of a pass in combination with an identity check using biometric data, e.g. fingerprints, iris scans or voice recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/18Artificial neural networks; Connectionist approaches
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/22Interactive procedures; Man-machine interfaces

Abstract

The present invention provides a kind of voice access control systems, carry out real voice identification using phonetic anti-fake turing test and deep learning, realize gate function.The system includes: the processor for being loaded with turing test module, generating confrontation network, two-way GRU neural network, and the processor and gate inhibition's driving mechanism communicate to connect, realize according to phonetic feature, open or close gate inhibition.The present invention is suitable under the specific environment of speaker not at the scene, turing test is carried out to acquired voice, it is determined as after real speech again, speech enhan-cement processing is carried out by production confrontation network, feature extraction then is carried out to enhanced voice using parameters such as Mel cepstrum (MFCC), completes Speaker Identification via depth bidirectional valve controlled cycling element (GRU) network.

Description

Voice access control system
Technical field
The present invention relates to voice processing technology fields, and in particular, to voice access control system.
Background technique
With the rapid development of electronic information technology, conventional door lock constantly develops to high-tech, intelligent direction, with biology Feature identification combines the intelligent identifying system of conventional lock to progress into people's lives.More and more enterprises by this Kind intelligent identifying system is applied to entrance guard management and attendance management.
Currently, the voiceprint in voice is incorporated into access control system as biological characteristic.
But existing voice access control system is easy to be recorded, the modes such as editing sound crack, there are certain safety is hidden Suffer from.
Summary of the invention
For the defects in the prior art, the object of the present invention is to provide a kind of voice access control systems.
A kind of voice access control system provided according to the present invention, comprising: voice acquisition module, processor, gate inhibition's driving machine Structure is loaded with turing test module in the processor, generates confrontation network, two-way GRU neural network, the voice collecting mould Block is connect with the processor communication, and the processor and gate inhibition's driving mechanism communicate to connect;Wherein:
The voice acquisition module gives the processor for acquiring voice messaging, and by the transmission of speech information;
The turing test module, for analyzing the voice messaging, with the determination voice messaging whether be The voice messaging is then inputted the generation and fights network by real voice information if real voice information;
The generation fights network, for carrying out enhancing processing to the voice messaging received, obtains enhancing processing Voice messaging afterwards;
The two-way GRU neural network obtains language for carrying out feature extraction to enhancing treated the voice messaging Sound feature;And judge whether the phonetic feature meets the phonetic feature of target person;Wherein, the two-way GRU neural network Feature extraction is carried out to enhancing treated the voice messaging by MFCC;The two-way GRU neural network is pre- first passes through The trained learning model for having phonetic feature recognition capability;
The processor when for meeting the phonetic feature of target person in the phonetic feature, controlling the gate inhibition and driving Motivation structure opening gate.
Optionally, the turing test module, is specifically used for: when receiving the voice messaging, random generate is preset The problem of quantity, receives the corresponding correct option of described problem, then passes through turing test within a preset time.
Optionally, the generation fights network, including arbiter and generator, and the arbiter is for judging the generation Treated whether voice messaging is real speech for the enhancing of device output;The generator is for increasing the voice messaging Strength reason, and will enhancing treated that voice messaging is input in the arbiter.
Optionally, the two-way GRU neural network is extracted 39 dimension MFCC characteristic parameters using mel-frequency cepstrum coefficient and is made For the phonetic feature.
Optionally, the processor uses TMS320DM8168 development board, adds on the TMS320DM8168 development board It carries turing test module, generate confrontation network, two-way GRU neural network.Compared with prior art, the present invention has with following Beneficial effect:
Voice access control system provided by the invention can differentiate the true and false property of identified person, improve precision of identifying speech, from And realize that accurately access control, safety are higher, user experience is good.
Detailed description of the invention
Upon reading the detailed description of non-limiting embodiments with reference to the following drawings, other feature of the invention, Objects and advantages will become more apparent upon:
Fig. 1 is the schematic illustration for the voice access control system that one embodiment of the invention provides;
Fig. 2 is the flow diagram of the training method of SEGAN;
Fig. 3 is the two-way GRU structure chart of depth in one embodiment of the invention;
Fig. 4 is the hardware realization block diagram in one embodiment of the invention.
Specific embodiment
The present invention is described in detail combined with specific embodiments below.Following embodiment will be helpful to the technology of this field Personnel further understand the present invention, but the invention is not limited in any way.It should be pointed out that the ordinary skill of this field For personnel, without departing from the inventive concept of the premise, several changes and improvements can also be made.These belong to the present invention Protection scope.
Fig. 1 is the schematic illustration for the voice access control system that one embodiment of the invention provides, as shown in Figure 1, acquiring first Speaker's voice messaging starts turing test then when receiving voice messaging, if being judged as true by turing test Then people's voice messaging will be input to deep learning module by the voice messaging of turing test.The deep learning module includes: Generate confrontation network and the two-way GRU network of depth.
In the present embodiment, judges that identified speaker is true, real-time by turing test first, rather than record Sound or machine simulation sound.The method of turing test is exactly to allow tester and testee, by a kind of special mode to quilt Tester arbitrarily puts question to.It carries out after repeatedly testing, if there is being more than that 30% tester cannot determine that testee is that people goes back It is machine, then this machine has just passed through test.The present invention in certain circumstances, judged using the method for turing test, The problem of speaking ability by random question, makes recording that cannot answer real time problem, and machine person's development is also less than the true mankind.So far, Very rare robot can be by turing test, so turing test can accurately judge that identified speaker is true people Or other.
In the present embodiment, speaker's voice is enhanced using GAN.Production fights network (GAN, Generative Adversarial Networks) it is a kind of deep learning model, it is unsupervised learning most prospect in complex distributions in recent years One of method.GAN network contains two confrontation models: generating model (G) input is that band is made an uproar picture, and output generates one and sees Get up as genuine picture, confuses discrimination model;Discrimination model (D) is for judging that a given picture is true picture (including the picture obtained in data set and the output picture for generating network).In rigid start, two models are all not pass through Training is crossed, dual training, generation model generate picture and remove deception discrimination model two models together, and then discrimination model goes to sentence Break the true and false of this picture, final two model capabilities are more and more stronger, reach stable state.The present invention uses the language based on confrontation network Sound Enhancement Method SEGAN (Speech Enhancement GAN), the advantage of this method is: 1) providing a Rapid Speech Enhancing process, no causality are required, no recursive operations as RNN;2) it is processed based on original audio. Manual feature is not extracted, specific hypothesis is not made to initial data;3) from different speakers and noise type middle school It practises, and merges them into identical shared parameter, so that system is simple and generalization ability is stronger.The input of SEGAN is to contain Noisy speech signal and potential characterization signal, output are enhanced signals.It is to be entirely convolutional layer (without complete by Generator Design Articulamentum), training parameter can be reduced so as to shorten the training time by doing so.An important feature for generating network is that end is arrived End structure, directly processing primary speech signal, avoid and extract acoustic feature by intermediate conversion.In the training process, identify Device is responsible for sending true and false information in input data to generator, and generator is allowed to output it waveform towards true distribution Fine tuning, to eliminate interference signal.Speaker Identification is carried out again by the enhanced voice signal of SEGAN.
SEGAN whole network is made of CNN, it is a codec (encoder-decoder), the structure of D It is encoder (encoder), above connects a dimensionality reduction layer.8 × 1024 parameters are reduced to 8.Encoder is by 1 dimension that step-length is 2 Convolutional layer is constituted.It inputs as Noisy Speech Signal and isOutput is that enhancing voice signal isSpeech enhan-cement process is complete with G network At,It is input, output with implicit function zG is a full convolutional network, similar with autocoder.In coding, Input signal is projected and compressed with a series of great-leap-forward convolutional layer activation primitives, every N walks to obtain a convolution results.Experiment card Bright, great-leap-forward convolution is more preferable than pond method in GAN network training.Great-leap-forward connection is the wave directly skipped in decoding process The fine tuning line information of shape, and its gradient can flow intensification in total, and this operation prevents low level details to exist Reconstructed speech waveform is lost.Decoding process and cataloged procedure on the contrary, replaced convolution with a small amount of great-leap-forward, function used with it is encoded Journey uses identical activation primitive.
There are three the stages for the training process of SEGAN.(1) arbiter D inputs noisy speech and corresponding clean speech, by it Label is set as very, the parameter of training D;(2) generator G inputs noisy speech, generates enhancing voice, inputs together with noisy speech D, label are set as false, update the parameter of D;(3) D is fixed, and is repeated step (2), is updated the parameter of G.It completes the above three steps, G The as network of speech enhan-cement.Specifically, training process is as shown in Figure 2.
Specifically, Fig. 3 is the two-way GRU structure chart of depth in one embodiment of the invention, as shown in figure 3, double using depth Speech recognition is carried out to GRU.Speech characteristic parameter uses mel-frequency cepstrum coefficient (Mel Frequency Cepstral Coefficent, MFCC), take input of the 39 dimension MFCC characteristic parameters as deep learning.The present invention uses the two-way GRU of depth (BiGRUs) Speaker Identification is carried out, being proposed to of LSTM (Long Short Time Memory) overcomes RNN (Recurrent Neural Network) can not handle remote dependence ground problem, GRU (Gated Recurrent well Unit) be LSTM a variant, GRU by analysis LSTM framework which be partially improving of really needing, will Forget door and input gate has synthesized a update door.It is equally also mixed with cell state and hidden state, final model is than double It will succinctly efficiently to LSTM model.Only there are two doors, respectively update door z by GRU(t)With resetting door r(t).It is previous to update door control The status information at moment is brought into the degree in current state, and the status information of the bigger previous moment of value is brought into more.Resetting The degree of the status information of previous moment is ignored in door control, and the smaller explanation of value is ignored more.Propagated forward formula is defined as follows:
netR, t=wrhht-1+wrxxt+br
netZ, t=wzhht-1+wzxxt+bz
netG, t=wgh(rt*ht-1)+wgxxt+bg (1)
Wherein: netR, tIndicate the resetting door network state in t moment, wrhIndicate the resetting door weight at the t-1 moment, ht-1 Indicate the hidden state at the t-1 moment, wrxIndicate the resetting door weight in t moment, xtIndicate the input in t moment, brIt indicates It is biased in the resetting door of t moment, netZ, tIndicate the update door network state in t moment, wzhIndicate the update door at the t-1 moment Weight, wzxIndicate the update door weight in t moment, bzIt indicates to bias in the update door of t moment.
It can be obtained according to the definition of GRU network structure:
rt=sigmod (netR, t)
zt=sigmod (netZ, t)
gt=tanh (netG, t)
ht=(1-zt)*ht-1+zt*gt (2)
Wherein: rtIndicate the resetting door output state in t moment, ztIndicate the update door output state in t moment, gtTable Show the resetting door state of a control in t moment, htIndicate the hidden state in t moment, ht-1Indicate the hidden state at the t-1 moment.
For GRU network layer l, as t=T,For l+1 layers of Feedback error, when t ∈ [0, T) whenBy two It is grouped as, first is that l+1 layers of t moment of error reversely comes intoSecond is that the Feedback error at t+1 momentError is fixed Justice: in t moment, the output valve of GRU is ht, the error of t moment are as follows:
The net known to formula (1) (2)R, t, netZ, t, netG, t, htIt is all ht-1Function, defined according to error and entirely led Formula can obtain:
Wherein: E indicates unit matrix, δt-1Indicate the reversed error at the t-1 moment,Indicate the l-1 at the t-1 moment The reversed error of layer, δZ, tIt indicates to update the reversed error of output of door, δ in t momentR, tIndicate reversed in the output of t moment resetting door Error, δG, tIndicate the reversed error of control in t moment resetting door, δtIndicate the reversed error in t moment.
Each unit t moment error deltaZ, t, δR, t, δG, tFormula is as follows:
Formula (4) circulation is brought into The error at each moment can be found out.
According to each moment error deltaZ, t, δR, t, δG, tCalculate weight and biasing gradient, first calculating Δ wZh, t, Δ wRh, t, Δ wGh, t
Wherein: w+Indicate that the Error weight in t moment, w indicate the weight in t moment, Δ w indicates the weight in t moment Gradient, η indicate coefficient, b+It indicates in t moment error offset, b indicates the biasing in t moment, and Δ b indicates to bias ladder in t moment Degree, Δ wZh, tIt indicates to update the output weight gradient of door, Δ w in t momentRh, tIndicate the output weight ladder in t moment resetting door Degree, Δ wGh, tIndicate the control weight gradient in t moment resetting door.
The gradient at each moment is added together, it is as follows gradient to be obtained:
The input of GRUIt is upper one layer of network output, is defined asWherein fl-1It is l-1 layers Activation primitive.By the definition of formula (1) it is found thatIt isFunction, can according to total derivative formula :
It is the calculating process of two-way GRU above.Deep learning has good application in terms of speech recognition, and depth is two-way GRU can more efficiently realize Speaker Identification, so present invention uses BiGRUs.
Specifically, the training and differentiation process of two-way GRU neural network:
(1) characteristic parameter using mel-frequency cepstrum coefficient (Mel Frequency Cepstral Coefficent, MFCC), input of the 39 dimension MFCC characteristic parameters as deep learning is taken.
(2) pass through full articulamentum processing feature data.
(3) in Bi-GRU, propagated forward and backpropagation are combined and are trained to data.
(4) softmax classifier classification output differentiates result.
Fig. 4 is the hardware realization block diagram in one embodiment of the invention, as shown in figure 3, obtaining voice messaging, then base first Turing test is carried out in computer, if test passes through, in the TMS320DM8168 development board (journey for being pre-loaded with program instruction Sequence instruction also can store in DDR or NVRAM, at runtime, be called by the processing chip of TMS320DM8168 development board) It executes the analysis to phonetic feature and compares and operate, obtain differentiating result.When differentiating that result is collected phonetic feature and pre- When the phonetic feature of the target person first stored is consistent, then gate inhibition's driving mechanism is transferred to by bus (PCI), is driven by gate inhibition Motivation structure drives gate inhibition to open.If differentiating, result is the phonetic feature of collected phonetic feature and pre-stored target person When not meeting, gate inhibition is remained turned-off.
In the present embodiment, programmed algorithm can be read in TMS320DM8168 development board, software and hardware combining makes test more It is convenient and efficient.TMS320DM8168 is a high-end Floating-point DSP+ARM double-core development board, has the spies such as stable, convenient, reliable Speaker Identification can be effectively performed in point.
The present invention enhances the voice collected using production confrontation network, weakens noise, and double using depth Speaker Identification is carried out to GRU, adaptive is strong, and universality is high, while possessing high efficiency;So as to improve the peace of access control system Quan Xing, reliability.It is more fast and reliable when realizing the above method on TMS320DM8168.
Specific embodiments of the present invention are described above.It is to be appreciated that the invention is not limited to above-mentioned Particular implementation, those skilled in the art can make a variety of changes or modify within the scope of the claims, this not shadow Ring substantive content of the invention.In the absence of conflict, the feature in embodiments herein and embodiment can any phase Mutually combination.

Claims (5)

1. a kind of voice access control system characterized by comprising voice acquisition module, processor, gate inhibition's driving mechanism, it is described It is loaded with turing test module in processor, generates confrontation network, two-way GRU neural network, the voice acquisition module and institute Processor communication connection is stated, the processor and gate inhibition's driving mechanism communicate to connect;Wherein:
The voice acquisition module gives the processor for acquiring voice messaging, and by the transmission of speech information;
Whether the turing test module is true man with the determination voice messaging for analyzing the voice messaging The voice messaging is then inputted the generation and fights network by voice messaging if real voice information;
The generation fights network, and for carrying out enhancing processing to the voice messaging received, obtaining enhancing, treated Voice messaging;
The two-way GRU neural network obtains voice spy for carrying out feature extraction to enhancing treated the voice messaging Sign;And judge whether the phonetic feature meets the phonetic feature of target person;Wherein, the two-way GRU neural network passes through MFCC carries out feature extraction to enhancing treated the voice messaging;The two-way GRU neural network is pre- to first pass through training The learning model for having phonetic feature recognition capability;
The processor when for meeting the phonetic feature of target person in the phonetic feature, controls gate inhibition's driving machine Structure opening gate.
2. voice access control system according to claim 1, which is characterized in that the turing test module is specifically used for: When receiving the voice messaging, random the problem of generating preset quantity, within a preset time, it is corresponding to receive described problem Correct option then passes through turing test.
3. voice access control system according to claim 1, which is characterized in that the generation fights network, including arbiter And generator, the arbiter is used to judge the enhancing of generator output, and treated whether voice messaging is true language Sound;The generator is used to the voice messaging carrying out enhancing processing, and will enhancing treated that voice messaging is input to institute It states in arbiter.
4. voice access control system according to claim 1, which is characterized in that the two-way GRU neural network uses Meier Frequency cepstral coefficient extracts 39 dimension MFCC characteristic parameters as the phonetic feature.
5. voice access control system according to claim 1, which is characterized in that the processor is opened using TMS320DM8168 Plate is sent out, loading figure spirit test module, generation confrontation network, two-way GRU neural network on the TMS320DM8168 development board.
CN201910534516.7A 2019-06-19 2019-06-19 Voice access control system Pending CN110223429A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910534516.7A CN110223429A (en) 2019-06-19 2019-06-19 Voice access control system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910534516.7A CN110223429A (en) 2019-06-19 2019-06-19 Voice access control system

Publications (1)

Publication Number Publication Date
CN110223429A true CN110223429A (en) 2019-09-10

Family

ID=67814121

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910534516.7A Pending CN110223429A (en) 2019-06-19 2019-06-19 Voice access control system

Country Status (1)

Country Link
CN (1) CN110223429A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111341304A (en) * 2020-02-28 2020-06-26 广州国音智能科技有限公司 Method, device and equipment for training speech characteristics of speaker based on GAN
CN111640440A (en) * 2020-04-30 2020-09-08 华为技术有限公司 Audio stream decoding method, device, storage medium and equipment
CN111862413A (en) * 2020-07-28 2020-10-30 公安部第三研究所 Method and system for realizing epidemic situation resistant non-contact multidimensional identity rapid identification
CN112735431A (en) * 2020-12-29 2021-04-30 三星电子(中国)研发中心 Model training method and device and artificial intelligence dialogue recognition method and device

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102737634A (en) * 2012-05-29 2012-10-17 百度在线网络技术(北京)有限公司 Authentication method and device based on voice
US20150025889A1 (en) * 2013-02-19 2015-01-22 Max Sound Corporation Biometric audio security
CN107886943A (en) * 2017-11-21 2018-04-06 广州势必可赢网络科技有限公司 A kind of method for recognizing sound-groove and device
CN108346433A (en) * 2017-12-28 2018-07-31 北京搜狗科技发展有限公司 A kind of audio-frequency processing method, device, equipment and readable storage medium storing program for executing
CN108595601A (en) * 2018-04-20 2018-09-28 福州大学 A kind of long text sentiment analysis method incorporating Attention mechanism
CN108806109A (en) * 2018-05-02 2018-11-13 苏州诺登德智能科技有限公司 A kind of express delivery cabinet piece taking control device based on speech recognition
CN109273009A (en) * 2018-08-02 2019-01-25 平安科技(深圳)有限公司 Access control method, device, computer equipment and storage medium
CN109599124A (en) * 2018-11-23 2019-04-09 腾讯科技(深圳)有限公司 A kind of audio data processing method, device and storage medium
CN109631931A (en) * 2018-11-28 2019-04-16 深圳桓轩科技有限公司 A kind of artificial intelligence navigator
CN109785834A (en) * 2019-01-24 2019-05-21 中国—东盟信息港股份有限公司 A kind of voice data sample acquisition system and its method based on identifying code

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102737634A (en) * 2012-05-29 2012-10-17 百度在线网络技术(北京)有限公司 Authentication method and device based on voice
US20150025889A1 (en) * 2013-02-19 2015-01-22 Max Sound Corporation Biometric audio security
CN107886943A (en) * 2017-11-21 2018-04-06 广州势必可赢网络科技有限公司 A kind of method for recognizing sound-groove and device
CN108346433A (en) * 2017-12-28 2018-07-31 北京搜狗科技发展有限公司 A kind of audio-frequency processing method, device, equipment and readable storage medium storing program for executing
CN108595601A (en) * 2018-04-20 2018-09-28 福州大学 A kind of long text sentiment analysis method incorporating Attention mechanism
CN108806109A (en) * 2018-05-02 2018-11-13 苏州诺登德智能科技有限公司 A kind of express delivery cabinet piece taking control device based on speech recognition
CN109273009A (en) * 2018-08-02 2019-01-25 平安科技(深圳)有限公司 Access control method, device, computer equipment and storage medium
CN109599124A (en) * 2018-11-23 2019-04-09 腾讯科技(深圳)有限公司 A kind of audio data processing method, device and storage medium
CN109631931A (en) * 2018-11-28 2019-04-16 深圳桓轩科技有限公司 A kind of artificial intelligence navigator
CN109785834A (en) * 2019-01-24 2019-05-21 中国—东盟信息港股份有限公司 A kind of voice data sample acquisition system and its method based on identifying code

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
林崇德: "《中国少年儿童百科全书 科学 技术》", 30 April 2017, 浙江教育出版社 *
玉圣龙: "基于分节信息的方言语音系统的研究与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
魏虎林: "基于SIP的SPIT防御方案的设计与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111341304A (en) * 2020-02-28 2020-06-26 广州国音智能科技有限公司 Method, device and equipment for training speech characteristics of speaker based on GAN
CN111640440A (en) * 2020-04-30 2020-09-08 华为技术有限公司 Audio stream decoding method, device, storage medium and equipment
WO2021218240A1 (en) * 2020-04-30 2021-11-04 华为技术有限公司 Audio stream decoding method and apparatus, storage medium, and device
CN111640440B (en) * 2020-04-30 2022-12-30 华为技术有限公司 Audio stream decoding method, device, storage medium and equipment
CN111862413A (en) * 2020-07-28 2020-10-30 公安部第三研究所 Method and system for realizing epidemic situation resistant non-contact multidimensional identity rapid identification
CN112735431A (en) * 2020-12-29 2021-04-30 三星电子(中国)研发中心 Model training method and device and artificial intelligence dialogue recognition method and device
CN112735431B (en) * 2020-12-29 2023-12-22 三星电子(中国)研发中心 Model training method and device and artificial intelligent dialogue recognition method and device

Similar Documents

Publication Publication Date Title
CN110223429A (en) Voice access control system
Ravanelli et al. Multi-task self-supervised learning for robust speech recognition
CN106251874B (en) A kind of voice gate inhibition and quiet environment monitoring method and system
Chen et al. A Multi-Scale Fusion Framework for Bimodal Speech Emotion Recognition.
WO2019210796A1 (en) Speech recognition method and apparatus, storage medium, and electronic device
TW201935464A (en) Method and device for voiceprint recognition based on memorability bottleneck features
Basu et al. Towards measuring human interactions in conversational settings
CN113822192A (en) Method, device and medium for identifying emotion of escort personnel based on Transformer multi-modal feature fusion
CN112216271A (en) Audio-visual dual-mode speech recognition method based on convolution block attention mechanism
Ghriss et al. Sentiment-aware automatic speech recognition pre-training for enhanced speech emotion recognition
Wang et al. Recognition of audio depression based on convolutional neural network and generative antagonism network model
Nunes et al. Am-mobilenet1d: A portable model for speaker recognition
Elshaer et al. Transfer learning from sound representations for anger detection in speech
Pan et al. A fused hidden Markov model with application to bimodal speech processing
CN112507311A (en) High-security identity verification method based on multi-mode feature fusion
Sun et al. A novel convolutional neural network voiceprint recognition method based on improved pooling method and dropout idea
Mohammed et al. Advantages and disadvantages of automatic speaker recognition systems
Yusuf et al. A novel multi-window spectrogram augmentation approach for speech emotion recognition using deep learning
Yusuf et al. RMWSaug: robust multi-window spectrogram augmentation approach for deep learning based speech emotion recognition
CN108074585A (en) A kind of voice method for detecting abnormality based on sound source characteristics
Li et al. A novel trojan attack against co-learning based asr dnn system
Li et al. Rethinking Voice-Face Correlation: A Geometry View
An et al. Combining deep neural network with SVM to identify used in IOT
Shofiyah et al. Voice recognition system for home security keys with mel-frequency cepstral coefficient method and backpropagation artificial neural network
CN117854509B (en) Training method and device for whisper speaker recognition model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190910

RJ01 Rejection of invention patent application after publication