CN110223429A - Voice access control system - Google Patents
Voice access control system Download PDFInfo
- Publication number
- CN110223429A CN110223429A CN201910534516.7A CN201910534516A CN110223429A CN 110223429 A CN110223429 A CN 110223429A CN 201910534516 A CN201910534516 A CN 201910534516A CN 110223429 A CN110223429 A CN 110223429A
- Authority
- CN
- China
- Prior art keywords
- voice
- voice messaging
- network
- processor
- access control
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012360 testing method Methods 0.000 claims abstract description 26
- 238000013528 artificial neural network Methods 0.000 claims abstract description 15
- 230000005764 inhibitory process Effects 0.000 claims abstract description 12
- 238000012545 processing Methods 0.000 claims abstract description 9
- 238000000605 extraction Methods 0.000 claims abstract description 5
- 230000007246 mechanism Effects 0.000 claims abstract description 5
- 230000002708 enhancing effect Effects 0.000 claims description 16
- 238000012549 training Methods 0.000 claims description 12
- 238000011161 development Methods 0.000 claims description 9
- 230000005540 biological transmission Effects 0.000 claims description 2
- 238000004891 communication Methods 0.000 claims description 2
- 239000000284 extract Substances 0.000 claims description 2
- 238000013135 deep learning Methods 0.000 abstract description 6
- 239000004568 cement Substances 0.000 abstract description 3
- 230000006870 function Effects 0.000 abstract description 3
- 238000004519 manufacturing process Methods 0.000 abstract description 3
- 230000002457 bidirectional effect Effects 0.000 abstract 1
- 230000001351 cycling effect Effects 0.000 abstract 1
- 238000000034 method Methods 0.000 description 18
- 230000008569 process Effects 0.000 description 8
- 230000004913 activation Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 230000008450 motivation Effects 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 230000000306 recurrent effect Effects 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G07—CHECKING-DEVICES
- G07C—TIME OR ATTENDANCE REGISTERS; REGISTERING OR INDICATING THE WORKING OF MACHINES; GENERATING RANDOM NUMBERS; VOTING OR LOTTERY APPARATUS; ARRANGEMENTS, SYSTEMS OR APPARATUS FOR CHECKING NOT PROVIDED FOR ELSEWHERE
- G07C9/00—Individual registration on entry or exit
- G07C9/30—Individual registration on entry or exit not involving the use of a pass
- G07C9/32—Individual registration on entry or exit not involving the use of a pass in combination with an identity check
- G07C9/37—Individual registration on entry or exit not involving the use of a pass in combination with an identity check using biometric data, e.g. fingerprints, iris scans or voice recognition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/02—Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/18—Artificial neural networks; Connectionist approaches
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/22—Interactive procedures; Man-machine interfaces
Landscapes
- Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Physics & Mathematics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The present invention provides a kind of voice access control systems, carry out real voice identification using phonetic anti-fake turing test and deep learning, realize gate function.The system includes: the processor for being loaded with turing test module, generating confrontation network, two-way GRU neural network, and the processor and gate inhibition's driving mechanism communicate to connect, realize according to phonetic feature, open or close gate inhibition.The present invention is suitable under the specific environment of speaker not at the scene, turing test is carried out to acquired voice, it is determined as after real speech again, speech enhan-cement processing is carried out by production confrontation network, feature extraction then is carried out to enhanced voice using parameters such as Mel cepstrum (MFCC), completes Speaker Identification via depth bidirectional valve controlled cycling element (GRU) network.
Description
Technical field
The present invention relates to voice processing technology fields, and in particular, to voice access control system.
Background technique
With the rapid development of electronic information technology, conventional door lock constantly develops to high-tech, intelligent direction, with biology
Feature identification combines the intelligent identifying system of conventional lock to progress into people's lives.More and more enterprises by this
Kind intelligent identifying system is applied to entrance guard management and attendance management.
Currently, the voiceprint in voice is incorporated into access control system as biological characteristic.
But existing voice access control system is easy to be recorded, the modes such as editing sound crack, there are certain safety is hidden
Suffer from.
Summary of the invention
For the defects in the prior art, the object of the present invention is to provide a kind of voice access control systems.
A kind of voice access control system provided according to the present invention, comprising: voice acquisition module, processor, gate inhibition's driving machine
Structure is loaded with turing test module in the processor, generates confrontation network, two-way GRU neural network, the voice collecting mould
Block is connect with the processor communication, and the processor and gate inhibition's driving mechanism communicate to connect;Wherein:
The voice acquisition module gives the processor for acquiring voice messaging, and by the transmission of speech information;
The turing test module, for analyzing the voice messaging, with the determination voice messaging whether be
The voice messaging is then inputted the generation and fights network by real voice information if real voice information;
The generation fights network, for carrying out enhancing processing to the voice messaging received, obtains enhancing processing
Voice messaging afterwards;
The two-way GRU neural network obtains language for carrying out feature extraction to enhancing treated the voice messaging
Sound feature;And judge whether the phonetic feature meets the phonetic feature of target person;Wherein, the two-way GRU neural network
Feature extraction is carried out to enhancing treated the voice messaging by MFCC;The two-way GRU neural network is pre- first passes through
The trained learning model for having phonetic feature recognition capability;
The processor when for meeting the phonetic feature of target person in the phonetic feature, controlling the gate inhibition and driving
Motivation structure opening gate.
Optionally, the turing test module, is specifically used for: when receiving the voice messaging, random generate is preset
The problem of quantity, receives the corresponding correct option of described problem, then passes through turing test within a preset time.
Optionally, the generation fights network, including arbiter and generator, and the arbiter is for judging the generation
Treated whether voice messaging is real speech for the enhancing of device output;The generator is for increasing the voice messaging
Strength reason, and will enhancing treated that voice messaging is input in the arbiter.
Optionally, the two-way GRU neural network is extracted 39 dimension MFCC characteristic parameters using mel-frequency cepstrum coefficient and is made
For the phonetic feature.
Optionally, the processor uses TMS320DM8168 development board, adds on the TMS320DM8168 development board
It carries turing test module, generate confrontation network, two-way GRU neural network.Compared with prior art, the present invention has with following
Beneficial effect:
Voice access control system provided by the invention can differentiate the true and false property of identified person, improve precision of identifying speech, from
And realize that accurately access control, safety are higher, user experience is good.
Detailed description of the invention
Upon reading the detailed description of non-limiting embodiments with reference to the following drawings, other feature of the invention,
Objects and advantages will become more apparent upon:
Fig. 1 is the schematic illustration for the voice access control system that one embodiment of the invention provides;
Fig. 2 is the flow diagram of the training method of SEGAN;
Fig. 3 is the two-way GRU structure chart of depth in one embodiment of the invention;
Fig. 4 is the hardware realization block diagram in one embodiment of the invention.
Specific embodiment
The present invention is described in detail combined with specific embodiments below.Following embodiment will be helpful to the technology of this field
Personnel further understand the present invention, but the invention is not limited in any way.It should be pointed out that the ordinary skill of this field
For personnel, without departing from the inventive concept of the premise, several changes and improvements can also be made.These belong to the present invention
Protection scope.
Fig. 1 is the schematic illustration for the voice access control system that one embodiment of the invention provides, as shown in Figure 1, acquiring first
Speaker's voice messaging starts turing test then when receiving voice messaging, if being judged as true by turing test
Then people's voice messaging will be input to deep learning module by the voice messaging of turing test.The deep learning module includes:
Generate confrontation network and the two-way GRU network of depth.
In the present embodiment, judges that identified speaker is true, real-time by turing test first, rather than record
Sound or machine simulation sound.The method of turing test is exactly to allow tester and testee, by a kind of special mode to quilt
Tester arbitrarily puts question to.It carries out after repeatedly testing, if there is being more than that 30% tester cannot determine that testee is that people goes back
It is machine, then this machine has just passed through test.The present invention in certain circumstances, judged using the method for turing test,
The problem of speaking ability by random question, makes recording that cannot answer real time problem, and machine person's development is also less than the true mankind.So far,
Very rare robot can be by turing test, so turing test can accurately judge that identified speaker is true people
Or other.
In the present embodiment, speaker's voice is enhanced using GAN.Production fights network (GAN, Generative
Adversarial Networks) it is a kind of deep learning model, it is unsupervised learning most prospect in complex distributions in recent years
One of method.GAN network contains two confrontation models: generating model (G) input is that band is made an uproar picture, and output generates one and sees
Get up as genuine picture, confuses discrimination model;Discrimination model (D) is for judging that a given picture is true picture
(including the picture obtained in data set and the output picture for generating network).In rigid start, two models are all not pass through
Training is crossed, dual training, generation model generate picture and remove deception discrimination model two models together, and then discrimination model goes to sentence
Break the true and false of this picture, final two model capabilities are more and more stronger, reach stable state.The present invention uses the language based on confrontation network
Sound Enhancement Method SEGAN (Speech Enhancement GAN), the advantage of this method is: 1) providing a Rapid Speech
Enhancing process, no causality are required, no recursive operations as RNN;2) it is processed based on original audio.
Manual feature is not extracted, specific hypothesis is not made to initial data;3) from different speakers and noise type middle school
It practises, and merges them into identical shared parameter, so that system is simple and generalization ability is stronger.The input of SEGAN is to contain
Noisy speech signal and potential characterization signal, output are enhanced signals.It is to be entirely convolutional layer (without complete by Generator Design
Articulamentum), training parameter can be reduced so as to shorten the training time by doing so.An important feature for generating network is that end is arrived
End structure, directly processing primary speech signal, avoid and extract acoustic feature by intermediate conversion.In the training process, identify
Device is responsible for sending true and false information in input data to generator, and generator is allowed to output it waveform towards true distribution
Fine tuning, to eliminate interference signal.Speaker Identification is carried out again by the enhanced voice signal of SEGAN.
SEGAN whole network is made of CNN, it is a codec (encoder-decoder), the structure of D
It is encoder (encoder), above connects a dimensionality reduction layer.8 × 1024 parameters are reduced to 8.Encoder is by 1 dimension that step-length is 2
Convolutional layer is constituted.It inputs as Noisy Speech Signal and isOutput is that enhancing voice signal isSpeech enhan-cement process is complete with G network
At,It is input, output with implicit function zG is a full convolutional network, similar with autocoder.In coding,
Input signal is projected and compressed with a series of great-leap-forward convolutional layer activation primitives, every N walks to obtain a convolution results.Experiment card
Bright, great-leap-forward convolution is more preferable than pond method in GAN network training.Great-leap-forward connection is the wave directly skipped in decoding process
The fine tuning line information of shape, and its gradient can flow intensification in total, and this operation prevents low level details to exist
Reconstructed speech waveform is lost.Decoding process and cataloged procedure on the contrary, replaced convolution with a small amount of great-leap-forward, function used with it is encoded
Journey uses identical activation primitive.
There are three the stages for the training process of SEGAN.(1) arbiter D inputs noisy speech and corresponding clean speech, by it
Label is set as very, the parameter of training D;(2) generator G inputs noisy speech, generates enhancing voice, inputs together with noisy speech
D, label are set as false, update the parameter of D;(3) D is fixed, and is repeated step (2), is updated the parameter of G.It completes the above three steps, G
The as network of speech enhan-cement.Specifically, training process is as shown in Figure 2.
Specifically, Fig. 3 is the two-way GRU structure chart of depth in one embodiment of the invention, as shown in figure 3, double using depth
Speech recognition is carried out to GRU.Speech characteristic parameter uses mel-frequency cepstrum coefficient (Mel Frequency Cepstral
Coefficent, MFCC), take input of the 39 dimension MFCC characteristic parameters as deep learning.The present invention uses the two-way GRU of depth
(BiGRUs) Speaker Identification is carried out, being proposed to of LSTM (Long Short Time Memory) overcomes RNN
(Recurrent Neural Network) can not handle remote dependence ground problem, GRU (Gated Recurrent well
Unit) be LSTM a variant, GRU by analysis LSTM framework which be partially improving of really needing, will
Forget door and input gate has synthesized a update door.It is equally also mixed with cell state and hidden state, final model is than double
It will succinctly efficiently to LSTM model.Only there are two doors, respectively update door z by GRU(t)With resetting door r(t).It is previous to update door control
The status information at moment is brought into the degree in current state, and the status information of the bigger previous moment of value is brought into more.Resetting
The degree of the status information of previous moment is ignored in door control, and the smaller explanation of value is ignored more.Propagated forward formula is defined as follows:
netR, t=wrhht-1+wrxxt+br
netZ, t=wzhht-1+wzxxt+bz
netG, t=wgh(rt*ht-1)+wgxxt+bg (1)
Wherein: netR, tIndicate the resetting door network state in t moment, wrhIndicate the resetting door weight at the t-1 moment, ht-1
Indicate the hidden state at the t-1 moment, wrxIndicate the resetting door weight in t moment, xtIndicate the input in t moment, brIt indicates
It is biased in the resetting door of t moment, netZ, tIndicate the update door network state in t moment, wzhIndicate the update door at the t-1 moment
Weight, wzxIndicate the update door weight in t moment, bzIt indicates to bias in the update door of t moment.
It can be obtained according to the definition of GRU network structure:
rt=sigmod (netR, t)
zt=sigmod (netZ, t)
gt=tanh (netG, t)
ht=(1-zt)*ht-1+zt*gt (2)
Wherein: rtIndicate the resetting door output state in t moment, ztIndicate the update door output state in t moment, gtTable
Show the resetting door state of a control in t moment, htIndicate the hidden state in t moment, ht-1Indicate the hidden state at the t-1 moment.
For GRU network layer l, as t=T,For l+1 layers of Feedback error, when t ∈ [0, T) whenBy two
It is grouped as, first is that l+1 layers of t moment of error reversely comes intoSecond is that the Feedback error at t+1 momentError is fixed
Justice: in t moment, the output valve of GRU is ht, the error of t moment are as follows:
The net known to formula (1) (2)R, t, netZ, t, netG, t, htIt is all ht-1Function, defined according to error and entirely led
Formula can obtain:
Wherein: E indicates unit matrix, δt-1Indicate the reversed error at the t-1 moment,Indicate the l-1 at the t-1 moment
The reversed error of layer, δZ, tIt indicates to update the reversed error of output of door, δ in t momentR, tIndicate reversed in the output of t moment resetting door
Error, δG, tIndicate the reversed error of control in t moment resetting door, δtIndicate the reversed error in t moment.
Each unit t moment error deltaZ, t, δR, t, δG, tFormula is as follows:
Formula (4) circulation is brought into
The error at each moment can be found out.
According to each moment error deltaZ, t, δR, t, δG, tCalculate weight and biasing gradient, first calculating Δ wZh, t, Δ wRh, t, Δ
wGh, t。
Wherein: w+Indicate that the Error weight in t moment, w indicate the weight in t moment, Δ w indicates the weight in t moment
Gradient, η indicate coefficient, b+It indicates in t moment error offset, b indicates the biasing in t moment, and Δ b indicates to bias ladder in t moment
Degree, Δ wZh, tIt indicates to update the output weight gradient of door, Δ w in t momentRh, tIndicate the output weight ladder in t moment resetting door
Degree, Δ wGh, tIndicate the control weight gradient in t moment resetting door.
The gradient at each moment is added together, it is as follows gradient to be obtained:
The input of GRUIt is upper one layer of network output, is defined asWherein fl-1It is l-1 layers
Activation primitive.By the definition of formula (1) it is found thatIt isFunction, can according to total derivative formula
:
It is the calculating process of two-way GRU above.Deep learning has good application in terms of speech recognition, and depth is two-way
GRU can more efficiently realize Speaker Identification, so present invention uses BiGRUs.
Specifically, the training and differentiation process of two-way GRU neural network:
(1) characteristic parameter using mel-frequency cepstrum coefficient (Mel Frequency Cepstral Coefficent,
MFCC), input of the 39 dimension MFCC characteristic parameters as deep learning is taken.
(2) pass through full articulamentum processing feature data.
(3) in Bi-GRU, propagated forward and backpropagation are combined and are trained to data.
(4) softmax classifier classification output differentiates result.
Fig. 4 is the hardware realization block diagram in one embodiment of the invention, as shown in figure 3, obtaining voice messaging, then base first
Turing test is carried out in computer, if test passes through, in the TMS320DM8168 development board (journey for being pre-loaded with program instruction
Sequence instruction also can store in DDR or NVRAM, at runtime, be called by the processing chip of TMS320DM8168 development board)
It executes the analysis to phonetic feature and compares and operate, obtain differentiating result.When differentiating that result is collected phonetic feature and pre-
When the phonetic feature of the target person first stored is consistent, then gate inhibition's driving mechanism is transferred to by bus (PCI), is driven by gate inhibition
Motivation structure drives gate inhibition to open.If differentiating, result is the phonetic feature of collected phonetic feature and pre-stored target person
When not meeting, gate inhibition is remained turned-off.
In the present embodiment, programmed algorithm can be read in TMS320DM8168 development board, software and hardware combining makes test more
It is convenient and efficient.TMS320DM8168 is a high-end Floating-point DSP+ARM double-core development board, has the spies such as stable, convenient, reliable
Speaker Identification can be effectively performed in point.
The present invention enhances the voice collected using production confrontation network, weakens noise, and double using depth
Speaker Identification is carried out to GRU, adaptive is strong, and universality is high, while possessing high efficiency;So as to improve the peace of access control system
Quan Xing, reliability.It is more fast and reliable when realizing the above method on TMS320DM8168.
Specific embodiments of the present invention are described above.It is to be appreciated that the invention is not limited to above-mentioned
Particular implementation, those skilled in the art can make a variety of changes or modify within the scope of the claims, this not shadow
Ring substantive content of the invention.In the absence of conflict, the feature in embodiments herein and embodiment can any phase
Mutually combination.
Claims (5)
1. a kind of voice access control system characterized by comprising voice acquisition module, processor, gate inhibition's driving mechanism, it is described
It is loaded with turing test module in processor, generates confrontation network, two-way GRU neural network, the voice acquisition module and institute
Processor communication connection is stated, the processor and gate inhibition's driving mechanism communicate to connect;Wherein:
The voice acquisition module gives the processor for acquiring voice messaging, and by the transmission of speech information;
Whether the turing test module is true man with the determination voice messaging for analyzing the voice messaging
The voice messaging is then inputted the generation and fights network by voice messaging if real voice information;
The generation fights network, and for carrying out enhancing processing to the voice messaging received, obtaining enhancing, treated
Voice messaging;
The two-way GRU neural network obtains voice spy for carrying out feature extraction to enhancing treated the voice messaging
Sign;And judge whether the phonetic feature meets the phonetic feature of target person;Wherein, the two-way GRU neural network passes through
MFCC carries out feature extraction to enhancing treated the voice messaging;The two-way GRU neural network is pre- to first pass through training
The learning model for having phonetic feature recognition capability;
The processor when for meeting the phonetic feature of target person in the phonetic feature, controls gate inhibition's driving machine
Structure opening gate.
2. voice access control system according to claim 1, which is characterized in that the turing test module is specifically used for:
When receiving the voice messaging, random the problem of generating preset quantity, within a preset time, it is corresponding to receive described problem
Correct option then passes through turing test.
3. voice access control system according to claim 1, which is characterized in that the generation fights network, including arbiter
And generator, the arbiter is used to judge the enhancing of generator output, and treated whether voice messaging is true language
Sound;The generator is used to the voice messaging carrying out enhancing processing, and will enhancing treated that voice messaging is input to institute
It states in arbiter.
4. voice access control system according to claim 1, which is characterized in that the two-way GRU neural network uses Meier
Frequency cepstral coefficient extracts 39 dimension MFCC characteristic parameters as the phonetic feature.
5. voice access control system according to claim 1, which is characterized in that the processor is opened using TMS320DM8168
Plate is sent out, loading figure spirit test module, generation confrontation network, two-way GRU neural network on the TMS320DM8168 development board.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910534516.7A CN110223429A (en) | 2019-06-19 | 2019-06-19 | Voice access control system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910534516.7A CN110223429A (en) | 2019-06-19 | 2019-06-19 | Voice access control system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110223429A true CN110223429A (en) | 2019-09-10 |
Family
ID=67814121
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910534516.7A Pending CN110223429A (en) | 2019-06-19 | 2019-06-19 | Voice access control system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110223429A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111341304A (en) * | 2020-02-28 | 2020-06-26 | 广州国音智能科技有限公司 | Method, device and equipment for training speech characteristics of speaker based on GAN |
CN111640440A (en) * | 2020-04-30 | 2020-09-08 | 华为技术有限公司 | Audio stream decoding method, device, storage medium and equipment |
CN111862413A (en) * | 2020-07-28 | 2020-10-30 | 公安部第三研究所 | Method and system for realizing epidemic situation resistant non-contact multidimensional identity rapid identification |
CN112735431A (en) * | 2020-12-29 | 2021-04-30 | 三星电子(中国)研发中心 | Model training method and device and artificial intelligence dialogue recognition method and device |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102737634A (en) * | 2012-05-29 | 2012-10-17 | 百度在线网络技术(北京)有限公司 | Authentication method and device based on voice |
US20150025889A1 (en) * | 2013-02-19 | 2015-01-22 | Max Sound Corporation | Biometric audio security |
CN107886943A (en) * | 2017-11-21 | 2018-04-06 | 广州势必可赢网络科技有限公司 | Voiceprint recognition method and device |
CN108346433A (en) * | 2017-12-28 | 2018-07-31 | 北京搜狗科技发展有限公司 | A kind of audio-frequency processing method, device, equipment and readable storage medium storing program for executing |
CN108595601A (en) * | 2018-04-20 | 2018-09-28 | 福州大学 | A kind of long text sentiment analysis method incorporating Attention mechanism |
CN108806109A (en) * | 2018-05-02 | 2018-11-13 | 苏州诺登德智能科技有限公司 | A kind of express delivery cabinet piece taking control device based on speech recognition |
CN109273009A (en) * | 2018-08-02 | 2019-01-25 | 平安科技(深圳)有限公司 | Access control method, device, computer equipment and storage medium |
CN109599124A (en) * | 2018-11-23 | 2019-04-09 | 腾讯科技(深圳)有限公司 | A kind of audio data processing method, device and storage medium |
CN109631931A (en) * | 2018-11-28 | 2019-04-16 | 深圳桓轩科技有限公司 | A kind of artificial intelligence navigator |
CN109785834A (en) * | 2019-01-24 | 2019-05-21 | 中国—东盟信息港股份有限公司 | A kind of voice data sample acquisition system and its method based on identifying code |
-
2019
- 2019-06-19 CN CN201910534516.7A patent/CN110223429A/en active Pending
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102737634A (en) * | 2012-05-29 | 2012-10-17 | 百度在线网络技术(北京)有限公司 | Authentication method and device based on voice |
US20150025889A1 (en) * | 2013-02-19 | 2015-01-22 | Max Sound Corporation | Biometric audio security |
CN107886943A (en) * | 2017-11-21 | 2018-04-06 | 广州势必可赢网络科技有限公司 | Voiceprint recognition method and device |
CN108346433A (en) * | 2017-12-28 | 2018-07-31 | 北京搜狗科技发展有限公司 | A kind of audio-frequency processing method, device, equipment and readable storage medium storing program for executing |
CN108595601A (en) * | 2018-04-20 | 2018-09-28 | 福州大学 | A kind of long text sentiment analysis method incorporating Attention mechanism |
CN108806109A (en) * | 2018-05-02 | 2018-11-13 | 苏州诺登德智能科技有限公司 | A kind of express delivery cabinet piece taking control device based on speech recognition |
CN109273009A (en) * | 2018-08-02 | 2019-01-25 | 平安科技(深圳)有限公司 | Access control method, device, computer equipment and storage medium |
CN109599124A (en) * | 2018-11-23 | 2019-04-09 | 腾讯科技(深圳)有限公司 | A kind of audio data processing method, device and storage medium |
CN109631931A (en) * | 2018-11-28 | 2019-04-16 | 深圳桓轩科技有限公司 | A kind of artificial intelligence navigator |
CN109785834A (en) * | 2019-01-24 | 2019-05-21 | 中国—东盟信息港股份有限公司 | A kind of voice data sample acquisition system and its method based on identifying code |
Non-Patent Citations (3)
Title |
---|
林崇德: "《中国少年儿童百科全书 科学 技术》", 30 April 2017, 浙江教育出版社 * |
玉圣龙: "基于分节信息的方言语音系统的研究与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
魏虎林: "基于SIP的SPIT防御方案的设计与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111341304A (en) * | 2020-02-28 | 2020-06-26 | 广州国音智能科技有限公司 | Method, device and equipment for training speech characteristics of speaker based on GAN |
CN111640440A (en) * | 2020-04-30 | 2020-09-08 | 华为技术有限公司 | Audio stream decoding method, device, storage medium and equipment |
WO2021218240A1 (en) * | 2020-04-30 | 2021-11-04 | 华为技术有限公司 | Audio stream decoding method and apparatus, storage medium, and device |
CN111640440B (en) * | 2020-04-30 | 2022-12-30 | 华为技术有限公司 | Audio stream decoding method, device, storage medium and equipment |
CN111862413A (en) * | 2020-07-28 | 2020-10-30 | 公安部第三研究所 | Method and system for realizing epidemic situation resistant non-contact multidimensional identity rapid identification |
CN112735431A (en) * | 2020-12-29 | 2021-04-30 | 三星电子(中国)研发中心 | Model training method and device and artificial intelligence dialogue recognition method and device |
CN112735431B (en) * | 2020-12-29 | 2023-12-22 | 三星电子(中国)研发中心 | Model training method and device and artificial intelligent dialogue recognition method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110223429A (en) | Voice access control system | |
Ravanelli et al. | Multi-task self-supervised learning for robust speech recognition | |
Chen et al. | A Multi-Scale Fusion Framework for Bimodal Speech Emotion Recognition. | |
WO2019210796A1 (en) | Speech recognition method and apparatus, storage medium, and electronic device | |
TW201935464A (en) | Method and device for voiceprint recognition based on memorability bottleneck features | |
Basu et al. | Towards measuring human interactions in conversational settings | |
CN113822192A (en) | Method, device and medium for identifying emotion of escort personnel based on Transformer multi-modal feature fusion | |
Wang et al. | Recognition of audio depression based on convolutional neural network and generative antagonism network model | |
Ghriss et al. | Sentiment-aware automatic speech recognition pre-training for enhanced speech emotion recognition | |
Nunes et al. | Am-mobilenet1d: A portable model for speaker recognition | |
CN112507311A (en) | High-security identity verification method based on multi-mode feature fusion | |
Elshaer et al. | Transfer learning from sound representations for anger detection in speech | |
Sun et al. | A novel convolutional neural network voiceprint recognition method based on improved pooling method and dropout idea | |
Mohammed et al. | Advantages and disadvantages of automatic speaker recognition systems | |
Singh | A text independent speaker identification system using ANN, RNN, and CNN classification technique | |
Chang et al. | STAA-Net: A Sparse and Transferable Adversarial Attack for Speech Emotion Recognition | |
Li et al. | Rethinking Voice-Face Correlation: A Geometry View | |
Mitra et al. | Investigating Salient Representations and Label Variance in Dimensional Speech Emotion Analysis | |
Gade et al. | A comprehensive study on automatic speaker recognition by using deep learning techniques | |
Yusuf et al. | A novel multi-window spectrogram augmentation approach for speech emotion recognition using deep learning | |
Yusuf et al. | RMWSaug: robust multi-window spectrogram augmentation approach for deep learning based speech emotion recognition | |
CN108074585A (en) | A kind of voice method for detecting abnormality based on sound source characteristics | |
Shofiyah et al. | Voice recognition system for home security keys with mel-frequency cepstral coefficient method and backpropagation artificial neural network | |
Li et al. | A novel trojan attack against co-learning based asr dnn system | |
An et al. | Combining deep neural network with SVM to identify used in IOT |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190910 |
|
RJ01 | Rejection of invention patent application after publication |