CN110728991B - Improved recording equipment identification algorithm - Google Patents

Improved recording equipment identification algorithm Download PDF

Info

Publication number
CN110728991B
CN110728991B CN201910841092.9A CN201910841092A CN110728991B CN 110728991 B CN110728991 B CN 110728991B CN 201910841092 A CN201910841092 A CN 201910841092A CN 110728991 B CN110728991 B CN 110728991B
Authority
CN
China
Prior art keywords
model
layer
frame
identification algorithm
recording device
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910841092.9A
Other languages
Chinese (zh)
Other versions
CN110728991A (en
Inventor
包永强
梁瑞宇
王青云
冯月芹
唐闺臣
朱悦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Institute of Technology
Original Assignee
Nanjing Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Institute of Technology filed Critical Nanjing Institute of Technology
Priority to CN201910841092.9A priority Critical patent/CN110728991B/en
Publication of CN110728991A publication Critical patent/CN110728991A/en
Application granted granted Critical
Publication of CN110728991B publication Critical patent/CN110728991B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/12Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being prediction coefficients
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/45Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of analysis window

Abstract

The invention discloses an improved recording equipment recognition algorithm, which comprises the steps of constructing a first model and a second model, wherein the first model comprises a bidirectional gate cyclic neural network layer, a unidirectional gate cyclic neural network layer and an attention layer, the second model comprises a convolution layer, a jump connection layer and a global average pooling layer, framing and preprocessing are carried out on an audio signal to be detected, multi-dimensional frame-level features of the audio signal are extracted to serve as input of the first model, Mel-frequency spectrum features serve as input of the second model, output features of the first model and the second model are spliced and fused, and a recognition result is obtained through classification. The recognition algorithm of the invention reserves the time sequence characteristic of the audio signal, finally obtains the related characteristic parameters of the high-quality recording equipment by increasing the attention mechanism, the jump connection structure, the hidden unit splicing method and the like, and improves the recognition effect of the recording equipment and the robustness of the model.

Description

Improved recording equipment identification algorithm
Technical Field
The invention relates to the technical field of recording equipment, in particular to an improved recording equipment identification algorithm.
Background
Sound is the most natural means of communication for humans. With the increasing maturity of audio technology, audio has been widely spread in various aspects of social life. Different brands of recording equipment manufacturers generally record audio using different digital signal processing methods and circuits, and the difference between the methods results in audio signals containing features different from other recording equipment. Therefore, the recording apparatus can be identified to some extent by analyzing the audio signal. In the judicial case, related personnel often claim that evidence is recorded by using certain equipment, so that the judgment that the equipment for recording the audio privately is an urgent problem to be solved by relevant departments of the judicial affairs.
With the development of machine learning and deep learning techniques, researchers have proposed a variety of effective machine learning and deep learning recognition models. In 2007, Christian Kraetzer et al combined with time domain and frequency domain mixed feature recognition microphone equipment, and experiments were verified by using a naive Bayes classifier and the like, and finally an identification rate of 75.99% was obtained. Robert Buchholz in 2009 used naive Bayes, logistic regression, and a support vector machine as classifiers to classify microphones, and the characteristic input of the model was the Fourier coefficients of the audio. The effectiveness of the pitch frequency, the formant frequency and the MFCC in the audio in the recording equipment identification process is verified in 2011 by encourage and the like. In 2012, the Mel-Frequency Cepstral Coefficients (MFCC) of audio is extracted by Cemal Hanilc and used as a feature, a support vector machine is used as a model classifier, 14 different telephone devices are identified, and the identification rate reaches 96.42%. In 2014 Vandana Pandey discovered that the power spectral density function of audio could distinguish microphone devices to some extent. In the same year, Ling Zou et al have demonstrated that sound recording devices can be effectively distinguished using MFCC and power-normalized cepstral coefficients (PNCC).
From the present state of research, relatively few studies have been made specifically for recording device identification. Firstly, the shortage of the characteristic database of the recording equipment is caused, with the coming of the 4G era, the brands and signals of mobile phones on the market are continuously increased, and the existing database is not updated in time. And secondly, extracting characteristic parameters of the recording equipment, wherein voice recognition related characteristics are generally adopted in the recording equipment recognition and are not specially used for the recording equipment recognition. And finally, a recording equipment identification model is adopted, the existing recording equipment identification models are all models with excellent performance in speech recognition or speaker recognition, and parameter setting and model design are not specially improved aiming at the characteristics of the recording equipment.
Disclosure of Invention
The purpose of the invention is as follows: in order to overcome the defects in the prior art, the invention provides an improved recording equipment identification algorithm which can overcome the problems of low identification rate and poor generalization performance of recording equipment in the prior art and can effectively identify mobile phones and computer equipment with high utilization rate in the current market.
The technical scheme is as follows: in order to achieve the purpose, the invention adopts the following technical scheme:
an improved sound recording device identification algorithm comprising the steps of:
step S1, framing and preprocessing the audio signal to be detected;
s2, constructing a first model, wherein the first model comprises a bidirectional gate recurrent neural network layer, a unidirectional gate recurrent neural network layer and an attention layer which are sequentially arranged, and multi-dimensional frame-level features of the signals in the S1 are extracted as input of the first model;
s3, constructing a second model, wherein the second model comprises a first convolutional layer, a second convolutional layer, a third convolutional layer, a jump connection layer, a fourth convolutional layer and a global average pooling layer which are sequentially arranged, and Mel frequency spectrum characteristics of the signals in the step S1 are extracted to serve as input of the second model;
and S4, splicing and fusing the output characteristics of the first model and the second model, classifying and obtaining a recognition result.
Preferably, in step S2, 72-dimensional frame-level features are extracted, and after model one processing, 1000-dimensional feature vectors are output.
Preferably, in step S3, the output result of the first convolution layer and the output result of the third convolution layer are superimposed to be the final output of the third convolution layer.
Preferably, in step S1, the audio signal is framed, the frame length is 1024, the frame shift is 25%, and Hanning window processing is performed on the signal to extract the multi-dimensional frame-level features.
Preferably, in step S1, the audio signal is framed, the frame length is 1024, and the frame shift is 25%; calculating FFT for each frame of data, wherein the number of FFT points is 2048; and then a logarithmic Mel frequency spectrum diagram is obtained by calculation through a Mel filter bank with 80 sub-band filters.
Preferably, in step S2, the multidimensional frame-level features include a short-term zero-crossing rate, a root-mean-square energy, a fundamental frequency, a spectral centroid, a spectral spread, a spectral entropy, a spectral flux, a formant frequency, a first-order difference mel-frequency cepstral coefficient, a second-order difference mel-frequency cepstral coefficient, a linear prediction coefficient, and a Bark frequency cepstral coefficient.
Preferably, in step S2, the output S of the attention layer is expressed as P (v | x, q) expectation of the class probability distribution:
Figure BDA0002193323590000021
wherein the input sequence is
Figure BDA0002193323590000022
The corresponding request is q.
Has the advantages that: the improved recording equipment recognition algorithm has the following advantages:
1) the frame-level characteristics of the signals are introduced into a recording equipment identification algorithm, and the time sequence characteristics of the audio signals are reserved;
2) adding an attention mechanism to carry out weighted summation on the high-level features according to importance, and finally obtaining related feature parameters of the high-quality recording equipment so as to improve the robustness of the model;
3) improving a standard convolutional neural network model by adding a jump connection structure, and further improving the performance of the model;
4) and the final model fusion is realized by adopting a hidden unit splicing method, the method can improve the recognition effect of the sound recording equipment recognition and the robustness of the model, and has good application prospect.
Drawings
FIG. 1 is a schematic diagram of a model structure of an improved recognition algorithm of a recording device according to the present invention.
Detailed Description
The present invention will be further described with reference to the accompanying drawings.
As shown in fig. 1, the improved sound recording device recognition algorithm model of the present invention, specifically, the algorithm, comprises the following steps: and (1) extracting 72-dimensional frame-level characteristic parameters from each audio as input of a first model. Since the audio signal is relatively stable in a short time and non-stable in a long time, the framing is performed, and the frame length in the invention is 1024. In order to smooth the transition between two frames, it is necessary to have an overlap between the two frames, with an overlap ratio of 25%. Since framing causes spectral leakage, the signal is Hanning windowed.
And finally, extracting the features. Extracting 72-dimensional features for each frame signal, wherein the features are as follows: short-time zero-crossing rate, root-mean-square energy, fundamental frequency, spectrum centroid, spectrum diffusion, spectrum entropy, spectrum flux, formant frequency, first-order difference Mel cepstrum coefficient, second-order difference Mel cepstrum coefficient, linear prediction coefficient, Bark frequency cepstrum coefficient, and specific parameters are shown in table 1. These features are then combined together in frames, each frame has 72-dimensional speech features, and the precedence relationship between each frame of data also retains the timing information of the original audio signal. The finally obtained feature dimension is (frame number 72), and the frame number is dynamically changed along with the original audio length, so that the contradiction between the feature of the fixed dimension and the changed speech length is solved.
TABLE 1
Figure BDA0002193323590000031
Figure BDA0002193323590000041
Step (2), constructing a first model: and constructing a model I by utilizing a layer of bidirectional door circulation unit, a layer of unidirectional door circulation unit and a layer of attention layer. The recurrent neural network can well process the time sequence signal, the attention mechanism can independently learn the characteristics of the time sequence signal, and the characteristic parameters of the time sequence signal can be effectively mined by combining the recurrent neural network and the attention mechanism. Model one uses one layer of bidirectional gate cycle unit, one layer of unidirectional gate cycle unit and one layer of attention layer, and the input of the model is 72-dimensional frame-level features.
The principle of attention mechanism (attention) is to simulate the human visual attention mechanism. Suppose the input sequence is
Figure BDA0002193323590000042
The corresponding request is q, and the standard attention mechanism principle is to use a function f (x)iQ) calculating a q and xiAlignment score a betweeni. All alignment scores for q with respect to x are noted as a ═ a (a)1,a2,…,an) Finally, a is mapped to a class probability distribution P (v | x, q) using a soft maximization function, which represents the selection of x according to q when v ═ iiSuch as the following equation:
Figure BDA0002193323590000043
equation 2 expresses the output of attention s as P (v | x, q) expectation of the class probability distribution:
Figure BDA0002193323590000044
the attention mechanism can endow different importance to the local part of the same sample, automatically learn the characteristics of a time sequence signal and improve the robustness of the model. And outputting a 1000-dimensional characteristic vector by the model, overlapping the output of the model II, and finally classifying.
And (3) extracting a Mel frequency spectrum from each audio as an input of a model II. Firstly, framing an audio signal sample, wherein the frame length is 1024 and the frame shift is 25%; secondly, FFT is calculated for each frame of data, and the number of FFT points is 2048; again, a log-mel spectrogram was calculated using a mel-filter bank having 80 subband filters.
Step (4), constructing a model II: and (4) inputting the second model into the Mel frequency spectrum obtained in the step (3), adding jump connection to the first three layers of the second model which are convolution layers, connecting a layer of convolution and a layer of global average pooling, and overlapping the output result of the first layer of convolution layer with the output result of the third layer of convolution layer to form the final characteristic of the third layer.
And (5): the model I comprises a layer of bidirectional gate circulation unit, a layer of unidirectional gate circulation unit and an attention layer, and a 1000-dimensional high-level feature is finally extracted; and the first three layers of the model II are convolution layers, jump connection is added, a convolution layer and a global average pooling layer are connected, and finally a 1000-dimensional high-level feature is extracted. And splicing and fusing the output characteristics of the two models, and finally classifying.
TABLE 2 comparison of different model identification rates
Model (model) Support vector machine Recurrent neural networks Standard convolutional neural network Model fusion
Average recognition rate 81% 82.3% 81.5% 87.5%
In conclusion, the improved recording equipment identification algorithm has the accuracy rate of 87.5%. It is characterized in that: 1) the model fusion structure improves the robustness of the system; 2) the extraction of the frame-level features can effectively mine the information of the recording equipment in the audio; 3) different importance is given to the local part of the same sample by using an attention mechanism, and the characteristics of a time sequence signal are automatically learned; 4) the underlying features are extracted using a jump join operation. Therefore, in practical application, different recording devices such as mobile phones and computers with higher utilization rate in the current market can be effectively distinguished according to the detected audio signals. The invention can overcome the problem of low recognition rate of the recognition model of the traditional recording equipment. The method can improve the recognition effect of the recording equipment recognition and the robustness of the model, and has good application prospect.
The above description is only of the preferred embodiments of the present invention, and it should be noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the invention and these are intended to be within the scope of the invention.

Claims (7)

1. An improved sound recording device identification algorithm comprising the steps of:
step S1, framing and preprocessing the audio signal to be detected;
s2, constructing a first model, wherein the first model comprises a bidirectional gate recurrent neural network layer, a unidirectional gate recurrent neural network layer and an attention layer which are sequentially arranged, and multi-dimensional frame-level features of the signals in the S1 are extracted as input of the first model;
s3, constructing a second model, wherein the second model comprises a first convolutional layer, a second convolutional layer, a third convolutional layer, a fourth convolutional layer and a global average pooling layer which are sequentially arranged, the first three layers of the second model are convolutional layers, jump connection is added, the output result of the first convolutional layer and the output result of the third convolutional layer are superposed to form the final characteristic of the third layer, and the Mel frequency spectrum characteristic of the signal in the S1 is extracted as the input of the second model;
and S4, splicing and fusing the output characteristics of the first model and the second model, classifying and obtaining a recognition result.
2. An improved sound recording device identification algorithm as claimed in claim 1 wherein: in step S2, 72-dimensional frame-level features are extracted, and 1000-dimensional feature vectors are output after model one processing.
3. An improved sound recording device identification algorithm as claimed in claim 1 wherein: in step S3, the output result of the first convolution layer and the output result of the third convolution layer are superimposed to be the final output of the third convolution layer.
4. An improved sound recording device identification algorithm as claimed in claim 1 wherein: in step S1, the audio signal is framed, the frame length is 1024, the frame shift is 25%, and Hanning window processing is performed on the signal to extract multi-dimensional frame-level features.
5. An improved sound recording device identification algorithm as claimed in claim 1 wherein: in step S1, framing the audio signal, where the frame length is 1024 and the frame shift is 25%; calculating FFT for each frame of data, wherein the number of FFT points is 2048; and then a logarithmic Mel frequency spectrum diagram is obtained by calculation through a Mel filter bank with 80 sub-band filters.
6. An improved sound recording device identification algorithm as claimed in claim 1 wherein: in step S2, the multidimensional frame-level features include a short-term zero-crossing rate, root-mean-square energy, fundamental frequency, spectrum centroid, spectrum diffusion, spectrum entropy, spectrum flux, formant frequency, first-order difference mel cepstrum coefficient, second-order difference mel cepstrum coefficient, linear prediction coefficient, and Bark frequency cepstrum coefficient.
7. An improved sound recording device identification algorithm as claimed in claim 1 wherein: in step S2, the output S of the attention layer is expressed as P (v | x, q) expectation of the class probability distribution:
Figure FDA0003360607180000011
wherein the input sequence is
Figure DEST_PATH_IMAGE002
The corresponding request is q;
p denotes the probability distribution function, i.e. when v ═ i represents the choice of x according to qiThe probability of (d);
q and xiThere is a relationship:
Figure FDA0003360607180000022
i.e. using a function f (x) according to the standard attention mechanism principleiQ) calculating a q and xiAlignment score a betweeni
CN201910841092.9A 2019-09-06 2019-09-06 Improved recording equipment identification algorithm Active CN110728991B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910841092.9A CN110728991B (en) 2019-09-06 2019-09-06 Improved recording equipment identification algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910841092.9A CN110728991B (en) 2019-09-06 2019-09-06 Improved recording equipment identification algorithm

Publications (2)

Publication Number Publication Date
CN110728991A CN110728991A (en) 2020-01-24
CN110728991B true CN110728991B (en) 2022-03-01

Family

ID=69217918

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910841092.9A Active CN110728991B (en) 2019-09-06 2019-09-06 Improved recording equipment identification algorithm

Country Status (1)

Country Link
CN (1) CN110728991B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111508519B (en) * 2020-04-03 2022-04-26 北京达佳互联信息技术有限公司 Method and device for enhancing voice of audio signal
CN112466298B (en) * 2020-11-24 2023-08-11 杭州网易智企科技有限公司 Voice detection method, device, electronic equipment and storage medium
CN113220934B (en) * 2021-06-01 2023-06-23 平安科技(深圳)有限公司 Singer recognition model training and singer recognition method and device and related equipment
CN113793602B (en) * 2021-08-24 2022-05-10 北京数美时代科技有限公司 Audio recognition method and system for juveniles

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102394062A (en) * 2011-10-26 2012-03-28 华南理工大学 Method and system for automatically identifying voice recording equipment source
CN105023581A (en) * 2015-07-24 2015-11-04 南京工程学院 Audio tampering detection device based on time-frequency domain joint features
CN105513610A (en) * 2015-11-23 2016-04-20 南京工程学院 Voice analysis method and device
CN106952643A (en) * 2017-02-24 2017-07-14 华南理工大学 A kind of sound pick-up outfit clustering method based on Gaussian mean super vector and spectral clustering
CN107507626A (en) * 2017-07-07 2017-12-22 宁波大学 A kind of mobile phone source title method based on voice spectrum fusion feature
CN108831443A (en) * 2018-06-25 2018-11-16 华中师范大学 A kind of mobile sound pick-up outfit source discrimination based on stacking autoencoder network
CN109285538A (en) * 2018-09-19 2019-01-29 宁波大学 A kind of mobile phone source title method under the additive noise environment based on normal Q transform domain
CN109923559A (en) * 2016-11-04 2019-06-21 易享信息技术有限公司 Quasi- Recognition with Recurrent Neural Network

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3822863B1 (en) * 2016-09-06 2022-11-02 DeepMind Technologies Limited Generating audio using neural networks
CN106887225B (en) * 2017-03-21 2020-04-07 百度在线网络技术(北京)有限公司 Acoustic feature extraction method and device based on convolutional neural network and terminal equipment
CN107481715B (en) * 2017-09-29 2020-12-08 百度在线网络技术(北京)有限公司 Method and apparatus for generating information

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102394062A (en) * 2011-10-26 2012-03-28 华南理工大学 Method and system for automatically identifying voice recording equipment source
CN105023581A (en) * 2015-07-24 2015-11-04 南京工程学院 Audio tampering detection device based on time-frequency domain joint features
CN105513610A (en) * 2015-11-23 2016-04-20 南京工程学院 Voice analysis method and device
CN109923559A (en) * 2016-11-04 2019-06-21 易享信息技术有限公司 Quasi- Recognition with Recurrent Neural Network
CN109952580A (en) * 2016-11-04 2019-06-28 易享信息技术有限公司 Coder-decoder model based on quasi- Recognition with Recurrent Neural Network
CN106952643A (en) * 2017-02-24 2017-07-14 华南理工大学 A kind of sound pick-up outfit clustering method based on Gaussian mean super vector and spectral clustering
CN107507626A (en) * 2017-07-07 2017-12-22 宁波大学 A kind of mobile phone source title method based on voice spectrum fusion feature
CN108831443A (en) * 2018-06-25 2018-11-16 华中师范大学 A kind of mobile sound pick-up outfit source discrimination based on stacking autoencoder network
CN109285538A (en) * 2018-09-19 2019-01-29 宁波大学 A kind of mobile phone source title method under the additive noise environment based on normal Q transform domain

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
基于CNN的录音设备判别研究;高冲红等;《信息化研究》;20160420(第02期);全文 *
基于线性预测梅尔频率倒谱系数的设备来源识别;秦天芸等;《数据通信》;20180828(第04期);全文 *
音频取证中录音设备识别研究进展;包永强等;《数据采集与处理》;20180915(第05期);全文 *

Also Published As

Publication number Publication date
CN110728991A (en) 2020-01-24

Similar Documents

Publication Publication Date Title
CN110728991B (en) Improved recording equipment identification algorithm
WO2021208287A1 (en) Voice activity detection method and apparatus for emotion recognition, electronic device, and storage medium
CN108305616B (en) Audio scene recognition method and device based on long-time and short-time feature extraction
US9818431B2 (en) Multi-speaker speech separation
US20160189730A1 (en) Speech separation method and system
Alamdari et al. Improving deep speech denoising by noisy2noisy signal mapping
CN105788592A (en) Audio classification method and apparatus thereof
CN103377651B (en) The automatic synthesizer of voice and method
CN108986798B (en) Processing method, device and the equipment of voice data
CN103985381A (en) Voice frequency indexing method based on parameter fusion optimized decision
CN109378014A (en) A kind of mobile device source discrimination and system based on convolutional neural networks
CN107358947A (en) Speaker recognition methods and system again
CN107274892A (en) Method for distinguishing speek person and device
Salekin et al. Distant emotion recognition
CN109300470A (en) Audio mixing separation method and audio mixing separator
Yan et al. Audio deepfake detection system with neural stitching for add 2022
CN113571095A (en) Speech emotion recognition method and system based on nested deep neural network
Jannu et al. Shuffle attention u-Net for speech enhancement in time domain
CN114970695B (en) Speaker segmentation clustering method based on non-parametric Bayesian model
Jin et al. Speech separation and emotion recognition for multi-speaker scenarios
Cui et al. Research on Audio Recognition Based on the Deep Neural Network in Music Teaching
Uhle et al. Speech enhancement of movie sound
WO2021217750A1 (en) Method and system for eliminating channel difference in voice interaction, electronic device, and medium
Tailor et al. Deep learning approach for spoken digit recognition in Gujarati language
Zhou et al. Environmental sound classification of western black-crowned gibbon habitat based on spectral subtraction and VGG16

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant