CN110797031A - Voice change detection method, system, mobile terminal and storage medium - Google Patents

Voice change detection method, system, mobile terminal and storage medium Download PDF

Info

Publication number
CN110797031A
CN110797031A CN201910888401.8A CN201910888401A CN110797031A CN 110797031 A CN110797031 A CN 110797031A CN 201910888401 A CN201910888401 A CN 201910888401A CN 110797031 A CN110797031 A CN 110797031A
Authority
CN
China
Prior art keywords
voice
features
detected
cqt
cqcc
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910888401.8A
Other languages
Chinese (zh)
Inventor
陈文敏
肖龙源
李稀敏
蔡振华
刘晓葳
王静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Kuaishangtong Technology Co Ltd
Original Assignee
Xiamen Kuaishangtong Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Kuaishangtong Technology Co Ltd filed Critical Xiamen Kuaishangtong Technology Co Ltd
Priority to CN201910888401.8A priority Critical patent/CN110797031A/en
Publication of CN110797031A publication Critical patent/CN110797031A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/18Artificial neural networks; Connectionist approaches

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The invention is suitable for the technical field of automatic speaker verification, and provides a voice inflection detection method, a system, a mobile terminal and a storage medium, wherein the method comprises the following steps: acquiring sample voice data, and performing feature extraction on the sample voice data to obtain cqt voice features; cqt voice features are optimized to obtain cqcc voice features, and the cqcc voice features are input into a preset convolutional neural network for model training to obtain a voice detection model; and acquiring the voice to be detected, inputting the voice to be detected into the voice detection model for voice analysis, and performing inflexion judgment on the voice to be detected according to the analysis result of the voice detection model. According to the method, manual feature selection is not needed, model training is performed in a convolutional neural network-based mode, the accuracy of subsequent inflexion detection on the voice to be detected is improved, and the resolution of a voice detection model is improved through extraction and optimization based on cqt features.

Description

Voice change detection method, system, mobile terminal and storage medium
Technical Field
The invention belongs to the technical field of automatic speaker verification, and particularly relates to a voice inflection detection method, a system, a mobile terminal and a storage medium.
Background
Automatic Speaker Verification (ASV) technology has matured to become a low-cost, reliable method of identity verification and identification. However, just like all biometric patterns, this technique may be attacked by some fraudulent speech, such as replay speech, inflexion speech, synthetic speech, etc. The intention of using these types of speech is to impersonate other enrollees and then breach the verification system, thereby performing some illegal operations, and therefore, the inflexion detection step of the speech to be detected is particularly important in the use of ASV technology.
The existing voice inflection detection methods need manual sound wave feature selection, and correspondingly perform inflection judgment on voice to be detected in a sound wave matching mode, namely, the voice to be detected is subjected to ripple matching with preset sound waves through selection based on the manual sound wave features, so that an inflection judgment result is obtained, but the voice inflection detection efficiency is low due to the sound wave matching mode selected based on the manual features, and the accuracy is poor.
Disclosure of Invention
The embodiment of the invention aims to provide a voice inflection detection method, a system, a mobile terminal and a storage medium, and aims to solve the problems of low detection efficiency and poor detection accuracy caused by the fact that an acoustic wave matching mode is adopted for inflection judgment in the existing voice inflection detection process.
The embodiment of the invention is realized in such a way that a voice inflection detection method comprises the following steps:
acquiring sample voice data, and performing feature extraction on the sample voice data to obtain cqt voice features, wherein the sample voice data comprises positive sample data and negative sample data;
optimizing the cqt voice features to obtain cqcc voice features, and inputting the cqcc voice features to a preset convolutional neural network for model training to obtain a voice detection model;
and acquiring the voice to be detected, inputting the voice to be detected into the voice detection model for voice analysis, and performing inflexion judgment on the voice to be detected according to the analysis result of the voice detection model.
Further, the step of optimizing the cqt voice features comprises:
carrying out rate spectrum conversion on the cqt voice features to obtain a voice power spectrum, and acquiring the logarithm of the voice power spectrum;
and resampling is carried out according to the logarithm acquisition result of the voice power spectrum, and discrete cosine transform is carried out after resampling, so as to obtain the cqcc voice feature.
Further, the step of performing inflexion determination on the speech to be detected according to the analysis result of the speech detection model includes:
acquiring a probability value output by a softmax layer in the voice detection model;
when the probability value is judged to be larger than the probability threshold value, judging the voice to be detected to be inflexion voice;
and when the probability value is not larger than the probability threshold value, judging that the voice to be detected is non-inflexion voice.
Further, the step of inputting the cqcc speech features into a preset convolutional neural network for model training includes:
controlling the preset convolutional neural network to adopt a cross entropy loss function and updating network parameters by adopting an Adam algorithm;
and carrying out iteration for a preset number of times according to the cqcc voice characteristics to obtain the voice detection model.
Further, after the step of updating the network parameter by using the Adam algorithm, the method further includes:
and adding random inactivation operation into the preset convolutional neural network.
Furthermore, the preset convolutional neural network comprises three convolutional layers and two full-connection layers.
Another objective of an embodiment of the present invention is to provide a system for detecting a voice inflection, where the system includes:
the characteristic extraction module is used for acquiring sample voice data and extracting characteristics of the sample voice data to obtain cqt voice characteristics, wherein the sample voice data comprises positive sample data and negative sample data;
the model training module is used for optimizing the cqt voice features to obtain cqcc voice features, and inputting the cqcc voice features into a preset convolutional neural network for model training to obtain a voice detection model;
and the voice detection module is used for acquiring the voice to be detected, inputting the voice to be detected into the voice detection model for voice analysis, and carrying out voice change judgment on the voice to be detected according to the analysis result of the voice detection model.
Further, the model training module is further configured to:
carrying out rate spectrum conversion on the cqt voice features to obtain a voice power spectrum, and acquiring the logarithm of the voice power spectrum;
and resampling is carried out according to the logarithm acquisition result of the voice power spectrum, and discrete cosine transform is carried out after resampling, so as to obtain the cqcc voice feature.
Another object of an embodiment of the present invention is to provide a mobile terminal, including a storage device and a processor, where the storage device is used to store a computer program, and the processor runs the computer program to make the mobile terminal execute the above-mentioned voice inflection detection method.
Another object of an embodiment of the present invention is to provide a storage medium, which stores a computer program used in the mobile terminal, wherein the computer program, when executed by a processor, implements the steps of the voice inflection detection method.
According to the embodiment of the invention, manual feature selection is not needed, model training is carried out by adopting a convolutional neural network-based mode, so that the accuracy of subsequent inflexion detection for the voice to be detected is effectively improved, the resolution of the voice detection model is improved by extraction and optimization based on cqt features, the voice detection model can better distinguish the difference between inflexion voice and normal voice, the data computation amount can be reduced, and the detection efficiency of voice inflexion detection is improved.
Drawings
Fig. 1 is a flowchart of a voice inflection detection method according to a first embodiment of the present invention;
FIG. 2 is a flowchart of a method for detecting voice inflection provided in a second embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a system for detecting a voice inflection according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of a mobile terminal according to a fourth embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In order to explain the technical means of the present invention, the following description will be given by way of specific examples.
The existing voice inflection detection method has low voice inflection detection efficiency and poor accuracy due to the fact that a sound wave matching mode selected based on artificial features is adopted, therefore, the method aims to carry out model training by adopting a mode based on a convolutional neural network so as to improve the accuracy of subsequent inflection detection aiming at voice to be detected, improves the resolution of a voice detection model by extracting and optimizing based on cqt features, enables the voice detection model to better distinguish the difference between the inflection voice and normal voice, reduces the operation amount of data and improves the detection efficiency of the voice inflection detection.
Example one
Please refer to fig. 1, which is a flowchart illustrating a voice inflection detection method according to a first embodiment of the present invention, including the steps of:
step S10, obtaining sample voice data, and performing feature extraction on the sample voice data to obtain cqt voice features;
the sample voice data includes positive sample data and negative sample data, specifically, the positive sample data mainly includes sound data of a real person, and the negative sample data mainly includes pitch change data, record playback data, synthetic audio data, and the like;
preferably, the inflexion data can be collected through some mainstream inflexion apps, or the sound in the audio can be converted into the sound of a specific person through a related conversion algorithm, the record playback data can be acquired by some recording devices, and in addition, the synthetic sound data can also be generated through any voice interface;
step S20, optimizing the cqt voice features to obtain cqcc voice features, and inputting the cqcc voice features to a preset convolutional neural network for model training to obtain a voice detection model;
specifically, in the step, cqt voice features are optimized to obtain a design of cqcc (constant Q Cepstral coefficients) voice features, so that the characteristics of the voice features are improved, and the detection accuracy of a subsequent model for inflexion and non-inflexion is improved;
step S30, acquiring a voice to be detected, inputting the voice to be detected into the voice detection model for voice analysis, and performing inflexion judgment on the voice to be detected according to the analysis result of the voice detection model;
acquiring a voice to be detected by adopting a sound pickup mode, and in the step, acquiring a target cqcc characteristic in the voice to be detected and inputting the target cqcc characteristic into a network in the voice detection model for analysis to obtain a voice analysis result;
this embodiment need not to carry out artifical feature selection, through adopting the mode based on convolutional neural network in order to carry out the model training for the effectual follow-up accuracy that detects to the inflexion of waiting to detect pronunciation that has improved, through extraction and optimization based on cqt characteristics, improved the resolution ratio of pronunciation detection model, and when making the pronunciation detection model can be better distinguish inflexion pronunciation and normal pronunciation difference, also can reduce the operand of data, improved the detection efficiency that pronunciation inflexion detected.
Example two
Please refer to fig. 2, which is a flowchart illustrating a voice inflection detection method according to a second embodiment of the present invention, including the steps of:
step S11, obtaining sample voice data, and performing feature extraction on the sample voice data to obtain cqt voice features;
preferably, because human voice mainly gathers in low frequency, the low frequency has higher resolution and the high frequency has lower resolution, so that the model obtained by subsequent training can better distinguish the difference between inflected voice and normal voice and reduce the operation amount of the data by extracting based on cqt features in the step;
step S21, performing rate spectrum conversion on the cqt voice features to obtain a voice power spectrum, and acquiring the logarithm of the voice power spectrum;
step S31, resampling is carried out according to the logarithm acquisition result of the voice power spectrum, and discrete cosine transform is carried out after resampling, so as to obtain the cqcc voice feature;
wherein, the design of resampling is carried out to process the logarithm into the same scale, and the design of discrete cosine transform is based on, so that the voice information is concentrated in the low frequency part;
step S41, controlling the preset convolutional neural network to adopt a cross entropy loss function, adopting an Adam algorithm to update network parameters, and adding random inactivation operation into the preset convolutional neural network;
the design of random inactivation operation is added into the preset convolutional neural network, so that overfitting in the model construction process can be effectively prevented, and the stability of model construction is improved;
step S51, performing iteration for preset times according to the cqcc voice characteristics to obtain the voice detection model;
the preset times can be set according to the requirements of the user, for example, 500 times, 1000 times or 2000 times;
step S61, acquiring a voice to be detected, inputting the voice to be detected into the voice detection model for voice analysis, and performing inflexion judgment on the voice to be detected according to the analysis result of the voice detection model;
when the voice analysis result is voice information, the detection result is directly played to the user by adopting a voice playing mode;
when the voice analysis result is text information, numerical information or image information, the voice analysis result is sent to the target equipment corresponding to the preset display address through obtaining based on the preset display address, so that the detection result is correspondingly displayed on the target equipment, and the user can effectively and conveniently check the sound change detection result of the voice to be detected;
specifically, in this step, the step of performing inflection point determination on the speech to be detected according to the analysis result of the speech detection model includes:
acquiring a probability value output by a softmax layer in the voice detection model;
when the probability value is judged to be larger than the probability threshold value, judging the voice to be detected to be inflexion voice;
when the probability value is judged to be not greater than the probability threshold value, judging the voice to be detected to be non-inflexion voice;
this embodiment need not to carry out artifical feature selection, through adopting the mode based on convolutional neural network in order to carry out the model training for the effectual follow-up accuracy that detects to the inflexion of waiting to detect pronunciation that has improved, through extraction and optimization based on cqt characteristics, improved the resolution ratio of pronunciation detection model, and when making the pronunciation detection model can be better distinguish inflexion pronunciation and normal pronunciation difference, also can reduce the operand of data, improved the detection efficiency that pronunciation inflexion detected.
EXAMPLE III
Please refer to fig. 3, which is a schematic structural diagram of a system 100 for detecting a voice inflection point according to a third embodiment of the present invention, including: feature extraction module 10, model training module 11 and speech detection module 12, wherein:
the feature extraction module 10 is configured to obtain sample voice data, and perform feature extraction on the sample voice data to obtain cqt voice features, where the sample voice data includes positive sample data and negative sample data.
And the model training module 11 is configured to perform optimization processing on the cqt speech features to obtain cqcc speech features, and input the cqcc speech features to a preset convolutional neural network for model training to obtain a speech detection model, where the preset convolutional neural network includes three convolutional layers and two fully-connected layers.
Wherein the model training module 11 is further configured to: controlling the preset convolutional neural network to adopt a cross entropy loss function and updating network parameters by adopting an Adam algorithm; and carrying out iteration for a preset number of times according to the cqcc voice characteristics to obtain the voice detection model.
Furthermore, the model training module 11 is further configured to: and adding random inactivation operation into the preset convolutional neural network.
Further, the model training module 11 is further configured to: carrying out rate spectrum conversion on the cqt voice features to obtain a voice power spectrum, and acquiring the logarithm of the voice power spectrum; and resampling is carried out according to the logarithm acquisition result of the voice power spectrum, and discrete cosine transform is carried out after resampling, so as to obtain the cqcc voice feature.
The voice detection module 12 is configured to acquire a voice to be detected, input the voice to be detected to the voice detection model for voice analysis, and perform inflexion judgment on the voice to be detected according to an analysis result of the voice detection model.
Wherein the voice detection module 12 is further configured to: acquiring a probability value output by a softmax layer in the voice detection model; when the probability value is judged to be larger than the probability threshold value, judging the voice to be detected to be inflexion voice; and when the probability value is not larger than the probability threshold value, judging that the voice to be detected is non-inflexion voice.
This embodiment need not to carry out artifical feature selection, through adopting the mode based on convolutional neural network in order to carry out the model training for the effectual follow-up accuracy that detects to the inflexion of waiting to detect pronunciation that has improved, through extraction and optimization based on cqt characteristics, improved the resolution ratio of pronunciation detection model, and when making the pronunciation detection model can be better distinguish inflexion pronunciation and normal pronunciation difference, also can reduce the operand of data, improved the detection efficiency that pronunciation inflexion detected.
Example four
Referring to fig. 4, a mobile terminal 101 according to a fourth embodiment of the present invention includes a storage device and a processor, where the storage device is used to store a computer program, and the processor runs the computer program to make the mobile terminal 101 execute the above-mentioned voice change detection method.
The present embodiment also provides a storage medium on which a computer program used in the above-mentioned mobile terminal 101 is stored, which when executed, includes the steps of:
acquiring sample voice data, and performing feature extraction on the sample voice data to obtain cqt voice features, wherein the sample voice data comprises positive sample data and negative sample data;
optimizing the cqt voice features to obtain cqcc voice features, and inputting the cqcc voice features to a preset convolutional neural network for model training to obtain a voice detection model;
and acquiring the voice to be detected, inputting the voice to be detected into the voice detection model for voice analysis, and performing inflexion judgment on the voice to be detected according to the analysis result of the voice detection model. The storage medium, such as: ROM/RAM, magnetic disk, optical disk, etc.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is used as an example, in practical applications, the above-mentioned function distribution may be performed by different functional units or modules according to needs, that is, the internal structure of the storage device is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit, and the integrated unit may be implemented in a form of hardware, or may be implemented in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application.
Those skilled in the art will appreciate that the configuration shown in fig. 3 is not intended to limit the present invention and may include more or fewer components than those shown, or some components in combination, or a different arrangement of components, and that the voice inflection detection method of fig. 1-2 may be implemented using more or fewer components than those shown in fig. 3, or some components in combination, or a different arrangement of components. The units, modules, etc. referred to herein are a series of computer programs that can be executed by a processor (not shown) in the voice inflection detection system and that can perform a specific function, and all of them can be stored in a storage device (not shown) of the voice inflection detection system.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (10)

1. A method for detecting a voice inflection, the method comprising:
acquiring sample voice data, and performing feature extraction on the sample voice data to obtain cqt voice features, wherein the sample voice data comprises positive sample data and negative sample data;
optimizing the cqt voice features to obtain cqcc voice features, and inputting the cqcc voice features to a preset convolutional neural network for model training to obtain a voice detection model;
and acquiring the voice to be detected, inputting the voice to be detected into the voice detection model for voice analysis, and performing inflexion judgment on the voice to be detected according to the analysis result of the voice detection model.
2. The method of detecting inflections in speech of claim 1 wherein said step of optimizing said cqt speech feature comprises:
carrying out rate spectrum conversion on the cqt voice features to obtain a voice power spectrum, and acquiring the logarithm of the voice power spectrum;
and resampling is carried out according to the logarithm acquisition result of the voice power spectrum, and discrete cosine transform is carried out after resampling, so as to obtain the cqcc voice feature.
3. The method according to claim 1, wherein the step of performing inflection decision on the speech to be detected according to the analysis result of the speech detection model comprises:
acquiring a probability value output by a softmax layer in the voice detection model;
when the probability value is judged to be larger than the probability threshold value, judging the voice to be detected to be inflexion voice;
and when the probability value is not larger than the probability threshold value, judging that the voice to be detected is non-inflexion voice.
4. The method of detecting inflexion in speech of claim 1, where the step of inputting the cqcc speech features into a predetermined convolutional neural network for model training comprises:
controlling the preset convolutional neural network to adopt a cross entropy loss function and updating network parameters by adopting an Adam algorithm;
and carrying out iteration for a preset number of times according to the cqcc voice characteristics to obtain the voice detection model.
5. The method of detecting inflections in speech according to claim 4, wherein after the step of employing the Adam algorithm for updating the network parameters, the method further comprises:
and adding random inactivation operation into the preset convolutional neural network.
6. The method of detecting inflectional to speech of claim 1, wherein the predetermined convolutional neural network comprises three convolutional layers and two fully-connected layers.
7. A system for detecting a voice inflection, the system comprising:
the characteristic extraction module is used for acquiring sample voice data and extracting characteristics of the sample voice data to obtain cqt voice characteristics, wherein the sample voice data comprises positive sample data and negative sample data;
the model training module is used for optimizing the cqt voice features to obtain cqcc voice features, and inputting the cqcc voice features into a preset convolutional neural network for model training to obtain a voice detection model;
and the voice detection module is used for acquiring the voice to be detected, inputting the voice to be detected into the voice detection model for voice analysis, and carrying out voice change judgment on the voice to be detected according to the analysis result of the voice detection model.
8. The voice inflection detection system of claim 7 wherein the model training module is further configured to:
carrying out rate spectrum conversion on the cqt voice features to obtain a voice power spectrum, and acquiring the logarithm of the voice power spectrum;
and resampling is carried out according to the logarithm acquisition result of the voice power spectrum, and discrete cosine transform is carried out after resampling, so as to obtain the cqcc voice feature.
9. A mobile terminal, characterized by comprising a storage device for storing a computer program and a processor for executing the computer program to cause the mobile terminal to perform the voice inflection detection method according to any one of claims 1 to 6.
10. A storage medium, characterized in that it stores a computer program for use in a mobile terminal according to claim 9, which computer program, when executed by a processor, implements the steps of the method of detecting a voicing according to any of claims 1-6.
CN201910888401.8A 2019-09-19 2019-09-19 Voice change detection method, system, mobile terminal and storage medium Pending CN110797031A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910888401.8A CN110797031A (en) 2019-09-19 2019-09-19 Voice change detection method, system, mobile terminal and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910888401.8A CN110797031A (en) 2019-09-19 2019-09-19 Voice change detection method, system, mobile terminal and storage medium

Publications (1)

Publication Number Publication Date
CN110797031A true CN110797031A (en) 2020-02-14

Family

ID=69438586

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910888401.8A Pending CN110797031A (en) 2019-09-19 2019-09-19 Voice change detection method, system, mobile terminal and storage medium

Country Status (1)

Country Link
CN (1) CN110797031A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111739546A (en) * 2020-07-24 2020-10-02 深圳市声扬科技有限公司 Sound-changing voice reduction method and device, computer equipment and storage medium
CN111798828A (en) * 2020-05-29 2020-10-20 厦门快商通科技股份有限公司 Synthetic audio detection method, system, mobile terminal and storage medium
CN111951811A (en) * 2020-07-15 2020-11-17 珠海市杰理科技股份有限公司 Bluetooth headset control method and device, Bluetooth headset and preset information importing method
CN113257284A (en) * 2021-06-09 2021-08-13 北京世纪好未来教育科技有限公司 Voice activity detection model training method, voice activity detection method and related device
CN113436646A (en) * 2021-06-10 2021-09-24 杭州电子科技大学 Camouflage voice detection method adopting combined features and random forest
CN113646833A (en) * 2021-07-14 2021-11-12 东莞理工学院 Voice confrontation sample detection method, device, equipment and computer readable storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108198574A (en) * 2017-12-29 2018-06-22 科大讯飞股份有限公司 Change of voice detection method and device
US20180254046A1 (en) * 2017-03-03 2018-09-06 Pindrop Security, Inc. Method and apparatus for detecting spoofing conditions
CN109243487A (en) * 2018-11-30 2019-01-18 宁波大学 A kind of voice playback detection method normalizing normal Q cepstrum feature
CN109300479A (en) * 2018-10-31 2019-02-01 桂林电子科技大学 A kind of method for recognizing sound-groove of voice playback, device and storage medium
CN109473105A (en) * 2018-10-26 2019-03-15 平安科技(深圳)有限公司 The voice print verification method, apparatus unrelated with text and computer equipment
CN109754812A (en) * 2019-01-30 2019-05-14 华南理工大学 A kind of voiceprint authentication method of the anti-recording attack detecting based on convolutional neural networks
CN109872720A (en) * 2019-01-29 2019-06-11 广东技术师范学院 It is a kind of that speech detection algorithms being rerecorded to different scenes robust based on convolutional neural networks
CN110232927A (en) * 2019-06-13 2019-09-13 苏州思必驰信息科技有限公司 Speaker verification's anti-spoofing method and apparatus

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180254046A1 (en) * 2017-03-03 2018-09-06 Pindrop Security, Inc. Method and apparatus for detecting spoofing conditions
CN108198574A (en) * 2017-12-29 2018-06-22 科大讯飞股份有限公司 Change of voice detection method and device
CN109473105A (en) * 2018-10-26 2019-03-15 平安科技(深圳)有限公司 The voice print verification method, apparatus unrelated with text and computer equipment
CN109300479A (en) * 2018-10-31 2019-02-01 桂林电子科技大学 A kind of method for recognizing sound-groove of voice playback, device and storage medium
CN109243487A (en) * 2018-11-30 2019-01-18 宁波大学 A kind of voice playback detection method normalizing normal Q cepstrum feature
CN109872720A (en) * 2019-01-29 2019-06-11 广东技术师范学院 It is a kind of that speech detection algorithms being rerecorded to different scenes robust based on convolutional neural networks
CN109754812A (en) * 2019-01-30 2019-05-14 华南理工大学 A kind of voiceprint authentication method of the anti-recording attack detecting based on convolutional neural networks
CN110232927A (en) * 2019-06-13 2019-09-13 苏州思必驰信息科技有限公司 Speaker verification's anti-spoofing method and apparatus

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
许庆勇: "《基于深度学习理论的纹身图像识别与检测研究》", 31 December 2018, 华中科技大学出版社 *
辛阳等: "《大数据技术原理与实践》", 31 January 2018, 北京邮电大学出版社 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111798828A (en) * 2020-05-29 2020-10-20 厦门快商通科技股份有限公司 Synthetic audio detection method, system, mobile terminal and storage medium
CN111798828B (en) * 2020-05-29 2023-02-14 厦门快商通科技股份有限公司 Synthetic audio detection method, system, mobile terminal and storage medium
CN111951811A (en) * 2020-07-15 2020-11-17 珠海市杰理科技股份有限公司 Bluetooth headset control method and device, Bluetooth headset and preset information importing method
CN111739546A (en) * 2020-07-24 2020-10-02 深圳市声扬科技有限公司 Sound-changing voice reduction method and device, computer equipment and storage medium
CN113257284A (en) * 2021-06-09 2021-08-13 北京世纪好未来教育科技有限公司 Voice activity detection model training method, voice activity detection method and related device
CN113436646A (en) * 2021-06-10 2021-09-24 杭州电子科技大学 Camouflage voice detection method adopting combined features and random forest
CN113436646B (en) * 2021-06-10 2022-09-23 杭州电子科技大学 Camouflage voice detection method adopting combined features and random forest
CN113646833A (en) * 2021-07-14 2021-11-12 东莞理工学院 Voice confrontation sample detection method, device, equipment and computer readable storage medium
WO2023283823A1 (en) * 2021-07-14 2023-01-19 东莞理工学院 Speech adversarial sample testing method and apparatus, device, and computer-readable storage medium

Similar Documents

Publication Publication Date Title
CN110797031A (en) Voice change detection method, system, mobile terminal and storage medium
JP6393730B2 (en) Voice identification method and apparatus
US10178228B2 (en) Method and apparatus for classifying telephone dialing test audio based on artificial intelligence
CN110223680A (en) Method of speech processing, recognition methods and its device, system, electronic equipment
CN111161752A (en) Echo cancellation method and device
CN110600048B (en) Audio verification method and device, storage medium and electronic equipment
CN110880329A (en) Audio identification method and equipment and storage medium
CN111862951B (en) Voice endpoint detection method and device, storage medium and electronic equipment
CN110428835B (en) Voice equipment adjusting method and device, storage medium and voice equipment
CN110211599A (en) Using awakening method, device, storage medium and electronic equipment
US11282514B2 (en) Method and apparatus for recognizing voice
CN109065043A (en) A kind of order word recognition method and computer storage medium
CN111540342A (en) Energy threshold adjusting method, device, equipment and medium
CN109545226B (en) Voice recognition method, device and computer readable storage medium
CN108847251B (en) Voice duplicate removal method, device, server and storage medium
CN113886821A (en) Malicious process identification method and device based on twin network, electronic equipment and storage medium
JP6843701B2 (en) Parameter prediction device and parameter prediction method for acoustic signal processing
CN109377982A (en) A kind of efficient voice acquisition methods
CN110070891B (en) Song identification method and device and storage medium
CN117076941A (en) Optical cable bird damage monitoring method, system, electronic equipment and readable storage medium
CN107993666B (en) Speech recognition method, speech recognition device, computer equipment and readable storage medium
CN116386669A (en) Machine running acoustic state monitoring method and system based on block automatic encoder
CN113793623B (en) Sound effect setting method, device, equipment and computer readable storage medium
CN113312619B (en) Malicious process detection method and device based on small sample learning, electronic equipment and storage medium
CN114420136A (en) Method and device for training voiceprint recognition model and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200214