CN109065075A - A kind of method of speech processing, device, system and computer readable storage medium - Google Patents

A kind of method of speech processing, device, system and computer readable storage medium Download PDF

Info

Publication number
CN109065075A
CN109065075A CN201811124680.2A CN201811124680A CN109065075A CN 109065075 A CN109065075 A CN 109065075A CN 201811124680 A CN201811124680 A CN 201811124680A CN 109065075 A CN109065075 A CN 109065075A
Authority
CN
China
Prior art keywords
voice
mfcc
characteristic parameters
framing
processed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811124680.2A
Other languages
Chinese (zh)
Inventor
郑棉洲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Speakin Network Technology Co Ltd
Original Assignee
Guangzhou Speakin Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Speakin Network Technology Co Ltd filed Critical Guangzhou Speakin Network Technology Co Ltd
Priority to CN201811124680.2A priority Critical patent/CN109065075A/en
Publication of CN109065075A publication Critical patent/CN109065075A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/84Detection of presence or absence of voice signals for discriminating voice from noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention discloses a kind of method of speech processing, device, equipment and computer readable storage mediums, including pre-process to voice signal to be processed, obtain multiple framing audio signals;Feature extraction is carried out to each framing audio signal, is obtained and the one-to-one MFCC acoustical characteristic parameters of each framing audio signal;Each MFCC acoustical characteristic parameters are trained using the DNN disaggregated model pre-established, obtain with each one-to-one classification information of MFCC acoustical characteristic parameters, classification information includes voice and non-voice.The application can carry out non-voice classification to voice signal to be processed automatically, not only alleviate the work load of staff, also greatly improve work efficiency and classification accuracy.

Description

A kind of method of speech processing, device, system and computer readable storage medium
Technical field
The present embodiments relate to voice processing technology fields, more particularly to a kind of method of speech processing, device, system And computer readable storage medium.
Background technique
Voice identification is commonly used to work with security protection, is that the recording of speaking of perpetrator and suspect are passed through sonagraph respectively (vocal print instrument) is converted into ribbon or curved shape sonagram (i.e. vocal print), the languages such as the audio reflected according to sonagram, loudness of a sound and time Sound characteristic is compared, with regard to suspect whether be crime when speech people make identification and judgement.
In order to ensure the accuracy of voice identification needs to handle voice signal before identifying into voice, by language Non- voice in sound signal identifies, to reduce the influence that non-voice identifies voice.
In the prior art when carrying out Classification and Identification to the non-voice in voice signal, by using the mode manually demarcated Non- voice identification is carried out, such as observes the spectrogram of voice signal or by listening voice signal to be judged and identified, works Larger, not only easy error is measured, but also working efficiency is low.
In consideration of it, the method for speech processing, device, system and the computer that how to provide a kind of solution above-mentioned technical problem can Reading storage medium becomes those skilled in the art's problem to be solved.
Summary of the invention
The purpose of the embodiment of the present invention is that providing a kind of method of speech processing, device, system and computer-readable storage medium Matter can carry out non-voice classification to voice signal to be processed automatically in use, not only alleviate the work of staff It bears, also greatly improves work efficiency and classification accuracy.
In order to solve the above technical problems, the embodiment of the invention provides a kind of method of speech processing, comprising:
Voice signal to be processed is pre-processed, multiple framing audio signals are obtained;
Feature extraction is carried out to each framing audio signal, obtains corresponding with each framing audio signal Mel-frequency cepstrum coefficient MFCC acoustical characteristic parameters;
Each MFCC acoustical characteristic parameters are trained using the DNN disaggregated model pre-established, obtain with often A one-to-one classification information of MFCC acoustical characteristic parameters, the classification information includes voice and non-voice.
Optionally, described that voice signal to be processed is pre-processed, obtain the process of multiple framing audio signals are as follows:
Preemphasis, framing are carried out to the voice signal to be processed and add Hamming window processing, obtains multiple framing sounds Frequency signal.
Optionally, it obtains going back with after each one-to-one classification information of MFCC acoustical characteristic parameters described Include:
Determine that classification is each framing audio signal of non-voice according to each classification information.
Optionally, the non-voice includes paging sound, CRBT sound and customer service voices.
The embodiment of the present invention provides a kind of voice processing apparatus accordingly, comprising:
Preprocessing module obtains multiple framing audio signals for pre-processing to voice signal to be processed;
Extraction module obtains and each framing sound for carrying out feature extraction to each framing audio signal The one-to-one MFCC acoustical characteristic parameters of frequency signal;
Categorization module, for being carried out using the DNN disaggregated model pre-established to each MFCC acoustical characteristic parameters Training, obtain with each one-to-one classification information of MFCC acoustical characteristic parameters, the classification information include voice and Non- voice.
Optionally, the preprocessing module is specifically used for carrying out preemphasis, framing to the voice signal to be processed and add Hamming window processing, obtains multiple framing audio signals.
Optionally, further includes:
Screening module, for determining that each framing audio that classification is non-voice is believed according to each classification information Number.
The embodiment of the invention also provides a kind of speech processing devices, comprising:
Memory, for storing computer program;
Processor, the step of method of speech processing as described above is realized when for executing the computer program.
The embodiment of the invention also provides a kind of computer readable storage medium, deposited on the computer readable storage medium Computer program is contained, the computer program realizes the step of method of speech processing as described above when being executed by processor Suddenly.
The embodiment of the invention provides a kind of method of speech processing, device, equipment and computer readable storage medium, packets It includes: voice signal to be processed is pre-processed, obtain multiple framing audio signals;Feature is carried out to each framing audio signal It extracts, obtains and the one-to-one MFCC acoustical characteristic parameters of each framing audio signal;Using the DNN classification mould pre-established Type is trained each MFCC acoustical characteristic parameters, obtains believing with the one-to-one classification of each MFCC acoustical characteristic parameters Breath, classification information includes voice and non-voice.
As it can be seen that the application is by pre-processing voice signal to be processed and to the framing sound obtained after multiple pretreatments After frequency signal extraction MFCC acoustical characteristic parameters, by the DNN disjunctive model that pre-establishes respectively to each MFCC acoustic feature Parameter, which is trained, can obtain classification information corresponding with each framing audio signal, to believe voice to be processed Number carry out non-voice classification.The application can carry out non-voice classification to voice signal to be processed automatically, not only alleviate work The work load of personnel, also greatly improves work efficiency and classification accuracy.
Detailed description of the invention
It to describe the technical solutions in the embodiments of the present invention more clearly, below will be to institute in the prior art and embodiment Attached drawing to be used is needed to be briefly described, it should be apparent that, the accompanying drawings in the following description is only some implementations of the invention Example, for those of ordinary skill in the art, without creative efforts, can also obtain according to these attached drawings Obtain other attached drawings.
Fig. 1 is a kind of flow diagram of method of speech processing provided in an embodiment of the present invention;
Fig. 2 is a kind of structural schematic diagram of voice processing apparatus provided in an embodiment of the present invention;
Fig. 3 is a kind of structural schematic diagram of speech processing device provided in an embodiment of the present invention.
Specific embodiment
The embodiment of the invention provides a kind of method of speech processing, device, system and computer readable storage mediums, make With in the process non-voice classification can be carried out to voice signal to be processed automatically, the work load of staff is not only alleviated, It also greatly improves work efficiency and classification accuracy.
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art Every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
Fig. 1 is please referred to, Fig. 1 is a kind of flow diagram of method of speech processing provided in an embodiment of the present invention.
This method, comprising:
S110: pre-processing voice signal to be processed, obtains multiple framing audio signals;
It should be noted that needing when this application can be applied to carry out voice identification to call voice to call voice Carry out non-voice Classification and Identification, wherein voice signal to be processed can be obtained from recording file gathered in advance, recording file Specifically text specifically can be determined by the head information of the recording file by the recording file for terminal acquisition of arbitrarily recording Then part type reads the simultaneously information such as byte length, sample rate, sound channel of storage file, to obtain the record of the pcm of removal head Sound signal, the recorded audio signals can be used as voice signal to be processed.For example, the available suspect of police and judicial department Then recording file gets corresponding recorded audio signals further according to the recording file, to obtain voice signal to be processed, and lead to The method for crossing subsequent offer in the application carries out non-voice identification to the voice signal to be processed.
Wherein, pretreated process is carried out to voice signal to be processed in the application to be specifically as follows:
Preemphasis, framing are carried out to voice signal to be processed and add Hamming window processing, obtains multiple framing audio signals.
Specifically, carry out preemphasis processing to voice signal to be processed first, preemphasis is obtained treated the first voice Then signal carries out sub-frame processing to the first voice signal again, multiple second voice signals after obtaining framing, then again to every A second voice signal carries out plus Hamming window processing, to obtain multiple framing audio signals.
Wherein, preemphasis is a kind of signal processing mode compensated in transmitting terminal to input signal high fdrequency component.With The increase of signal rate, signal be damaged in transmission process it is very big, in order to which relatively good signal wave can be obtained receiving terminal Shape, it is necessary to impaired signal be compensated, the thought of pre-emphasis technique is exactly the height in the beginning of transmission line enhancing signal Frequency ingredient, to compensate excessive decaying of the high fdrequency component in transmission process, and preemphasis does not have an impact to noise, therefore can Effectively to improve output signal-to-noise ratio.
S120: feature extraction is carried out to each framing audio signal, is obtained one-to-one with each framing audio signal MFCC acoustical characteristic parameters;
Specifically, due to MFCC (Mel-Frequency Cepstral Coefficients, mel-frequency cepstrum coefficient) Acoustical characteristic parameters can characterize the feature of entire voice signal, so to the extraction of each framing audio signal and its in the application One-to-one MFCC acoustical characteristic parameters, to reduce the data volume of follow-up data processing, to improve treatment effeciency.
S130: being trained each MFCC acoustical characteristic parameters using the DNN disaggregated model that pre-establishes, obtain with often A one-to-one classification information of MFCC acoustical characteristic parameters, classification information include voice and non-voice.
It should be noted that establishing DNN disjunctive model previously according to a large amount of historical data in practical applications, specifically may be used To establish the DNN separation module in the application according to historical data according to DNN method for establishing model in the prior art, then exist When carrying out the identification of non-voice to voice signal to be processed, said extracted is gone out one-to-one with each framing audio signal MFCC acoustical characteristic parameters are input in the DNN disaggregated model, which passes through to each MFCC acoustical characteristic parameters It is trained to obtain classification information corresponding with the MFCC acoustical characteristic parameters, to obtain corresponding with each framing audio signal Classification information, thus staff can be just able to know that by classification information corresponding to each framing audio signal it is corresponding The type of framing audio signal be voice or non-voice.
Certainly, it in order to further increase working efficiency, is being obtained with each MFCC acoustical characteristic parameters one by one in the application After corresponding classification information, this method further include: determine that classification is each framing of non-voice according to each classification information Audio signal, so that staff directly determines that the framing audio of non-voice is believed according to the framing audio signal of each non-voice Time interval where number, and then the framing audio signal that type information is voice is calibrated, for use in subsequent voice mirror It is fixed.
Wherein, the non-voice in the application may include paging sound, CRBT sound and customer service voices, and DNN disjunctive model Specific establishment process is technology mature in the prior art, and details are not described herein by the application.
As it can be seen that the application is by pre-processing voice signal to be processed and to the framing sound obtained after multiple pretreatments After frequency signal extraction MFCC acoustical characteristic parameters, by the DNN disjunctive model that pre-establishes respectively to each MFCC acoustic feature Parameter, which is trained, can obtain classification information corresponding with each framing audio signal, to believe voice to be processed Number carry out non-voice classification.The application can carry out non-voice classification to voice signal to be processed automatically, not only alleviate work The work load of personnel, also greatly improves work efficiency and classification accuracy.
On the basis of the above embodiments, the embodiment of the present invention provides a kind of voice processing apparatus accordingly, specifically asks Referring to Fig. 2.The device includes:
Preprocessing module 21 obtains multiple framing audio signals for pre-processing to voice signal to be processed;
Extraction module 22 obtains and each framing audio signal for carrying out feature extraction to each framing audio signal One-to-one MFCC acoustical characteristic parameters;
Categorization module 23, for being instructed using the DNN disaggregated model pre-established to each MFCC acoustical characteristic parameters Practice, obtain with each one-to-one classification information of MFCC acoustical characteristic parameters, classification information includes voice and non-voice.
On the basis of a upper embodiment, preprocessing module 21 specifically can be used for carrying out voice signal to be processed pre- Aggravate, framing and add Hamming window processing, obtain multiple framing audio signals.
Based on any of the above embodiments, the device further include:
Screening module, for determining that classification is each framing audio signal of non-voice according to each classification information.
It should be noted that voice processing apparatus provided in the present embodiment have with provided in above-described embodiment The identical beneficial effect of method of speech processing, the specific introduction to method of speech processing involved in this present embodiment, is asked Referring to above-described embodiment, details are not described herein by the application.
On the basis of the above embodiments, it the embodiment of the invention also provides a kind of speech processing device, specifically please refers to Fig. 3.The equipment includes:
Memory 31, for storing computer program;
Processor 32 is realized when for executing computer program such as the step of above-mentioned method of speech processing.
For example, the processor 32 in the present embodiment pre-processes for realizing to voice signal to be processed, obtain multiple Framing audio signal;Feature extraction is carried out to each framing audio signal, is obtained one-to-one with each framing audio signal MFCC acoustical characteristic parameters;Each MFCC acoustical characteristic parameters are trained using the DNN disaggregated model pre-established, are obtained With each one-to-one classification information of MFCC acoustical characteristic parameters, classification information includes voice and non-voice.
On the basis of the above embodiments, the embodiment of the invention also provides a kind of computer readable storage mediums, calculate It is stored with computer program on machine readable storage medium storing program for executing, such as above-mentioned speech processes are realized when computer program is executed by processor The step of method.
The computer readable storage medium may include: USB flash disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic or disk etc. is various to deposit Store up the medium of program code.
Each embodiment in this specification is described in a progressive manner, the highlights of each of the examples are with other The difference of embodiment, the same or similar parts in each embodiment may refer to each other.For device disclosed in embodiment For, since it is corresponded to the methods disclosed in the examples, so being described relatively simple, related place is said referring to method part It is bright.
It should also be noted that, in the present specification, relational terms such as first and second and the like be used merely to by One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant meaning Covering non-exclusive inclusion, so that the process, method, article or equipment for including a series of elements not only includes that A little elements, but also including other elements that are not explicitly listed, or further include for this process, method, article or The intrinsic element of equipment.In the absence of more restrictions, the element limited by sentence "including a ...", is not arranged Except there is also other identical elements in the process, method, article or apparatus that includes the element.
The foregoing description of the disclosed embodiments enables those skilled in the art to implement or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, as defined herein General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, of the invention It is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein phase one The widest scope of cause.

Claims (9)

1. a kind of method of speech processing characterized by comprising
Voice signal to be processed is pre-processed, multiple framing audio signals are obtained;
Feature extraction is carried out to each framing audio signal, is obtained and each one-to-one plum of framing audio signal That frequency cepstral coefficient MFCC acoustical characteristic parameters;
Each MFCC acoustical characteristic parameters are trained using the DNN disaggregated model pre-established, are obtained and each institute The one-to-one classification information of MFCC acoustical characteristic parameters is stated, the classification information includes voice and non-voice.
2. method of speech processing according to claim 1, which is characterized in that described to be located in advance to voice signal to be processed Reason, obtains the process of multiple framing audio signals are as follows:
Preemphasis, framing are carried out to the voice signal to be processed and add Hamming window processing, obtains multiple framing audio letters Number.
3. method of speech processing according to claim 1 or 2, which is characterized in that obtained and each MFCC described After the one-to-one classification information of acoustical characteristic parameters, further includes:
Determine that classification is each framing audio signal of non-voice according to each classification information.
4. method of speech processing according to claim 3, which is characterized in that the non-voice includes paging sound, CRBT sound And customer service voices.
5. a kind of voice processing apparatus characterized by comprising
Preprocessing module obtains multiple framing audio signals for pre-processing to voice signal to be processed;
Extraction module obtains believing with each framing audio for carrying out feature extraction to each framing audio signal Number one-to-one MFCC acoustical characteristic parameters;
Categorization module, for being trained using the DNN disaggregated model pre-established to each MFCC acoustical characteristic parameters, Obtain with each one-to-one classification information of MFCC acoustical characteristic parameters, the classification information includes voice and inhuman Sound.
6. voice processing apparatus according to claim 5, which is characterized in that the preprocessing module is specifically used for institute It states voice signal to be processed to carry out preemphasis, framing and add Hamming window processing, obtains multiple framing audio signals.
7. voice processing apparatus according to claim 5 or 6, which is characterized in that further include:
Screening module, for determining that classification is each framing audio signal of non-voice according to each classification information.
8. a kind of speech processing device characterized by comprising
Memory, for storing computer program;
Processor realizes the speech processes side as described in Claims 1-4 any one when for executing the computer program The step of method.
9. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium Program realizes the method for speech processing as described in Claims 1-4 any one when the computer program is executed by processor The step of.
CN201811124680.2A 2018-09-26 2018-09-26 A kind of method of speech processing, device, system and computer readable storage medium Pending CN109065075A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811124680.2A CN109065075A (en) 2018-09-26 2018-09-26 A kind of method of speech processing, device, system and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811124680.2A CN109065075A (en) 2018-09-26 2018-09-26 A kind of method of speech processing, device, system and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN109065075A true CN109065075A (en) 2018-12-21

Family

ID=64765930

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811124680.2A Pending CN109065075A (en) 2018-09-26 2018-09-26 A kind of method of speech processing, device, system and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN109065075A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109949824A (en) * 2019-01-24 2019-06-28 江南大学 City sound event classification method based on N-DenseNet and higher-dimension mfcc feature
CN110049395A (en) * 2019-04-25 2019-07-23 维沃移动通信有限公司 Headset control method and ear speaker device
CN110390952A (en) * 2019-06-21 2019-10-29 江南大学 City sound event classification method based on bicharacteristic 2-DenseNet parallel connection
CN110473566A (en) * 2019-07-25 2019-11-19 深圳壹账通智能科技有限公司 Audio separation method, device, electronic equipment and computer readable storage medium
CN111696580A (en) * 2020-04-22 2020-09-22 广州多益网络股份有限公司 Voice detection method and device, electronic equipment and storage medium
CN112087726A (en) * 2020-09-11 2020-12-15 携程旅游网络技术(上海)有限公司 Method and system for identifying polyphonic ringtone, electronic equipment and storage medium
CN112382310A (en) * 2020-11-12 2021-02-19 北京猿力未来科技有限公司 Human voice audio recording method and device
CN112397073A (en) * 2020-11-04 2021-02-23 北京三快在线科技有限公司 Audio data processing method and device
CN112562656A (en) * 2020-12-16 2021-03-26 咪咕文化科技有限公司 Signal classification method, device, equipment and storage medium
CN113314123A (en) * 2021-04-12 2021-08-27 科大讯飞股份有限公司 Voice processing method, electronic equipment and storage device
CN113744758A (en) * 2021-09-16 2021-12-03 江南大学 Sound event detection method based on 2-DenseGRUNet model
CN115065912A (en) * 2022-06-22 2022-09-16 广州市迪声音响有限公司 Feedback inhibition device for screening sound box energy based on voiceprint screen technology

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030233233A1 (en) * 2002-06-13 2003-12-18 Industrial Technology Research Institute Speech recognition involving a neural network
CN105632501A (en) * 2015-12-30 2016-06-01 中国科学院自动化研究所 Deep-learning-technology-based automatic accent classification method and apparatus
CN105761720A (en) * 2016-04-19 2016-07-13 北京地平线机器人技术研发有限公司 Interaction system based on voice attribute classification, and method thereof
CN105788592A (en) * 2016-04-28 2016-07-20 乐视控股(北京)有限公司 Audio classification method and apparatus thereof
CN106328121A (en) * 2016-08-30 2017-01-11 南京理工大学 Chinese traditional musical instrument classification method based on depth confidence network
CN106710599A (en) * 2016-12-02 2017-05-24 深圳撒哈拉数据科技有限公司 Particular sound source detection method and particular sound source detection system based on deep neural network
CN107452371A (en) * 2017-05-27 2017-12-08 北京字节跳动网络技术有限公司 A kind of construction method and device of Classification of Speech model
CN108182949A (en) * 2017-12-11 2018-06-19 华南理工大学 A kind of highway anomalous audio event category method based on depth conversion feature
CN108257614A (en) * 2016-12-29 2018-07-06 北京酷我科技有限公司 The method and its system of audio data mark

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030233233A1 (en) * 2002-06-13 2003-12-18 Industrial Technology Research Institute Speech recognition involving a neural network
CN105632501A (en) * 2015-12-30 2016-06-01 中国科学院自动化研究所 Deep-learning-technology-based automatic accent classification method and apparatus
CN105761720A (en) * 2016-04-19 2016-07-13 北京地平线机器人技术研发有限公司 Interaction system based on voice attribute classification, and method thereof
CN105788592A (en) * 2016-04-28 2016-07-20 乐视控股(北京)有限公司 Audio classification method and apparatus thereof
CN106328121A (en) * 2016-08-30 2017-01-11 南京理工大学 Chinese traditional musical instrument classification method based on depth confidence network
CN106710599A (en) * 2016-12-02 2017-05-24 深圳撒哈拉数据科技有限公司 Particular sound source detection method and particular sound source detection system based on deep neural network
CN108257614A (en) * 2016-12-29 2018-07-06 北京酷我科技有限公司 The method and its system of audio data mark
CN107452371A (en) * 2017-05-27 2017-12-08 北京字节跳动网络技术有限公司 A kind of construction method and device of Classification of Speech model
CN108182949A (en) * 2017-12-11 2018-06-19 华南理工大学 A kind of highway anomalous audio event category method based on depth conversion feature

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
周学广等 著: "《信息内容安全》", 30 November 2012, 武汉大学出版社 *
宋知用 著: "《MATLAB语音信号分析与合成 第2版》", 31 January 2018, 北京航空航天大学出版社 *
韩志艳: "《语音识别及语音可视化技术研究》", 31 January 2017 *

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109949824A (en) * 2019-01-24 2019-06-28 江南大学 City sound event classification method based on N-DenseNet and higher-dimension mfcc feature
CN110049395A (en) * 2019-04-25 2019-07-23 维沃移动通信有限公司 Headset control method and ear speaker device
CN110049395B (en) * 2019-04-25 2020-06-05 维沃移动通信有限公司 Earphone control method and earphone device
CN110390952A (en) * 2019-06-21 2019-10-29 江南大学 City sound event classification method based on bicharacteristic 2-DenseNet parallel connection
CN110390952B (en) * 2019-06-21 2021-10-22 江南大学 City sound event classification method based on dual-feature 2-DenseNet parallel connection
CN110473566A (en) * 2019-07-25 2019-11-19 深圳壹账通智能科技有限公司 Audio separation method, device, electronic equipment and computer readable storage medium
CN111696580A (en) * 2020-04-22 2020-09-22 广州多益网络股份有限公司 Voice detection method and device, electronic equipment and storage medium
CN112087726A (en) * 2020-09-11 2020-12-15 携程旅游网络技术(上海)有限公司 Method and system for identifying polyphonic ringtone, electronic equipment and storage medium
CN112397073A (en) * 2020-11-04 2021-02-23 北京三快在线科技有限公司 Audio data processing method and device
CN112397073B (en) * 2020-11-04 2023-11-21 北京三快在线科技有限公司 Audio data processing method and device
CN112382310A (en) * 2020-11-12 2021-02-19 北京猿力未来科技有限公司 Human voice audio recording method and device
WO2022100692A1 (en) * 2020-11-12 2022-05-19 北京猿力未来科技有限公司 Human voice audio recording method and apparatus
CN112562656A (en) * 2020-12-16 2021-03-26 咪咕文化科技有限公司 Signal classification method, device, equipment and storage medium
CN113314123A (en) * 2021-04-12 2021-08-27 科大讯飞股份有限公司 Voice processing method, electronic equipment and storage device
CN113314123B (en) * 2021-04-12 2024-05-31 中国科学技术大学 Voice processing method, electronic equipment and storage device
CN113744758A (en) * 2021-09-16 2021-12-03 江南大学 Sound event detection method based on 2-DenseGRUNet model
CN113744758B (en) * 2021-09-16 2023-12-01 江南大学 Sound event detection method based on 2-DenseGRUNet model
CN115065912A (en) * 2022-06-22 2022-09-16 广州市迪声音响有限公司 Feedback inhibition device for screening sound box energy based on voiceprint screen technology

Similar Documents

Publication Publication Date Title
CN109065075A (en) A kind of method of speech processing, device, system and computer readable storage medium
WO2019227580A1 (en) Voice recognition method, apparatus, computer device, and storage medium
CN108900725B (en) Voiceprint recognition method and device, terminal equipment and storage medium
CN108922538B (en) Conference information recording method, conference information recording device, computer equipment and storage medium
CN110428810B (en) Voice wake-up recognition method and device and electronic equipment
US8731936B2 (en) Energy-efficient unobtrusive identification of a speaker
WO2019037382A1 (en) Emotion recognition-based voice quality inspection method and device, equipment and storage medium
US8396704B2 (en) Producing time uniform feature vectors
US8005676B2 (en) Speech analysis using statistical learning
CN105679310A (en) Method and system for speech recognition
CN109256150A (en) Speech emotion recognition system and method based on machine learning
CN108010513B (en) Voice processing method and device
CN111145763A (en) GRU-based voice recognition method and system in audio
CN108597505A (en) Audio recognition method, device and terminal device
US20210118464A1 (en) Method and apparatus for emotion recognition from speech
CN110798578A (en) Incoming call transaction management method and device and related equipment
CN109215634A (en) A kind of method and its system of more word voice control on-off systems
WO2019210556A1 (en) Call reservation method, agent leaving processing method and apparatus, device, and medium
CN109994129A (en) Speech processing system, method and apparatus
CN110556114B (en) Speaker identification method and device based on attention mechanism
CN108053834A (en) audio data processing method, device, terminal and system
JP2021078012A (en) Answering machine determination device, method and program
CN117116251A (en) Repayment probability assessment method and device based on collection-accelerating record
CN115273855A (en) Call volume adjusting method and related equipment
CN108735234A (en) A kind of device monitoring health status using voice messaging

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20181221

RJ01 Rejection of invention patent application after publication