CN109065075A - A kind of method of speech processing, device, system and computer readable storage medium - Google Patents
A kind of method of speech processing, device, system and computer readable storage medium Download PDFInfo
- Publication number
- CN109065075A CN109065075A CN201811124680.2A CN201811124680A CN109065075A CN 109065075 A CN109065075 A CN 109065075A CN 201811124680 A CN201811124680 A CN 201811124680A CN 109065075 A CN109065075 A CN 109065075A
- Authority
- CN
- China
- Prior art keywords
- voice
- mfcc
- characteristic parameters
- framing
- processed
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 46
- 238000012545 processing Methods 0.000 title claims abstract description 44
- 238000009432 framing Methods 0.000 claims abstract description 61
- 230000005236 sound signal Effects 0.000 claims abstract description 50
- 238000000605 extraction Methods 0.000 claims abstract description 14
- 238000007781 pre-processing Methods 0.000 claims description 12
- 238000004590 computer program Methods 0.000 claims description 11
- 238000012216 screening Methods 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 3
- 238000002203 pretreatment Methods 0.000 description 2
- 230000001755 vocal effect Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000001771 impaired effect Effects 0.000 description 1
- 239000004615 ingredient Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L25/84—Detection of presence or absence of voice signals for discriminating voice from noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Telephonic Communication Services (AREA)
Abstract
The invention discloses a kind of method of speech processing, device, equipment and computer readable storage mediums, including pre-process to voice signal to be processed, obtain multiple framing audio signals;Feature extraction is carried out to each framing audio signal, is obtained and the one-to-one MFCC acoustical characteristic parameters of each framing audio signal;Each MFCC acoustical characteristic parameters are trained using the DNN disaggregated model pre-established, obtain with each one-to-one classification information of MFCC acoustical characteristic parameters, classification information includes voice and non-voice.The application can carry out non-voice classification to voice signal to be processed automatically, not only alleviate the work load of staff, also greatly improve work efficiency and classification accuracy.
Description
Technical field
The present embodiments relate to voice processing technology fields, more particularly to a kind of method of speech processing, device, system
And computer readable storage medium.
Background technique
Voice identification is commonly used to work with security protection, is that the recording of speaking of perpetrator and suspect are passed through sonagraph respectively
(vocal print instrument) is converted into ribbon or curved shape sonagram (i.e. vocal print), the languages such as the audio reflected according to sonagram, loudness of a sound and time
Sound characteristic is compared, with regard to suspect whether be crime when speech people make identification and judgement.
In order to ensure the accuracy of voice identification needs to handle voice signal before identifying into voice, by language
Non- voice in sound signal identifies, to reduce the influence that non-voice identifies voice.
In the prior art when carrying out Classification and Identification to the non-voice in voice signal, by using the mode manually demarcated
Non- voice identification is carried out, such as observes the spectrogram of voice signal or by listening voice signal to be judged and identified, works
Larger, not only easy error is measured, but also working efficiency is low.
In consideration of it, the method for speech processing, device, system and the computer that how to provide a kind of solution above-mentioned technical problem can
Reading storage medium becomes those skilled in the art's problem to be solved.
Summary of the invention
The purpose of the embodiment of the present invention is that providing a kind of method of speech processing, device, system and computer-readable storage medium
Matter can carry out non-voice classification to voice signal to be processed automatically in use, not only alleviate the work of staff
It bears, also greatly improves work efficiency and classification accuracy.
In order to solve the above technical problems, the embodiment of the invention provides a kind of method of speech processing, comprising:
Voice signal to be processed is pre-processed, multiple framing audio signals are obtained;
Feature extraction is carried out to each framing audio signal, obtains corresponding with each framing audio signal
Mel-frequency cepstrum coefficient MFCC acoustical characteristic parameters;
Each MFCC acoustical characteristic parameters are trained using the DNN disaggregated model pre-established, obtain with often
A one-to-one classification information of MFCC acoustical characteristic parameters, the classification information includes voice and non-voice.
Optionally, described that voice signal to be processed is pre-processed, obtain the process of multiple framing audio signals are as follows:
Preemphasis, framing are carried out to the voice signal to be processed and add Hamming window processing, obtains multiple framing sounds
Frequency signal.
Optionally, it obtains going back with after each one-to-one classification information of MFCC acoustical characteristic parameters described
Include:
Determine that classification is each framing audio signal of non-voice according to each classification information.
Optionally, the non-voice includes paging sound, CRBT sound and customer service voices.
The embodiment of the present invention provides a kind of voice processing apparatus accordingly, comprising:
Preprocessing module obtains multiple framing audio signals for pre-processing to voice signal to be processed;
Extraction module obtains and each framing sound for carrying out feature extraction to each framing audio signal
The one-to-one MFCC acoustical characteristic parameters of frequency signal;
Categorization module, for being carried out using the DNN disaggregated model pre-established to each MFCC acoustical characteristic parameters
Training, obtain with each one-to-one classification information of MFCC acoustical characteristic parameters, the classification information include voice and
Non- voice.
Optionally, the preprocessing module is specifically used for carrying out preemphasis, framing to the voice signal to be processed and add
Hamming window processing, obtains multiple framing audio signals.
Optionally, further includes:
Screening module, for determining that each framing audio that classification is non-voice is believed according to each classification information
Number.
The embodiment of the invention also provides a kind of speech processing devices, comprising:
Memory, for storing computer program;
Processor, the step of method of speech processing as described above is realized when for executing the computer program.
The embodiment of the invention also provides a kind of computer readable storage medium, deposited on the computer readable storage medium
Computer program is contained, the computer program realizes the step of method of speech processing as described above when being executed by processor
Suddenly.
The embodiment of the invention provides a kind of method of speech processing, device, equipment and computer readable storage medium, packets
It includes: voice signal to be processed is pre-processed, obtain multiple framing audio signals;Feature is carried out to each framing audio signal
It extracts, obtains and the one-to-one MFCC acoustical characteristic parameters of each framing audio signal;Using the DNN classification mould pre-established
Type is trained each MFCC acoustical characteristic parameters, obtains believing with the one-to-one classification of each MFCC acoustical characteristic parameters
Breath, classification information includes voice and non-voice.
As it can be seen that the application is by pre-processing voice signal to be processed and to the framing sound obtained after multiple pretreatments
After frequency signal extraction MFCC acoustical characteristic parameters, by the DNN disjunctive model that pre-establishes respectively to each MFCC acoustic feature
Parameter, which is trained, can obtain classification information corresponding with each framing audio signal, to believe voice to be processed
Number carry out non-voice classification.The application can carry out non-voice classification to voice signal to be processed automatically, not only alleviate work
The work load of personnel, also greatly improves work efficiency and classification accuracy.
Detailed description of the invention
It to describe the technical solutions in the embodiments of the present invention more clearly, below will be to institute in the prior art and embodiment
Attached drawing to be used is needed to be briefly described, it should be apparent that, the accompanying drawings in the following description is only some implementations of the invention
Example, for those of ordinary skill in the art, without creative efforts, can also obtain according to these attached drawings
Obtain other attached drawings.
Fig. 1 is a kind of flow diagram of method of speech processing provided in an embodiment of the present invention;
Fig. 2 is a kind of structural schematic diagram of voice processing apparatus provided in an embodiment of the present invention;
Fig. 3 is a kind of structural schematic diagram of speech processing device provided in an embodiment of the present invention.
Specific embodiment
The embodiment of the invention provides a kind of method of speech processing, device, system and computer readable storage mediums, make
With in the process non-voice classification can be carried out to voice signal to be processed automatically, the work load of staff is not only alleviated,
It also greatly improves work efficiency and classification accuracy.
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention
In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is
A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art
Every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
Fig. 1 is please referred to, Fig. 1 is a kind of flow diagram of method of speech processing provided in an embodiment of the present invention.
This method, comprising:
S110: pre-processing voice signal to be processed, obtains multiple framing audio signals;
It should be noted that needing when this application can be applied to carry out voice identification to call voice to call voice
Carry out non-voice Classification and Identification, wherein voice signal to be processed can be obtained from recording file gathered in advance, recording file
Specifically text specifically can be determined by the head information of the recording file by the recording file for terminal acquisition of arbitrarily recording
Then part type reads the simultaneously information such as byte length, sample rate, sound channel of storage file, to obtain the record of the pcm of removal head
Sound signal, the recorded audio signals can be used as voice signal to be processed.For example, the available suspect of police and judicial department
Then recording file gets corresponding recorded audio signals further according to the recording file, to obtain voice signal to be processed, and lead to
The method for crossing subsequent offer in the application carries out non-voice identification to the voice signal to be processed.
Wherein, pretreated process is carried out to voice signal to be processed in the application to be specifically as follows:
Preemphasis, framing are carried out to voice signal to be processed and add Hamming window processing, obtains multiple framing audio signals.
Specifically, carry out preemphasis processing to voice signal to be processed first, preemphasis is obtained treated the first voice
Then signal carries out sub-frame processing to the first voice signal again, multiple second voice signals after obtaining framing, then again to every
A second voice signal carries out plus Hamming window processing, to obtain multiple framing audio signals.
Wherein, preemphasis is a kind of signal processing mode compensated in transmitting terminal to input signal high fdrequency component.With
The increase of signal rate, signal be damaged in transmission process it is very big, in order to which relatively good signal wave can be obtained receiving terminal
Shape, it is necessary to impaired signal be compensated, the thought of pre-emphasis technique is exactly the height in the beginning of transmission line enhancing signal
Frequency ingredient, to compensate excessive decaying of the high fdrequency component in transmission process, and preemphasis does not have an impact to noise, therefore can
Effectively to improve output signal-to-noise ratio.
S120: feature extraction is carried out to each framing audio signal, is obtained one-to-one with each framing audio signal
MFCC acoustical characteristic parameters;
Specifically, due to MFCC (Mel-Frequency Cepstral Coefficients, mel-frequency cepstrum coefficient)
Acoustical characteristic parameters can characterize the feature of entire voice signal, so to the extraction of each framing audio signal and its in the application
One-to-one MFCC acoustical characteristic parameters, to reduce the data volume of follow-up data processing, to improve treatment effeciency.
S130: being trained each MFCC acoustical characteristic parameters using the DNN disaggregated model that pre-establishes, obtain with often
A one-to-one classification information of MFCC acoustical characteristic parameters, classification information include voice and non-voice.
It should be noted that establishing DNN disjunctive model previously according to a large amount of historical data in practical applications, specifically may be used
To establish the DNN separation module in the application according to historical data according to DNN method for establishing model in the prior art, then exist
When carrying out the identification of non-voice to voice signal to be processed, said extracted is gone out one-to-one with each framing audio signal
MFCC acoustical characteristic parameters are input in the DNN disaggregated model, which passes through to each MFCC acoustical characteristic parameters
It is trained to obtain classification information corresponding with the MFCC acoustical characteristic parameters, to obtain corresponding with each framing audio signal
Classification information, thus staff can be just able to know that by classification information corresponding to each framing audio signal it is corresponding
The type of framing audio signal be voice or non-voice.
Certainly, it in order to further increase working efficiency, is being obtained with each MFCC acoustical characteristic parameters one by one in the application
After corresponding classification information, this method further include: determine that classification is each framing of non-voice according to each classification information
Audio signal, so that staff directly determines that the framing audio of non-voice is believed according to the framing audio signal of each non-voice
Time interval where number, and then the framing audio signal that type information is voice is calibrated, for use in subsequent voice mirror
It is fixed.
Wherein, the non-voice in the application may include paging sound, CRBT sound and customer service voices, and DNN disjunctive model
Specific establishment process is technology mature in the prior art, and details are not described herein by the application.
As it can be seen that the application is by pre-processing voice signal to be processed and to the framing sound obtained after multiple pretreatments
After frequency signal extraction MFCC acoustical characteristic parameters, by the DNN disjunctive model that pre-establishes respectively to each MFCC acoustic feature
Parameter, which is trained, can obtain classification information corresponding with each framing audio signal, to believe voice to be processed
Number carry out non-voice classification.The application can carry out non-voice classification to voice signal to be processed automatically, not only alleviate work
The work load of personnel, also greatly improves work efficiency and classification accuracy.
On the basis of the above embodiments, the embodiment of the present invention provides a kind of voice processing apparatus accordingly, specifically asks
Referring to Fig. 2.The device includes:
Preprocessing module 21 obtains multiple framing audio signals for pre-processing to voice signal to be processed;
Extraction module 22 obtains and each framing audio signal for carrying out feature extraction to each framing audio signal
One-to-one MFCC acoustical characteristic parameters;
Categorization module 23, for being instructed using the DNN disaggregated model pre-established to each MFCC acoustical characteristic parameters
Practice, obtain with each one-to-one classification information of MFCC acoustical characteristic parameters, classification information includes voice and non-voice.
On the basis of a upper embodiment, preprocessing module 21 specifically can be used for carrying out voice signal to be processed pre-
Aggravate, framing and add Hamming window processing, obtain multiple framing audio signals.
Based on any of the above embodiments, the device further include:
Screening module, for determining that classification is each framing audio signal of non-voice according to each classification information.
It should be noted that voice processing apparatus provided in the present embodiment have with provided in above-described embodiment
The identical beneficial effect of method of speech processing, the specific introduction to method of speech processing involved in this present embodiment, is asked
Referring to above-described embodiment, details are not described herein by the application.
On the basis of the above embodiments, it the embodiment of the invention also provides a kind of speech processing device, specifically please refers to
Fig. 3.The equipment includes:
Memory 31, for storing computer program;
Processor 32 is realized when for executing computer program such as the step of above-mentioned method of speech processing.
For example, the processor 32 in the present embodiment pre-processes for realizing to voice signal to be processed, obtain multiple
Framing audio signal;Feature extraction is carried out to each framing audio signal, is obtained one-to-one with each framing audio signal
MFCC acoustical characteristic parameters;Each MFCC acoustical characteristic parameters are trained using the DNN disaggregated model pre-established, are obtained
With each one-to-one classification information of MFCC acoustical characteristic parameters, classification information includes voice and non-voice.
On the basis of the above embodiments, the embodiment of the invention also provides a kind of computer readable storage mediums, calculate
It is stored with computer program on machine readable storage medium storing program for executing, such as above-mentioned speech processes are realized when computer program is executed by processor
The step of method.
The computer readable storage medium may include: USB flash disk, mobile hard disk, read-only memory (Read-Only
Memory, ROM), random access memory (Random Access Memory, RAM), magnetic or disk etc. is various to deposit
Store up the medium of program code.
Each embodiment in this specification is described in a progressive manner, the highlights of each of the examples are with other
The difference of embodiment, the same or similar parts in each embodiment may refer to each other.For device disclosed in embodiment
For, since it is corresponded to the methods disclosed in the examples, so being described relatively simple, related place is said referring to method part
It is bright.
It should also be noted that, in the present specification, relational terms such as first and second and the like be used merely to by
One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation
Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant meaning
Covering non-exclusive inclusion, so that the process, method, article or equipment for including a series of elements not only includes that
A little elements, but also including other elements that are not explicitly listed, or further include for this process, method, article or
The intrinsic element of equipment.In the absence of more restrictions, the element limited by sentence "including a ...", is not arranged
Except there is also other identical elements in the process, method, article or apparatus that includes the element.
The foregoing description of the disclosed embodiments enables those skilled in the art to implement or use the present invention.
Various modifications to these embodiments will be readily apparent to those skilled in the art, as defined herein
General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, of the invention
It is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein phase one
The widest scope of cause.
Claims (9)
1. a kind of method of speech processing characterized by comprising
Voice signal to be processed is pre-processed, multiple framing audio signals are obtained;
Feature extraction is carried out to each framing audio signal, is obtained and each one-to-one plum of framing audio signal
That frequency cepstral coefficient MFCC acoustical characteristic parameters;
Each MFCC acoustical characteristic parameters are trained using the DNN disaggregated model pre-established, are obtained and each institute
The one-to-one classification information of MFCC acoustical characteristic parameters is stated, the classification information includes voice and non-voice.
2. method of speech processing according to claim 1, which is characterized in that described to be located in advance to voice signal to be processed
Reason, obtains the process of multiple framing audio signals are as follows:
Preemphasis, framing are carried out to the voice signal to be processed and add Hamming window processing, obtains multiple framing audio letters
Number.
3. method of speech processing according to claim 1 or 2, which is characterized in that obtained and each MFCC described
After the one-to-one classification information of acoustical characteristic parameters, further includes:
Determine that classification is each framing audio signal of non-voice according to each classification information.
4. method of speech processing according to claim 3, which is characterized in that the non-voice includes paging sound, CRBT sound
And customer service voices.
5. a kind of voice processing apparatus characterized by comprising
Preprocessing module obtains multiple framing audio signals for pre-processing to voice signal to be processed;
Extraction module obtains believing with each framing audio for carrying out feature extraction to each framing audio signal
Number one-to-one MFCC acoustical characteristic parameters;
Categorization module, for being trained using the DNN disaggregated model pre-established to each MFCC acoustical characteristic parameters,
Obtain with each one-to-one classification information of MFCC acoustical characteristic parameters, the classification information includes voice and inhuman
Sound.
6. voice processing apparatus according to claim 5, which is characterized in that the preprocessing module is specifically used for institute
It states voice signal to be processed to carry out preemphasis, framing and add Hamming window processing, obtains multiple framing audio signals.
7. voice processing apparatus according to claim 5 or 6, which is characterized in that further include:
Screening module, for determining that classification is each framing audio signal of non-voice according to each classification information.
8. a kind of speech processing device characterized by comprising
Memory, for storing computer program;
Processor realizes the speech processes side as described in Claims 1-4 any one when for executing the computer program
The step of method.
9. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium
Program realizes the method for speech processing as described in Claims 1-4 any one when the computer program is executed by processor
The step of.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811124680.2A CN109065075A (en) | 2018-09-26 | 2018-09-26 | A kind of method of speech processing, device, system and computer readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811124680.2A CN109065075A (en) | 2018-09-26 | 2018-09-26 | A kind of method of speech processing, device, system and computer readable storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109065075A true CN109065075A (en) | 2018-12-21 |
Family
ID=64765930
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811124680.2A Pending CN109065075A (en) | 2018-09-26 | 2018-09-26 | A kind of method of speech processing, device, system and computer readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109065075A (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109949824A (en) * | 2019-01-24 | 2019-06-28 | 江南大学 | City sound event classification method based on N-DenseNet and higher-dimension mfcc feature |
CN110049395A (en) * | 2019-04-25 | 2019-07-23 | 维沃移动通信有限公司 | Headset control method and ear speaker device |
CN110390952A (en) * | 2019-06-21 | 2019-10-29 | 江南大学 | City sound event classification method based on bicharacteristic 2-DenseNet parallel connection |
CN110473566A (en) * | 2019-07-25 | 2019-11-19 | 深圳壹账通智能科技有限公司 | Audio separation method, device, electronic equipment and computer readable storage medium |
CN111696580A (en) * | 2020-04-22 | 2020-09-22 | 广州多益网络股份有限公司 | Voice detection method and device, electronic equipment and storage medium |
CN112087726A (en) * | 2020-09-11 | 2020-12-15 | 携程旅游网络技术(上海)有限公司 | Method and system for identifying polyphonic ringtone, electronic equipment and storage medium |
CN112382310A (en) * | 2020-11-12 | 2021-02-19 | 北京猿力未来科技有限公司 | Human voice audio recording method and device |
CN112397073A (en) * | 2020-11-04 | 2021-02-23 | 北京三快在线科技有限公司 | Audio data processing method and device |
CN112562656A (en) * | 2020-12-16 | 2021-03-26 | 咪咕文化科技有限公司 | Signal classification method, device, equipment and storage medium |
CN113314123A (en) * | 2021-04-12 | 2021-08-27 | 科大讯飞股份有限公司 | Voice processing method, electronic equipment and storage device |
CN113744758A (en) * | 2021-09-16 | 2021-12-03 | 江南大学 | Sound event detection method based on 2-DenseGRUNet model |
CN115065912A (en) * | 2022-06-22 | 2022-09-16 | 广州市迪声音响有限公司 | Feedback inhibition device for screening sound box energy based on voiceprint screen technology |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030233233A1 (en) * | 2002-06-13 | 2003-12-18 | Industrial Technology Research Institute | Speech recognition involving a neural network |
CN105632501A (en) * | 2015-12-30 | 2016-06-01 | 中国科学院自动化研究所 | Deep-learning-technology-based automatic accent classification method and apparatus |
CN105761720A (en) * | 2016-04-19 | 2016-07-13 | 北京地平线机器人技术研发有限公司 | Interaction system based on voice attribute classification, and method thereof |
CN105788592A (en) * | 2016-04-28 | 2016-07-20 | 乐视控股(北京)有限公司 | Audio classification method and apparatus thereof |
CN106328121A (en) * | 2016-08-30 | 2017-01-11 | 南京理工大学 | Chinese traditional musical instrument classification method based on depth confidence network |
CN106710599A (en) * | 2016-12-02 | 2017-05-24 | 深圳撒哈拉数据科技有限公司 | Particular sound source detection method and particular sound source detection system based on deep neural network |
CN107452371A (en) * | 2017-05-27 | 2017-12-08 | 北京字节跳动网络技术有限公司 | A kind of construction method and device of Classification of Speech model |
CN108182949A (en) * | 2017-12-11 | 2018-06-19 | 华南理工大学 | A kind of highway anomalous audio event category method based on depth conversion feature |
CN108257614A (en) * | 2016-12-29 | 2018-07-06 | 北京酷我科技有限公司 | The method and its system of audio data mark |
-
2018
- 2018-09-26 CN CN201811124680.2A patent/CN109065075A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030233233A1 (en) * | 2002-06-13 | 2003-12-18 | Industrial Technology Research Institute | Speech recognition involving a neural network |
CN105632501A (en) * | 2015-12-30 | 2016-06-01 | 中国科学院自动化研究所 | Deep-learning-technology-based automatic accent classification method and apparatus |
CN105761720A (en) * | 2016-04-19 | 2016-07-13 | 北京地平线机器人技术研发有限公司 | Interaction system based on voice attribute classification, and method thereof |
CN105788592A (en) * | 2016-04-28 | 2016-07-20 | 乐视控股(北京)有限公司 | Audio classification method and apparatus thereof |
CN106328121A (en) * | 2016-08-30 | 2017-01-11 | 南京理工大学 | Chinese traditional musical instrument classification method based on depth confidence network |
CN106710599A (en) * | 2016-12-02 | 2017-05-24 | 深圳撒哈拉数据科技有限公司 | Particular sound source detection method and particular sound source detection system based on deep neural network |
CN108257614A (en) * | 2016-12-29 | 2018-07-06 | 北京酷我科技有限公司 | The method and its system of audio data mark |
CN107452371A (en) * | 2017-05-27 | 2017-12-08 | 北京字节跳动网络技术有限公司 | A kind of construction method and device of Classification of Speech model |
CN108182949A (en) * | 2017-12-11 | 2018-06-19 | 华南理工大学 | A kind of highway anomalous audio event category method based on depth conversion feature |
Non-Patent Citations (3)
Title |
---|
周学广等 著: "《信息内容安全》", 30 November 2012, 武汉大学出版社 * |
宋知用 著: "《MATLAB语音信号分析与合成 第2版》", 31 January 2018, 北京航空航天大学出版社 * |
韩志艳: "《语音识别及语音可视化技术研究》", 31 January 2017 * |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109949824A (en) * | 2019-01-24 | 2019-06-28 | 江南大学 | City sound event classification method based on N-DenseNet and higher-dimension mfcc feature |
CN110049395A (en) * | 2019-04-25 | 2019-07-23 | 维沃移动通信有限公司 | Headset control method and ear speaker device |
CN110049395B (en) * | 2019-04-25 | 2020-06-05 | 维沃移动通信有限公司 | Earphone control method and earphone device |
CN110390952A (en) * | 2019-06-21 | 2019-10-29 | 江南大学 | City sound event classification method based on bicharacteristic 2-DenseNet parallel connection |
CN110390952B (en) * | 2019-06-21 | 2021-10-22 | 江南大学 | City sound event classification method based on dual-feature 2-DenseNet parallel connection |
CN110473566A (en) * | 2019-07-25 | 2019-11-19 | 深圳壹账通智能科技有限公司 | Audio separation method, device, electronic equipment and computer readable storage medium |
CN111696580A (en) * | 2020-04-22 | 2020-09-22 | 广州多益网络股份有限公司 | Voice detection method and device, electronic equipment and storage medium |
CN112087726A (en) * | 2020-09-11 | 2020-12-15 | 携程旅游网络技术(上海)有限公司 | Method and system for identifying polyphonic ringtone, electronic equipment and storage medium |
CN112397073A (en) * | 2020-11-04 | 2021-02-23 | 北京三快在线科技有限公司 | Audio data processing method and device |
CN112397073B (en) * | 2020-11-04 | 2023-11-21 | 北京三快在线科技有限公司 | Audio data processing method and device |
CN112382310A (en) * | 2020-11-12 | 2021-02-19 | 北京猿力未来科技有限公司 | Human voice audio recording method and device |
WO2022100692A1 (en) * | 2020-11-12 | 2022-05-19 | 北京猿力未来科技有限公司 | Human voice audio recording method and apparatus |
CN112562656A (en) * | 2020-12-16 | 2021-03-26 | 咪咕文化科技有限公司 | Signal classification method, device, equipment and storage medium |
CN113314123A (en) * | 2021-04-12 | 2021-08-27 | 科大讯飞股份有限公司 | Voice processing method, electronic equipment and storage device |
CN113314123B (en) * | 2021-04-12 | 2024-05-31 | 中国科学技术大学 | Voice processing method, electronic equipment and storage device |
CN113744758A (en) * | 2021-09-16 | 2021-12-03 | 江南大学 | Sound event detection method based on 2-DenseGRUNet model |
CN113744758B (en) * | 2021-09-16 | 2023-12-01 | 江南大学 | Sound event detection method based on 2-DenseGRUNet model |
CN115065912A (en) * | 2022-06-22 | 2022-09-16 | 广州市迪声音响有限公司 | Feedback inhibition device for screening sound box energy based on voiceprint screen technology |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109065075A (en) | A kind of method of speech processing, device, system and computer readable storage medium | |
WO2019227580A1 (en) | Voice recognition method, apparatus, computer device, and storage medium | |
CN108900725B (en) | Voiceprint recognition method and device, terminal equipment and storage medium | |
CN108922538B (en) | Conference information recording method, conference information recording device, computer equipment and storage medium | |
CN110428810B (en) | Voice wake-up recognition method and device and electronic equipment | |
US8731936B2 (en) | Energy-efficient unobtrusive identification of a speaker | |
WO2019037382A1 (en) | Emotion recognition-based voice quality inspection method and device, equipment and storage medium | |
US8396704B2 (en) | Producing time uniform feature vectors | |
US8005676B2 (en) | Speech analysis using statistical learning | |
CN105679310A (en) | Method and system for speech recognition | |
CN109256150A (en) | Speech emotion recognition system and method based on machine learning | |
CN108010513B (en) | Voice processing method and device | |
CN111145763A (en) | GRU-based voice recognition method and system in audio | |
CN108597505A (en) | Audio recognition method, device and terminal device | |
US20210118464A1 (en) | Method and apparatus for emotion recognition from speech | |
CN110798578A (en) | Incoming call transaction management method and device and related equipment | |
CN109215634A (en) | A kind of method and its system of more word voice control on-off systems | |
WO2019210556A1 (en) | Call reservation method, agent leaving processing method and apparatus, device, and medium | |
CN109994129A (en) | Speech processing system, method and apparatus | |
CN110556114B (en) | Speaker identification method and device based on attention mechanism | |
CN108053834A (en) | audio data processing method, device, terminal and system | |
JP2021078012A (en) | Answering machine determination device, method and program | |
CN117116251A (en) | Repayment probability assessment method and device based on collection-accelerating record | |
CN115273855A (en) | Call volume adjusting method and related equipment | |
CN108735234A (en) | A kind of device monitoring health status using voice messaging |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181221 |
|
RJ01 | Rejection of invention patent application after publication |