CN107808659A - Intelligent sound signal type recognition system device - Google Patents
Intelligent sound signal type recognition system device Download PDFInfo
- Publication number
- CN107808659A CN107808659A CN201711253194.6A CN201711253194A CN107808659A CN 107808659 A CN107808659 A CN 107808659A CN 201711253194 A CN201711253194 A CN 201711253194A CN 107808659 A CN107808659 A CN 107808659A
- Authority
- CN
- China
- Prior art keywords
- voice
- signal
- typing
- sound
- present
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000005236 sound signal Effects 0.000 title claims abstract description 24
- 238000000034 method Methods 0.000 claims abstract description 40
- 230000008569 process Effects 0.000 claims abstract description 14
- 238000012545 processing Methods 0.000 claims abstract description 14
- 230000003993 interaction Effects 0.000 claims abstract description 5
- 238000012800 visualization Methods 0.000 claims abstract description 5
- 238000012360 testing method Methods 0.000 claims description 52
- 238000000605 extraction Methods 0.000 claims description 33
- 230000006870 function Effects 0.000 claims description 23
- 238000013528 artificial neural network Methods 0.000 claims description 18
- 210000002569 neuron Anatomy 0.000 claims description 15
- 238000004422 calculation algorithm Methods 0.000 claims description 12
- 238000012549 training Methods 0.000 claims description 12
- 238000001514 detection method Methods 0.000 claims description 9
- 230000005284 excitation Effects 0.000 claims description 9
- 239000000284 extract Substances 0.000 claims description 9
- 238000001914 filtration Methods 0.000 claims description 7
- 238000009432 framing Methods 0.000 claims description 6
- 238000002372 labelling Methods 0.000 claims description 6
- 238000004321 preservation Methods 0.000 claims description 6
- 238000001228 spectrum Methods 0.000 claims description 6
- 230000007704 transition Effects 0.000 claims description 6
- 230000008030 elimination Effects 0.000 claims description 4
- 238000003379 elimination reaction Methods 0.000 claims description 4
- 238000011017 operating method Methods 0.000 claims description 4
- 230000008859 change Effects 0.000 claims description 3
- 238000004891 communication Methods 0.000 claims description 3
- 238000011156 evaluation Methods 0.000 claims description 3
- 230000004044 response Effects 0.000 claims description 3
- 230000035945 sensitivity Effects 0.000 claims description 3
- 241000406668 Loxodonta cyclotis Species 0.000 claims 1
- 230000007935 neutral effect Effects 0.000 claims 1
- 238000013139 quantization Methods 0.000 claims 1
- 238000005516 engineering process Methods 0.000 description 7
- 238000007635 classification algorithm Methods 0.000 description 5
- 238000000465 moulding Methods 0.000 description 3
- 230000006399 behavior Effects 0.000 description 2
- 235000013399 edible fruits Nutrition 0.000 description 2
- 230000005611 electricity Effects 0.000 description 2
- 230000007774 longterm Effects 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 235000010149 Brassica rapa subsp chinensis Nutrition 0.000 description 1
- 235000000536 Brassica rapa subsp pekinensis Nutrition 0.000 description 1
- 241000499436 Brassica rapa subsp. pekinensis Species 0.000 description 1
- 244000241235 Citrullus lanatus Species 0.000 description 1
- 235000012828 Citrullus lanatus var citroides Nutrition 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000004575 stone Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/285—Selection of pattern recognition techniques, e.g. of classifiers in a multi-classifier system
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Acoustics & Sound (AREA)
- Human Computer Interaction (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Theoretical Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Quality & Reliability (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A kind of intelligent sound signal type recognition system device,Include framework 10,The framework 10 is provided with cavity,Voice acquisition module 1 is provided with framework 10,Sound identification module 2,Central processing unit 3,Wireless signal transceiver 4,Display screen 8,Memory 33,Mixed-media network modules mixed-media 31,RAM card 32,Loudspeaker 35 and power supply 9,Voice acquisition module 1 includes microphone 11,Wireless interspeaker 12 and fixed phonographic recorder 13,Sound identification module 2 includes voice-input unit 20,Voice pretreatment unit 21,Speech recognition unit 22,Characteristic matching identification and classification unit 23,Voice signal is gathered by voice acquisition module 1,The signal collected is handled by sound identification module 2,Data-signal is preserved by memory 33,The visualization of the output of the operating process and result of man-machine interaction is shown by display screen 8,Therefore,People's recognition of speech signals is more convenient.
Description
Technical field
The invention discloses a kind of intelligent sound signal type recognition system device, belongs to smart electronicses product technology neck
Domain, specifically it is equipped with one kind intelligence that voice acquisition module, sound identification module, control system and loudspeaker are integrated
Voice signal PRS device.
Background technology
In daily life, there is the voice letter that various language signals, such as the exchange of people are sent
Number, sound caused by machine operation, play sound etc. caused by the sound, vehicle whistle that music sends, voice signal almost fills
Denounce around whole living environment, be sometimes right by which in one group of voice signal it is desirable to accurately learn and identify
As sending.In common voice signal, people can often identify different sound and be sent by what object, but
Be when a variety of objects are simultaneously emitted by sound, especially multiple homogeneous objects simultaneously sounding when, or playback environ-ment is noisy, people
It is difficult to distinguish which kind of sound is sent by which object, for example, in the recording that people more than one group argues phenomenon, the number of speech
When more, people be difficult by playback and distinguish which words be which debater says.Therefore, people usually require a kind of energy
Enough identify the device of voice.
Before making the present invention, on the market there is also the product of some identification voices, such as some phonetic entry softwares, but
It is to identify the word or letter in voice mostly to be, or carries out pairing identification to simple single voice, and what is also had can be with
By being spoken against products such as mobile phones, some tasks are completed after handset identity voice semanteme, such as make a phone call to search for simple task,
But the difference to phonetic feature can not be realized, it is impossible to which it is by which that accurately comparable speech or identical word are distinguished in identification
The Similar Problems that people or object are said.Therefore, it is not easy to the flexible use of people.
The content of the invention
In order to overcome above-mentioned technical disadvantages, it is an object of the invention to provide a kind of intelligent sound signal type recognition system dress
Put, in that context it may be convenient to identification and record voice signal and proposition characteristic parameter, and by carrying out existing signal to unknown voice
Signal carries out intelligent mode identification, classification and extraction.
To reach above-mentioned purpose, the present invention adopts the technical scheme that:Including framework 10, framework 10 is provided with cavity,
Voice acquisition module 1, sound identification module 2, central processing unit 3, wireless signal transceiver 4, aobvious is provided with framework 10
Display screen 8, memory 33, mixed-media network modules mixed-media 31, RAM card 32, loudspeaker 35 and power supply 9, voice acquisition module 1 include microphone 11,
Wireless interspeaker 12 and fixed phonographic recorder 13, sound identification module 2 include voice-input unit 20, voice pretreatment unit
21st, speech recognition unit 22, characteristic matching identification and classification unit 23, voice signal is gathered by voice acquisition module 1,
The signal collected is handled by sound identification module 2, and data-signal is preserved by memory 33, the operating process of man-machine interaction and
As a result the visualization of output is shown that loudspeaker 35 is arranged to carry out voice message to operating procedure and report to know by display screen 8
Other result, mixed-media network modules mixed-media 31 are arranged to the present invention being attached with internet cloud platform, and central processing unit 3 is arranged to whole
The programme-control of system and device and data operation, wireless signal transceiver 4 be arranged to wireless interspeaker 12, smart mobile phone,
Radio signal caused by mixed-media network modules mixed-media 31 is received, launched and is connected the present invention with internet wireless, RAM card 32
The external voice data for being arranged to have recorded are read in database of the present invention.
The present invention devises, and voice-input unit 20 is arranged to include " voice typing pattern " and " tone testing pattern "
Two types, microphone 11 that voice acquisition module 1 is provided, wireless interspeaker 12, fixed phonographic recorder 13 and intelligent hand can be passed through
Any one mode of machine inputs voice, and in " voice typing pattern ", voice-input unit 20 is arranged to once can only be to one
People or an object carry out voice typing, it is characterised in that the voice of typing is one section of audio signal of 5~30 seconds, the present invention
Using multimode voice typing strategy, it is characterised in that can include in the voice of typing normal speech, singing or height/in/
The multimode combine voice of bass, the real-time display speech waveform of display 8 and schedule bar, need after typing voice into
Row data markers, labeling method have such as gathered the sound of Zhang San, i.e., shown in display screen 8 of the present invention using manually marking
Dialog box in remarks:" sound of Zhang San ", preservation, the voice of typing are preserved in the memory 33, in " tone testing mould
Under formula ", the present invention by the microphone 11 in voice acquisition module 1, wireless interspeaker 12, fixed phonographic recorder 13 and smart mobile phone its
In one or more input tools together collecting test voice, tested speech gatherer process is collection in real time, nobody
Number, object and the limitation of time.
The present invention devises, and voice-input unit 20 is arranged to be connected with voice acquisition module 1, and microphone 11 passes through audio
Line is connected to voice acquisition module 1, and wireless interspeaker 12 is connected by radio signal with voice acquisition module 1.
The present invention devises, and voice acquisition module 1 can also use smart mobile phone to carry out voice signal input, by using mobile phone
Connection is matched with voice acquisition module 1 of the present invention, matching way, which includes bluetooth, infrared ray, WIFI and scanning Quick Response Code, to be carried out
Connection, realizes voice typing, is used equivalent to mobile phone as wireless language cylinder, more convenient for more crowd's voice interfaces.
The present invention is devised, and the voice signal that voice acquisition module 1 collects is changed into electricity by voice pretreatment unit 21
Signal, i.e., be data signal by analog-signal transitions, then carry out conventional signal transacting, including ambient background noise eliminate,
Signal framing, filtering, preemphasis, windowed function and end-point detection etc..
The present invention devises, and speech recognition unit 22 is arranged to extract reflection language from primary speech signal
The main characteristic parameters of sound essence, form characteristic vector xi, xi=(xi1,xi2,…xij,…,xin)T, xijRepresent i-th of object
Or j-th personal of speech characteristic value, characteristic parameter extraction method preferably use frequency cepstral coefficient method (MFCC), can also adopted
Acoustic feature is obtained with spectrum envelope method, LPC interpolation methods, LPC extraction of roots, Hilbert transform method etc., is obtained after extraction feature
Characteristic vector system will be automatically saved in pattern class database, the corresponding pattern of all sound characteristics of an object or people
Class, if after the voice of the N number of people of typing or object, that is, N number of pattern class is obtained, if each pattern class has n characteristic parameter, you can structure
Into n dimensional feature spaces, that is, the characteristic signal collection after marking can be designated as D={ (x1,y1),(x2,y2),…(xi,yi),…,(xN,
yN), wherein xi∈ χ=Rn, xiRepresent i-th of object of institute's typing or the phonetic feature signal of people, yi∈ Y={ 1,2 ..., N },
yiRepresent i-th of people or object, N represents the digital number of n-th people or object, and the voice feature data after mark forms pattern
Class database, and be stored in the memory 33 of the present invention.
The present invention devises, and characteristic matching identification and classification unit 23 is arranged to the multi classifier using intelligence, grader
Learning algorithm be arranged to use improved neural network classification algorithm, pass through to typing and mark phonetic feature signal collection
As training data, allow network model to learn training data, obtain classifying rules, complete the training of grader;Then
Intelligent classification and identification are carried out to unknown tested speech signal using the grader trained;When test signal extracts spy
After sign, the present invention can carry out characteristic matching automatically, and the characteristic parameter of the tested speech signal of extraction is deposited with the present invention in real time
Typing and the sample voice characteristic parameter progress characteristic matching marked in reservoir 33, and tested speech signal is calculated with owning
The similarity of the sample speech signal of typing, then tested speech signal is divided into and that sample of its similarity highest
In signal mode classification, the last present invention outwardly exports recognition result, " this is XXX sound " similar report, for example, such as
The fruit present invention has stored the phonetic feature signal of Zhang San, and when Zhang San speaks or sung against the present invention, the present invention can be automatic
It is most like to calculate tested speech characteristic parameter and the typing voice signal of typing and labeled Zhang San of Zhang San, by knowing
Not, output " this is the sound of Zhang San " automatically.
The present invention devises, the multi-layer artificial neural network structure that multi classifier uses, it is characterized in that, one end of network
It is defined as input layer, the other end is defined as output layer, and the part among input layer and output layer is defined as hidden layer, and input layer is used
In receiving extraneous input signal, input signal is sent to all neurons of hidden layer again, will after hidden layer calculates
As a result output layer is passed to, output layer is from output category result after hidden layer reception signal, calculating, that is, the result identified, this hair
The number of plies of bright preferable hidden layer is arranged to 1~200 layer.
The present invention devises, and the process of improved artificial neural network classification algorithm training includes step 1~7.
Step 1:Netinit.According to the number of voice signal typing, algorithm data-base is constantly updated, the N when typing
During the voice signal of individual object, that is, N number of pattern class is formed, obtain sample space (X, Y), i-th group of sample is (Xi,Yi), XiRepresent
The characteristic vector set extracted to i-th of object, YiRepresent i-th marked of object;According to system input and output sequence
(X, Y) determines network input layer nodal point number n, hidden layer node number l, output layer nodal point number m, and wherein n values are by input signal feature
The number of character pair value is determined in extraction, and m values are determined that l reference point is by the number of the speech pattern class storedWherein a span is 0~10, calculates determination automatically by model, initializes input layer and hidden layer
Neuron between connection weight ωijConnection weight ω between hidden layer and output layer neuronjk, initialize implicit
Layer threshold value a and output layer threshold value b, gives learning rate η and neuron excitation function.
Step 2:Calculate the output of hidden layer.According to input change X, the connection weight of the neuron of input layer and hidden layer
ωij, and hidden layer threshold value a, calculate hidden layer output H;The output for remembering j-th of hidden layer node is Hj,J=1,2 ..., l, wherein l are hidden layer node number, and f is general hidden layer excitation function, the excitation
Function has a variety of, currently preferred use f (x)=(1+e-x)-1。
Step 3:Calculate the output of output layer.H, the connection between hidden layer and output layer neuron are exported according to hidden layer
Weights ωjk, and output layer threshold value b, output layer output O is calculated, the output of k-th of output layer node of note is Ok,K=1,2 ..., m, wherein m are output layer nodal point number, bkFor the threshold value of k-th of node of output layer,
HjFor the output valve of j-th of node of hidden layer.
Step 4:Calculate prediction error.The output O and desired output Y (true value) obtained according to neural network forecast, calculating network
Overall error e is predicted,ekFor error caused by k-th of output layer node,
Step 5:Update weights.Network connection weights ω is updated according to neural network forecast overall error ejkAnd ωij, ωjk +=ωjk
+η·Hj·Ek, wherein j=1,2 ..., l, k=1,2 ..., m, η are learning rate, EkRepresent the network overall error of output layer node
Sensitivity to output layer network node k,
Wherein i=1,2 ..., n, j=1,2 ..., l.
Step 6:Threshold value updates.Hidden layer threshold value a and output layer threshold value b is updated according to neural network forecast overall error e,J=1,2 ..., l;bk +=bk+η·Ek, k=1,2 ..., m.
Step 7:Whether evaluation algorithm iteration restrains, if no convergence return to step 2, currently preferred minimal error are not
Terminate iteration when 0.001.
The present invention devises, and voice acquisition module 1 is built-in with voice collecting card, for collecting and handling the voice collected
Signal.
The present invention devises, and fixed phonographic recorder 13 uses wind-proof type microphone.
The present invention devises, and display screen 8 uses touch-screen or LED display with background light.
In the present invention, fixed phonographic recorder 13 can be set multiple, be arranged at shell of the present invention, for increasing Speech Record
Intensity processed.
The present invention has long-term store function to typing and the voice signal that has marked, every to be stored in voice of the present invention
Voice signal in pattern class database, the present invention can transfer at any time carries out contrast identification with unknown tested speech.
The process for using of the present invention is first to turn on the power switch 5, and then automated system operation, display screen 8 are lighted and shown
Operation interface, people can select " voice typing pattern " and " tone testing pattern " two kinds of functions.
(1) when selection Speech Record is fashionable, the meeting control voice of central processing unit 3 input block 20 enters " voice typing mould
Formula ", display screen 8 and loudspeaker 35 can prompt " being voice typing pattern now, please speak " similar prompting simultaneously, and people can lead to
Cross microphone 11, wireless interspeaker 12, fixed phonographic recorder 13 that voice acquisition module 1 provided and smart mobile phone any one mode
Input voice;To ensure the phonetic feature of the invention that can be accurately identified and quantify identified object, therefore in " voice typing mould
Formula " the stage can only carry out voice typing to a people or an object every time, be sent due to same people when speaking and singing
Sound signal data can have certain feature deviation, and therefore, to improve the degree of accuracy of voice signal identification, the present invention uses more
State voice typing strategy, i.e., it can be included in the voice of typing under normal speech, singing or high/medium/low sound and other states
Multimode combined sound, long recording time be 5~30 seconds, display 8 can show voice real-time waveform and schedule bar, if
The voice of recording is undesirable can to delete typing again, need to carry out data markers after typing voice, labeling method uses
Manually mark, such as gathered the sound of Zhang San, i.e., the remarks in the dialog box of display screen 8 of the present invention display:" the sound of Zhang San
Sound ", preservation, the phonetic storage of typing is in the memory 33 of the present invention.
(2) after voice signal typing, marked voice signal is sent into voice by control system of the invention automatically
The voice signal that voice acquisition module 1 collects is changed into electric signal by pretreatment unit 21, voice pretreatment unit 21, will
Analog-signal transitions are data signal, then carry out conventional signal transacting, including ambient background noise elimination, signal framing,
Filtering, preemphasis, windowed function and end-point detection etc..
(3) voice signal after having pre-processed is sent into signal characteristic abstraction unit 22, language by control system of the invention automatically
Sound signal feature extraction unit 22 extracts the characteristic parameter of reflection voice essence from pretreated voice signal, obtains spy
Levy vector xi, characteristic parameter extraction method preferably uses frequency cepstral coefficient method (MFCC), can also use spectrum envelope method, LPC
Interpolation method, LPC extraction of roots, Hilbert transform method etc. obtain acoustic feature, extract the characteristic vector system obtained after feature certainly
It is dynamic to be saved in pattern class storehouse, the corresponding pattern class of all sound characteristics of a people, if after the N number of people's voice of typing, i.e.,
N number of pattern class is obtained, if each pattern class has n characteristic parameter, so as to obtain the data that a people corresponds to voice signal pattern class
Storehouse, all data are stored in the memory 33 of the present invention, and so far, speech signal typing mode contents finish.
(4) after voice typing, tone testing can be carried out, when carrying out tone testing, it is only necessary in the behaviour of display screen 8
Make to select " tone testing pattern " in interface, the meeting control voice of central processing unit 3 input block 20 enters " tone testing mould
Formula ", display screen 8 can prompt prompting similar " in tone testing ... " simultaneously with loudspeaker 35, at this moment people do not take to do it is any
Operation, the present invention can pass through the microphone 11 in voice acquisition module 1, wireless interspeaker 12, fixed phonographic recorder 13 and smart mobile phone
Together collecting test voice, tested speech gatherer process be not any to gather in real time for one or more input tools therein
The limitation of time restriction and number.
(5) speech data collected under " tone testing pattern ", present system device can be automatically to testing language
Sound signal is pre-processed and feature extraction, the tone testing signal collected is converted into electric signal, and carry out conventional filter
Ripple, carry out signal characteristic abstraction after removing noise, windowed function and end-point detection.
(6) after test signal extraction feature, the present invention can carry out characteristic matching automatically, by the tested speech signal of extraction
Marked good sample voice characteristic parameter of the characteristic parameter in real time with typing in memory 33 of the present invention carries out feature
Match somebody with somebody, and calculate the similarity of tested speech signal and the primary speech signal of all typings, and tested speech signal is assigned to
With in its similarity highest that pattern class, the last present invention outwardly exports, " this is XXX sound " similar report
Accuse, if for example, the present invention has stored the phonetic feature signal of Zhang San, when Zhang San speaks or sung against the present invention, originally
Invention can export " this is the sound of Zhang San " automatically by identification.
When the present invention tests in public, because in test environment, it is same that the same period there may be multiple objects
When speak, that is, the voice signal collected is the signal of broadband aliasing, to prevent that the present invention is special to the voice signal that now gathers
Malfunctioned during sign extraction, the strategy that the present invention uses is, with intelligent algorithm, first matches and identify language when single people speaks
Sound characteristic parameter is simultaneously identified and stored, then system again to speaking jointly when voice signal carry out automatic screening and point
From finally exporting recognition result and report " being the common sound of Zhang San, Li Si, king five ... now " similar prompting, and carry
Show that XX voice be present fails to identify, power off key 6 is pressed when closing system.
The present invention have also been devised, and system and device can also export clear to the recognition result under more people's communication environments to people
It is single, comprising the quantity that how many people or object are spoken at the scene under test environment, and screen and play from more people while speak
Recording in identification isolate the content that everyone is said, and filter out other people sound and ambient sound.
When occur in tested speech signal the present invention do not store sample phonic signal character when, the present invention can remember automatically
The unknown phonic signal character is recorded, to remind people whether to mark and store the voice signal of the object.
Brief description of the drawings
Fig. 1 is the structural representation of the present invention.
Fig. 2 is the system framework figure of the present invention.
Fig. 3 is the multi-layer artificial neural network schematic diagram of the present invention.
Fig. 4 is the improved neural network classification algorithm flow chart of voice signal of the present invention.
Embodiment
Accompanying drawing 1 is one embodiment of the present of invention, illustrates the present embodiment with reference to 1~accompanying drawing of accompanying drawing 4, includes framework
10, framework 10 is provided with cavity, be provided with framework 10 voice acquisition module 1, sound identification module 2, central processing unit 3,
Wireless signal transceiver 4, display screen 8, memory 33, mixed-media network modules mixed-media 31, RAM card 32, loudspeaker 35 and power supply 9, voice is adopted
Collection module 1 includes microphone 11, wireless interspeaker 12 and fixed phonographic recorder 13, sound identification module 2 and includes phonetic entry list
Member 20, voice pretreatment unit 21, speech recognition unit 22, characteristic matching identification and classification unit 23, voice signal
Gathered by voice acquisition module 1, the signal collected is handled by sound identification module 2, and data-signal is preserved by memory 33, people
The visualization of the output of the operating process and result of machine interaction is shown that loudspeaker 35 is arranged to operating procedure by display screen 8
Carrying out voice message and report recognition result, mixed-media network modules mixed-media 31 is arranged to the present invention being attached with internet cloud platform, in
Central processor 3 is arranged to programme-control and data operation to whole system device, and wireless signal transceiver 4 is arranged to nothing
Line intercom 12, smart mobile phone, radio signal caused by mixed-media network modules mixed-media 31 received, is launched and by the present invention and interconnection
Net wireless connection, the external voice data that RAM card 32 is arranged to have recorded are read in database of the present invention.
In the present embodiment, voice-input unit 20 is arranged to include " voice typing pattern " and " tone testing pattern "
Two types, microphone 11 that voice acquisition module 1 is provided, wireless interspeaker 12, fixed phonographic recorder 13 and intelligent hand can be passed through
Any one mode of machine inputs voice, and in " voice typing pattern ", voice-input unit 20 is arranged to once can only be to one
People or an object carry out voice typing, it is characterised in that the voice of typing is one section of audio signal of 5~30 seconds, the present invention
Using multimode voice typing strategy, it is characterised in that can include in the voice of typing normal speech, singing or height/in/
The multimode combine voice of bass, the real-time display speech waveform of display 8 and schedule bar, need after typing voice into
Row data markers, labeling method have such as gathered the sound of Zhang San, i.e., shown in display screen 8 of the present invention using manually marking
Dialog box in remarks:" sound of Zhang San ", preservation, the voice of typing are preserved in the memory 33, in " tone testing mould
Under formula ", the present invention by the microphone 11 in voice acquisition module 1, wireless interspeaker 12, fixed phonographic recorder 13 and smart mobile phone its
In one or more input tools together collecting test voice, tested speech gatherer process is collection in real time, nobody
Number, object and the limitation of time.
In the present embodiment, voice-input unit 20 is arranged to be connected with voice acquisition module 1, and microphone 11 passes through audio
Line is connected to voice acquisition module 1, and wireless interspeaker 12 is connected by radio signal with voice acquisition module 1.
In the present embodiment, voice acquisition module 1 can also use smart mobile phone to carry out voice signal input, by using mobile phone
Connection is matched with voice acquisition module 1 of the present invention, matching way, which includes bluetooth, infrared ray, WIFI and scanning Quick Response Code, to be carried out
Connection, realizes voice typing, is used equivalent to mobile phone as wireless language cylinder, more convenient for more crowd's voice interfaces.
In the present embodiment, the voice signal that voice acquisition module 1 collects is changed into electricity by voice pretreatment unit 21
Signal, i.e., be data signal by analog-signal transitions, then carry out conventional signal transacting, including ambient background noise eliminate,
Signal framing, filtering, preemphasis, windowed function and end-point detection etc..
In the present embodiment, speech recognition unit 22 is arranged to extract reflection language from primary speech signal
The main characteristic parameters of sound essence, form characteristic vector xi, xi=(xi1,xi2,…xij,…,xin)T, xijRepresent i-th of object
Or j-th personal of speech characteristic value, characteristic parameter extraction method preferably use frequency cepstral coefficient method (MFCC), can also adopted
Acoustic feature is obtained with spectrum envelope method, LPC interpolation methods, LPC extraction of roots, Hilbert transform method etc., is obtained after extraction feature
Characteristic vector system will be automatically saved in pattern class database, the corresponding pattern of all sound characteristics of an object or people
Class, if after the voice of the N number of people of typing or object, that is, N number of pattern class is obtained, if each pattern class has n characteristic parameter, you can structure
Into n dimensional feature spaces, that is, the characteristic signal collection after marking can be designated as D={ (x1,y1),(x2,y2),…(xi,yi),…,(xN,
yN), wherein xi∈ χ=Rn, xiRepresent i-th of object of institute's typing or the phonetic feature signal of people, yi∈ Y={ 1,2 ..., N },
yiRepresent i-th of people or object, N represents the digital number of n-th people or object, and the voice feature data after mark forms pattern
Class database, and be stored in the memory 33 of the present invention.
In the present embodiment, characteristic matching identification and classification unit 23 is arranged to the multi classifier using intelligence, grader
Learning algorithm be arranged to use improved neural network classification algorithm, pass through to typing and mark phonetic feature signal collection
As training data, allow network model to learn training data, obtain classifying rules, complete the training of grader;Then
Intelligent classification and identification are carried out to unknown tested speech signal using the grader trained;When test signal extracts spy
After sign, the present invention can carry out characteristic matching automatically, and the characteristic parameter of the tested speech signal of extraction is deposited with the present invention in real time
Typing and the sample voice characteristic parameter progress characteristic matching marked in reservoir 33, and tested speech signal is calculated with owning
The similarity of the sample speech signal of typing, then tested speech signal is divided into and that sample of its similarity highest
In signal mode classification, the last present invention outwardly exports recognition result, " this is XXX sound " similar report, for example, such as
The fruit present invention has stored the phonetic feature signal of Zhang San, and when Zhang San speaks or sung against the present invention, the present invention can be automatic
It is most like to calculate tested speech characteristic parameter and the typing voice signal of typing and labeled Zhang San of Zhang San, by knowing
Not, output " this is the sound of Zhang San " automatically.
In the present embodiment, the multi-layer artificial neural network structure that multi classifier uses, it is characterized in that, one end of network
It is defined as input layer, the other end is defined as output layer, and the part among input layer and output layer is defined as hidden layer, and input layer is used
In receiving extraneous input signal, input signal is sent to all neurons of hidden layer again, will after hidden layer calculates
As a result output layer is passed to, output layer is from output category result after hidden layer reception signal, calculating, that is, the result identified, this hair
The number of plies of bright preferable hidden layer is arranged to 1~200 layer.
In the present embodiment, the process of improved artificial neural network classification algorithm training is as follows:
Step 1:Netinit.According to the number of voice signal typing, algorithm data-base is constantly updated, the N when typing
During the voice signal of individual object, that is, N number of pattern class is formed, obtain sample space (X, Y), i-th group of sample is (Xi,Yi), XiRepresent
The characteristic vector set extracted to i-th of object, YiRepresent i-th marked of object;According to system input and output sequence
(X, Y) determines network input layer nodal point number n, hidden layer node number l, output layer nodal point number m, and wherein n values are by input signal feature
The number of character pair value is determined in extraction, and m values are determined that l reference point is by the number of the speech pattern class storedWherein a span is 0~10, calculates determination automatically by model, initializes input layer and hidden layer
Neuron between connection weight ωijConnection weight ω between hidden layer and output layer neuronjk, initialize implicit
Layer threshold value a and output layer threshold value b, gives learning rate η and neuron excitation function.
Step 2:Calculate the output of hidden layer.According to input change X, the connection weight of the neuron of input layer and hidden layer
ωij, and hidden layer threshold value a, calculate hidden layer output H;The output for remembering j-th of hidden layer node is Hj,J=1,2 ..., l, wherein l are hidden layer node number, and f is general hidden layer excitation function, the excitation
Function has a variety of, currently preferred use f (x)=(1+e-x)-1。
Step 3:Calculate the output of output layer.H, the connection between hidden layer and output layer neuron are exported according to hidden layer
Weights ωjk, and output layer threshold value b, output layer output O is calculated, the output of k-th of output layer node of note is Ok,K=1,2 ..., m, wherein m are output layer nodal point number, bkFor the threshold value of k-th of node of output layer,
HjFor the output valve of j-th of node of hidden layer.
Step 4:Calculate prediction error.The output O and desired output Y (true value) obtained according to neural network forecast, calculating network
Overall error e is predicted,ekFor error caused by k-th of output layer node,Step 5:Renewal power
Value.Network connection weights ω is updated according to neural network forecast overall error ejkAnd ωij, ωjk +=ωjk+η·Hj·Ek, wherein j=1,
2 ..., l, k=1,2 ..., m, η are learning rate, EkRepresent the network overall error of output layer node to output layer network node k's
Sensitivity, Wherein i=1,2 ..., n, j
=1,2 ..., l.
Step 6:Threshold value updates.Hidden layer threshold value a and output layer threshold value b is updated according to neural network forecast overall error e,J=1,2 ..., l;bk +=bk+η·Ek, k=1,2 ..., m.
Step 7:Whether evaluation algorithm iteration restrains, if no convergence return to step 2, currently preferred minimal error are not
Terminate iteration when 0.001.
In the present embodiment, voice acquisition module 1 is built-in with voice collecting card, for collecting and handling the voice collected
Signal.
In the present embodiment, fixed phonographic recorder 13 uses wind-proof type microphone.
In this example it is shown that screen 8 uses touch-screen or LED display with background light.
In the present embodiment, fixed phonographic recorder 13 can be set multiple, be arranged at shell of the present invention, for increasing voice
Record intensity.
The present invention has long-term store function to typing and the voice signal that has marked, every to be stored in voice of the present invention
Voice signal in pattern class database, the present invention can transfer at any time carries out contrast identification with unknown tested speech.
The process for using of the present invention is first to turn on the power switch 5, and then automated system operation, display screen 8 are lighted and shown
Operation interface, people can select " voice typing pattern " and " tone testing pattern " two kinds of functions.
(1) when selection Speech Record is fashionable, the meeting control voice of central processing unit 3 input block 20 enters " voice typing mould
Formula ", display screen 8 and loudspeaker 35 can prompt " being voice typing pattern now, please speak " similar prompting simultaneously, and people can lead to
Cross microphone 11, wireless interspeaker 12, fixed phonographic recorder 13 that voice acquisition module 1 provided and smart mobile phone any one mode
Input voice;To ensure the phonetic feature of the invention that can be accurately identified and quantify identified object, therefore in " voice typing mould
Formula " the stage can only carry out voice typing to a people or an object every time, be sent due to same people when speaking and singing
Sound signal data can have certain feature deviation, and therefore, to improve the degree of accuracy of voice signal identification, the present invention uses more
State voice typing strategy, i.e., it can be included in the voice of typing under normal speech, singing or high/medium/low sound and other states
Multimode combined sound, long recording time be 5~30 seconds, display 8 can show voice real-time waveform and schedule bar, if
The voice of recording is undesirable can to delete typing again, need to carry out data markers after typing voice, labeling method uses
Manually mark, such as gathered the sound of Zhang San, i.e., the remarks in the dialog box of display screen 8 of the present invention display:" the sound of Zhang San
Sound ", preservation, the phonetic storage of typing is in the memory 33 of the present invention.
(2) after voice signal typing, marked voice signal is sent into voice by control system of the invention automatically
The voice signal that voice acquisition module 1 collects is changed into electric signal by pretreatment unit 21, voice pretreatment unit 21, will
Analog-signal transitions are data signal, then carry out conventional signal transacting, including ambient background noise elimination, signal framing,
Filtering, preemphasis, windowed function and end-point detection etc..
(3) voice signal after having pre-processed is sent into signal characteristic abstraction unit 22, language by control system of the invention automatically
Sound signal feature extraction unit 22 extracts the characteristic parameter of reflection voice essence from pretreated voice signal, obtains spy
Levy vector xi, characteristic parameter extraction method preferably uses frequency cepstral coefficient method (MFCC), can also use spectrum envelope method, LPC
Interpolation method, LPC extraction of roots, Hilbert transform method etc. obtain acoustic feature, extract the characteristic vector system obtained after feature certainly
It is dynamic to be saved in pattern class storehouse, the corresponding pattern class of all sound characteristics of a people, if after the N number of people's voice of typing, i.e.,
N number of pattern class is obtained, if each pattern class has n characteristic parameter, so as to obtain the data that a people corresponds to voice signal pattern class
Storehouse, all data are stored in the memory 33 of the present invention, and so far, speech signal typing mode contents finish.
(4) after voice typing, tone testing can be carried out, when carrying out tone testing, it is only necessary in the behaviour of display screen 8
Make to select " tone testing pattern " in interface, the meeting control voice of central processing unit 3 input block 20 enters " tone testing mould
Formula ", display screen 8 can prompt prompting similar " in tone testing ... " simultaneously with loudspeaker 35, at this moment people do not take to do it is any
Operation, the present invention can pass through the microphone 11 in voice acquisition module 1, wireless interspeaker 12, fixed phonographic recorder 13 and smart mobile phone
Together collecting test voice, tested speech gatherer process be not any to gather in real time for one or more input tools therein
The limitation of time restriction and number.
(5) speech data collected under " tone testing pattern ", present system device can be automatically to testing language
Sound signal is pre-processed and feature extraction, the tone testing signal collected is converted into electric signal, and carry out conventional filter
Ripple, carry out signal characteristic abstraction after removing noise, windowed function and end-point detection.
(6) after test signal extraction feature, the present invention can carry out characteristic matching automatically, by the tested speech signal of extraction
Marked good sample voice characteristic parameter of the characteristic parameter in real time with typing in memory 33 of the present invention carries out feature
Match somebody with somebody, and calculate the similarity of tested speech signal and the primary speech signal of all typings, and tested speech signal is assigned to
With in its similarity highest that pattern class, the last present invention outwardly exports, " this is XXX sound " similar report
Accuse, if for example, the present invention has stored the phonetic feature signal of Zhang San, when Zhang San speaks or sung against the present invention, originally
Invention can export " this is the sound of Zhang San " automatically by identification.
When the present invention tests in public, because in test environment, it is same that the same period there may be multiple objects
When speak, that is, the voice signal collected is the signal of broadband aliasing, to prevent that the present invention is special to the voice signal that now gathers
Malfunctioned during sign extraction, the strategy that the present invention uses is, with intelligent algorithm, first matches and identify language when single people speaks
Sound characteristic parameter is simultaneously identified and stored, then system again to speaking jointly when voice signal carry out automatic screening and point
From finally exporting recognition result and report " being the common sound of Zhang San, Li Si, king five ... now " similar prompting, and carry
Show that XX voice be present fails to identify, power off key 6 is pressed when closing system.
In the present embodiment, system and device can also export to the recognition result inventory under more people's communication environments to people,
Comprising the quantity that how many people or object are spoken at the scene under test environment, and screen and play from more people while the record spoken
The content that everyone is said is isolated in identification in sound, and filters out other people sound and ambient sound.
When occur in tested speech signal the present invention do not store sample phonic signal character when, the present invention can remember automatically
The unknown phonic signal character is recorded, to remind people whether to mark and store the voice signal of the object.
In intelligent sound signal type recognition system engineering device technique field;Every to include framework 10, framework 10 is set
There is cavity, voice acquisition module 1, sound identification module 2, central processing unit 3, wireless signal transmitting-receiving dress are provided with framework 10
4, display screen 8, memory 33, mixed-media network modules mixed-media 31, RAM card 32, loudspeaker 35 and power supply 9 are put, voice acquisition module 1 includes
Microphone 11, wireless interspeaker 12 and fixed phonographic recorder 13, sound identification module 2 includes voice-input unit 20, voice is located in advance
Unit 21, speech recognition unit 22, characteristic matching identification and classification unit 23 are managed, voice signal is by voice acquisition module
1 collection, the signal collected are handled by sound identification module 2, and data-signal is preserved by memory 33, the operation stream of man-machine interaction
The visualization of the output of journey and result shows by display screen 8, loudspeaker 35 be arranged to carry out operating procedure voice message and
Recognition result is reported, mixed-media network modules mixed-media 31 is arranged to the present invention being attached with internet cloud platform, and central processing unit 3 is arranged to
Programme-control and data operation to whole system device, wireless signal transceiver 4 are arranged to wireless interspeaker 12, intelligence
Radio signal caused by mobile phone, mixed-media network modules mixed-media 31 is received, launched and is connected the present invention with internet wireless, internal memory
Technology contents in the external voice data reading database of the present invention that card 32 is arranged to have recorded are all in the protection of the present invention
In the range of, it is noted that the scope of the present invention should not be so limited to resemblance, and the moulding of framework 10 of the invention can be set
For square, cylindrical, the polygon prism bodily form or similar to other moulding such as Chinese cabbage, watermelon, stone, every moulding is different and essence
Technology contents and identical of the present invention all technology contents are also within protection scope of the present invention;Meanwhile the art skill
Art personnel make the obvious small improvement of routine or small combination on the basis of present invention, as long as technology contents are included in
The technology contents within context described in the present invention are also within the scope of the present invention.
Claims (8)
- A kind of 1. intelligent sound signal type recognition system device;It is characterized in that:Include framework (10), the framework (10) sets It is equipped with cavity, it is characterised in that voice acquisition module (1), sound identification module (2), centre are provided with framework (10) Reason device (3), wireless signal transceiver (4), display screen (8), memory (33), mixed-media network modules mixed-media (31), RAM card (32), raise one's voice Device (35) and power supply (9), voice acquisition module (1) include microphone (11), wireless interspeaker (12) and fixed phonographic recorder (13), Sound identification module (2) includes voice-input unit (20), voice pretreatment unit (21), speech recognition unit (22), characteristic matching identification and classification unit (23), voice signal are gathered by voice acquisition module (1), and the signal collected is by language Sound identification module (2) processing, data-signal are preserved by memory (33), the output of the operating process and result of man-machine interaction Visualization is shown that loudspeaker (35) is arranged to carry out operating procedure voice message and reports recognition result, net by display screen (8) Network module (31) is arranged to the present invention being attached with internet cloud platform, and central processing unit (3) is arranged to whole system The programme-control of device and data carry out computing, and wireless signal transceiver (4) is arranged to wireless interspeaker (12), intelligent hand Radio signal caused by machine, mixed-media network modules mixed-media (31) is received, launched and is connected the present invention with internet wireless, internal memory The external voice data that card (32) is arranged to have recorded are read in database of the present invention.
- 2. intelligent sound signal type recognition system device according to claim 1;It is characterized in that:Voice-input unit (20) it is arranged to include " voice typing pattern " and " tone testing pattern " two types, voice acquisition module (1) can be passed through The microphone (11) that is there is provided, wireless interspeaker (12), fixed phonographic recorder (13) and smart mobile phone any one mode input voice, In " voice typing pattern ", voice-input unit (20) is arranged to that once Speech Record can only be carried out to a people or an object Entering, it is characterised in that the voice of typing is one section of audio signal of 5~30 seconds, and the present invention uses multimode voice typing strategy, Characterized in that, the multimode combine voice of normal speech, singing or high/medium/low sound can be included in the voice of typing, show Show device (8) real-time display speech waveform and schedule bar, need to carry out data markers after typing voice, labeling method is adopted Manually hand labeled, the remarks in the dialog box of display screen (8) display:" XXX sound ", preservation, the voice of typing It is stored in memory (33), under " tone testing pattern ", by the microphone (11) in voice acquisition module (1), wireless right Machine (12), fixed phonographic recorder (13) and smart mobile phone one or more input tools therein together collecting test voice are said, is surveyed Voice collecting process is tried as collection in real time, nobody number, object and the limitation of time, smart mobile phone and voice acquisition module (1) no lines matching connection is arranged to, matching way includes bluetooth, infrared ray, WIFI and scanning Quick Response Code and is attached, realizes Voice typing, used equivalent to mobile phone as wireless language cylinder, realize more crowd's voice interfaces.
- 3. intelligent sound signal type recognition system device according to claim 1;It is characterized in that:Voice pretreatment unit (21) voice signal that voice acquisition module (1) collects is changed into electric signal, i.e., is data signal by analog-signal transitions, Then conventional signal transacting, including ambient background noise elimination, signal framing, filtering, preemphasis, windowed function and end are carried out Point detection.
- 4. intelligent sound signal type recognition system device according to claim 1;It is characterized in that:Phonic signal character carries Take unit (22) to be arranged to extract the main characteristic parameters of reflection voice essence from primary speech signal, form characteristic vector xi, xi=(xi1,xi2,…xij,…,xin)T, xijRepresent that i-th of object or j-th of speech characteristic value of individual, characteristic parameter carry Take method to use frequency cepstral coefficient method (MFCC), can also use spectrum envelope method, LPC interpolation methods, LPC extraction of roots, Hilbert Converter technique etc. obtains acoustic feature, and extracting the characteristic vector system obtained after feature will be automatically saved in pattern class database, One object or the corresponding pattern class of all sound characteristics of people, if after the voice of the N number of people of typing or object, that is, are obtained N number of Pattern class, if each pattern class has n characteristic parameter, you can form n dimensional feature spaces, that is, the characteristic signal collection after marking can be remembered For D={ (x1,y1),(x2,y2),…(xi,yi),…,(xN,yN), wherein xi∈ χ=Rn, xiRepresent i-th of object of institute's typing Or the phonetic feature signal of people, yi∈ Y={ 1,2 ..., N }, yiRepresent i-th of people or object, N represents n-th people or object Digital number, the voice feature data after mark forms pattern class database, and is stored in the memory (33) of the present invention.
- 5. intelligent sound signal type recognition system device according to claim 1;It is characterized in that:Characteristic matching, which differentiates, divides Class unit (23) is arranged to the multi classifier using intelligence, and the learning algorithm of grader is arranged to use improved neutral net Sorting algorithm, by being used as training data to the phonetic feature signal collection of typing and mark, network model is allowed to training data Learnt, obtain classifying rules, complete the training of grader;Then using the grader trained to unknown test Voice signal carries out intelligent classification and identification;After test signal extracts feature, the present invention can carry out characteristic matching automatically, will carry The characteristic parameter of the tested speech signal taken in real time with typing in memory of the present invention (33) and the sample voice marked Characteristic parameter carries out characteristic matching, and calculates tested speech signal and the similarity of the sample speech signal of all typings, so Tested speech signal is divided into and in its similarity highest that sample signal mode classification, the last present invention is outwardly afterwards Export recognition result, " this is XXX sound " similar report.
- 6. intelligent sound signal type recognition system device according to claim 1;It is characterized in that:Multi classifier uses Multi-layer artificial neural network structure, it is characterised in that one end of network is defined as input layer, and the other end is defined as output layer, Part among input layer and output layer is defined as hidden layer, and input layer is used for the input signal for receiving the external world, again will input Signal is sent to all neurons of hidden layer, passes the result to output layer after hidden layer calculates, and output layer is from hidden layer Reception signal, output category result after calculating, that is, the result identified, the number of plies of currently preferred hidden layer are arranged to 1~200 Layer.
- 7. intelligent sound signal type recognition system device according to claim 1;It is characterized in that:Improve ANN The step of network training, is as follows:Step 1:Netinit;According to the number of voice signal typing, algorithm data-base is constantly updated, it is N number of right when typing During the voice signal of elephant, that is, N number of pattern class is formed, obtain sample space (X, Y), i-th group of sample is (Xi,Yi), XiRepresent to the The characteristic vector set that i object is extracted, YiRepresent i-th marked of object;According to system input and output sequence (X, Y) Network input layer nodal point number n, hidden layer node number l, output layer nodal point number m are determined, wherein n values are by input signal feature extraction The number of character pair value determines that m values are determined that l reference point is by the number of the speech pattern class stored Wherein a span is 0~10, calculates determination automatically by model, is initialized between input layer and the neuron of hidden layer Connection weight ωijConnection weight ω between hidden layer and output layer neuronjk, initialize hidden layer threshold value a and output layer Threshold value b, give learning rate η and neuron excitation function;Step 2:Calculate the output of hidden layer;According to input change X, the connection weight ω of the neuron of input layer and hidden layerij, And hidden layer threshold value a, calculate hidden layer output H;The output for remembering j-th of hidden layer node is Hj, J=1,2 ..., l, wherein l are hidden layer node number, and f is general hidden layer excitation function, the excitation function have it is a variety of, the present invention it is excellent Choosing uses f (x)=(1+e-x)-1;Step 3:Calculate the output of output layer;H, the connection weight between hidden layer and output layer neuron are exported according to hidden layer ωjk, and output layer threshold value b, output layer output O is calculated, the output of k-th of output layer node of note is Ok,K=1,2 ..., m, wherein m are output layer nodal point number, bkFor the threshold value of k-th of node of output layer, HjFor the output valve of j-th of node of hidden layer;Step 4:Calculate prediction error;The output O and desired output Y (true value) obtained according to neural network forecast, calculating network prediction Overall error e,ekFor error caused by k-th of output layer node,<mrow> <msub> <mi>e</mi> <mi>k</mi> </msub> <mo>=</mo> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> <msup> <mrow> <mo>(</mo> <msub> <mi>Y</mi> <mi>k</mi> </msub> <mo>-</mo> <msub> <mi>O</mi> <mi>k</mi> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>;</mo> </mrow>Step 5:Update weights;Network connection weights ω is updated according to neural network forecast overall error ejkAnd ωij, ωjk +=ωjk+η· Hj·Ek, wherein j=1,2 ..., l, k=1,2 ..., m, η are learning rate, EkRepresent the network overall error of output layer node to defeated Go out layer network node k sensitivity, Wherein i=1,2 ..., n, j=1,2 ..., l;Step 6:Threshold value updates;Hidden layer threshold value a and output layer threshold value b is updated according to neural network forecast overall error e,J=1,2 ..., l;bk +=bk+η·Ek, k=1,2 ..., m;Step 7:Whether evaluation algorithm iteration restrains, if no convergence return to step 2, currently preferred minimal error is 0.001 When terminate iteration.
- 8. according to the intelligent sound signal type recognition system device described in any one of claim 1,2,3,4,5,6,7;It is special Levying is being:Basic operating system flow setting is:1) (5) are turned on the power switch, then automated system operation, display screen (8) is lighted and operation display interface, people can select Select " voice typing pattern " and " tone testing pattern " two kinds of functions;When selection Speech Record is fashionable, central processing unit (3) can control Voice-input unit (20) enters " voice typing pattern ", and display screen (8) and loudspeaker (35) can prompt " to be voice now simultaneously Typing pattern, please speak " similar prompting, people can pass through microphone (11) that voice acquisition module (1) is provided, wireless talkback Machine (12), fixed phonographic recorder (13) and smart mobile phone any one mode input voice;To ensure that the present invention can accurately identify The phonetic feature of object is identified with quantization, therefore every time can only be to a people or an object in " the voice typing pattern " stage Voice typing is carried out, because the sound signal data that same people sends when speaking and singing can have certain feature deviation, Therefore, to improve the degree of accuracy of voice signal identification, the present invention uses multimode voice typing strategy, i.e., can in the voice of typing Comprising the multimode combined sound under normal speech, singing or high/medium/low sound and other states, long recording time is 5~30 Second, display (8) can show voice real-time waveform and schedule bar, if undesirable can delete of voice recorded is recorded again Enter, need to carry out data markers after typing voice, labeling method has such as gathered the sound of Zhang San using manually marking Sound, i.e., the remarks in the dialog box of display screen of the present invention (8) display:" sound of Zhang San ", preservation, the phonetic storage of typing In the memory (33) of the present invention;2) after voice signal typing, marked voice signal is sent into voice pretreatment by control system of the invention automatically The voice signal that voice acquisition module (1) collects is changed into electric signal by unit (21), voice pretreatment unit (21), will Analog-signal transitions are data signal, then carry out conventional signal transacting, including ambient background noise elimination, signal framing, Filtering, preemphasis, windowed function and end-point detection etc.;3) voice signal after having pre-processed is sent into signal characteristic abstraction unit (22), voice letter by control system of the invention automatically Number feature extraction unit (22) extracts the characteristic parameter of reflection voice essence from pretreated voice signal, obtains feature Vector xi, characteristic parameter extraction method uses frequency cepstral coefficient method (MFCC), can also use spectrum envelope method, LPC interpolation methods, LPC extraction of roots, Hilbert transform method etc. obtain acoustic feature, extract the characteristic vector system obtained after feature and are automatically saved to In pattern class storehouse, the corresponding pattern class of all sound characteristics of a people, if after the N number of people's voice of typing, that is, N number of mould is obtained Formula class, it is all so as to obtain the database that a people corresponds to voice signal pattern class if each pattern class has n characteristic parameter Data are stored in the memory (33) of the present invention, and so far, speech signal typing mode contents finish;4) after voice typing, tone testing can be carried out, when carrying out tone testing, it is only necessary to the operation in display screen (8) " tone testing pattern " is selected in interface, central processing unit (3) meeting control voice input block (20) enters " tone testing Pattern ", display screen (8) can prompt prompting similar " in tone testing ... " simultaneously with loudspeaker (35), and at this moment people, which do not take, wants Any operation is done, the present invention can pass through the microphone (11) in voice acquisition module (1), wireless interspeaker (12), fixed phonographic recorder (13) and smart mobile phone one or more input tools therein together collecting test voice, tested speech gatherer process are real-time Collection, without any time limitation and the limitation of number;5) speech data collected under " tone testing pattern ", present system device can be automatically to tested speech signals Pre-processed and feature extraction, the tone testing signal collected is converted into electric signal, and carried out conventional filtering, remove Signal characteristic abstraction is carried out after noise, windowed function and end-point detection;6) after test signal extraction feature, the present invention can carry out characteristic matching automatically, by the feature of the tested speech signal of extraction Marked good sample voice characteristic parameter of the parameter in real time with typing in memory of the present invention (33) carries out characteristic matching, And calculate the similarity of the primary speech signal of tested speech signal and all typings, and tested speech signal assign to and its In that pattern class of similarity highest, the last present invention outwardly exports, " this is XXX sound " similar report, example Such as, if the present invention has stored the phonetic feature signal of Zhang San, when Zhang San speaks or sung against the present invention, present invention warp Identification is crossed, " this is the sound of Zhang San " can be exported automatically;7) system and device can also be exported to the recognition result inventory under more people's communication environments to people, comprising having under test environment The quantity that how many people or object are spoken at the scene, and screen and play the identification from more people while the recording spoken isolate it is every The content that individual is said, and other people sound and ambient sound are filtered out, do not deposited when occurring the present invention in tested speech signal During the sample phonic signal character of storage, the present invention can record the unknown phonic signal character automatically, to remind whether people mark Remember and store the voice signal of the object.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711253194.6A CN107808659A (en) | 2017-12-02 | 2017-12-02 | Intelligent sound signal type recognition system device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711253194.6A CN107808659A (en) | 2017-12-02 | 2017-12-02 | Intelligent sound signal type recognition system device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107808659A true CN107808659A (en) | 2018-03-16 |
Family
ID=61589300
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711253194.6A Pending CN107808659A (en) | 2017-12-02 | 2017-12-02 | Intelligent sound signal type recognition system device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107808659A (en) |
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108520752A (en) * | 2018-04-25 | 2018-09-11 | 西北工业大学 | A kind of method for recognizing sound-groove and device |
CN108564954A (en) * | 2018-03-19 | 2018-09-21 | 平安科技(深圳)有限公司 | Deep neural network model, electronic device, auth method and storage medium |
CN108597521A (en) * | 2018-05-04 | 2018-09-28 | 徐涌 | Audio role divides interactive system, method, terminal and the medium with identification word |
CN108877823A (en) * | 2018-07-27 | 2018-11-23 | 三星电子(中国)研发中心 | Sound enhancement method and device |
CN109448726A (en) * | 2019-01-14 | 2019-03-08 | 李庆湧 | A kind of method of adjustment and system of voice control accuracy rate |
CN109611703A (en) * | 2018-10-19 | 2019-04-12 | 宁波市鄞州利帆灯饰有限公司 | A kind of LED light being easily installed |
CN109714491A (en) * | 2019-02-26 | 2019-05-03 | 上海凯岸信息科技有限公司 | Intelligent sound outgoing call detection system based on voice mail |
CN109785855A (en) * | 2019-01-31 | 2019-05-21 | 秒针信息技术有限公司 | Method of speech processing and device, storage medium, processor |
CN109801619A (en) * | 2019-02-13 | 2019-05-24 | 安徽大尺度网络传媒有限公司 | A kind of across language voice identification method for transformation of intelligence |
CN109859763A (en) * | 2019-02-13 | 2019-06-07 | 安徽大尺度网络传媒有限公司 | A kind of intelligent sound signal type recognition system |
CN109936814A (en) * | 2019-01-16 | 2019-06-25 | 深圳市北斗智能科技有限公司 | A kind of intercommunication terminal, speech talkback coordinated dispatching method and its system |
CN110033785A (en) * | 2019-03-27 | 2019-07-19 | 深圳市中电数通智慧安全科技股份有限公司 | A kind of calling for help recognition methods, device, readable storage medium storing program for executing and terminal device |
CN110060717A (en) * | 2019-01-02 | 2019-07-26 | 孙剑 | A kind of law enforcement equipment laws for criterion speech French play system |
CN110289016A (en) * | 2019-06-20 | 2019-09-27 | 深圳追一科技有限公司 | A kind of voice quality detecting method, device and electronic equipment based on actual conversation |
CN111314451A (en) * | 2020-02-07 | 2020-06-19 | 普强时代(珠海横琴)信息技术有限公司 | Language processing system based on cloud computing application |
CN111475206A (en) * | 2019-01-04 | 2020-07-31 | 优奈柯恩(北京)科技有限公司 | Method and apparatus for waking up wearable device |
CN111603191A (en) * | 2020-05-29 | 2020-09-01 | 上海联影医疗科技有限公司 | Voice noise reduction method and device in medical scanning and computer equipment |
CN111674360A (en) * | 2019-01-31 | 2020-09-18 | 青岛科技大学 | Method for establishing distinguishing sample model in vehicle tracking system based on block chain |
CN111989742A (en) * | 2018-04-13 | 2020-11-24 | 三菱电机株式会社 | Speech recognition system and method for using speech recognition system |
CN113572492A (en) * | 2021-06-23 | 2021-10-29 | 力声通信股份有限公司 | Novel communication equipment prevents falling digital intercom |
Citations (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH11265197A (en) * | 1997-12-13 | 1999-09-28 | Hyundai Electronics Ind Co Ltd | Voice recognizing method utilizing variable input neural network |
US6026358A (en) * | 1994-12-22 | 2000-02-15 | Justsystem Corporation | Neural network, a method of learning of a neural network and phoneme recognition apparatus utilizing a neural network |
CN1662956A (en) * | 2002-06-19 | 2005-08-31 | 皇家飞利浦电子股份有限公司 | Mega speaker identification (ID) system and corresponding methods therefor |
CN1941080A (en) * | 2005-09-26 | 2007-04-04 | 吴田平 | Soundwave discriminating unlocking module and unlocking method for interactive device at gate of building |
CN101419799A (en) * | 2008-11-25 | 2009-04-29 | 浙江大学 | Speaker identification method based mixed t model |
US20100057453A1 (en) * | 2006-11-16 | 2010-03-04 | International Business Machines Corporation | Voice activity detection system and method |
CN103236260A (en) * | 2013-03-29 | 2013-08-07 | 京东方科技集团股份有限公司 | Voice recognition system |
CN103456301A (en) * | 2012-05-28 | 2013-12-18 | 中兴通讯股份有限公司 | Ambient sound based scene recognition method and device and mobile terminal |
CN103619021A (en) * | 2013-12-10 | 2014-03-05 | 天津工业大学 | Neural network-based intrusion detection algorithm for wireless sensor network |
JP2014048534A (en) * | 2012-08-31 | 2014-03-17 | Sogo Keibi Hosho Co Ltd | Speaker recognition device, speaker recognition method, and speaker recognition program |
US20140195236A1 (en) * | 2013-01-10 | 2014-07-10 | Sensory, Incorporated | Speaker verification and identification using artificial neural network-based sub-phonetic unit discrimination |
CN104008751A (en) * | 2014-06-18 | 2014-08-27 | 周婷婷 | Speaker recognition method based on BP neural network |
US20160260428A1 (en) * | 2013-11-27 | 2016-09-08 | National Institute Of Information And Communications Technology | Statistical acoustic model adaptation method, acoustic model learning method suitable for statistical acoustic model adaptation, storage medium storing parameters for building deep neural network, and computer program for adapting statistical acoustic model |
CN106128465A (en) * | 2016-06-23 | 2016-11-16 | 成都启英泰伦科技有限公司 | A kind of Voiceprint Recognition System and method |
CN106227038A (en) * | 2016-07-29 | 2016-12-14 | 中国人民解放军信息工程大学 | Grain drying tower intelligent control method based on neutral net and fuzzy control |
CN106779053A (en) * | 2016-12-15 | 2017-05-31 | 福州瑞芯微电子股份有限公司 | The knowledge point of a kind of allowed for influencing factors and neutral net is known the real situation method |
CN106782603A (en) * | 2016-12-22 | 2017-05-31 | 上海语知义信息技术有限公司 | Intelligent sound evaluating method and system |
CN106875943A (en) * | 2017-01-22 | 2017-06-20 | 上海云信留客信息科技有限公司 | A kind of speech recognition system for big data analysis |
US20170178666A1 (en) * | 2015-12-21 | 2017-06-22 | Microsoft Technology Licensing, Llc | Multi-speaker speech separation |
US20170270919A1 (en) * | 2016-03-21 | 2017-09-21 | Amazon Technologies, Inc. | Anchored speech detection and speech recognition |
CN112541533A (en) * | 2020-12-07 | 2021-03-23 | 阜阳师范大学 | Modified vehicle identification method based on neural network and feature fusion |
-
2017
- 2017-12-02 CN CN201711253194.6A patent/CN107808659A/en active Pending
Patent Citations (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6026358A (en) * | 1994-12-22 | 2000-02-15 | Justsystem Corporation | Neural network, a method of learning of a neural network and phoneme recognition apparatus utilizing a neural network |
JPH11265197A (en) * | 1997-12-13 | 1999-09-28 | Hyundai Electronics Ind Co Ltd | Voice recognizing method utilizing variable input neural network |
CN1662956A (en) * | 2002-06-19 | 2005-08-31 | 皇家飞利浦电子股份有限公司 | Mega speaker identification (ID) system and corresponding methods therefor |
CN1941080A (en) * | 2005-09-26 | 2007-04-04 | 吴田平 | Soundwave discriminating unlocking module and unlocking method for interactive device at gate of building |
US20100057453A1 (en) * | 2006-11-16 | 2010-03-04 | International Business Machines Corporation | Voice activity detection system and method |
CN101419799A (en) * | 2008-11-25 | 2009-04-29 | 浙江大学 | Speaker identification method based mixed t model |
CN103456301A (en) * | 2012-05-28 | 2013-12-18 | 中兴通讯股份有限公司 | Ambient sound based scene recognition method and device and mobile terminal |
JP2014048534A (en) * | 2012-08-31 | 2014-03-17 | Sogo Keibi Hosho Co Ltd | Speaker recognition device, speaker recognition method, and speaker recognition program |
US20140195236A1 (en) * | 2013-01-10 | 2014-07-10 | Sensory, Incorporated | Speaker verification and identification using artificial neural network-based sub-phonetic unit discrimination |
CN103236260A (en) * | 2013-03-29 | 2013-08-07 | 京东方科技集团股份有限公司 | Voice recognition system |
US20160260428A1 (en) * | 2013-11-27 | 2016-09-08 | National Institute Of Information And Communications Technology | Statistical acoustic model adaptation method, acoustic model learning method suitable for statistical acoustic model adaptation, storage medium storing parameters for building deep neural network, and computer program for adapting statistical acoustic model |
CN103619021A (en) * | 2013-12-10 | 2014-03-05 | 天津工业大学 | Neural network-based intrusion detection algorithm for wireless sensor network |
CN104008751A (en) * | 2014-06-18 | 2014-08-27 | 周婷婷 | Speaker recognition method based on BP neural network |
US20170178666A1 (en) * | 2015-12-21 | 2017-06-22 | Microsoft Technology Licensing, Llc | Multi-speaker speech separation |
US20170270919A1 (en) * | 2016-03-21 | 2017-09-21 | Amazon Technologies, Inc. | Anchored speech detection and speech recognition |
CN106128465A (en) * | 2016-06-23 | 2016-11-16 | 成都启英泰伦科技有限公司 | A kind of Voiceprint Recognition System and method |
CN106227038A (en) * | 2016-07-29 | 2016-12-14 | 中国人民解放军信息工程大学 | Grain drying tower intelligent control method based on neutral net and fuzzy control |
CN106779053A (en) * | 2016-12-15 | 2017-05-31 | 福州瑞芯微电子股份有限公司 | The knowledge point of a kind of allowed for influencing factors and neutral net is known the real situation method |
CN106782603A (en) * | 2016-12-22 | 2017-05-31 | 上海语知义信息技术有限公司 | Intelligent sound evaluating method and system |
CN106875943A (en) * | 2017-01-22 | 2017-06-20 | 上海云信留客信息科技有限公司 | A kind of speech recognition system for big data analysis |
CN112541533A (en) * | 2020-12-07 | 2021-03-23 | 阜阳师范大学 | Modified vehicle identification method based on neural network and feature fusion |
Non-Patent Citations (4)
Title |
---|
刘拥军等;: "基于神经网络算法的粮食智能控制系统研究", 计算机与数字工程, vol. 44, no. 07, pages 1271 - 1276 * |
曾向阳等: "声信号处理基础", vol. 1, 30 September 2015, 西北工业大学出版社, pages: 160 - 163 * |
王小川等: "MATLAB神经网络43个案例分析", vol. 1, 31 August 2013, 北京航空航天大学出版社 , pages: 8 - 10 * |
赵力: "语音信号处理", vol. 1, 31 March 2003, 机械工业出版社, pages: 141 - 145 * |
Cited By (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108564954A (en) * | 2018-03-19 | 2018-09-21 | 平安科技(深圳)有限公司 | Deep neural network model, electronic device, auth method and storage medium |
CN108564954B (en) * | 2018-03-19 | 2020-01-10 | 平安科技(深圳)有限公司 | Deep neural network model, electronic device, identity verification method, and storage medium |
CN111989742A (en) * | 2018-04-13 | 2020-11-24 | 三菱电机株式会社 | Speech recognition system and method for using speech recognition system |
CN108520752B (en) * | 2018-04-25 | 2021-03-12 | 西北工业大学 | Voiceprint recognition method and device |
CN108520752A (en) * | 2018-04-25 | 2018-09-11 | 西北工业大学 | A kind of method for recognizing sound-groove and device |
CN108597521A (en) * | 2018-05-04 | 2018-09-28 | 徐涌 | Audio role divides interactive system, method, terminal and the medium with identification word |
CN108877823A (en) * | 2018-07-27 | 2018-11-23 | 三星电子(中国)研发中心 | Sound enhancement method and device |
CN109611703A (en) * | 2018-10-19 | 2019-04-12 | 宁波市鄞州利帆灯饰有限公司 | A kind of LED light being easily installed |
CN110060717A (en) * | 2019-01-02 | 2019-07-26 | 孙剑 | A kind of law enforcement equipment laws for criterion speech French play system |
CN111475206B (en) * | 2019-01-04 | 2023-04-11 | 优奈柯恩(北京)科技有限公司 | Method and apparatus for waking up wearable device |
CN111475206A (en) * | 2019-01-04 | 2020-07-31 | 优奈柯恩(北京)科技有限公司 | Method and apparatus for waking up wearable device |
CN109448726A (en) * | 2019-01-14 | 2019-03-08 | 李庆湧 | A kind of method of adjustment and system of voice control accuracy rate |
CN109936814A (en) * | 2019-01-16 | 2019-06-25 | 深圳市北斗智能科技有限公司 | A kind of intercommunication terminal, speech talkback coordinated dispatching method and its system |
CN111674360A (en) * | 2019-01-31 | 2020-09-18 | 青岛科技大学 | Method for establishing distinguishing sample model in vehicle tracking system based on block chain |
CN109785855A (en) * | 2019-01-31 | 2019-05-21 | 秒针信息技术有限公司 | Method of speech processing and device, storage medium, processor |
CN109785855B (en) * | 2019-01-31 | 2022-01-28 | 秒针信息技术有限公司 | Voice processing method and device, storage medium and processor |
CN109859763A (en) * | 2019-02-13 | 2019-06-07 | 安徽大尺度网络传媒有限公司 | A kind of intelligent sound signal type recognition system |
CN109801619A (en) * | 2019-02-13 | 2019-05-24 | 安徽大尺度网络传媒有限公司 | A kind of across language voice identification method for transformation of intelligence |
CN109714491A (en) * | 2019-02-26 | 2019-05-03 | 上海凯岸信息科技有限公司 | Intelligent sound outgoing call detection system based on voice mail |
CN110033785A (en) * | 2019-03-27 | 2019-07-19 | 深圳市中电数通智慧安全科技股份有限公司 | A kind of calling for help recognition methods, device, readable storage medium storing program for executing and terminal device |
CN110289016A (en) * | 2019-06-20 | 2019-09-27 | 深圳追一科技有限公司 | A kind of voice quality detecting method, device and electronic equipment based on actual conversation |
CN111314451A (en) * | 2020-02-07 | 2020-06-19 | 普强时代(珠海横琴)信息技术有限公司 | Language processing system based on cloud computing application |
CN111603191A (en) * | 2020-05-29 | 2020-09-01 | 上海联影医疗科技有限公司 | Voice noise reduction method and device in medical scanning and computer equipment |
CN111603191B (en) * | 2020-05-29 | 2023-10-20 | 上海联影医疗科技股份有限公司 | Speech noise reduction method and device in medical scanning and computer equipment |
CN113572492A (en) * | 2021-06-23 | 2021-10-29 | 力声通信股份有限公司 | Novel communication equipment prevents falling digital intercom |
CN113572492B (en) * | 2021-06-23 | 2022-08-16 | 力声通信股份有限公司 | Communication equipment prevents falling digital intercom |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107808659A (en) | Intelligent sound signal type recognition system device | |
CN109559736B (en) | Automatic dubbing method for movie actors based on confrontation network | |
CN105374356B (en) | Audio recognition method, speech assessment method, speech recognition system and speech assessment system | |
CN107221320A (en) | Train method, device, equipment and the computer-readable storage medium of acoustic feature extraction model | |
CN107680582A (en) | Acoustic training model method, audio recognition method, device, equipment and medium | |
CN107610707A (en) | A kind of method for recognizing sound-groove and device | |
CN106504768B (en) | Phone testing audio frequency classification method and device based on artificial intelligence | |
CN108305615A (en) | A kind of object identifying method and its equipment, storage medium, terminal | |
CN110838286A (en) | Model training method, language identification method, device and equipment | |
CN108281137A (en) | A kind of universal phonetic under whole tone element frame wakes up recognition methods and system | |
CN108364662B (en) | Voice emotion recognition method and system based on paired identification tasks | |
CN107767869A (en) | Method and apparatus for providing voice service | |
CN110299135A (en) | Intelligent sound signal mode automatic recognition system device | |
CN110428843A (en) | A kind of voice gender identification deep learning method | |
CN110610709A (en) | Identity distinguishing method based on voiceprint recognition | |
CN105679313A (en) | Audio recognition alarm system and method | |
CN104903954A (en) | Speaker verification and identification using artificial neural network-based sub-phonetic unit discrimination | |
CN112259104B (en) | Training device for voiceprint recognition model | |
CN110047506B (en) | Key audio detection method based on convolutional neural network and multi-core learning SVM | |
CN109271533A (en) | A kind of multimedia document retrieval method | |
CN108876951A (en) | A kind of teaching Work attendance method based on voice recognition | |
CN109473119A (en) | A kind of acoustic target event-monitoring method | |
CN108806694A (en) | A kind of teaching Work attendance method based on voice recognition | |
CN107507625A (en) | Sound source distance determines method and device | |
CN103811000A (en) | Voice recognition system and voice recognition method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |