CN107808659A - Intelligent sound signal type recognition system device - Google Patents

Intelligent sound signal type recognition system device Download PDF

Info

Publication number
CN107808659A
CN107808659A CN201711253194.6A CN201711253194A CN107808659A CN 107808659 A CN107808659 A CN 107808659A CN 201711253194 A CN201711253194 A CN 201711253194A CN 107808659 A CN107808659 A CN 107808659A
Authority
CN
China
Prior art keywords
voice
signal
typing
sound
present
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711253194.6A
Other languages
Chinese (zh)
Inventor
宫文峰
张泽辉
刘志勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201711253194.6A priority Critical patent/CN107808659A/en
Publication of CN107808659A publication Critical patent/CN107808659A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/285Selection of pattern recognition techniques, e.g. of classifiers in a multi-classifier system
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A kind of intelligent sound signal type recognition system device,Include framework 10,The framework 10 is provided with cavity,Voice acquisition module 1 is provided with framework 10,Sound identification module 2,Central processing unit 3,Wireless signal transceiver 4,Display screen 8,Memory 33,Mixed-media network modules mixed-media 31,RAM card 32,Loudspeaker 35 and power supply 9,Voice acquisition module 1 includes microphone 11,Wireless interspeaker 12 and fixed phonographic recorder 13,Sound identification module 2 includes voice-input unit 20,Voice pretreatment unit 21,Speech recognition unit 22,Characteristic matching identification and classification unit 23,Voice signal is gathered by voice acquisition module 1,The signal collected is handled by sound identification module 2,Data-signal is preserved by memory 33,The visualization of the output of the operating process and result of man-machine interaction is shown by display screen 8,Therefore,People's recognition of speech signals is more convenient.

Description

Intelligent sound signal type recognition system device
Technical field
The invention discloses a kind of intelligent sound signal type recognition system device, belongs to smart electronicses product technology neck Domain, specifically it is equipped with one kind intelligence that voice acquisition module, sound identification module, control system and loudspeaker are integrated Voice signal PRS device.
Background technology
In daily life, there is the voice letter that various language signals, such as the exchange of people are sent Number, sound caused by machine operation, play sound etc. caused by the sound, vehicle whistle that music sends, voice signal almost fills Denounce around whole living environment, be sometimes right by which in one group of voice signal it is desirable to accurately learn and identify As sending.In common voice signal, people can often identify different sound and be sent by what object, but Be when a variety of objects are simultaneously emitted by sound, especially multiple homogeneous objects simultaneously sounding when, or playback environ-ment is noisy, people It is difficult to distinguish which kind of sound is sent by which object, for example, in the recording that people more than one group argues phenomenon, the number of speech When more, people be difficult by playback and distinguish which words be which debater says.Therefore, people usually require a kind of energy Enough identify the device of voice.
Before making the present invention, on the market there is also the product of some identification voices, such as some phonetic entry softwares, but It is to identify the word or letter in voice mostly to be, or carries out pairing identification to simple single voice, and what is also had can be with By being spoken against products such as mobile phones, some tasks are completed after handset identity voice semanteme, such as make a phone call to search for simple task, But the difference to phonetic feature can not be realized, it is impossible to which it is by which that accurately comparable speech or identical word are distinguished in identification The Similar Problems that people or object are said.Therefore, it is not easy to the flexible use of people.
The content of the invention
In order to overcome above-mentioned technical disadvantages, it is an object of the invention to provide a kind of intelligent sound signal type recognition system dress Put, in that context it may be convenient to identification and record voice signal and proposition characteristic parameter, and by carrying out existing signal to unknown voice Signal carries out intelligent mode identification, classification and extraction.
To reach above-mentioned purpose, the present invention adopts the technical scheme that:Including framework 10, framework 10 is provided with cavity, Voice acquisition module 1, sound identification module 2, central processing unit 3, wireless signal transceiver 4, aobvious is provided with framework 10 Display screen 8, memory 33, mixed-media network modules mixed-media 31, RAM card 32, loudspeaker 35 and power supply 9, voice acquisition module 1 include microphone 11, Wireless interspeaker 12 and fixed phonographic recorder 13, sound identification module 2 include voice-input unit 20, voice pretreatment unit 21st, speech recognition unit 22, characteristic matching identification and classification unit 23, voice signal is gathered by voice acquisition module 1, The signal collected is handled by sound identification module 2, and data-signal is preserved by memory 33, the operating process of man-machine interaction and As a result the visualization of output is shown that loudspeaker 35 is arranged to carry out voice message to operating procedure and report to know by display screen 8 Other result, mixed-media network modules mixed-media 31 are arranged to the present invention being attached with internet cloud platform, and central processing unit 3 is arranged to whole The programme-control of system and device and data operation, wireless signal transceiver 4 be arranged to wireless interspeaker 12, smart mobile phone, Radio signal caused by mixed-media network modules mixed-media 31 is received, launched and is connected the present invention with internet wireless, RAM card 32 The external voice data for being arranged to have recorded are read in database of the present invention.
The present invention devises, and voice-input unit 20 is arranged to include " voice typing pattern " and " tone testing pattern " Two types, microphone 11 that voice acquisition module 1 is provided, wireless interspeaker 12, fixed phonographic recorder 13 and intelligent hand can be passed through Any one mode of machine inputs voice, and in " voice typing pattern ", voice-input unit 20 is arranged to once can only be to one People or an object carry out voice typing, it is characterised in that the voice of typing is one section of audio signal of 5~30 seconds, the present invention Using multimode voice typing strategy, it is characterised in that can include in the voice of typing normal speech, singing or height/in/ The multimode combine voice of bass, the real-time display speech waveform of display 8 and schedule bar, need after typing voice into Row data markers, labeling method have such as gathered the sound of Zhang San, i.e., shown in display screen 8 of the present invention using manually marking Dialog box in remarks:" sound of Zhang San ", preservation, the voice of typing are preserved in the memory 33, in " tone testing mould Under formula ", the present invention by the microphone 11 in voice acquisition module 1, wireless interspeaker 12, fixed phonographic recorder 13 and smart mobile phone its In one or more input tools together collecting test voice, tested speech gatherer process is collection in real time, nobody Number, object and the limitation of time.
The present invention devises, and voice-input unit 20 is arranged to be connected with voice acquisition module 1, and microphone 11 passes through audio Line is connected to voice acquisition module 1, and wireless interspeaker 12 is connected by radio signal with voice acquisition module 1.
The present invention devises, and voice acquisition module 1 can also use smart mobile phone to carry out voice signal input, by using mobile phone Connection is matched with voice acquisition module 1 of the present invention, matching way, which includes bluetooth, infrared ray, WIFI and scanning Quick Response Code, to be carried out Connection, realizes voice typing, is used equivalent to mobile phone as wireless language cylinder, more convenient for more crowd's voice interfaces.
The present invention is devised, and the voice signal that voice acquisition module 1 collects is changed into electricity by voice pretreatment unit 21 Signal, i.e., be data signal by analog-signal transitions, then carry out conventional signal transacting, including ambient background noise eliminate, Signal framing, filtering, preemphasis, windowed function and end-point detection etc..
The present invention devises, and speech recognition unit 22 is arranged to extract reflection language from primary speech signal The main characteristic parameters of sound essence, form characteristic vector xi, xi=(xi1,xi2,…xij,…,xin)T, xijRepresent i-th of object Or j-th personal of speech characteristic value, characteristic parameter extraction method preferably use frequency cepstral coefficient method (MFCC), can also adopted Acoustic feature is obtained with spectrum envelope method, LPC interpolation methods, LPC extraction of roots, Hilbert transform method etc., is obtained after extraction feature Characteristic vector system will be automatically saved in pattern class database, the corresponding pattern of all sound characteristics of an object or people Class, if after the voice of the N number of people of typing or object, that is, N number of pattern class is obtained, if each pattern class has n characteristic parameter, you can structure Into n dimensional feature spaces, that is, the characteristic signal collection after marking can be designated as D={ (x1,y1),(x2,y2),…(xi,yi),…,(xN, yN), wherein xi∈ χ=Rn, xiRepresent i-th of object of institute's typing or the phonetic feature signal of people, yi∈ Y={ 1,2 ..., N }, yiRepresent i-th of people or object, N represents the digital number of n-th people or object, and the voice feature data after mark forms pattern Class database, and be stored in the memory 33 of the present invention.
The present invention devises, and characteristic matching identification and classification unit 23 is arranged to the multi classifier using intelligence, grader Learning algorithm be arranged to use improved neural network classification algorithm, pass through to typing and mark phonetic feature signal collection As training data, allow network model to learn training data, obtain classifying rules, complete the training of grader;Then Intelligent classification and identification are carried out to unknown tested speech signal using the grader trained;When test signal extracts spy After sign, the present invention can carry out characteristic matching automatically, and the characteristic parameter of the tested speech signal of extraction is deposited with the present invention in real time Typing and the sample voice characteristic parameter progress characteristic matching marked in reservoir 33, and tested speech signal is calculated with owning The similarity of the sample speech signal of typing, then tested speech signal is divided into and that sample of its similarity highest In signal mode classification, the last present invention outwardly exports recognition result, " this is XXX sound " similar report, for example, such as The fruit present invention has stored the phonetic feature signal of Zhang San, and when Zhang San speaks or sung against the present invention, the present invention can be automatic It is most like to calculate tested speech characteristic parameter and the typing voice signal of typing and labeled Zhang San of Zhang San, by knowing Not, output " this is the sound of Zhang San " automatically.
The present invention devises, the multi-layer artificial neural network structure that multi classifier uses, it is characterized in that, one end of network It is defined as input layer, the other end is defined as output layer, and the part among input layer and output layer is defined as hidden layer, and input layer is used In receiving extraneous input signal, input signal is sent to all neurons of hidden layer again, will after hidden layer calculates As a result output layer is passed to, output layer is from output category result after hidden layer reception signal, calculating, that is, the result identified, this hair The number of plies of bright preferable hidden layer is arranged to 1~200 layer.
The present invention devises, and the process of improved artificial neural network classification algorithm training includes step 1~7.
Step 1:Netinit.According to the number of voice signal typing, algorithm data-base is constantly updated, the N when typing During the voice signal of individual object, that is, N number of pattern class is formed, obtain sample space (X, Y), i-th group of sample is (Xi,Yi), XiRepresent The characteristic vector set extracted to i-th of object, YiRepresent i-th marked of object;According to system input and output sequence (X, Y) determines network input layer nodal point number n, hidden layer node number l, output layer nodal point number m, and wherein n values are by input signal feature The number of character pair value is determined in extraction, and m values are determined that l reference point is by the number of the speech pattern class storedWherein a span is 0~10, calculates determination automatically by model, initializes input layer and hidden layer Neuron between connection weight ωijConnection weight ω between hidden layer and output layer neuronjk, initialize implicit Layer threshold value a and output layer threshold value b, gives learning rate η and neuron excitation function.
Step 2:Calculate the output of hidden layer.According to input change X, the connection weight of the neuron of input layer and hidden layer ωij, and hidden layer threshold value a, calculate hidden layer output H;The output for remembering j-th of hidden layer node is Hj,J=1,2 ..., l, wherein l are hidden layer node number, and f is general hidden layer excitation function, the excitation Function has a variety of, currently preferred use f (x)=(1+e-x)-1
Step 3:Calculate the output of output layer.H, the connection between hidden layer and output layer neuron are exported according to hidden layer Weights ωjk, and output layer threshold value b, output layer output O is calculated, the output of k-th of output layer node of note is Ok,K=1,2 ..., m, wherein m are output layer nodal point number, bkFor the threshold value of k-th of node of output layer, HjFor the output valve of j-th of node of hidden layer.
Step 4:Calculate prediction error.The output O and desired output Y (true value) obtained according to neural network forecast, calculating network Overall error e is predicted,ekFor error caused by k-th of output layer node,
Step 5:Update weights.Network connection weights ω is updated according to neural network forecast overall error ejkAnd ωij, ωjk +jk +η·Hj·Ek, wherein j=1,2 ..., l, k=1,2 ..., m, η are learning rate, EkRepresent the network overall error of output layer node Sensitivity to output layer network node k, Wherein i=1,2 ..., n, j=1,2 ..., l.
Step 6:Threshold value updates.Hidden layer threshold value a and output layer threshold value b is updated according to neural network forecast overall error e,J=1,2 ..., l;bk +=bk+η·Ek, k=1,2 ..., m.
Step 7:Whether evaluation algorithm iteration restrains, if no convergence return to step 2, currently preferred minimal error are not Terminate iteration when 0.001.
The present invention devises, and voice acquisition module 1 is built-in with voice collecting card, for collecting and handling the voice collected Signal.
The present invention devises, and fixed phonographic recorder 13 uses wind-proof type microphone.
The present invention devises, and display screen 8 uses touch-screen or LED display with background light.
In the present invention, fixed phonographic recorder 13 can be set multiple, be arranged at shell of the present invention, for increasing Speech Record Intensity processed.
The present invention has long-term store function to typing and the voice signal that has marked, every to be stored in voice of the present invention Voice signal in pattern class database, the present invention can transfer at any time carries out contrast identification with unknown tested speech.
The process for using of the present invention is first to turn on the power switch 5, and then automated system operation, display screen 8 are lighted and shown Operation interface, people can select " voice typing pattern " and " tone testing pattern " two kinds of functions.
(1) when selection Speech Record is fashionable, the meeting control voice of central processing unit 3 input block 20 enters " voice typing mould Formula ", display screen 8 and loudspeaker 35 can prompt " being voice typing pattern now, please speak " similar prompting simultaneously, and people can lead to Cross microphone 11, wireless interspeaker 12, fixed phonographic recorder 13 that voice acquisition module 1 provided and smart mobile phone any one mode Input voice;To ensure the phonetic feature of the invention that can be accurately identified and quantify identified object, therefore in " voice typing mould Formula " the stage can only carry out voice typing to a people or an object every time, be sent due to same people when speaking and singing Sound signal data can have certain feature deviation, and therefore, to improve the degree of accuracy of voice signal identification, the present invention uses more State voice typing strategy, i.e., it can be included in the voice of typing under normal speech, singing or high/medium/low sound and other states Multimode combined sound, long recording time be 5~30 seconds, display 8 can show voice real-time waveform and schedule bar, if The voice of recording is undesirable can to delete typing again, need to carry out data markers after typing voice, labeling method uses Manually mark, such as gathered the sound of Zhang San, i.e., the remarks in the dialog box of display screen 8 of the present invention display:" the sound of Zhang San Sound ", preservation, the phonetic storage of typing is in the memory 33 of the present invention.
(2) after voice signal typing, marked voice signal is sent into voice by control system of the invention automatically The voice signal that voice acquisition module 1 collects is changed into electric signal by pretreatment unit 21, voice pretreatment unit 21, will Analog-signal transitions are data signal, then carry out conventional signal transacting, including ambient background noise elimination, signal framing, Filtering, preemphasis, windowed function and end-point detection etc..
(3) voice signal after having pre-processed is sent into signal characteristic abstraction unit 22, language by control system of the invention automatically Sound signal feature extraction unit 22 extracts the characteristic parameter of reflection voice essence from pretreated voice signal, obtains spy Levy vector xi, characteristic parameter extraction method preferably uses frequency cepstral coefficient method (MFCC), can also use spectrum envelope method, LPC Interpolation method, LPC extraction of roots, Hilbert transform method etc. obtain acoustic feature, extract the characteristic vector system obtained after feature certainly It is dynamic to be saved in pattern class storehouse, the corresponding pattern class of all sound characteristics of a people, if after the N number of people's voice of typing, i.e., N number of pattern class is obtained, if each pattern class has n characteristic parameter, so as to obtain the data that a people corresponds to voice signal pattern class Storehouse, all data are stored in the memory 33 of the present invention, and so far, speech signal typing mode contents finish.
(4) after voice typing, tone testing can be carried out, when carrying out tone testing, it is only necessary in the behaviour of display screen 8 Make to select " tone testing pattern " in interface, the meeting control voice of central processing unit 3 input block 20 enters " tone testing mould Formula ", display screen 8 can prompt prompting similar " in tone testing ... " simultaneously with loudspeaker 35, at this moment people do not take to do it is any Operation, the present invention can pass through the microphone 11 in voice acquisition module 1, wireless interspeaker 12, fixed phonographic recorder 13 and smart mobile phone Together collecting test voice, tested speech gatherer process be not any to gather in real time for one or more input tools therein The limitation of time restriction and number.
(5) speech data collected under " tone testing pattern ", present system device can be automatically to testing language Sound signal is pre-processed and feature extraction, the tone testing signal collected is converted into electric signal, and carry out conventional filter Ripple, carry out signal characteristic abstraction after removing noise, windowed function and end-point detection.
(6) after test signal extraction feature, the present invention can carry out characteristic matching automatically, by the tested speech signal of extraction Marked good sample voice characteristic parameter of the characteristic parameter in real time with typing in memory 33 of the present invention carries out feature Match somebody with somebody, and calculate the similarity of tested speech signal and the primary speech signal of all typings, and tested speech signal is assigned to With in its similarity highest that pattern class, the last present invention outwardly exports, " this is XXX sound " similar report Accuse, if for example, the present invention has stored the phonetic feature signal of Zhang San, when Zhang San speaks or sung against the present invention, originally Invention can export " this is the sound of Zhang San " automatically by identification.
When the present invention tests in public, because in test environment, it is same that the same period there may be multiple objects When speak, that is, the voice signal collected is the signal of broadband aliasing, to prevent that the present invention is special to the voice signal that now gathers Malfunctioned during sign extraction, the strategy that the present invention uses is, with intelligent algorithm, first matches and identify language when single people speaks Sound characteristic parameter is simultaneously identified and stored, then system again to speaking jointly when voice signal carry out automatic screening and point From finally exporting recognition result and report " being the common sound of Zhang San, Li Si, king five ... now " similar prompting, and carry Show that XX voice be present fails to identify, power off key 6 is pressed when closing system.
The present invention have also been devised, and system and device can also export clear to the recognition result under more people's communication environments to people It is single, comprising the quantity that how many people or object are spoken at the scene under test environment, and screen and play from more people while speak Recording in identification isolate the content that everyone is said, and filter out other people sound and ambient sound.
When occur in tested speech signal the present invention do not store sample phonic signal character when, the present invention can remember automatically The unknown phonic signal character is recorded, to remind people whether to mark and store the voice signal of the object.
Brief description of the drawings
Fig. 1 is the structural representation of the present invention.
Fig. 2 is the system framework figure of the present invention.
Fig. 3 is the multi-layer artificial neural network schematic diagram of the present invention.
Fig. 4 is the improved neural network classification algorithm flow chart of voice signal of the present invention.
Embodiment
Accompanying drawing 1 is one embodiment of the present of invention, illustrates the present embodiment with reference to 1~accompanying drawing of accompanying drawing 4, includes framework 10, framework 10 is provided with cavity, be provided with framework 10 voice acquisition module 1, sound identification module 2, central processing unit 3, Wireless signal transceiver 4, display screen 8, memory 33, mixed-media network modules mixed-media 31, RAM card 32, loudspeaker 35 and power supply 9, voice is adopted Collection module 1 includes microphone 11, wireless interspeaker 12 and fixed phonographic recorder 13, sound identification module 2 and includes phonetic entry list Member 20, voice pretreatment unit 21, speech recognition unit 22, characteristic matching identification and classification unit 23, voice signal Gathered by voice acquisition module 1, the signal collected is handled by sound identification module 2, and data-signal is preserved by memory 33, people The visualization of the output of the operating process and result of machine interaction is shown that loudspeaker 35 is arranged to operating procedure by display screen 8 Carrying out voice message and report recognition result, mixed-media network modules mixed-media 31 is arranged to the present invention being attached with internet cloud platform, in Central processor 3 is arranged to programme-control and data operation to whole system device, and wireless signal transceiver 4 is arranged to nothing Line intercom 12, smart mobile phone, radio signal caused by mixed-media network modules mixed-media 31 received, is launched and by the present invention and interconnection Net wireless connection, the external voice data that RAM card 32 is arranged to have recorded are read in database of the present invention.
In the present embodiment, voice-input unit 20 is arranged to include " voice typing pattern " and " tone testing pattern " Two types, microphone 11 that voice acquisition module 1 is provided, wireless interspeaker 12, fixed phonographic recorder 13 and intelligent hand can be passed through Any one mode of machine inputs voice, and in " voice typing pattern ", voice-input unit 20 is arranged to once can only be to one People or an object carry out voice typing, it is characterised in that the voice of typing is one section of audio signal of 5~30 seconds, the present invention Using multimode voice typing strategy, it is characterised in that can include in the voice of typing normal speech, singing or height/in/ The multimode combine voice of bass, the real-time display speech waveform of display 8 and schedule bar, need after typing voice into Row data markers, labeling method have such as gathered the sound of Zhang San, i.e., shown in display screen 8 of the present invention using manually marking Dialog box in remarks:" sound of Zhang San ", preservation, the voice of typing are preserved in the memory 33, in " tone testing mould Under formula ", the present invention by the microphone 11 in voice acquisition module 1, wireless interspeaker 12, fixed phonographic recorder 13 and smart mobile phone its In one or more input tools together collecting test voice, tested speech gatherer process is collection in real time, nobody Number, object and the limitation of time.
In the present embodiment, voice-input unit 20 is arranged to be connected with voice acquisition module 1, and microphone 11 passes through audio Line is connected to voice acquisition module 1, and wireless interspeaker 12 is connected by radio signal with voice acquisition module 1.
In the present embodiment, voice acquisition module 1 can also use smart mobile phone to carry out voice signal input, by using mobile phone Connection is matched with voice acquisition module 1 of the present invention, matching way, which includes bluetooth, infrared ray, WIFI and scanning Quick Response Code, to be carried out Connection, realizes voice typing, is used equivalent to mobile phone as wireless language cylinder, more convenient for more crowd's voice interfaces.
In the present embodiment, the voice signal that voice acquisition module 1 collects is changed into electricity by voice pretreatment unit 21 Signal, i.e., be data signal by analog-signal transitions, then carry out conventional signal transacting, including ambient background noise eliminate, Signal framing, filtering, preemphasis, windowed function and end-point detection etc..
In the present embodiment, speech recognition unit 22 is arranged to extract reflection language from primary speech signal The main characteristic parameters of sound essence, form characteristic vector xi, xi=(xi1,xi2,…xij,…,xin)T, xijRepresent i-th of object Or j-th personal of speech characteristic value, characteristic parameter extraction method preferably use frequency cepstral coefficient method (MFCC), can also adopted Acoustic feature is obtained with spectrum envelope method, LPC interpolation methods, LPC extraction of roots, Hilbert transform method etc., is obtained after extraction feature Characteristic vector system will be automatically saved in pattern class database, the corresponding pattern of all sound characteristics of an object or people Class, if after the voice of the N number of people of typing or object, that is, N number of pattern class is obtained, if each pattern class has n characteristic parameter, you can structure Into n dimensional feature spaces, that is, the characteristic signal collection after marking can be designated as D={ (x1,y1),(x2,y2),…(xi,yi),…,(xN, yN), wherein xi∈ χ=Rn, xiRepresent i-th of object of institute's typing or the phonetic feature signal of people, yi∈ Y={ 1,2 ..., N }, yiRepresent i-th of people or object, N represents the digital number of n-th people or object, and the voice feature data after mark forms pattern Class database, and be stored in the memory 33 of the present invention.
In the present embodiment, characteristic matching identification and classification unit 23 is arranged to the multi classifier using intelligence, grader Learning algorithm be arranged to use improved neural network classification algorithm, pass through to typing and mark phonetic feature signal collection As training data, allow network model to learn training data, obtain classifying rules, complete the training of grader;Then Intelligent classification and identification are carried out to unknown tested speech signal using the grader trained;When test signal extracts spy After sign, the present invention can carry out characteristic matching automatically, and the characteristic parameter of the tested speech signal of extraction is deposited with the present invention in real time Typing and the sample voice characteristic parameter progress characteristic matching marked in reservoir 33, and tested speech signal is calculated with owning The similarity of the sample speech signal of typing, then tested speech signal is divided into and that sample of its similarity highest In signal mode classification, the last present invention outwardly exports recognition result, " this is XXX sound " similar report, for example, such as The fruit present invention has stored the phonetic feature signal of Zhang San, and when Zhang San speaks or sung against the present invention, the present invention can be automatic It is most like to calculate tested speech characteristic parameter and the typing voice signal of typing and labeled Zhang San of Zhang San, by knowing Not, output " this is the sound of Zhang San " automatically.
In the present embodiment, the multi-layer artificial neural network structure that multi classifier uses, it is characterized in that, one end of network It is defined as input layer, the other end is defined as output layer, and the part among input layer and output layer is defined as hidden layer, and input layer is used In receiving extraneous input signal, input signal is sent to all neurons of hidden layer again, will after hidden layer calculates As a result output layer is passed to, output layer is from output category result after hidden layer reception signal, calculating, that is, the result identified, this hair The number of plies of bright preferable hidden layer is arranged to 1~200 layer.
In the present embodiment, the process of improved artificial neural network classification algorithm training is as follows:
Step 1:Netinit.According to the number of voice signal typing, algorithm data-base is constantly updated, the N when typing During the voice signal of individual object, that is, N number of pattern class is formed, obtain sample space (X, Y), i-th group of sample is (Xi,Yi), XiRepresent The characteristic vector set extracted to i-th of object, YiRepresent i-th marked of object;According to system input and output sequence (X, Y) determines network input layer nodal point number n, hidden layer node number l, output layer nodal point number m, and wherein n values are by input signal feature The number of character pair value is determined in extraction, and m values are determined that l reference point is by the number of the speech pattern class storedWherein a span is 0~10, calculates determination automatically by model, initializes input layer and hidden layer Neuron between connection weight ωijConnection weight ω between hidden layer and output layer neuronjk, initialize implicit Layer threshold value a and output layer threshold value b, gives learning rate η and neuron excitation function.
Step 2:Calculate the output of hidden layer.According to input change X, the connection weight of the neuron of input layer and hidden layer ωij, and hidden layer threshold value a, calculate hidden layer output H;The output for remembering j-th of hidden layer node is Hj,J=1,2 ..., l, wherein l are hidden layer node number, and f is general hidden layer excitation function, the excitation Function has a variety of, currently preferred use f (x)=(1+e-x)-1
Step 3:Calculate the output of output layer.H, the connection between hidden layer and output layer neuron are exported according to hidden layer Weights ωjk, and output layer threshold value b, output layer output O is calculated, the output of k-th of output layer node of note is Ok,K=1,2 ..., m, wherein m are output layer nodal point number, bkFor the threshold value of k-th of node of output layer, HjFor the output valve of j-th of node of hidden layer.
Step 4:Calculate prediction error.The output O and desired output Y (true value) obtained according to neural network forecast, calculating network Overall error e is predicted,ekFor error caused by k-th of output layer node,Step 5:Renewal power Value.Network connection weights ω is updated according to neural network forecast overall error ejkAnd ωij, ωjk +jk+η·Hj·Ek, wherein j=1, 2 ..., l, k=1,2 ..., m, η are learning rate, EkRepresent the network overall error of output layer node to output layer network node k's Sensitivity, Wherein i=1,2 ..., n, j =1,2 ..., l.
Step 6:Threshold value updates.Hidden layer threshold value a and output layer threshold value b is updated according to neural network forecast overall error e,J=1,2 ..., l;bk +=bk+η·Ek, k=1,2 ..., m.
Step 7:Whether evaluation algorithm iteration restrains, if no convergence return to step 2, currently preferred minimal error are not Terminate iteration when 0.001.
In the present embodiment, voice acquisition module 1 is built-in with voice collecting card, for collecting and handling the voice collected Signal.
In the present embodiment, fixed phonographic recorder 13 uses wind-proof type microphone.
In this example it is shown that screen 8 uses touch-screen or LED display with background light.
In the present embodiment, fixed phonographic recorder 13 can be set multiple, be arranged at shell of the present invention, for increasing voice Record intensity.
The present invention has long-term store function to typing and the voice signal that has marked, every to be stored in voice of the present invention Voice signal in pattern class database, the present invention can transfer at any time carries out contrast identification with unknown tested speech.
The process for using of the present invention is first to turn on the power switch 5, and then automated system operation, display screen 8 are lighted and shown Operation interface, people can select " voice typing pattern " and " tone testing pattern " two kinds of functions.
(1) when selection Speech Record is fashionable, the meeting control voice of central processing unit 3 input block 20 enters " voice typing mould Formula ", display screen 8 and loudspeaker 35 can prompt " being voice typing pattern now, please speak " similar prompting simultaneously, and people can lead to Cross microphone 11, wireless interspeaker 12, fixed phonographic recorder 13 that voice acquisition module 1 provided and smart mobile phone any one mode Input voice;To ensure the phonetic feature of the invention that can be accurately identified and quantify identified object, therefore in " voice typing mould Formula " the stage can only carry out voice typing to a people or an object every time, be sent due to same people when speaking and singing Sound signal data can have certain feature deviation, and therefore, to improve the degree of accuracy of voice signal identification, the present invention uses more State voice typing strategy, i.e., it can be included in the voice of typing under normal speech, singing or high/medium/low sound and other states Multimode combined sound, long recording time be 5~30 seconds, display 8 can show voice real-time waveform and schedule bar, if The voice of recording is undesirable can to delete typing again, need to carry out data markers after typing voice, labeling method uses Manually mark, such as gathered the sound of Zhang San, i.e., the remarks in the dialog box of display screen 8 of the present invention display:" the sound of Zhang San Sound ", preservation, the phonetic storage of typing is in the memory 33 of the present invention.
(2) after voice signal typing, marked voice signal is sent into voice by control system of the invention automatically The voice signal that voice acquisition module 1 collects is changed into electric signal by pretreatment unit 21, voice pretreatment unit 21, will Analog-signal transitions are data signal, then carry out conventional signal transacting, including ambient background noise elimination, signal framing, Filtering, preemphasis, windowed function and end-point detection etc..
(3) voice signal after having pre-processed is sent into signal characteristic abstraction unit 22, language by control system of the invention automatically Sound signal feature extraction unit 22 extracts the characteristic parameter of reflection voice essence from pretreated voice signal, obtains spy Levy vector xi, characteristic parameter extraction method preferably uses frequency cepstral coefficient method (MFCC), can also use spectrum envelope method, LPC Interpolation method, LPC extraction of roots, Hilbert transform method etc. obtain acoustic feature, extract the characteristic vector system obtained after feature certainly It is dynamic to be saved in pattern class storehouse, the corresponding pattern class of all sound characteristics of a people, if after the N number of people's voice of typing, i.e., N number of pattern class is obtained, if each pattern class has n characteristic parameter, so as to obtain the data that a people corresponds to voice signal pattern class Storehouse, all data are stored in the memory 33 of the present invention, and so far, speech signal typing mode contents finish.
(4) after voice typing, tone testing can be carried out, when carrying out tone testing, it is only necessary in the behaviour of display screen 8 Make to select " tone testing pattern " in interface, the meeting control voice of central processing unit 3 input block 20 enters " tone testing mould Formula ", display screen 8 can prompt prompting similar " in tone testing ... " simultaneously with loudspeaker 35, at this moment people do not take to do it is any Operation, the present invention can pass through the microphone 11 in voice acquisition module 1, wireless interspeaker 12, fixed phonographic recorder 13 and smart mobile phone Together collecting test voice, tested speech gatherer process be not any to gather in real time for one or more input tools therein The limitation of time restriction and number.
(5) speech data collected under " tone testing pattern ", present system device can be automatically to testing language Sound signal is pre-processed and feature extraction, the tone testing signal collected is converted into electric signal, and carry out conventional filter Ripple, carry out signal characteristic abstraction after removing noise, windowed function and end-point detection.
(6) after test signal extraction feature, the present invention can carry out characteristic matching automatically, by the tested speech signal of extraction Marked good sample voice characteristic parameter of the characteristic parameter in real time with typing in memory 33 of the present invention carries out feature Match somebody with somebody, and calculate the similarity of tested speech signal and the primary speech signal of all typings, and tested speech signal is assigned to With in its similarity highest that pattern class, the last present invention outwardly exports, " this is XXX sound " similar report Accuse, if for example, the present invention has stored the phonetic feature signal of Zhang San, when Zhang San speaks or sung against the present invention, originally Invention can export " this is the sound of Zhang San " automatically by identification.
When the present invention tests in public, because in test environment, it is same that the same period there may be multiple objects When speak, that is, the voice signal collected is the signal of broadband aliasing, to prevent that the present invention is special to the voice signal that now gathers Malfunctioned during sign extraction, the strategy that the present invention uses is, with intelligent algorithm, first matches and identify language when single people speaks Sound characteristic parameter is simultaneously identified and stored, then system again to speaking jointly when voice signal carry out automatic screening and point From finally exporting recognition result and report " being the common sound of Zhang San, Li Si, king five ... now " similar prompting, and carry Show that XX voice be present fails to identify, power off key 6 is pressed when closing system.
In the present embodiment, system and device can also export to the recognition result inventory under more people's communication environments to people, Comprising the quantity that how many people or object are spoken at the scene under test environment, and screen and play from more people while the record spoken The content that everyone is said is isolated in identification in sound, and filters out other people sound and ambient sound.
When occur in tested speech signal the present invention do not store sample phonic signal character when, the present invention can remember automatically The unknown phonic signal character is recorded, to remind people whether to mark and store the voice signal of the object.
In intelligent sound signal type recognition system engineering device technique field;Every to include framework 10, framework 10 is set There is cavity, voice acquisition module 1, sound identification module 2, central processing unit 3, wireless signal transmitting-receiving dress are provided with framework 10 4, display screen 8, memory 33, mixed-media network modules mixed-media 31, RAM card 32, loudspeaker 35 and power supply 9 are put, voice acquisition module 1 includes Microphone 11, wireless interspeaker 12 and fixed phonographic recorder 13, sound identification module 2 includes voice-input unit 20, voice is located in advance Unit 21, speech recognition unit 22, characteristic matching identification and classification unit 23 are managed, voice signal is by voice acquisition module 1 collection, the signal collected are handled by sound identification module 2, and data-signal is preserved by memory 33, the operation stream of man-machine interaction The visualization of the output of journey and result shows by display screen 8, loudspeaker 35 be arranged to carry out operating procedure voice message and Recognition result is reported, mixed-media network modules mixed-media 31 is arranged to the present invention being attached with internet cloud platform, and central processing unit 3 is arranged to Programme-control and data operation to whole system device, wireless signal transceiver 4 are arranged to wireless interspeaker 12, intelligence Radio signal caused by mobile phone, mixed-media network modules mixed-media 31 is received, launched and is connected the present invention with internet wireless, internal memory Technology contents in the external voice data reading database of the present invention that card 32 is arranged to have recorded are all in the protection of the present invention In the range of, it is noted that the scope of the present invention should not be so limited to resemblance, and the moulding of framework 10 of the invention can be set For square, cylindrical, the polygon prism bodily form or similar to other moulding such as Chinese cabbage, watermelon, stone, every moulding is different and essence Technology contents and identical of the present invention all technology contents are also within protection scope of the present invention;Meanwhile the art skill Art personnel make the obvious small improvement of routine or small combination on the basis of present invention, as long as technology contents are included in The technology contents within context described in the present invention are also within the scope of the present invention.

Claims (8)

  1. A kind of 1. intelligent sound signal type recognition system device;It is characterized in that:Include framework (10), the framework (10) sets It is equipped with cavity, it is characterised in that voice acquisition module (1), sound identification module (2), centre are provided with framework (10) Reason device (3), wireless signal transceiver (4), display screen (8), memory (33), mixed-media network modules mixed-media (31), RAM card (32), raise one's voice Device (35) and power supply (9), voice acquisition module (1) include microphone (11), wireless interspeaker (12) and fixed phonographic recorder (13), Sound identification module (2) includes voice-input unit (20), voice pretreatment unit (21), speech recognition unit (22), characteristic matching identification and classification unit (23), voice signal are gathered by voice acquisition module (1), and the signal collected is by language Sound identification module (2) processing, data-signal are preserved by memory (33), the output of the operating process and result of man-machine interaction Visualization is shown that loudspeaker (35) is arranged to carry out operating procedure voice message and reports recognition result, net by display screen (8) Network module (31) is arranged to the present invention being attached with internet cloud platform, and central processing unit (3) is arranged to whole system The programme-control of device and data carry out computing, and wireless signal transceiver (4) is arranged to wireless interspeaker (12), intelligent hand Radio signal caused by machine, mixed-media network modules mixed-media (31) is received, launched and is connected the present invention with internet wireless, internal memory The external voice data that card (32) is arranged to have recorded are read in database of the present invention.
  2. 2. intelligent sound signal type recognition system device according to claim 1;It is characterized in that:Voice-input unit (20) it is arranged to include " voice typing pattern " and " tone testing pattern " two types, voice acquisition module (1) can be passed through The microphone (11) that is there is provided, wireless interspeaker (12), fixed phonographic recorder (13) and smart mobile phone any one mode input voice, In " voice typing pattern ", voice-input unit (20) is arranged to that once Speech Record can only be carried out to a people or an object Entering, it is characterised in that the voice of typing is one section of audio signal of 5~30 seconds, and the present invention uses multimode voice typing strategy, Characterized in that, the multimode combine voice of normal speech, singing or high/medium/low sound can be included in the voice of typing, show Show device (8) real-time display speech waveform and schedule bar, need to carry out data markers after typing voice, labeling method is adopted Manually hand labeled, the remarks in the dialog box of display screen (8) display:" XXX sound ", preservation, the voice of typing It is stored in memory (33), under " tone testing pattern ", by the microphone (11) in voice acquisition module (1), wireless right Machine (12), fixed phonographic recorder (13) and smart mobile phone one or more input tools therein together collecting test voice are said, is surveyed Voice collecting process is tried as collection in real time, nobody number, object and the limitation of time, smart mobile phone and voice acquisition module (1) no lines matching connection is arranged to, matching way includes bluetooth, infrared ray, WIFI and scanning Quick Response Code and is attached, realizes Voice typing, used equivalent to mobile phone as wireless language cylinder, realize more crowd's voice interfaces.
  3. 3. intelligent sound signal type recognition system device according to claim 1;It is characterized in that:Voice pretreatment unit (21) voice signal that voice acquisition module (1) collects is changed into electric signal, i.e., is data signal by analog-signal transitions, Then conventional signal transacting, including ambient background noise elimination, signal framing, filtering, preemphasis, windowed function and end are carried out Point detection.
  4. 4. intelligent sound signal type recognition system device according to claim 1;It is characterized in that:Phonic signal character carries Take unit (22) to be arranged to extract the main characteristic parameters of reflection voice essence from primary speech signal, form characteristic vector xi, xi=(xi1,xi2,…xij,…,xin)T, xijRepresent that i-th of object or j-th of speech characteristic value of individual, characteristic parameter carry Take method to use frequency cepstral coefficient method (MFCC), can also use spectrum envelope method, LPC interpolation methods, LPC extraction of roots, Hilbert Converter technique etc. obtains acoustic feature, and extracting the characteristic vector system obtained after feature will be automatically saved in pattern class database, One object or the corresponding pattern class of all sound characteristics of people, if after the voice of the N number of people of typing or object, that is, are obtained N number of Pattern class, if each pattern class has n characteristic parameter, you can form n dimensional feature spaces, that is, the characteristic signal collection after marking can be remembered For D={ (x1,y1),(x2,y2),…(xi,yi),…,(xN,yN), wherein xi∈ χ=Rn, xiRepresent i-th of object of institute's typing Or the phonetic feature signal of people, yi∈ Y={ 1,2 ..., N }, yiRepresent i-th of people or object, N represents n-th people or object Digital number, the voice feature data after mark forms pattern class database, and is stored in the memory (33) of the present invention.
  5. 5. intelligent sound signal type recognition system device according to claim 1;It is characterized in that:Characteristic matching, which differentiates, divides Class unit (23) is arranged to the multi classifier using intelligence, and the learning algorithm of grader is arranged to use improved neutral net Sorting algorithm, by being used as training data to the phonetic feature signal collection of typing and mark, network model is allowed to training data Learnt, obtain classifying rules, complete the training of grader;Then using the grader trained to unknown test Voice signal carries out intelligent classification and identification;After test signal extracts feature, the present invention can carry out characteristic matching automatically, will carry The characteristic parameter of the tested speech signal taken in real time with typing in memory of the present invention (33) and the sample voice marked Characteristic parameter carries out characteristic matching, and calculates tested speech signal and the similarity of the sample speech signal of all typings, so Tested speech signal is divided into and in its similarity highest that sample signal mode classification, the last present invention is outwardly afterwards Export recognition result, " this is XXX sound " similar report.
  6. 6. intelligent sound signal type recognition system device according to claim 1;It is characterized in that:Multi classifier uses Multi-layer artificial neural network structure, it is characterised in that one end of network is defined as input layer, and the other end is defined as output layer, Part among input layer and output layer is defined as hidden layer, and input layer is used for the input signal for receiving the external world, again will input Signal is sent to all neurons of hidden layer, passes the result to output layer after hidden layer calculates, and output layer is from hidden layer Reception signal, output category result after calculating, that is, the result identified, the number of plies of currently preferred hidden layer are arranged to 1~200 Layer.
  7. 7. intelligent sound signal type recognition system device according to claim 1;It is characterized in that:Improve ANN The step of network training, is as follows:
    Step 1:Netinit;According to the number of voice signal typing, algorithm data-base is constantly updated, it is N number of right when typing During the voice signal of elephant, that is, N number of pattern class is formed, obtain sample space (X, Y), i-th group of sample is (Xi,Yi), XiRepresent to the The characteristic vector set that i object is extracted, YiRepresent i-th marked of object;According to system input and output sequence (X, Y) Network input layer nodal point number n, hidden layer node number l, output layer nodal point number m are determined, wherein n values are by input signal feature extraction The number of character pair value determines that m values are determined that l reference point is by the number of the speech pattern class stored Wherein a span is 0~10, calculates determination automatically by model, is initialized between input layer and the neuron of hidden layer Connection weight ωijConnection weight ω between hidden layer and output layer neuronjk, initialize hidden layer threshold value a and output layer Threshold value b, give learning rate η and neuron excitation function;
    Step 2:Calculate the output of hidden layer;According to input change X, the connection weight ω of the neuron of input layer and hidden layerij, And hidden layer threshold value a, calculate hidden layer output H;The output for remembering j-th of hidden layer node is Hj, J=1,2 ..., l, wherein l are hidden layer node number, and f is general hidden layer excitation function, the excitation function have it is a variety of, the present invention it is excellent Choosing uses f (x)=(1+e-x)-1
    Step 3:Calculate the output of output layer;H, the connection weight between hidden layer and output layer neuron are exported according to hidden layer ωjk, and output layer threshold value b, output layer output O is calculated, the output of k-th of output layer node of note is Ok,K=1,2 ..., m, wherein m are output layer nodal point number, bkFor the threshold value of k-th of node of output layer, HjFor the output valve of j-th of node of hidden layer;
    Step 4:Calculate prediction error;The output O and desired output Y (true value) obtained according to neural network forecast, calculating network prediction Overall error e,ekFor error caused by k-th of output layer node,
    <mrow> <msub> <mi>e</mi> <mi>k</mi> </msub> <mo>=</mo> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> <msup> <mrow> <mo>(</mo> <msub> <mi>Y</mi> <mi>k</mi> </msub> <mo>-</mo> <msub> <mi>O</mi> <mi>k</mi> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>;</mo> </mrow>
    Step 5:Update weights;Network connection weights ω is updated according to neural network forecast overall error ejkAnd ωij, ωjk +jk+η· Hj·Ek, wherein j=1,2 ..., l, k=1,2 ..., m, η are learning rate, EkRepresent the network overall error of output layer node to defeated Go out layer network node k sensitivity, Wherein i=1,2 ..., n, j=1,2 ..., l;
    Step 6:Threshold value updates;Hidden layer threshold value a and output layer threshold value b is updated according to neural network forecast overall error e,J=1,2 ..., l;bk +=bk+η·Ek, k=1,2 ..., m;
    Step 7:Whether evaluation algorithm iteration restrains, if no convergence return to step 2, currently preferred minimal error is 0.001 When terminate iteration.
  8. 8. according to the intelligent sound signal type recognition system device described in any one of claim 1,2,3,4,5,6,7;It is special Levying is being:Basic operating system flow setting is:
    1) (5) are turned on the power switch, then automated system operation, display screen (8) is lighted and operation display interface, people can select Select " voice typing pattern " and " tone testing pattern " two kinds of functions;When selection Speech Record is fashionable, central processing unit (3) can control Voice-input unit (20) enters " voice typing pattern ", and display screen (8) and loudspeaker (35) can prompt " to be voice now simultaneously Typing pattern, please speak " similar prompting, people can pass through microphone (11) that voice acquisition module (1) is provided, wireless talkback Machine (12), fixed phonographic recorder (13) and smart mobile phone any one mode input voice;To ensure that the present invention can accurately identify The phonetic feature of object is identified with quantization, therefore every time can only be to a people or an object in " the voice typing pattern " stage Voice typing is carried out, because the sound signal data that same people sends when speaking and singing can have certain feature deviation, Therefore, to improve the degree of accuracy of voice signal identification, the present invention uses multimode voice typing strategy, i.e., can in the voice of typing Comprising the multimode combined sound under normal speech, singing or high/medium/low sound and other states, long recording time is 5~30 Second, display (8) can show voice real-time waveform and schedule bar, if undesirable can delete of voice recorded is recorded again Enter, need to carry out data markers after typing voice, labeling method has such as gathered the sound of Zhang San using manually marking Sound, i.e., the remarks in the dialog box of display screen of the present invention (8) display:" sound of Zhang San ", preservation, the phonetic storage of typing In the memory (33) of the present invention;
    2) after voice signal typing, marked voice signal is sent into voice pretreatment by control system of the invention automatically The voice signal that voice acquisition module (1) collects is changed into electric signal by unit (21), voice pretreatment unit (21), will Analog-signal transitions are data signal, then carry out conventional signal transacting, including ambient background noise elimination, signal framing, Filtering, preemphasis, windowed function and end-point detection etc.;
    3) voice signal after having pre-processed is sent into signal characteristic abstraction unit (22), voice letter by control system of the invention automatically Number feature extraction unit (22) extracts the characteristic parameter of reflection voice essence from pretreated voice signal, obtains feature Vector xi, characteristic parameter extraction method uses frequency cepstral coefficient method (MFCC), can also use spectrum envelope method, LPC interpolation methods, LPC extraction of roots, Hilbert transform method etc. obtain acoustic feature, extract the characteristic vector system obtained after feature and are automatically saved to In pattern class storehouse, the corresponding pattern class of all sound characteristics of a people, if after the N number of people's voice of typing, that is, N number of mould is obtained Formula class, it is all so as to obtain the database that a people corresponds to voice signal pattern class if each pattern class has n characteristic parameter Data are stored in the memory (33) of the present invention, and so far, speech signal typing mode contents finish;
    4) after voice typing, tone testing can be carried out, when carrying out tone testing, it is only necessary to the operation in display screen (8) " tone testing pattern " is selected in interface, central processing unit (3) meeting control voice input block (20) enters " tone testing Pattern ", display screen (8) can prompt prompting similar " in tone testing ... " simultaneously with loudspeaker (35), and at this moment people, which do not take, wants Any operation is done, the present invention can pass through the microphone (11) in voice acquisition module (1), wireless interspeaker (12), fixed phonographic recorder (13) and smart mobile phone one or more input tools therein together collecting test voice, tested speech gatherer process are real-time Collection, without any time limitation and the limitation of number;
    5) speech data collected under " tone testing pattern ", present system device can be automatically to tested speech signals Pre-processed and feature extraction, the tone testing signal collected is converted into electric signal, and carried out conventional filtering, remove Signal characteristic abstraction is carried out after noise, windowed function and end-point detection;
    6) after test signal extraction feature, the present invention can carry out characteristic matching automatically, by the feature of the tested speech signal of extraction Marked good sample voice characteristic parameter of the parameter in real time with typing in memory of the present invention (33) carries out characteristic matching, And calculate the similarity of the primary speech signal of tested speech signal and all typings, and tested speech signal assign to and its In that pattern class of similarity highest, the last present invention outwardly exports, " this is XXX sound " similar report, example Such as, if the present invention has stored the phonetic feature signal of Zhang San, when Zhang San speaks or sung against the present invention, present invention warp Identification is crossed, " this is the sound of Zhang San " can be exported automatically;
    7) system and device can also be exported to the recognition result inventory under more people's communication environments to people, comprising having under test environment The quantity that how many people or object are spoken at the scene, and screen and play the identification from more people while the recording spoken isolate it is every The content that individual is said, and other people sound and ambient sound are filtered out, do not deposited when occurring the present invention in tested speech signal During the sample phonic signal character of storage, the present invention can record the unknown phonic signal character automatically, to remind whether people mark Remember and store the voice signal of the object.
CN201711253194.6A 2017-12-02 2017-12-02 Intelligent sound signal type recognition system device Pending CN107808659A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711253194.6A CN107808659A (en) 2017-12-02 2017-12-02 Intelligent sound signal type recognition system device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711253194.6A CN107808659A (en) 2017-12-02 2017-12-02 Intelligent sound signal type recognition system device

Publications (1)

Publication Number Publication Date
CN107808659A true CN107808659A (en) 2018-03-16

Family

ID=61589300

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711253194.6A Pending CN107808659A (en) 2017-12-02 2017-12-02 Intelligent sound signal type recognition system device

Country Status (1)

Country Link
CN (1) CN107808659A (en)

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108520752A (en) * 2018-04-25 2018-09-11 西北工业大学 A kind of method for recognizing sound-groove and device
CN108564954A (en) * 2018-03-19 2018-09-21 平安科技(深圳)有限公司 Deep neural network model, electronic device, auth method and storage medium
CN108597521A (en) * 2018-05-04 2018-09-28 徐涌 Audio role divides interactive system, method, terminal and the medium with identification word
CN108877823A (en) * 2018-07-27 2018-11-23 三星电子(中国)研发中心 Sound enhancement method and device
CN109448726A (en) * 2019-01-14 2019-03-08 李庆湧 A kind of method of adjustment and system of voice control accuracy rate
CN109611703A (en) * 2018-10-19 2019-04-12 宁波市鄞州利帆灯饰有限公司 A kind of LED light being easily installed
CN109714491A (en) * 2019-02-26 2019-05-03 上海凯岸信息科技有限公司 Intelligent sound outgoing call detection system based on voice mail
CN109785855A (en) * 2019-01-31 2019-05-21 秒针信息技术有限公司 Method of speech processing and device, storage medium, processor
CN109801619A (en) * 2019-02-13 2019-05-24 安徽大尺度网络传媒有限公司 A kind of across language voice identification method for transformation of intelligence
CN109859763A (en) * 2019-02-13 2019-06-07 安徽大尺度网络传媒有限公司 A kind of intelligent sound signal type recognition system
CN109936814A (en) * 2019-01-16 2019-06-25 深圳市北斗智能科技有限公司 A kind of intercommunication terminal, speech talkback coordinated dispatching method and its system
CN110033785A (en) * 2019-03-27 2019-07-19 深圳市中电数通智慧安全科技股份有限公司 A kind of calling for help recognition methods, device, readable storage medium storing program for executing and terminal device
CN110060717A (en) * 2019-01-02 2019-07-26 孙剑 A kind of law enforcement equipment laws for criterion speech French play system
CN110289016A (en) * 2019-06-20 2019-09-27 深圳追一科技有限公司 A kind of voice quality detecting method, device and electronic equipment based on actual conversation
CN111314451A (en) * 2020-02-07 2020-06-19 普强时代(珠海横琴)信息技术有限公司 Language processing system based on cloud computing application
CN111475206A (en) * 2019-01-04 2020-07-31 优奈柯恩(北京)科技有限公司 Method and apparatus for waking up wearable device
CN111603191A (en) * 2020-05-29 2020-09-01 上海联影医疗科技有限公司 Voice noise reduction method and device in medical scanning and computer equipment
CN111674360A (en) * 2019-01-31 2020-09-18 青岛科技大学 Method for establishing distinguishing sample model in vehicle tracking system based on block chain
CN111989742A (en) * 2018-04-13 2020-11-24 三菱电机株式会社 Speech recognition system and method for using speech recognition system
CN113572492A (en) * 2021-06-23 2021-10-29 力声通信股份有限公司 Novel communication equipment prevents falling digital intercom

Citations (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH11265197A (en) * 1997-12-13 1999-09-28 Hyundai Electronics Ind Co Ltd Voice recognizing method utilizing variable input neural network
US6026358A (en) * 1994-12-22 2000-02-15 Justsystem Corporation Neural network, a method of learning of a neural network and phoneme recognition apparatus utilizing a neural network
CN1662956A (en) * 2002-06-19 2005-08-31 皇家飞利浦电子股份有限公司 Mega speaker identification (ID) system and corresponding methods therefor
CN1941080A (en) * 2005-09-26 2007-04-04 吴田平 Soundwave discriminating unlocking module and unlocking method for interactive device at gate of building
CN101419799A (en) * 2008-11-25 2009-04-29 浙江大学 Speaker identification method based mixed t model
US20100057453A1 (en) * 2006-11-16 2010-03-04 International Business Machines Corporation Voice activity detection system and method
CN103236260A (en) * 2013-03-29 2013-08-07 京东方科技集团股份有限公司 Voice recognition system
CN103456301A (en) * 2012-05-28 2013-12-18 中兴通讯股份有限公司 Ambient sound based scene recognition method and device and mobile terminal
CN103619021A (en) * 2013-12-10 2014-03-05 天津工业大学 Neural network-based intrusion detection algorithm for wireless sensor network
JP2014048534A (en) * 2012-08-31 2014-03-17 Sogo Keibi Hosho Co Ltd Speaker recognition device, speaker recognition method, and speaker recognition program
US20140195236A1 (en) * 2013-01-10 2014-07-10 Sensory, Incorporated Speaker verification and identification using artificial neural network-based sub-phonetic unit discrimination
CN104008751A (en) * 2014-06-18 2014-08-27 周婷婷 Speaker recognition method based on BP neural network
US20160260428A1 (en) * 2013-11-27 2016-09-08 National Institute Of Information And Communications Technology Statistical acoustic model adaptation method, acoustic model learning method suitable for statistical acoustic model adaptation, storage medium storing parameters for building deep neural network, and computer program for adapting statistical acoustic model
CN106128465A (en) * 2016-06-23 2016-11-16 成都启英泰伦科技有限公司 A kind of Voiceprint Recognition System and method
CN106227038A (en) * 2016-07-29 2016-12-14 中国人民解放军信息工程大学 Grain drying tower intelligent control method based on neutral net and fuzzy control
CN106779053A (en) * 2016-12-15 2017-05-31 福州瑞芯微电子股份有限公司 The knowledge point of a kind of allowed for influencing factors and neutral net is known the real situation method
CN106782603A (en) * 2016-12-22 2017-05-31 上海语知义信息技术有限公司 Intelligent sound evaluating method and system
CN106875943A (en) * 2017-01-22 2017-06-20 上海云信留客信息科技有限公司 A kind of speech recognition system for big data analysis
US20170178666A1 (en) * 2015-12-21 2017-06-22 Microsoft Technology Licensing, Llc Multi-speaker speech separation
US20170270919A1 (en) * 2016-03-21 2017-09-21 Amazon Technologies, Inc. Anchored speech detection and speech recognition
CN112541533A (en) * 2020-12-07 2021-03-23 阜阳师范大学 Modified vehicle identification method based on neural network and feature fusion

Patent Citations (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6026358A (en) * 1994-12-22 2000-02-15 Justsystem Corporation Neural network, a method of learning of a neural network and phoneme recognition apparatus utilizing a neural network
JPH11265197A (en) * 1997-12-13 1999-09-28 Hyundai Electronics Ind Co Ltd Voice recognizing method utilizing variable input neural network
CN1662956A (en) * 2002-06-19 2005-08-31 皇家飞利浦电子股份有限公司 Mega speaker identification (ID) system and corresponding methods therefor
CN1941080A (en) * 2005-09-26 2007-04-04 吴田平 Soundwave discriminating unlocking module and unlocking method for interactive device at gate of building
US20100057453A1 (en) * 2006-11-16 2010-03-04 International Business Machines Corporation Voice activity detection system and method
CN101419799A (en) * 2008-11-25 2009-04-29 浙江大学 Speaker identification method based mixed t model
CN103456301A (en) * 2012-05-28 2013-12-18 中兴通讯股份有限公司 Ambient sound based scene recognition method and device and mobile terminal
JP2014048534A (en) * 2012-08-31 2014-03-17 Sogo Keibi Hosho Co Ltd Speaker recognition device, speaker recognition method, and speaker recognition program
US20140195236A1 (en) * 2013-01-10 2014-07-10 Sensory, Incorporated Speaker verification and identification using artificial neural network-based sub-phonetic unit discrimination
CN103236260A (en) * 2013-03-29 2013-08-07 京东方科技集团股份有限公司 Voice recognition system
US20160260428A1 (en) * 2013-11-27 2016-09-08 National Institute Of Information And Communications Technology Statistical acoustic model adaptation method, acoustic model learning method suitable for statistical acoustic model adaptation, storage medium storing parameters for building deep neural network, and computer program for adapting statistical acoustic model
CN103619021A (en) * 2013-12-10 2014-03-05 天津工业大学 Neural network-based intrusion detection algorithm for wireless sensor network
CN104008751A (en) * 2014-06-18 2014-08-27 周婷婷 Speaker recognition method based on BP neural network
US20170178666A1 (en) * 2015-12-21 2017-06-22 Microsoft Technology Licensing, Llc Multi-speaker speech separation
US20170270919A1 (en) * 2016-03-21 2017-09-21 Amazon Technologies, Inc. Anchored speech detection and speech recognition
CN106128465A (en) * 2016-06-23 2016-11-16 成都启英泰伦科技有限公司 A kind of Voiceprint Recognition System and method
CN106227038A (en) * 2016-07-29 2016-12-14 中国人民解放军信息工程大学 Grain drying tower intelligent control method based on neutral net and fuzzy control
CN106779053A (en) * 2016-12-15 2017-05-31 福州瑞芯微电子股份有限公司 The knowledge point of a kind of allowed for influencing factors and neutral net is known the real situation method
CN106782603A (en) * 2016-12-22 2017-05-31 上海语知义信息技术有限公司 Intelligent sound evaluating method and system
CN106875943A (en) * 2017-01-22 2017-06-20 上海云信留客信息科技有限公司 A kind of speech recognition system for big data analysis
CN112541533A (en) * 2020-12-07 2021-03-23 阜阳师范大学 Modified vehicle identification method based on neural network and feature fusion

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
刘拥军等;: "基于神经网络算法的粮食智能控制系统研究", 计算机与数字工程, vol. 44, no. 07, pages 1271 - 1276 *
曾向阳等: "声信号处理基础", vol. 1, 30 September 2015, 西北工业大学出版社, pages: 160 - 163 *
王小川等: "MATLAB神经网络43个案例分析", vol. 1, 31 August 2013, 北京航空航天大学出版社 , pages: 8 - 10 *
赵力: "语音信号处理", vol. 1, 31 March 2003, 机械工业出版社, pages: 141 - 145 *

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108564954A (en) * 2018-03-19 2018-09-21 平安科技(深圳)有限公司 Deep neural network model, electronic device, auth method and storage medium
CN108564954B (en) * 2018-03-19 2020-01-10 平安科技(深圳)有限公司 Deep neural network model, electronic device, identity verification method, and storage medium
CN111989742A (en) * 2018-04-13 2020-11-24 三菱电机株式会社 Speech recognition system and method for using speech recognition system
CN108520752B (en) * 2018-04-25 2021-03-12 西北工业大学 Voiceprint recognition method and device
CN108520752A (en) * 2018-04-25 2018-09-11 西北工业大学 A kind of method for recognizing sound-groove and device
CN108597521A (en) * 2018-05-04 2018-09-28 徐涌 Audio role divides interactive system, method, terminal and the medium with identification word
CN108877823A (en) * 2018-07-27 2018-11-23 三星电子(中国)研发中心 Sound enhancement method and device
CN109611703A (en) * 2018-10-19 2019-04-12 宁波市鄞州利帆灯饰有限公司 A kind of LED light being easily installed
CN110060717A (en) * 2019-01-02 2019-07-26 孙剑 A kind of law enforcement equipment laws for criterion speech French play system
CN111475206B (en) * 2019-01-04 2023-04-11 优奈柯恩(北京)科技有限公司 Method and apparatus for waking up wearable device
CN111475206A (en) * 2019-01-04 2020-07-31 优奈柯恩(北京)科技有限公司 Method and apparatus for waking up wearable device
CN109448726A (en) * 2019-01-14 2019-03-08 李庆湧 A kind of method of adjustment and system of voice control accuracy rate
CN109936814A (en) * 2019-01-16 2019-06-25 深圳市北斗智能科技有限公司 A kind of intercommunication terminal, speech talkback coordinated dispatching method and its system
CN111674360A (en) * 2019-01-31 2020-09-18 青岛科技大学 Method for establishing distinguishing sample model in vehicle tracking system based on block chain
CN109785855A (en) * 2019-01-31 2019-05-21 秒针信息技术有限公司 Method of speech processing and device, storage medium, processor
CN109785855B (en) * 2019-01-31 2022-01-28 秒针信息技术有限公司 Voice processing method and device, storage medium and processor
CN109859763A (en) * 2019-02-13 2019-06-07 安徽大尺度网络传媒有限公司 A kind of intelligent sound signal type recognition system
CN109801619A (en) * 2019-02-13 2019-05-24 安徽大尺度网络传媒有限公司 A kind of across language voice identification method for transformation of intelligence
CN109714491A (en) * 2019-02-26 2019-05-03 上海凯岸信息科技有限公司 Intelligent sound outgoing call detection system based on voice mail
CN110033785A (en) * 2019-03-27 2019-07-19 深圳市中电数通智慧安全科技股份有限公司 A kind of calling for help recognition methods, device, readable storage medium storing program for executing and terminal device
CN110289016A (en) * 2019-06-20 2019-09-27 深圳追一科技有限公司 A kind of voice quality detecting method, device and electronic equipment based on actual conversation
CN111314451A (en) * 2020-02-07 2020-06-19 普强时代(珠海横琴)信息技术有限公司 Language processing system based on cloud computing application
CN111603191A (en) * 2020-05-29 2020-09-01 上海联影医疗科技有限公司 Voice noise reduction method and device in medical scanning and computer equipment
CN111603191B (en) * 2020-05-29 2023-10-20 上海联影医疗科技股份有限公司 Speech noise reduction method and device in medical scanning and computer equipment
CN113572492A (en) * 2021-06-23 2021-10-29 力声通信股份有限公司 Novel communication equipment prevents falling digital intercom
CN113572492B (en) * 2021-06-23 2022-08-16 力声通信股份有限公司 Communication equipment prevents falling digital intercom

Similar Documents

Publication Publication Date Title
CN107808659A (en) Intelligent sound signal type recognition system device
CN109559736B (en) Automatic dubbing method for movie actors based on confrontation network
CN105374356B (en) Audio recognition method, speech assessment method, speech recognition system and speech assessment system
CN107221320A (en) Train method, device, equipment and the computer-readable storage medium of acoustic feature extraction model
CN107680582A (en) Acoustic training model method, audio recognition method, device, equipment and medium
CN107610707A (en) A kind of method for recognizing sound-groove and device
CN106504768B (en) Phone testing audio frequency classification method and device based on artificial intelligence
CN108305615A (en) A kind of object identifying method and its equipment, storage medium, terminal
CN110838286A (en) Model training method, language identification method, device and equipment
CN108281137A (en) A kind of universal phonetic under whole tone element frame wakes up recognition methods and system
CN108364662B (en) Voice emotion recognition method and system based on paired identification tasks
CN107767869A (en) Method and apparatus for providing voice service
CN110299135A (en) Intelligent sound signal mode automatic recognition system device
CN110428843A (en) A kind of voice gender identification deep learning method
CN110610709A (en) Identity distinguishing method based on voiceprint recognition
CN105679313A (en) Audio recognition alarm system and method
CN104903954A (en) Speaker verification and identification using artificial neural network-based sub-phonetic unit discrimination
CN112259104B (en) Training device for voiceprint recognition model
CN110047506B (en) Key audio detection method based on convolutional neural network and multi-core learning SVM
CN109271533A (en) A kind of multimedia document retrieval method
CN108876951A (en) A kind of teaching Work attendance method based on voice recognition
CN109473119A (en) A kind of acoustic target event-monitoring method
CN108806694A (en) A kind of teaching Work attendance method based on voice recognition
CN107507625A (en) Sound source distance determines method and device
CN103811000A (en) Voice recognition system and voice recognition method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination