CN1457021A - Information identifying processing method based on nervous network - Google Patents

Information identifying processing method based on nervous network Download PDF

Info

Publication number
CN1457021A
CN1457021A CN 03137640 CN03137640A CN1457021A CN 1457021 A CN1457021 A CN 1457021A CN 03137640 CN03137640 CN 03137640 CN 03137640 A CN03137640 A CN 03137640A CN 1457021 A CN1457021 A CN 1457021A
Authority
CN
China
Prior art keywords
neuron
output
input
class
transmission
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN 03137640
Other languages
Chinese (zh)
Other versions
CN1202494C (en
Inventor
王慧东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN 03137640 priority Critical patent/CN1202494C/en
Publication of CN1457021A publication Critical patent/CN1457021A/en
Application granted granted Critical
Publication of CN1202494C publication Critical patent/CN1202494C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Landscapes

  • Image Analysis (AREA)

Abstract

The system is composed of input neuron, intermediate excitation neuron, suppression neuron, output neuron and link in between them. The information is converted to be level meter inputted in neuron, transmitted with decurrence to one or multiexcitation neuron through excitation neuron till to output neuron to form link channel from input to output; being transmitted with decurrence to one or multisuppression neuron through suppression neuron to transmit the suppression neuron to the excitation neuron or the output neuron for blocking error transmission in excitation neuron link in order to obtain a correct output result.

Description

Information Recognition disposal route based on neural network
One, affiliated technical field
The present invention relates to a kind of information Recognition disposal route in the information processing, be specifically related to a kind of information Recognition disposal route of having utilized nerual network technique.The invention still further relates to this application of information Recognition disposal route in speech recognition and image recognition.
Two, background technology
Neural network is as a kind of message handler, be at present by one of research field of extensive concern, its key property be the usage space distributed component with the input signal nonlinear transformation be output signal and in intensive interconnection structure parallel processing information in large quantities, sort processor is a kind of have power and fault-tolerant processing apparatus.And neural network can also be programmed by the training that is dependent on example, rather than by the regulation algorithm in the conventional processors field for example, its training can be in or be not under the supervisory routine and reach.Therefore, in some occasion that need handle in real time complex information, for example at aspects such as speech recognition, image recognitions, the artificial neural network structure has just demonstrated huge superiority than digital computer structure, and not only the information processing function is strong but also processing speed is fast.
Now developed the neural network such as adaptive resonance theory (ART), backpropagation (BP) network, convection current communication network (CPN), Hopfield net, cognitron (Neocogntion), self-organization mapper multiple structures such as (SOM), the judgement and the identification that are applied to complex information are handled.Above-mentioned various neural network structure respectively possesses some good points, and exists different separately defectives again.
Speech recognition technology is the heat subject of computer nowadays technical research, and it originates from the Audry speech recognition system at the initial stage fifties at first, can discern 10 English digitals.Along with fast development of computer technology, speech recognition technology is also in innovation and development constantly, disconnected mutually dynamic programming technology (DP), linear prediction analysis technology (LP), dynamic time consolidation technology speech recognition technologies such as (DTW) and vector quantization (VQ), the hidden Markov model speech recognition theories such as (HMM) of having occurred.
It is little to have system based on the speech recognition technology of dynamic time consolidation technology, and the characteristics that recognition speed is fast are very efficient for the speech recognition of little vocabulary, but the discrimination that then shows during at the speech recognition of large vocabulary is lower, inefficiency.
Based on the hidden Markov model speech recognition of statistics may be present the most successful audio recognition method, also by the software employing of many commercialization as system kernel, this model has very high discrimination for large vocabulary, can reach more than 95%.But this audio recognition method system is huge, the computing method complexity, particularly this method need be set up a bigger sound bank, only the special sound to specific people has high recognition, but also need there be long adaptation cycle, still it is poor to exist the suitability of unspecified person at present, and the problem of robust performance difference needs to solve.
More than all factors, greatly restricted the development and application of speech recognition technology.And theoretically,, cooperate artificial intelligence system based on neural network also only based on the speech recognition technology of neural network, and could really accomplish to understand and understand human language, accomplish 100% discrimination, thereby realize speech recognition truly.
Three, summary of the invention
The objective of the invention is to set up a kind of new information Recognition disposal route based on neural network.
This application of neural network information Recognition disposal route in speech recognition is provided, and is another goal of the invention of the present invention.
Another goal of the invention of the present invention provides this application of neural network information Recognition disposal route in image recognition.
Nerve network system of the present invention by excited class neuron in the middle of the input neuron, several layers and inhibition class neuron, output neuron and between link form jointly, and be stored in the memory.The concrete grammar that the neural network information Recognition is handled is: after the information of information acquisition device collection is converted into decibel meter, be input in the input neuron of nerve network system, this input neuron is equal to neuronic transmission signal with the high level signal of input, on the one hand pass to one or more excited class neurons by excited class neuron is downward, through the transmission between the excited class neuron of some levels, until being delivered to output neuron, formation from input neuron through the link channel of intrerneuron to output neuron, on the other hand, also the class neuron is downward to pass to one or more inhibition class neurons to input neuron by suppressing, also through the transmission between some levels inhibition class neurons, to suppress signal is delivered on excited class neuron or the output neuron, the mistake transmission of blocking-up in the neural link of excited class, thus obtain one with the corresponding correct output result of information who imports.
Each neuron in the nerve network system all is to be made of following structure: neuron type+transfer counter+downward bonding pad+processing sign.
Wherein, neuronic type comprises input interface class, middle excited class, middle class and output interface class four classes of suppressing.The neuron of input interface class is defined as the neurons of the many outputs of list input, and such neuron is only accepted the binary states input, and the downward connection of input signal is passed to the neuron of one or more excited classes or inhibition class; Intermediate neuron is the neuron of multiple-input and multiple-output, and it plays and connects the neuronic function served as bridge of input and output, and bears transmission, blocking-up and the time-delay of information; Output interface class neuron is defined as the special neurons of the single output of many inputs, and each this type of neuron is all represented a corresponding information output, just passes through after the Processing with Neural Network, does result's output of corresponding relation with input information.
On each neuronic transfer counter, all be set with a determined value; Writing down each on the neuronic downward bonding pad to the neuronic address of the other side of downward connection; Handle sign and be the indication system neuron is carried out the sign of scan process, when neuron is in when transmitting attitude, handles sign and be set at very, waiting system comes to handle.
The formation of neuron link channel and disconnection are in transmittance process, by on prolong the change that neuron decides neuron transfer counter numerical value, when transfer counter from becoming greater than zero the time less than zero, this neuron shows as the transmission attitude, attribute by self passes to downward neuron with excitement or inhibition signal, if identical with downward neuronic type, then downward neuronic transfer counter adds one, otherwise subtracts one; If transfer counter is from becoming less than zero the time greater than zero, this neuron also shows as the transmission attitude, if just identical with downward neuronic type, then downward neuronic transfer counter subtracts one, otherwise adds one; When transfer counter by null value, this neuron shows as normality; System is in the neuron that transmits attitude to all and carries out scan process, the neuron of handling becomes normality by transmitting attitude, system does not handle the neuron that is in the normal state, so repeat, until the transfer counter of certain output neuron that is delivered to from becoming greater than zero less than zero, promptly this output neuron becomes when transmitting attitude, obtains result's output.
For the neuron of forming neural network, the excited class and the neuronic concrete number of plies of inhibition class are determined according to the network size size when setting up in the middle of it, and there is not mutual continuous relationship between layer and the layer yet, information can be amphi-position, stride layer transmission, and determine when setting up for single neuronic information direction of transfer, both can the forward transmission, also can be reverse as feedback, simultaneously, also can carry out the information transmission between the dissimilar neuron.
Nerve network system is to adopt every the slip a line scan mode of skip floor of position to scan the neuron that all need be handled, adopt this scan mode, the trend that flash trimming edge each neuron is in addition extended to any direction all is essentially identical, thereby accomplish the time-delay transmission of input signal, arrived output neuron synchronously.For example, for one 4 * 4 * 4 neural network,
1 2 3 4 17?18?19?20 33?34?35?36 49?50?51?52
5 6 7 8 21?22?23?24 37?38?39?40 53?54?55?56
9 10?11?12 25?26?27?28 41?42?43?44 57?58?59?60
13?14?15?16 29?30?31?32 45?46?47?48 61?62?63?64
Two layers three layers four layers of one decks
The order of system scan should be: 13429 11 12 10 13 15 16 14 5786 33 35 36 34 41 43 44 42 45 47 48 46 37 39 40 38 49 51 52 50 57 59 60 58 61 63 64 62 53 55 56 54 17 19 20 18 25 27 28 26 29 31 32 30 21 23 24 22 1 ... so repeatedly.
Through aforesaid way scanning, make in the signal of the different time dimension of transmitting on the input neuron, the signal of receiving is earlier delayed time through more transmission unit, after the signal received through less transmission unit, all signals can be delivered to output neuron substantially simultaneously, thereby reach the synchronous purpose of time-delay.
Because neural network is a parallel processing system (PPS) that is made of several neurons, each neuron all is equivalent to a unit, there is not the relation of first aftertreatment in computing between neuron and the neuron, reach exchanges data synchronously as long as carry out, the inner just cutting arbitrarily of neural network, and do not influence whole operation.Therefore, She Ji neural network is too huge if desired, just can use many computing machines to form parallel network, simultaneously neural network is handled, to solve the bottleneck problem in calculating, make the neural network of various scales can both have a suitable hardware running environment.
For the parallel network of forming by many computing machines, can select a computing machine as main control computer, other computing machine is as secondary computer, and operation is simultaneously carried out information Recognition and is handled.Its effect is to provide one to be fit to carry out large-scale complex data library lookup, modification, increase, deletion action, running environment that particularly can the parallel processing operation.This parallel network can provide the parallel processing environment of a simulation in the microcomputer of single CPU such as PC, commercial microcomputer, also several one chip microcomputers can be formed a superhuge computer system, more several processing units can be integrated in the integrated circuit, make application specific processor, make its computing power infinite expanding.
Above-mentioned neural network information Recognition disposal route can well be used aspect speech recognition, and concrete applying step is:
A, sound spectrum is done asymmetric segmentation according to the phonetic hearing model, each segment calculates a centre frequency, with this centre frequency is that natural frequency is made perfect condition spring shake submodel, and the height of all spring shake according to response frequency sorted in proper order; The coefficient of stiffiness K of spring shake is a variable element, when output frequency is within the border of adjacent springs shake subcenter frequency, spring shake can be adjusted the K value automatically, make the natural frequency of spring shake equal output frequency, remain resonance amplitude maximum, if between adjacent springs shake subcenter frequency, then natural frequency does not equal initial value to output frequency.
B, employing tonepulse coded system input voice are quantified as audio data stream with acoustic signals, are read by system.
C, when the natural frequency of the input audio data that reads stream quantized value and some spring shake submodels is identical, this spring shake can produce strong resonance effect, the resonance shift amount of record spring shake is formed the dynamic spectrum decibel meter of importing voice.
D, input neuron with frequency spectrum decibel meter input neural network system, input neuron is equal to neuronic transmission signal with the high level signal of input, on the one hand pass to one or more excited class neurons by excited class neuron is downward, through the transmission between the excited class neuron of some levels, until being delivered to output neuron, formation from input neuron through the link channel of intrerneuron to output neuron, on the other hand, also the class neuron is downward to pass to one or more inhibition class neurons to input neuron by suppressing, also through the transmission between some levels inhibition class neurons, to suppress signal is delivered on excited class neuron or the output neuron, the mistake transmission of blocking-up in the neural link of excited class, thus one and the formation of the corresponding pronunciation probability of input voice obtained.
E, combined sorting is carried out in the probability formation of the probability formation of each pronunciation and front and back pronunciation, form sentence output.
F, the possible mistake of sentence of front end output is carried out voice error correction identification, obtain right-on sentence output.
Detailed process about voice error correction identification is:
A, foundation output circulation mapping formation in internal memory are used for part and hold literal and the pronunciation that front end is exported sentence, and synchronous with the output of user interface.
B, when the mapping formation receives the pronunciation of a crucial words of being determined by the voice error correction system, judge whether what closely follow before it is a phrase, and judge whether the pronunciation of the next literal that receives thereafter is the pronunciation of the some literal in this phrase.
If the c condition satisfies, according to the specific syntax of correcting mistakes, search for and locate the literal that this pronounces together forward from phrase, replace the same pronunciation literal that searches with the literal in the phrase.
If the d condition does not satisfy, finish voice error correction identification, proceed speech recognition.
Utilize that the neural network antijamming capability is strong, the good characteristic of identification accuracy, neural network information Recognition disposal route can also be applied in the identification of image or image.The identification of image, image collects digital picture by image input device exactly, is obtained the process of a corresponding results output again by the Processing with Neural Network that trains.
Wherein, digital picture can by digital camera, Digital Video, digital probe or to have an analog information collecting device conversion of mould/number conversion function resultant.Because the input window of neural network receives only the input of 2 values, so any digital picture all will be carried out the conversion of 2 values in advance.When image transforms, earlier picture breakdown is become pixel one by one, each pixel resolves into multiple monochrome again, each monochrome has different gray scales, color range according to gray scale, make corresponding input neuron, the input neuron number of a pixel correspondence is color range number * color matching number, is 2 value images with the image transitions of intending identification.
2 value images after the conversion are directly inputted to the input neuron of neural network, this input neuron is equal to neuronic transmission signal with the high level signal of input, on the one hand pass to one or more excited class neurons by excited class neuron is downward, through the transmission between the excited class neuron of some levels, until being delivered to output neuron, formation from input neuron through the link channel of intrerneuron to output neuron, on the other hand, also the class neuron is downward to pass to one or more inhibition class neurons to input neuron by suppressing, also through the transmission between some levels inhibition class neurons, to suppress signal is delivered on excited class neuron or the output neuron, the mistake transmission of blocking-up in the neural link of excited class, through excited class and the interaction that suppresses neural network, final produce one with the corresponding output result of the input information of once training, and be translated into the controlled quentity controlled variable of other system.
It is that a series of pictures that is on the time dimension is used as the identification that an integral body is carried out that the identification of image, image is handled, because neural network of the present invention has the delay process ability, can design the corresponding neural network middle layer number of plies according to the time length of recognition objective in advance, the image of input acts on output neuron simultaneously through the image of time-delay back and input subsequently, exported the result accordingly, therefore, can discern dynamic image by the present invention.Certainly, this " time-delay " disposal route of neural network can also have the good treatment ability to the data of any multidimensional, for example, can discern 3 D stereo dynamic image or the like.
Neural network information Recognition disposal route of the present invention can also be applied in numerous areas such as identification (fingerprint, head portrait, vocal print, retina etc.), picture retrieval, literal identification, target following, target lock-on, Based Intelligent Control.
Neural network information Recognition disposal route of the present invention has the advantage of following several respects:
1, the identification processing speed is fast.
Neural network of the present invention is to come process information by transmission, and processing speed just is equivalent to transmission speed, and this transmission speed can reach the limiting velocity of signal Processing theoretically.
2, suitability is strong.
Nerve network system of the present invention has input, the output neuron interface specification of standard, go for any field of information processing, and can determine the Processing with Neural Network system of different scales according to user's demand, and result output both can be used as the Controlling Source of other system, also can directly be used as information and export to the user.
3, expandability is strong.
Neural network of the present invention if the information complexity of handling increases, then can enlarge network size on original neural net base when reality is used, and by the dilatancy training, regenerate new nerve and handle network, and training burden is significantly reduced.
4, networking flexibility.
Connected mode is various between the neuron in the neural network of the present invention, can realize various functions, as reflection, feedback, vibration, interruption, activation, inhibition, screening, association, reasoning, insight or the like.In addition, building of hardware operation platform also can be varied, can use unit, also types such as dissimilar single-chip microcomputers, personal-machine, commercial machine can be organized into the operation platform of storing as calculating together, even can make the dedicated processes chip.
5, fault-tolerance is better.
Owing to introduced inhibition class neuron, the mistake output of system has been had better operability, by the training work in early stage, the neural network antijamming capability of deciding will be stronger, can make information processing obtain better accuracy.
6, training is easily simple relatively.
Because neural network of the present invention has good expandability, only need train new content just during training, newly-generated system has good downward compatibility, institute so that training work significantly reduce.
7, can handle multidate information.
Traditional neural network model is comparatively difficult when handling multidate information, and the present invention but can handle in real time, is equivalent to have " time memory " function, has good cognitive processing power for the information that is in the time slice.
Particularly aspect voice recognition processing, the present invention has more superiority.
The discrimination of general voice recognition processing method obtains after at the special test record of special messenger, can be between 95~98%, and, do not reach above-mentioned standard for the test of unspecified person at all, most people's discrimination only is between 60~90%, and is poor for the suitability of unspecified person.The present invention uses the neural network audio recognition method, solved this problem, behind the general training by universality, just can be with different dialects, the pronunciation of different phonetic intonation provides unique output, needn't have acclimatization training to system again, does not also need the user to train again, identification suitability for unspecified person strengthens greatly, and discrimination improves greatly.
Audio recognition method of the present invention can adapt to foreign environment automatically, and the user need not to train, and just can directly use, and has the incomparable advantage of other audio recognition method.
Noise is the formidable enemy of speech recognition always, handles badly, can cause robustness (Robust) variation of system.The present invention directly carries out noise reduction to the collection of sound, system adopts the dynamic network adjustment technology to generate automatically antinoise signal is resisted input noise, no longer need to add extras, there is not the specific occasion restriction, just reduced noise effectively, improve signal to noise ratio (S/N ratio), produced extraordinary recognition effect.Simultaneously, also utilized the failure tolerance of neural network, come further " digestion " noise, reduced the influence of noise system's output.
System makes each part operate in optimum condition constantly all the time with dynamic adjustment operational factor, obtains best output result, improves discrimination.
Four, embodiment
Present embodiment is specific descriptions that are applied to speech recognition about neural network information Recognition disposal route, and its identifying is:
1, phonetic entry
Adopt the tonepulse coded system, directly the sound wave with voice is quantified as audio data stream input system.
High precision, high sample frequency can make after the sampling be coded in reduction the time more true to nature, the coding that sampling is generated also is multiplied thereupon, handles needed calculation resources and is multiplied especially.General audio recognition method all adopts the sample frequency of 10KHz or 16KHz, we find, low sample frequency and sampling precision can be lost useful informations many in the voice signal, the accuracy of identification is reduced, the present invention is for extracting the effective constituent in the sound effect more completely, adopted the high sample frequency of 22KHz first, far above other recognition methodss, for degree of accuracy, the accuracy of follow-up identification provides good assurance.
2, audio data buffer
Because the operation of each system of neural network inside is handled the time that is spent and can be changed according to factors such as content, states, might at a time fail timely the data of input to be handled, and constantly may compare the leisure at another, so the audio data stream that needs to produce constantly deposits the buffer zone formation earlier in, is read by the clock synchronization unit controls again.
Buffer zone is a ring-type formation, definition writes pointer and reads pointer, whole ring is being followed and is being write the principle of reading again earlier and carry out work, if read slow excessively, the pointer that writes catch up with when reading pointer after one week of ring, apply for that then more internal memory enlarges buffer zone, if but write relatively slow, read pointer and follow closely and write after the pointer, whether then survey the internal memory that writes before the pointer has a large amount of free time, if any then part release, make operating system can use more internal memory.
3, filter array
General audio recognition method is directly to use Fourier transform (FFT) mode to do the analysis of time-domain and frequency-domain digital signal at present, again analysis result is offered model of cognition and discerns.Its advantage is to analyze accurately, and in detail, but shortcoming is to carry out the processing of frame one by one to time domain and frequency domain, the computing method complexity, and calculated amount is huge, and processing speed is relatively slow, and system and hardware had relatively high expectations poor anti jamming capability.The present invention adopts filter array that the data stream that quantizes is converted into dynamic frequency spectrum decibel meter, its filtering system has been simulated the biophysics process of people's ear cochlea substantially, auditory properties in conjunction with people's ear, and adopt dynamic spring oscillator model algorithm, the computing of filter array is simplified more, and structure is simpler, and processing speed is faster, can be connected with neural network easily again, guaranteed system to the voice extracts active ingredients accurately with efficient.Its detailed process is:
A, according to the auditory model of voice frequency spectrum is done asymmetric segmentation, each segment calculates a centre frequency, is the shake submodel that natural frequency is made perfect condition again with the centre frequency.
B, all spring shake are pressed the height rank order of response frequency, quantized value with audio data stream shakes sub external force as promoting spring, when the sub natural frequency of certain spring shake is identical with input audio data stream quantized value, will produce strong resonance effect.
C, write down the displacement of all springs shake, generate a frequency spectrum decibel meter, finish the output of frequency spectrum.
Filter array can be determined different segmentation precision according to different needs, and segmentation is meticulous more, and spring shake is just many more, and output is also just accurate more, but calculated amount also can correspondingly increase.
In order under the situation of less spring shake, response output preferably also to be arranged system, the present invention has made variable element with the coefficient of stiffiness K value of each spring shake, output frequency is within the adjacent shake subcenter frequency boundary, automatically adjust the K value, make the sub-natural frequency of shake equal output frequency, make resonance amplitude maximum, output frequency is not between adjacent shake subcenter frequency boundary, adjust the K value, make the sub-natural frequency of shake equal initial value.
Because the output waveform amplitude difference of each spring shake is bigger, the interference signal is also more, for the output that makes each spring shake all within limited, a unified scope, need dynamically generate a frequency spectrum decibel meter according to the output of neural disposal system.
Being defined as of decibel meter span: when the interface nerve of the minimum value correspondence of decibel meter is in excited attitude all the time, then the corresponding minimum amplitude value of this spring shake is raise, when being in aepression for a long time, then the minimum amplitude value is reduced, when the interface nerve of the maximal value correspondence of decibel meter is in excited attitude, then relatively spring shakes sub amplitude and peak swing, with the big worthwhile peak swing of doing, and should value can slowly reduce in time, between minimum amplitude value and peak swing value, do the index cutting of n part then, level value of each part correspondence.
Input interface as neural network, filter array has also possessed the function of self-control, can raise or reduce the response sensitivity of a certain frequency band according to feedback signal automatically, make system abandon complicated noise reduction system, improve the discrimination under the rugged surroundings from neural network.
4, neural network identification processing system
The Processing with Neural Network system is based on artificial neural network, and the calculation process of neural network is to finish by information transmission between the neuron and blocking-up operation, and each group link channel is all represented a corresponding information process.
The signal of being carried by filter array inserts input neuron, after the downward transmission of several times intrerneuron, set up one by the link channel of input neuron to output neuron, finally produce one and the corresponding result's output of input signal at certain output neuron, only be in wherein that system just carries out scan process to middle neuron when transmitting attitude, when this neuron is in the normal state, keep original chain to connect between neuron and the following neuron and concern at this neuron.So re-treatment is until setting up a correct link channel.
Nerve network system is the information parallel processing system (PPS) of a complexity, owing to be to become a parallel neural network computing machine of forming by several neurons with common computer, the operation processing time of each system of neural network inside can be according to calculating content, states etc. are former thereby change, but as the whole necessary concerted action of a neural network, therefore in forming neuronic Processing with Neural Network unit, the clock synchronization unit is set, after last processing unit processes that the each entire scan in clock synchronization unit is handled finishes, be a scan period, and act on the control of reading pointer with data sync in the buffer zone.
5, voice vocabulary discrimination system
After the voice signal of input is handled through the neural network identification processing system, obtain be one with the input the formation of the corresponding pronunciation probability of voice, for example a pronunciation is after Processing with Neural Network, may obtain a correct phonetic, but the tone that also has other, or similar phonetic, therefore the probability formation of these pronunciations need be linked in the grammer vocabulary analytic system, utilize this conventional treating method, the combined sorting of grammer and morphology is carried out in the probability formation of the probability formation of pronunciation and front and back pronunciations thereof, finally select correct output.
Through the output after the grammer lexical analysis, though can satisfy most identification, some phrases are still arranged, their structure, part of speech, pronunciation are all identical, just the meaning difference of representative (for example they, they, they), at this moment just need enter in the semantic net system and select.The foundation of semantic net is that the vocabulary that system can't distinguish is gathered, and the purposes difference between record speech and the speech, and describe environment for use makes finally can obtain a correct output.
Under the perfect inadequately situation of semantic net, still need the user to carry out error correction, but traditional error correction method is that error correction is very inconvenient with the correct words of external units such as keyboard input.Voice vocabulary discrimination system of the present invention support utilizes the voice error correction, its principle is that system remains with historical record, when the words pronunciation of hearing is the repeat-of-pronunciation of a words in the previous output, whether detect follow-up pronunciation automatically is the error correction content, if the error correction content then corrects wrong words and morphology, grammatical analysis is carried out in the pronunciation after the fault again.For example the voice sequence of " everybody is (entirely not) agreement all " is: d à ji ā qu á n bu t ó ng y ì, when semantic net is difficult to judge the accurate output of these voice, system will be according to frequency or provide randomly one output the result, at this moment can read in following pinyin sequence: d à ji ā qu á nb ù t ó ng y ì b ù sh ì d ē b ù, perhaps read in following pinyin sequence: qu á n b ù d ē b ù sh ì bu sh ì d ē b ù, so just can draw a right-on statement, make system reach 100% correct recognition rata, thereby the ideal that just can solve the input problem by speech recognition is achieved by this voice error correction method.
Concrete voice error correction identification step is as follows:
1, in internal memory, sets up output circulation mapping formation, be used to hold limited literal and pronunciation.User Interface exported in the literal that Front End is determined, and this literal of record and pronunciation thereof in the mapping formation simultaneously.
2, when the mapping formation receives a crucial words of being determined by the voice error correction system as the pronunciation of " " word, then start the voice error correction system and judge that whether what closely follow before it is a phrase, receives whether next pronunciation is the pronunciation of certain literal in this phrase if then wait for.If condition satisfies, according to the specific syntax of correcting mistakes, search for and locate this literal forward with pronunciation from phrase, with the same pronunciation literal that the replacement of the literal in the phrase searches, delete the used literal of error correction simultaneously.
If 3 conditions do not satisfy, finish voice error correction identification, proceed speech recognition.

Claims (10)

1, information Recognition disposal route based on neural network, it is characterized in that: nerve network system is by the input neuron that is stored in the memory, excited class neuron and inhibition class neuron in the middle of the several layers, output neuron and between link form jointly, after the information of information acquisition device collection is converted into decibel meter, be input in the input neuron of nerve network system, input neuron is equal to neuronic transmission signal with the high level signal of input, on the one hand pass to one or more excited class neurons by excited class neuron is downward, through the transmission between the excited class neuron of some levels, until being delivered to output neuron, formation from input neuron through the link channel of intrerneuron to output neuron, on the other hand, also the class neuron is downward to pass to one or more inhibition class neurons to input neuron by suppressing, also through the transmission between some levels inhibition class neurons, to suppress signal is delivered on excited class neuron or the output neuron, the mistake transmission of blocking-up in the neural link of excited class, thus obtain one with the corresponding correct output result of information who imports.
2, information Recognition disposal route according to claim 1, the formation that it is characterized in that the nerve chain paths realizes in the following manner with blocking-up: each neuron all has a transfer counter of setting determined value, the change of this transfer counter numerical value by on prolong neuron decision, when transfer counter from becoming greater than zero the time less than zero, this neuron shows as the transmission attitude, and waiting system scan process, when obtaining scan process, this neuron passes to downward neuron by the attribute of self with excitement or inhibition signal, if it is identical with downward neuronic type, then downward neuronic transfer counter adds one, otherwise subtracts one; When transfer counter from becoming less than zero the time greater than zero, this neuron also shows as the transmission attitude, just when system handles, if identical with downward neuronic type, then downward neuronic transfer counter subtracts one, otherwise adds one; When transfer counter by null value, this neuron shows as normality; System is in the neuron that transmits attitude to all and carries out scan process, the neuron of handling becomes normality by transmitting attitude, system does not handle the neuron that is in the normal state, so repeat, until the transfer counter of certain output neuron that is delivered to from becoming greater than zero less than zero, promptly this output neuron becomes when transmitting attitude, obtains result's output.
3, information Recognition disposal route according to claim 1, the excited class and the neuronic number of plies of inhibition class are to determine according to the network size size when setting up in the middle of it is characterized in that, and no mutual continuous relationship between layer and the layer, information can be amphi-position, stride layer transmission, and determine when setting up for single neuronic information direction of transfer, both can the forward transmission, also can be reverse as feedback, also can carry out the information transmission between the simultaneously dissimilar neurons.
4, information Recognition disposal route according to claim 1 is characterized in that it is to take every the slip a line scan method of skip floor of position that system needs the neuronic scanning of scan process to all.
5, information Recognition disposal route according to claim 1 is characterized in that and can form parallel network by a plurality of arithmetic elements, simultaneously input information is discerned processing.
6, information Recognition disposal route according to claim 5 is characterized in that described arithmetic element can be to be made of PC, commercial microcomputer or single-chip microcomputer.
7, the application of the described neural network information Recognition of claim 1 disposal route in speech recognition is characterized in that comprising the steps:
A, sound spectrum is done asymmetric segmentation according to the phonetic hearing model, each segment calculates a centre frequency, with this centre frequency is that natural frequency is made perfect condition spring shake submodel, and the height of all spring shake according to response frequency sorted in proper order;
B, employing tonepulse coded system input voice are quantified as audio data stream with acoustic signals, are read by system;
C, when the natural frequency of the input audio data that reads stream quantized value and some spring shake submodels is identical, this spring shake can produce strong resonance effect, the resonance shift amount of record spring shake is formed the dynamic spectrum decibel meter of importing voice;
D, input neuron with frequency spectrum decibel meter input neural network system, input neuron is equal to neuron transmission signal with the high level signal of input, on the one hand pass to one or more excited class neurons by excited class neuron is downward, through the transmission between the excited class neuron of some levels, until being delivered to output neuron, formation from input neuron through the link channel of intrerneuron to output neuron, on the other hand, also the class neuron is downward to pass to one or more inhibition class neurons to input neuron by suppressing, also through the transmission between some levels inhibition class neurons, to suppress signal is delivered on excited class neuron or the output neuron, the mistake transmission of blocking-up in the neural link of excited class, thus one and the formation of the corresponding pronunciation probability of input voice obtained;
E, combined sorting is carried out in the probability formation of the probability formation of each pronunciation and front and back pronunciation, form sentence output;
F, the possible mistake of sentence of front end output is carried out voice error correction identification, obtain right-on sentence output.
8, the application of the described neural network information Recognition of claim 7 disposal route in speech recognition is characterized in that described voice error correction identification is:
A, foundation output circulation mapping formation in internal memory are used for part and hold literal and the pronunciation that front end is exported sentence, and synchronous with the output of user interface;
B, when the mapping formation receives the pronunciation of a crucial words of being determined by the voice error correction system, judge whether what closely follow before it is a phrase, and judge whether the pronunciation of the next literal that receives thereafter is the pronunciation of the some literal in this phrase;
If the c condition satisfies, search for and locate the literal that this pronounces together according to the specific syntax of correcting mistakes forward from phrase, replace the same pronunciation literal that searches with the literal in the phrase;
If the d condition does not satisfy, finish voice error correction identification, proceed speech recognition.
9, the application of the described neural network information Recognition of claim 7 disposal route in speech recognition, the coefficient of stiffiness K that it is characterized in that spring shake is a variable element, output frequency is within the border of adjacent springs shake subcenter frequency, spring shake can be adjusted the K value automatically, make the natural frequency of spring shake equal output frequency, remain resonance amplitude maximum, between adjacent springs shake subcenter frequency, natural frequency does not equal initial value to output frequency.
10, the application of the described neural network information Recognition of claim 1 disposal route in image recognition is characterized in that comprising the steps:
A, elder generation become pixel one by one with picture breakdown, each pixel resolves into multiple monochrome again, each monochrome has different gray scales, color range according to gray scale, make corresponding input neuron, the input neuron number of a pixel correspondence is color range number * color matching number, is 2 value images with the image transitions of intending identification;
B, 2 value images after the conversion are directly inputted to the input neuron of neural network, this input neuron is equal to neuron transmission signal with the high level signal of input, on the one hand pass to one or more excited class neurons by excited class neuron is downward, through the transmission between the excited class neuron of some levels, until being delivered to output neuron, formation from input neuron through the link channel of intrerneuron to output neuron, on the other hand, also the class neuron is downward to pass to one or more inhibition class neurons to input neuron by suppressing, also through the transmission between some levels inhibition class neurons, to suppress signal is delivered on excited class neuron or the output neuron, the mistake transmission of blocking-up in the neural link of excited class, thus one and the corresponding image output of input picture obtained.
CN 03137640 2003-06-09 2003-06-09 Information identifying processing method based on nervous network Expired - Fee Related CN1202494C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 03137640 CN1202494C (en) 2003-06-09 2003-06-09 Information identifying processing method based on nervous network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 03137640 CN1202494C (en) 2003-06-09 2003-06-09 Information identifying processing method based on nervous network

Publications (2)

Publication Number Publication Date
CN1457021A true CN1457021A (en) 2003-11-19
CN1202494C CN1202494C (en) 2005-05-18

Family

ID=29411819

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 03137640 Expired - Fee Related CN1202494C (en) 2003-06-09 2003-06-09 Information identifying processing method based on nervous network

Country Status (1)

Country Link
CN (1) CN1202494C (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8255353B2 (en) 2006-05-16 2012-08-28 Zhan Zhang Method for constructing an intelligent system processing uncertain causal relationship information
CN105488565A (en) * 2015-11-17 2016-04-13 中国科学院计算技术研究所 Calculation apparatus and method for accelerator chip accelerating deep neural network algorithm
CN105654955A (en) * 2016-03-18 2016-06-08 华为技术有限公司 Voice recognition method and device
CN107578097A (en) * 2017-09-25 2018-01-12 胡明建 A kind of design method of more threshold values polygamma function feedback artificial neurons
CN111103568A (en) * 2019-12-10 2020-05-05 北京声智科技有限公司 Sound source positioning method, device, medium and equipment

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8255353B2 (en) 2006-05-16 2012-08-28 Zhan Zhang Method for constructing an intelligent system processing uncertain causal relationship information
CN105488565A (en) * 2015-11-17 2016-04-13 中国科学院计算技术研究所 Calculation apparatus and method for accelerator chip accelerating deep neural network algorithm
CN105654955A (en) * 2016-03-18 2016-06-08 华为技术有限公司 Voice recognition method and device
CN105654955B (en) * 2016-03-18 2019-11-12 华为技术有限公司 Audio recognition method and device
CN107578097A (en) * 2017-09-25 2018-01-12 胡明建 A kind of design method of more threshold values polygamma function feedback artificial neurons
CN111103568A (en) * 2019-12-10 2020-05-05 北京声智科技有限公司 Sound source positioning method, device, medium and equipment

Also Published As

Publication number Publication date
CN1202494C (en) 2005-05-18

Similar Documents

Publication Publication Date Title
CN112818892B (en) Multi-modal depression detection method and system based on time convolution neural network
WO2020248376A1 (en) Emotion detection method and apparatus, electronic device, and storage medium
Chen et al. A Multi-Scale Fusion Framework for Bimodal Speech Emotion Recognition.
CN1236423C (en) Background learning of speaker voices
CN1296886C (en) Speech recognition system and method
CN1139911C (en) Dynamically configurable acoustic model for speech recognition systems
US20060206333A1 (en) Speaker-dependent dialog adaptation
CN107103903A (en) Acoustic training model method, device and storage medium based on artificial intelligence
CN111916054B (en) Lip-based voice generation method, device and system and storage medium
CN112863529B (en) Speaker voice conversion method based on countermeasure learning and related equipment
CN111329494A (en) Depression detection method based on voice keyword retrieval and voice emotion recognition
CN115641543A (en) Multi-modal depression emotion recognition method and device
CN112507311A (en) High-security identity verification method based on multi-mode feature fusion
KR101542294B1 (en) Image recognition system based on cascaded over-complete dictionaries
Liu et al. Speech emotion recognition based on transfer learning from the FaceNet framework
KR20200088263A (en) Method and system of text to multiple speech
Illium et al. Surgical mask detection with convolutional neural networks and data augmentations on spectrograms
CN1202494C (en) Information identifying processing method based on nervous network
Pal et al. Synthetic speech detection using meta-learning with prototypical loss
KR20190135853A (en) Method and system of text to multiple speech
Liu et al. MSDWild: Multi-modal Speaker Diarization Dataset in the Wild.
CN116758451A (en) Audio-visual emotion recognition method and system based on multi-scale and global cross attention
Vlasenko et al. Fusion of acoustic and linguistic information using supervised autoencoder for improved emotion recognition
Zhang et al. Learning singing from speech
KR102429365B1 (en) System and method for analyzing emotion of speech

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C19 Lapse of patent right due to non-payment of the annual fee
CF01 Termination of patent right due to non-payment of annual fee