WO2006000103A1 - Reseau neural impulsionnel et utilisation de celui-ci - Google Patents

Reseau neural impulsionnel et utilisation de celui-ci Download PDF

Info

Publication number
WO2006000103A1
WO2006000103A1 PCT/CA2005/001018 CA2005001018W WO2006000103A1 WO 2006000103 A1 WO2006000103 A1 WO 2006000103A1 CA 2005001018 W CA2005001018 W CA 2005001018W WO 2006000103 A1 WO2006000103 A1 WO 2006000103A1
Authority
WO
WIPO (PCT)
Prior art keywords
neurons
recited
layer
neuron
layers
Prior art date
Application number
PCT/CA2005/001018
Other languages
English (en)
Inventor
Jean Rouat
Ramin Pichevar
Original Assignee
Universite De Sherbrooke
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Universite De Sherbrooke filed Critical Universite De Sherbrooke
Publication of WO2006000103A1 publication Critical patent/WO2006000103A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/751Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
    • G06V10/7515Shifting the patterns to accommodate for positional errors
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks

Definitions

  • the present invention relates to neural networks. More specifically, the present invention is concerned with a spiking neural network and its use in pattern recognition and in monophonic source separation.
  • Pattern recognition is an aspect of the field of artificial intelligence aiming at providing perceptions to "intelligent" systems, such as robots, programmable controllers, speech recognition systems, artificial visions systems, etc.
  • pattern recognition In pattern recognition, comparison criteria, similarities between shapes, and distances must be computed in order to answer questions such as: - "Are these objects similar?" ⁇ "Has the system already identified this form?" ⁇ "Is this pattern different enough from the other patterns already identified by the system?" ⁇ “Is this form to be remembered?” ⁇ etc.
  • pattern recognition systems In a nutshell, pattern recognition systems must use performance and comparison criteria usually assimilated as distances.
  • distance should be construed as a probability, an error, a score. It is a value that can be assimilated to a distance. This type of criteria is widely used, for example: ⁇ in any rule-based expert system; ⁇ in statistical Markovian systems; ⁇ in second generation (formal) neural network system; * etc.
  • comparing Nsignals would require two steps: 1. Compute distance on each pair of signals; and 2. Find similar signals by sorting and comparing distances.
  • any distance between objects can be represented by: ⁇ more or less similar spike timings between neurons; ⁇ a single spike issued by a neuron, resulting from a a specific input sequence of This process is called "spikes order coding", and is characterized by the existence of couples of excitatory/inhibitory neurons , providing recognition of incoming spikes sequences from other neurons, after the spike has been generated by the neuron.
  • Synchronization coding occurs when two neurons groups appear spontaneously because of the neurons interconnections plasticity. Thus, two neurons having similar inputs present a growth of their mutual synaptic connections, causing their outputs to be synchronous. Otherwise, when neurons inputs are not similar, their mutual synaptic connections decrease, causing them to be desynchronized. In fact, the inputs of two neurons spiking simultaneously are relatively correlated.
  • separation of mixed signals is an important problem with many applications in the context of audio processing. It can be used, for example, to assist a robot in segregating multiple speakers, to ease the automatic transcription of video via the audio tracks, to separate musical instruments before automatic transcription, to clean the signal before performing speech recognition, etc.
  • the ideal instrumental setup is based on the use of array of microphones during recording to obtain many audio channels. In fact, in that situation, very good separation can be obtained between noise and signal of interest (see [29], [33], and [50]) and experiments with great improvements have been reported in speech recognition [4], [64]. Further applications have been ported on mobile robots [66], [71], [72]and have also been developed to track multi-speakers [58].
  • the source separation process implies segregation and/or fusion (integration), usually based on correlation, statistical estimation, binding, etc. of features extracted by the analysis module.
  • Monophonic source separation systems can be seen as comprising two main stages: i) signal analysis to yield a representation suitable to the second stage ii) clustering with segregation.
  • bottom-up processing corresponds to primitive processing
  • top-down processing means schema-based processing [15].
  • the auditory cues proposed by Bregman [15] for simple tones are not applicable directly to complex sounds. More sophisticated cues based on different auditory maps are thus desirable.
  • Ellis [13] uses sinusoidal tracks created by the interpolation of the spectral picks of the output of a cochlear filter bank, while Mellinger's model [41 ] uses partials.
  • a partial is formed if an activity on the onset maps (the beginning of an energy burst) coincides with an energy local minimum of the spectral maps.
  • Cooke [9] introduced the harmony strands, which is the counterpart of Mellinger's cues in speech.
  • the integration and segregation of streams is done using Gestalt and Bregman's heuristics.
  • Berthommierand Meyer use Amplitude Modulation maps (see [4], [42], [49] and [63]).
  • Gaillard [32] uses a more conventional approach by using the first zero crossing for the detection of pitch and harmonic structures in the frequency-time map. Brown proposes an algorithm [17] based on the mutual exclusivity Gestalt principle.
  • Hu and Wang use a pitch tracking technique [26].
  • Wang and Brown [69] use correlograms in combination with bio-inspired neural networks.
  • Grossberg [37] proposes a neural architecture that implements Bregman's rules for simple sounds. Sameti [58] uses HMMs (Hidden Markov Models), while Ro Stamm [57] and Reyes-Gomez [53] use Factorial HMMs. Jang and Lee [53] use a technique based on Maximum a posteriori (MAP) criterion. Another probability-based CASA is proposed by Cooke [22].
  • lrino and Patterson [30] propose an auditory representation that is synchronous to glottis and preserves fine temporal information, which makes possible the synchronous segregation of speech.
  • Harding and Meyer [22] use a model of multi-resolution with parallel high-resolution and low-resolution representations of the auditory signal. They propose an implementation for speech recognition.
  • Nix [45] performs a binaural statistical estimation of two speech sources by an approach that integrates temporal and frequency-specific features of speech. It tracks magnitude spectra and direction on a frame-by-frame basis.
  • a well known amazing characteristic of human perception is that the recognition of stimuli is quasi-instantaneous, even if the information propagation speed in living neurons is slow [26], [60], [61]. This implies that neural responses are conditioned by previous events and states of the neural sub-network [71]. Understanding of the underlying mechanisms of perception in combination with that of the peripheral auditory system [11], [17], [23], [73] allows designing an analysis module.
  • novelty detection allows facilitating autonomy. For example, it can allow robots to detect if stimuli are new or already seen. When associated with conditioning, novelty detection can create autonomy of the system [15], [24].
  • Sequence classification is particularly interesting for speech. Recently Panchev and Wermter [46] have shown that synaptic plasticity can be used to perform recognition of sequences. Perrinet [78] and Thorpe [61] discuss the importance of sparse coding and rank order coding for classification of sequences.
  • Neuron assemblies (groups) of spiking neurons can be used to implement segregation and fusion (integration) of objects in an auditory image representation.
  • correlations (or distances) between signals are implemented with delay lines, products and summations.
  • comparison between signals can be made with spiking neurons without implementation of delay lines. This is achieved by presenting images to spiking neurons with dynamic synapses. Then, a spontaneous organization appears in the network with sets of neurons firing in synchrony. Neurons with the same firing phase belong to the same auditory objects.
  • Milner [43] and Malsburg [67], [68], [69] propose the temporal correlation to perform binding. Milner and Malsburg have observed that synchrony is a crucial feature to bind neurons associated to similar characteristics.
  • Pattern recognition robust to noise, symmetry, homothety (size change with angle preservation), etc. has long been a challenging problem in artificial intelligence.
  • Many solutions or partial solutions to this problem have been proposed using expert systems or neural networks.
  • ⁇ Normalization In this approach the analyzed object is normalized to a standard position and size by an internal transformation. Advantage of this approach include i) the coordinate information (the "where" information) is retrievable at any stage of the processing and ii) there is a minimum loss of information.
  • the disadvantage of this approach is that the network should find the object in the scene and then normalize it. This task is not as obvious as it can appear [35], [51].
  • DLM Dynamic Link Matching
  • blobs may or may not correspond to a segmented region of the visual scene, since their size is fixed in the whole simulation period and is chosen by some parameters in the dynamics of the network [35].
  • the apparition of blobs in the network has been linked to the attention process present in the brain by the developers of the architecture.
  • the dynamics of the neurons used in the original DLM network are not the well-known spiking neuron dynamics.
  • the behavior of neurons from the DLM is based on rate coding (average neuron activity over time) and can be shown to be equivalent to an enhanced dynamic Kohonen Map in its Fast Dynamic Link Matching (FDLM) [35].
  • FDLM Fast Dynamic Link Matching
  • the above systems from the prior art are supervised / non-autonomous or include two operating modes: learning and recognition.
  • An object of the present invention is therefore to provide an improved method for monophonic sound separation.
  • Another object of the invention is to provide an improved method to image processing and/or recognition.
  • Another object of the invention is to provide an improved method for pattern recognition.
  • an Oscillatory Dynamic Link Matching algorithm which uses spiking neurons and is based on phase coding.
  • a two-layer neural network is also provided which is capable of doing motion analysis without requiring either the computing of optical flow or additional signal processing between its layers.
  • the proposed neural network can solve the correspondence problem, and at the same time, perform the segmentation of the scene, which is in accordance with the Gestalt theory of perception [21].
  • the proposed neural network based system is very useful in pattern recognition in multiple-object scenes.
  • the proposed network does normalization, segmentation, and pattern recognition at the same time. It is also self-organized.
  • a neural network system comprising: first and second layers of spiking neurons; each neurons from the first layer being configured for first internal connections to other neurons from the first layer or for external connections to neurons from the second layer to receive first extra-layer stimuli therefrom and for receiving first external stimuli; each neurons from the second layer being configured for second internal connections to other neurons from the second layer or for the external connections to neurons from the first layer to receive second extra-layer stimuli therefrom and for receiving second external stimuli; and at least one network activity controller connected to at least some of the neurons from each of the first and second layers for regulating the activity of the first and second layers of spiking neurons; whereby, in operation, upon receiving the first and second external stimuli, the first and second internal connections are promoted, and synchronous spiking from neurons from the first and second layers are promoted by the external connections when some of the first external stimuli are similar to some of the second external stimuli.
  • auditory-based features are integrated with an unconventional pattern recognition system, based on a network of spiking neurons with dynamical and multiplicative synapses.
  • the analysis is dynamical and extracts multiple features (and maps), while the neural network does not require any training and is autonomous.
  • a system for monophonic source separation comprising; a vocoder for receiving a sound mixture including at least one monophonic sound source; an auditory image generator coupled to the vocoder for receiving the sound mixture therefrom for generating an auditory image representation of the sound mixture; a neural network as recited in claim 1 , coupled to the auditory image generator for receiving the auditory image representation and for generating a mask in response to the auditory image representation; a multiplier coupled to both the vocoder and the neural network for receiving the mask from the neural network and for multiplying the mask with the at least one monophonic sound mixture from the vocoder, resulting in the identification of the at least one monophonic source by muting sounds from the sound mixture not belonging to the at least one monophonic source.
  • a method for establishing correspondence between first and second images each of the first and second images including pixels, the method comprising: providing a neural network including first and second layers of neurons; applying pixels from the first image to respective neurons of the first layer of neurons and pixels from the second image to respective neurons of the second layer of neurons; interconnecting each neuron from the first layer to each neuron of the second layer; performing a dynamic matching between the first and second layers, yielding a temporal correlation between the first and second layers; and using the temporal correlation between the first and second layers for establishing correspondence between the first and second images.
  • does not need explicit rules to create a separation mask. For example, mapping between the rules, developed by Bregman [6] for simple sounds and the real world, is difficult. Therefore, as long as the aforementioned rules are not derived and well-documented [22], expert systems are difficult to use; ⁇ does not require time-consuming training phase prior to the separation phase contrarily to approaches based on statistics like HMMs [58], Factorial HMMs [53], or MAP [53] that usually do; and ⁇ is autonomous, as it does not use hierarchical classification.
  • a method for establishing correspondence between first and second sets of data comprising; providing a neural network including first and second layers of neurons; providing first and second image representations of respectively the first and second sets of data including pixels; applying the first and second image representations respectively to the first and second layers; and interconnecting each neuron from the first layer to each neuron of the second layer; performing a dynamic matching between the first and second layers, yielding a temporal correlation between the first and second layers; and using the temporal correlation between the first and second layers for establishing correspondence between the first and second set of images.
  • the building blocks of the proposed architecture are spiking neurons. These neurons are different from conventional neurons used in engineering problems in the way they convey information and perform computations. Each spiking neuron fires "spikes". Thus, the information is transmitted through either spike rate (or frequency of discharge) or the spike timing of each neuron and the relative spike timing between different neurons.
  • the present invention concerns more specifically temporal correlation (phase synchrony between neurons).
  • the synchrony among a cluster of neurons means external inputs similarity.
  • desynchrony between clusters of neurons means that the underlying data belong to different sources (either audio or visual).
  • the information of the proposed neural network architecture is coded in the neurons synchronization or in the relative timing between spikes from different neurons.
  • the update of the synaptic weights is automatic, so that synapses are dynamic.
  • Characteristics of the method and system according to the present invention include: ⁇ no learning or recognition phase; ⁇ information coded in the synchronization or in the relative timing between spikes from different neurons; ⁇ synaptic-weight-update done automatically; ⁇ dynamic synapses; ⁇ sound sources separation in a mixture of audio sources; ⁇ allows invariant pattern processing or recognition; ⁇ no need to develop specific neural network for each new application, which confirms the application adaptation said in the main document; - no distance computation (as in classical methods),because the information needed to perform classification does not include distances; ⁇ generated either auditory maps or adequate visual pattern, depending on the nature of the target application of the system; ⁇ automatic audio channels selection, in the context of audio processing; ⁇ internal connections between neurons acting as competitive or cooperative relations.
  • Figures 1A- 1C which are labeled "Prior Art", are graphs illustrating respectively the behavior of a single third-generation neuron and a couple of neurons under different stimulus;
  • Figure 2 is a block diagram of a system for monophonic source separation according to an illustrative embodiment of the present invention, including a spiking neural network according to a first illustrative embodiment of the present invention
  • Figure 3 is a schematic view of the neural network from Figure 2;
  • Figure 4 is a flow chart illustrating a method for monophonic source separation according to an illustrative embodiment of the present invention
  • Figure 5 is a spectrogram illustrating the mixture of the utterance "Why were you all weary?" with a trill telephone noise
  • Figure 6A is a spectrogram illustrating a synthesized version of the utterance from Figure 5, "Why were you all weary?", after the separation using the method from Figure 4;
  • Figure 6B which is labeled "Prior Art", is an image illustrating a synthesized version of the utterance from Figure 5 as obtained using a system from the prior art;
  • Figure 7 is a spectrogram illustrating a synthesized version of a trill phone after the separation using the method from Figure 4;
  • Figure 8 is a spectrogram illustrating the utterance "I willingly marry Marilyn” with a 1 kHz pure tone
  • Figures 9A-19B are spectrograms illustrating the separation results using respectively the method from Figure 4 and the approach proposed by Wang and Brown; Figure 9B being labeled "Prior Art";
  • Figure 10 is a spectrogram illustrating the mixture of the utterance "I willingly marry Marylin" with a siren
  • Figure 11 is a spectrogram illustrating the separated siren obtained using the method from Figure 7;
  • Figure 12 is a spectrogram of the utterance from Figure 10 separated
  • Figure 13 is a bloc diagram illustrating a pattern recognition system according to an illustrative embodiment of the present invention.
  • Figure 14 is a schematic view illustrating an example of image to be processed by the system from Figure 13;
  • Figure 15 is a schematic view illustrating a neural network according to a second illustrative embodiment of the present invention.
  • Figure 16 is a flowchart of a method for establishing correspondence between first and second images according to an illustrative embodiment of the present invention
  • Figure 17 is a schematic view illustrating an affine transform T for a four-corner object
  • Figures 18A and 18B are images illustrating the activity of respectively the first and second layers of the neural map using the system from Figure 13 when two bars are presented to the neural network;
  • Figures 19A and 19B are graphs showing respectively the activity of one of the neurons associated with the vertical bar from Figure 18B in the first layer of the neural network from Figure 15 after the segmentation steps from Figure 16, and the activity associated with the background in the same layer;
  • Figures 2OA and 2OB are graphs showing respectively the activity of one of the neurons associated with the horizontal bar from Figure 18A in the first layer of the neural network from Figure 15 after the dynamic matching step from Figure 16, and the activity of one of the neurons associated with the vertical bar from Figure 18B in the second layer of the neural network from Figure 15 after the dynamic matching step from Figure 16;
  • Figure 21 is a graph illustrating the evolution of the thresholded activity of the network from Figure 15 through time in the segmentation phase from Figure 16 considering the images from Figures 18A-18B; each vertical rod representing a synchronized ensemble of neurons and the vertical axis representing the number of neurons in that synchronized region;
  • Figure 22 is a graph illustrating the evolution of the thresholded activity through time of the network from Figure 15, in the dynamic matching phase from Figure 16, considering the images from Figures 18A-18B;
  • Figure 23 is a graph illustrating the synchronization index of a one-object scene when the segmentation steps from Figure 16 are bypassed, the synchronization taking 85 oscillations;
  • Figure 24 is a graph illustrating the synchronization index of a one-object scene when the segmentation steps from Figure 16 preceding the matching phase, the synchronization taking 155 oscillations;
  • Figure 25 is an image illustrating the synchronization phase from the method from Figure 16, binary masks being generated by assigning binary values to different oscillation phases.
  • a system 10 for monophonic source separation according to an illustrative embodiment of the present invention will now be described with reference to Figure 2.
  • the system 10 includes a spiking neural network 12 according to a first illustrative embodiment of the present invention.
  • the system 10 allows to separate a plurality of different monophonic source blended in a sound mixture 14 provided as an input to the system 10.
  • the system 10 is in the form of a bottom-up CASA system which intends to separate different sound sources. System 10 allows separating two, three, or more sound sources.
  • the left branch in Figure 2 provides analysis/synthesis of the sound source in many sub-bands or channels. This separation is achieved by a double vocoder, in the form of FIR Gammatone filter banks 24, following the psychoacoustics cochlear frequency distribution.
  • the system 10 further comprises an auditory images generator 15 including a CAM (Cochleotopic/AMtopic Map) generator 16 and a CSM (Cochleotopic/Spectropic Map) generator 18, the two-layered spiking neural network 12 for receiving an processing a map outputted by the auditory image generator 15 and for providing a binary mask 22 based on the neural synchrony 20 in the output of the neural network, means 26 for multiplying the binary mask 22 with the output of the FIR Gammatone synthesis filter bank 24 and an integrator 28 sum-up the channels.
  • an auditory images generator 15 including a CAM (Cochleotopic/AMtopic Map) generator 16 and a CSM (Cochleotopic/Spectropic Map) generator 18, the two-layered spiking neural network 12 for receiving an processing a map outputted by the auditory image generator 15 and for providing a binary mask 22 based on the neural synchrony 20 in the output of the neural network, means 26 for multiplying the binary mask 22 with the output of the FIR Gammatone synthesis filter bank 24 and an
  • An FIR implementation of the well-known Gammatone filter bank 0 is used as the analysis/synthesis filter bank.
  • the use of the Gammatone filter bank allows obtaining the properties of the audition as observed in the psychoacoustics field.
  • the resulting number of channels is 256 with center frequencies from 100 Hz to 3600 Hz uniformly spaced on an ERB scale (Equivalent Rectangular Bandwidth scale, a psychoacoustics critical bands scale), with a sampling rate of 8 kHz.
  • the actual time-varying filtering is done by the mask 22.
  • this mask 22 is obtained by grouping synchronous oscillators of the neural net, the output of the synthesis filter bank 24 is multiplied with it.
  • auditory channels belonging to interfering sound sources are muted and channels belonging to the sound source of interest remain unaffected. This is, in someway, equivalent into labeling, for each time frame, cochlear channels.
  • a value of one is associated to the targeted signal and a value of 0 to the interfering signal, yielding a binary mask.
  • a non- binary continuous mask can also be used.
  • the signals of the masked auditory channels are added to form the synthesized signal, they are passed through the synthesis filters, which impulse responses are time-reversed versions of the impulse responses of the corresponding analysis filters.
  • the vocoder may take other forms including Linear Predictive Coding Vocoder, Fourier transform and any other transformation allowing analysis/synthesis.
  • the number of channels, the sampling rate and the center frequency values may differ without departing from the spirit and nature of the present invention.
  • the system 10 comprises an auditory images generator 15 to simultaneously generate two different image representations of the signals provided by the filter bank 24: • a first representation in the form of the well known amplitude modulation map, which will be referred to herein as Cochleotopic/AMtopic (CAM) map - closely related to modulation spectrograms as defined in [12] and [42]; and • a second representation in the form of the well known Cochleotopic/Spectrotopic (CSM) map that encodes the averaged spectral energies of the cochlear filter bank output.
  • CAM Cochleotopic/AMtopic
  • CSM Cochleotopic/Spectrotopic
  • the CAM map is an amplitude modulation representation of cochlear envelopes; while CSM is a cochlear energy representation (energy frequency distribution following the Gammatone filter bank structure). CAM and CSM generators can be added to follow each chosen application.
  • the generator 15 is configured so that, in operation, depending on the nature of the intruding sound (speech, music, noise, etc.) one of the maps is selected.
  • the generator 15 is programmed with the following CAM/CSM generation algorithm: 1. Down-sampling to 8000 Hz; 2. Filtering the sound source using a 256 dimensional bark- scaled cochlear filter bank ranging from 200 Hz to 3.6 kHz; 3. For CAM: Extracting the envelope (AM demodulation) for channels 30-256; for other low frequency channels (1- 29) using raw outputs [56]. For CSM: None is done in this step; 4. Computing the STFT (Short-Time Fourier Transform) using a Hamming window, which is well known in the art. Alternatively, non-overlapping adjacent windows with 4ms or 32ms lengths for example can also be used; 5.
  • CAM/CSM generation algorithm 1. Down-sampling to 8000 Hz; 2. Filtering the sound source using a 256 dimensional bark- scaled cochlear filter bank ranging from 200 Hz to 3.6 kHz; 3. For CAM: Extracting the envelope (AM demodulation) for channels 30-256; for other low frequency channels (1
  • the generator 15 as illustrated allows generating only CAM or CSM maps, other maps can alternatively or additionally be generated, such as a map facilitating the separation of speakers whose glottis have similar frequencies.
  • a neural network 12 according to a first illustrative embodiment of the present invention will now be described in more detail with reference to Figure 3.
  • the neural network 12 comprises first and second layers 30 and 32 of oscillatory spiking neurons respectively 36-38 and first and second global controllers 34 (only one shown).
  • the second layer 32 is one-dimensional.
  • the number of neurons 36-38 shown in Figure 3 does not correspond to the real number of neurons, which is of course greater.
  • the dynamics of oscillatory neurons 36-38 is governed by a modified version of the Van der Pol relaxation oscillator (called the Wang- Terman oscillator [69]). There is an active phase when the neuron spikes and a relaxation phase when the neuron is silent. More details on oscillatory neurons are provided in [70].
  • the neural network 12 is configured so that each neuron 36 from the first layer 30 allows for internal connections 37 to other neurons 36 from the first layer 30 or for external connections 39 to neurons 38 from the second layer 32 to receive extra-layer stimuli therefrom and for receiving external stimuli from external signals.
  • Each neuron 38 from the second layer 32 also allows for external connections 41 to other neurons 38 from the second layer 32 or for said internal connection 39 to neurons 36 from said the layer to receive extra-layer stimuli therefrom and for receiving second external stimuli from the second external signals.
  • the neurons 36-38 from of the first and second layers 30-32 are connected to a global controllers 34, which allows for synchronization and un-synchronization of different regions of the network 12.
  • the global controller 34 can be substitute by any network activity regulator allowing regulating the activity of the network.
  • the global controller 34 acts as a local inhibitor.
  • the first layer 30 performs a segmentation of the auditory map.
  • the dynamics of the neurons follows the following state-space Equations, where Xj is the membrane potential (output) of the neuron and Yi is the state for channel activation or inactivation.
  • d Vi,i e[ 1 il + ia,nh(x i , j / ⁇ )) - y id ] (2) dt p denotes the amplitude of a Gaussian noise, /?!j""the external input to
  • the Euler integration method is used to solve the Equations 1 and 2.
  • the first layer is a partially connected network of relaxation oscillators [69]. Each neuron is connected to its four neighbors.
  • the CAM 16 (or the CSM 18) is applied to the input of the neurons 36. It has been found that the geometric interpretation of pitch (ray distance criterion) is less clear for the first 29 channels. For this reason, long-range connections 40 (only one shown) have also been established from clear (high frequency) zones to confusion (low frequency) zones. These connections exist only across the cochlear channel number axis of the CAM. This architecture can help the network 12 to better extract harmonic patterns.
  • the weight between neuron (i,j) and neuron (k,m) of the first layer is computed via the following formula:
  • p(i,j) and p(k,m) are respectively external inputs to neuron(ij) and neuron(k,m) G N(ij).
  • Card ⁇ N(i,j) ⁇ is a normalization factor and is equal to the cardinal number (number of elements) of the set N(i,j) containing neighbors connected to the neuron(ij) (can be equal to 4, 3 or 2 depending on the location of the neuron on the map, i.e. center, corner, etc.).
  • the external input values are normalized.
  • is equal to 1 if the global activity of the network is greater than a predefined ⁇ and is zero otherwise, ⁇ and ⁇ are constants.
  • Li j (t) is the long range coupling as follows:
  • the first layer 30 is designed to handle presentations of auditory maps to process continuous sliding and overlapping windows on the signal.
  • Second layer temporal correlation and multiplicative synapses
  • the second layer 32 performs temporal correlation between neurons. Each of them represents a cochlear channel of the analysis/synthesis filter bank. For each presented auditory map, the second layer 32 establishes binding between neurons which entry is dominated by the same source. The external connections establish multiplicative synapses with the first layer 30.
  • the second layer is an array of 256 neurons (one for each channel) similar to those described by Equations 1 and 2. Each neuron 38 receives the weighted product of the outputs of the first layer neurons along the frequency axis of the CAM/CSM.
  • the operator ⁇ is defined as:
  • ( ) is the averaging over a time window operator (the duration of the window is on the order of the discharge period).
  • the multiplication is done only for non-zero inputs (outputs of first layer 30, in which spike is present) [19], [47]. It is to be noted that this behavior has been observed in the integration of ITD (Interaural Time Difference) and ILD (Inter Level Difference) information in the barn owl's auditory system [19] or in the monkey's posterior parietal lobe neurons that show receptive fields that can be explained by a multiplication of retinal and eye or head position signals [1].
  • ITD Interaural Time Difference
  • ILD Inter Level Difference
  • the synaptic weights inside the second layer 32 are adjusted through the following rule:
  • is chosen to be equal to 2.
  • the "binding" of these features is done via this second layer 32.
  • the second layer 32 is an array of fully connected neurons 38 along with a global controller defined as in Equations 5 and 6.
  • the global controller desynchronizes the synchronized neurons 38 for the fi rst and second sources by emitting inhibitory activities whenever there is an activity (spikes) in the network [69].
  • the selection strategy at the output of the second layer 32 is based on temporal correlation: Neurons belonging to the same source synchronize (same spiking phase); ⁇ Neurons belonging to the other source desynchronize (different spiking phase).
  • a method 100 monophonic source separation according to an illustrative embodiment of the present invention is illustrated in Figure 4.
  • the synaptic connections plasticity is a dynamic process which provides on-demand neural network topology self- modification.
  • the present invention uses this neural network auto-organization feature by the synchronization/desynchronization of the cells, forming or dismantling groups of neurons.
  • the neural network 12 is independent from the signal representation used, but this representation is related to the chosen application.
  • this representation is related to the chosen application. For example, when two speakers are talking simultaneously with similar glottis fundamental frequencies, which makes more difficult the sound sources separation, the instantaneous frequency is suitable in order to detect each glottis opening. In that case, we know that an auditor would have great difficulties to separate the two voices.
  • Some examples [14], [15], [16] demonstrate that auditors confuse the speakers when the transmission channel affects the transmitted voice. Even if the input representation presented to the network is different, the approach stands because the network is not affected by this change.
  • a binary mask 22 is generated from the output of the neural network 12 associating zeros and ones to different channels in order to preserve or remove each sound source.
  • the mask 22 allows attributing each of the channels to a respective source.
  • the energy is normalized.
  • the system 10 has been described with reference to two-source mixtures, the present method and system for monophonic sources separation can also be used for more than two sources. In that case, for each time frame n, labeling of individual channels is equivalent to the use of multiple masks (one for each source).
  • Cooke's database [10] is used for evaluation purposes.
  • the following noises have been tested: 1 kHz tone, FM siren, white noise, trill telephone noise, and human speech.
  • the aforementioned noises have been added to the target utterance.
  • Each mixture is applied to the neural system 10 and the sound sources are extracted.
  • the LSD Log Spectral Distortion
  • performance criterion [64], [65] is used as performance criterion [64], [65] as defined below:
  • Table 1 gives the LSD performances.
  • the method 100 outperforms the other two systems for the tone plus utterance case, performs better than the system proposed by Wang and Brown [69] in all cases, gives performance similar to those provided by the system from Hu and Wang [27] for the siren, but performs worst than the same system from Hu and Wang [27] for the white noise and the telephone ring. For the double-vowel case, the tests are not available for the other two approaches [27], [69].
  • SNR-like criteria such as the SNR, Segmental SNR, PEL (Percentage of Energy Loss), and PNR (Percentage of Noise Residue) are used in the literature (see for example [26], [27], [28], [35], [48], [55], and [69]) and can be used as performance scores. In what follows, spectrograms for different sounds and different approaches are given for visual comparison purposes.
  • the log spectral distortion for three different methods the method according to the present invention as described herein above, W-B (the method proposed by Wang and Brown [69]), and H-W (the method proposed by Hu and Wang [27]).
  • the intrusion noises are as follows: 1 kHz pure tone, FM Siren, telephone ring, white noise, the male intrusion (IdU) for the French /di//da/ mixture, and the female intrusion (/da/) for the French /di//da/ mixture. Except for the last two tests, for the remaining the intrusion is mixed with a sentence taken from Cooke's database.
  • Figure 5 shows the mixture of the utterance "Why ere you all weary” with the telephone trill noise (from Cooke's database).
  • the trill telephone noise (ring) is wideband, interrupted, and structured.
  • Figures 6A-6B show separated utterance spectrogram obtained using respectively the method 100 and the one proposed in Wang and Brown [69]. As can be seen, the method 100 yields better results in higher frequencies.
  • Figure 7 shows the extracted telephone trill.
  • Figure 10 shows the mixture of the utterance "I willingly marry Marylin" with a siren.
  • the siren is a locally narrowband, continuous, structured signal.
  • Figure 11 shows the separated siren obtained using the method 100.
  • Figure 12 shows the spectrogram of the separated utterance.
  • criteria like the PEL and PNR, the SNR, the Segmental SNR, the LSD, etc. are used in the literature as performance criteria, they do not always reflect exactly the real perceptive performance of a given sound separation system.
  • the SNR, the PEL and the PNR ignore high-frequency information.
  • the LSD does not take into account some temporal aspects like the phase distortion. The result of the LSD for two techniques with different phases would be the same. Therefore the LSD will not detect phase distortions in the separation results.
  • Method and system for monophonic source separation have many applications including: ⁇ Multimedia file indexation and authentication: multimedia files on the Internet or other media must be indexed before someone can launch queries on their content ("Who is the singer?", "Which song does he sing?", etc.). Method and system from the present invention can allow separating multiple sources to ease indexation. Also, since the present invention is suitable for comparison purposes, musical files authentication can also be either integrated in a file indexation system, or simply used as a stand-alone application. For example, in peer-to-peer file sharing systems, files may be renamed by users who want to share them illegally.
  • combining an auditory mask generated from camera images to the filter bank can be use to create audio stimuli from the visual scene; ⁇ Intelligent helmet to be used, for example, in high-risk industrial areas where the noise levels are high; and ⁇ Scene Analysis for visually impaired persons where sounds can be used to help blind people analyze visual scenes.
  • different colors, textures, and objects are associated to different sound characteristics (frequency, duration, etc.). If there are many objects in the scene the analysis of the corresponding sound mixture will become difficult for the subject.
  • a method and system for monophonic source separation can be used to separate different visual scene objects by separating their equivalent auditory objects.
  • the system 50 includes a spiking neural network 52 according to a second illustrative embodiment of the present invention.
  • the system 50 does not include preprocessors upstream from the neural network 12 since the image to compare 54 and the reference image 56 are both provided to the neural network 52 as an external input. More specifically, each pixel of each image 54-56 is associated to respective neurons of a respective layer 62-64 of the neural network 52 as will be described hereinbelow in more detail.
  • the synaptic weights are provided by the grey- scales or colors associated to the neurons.
  • the images 54-56 can be for example post-processed images from an external camera (not shown).
  • the source of the images 54-56 may of course vary without departing from the spirit and nature of the present invention.
  • An example of image to be inputted to the layer is illustrated in Figure 14.
  • the neural network 52 will now be described in more detail with reference to Figure 15. Since the spiking neural network 52 is similar to the neural network 12 and for concision purposes only the differences will be described herein in more detail.
  • the neural network 52 includes first and second layers 62 and 64 of spiking neurons 66 and 68.
  • a neighborhood of 4 is chosen in each layer 62 or 64 for the connections.
  • Each neuron 66 in the first layer 62 is connected to all neurons 68 in the second layer 64 and vice-versa.
  • the number of neurons 66-68 shown in Figure 15 does not correspond to the real number of neurons, which is greater.
  • the neighborhood can be set to other numbers depending on the applications.
  • a network activity regulator in the form of a global controller 70, is connected to all neurons 66-68 in the first and second layers 62 and 64 as in [7].
  • the global controller 70 has bidirectional connections to all neurons 66-68 in the two layers 62-64.
  • segmentation is done in the two layers 62 and 64 independently (with no extra-layer connections), while dynamic matching is done with both intra-layer and extra-layer couplings.
  • the intra- layer and extra-layer connections are defined as follows:
  • Card ⁇ N int (i, j) ⁇ is a normalization factor and is equal to the
  • neighbors connected to the neuron ⁇ i,j) can be equal to 4, 3 or 2 depending on the location of the neuron on the map, i.e. center, corner, etc., and the number of active connections.
  • Connection 72-76 Card is the cardinal number for extra-layer connections 76 and is equal to the number of neurons in the second layer 64 with active connection to neuron(i,j) 66 in the first layer 62.
  • normalization in Equation 14 allows the correspondence between similar pictures with different sizes. If the aim is to match objects with exactly the same size the normalization factor is set to a constant for all neurons 66-68. The reason for this is that with normalization even if the size of the picture in the second layer 64 was the double of the same object in the first layer 62 the total influence to the neuron(i,j) would be the same as if the pattern was of the same size.
  • the network 52 can have two different behavioral modes: segmentation and matching.
  • the segmentation stage there is no connection between the two layers 62 and 64.
  • the two layers 62-64 act independently (unless for the influence of the global controller 70) and segment the two images 54and 56 applied to the two layers 62 and 64 respectively.
  • the global controller 70 forces the segments on the two layers 62-64 to have different phases.
  • the two images are segmented but no two segments have the same phase (see Figures 19A-19B).
  • the results from segmentation are used to create binary masks 58 and 60 that select one object in each layer 62 and 64 in multi-object scenes.
  • a snapshot, obtained at a specific time t like the one shown in Figure 25 is used to create the binary mask m(i,j) for one of the objects as follows:
  • x sync can be the synchronized value that corresponds to either the cross or the rectangle in Figure 25 at time t sync .
  • the mask m(i,j) is different than at time t j and corresponds to a different object than the one at time t,.
  • G(t) oH(z - ⁇ ) (17)
  • % " ⁇ (18) ⁇ is equal to 1 if the global activity of the network is greater than a predefined ⁇ and is zero otherwise.
  • the inputs to the layers are defined by:
  • Extra-layer connections 76 (Equation 2) are established. If there are similar objects in the two layers 62-64, these extra-layer connections 76 will help them synchronize. In other words, these two segments are bound together through these extra-layer connections 76 [123]. In order to detect synchronization double-thresholding can be used [2]. This stage may be seen as a folded oscillatory texture segmentation device as the one proposed in [70].
  • the coupling strength S ltj for each layer in the matching phase is defined as follows:
  • x ext defines action potentials from external connections and x ⁇ nt defines action potentials from said first and second internal connections.
  • Figure 16 summarizes the general steps of a method 200 for establishing correspondence between first and second images according to an illustrative embodiment of the present invention. It is to be noted that steps 208-210 are not necessarily sequential.
  • A is a 2x2 non-singular matrix
  • p e R 2 is a point in the plane
  • p' is its affine transform
  • t is the translation vector.
  • Affine transformation is a combination of several simple mappings such as rotation, scaling, translation, and shearing.
  • the similarity transformation is a special case of affine transformation. It preserves length ratios and angles while the affine transformation, in general does not.
  • Equation 22 is equivalent to (neglecting the effect of intra-layer connections, since hf xt » N' nt ):
  • N ext A ⁇ (abc) + A ⁇ abd) (23)
  • extra-layer connections 76 are independent of the affine transform that maps the model to the scene (first and second layer objects) and can be extended to more than 4 points.
  • the original DLM is a rate coding approximation of the ODLM (Oscillatory Dynamic Link Matching) according to the present invention.
  • Aoinishi et al. [3] have shown that a canonical form of rate coding dynamic Equations solve the matching problem in the mathematical sense.
  • the dynamics of a neuron in one of the layers of the original Dynamic Link Matcher proposed in [35] is as follows:
  • Equation 28 Equation 28 becomes:
  • Equation 29 can be further simplified to:
  • Equation 31 the averaged output ⁇ 0 Of an integrate-and-fire neuron is related to the averaged-over-time inputs of a neuron ( ⁇ af'x*TM ) by a continuous function (sigmoidal, etc.).
  • Equation 31 Considering that in Equation 31 one need ⁇ x°"f > in function of ⁇ x ⁇ ° > and that Equation 32 is a set of linear Equations in ⁇ irA :
  • ⁇ wTMV(r ⁇ f o wo ) H ⁇ a WO ) * ⁇ « ⁇ o ) (35)
  • k(.) is a 2-D rectangular window (in the original DLM k(.) was chosen to be a well- known Mexican hat).
  • the DLM is an averaged-over- time approximation of the ODLM according to the present invention.
  • the network 52 can be used to solve the correspondence problem. For example, considering that in a factory chain, someone wants to check the existence of a component on an electronic circuit board (see for example Figure 14). All this person has to do is to put an image of the component on the first layer and check for synchronization between the layers. Ideally, any change in the angle or the location of the camera or even the zoom factor should not influence the result.
  • One of the signal processing counterparts of the method and system from the present invention is the morphological processing. Other partial solutions such as the Fourier transform could be used to perform matching robust to translation.
  • a method and system according to the present invention does not require training or configuration according to the stimulus applied.
  • the network 52 is autonomous and flexible to not previously seen stimuli. This is in contrast with associative memory based architectures in which a stimulus must be applied and saved into memory before retrieval (as in [66] for example). It does not require any pre-configured architecture adapted to the stimulus, like in the hierarchical coding paradigm [52].
  • Figures 18A-18B show activity snapshot (instantaneous values of x(i,j)) ⁇ n the two layers 62-64 after the segmentation step 206. Same-gray scale neurons have similar phases in Figures 18A-18B. On the other hand, different segments on different layers are desynchronized ( Figures 19A-19B and 20A-20B).
  • the segmentation step 206 can be bypassed and the network 52 can function directly in the matching mode. This allows speed up the pattern recognition process.
  • Figures 23-24 illustrate the behavior of a 13x5 network when only one object is present in each layer 62-64 showing that the synchronization time for the matching-only network is shorter. It is to be noted that the matching-only approach is inefficient when there are multiple objects in the scene. In the latter-mentioned case the segmentation plus matching approach should be used.
  • the network 52 is capable of establishing correspondence between images and is robust to translation, rotation, noise and homothetic transforms.
  • the method 200 as been described as a mean to segment images, it can also be used in solving the correspondence problem, as a whole system, using a two-layered oscillatory neural network.
  • Applications of the system 50 include: ⁇ Electronic circuit assembling where the system 50 can be used to verify whether all the electronic components are present (or in good conditions) on a PCB (Printed Circuit Board) or not; ⁇ Facial recognition: the technique can be applied to facial recognition by comparing a given face to a database of faces, in custom houses for example; and ⁇ Fault detection in a production chain where the invention can be used to find manufacturing errors in an assembly chain. ⁇ Teledetection: it can be used to find objects or changes in satellite images. It can also be used to assist in the automatic generation of maps.
  • many types of neurons can be implemented, including integrate-and-fire, relaxation oscillators, or chaotic neurons, even if in the examples detailed hereinbelow only relaxation oscillators are used.
  • the neural network 12 and 52 are implemented on a SIMULINKTM simulation spiking neural networks library, in Java and in C++ (3 different simulators can be used).
  • a neural network according to the present invention can be implemented on other platforms,
  • Neural network architecture according to the present invention is well-suited to the control of event-driven and adaptive event- driven processes and the control of robots for example. For instance, it can be used to control sensorimotor parts of robots by bio-inspired (spiking neural) networks or for fly-by-wire design of aircraft.
  • a plurality of interconnected spiking neural networks can manage and control sensors, peripherals, vision, etc.
  • Frisina R. D., Smith, R. L., Chamberlain, S. C, 1985. Differential encoding of rapid changes in sound amplitude by second-order auditory neurons. Experimental Brain Research 60, 1985, pp. 417-422.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Computational Linguistics (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Image Analysis (AREA)

Abstract

L'invention concerne des systèmes de traitement audio et d'images au moyen d'un réseau neural bio-inspiré. Le premier système permet de séparer un son spécifique dans un mélange de sources audio. Le second système permet d'effectuer le traitement et la reconnaissance d'un motif visuel permettant d'affiner des transformées et du bruit. Le système du réseau neural comprend des première et seconde couches de neurones impulsionnels conçues individuellement pour les première et seconde connexions internes, respectivement, à d'autres neurones de la même couche ou pour des connexions externes à des neurones de l'autre couche, aux fins de réception des stimuli extra-couche de celle-ci et de réception des stimuli externes de signaux externes; et des unités de commande globales connectées à tous les neurones, de manière à permettre d'inhiber ceux-ci. Pendant le fonctionnement, au moment de la réception des stimuli des premiers et seconds signaux externes, les connexions internes sont stimulées et des impulsions synchrones des neurones des première et seconde couches sont favorisées par les connexions externes quand une partie des stimuli des premiers signaux externes sont similaires à quelques stimuli de seconds signaux externes. Il n'est pas nécessaire d'accorder le réseau neural lors du changement de la nature des signaux. De plus, le réseau neural selon l'invention est autonome et il n'y a pas de phase d'entraînement ou de reconnaissance.
PCT/CA2005/001018 2004-06-29 2005-06-29 Reseau neural impulsionnel et utilisation de celui-ci WO2006000103A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CA2,472,864 2004-06-29
CA2472864 2004-06-29

Publications (1)

Publication Number Publication Date
WO2006000103A1 true WO2006000103A1 (fr) 2006-01-05

Family

ID=35781549

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CA2005/001018 WO2006000103A1 (fr) 2004-06-29 2005-06-29 Reseau neural impulsionnel et utilisation de celui-ci

Country Status (1)

Country Link
WO (1) WO2006000103A1 (fr)

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6866841B2 (en) 2001-08-09 2005-03-15 Epatentmanager.Com Non-endocrine disrupting cytoprotective UV radiation resistant substance
EP1964036A1 (fr) * 2005-12-23 2008-09-03 Université de Sherbrooke Reconnaissance de motif spatio-temporel utilisant un réseau de neurones impulsionnels et traitement de celui-ci sur un ordinateur portable et/ou distribué
US8515885B2 (en) 2010-10-29 2013-08-20 International Business Machines Corporation Neuromorphic and synaptronic spiking neural network with synaptic weights learned using simulation
US8626495B2 (en) 2009-08-26 2014-01-07 Oticon A/S Method of correcting errors in binary masks
CN103886395A (zh) * 2014-04-08 2014-06-25 河海大学 一种基于神经网络模型的水库优化调度方法
US20150235125A1 (en) * 2014-02-14 2015-08-20 Qualcomm Incorporated Auditory source separation in a spiking neural network
CN105229675A (zh) * 2013-05-21 2016-01-06 高通股份有限公司 尖峰网络的高效硬件实现
US9317540B2 (en) 2011-06-06 2016-04-19 Socpra Sciences Et Genie S.E.C. Method, system and aggregation engine for providing structural representations of physical entities
CN106683663A (zh) * 2015-11-06 2017-05-17 三星电子株式会社 神经网络训练设备和方法以及语音识别设备和方法
CN106991999A (zh) * 2017-03-29 2017-07-28 北京小米移动软件有限公司 语音识别方法及装置
CN110291540A (zh) * 2017-02-10 2019-09-27 谷歌有限责任公司 批再归一化层
US20200272884A1 (en) * 2017-12-15 2020-08-27 Intel Corporation Context-based search using spike waves in spiking neural networks
CN112036232A (zh) * 2020-07-10 2020-12-04 中科院成都信息技术股份有限公司 一种图像表格结构识别方法、系统、终端以及存储介质
CN112541578A (zh) * 2020-12-23 2021-03-23 中国人民解放军总医院 视网膜神经网络模型
CN112805717A (zh) * 2018-09-21 2021-05-14 族谱网运营公司 腹侧-背侧神经网络:通过选择性注意力的对象检测
CN112858468A (zh) * 2021-01-18 2021-05-28 金陵科技学院 一种多融合特征回声状态网络的钢轨裂纹定量估计方法
CN113426109A (zh) * 2021-06-24 2021-09-24 杭州悠潭科技有限公司 一种基于因式分解机进行棋牌游戏行为克隆的方法
US11164068B1 (en) 2020-11-13 2021-11-02 International Business Machines Corporation Feature recognition with oscillating neural network
CN113609912A (zh) * 2021-07-08 2021-11-05 西华大学 一种基于多源信息融合的输电网故障诊断方法
CN117314972A (zh) * 2023-11-21 2023-12-29 安徽大学 一种基于多类注意力机制的脉冲神经网络的目标跟踪方法

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004088457A2 (fr) * 2003-03-25 2004-10-14 Sedna Patent Services, Llc Systeme de generation d'analyses d'audience

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004088457A2 (fr) * 2003-03-25 2004-10-14 Sedna Patent Services, Llc Systeme de generation d'analyses d'audience

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
JAHNKE ET AL: "Simulation of Spiking Neural Networks on Different Hardware Platforms.", INSTITUTE FUR MIKROELECTRONIC., Retrieved from the Internet <URL:http://mikro.ee.tu-berlin.de/ifm/spinn/pdf/icann97a.pdf> *
PICHEVAR ET AL: "Double-vowel Segregation through Temporal Correlation: A Bio-Inspired Neural Network Paradigm.", NONLINEAR SIGNAL PROCESSING WORKSHOP., 20 May 2003 (2003-05-20), Retrieved from the Internet <URL:http://www.nolisp2005.org/cost/doc/nolisp03/006.pdf> *
ROUAT ET AL: "A bio-inspired sound source separation technique in combination with an enhanced FIR Gammatone Analysis/Synthesis Filterbank.", EUROPEAN SIGNAL PROCESSING CONFERENCE., September 2004 (2004-09-01), Retrieved from the Internet <URL:http://www.igi.tugraz.at/lehre/CI/links/pichevar_eusipco_2004.pdf> *

Cited By (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6866841B2 (en) 2001-08-09 2005-03-15 Epatentmanager.Com Non-endocrine disrupting cytoprotective UV radiation resistant substance
EP1964036A1 (fr) * 2005-12-23 2008-09-03 Université de Sherbrooke Reconnaissance de motif spatio-temporel utilisant un réseau de neurones impulsionnels et traitement de celui-ci sur un ordinateur portable et/ou distribué
EP1964036A4 (fr) * 2005-12-23 2010-01-13 Univ Sherbrooke Reconnaissance de motif spatio-temporel utilisant un réseau de neurones impulsionnels et traitement de celui-ci sur un ordinateur portable et/ou distribué
US8346692B2 (en) 2005-12-23 2013-01-01 Societe De Commercialisation Des Produits De La Recherche Appliquee-Socpra-Sciences Et Genie S.E.C. Spatio-temporal pattern recognition using a spiking neural network and processing thereof on a portable and/or distributed computer
US8626495B2 (en) 2009-08-26 2014-01-07 Oticon A/S Method of correcting errors in binary masks
US8515885B2 (en) 2010-10-29 2013-08-20 International Business Machines Corporation Neuromorphic and synaptronic spiking neural network with synaptic weights learned using simulation
US8812415B2 (en) 2010-10-29 2014-08-19 International Business Machines Corporation Neuromorphic and synaptronic spiking neural network crossbar circuits with synaptic weights learned using a one-to-one correspondence with a simulation
US9317540B2 (en) 2011-06-06 2016-04-19 Socpra Sciences Et Genie S.E.C. Method, system and aggregation engine for providing structural representations of physical entities
CN105229675B (zh) * 2013-05-21 2018-02-06 高通股份有限公司 尖峰网络的高效硬件实现
CN105229675A (zh) * 2013-05-21 2016-01-06 高通股份有限公司 尖峰网络的高效硬件实现
US9269045B2 (en) 2014-02-14 2016-02-23 Qualcomm Incorporated Auditory source separation in a spiking neural network
US20150235125A1 (en) * 2014-02-14 2015-08-20 Qualcomm Incorporated Auditory source separation in a spiking neural network
CN103886395A (zh) * 2014-04-08 2014-06-25 河海大学 一种基于神经网络模型的水库优化调度方法
CN106683663A (zh) * 2015-11-06 2017-05-17 三星电子株式会社 神经网络训练设备和方法以及语音识别设备和方法
CN110291540A (zh) * 2017-02-10 2019-09-27 谷歌有限责任公司 批再归一化层
US11887004B2 (en) 2017-02-10 2024-01-30 Google Llc Batch renormalization layers
CN106991999A (zh) * 2017-03-29 2017-07-28 北京小米移动软件有限公司 语音识别方法及装置
US11636318B2 (en) * 2017-12-15 2023-04-25 Intel Corporation Context-based search using spike waves in spiking neural networks
US20200272884A1 (en) * 2017-12-15 2020-08-27 Intel Corporation Context-based search using spike waves in spiking neural networks
CN112805717A (zh) * 2018-09-21 2021-05-14 族谱网运营公司 腹侧-背侧神经网络:通过选择性注意力的对象检测
CN112036232A (zh) * 2020-07-10 2020-12-04 中科院成都信息技术股份有限公司 一种图像表格结构识别方法、系统、终端以及存储介质
CN112036232B (zh) * 2020-07-10 2023-07-18 中科院成都信息技术股份有限公司 一种图像表格结构识别方法、系统、终端以及存储介质
US11164068B1 (en) 2020-11-13 2021-11-02 International Business Machines Corporation Feature recognition with oscillating neural network
CN112541578A (zh) * 2020-12-23 2021-03-23 中国人民解放军总医院 视网膜神经网络模型
CN112858468B (zh) * 2021-01-18 2023-08-15 金陵科技学院 一种多融合特征回声状态网络的钢轨裂纹定量估计方法
CN112858468A (zh) * 2021-01-18 2021-05-28 金陵科技学院 一种多融合特征回声状态网络的钢轨裂纹定量估计方法
CN113426109A (zh) * 2021-06-24 2021-09-24 杭州悠潭科技有限公司 一种基于因式分解机进行棋牌游戏行为克隆的方法
CN113426109B (zh) * 2021-06-24 2023-09-26 深圳市优智创芯科技有限公司 一种基于因式分解机进行棋牌游戏行为克隆的方法
CN113609912A (zh) * 2021-07-08 2021-11-05 西华大学 一种基于多源信息融合的输电网故障诊断方法
CN117314972A (zh) * 2023-11-21 2023-12-29 安徽大学 一种基于多类注意力机制的脉冲神经网络的目标跟踪方法
CN117314972B (zh) * 2023-11-21 2024-02-13 安徽大学 一种基于多类注意力机制的脉冲神经网络的目标跟踪方法

Similar Documents

Publication Publication Date Title
WO2006000103A1 (fr) Reseau neural impulsionnel et utilisation de celui-ci
CA2642041C (fr) Reconnaissance de motif spatio-temporel utilisant un reseau de neurones impulsionnels et traitement de celui-ci sur un ordinateur portable et/ou distribue
CN113035227B (zh) 一种多模态语音分离方法及系统
Dávila-Chacón et al. Enhanced robot speech recognition using biomimetic binaural sound source localization
RU2193797C2 (ru) Устройство ассоциативной памяти (варианты) и способ распознавания образов (варианты)
Barros et al. Learning auditory neural representations for emotion recognition
AU655235B2 (en) Signal processing arrangements
Watrous Phoneme discrimination using connectionist networks
Sagi et al. A biologically motivated solution to the cocktail party problem
Watrous Speaker normalization and adaptation using second-order connectionist networks
Movellan et al. Robust sensor fusion: Analysis and application to audio visual speech recognition
Song et al. Research on scattering transform of urban sound events detection based on self-attention mechanism
Adeel Conscious multisensory integration: introducing a universal contextual field in biological and deep artificial neural networks
Makhlouf et al. Evolutionary structure of hidden Markov models for audio-visual Arabic speech recognition
Pichevar et al. Monophonic sound source separation with an unsupervised network of spiking neurones
Elhilali et al. A biologically-inspired approach to the cocktail party problem
Gombos Acoustic recognition with deep learning; experimenting with data augmentation and neural networks
Kasabov et al. Audio-and Visual Information Processing in the Brain and Its Modelling with Evolving SNN
Betancourt et al. Portable expert system to voice and speech recognition using an open source computer hardware
CN115132221A (zh) 一种人声分离的方法、电子设备和可读存储介质
Rouat et al. Source separation with one ear: Proposition for an anthropomorphic approach
Chelali et al. Audiovisual Speaker Identification Based on Lip and Speech Modalities.
Movellan et al. Bayesian robustification for audio visual fusion
Su A bio-inspired smart perception system based on human’s cognitive auditory skills
Kamm et al. Comparing performance of spectral distance measures and neural network methods for vowel recognition

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KM KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NG NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SM SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

NENP Non-entry into the national phase

Ref country code: DE

WWW Wipo information: withdrawn in national office

Country of ref document: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 05761674

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 05761674

Country of ref document: EP

Kind code of ref document: A1