WO2021037125A1 - 对象识别方法及装置 - Google Patents

对象识别方法及装置 Download PDF

Info

Publication number
WO2021037125A1
WO2021037125A1 PCT/CN2020/111650 CN2020111650W WO2021037125A1 WO 2021037125 A1 WO2021037125 A1 WO 2021037125A1 CN 2020111650 W CN2020111650 W CN 2020111650W WO 2021037125 A1 WO2021037125 A1 WO 2021037125A1
Authority
WO
WIPO (PCT)
Prior art keywords
aer
feature maps
data
feature
event
Prior art date
Application number
PCT/CN2020/111650
Other languages
English (en)
French (fr)
Inventor
潘纲
刘千惠
蒋磊
程捷
阮海博
邢东
唐华锦
马德
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP20858856.6A priority Critical patent/EP4016385A4/en
Publication of WO2021037125A1 publication Critical patent/WO2021037125A1/zh
Priority to US17/680,668 priority patent/US20220180619A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/62Extraction of image or video features relating to a temporal dimension, e.g. time-based feature extraction; Pattern tracking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning

Definitions

  • This application relates to the technical field of image processing, and in particular to an object recognition method and device.
  • AER address event representation
  • the AER event includes the location of the pixel (ie address information), and sends out the The time of the event (that is, the time stamp) and the indication information that the value of the light intensity increases or decreases.
  • the time of the event that is, the time stamp
  • AER sensors only record events where the change in light intensity exceeds the threshold, and events where the change in light intensity is less than the threshold will not be recorded, so it can greatly reduce the redundancy of visual information. More.
  • the spatial feature of the AER data is first extracted, and the AER event in the AER data is input in the spatial feature extraction algorithm, and the output is It is a 128*128 spatial feature map (ie, spatial feature). Then input the spatial feature map to the classification algorithm to identify the objects in the AER data.
  • this application provides an object recognition method and device.
  • this application provides a method for object recognition.
  • the AER data of the object to be identified in the process of identifying the object, can be obtained.
  • the AER data includes a plurality of AER events of the object to be identified.
  • Each AER event includes the timestamp of the occurrence of the AER event (that is, the occurrence of the AER The time of the event) and address information (that is, the location information of the pixel that generated the AER event).
  • you can extract multiple feature maps of the AER data where each feature map contains part of the space information and part of the time information of the object to be identified, and part of the space information and part of the time information are obtained based on the timestamp and address information of each AER event of.
  • the object to be recognized is recognized.
  • the object recognition method provided in this application extracts the time information and address information of the AER event when extracting the feature map of the AER data, the time information and spatial information of the original AER data are included in the extracted feature map, so that The feature map can more comprehensively represent the original data, and then when the object to be recognized is recognized, the recognition result can be more accurate.
  • the extracting multiple feature maps of the AER data includes: using multiple filters to process the address information of the multiple AER events to obtain multiple first feature maps; Perform attenuation processing on the feature values in the multiple first feature maps according to the timestamps of the multiple AER events to obtain multiple feature maps of the AER data.
  • the filters can be Gabor filters or Difference of Guassian (DOG) filters.
  • DOG Difference of Guassian
  • the filter is a Gabor filter
  • the multiple filters are filters with combinations of different directions and scales
  • the scale is the size of the convolution kernel
  • the direction is the direction of the Gabor filter kernel function.
  • different directions and scales mean that at least one of the scales and directions is different
  • any filter you can use the filter to process the address information of multiple AER events (the processing can be convolution processing) to get a first feature map.
  • the processing can be convolution processing
  • you can get Multiple first feature maps the sizes of the multiple first feature maps are the same, all are n*m). Then, the feature value of each first feature map in the multiple first feature maps is attenuated according to the timestamp of the AER event to obtain multiple feature maps of the AER data.
  • the convolution kernel of each Gabor filter and the spatial information of the AER event can be used for convolution processing to obtain the first convolution kernel corresponding to each Gabor filter.
  • a feature map In this way, since there are multiple Gabor filters, multiple first feature maps can be obtained.
  • the identifying the object to be identified according to the multiple feature maps of the AER data includes: encoding the multiple feature maps of the AER data to obtain multiple pulses Sequence, and use a pulse neural network to process the multiple pulse sequences to identify the object to be identified.
  • each pulse sequence includes multiple pulses, and each pulse carries part of the time information and part of the space information of the object to be identified.
  • the multiple pulses belonging to the same pulse sequence are based on differences in the same set direction. In the feature map corresponding to the filter, the feature values with the same position are obtained.
  • multiple feature maps of the AER data can be encoded to obtain multiple pulse sequences.
  • the feature value in each feature map is encoded into the trigger time of the pulse.
  • the trigger moment of the encoding pulse or at other times, combine the feature values at the same position in the feature maps corresponding to different filters in the set directions in multiple feature maps to form a set of feature values (the set direction is filter In either direction of the device).
  • the trigger moments of the pulses corresponding to the set of eigenvalues form a pulse sequence.
  • each position in the feature map corresponds to a pulse sequence in different directions, and multiple pulse sequences can be obtained, and each pulse sequence includes multiple pulse.
  • the pulse Since the feature value in the feature map carries part of the time information and part of the space information of the object to be identified, the pulse also carries part of the time information and part of the space information of the object to be identified. Then the pulse sequence is input to the pulse neural network to identify the objects included in the AER data.
  • a target encoding function may be used to encode multiple feature maps of the AER data to obtain multiple pulse sequences, and the target encoding function is an inverse linear function or an inverse logarithmic function.
  • the target coding function is used to control the smaller characteristic value in the characteristic map to trigger the pulse later or not to trigger the pulse, and to control the larger characteristic value in the characteristic map to trigger the pulse earlier.
  • the present application provides an object recognition device, the device includes one or more modules, and the one or more modules are used to implement the object provided by the first aspect or the possible implementation of the first aspect.
  • Method of identification is used to implement the object provided by the first aspect or the possible implementation of the first aspect.
  • the present application provides a computing device for object recognition.
  • the computing device includes a processor and a communication interface, and the processor is connected to the communication interface,
  • the communication interface is used to receive AER data of the object to be identified, and the processor is used to execute the object identification method described in the first aspect.
  • the present application provides a computer-readable storage medium, the computer-readable storage medium stores instructions, and when the instructions in the computer-readable storage medium are executed on a computing device, the computing device executes the above-mentioned first aspect Or the object recognition method provided by the possible implementation of the first aspect.
  • the present application provides a computer program product containing instructions that, when run on a computing device, causes the computing device to execute the object recognition method provided by the first aspect or the possible implementation of the first aspect. .
  • Fig. 1 is a structural block diagram of a computing device provided by an exemplary embodiment of the present application
  • Fig. 2 is a schematic structural diagram of an object recognition method provided by an exemplary embodiment of the present application.
  • FIG. 3 is a schematic diagram of the architecture of extracting feature maps provided by an exemplary embodiment of the present application.
  • FIG. 4 is a schematic diagram of an architecture for encoding a feature map provided by an exemplary embodiment of the present application
  • Fig. 5 is a schematic diagram of extracting a feature map provided by an exemplary embodiment of the present application.
  • FIG. 6 is a schematic flowchart of an object recognition method provided by an exemplary embodiment of the present application.
  • Fig. 7 is an implementation architecture diagram of an object recognition method provided by an exemplary embodiment of the present application.
  • Fig. 8 is a schematic diagram of extracting a feature map provided by an exemplary embodiment of the present application.
  • Fig. 9 is a schematic structural diagram of an object recognition apparatus provided by an exemplary embodiment of the present application.
  • the AER sensor is a neuromorphic device that mimics the mechanism of the human retina.
  • the AER sensor includes multiple pixels, each of which monitors the change in light intensity in a specific area. When the change exceeds the threshold, the AER event corresponding to the pixel is recorded, and when the change does not exceed the threshold, the AER event corresponding to the pixel is not recorded.
  • Each AER event includes the location information (ie address information), time of occurrence (ie timestamp) and polarity of the pixel where the AER event occurred.
  • the polarity is used to characterize the change in light perceived by the pixel from dark to bright. (It can be represented by a value of 1), or from light to dark (which can be represented by a value of -1).
  • the final output of the AER sensor is the AER event from each pixel.
  • the AER sensor has the advantages of asynchronous, high temporal resolution and sparse representation of the scene. It has the advantages of both data transmission speed and data redundancy. There are great advantages. It should be noted that the asynchrony of the above scenario means that each pixel collects AER events separately.
  • AER data including a stream of AER events from each pixel. Any AER event in the AER event stream from each pixel includes the address information of the pixel where the AER event occurred, the occurrence time stamp, and the polarity.
  • Gabor filter is a linear filter used for texture analysis. Gabor filters can be used to extract features of images and videos, and are widely used in computer vision applications. Specifically, only the texture corresponding to its frequency is allowed to pass smoothly, while the energy of other textures is suppressed.
  • the Gabor filter can be represented by the scale s and the direction ⁇ . The combination of different scale s and direction ⁇ corresponds to different convolution kernels, so the combination of different scale s and direction ⁇ corresponds to different filters.
  • Studies have shown that simple cells in the visual cortex of the mammalian brain can be modeled by Gabor filters, and each Gabor filter simulates neuronal cells with a certain scale of receptive fields.
  • the receptive field is a stimulus area that a neuron responds or innervates.
  • Spiking Neural Network is often referred to as the third-generation artificial neural network.
  • the neurons of the impulse network can simulate the voltage change and transmission process of biological nerve cells.
  • the information transmission between neurons is in the form of pulses, and the pulses contain time and space information.
  • Spike neural networks can be used to identify and classify inputs.
  • the Spike-timing Dependent Plasticity is an update rule for the weights of connections between neurons found in the brain. The goal is if the firing of two neurons is closer in time, The more intimate the binding relationship between them.
  • the STDP algorithm is an unsupervised learning algorithm.
  • Unsupervised learning algorithms dominate the learning of humans and animals. People can discover the inner structure of the world through observation instead of being told the name of every objective thing.
  • the design of the unsupervised learning algorithm is mainly for the training of unlabeled data sets, requiring the application of unsupervised learning rules to adaptively adjust the connection weight or structure in the neural network. That is to say, without the supervision of the "teacher" signal, the neural network must find the regularity (such as statistical characteristics, correlation or category, etc.) from the input data, and realize classification or decision-making through the output.
  • the supervised learning algorithm uses a set of samples of known categories to adjust the parameters of the classifier to achieve the required performance. It is also called supervised training. That is, supervised learning is a machine learning task that infers a function from the labeled training data.
  • the object in the image data is usually recognized.
  • the object can be a pedestrian, a vehicle, etc., or an action process.
  • the spatial feature of the AER data is extracted from the data output by the AER sensor, and then the object to be recognized in the AER data is identified based on the spatial feature. Since only the spatial features are considered, the extracted features are relatively single, which may lead to inaccurate recognition results when recognizing AER data. Therefore, it is necessary to provide a relatively accurate identification method.
  • the AER sensor in the embodiment of the present application can be applied to any image shooting scene that mainly records changed content, such as a driving recorder, a monitoring device, and the like.
  • the object recognition method provided in the embodiments of the present application may be executed by an object recognition device, and the object recognition device may be a hardware device, such as a server, a terminal computing device, and the like.
  • the object recognition device may also be a software device, specifically a software system running on a hardware computing device.
  • the embodiment of the present application does not limit the location where the object recognition device is deployed. Exemplarily, the device for object recognition may be deployed on a server.
  • the device for object recognition can also be a device composed of multiple parts logically.
  • the device for object recognition can include an acquisition module, an extraction module, and a recognition module.
  • Each component of the device for object recognition can be deployed in different parts.
  • Each part of the device can run in the three environments of cloud computing equipment system, edge computing equipment system or terminal computing equipment respectively, and can also run in any two of these three environments.
  • the cloud computing device system, the edge computing device system, and the terminal computing device are connected by a communication path and can communicate with each other.
  • FIG. 1 exemplarily provides a possible architecture diagram of the computing device 100 of the present application.
  • the computing device 100 may include a processor 101, a memory 102, a communication interface 103, and a bus 104.
  • the number of processors 101 may be one or more, and FIG. 1 only illustrates one of the processors 101.
  • the processor 101 may be a central processing unit (CPU). If the computing device 100 has multiple processors 101, the types of the multiple processors 101 may be different or may be the same.
  • multiple processors of the computing device 100 may also be integrated into a multi-core processor.
  • the processor 101 may be used to execute the steps of the object recognition method. In practical applications, the processor 101 may be a very large-scale integrated circuit.
  • processor 101 An operating system and other software programs are installed in the processor 101, so that the processor 101 can implement access to devices such as the memory 102. It is understandable that, in the embodiment of the present invention, the processor 101 is introduced as a central processing unit (CPU), and in practical applications, it may also be other specific integrated circuits (ASICs).
  • CPU central processing unit
  • ASICs application specific integrated circuits
  • the memory 102 stores computer instructions and data, and the memory 102 can store computer instructions and data required to implement the object recognition method provided in the present application.
  • the memory 102 stores instructions used to implement the execution steps of the acquiring module in the method for object recognition provided in the present application.
  • the memory 102 stores instructions for performing steps of the extraction module in the object recognition method provided in this application.
  • the memory 102 stores instructions for performing steps of the recognition module in the method for object recognition provided in the present application.
  • the memory 102 may be any one or any combination of the following storage media: non-volatile memory (such as Read-Only Memory (ROM), Solid State Disk (SSD), Hard Disk Drive, HDD), optical disc, etc.), volatile memory.
  • ROM Read-Only Memory
  • SSD Solid State Disk
  • HDD Hard Disk Drive
  • optical disc etc.
  • the communication interface 103 may be any one or any combination of the following devices: a network interface (such as an Ethernet interface), a wireless network card, and other devices with a network access function.
  • the communication interface 103 is used for data communication between the computing device 100 and other computing devices 100 or terminals.
  • the AER data of the object to be identified can be obtained from the AER sensor through the communication interface 103.
  • FIG. 1 shows the bus 104 with a thick line.
  • the bus 104 can connect the processor 101 with the memory 102 and the communication interface 103. In this way, through the bus 104, the processor 101 can access the memory 102, and can also use the communication interface 103 to interact with other computing devices 100 or terminals.
  • the computing device 100 executes computer instructions in the memory 102, and uses the computing device 100 to implement the object recognition method provided in this application.
  • the computing device 100 is caused to perform the steps performed by the acquisition module in the method of object recognition.
  • the computing device 100 is caused to perform the steps performed by the extraction module in the method of object recognition.
  • the computing device 100 is caused to execute the instruction of the step of the recognition module in the method of object recognition.
  • the embodiment of the present application provides an object recognition method.
  • the implementation diagram of the method is shown in FIG. 2.
  • the object recognition device obtains the AER data collected by the AER sensor from the AER sensor.
  • the object recognition device performs encoding processing on the AER data, and the encoding processing may include extracting multiple feature maps of the AER data and encoding multiple feature maps of the AER data.
  • the object recognition device inputs the encoded content into the recognition model for recognition.
  • the process of extracting the feature map can be as shown in Figure 3.
  • the object recognition device calculates the AER data through convolution and attenuates the spatial information with time, so that each feature value in each feature map is affected by the AER event. The impact of timestamps.
  • the process of encoding the feature map can be as shown in Figure 4.
  • the object recognition device performs time encoding on the feature value in the feature map through an encoding function to generate the trigger time of the pulse.
  • it can also perform spatial coding at the same time. Specifically, the feature values of different scales at the same position and set direction in the multiple feature maps of AER data are grouped together (the set direction will be described later) ) To obtain multiple sets of eigenvalues. Then, the trigger time of the pulse corresponding to each group of characteristic values is determined, and the trigger time of a group of pulses corresponding to each group of characteristic values is obtained.
  • the trigger moment of each group of pulses is a pulse sequence, that is, each pulse sequence includes multiple pulses.
  • the feature maps with different scales in the set direction are feature maps obtained by using different filters in the set direction. For example, there are 4 filters with a direction ⁇ of 45 degrees (4 represents the number of scales), and a filter with a direction of 45 degrees can obtain 4 feature maps. The same position refers to the same position in multiple feature maps. If there are 4 feature maps, each feature map is the (4,4) position.
  • the embodiment of the present application uses a filter when extracting the feature map.
  • the filter can be any filter capable of extracting features, such as a Gabor filter or a DOG filter.
  • a Gabor filter or a DOG filter.
  • the function expression of the Gabor filter can be:
  • ( ⁇ x, ⁇ y) is the spatial offset position of (x, y) and the position (e x , e y ) of the pixel to which the AER event belongs, (x, y) is the element in the convolution kernel corresponding to the Gabor filter corresponds to The position in the feature map.
  • (e x , e y ) can also be considered as the position of the AER event in the feature map.
  • is the spatial aspect ratio and determines the shape of the Gabor filter. When the value is 1, the shape is circular.
  • is the wavelength, which directly affects the filtering scale of the Gabor filter (that is, the scale mentioned later).
  • is the bandwidth and the variance of the Gabor filter, and ⁇ and ⁇ are determined by the scale s.
  • the scale s represents the size of the convolution kernel of the Gabor filter.
  • the scale s is 3, the convolution kernel is a 3*3 convolution kernel, the scale s is 5, and the convolution kernel is a 5*5 convolution kernel.
  • represents the direction of the kernel function of the Gabor filter.
  • the convolution kernel under this set of scale s and the direction ⁇ can be calculated through the function expression of the Gabor filter.
  • the scale s is 3, the direction is 45 degrees
  • the position of the pixel to which the AER event belongs is (3, 3) (that is, the position in the feature map is (3, 3))
  • Substituting ⁇ x and ⁇ y into equation (1) can obtain G(-1, -1), which is the value of the (1, 1) position of the convolution kernel. In this way, the convolution kernel under each set of scale s and direction ⁇ can be determined.
  • the convolution kernel After determining the convolution, the convolution kernel will be used to extract the feature map.
  • the number of feature values included in the feature map is the same as the number of pixels of the AER sensor, and the number of feature values in each row of the feature map is the same as the number of pixels in each row of the AER sensor, and there is a one-to-one correspondence.
  • the value of each feature value in the initial feature map can be zero. For example, if the number of pixels of the AER sensor is 5*5, then the number of feature values contained in the feature map is 5*5.
  • the feeling of the location of the AER event Nonaka.
  • the convolution is The position of the AER event in the feature map is (m, n), and the value e of the center position of the convolution kernel is superimposed with the feature value of the position (m, n) in the feature map. Then superimpose a with the feature value at position (m-1, n-1) in the feature map, superimpose b with the feature value at position (m, n-1) in the feature map, and combine c with ( The feature values at the position m+1, n-1) are superimposed, and so on, the convolution kernel can be overlaid on the feature map, so that the feature map to which the AER event is added is obtained.
  • the convolution kernel is 3*3
  • the initial feature map is 5*5
  • the object recognition device superimposes the convolution kernel on the position of the feature map (3,3). Then the object recognition device superimposes the convolution kernel on the position of the feature map (2,3).
  • the feature map of the AER data can be obtained.
  • the above description only takes a set of filters of scale s and direction ⁇ as an example, and each set of filters of scale s and direction ⁇ can obtain a feature map.
  • the convolution kernel of the Gabor filter cannot completely cover the feature map, and only the values that can be covered in the convolution kernel can be superimposed.
  • the convolution kernel of the Gabor filter is a 3*3 convolution kernel.
  • the values in the first column and the first row of the convolution kernel cannot be covered in the feature map.
  • the values except the first column and the first row can be covered, so only the values except the first column and the first row are superimposed in the feature map.
  • the above process of determining the convolution kernel in each scale s and direction ⁇ can be determined by the object recognition device during object recognition. It may also be determined on other equipment, and the object recognition device is obtained from other equipment when in use, which is not limited in the embodiment of the present application.
  • horizontal to the right is the positive direction of the X-axis
  • vertical downward is the positive direction of the Y-axis as an example for description.
  • the object recognition device obtains AER data from the AER sensor.
  • the AER data includes the AER event from each pixel. Since each pixel is used to detect the object to be identified, it can be considered that the AER data includes multiple AER events of the object to be identified.
  • Each AER event includes the address information, time stamp, and polarity of the pixel where the AER event occurred.
  • the AER sensor can detect the change in the light intensity of each pixel, and when the change exceeds the threshold, the AER event corresponding to the pixel is recorded, and when the change does not exceed the threshold, the AER event corresponding to the pixel is not recorded.
  • Each AER event includes the address information, time stamp, and polarity of the pixel where the AER event occurred. The polarity is used to characterize whether the pixel perceives the change in light from dark to bright (which can be represented by a value of 1) or from bright To dark (you can use the value -1 to indicate). In this way, the AER data includes multiple AER events.
  • the object recognition device When the object recognition device receives the processing request of the AER data, it can send the acquisition request of the AER data to the AER sensor to which the AER data belongs. After the AER sensor receives the AER data acquisition request sent by the object recognition device, it can send the AER data to the object recognition device. In this way, the object recognition device can obtain AER data from the AER sensor.
  • the AER sensor is configured with an upload cycle of AER data. At each upload cycle, the AER sensor sends the AER data collected from the last upload to this upload to the object recognition device.
  • the object recognition device can receive the AER data sent by the AER sensor. In this way, the object recognition device can also obtain AER data from the AER sensor.
  • the AER sensor each time the AER sensor collects AER data, it sends the collected AER data to the object recognition device. In this way, the object recognition device can also obtain AER data from the AER sensor.
  • AER data for a period of time is acquired in this application, and the object to be identified in the AER data for this period of time is identified, for example, the period of time is 1 minute.
  • the object to be recognized refers to an object whose category or action is not determined in the AER data.
  • Step 602 The device for object recognition extracts multiple feature maps of the AER data.
  • each feature map contains part of the spatial information and part of the time information of the object to be identified, and the part of the spatial information and part of the time information are obtained according to the timestamp and address information of each AER event.
  • AER data corresponds to multiple feature maps, and each feature map contains part of the spatial information and part of the time information of the object to be identified.
  • the spatial information is used to indicate the spatial characteristics of the object to be recognized, and the time information is used to indicate the temporal characteristics of the object to be recognized.
  • the device for object recognition extracts part of the spatial information of the object to be recognized, a convolution operation can be used to extract it.
  • the object recognition device extracts time information, it can extract the space information in a way that decays with time, so that the space information is affected by the timestamp of the AER event.
  • the object recognition device can use multiple filters (the multiple filters are not the same in each pair.
  • the filter is a Gabor filter, the multiple filters can be multiple sets of filters under the scale and direction.
  • the address information of multiple AER events in the AER data is processed to obtain multiple first feature maps, and the multiple first feature maps have the same size.
  • the device for object recognition can perform attenuation processing on the feature values in the first feature map according to the timestamps of multiple AER events to obtain the feature map corresponding to the first feature map (first feature map).
  • the size is the same as the size of the feature map.
  • the size refers to the number of feature values.
  • the difference between the first feature map and the feature map is that the feature values are attenuated.
  • the first feature map only includes part of the spatial information of the object to be identified.
  • the feature map It includes part of the spatial information and part of the time information of the object to be identified. Since there are multiple first feature maps, multiple feature maps of AER data can be obtained. The technical details of step 602 will be described in detail later.
  • Step 603 The device for object recognition recognizes the object to be recognized according to the multiple feature maps of the AER data.
  • the object recognition device can use a recognition model (such as SNN, etc.) to recognize the object to be recognized.
  • a recognition model such as SNN, etc.
  • the structure diagram of the process of performing the processing from step 602 to step 603 is shown in FIG. 7.
  • the structure diagram includes the S1 layer, C1 layer, coding layer and recognition layer.
  • the S1 layer can output 16 feature maps based on the AER data, and each The feature maps have the same size, which can be equal to the number of pixels of the AER sensor.
  • the feature maps stacked in the horizontal direction are the feature maps of the same direction ⁇ and different scales (that is, the feature maps processed by different filters in the same direction ⁇ ).
  • the vertical direction of the S1 layer is the direction from top to bottom.
  • is a feature map of 0 degrees, 45 degrees, 90 degrees, and 135 degrees.
  • the C1 layer only performs dimensionality reduction processing on the feature maps output by the S1 layer, so the number of feature maps has not changed.
  • the coding layer is used to perform temporal coding and spatial coding processing on the feature map output by the C1 layer to obtain a pulse sequence.
  • the recognition layer is used to recognize the object to be recognized in the AER data based on the pulse sequence.
  • the S1 layer is used to implement step 602, and the C1 layer, the coding layer, and the recognition layer are used to implement step 603.
  • the S1 layer is used.
  • the object recognition device performs convolution calculations on each AER event in the AER data with the filter (in this application, the filter is a Gabor filter as an example. Description), the specific process is:
  • the object recognition device obtains the address information of an AER event each time, and the convolution kernel corresponding to the Gabor filter can be obtained. Cover the receptive field corresponding to the pixel to which the AER event belongs in the feature map (this processing may be referred to as convolution processing), thereby updating the feature map.
  • convolution processing on multiple AER events can obtain the first feature map corresponding to the Gabor filter.
  • the object recognition device can use the timestamps of multiple AER events to attenuate the feature value in the first feature map corresponding to the Gabor filter, and extract the part of the space that contains the object to be recognized.
  • the feature map of the information and part of the time information that is, the influence of the earlier attenuated AER event on the feature value of the feature map at the current moment, so as to effectively extract the time information of the AER data.
  • the specific processing is to determine the AER event covered by the receptive field at any location in the first feature map, and then use the timestamps of these AER events to attenuate the feature value of the location, so that the longer the AER from the current moment is The smaller the impact of the event on the feature value in the feature map at the current moment, the greater the impact of the AER event with the shorter time from the current moment on the feature value in the feature map at the current moment.
  • the device for object recognition can perform convolution processing on the address information of multiple AER events and use each set of Gabor filters with scale s and direction ⁇ to obtain multiple first feature maps.
  • the eigenvalues of are respectively attenuated to obtain multiple characteristic maps of the AER data.
  • the number of feature values in the first feature map may be the same as the number of pixels in the AER sensor, and the number of feature values in each row of the first feature map is the same as the number of pixels in each row of the AER sensor.
  • a formula can also be used to directly combine the spatial information and time information of the object to be identified, as shown below:
  • r (x, y, t, s, ⁇ ) refers to the eigenvalue of the feature map corresponding to the Gabor filter whose scale is s and the direction is ⁇ at the position (x, y) at time t .
  • ⁇ x xe x
  • ⁇ y ye y
  • E(t) represents the AER event set composed of all AER events before time t (including time t).
  • e represents a specific AER event.
  • (e x , e y ) represents the position of the pixel to which the AER event e belongs, and can also be referred to as the position of the AER event e in the feature map.
  • (XE x) is a characteristic diagram position (x, y) in the x and e x AER event e shifted in the X direction
  • (ye y) is a characteristic diagram position (x, y) in the event y and AER The offset of e y of e in the Y direction.
  • T leak is a preset parameter, which is a constant.
  • the eigenvalue r(x,y,t,s, ⁇ ) at the (x,y) position is the sum of the product of the eigenvalues corresponding to all AER events at the (x,y) position and the attenuation function.
  • all AER events at the location (x, y) refer to the AER events at the location (x, y) covered by the receptive field.
  • it can include two types, one is the AER event where the position of the pixel to which the AER event belongs is also (x, y), and the other is the position of the pixel to which the AER event belongs is not the (x, y) position, but the receptive field covers (x, y) position.
  • the convolution kernel is a 3*3 convolution kernel
  • the first line is a11, a12, a13 from left to right
  • the second line is a21, a22, a23 from left to right
  • the third line is from left to right a31, a32, a33.
  • the position (x, y) in the feature map is (3, 3)
  • there are two AER events whose position (e x , e y ) is (3, 3) before time t (including time t), and the AER event 1 at 100 ms And AER event 2 at 200ms.
  • the object recognition device multiplies the a22 of the convolution kernel by the corresponding AER event 1.
  • the object recognition device determines that the AER event 4 is at the position (3,3), the value of the corresponding convolution kernel is a11, and the value of a11 is Multiply to get the fourth value. Then, the object recognition device adds the first value, the second value, the third value, and the fourth value to obtain the fifth value.
  • the fifth value is the feature value of the position (3, 3) in the feature map at time t.
  • r(x, y, t, s, ⁇ ) is obtained by multiplying two parts, one part is used to reflect spatial information, namely G( ⁇ x, ⁇ y, s, ⁇ ), and the other One part is used to reflect time information, namely Due to the use of The effect of an AER event that decays earlier on the eigenvalue at the current moment, the feature map includes both spatial information and time information.
  • the above is for a set of Gabor filters of scale s and direction ⁇ , and determine the feature map corresponding to the Gabor filter of scale s and direction ⁇ .
  • the feature map corresponding to the Gabor filter is determined.
  • the object recognition device can be determined using the above formula (2).
  • the feature map output in step 602 is 6*6.
  • the feature The value of each position in the figure is 0, an AER event is generated at 100ms position (4, 4), and the convolution kernel of the Gabor filter with a scale s of 3 and a direction ⁇ of 45 degrees is overlaid on the feature map (4, 4)
  • the feeling of the AER event of the location is wild.
  • the characteristic value in the characteristic diagram has a certain attenuation compared to 100ms.
  • the feature value corresponding to each pixel in the feature map decreases or increases toward the stationary potential.
  • the static potential is generally 0, and the situation that decreases toward the static potential is: the characteristic value greater than 0 will decrease toward the static potential 0, such as changing from 1 to 0.5.
  • the situation of increasing towards the stationary potential is: the characteristic value less than 0 will increase towards the stationary potential 0, such as changing from -1 to -0.5.
  • t in the above formula (2) refers to the time in the AER data.
  • the 5th second of AER data the 10th second of AER data, etc.
  • the aforementioned attenuation method is exponentially attenuated.
  • other methods of attenuation can also be used, as long as the spatial information can be attenuated over time, which is not limited in the embodiment of the application.
  • 16 sets of scales and directions ⁇ can be used, and their values are shown in Table 1.
  • each set of scale s and direction ⁇ corresponds to a feature map, so the S1 layer in the embodiment of the present application can output 16 feature maps.
  • the value of ⁇ may be 0.3.
  • is 1.2 and ⁇ is 1.5; when the scale s is 5, ⁇ is 2.0 and ⁇ is 2.5; when the scale s is 9 , The value of ⁇ is 3.6, and the value of ⁇ is 4.6; when the scale s is 7, the value of ⁇ is 2.8 and the value of ⁇ is 3.5.
  • the C1 layer is used when implementing the above step 603.
  • the device for object recognition divides each feature map output by the S1 layer into adjacent 2*2 regions. For each feature map, the object recognition device selects the maximum value in each 2*2 area of the feature map to obtain a new feature map corresponding to the feature map. It can be seen that the C1 layer will only change the dimension of the feature maps, but not the number of feature maps. For example, there are 16 feature maps output by the S1 layer, each with a size of 128*128, and there are 16 new feature maps, each with a size of 64*64.
  • the processing of the C1 layer can be referred to as a pooling operation.
  • the processing of the C1 layer may not be performed.
  • the encoding layer is used when implementing the above step 603.
  • the encoding layer is used to encode multiple feature maps of the AER data.
  • the device for object recognition encodes the feature maps of the AER data into a pulse sequence.
  • the object recognition device performs temporal coding and spatial coding.
  • Time coding can be used to perform sequential coding processing of each feature value in the feature map according to the target coding function to obtain the trigger time of the pulse of each feature value of the feature map.
  • the target coding function can be a reverse linear function or a reverse logarithm. function.
  • Spatial coding is used to compose the pulse sequence from the trigger moment of the pulse.
  • the target coding function is an inverse logarithmic function or an inverse linear function.
  • the inverse logarithmic function can be u-vln(r) (will be described later).
  • the inverse linear function may be kr+b (k is a value less than 0).
  • the reverse logarithmic function or the reverse linear function changes the distribution of feature values in the feature map, the feature values can express more information in the subsequent recognition layer, so the accuracy of recognition can be improved.
  • Time coding For any feature map A, the embodiment of the present application takes the objective coding function as an inverse logarithmic function as an example to illustrate time coding.
  • the function expression can be:
  • r is the characteristic value of any position
  • t is the trigger time of the pulse whose characteristic value is r.
  • u and v are normalization factors, which are used to ensure that pulses corresponding to all eigenvalues in a feature map are excited in a predetermined time window tw, for example, tw is 120ms.
  • C() represents the target encoding function.
  • u and v can be determined as follows:
  • r max is the largest feature value in the feature map A
  • r min is a predefined minimum threshold. What needs to be explained here is that r max and r min in each feature map may be different, so when performing time coding on different feature maps, r max and r min need to be re-determined.
  • the device for object recognition encodes the feature value in each feature map as the trigger time of a pulse, which is equivalent to the trigger time of a pulse corresponding to each feature value in each feature map. For example, if there are 16 feature maps, each feature map has 64*64 feature values, there will be a total of 16*64*64 pulse trigger moments.
  • the object recognition device can delete the feature value in the feature map whose feature value is less than the target threshold (the target threshold is r min) before time coding. , In order to save processing resources.
  • the subsequent spatial coding there is no need to count the trigger moments of pulses with characteristic values smaller than the target threshold, so processing resources can also be saved.
  • the object recognition device can fuse certain feature values to more effectively use neurons and form a compact representation to reduce the amount of calculation in the subsequent recognition layer.
  • the feature map output by the C1 layer has the same position and the feature values of all scales s in the set direction form a set of feature values (the set direction refers to a fixed direction among multiple directions ⁇ , in this application
  • the setting direction may be any one of 0 degrees, 45 degrees, 90 degrees, or 135 degrees).
  • 16 feature maps are output in the C1 layer.
  • the 16 feature maps can be divided into feature maps in 4 directions.
  • the feature maps in each direction are 4 feature maps (ie, feature maps with 4 scales).
  • the values at positions (m, n) in the 4 feature maps with 0 degrees are 3, 4, 5, and 5 respectively, and a set of feature values at the same position (2, 2) at 0 degrees are set to be (3 , 4, 5, 5).
  • the object recognition device groups the feature values of the same position in the feature map with the set direction of 0 degrees (for example, the same position is (2, 2)), and the set direction is 0 degrees and the position is (2, 2). 2) A set of eigenvalues composed of 4 eigenvalues. Then, the trigger time of the pulse corresponding to each group of eigenvalues is obtained, so that the trigger time of the pulse corresponding to each group of eigenvalues forms a pulse sequence. Since there are multiple sets of eigenvalues, multiple pulse sequences can be obtained.
  • the coding layer may include a plurality of coding neurons, and each coding neuron is responsible for the conversion of feature maps of multiple scales s in the same position and setting direction.
  • the number of coding neurons can be N*P*M (N and P can be equal or not), where N*P is the size of the feature map (the feature map output by the C1 layer), and M is the number of directions ⁇ .
  • r y y
  • r s represents the scale of the feature value r
  • S represents the set of scale s
  • r ⁇ represents the characteristic value r
  • the direction of r x and r y indicate the position of the eigenvalue r in the feature map.
  • the function of formula (5) indicates the set of t spikes generated at the trigger moments of pulses with a position of (x, y) and a direction of ⁇ .
  • the trigger time of the pulse of all the scales in the 4 directions (ie, the Gabor filter in the 4 directions) can be obtained.
  • the size of the feature map is 64*64
  • the direction has four values (0 degrees, 45 degrees, 90 degrees, and 135 degrees)
  • the scale s also has four values (3, 5, 7, 9)
  • the feature map has 16 A.
  • the pulse sequence corresponding to each direction includes pulse triggering times corresponding to 4 scales, so there are a total of 64*64*4 pulse sequences.
  • the trigger time of the pulse in the pulse sequence is obtained based on the feature value in the feature map. Since each feature value reflects part of the spatial information and part of the time information of the object to be identified, the pulse in the pulse sequence also carries the information of the object to be identified. Part of the space information and part of the time information.
  • pulses of different scales are fused in the coding layer, the number of parameters in the subsequent recognition layer can be reduced while maintaining accuracy, which is very suitable for resource-constrained neuromorphic devices.
  • the recognition layer is also used when implementing the above step 603, and the recognition layer is used to receive the pulse sequence output by the coding layer to recognize the object to be recognized in the AER data.
  • the recognition layer can be realized by SNN, which is composed of a layer of fully connected network structure.
  • the number of neurons included in the SNN (that is, the recognition neuron mentioned later) is equal to N*P*M (N and P can be equal or not), where N*P is the feature map (the feature output by the C1 layer) Figure) size, M is the number of directions ⁇ .
  • the object recognition device After obtaining the output of the coding layer, the object recognition device can input each pulse sequence into each recognition neuron in the recognition layer. Through the recognition processing of the recognition layer, the objects in the AER data can be obtained.
  • a method for training SNN is also provided, and the processing can be:
  • the method of training SNN may include supervised learning algorithm and unsupervised learning algorithm.
  • the supervised learning algorithm can be a Multi-Spike Prop algorithm, etc.
  • the unsupervised learning algorithm may be an STDP algorithm or the like. This application uses the STDP algorithm as an example for SNN training. According to the relative timing relationship between the pulse sequences emitted by presynaptic neurons and postsynaptic neurons, the learning rules of the STDP algorithm can be used to unsupervised adjustment of synaptic weights .
  • the training process can be as follows:
  • Step a Obtain a sample set, the sample set includes AER data, and the AER data includes multiple AER events.
  • step b a plurality of AER events are subjected to the aforementioned feature extraction to obtain a feature map.
  • Step c After the feature map is coded by the coding neuron, a pulse sequence is obtained (in the foregoing manner).
  • Step d input the pulse sequence to the recognition layer, and stimulate the recognition neuron to emit pulses.
  • the STDP algorithm adjusts the synapse weight according to the time interval between the pulses of the coding neuron and the recognition neuron. If the coding neuron pulse precedes the recognition neuron pulse, the weight is increased, otherwise, the weight is decreased.
  • the recognition neuron adopts a dynamic threshold, that is, if the recognition neuron often triggers pulses easily, the threshold is increased. Recognize the interconnection between neurons and play a role in mutual inhibition.
  • Step e After performing the target number of steps b to d (the target number can be 5-10 times), the training ends. Set the learning rate to zero, and determine the threshold of each recognition neuron and the weight of each synapse when step d is executed for the last time. According to the highest response of the recognition neuron to the sample category in the sample set, each recognition neuron is assigned a category (this is the only step in using labels).
  • the category with the highest firing rate can be selected as the prediction result through the response of each recognition neuron that has been assigned a category.
  • the object recognition device can obtain the AER data of the object to be recognized.
  • the AER data includes multiple AER events of the object to be recognized.
  • Each AER event includes the timestamp and address information of the AER event, and then extracts the AER data.
  • Each feature map contains part of the spatial information and part of the time information of the object to be identified. Part of the spatial information and part of the time information are obtained based on the timestamp and address information of each AER event.
  • the object to be recognized is recognized. In this way, since the time information and spatial information of the object to be recognized in the AER data are included in the extracted feature map, the feature map can be made to more comprehensively represent the original data, and the recognition result can be more accurate when performing recognition.
  • the pulse coding method is used in the embodiments of this application, which can not only express more information in the subsequent recognition model and improve the accuracy of recognition, but also because the pulses of different scales are merged, the reduction can be reduced while maintaining the accuracy.
  • the number of neurons is recognized, which in turn saves computing resources.
  • Fig. 9 is a structural diagram of an object recognition device provided by an embodiment of the present application.
  • the device can be implemented as part or all of the device through software, hardware or a combination of the two.
  • the apparatus provided in the embodiment of the present application can implement the process described in FIG. 6 of the embodiment of the present application.
  • the apparatus includes: an acquisition module 910, an extraction module 920, and an identification module 930, wherein:
  • the acquiring module 910 is configured to acquire the address event representation AER data of the object to be identified, the AER data includes multiple AER events of the object to be identified, and each AER event includes the timestamp and address information that generated the AER event, Specifically, it can be used to perform the obtaining function of implementing step 601;
  • the extraction module 920 is used to extract multiple feature maps of the AER data, where each feature map contains part of the spatial information and part of the time information of the object to be identified, and the part of the spatial information and part of the time information is based on the The timestamp and address information of each AER event are obtained, which can be specifically used to implement the extraction function of step 602 and the implicit steps included in step 602;
  • the recognition module 930 is configured to recognize the object to be recognized according to the multiple feature maps of the AER data, and may be specifically used to implement the recognition function of step 603 and the implicit steps included in step 603.
  • the extraction module 920 is configured to:
  • Multiple filters are used to process the address information of the multiple AER events to obtain multiple first feature maps
  • the extraction module 920 is configured to:
  • a convolution kernel of multiple Gabor filters is used to perform convolution processing on the spatial information of the multiple AER events to obtain multiple first feature maps.
  • the identification module 930 is configured to:
  • the multiple feature maps of the AER data are encoded to obtain multiple pulse sequences, where each pulse sequence includes multiple pulses, and each pulse carries part of the time information and part of the space information of the object to be identified, belonging to Multiple pulses in the same pulse sequence are obtained based on the feature values at the same position in the feature maps corresponding to different filters with the same set direction;
  • a pulse neural network is used to process the multiple pulse sequences to identify the object to be identified.
  • the identification module 930 is configured to:
  • a target encoding function is used to encode multiple feature maps of the AER data to obtain multiple pulse sequences, and the target encoding function is an inverse linear function or an inverse logarithmic function.
  • the object recognition device can obtain the AER data of the object to be recognized.
  • the AER data includes multiple AER events of the object to be recognized.
  • Each AER event includes the timestamp and address information of the AER event, and then extracts the AER data.
  • Each feature map contains part of the spatial information and part of the time information of the object to be identified. Part of the spatial information and part of the time information are obtained based on the timestamp and address information of each AER event.
  • the object to be recognized is recognized. In this way, since the time information and spatial information of the object to be recognized in the AER data are included in the extracted feature map, the feature map can be made to more comprehensively represent the original data, and the recognition result can be more accurate when performing recognition.
  • the pulse coding method is used in the embodiments of this application, which can not only express more information in the subsequent recognition model and improve the accuracy of recognition, but also because the pulses of different scales are merged, the reduction can be reduced while maintaining the accuracy.
  • the number of neurons is recognized, which in turn saves computing resources.
  • the object recognition device provided in the above embodiment only uses the division of the above-mentioned functional modules to illustrate the object recognition.
  • the above-mentioned functions can be allocated by different functional modules as needed. That is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above.
  • the object recognition apparatus provided in the foregoing embodiment and the object recognition method embodiment belong to the same concept. For the specific implementation process, please refer to the method embodiment, which will not be repeated here.
  • a computing device for object recognition includes a processor and a memory.
  • the memory is configured to store one or more instructions.
  • the processor executes the one or more instructions. To implement the object recognition method provided above.
  • a computer-readable storage medium is also provided, and the computer-readable storage medium stores instructions.
  • the computing device is caused to execute the above provided Methods of object recognition.
  • a computer program product containing instructions is also provided, which when running on a computing device, causes the computing device to execute the object recognition method provided above, or causes the computing device to implement the object provided above The function of the identified device.
  • the computer program product includes one or more computer instructions, and when the computer program instructions are loaded and executed on a server or a terminal, the processes or functions described in the embodiments of the present application are generated in whole or in part.
  • the computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer instructions may be transmitted from a website, computer, server, or data center.
  • the computer-readable storage medium may be any available medium that can be accessed by a server or a terminal, or a data storage device such as a server or a data center integrated with one or more available media.
  • the usable medium may be a magnetic medium (such as a floppy disk, a hard disk, a magnetic tape, etc.), an optical medium (such as a digital video disk (Digital Video Disk, DVD), etc.), or a semiconductor medium (such as a solid-state hard disk, etc.).
  • the device embodiments described above are merely illustrative.
  • the division of the modules is only a logical function division, and there may be other division methods in actual implementation.
  • multiple modules or components can be combined or integrated into another system, or some features can be omitted or not implemented.
  • the connections between the modules discussed in the above embodiments may be electrical, mechanical or other forms.
  • the modules described as separate components may or may not be physically separate.
  • the component displayed as a module may be a physical module or may not be a physical module.
  • each functional module in each embodiment of the application embodiment may exist independently, or may be integrated into one processing module.

Abstract

一种对象识别的方法,属于图像处理技术领域。该方法包括:在AER数据中识别对象时,可以获取待识别对象的AER数据,AER数据包括待识别对象的多个AER事件,每个AER事件包括产生AER事件的时间戳和地址信息,提取AER数据的多个特征图,其中,每个特征图包含待识别对象的部分空间信息和部分时间信息,部分空间信息和部分时间信息是根据每个AER事件的时间戳和地址信息获得的,根据AER数据的多个特征图,对待识别对象进行识别。上述方法可以提高对象识别结果的准确率。

Description

对象识别方法及装置 技术领域
本申请涉及图像处理的技术领域,特别涉及一种对象识别方法及装置。
背景技术
传统的视觉传感器以“帧扫描”为图像采集方式,随着视觉系统实际应用对于速度等性能要求的提升,传统的视觉传感器遇到了数据过大、帧频受限等发展瓶颈。基于仿生视觉感知模型的地址事件表示(address event representation,AER)传感器以其速度高、延迟小、冗余低的优势,成为当前机器视觉系统领域的研究热点。AER传感器中的每个像素分别监测特定区域的光强的相对变化,如果变化超过预定义的阈值,则记录该像素对应的AER事件,AER事件包括该像素的位置(即地址信息)、发出该事件的时间(即时间戳)以及光强的值增大或减小的指示信息。与传统的视觉传感器记录光强的值不同,AER传感器仅记录光强的变化值超过阈值的事件,光强的变化值小于阈值的事件并不会记录,因此可以极大的降低视觉信息的冗余。
相关技术中,假设AER传感器的像素为128*128,在对AER数据中的对象进行识别时,首先提取AER数据的空间特征,在空间特征提取算法中输入AER数据中的AER事件,输出则会是一个128*128的空间特征图(即空间特征)。然后将空间特征图,输入到分类算法,对AER数据中的对象进行识别。
由于相关技术中仅考虑了空间特征,提取的特征比较单一,对AER数据进行识别时,有可能导致识别结果不准确。
发明内容
为了解决对象的识别结果不准确的问题,本申请提供了一种对象识别方法及装置。
第一方面,本申请提供了一种对象识别的方法。在该方法中,在识别对象的过程中,可以获取待识别对象的AER数据,该AER数据包括待识别对象的多个AER事件,每个AER事件包括产生AER事件的时间戳(即发生该AER事件的时间)和地址信息(即产生该AER事件的像素的位置信息)。然后可以提取AER数据的多个特征图,其中,每个特征图包含待识别对象的部分空间信息和部分时间信息,部分空间信息和部分时间信息是根据每个AER事件的时间戳和地址信息获得的。根据AER数据的多个特征图,对待识别对象进行识别。
由于本申请提供的对象识别方法中,在提取AER数据的特征图时提取了AER事件的时间信息和地址信息,所以原始的AER数据的时间信息和空间信息都包含在提取的特征图中,使得特征图能够更全面的代表原始数据,进而对待识别对象进行识别时,可以使识别结果更准确。
在一种可能的实现方式中,所述提取所述AER数据的多个特征图,包括:采用多个滤波器对所述多个AER事件的地址信息进行处理,得到多个第一特征图;按照所述多个AER事件的时间戳对所述多个第一特征图中的特征值进行衰减处理,得到所述AER数据的多个特征图。
本申请所示的方案,在提取AER数据的多个特征图时,可以获取多个滤波器,滤波器可以是Gabor滤波器或者高斯函数的差分(Difference of Guassian,DOG)滤波器。在滤波器为Gabor滤波器时,多个滤波器为不同方向和尺度的组合的滤波器,尺度为卷积核的大小,方向为Gabor滤波器核函数的方向。其中,不同方向和尺度指尺度和方向中至少有一个不相同
对于任一滤波器,可以使用该滤波器对多个AER事件的地址信息进行处理(该处理可以是卷积处理),得到一个第一特征图,这样,由于有多个滤波器,所以可以得到多个第一特征图(多个第一特征图的大小相同,均为n*m)。然后对这多个第一特征图中的每个第一特征图的特征值,按照AER事件的时间戳进行衰减处理,得到AER数据的多个特征图。
在一种可能的实现方式中,所述采用多个滤波器对所述多个AER事件的地址信息进行处理,得到多个第一特征图,包括:采用多个Gabor滤波器的卷积核对所述多个AER事件的空间信息进行卷积处理,得到多个第一特征图。
本申请所示的方案,在提取第一特征图时,可以采用每个Gabor滤波器的卷积核与AER事件的空间信息进行卷积处理,得到每个Gabor滤波器的卷积核对应的第一特征图。这样,由于有多个Gabor滤波器,所以可以得到多个第一特征图。
在一种可能的实现方式中,所述根据所述AER数据的多个特征图,对所述待识别对象进行识别,包括:对所述AER数据的多个特征图进行编码以获得多个脉冲序列,并采用脉冲神经网络对所述多个脉冲序列进行处理,以对所述待识别对象进行识别。其中,每个脉冲序列包括多个脉冲,每个脉冲携带有所述待识别对象的部分时间信息和部分空间信息,属于同一个脉冲序列中的多个脉冲是根据具有相同设定方向上的不同滤波器对应的特征图中,具有相同位置的特征值获得的。
本申请所示的方案,在对待识别对象进行识别时,可以对AER数据的多个特征图进行编码以获得多个脉冲序列。具体可以是将每个特征图中的特征值编码成脉冲的触发时刻。在编码脉冲的触发时刻时,或者在其他时间,将多个特征图中,设定方向上的不同滤波器对应的特征图中的相同位置的特征值组成一组特征值(设定方向为滤波器的任一方向)。然后将该组特征值对应的脉冲的触发时刻,组成一个脉冲序列,这样,特征图中每个位置的不同方向都对应有一个脉冲序列,可以得到多个脉冲序列,每个脉冲序列包括多个脉冲。由于特征图中的特征值携带了待识别对象的部分时间信息和部分空间信息,所以脉冲也携带了待识别对象的部分时间信息和部分空间信息。然后将脉冲序列输入至脉冲神经网络,对AER数据中包括的对象进行识别。
这样,由于编码成脉冲序列,脉冲序列中每个脉冲表达了时间信息和空间信息,所以在识别时,可以使识别结果更准确。而且由于将不同尺度的脉冲进行融合,可以在保持准确率的基础上减少了脉冲神经网络中识别神经元的数目,进而节省了计算资源。
在一种可能的实现方式,可以采用目标编码函数对AER数据的多个特征图进行编码,以获得多个脉冲序列,目标编码函数为反向的线性函数或者反向的对数函数。目标编码函数用于控制特征图中较小的特征值更迟触发脉冲或者不触发脉冲,并用于控制特征图中较大的特征值更早的触发脉冲。
这样,由于通过目标编码函数改变了特征图中特征值的分布,所以可以在后续识别时表达更多的信息,进而可以提高识别的准确率。
第二方面,本申请提供了一种对象识别的装置,该装置包括一个或多个模块,该一个或多个模块用于实现上述第一方面或第一方面的可能的实现方式所提供的对象识别的方法。
第三方面,本申请提供了一种对象识别的计算设备,该计算设备包括处理器和通信接口,处理器与通信接口连接,
通信接口,用于接收待识别对象的AER数据,处理器用于执行上述第一方面所述的对象识别的方法。
第四方面,本申请提供了一种计算机可读存储介质,计算机可读存储介质存储有指令,当计算机可读存储介质中的指令在计算设备上被执行时,使得计算设备执行上述第一方面或第一方面的可能的实现方式所提供的对象识别的方法。
第五方面,本申请提供了一种包含指令的计算机程序产品,当其在计算设备上运行时,使得计算设备执行上述第一方面或第一方面的可能的实现方式所提供的对象识别的方法。
附图说明
图1是本申请一个示例性实施例提供的计算设备的结构框图;
图2是本申请一个示例性实施例提供的对象识别的方法的架构示意图;
图3是本申请一个示例性实施例提供的提取特征图的架构示意图;
图4是本申请一个示例性实施例提供的对特征图进行编码的架构示意图;
图5是本申请一个示例性实施例提供的提取特征图的示意图;
图6是本申请一个示例性实施例提供的对象识别的方法的流程示意图;
图7是本申请一个示例性实施例提供的对象识别的方法的实现架构图;
图8是本申请一个示例性实施例提供的提取特征图的示意图;
图9是本申请一个示例性实施例提供的对象识别的装置的结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
为了便于对本申请实施例的理解,下面首先介绍所涉及到名词的概念:
AER传感器,是模仿人类视网膜机制的神经形态装置。AER传感器包括多个像素,每个像素分别监测特定区域的光强的变化。在变化超过阈值时,记录该像素对应的AER事件,而变化未超过阈值时,不记录该像素对应的AER事件。每个AER事件都包括发生该AER事件的像素的位置信息(即地址信息)、发生的时间(即时间戳)以及极性,极性用于表征该像素感知到光的变化是从暗到亮(可以使用数值1表示),还是从亮到暗(可以使用数值-1表示)。可见,AER传感器最终输出的是来自每个像素的AER事件,与传统相机相比,AER传感器具有场景的异步、高时间分辨率和稀疏表示的优点,在数据传输速度和数据冗余度上都有很大的优势。需要说明的是,上述场景的异步指每个像素是单独采集AER事件。
AER数据,包括来自每个像素的AER事件流。来自每个像素的AER事件流中的任一 AER事件包括发生该AER事件的像素的地址信息、发生的时间戳以及极性等。
Gabor滤波器,是一种用于纹理分析的线性滤波器。Gabor滤波器可以用于提取图像、视频的特征,广泛应用在计算机视觉应用中。具体是仅允许与其频率相对应的纹理顺利通过,而使其他纹理的能量受到抑制。Gabor滤波器可以由尺度s、方向θ的来表示,不同尺度s和方向θ的组合对应的卷积核不相同,那么不同尺度s和方向θ的组合对应不同的滤波器。通过研究表明,哺乳动物大脑视觉皮层中的简单细胞可以通过Gabor滤波器建模,每个Gabor滤波器模拟具有一定尺度的感受野的神经元细胞。感受野是一个神经元的反应或支配的刺激区域。
脉冲神经网络(Spiking Neural Network,SNN),经常被誉为第三代人工神经网络。脉冲网络的神经元可以模拟生物神经细胞电压变化及传递过程,神经元之间信息传递采用脉冲的形式,脉冲包含时间和空间信息。脉冲神经网络可以用于对输入进行识别、分类。
脉冲时间依赖的可塑性算法(Spike-timing Dependent Plasticity,STDP算法),是在大脑中发现的神经元之间连接权重的更新规则,目标是若两个神经元的发放在时间上离的越近,他们之间的绑定关系就越亲密。STDP算法是一种无监督的学习算法。
无监督学习算法,其在人类和动物的学习中占据主导地位,人们通过观察能够发现世界的内在结构,而不是被告知每一个客观事物的名称。无监督学习算法的设计主要是针对无标签数据集的训练,要求应用无监督学习规则对神经网络中的连接权值或者结构进行自适应的调整。也就是说,在没有“教师”信号的监督下,神经网络必须自己从输入的数据中发现规律性(如统计特征、相关性或者类别等),并通过输出实现分类或者决策。
监督学习算法,利用一组已知类别的样本调整分类器的参数,使其达到所要求性能的过程,也称为监督训练。即监督学习是从标记的训练数据来推断一个功能的机器学习任务。
在对本申请实施例提供的对象识别的方法介绍之前,对本申请实施例所适用的应用场景和系统架构进行介绍。
在使用AER传感器获取到图像数据后,通常会识别图像数据中的对象,对象可以是行人、车辆等物体,还可以是动作过程等。相关技术中是在AER传感器输出的数据中提取AER数据的空间特征,然后基于空间特征对AER数据中的待识别对象进行识别。由于仅考虑了空间特征,提取的特征比较单一,对AER数据进行识别时,有可能导致识别结果不准确。所以需要提供一种相对比较准确的识别方法。本申请实施例中的AER传感器可以应用于行车记录仪、监控设备等任何主要记录变化的内容的图像拍摄场景中。
本申请实施例提供的对象识别的方法可以由对象识别的装置执行,对象识别的装置可以是一个硬件装置,例如:服务器、终端计算设备等。对象识别的装置也可以是一个软件装置,具体为运行在硬件计算设备上的一套软件系统。本申请实施例中并不限定对象识别的装置所部署的位置。示例性的,对象识别的装置可以部署在服务器上。
对象识别的装置在逻辑上也可以是由多个部分构成的装置,如对象识别的装置可以包括获取模块、提取模块和识别模块等,对象识别的装置中的各个组成部分可以分别部署在不同的系统或服务器中。装置的各部分可以分别运行在云计算设备系统、边缘计算设备系统或终端计算设备这三个环境中,也可以运行在这三个环境中的任意两个环境中。云计算设备系统、边缘计算设备系统和终端计算设备之间由通信通路连接,可以相互进行通信。
在对象识别的装置为计算设备时,图1示例性的提供了本申请的计算设备100的一种可能的架构图。
计算设备100可以包括处理器101、存储器102、通信接口103和总线104。在计算设备100中,处理器101的数量可以是一个或多个,图1仅示意了其中一个处理器101。可选的,处理器101可以是中央处理器(Central Processing Unit,CPU)。若计算设备100具有多个处理器101,多个处理器101的类型可以不同,或者可以相同。可选的,计算设备100的多个处理器还可以集成为多核处理器。处理器101可以用于执行对象识别的方法的步骤。实际应用中,处理器101可以是一块超大规模的集成电路。在处理器101中安装有操作系统和其他软件程序,从而处理器101能够实现对存储器102等器件的访问。可以理解的是,在本发明实施例中,处理器101是以中央处理器(Central Processing unit,CPU)进行介绍,实际应用中,还可以是其他特定集成电路(Application Specific Integrated Circuit,ASIC)。
存储器102存储计算机指令和数据,存储器102可以存储实现本申请提供的对象识别的方法所需的计算机指令和数据。例如,存储器102存储用于实现本申请提供的对象识别的方法中获取模块执行步骤的指令。再例如,存储器102存储用于本申请提供的对象识别的方法中提取模块执行步骤的指令。再例如,在存储器102存储用于本申请提供的对象识别的方法中识别模块执行步骤的指令。存储器102可以是以下存储介质的任一种或任一种组合:非易失性存储器(如只读存储器(Read-Only Memory,ROM)、固态硬盘(Solid State Disk,SSD)、硬盘(Hard Disk Drive,HDD)、光盘等)、易失性存储器。
通信接口103可以是以下器件的任一种或任一种组合:网络接口(如以太网接口)、无线网卡等具有网络接入功能的器件。通信接口103用于计算设备100与其他计算设备100或者终端进行数据通信。在本申请中,可以通过通信接口103从AER传感器获取待识别对象的AER数据。
图1用一条粗线表示总线104。总线104可以将处理器101与存储器102、通信接口103连接。这样,通过总线104,处理器101可以访问存储器102,还可以利用通信接口103与其它计算设备100或者终端进行数据交互。
在本申请中,计算设备100执行存储器102中的计算机指令,使用计算设备100实现本申请提供的对象识别的方法。例如,使得计算设备100执行对象识别的方法中由获取模块执行的步骤。再例如,使得计算设备100执行对象识别的方法中由提取模块执行的步骤。再例如,使得计算设备100执行对象识别的方法中识别模块执行步骤的指令。
在进行实施前,首先介绍本申请实施例的整体框架。本申请实施例提供了一种对象识别的方法,该方法的实现图如图2所示,对象识别的装置从AER传感器,获取AER传感器采集的AER数据。对象识别的装置对AER数据进行编码处理,编码处理可以包括提取AER数据的多个特征图和对AER数据的多个特征图进行编码处理。对象识别的装置将编码处理后的内容输入至识别模型中进行识别。
在图2中,提取特征图的处理过程可以如图3所示,对象识别的装置将AER数据通过卷积计算和空间信息随时间衰减,使得每个特征图中每个特征值受到AER事件的时间戳的影响。
在图2中,编码特征图的处理过程可以如图4所示,对象识别的装置对特征图中的特征 值经过编码函数进行时间编码后,生成脉冲的触发时刻。对象识别的装置进行时间编码时,也可以同时进行空间编码,具体是将AER数据的多个特征图中相同位置且设定方向的不同尺度的特征值组成一组(设定方向在后面进行描述),得到多组特征值。然后确定每组特征值对应的脉冲的触发时刻,得到每组特征值对应的一组脉冲的触发时刻。每组脉冲的触发时刻即为一个脉冲序列,即每个脉冲序列包括多个脉冲。
需要说明的是,上述设定方向的不同尺度的特征图为使用设定方向的不同滤波器得到的特征图。例如,方向θ为45度的滤波器有4个(4代表尺度的数目),设定方向为45度的滤波器得到的特征图为4个。相同位置指多个特征图中的同一位置,如有4个特征图,每个特征图均为(4,4)位置。
另外,本申请实施例在提取特征图时使用到滤波器,滤波器可以是任意一种能提取特征的滤波器,如可以是Gabor滤波器或者DOG滤波器等。在进行实施前,为了更好的理解本申请实施例,首先介绍Gabor滤波器,Gabor滤波器的函数表达式可以为:
Figure PCTCN2020111650-appb-000001
其中,在式(1)中,X=Δxcosθ+Δysinθ;Y=-Δxsinθ+Δycosθ。(Δx,Δy)是(x,y)和AER事件所属的像素的位置(e x,e y)的空间偏移位置,(x,y)为Gabor滤波器对应的卷积核中元素对应到特征图中的位置。实际上也可以认为(e x,e y)是AER事件在特征图中的位置。
γ是空间纵横比,决定Gabor滤波器的形状,取值为1时,形状为圆形。
λ为波长,直接影响Gabor滤波器的滤波尺度(即后续提到的尺度)。σ是表示带宽,是Gabor滤波器的方差,λ和σ是由尺度s决定。
尺度s表示Gabor滤波器的卷积核的大小,如尺度s为3,卷积核为3*3的卷积核,尺度s为5,卷积核为5*5的卷积核。
θ表示Gabor滤波器的核函数的方向。
在尺度s和方向θ固定的情况下,通过Gabor滤波器的函数表达式,即可计算出在这组尺度s和方向θ下的卷积核。例如,尺度s为3,方向为45度,AER事件所属的像素的位置为(3,3)(即特征图中的位置为(3,3)),想要计算卷积核的(1,1)位置的数值,则x=2,y=2,Δx=-1,Δy=-1。将Δx和Δy代入式(1)即可得到G(-1,-1),即为卷积核的(1,1)位置的数值。通过这种方式可以确定出每组尺度s和方向θ下的卷积核。
在确定出卷积后,会使用卷积核提取特征图。特征图中包含特征值的数目与AER传感器的像素的数目相同,且特征图中每一行特征值的数目与AER传感器中每一行像素的数目相同,且一一对应。初始特征图中各特征值取值均可以为零。例如,AER传感器的像素数目为5*5,那么特征图包含的特征值的数目为5*5。在尺度s和方向θ固定的情况下,在每当一个AER事件进行卷积处理时,都是将该组尺度s和方向θ对应的卷积核覆盖到特征图该AER事件所属的位置的感受野中。具体是:假设卷积为
Figure PCTCN2020111650-appb-000002
AER事件在特征图中所属的位置为(m,n),将卷积核中心位置的数值e与特征图中(m,n)位置的特征值进行叠加。然后将a与特征图中(m-1,n-1)位置的特征值进行叠加,将b与特征图中(m,n-1)位置的特征 值进行叠加,将c与特征图中(m+1,n-1)位置的特征值进行叠加,以此类推,即可将卷积核覆盖到特征图中,这样,就得到了添加该AER事件的特征图。如图5所示,假设卷积核为3*3,即
Figure PCTCN2020111650-appb-000003
初始特征图为5*5,即
Figure PCTCN2020111650-appb-000004
在像素位置为(3,3)、时间为100ns处输入一个AER事件,在像素位置为(2,3)、时间为200ns处输入一个AER事件。对象识别的装置在特征图(3,3)的位置叠加卷积核。然后对象识别的装置在特征图(2,3)的位置叠加卷积核。
这样,将AER数据中每个AER事件都叠加到特征图中,可以得到AER数据的特征图。上述是仅以一组尺度s和方向θ的滤波器为例进行说明,每组尺度s和方向θ的滤波器均可以得到一个特征图。
需要说明的是,对于特征图中边界位置处,Gabor滤波器的卷积核并不能完全的覆盖到特征图中,可以仅叠加卷积核中能覆盖的数值。例如,Gabor滤波器的卷积核为3*3的卷积核,特征图中的(1,1)位置处,卷积核中第一列和第一行的数值不能覆盖到特征图中,而除第一列和第一行之外的数值均可以覆盖,所以在特征图仅叠加除第一列和第一行之外的数值。
还需要说明的是,上述确定每种尺度s和方向θ下的卷积核的过程可以在对象识别时由对象识别的装置确定。也可以是其他设备上确定,对象识别的装置在使用时,从其他设备上获取,本申请实施例不做限定。另外本申请实施例是以水平向右为X轴的正方向,并且以竖直向下为Y轴的正方向为例进行说明。
下面将结合图6对本申请实施例提供的一种对象识别的方法进行说明,且以执行主体为对象识别的装置为例进行说明。方法流程可以如下:
步骤601,对象识别的装置从AER传感器获取AER数据。AER数据包括来自每个像素的AER事件,由于每个像素均是用于检测待识别对象,所以可以认为AER数据包括待识别对象的多个AER事件。每个AER事件包括发生该AER事件的像素的地址信息、时间戳以及极性。
本实施例中,AER传感器可以检测每个像素的光强的变化,在变化超过阈值时,记录该像素对应的AER事件,而变化未超过阈值时,不记录该像素对应的AER事件。每个AER事件都包括发生该AER事件的像素的地址信息、时间戳以及极性,极性用于表征该像素感知到光的变化是从暗到亮(可以使用数值1表示),还是从亮到暗(可以使用数值-1表示)。这样AER数据中包括多个AER事件。
对象识别的装置接收到AER数据的处理请求时,可以向AER数据所属的AER传感器发送AER数据的获取请求。AER传感器接收到对象识别的装置发送的AER数据的获取请求后,可以向对象识别的装置发送AER数据。这样对象识别的装置可以从AER传感器获取到AER数据。
另外,AER传感器中配置有AER数据的上传周期,每到上传周期,AER传感器向对象识别的装置发送上次上传至此次上传这段时间采集到的AER数据。对象识别的装置可以接收 AER传感器发送的AER数据。这样对象识别的装置也可以从AER传感器获取到AER数据。
另外,AER传感器每当采集到AER数据时,向对象识别的装置发送采集到的AER数据。这样,对象识别的装置也可以从AER传感器获取到AER数据。
需要说明的是,本申请中获取的是一段时间的AER数据,对这一段时间的AER数据中的待识别对象进行识别,如一段时间为1分钟等。待识别对象指AER数据中未确定类别或者动作的物体。
步骤602,对象识别的装置提取AER数据的多个特征图。
其中,每个特征图包含待识别对象的部分空间信息和部分时间信息,部分空间信息和部分时间信息是根据每个AER事件的时间戳和地址信息获得的。AER数据对应有多个特征图,每个特征图包含待识别对象的部分空间信息和部分时间信息。空间信息用于指示待识别对象的空间特征,时间信息用于指示待识别对象的时间特征。
对象识别的装置在提取待识别对象的部分空间信息时,可以使用卷积操作进行提取。对象识别的装置在提取时间信息时,可以使用空间信息随时间衰减的方式进行提取,使得空间信息受到AER事件的时间戳的影响。
具体处理时,对象识别的装置可以使用多个滤波器(这多个滤波器两两均不相同,在滤波器是Gabor滤波器时,多个滤波器可以是多组尺度和方向下的滤波器)对AER数据中的多个AER事件的地址信息进行处理,得到多个第一特征图,多个第一特征图的大小相同。对于任一第一特征图,对象识别的装置可以按照多个AER事件的时间戳对该第一特征图中的特征值进行衰减处理,得到该第一特征图对应的特征图(第一特征图与特征图的大小相同,大小指包含特征值的数目,第一特征图与特征图的区别仅是特征值进行了衰减),第一特征图仅包括了待识别对象的部分空间信息,特征图中包括待识别对象的部分空间信息和部分时间信息。由于有多个第一特征图,所以可以得到AER数据的多个特征图。步骤602的技术细节在后文进行具体描述。
步骤603,对象识别的装置根据AER数据的多个特征图,对待识别对象进行识别。对象识别装置可以使用识别模型(如SNN等)对待识别对象进行识别。步骤603的技术细节在后文进行具体描述。
本申请实施例中,在执行上述步骤602至步骤603的处理的过程的结构图如图7所示。结构图中包括S1层、C1层、编码层和识别层,假设本申请实施例中有16组尺度s和方向θ的滤波器,那么S1层可以基于AER数据输出16个特征图,且每个特征图的大小相同,均可以等于AER传感器的像素的数目。在图7中水平方向叠在一起的特征图是同一方向θ,不同尺度s的特征图(即同一方向θ的不同滤波器处理后的特征图),S1层竖直方向从上到下为方向θ为0度、45度、90度和135度的特征图。C1层仅是对S1层输出的特征图进行降维处理,所以特征图的数目未发生改变。编码层用于对C1层输出的特征图进行时间编码和空间编码处理,得到脉冲序列。识别层用于基于脉冲序列对AER数据中的待识别对象进行识别处理。
S1层用于实现步骤602,C1层、编码层和识别层用于实现步骤603。以下分别具体说明每一层的处理:
在实现上述步骤602时使用S1层,在S1层中,对象识别的装置将AER数据中的每个AER事件与滤波器进行卷积计算(本申请实施例以滤波器为Gabor滤波器为例进行说明), 具体过程为:
对于任一组尺度s和方向θ的Gabor滤波器(可以简称为任一Gabor滤波器),对象识别的装置每次获取到一个AER事件的地址信息,可以将该Gabor滤波器对应的卷积核覆盖到特征图中该AER事件所属的像素对应的感受野中(此处理可以称为是卷积处理),从而更新特征图。将多个AER事件进行卷积处理,可以得到该Gabor滤波器对应的第一特征图。此外为了有效的提取AER数据的时间信息,对象识别的装置可以使用多个AER事件的时间戳,衰减该Gabor滤波器对应的第一特征图中的特征值,提取到包含待识别对象的部分空间信息和部分时间信息的特征图,也即衰减较早的AER事件对当前时刻的特征图中特征值的影响,从而有效的提取AER数据的时间信息。具体处理是,对于第一特征图中任一位置,确定感受野覆盖在该位置上的AER事件,然后使用这些AER事件的时间戳衰减该位置的特征值,使距离当前时刻时间越长的AER事件对当前时刻的特征图中特征值的影响越小,使距离当前时刻时间越短的AER事件对当前时刻的特征图中特征值的影响越大。
对象识别的装置可以对多个AER事件的地址信息进,使用每组尺度s和方向θ的Gabor滤波器进行卷积处理,那么会得到多个第一特征图,对多个第一特征图中的特征值分别进行衰减,得到AER数据的多个特征图。
需要说明的是,第一特征图中特征值的数目可以与AER传感器中像素的数目相同,且第一特征图中每一行的特征值的数目与AER传感器中每一行像素的数目相同。
具体的,也可以是使用一个公式直接将待识别对象的空间信息和时间信息相结合,如下所示:
Figure PCTCN2020111650-appb-000005
其中,在式(2)中,r(x,y,t,s,θ)指尺度为s,方向为θ的Gabor滤波器对应的特征图在t时刻位置(x,y)上的特征值。Δx=x-e x,Δy=y-e y,E(t)表示在t时刻之前(包括t时刻),所有AER事件组成的AER事件集合。e表示一个特定的AER事件。(e x,e y)表示AER事件e所属像素的位置,也可以称为是AER事件e在特征图中所属的位置。(x-e x)为特征图中位置(x,y)中x与AER事件e的e x在X方向上的偏移,(y-e y)为特征图中位置(x,y)中y与AER事件e的e y在Y方向上的偏移。
在式(2)中,G(Δx,Δy,s,θ)的表达式如式(1)所示,表示在尺度为s,方向为θ的Gabor滤波器时,AER事件e在特征图中位置(x,y)处的取值。该取值为卷积核的中心位置与位置(e x,e y)重合时,卷积核对应覆盖在特征图中位置(x,y)处的取值。
在式(2)中
Figure PCTCN2020111650-appb-000006
为衰减函数,表示衰减程度,与AER事件的时间戳有关系。e t表示AER事件的时间戳,t为当前时刻。t-e t越小,说明时间戳为e t的AER事件距离当前时刻越短,
Figure PCTCN2020111650-appb-000007
的取值越大,反之t-e t越大,说明时间戳为e t的AER事件距离当前时刻越长,
Figure PCTCN2020111650-appb-000008
的取值越小。T leak为预设的参数,是一个常数。
在式(2)中,x∈X(e x),y∈Y(e y)。若x∈X(e x),y∈Y(e y),则说明位置(x,y)在位置为(e x,e y)的AER事件的感受野中。若
Figure PCTCN2020111650-appb-000009
y∈Y(e y),则说明位置(x,y)不在位置为(e x,e y)的AER事件的感受野中。若x∈X(e x),
Figure PCTCN2020111650-appb-000010
则说明位置(x,y)不 在位置为(e x,e y)的AER事件的感受野中。
具体(x,y)位置的特征值r(x,y,t,s,θ)是(x,y)位置上的所有AER事件对应的特征值与衰减函数的乘积之和。此处位置(x,y)上的所有AER事件指感受野覆盖位置(x,y)的AER事件。具体可以包括两类,一类为AER事件所属的像素的位置也为(x,y)的AER事件,另一类为AER事件所属的像素的位置不是(x,y)位置,但是感受野覆盖(x,y)位置。
例如,卷积核为3*3的卷积核,第一行从左向右为a11、a12、a13;第二行从左向右为a21、a22、a23;第三行从左向右为a31、a32、a33。特征图中位置(x,y)为(3,3),t时刻之前(包括t时刻)位置(e x,e y)为(3,3)的AER事件有两个,100ms处AER事件1和200ms处AER事件2。那么首先对象识别的装置将卷积核的a22乘以AER事件1对应的
Figure PCTCN2020111650-appb-000011
得到第一数值,并将卷积核的a22乘以AER事件2对应的
Figure PCTCN2020111650-appb-000012
得到第二数值。位置(3,3)处,t时刻之前(包括t时刻)位置(e x,e y)不是(3,3),但是感受野覆盖位置(3,3)的AER事件有两个,150ms、位置为(2,2)的AER事件3和210ms、位置为(4,4)的AER事件4。对象识别的装置确定AER事件3在位置(3,3)处时,对应的卷积核的数值为a33,将a33与
Figure PCTCN2020111650-appb-000013
相乘得到第三数值。并且对象识别的装置确定AER事件4在位置(3,3)处时,对应的卷积核的数值为a11,将a11与
Figure PCTCN2020111650-appb-000014
相乘得到第四数值。然后对象识别的装置将第一数值、第二数值、第三数值、第四数值相加,得到第五数值,第五数值即为特征图中位置(3,3)在t时刻的特征值。此处仅是为了描述清楚式(2)的计算过程,在计算机处理时,也可以按照其他方式进行计算。
另外,从式(2)中可以得出r(x,y,t,s,θ)由两部分相乘得到,一部分用于反映空间信息,即G(Δx,Δy,s,θ),另一部分用于反映时间信息,即
Figure PCTCN2020111650-appb-000015
由于使用
Figure PCTCN2020111650-appb-000016
衰减较早的AER事件对当前时刻的特征值的影响,特征图中既包括空间信息,也包括时间信息。
上述是针对一组尺度s和方向θ下的Gabor滤波器,确定该组尺度s和方向θ的Gabor滤波器对应的特征图,对于任意一组尺度s和方向θ的Gabor滤波器对应的特征图,对象识别的装置均可以使用上述式(2)确定。
例如,如图8所示,假设AER传感器包括6*6个像素,尺度s为3,方向θ为45度时,步骤602输出的特征图为6*6,在没有任何AER事件输入时,特征图中每个位置的值均为0,在100ms位置(4,4)产生AER事件,将尺度s为3,方向θ为45度的Gabor滤波器的卷积核覆盖到特征图的(4,4)位置的AER事件的感受野上。而随着时间的推移到达200ms时,特征图中的特征值相比于100ms有一定的衰减。
综上可见,随着时间的推移,特征图中的每个像素对应的特征值都朝着静止电位减少,或增加。需要说明的是,静止电位一般为0,朝着静止电位减少的情况是:大于0的特征值都会朝着静止电位0减少,如从1变为0.5。朝着静止电位增加的情况是:小于0的特征值都会朝着静止电位0增加,如从-1变为-0.5。
需要说明的是,上述式(2)中的t指的是AER数据中时间。例如,AER数据的第5秒,AER数据的第10秒等。另外,上述衰减方式是按照指数的方式进行衰减,同样,也可以使 用其他方式衰减,只要能达到空间信息随时间的推移进行衰减即可,本申请实施例不作限定。本申请实施例中可以使用16组尺度s和方向θ,他们的取值如表一所示。
表一
尺度s 3 5 7 9
方向θ 0度 45度 90度 135度
在表一中,由于尺度有4个数值,方向θ有4个数值,所以尺度s和方向θ的组合为16种,即有16个Gabor滤波器。这样,每组尺度s和方向θ都对应一个特征图,所以本申请实施例中S1层可以输出16个特征图。
另外,在本申请实施例中,γ取值可以为0.3。在尺度s取值为3时,σ取值为1.2,λ取值为1.5;在尺度s取值为5时,σ取值为2.0,λ取值为2.5;在尺度s取值为9时,σ取值为3.6,λ取值为4.6;在尺度s取值为7时,σ取值为2.8,λ取值为3.5。
在实现上述步骤603时使用C1层,在C1层中,对象识别的装置将S1层输出的每个特征图分为相邻2*2的区域。对于每个特征图,对象识别的装置选取该特征图中每个2*2的区域中的最大值,得到该特征图对应的新特征图。可见,C1层仅会改变特征图的维度,而不会改变特征图的数目。例如,S1层输出的特征图为16个,每个的大小为128*128,得到新特征图为16个,每个的大小为64*64。C1层的处理可以称为是池化操作。
这样,通过C1层的处理,可以降低特征图的维度,进而可以降低后续编码层和识别层的处理量。另外,若S1层输出的特征图的维度比较小,且对象识别的装置的处理能力比较强,可以不进行C1层的处理。
在实现上述步骤603时使用编码层,编码层用于对AER数据的多个特征图进行编码处理,具体处理时对象识别的装置将AER数据的特征图,编码成脉冲序列。对于任意一组尺度s和方向θ下的Gabor滤波器得到的特征图,在被编码时,对象识别的装置进行时间编码和空间编码。时间编码可以用于将特征图中的各特征值按照目标编码函数进行时序编码处理,得到特征图各特征值的脉冲的触发时刻,目标编码函数可以是反向的线性函数或者反向的对数函数。空间编码用于将脉冲的触发时刻组成脉冲序列。
需要说明的是,上述进行时间编码时,特征图中特征值较大的特征被认为更容易产生脉冲,对应最小延迟时间,会先触发脉冲,而特征图中特征值较小的特征会比较晚触发脉冲甚至不触发脉冲。所以基于该原则,目标编码函数为反向的对数函数或者反向的线性函数。反向的对数函数可以是u-vln(r)(在后文会进行描述)。反向的线性函数可以是kr+b(k为小于0的数值)。而且由于反向的对数函数或者反向的线性函数改变了特征图中特征值的分布,使得特征值能够在后续的识别层中表达更多的信息,所以可以提高识别的准确率。
以下分别描述时间编码和空间编码:
时间编码:对于任一特征图A,本申请实施例以目标编码函数为反向的对数函数为例说明时间编码,函数表达式可以为:
t=C(r)=u-vln(r)    (3)
其中,在式(3)中,r为任一位置的特征值,t为特征值为r的脉冲的触发时刻。u和v是归一化因子,用于确保一个特征图中所有特征值对应的脉冲在预定的时间窗口tw中激发,例如,tw为120ms。C()表示目标编码函数。
u和v可以使用如下方式确定:
Figure PCTCN2020111650-appb-000017
其中,对于特征图A,r max为该特征图A中最大的特征值,r min为预先定义的最小阈值。此处需要说明的是,每个特征图中r max和r min有可能不相同,所以在对不同的特征图进行时间编码时,需要重新确定r max和r min
通过时间编码,对象识别的装置将每个特征图中的特征值,编码为脉冲的触发时刻,相当于每个特征图中各个特征值分别对应一个脉冲的触发时刻。例如,有16个特征图,每个特征图为64*64个特征值,一共会有16*64*64个脉冲的触发时刻。
另外,由于特征图中较小的特征值有可能不会触发脉冲,所以对象识别的装置在进行时间编码之前,可以将特征图中特征值小于目标阈值(目标阈值为r min)的特征值删除,以节约处理资源。相应的,在后续进行空间编码时,也不需要对小于目标阈值的特征值的脉冲的触发时刻进行统计,所以也可以节约处理资源。
空间编码,对象识别的装置在进行空间编码时,对象识别的装置可以融合某些特征值以更有效的利用神经元,形成紧凑的表示,以减少后续识别层的计算量。具体可以是将C1层输出的特征图中,具有相同位置,且设定方向的所有尺度s的特征值组成一组特征值(设定方向指多个方向θ中的一个固定方向,在本申请实施例中,设定方向可以为0度、45度、90度或135度中的任一个)。例如,在C1层输出16个特征图,16个特征图可以分为4个方向的特征图,每个方向的特征图为4个特征图(即4个尺度的特征图),在设定方向为0度的4个特征图中位置(m,n)处的值分别为3、4、5、5,设定方向为0度相同位置(2,2)处的一组特征值为(3,4,5,5)。
然后对象识别的装置将设定方向为0度的特征图中相同位置(如相同位置均为(2,2))的特征值组成一组,即得到设定方向为0度位置为(2,2)的4个特征值组成的一组特征值。然后获取每组特征值对应的脉冲的触发时刻,这样,每组特征值对应的脉冲的触发时刻组成一个脉冲序列。由于有多组特征值,所以可以得到多个脉冲序列。
在本申请实施例中,编码层可以包括多个编码神经元,每个编码神经元负责同一位置设定方向的多个尺度s的特征图的转换。编码神经元的数目可以为N*P*M(N和P可以相等,也可以不相等),其中N*P为特征图(C1层输出的特征图)的尺寸,M为方向θ的数目。
经过上述分析,将空间编码和时间编码结合,上述式(3)可以表示为:
t spike=C(r|x,y,s,θ)=u-vln(r)    (5)
其中,r∈{r|r x=x,r y=y,r s∈S,r θ=θ},r s表示特征值r的尺度,S表示尺度s的集合,r θ表示特征值r的方向,r x和r y表示特征值r在特征图中的位置。式(5)的函数表明位置为(x,y)、方向为θ所有尺度的脉冲的触发时刻生成的t spike集合。
这样,对于特征图的每个位置,均可以得到4个方向所有尺度(即4个方向的Gabor滤波器)的脉冲的触发时刻。例如,特征图的大小是64*64,方向有四个数值(0度、45度、90度和135度),尺度s也有四个数值(3、5、7、9),特征图有16个。对于特征图中的每个位置,有四个方向分别对应的脉冲序列,每个方向对应的脉冲序列中包括4个尺度对应的脉冲的触发时刻,那么一共有64*64*4个脉冲序列。
这样,脉冲序列中的脉冲的触发时刻是基于特征图中的特征值得到,由于每个特征值反映待识别对象的部分空间信息和部分时间信息,所以脉冲序列中的脉冲也携带有待识别对象 的部分空间信息和部分时间信息。
需要说明的是,由于在编码层将不同尺度的脉冲进行融合,所以可以在保持准确率的情况下,也可以减少后续识别层中参数的数目,非常适用于资源受限制的神经拟态设备中。
在实现上述步骤603时还使用识别层,识别层用于接收编码层输出的脉冲序列,对AER数据中的待识别对象进行识别。识别层可以通过SNN实现,SNN是一层全连接的网络结构组成。SNN中包括的神经元(即后续提到的识别神经元)的数目等于N*P*M(N和P可以相等,也可以不相等),其中N*P为特征图(C1层输出的特征图)的尺寸,M为方向θ的数目。
对象识别的装置在得到编码层的输出后,可以将每个脉冲序列输入到识别层中的每个识别神经元中。通过识别层的识别处理,可以得到AER数据中的对象。
另外,本申请实施例中,还提供了训练SNN的方法,处理可以为:
训练SNN的方法可以包括监督学习算法和无监督学习算法。监督学习算法可以为Multi—Spike Prop算法等。非监督学习算法可以为STDP算法等。本申请以STDP算法为例进行SNN的训练,根据突触前神经元和突触后神经元发放的脉冲序列的相对时序关系,应用STDP算法学习规则可以对突触权值进行无监督式的调整。训练过程可以如下:
步骤a,获取样本集,样本集中包括AER数据,AER数据中包括多个AER事件。
步骤b,将多个AER事件经过前述特征提取,得到特征图。
步骤c,将特征图经过编码神经元编码后,得到脉冲序列(按照前述方式进行)。
步骤d,将脉冲序列输入至识别层,刺激识别神经元发放脉冲。STDP算法根据编码神经元和识别神经元脉冲发放的时间间隔调整突触权重,如果编码神经元脉冲先于识别神经元脉冲,则增大权重,反之则减小权重。同时识别神经元采用动态阈值,即如果该识别神经元经常容易触发脉冲,则增加其阈值。识别神经元之间相互连接,起到互相抑制的作用。
步骤e,执行步骤b至步骤d目标次数(目标次数可以为5-10次)后,结束训练。将学习率设置为零,确定步骤d最后一次执行时每个识别神经元的阈值和每个突触的权重。根据识别神经元对样本集中样本类别的最高响应,为每个识别神经元分配一个类别(这是使用标签的唯一步骤)。
后续在使用过程中,可以通过被分配好类别的每个识别神经元的响应,选择具有最高激发速率的类别,来作为预测结果。
本申请实施例中,对象识别的装置可以获取待识别对象的AER数据,AER数据包括待识别对象的多个AER事件,每个AER事件包括产生AER事件的时间戳和地址信息,然后提取AER数据的多个特征图,每个特征图包含待识别对象的部分空间信息和部分时间信息,部分空间信息和部分时间信息是根据每个AER事件的时间戳和地址信息获得的。最后根据AER数据的多个特征图,对待识别对象进行识别处理。这样,由于AER数据中待识别对象的时间信息和空间信息都包含在提取的特征图中,所以可以使特征图能够更全面的代表原始数据,进而在进行识别时,可以使识别结果更准确。
而且本申请实施例中使用了脉冲编码方式,不仅可以在后续识别模型中表达更多的信息,提高识别的准确率,而且由于将不同尺度的脉冲进行融合,可以在保持准确率的基础上减少了识别神经元的数目,进而节省了计算资源。
图9是本申请实施例提供的对象识别的装置的结构图。该装置可以通过软件、硬件或者 两者的结合实现成为装置中的部分或者全部。本申请实施例提供的装置可以实现本申请实施例图6所述的流程,该装置包括:获取模块910、提取模块920和识别模块930,其中:
获取模块910,用于获取待识别对象的地址事件表示AER数据,所述AER数据包括所述待识别对象的多个AER事件,每个AER事件包括产生所述AER事件的时间戳和地址信息,具体可以用于执行实现步骤601的获取功能;
提取模块920,用于提取所述AER数据的多个特征图,其中,每个特征图包含所述待识别对象的部分空间信息和部分时间信息,所述部分空间信息和部分时间信息是根据所述每个AER事件的时间戳和地址信息获得的,具体可以用于执行实现步骤602的提取功能,以及步骤602包含的隐含步骤;
识别模块930,用于根据所述AER数据的多个特征图,对所述待识别对象进行识别,具体可以用于执行实现步骤603的识别功能,以及步骤603包含的隐含步骤。
在一种可能的实施方式中,所述提取模块920,用于:
采用多个滤波器对所述多个AER事件的地址信息进行处理,得到多个第一特征图;
按照所述多个AER事件的时间戳对所述多个第一特征图中的特征值进行衰减处理,得到所述AER数据的多个特征图。
在一种可能的实施方式中,所述提取模块920,用于:
采用多个Gabor滤波器的卷积核对所述多个AER事件的空间信息进行卷积处理,得到多个第一特征图。
在一种可能的实施方式中,所述识别模块930,用于:
对所述AER数据的多个特征图进行编码以获得多个脉冲序列,其中,每个脉冲序列包括多个脉冲,每个脉冲携带有所述待识别对象的部分时间信息和部分空间信息,属于同一个脉冲序列中的多个脉冲是根据具有相同设定方向上的不同滤波器对应的特征图中,具有相同位置的特征值获得的;
采用脉冲神经网络对所述多个脉冲序列进行处理,以对所述待识别对象进行识别。
在一种可能的实施方式中,所述识别模块930,用于:
采用目标编码函数对所述AER数据的多个特征图进行编码,以获得多个脉冲序列,所述目标编码函数为反向的线性函数或者反向的对数函数。
本申请实施例中,对象识别的装置可以获取待识别对象的AER数据,AER数据包括待识别对象的多个AER事件,每个AER事件包括产生AER事件的时间戳和地址信息,然后提取AER数据的多个特征图,每个特征图包含待识别对象的部分空间信息和部分时间信息,部分空间信息和部分时间信息是根据每个AER事件的时间戳和地址信息获得的。最后根据AER数据的多个特征图,对待识别对象进行识别处理。这样,由于AER数据中待识别对象的时间信息和空间信息都包含在提取的特征图中,所以可以使特征图能够更全面的代表原始数据,进而在进行识别时,可以使识别结果更准确。
而且本申请实施例中使用了脉冲编码方式,不仅可以在后续识别模型中表达更多的信息,提高识别的准确率,而且由于将不同尺度的脉冲进行融合,可以在保持准确率的基础上减少了识别神经元的数目,进而节省了计算资源。
需要说明的是:上述实施例提供的对象识别的装置在对象识别时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成, 即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的对象识别的装置与对象识别的方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
本实施例中,还提供了一种对象识别的计算设备,该计算设备包括处理器和存储器,所述存储器用于存储一个或多个指令,所述处理器通过执行所述一个或多个指令来实现上述所提供的对象识别的方法置。
本实施例中,还提供了一种计算机可读存储介质,计算机可读存储介质存储有指令,当计算机可读存储介质中的指令在计算设备上被执行时,使得计算设备执行上述所提供的对象识别的方法。
本实施例中,还提供了一种包含指令的计算机程序产品,当其在计算设备上运行时,使得计算设备执行上述所提供的对象识别的方法,或者使得所述计算设备实现上述提供的对象识别的装置的功能。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现,当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令,在服务器或终端上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴光缆、光纤、数字用户线)或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是服务器或终端能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(如软盘、硬盘和磁带等),也可以是光介质(如数字视盘(Digital Video Disk,DVD)等),或者半导体介质(如固态硬盘等)。
可以理解的是,以上所描述的装置实施例仅仅是示意性的,例如,所述模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。例如,多个模块或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另外,上述实施例所讨论的模块相互之间的连接可以是电性、机械或其他形式。所述作为分离部件说明的模块可以是物理上分开的,也可以不是物理上分开的。作为模块显示的部件可以是物理模块或者也可以不是物理模块。另外,在申请实施例各个实施例中的各功能模块可以独立存在,也可以集成在一个处理模块中。
需要说明的是,本申请所提供的实施例仅仅是示意性的。所属领域的技术人员可以清楚的了解到,为了描述的方便和简洁,在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。在本发明实施例、权利要求以及附图中揭示的特征可以独立存在也可以组合存在。在本发明实施例中以硬件形式描述的特征可以通过软件来执行,反之亦然。在此不做限定。

Claims (12)

  1. 一种对象识别的方法,其特征在于,所述方法包括:
    获取待识别对象的地址事件表示AER数据,所述AER数据包括所述待识别对象的多个AER事件,每个AER事件包括产生所述AER事件的时间戳和地址信息;
    提取所述AER数据的多个特征图,其中,每个特征图包含所述待识别对象的部分空间信息和部分时间信息,所述部分空间信息和部分时间信息是根据所述每个AER事件的时间戳和地址信息获得的;
    根据所述AER数据的多个特征图,对所述待识别对象进行识别。
  2. 根据权利要求1所述的方法,其特征在于,所述提取所述AER数据的多个特征图,包括:
    采用多个滤波器对所述多个AER事件的地址信息进行处理,得到多个第一特征图;
    按照所述多个AER事件的时间戳对所述多个第一特征图中的特征值进行衰减处理,得到所述AER数据的多个特征图。
  3. 根据权利要求2所述的方法,其特征在于,所述采用多个滤波器对所述多个AER事件的地址信息进行处理,得到多个第一特征图,包括:
    采用多个Gabor滤波器的卷积核对所述多个AER事件的空间信息进行卷积处理,得到所述多个第一特征图。
  4. 根据权利要求2或3所述的方法,其特征在于,所述根据所述AER数据的多个特征图,对所述待识别对象进行识别,包括:
    对所述AER数据的多个特征图进行编码以获得多个脉冲序列,其中,每个脉冲序列包括多个脉冲,每个脉冲携带有所述待识别对象的部分时间信息和部分空间信息,属于同一个脉冲序列中的多个脉冲是根据具有相同设定方向上的不同滤波器对应的特征图中,具有相同位置的特征值获得的;
    采用脉冲神经网络对所述多个脉冲序列进行处理,以对所述待识别对象进行识别。
  5. 根据权利要求4所述的方法,其特征在于,所述对所述AER数据的多个特征图进行编码以获得多个脉冲序列,包括:
    采用目标编码函数对所述AER数据的多个特征图进行编码,以获得多个脉冲序列,所述目标编码函数为反向的线性函数或者反向的对数函数。
  6. 一种对象识别的装置,其特征在于,所述装置包括:
    获取模块,用于获取待识别对象的地址事件表示AER数据,所述AER数据包括所述待识别对象的多个AER事件,每个AER事件包括产生所述AER事件的时间戳和地址信息;
    提取模块,用于提取所述AER数据的多个特征图,其中,每个特征图包含所述待识别对象的部分空间信息和部分时间信息,所述部分空间信息和部分时间信息是根据所述每个AER 事件的时间戳和地址信息获得的;
    识别模块,用于根据所述AER数据的多个特征图,对所述待识别对象进行识别。
  7. 根据权利要求6所述的装置,其特征在于,所述提取模块,用于:
    采用多个滤波器对所述多个AER事件的地址信息进行处理,得到多个第一特征图;
    按照所述多个AER事件的时间戳对所述多个第一特征图中的特征值进行衰减处理,得到所述AER数据的多个特征图。
  8. 根据权利要求7所述的装置,其特征在于,所述提取模块,用于:
    采用多个Gabor滤波器的卷积核对所述多个AER事件的空间信息进行卷积处理,得到多个第一特征图。
  9. 根据权利要求7或8所述的装置,其特征在于,所述识别模块,用于:
    对所述AER数据的多个特征图进行编码以获得多个脉冲序列,其中,每个脉冲序列包括多个脉冲,每个脉冲携带有所述待识别对象的部分时间信息和部分空间信息,属于同一个脉冲序列的多个脉冲是根据具有相同设定方向上的不同滤波器对应的特征图中,具有相同位置的特征值获得的;
    采用脉冲神经网络对所述多个脉冲序列进行处理,以对所述待识别对象进行识别。
  10. 根据权利要求9所述的装置,其特征在于,所述识别模块,用于:
    采用目标编码函数对所述AER数据的多个特征图进行编码,以获得多个脉冲序列,所述目标编码函数为反向的线性函数或者反向的对数函数。
  11. 一种对象识别的计算设备,其特征在于,所述计算设备包括:
    通信接口,用于接收待识别对象的地址事件表示AER数据;
    处理器,与所述通信接口连接并用于执行如权利要求1-5中任一权利要求所述的方法。
  12. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有指令,当所述计算机可读存储介质中的指令被计算设备执行时,使得所述计算设备执行所述权利要求1-5中任一权利要求所述的方法。
PCT/CN2020/111650 2019-08-30 2020-08-27 对象识别方法及装置 WO2021037125A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP20858856.6A EP4016385A4 (en) 2019-08-30 2020-08-27 OBJECT IDENTIFICATION METHOD AND APPARATUS
US17/680,668 US20220180619A1 (en) 2019-08-30 2022-02-25 Object recognition method and apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910818551.1 2019-08-30
CN201910818551.1A CN112446387A (zh) 2019-08-30 2019-08-30 对象识别方法及装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/680,668 Continuation US20220180619A1 (en) 2019-08-30 2022-02-25 Object recognition method and apparatus

Publications (1)

Publication Number Publication Date
WO2021037125A1 true WO2021037125A1 (zh) 2021-03-04

Family

ID=74685584

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/111650 WO2021037125A1 (zh) 2019-08-30 2020-08-27 对象识别方法及装置

Country Status (4)

Country Link
US (1) US20220180619A1 (zh)
EP (1) EP4016385A4 (zh)
CN (1) CN112446387A (zh)
WO (1) WO2021037125A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114611686A (zh) * 2022-05-12 2022-06-10 之江实验室 基于可编程神经拟态核的突触延时实现系统及方法

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115909329B (zh) * 2023-01-10 2023-05-26 深圳前海量子云码科技有限公司 一种微观目标识别方法、装置、电子设备和存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105469039A (zh) * 2015-11-19 2016-04-06 天津大学 基于aer图像传感器的目标识别系统
CN105721772A (zh) * 2016-01-20 2016-06-29 天津师范大学 一种异步时域视觉信息成像方法
CN106407990A (zh) * 2016-09-10 2017-02-15 天津大学 基于事件驱动的仿生目标识别系统
CN106446937A (zh) * 2016-09-08 2017-02-22 天津大学 用于aer图像传感器的多层卷积识别系统
US20180357504A1 (en) * 2017-06-13 2018-12-13 Samsung Electronics Co., Ltd. Event-based image feature extraction

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105469039A (zh) * 2015-11-19 2016-04-06 天津大学 基于aer图像传感器的目标识别系统
CN105721772A (zh) * 2016-01-20 2016-06-29 天津师范大学 一种异步时域视觉信息成像方法
CN106446937A (zh) * 2016-09-08 2017-02-22 天津大学 用于aer图像传感器的多层卷积识别系统
CN106407990A (zh) * 2016-09-10 2017-02-15 天津大学 基于事件驱动的仿生目标识别系统
US20180357504A1 (en) * 2017-06-13 2018-12-13 Samsung Electronics Co., Ltd. Event-based image feature extraction

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4016385A4 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114611686A (zh) * 2022-05-12 2022-06-10 之江实验室 基于可编程神经拟态核的突触延时实现系统及方法

Also Published As

Publication number Publication date
US20220180619A1 (en) 2022-06-09
EP4016385A4 (en) 2022-10-26
EP4016385A1 (en) 2022-06-22
CN112446387A (zh) 2021-03-05

Similar Documents

Publication Publication Date Title
Baldwin et al. Time-ordered recent event (TORE) volumes for event cameras
US20220108546A1 (en) Object detection method and apparatus, and computer storage medium
WO2020192469A1 (zh) 图像语义分割网络的训练方法、装置、设备及存储介质
WO2021043168A1 (zh) 行人再识别网络的训练方法、行人再识别方法和装置
EP3779774B1 (en) Training method for image semantic segmentation model and server
Zhu et al. Retina-like visual image reconstruction via spiking neural model
US11928893B2 (en) Action recognition method and apparatus, computer storage medium, and computer device
US20210398252A1 (en) Image denoising method and apparatus
EP3500979A1 (en) Computer device for training a deep neural network
CN111797683A (zh) 一种基于深度残差注意力网络的视频表情识别方法
WO2021037125A1 (zh) 对象识别方法及装置
CN111339813B (zh) 人脸属性识别方法、装置、电子设备和存储介质
Zheng et al. Unraveling neural coding of dynamic natural visual scenes via convolutional recurrent neural networks
KR20190128933A (ko) 시공간 주의 기반 감정 인식 장치 및 방법
Iyer et al. Unsupervised learning of event-based image recordings using spike-timing-dependent plasticity
CN113011562A (zh) 一种模型训练方法及装置
Thakare et al. Classification of bioinformatics EEG data signals to identify depressed brain state using CNN Model
Li et al. Event stream super-resolution via spatiotemporal constraint learning
CN111259759A (zh) 基于域选择迁移回归的跨数据库微表情识别方法及装置
Machado et al. HSMD: An object motion detection algorithm using a Hybrid Spiking Neural Network Architecture
Hu et al. Bio-inspired visual neural network on spatio-temporal depth rotation perception
WO2023083121A1 (zh) 去噪方法和相关设备
CN115346091B (zh) 一种Mura缺陷图像数据集的生成方法和生成装置
CN113221605A (zh) 对象识别的方法、装置及计算机可读存储介质
Li et al. Fusion of ANNs as decoder of retinal spike trains for scene reconstruction

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20858856

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2020858856

Country of ref document: EP

Effective date: 20220315