CN106250981A - The impulsive neural networks of bandwidth consumption in minimizing memory access and network - Google Patents

The impulsive neural networks of bandwidth consumption in minimizing memory access and network Download PDF

Info

Publication number
CN106250981A
CN106250981A CN201610373734.3A CN201610373734A CN106250981A CN 106250981 A CN106250981 A CN 106250981A CN 201610373734 A CN201610373734 A CN 201610373734A CN 106250981 A CN106250981 A CN 106250981A
Authority
CN
China
Prior art keywords
layer
neural networks
impulsive neural
frustum
pulse
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610373734.3A
Other languages
Chinese (zh)
Other versions
CN106250981B (en
Inventor
约翰·布拉泽斯
李柱勋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Samsung Electronics Co Ltd
Original Assignee
Samsung Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US15/062,365 external-priority patent/US10387770B2/en
Application filed by Samsung Electronics Co Ltd filed Critical Samsung Electronics Co Ltd
Publication of CN106250981A publication Critical patent/CN106250981A/en
Application granted granted Critical
Publication of CN106250981B publication Critical patent/CN106250981B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

A kind of reduce the impulsive neural networks of bandwidth consumption in memory access and network.Can realize having and use the first impulsive neural networks splitting multiple layers that are divided into multiple frustum, wherein, each frustum includes a piece of of each dividing layer of impulsive neural networks.The first of the ground floor of impulsive neural networks can be read.Utilize processor, the first of ground floor while storing intermediate data in the internal storage of processor, can be used to produce the first of the second layer of impulsive neural networks.The first of ground floor and the first of the second layer belong to same frustum.

Description

The impulsive neural networks of bandwidth consumption in minimizing memory access and network
This application claims submit on June 10th, 2015 No. 62/173,742 U.S. Provisional Patent Application, 2016 years 3 The 10-2016-0048957 that No. 15/062,365 U.S. Patent application of the moon 7 submission and on April 21st, 2016 submit to The interests of number korean patent application, entire contents is hereby incorporated by.
Technical field
It relates to a kind of impulsive neural networks.More specifically, it relates to the term of execution to reduce pulse neural Bandwidth consumption in the memory access of network and network.
Background technology
Neutral net can use in various different application.Such as, neutral net may be used for execution task, including But it is not limited to, speech recognition, sensation target identification and/or location, image reconstruction, abnormality detection, pattern recognition may be needed Other application etc..Impulsive neural networks is the certain kinds neutral net being suitable for processing data in terms of time and/or order. Such as, impulsive neural networks may be used for processing audio frequency, such as voice and/or video.
In impulsive neural networks, " pulse " transmits between neuron.This spy of impulsive neural networks can be utilized Property realizes the neutral net of power optimization.The neutral net of these power optimizations can with various types of device and/ Or system realizes, include but not limited to, consumer electronics device, server, Cloud Server application etc..
Pulse convolutional neural networks (SCNN) and pulse recurrent neural network (SRNN) belong to the specific of impulsive neural networks Subclass.SCNN and SRNN uses the network topology structure of CNN and RNN and optimizes training infrastructure.But, SCNN and SRNN quilt It is converted into pulse to realize for performing.The input being supplied to impulsive neural networks includes spike train.More specifically, each input god A series of pulses are received as input through unit.As indicated, cross over impulsive neural networks transmit information, as from one layer to The pulse of next layer.Final output is typically to export spike train.Output pulse can be counted or otherwise change into Required result, this be likely about receive the classification of input or itself it is determined that.
Summary of the invention
One embodiment can include that a kind of realization uses the first segmentation to be split by the impulsive neural networks with multiple layers The method becoming multiple frustum, wherein, each frustum includes a piece of of each dividing layer of impulsive neural networks.Method Including: read the first of the ground floor of impulsive neural networks;And utilize processor, deposit in the internal storage of processor The first using ground floor while storage intermediate data produces the first of the second layer of impulsive neural networks;The of ground floor A piece of and the second layer first belongs to same frustum.
Another embodiment can include realizing using the first segmentation to be divided into many by the impulsive neural networks with multiple layers The equipment of individual frustum, wherein, each frustum includes a piece of of each dividing layer of impulsive neural networks.Equipment includes It is configured to store the internal storage of intermediate data, and is coupled to internal storage and is configured to initiation and can perform operation The first computing unit.Operation can be performed include: read the first of the ground floor of impulsive neural networks, and at processor Internal storage stores the of the second layer that uses while intermediate data the first of ground floor to produce impulsive neural networks A piece of.The first of ground floor and the first of the second layer belong to same frustum.
Another embodiment can include computer program, has realization to use the first segmentation to have including storage on it The impulsive neural networks of multiple layers is divided into the computer-readable recording medium of the program code of multiple frustum, wherein, often Individual frustum includes a piece of of each dividing layer of impulsive neural networks.Program code can by processor perform with perform with Lower operation, described operation includes: read the first of the ground floor of impulsive neural networks, and at the internal storage of processor The first using ground floor while middle storage intermediate data produces the first of the second layer of impulsive neural networks.Ground floor First and the first of the second layer belong to same frustum.
Present invention part is provided only to introduce specific concept rather than identify any key of theme required for protection Or essential feature.With reference to the accompanying drawings and detailed description below, many further features of the present invention and embodiment will be apparent 's.
Accompanying drawing explanation
Accompanying drawing illustrates one or more embodiment;But, accompanying drawing is not construed as being only limited to the present invention shown reality Execute example.Checking detailed description below and referring to the drawings the most afterwards, each side and advantage will be apparent from.
Fig. 1 is the block diagram of the exemplary process of multiple layers that illustrate impulsive neural networks.
Fig. 2 is the block diagram of the example division illustrating the neural net layer with superimposed sheets (overlapping tiles).
Fig. 3 is the block diagram illustrating the process performed by Exemplary neural network engine.
Fig. 4 is the flow chart illustrating the illustrative methods performing impulsive neural networks.
Fig. 5 is the flow chart of the illustrative methods of the frustum (frustum) illustrating and determining neutral net.
Fig. 6 is the example division of the layer of impulsive neural networks.
Fig. 7 is the flow chart illustrating the another exemplary method performing impulsive neural networks.
Fig. 8 is the block diagram of the example illustrating the data handling system for realizing dividing processing in neutral net.
Detailed description of the invention
Although the disclosure with limit novel feature claim as conclusion, but it is believed that combine retouching of accompanying drawing by consideration Stating, various features described here are best understood from.The process of description, machine in the scope of the present disclosure is provided for illustrative purposes Device, manufacture and any change thereof.Any ad hoc structure described and function detail are not necessarily to be construed as limiting, and as just power Basis that profit requires and the basis as representative, be used for instructing those skilled in the art differently use substantially any suitably Detailed construction described in feature.It addition, the term and the phrase that use in the disclosure are not intended to restrictive, but carry Intelligible description for described feature.
It relates to impulsive neural networks (SNN).More specifically, it relates to the term of execution reduce memory access Ask and bandwidth consumption in network.SNN may be used for realizing power optimization network, and it can be used in from mobile consumption device to permissible In various types of system in the server-wide performing Cloud Server application.
SNN utilizes the specific neuron models of the piecemeal activity of suppression SNN, thus allows SNN than other type of nerve The power that network consumption is less.Can be by using the activity of the neuron of threshold model suppression SNN.Unless feature is given Region sufficiently strong, otherwise the response of SNN is (0) or do not produce pulse.The power of the part consumption of neutral net becomes can Ignore.
Many dissimilar SNN are at each neuron storage postsynaptic potential (PSP).The PSP value table of the neuron in SNN Show the SNN state in preset time.Owing to incoming data are 0 (instruction does not has pulse) or 1 (marker pulse), therefore update nerve The calculating cost of the PSP of unit can be the least.Determine the weight of input and to determine that the PSP value of neuron includes condition addition fortune Calculate rather than multiplying.Multiplying is eliminated.While it is true, some SNN store substantial amounts of data, including PSP Value.Store these data a large amount of internal memories of possible needs to perform SNN, and produce a large amount of reading and write data traffic with preservation The state of SNN also recovers the state of SNN after processed each time interval.
According in the layout of invention as described herein, can be reduced or eliminated for storing and recovering SNN state The mode of data traffic performs to be performed SNN by neutral net engine (NN engine).Due to reduce or eliminate and store PSP value and The data traffic that PSP value is relevant is obtained, outside therefore can also reducing or eliminating the most in some cases from external memory storage SNN state storage in portion's memorizer.On the other hand, SNN can be performed so that pulse can be produced, to reduce and/or to disappear Except when to each neuron calculate input weight and time zero interpolation.
As described herein, the minimizing of data traffic is converted into the NN engine power consumption reduced when performing SNN.Can also Increase the performance of NN engine, such as, make operation or perform speed faster.Either power consumption is also carried out the performance of time and changes Enter, it is allowed to power and/or the limited system of heat budget realize SNN as described herein.At bigger server-based computing In environment, power consumption remains and may stop the prime cost and restriction utilizing the application scale of neutral net to increase.These public affairs The layout of the present invention opening description promotes (to include but not limited to, cloud computing environment) realize in server-based computing environment SNN using and being scaled.
Fig. 1 is the block diagram of the exemplary process of multiple layers that illustrate SNN 100.In the example of fig. 1, SNN 100 includes Layer 102,106 and 106.Layer 102 is ground floor.Layer 104 is the second layer.Layer 106 is third layer.In the example of fig. 1, data from Layer 102 flows to layer 104, then to layer 106.As example, it is neural that layer 102,106 and 106 may be implemented as feature extraction The convolutional layer of network.
As defined herein, " layer " of neutral net includes one or more Feature Mapping.As shown, layer 102,106 and Each of 106 can include one or more Feature Mapping.In the example of fig. 1, layer 102 includes Feature Mapping 102- 1.Layer 104 includes 4 Feature Mapping 104-1,104-2,104-3 and 104-4.Layer 106 includes 6 Feature Mapping 106-1,106- 2,106-3,106-4,106-5 and 106-6.Should be appreciated that the number of each shown Feature Mapping of layer 102,106 and 106 Amount is merely to illustration purpose.The layout of the present invention that the disclosure describes is not intended to by the certain number in any layer of SNN 100 Certain amount of layer in the Feature Mapping of amount and/or SNN 100 limits.
In arranging at one, NN engine can perform the SNN 100 shown in Fig. 1.During operation, NN engine is by reading Intermediate data flow is produced with the Feature Mapping of write layer 102,106 and 106.NN engine can perform in performing SNN 100 Multiple read operations of Feature Mapping.For illustrative purposes, due to each of layer 102,106 and 106 can include 10, 100 or more features map, therefore, SNN 100 the term of execution can be in layer and/or interlayer produces a large amount of mediants According to flow.
On the one hand, each of Feature Mapping may be implemented as representing that different (x, y) learning characteristic of position is strong The 2D image of the PSP value of degree maps.In order to produce the Feature Mapping of the layer N+1 of SNN 100, multiple spies of NN engine reading layer N Levy mapping.Example as shown, if layer N has 10 Feature Mapping, layer N+1 has 20 Feature Mapping, then NN engine is permissible Each Feature Mapping of reading layer N 20 times.So, NN engine list from layer N it is possible to perform 200 Feature Mapping altogether and read (10x20)。
The one side arranged according to invention described herein, NN engine can rearrange meter in realizing SNN 100 Calculate.Rearrange calculate allow NN engine produce consume immediately after intermediate value SNN 100 the term of execution generate Between be worth.By using intermediate value soon, may be limited to NN engine interior outside arbitrarily once must be stored at Intermediate data amount in portion's memorizer (such as random access memory (RAM)).Such as, the term of execution needs of SNN 100 are deposited The intermediate data amount of storage can the internal storage of the sufficiently small processor with applicable NN engine.Internal storage can be that NN draws The on-chip memory of the processor held up or cache.So, intermediate data need not to be stored in the needs more time and/ Or the external RAM that is written and read of energy or other memorizer.
In one exemplary embodiment, NN engine can use identical condition summing elements group to produce the centre of layer Data and consumption intermediate data, as the input of next layer of neutral net.As it has been described above, intermediate data produces at summing elements The most quickly consumed by identical summing elements.Therefore, if any, the most little intermediate data may need in NN engine Arbitrarily it is sent to such as external RAM at a distance.This configuration can aid in the number by interconnection long in reducing NN engine According to flow, to reduce the power consumption of NN engine further.
In a further exemplary embodiment, NN engine can be configured to rearrange amount of calculation, with by the nerve that interweaves The one or more of network maybe the generation of all convolutional layers may reduce and/or eliminate intermediate object program, and localizes middle junction Really.In traditional embodiment, NN engine can all with execution level 102, then execution level 104 is all, then performs Layer 106 all etc..According to the layout of invention as described herein, then NN engine can perform with a part for execution level 102 A part for layer 104, a then part etc. for execution level 106.Such as, NN engine can be with the sheet 108-1 of execution level 102, then The sheet 110-1 of execution level 104, then the sheet 112-1 etc. of execution level 106.Then, NN engine then can be held with performing chip 108-2 Row sheet 110-2, then performing chip 112-2, the rest may be inferred.In another is arranged, input data can be at two or more Time interval in carry out batch processing.
For illustrative purposes, the SNN 100 of Fig. 1 can be visualized as pyramidal layer.As it has been described above, execution can be at tool The pyramid base having the layer 102 of sheet 108 starts, and continues up the layer 104 with sheet 110, and proceeds to have sheet The layer 106 of 112.Owing to SNN 100 is upward through, although the quantity of the Feature Mapping of next higher the most each increases, but It is that next higher each may be shunk in terms of x-y size.Such as, the x-y size of layer 104 can be less than the x-y of layer 102 Size.Layer 104 has more features than layer 102 and maps.In other cases, the number of the Feature Mapping of next higher of SNN Amount can keep identical.
According to another embodiment, the 3D volume of SNN 100 can conceptually be diced or singulated into the flat of multiple rectangle Frutum.Each rectangle frustum can have rectangle intersection with each layer of SNN 100, this defines sheet.Example at Fig. 1 In, SNN 100 is divided into 4 frustums, is referred to as frustum 1,2,3 and 4.Layer by frustum Yu SNN 100 102, the crossing restriction rectangular sheet of each of 104 and 106.Therefore, to every of the given layer each Feature Mapping including this layer A part.Such as, sheet 110-1 includes the upper left of each of Feature Mapping 104-1,104-2,104-3 and 104-4. For discussion purposes, the extension of the Ref. No. of each layer indicates the specific frustum belonging to this sheet.Such as, truncate Head body 1 can include the sheet 108-1, the sheet 110-1 of layer 104 and the sheet 112-1 of layer 106 of layer 102.Frustum 2 can include The layer sheet 108-2 of 102, the sheet 110-2 of the layer 104 and sheet 112-2 etc. of layer 106.
Generally, the process in each frustum can be performed independent of other frustum each.An embodiment In, low volume data can be shared between adjacent flat frutum.For SNN 100 to the sheet of given layer, the processor of NN engine A part for the Feature Mapping consuming and producing can be stored in internal storage, and this internal storage and processor are at sheet On.A part for the Feature Mapping that processor produces for sheet may be used for producing the output of the respective flap of next layer.Such as, place Reason device can consume a part (such as, the sheet 108-1 of layer 102) for the Feature Mapping of storage in internal storage, to produce layer Respective flap 110-1 of 104.The sheet 110-1 of layer 104 can also be stored in internal storage.As limited in the disclosure, art Language " respective flap " refers to the sheet in the identical frustum of adjacent layer of neutral net, as reference plate or body sheet.Then, process Device can utilize the sheet 110-1 of the layer 104 in internal storage to produce the sheet 112-1 of layer 106.Sheet 112-1 can also store In internal storage.In one aspect, the total storage needed for internal storage processes frustum is two phases of SNN 100 The maximum footprint (such as, memorizer uses) of the respective flap of the frustum of adjacent bed.Such as, data corresponding with sheet 112-1 can Data with emulsion sheet 108-1.It should be understood that x and the y size of sheet (such as, frustum size) can reduce as required, To ensure that intermediate object program is suitable for available internal storage.
For each frustum of neutral net, the one of the Feature Mapping that NN engine can limit from the respective flap of layer N Partly produce the part of the Feature Mapping limited by the sheet of layer N+1.In one embodiment, NN engine can be at internal storage Middle holding perform the process of necessity while data in need according to the random order in various different order.Such as, NN engine can pass through each input pulse reading in the input pulse of the part of the Feature Mapping limited by the respective flap of layer N And it is carried out the part that weight maps to each output characteristic producing sheet.The pulse of weight can be added in one by NN engine Rise, with the part of Feature Mapping that limited by the sheet of layer N+1 for (or being not for) to produce pulse.Layer N+ is produced at NN engine After the part of the Feature Mapping of 1, the memory area of the internal storage distributing to the Feature Mapping of layer N can be released, and And for the Feature Mapping of accumulation layer N+2.NN engine can along with the newly generated intermediate data of succeeding layer the most described produce And continue to cover intermediate data.
Although can with independent process frustum, but the sub-fraction of intermediate data can be at the identical layer of neutral net Interior shared along adjacent sheet border.As indicated, the pulse propagation that produces of the sheet 108-1 of layer 102 is to the sheet 110-1 of layer 104.Show at this In example, the sub-fraction of the pulse that the sheet 108-1 of layer 102 produces may need the sheet 108 with layer 102 to share.Each truncate In the embodiment that head body and event queue are associated, it is affected that the pulse event of adjacent sheet can be posted within target In two or more event queues of sheet.In another embodiment, the pulse event of adjacent sheet can be posted within a thing Part queue is also read several times.The pulse event of adjacent sheet is posted to more than one event queue or repeatedly reads adjacent sheet Pulse event between sheet, effectively share data.In another embodiment, can be by sheet being defined as on sheet border that This overlap eliminates the data sharing between adjacent sheet.In this case, NN engine may determine that the pulse event of sheet, including The frontier district of sheet, every once.Therefore, in the case of superimposed sheets, the data of two adjacent sheet need not be shared.
By neutral net is divided into the frustum that can process independently of one another, NN engine can use multiple calculating Unit parallel processing frustum.Such as, a computing unit of NN engine can be with performing chip 108-1,110-1 and 112-1;NN Another computing unit of engine can be with performing chip 108-2,110-2 and 112-2;Another computing unit of NN engine can be with performing chip 108-3,110-3 and 112-3;And another computing unit of NN engine can be with performing chip 108-4,110-4 and 112-4.As indicated, Can use or share some data between the sheet of the next-door neighbour's frustum being in identical layer.
Fig. 2 is the block diagram of the example division illustrating the neural net layer with superimposed sheets.More specifically, Fig. 2 illustrates SNN The layer 102 of 100.As shown, sheet 108-1,108-2,108-3 and 108-4 is defined as overlapping each other.Overlapping region 205 is cloudy Shadow.Also in the case of there is no sheet 108-1,108-2,108-3 and 108-4, overlapping region 205 is shown in isolation.
Fig. 3 is the block diagram illustrating the process performed by exemplary NN engine 300.As shown, NN engine 300 can include place Reason device 305 and external memory storage 315.Processor 305 can include one or more computing unit 308.Include at processor 305 In the case of multiple computing units 308, computing unit 308 can be configured to parallel work-flow or operate simultaneously with one another.Additionally, meter Calculate unit 308 can operate independently of one another.In one example, each computing unit 308 may be implemented as permissible Perform the kernel of instruction.
Processor 305 can include internal storage 310.Internal storage 310 can be on-chip memory.Such as, interior Portion's memorizer 310 can be the cache memory of processor 305.Internal storage 310 may be implemented as processor 305 Simple buffering device, 1 grade of cache memory, 2 grades of cache memories etc..As shown, computing unit 308 can couple To internal storage 310.In processor 305 includes the layout of multiple computing unit 308, each computing unit 308 can have There is special inside memorizer 310.Internal storage 310, or the most each internal storage, can store PSP value 322-1 (such as, Feature Mapping and/or its part), optionally stored weight 320-1, instruction 325, and include one or more event Queue 330.
As shown, processor 305 is alternatively coupled to external memory storage 315.In one example, external memory storage 315 can To be implemented as the cache memory of the processor 305 of other rank one or more.But, external memory storage 315 can Not to be positioned at processor 305 with on a piece of.In another example, external memory storage 315 may be implemented as RAM, such as DRAM, SRAM etc..On the other hand, processor 305 can be coupled to external memory storage by Memory Controller (not shown) 315.Generally, intermediate data, such as, PSP value 322-1, weight 320-1 and pulse event can be stored in internal storage 310 In.Pulse event can be stored in event queue 330.As herein defined, term " pulse event " refers to neuron It is output as being 1 (instruction spike) or 0 (instruction does not has spike).
In the example of fig. 3, weight 320-2 of the neuron of neutral net can be stored in external memory storage 315, and Weight 320-1 can be stored in internal storage 310.In an example implementations, weight 320-1 is to process SNN Those weights needed for the respective flap of next layer that the sheet of the layer of 100 produces SNN 100.In the figure 2 example, such as, weight 320-1 is the weight needed for the sheet 110-1 of the sheet 108-1 generation layer 104 processing layer 102.Weight 320-2 is currently without use Other weight of SNN100.In another exemplary embodiment, processor 305 can compress weight 320-1 and deposit in inside Storage in reservoir 310.
Processor 305 may be implemented as one or more hardware circuit.On the one hand, processor 305 can be configured For performing the instruction of such as instruction 325.Instruction 325 can be contained in program code.As discussed, processor 305 can be by It is embodied as integrated circuit.The exemplary embodiment of processor 305 can include but not limited to, CPU (CPU), multinuclear CPU, array processor, vector processor, digital signal processor (DSP), field programmable gate array (FPGA), able to programme patrol Collect array (PLA), special IC (ASIC), Programmable Logic Device, controller etc..Can use described various different Any one of processor realizes NN engine 300 with external memory storage 315 combination.
NN engine 300 can operate on each sheet of SNN 100 rather than operate on each neuron.Due to convolution The receiving area of wave filter is overlapping, and single input pulse can act on multiple adjacent target neuron.As illustrative example, examine Consider the receiving area of 5 × 5 target nerve units.In this case, input pulse acts on 25 target nerve units.Neutral net Each layer between the result of pulse that transmits be that the convolution of SCNN degenerates to N × M condition addition from N × Metzler matrix multiplication.N × In M condition addition, add a certain weights corresponding with the input with pulse in weight 320-1.
In one exemplary embodiment, NN engine 300 can reduce from accumulation step and/or eliminate and unnecessary adds zero (adding-0) computing.Such as, processor 305 can on input pulse rather than output neuron iteration.Processor 305 is permissible Update each affected target nerve unit.Processor 305 can update with executed in parallel.As a result, processor 305 can " disperse " Input rather than " gathering " input.
As shown, NN engine 300 can store input pulse in multiple (such as, the arrays) of event queue 330.One In individual layout, each event queue 330 can be retained to be used by the specific frustum of neutral net.Such as, for god Through each frustum of network, NN engine 300 can include an event queue.So, " T " time interval is given Set, each event queue can store the pulse event of given width (W) highly (H) degree of depth (D) that target is neutral net, Wherein, T is the integer value of one or bigger.
In the example of fig. 3, internal storage 310 may be used for realizing event queue 330, and it is marked as event queue 330-1,330-2,330-3 and 330-4.Permissible by the most each event queue in the extension of the label of event queue 300 The specific frustum used.So, event queue 330-1 may be used for frustum 1.Event queue 330-2 may be used for Frustum 2 etc..In one exemplary embodiment, NN engine 300 can safeguard between frustum and event queue One-one relationship.Safeguard that this relation causes the pulse event of the sheet in the identical frustum of different layers to be stored in identical thing Part queue.
For illustrative purposes, event queue 330-1,330-2,330-3 and 330-4 are shown in internal storage Realize in 310 simultaneously.Should be appreciated that including the event queue needs of the most processed specific frustum Realize in portion's memorizer 310.Such as, if NN engine 300 is processing the sheet of frustum 1, then NN engine 300 only needs Event queue 330-1 is realized in internal storage 310.Along with NN engine 300 processes other frustum, it is possible to achieve other thing Part queue is used for storage pulse event.
As illustrated examples, event queue 330-1 can store from processing the pulse event that sheet 108-1 produces.Pulse Event is interim, and therefore, once uses the neuron consumption of weight 320-1 or use pulse event to update neuron PSP value 322-1, in event queue 330-1, pulse event can be capped.Form PSP value 322-1 and the 322-of Feature Mapping 2 permanent states representing SNN 100.Processor 305 updates PSP value 322-1 in each time interval.Therefore, the spy of sheet is represented PSP value 322-1 of the part levying mapping will not be deleted and/or mobile from internal storage 310, until the time of selected quantity Interval is processed to the respective flap producing in next layer.Such as, PSP value 322-1 in internal storage 310 can represent sheet The PSP value of T the time interval of 108-1 and the PSP value of T the time interval from the sheet 110-1 of sheet 108-1 generation.Response The PSP value of the sheet 108-1 in T the time interval of generation sheet 110-1, storage internal storage can be stored in or mobile To external memory storage 315, as PSP value 322-2.PSP value can be called from external memory storage 315 subsequently, and PSP value is deposited Storage processes the further time interval of sheet 108-1 at internal storage 310, and produces the further time of sheet 110-1 Interval.
In one embodiment, can by the entry representation layer in event queue 330 to the pulse event of stator.Team Row entry can include in sheet (x, y) coordinate are followed by specifying the mask of 1 and/or 0.Such as, (x y) represents sheet to coordinate The position skew of Part I.Mask after coordinate can include that the specific node of indicating piece has pulse and the most do not has arteries and veins Multiple bits of punching.In one example, one (1) value in mask may indicate that pulse, and zero (0) value may indicate that do not have arteries and veins Punching.In the present embodiment, it is not necessary to for each neuron preserve (x, y) coordinate, thus provide compact and/or compression storage.Root Openness according to pulse, other event queue storage format may be used for specifying which neuron to have pulse and arteries and veins efficiently The time interval of punching.
As shown, according to row, event queue 330-1 can include multiple entry.Often row can be represented to the region of stator Pulse.Such as, (x1, y1) being represented, the position of first area of sheet is from the side-play amount of the datum mark of sheet.Mask is (such as, MASK 1) can include that the specific neuron of indicating piece has pulse and the most do not has multiple bits of spike.Such as, if district Territory is the block of 5 × 5, if then state is " pulse " and " no pulse ", MASK 1 can have 25.Sheet 108-1 is used to produce The region of Feature Mapping can be stored in event queue 330-1, as (x1, y1), be followed by Mask 1, as (x2, y2), Mask 2 etc. subsequently.
In another embodiment, processor 305 can be configured to further compression pulse event data to reduce storage And/or simplify the process of compression entries.In one example, processor 305 can be skipped does not has the region of pulse and/or sub-district The process in territory.In another example, region corresponding with all zero (such as, not the having spike) as specified in queue entries or son Region can be filtered, to reduce unnecessary calculating cost.Such as, NN engine 300 can include the region quilt preventing complete zero The logic of the event queue 330 being stored in internal storage 310.
Should be appreciated that NN engine 300 is permissible in the case of region or subregion have complete zero (such as, do not have pulse) Follow the tracks of the quantity that given area does not have the time interval of spike, and update PSP value 322-1, there is no arteries and veins with instruction in region The decay occurred based on the time interval quantity passed in the case of punching.So, in one or more time intervals not There are those regions of pulse, it is not necessary in each time interval more PSP value 322-1.
Fig. 4 is the flow chart illustrating the illustrative methods 400 performing SNN.Method 400 can be by as described with respect to figure 3 NN engine perform.At block 405, neutral net can be divided into multiple frustum by NN engine.Frustum can be Rectangle.Neutral net can be divided into the rectangle frustum projected from the higher level of neutral net to lower level by NN engine.
In one embodiment, NN engine can use the predetermined segmentation of SNN.For example, it is possible to performing SNN with NN engine Time the predetermined segmentation that can read and/or determine to store SNN.
In another embodiment, NN engine can split neutral net in response to the execution of neutral net.Such as, NN draws Holding up can be according to the sized divisions SNN of the internal storage of the processor of neutral net.NN engine can be according to being provided internally to For realizing the amount of memory of the processor of event queue, storage weight, storing instruction and for processing two adjacent layers of SNN The storage PSP value of respective flap adjust the size of frustum, as mentioned above.
At block 410, NN engine can create and/or safeguard the event queue of each frustum of neutral net.Along with The sheet of layer is produced pulse, and pulse event is posted to the event queue respective flap for next layer.These are at future time To the input pulse of this layer in interval.Should be appreciated that the pulse event of two different layers and/or time interval can be in logic Upper differentiation and/or separation in similar events queue.NN engine can make in a time interval according to the sheet of the layer of neutral net Pulse is assembled.Permission NN engine of assembling as used event queue to realize more compactly represents the set of pulse event, and reason is Multiple pulses/no pulse value can be stored in have single (x, in mask y) offset rather than storage mask each bit (x, y) skew of (pulse/no pulse value).
At block 415, each time interval in gathering for time interval T, NN engine can produce all arteries and veins by being used for Rush the suitable object event queue of the respective flap that each input rank entry process of event is next layer in neutral net. Such as, NN engine can read next queue entries of processed frustum.NN engine can be according to the width receiving territory To calculate which target nerve unit impacted with the bit arranged in the mask of height and queue entries.As discussed, in queue In the bit explanation previous layer of the mask in entry, which neuron has produced pulse.
In illustrated examples, NN engine can process the group of 4 pulses.Should be appreciated that selection 4 is only illustrative mesh , it is not meant as the restriction of the layout of invention described herein.Such as, NN engine can process and have than 4 pulses Less or more group.In one example, NN engine can process the part group that the quantity of pulse is not the integral multiple of 4.Continue This example continuous, for each 4 input accumulators, NN engine can be 4 the input pulse roads acting on corresponding output neuron By the respective weights from internal storage.In one arrangement, NN engine can be according to being supplied to each accumulator as defeated The pulse entered relative (x, y) skew utilize look-up table to produce correct weight.
When the pulse mapped at the given the most accumulated all input feature vectors of accumulator corresponding with given target nerve unit During contribution, NN engine can calculate the PSP value of neuron.If new PSP value exceedes threshold value, then NN engine produces new arteries and veins Punching.Therefore, respective queue entry can be added to outgoing event queue by NN engine.Update PSP value for time interval subsequently In calculating.So, it is necessary to store PSP value at each neuron.The PSP value of all neurons represents that neutral net is given any Fix time interval state.PSP value may be constructed to be needed memorizer storage and reads and the mass data of write data traffic, To preserve neutral net state and to recover neutral net state after each interval.
Owing to SCNN is feed-forward type neutral net, therefore NN engine can be at time interval (the wherein T of some quantity T The integer value of 1 or bigger) in process the sheet of ground floor, and the respective flap of next layer for neutral net sets up multiple time Interval.In one arrangement, the pulse of multiple time intervals can be by one layer of batch processing together of neutral net, with multiple Time interval produces event from this layer.The state (such as, PSP value) of neuron can be saved to external memory storage, and can To call from external memory storage between the batch processing of time interval.Preserving and call neuron state can be data traffic Source, particularly in bigger neutral net.Additionally, by the group that time interval batch processing is become T time interval, reduce Data traffic that weight produces and weight decompression time.Owing to the whole batch processing for T time interval repeats Use equal weight, and during process the time before the sheet of next layer from the new weight of external memory storage reading, identical Weight is maintained in internal storage, this minimizing therefore occurs.
Due to NN engine by each layer by T time interval together batch processing, can according to factor 1/T reduce preserve and Call the data traffic of neuron state.By batch processing, NN engine can keep PSP value at internal storage, and suppress A part for data traffic, such as, with storage and call some or all data traffics that neuron state is associated.At one In embodiment, can carry out subsequently or higher anything before to every layer together batch processing generate neutral net knot All intervals required for Guo.In such a case, it is possible to eliminate the data traffic relevant to storage neuron state.Additionally, External memory storage can be got rid of from NN engine.
NN engine can also be distributed work between multiple process kernels, realizes with the exchange for pulse between adjacent sheet Minimum or less the good of cross flow scales.As discussed, NN engine can safeguard for by pulse from neutral net One Es-region propagations is to the event queue of next layer.Pulse needs storage.The process producing pulse can be managed so that each queue bar Mesh can be stored in the computing unit of NN engine and/or the internal storage of processor, such as, in on-chip memory.Permissible By process in there is the sheet that known maximum storage requires in event number every layer realize safeguarding internal storage in queue Entry.Therefore, NN engine need not be immediately generated all pulse events of whole layer before next layer of process completes.
In block 420, the pulse that NN engine can obtain T time interval set batch processing for the respective flap of next layer Event data.
Although the most not shown, but it is to be understood that the storehouse of the layer in the frustum of neutral net upwards, NN engine can repeat the operation that reference block 415 and 420 describes.Additionally, for given a collection of time interval, NN engine is permissible One or more serialgrams of scanning slice, to produce plural number " N " individual pulse event, wherein, " N " is the number of kernel in NN engine Amount.Such as, if NN engine generates the pulse event of " S " quantity, and include " N " individual kernel, then NN engine can move N × S pulse event was generated for given layer before next higher of neutral net.
Fig. 5 is the flow chart of the illustrative methods 500 illustrating the frustum determining neutral net.Method can be performed 500, neutral net is divided into frustum, the size of the sheet in each equivalent layer of restriction neutral net.An example In, method 500 can be realized by the data handling system (system) of such as computer.System can be configured to according to wanting The description of the neutral net being divided into frustum operates.Such as, system can be configured to (such as, being programmed to) Perform the operation described with reference to Fig. 5.In one embodiment, method 500 can be performed as processed offline, and reason is, can To perform method 500 before the execution of neutral net.The segmentation determined can be as the one of the neutral net performed after a while Divide and stored.
At block 505, system can select the pantostrat of one group of neutral net to process together, in the processor of NN engine Portion's memorizer keeps intermediate data.As discussed, the internal storage of the processor of NN engine keeps intermediate data Decrease and performing the data off-chip flow that neutral net produces.
At block 510, system can deduct compression weight and the event queue of storage every layer from known internal memory size Required storage.Depositing needed for the training managing of execution is known for every layer of storage compression weight of neutral net before segmentation Reservoir amount.For the neutral net allowing pulse event data to be suitable in queue, it is possible to use heuristics arranges or determines thing The size of part queue.Residual memory space can be used for storing the PSP value of every layer.
At block 515, system can storage needed for the quantity of Feature Mapping in layer N based on group plus next layer of group The respective stored demand of (layer N+1), determines width and the height of sheet.Layer N+1 needed for storage be scaling width and height with The product of the quantity of the Feature Mapping of layer N+1.Described width and height is scaled from layer N.
For the layer group selected at block 505, system may determine that any width to given layer and height, in order to from inside After the total available storage of memorizer deducts compression weight and event queue storage, the sheet (such as, respective flap) of two adjacent layers PSP value be suitable for residue storage.Owing to scaling chip resolution at each layer, therefore size will cause scaling not over can With storage.
Propose the purpose that Fig. 5 is merely to illustrate, and be therefore not intended to the limit of the layout as present invention disclosed herein System.Fig. 5 is illustrated based on the size of the internal storage of NN engine and neutral net is divided into the exemplary process of frustum.? In a kind of layout, the different pantostrat groups in SNN can be performed Fig. 5 to determine the more than one segmentation of SNN.With this side Formula, it is possible to use the first segmentation performs a part with the SNN of multiple pantostrat, and uses the second segmentation and different segmentation Perform to have the Part II (or Part III or more parts) of the SNN of pantostrat.
Fig. 6 is the example division of the layer 602 of SNN.In the example of fig. 6, layer 602 is divided into and includes 16, such as sheet Shown in 604-634.In the example depicted in fig. 6, NN engine can include multiple different computing unit or kernel.Such as, NN draws Hold up can be distributed between computing unit A, B, C and D and work, minimum or at least to realize for the exchange of pulse between adjacent sheet The good scaling of less cross flow.
In one example, first, NN engine can be with the subregion 640 of execution level 602.Subregion 640 include sheet 604,606, 608 and 610.NN engine can produce all pulse events of the sheet of subregion 640 as to neutral net to a collection of T time interval The input of next layer.Computing unit A can be with performing chip 604.Computing unit B can be with performing chip 606.Computing unit C can hold Row sheet 608.Computing unit D can be with performing chip 610.
Second, NN engine can be with the subregion 642 of execution level 602.Subregion 642 includes sheet 612,614,616 and 618.NN draws Hold up and a collection of T time interval can be produced defeated as next layer to neutral net of all pulse events of sheet of subregion 642 Enter.Computing unit A can be with performing chip 612.Computing unit B can be with performing chip 614.Computing unit C can be with performing chip 616.Calculate Cells D can be with performing chip 618.
Then, NN engine proceeds to next layer of neutral net, and performs respective flap at next layer.NN engine can lead to The each layer crossing neutral net continues up, until reaching layer that is that select or that specify.In response to reach select layer, NN engine can To return to layer 602 to continue with residue sheet.
Such as, after returning to layer 602 as mentioned above, first, NN engine can be with the subregion 644 of execution level 602.Subregion 644 include sheet 620,622,624 and 626.NN engine can produce all pulses of the sheet of subregion 644 to a collection of T time interval Event is as the input of next layer to neutral net.Computing unit A can be with performing chip 620.Computing unit B can be with performing chip 622.Computing unit C can be with performing chip 624.Computing unit D can be with performing chip 626.
Second, NN engine can be with the subregion 646 of execution level 602.Subregion 646 includes sheet 628,630,632 and 634.NN draws Hold up all pulse events of the sheet that a collection of T time interval can be produced subregion 646 as next layer to neutral net Input.Computing unit A can be with performing chip 628.Computing unit B can be with performing chip 630.Computing unit C can be with performing chip 632.Meter Calculating cells D can be with performing chip 634.NN engine can be continued up by each layer of neutral net, as discussed, process respective flap, Until reaching the layer selected.
Fig. 7 is the flow chart illustrating the another exemplary method 700 performing SNN.Method 700 can be described by with reference to Fig. 3 NN engine perform.Neutral net can be divided into frustum at NN engine and be truncate by method 700 Start under the state that head body sets up queue, as described herein.
At block 705, NN engine can select the layer of neutral net as current layer.Such as, NN engine can select layer N to make For current layer.At block 710, NN engine may determine that whether current layer is designated as performing the execution method shown in Fig. 7 Halt.If it is, on the one hand, method 700 terminates.On the other hand, as described with reference to Figure 6, method can be god Restart through the ground floor of network.Such as, at block 710, NN engine may determine that the certain layer having reached neutral net.Make For response, need to locate if initial layers there remains pending sheet (such as, at least one of neutral net) frustum Reason), then NN engine can start to process from initial layers, or terminates.Without reaching designated layer, then method 700 can continue Continue block 715.
At block 715, NN engine can select the sheet of current layer as current slice.At block 720, NN engine can from currently The event queue that sheet is associated reads queue entries.Queue entries can be used for pending very first time interval.
At block 725, NN engine can use weight and PSP value to produce the queue of the respective flap in next layer of neutral net Entry.Such as, if pending current slice is the sheet 108-1 of the layer 102 in Fig. 1, then the respective flap of next layer is layer 104 Sheet 110-1.The queue entries produced for next layer of neutral net can be stored in similar events queue.As discussed, NN draws Hold up and the queue entries of generation can be designated as belonging to next layer rather than belong to current layer.NN engine can also be according to the time It is spaced and is grouped to queue entries.
As discussed, when reflecting with the corresponding given the most accumulated all input feature vectors of accumulator of given target nerve unit During the contribution of the pulse penetrated, NN engine can calculate the PSP value of neuron.If new PSP value exceedes threshold value, then NN engine produces The pulse of tissue regeneration promoting.Therefore, respective queue entry can be added to outgoing event queue by NN engine.Meter for time interval subsequently Calculate the PSP value updating neuron.
In some cases, the pulse in neutral net is probably scattered and gathering.For given area (such as, Queue entries) and time interval, it is possible to only exist the several pulses for different adjacent neurons.For each target god Through unit, although different weight can be added, but can be zero to many input of neuron.According to an exemplary enforcement Example, is not to use to perform to circulate to process all inputs arriving given neuron in the circulation setting quantity, and wherein, zero needs The time quantum identical with one processes, and NN engine can collect input by only processing the input of non-zero neuron.Pass through Effectively skipping zero input of neuron, the performance of NN engine can be improved with power consumption aspect at runtime.
At block 730, NN engine may determine that another time interval of current slice is the most pending.If it is, method 700 Block 720 can be circulated back to, to process the entry of the following time interval of current slice.If it is not, then method 700 can continue To block 735.
At block 735, NN engine may determine whether to perform another sheet of the current layer of neutral net.Such as, according to segmentation (such as, sheet number, the quantity of the computing unit of NN engine and/or the size of internal storage in layer), NN engine can perform Another sheet of current layer.NN engine can be configured to process the pulse event of one or more, and current sliding into T time interval of the output pulse event of next layer is produced before another sheet of layer.
In one aspect, NN engine can perform the most a piece of of current layer to one or more time intervals.The opposing party Face, NN engine can only perform the subset of the sheet of current layer, such as, 1 or more, but all or less than sheet.However, it should Understand, can by the method that the first computing unit of NN engine is performed Fig. 7, and NN engine one or more other calculate Unit performs the method for Fig. 7 simultaneously.Computing unit can also operate in a synchronous manner so that at the computing unit simultaneously operated The neighboring edge of the sheet of reason can be shared.Alternatively, sheet can define in an overlapping manner, to avoid between computing unit Data sharing.
Under any circumstance, if another sheet that NN engine determines current layer will be performed, then method 700 can loop back To block 715, a piece of to select under performing.If NN engine determines another sheet not processing current layer, then method 700 can To be circulated back to block 705 to continue with.
In arranging at one, method 700 can be performed, with flat by the first of more than first pantostrat process neutral net Frutum.Method 700 can be with iteration, to process other frustum each by more than first pantostrat.It is then possible to realize Method 700, processes the first frustum of more than second pantostrat with more than first pantostrat with part segmentation again.Can With repetition methods 700, process residue frustum by having more than second pantostrat of different segmentation.
Fig. 8 is to illustrate for realizing the showing of data handling system (system) 800 of dividing processing described herein by reference to Fig. 5 The block diagram of example.As shown, system 800 includes that at least one processor 805 (such as, CPU (CPU)) passes through system Bus 815 or other suitable circuit are coupled to memory element 810.System 800 stores computer-readable in memory element 810 Instruction (also referred to as " program code ").Memory element 810 can be considered the example of computer-readable recording medium.Processor 805 The program code accessed from memory element 810 is performed via system bus 815.
Memory element 810 can include one or more physical storage device, such as, local storage 820 and one or many Individual mass storage device 825.Local storage 820 refers to normally used program code actual the term of execution deposit at random Access to memory (RAM) or other non-persistent storage device.Mass storage device 825 may be implemented as hard disk drive (HDD), solid-state drive (SSD) or other persistent data storage device.System 800 can also include providing at least some of journey One or more cache memory (not shown) of the interim storage of sequence code, with reduce the term of execution must be from great Rong Amount storage device 825 obtains the number of times of program code.
Input/output (I/O) device (such as keyboard 830, display device 835, indicator device 840 and one or more net Network adapter 845) it is alternatively coupled to system 800.I/O device can be coupled to system directly or by middle I/O controller 800.In some cases, in the case of touch screen is used as display device 835, one or more I/O devices can be combined. In this case, display device 835 can also realize keyboard 830 and sensing equipment 840.Network adapter 845 may be used for By intermediate dedicated or public network, system 800 is coupled to other system, computer system, remote printer and/or remotely Storage device.Modem, cable modem, Ethernet card, wireless transceiver and/or radio are systems 800 examples that can use different types of network adapter 845.Specific implementation mode according to system 800, certain types of One or more network adapter optionally become.
As shown in Figure 8, memory element 810 can store operating system 850 and one or more application 855.Apply 855 examples As being the neutral net entity splitting neutral net upon being performed,.In one aspect, by system 800 (specifically, by Processor 805) perform the operating system 850 of the form realization with executable program code and apply 855.So, operating system 850 are considered the integration sections of system 800 with application 855.That operating system 850, application 855 and system 800 use, That produce and/or operation any data item is the performance data structure giving function by system 800 when being utilized.
In one aspect, system 800 can be suitable for storage and/or the computer performing program code or other dress Put.System 800 can represent any kind of computer system and/or device, including processor and memorizer, and is able to carry out Operation described in the disclosure.In some cases, particular computer system and/or device can include more less than described Assembly or more assembly.System 800 may be implemented as directed separate payment or multiple networking or interacted system, its In, each system has the same or similar structure with the structure of system.
In one example, system 800 can receive neutral net as input.System 800 is performing operating system 150 With can split neutral net when applying 155 and in memorizer or other computer-readable recording medium, store the nerve of segmentation Network is for performing after a while.
Term as used herein is only used for describing the purpose of specific embodiment, is not intended to limit.While it is true, Several definition that proposition is applied now at whole document.
As herein defined, singulative " ", " one " and " being somebody's turn to do " are also intended to include plural form, unless context Clearly dictate otherwise.
As herein defined, term " another " refers at least two or more.
As herein defined, term " at least one ", " one or more " and "and/or" are open statements, are Conjunction in operation and extracting, unless the most additionally explanation.Such as, state " at least one in A, B and C ", " in A, B or C At least one ", " one or more in A, B and C ", " one or more in A, B or C " and " A, B and/or C " refer to that A is mono- Solely, B is independent, C is independent, together with A with B, together with A with C, together with B with C or A, B are together with C.
As herein defined, " automatically " refer to without user intervention.
As used herein, term " cloud computing may refer to promote to configurable calculating resource (such as network, server, Memorizer, apply and/or service) the computation model facilitating on-demand network access of shared pool.Can be in minimum management work Or ISP mutual in the case of quickly provide and discharge these and calculate resource.Cloud computing improves availability, and its Feature can be on-demand Self-Service, widely network insertion, resource pool, the most elastic and measurement.Clothes are generally supported in cloud computing Business pattern, such as cloud computing software i.e. services (SaaS), cloud platform i.e. services (PaaS) and/or cloud infrastructure i.e. services (IaaS).Cloud computing can also support deployment mode, privately owned cloud, community cloud, publicly-owned cloud and/or mixed cloud.Can be from U.S. State's national standard obtains entering about cloud computing with Institute for Research and Technology (NIST) (more specifically, Information Technology Laboratory of NIST) One step data.
As herein defined, term " computer-readable recording medium " refer to comprise or store by instruction execution system, The storage medium of the program code that equipment or device use or be used in combination.As herein defined, " computer-readable storage medium Matter " it not interim transmitting signal itself.Computer-readable recording medium can be but not limited to, and electronic storage device, magnetic are deposited Storage device, light storage device, electromagnetic storage device, semiconductor storage or above-mentioned any appropriately combined.It is as described herein, Memory element is the example of computer-readable recording medium.The non exhaustive row of the more specifically example of computer-readable recording medium Table may include that portable computer diskette, hard disk, random access memory (RAM), read only memory (ROM), erasable can Program read-only memory (EPROM or flash memory), static RAM (SRAM), portable optic disk read only memory (CD- ROM), digital versatile disc (DVD), memory stick, floppy disk etc..
As herein defined, " coupling " refers to connect, and does not either directly have any intermediary element or has one Or multiple intermediary element connects, except as otherwise noted indirectly.Two elements can be believed with mechanical couplings, electric coupling or by communication Road, path, network or system link coupling communicatedly.
As herein defined, term " can perform operation " or " operation " is by data handling system or data handling system The task that interior processor performs, unless otherwise indicated by context.The example that can perform operation includes but not limited to, " process ", " computer calculating ", " calculating ", " determination ", " display ", " comparison " etc..To this, operation refers at the data of such as computer system Reason system or such action of similar computing electronics and/or process: the depositor of operating computer system or memorizer Inside be represented as the data that physics (electronics) is measured, and convert thereof into computer system memory and/or depositor or its Other data of physical quantity it are similarly represented as in its this information storage, transmission or display device.
As herein defined, term " includes " specifying feature, integer, step, operation, element and/or the portion stated The existence of part, but do not preclude the presence or addition of one or more further feature, integer, step, operation, element, parts and/or Their group.
As herein defined, term " if " refer to " when " or " after " or " in response to " or " response ", this depends on Context.Therefore, phrase " if it is determined that " or " if be detected that [condition stated or event] " can be interpreted to mean that " after determining " or " in response to determining " or " one detects [condition stated or event] " or " [institute is old in response to detecting The condition stated or event] " or " in response to [condition stated or event] being detected ", this depends on context.
As herein defined, term " embodiment ", " embodiment " or similar language refer to that combining this embodiment retouches In at least one embodiment that special characteristic, structure or the characteristic stated is described in being included in the disclosure.Therefore, whole public affairs Open middle phrase " in one embodiment ", " in an embodiment " and similar language appearance can but be not necessarily all referring to same reality Execute example.
As herein defined, term " exports " and refers to be stored in physical memory element (such as, device), is written to display Or other peripheral output devices, send or be sent to another system, derivation etc..
As herein defined, term " multiple " refers to two or more.
As herein defined, term " in response to " refer to action or event response or reaction.Therefore, if " responded In " first action executing the second action, then there is the cause effect relation between the generation of the second action in there is the first action.Art Language " in response to " represent cause effect relation.
As herein defined, term " user " refers to people.
Term first, second grade may be used for describing various element in this article.These elements should not limited by these terms System, reason is that these data are only used for distinguishing an element and another element, except as otherwise noted or context understands Point out.
Computer program can include having on it for making processor perform the computer of each aspect of the present invention The computer-readable recording medium (or medium) of readable program instructions.In this disclosure, term " program code " " calculates with term Machine readable program instructions " use can be exchanged.Computer-readable program instructions described herein can be from computer-readable Storage medium downloads to calculate/processing means accordingly, or via network (such as, the Internet, LAN, wide area network and/or Wireless network) download to outer computer or external memory.Network can include copper transmission cable, optical delivery fiber, nothing Line transmission, router, fire wall, switch, gateway computer and/or include the edge device of Edge Server.At each meter Network adapter cards or network interface in calculation/processing means receive computer-readable program instructions from network, and forward meter Calculation machine readable program instructions, for being stored in the computer-readable recording medium in respective calculating/processing means.
Can be compilation for performing the operation computer-readable program instructions of the layout of the present invention described herein Instruction, instruction set architecture (ISA) instruction, machine instruction, machine-dependent instructions, microcode firmware instructions, condition setup data or Any combination of one or more programming languages (including the programming language of OO programming language and/or program) is write Source code or object code.Computer-readable program instructions can all perform on the computer of user, as independent soft Part packet portion ground performs on the computer of user, partly and partly holds on the remote computer on the computer of user OK, or all on remote computer or server perform.In the later case, remote computer can be by any class The network (including LAN or WAN) of type is connected to user, or may be coupled to outer computer (such as, by using the Internet The Internet of ISP).In some cases, electronic circuit (including such as, Programmable Logic Device, FPGA or PLA) Can refer to by utilizing the status information individual electronic circuit of computer-readable program instructions to perform computer-readable program Order, to perform the layout of each side present invention described herein.
To describe with reference to flow chart and/or method, the block diagram of equipment (system) and computer program herein this The particular aspects of bright layout.Should be appreciated that each piece of flow chart and/or block diagram, and the group of block in flow chart and/or block diagram Conjunction can be by computer-readable program instructions, and such as, program code realizes.
These computer-readable program instructions can be provided to general purpose computer, special-purpose computer or other is able to programme The processor of data handling equipment is to produce machine so that via computer or the processor of other programmable data processing device When performing instruction, instruction creates the function/action for realizing specifying in one or more pieces of flow chart and/or block diagram Device.These computer-readable program instructions can also store in a computer-readable storage medium, it is possible to instruct computer, can Programming data processing equipment and/or other device run in a specific way so that wherein storage has the computer-readable of instruction to deposit Storage media includes realizing the one of the instruction of the various aspects of operation specified in one or more pieces of flow chart and/or block diagram Plant goods.
Computer-readable program instructions can also be loaded into computer, other programmable data processing device or other dress Put so that sequence of operations will be performed on computer, other programmable device or other device, to produce computer realization Process, thus instruction on computer, other programmable data processing device or other device perform time, instruction realize stream Function/the action specified in one or more pieces of journey figure and/or block diagram.
The framework of each side, function and system that flow chart in accompanying drawing and block diagram illustrate to arrange according to the present invention, The operation in the cards of method and computer program product.To this, each piece in flow chart or block diagram can be with representation module, section Or operation part, it includes the one or more executable instructions for realizing assigned operation.Implementation is substituted at some In, the operation pointed out in block can not occur according to the order of indication in figure.Such as, two blocks illustrated continuously can be basic Performing simultaneously, or these blocks can perform sometimes in reverse order, this depends on involved function.Should also be noted that In block diagram and/or each piece of flow chart and block diagram and/or flow chart, the combination of block can be by special hardware based system Realizing, described system performs specify function or action or carry out the combination of specialized hardware and computer instruction.
For the succinct of explanation and clearly purpose, element shown in the accompanying drawings is not drawn necessarily to scale.Such as, in order to Clear, the size of some elements may be exaggerated relative to other element.Additionally, when thinking fit, reference is in the drawings Repeat corresponding with instruction, similar or similar feature.
Corresponding structure, material, action and all devices that can find in the following claims or step add The equivalent of function element is intended to include that combining other claimed element performs any structure of function, material or action, makees For the content being specifically claimed.
One embodiment can include realizing using the first segmentation to be divided into many by the impulsive neural networks with multiple layers The method of individual frustum, wherein, each frustum includes a piece of of each dividing layer of impulsive neural networks.Method is permissible Including the first of the ground floor reading impulsive neural networks, and in the internal storage of processor, store intermediate data The first simultaneously utilizing processor use ground floor produces the first of the second layer of impulsive neural networks.The first of ground floor The first of sheet and the second layer belongs to same frustum.
In one aspect, at the first producing the generation second layer of the multiple time intervals before another sheet.
In another aspect, the size of frustum determines according to the size of internal storage.
In another further aspect, intermediate data include postsynaptic potential value (post synaptic potential value) and Pulse event.Additionally, pulse event can be stored in the event queue of internal storage as multiple masks, each mask bag Include x coordinate and y-coordinate.
The first computing unit that can use processor performs to produce the first of the second layer.Method can include and first Computing unit simultaneously utilizes second of the ground floor of the second computing unit use impulsive neural networks of processor to produce Second of the second layer.
Method can include using splits the another layer producing impulsive neural networks with the first segmentation different second At least one sheet.
Another embodiment can include realizing using the first segmentation to be divided into many by the impulsive neural networks with multiple layers The equipment of individual frustum, wherein, each frustum includes a piece of of each dividing layer of impulsive neural networks.Equipment includes It is configured to store the internal storage of intermediate data, and is coupled to internal storage and is configured to initiation and can perform operation The first computing unit.Can perform to operate the first including reading the ground floor of impulsive neural networks, and at processor Internal storage stores the of the second layer that uses while intermediate data the first of ground floor to produce impulsive neural networks A piece of.The first of ground floor and the first of the second layer belong to same frustum.
In one aspect, at the first producing the generation second layer of the multiple time intervals before another sheet.
In another aspect, the size of frustum determines according to the size of internal storage.
In another further aspect, intermediate data includes postsynaptic potential value and pulse event.Additionally, pulse event can be stored in As multiple masks in the event queue of internal storage, each mask includes x coordinate and y-coordinate.
Equipment may include that the second computing unit, is configured to initiation and can perform operation, described perform operation include with First computing unit uses second of second generation second layer of the ground floor of impulsive neural networks simultaneously.
First computing unit can be configured to initiation can perform operation, described perform to operate also include using and first Split at least one sheet that the second different segmentations produces another layer of impulsive neural networks.
Another embodiment can include computer program, has realization to use the first segmentation to have including storage on it The impulsive neural networks of multiple layers is divided into the computer-readable recording medium of the program code of multiple frustum, wherein, often Individual frustum includes a piece of of each dividing layer of impulsive neural networks.Program code can be performed by processor, to perform Operation, including the first of the ground floor reading impulsive neural networks, and in the internal storage of processor in the middle of storage The first using ground floor while data produces the first of the second layer of impulsive neural networks.The first of ground floor and The first of the second layer belongs to same frustum.
In one aspect, at the first producing the generation second layer of the multiple time intervals before another sheet.
In another aspect, the size of frustum determines according to the size of internal storage.
In another further aspect, intermediate data includes postsynaptic potential value and pulse event.Additionally, pulse event can be as many Individual mask is stored in the event queue of internal storage, and each mask includes x coordinate and y-coordinate.
Program code can be performed to perform operation by processor, and described operation also includes: uses different from the first segmentation The second segmentation produce at least one sheet of another layer of impulsive neural networks.
The purpose that the description of the layout of present invention provided herein illustrates that, is not intended to exhaustive or is limited to institute's public affairs The form opened and example.Term used herein is selected to explain the principle that the present invention arranges, the technology found in the market Actual application or technological improvement, and/or make those of ordinary skill in the art it will be appreciated that enforcement disclosed herein Example.In the case of the scope and spirit arranged without departing from the described present invention, common for this area of modifications and variations Technical staff can be apparent from.Therefore, it should with reference to following claims rather than foregoing disclose content, as table Show the scope of such feature and embodiment.

Claims (14)

1. the method realizing impulsive neural networks, wherein, described impulsive neural networks has use the first segmentation and is divided Being slit into multiple layers of multiple frustum, each frustum includes a piece of of each dividing layer of described impulsive neural networks, Described method includes:
Read the first of the ground floor of described impulsive neural networks;And
Utilize processor, while storing intermediate data in the internal storage of processor, use described impulsive neural networks The first of ground floor produces the first of the second layer of described impulsive neural networks, wherein, described impulsive neural networks The first of the first of ground floor and the second layer of described impulsive neural networks belongs to same frustum.
The most the method for claim 1, wherein described pulse is produced in the multiple time intervals before producing another sheet The first of the second layer of neutral net.
The most the method for claim 1, wherein the size of frustum is determined according to the size of internal storage.
The most the method for claim 1, wherein intermediate data includes postsynaptic potential value and pulse event.
5. method as claimed in claim 4, wherein, the event that pulse event is stored in internal storage as multiple masks In queue, wherein, each mask includes x coordinate and y-coordinate.
The most the method for claim 1, wherein it is neural that the first computing unit using processor performs to produce described pulse The operation of the first of the second layer of network, described method also includes:
The second computing unit of processor is simultaneously utilized to use the ground floor of described impulsive neural networks with the first computing unit Second of second second layer producing described impulsive neural networks.
7. the method for claim 1, also includes: use second segmentation different from the first segmentation to produce described pulse At least one sheet of another layer of neutral net.
8. realizing an equipment for impulsive neural networks, wherein, described impulsive neural networks has use the first segmentation and is divided Being slit into multiple layers of multiple frustum, each frustum includes a piece of of each dividing layer of described impulsive neural networks, Described equipment includes:
Internal storage, is configured to store intermediate data;
First computing unit, is coupled to internal storage and is configured to the performed operation initiating to include following operation:
Read the first of the ground floor of described impulsive neural networks;And
The first using the ground floor of described impulsive neural networks while storing intermediate data in internal storage produces The first of the second layer of raw described impulsive neural networks, wherein, the first of the ground floor of described impulsive neural networks and institute The first of the second layer stating impulsive neural networks belongs to same frustum.
9. equipment as claimed in claim 8, wherein, produces described pulse in the multiple time intervals before producing another sheet The first of the second layer of neutral net.
10. equipment as claimed in claim 8, wherein, the size of frustum is determined according to the size of internal storage.
11. equipment as claimed in claim 8, wherein, intermediate data includes postsynaptic potential value and pulse event.
12. equipment as claimed in claim 11, wherein, the thing that pulse event is stored in internal storage as multiple masks In part queue, wherein, each mask includes x coordinate and y-coordinate.
13. equipment as claimed in claim 8, also include:
Second computing unit, is configured to the performed operation initiating to include following operation:
Described pulse god is produced with second of the ground floor that the first computing unit simultaneously uses described impulsive neural networks Through the second layer of network second.
14. equipment as claimed in claim 8, wherein, what the first computing unit was configured to initiate also to include following operation can Perform operation:
Second different from the first segmentation is used to split at least one sheet of the another layer producing described impulsive neural networks.
CN201610373734.3A 2015-06-10 2016-05-31 Spiking neural network with reduced memory access and bandwidth consumption within the network Active CN106250981B (en)

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
US201562173742P 2015-06-10 2015-06-10
US62/173,742 2015-06-10
US15/062,365 US10387770B2 (en) 2015-06-10 2016-03-07 Spiking neural network with reduced memory access and reduced in-network bandwidth consumption
US15/062,365 2016-03-07
KR10-2016-0048957 2016-04-21
KR1020160048957A KR20160145482A (en) 2015-06-10 2016-04-21 Method and apparatus of implementing spiking neural network

Publications (2)

Publication Number Publication Date
CN106250981A true CN106250981A (en) 2016-12-21
CN106250981B CN106250981B (en) 2022-04-01

Family

ID=57612930

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610373734.3A Active CN106250981B (en) 2015-06-10 2016-05-31 Spiking neural network with reduced memory access and bandwidth consumption within the network

Country Status (1)

Country Link
CN (1) CN106250981B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108694441A (en) * 2017-04-07 2018-10-23 上海寒武纪信息科技有限公司 A kind of network processing unit and network operations method
WO2019095333A1 (en) * 2017-11-17 2019-05-23 华为技术有限公司 Data processing method and device
CN110321064A (en) * 2018-03-30 2019-10-11 北京深鉴智能科技有限公司 Computing platform realization method and system for neural network
CN110494869A (en) * 2017-04-04 2019-11-22 海露科技有限公司 Including individually controlling the neural network processor with data structure
CN110520909A (en) * 2017-04-17 2019-11-29 微软技术许可有限责任公司 The neural network processor of bandwidth of memory utilization rate is reduced using the compression and decompression of activation data
CN110555523A (en) * 2019-07-23 2019-12-10 中建三局智能技术有限公司 short-range tracking method and system based on impulse neural network
CN110955380A (en) * 2018-09-21 2020-04-03 中科寒武纪科技股份有限公司 Access data generation method, storage medium, computer device and apparatus
CN111052149A (en) * 2017-08-08 2020-04-21 三星电子株式会社 Method and apparatus for determining memory requirements in a network
CN111626430A (en) * 2019-04-18 2020-09-04 中科寒武纪科技股份有限公司 Data processing method and related product
CN112446478A (en) * 2019-09-05 2021-03-05 美光科技公司 Intelligent optimization of cache operations in a data storage device
WO2020026158A3 (en) * 2018-08-01 2021-10-07 南京天数智芯科技有限公司 Convolutional neural network accelerator-based data reuse method

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101546430A (en) * 2009-04-30 2009-09-30 上海大学 Edge extracting method based on simplified pulse coupled neural network
CN102682297A (en) * 2012-05-07 2012-09-19 中北大学 Pulse coupled neural network (PCNN) face image segmenting method simulating visual cells to feel field property
US20130073501A1 (en) * 2011-09-21 2013-03-21 Qualcomm Incorporated Method and apparatus for structural delay plasticity in spiking neural networks
CN103279958A (en) * 2013-05-31 2013-09-04 电子科技大学 Image segmentation method based on Spiking neural network
US20140074761A1 (en) * 2012-05-30 2014-03-13 Qualcomm Incorporated Dynamical event neuron and synapse models for learning spiking neural networks
CN103824291A (en) * 2014-02-24 2014-05-28 哈尔滨工程大学 Automatic image segmentation method of continuous quantum goose group algorithm evolution pulse coupling neural network system parameters
CN104145281A (en) * 2012-02-03 2014-11-12 安秉益 Neural network computing apparatus and system, and method therefor
CN104335232A (en) * 2012-05-30 2015-02-04 高通股份有限公司 Continuous time spiking neural network event-based simulation
US20150100529A1 (en) * 2013-10-08 2015-04-09 Qualcomm Incorporated Compiling network descriptions to multiple platforms
CN104599262A (en) * 2014-12-18 2015-05-06 浙江工业大学 Multichannel pulse coupling neural network based color image segmentation technology
CN104685516A (en) * 2012-08-17 2015-06-03 高通技术公司 Apparatus and methods for spiking neuron network learning

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101546430A (en) * 2009-04-30 2009-09-30 上海大学 Edge extracting method based on simplified pulse coupled neural network
US20130073501A1 (en) * 2011-09-21 2013-03-21 Qualcomm Incorporated Method and apparatus for structural delay plasticity in spiking neural networks
CN104145281A (en) * 2012-02-03 2014-11-12 安秉益 Neural network computing apparatus and system, and method therefor
CN102682297A (en) * 2012-05-07 2012-09-19 中北大学 Pulse coupled neural network (PCNN) face image segmenting method simulating visual cells to feel field property
US20140074761A1 (en) * 2012-05-30 2014-03-13 Qualcomm Incorporated Dynamical event neuron and synapse models for learning spiking neural networks
CN104335232A (en) * 2012-05-30 2015-02-04 高通股份有限公司 Continuous time spiking neural network event-based simulation
CN104685516A (en) * 2012-08-17 2015-06-03 高通技术公司 Apparatus and methods for spiking neuron network learning
CN103279958A (en) * 2013-05-31 2013-09-04 电子科技大学 Image segmentation method based on Spiking neural network
US20150100529A1 (en) * 2013-10-08 2015-04-09 Qualcomm Incorporated Compiling network descriptions to multiple platforms
CN103824291A (en) * 2014-02-24 2014-05-28 哈尔滨工程大学 Automatic image segmentation method of continuous quantum goose group algorithm evolution pulse coupling neural network system parameters
CN104599262A (en) * 2014-12-18 2015-05-06 浙江工业大学 Multichannel pulse coupling neural network based color image segmentation technology

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
AMMAR MOHEMMED等: "《Training Spiking Neural Networks to Associate Spatio-temporal Input-Output Spike Patterns》", 《NEUROCOMPUTING》 *
YONGQIANG CAO等: "《Spiking Deep Convolutional Neural Networks for Energy-Efficient Object Recognition》", 《INTERNATIONAL JOURNAL OF COMPUTER VISION》 *
崔文博 等: "《脉冲含有神经网络图像分割的编码方法》", 《计算机工程》 *

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110494869B (en) * 2017-04-04 2023-08-04 海露科技有限公司 Neural network processor including separate control and data structures
CN110494869A (en) * 2017-04-04 2019-11-22 海露科技有限公司 Including individually controlling the neural network processor with data structure
CN108694441A (en) * 2017-04-07 2018-10-23 上海寒武纪信息科技有限公司 A kind of network processing unit and network operations method
CN110520856B (en) * 2017-04-17 2023-03-31 微软技术许可有限责任公司 Processing non-contiguous memory as contiguous memory to improve performance of neural networks
CN110520909A (en) * 2017-04-17 2019-11-29 微软技术许可有限责任公司 The neural network processor of bandwidth of memory utilization rate is reduced using the compression and decompression of activation data
CN110520856A (en) * 2017-04-17 2019-11-29 微软技术许可有限责任公司 Handle the performance that not adjacent memory improves neural network as adjacent memory
US11528033B2 (en) 2017-04-17 2022-12-13 Microsoft Technology Licensing, Llc Neural network processor using compression and decompression of activation data to reduce memory bandwidth utilization
CN111052149A (en) * 2017-08-08 2020-04-21 三星电子株式会社 Method and apparatus for determining memory requirements in a network
CN111052149B (en) * 2017-08-08 2024-04-19 三星电子株式会社 Method and apparatus for determining memory requirements in a network
WO2019095333A1 (en) * 2017-11-17 2019-05-23 华为技术有限公司 Data processing method and device
CN110321064A (en) * 2018-03-30 2019-10-11 北京深鉴智能科技有限公司 Computing platform realization method and system for neural network
WO2020026158A3 (en) * 2018-08-01 2021-10-07 南京天数智芯科技有限公司 Convolutional neural network accelerator-based data reuse method
CN110955380A (en) * 2018-09-21 2020-04-03 中科寒武纪科技股份有限公司 Access data generation method, storage medium, computer device and apparatus
CN111626430A (en) * 2019-04-18 2020-09-04 中科寒武纪科技股份有限公司 Data processing method and related product
US11762690B2 (en) 2019-04-18 2023-09-19 Cambricon Technologies Corporation Limited Data processing method and related products
CN111626430B (en) * 2019-04-18 2023-09-26 中科寒武纪科技股份有限公司 Data processing method and related product
CN110555523B (en) * 2019-07-23 2022-03-29 中建三局智能技术有限公司 Short-range tracking method and system based on impulse neural network
CN110555523A (en) * 2019-07-23 2019-12-10 中建三局智能技术有限公司 short-range tracking method and system based on impulse neural network
CN112446478A (en) * 2019-09-05 2021-03-05 美光科技公司 Intelligent optimization of cache operations in a data storage device

Also Published As

Publication number Publication date
CN106250981B (en) 2022-04-01

Similar Documents

Publication Publication Date Title
CN106250981A (en) The impulsive neural networks of bandwidth consumption in minimizing memory access and network
US10387770B2 (en) Spiking neural network with reduced memory access and reduced in-network bandwidth consumption
EP3104309B1 (en) Spiking neural network with reduced memory access and reduced in-network bandwidth consumption
CN106203619B (en) Data optimized neural network traversal
KR102047031B1 (en) Deep Stereo: Learning to predict new views from real world images
EP3098762B1 (en) Data-optimized neural network traversal
CN107403430A (en) A kind of RGBD image, semantics dividing method
Walker et al. Modeling spatial decisions with graph theory: logging roads and forest fragmentation in the Brazilian Amazon
CN109141446A (en) For obtaining the method, apparatus, equipment and computer readable storage medium of map
CA3115188C (en) Apparatus and method for providing application service using satellite image
CN108304440A (en) Method, apparatus, computer equipment and the storage medium of game push
WO2022166681A1 (en) Virtual scenario generation method and apparatus, and device and storage medium
CN112069573A (en) City group space simulation method, system and equipment based on cellular automaton
CN108268481A (en) High in the clouds map updating method and electronic equipment
EP3493104A1 (en) Optimizations for dynamic object instance detection, segmentation, and structure mapping
CN104820945A (en) Online social network information transmision maximization method based on community structure mining algorithm
CN107122792A (en) Indoor arrangement method of estimation and system based on study prediction
Veerbeek Estimating the impacts of urban growth on future flood risk: A comparative study
Franczuk et al. Direct use of point clouds in real-time interaction with the cultural heritage in pandemic and post-pandemic tourism on the case of Kłodzko Fortress
CN106777875A (en) A kind of air traffic Complexity Measurement method based on double-deck multistage network model
KR102557686B1 (en) A method, device and system for providing future value analysis of land based on artificial intelligence
CN108270816A (en) High in the clouds map rejuvenation equipment
Chen et al. Bandit strategies in social search: the case of the DARPA red balloon challenge
CN115687526A (en) Seismic data model sharing method based on block chain and federal learning
CN114581086A (en) Phishing account detection method and system based on dynamic time sequence network

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant