CN116306855B - Data processing method and device based on memory and calculation integrated system - Google Patents

Data processing method and device based on memory and calculation integrated system Download PDF

Info

Publication number
CN116306855B
CN116306855B CN202310555078.9A CN202310555078A CN116306855B CN 116306855 B CN116306855 B CN 116306855B CN 202310555078 A CN202310555078 A CN 202310555078A CN 116306855 B CN116306855 B CN 116306855B
Authority
CN
China
Prior art keywords
target
unit
stage
target unit
feature map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310555078.9A
Other languages
Chinese (zh)
Other versions
CN116306855A (en
Inventor
吕波
程稳
刘懿
陈�光
曾令仿
李勇
徐鸿博
吴运翔
王鹏程
张金鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Lab
Original Assignee
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Lab filed Critical Zhejiang Lab
Priority to CN202310555078.9A priority Critical patent/CN116306855B/en
Publication of CN116306855A publication Critical patent/CN116306855A/en
Application granted granted Critical
Publication of CN116306855B publication Critical patent/CN116306855B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Evolutionary Computation (AREA)
  • Geometry (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Complex Calculations (AREA)

Abstract

The specification discloses a data processing method and device based on a unified system of memory and calculation, which determines the target number of target units and control vectors corresponding to the target units according to a target model, selects the target units of the target number from the data processing units of the unified system of memory and calculation, and then determines the target operations corresponding to the target units respectively from candidate operations of various types according to the control vectors corresponding to the target units so as to input the input of the target units into the target units respectively, and performs data processing on the input of the target units by adopting the target operations to obtain the output data of the target model. Therefore, the mode of determining the target operation executed by the target unit based on the control vector corresponding to the target unit can be compatible with models of different architectures only by changing the control vector, and redesign of a circuit structure is not needed, so that the scene of model reasoning based on the integrated circuit is expanded, and the efficiency is improved.

Description

Data processing method and device based on memory and calculation integrated system
Technical Field
The present disclosure relates to the field of artificial intelligence technologies, and in particular, to a data processing method and apparatus based on a memory integrated system.
Background
The in-Memory-in-Memory (CIM) architecture integrates storage and computation, avoiding frequent data movement between processing units and Memory units, which has attracted much attention as a new architecture. For the purpose of processing digital information using analog operations, existing integrated memory systems are typically composed of analog components, which may include a Crossbar array (Crossbar), and digital components, which may include controllers, buffers, analog-to-digital converters (Analog to Digital Converter, ADC), digital-to-analog converters (Digital to Analog Converter, DAC), etc. When the neural network is deployed based on the integrated system of memory and calculation, corresponding digital-analog/analog conversion operation is inevitably required to be carried out on the data input or output between each layer/each operation core, so that the power consumption and the area cost of the integrated platform of memory and calculation are high.
At present, full simulation and parallelization processing of the reasoning process of the neural network can be realized based on a pure simulation operation framework of the memristor cross array.
However, the fully parallel in-memory computational circuit modeling scheme must be regularized, structured, and heterogeneous network architecture design is difficult. More importantly, the design of the parallel analog circuit is customized according to the architecture of the neural network, and different neural network architectures are not identical, so that the designed parallel analog in-memory computing circuit cannot be compatible with different neural networks, and the application scene of the full parallel and analog in-memory computing integrated neural circuit is limited.
Based on the above, the present specification provides a data processing method based on a memory integrated system.
Disclosure of Invention
The present disclosure provides a data processing method and apparatus based on a memory integrated system, so as to partially solve the above-mentioned problems in the prior art.
The technical scheme adopted in the specification is as follows:
the present specification provides a data processing method based on a integrative system of memory, the integrative system of memory includes a plurality of data processing units, the data processing units correspond to a plurality of types of candidate operations, the method includes:
according to the obtained target model, determining the target number of target units contained in the target model and control vectors corresponding to the target units;
selecting the target number of target units from the data processing units of the integrated memory system;
for each target unit, determining a target operation corresponding to the target unit from candidate operations of various types according to a control vector corresponding to the target unit;
when input data of the target model are received, the input data of each target unit are determined according to the input data of the target model, the input data of each target unit are respectively input into each target unit, and the output data of the target model are obtained through target operations respectively corresponding to each target unit.
Optionally, before determining the target operation corresponding to the target unit from the candidate operations of each type according to the control vector corresponding to the target unit, the method further includes:
for each target unit, acquiring the number of intermediate feature graphs contained in the target unit and the stage numbers corresponding to the intermediate feature graphs respectively, wherein the stage numbers are used for indicating the ordering of the intermediate feature graphs in the intermediate feature graphs;
for the intermediate feature map of each stage in the target unit, determining a preamble node corresponding to the intermediate feature map of the stage according to the stage number of the intermediate feature map of the stage, wherein the preamble node is used for obtaining the intermediate feature map of the stage through candidate operation in the target unit, and the preamble node at least comprises input data of the target unit;
according to the control vector corresponding to the target unit, determining the target operation corresponding to the target unit from the candidate operations of each type, wherein the method specifically comprises the following steps:
searching vector values corresponding to all the preamble nodes of the intermediate feature diagram of the stage in the control vector corresponding to the target unit;
and selecting target operations respectively executed on the precursor nodes corresponding to the intermediate feature graphs of the stage from various types of candidate operations according to vector values corresponding to the precursor nodes of the intermediate feature graphs of the stage, and taking the target operations as target operations corresponding to the target units.
Optionally, determining the preamble node corresponding to the intermediate feature map of the stage according to the stage number of the intermediate feature map of the stage specifically includes:
determining a first appointed sequence of the intermediate feature graphs of each stage in the target unit according to the stage numbers of the intermediate feature graphs of each stage in the target unit;
when the intermediate feature map of the stage is determined to be positioned at the first position of the first appointed sequence according to the stage number of the intermediate feature map of the stage, determining that the preamble node corresponding to the intermediate feature map of the stage comprises the input data of the target unit to which the intermediate feature map of the stage belongs;
when the intermediate feature map of the stage is determined to be positioned at the non-first position of the first designated sequence according to the stage number of the intermediate feature map of the stage, determining that the preamble node corresponding to the intermediate feature map of the stage comprises input data of a target unit to which the intermediate feature map of the stage belongs, and arranging the stage numbers of the intermediate feature maps of other stages before the intermediate feature map of the stage in the first designated sequence.
Optionally, input data of each target unit is respectively input to each target unit, and output data of the target model is obtained through target operations respectively corresponding to each target unit, which specifically includes:
Inputting input data of each target unit into the target unit, and respectively executing target operation on each precursor node corresponding to the intermediate feature map of each stage contained in the target unit to obtain the intermediate feature map of each stage of the target unit;
stacking the intermediate feature graphs of each stage of the target unit to obtain an output feature graph of the target unit;
and determining output data of the target model according to the output characteristic diagrams of the target units.
Optionally, the data processing unit further includes an adaptive average pooling unit and a classification unit, where the adaptive average pooling unit is configured to reduce a size of the feature map to a specified size, and the classification unit is configured to obtain a classification result based on the feature map input to the classification unit;
determining output data of the target model according to the output feature diagrams of the target units, wherein the method specifically comprises the following steps:
selecting a designated unit from the target units, and inputting an output characteristic diagram of the designated unit into the adaptive average pooling unit to obtain an output characteristic diagram of the designated size; the size of the output characteristic diagram with the specified size is smaller than that of the output characteristic diagram of the specified unit;
And taking the output characteristic diagram with the specified size as input, inputting the input characteristic diagram into the classification unit, and obtaining a classification result output by the classification unit as output data of the target model.
Optionally, determining the input data of each target unit according to the input data of the target model specifically includes:
according to the target model, arranging the target units according to a second designated sequence;
taking the input data of the target model as the input data of the target unit positioned at the first position in the designated sorting;
and taking the output characteristic diagram of the last unit of the target unit and the output characteristic diagrams of the last two units of the target unit as input data of the target unit according to the second designated sequence.
Optionally, the target unit includes a reduction unit for reducing a size of an input feature map of the reduction unit and increasing a number of channels of the input feature map of the reduction unit;
before inputting the input of each target unit to each target unit, the method further comprises:
when the last unit of the target unit is a reduction unit, preprocessing the output feature images of the last two units of the target unit contained in the input of the target unit to obtain a preprocessed input feature image; the size of the preprocessed input feature image is smaller than that of the output feature image of the upper two units of the target unit, and the number of channels of the preprocessed input feature image is larger than that of the output feature image of the upper two units of the target unit;
Inputting the input of each target unit to each target unit respectively, specifically including:
and inputting the preprocessed input characteristic diagram into the target unit.
Optionally, the candidate operation includes at least one of a hole convolution operation, a split convolution operation, an average pooling operation, a maximum pooling operation.
The present specification provides a data processing apparatus based on a unified system for computing, the unified system for computing comprising a plurality of data processing units, the data processing units corresponding to a plurality of types of candidate operations, the apparatus comprising:
the control vector determining module is used for determining the target number of target units contained in the target model and the control vector corresponding to each target unit according to the obtained target model;
a target unit selection module, configured to select the target number of target units from the data processing units of the integrated storage and calculation system;
the target operation determining module is used for determining target operation corresponding to each target unit from candidate operations of various types according to the control vector corresponding to the target unit;
and the output data determining module is used for determining the input data of each target unit according to the input data of the target model when receiving the input data of the target model, respectively inputting the input data of each target unit into each target unit, and obtaining the output data of the target model through the corresponding target operation of each target unit.
The present specification provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the above-described data processing method based on a memory-accounting integrated system.
The present specification provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the above-mentioned data processing method based on a memory-integrated system when executing the program.
The above-mentioned at least one technical scheme that this specification adopted can reach following beneficial effect:
in the data processing method based on the integrated system of calculation provided in the present specification, the target number of target units and the control vector corresponding to each target unit are determined according to the target model, the target number of target units are selected from each data processing unit of the integrated system of calculation, and then the target operation corresponding to each target unit is determined from each type of candidate operation according to the control vector corresponding to each target unit, so that the input of each target unit is input into each target unit, and the data processing is performed on the input of each target unit by adopting the target operation, thereby obtaining the output data of the target model. Therefore, the mode of determining the target operation executed by the target unit based on the control vector corresponding to the target unit can be compatible with models of different architectures only by changing the control vector, and redesign of a circuit structure is not needed, so that the scene of model reasoning based on the integrated circuit is expanded, and the efficiency is improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the specification, illustrate and explain the exemplary embodiments of the present specification and their description, are not intended to limit the specification unduly. In the drawings:
FIG. 1 is a flow chart of a method for processing data in a unified memory system according to the present disclosure;
FIG. 2 is a schematic diagram of a control vector in the present specification;
FIG. 3 is a flow chart of a method for processing data in a unified memory system according to the present disclosure;
FIG. 4 is a schematic diagram of a control vector in the present specification;
FIG. 5 is a schematic diagram of a target unit according to the present disclosure;
FIG. 6 is a schematic diagram of a batch normalization circuit of the present disclosure;
FIG. 7 is a schematic diagram of a conventional convolution analog computation circuit of the present disclosure;
FIG. 8 is a schematic diagram of a separate convolution analog calculation circuit according to the present disclosure;
FIG. 9 is a schematic diagram of a data processing apparatus of a memory system provided in the present specification;
fig. 10 is a schematic view of the electronic device corresponding to fig. 1 provided in the present specification.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the present specification more apparent, the technical solutions of the present specification will be clearly and completely described below with reference to specific embodiments of the present specification and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present specification. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.
In addition, all the actions for acquiring signals, information or data in the present specification are performed under the condition of conforming to the corresponding data protection rule policy of the place and obtaining the authorization given by the corresponding device owner.
In the years, under the industry background that the development of the semiconductor process approaches to the physical limit and the moore law slows down, the data storage technology faces the dilemma that the memory access efficiency and the memory density cannot be considered. Under von neumann architecture, limited data throughput between processing units and memory devices results in stalling of chip single-thread performance development, while causing high power consumption of memory operations, which has become a key bottleneck for chip performance and resource consumption, i.e., a "memory wall" and a "power consumption wall" problem. The in-Memory-in-Memory (CIM) architecture integrates storage and computation, avoids frequent data movement between processing units and Memory units, and is attracting a great deal of attention as a new architecture, which is considered as an effective solution for deploying a large-scale network model based on resource/energy consumption sensitive devices. The integrated memory system is typically composed of analog and digital components, i.e. the Crossbar is an analog component, and the majority of the other components are digital, such as controllers, buffers, ADCs, DACs, etc. When the neural network is deployed based on a memory-computing integrated system, frequent digital-analog and analog-digital conversion operations exist between each layer/each computing core, and the energy and area consumption is concentrated in ADC, DAC, buffer overhead.
At present, full simulation and parallelization processing of the reasoning process of the neural network can be realized based on a pure simulation operation framework of the memristor cross array, and a plurality of output characteristic diagrams are generated in a single processing period so as to improve the throughput rate of the system.
However, the fully parallel in-memory computational circuit modeling scheme must be regularized, structured, and heterogeneous network architecture design is difficult. More importantly, although the synaptic weights can be changed by reprogramming, the current parallel analog circuit design is custom designed according to a specific network architecture, and the analog circuits adopted between different network architectures are not compatible, so that the application scenario of the data processing scheme for executing the neural network architecture based on the integrated memory system is limited.
For example, in the face recognition field, images may be identified using a pre-trained face recognition model. However, in practical applications, even though the models belong to the face recognition field, different face recognition models may have different model structures, for example, a face recognition model supporting in-vivo identification may be added with a neural network layer for detecting whether a face in an image is a real face or a photo containing a face before an image feature extraction sub-model, and a face recognition model not supporting in-vivo identification does not need to be deployed with a neural network layer for realizing in-vivo detection before the image feature extraction sub-model.
Based on the above, the present disclosure provides a data processing method based on a memory integrated system, which uses each data processing unit included in the memory integrated system as a limitation, and based on parallel simulation in-memory calculation rules, realizes a data processing scheme compatible with different specific model architectures, so as to realize the reconfigurable and high throughput processing effect of the neural network model.
The following describes in detail the technical solutions provided by the embodiments of the present specification with reference to the accompanying drawings.
Fig. 1 is a schematic flow chart of a data processing method based on a memory integrated system provided in the present specification.
S100: and determining the target quantity of target units contained in the target model and control vectors corresponding to the target units according to the obtained target model.
The execution subject of the data processing method based on the integrated system of storage and calculation provided in the present specification may be an electronic device such as a server, a chip, etc. deployed with the integrated system of storage and calculation, where the integrated system of storage and calculation includes a plurality of data processing units defined in advance, and each data processing unit may take on a data processing task of executing a plurality of types of candidate operations on data input to the data processing unit, that is, each data processing unit corresponds to a plurality of types of candidate operations. The data processing units of the integrated memory system may include multiple types of units, such as a conventional unit, a reduction unit, a classification unit, an adaptive averaging and pooling unit, and a preprocessing unit, and different data processing units may be capable of performing different data processing tasks due to the different number and types of candidate operations that may be performed by the different data processing units.
For each neural network model with a different model structure, when the neural network model performs a data processing task based on the integrated system, one or more data processing units required for performing the task of the neural network model may be determined as target units of the neural network model from each data processing unit included in the integrated system based on the model structure of the neural network model. Thus, when the target model is acquired, the network structure and model parameters of each neural network layer contained in the target model can be acquired, the target number of target units required for executing the data processing task of the target model is determined according to the acquired network structure of the neural network layer of the target model, and the parameters of each target unit adopted by the target model when executing the data processing task based on the integrated memory system are determined according to the acquired model parameters of the target model. Generally, a target model for executing a data processing task based on a memory integrated system is trained in advance, that is, model parameters of the target model are iteratively optimized, and accuracy meeting requirements can be achieved. The training process of the target model may be performed based on a memory integrated system, or may be performed by other electronic devices for model training, which is not limited in this specification.
For example, for an image classification model, the image classification model may be split into an image feature extraction sub-model for extracting image features from an input image and a classifier for performing image classification based on the image features, such that when the image classification model performs an image classification task based on a memory system, one or more data processing units that complete the image feature extraction and one or more data processing units that perform image classification based on the image features may be determined, thereby completing the image classification task of the image classification model based on each determined data processing unit.
Further, the data processing tasks performed by different target units are different from each other for each target unit included in the target model, in that the number and types of operations that can be performed by each target unit are different from each other.
In order to solve the problems of the heterogeneous model that the architecture design difficulty is high and the network architecture among different models is not compatible, in the present specification, a control vector corresponding to a target unit is adopted to flexibly select the type and the number of operations executed in the target unit, and only the control vector corresponding to the target unit needs to be preset, so that the candidate operations executed by the target unit can be flexibly adjusted without redesigning a simulation calculation circuit, thereby determining the structure of the target unit and the data processing task which can be realized.
The control vector may be a multidimensional numerical vector or numerical matrix, and the control vector (matrix) may be constructed by taking a connection relation from input to output of the target unit as a row and taking various types of candidate operations corresponding to the target unit as columns. The control vector may be a vector represented by a unique hot code, or a vector represented by decimal, or a vector represented by any existing coding method, which is not limited in this specification.
In addition, the control vector needs to be determined in advance according to the operation to be performed by the target unit included in the target model. Each dimension of the vector value included in the control vector is used for representing that the input data to the output data corresponding to the row where the vector value is located is performed by the operation corresponding to the column where the vector value is located. The number and type of operations that the target unit needs to perform, and the number of intermediate data that the target unit exists, may be determined according to a specific application scenario, which is not limited in this specification.
For example, by a separate convolution operation involving the target unit, the input X may be used 1 Obtain output Y 1 The control vector may be (0, 1, 0) when the arrangement sequence of the candidate operations is the hole convolution operation, the separate convolution operation, the average pooling operation, and the maximum pooling operation, where the vector value "1" indicates the input X 1 Obtain output Y 1 The operations to be performedThe vector value "0" is indicated by the input X 1 Obtain output Y 1 No action needs to be performed.
As another example, a hole convolution operation contained by a target cell is performed by input X 2 Obtaining intermediate data A 2 Then the intermediate data A is processed by the average pooling operation contained in the target unit 2 Obtain output Y 2 The control vector of the target unit may be as shown in table 2 below, and the control vector (matrix) corresponding to the target unit may also be as follows:
each candidate operation may be considered an atomic operation, i.e., each candidate operation may be combined from multiple steps of operations, such as atomic operations of a conventional convolution may be composed of an operator activation function (ReLu), a conventional convolution, and batch normalization. Wherein the candidate operation refers to a data processing operation performed on input data input into a data processing unit, and the candidate operation comprises at least one of a hole convolution operation, a split convolution operation, an average pooling operation and a maximum pooling operation. In addition, the same type of candidate operation may correspond to different operating parameters, such as step size, core size, etc., of the existing neural network, which is not limited in this specification. Wherein different types of candidate operations are considered different candidate operations, and the same type of candidate operations corresponding to different operating parameters may also be considered different candidate operations, e.g., a hole convolution operation and a split convolution are two different candidate operations, and a 3x3 split convolution operation and a 5x5 split convolution operation are two different candidate operations. As for the operation parameters corresponding to the target operation adopted by the target unit included in the target model, the operation parameters may be determined based on the obtained model parameters of the target model.
It is understood that the above atomic operation refers to one operation consisting of a multi-step operation. The atomic operation either performs all steps or none, and cannot perform only a subset of the steps that the atomic operation contains.
In addition, in one or more embodiments of the present disclosure, when a data processing unit in a unified system executes a data processing task, a plurality of operations may be sequentially performed on input data to obtain output data, in this process, data flowing between different operations is intermediate data, where the intermediate data is generally a numerical vector, such as an intermediate feature vector or an intermediate feature map, and a representation form of the intermediate data may be determined according to a specific application scenario, which is not limited in this disclosure.
In the embodiment of the present disclosure, each candidate operation executed by the data processing unit may be mapped into an analog computing circuit, that is, the analog computing circuits are designed based on each candidate operation executed by the data processing unit, and the analog computing circuits corresponding to each data processing unit are integrated to obtain a memory system. Thus, in the integrated system, each candidate operation is completed by the analog computing circuit corresponding to the candidate operation, the data processing task corresponding to each data processing unit is completed by the analog computing circuit corresponding to the candidate operation included in the data processing unit, and as for the specific design of the analog computing circuit corresponding to the candidate operation, the specific design can be predetermined according to the specific application scenario, which is not limited in this specification.
S102: and selecting the target number of target units from the data processing units of the integrated storage and calculation system.
Specifically, the target model may include a plurality of target units, each of which performs a subtask, and the subtask performed by each of the target units is based on the subtask performed by each of the target units, so as to complete a data processing task corresponding to the target model.
In this step, based on the above-determined target number, a target unit of the target number is selected from among the respective data processing units as each target unit that completes the data processing task corresponding to the target model. The types of the target units included in the target model may be the same or different, which is not limited in the present specification.
S104: and determining the target operation corresponding to each target unit from the candidate operations of each type according to the control vector corresponding to the target unit.
Specifically, based on the description of S100, the control vector corresponding to the target unit may indicate the number and type of operations that need to be performed when data processing is performed by the target unit, and the input data and the output data of each operation. Therefore, it is only necessary to determine what operation is required to be adopted to obtain corresponding output data when processing certain input data by the target unit based on each vector value included in the control vector corresponding to the target unit.
The "operation to be adopted" is the target operation corresponding to the target unit determined from the candidate operations of each type according to the control vector corresponding to the target unit. The number of target operations corresponding to the target unit may be one or more. Alternatively, the first specified vector value representation may be preset to employ a corresponding candidate operation, and the second specified vector value representation may not employ a corresponding candidate operation, e.g., the first specified vector value is "1" and the second specified vector value is "0". The candidate operation corresponding to the first specified vector value is the target operation.
For example, as shown in FIG. 2, the table is the control vector (matrix) of the target cell, and the input data of the conventional cell is X 1 The intermediate data includes A 1 And A 2 For intermediate data A 1 In the lookup table, "A 1 -X 1 The operation pointed to by the column with the value of "1" can be determined to be the hole convolution operation, so that the operation can be performed according to the input data X 1 Intermediate data A is obtained through hole convolution operation 1 For intermediate data A 2 In the lookup table, "A 2 -X 1 The operation pointed to by the column of the determinable magnitude "1" is a split convolution operation, and therefore, can be based on the input data X 1 Intermediate data A is obtained by a split convolution operation 2 In addition, "A" is also found from the table 2 -A 1 ", the description is based on intermediate data A 1 Intermediate data A may also be obtained by taking candidate actions 2 Find "A 2 -A 1 The operation pointed by the column corresponding to the determinable vector value of 1 is the average pooling operation, and the average pooling operation can be adopted according to the intermediate data A1 to obtain the intermediate data A 2
In addition, the target unit has the following two cases: firstly, in a target unit, input data of the target unit can directly obtain output data of the target unit based on target operation; secondly, in some application scenarios, the target unit may further include intermediate data, that is, intermediate data is obtained by using a target operation for input data of the target unit, and then output data of the target unit is obtained by using a target operation for the intermediate data. The two cases may exist in the same target model at the same time or only one of the cases may exist in the target unit included in the same target model, depending on the specific application scenario, which is not limited in this specification.
S106: when input data of the target model are received, the input data of each target unit are determined according to the input data of the target model, the input data of each target unit are respectively input into each target unit, and the output data of the target model are obtained through target operations respectively corresponding to each target unit.
After traversing each target unit based on step S104 to obtain the target operation corresponding to each target unit, since the target model is equivalent to being formed by stacking each target unit, there is a relationship between input and output between each target unit, that is, the input data of the target unit may be the input data of the target model or the output data of other target units, which may be determined according to the specific model structure of the target model, which is not limited in this specification.
According to different application scenarios and data processing tasks performed by the target models, the different target models may correspond to different input data, for example, for the face recognition field, the input data of the face recognition model may be a face image; for the natural language processing field, the input data of the natural language processing model may be text data; for the recommendation field, input data of the recommendation model may be commodity information. The specific type and form of the input data can be determined according to the specific application scenario, and the specification is not limited to this.
Optionally, determining a ranking of each target unit according to a model structure of the target model, wherein for a target unit located at the first position in the ranking, input data of the target unit is input data of the target model; for a target unit that is not first in the ranking, the input data of the target unit is the output data of other target units that are ranked before the target unit. The number of the other target units and whether the other target units are adjacent to the target unit in the order of each target unit are not limited in this specification.
Further, for each target unit, performing a target operation corresponding to the target unit on input data input to the target unit to obtain output data of the target unit, and determining output data of the target model according to the output data of each target unit.
Optionally, determining the output data of the target model according to the output data of each target unit specifically includes:
determining the sequence of each target unit, and taking the output data of the target unit positioned at the tail end in the sequence as the output data of the target model.
Optionally, determining the output data of the target model according to the output data of each target unit specifically includes:
and selecting a plurality of candidate units from the target units, and determining output data of the target model according to the output data of each candidate unit, wherein each candidate unit at least comprises a target unit positioned at the tail end in the sorting of each target unit. Specifically, in some application scenarios, the output data of the target model may be made up of output data of a plurality of target units included in the target model, for example, in a recommended scenario, the target model may include a target unit a for determining a purchase probability of a commodity in an electronic market and a target unit B for determining a purchase probability of a commodity in an online shopping scene. As to how to select the candidate unit from each target unit included in the target model in practical application, it needs to be determined according to the model structure of the target model and a specific application scenario, which is not limited in this specification.
In the data processing method based on the integrated system of calculation provided by the present description, the target number of target units and the control vector corresponding to each target unit are determined according to the target model, the target number of target units are selected from each data processing unit of the integrated system of calculation, and then the target operation corresponding to each target unit is determined from each type of candidate operation according to the control vector corresponding to each target unit, so that the input of each target unit is input into each target unit, and the data processing is performed on the input of each target unit by adopting the target operation, so as to obtain the output data of the target model.
Therefore, the mode of determining the target operation executed by the target unit based on the control vector corresponding to the target unit can be compatible with models of different architectures only by changing the control vector, and redesign of a circuit structure is not needed, so that the scene of model reasoning based on the integrated circuit is expanded, and the efficiency is improved.
In one or more embodiments of the present description, candidate operations performed by a data processing unit may include various types of convolution operations based on which feature maps may be extracted from input data of the data processing unit and various types of pooling operations based on which feature maps may be reduced in size in a memory system. Also, multiple convolution operations and/or multiple pooling operations may be performed in a single data processing unit, and the type of each convolution operation or pooling operation may be the same or different. This occurs when there are multiple intermediate feature maps in a single data processing unit, i.e. intermediate data is contained in the target unit mentioned in the previous step S104, where the intermediate data is an intermediate feature map.
Thus, before step S104 in fig. 1, the number of intermediate feature maps existing in each target unit corresponding to the target model may be determined, and a scheme for obtaining each intermediate feature map may be obtained, so that in an alternative scheme in step S104, a target operation corresponding to a target unit can be determined based on the target unit including a plurality of intermediate feature maps. As shown in fig. 3, the specific steps are as follows:
s200: and aiming at each target unit, acquiring the number of the intermediate feature graphs contained in the target unit and the stage numbers corresponding to the intermediate feature graphs, wherein the stage numbers are used for indicating the ordering of the intermediate feature graphs in the intermediate feature graphs.
Specifically, the number of intermediate feature graphs included in each target unit may be preset based on a priori experience of model reasoning, and the number of intermediate feature graphs included in different target units may be the same or different, which is not limited in this specification.
Based on the fact that the target unit comprises a plurality of intermediate feature graphs, output data of the target unit is the output feature graph obtained by stacking the intermediate feature graphs.
In addition, the stage numbers corresponding to the intermediate feature maps may be acquired. And determining the sequence of the intermediate feature graphs of each stage in each intermediate feature graph based on the stage numbers corresponding to each intermediate feature graph. For example, the target unit contains two intermediate feature maps, the stage numbers of which are "0" and "1", respectively, and the intermediate feature map with the stage number "0" is ordered before the intermediate feature map with the stage number "1". The stage number of the intermediate feature map may be used to determine the preamble node to which the intermediate feature map of each stage corresponds.
S202: and determining a preamble node corresponding to the intermediate feature map of each stage in the target unit according to the stage number of the intermediate feature map of the stage, wherein the preamble node is used for obtaining the intermediate feature map of the stage through candidate operation in the target unit, and the preamble node at least comprises input data of the target unit.
In the embodiment of the present disclosure, the intermediate feature map of one stage generally refers to a feature map, and the number of channels of the intermediate feature map of each stage included in the same target unit is generally the same.
Further, since the object to be achieved by the scheme shown in fig. 3 is to determine the target operation corresponding to the target unit, the target operation is determined based on the control vector, and the control vector is a vector describing through which operation the input data and the output data are connected. Therefore, in this step, it is necessary to determine the preamble node to which the intermediate feature map of each stage corresponds, which is input data from which the intermediate feature map can be obtained by a candidate operation, but it is necessary to determine whether or not the intermediate feature map is actually obtained based on the preamble node based on a specific vector value in the control vector.
For example, still taking the foregoing Table 2 as an example, assume A 2 And Y 2 Respectively, is an intermediate feature map of two stages contained in the target unit, X 2 Input data for the target cell, then X 2 Is A 2 Preamble node, X of 2 And A 2 All are Y 2 Is a preamble node of (c). However, since the control vector indicates A 2 Is based on X 2 Obtained by hole convolution operation, Y 2 Is based on intermediate feature diagram A 2 Obtained by an average pooling operation, and thus for Y 2 In other words, input data X 2 Not is provided withDetermination of Y by any operation 2
Further, in one or more embodiments of the present disclosure, the preamble node required for obtaining the intermediate feature map of each stage may further include other intermediate feature maps of each stage in addition to the input data of the target unit, so as to enrich the features of the data included in the intermediate feature maps of each stage, so as to improve accuracy of model reasoning.
Specifically, a first specified order of the intermediate graph of each stage in the target unit is determined according to the stage numbers of the intermediate feature graph of each stage in the target unit.
When the intermediate feature map of the stage is determined to be located at the first position of the first designated sequence according to the stage number of the intermediate feature map of the stage, the preamble node corresponding to the intermediate feature map of the stage is determined to comprise the input data of the target unit to which the intermediate feature map of the stage belongs.
When the intermediate feature map of the stage is determined to be positioned at the non-first position of the first designated sequence according to the stage number of the intermediate feature map of the stage, determining that the preamble node corresponding to the intermediate feature map of the stage comprises input data of a target unit to which the intermediate feature map of the stage belongs, and arranging the stage numbers of the intermediate feature maps of other stages before the intermediate feature map of the stage in the first designated sequence.
In practical applications, the intermediate feature map of a stage may correspond to one or more preamble nodes, and for each intermediate feature map of the stage, the intermediate feature maps of the stages whose stage numbers are arranged before the stage numbers of the intermediate feature map of the stage in the first specified order, and input data of a target unit to which the intermediate feature map of the stage belongs, are used as the preamble nodes of the intermediate feature map of the stage. For the intermediate feature map with no stage number arranged in front, the preceding node of the intermediate feature map is input data of the target unit, so that in general, the preceding node of the intermediate feature map includes at least input data of the target unit to which the intermediate feature map belongs.
Optionally, after determining the preamble node of the intermediate feature of the stage, when the preamble node is the input data of the target unit, the node number of the preamble node is set to a designated number, such as node number "0". For the case that the preceding node is an intermediate feature map, the node number of each preceding node may be determined according to the stage number of the intermediate feature map corresponding to each preceding node, and in general, the node number corresponding to the intermediate feature map when the intermediate feature map is used as the preceding node is determined, and after the input data of the target unit is used as the node number corresponding to the preceding node, such as node numbers "1" and "2", etc.
S204: and searching vector values corresponding to all the preamble nodes of the intermediate feature diagram of the stage in the control vector corresponding to the target unit.
The control vector (matrix) of the target unit may be constructed by taking the corresponding relation between the stage number of the intermediate feature map and the node number of the front node of the intermediate feature map as a row and taking various candidate operations corresponding to the target unit as columns, and the vector value of a certain column of a certain row corresponds to the front node of the intermediate feature map corresponding to the row where the representation vector value is located, and whether the candidate operation corresponding to the column where the representation vector value is located is adopted or not is determined, so as to obtain the intermediate feature map corresponding to the row where the vector value is located.
Alternatively, the first specified vector value representation may be preset to employ a corresponding candidate operation, and the second specified vector value representation may not employ a corresponding candidate operation, e.g., the first specified vector value is "1" and the second specified vector value is "0".
For example, as shown in FIG. 4, the table is a control vector (matrix) of a conventional unit whose input data includes X 3 And X 4 The intermediate feature diagram includes A 3 、A 4 And A 5 The row of control vectors is the correspondence between each intermediate feature map and the preceding node of the intermediate feature map, e.g. "A 3 -X 3 "characterizing intermediate feature map A 3 And A 3 Preamble node X of (2) 3 Corresponding relation of (3). From the vector values corresponding to each row, it can be determined what the preamble node of the intermediate feature map passesThe candidate operations may result in the intermediate feature map, e.g. "A 3 -X 3 "the corresponding vector value is (0, 1, 0), as can be seen from looking up the table in FIG. 4, based on the input data X 3 The intermediate feature diagram A can be obtained by adopting' cavity convolution operation 3 Thus, in the view of the conventional cell architecture shown in the upper diagram of FIG. 4, there is a diagram represented by X 3 Pointing A 3 Is marked with a "hole convolution operation", i.e. below the intermediate feature map a 3 In other words, one of the targets operates as X-based 3 Is a hole convolution operation of (1).
In addition, in practical applications, there may be some precursor node of the intermediate feature map that does not do anything to obtain the intermediate feature map, e.g. "A" in the table of FIG. 4 4 -X 4 "this line characterizes the input data X 4 To obtain the intermediate characteristic diagram A 4 The candidate operations employed, but based on the table of FIG. 4, it can be seen that "A 4 -X 4 "the vector value corresponding to this row is (1, 0), i.e., for the intermediate feature A 4 Input data X 4 No candidate operations are employed.
S206: and selecting target operations respectively executed on the precursor nodes corresponding to the intermediate feature graphs of the stage from various types of candidate operations according to vector values corresponding to the precursor nodes of the intermediate feature graphs of the stage, and taking the target operations as target operations corresponding to the target units.
Based on the scheme shown in fig. 3, when there are a plurality of stages of intermediate feature graphs in each target unit, based on the preamble node of the intermediate feature graph of each stage, the vector value corresponding to the preamble node of the intermediate feature graph of each stage is searched in the control vector corresponding to the target unit, and the preamble node required for obtaining the intermediate feature graph of the stage and the target operation performed on the preamble node are determined. Therefore, as long as the control vector corresponding to the target unit is changed, the network architecture corresponding to the target unit can be changed without redesigning an analog calculation circuit containing intermediate feature maps of a plurality of stages, and the integrated memory calculation system provided by the specification is used for executing the reasoning process of the target model, caching the intermediate feature maps is not needed, so that the data processing process of each target unit contained in the target model can be completed in a single processing period, and the system throughput rate is remarkably improved.
In one or more embodiments of the present disclosure, on the basis of the scheme shown in fig. 3, since each target unit may obtain a multi-stage intermediate feature map, step S106 of fig. 1 may be implemented as follows:
firstly, inputting input data of each target unit into the target unit, and respectively executing target operation on each precursor node corresponding to the intermediate feature map of each stage contained in the target unit to obtain the intermediate feature map of each stage;
secondly, stacking the intermediate feature graphs of each stage of the target unit to obtain an output feature graph of the target unit;
and then, determining output data of the target model according to the output characteristic diagrams of the target units.
In general, since the input data of the target unit may be output data of other target units arranged before the target unit, in this step, the output data of the target model may be determined based on the output feature map of the target unit arranged at the end of each target unit, and the scheme for determining the output data of the target model based on the output feature map of the target unit may be input to the classification unit for classification, or may be that the output feature map of the target unit arranged at the end is directly used as the output data of the target model, or may be other conventional feature map processing schemes, which are not limited in this specification.
Further, the data processing unit included in the integrated memory system comprises a conventional unit and a reduction unit, an adaptive average pooling unit and a classification unit, wherein the adaptive average pooling unit is used for reducing the size of the feature map to a specified size, and the classification unit is used for obtaining a classification result based on the feature map input into the classification unit.
Based on this, determining the output data of the target model according to the output feature map of each target unit may be further implemented according to the following implementation steps:
firstly, selecting a designated unit from the target units, and inputting an output characteristic diagram of the designated unit into the adaptive average pooling unit to obtain the output characteristic diagram of the designated size; the size of the output characteristic diagram with the specified size is smaller than that of the output characteristic diagram of the specified unit.
In this step, the number of the designated units selected from the target units may be one or more, and the scheme of selecting the designated units may be determined based on a preset rule or according to the size of the output feature map of the target unit, which is not limited in this specification.
The output feature map of the designating unit is used as an input of the adaptive averaging pooling unit, and the adaptive averaging pooling unit outputs an output feature map of a designated size, where the designated size may be a manually preset size or may be determined according to a size of the feature map that can be input by the classifying unit, which is not limited in this specification.
And then, taking the output characteristic diagram of the specified size corresponding to the specified unit as input, inputting the output characteristic diagram into the classification unit, and obtaining a classification result output by the classification unit as output data of the target model.
In general, in a unified system for computing, different types and numbers of target units may be selected based on the structure of the target model, in general, the target units are used to take the task of extracting features from the input data of the target model, while the adaptive averaging pooling unit is used to reduce the size of the feature map, and the classification unit is used to obtain classification results according to the feature map, it is seen that different model structures may correspond to selecting different target units in the unified system, but in general, an adaptive averaging pooling unit and a classification unit may be added after selecting the target units in order to obtain the output data of the target model, so in one or more embodiments of the present specification, the target units for extracting features may be used as variable (optional) units, and the adaptive averaging pooling unit and the classification unit may be used as fixed units. That is, the integrated system for storage and calculation related in the specification can provide a model implementation mode with changeable architecture under a certain limitation for models with different structures, and limits the scale of the architecture control while being compatible with different model structures, so that the architecture control cannot be expanded without limitation, and the efficiency of model reasoning and implementation is improved.
In one or more embodiments of the present description, the input of each target cell may be not only the output of the last cell but also the outputs of the last two (last) cells. Based on this, in either of the schemes shown in fig. 1 or 3, the input data of each target unit may be determined according to the arrangement order of the target unit in each target unit included in the target model.
The first step: and arranging the target units according to a second designated sequence according to the target model.
And a second step of: taking the input data of the target model as the input data of the target unit positioned at the first position in the designated sorting;
and a third step of: and taking the output characteristic diagram of the last unit of the target unit and the output characteristic diagrams of the last two units of the target unit as input data of the target unit according to the second designated sequence.
Fig. 5 is a schematic diagram of an internal architecture of a target unit, where the input of the target unit is the output feature map k-1 of the previous target unit and the output feature map k-2 of the previous target unit, and the target unit includes three intermediate feature maps numbered "0", "1", and "2", and the output feature map of the target unit can be obtained based on the stacking of the three intermediate feature maps.
Further, the target unit includes a reduction unit for reducing the size of the input feature map of the reduction unit and increasing the number of channels of the input feature map of the reduction unit.
In particular, in one or more embodiments of the present description, the target unit may include a conventional unit and a reduction unit. Wherein the conventional unit does not change the size of the feature map nor the channels of the feature map, i.e. the size of the input feature map and the size of the output feature map of the conventional unit are the same, and the number of channels of the input feature map and the number of channels of the output feature map of the conventional unit are also the same. The reduction unit is used for reducing the size of the feature map and increasing the channel number of the feature map, so that the size of the input feature map of the reduction unit is larger than that of the output feature map, and the channel number of the input feature map of the reduction unit is smaller than that of the output feature map.
In practical applications, the purpose of setting the reduction unit in each target unit is to reduce the size of the feature map and increase the number of channels of the feature map, the purpose of reducing the size of the feature map is to reduce the calculation amount, and the purpose of increasing or decreasing the number of channels of the feature map while reducing the size of the feature map is to reduce the calculation amount, so as to reduce the loss of important features as much as possible, and the purpose of increasing the number of channels of the feature map is to increase the number of channels of the feature map.
Furthermore, in order to ensure that the output feature map of the previous target unit can be input into the next target unit, the adaptation of the feature map size and the adaptation of the channel between the target units are required.
And when the last unit of the target unit is a reduction unit, preprocessing the output characteristic diagrams of the last two units of the target unit contained in the input of the target unit to obtain a preprocessed input characteristic diagram.
The size of the preprocessed input feature map is smaller than that of the output feature map of the upper two units of the target unit, and the number of channels of the preprocessed input feature map is larger than that of the output feature map of the upper two units of the target unit.
And then, inputting the preprocessed input characteristic diagram into the target unit.
In the embodiment of the present disclosure, when mapping a candidate operation into a simulated computing circuit to execute a corresponding data processing task, according to each sub-operation included in an atomic operation of the candidate operation, for each sub-operation included in the candidate operation, the simulated computing circuit may be built for the operation based on the data processing task completed by the sub-operation.
Wherein, for the operator activation function (ReLu) included in the candidate operation, the function of relu=max (0, x) in the operator activation function may be implemented by a diode. The type and specific parameters of the diode can be determined according to specific application scenes, and the specification is not limited.
For batch normalization operations included in the candidate operations, which may be implemented based on a batch normalization circuit as shown in FIG. 6, in this specification, a batch normalization circuit is implemented with one operational amplifier, three resistors (R 1 、R 2 、R 3 ) One memristor (R m ) The composition is formed. According to the normalized mean value and variance determined by the target model in the training stage, calculating and obtaining the scaling factor k and the offset value b, R of the layer-by-layer activation value 3 To fix the resistance of the tissue, R 2 The resistance of the memristor is dynamically adjusted according to a layer-by-layer scaling factor k and a bias value b. Input voltage V a And adjusting according to the offset value b. Parameters of elements in a specific batch normalization circuit can be determined according to a target model and a specific application scene, and the specification is not limited.
In addition, because in the neuron model, one neuron can simultaneously receive signals sent by a plurality of pre-neurons, corresponding stimulation is generated on the neurons after corresponding weighting treatment, the positive value and the negative value of the synaptic weight value mapped to the neural network model exist due to the stimulation effect and the inhibition effect among the neurons, but the memristor value only exists in the positive value, therefore, one weight value can be stored based on two memristor values, namely the weight value matrix W in the neural network is divided into two matrixes W + And W is - Storing the absolute value of the positive value in W as W + Storing the absolute value of the negative value in W as W - Then, w=w is obtained based on the subtracter + -W - . Based on this, in one or more embodiments of the present description, a mapping scheme of a conventional convolved analog computation circuit is shown in FIG. 7, where x 0 ~x 3 Input corresponding to a conventional convolution operationIn the feature map, each element included in x 0-x 3 is straightened, and then mapped into x in the analog computing circuit 0.0 ~x 3.5 . The weight matrix W is divided into W + And W is - Distributed in two intersecting arrays, i.e. W 0.0 ~W 2.3 . y 0-y 2 corresponds to the output characteristic diagram of the conventional convolution operation, and is mapped in an analog calculation circuit to be y 0.0+ ~y 2.1+
FIG. 8 shows a mapping scheme of a split convolved analog computation circuit, where x 0 ~x 3 Corresponding to the input feature diagram of the split convolution operation, each element contained in x 0-x 3 is straightened, and then mapped into x in the analog computing circuit 0.0 ~x 3.5 . y 0-y 2 corresponds to the output characteristic diagram of the split convolution operation, and is mapped in an analog calculation circuit to be y 0.0+ ~y 2.1+ . The split convolution consists of a ReLU, depth-wise convolution, point-wise convolution and batch normalization. The convolution kernels of the Depth-wise convolution are all single channels, and respectively correspond to the single channels of the feature map, convolution operation is carried out, and the number of the channels of the feature map is kept. For point-by-point convolution, FIG. 8 shows a normal convolution of 1x1 to maintain the feature map space size, with the number of output feature map channels consistent with the number of 1x1 convolution kernels. In the analog computation circuit of the split convolution operation in fig. 8, in order to show the arrangement of parallel analog devices, convolution kernel elements (analog computation units) are marked with different numbers (0 to 3).
Fig. 9 is a schematic diagram of a data processing device based on a memory integrated system provided in the present specification, specifically including:
the control vector determining module 300 is configured to determine, according to the obtained target model, a target number of target units included in the target model and a control vector corresponding to each target unit;
a target unit selection module 302, configured to select the target number of target units from the data processing units of the integrated storage and calculation system;
a target operation determining module 304, configured to determine, for each target unit, a target operation corresponding to the target unit from candidate operations of each type according to a control vector corresponding to the target unit;
and the output data determining module 306 is configured to determine input data of each target unit according to the input data of the target model when receiving the input data of the target model, input the input data of each target unit to each target unit, and obtain the output data of the target model through the target operations respectively corresponding to each target unit.
Optionally, the apparatus further comprises:
the preamble node determining module 308 is specifically configured to obtain, for each target unit, a number of intermediate feature graphs included in the target unit, and a stage number corresponding to each intermediate feature graph, where the stage number is used to indicate a ranking of the intermediate feature graphs in each intermediate feature graph; for the intermediate feature map of each stage in the target unit, determining a preamble node corresponding to the intermediate feature map of the stage according to the stage number of the intermediate feature map of the stage, wherein the preamble node is used for obtaining the intermediate feature map of the stage through candidate operation in the target unit, and the preamble node at least comprises input data of the target unit;
Optionally, the target operation determining module 304 is specifically configured to search, in a control vector corresponding to the target unit, a vector value corresponding to each preamble node of the intermediate feature map in the stage; and selecting target operations respectively executed on the precursor nodes corresponding to the intermediate feature graphs of the stage from various types of candidate operations according to vector values corresponding to the precursor nodes of the intermediate feature graphs of the stage, and taking the target operations as target operations corresponding to the target units.
Optionally, the preamble node determining module 308 is specifically configured to determine a first specified order of the intermediate graph of each stage in the target unit according to the stage number of the intermediate feature graph of each stage in the target unit; when the intermediate feature map of the stage is determined to be positioned at the first position of the first appointed sequence according to the stage number of the intermediate feature map of the stage, determining that the preamble node corresponding to the intermediate feature map of the stage comprises the input data of the target unit to which the intermediate feature map of the stage belongs; when the intermediate feature map of the stage is determined to be positioned at the non-first position of the first designated sequence according to the stage number of the intermediate feature map of the stage, determining that the preamble node corresponding to the intermediate feature map of the stage comprises input data of a target unit to which the intermediate feature map of the stage belongs, and arranging the stage numbers of the intermediate feature maps of other stages before the intermediate feature map of the stage in the first designated sequence.
Optionally, the output data determining module 306 is specifically configured to, for each target unit, input data of the target unit into the target unit, and execute a target operation on each preamble node corresponding to each stage of the intermediate feature map included in the target unit, so as to obtain each stage of the intermediate feature map of the target unit; stacking the intermediate feature graphs of each stage of the target unit to obtain an output feature graph of the target unit; and determining output data of the target model according to the output characteristic diagrams of the target units.
Optionally, the data processing unit further includes an adaptive average pooling unit and a classification unit, where the adaptive average pooling unit is configured to reduce a size of the feature map to a specified size, and the classification unit is configured to obtain a classification result based on the feature map input to the classification unit;
optionally, the output data determining module 306 is specifically configured to select a specified unit from the target units, and input an output feature map of the specified unit to the adaptive average pooling unit to obtain an output feature map of the specified size; the size of the output characteristic diagram with the specified size is smaller than that of the output characteristic diagram of the specified unit; and taking the output characteristic diagram with the specified size as input, inputting the input characteristic diagram into the classification unit, and obtaining a classification result output by the classification unit as output data of the target model.
Optionally, the output data determining module 306 is specifically configured to arrange the target units according to the target model in a second specified order; taking the input data of the target model as the input data of the target unit positioned at the first position in the designated sorting; and taking the output characteristic diagram of the last unit of the target unit and the output characteristic diagrams of the last two units of the target unit as input data of the target unit according to the second designated sequence.
Optionally, the target unit includes a reduction unit for reducing a size of an input feature map of the reduction unit and increasing a number of channels of the input feature map of the reduction unit;
optionally, the apparatus further comprises:
the preprocessing module 310 is specifically configured to, when a previous unit of the target unit is a reduction unit, perform preprocessing on output feature maps of two previous units of the target unit included in the input of the target unit, so as to obtain a preprocessed input feature map; the size of the preprocessed input feature image is smaller than that of the output feature image of the upper two units of the target unit, and the number of channels of the preprocessed input feature image is larger than that of the output feature image of the upper two units of the target unit;
Optionally, the output data determining module 306 is specifically configured to input the preprocessed input feature map into the target unit.
Optionally, the candidate operation includes at least one of a hole convolution operation, a split convolution operation, an average pooling operation, a maximum pooling operation.
The present specification also provides a computer-readable storage medium storing a computer program operable to execute the above-described data processing method based on a memory integrated system shown in fig. 1.
The present specification also provides a schematic structural diagram of the electronic device shown in fig. 10. At the hardware level, as shown in fig. 10, the electronic device includes a processor, an internal bus, a network interface, a memory, and a nonvolatile storage, and may include hardware required by other services. The processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to realize the data processing method based on the integrated memory system shown in fig. 1. Of course, other implementations, such as logic devices or combinations of hardware and software, are not excluded from the present description, that is, the execution subject of the following processing flows is not limited to each logic unit, but may be hardware or logic devices.
In the 90 s of the 20 th century, improvements to one technology could clearly be distinguished as improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) or software (improvements to the process flow). However, with the development of technology, many improvements of the current method flows can be regarded as direct improvements of hardware circuit structures. Designers almost always obtain corresponding hardware circuit structures by programming improved method flows into hardware circuits. Therefore, an improvement of a method flow cannot be said to be realized by a hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (e.g., field programmable gate array (Field Programmable Gate Array, FPGA)) is an integrated circuit whose logic function is determined by the programming of the device by a user. A designer programs to "integrate" a digital system onto a PLD without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Moreover, nowadays, instead of manually manufacturing integrated circuit chips, such programming is mostly implemented by using "logic compiler" software, which is similar to the software compiler used in program development and writing, and the original code before the compiling is also written in a specific programming language, which is called hardware description language (Hardware Description Language, HDL), but not just one of the hdds, but a plurality of kinds, such as ABEL (Advanced Boolean Expression Language), AHDL (Altera Hardware Description Language), confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), lava, lola, myHDL, PALASM, RHDL (Ruby Hardware Description Language), etc., VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog are currently most commonly used. It will also be apparent to those skilled in the art that a hardware circuit implementing the logic method flow can be readily obtained by merely slightly programming the method flow into an integrated circuit using several of the hardware description languages described above.
The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application specific integrated circuits (Application Specific Integrated Circuit, ASIC), programmable logic controllers, and embedded microcontrollers, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic of the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller may thus be regarded as a kind of hardware component, and means for performing various functions included therein may also be regarded as structures within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.
The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in one or more software and/or hardware elements when implemented in the present specification.
It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
The present description is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the specification. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
The description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.
The foregoing is merely exemplary of the present disclosure and is not intended to limit the disclosure. Various modifications and alterations to this specification will become apparent to those skilled in the art. Any modifications, equivalent substitutions, improvements, or the like, which are within the spirit and principles of the present description, are intended to be included within the scope of the claims of the present description.

Claims (10)

1. The data processing method based on the integrated system of calculation, characterized by that, the integrated system of calculation includes a plurality of data processing units, each candidate operation that the data processing unit in the integrated system of calculation carries out is mapped into the analog calculation circuit separately, the integrated system of calculation is integrated by the analog calculation circuit that each data processing unit corresponds to, the said data processing unit also includes the unit of adaptive average pooling and classification unit, the said data processing unit corresponds to a plurality of candidate operations of types, the type of the said candidate operation includes at least: one of a hole convolution operation, a split convolution operation, an average pooling operation, a maximum pooling operation, the method comprising:
according to the obtained target model, determining one or more data processing units required by executing tasks of the target model from all data processing units contained in the integrated storage and calculation system as target units of the target model; determining the target number of target units contained in the target model and control vectors corresponding to the target units, wherein each dimension of vector value contained in the control vectors is used for representing that the operation from input data to output data corresponding to a row where the vector value is located is executed by the operation corresponding to a column where the vector value is located;
Selecting the target number of target units from the data processing units of the integrated memory system;
for each target unit, determining a target operation corresponding to the target unit from candidate operations of various types according to a control vector corresponding to the target unit;
when input data of the target model are received, the input data of each target unit are determined according to the input data of the target model, the input data of each target unit are respectively input into each target unit, and the output data of the target model are obtained through target operations respectively corresponding to each target unit.
2. The method of claim 1, wherein prior to determining the target operation corresponding to the target unit from among the candidate operations of each type based on the control vector corresponding to the target unit, the method further comprises:
for each target unit, acquiring the number of intermediate feature graphs contained in the target unit and the stage numbers corresponding to the intermediate feature graphs respectively, wherein the stage numbers are used for indicating the ordering of the intermediate feature graphs in the intermediate feature graphs;
for the intermediate feature map of each stage in the target unit, determining a preamble node corresponding to the intermediate feature map of the stage according to the stage number of the intermediate feature map of the stage, wherein the preamble node is used for obtaining the intermediate feature map of the stage through candidate operation in the target unit, and the preamble node at least comprises input data of the target unit;
According to the control vector corresponding to the target unit, determining the target operation corresponding to the target unit from the candidate operations of each type, wherein the method specifically comprises the following steps:
searching vector values corresponding to all the preamble nodes of the intermediate feature diagram of the stage in the control vector corresponding to the target unit;
and selecting target operations respectively executed on the precursor nodes corresponding to the intermediate feature graphs of the stage from various types of candidate operations according to vector values corresponding to the precursor nodes of the intermediate feature graphs of the stage, and taking the target operations as target operations corresponding to the target units.
3. The method according to claim 2, wherein determining the preamble node corresponding to the intermediate feature map of the stage according to the stage number of the intermediate feature map of the stage specifically comprises:
determining a first appointed sequence of the intermediate feature graphs of each stage in the target unit according to the stage numbers of the intermediate feature graphs of each stage in the target unit;
when the intermediate feature map of the stage is determined to be positioned at the first position of the first appointed sequence according to the stage number of the intermediate feature map of the stage, determining that the preamble node corresponding to the intermediate feature map of the stage comprises the input data of the target unit to which the intermediate feature map of the stage belongs;
When the intermediate feature map of the stage is determined to be positioned at the non-first position of the first designated sequence according to the stage number of the intermediate feature map of the stage, determining that the preamble node corresponding to the intermediate feature map of the stage comprises input data of a target unit to which the intermediate feature map of the stage belongs, and arranging the stage numbers of the intermediate feature maps of other stages before the intermediate feature map of the stage in the first designated sequence.
4. The method according to claim 2, wherein the input data of each target unit is input to each target unit, and the output data of the target model is obtained through the target operation corresponding to each target unit, specifically including:
inputting input data of each target unit into the target unit, and respectively executing target operation on each precursor node corresponding to the intermediate feature map of each stage contained in the target unit to obtain the intermediate feature map of each stage of the target unit;
stacking the intermediate feature graphs of each stage of the target unit to obtain an output feature graph of the target unit;
and determining output data of the target model according to the output characteristic diagrams of the target units.
5. The method of claim 4, wherein the adaptive averaging pooling unit is configured to reduce a size of the feature map to a specified size, and the classification unit is configured to obtain a classification result based on the feature map input to the classification unit;
determining output data of the target model according to the output feature diagrams of the target units, wherein the method specifically comprises the following steps:
selecting a designated unit from the target units, and inputting an output characteristic diagram of the designated unit into the adaptive average pooling unit to obtain an output characteristic diagram of the designated size; the size of the output characteristic diagram with the specified size is smaller than that of the output characteristic diagram of the specified unit;
and taking the output characteristic diagram with the specified size as input, inputting the input characteristic diagram into the classification unit, and obtaining a classification result output by the classification unit as output data of the target model.
6. The method according to any one of claims 1 to 5, wherein determining the input data of each target unit according to the input data of the target model specifically includes:
according to the target model, arranging the target units according to a second designated sequence;
Taking the input data of the target model as the input data of the target unit positioned at the first position in the designated sorting;
and taking the output characteristic diagram of the last unit of the target unit and the output characteristic diagrams of the last two units of the target unit as input data of the target unit according to the second designated sequence.
7. The method of claim 6, wherein the target unit includes a reduction unit for reducing a size of an input feature map of the reduction unit and increasing a number of channels of the input feature map of the reduction unit;
before inputting the input of each target unit to each target unit, the method further comprises:
when the last unit of the target unit is a reduction unit, preprocessing the output feature images of the last two units of the target unit contained in the input of the target unit to obtain a preprocessed input feature image; the size of the preprocessed input feature image is smaller than that of the output feature image of the upper two units of the target unit, and the number of channels of the preprocessed input feature image is larger than that of the output feature image of the upper two units of the target unit;
Inputting the input of each target unit to each target unit respectively, specifically including:
and inputting the preprocessed input characteristic diagram into the target unit.
8. A data processing device based on a integrative system, wherein the integrative system comprises a plurality of data processing units, each candidate operation executed by the data processing units in the integrative system is respectively mapped into an analog computing circuit, the integrative system is integrated by the analog computing circuit corresponding to each data processing unit, the data processing unit also comprises an adaptive average pooling unit and a classification unit, the data processing unit corresponds to a plurality of types of candidate operations, and the types of the candidate operations at least comprise: one of a hole convolution operation, a split convolution operation, an average pooling operation, a maximum pooling operation, the apparatus comprising:
the control vector determining module is used for determining one or more data processing units required by executing tasks of the target model from all data processing units contained in the integrated storage and calculation system according to the acquired target model, and taking the one or more data processing units as target units of the target model; determining the target number of target units contained in the target model and control vectors corresponding to the target units, wherein each dimension of vector value contained in the control vectors is used for representing that the operation from input data to output data corresponding to a row where the vector value is located is executed by the operation corresponding to a column where the vector value is located;
A target unit selection module, configured to select the target number of target units from the data processing units of the integrated storage and calculation system;
the target operation determining module is used for determining target operation corresponding to each target unit from candidate operations of various types according to the control vector corresponding to the target unit;
and the output data determining module is used for determining the input data of each target unit according to the input data of the target model when receiving the input data of the target model, respectively inputting the input data of each target unit into each target unit, and obtaining the output data of the target model through the corresponding target operation of each target unit.
9. A computer readable storage medium, characterized in that the storage medium stores a computer program which, when executed by a processor, implements the method of any of the preceding claims 1-7.
10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of any of the preceding claims 1-7 when executing the program.
CN202310555078.9A 2023-05-17 2023-05-17 Data processing method and device based on memory and calculation integrated system Active CN116306855B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310555078.9A CN116306855B (en) 2023-05-17 2023-05-17 Data processing method and device based on memory and calculation integrated system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310555078.9A CN116306855B (en) 2023-05-17 2023-05-17 Data processing method and device based on memory and calculation integrated system

Publications (2)

Publication Number Publication Date
CN116306855A CN116306855A (en) 2023-06-23
CN116306855B true CN116306855B (en) 2023-09-01

Family

ID=86794540

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310555078.9A Active CN116306855B (en) 2023-05-17 2023-05-17 Data processing method and device based on memory and calculation integrated system

Country Status (1)

Country Link
CN (1) CN116306855B (en)

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108647777A (en) * 2018-05-08 2018-10-12 济南浪潮高新科技投资发展有限公司 A kind of data mapped system and method for realizing that parallel-convolution calculates
CN111222626A (en) * 2019-11-07 2020-06-02 合肥恒烁半导体有限公司 Data segmentation operation method of neural network based on NOR Flash module
CN111667052A (en) * 2020-05-27 2020-09-15 上海赛昉科技有限公司 Standard and nonstandard volume consistency transformation method for special neural network accelerator
CN112149816A (en) * 2020-11-25 2020-12-29 之江实验室 Heterogeneous memory-computation fusion system and method supporting deep neural network reasoning acceleration
CN113438171A (en) * 2021-05-08 2021-09-24 清华大学 Multi-chip connection method of low-power-consumption storage and calculation integrated system
CN113487006A (en) * 2021-07-09 2021-10-08 上海新氦类脑智能科技有限公司 Portable artificial intelligence auxiliary computing equipment
CN113517009A (en) * 2021-06-10 2021-10-19 上海新氦类脑智能科技有限公司 Storage and calculation integrated intelligent chip, control method and controller
CN114298296A (en) * 2021-12-30 2022-04-08 清华大学 Convolution neural network processing method and device based on storage and calculation integrated array
WO2023045114A1 (en) * 2021-09-22 2023-03-30 清华大学 Storage and computation integrated chip and data processing method
CN115964181A (en) * 2023-03-10 2023-04-14 之江实验室 Data processing method and device, storage medium and electronic equipment
CN115981870A (en) * 2023-03-10 2023-04-18 之江实验室 Data processing method and device, storage medium and electronic equipment
CN116048800A (en) * 2023-01-10 2023-05-02 之江实验室 Data processing method and device, storage medium and electronic equipment
CN116107728A (en) * 2023-04-06 2023-05-12 之江实验室 Task execution method and device, storage medium and electronic equipment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11182159B2 (en) * 2020-02-26 2021-11-23 Google Llc Vector reductions using shared scratchpad memory

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108647777A (en) * 2018-05-08 2018-10-12 济南浪潮高新科技投资发展有限公司 A kind of data mapped system and method for realizing that parallel-convolution calculates
CN111222626A (en) * 2019-11-07 2020-06-02 合肥恒烁半导体有限公司 Data segmentation operation method of neural network based on NOR Flash module
CN111667052A (en) * 2020-05-27 2020-09-15 上海赛昉科技有限公司 Standard and nonstandard volume consistency transformation method for special neural network accelerator
CN112149816A (en) * 2020-11-25 2020-12-29 之江实验室 Heterogeneous memory-computation fusion system and method supporting deep neural network reasoning acceleration
CN113438171A (en) * 2021-05-08 2021-09-24 清华大学 Multi-chip connection method of low-power-consumption storage and calculation integrated system
CN113517009A (en) * 2021-06-10 2021-10-19 上海新氦类脑智能科技有限公司 Storage and calculation integrated intelligent chip, control method and controller
CN113487006A (en) * 2021-07-09 2021-10-08 上海新氦类脑智能科技有限公司 Portable artificial intelligence auxiliary computing equipment
WO2023045114A1 (en) * 2021-09-22 2023-03-30 清华大学 Storage and computation integrated chip and data processing method
CN114298296A (en) * 2021-12-30 2022-04-08 清华大学 Convolution neural network processing method and device based on storage and calculation integrated array
CN116048800A (en) * 2023-01-10 2023-05-02 之江实验室 Data processing method and device, storage medium and electronic equipment
CN115964181A (en) * 2023-03-10 2023-04-14 之江实验室 Data processing method and device, storage medium and electronic equipment
CN115981870A (en) * 2023-03-10 2023-04-18 之江实验室 Data processing method and device, storage medium and electronic equipment
CN116107728A (en) * 2023-04-06 2023-05-12 之江实验室 Task execution method and device, storage medium and electronic equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
面向大数据处理的应用性能优化方法研究;华幸成;《中国博士学位论文全文数据库 信息科技辑》;第2019年卷(第6期);I138-14 *

Also Published As

Publication number Publication date
CN116306855A (en) 2023-06-23

Similar Documents

Publication Publication Date Title
Chen et al. A survey of accelerator architectures for deep neural networks
US20210157992A1 (en) Information processing method and terminal device
CN110490309B (en) Operator fusion method for neural network and related product thereof
US10872290B2 (en) Neural network processor with direct memory access and hardware acceleration circuits
CN109478144B (en) Data processing device and method
EP3891662A1 (en) Automated generation of machine learning models
KR20210032266A (en) Electronic device and Method for controlling the electronic device thereof
US11593628B2 (en) Dynamic variable bit width neural processor
CN115981870B (en) Data processing method and device, storage medium and electronic equipment
US20200151572A1 (en) Using Multiple Functional Blocks for Training Neural Networks
CN117223009A (en) Performance scaling of data stream deep neural network hardware accelerators
CN116663618A (en) Operator optimization method and device, storage medium and electronic equipment
JP2024008989A (en) System and method for retrieving image
CN110874627A (en) Data processing method, data processing apparatus, and computer readable medium
Mukhopadhyay et al. Systematic realization of a fully connected deep and convolutional neural network architecture on a field programmable gate array
CN115204355A (en) Neural processing unit capable of reusing data and method thereof
WO2020005599A1 (en) Trend prediction based on neural network
CN116402113B (en) Task execution method and device, storage medium and electronic equipment
US11748100B2 (en) Processing in memory methods for convolutional operations
CN116306855B (en) Data processing method and device based on memory and calculation integrated system
CN116150563B (en) Service execution method and device, storage medium and electronic equipment
CN117113174A (en) Model training method and device, storage medium and electronic equipment
US10997497B2 (en) Calculation device for and calculation method of performing convolution
KR102372869B1 (en) Matrix operator and matrix operation method for artificial neural network
CN116415103B (en) Data processing method, device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant