WO2020042739A1 - 数据预处理方法、装置、计算机设备和存储介质 - Google Patents

数据预处理方法、装置、计算机设备和存储介质 Download PDF

Info

Publication number
WO2020042739A1
WO2020042739A1 PCT/CN2019/093144 CN2019093144W WO2020042739A1 WO 2020042739 A1 WO2020042739 A1 WO 2020042739A1 CN 2019093144 W CN2019093144 W CN 2019093144W WO 2020042739 A1 WO2020042739 A1 WO 2020042739A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
memory
input data
data
storage capacity
Prior art date
Application number
PCT/CN2019/093144
Other languages
English (en)
French (fr)
Inventor
刘少礼
孟小甫
Original Assignee
中科寒武纪科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN201810987293.5A external-priority patent/CN110865950B/zh
Priority claimed from CN201810987343.XA external-priority patent/CN110865792B/zh
Application filed by 中科寒武纪科技股份有限公司 filed Critical 中科寒武纪科技股份有限公司
Priority to US16/622,503 priority Critical patent/US11966583B2/en
Priority to KR1020197036813A priority patent/KR102519467B1/ko
Priority to JP2019568721A priority patent/JP6867518B2/ja
Priority to EP19217269.0A priority patent/EP3757896B1/en
Priority to EP19812653.4A priority patent/EP3640810A4/en
Priority to US16/718,874 priority patent/US11243895B2/en
Publication of WO2020042739A1 publication Critical patent/WO2020042739A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • G06F3/0611Improving I/O performance in relation to response time
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0811Multiuser, multiprocessor or multiprocessing cache systems with multilevel cache hierarchies
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0804Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with main memory updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0888Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using selective caching, e.g. bypass
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus
    • G06F13/1668Details of memory controller
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0619Improving the reliability of storage systems in relation to data integrity, e.g. data losses, bit errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0629Configuration or reconfiguration of storage systems
    • G06F3/0631Configuration or reconfiguration of storage systems by allocating resources to storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1016Performance improvement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1041Resource optimization
    • G06F2212/1044Space efficiency improvement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/45Caching of specific data in cache memory
    • G06F2212/454Vector or matrix data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/60Details of cache memory
    • G06F2212/604Details relating to cache allocation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2213/00Indexing scheme relating to interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F2213/16Memory access
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/10Machine learning using kernel methods, e.g. support vector machines [SVM]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present disclosure relates to the field of computer technology, and in particular, to a data preprocessing method, device, computer device, and storage medium.
  • a multi-level memory architecture is currently adopted, that is, an architecture using a cache memory, a main memory, and an external memory.
  • the access speed of the cache, main memory, and external memory decreases in order, and the storage capacity increases in order.
  • the bandwidth of I / O in computer equipment often cannot meet the needs of large data volumes, during the process of machine learning operations performed by the processor, between the cache memory and the main memory, and / or between the main memory and the external memory Data needs to be read frequently.
  • the processor first needs to read the input data from the external memory. After the operation is completed, the processor needs to store the operation result in the external memory, and then continue to read from the external memory. Input data required for the next operation. Due to the limitation of I / O bandwidth, at least two I / O read and write operations are involved in one operation. Frequent I / O read and write operations take a long time, resulting in low processing efficiency of the processor.
  • a data preprocessing method includes the following steps:
  • Target input data corresponding to the target operation Determining target input data corresponding to the target operation according to the target operation and the available storage capacity of the first memory; wherein the target input data is a part or all of all input data corresponding to the target operation;
  • the target output data of the target operation is input data of another operation after the target operation, storing the target output data of the target operation on the first memory, wherein the The first memory is disposed near the processor.
  • a data pre-processing device includes:
  • An acquisition module configured to acquire an available storage capacity of a first memory and a target operation operation
  • An input determination module configured to determine target input data corresponding to the target operation according to the target operation and the available storage capacity of the first memory
  • An output determination module configured to determine target output data of the target operation according to the target operation and the target input data
  • a storage allocation module configured to store the target output data of the target operation in the first memory when the target output data of the target operation is input data of another operation after the target operation
  • the first memory is disposed near the processor.
  • a computer device includes a first memory, a second memory, and a processor.
  • the first memory is disposed close to the processor, and the first memory and the second memory are capable of reading and writing data; the second memory
  • the memory stores a computer program, which is characterized in that, when the processor executes the computer program, the steps of the foregoing method are implemented.
  • a computer-readable storage medium stores a computer program thereon, and when the computer program is executed by a processor, the steps of the foregoing method are implemented.
  • the target output data of the target operation when the target output data of the target operation is input data of other subsequent operation, the target output data corresponding to the target operation can be stored near the processor.
  • the I / O read operation during the operation can be reduced, thereby increasing the speed and efficiency of the processor.
  • a data preprocessing method includes the following steps:
  • the target output data of the target operation is input data of another operation after the target operation, the target output data is correspondingly stored on the main memory.
  • the step of determining target input data corresponding to the target operation according to the available storage capacity of the main memory, the available storage capacity of the slave memory, and the target operation and include:
  • the target input data corresponding to the target operation is determined according to the available storage capacity of the first memory and the target operation.
  • the target operation includes more than one operation, and each operation corresponds to sub-target input data; and the target operation is determined based on the target operation and the available storage capacity of the first memory.
  • the steps of the target input data corresponding to the target operation operation further include:
  • fusion-capable operation Using a selected number of combinations of the fusion-capable operation as the target operation, the selected number being less than or equal to the fusion number threshold;
  • the operation to be processed is a neural network operation including multiple operation layers, and each operation layer represents one of the operation operations; the method further includes the following steps:
  • a fusion attribute of each of the operation operations is determined according to a connection relationship of each operation layer of the neural network operation.
  • the input data corresponding to the target operation includes a plurality of input data blocks, each of the target input data includes more than one of the input data blocks, and the target input data corresponding to the target operation The number is more than one.
  • the target operation includes more than one sub-target operation, and each of the sub-target operation corresponds to one target input data; the method further includes the following steps:
  • the number of the sub-target operation is determined according to the remaining storage capacity of the first memory and the target storage capacity required for the sub-target operation other than the current sub-target operation.
  • the target input data includes first target input data and second target input data; the method further includes the following steps:
  • the first target input data corresponding to the main memory and the second target input data corresponding to each of the slave memories are determined according to a preset operation allocation rule.
  • the method further includes the following steps:
  • a storage address of each of the second target input data on the slave memory is determined according to an available storage capacity of each of the slave memories and a data capacity of the corresponding second target input data.
  • the target output data includes first target output data and second target output data; and the target corresponding to the target operation is determined according to the target operation and the target input data.
  • the step of outputting data also includes:
  • a storage address of each of the second target output data on the main memory is determined.
  • the method further includes the following steps:
  • the second target output data is stored in a slave memory corresponding to the slave processing circuit.
  • the method further includes the following steps:
  • the target output data of the target operation is input data of another operation after the target operation, the target output data is correspondingly stored in the main memory and the second memory.
  • a data pre-processing device includes:
  • An acquisition module for acquiring the available storage capacity of the main memory, the available storage capacity of the slave memory, and the target operation
  • An input determination module configured to determine target input data corresponding to the target operation according to the available storage capacity of the main memory, the available storage capacity of the slave memory, and the target operation;
  • An output determination module configured to determine target output data corresponding to the target operation according to the target operation and the target input data
  • the storage allocation module is configured to store the target output data on the main memory correspondingly when the target output data of the target operation is input data of other operation subsequent to the target operation.
  • the data pre-processing device further includes a storage capacity determination module, configured to compare the available storage capacity of the main storage with the available storage capacity of each of the slave storages, and compare the minimum available storage capacity As the available storage capacity of the first memory;
  • the input determination module is specifically configured to determine target input data corresponding to the target operation based on the available storage capacity of the first memory and the target operation.
  • the target operation includes more than one operation, and each operation corresponds to sub-target input data.
  • the input determination module further includes:
  • a fusion determining unit configured to determine the number of fusion operation operations that can be fused and obtain a threshold for the number of fusion operations according to the available storage capacity of the first memory and the fusion attribute of each operation operation in the pending operation;
  • An input determining unit configured to use a combination of a selected number of the fusion-capable operation operations as the target operation operation, where the selected number is less than or equal to the fusion number threshold;
  • the sub-target input data corresponding to the fused operation is used as the target input data corresponding to the target operation.
  • the operation to be processed is a neural network operation including a plurality of operation layers, and each of the operation layers represents one of the operation operations; and the fusion determination unit is further configured to perform operations according to the neural network operation.
  • the connection relationship between the various operation layers of the computer determines the fusion attribute of each of the operation operations.
  • the target operation includes more than one sub-target operation, and each of the sub-target operations corresponds to one of the target input data; wherein all input data corresponding to the target operation includes Multiple input data blocks, each of the target input data includes more than one of the input data blocks, and the number of target input data corresponding to the target operation is more than one; the input determination module is further configured to:
  • the number of the sub-target operation is determined according to the remaining storage capacity of the first memory and the target storage capacity required for the sub-target operation other than the current sub-target operation.
  • the target input data includes first target input data and second target input data
  • the input determination module is further configured to determine the first target input data corresponding to the main memory and the second target input data corresponding to each of the slave memories according to a preset operation allocation rule;
  • the storage allocation module is further configured to determine a storage address of the first target input data on the main memory according to an available storage capacity of the main memory and a data capacity of the first target input data;
  • the available storage capacity of the slave memory and the data capacity of the corresponding second target input data determine a storage address of each of the second target input data on the slave memory.
  • the target output data includes first target output data and second target output data; the output determination module is further configured to:
  • a storage address of each of the second target output data on the main memory is determined.
  • the storage allocation module is further configured to store the second target output data in another target operation operation performed on the slave processing circuit when the second target output data is needed. And a slave memory corresponding to the slave processing circuit.
  • a computer device including:
  • a processor including a controller unit and an operation unit, wherein the controller unit is connected to the operation unit, and the operation unit includes a master processing circuit and a plurality of slave processing circuits;
  • a plurality of first memories including a main memory and a plurality of slave memories
  • the main memory is provided close to the main processor
  • the plurality of slave memories are provided corresponding to the plurality of slave processing circuits
  • each of the The slave processors are respectively arranged close to the corresponding slave processing circuits
  • a second memory where the first memory and the second memory can read and write data
  • the first memory or the second memory stores a computer program
  • the processor implements the steps of the method in the embodiment of the present disclosure when the processor executes the computer program.
  • a computer-readable storage medium has stored thereon a computer program that, when executed by a processor, implements steps of a method in an embodiment of the present disclosure.
  • the target output data of the target operation when the target output data of the target operation is input data of other subsequent operation, the target output data corresponding to the target operation can be stored on the main memory.
  • the data preprocessing method can also reduce the data interaction between the main memory and the slave memory, further reduce the occupation time of the I / O read operation during the calculation process, and improve the speed and efficiency of the processor.
  • FIG. 1 is a schematic structural diagram of a computer device in an embodiment
  • FIG. 2 is a schematic structural diagram of a processor of a computer device according to an embodiment
  • FIG. 3 is a schematic structural diagram of a processor of a computer device according to an embodiment
  • FIG. 4 is a schematic structural diagram of a processor of a computer device according to an embodiment
  • FIG. 5 is a schematic flowchart of a data preprocessing method according to an embodiment
  • FIG. 6 is a schematic flowchart of an embodiment of a step of determining target input data in FIG. 5;
  • FIG. 7 is a schematic flowchart of an embodiment of determining the number of target operation operations in the data preprocessing method shown in FIG. 5; FIG.
  • FIG. 8 is a schematic diagram of a pending operation in an embodiment
  • FIG. 9 is a schematic diagram of a pending operation in another embodiment
  • FIG. 10 is a schematic flowchart of a data preprocessing method in another embodiment
  • FIG. 11 is a schematic flowchart of an embodiment of a step of determining target input data in FIG. 10;
  • FIG. 12 is a structural block diagram of a data pre-processing apparatus according to an embodiment
  • FIG. 13 is a structural block diagram of a data pre-processing apparatus according to an embodiment
  • FIG. 14 is a structural block diagram of a data pre-processing apparatus according to another embodiment.
  • the term “if” can be construed as “when” or “once” or “in response to a determination” or “in response to a detection” depending on the context.
  • the phrase “if determined” or “if [the described condition or event] is detected” can be interpreted, depending on the context, to mean “once determined” or “in response to the determination” or “once [the condition or event described ] “Or” In response to [Description of condition or event] detected ".
  • a computer device may include a processor 100, a first memory 200, and a second memory 300.
  • the first memory 200 may be located near the processor 100, and the processor 100 may directly exchange data with the first memory 200, that is, the processor 100 may directly read input data from the first memory 200, and will perform the following operations according to the foregoing description.
  • the output data obtained by the input data is written into the first memory 200.
  • the first memory 200 may directly perform data exchange with the second memory 300, that is, the first memory 200 may read data from the second memory 300, and may also write data to the second memory. Further, the access speed of the first memory 200 is greater than the access speed of the second memory 300, and the storage capacity of the first memory 200 is less than the storage capacity of the second memory 300.
  • the computer device may be a mobile terminal such as a mobile phone or a tablet computer, or a terminal such as a desktop computer, a board card, or a cloud server.
  • the computer device may also be a computer system formed by a cloud server and a terminal such as a mobile phone or a computer.
  • the computer device can be applied to a robot, a printer, a scanner, a driving recorder, a navigator, a camera, a video camera, a projector, a watch, a mobile storage, a wearable device, a vehicle, a home appliance, and / or a medical device.
  • the transportation means may include airplanes, ships and / or vehicles; household appliances may include televisions, air conditioners, microwave ovens, refrigerators, rice cookers, humidifiers, washing machines, electric lights, gas stoves, cooker hoods; medical equipment may include nuclear magnetic resonance instruments, Ultrasound and / or electrocardiograph, etc.
  • the first memory 200 may be an internal memory
  • the second memory 300 may be an external memory, such as a hard disk.
  • the first memory 200 may be a RAM (Random-Access Memory)
  • the second memory 300 may be a DDR (Double Data Rate, a double-rate synchronous dynamic random first memory 200). and many more.
  • the first memory 200 may be integrated with the processor 100, that is, the first memory 200 is an on-chip memory, such as a cache memory, and the second memory 300 may be an off-chip memory such as an internal memory. , Such as RAM and so on.
  • the second memory 300 may be used to store data and computer programs required by the computer device to perform specific operations.
  • the data may be machine learning data, such as neural network data and the like. Because the storage capacity of the first memory 200 is small, when the processor 100 needs to perform a specific operation, the data stored in the second memory 300 to complete the specific operation may be written into the first memory 200, and the processor 100 The input data required for the specific operation may be read from the first memory 200 for operation, and the operation result may be written into the first memory 200.
  • the processor 100 may include a controller unit 110 and an operation unit 120.
  • the controller unit 110 is connected to the operation unit 120, and the operation unit 120 may include a main processing circuit 121.
  • the master processing circuit 121 and the slave processing circuits 122 form a master-slave structure.
  • the number of the first memories 200 may be multiple, and the plurality of first memories 200 may form a storage system with a master-slave structure.
  • the plurality of first memories 200 may include a main memory and a plurality of slave memories, wherein the main memory may be disposed near the main processing circuit, and the slave memory may be disposed near the slave processing circuit.
  • the main memory may be an on-chip memory of a main processing circuit
  • the slave memory may be an on-chip memory of a slave processing circuit.
  • the storage capacity of the master memory is smaller than the storage capacity of each slave memory.
  • each slave processor may correspondingly set more than one slave memory, which is not specifically limited herein.
  • the controller unit 110 is configured to obtain data and calculate instructions.
  • the data may specifically include machine learning data.
  • the machine learning data may be neural network data.
  • the controller unit 110 is further configured to parse a calculation instruction obtained by the controller unit 110 to obtain an operation instruction, and send a plurality of operation instructions and data to the main processing circuit.
  • the main processing circuit 121 is configured to perform pre-processing on data and data and operation instructions transmitted between the main processing circuit 121 and a plurality of slave processing circuits 122.
  • the multiple slave processing circuits 122 are configured to perform intermediate operations in parallel according to the data transmitted from the master processing circuit 121 and the operation instructions to obtain a plurality of intermediate results, and transmit the multiple intermediate results to the master processing circuit 121.
  • the master processing circuit 121 is further configured to: Perform subsequent processing on multiple intermediate results to obtain the calculation result of the calculation instruction.
  • a first memory is integrated on the main processing circuit 121 and each slave processing circuit 122, that is, a plurality of first memories may be on-chip memories of the main processing circuit and a slave processing circuit, and a second memory may be a chip of the processor External storage.
  • the controller unit 110 may include an instruction cache unit 111, an instruction processing unit 112, and a storage queue unit 114; the instruction cache unit 111 is configured to store calculation instructions associated with machine learning data; the instruction processing unit 112 is configured to perform calculations on the calculation instructions A plurality of operation instructions are obtained through analysis; the storage queue unit 114 is configured to store an instruction queue, and the instruction queue includes a plurality of operation instructions or calculation instructions to be executed in the order of the queue.
  • the controller unit 110 may further include a dependency processing unit 113 for determining whether there is an association relationship between the first operation instruction and the zeroth operation instruction before the first operation instruction when there are multiple operation instructions, such as The first operation instruction is associated with the zeroth operation instruction, the first operation instruction is buffered in the instruction storage unit, and after the zeroth operation instruction is executed, the first operation instruction is extracted from the instruction storage unit and transmitted to the operation unit. Specifically, if the dependency processing unit 113 extracts a first storage address range of data (for example, a matrix) required in the first operation instruction according to the first operation instruction, extracts a first storage address range of the required matrix in the zeroth operation instruction according to the zeroth operation instruction. Zero memory address interval.
  • a dependency processing unit 113 for determining whether there is an association relationship between the first operation instruction and the zeroth operation instruction before the first operation instruction when there are multiple operation instructions, such as The first operation instruction is associated with the zeroth operation instruction, the first operation instruction is buffered in the instruction storage unit, and after the zeroth operation instruction is
  • first memory address interval and the zeroth memory address interval have overlapping areas, it is determined that the first operation instruction and the zeroth operation instruction have an associated relationship. For example, the first memory address interval and the zeroth memory address interval are not related. If there is an overlapped area, it is determined that the first operation instruction and the zeroth operation instruction have no correlation.
  • the arithmetic unit 120 may further include a branch processing circuit 123, wherein the main processing circuit 121 is connected to the branch processing circuit 123, and the branch processing circuit 123 is connected to a plurality of slave processing circuits 122;
  • the processing circuit 123 is configured to execute data or instructions transferred between the master processing circuit 121 and the slave processing circuit 122.
  • the main processing circuit 121 is specifically configured to allocate an input neuron into a plurality of data blocks, and assign at least one data block, a weight, and at least one operation instruction of the plurality of operation instructions in the plurality of data blocks.
  • the branch processing circuit 123 is used to forward the data blocks, weights, and operation instructions between the main processing circuit 121 and the multiple slave processing circuits 122; the multiple slave processing circuits 122 are used to receive the data according to the operation instruction; The obtained data blocks and weights perform an operation to obtain an intermediate result, and transmit the intermediate result to the branch processing circuit 123.
  • the main processing circuit 121 is further configured to perform subsequent processing on the intermediate result sent by the branch processing circuit to obtain the result of the calculation instruction.
  • the result of the calculation instruction is sent to the controller unit.
  • a first memory is also integrated on each branch processing circuit 123.
  • the operation unit 120 may include a master processing circuit 121 and a plurality of slave processing circuits 122.
  • a plurality of slave processing circuits are distributed in an array; each slave processing circuit is connected to an adjacent slave processing circuit, and the master processing circuit connects k slave processing circuits of the plurality of slave processing circuits, and the k slave processing circuits are: The n slave processing circuits in the first row, the n slave processing circuits in the m row, and the m slave processing circuits in the first column.
  • the K slave processing circuits shown in FIG. 1C include only the first slave processing circuit.
  • the n slave processing circuits in the row, the n slave processing circuits in the m row, and the m slave processing circuits in the first column, that is, the k slave processing circuits are slaves directly connected to the main processing circuit among the multiple slave processing circuits. Processing circuit.
  • the K slave processing circuits are used to transfer data and instructions between the master processing circuit and a plurality of slave processing circuits.
  • the processor provided in the present disclosure sets the operation unit into a master-slave structure. For forward-looking calculation instructions, it can split the data according to forward-looking calculation instructions. In this way, multiple slave processing circuits can The part with a large amount of calculation is performed in parallel, thereby increasing the operation speed, saving operation time, and further reducing power consumption.
  • the above machine learning calculation may specifically include an artificial neural network operation
  • the above input data may specifically include input neuron data and weight data.
  • the above calculation result may specifically be: a result of an artificial neural network operation, that is, output neuron data.
  • the operation in the neural network it can be the operation of one layer in the neural network.
  • the realization process is that in the forward operation, after the execution of the artificial neural network in the previous layer is completed, the operation in the next layer is The instruction will calculate the output neuron calculated in the arithmetic unit as the input neuron of the next layer (or perform some operations on the output neuron and use it as the input neuron of the next layer), and at the same time, the weight It is also replaced by the weight of the next layer; in the reverse operation, when the reverse operation of the artificial neural network of the previous layer is completed, the next layer of operation instructions will use the input neuron gradient calculated in the operation unit as the lower The output neuron gradient of one layer is calculated (or some operation is performed on the input neuron gradient and then used as the output neuron gradient of the next layer), and the weight is replaced with the weight of the next layer.
  • the above machine learning calculations may also include support vector machine operations, k-nearest neighbor (k-nn) operations, k-means (k-means) operations, principal component analysis operations, and so on.
  • k-nn k-nearest neighbor
  • k-means k-means
  • principal component analysis operations and so on.
  • the artificial neural network operation is taken as an example to explain the specific scheme of machine learning calculation.
  • the input neuron and output neuron of the multi-layer operation do not refer to the neurons in the input layer and the output layer of the entire neural network, but to the For any two adjacent layers in the network, the neuron in the lower layer of the network forward operation is the input neuron, and the neuron in the upper layer of the network forward operation is the output neuron.
  • the neuron in the lower layer of the network forward operation is the input neuron
  • the neuron in the upper layer of the network forward operation is the output neuron.
  • Kth layer It is called the input layer, where the neuron is the input neuron, the K + 1th layer is called the output layer, and the neuron is the output neuron. That is, except for the top layer, each layer can be used as the input layer, and the next layer is the corresponding output layer.
  • the second memory is used to store a computer program.
  • the data preprocessing method in the embodiment of the present disclosure can be implemented, thereby obtaining various data during the execution of the pending operation.
  • Storage space allocation rules Specifically, the above-mentioned computer equipment may be used to execute the following data preprocessing method, preprocess the operation to be processed (such as a neural network, etc.), and obtain input data, output data, and intermediate calculation results of the operation to be processed. Storage space allocation rule of data on the first memory. In this way, when the processor executes the pending operation, data (input data, output data, intermediate calculation results, etc.) involved in the pending operation can be stored on the first memory according to the above-mentioned storage space allocation rule.
  • the storage space allocation rule may include a storage address of input data, a storage address of output data, a storage address of an intermediate calculation result, an update rule of data stored in each storage space, and the like during execution of a pending operation. For details, please refer to the description below.
  • the data preprocessing method may include the following steps:
  • the processor may obtain the total storage capacity of the first memory according to the configuration information of the first memory (such as information about the model of the first memory). Further, the processor may obtain the available storage capacity of the first memory according to the total storage capacity of the first memory and the occupied storage capacity on the first memory.
  • the processor may obtain a pending operation and determine a target operation based on the pending operation and the available storage capacity of the first memory.
  • the operation to be processed may include one or more operation operations, and the operation to be processed may be an operation such as a neural network.
  • the arithmetic operation included in the pending operation may be an addition operation, a subtraction operation, a multiplication operation, a division operation, a convolution operation, a pooling operation (Pooling), and an activation operation (for example, Relu). limited.
  • the target operation may be a combination of one or more operation operations in the operation to be processed.
  • S200 Determine target input data corresponding to the target operation according to the target operation and the available storage capacity of the first memory; wherein the target input data is a part or all of all input data corresponding to the target operation.
  • the processor may determine, according to the target operation, all input data required to complete the target operation and a data capacity of the entire input data (that is, a size of a storage space occupied by the entire input data). Further, the processor may determine the target input data corresponding to the target operation and its data capacity according to the available storage capacity of the first memory and the data capacity of all the input data of the target operation, and the data capacity of the target input data is less than Or equal to the storage capacity of the first memory.
  • the target input data is a part or all of all input data corresponding to the target operation, that is, the data capacity of the target input data is less than or equal to the data capacity of all input data corresponding to the target operation.
  • the storage space for storing data such as target output data and intermediate calculation results of the target operation.
  • the storage space can be reused to store data such as the target output data and intermediate calculation results of the target operation.
  • the processor may obtain the target output data and the target output data of the target operation according to the target input data of the target operation and the target operation. Data capacity and other information, that is, the processor can obtain the storage space required for the target output data of the target operation.
  • the target output operation may be performed.
  • Data is stored on the first memory to reduce the number of times the target output data is read, so that the speed and efficiency of the processor can be improved.
  • the processor moves the target output data from the first memory to the second memory, thereby releasing the target output data occupying the first memory. storage. If the operation after the target operation needs to continue to use the target output data, the processor needs to move the target output data from the second memory to the first memory again. In this way, the method needs to perform I / O read operations easily lead to excessive computing time and low processor efficiency and speed.
  • the data preprocessing method of the embodiment of the present disclosure can reduce the occupation time of the I / O read operation during the operation by reducing the number of times the target output data is read, thereby improving the processor's Speed and efficiency.
  • the processor may obtain a target operation OP1, and all input data of the target operation OP1 is input data X (which includes sub-input data X11, X21, X12, and X22, where the sub-input data X11 And X12 may constitute input data X1, and sub-input data X21 and X22 may constitute input data X2, and the input data X1 and X2 may be vector or matrix data, etc.).
  • the processor may use the sub-input data X11 and X21 as the target input data of the target operation OP1 according to the available storage capacity of the target operation OP1 and the first memory. Further, the processor may determine the target output data Y1 and the data capacity of the target output data Y1 according to the target operation OP1 and the target input data X11 and X21.
  • the processor can determine whether the target output data Y1 needs to be used by other operation operations after the target operation OP1 according to a preset operation rule. If the target output data Y1 needs to be used by other operation operations after the target operation OP1 If the target output data Y1 is input data of the operation OP2 after the target operation OP1, the target output data Y1 is temporarily stored in the first memory. In this way, when the operation OP2 is the next target operation, before the processor executes the next operation OP2, the processor only needs to move the input data Y3 required by the operation OP2 from the second memory to the first memory according to a preset rule. It is not necessary to perform the carrying step of the target output data Y1 on a memory.
  • the target output data Y1 is input data of the operation OP2 after the target operation OP1, and at the same time, the target output data Y1 is input data of the operation OP3.
  • the target output data Y1 may be stored in the first memory until the operation OP2 and OP3 are completed, and the target output data Y1 may be deleted from the first memory to release the target output data Y1 in the first memory. Of storage space on your computer.
  • the data preprocessing method in the embodiment of the present disclosure reduces the process of moving the target output data Y1 from the first memory to the second memory after the calculation operation OP1 is finished, and the target output data Y1 is removed from the first memory when the operation OP2 is performed.
  • the process of moving the second memory to the first memory, thereby reducing the number of times the target output data is read, can reduce the time taken by the I / O read operation during the operation, and can improve the speed and efficiency of the processor.
  • the above-mentioned operation to be processed may be a neural network operation including multiple operation layers.
  • the above-mentioned operation operations OP1 and OP2 may be operation layers in a neural network operation.
  • the above-mentioned input data X may include input neuron data and weight data, etc., which may include input data X1 and X2.
  • the above-mentioned input data X1 and X2 may belong to different operation layers, respectively.
  • the processor may use the sub-input data X11 and X21 as target input data of the target operation layer OP1 according to the available storage capacity of the target operation layer OP1 and the first memory.
  • the processor may determine the target output data Y1 and the data capacity of the target output data Y1 according to the target operation layer OP1 and the target input data X11 and X21.
  • the target output data Y1 is the output data of the operation layer OP1.
  • Part of the output data may include output neuron data and weight values of the operation layer OP1.
  • the operation to be processed is a neural network operation
  • the operation to be processed may include a convolution layer, a pooling layer, and an activation layer.
  • the execution order of each of the above operation layers is a convolution operation operation in order— Pooled arithmetic operations—Activates arithmetic operations. That is, the output data of the convolution operation is the input data of the pooling operation, and the output data of the pooling operation is the input data of the activation operation.
  • the input data of each operation layer may include data such as input neuron data and weight values corresponding to the operation layer.
  • the processor can obtain the target input data corresponding to the pooled operation according to the available storage capacity of the first memory and the target operation.
  • the data in the C1-C2 interval (C1-C2)
  • the data in the interval represents the output data of the convolution operation, which may include the output neuron data and weights corresponding to the convolution operation, etc.).
  • the target output data corresponding to the target input data C1-C2 is data in the B1-B2 interval (where the target output data in the B1-B2 interval may include the output neuron data and weight values corresponding to the pooling operation, etc.) .
  • the target output data B1-B2 of the pooling operation is input data for activating the computing operation
  • the target output data B1-B2 of the pooling operation can be stored on the first memory. In this way, after completing the pooling operation, there is no need to move the target output data B1-B2 from the first memory to the second memory, and release the storage space on the first memory. Moreover, it is not necessary to carry the target output data B1-B2 from the second memory to the first memory again before the activation operation is performed.
  • the target output data B1-B2 is first transferred from the first memory to the second memory to release the storage space of the first memory. Because the input data of the activation operation depends on the output data of the pooling operation, before the processor needs to perform the activation operation, it will again target the data block B1-B2 corresponding to the pooling operation. Transfer from the second storage to the first storage. In the case of limited I / O bandwidth, such frequent data read operations will affect the processing efficiency of the processor.
  • the data pre-processing method of the embodiment of the present disclosure can reduce the number of times the target output data is read (that is, reduce the load and store operations of the target output data), which can reduce the number of operations in the calculation process.
  • I / O read operation takes time, which can improve the speed and efficiency of the processor.
  • the above method further includes the following steps:
  • the target operation is stored in the first memory, or the first memory and the second memory. Specifically, if the target output data of the target operation is input data of other operation after the target operation, the target output data may be stored on the first memory to reduce the repeated loading operation of the target output data (i.e. Reduce the load operation of the target output data). At the same time, the target output data can also be copied from the first memory to the second memory, thereby ensuring the consistency of the data on the first memory and the second memory.
  • whether the target output data corresponding to the target operation needs to be synchronously stored on the second memory may be determined according to specific calculation requirements.
  • the target output data may be stored on the first storage only, thereby reducing load and store operations of the target output data at the same time. If the target output data needs to be synchronously stored on the second storage, the target output data can be stored on the first storage and the second storage synchronously. By reducing the load operation of the target output data, data read operations can be avoided. Excessive use of I / O bandwidth affects the processing speed of the processor.
  • the target output data Y1 needs to be used by other operation operations after the target operation OP1, and if the target output data Y1 is input data of the operation OP2 after the target operation OP1, the target output data Y1 is used.
  • Y1 is temporarily stored in the first memory. In this way, when the operation OP2 is the next target operation, before the processor executes the next operation OP2, the processor only needs to move the input data Y3 required by the operation OP2 from the second memory to the It is not necessary to perform the carrying step of the target output data Y1 on a memory. Further, the processor may also copy the target output data Y1 from the first memory to the second memory, so that the data on the first memory and the second memory are consistent.
  • the data preprocessing method of the embodiment of the present disclosure reduces the process of moving the target output data Y1 from the first memory to the second memory after the calculation operation OP1 is completed, thereby reducing the number of times the target output data is read, which can reduce The time taken by I / O read operations during the calculation can increase the speed and efficiency of the processor.
  • the target output data B1-B2 of the pooling operation is input data for activating the computing operation
  • the target output data B1-B2 of the pooling operation can be stored in the first memory and at the same time.
  • the target output data B1-B2 need not be transferred from the second memory to the first memory again before the activation operation is performed.
  • copying the target output data B1-B2 from the first memory to the second memory can ensure the consistency of the data on the first memory and the second memory.
  • the data pre-processing method of the embodiment of the present disclosure reduces the process of moving the target output data B1-B2 from the second storage to the first storage again. By reducing the number of times the target output data is read, It can reduce the time taken by I / O read operations during the operation, which can improve the speed and efficiency of the processor.
  • the processor can split all the input data involved in each target operation, that is, According to the available storage capacity of the first memory, all input data (including input neuron data, weights, etc.) involved in each target operation can be split into multiple input data blocks, and each input data block is separately
  • the target operation is performed to obtain a calculation result of the target operation.
  • the output data corresponding to the target operation can be obtained by fusing the calculation results corresponding to each input data block.
  • the input data block is the above-mentioned target input data
  • the output data corresponding to each input data block is the above-mentioned target output data.
  • the above step S200 specifically includes:
  • the processor determines an input data block corresponding to the target operation according to the available storage capacity of the first memory and the data capacity of the input data required for the target operation, and uses the input data block as a target input corresponding to the target operation. data. Specifically, if the data capacity of all input data required for the target operation is greater than the available storage capacity of the first memory, the processor may determine the input data block corresponding to the target operation according to the available storage capacity of the first memory. The input data block is a part of the entire input data of the target operation. If the data capacity of all input data required for the target operation is less than or equal to the available storage capacity of the first memory, the entire input data of the target operation can be used as one input data block, that is, all the input of the target operation Data as its target input data.
  • the processor may obtain a current target operation OP1, and all input data of the target operation OP1 is all input data X (which includes input data X1 and X2).
  • the processor may use the sub-input data X21 and the sub-input data X21 of the input data as target input data of the target operation OP1 according to the available storage capacity of the target operation OP1 and the first memory, where the sub-input data
  • the sum of the data capacities of X21 and sub-input data X11 is less than the available storage capacity of the first memory.
  • all the input data corresponding to the target operation may also be loaded into the first memory. on.
  • the processor may convert the data in the C1-C2 interval (in the C1-C2 interval) according to the available storage capacity of the first memory and the target operation.
  • the data represents output data of the convolution operation) as an input data block, and the input data block is used as the target input data corresponding to the pooling operation.
  • the processor may use the data in the B1-B2 range as an input data block of the active operation according to the available storage capacity of the first memory, and use the input data block as the The target input data to activate the operation.
  • the target operation when all input data involved in each target operation is split into multiple input data blocks, since the data capacity of each input data block is less than the storage capacity of the first memory, the target operation can Fusion multiple operation operations to be processed by the processor to maximize the use of the storage space of the first memory and improve the operation efficiency.
  • the above-mentioned target operation includes one or more operations, that is, the target operation is a combination of more than one operation.
  • each operation included in the target operation is a different operation, and is used to implement different operations.
  • the processor may determine the sub-target input data corresponding to each operation according to the available storage capacity of the first memory, and determine the target input data corresponding to the target operation according to the sub-target input data corresponding to each operation.
  • the step of determining the input data block corresponding to the target operation in step S200 further includes the following steps:
  • S210 Determine the number of operation operations that can be fused according to the available storage capacity of the first memory and the fusion attributes of each operation operation, and obtain a threshold for the number of operation operations.
  • the fusion attribute of each operation may include a data dependency relationship between input data and / or output data involved in each operation.
  • a combination of more than one selected operation operation capable of fusion is selected as a target operation operation, where the selected number is less than or equal to a fusion number threshold.
  • the selected number is equal to the fusion number threshold, that is, a plurality of operation operations capable of fusion determined according to the storage capacity of the first memory is equivalent to one target operation operation.
  • the operation to be processed may include operation operations OP1 and OP2.
  • the operation operations OP1 and OP2 can be executed by the processor together, and the available storage capacity of the first memory can be
  • the number of operation operations that can be fused with the target operation operation can be considered to be 2.
  • the arithmetic operations OP1 and OP2 are regarded as a target arithmetic operation.
  • the sub-target input data X11, X21, and Y3 corresponding to the operation operations OP1 and OP2 are used as target input data of the target operation operation.
  • the operation OP1 and OP2 can be merged, but the available storage capacity of the first memory can only accommodate the target input data and target output data of the operation OP1, and cannot fully accommodate the target input data and target output data of the operation OP2, at this time ,
  • the number of operation operations that can be fused with the target operation operation can be one, and at this time, the operation operation OP1 can be used as a target operation operation.
  • the sub-target input data X11 and X21 corresponding to the operation OP1 are used as the target input data of the target operation.
  • the number of operation operations included in the target operation operation may be more than two.
  • the number of operation operations included in the target operation operation may be OP1, OP2, and OPn (where n is greater than 2, and n is a positive integer).
  • the sum of the data capacities of the target input data and target output data corresponding to OP1, OP2, and OPn is less than or equal to the available storage capacity of the first memory.
  • the operation to be processed may be an operation such as a neural network, and the neural network operation may include multiple operation layers, and each operation layer may represent one operation.
  • the processor needs to perform operations on the neural network, etc.
  • Each operation layer of the neural network can be used as an operation.
  • the connection relationship between the operation layers of the neural network the fusion properties of the operation operations can be determined.
  • the connection relationship between the various operation layers of the neural network determines which operation layers are fused and the number of operation layers that can be fused, and uses the combination of more than one operation layer that can be fused as a target operation. In this way, by fusing multiple computing layers in the depth direction of the neural network as a target computing operation, the number of computations and data reads can be reduced, and the processing efficiency of the processor can be further improved.
  • the processor may determine the fusion quantity threshold according to the available storage capacity of the first memory and the target input data capacity of each operation. Specifically, if the available storage capacity of the first memory can accommodate the target input data C1-C2 of the pooling operation and the target input data B1-B2 of the activation operation, the threshold of the number of fusions may be determined, and The pooling operation and the activation operation are equivalent to a target operation. At this time, the target input data of the target operation may be data in the C1-C2 interval. In other embodiments, the target operation may be a fusion of a convolution operation, a pooling operation, and an activation operation.
  • the target arithmetic operation may continue to merge more arithmetic operations according to the available storage capacity of the first memory.
  • the neural network may include N computing layers, and the processor may determine n fusion thresholds (where n is greater than or equal to 1 and n is less than or equal to N) according to the available storage capacity of the first memory, and may use The n operation layers serve as one target operation. This is used for illustration only, and is not intended to be a specific limitation.
  • an intermediate calculation result during the execution of the target operation may be stored on the first memory.
  • the above method further includes the following steps:
  • the processor may The intermediate calculation result output by the current operation operation is temporarily stored in the first memory. Specifically, the processor may allocate a storage address for the intermediate result output by the current operation operation on the first memory according to the data capacity of the intermediate result output by the current operation operation.
  • the storage space occupied by the intermediate result output by the current operation can be reallocated, that is, The memory address occupied by the intermediate result of the current operation can be assigned to other data.
  • the processor may temporarily store the intermediate result Y1 output by the current operation operation in the first memory. on. In this way, the number of readings of the intermediate calculation result Y1 is reduced, so that the processing efficiency and speed of the processor can be improved. If the operation OP2 does not need to continue to use the intermediate calculation result, and other target operation operations after the target operation do not need to reuse the intermediate calculation result Y1, the storage space occupied by the intermediate calculation result Y1 can be released. To allocate the storage address occupied by the intermediate calculation result Y1 to other data, such as storing target output data of other target operation operations after the current target operation operation in the storage space occupied by the intermediate calculation result, so as to realize the first Reuse of storage space in memory.
  • the target input data of the pooling operation is data in the C1-C2 interval
  • the target output data corresponding to the target input data is data in the B1-B2 interval
  • the target output data B1-B2 is the target input data for activating the arithmetic operation
  • the processor may temporarily store the intermediate calculation results B1-B2 on the first memory. In this way, the reading times of the intermediate calculation results B1-B2 are reduced, so that the processing efficiency and speed of the processor can be improved. If it is not necessary to use the target output data B1-B2 for the activation operation, the storage space occupied by the target output data B1-B2 may be allocated to other data, so as to realize the multiplexing of the storage space on the first memory.
  • each target input data of the target operation is only used to complete a part of the operation of the target operation.
  • the number of target input data corresponding to the target operation may be more than one, and each target input data is a part of the entire input data, that is, each The target input data contains one or more input data blocks of all input data. That is, more than one target input data can be loaded onto the first memory at the same time.
  • the target operation can be split into multiple sub-target operation operations.
  • each sub-target operation operation can implement the same operation. Specifically, as shown in FIG. 7, the above method further includes the following steps:
  • S500 Determine the target storage capacity required for each sub-target operation according to the target input data capacity and the target output data data capacity corresponding to each of the sub-target operation operations; wherein the target storage required for each sub-target operation is required.
  • the capacities can be equal or different.
  • S510 Determine the remaining storage capacity of the first memory according to the available storage capacity of the first memory and the target storage capacity required for the current sub-target operation.
  • S520 Determine the number of the sub-target operation according to the remaining storage capacity of the first memory and the target storage capacity required for each sub-target operation.
  • the remaining storage capacity of the first memory and the target storage capacity of other sub-target operation operations other than the current sub-target operation operation it is possible to determine how many sub-target operation operations can be accommodated on the first memory. After that, the total number of sub-target operation operations can be determined according to the current sub-target operation operation and the number of other sub-target operation operations.
  • the processor may simultaneously process target input data corresponding to the one or more sub-target operation operations. In this way, by processing multiple pieces of target input data simultaneously, the processing speed and efficiency of the processor can be further improved.
  • the target operation (the left operation in the figure) may include operation operations OP1 and OP2, and the processor may determine the target input data X11, X21 and The data capacity of Y3, and determine the data capacity of the target output data Y1 and Z1 of the current sub-target operation, and determine the current sub-target based on the sum of the data capacity of the target input data and target output data of the current sub-target operation The target storage capacity required for the operation. If the target storage capacity of the current sub-target operation is less than the available storage capacity of the first memory, the remaining storage capacity of the first memory may be calculated and obtained. The remaining storage capacity of the first memory is equal to the available storage capacity of the first memory minus the target storage capacity of the current sub-target operation. After that, the processor may determine the number of sub-target operation operations according to the remaining storage capacity of the first memory.
  • the remaining storage capacity of the first memory can also accommodate target input data X12, X22, and Y4 of another sub-target operation, the intermediate calculation result Y2 output by the operation OP1, and the target output data output by the operation OP2 In Z2, the number of target operation operations can be determined to be two, and the sub-input data X21, X22, and Y4 can be used as target input data of one of the target operation operations.
  • the processor can process multiple target input data in parallel, which can further improve the processing speed and efficiency of the processor.
  • the intermediate calculation result Y2 output by the operation OP1, and the target output data output by the operation OP2 can also accommodate the output data Y of the operation OP3, the operation OP1, OP2, and OP3 can also be fused to obtain the calculation result Y by performing one operation.
  • the operation to be processed is a neural network operation
  • the operation to be processed may include a convolution layer, a pooling layer, and an activation layer.
  • the execution order of the above operation layers is a convolution operation operation—pool Calculate Operation—Activates the operation. If the target operation is an active operation, the processor may obtain the target input data of the current sub-target operation based on the storage capacity of the first memory, and the target input data of the current sub-target operation may be B1 on the pooling layer. Enter data in the B2 interval.
  • the target output data of the current sub-target operation is A1.
  • the processor may further determine the number of sub-target operation operations according to the remaining storage capacity of the first memory. For example, the processor may determine, according to the remaining storage capacity of the first memory, that the remaining storage capacity of the first memory can satisfy the calculation amount of the active operation A1 to A2, and then determine that the number of sub-target operation operations is two, and may The data in the range of the target input data B2 to B3 is used as the target input data corresponding to one target operation of the active operation.
  • the intersection can be temporarily stored in the first memory to avoid multiple readings of the partial data. Operation, which can improve the processing efficiency and speed of the processor.
  • the number of target input data corresponding to the pooled operation can be For two, one target input data is C1-C2 and the other target input data is C3-C4.
  • the target output data corresponding to the target input data C1-C2 is B1-B2, and the target output data corresponding to the target input data C3-C4 is B2-B3. It can be seen from the drawings that the data in the interval of input data C3-C2 is part of the target input data C1-C2, and also part of the target input data C3-C4, that is, the intersection of the two target input data C3-C2.
  • the input data C3-C2 can still be stored in the first memory to avoid the excessive data Read operations, which can improve the processing efficiency and speed of the processor.
  • the above method further includes the following steps:
  • the target output data is stored on the first memory to reduce the number of times the target output data is read.
  • the target output data may be stored on the first memory to reduce the number of times the target output data is read. If the interval between the other operation after the target operation and the target operation exceeds the preset range, in order to prevent the target output data of the target operation from occupying the storage space of the first memory for a long time, the target may be The output data is transferred from the first memory to the second memory.
  • the above method further includes the following steps:
  • the processor may determine the storage address of the target input data on the first memory according to the data capacity of the target input data of the target operation; and determine the target output data on the first memory according to the data capacity of the target output data of the target operation. Store address.
  • the processor may allocate a storage space matching the data capacity of the target input data on the first memory according to the data capacity of the target input data of the target operation, and allocate a storage address of the storage space to the Target input data. In this way, during the actual operation, the target input data can be loaded into a specified storage space on the first memory.
  • the processor may allocate a storage space matching the data capacity of the target input data on the first memory according to the data capacity of the target output data of the target operation, and allocate a storage address of the storage space to the Target output data. In this way, during the actual operation, the target output data can be stored in a specified storage space on the first memory.
  • the above method further includes the following steps:
  • the processor may allocate part or all of the storage address of the target input data to the target output data of the target operation. In this way, the space utilization of the first memory can be improved by multiplexing the same storage space multiple times.
  • the processor may record the storage address of the target input data, the storage address of the target output data, the storage address of the intermediate calculation result, and the update rule of each storage space on the first memory, etc., according to the foregoing.
  • a storage address corresponding to the data of the obtained data obtains a storage allocation rule corresponding to the pending operation.
  • the processor may obtain the storage allocation rule corresponding to the pending operation, and determine the read and write operations of various data and the storage location during the operation according to the storage allocation rule.
  • the foregoing data preprocessing method may also be applied to the computer device shown in FIG. 2 to FIG. 4.
  • the plurality of first memories may include a main memory and a slave memory, wherein the main memory is disposed close to the main processing circuit, and further, the main memory may be an on-chip memory of the main processing circuit.
  • the slave memory is disposed close to the slave processing circuit. Further, the slave memory may also be an on-chip memory of the slave processing circuit.
  • part of the target input data corresponding to the target operation needs to be loaded into the main memory and executed by the main processing circuit.
  • the other part of the target input data corresponding to the target operation needs to be loaded into more than one slave memory, and each slave The memory corresponds to the slave processing circuit.
  • the processor may obtain the total storage capacity of the main memory according to the configuration information of the main memory (such as information such as a model of the main memory). Further, the processor may obtain the available storage capacity of the main memory according to the total storage capacity of the main memory and the storage capacity already occupied on the main memory. Similarly, the processor may obtain the total storage capacity of the slave memory according to the configuration information of the slave memory, and obtain the available storage capacity of the slave memory according to the total storage capacity of the slave memory and the storage capacity already occupied by the slave memory. .
  • the main processing circuit of the processor can obtain the available storage capacity of the main memory, and each of the slave processing circuits can obtain the available storage capacity of the corresponding slave memory and transfer the corresponding available storage capacity of the slave memory to the main processing. Circuit.
  • the controller unit of the processor can obtain the operation to be processed and send data such as the analysis result of the operation to the main processing circuit.
  • the main processing circuit may determine the target operation based on the pending operation, the available storage capacity of the main memory and the available storage capacity of the slave memory.
  • the operation included in the pending operation may be an addition operation, a subtraction operation, a multiplication operation, a division operation, a convolution operation, a pooling operation, an activation operation (for example, Relu), and so on.
  • the target operation may be a combination of one or more operation operations in the operation to be processed.
  • S700 Determine target input data corresponding to the target operation according to the available storage capacity of the main memory, the available storage capacity of the slave memory, and the target operation; wherein the target input data is the target Part or all of all input data corresponding to an arithmetic operation.
  • the processor of the main processing circuit of the processor may determine all the input data required to complete the target operation and the data capacity of the entire input data (that is, the amount of storage space required for the entire input data) according to the target operation. ). Further, the main processing circuit may determine the target input data and its data capacity corresponding to the target operation according to the available storage capacity of the main memory, the available storage capacity of each slave memory, and the data capacity of all input data of the target operation.
  • the main processing circuit of the processor may obtain the target output data of the target operation and the target output operation according to the target input data of the target operation and the target operation.
  • Information such as the data capacity of the target output data, that is, the main processing circuit of the processor can obtain the storage space required by the target output data of the target operation.
  • the target output data of the target operation is input data of another operation after the target operation, the target output data is correspondingly stored on the main memory.
  • the main processing circuit may allocate target input data corresponding to the target operation to the main memory and the slave memory according to a preset operation allocation rule, so that the main processing circuit and the slave processing circuit can cooperatively execute the target operation.
  • the slave processing circuit may process its target input data from the memory to obtain an intermediate calculation result.
  • the slave processing circuit may also transfer the intermediate calculation result to the master processing circuit.
  • the main processing circuit can process the target input data on its main memory and combine the intermediate calculation results transmitted from the processing circuits to obtain the target output data of the target operation. If the target output data corresponding to the target operation is input data of other subsequent operation operations, the target output data may be stored in the main memory, thereby reducing the number of data reads and increasing the operation speed of the processor.
  • step S700 may further include:
  • S710 Compare the available storage capacity of the main memory with the available storage capacity of each of the slave memories, and use the smallest available storage capacity as the available storage capacity of the first memory;
  • the main processing circuit may determine the target input data corresponding to the target operation according to the available storage capacity of the first memory and the target operation.
  • the main processing circuit may split the target input data according to a preset operation allocation rule, allocate the target input data into a plurality of data blocks, and determine a processing circuit corresponding to each data block.
  • the data block processed by the main processing circuit in the target input data may be recorded as the first target input data.
  • the data block processed by the processing circuit in the target input data may be recorded as the second target input data.
  • the data capacity of the second target input data corresponding to each slave processing circuit may be different, and specifically determined by the operation allocation rule.
  • the method further includes the following steps:
  • the first target input data corresponding to the main memory and the second target input data corresponding to each of the slave memories are determined.
  • the master processing circuit may determine which target input data of the target operation is processed by the master processing circuit and which target input data of the target operation is processed by each slave processing circuit according to a preset operation allocation rule.
  • the current target operation is a pooling operation. If the B1-B2 interval on the pooling layer needs to be completed, at this time, the target input data required for the target operation is C1- C2.
  • the master processor may use the input data C1-C3 as the second target input data according to a preset operation distribution rule, and store the second target input data C1-C3 on the slave memory.
  • the input data C3-C2 is used as the first target input data, and the first target input data C3-C2 is stored in the main memory.
  • the above method may further include the following steps:
  • the processor may further determine a storage address of the first target input data on the main memory according to an available storage capacity of the main memory and a data capacity of the first target input data. Specifically, the main processing circuit may determine a storage address of the first target input data on the main memory according to an available storage capacity of the main memory and a data capacity of the first target input data. Further, the main processing circuit may further determine the first target output data corresponding to the first target input data and its data capacity according to the data capacity of the first target input data and the target operation operation, and determine the first target output The storage address of the data on the main memory.
  • the processor may further determine a storage address of the second target input data on the slave memory according to the available storage capacity of the slave memory and the data capacity of the second target input data. Specifically, the main processing circuit determines a storage address of each second target input data on its corresponding slave memory according to the available storage capacity of each slave processing circuit and the data capacity of its corresponding second target input data. Further, the main processing circuit may determine the second target output data corresponding to each second target input data and its data capacity according to the data capacity and target operation of each second target input data, and determine each second target output The storage address of the data in its corresponding slave memory.
  • each slave processing circuit may transmit the second target output data obtained by calculation to the main processing circuit, and the main processing circuit may further determine a storage address of each second target output data on the main memory.
  • the second target output data may be temporarily stored in a slave memory corresponding to the slave processing circuit. . In this way, data read operations between the main memory and the slave memory can be reduced, and the operation speed of the processor can be further improved.
  • the above-mentioned target operation includes more than one operation, that is, the target operation is a combination of more than one operation.
  • each operation included in the target operation is a different operation, and is used to implement different operations.
  • the main processing circuit of the processor may determine the sub-target input data corresponding to each operation according to the available storage capacity of the first memory, and determine the target input corresponding to the target operation according to the sub-target input data corresponding to each operation. data.
  • the determination process of the target input data is consistent with steps S210 to S230 in the above method. For details, refer to the description above, and details are not described herein again.
  • the one or more operation can be divided into a first target operation and a second target operation.
  • the main processing circuit may allocate a first target operation to the main processing circuit and a second target operation to the slave processing circuit according to a preset operation allocation rule.
  • the main processing circuit may store input data required for the first target operation operation on the main memory, and separately store input data required for each second target operation operation on the corresponding slave memory.
  • the pooling operation may be performed. Operation and activation operation are equivalent to a target operation.
  • the target input data of the target operation may be data in the C1-C2 interval.
  • the master processing circuit may assign the active computing operation as the first target computing operation to the master processing circuit itself and the pooled computing operation as the second target computing operation to the slave processing circuit according to a preset computing rule.
  • the input data C1-C2 required for the pooling operation can be loaded into the slave memory, and the input data B1-B2 required for the activation operation is loaded into the main memory. Because there is a dependency relationship between the pooling operation and the activation operation, after the pooling operation is completed, the input data B1-B2 required for the activation operation can be loaded from the memory to the main memory.
  • each target input data of the target operation is only used to complete a part of the operation of the target operation.
  • the number of target input data corresponding to the target operation may be more than one, and each target input data is a part of the entire input data, that is, each The target input data contains one or more input data blocks of all input data. That is, more than one target input data can be loaded onto the first memory at the same time.
  • the target operation can be split into multiple sub-target operation operations.
  • each sub-target operation operation can implement the same operation.
  • the main processing circuit can determine the number of target operation operations according to the available storage capacity of the first memory and the target storage capacity required for each target operation operation, so that the target input data of more than one target operation operation can be loaded at the same time.
  • the process of determining the number of target operation operations is consistent with steps S500 to S520 in the above method. For details, refer to the description above, and details are not described herein again.
  • FIGS. 5-7 and 10-11 are sequentially displayed according to the directions of the arrows, these steps are not necessarily performed in the order indicated by the arrows. Unless explicitly stated in this document, the execution of these steps is not strictly limited, and these steps can be performed in other orders. Moreover, at least some of the steps in FIGS. 5-7 and 10-11 may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily performed at the same time, but may be performed at different times. The execution order of these sub-steps or stages is not necessarily performed sequentially, but may be performed in turn or alternately with at least a part of other steps or sub-steps or stages of other steps.
  • FIGS. 5-7 and 10-11 are sequentially displayed according to the directions of the arrows, these steps are not necessarily performed in the order indicated by the arrows. Unless explicitly stated in this document, the execution of these steps is not strictly limited, and these steps can be performed in other orders. Moreover, at least some of the steps in FIGS. 5-7 and 10-11 may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily performed at the same time, but may be performed at different times. The execution order of these sub-steps or stages is not necessarily performed sequentially, but may be performed in turn or alternately with at least a part of other steps or sub-steps or stages of other steps.
  • Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory can include random access memory (RAM) or external cache memory.
  • RAM is available in various forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Synchlink DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
  • SRAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM dual data rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM synchronous chain Synchlink DRAM
  • Rambus direct RAM
  • DRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM
  • an embodiment of the present disclosure provides a data preprocessing device, which may include an obtaining module 410, an input determining module 420, an output determining module 430, and a storage allocation module 440. among them,
  • the obtaining module 410 is configured to obtain an available storage capacity and a target operation of the first memory; and the input determination module 420 is configured to determine a target corresponding to the target operation according to the target operation and the available storage capacity of the first memory.
  • Input data; an output determination module 430 is configured to determine a target output data of the target operation according to the target operation and the target input data; and a storage allocation module 440 is used to output the target output data of the target operation as When input data of other operation operations after the target operation operation, the target output data of the target operation operation is stored on the first memory, where the first memory is disposed near the processor.
  • the target operation includes more than one operation, and each operation corresponds to sub-target input data.
  • the input determination module 420 further includes a fusion determination unit 421 and an input determination unit 422.
  • the fusion determination unit 421 is configured to determine the number of operation operations that can be fused according to the available storage capacity of the first memory and the fusion attribute of each operation operation in the pending operation, and obtain a threshold for the number of fusion operations.
  • the input determining unit 422 is configured to use, as the target operation, a combination of a selected number of the fusion-capable operation operations; the selected number is less than or equal to the fusion number threshold;
  • the sub-target input data corresponding to each calculation operation is used as the target input data corresponding to the target operation.
  • the operation to be processed is a neural network operation including multiple operation layers, and each operation layer represents one of the operation operations; and the fusion determination unit 421 is further configured to perform operations according to each of the neural network operations.
  • the connection relationship of the operation layer determines the fusion attribute of each operation operation.
  • the storage allocation module 440 is further configured to use an intermediate calculation result output by a current operation operation in the target operation operation as input data of another operation operation in the target operation operation, or output the current operation operation.
  • the intermediate calculation result of the need to be used as input data for other target operation operations the intermediate calculation result output by the current operation operation is stored in the first memory, or the intermediate calculation result output by the current operation operation is stored in the first Memory and second memory.
  • the target operation includes more than one sub-target operation, and each of the sub-target operations corresponds to one of the target input data; wherein all input data corresponding to the target operation includes multiple inputs.
  • Data block the number of target input data corresponding to the target operation is more than one, and each target input data includes more than one input data block;
  • the input determining module 420 is further configured to respectively Determine the target storage capacity required for each of the sub-target computation operations according to the data capacity of the target input data and the target output data of the computation operation; according to the available storage capacity of the first memory and the current sub-target computation operation required Determine the remaining storage capacity of the first memory; based on the remaining storage capacity of the first memory and the target storage capacity required by the sub-target operation other than the current sub-target operation, Determine the number of sub-target operation operations.
  • the storage allocation module 440 is further configured to: when there is an intersection of the target input data of more than one sub-target operation, the intersection of the target input data of the more than one sub-target operation Stored on the first memory.
  • the storage allocation module 440 is further configured to determine a storage address of the target input data on the first memory according to a data capacity of the target input data of the target operation; according to the target operation, The data capacity of the target output data to determine the storage address of the target output data on the first memory; if other operation operations after the target operation operation do not need to use the target input data of the target operation operation , After completing the target operation, a part or all of the storage address of the target input data corresponding to the target operation is allocated to the target output data of the target operation.
  • the obtaining module 410 is configured to obtain the available storage capacity of the main memory, the available storage capacity of the slave memory, and the target operation;
  • the input determining module 420 is configured to determine the available storage capacity of the main memory.
  • the storage capacity, the available storage capacity of the slave memory, and the target operation determine the target input data corresponding to the target operation;
  • the output determining module 430 is configured to determine the target input data according to the target operation and the target input data.
  • the storage allocation module 440 is configured to output the target output data when the target output data of the target operation is input data of another operation after the target operation; Correspondingly stored on the main memory.
  • the data pre-processing device further includes a storage capacity determining module 450, configured to compare the available storage capacity of the main memory with the available storage capacity of each of the slave memories, and minimize The available storage capacity of the first storage is used as the available storage capacity of the first memory; the input determination module 420 is specifically configured to determine target input data corresponding to the target operation based on the available storage capacity of the first memory and the target operation.
  • a storage capacity determining module 450 configured to compare the available storage capacity of the main memory with the available storage capacity of each of the slave memories, and minimize The available storage capacity of the first storage is used as the available storage capacity of the first memory
  • the input determination module 420 is specifically configured to determine target input data corresponding to the target operation based on the available storage capacity of the first memory and the target operation.
  • the target operation includes more than one operation, and each operation corresponds to sub-target input data;
  • the input determination module 420 further includes a fusion determination unit 421 and an input determination unit 422.
  • the fusion determination unit 421 is configured to determine the number of operation operations that can be fused according to the available storage capacity of the first memory and the fusion attribute of each operation operation in the pending operation, and obtain a threshold for the number of fusion operations.
  • the input determining unit 422 is configured to use a selected number of combinations of the fusion-capable operation operations as the target operation operation, where the selected number is less than or equal to the fusion number threshold;
  • the sub-target input data corresponding to the fused operation is used as the target input data corresponding to the target operation.
  • the operation to be processed is a neural network operation including multiple operation layers, and each operation layer represents one of the operation operations; and the fusion determination unit 421 is further configured to perform operations according to each of the neural network operations.
  • the connection relationship of the operation layer determines the fusion attribute of each operation operation.
  • the target operation includes more than one sub-target operation, and each of the sub-target operations corresponds to one of the target input data; wherein all input data corresponding to the target operation includes multiple inputs.
  • Data block the number of target input data corresponding to the target operation is more than one, and each of the target input data includes more than one input data block.
  • the input determining module is further configured to determine a target storage capacity required for each of the sub-target computing operations according to a data capacity of a target input data and a target output data of each of the sub-target computing operations; Determining the remaining storage capacity of the first memory according to the available storage capacity of the first memory and the target storage capacity required for the current sub-target operation; according to the remaining storage capacity of the first memory and the current sub-target operation The target storage capacity required for other sub-target operation operations determines the number of the sub-target operation operations.
  • the target input data includes first target input data and second target input data; the input determination module 420 is further configured to determine a first target input corresponding to the main memory according to a preset operation allocation rule. Data and second target input data corresponding to each of the slave memories; the storage allocation module 440 is further configured to determine the first based on the available storage capacity of the main memory and the data capacity of the first target input data A storage address of the target input data in the main memory; and determining each of the second target input data in the second memory according to the available storage capacity of each of the slave memories and the data capacity of the corresponding second target input data The storage address on the slave memory.
  • the target output data includes first target output data and second target output data; the output determining module 430 is further configured to determine the first target output data according to the target operation operation and the first target input data.
  • the storage allocation module 440 is further configured to store the second target output data in the second target output data when other target operation operations performed on the slave processing circuit need to use the second target output data.
  • the slave processing circuit corresponds to the slave memory.
  • the storage allocation module 440 is further configured to: when the target output data of the target operation is input data of other operation subsequent to the target operation, store the target output data in the corresponding correspondence. Main memory and said second memory.
  • the embodiment of the present disclosure further provides a computer-readable storage medium having stored thereon a computer program that, when executed by a processor, implements the steps of the method according to any one of the above embodiments. . Specifically, when the computer program is executed by a processor, the following steps are implemented:
  • Target input data corresponding to the target operation Determining target input data corresponding to the target operation according to the target operation and the available storage capacity of the first memory; wherein the target input data is a part or all of all input data corresponding to the target operation;
  • the target output data of the target operation is input data of another operation after the target operation, storing the target output data of the target operation on the first memory, wherein the The first memory is disposed near the processor.
  • the processor may include a master-slave structure formed by a master processing circuit and a slave processing circuit. At this time, when the processor executes the computer program, the following steps are specifically implemented:
  • Target input data corresponding to the target operation according to the available storage capacity of the main memory, the available storage capacity of the slave memory, and the target operation; wherein the target input data is the target operation Part or all of the corresponding input data;
  • the target output data of the target operation is input data of another operation after the target operation, the target output data is correspondingly stored on the main memory.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Human Computer Interaction (AREA)
  • Neurology (AREA)
  • Computer Security & Cryptography (AREA)
  • Memory System (AREA)
  • Advance Control (AREA)
  • Color Image Communication Systems (AREA)
  • Facsimile Image Signal Circuits (AREA)
  • Feedback Control In General (AREA)

Abstract

一种数据预处理方法、装置、计算机设备和存储介质,通过将目标运算操作对应的目标输出数据存储在靠近处理器(100)的第一存储器(200)上,通过减少目标输出数据的读取次数,可以减少运算过程中的I/O读取操作的占用时间,从而可以提高处理器的速度及效率。

Description

数据预处理方法、装置、计算机设备和存储介质
相关申请
本披露要求2018年08月28日申请的,申请号为2018109872935,名称为“数据预处理方法、装置、计算机设备和存储介质”的中国专利申请的优先权,在此将其全文引入作为参考。
本披露要求2018年08月28日申请的,申请号为201810987343X,名称为“数据预处理方法、装置、计算机设备和存储介质”的中国专利申请的优先权,在此将其全文引入作为参考。
技术领域
本披露涉及计算机技术领域,特别是涉及一种数据预处理方法、装置、计算机设备和存储介质。
背景技术
随着数据量的爆炸式增长,机器学习等人工智能算法得到了越来越多的应用。机器是通过分析大量的数据来进行学习,因此,机器学习等大数据运算对存储器的访存量等需求急剧增大。
为了满足存储器的访存量等需求,目前通常采用多级存储器体系结构,即使用高速缓冲存储器、主存储器和外存储器的体系结构。其中,高速缓冲存储器(Cache)、主存储器和外存储器的存取速度依次减小,存储容量依次增大。但由于计算机设备中的I/O的带宽往往不能满足超大数据量的需求,因此在处理器执行机器学习运算的过程中,高速缓冲存储器与主存储器之间,和/或主存储器与外存储器之间需要频繁地进行数据读取操作。例如,在处理器执行运算的过程中,处理器首先需要从外存储器中读取输入数据,在运算结束后,处理器需要将运算结果存储至外存储器中,之后再继续从外存储器中读取下一个运算所需的输入数据。由于I/O带宽的限制,因此在一个运算过程中,至少涉及两次I/O读写操作,频繁的I/O读写操作占用时间长,导致处理器的处理效率偏低。
发明内容
基于此,有必要针对上述技术问题,提供一种能够数据预处理方法、装置、计算机设 备和存储介质,能够减少运算过程中的I/O读写操作次数,提高处理器的处理效率。
一种数据预处理方法,所述方法包括如下步骤:
获取第一存储器的可用存储容量及目标运算操作;
根据目标运算操作及第一存储器的可用存储容量,确定所述目标运算操作对应的目标输入数据;其中,所述目标输入数据为所述目标运算操作对应的全部输入数据的一部分或全部;
根据所述目标运算操作和所述目标输入数据,确定所述目标运算操作的目标输出数据;
若所述目标运算操作的目标输出数据为所述目标运算操作之后的其他运算操作的输入数据时,则将所述目标运算操作的目标输出数据存储在所述第一存储器上,其中,所述第一存储器靠近处理器设置。
一种数据预处理装置,所述装置包括:
获取模块,用于获取第一存储器的可用存储容量及目标运算操作;
输入确定模块,用于根据所述目标运算操作及所述第一存储器的可用存储容量,确定所述目标运算操作对应的目标输入数据;
输出确定模块,用于根据所述目标运算操作和所述目标输入数据,确定所述目标运算操作的目标输出数据;
存储分配模块,用于在所述目标运算操作的目标输出数据为所述目标运算操作之后的其他运算操作的输入数据时,则将所述目标运算操作的目标输出数据存储在所述第一存储器上,其中,所述第一存储器靠近处理器设置。
一种计算机设备,包括第一存储器、第二存储器和处理器,所述第一存储器靠近所述处理器设置,所述第一存储器和所述第二存储器能够进行数据读写;所述第二存储器存储有计算机程序,其特征在于,所述处理器执行所述计算机程序时实现上述方法的步骤。
一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现上述的方法的步骤。
上述数据预处理方法、装置、计算机设备和存储介质,当该目标运算操作的目标输出数据为其之后的其他运算操作的输入数据时,可以将目标运算操作对应的目标输出数据存储在靠近处理器的第一存储器上,通过减少目标输出数据的读取次数,可以减少运算过程中的I/O读取操作的占用时间,从而可以提高处理器的速度及效率。
一种数据预处理方法,所述方法包括如下步骤:
获取主存储器的可用存储容量、从存储器的可用存储容量以及目标运算操作;
根据所述主存储器的可用存储容量、所述从存储器的可用存储容量以及所述目标运算操作,确定所述目标运算操作对应的目标输入数据;
根据所述目标运算操作和所述目标输入数据,确定所述目标运算操作对应的目标输出数据;
若所述目标运算操作的目标输出数据为所述目标运算操作之后的其他运算操作的输入数据时,则将所述目标输出数据对应存储在所述主存储器上。
在其中一个实施例中,所述的根据所述主存储器的可用存储容量、所述从存储器的可用存储容量以及所述目标运算操作,确定所述目标运算操作对应的目标输入数据的步骤,还包括:
将所述主存储器的可用存储容量和各个所述从存储器的可用存储容量进行比较,将最小的可用存储容量作为第一存储器的可用存储容量;
根据所述第一存储器的可用存储容量及目标运算操作,确定目标运算操作对应的目标输入数据。
在其中一个实施例中,所述目标运算操作包含一个以上的运算操作,每个所述运算操作对应有子目标输入数据;所述的根据目标运算操作及第一存储器的可用存储容量,确定所述目标运算操作对应的目标输入数据的步骤,还包括:
根据所述第一存储器的可用存储容量及待处理运算中各个运算操作的融合属性,确定能够融合的运算操作的数量,获得融合数量阈值;
将选定数量的所述能够融合的运算操作的组合作为所述目标运算操作,所述选定数量小于或等于所述融合数量阈值;
将所述选定数量的各个能够融合的运算操作对应的子目标输入数据,作为所述目标运算操作对应的目标输入数据。
在其中一个实施例中,所述待处理运算为包含多个运算层的神经网络运算,每个所述运算层表示一个所述运算操作;所述方法还包括如下步骤:
根据所述神经网络运算的各个运算层的连接关系,确定各个所述运算操作的融合属性。
在其中一个实施例中,所述目标运算操作对应的输入数据包括多个输入数据块,每个所述目标输入数据包含一个以上的所述输入数据块,所述目标运算操作对应的目标输入数据的数量为一个以上。
在其中一个实施例中,所述目标运算操作包括一个以上的子目标运算操作,每个所述 子目标运算操作对应一个所述目标输入数据;所述方法还包括如下步骤:
分别根据各个所述子目标运算操作的目标输入数据的数据容量及目标输出数据的数据容量,确定各个所述子目标运算操作所需的目标存储容量;
根据所述第一存储器的可用存储容量以及当前子目标运算操作所需的目标存储容量,确定所述第一存储器的剩余存储容量;
根据所述第一存储器的剩余存储容量,以及所述当前子目标运算操作之外的其他子目标运算操作所需的目标存储容量,确定所述子目标运算操作的数量。
在其中一个实施例中,所述目标输入数据包括第一目标输入数据和第二目标输入数据;所述方法还包括如下步骤:
根据预设的运算分配规则,确定所述主存储器对应的第一目标输入数据以及各个所述从存储器对应的第二目标输入数据。
在其中一个实施例中,所述方法还包括如下步骤:
根据所述主存储器的可用存储容量以及所述第一目标输入数据的数据容量,确定所述第一目标输入数据在所述主存储器上的存储地址;
分别根据各个所述从存储器的可用存储容量以及对应的所述第二目标输入数据的数据容量,确定各个所述第二目标输入数据在所述从存储器上的存储地址。
在其中一个实施例中,所述目标输出数据包括第一目标输出数据和第二目标输出数据;所述的根据所述目标运算操作和所述目标输入数据,确定所述目标运算操作对应的目标输出数据的步骤,还包括:
根据所述目标运算操作及所述第一目标输入数据,确定所述第一目标输出数据及所述第一目标输出数据在所述主存储器上的存储地址;
根据所述目标运算操作及各个所述第二目标输入数据,确定各个所述第二目标输出数据及各个所述第二目标输出数据在对应的所述从存储器上的存储地址;
根据各个所述第二目标输出数据,确定各个所述第二目标输出数据在所述主存储器上的存储地址。
在其中一个实施例中,所述方法还包括如下步骤:
若所述从处理电路上执行的其他目标运算操作需要使用所述第二目标输出数据时,则将所述第二目标输出数据存储在所述从处理电路对应的从存储器上。
在其中一个实施例中,所述方法还包括如下步骤:
若所述目标运算操作的目标输出数据为所述目标运算操作之后的其他运算操作的输入数据时,则将所述目标输出数据对应存储在所述主存储器和所述第二存储器上。
一种数据预处理装置,所述装置包括:
获取模块,用于获取主存储器的可用存储容量、从存储器的可用存储容量以及目标运算操作;
输入确定模块,用于根据所述主存储器的可用存储容量、所述从存储器的可用存储容量以及所述目标运算操作,确定所述目标运算操作对应的目标输入数据;
输出确定模块,用于根据所述目标运算操作和所述目标输入数据,确定所述目标运算操作对应的目标输出数据;
存储分配模块,用于在所述目标运算操作的目标输出数据为所述目标运算操作之后的其他运算操作的输入数据时,则将所述目标输出数据对应存储在所述主存储器上。
在其中一个实施例中,所述数据预处理装置还包括存储容量确定模块,用于将所述主存储器的可用存储容量和各个所述从存储器的可用存储容量进行比较,将最小的可用存储容量作为第一存储器的可用存储容量;
输入确定模块具体用于根据所述第一存储器的可用存储容量及目标运算操作,确定目标运算操作对应的目标输入数据。
在其中一个实施例中,所述目标运算操作包含一个以上的运算操作,每个所述运算操作对应有子目标输入数据;所述输入确定模块还包括:
融合确定单元,用于根据所述第一存储器的可用存储容量及所述待处理运算中各个运算操作的融合属性,确定能够融合的运算操作的数量,获得融合数量阈值;
输入确定单元,用于将选定数量的所述能够融合的运算操作的组合作为所述目标运算操作,所述选定数量小于或等于所述融合数量阈值;将所述选定数量的各个能够融合的运算操作对应的子目标输入数据作为所述目标运算操作对应的目标输入数据。
在其中一个实施例中,所述待处理运算为包含多个运算层的神经网络运算,每个所述运算层表示一个所述运算操作;所述融合确定单元还用于根据所述神经网络运算的各个运算层的连接关系,确定各个所述运算操作的融合属性。
在其中一个实施例中,所述目标运算操作包括一个以上的子目标运算操作,每个所述子目标运算操作对应一个所述目标输入数据;其中,所述目标运算操作对应的全部输入数据包括多个输入数据块,每个所述目标输入数据包含一个以上的所述输入数据块,所述目标运算操作对应的目标输入数据的数量为一个以上;所述输入确定模块还用于:
分别根据各个所述子目标运算操作的目标输入数据的数据容量及目标输出数据的数据容量,确定各个所述子目标运算操作所需的目标存储容量;
根据所述第一存储器的可用存储容量以及当前子目标运算操作所需的目标存储容量, 确定所述第一存储器的剩余存储容量;
根据所述第一存储器的剩余存储容量,以及所述当前子目标运算操作之外的其他子目标运算操作所需的目标存储容量,确定所述子目标运算操作的数量。
在其中一个实施例中,所述目标输入数据包括第一目标输入数据和第二目标输入数据;
所述输入确定模块还用于根据预设的运算分配规则,确定所述主存储器对应的第一目标输入数据以及各个所述从存储器对应的第二目标输入数据;
所述存储分配模块还用于根据所述主存储器的可用存储容量以及所述第一目标输入数据的数据容量,确定所述第一目标输入数据在所述主存储器上的存储地址;分别根据各个所述从存储器的可用存储容量以及对应的所述第二目标输入数据的数据容量,确定各个所述第二目标输入数据在所述从存储器上的存储地址。
在其中一个实施例中,所述目标输出数据包括第一目标输出数据和第二目标输出数据;所述输出确定模块还用于:
根据所述目标运算操作及所述第一目标输入数据,确定所述第一目标输出数据及所述第一目标输出数据在所述主存储器上的存储地址;
根据所述目标运算操作及各个所述第二目标输入数据,确定各个所述第二目标输出数据及各个所述第二目标输出数据在对应的所述从存储器上的存储地址;
根据各个所述第二目标输出数据,确定各个所述第二目标输出数据在所述主存储器上的存储地址。
在其中一个实施例中,所述存储分配模块还用于在所述从处理电路上执行的其他目标运算操作需要使用所述第二目标输出数据时,则将所述第二目标输出数据存储在所述从处理电路对应的从存储器上。
一种计算机设备,包括:
处理器,所述处理器包括控制器单元和运算单元,其中,所述控制器单元与运算单元连接,所述运算单元包括一个主处理电路和多个从处理电路;
多个第一存储器,多个第一存储器包括主存储器和多个从存储器,所述主存储器靠近所述主处理器设置,多个从存储器与多个所述从处理电路对应设置,各个所述从处理器分别靠近对应的所述从处理电路设置;以及
第二存储器,所述第一存储器和所述第二存储器能够进行数据读写;
其中,所述第一存储器或第二存储器存储有计算机程序,所述处理器执行所述计算机程序时实现本披露实施例中的方法的步骤。
一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现本披露实施例中的方法的步骤。
上述数据预处理方法、装置、计算机设备和存储介质,当该目标运算操作的目标输出数据为其之后的其他运算操作的输入数据时,可以将目标运算操作对应的目标输出数据存储在主存储器上,通过减少主存储器与第二存储器之间的数据交互,减少目标输出数据的读取次数,可以减少运算过程中的I/O读取操作的占用时间,从而可以提高处理器的速度及效率。进一步地,该数据预处理方法还可以减少主存储器与从存储器之间的数据交互,进一步减少运算过程中的I/O读取操作的占用时间,提高处理器的速度及效率。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本披露的实施例,并与说明书一起用于解释本披露的原理。
图1为一个实施例中计算机设备的结构示意图;
图2为一个实施例的计算机设备的处理器的结构示意图;
图3为一个实施例的计算机设备的处理器的结构示意图;
图4为一个实施例的计算机设备的处理器的结构示意图;
图5为一个实施例中数据预处理方法的流程示意图;
图6为图5中确定目标输入数据的步骤一实施例的流程示意图;
图7为图5所示的数据预处理方法中确定目标运算操作数量一实施例的流程示意图;
图8为一个实施例中的待处理运算的示意图;
图9为另一个实施例中的待处理运算的示意图;
图10为另一实施例中数据预处理方法的流程示意图;
图11为图10中确定目标输入数据的步骤一实施例的流程示意图;
图12为一实施例的数据预处理装置的结构框图;
图13为一实施例的数据预处理装置的结构框图;
图14为另一实施例的数据预处理装置的结构框图。
具体实施方式
为了使本披露的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本披露进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本披露,并不用于限定本披露。
应当理解,本披露的权利要求、说明书及附图中的术语“第一”、“第二”、“第三”和“第四”等是用于区别不同对象,而不是用于描述特定顺序。本披露的说明书和权利要求书中使用的术语“包括”和“包含”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。
还应当理解,在此本披露说明书中所使用的术语仅仅是出于描述特定实施例的目的,而并不意在限定本披露。如在本披露说明书和权利要求书中所使用的那样,除非上下文清楚地指明其它情况,否则单数形式的“一”、“一个”及“该”意在包括复数形式。还应当进一步理解,在本披露说明书和权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。
如在本说明书和权利要求书中所使用的那样,术语“如果”可以依据上下文被解释为“当...时”或“一旦”或“响应于确定”或“响应于检测到”。类似地,短语“如果确定”或“如果检测到[所描述条件或事件]”可以依据上下文被解释为意指“一旦确定”或“响应于确定”或“一旦检测到[所描述条件或事件]”或“响应于检测到[所描述条件或事件]”。
如图1所示,本披露实施例的计算机设备可以包括处理器100、第一存储器200和第二存储器300。其中,第一存储器200可以设置在处理器100的附近,处理器100可以直接与第一存储器200进行数据交换,即处理器100可以直接从第一存储器200中读取输入数据,并将根据上述输入数据获得的输出数据写入该第一存储器200。该第一存储器200可以直接与该第二存储器300进行数据交换,即该第一存储器200可以从第二存储器300读取数据,也可以向该第二存储写入数据。进一步地,该第一存储器200的存取速度大于第二存储器300的存取速度,该第一存储器200的存储容量小于第二存储器300的存储容量。
可选地,该计算机设备可以是手机或平板电脑等移动终端,或台式电脑、板卡或云端服务器等终端。当然,该计算机设备还可以是云端服务器和手机或电脑等终端形成的计算机系统。该计算机设备可以应用于机器人、打印机、扫描仪、行车记录仪、导航仪、相机、摄像机、投影仪、手表、移动存储、可穿戴设备、交通工具、家用电器、和/或医疗设备。其中,交通工具可以包括飞机、轮船和/或车辆;家用电器可以包括电视、空调、微波炉、冰箱、电饭煲、加湿器、洗衣机、电灯、燃气灶、油烟机;医疗设备可以包括核磁共振仪、B超仪和/或心电图仪等等。
可选地,该第一存储器200可以是内存储器,该第二存储器300可以是外存储器,如硬盘等等。例如,该如第一存储器200可以为RAM(Random-Access Memory,随机存取 第一存储器200)等,第二存储器300可以为DDR(Double Data Rate,双倍速率同步动态随机第一存储器200)等等。可选地,该第一存储器200可以与该处理器100集成为一体,即该第一存储器200为片上存储器,如高速缓冲存储器(Cache),该第二存储器300可以是内存储器等片外存储器,如RAM等等。
可选地,第二存储器300可以用于存储计算机设备执行特定运算所需的数据及计算机程序等等。进一步地,该数据可以是机器学习数据,如神经网络数据等等。由于第一存储器200的存储容量较小,因此,当处理器100需要执行特定运算时,可以将第二存储器300中存储的完成该特定运算所需的数据写入第一存储器200,处理器100可以从第一存储器200读取该特定运算所需的输入数据进行运算,并将运算结果写入第一存储器200。
在一个实施例中,如图2所示,该处理器100可以包括控制器单元110和运算单元120,其中,控制器单元110与运算单元120连接,该运算单元120可以包括一个主处理电路121和多个从处理电路122,该主处理电路121和从处理电路122形成主从结构。相应地,该第一存储器200的数量可以为多个,多个第一存储器200可以形成主从结构的存储体系。例如,多个第一存储器200可以包括一个主存储器和多个从存储器,其中,该主存储器可以靠近主处理电路设置,该从存储器可以靠近从处理电路设置。可选地,该主存储器可以是主处理电路的片上存储器,该从存储器可以是从处理电路的片上存储器。进一步地,该主存储器的存储容量小于各个从存储器的存储容量。更进一步地,每个从处理器可以对应设置一个以上的从存储器,此处不做具体限定。
可选地,上述的控制器单元110用于获取数据以及计算指令。该数据具体可以包括机器学习数据,可选地,该机器学习数据可以为神经网络数据。控制器单元110还用于解析其获取的计算指令得到运算指令,并将多个运算指令以及数据发送给主处理电路。主处理电路121用于对数据以及该主处理电路121与多个从处理电路122之间传输的数据和运算指令执行前序处理。多个从处理电路122用于依据从主处理电路121传输的数据以及运算指令并行执行中间运算得到多个中间结果,并将多个中间结果传输给主处理电路121;主处理电路121还用于对多个中间结果执行后续处理得到计算指令的计算结果。该主处理电路121和每个从处理电路122上均集成有第一存储器,即多个第一存储器可以是该主处理电路和从处理电路的片上存储器,第二存储器可以是该处理器的片外存储器。
可选地,该控制器单元110可以包括指令缓存单元111、指令处理单元112和存储队列单元114;指令缓存单元111用于存储机器学习数据关联的计算指令;指令处理单元112用于对计算指令解析得到多个运算指令;存储队列单元114用于存储指令队列,该指令队列包括:按该队列的前后顺序待执行的多个运算指令或计算指令。可选地,该控制器单元 110还可以包括依赖关系处理单元113,用于在具有多个运算指令时,确定第一运算指令与第一运算指令之前的第零运算指令是否存在关联关系,如第一运算指令与第零运算指令存在关联关系,则将第一运算指令缓存在指令存储单元内,在第零运算指令执行完毕后,从指令存储单元提取第一运算指令传输至运算单元。具体地,若依赖关系处理单元113依据第一运算指令提取第一运算指令中所需数据(例如矩阵)的第一存储地址区间,依据第零运算指令提取第零运算指令中所需矩阵的第零存储地址区间,如第一存储地址区间与第零存储地址区间具有重叠的区域,则确定第一运算指令与第零运算指令具有关联关系,如第一存储地址区间与第零存储地址区间不具有重叠的区域,则确定第一运算指令与第零运算指令不具有关联关系。
在一个实施例中,如图3所示,运算单元120还可以包括分支处理电路123,其中,主处理电路121与分支处理电路123连接,分支处理电路123与多个从处理电路122连接;分支处理电路123用于执行转发主处理电路121与从处理电路122之间的数据或指令。在此实施例中,主处理电路121具体用于将一个输入神经元分配成多个数据块,将多个数据块中的至少一个数据块、权值以及多个运算指令中的至少一个运算指令发送给分支处理电路;分支处理电路123用于转发主处理电路121与多个从处理电路122之间的数据块、权值以及运算指令;多个从处理电路122用于依据该运算指令对接收到的数据块以及权值执行运算得到中间结果,并将中间结果传输给分支处理电路123;主处理电路121还用于将分支处理电路发送的中间结果进行后续处理得到该计算指令的结果,将该计算指令的结果发送给所述控制器单元。可选地,每个分支处理电路123上也集成有第一存储器。
在另一种可选实施例中,如图4所示,运算单元120可以包括一个主处理电路121和多个从处理电路122。其中,多个从处理电路呈阵列分布;每个从处理电路与相邻的其他从处理电路连接,主处理电路连接多个从处理电路中的k个从处理电路,k个从处理电路为:第1行的n个从处理电路、第m行的n个从处理电路以及第1列的m个从处理电路,需要说明的是,如图1C所示的K个从处理电路仅包括第1行的n个从处理电路、第m行的n个从处理电路以及第1列的m个从处理电路,即该k个从处理电路为多个从处理电路中直接与主处理电路连接的从处理电路。K个从处理电路用于在主处理电路以及多个从处理电路之间的数据以及指令的转发。
本披露提供的处理器将运算单元设置成一主多从结构,对于正向运算的计算指令,其可以将依据正向运算的计算指令将数据进行拆分,这样通过多个从处理电路即能够对计算量较大的部分进行并行运算,从而提高运算速度,节省运算时间,进而降低功耗。
可选地,上述机器学习计算具体可以包括:人工神经网络运算,上述输入数据具体可 以包括:输入神经元数据和权值数据。上述计算结果具体可以为:人工神经网络运算的结果即输出神经元数据。
对于神经网络中的运算可以为神经网络中的一层的运算,对于多层神经网络,其实现过程是,在正向运算中,当上一层人工神经网络执行完成之后,下一层的运算指令会将运算单元中计算出的输出神经元作为下一层的输入神经元进行运算(或者是对该输出神经元进行某些操作再作为下一层的输入神经元),同时,将权值也替换为下一层的权值;在反向运算中,当上一层人工神经网络的反向运算执行完成后,下一层运算指令会将运算单元中计算出的输入神经元梯度作为下一层的输出神经元梯度进行运算(或者是对该输入神经元梯度进行某些操作再作为下一层的输出神经元梯度),同时将权值替换为下一层的权值。
上述机器学习计算还可以包括支持向量机运算,k-近邻(k-nn)运算,k-均值(k-means)运算,主成分分析运算等等。为了描述的方便,下面以人工神经网络运算为例来说明机器学习计算的具体方案。
对于人工神经网络运算,如果该人工神经网络运算具有多层运算,多层运算的输入神经元和输出神经元并非是指整个神经网络的输入层中神经元和输出层中神经元,而是对于网络中任意相邻的两层,处于网络正向运算下层中的神经元即为输入神经元,处于网络正向运算上层中的神经元即为输出神经元。以卷积神经网络为例,设一个卷积神经网络有L层,K=1,2,...,L-1,对于第K层和第K+1层来说,我们将第K层称为输入层,其中的神经元为所述输入神经元,第K+1层称为输出层,其中的神经元为所述输出神经元。即除最顶层外,每一层都可以作为输入层,其下一层为对应的输出层。
在一个实施例中,该第二存储器用于存储计算机程序,该处理器执行上述计算机程序时,能够实现本披露实施例中的数据预处理方法,从而获得该待处理运算执行过程中各种数据的存储空间分配规则。具体地,上述的计算机设备可以用于执行下述的数据预处理方法,对待处理运算(如神经网络等运算等)进行预处理,获得该待处理运算的输入数据、输出数据及中间计算结果等数据在第一存储器上的存储空间分配规则。这样,当处理器执行该待处理运算时,该待处理运算所涉及的数据(输入数据、输出数据及中间计算结果等)可以按照上述的存储空间分配规则存储在第一存储器上。这样,通过对运算过程中的存储资源进行预分配,不仅能够合理的利用第一存储器的存储空间,还可以提高处理的运算速度及准确性。其中,该存储空间分配规则可以包括待处理运算执行过程中输入数据的存储地址、输出数据的存储地址、中间计算结果的存储地址以及各个存储空间内存放数据的更新规则等等。具体可参见下文中的描述。
本披露实施例中,为减少运算过程中的数据读写操作(即减少I/O操作次数),提供了一种数据预处理方法,该数据预处理方法可以应用于上述的计算机设备中。具体地,如图5所示,该数据预处理方法可以包括如下步骤:
S100、获取第一存储器的可用存储容量及目标运算操作;
具体地,处理器可以根据该第一存储器的配置信息(如该第一存储器的型号等信息),获得该第一存储器的总存储容量。进一步地,处理器可以根据该第一存储器的总存储容量及该第一存储器上的已经占用的存储容量,获得该第一存储器的可用存储容量。
本披露实施例中,处理器可以获取待处理运算,并根据该待处理运算及第一存储器的可用存储容量确定目标运算操作。其中,该待处理运算可以包括一个或多个运算操作,该待处理运算可以是神经网络等运算。例如,该待处理运算包含的运算操作可以是加法操作、减法操作、乘法操作、除法操作、卷积操作、池化操作(Pooling)及激活操作(例如,Relu)等等,此处不做具体限定。该目标运算操作可以是待处理运算中一个或多个运算操作的组合。
S200、根据目标运算操作及第一存储器的可用存储容量,确定目标运算操作对应的目标输入数据;其中,目标输入数据为目标运算操作对应的全部输入数据的一部分或全部。
具体地,处理器可以根据该目标运算操作,确定完成该目标运算操作所需的全部输入数据及该全部输入数据的数据容量(即该全部输入数据需占用的存储空间大小)。进一步地,处理器可以根据第一存储器的可用存储容量和该目标运算操作的全部输入数据的数据容量,确定该目标运算操作对应的目标输入数据及其数据容量,该目标输入数据的数据容量小于或等于第一存储器的存储容量。其中,该目标输入数据为该目标运算操作对应的全部输入数据的一部分或全部,即该目标输入数据的数据容量小于或等于该目标运算操作对应的全部输入数据的数据容量。当目标输入数据的数据容量小于该目标运算操作的全部输入数据的数据容量时,通过仅加载该目标运算操作的全部输入数据的一部分到第一存储器上,可以在第一存储器上预留一定的存储空间,以供存储该目标运算操作的目标输出数据及中间计算结果等数据。当目标输入数据的数据容量等于该目标运算操作的全部输入数据的数据容量时,可以通过存储空间的复用,以实现存储该目标运算操作的目标输出数据及中间计算结果等数据。
S300、根据目标运算操作和目标输入数据,确定目标运算操作的目标输出数据。
具体地,由于待处理运算的计算量是静态可分析的,因此,处理器可以根据该目标运算操作的目标输入数据及该目标运算操作,获得该目标运算操作的目标输出数据及该目标输出数据的数据容量等信息,即处理器可以获得该目标运算操作的目标输出数据所需占用 的存储空间。
S400、若目标运算操作的目标输出数据为该目标运算操作之后的其他运算操作的输入数据时,则将该目标输出数据存储在第一存储器上,以减少目标输出数据的读取次数。
具体地,若该目标运算操作的目标输出数据为该目标运算操作之后的其他运算操作的输入数据时,即在该目标运算操作之后仍需继续使用该目标输出数据时,则可以将该目标输出数据存储在所述第一存储器上,以减少目标输出数据的读取次数,从而可以提高处理器的速度及效率。
传统技术中,当处理器执行该目标运算操作获得上述目标输出数据后,处理器会将该目标输出数据从第一存储器搬运至第二存储器,从而释放该目标输出数据在第一存储器上占用的存储空间。若该目标运算操作之后的运算操作需要继续使用该目标输出数据时,处理器需要再次将该目标输出数据从第二存储器搬运至第一存储器上,这样方法需要多次执行目标输出数据的I/O读取操作,容易导致运算时间过长,处理器的效率及速度较低。而本披露实施例的数据预处理方法,相对于传统技术而言,通过减少目标输出数据的读取次数,可以减少运算过程中的I/O读取操作的占用时间,从而可以提高处理器的速度及效率。
例如,如图8所示,处理器可以获取目标运算操作OP1,该目标运算操作OP1的全部输入数据为输入数据X(其包括子输入数据X11、X21、X12和X22,其中,子输入数据X11和X12可以组成输入数据X1,子输入数据X21和X22可以组成输入数据X2,该输入数据X1和X2可以是向量或矩阵数据等)。处理器可以根据该目标运算操作OP1和第一存储器的可用存储容量,将子输入数据X11和X21作为该目标运算操作OP1的目标输入数据。进一步地,该处理器可以根据目标运算操作OP1和目标输入数据X11和X21,确定出目标输出数据Y1及该目标输出数据Y1的数据容量。
更进一步地,处理器可以根据预设的运算规则判断该目标输出数据Y1是否需要被目标运算操作OP1之后的其他运算操作使用,若该目标输出数据Y1需要被目标运算操作OP1之后的其他运算操作使用,如该目标输出数据Y1为目标运算操作OP1之后运算操作OP2的输入数据,则将该目标输出数据Y1暂存于第一存储器上。这样,当运算操作OP2为下一个目标运算操作时,则处理器在执行下一运算操作OP2之前,仅需要根据预设规则将该运算操作OP2所需的输入数据Y3从第二存储器搬运至第一存储器上,无需再执行该目标输出数据Y1的搬运步骤。再进一步地,该目标输出数据Y1为目标运算操作OP1之后运算操作OP2的输入数据,同时,该目标输出数据Y1为运算操作OP3的输入数据。此时,可以将该目标输出数据Y1存储在第一存储器上,直至完成运算操作OP2和OP3之后,可以将该目标输出数据Y1从第一存储器中删除,以释放目标输出数据Y1在第一存储器上占 用的存储空间。
本披露实施例的数据预处理方法,减少了运算操作OP1计算结束后将目标输出数据Y1从第一存储器搬运至第二存储器的过程,以及在进行运算操作OP2时再将目标输出数据Y1从第二存储器搬运至第一存储器的过程,从而通过减少目标输出数据的读取次数,可以减少运算过程中的I/O读取操作的占用时间,从而可以提高处理器的速度及效率。
可选地,上述的待处理运算可以是包括多个运算层的神经网络运算,如图8所示,上述的运算操作OP1和OP2可以是神经网络运算中的运算层。上述的输入数据X可以包括输入神经元数据和权值数据等,其可以包括输入数据X1和X2。可选地,上述的输入数据X1和X2可以分别属于不同的运算层。进一步地,处理器可以根据该目标运算层OP1和第一存储器的可用存储容量,将子输入数据X11和X21作为该目标运算层OP1的目标输入数据。更进一步地,该处理器可以根据目标运算层OP1和目标输入数据X11和X21,确定出目标输出数据Y1及该目标输出数据Y1的数据容量,该目标输出数据Y1即为运算层OP1的输出数据的一部分,该输出数据可以包括运算层OP1的输出神经元数据及权值等。
再如,如图9所示,该待处理运算为神经网络运算等运算,该待处理运算可以包括卷积层、池化层及激活层,上述各个运算层执行顺序依次为卷积运算操作—池化运算操作—激活运算操作。即卷积运算操作的输出数据为池化运算操作的输入数据,池化运算操作的输出数据为激活运算操作的输入数据。各个运算层的输入数据可以包括该运算层对应的输入神经元数据及权值等数据。
若当前目标运算操作为池化运算操作,处理器可以根据第一存储器的可用存储容量及目标运算操作,获得该池化运算操作对应的目标输入数据为C1—C2区间内的数据(C1—C2区间内的数据表示卷积运算操作的输出数据,其可以包括卷积运算操作所对应的输出神经元数据及权值等等)。该目标输入数据C1—C2对应的目标输出数据为B1—B2区间内的数据(其中,B1-B2区间内的目标输出数据可以包括池化运算操作对应的输出神经元数据及权值等等)。进一步地,由于该池化运算操作的目标输出数据B1—B2为激活运算操作的输入数据,因此,可以将该池化运算操作的目标输出数据B1—B2存储于第一存储器上。这样,在完成池化运算操作之后,无需将目标输出数据B1—B2从第一存储器搬运至第二存储器上,释放第一存储器上的存储空间。并且,在执行激活运算操作之前,无需再次将该目标输出数据B1—B2从第二存储器搬运至第一存储器上。
而传统技术中,当处理器运算获得目标输出数据B1-B2之后,会首先将该目标输出数据B1-B2从第一存储器上搬运至第二存储器上,以释放第一存储器的存储空间。由于该激活运算操作的输入数据依赖于该池化运算操作的输出数据,因此在处理器需执行激活运算 操作之前,会再次将该池化运算操作对应的目标输出数据B1-B2这一数据块从第二存储器搬运至第一存储器上。在I/O带宽有限的情况下,这种频繁的数据读取操作将影响处理器的处理效率。因而,本披露实施例的数据预处理方法,相较于现有技术而言,通过减少目标输出数据的读取次数(即减少了目标输出数据的load和store的操作),可以减少运算过程中的I/O读取操作的占用时间,从而可以提高处理器的速度及效率。
在一个实施例中,上述方法还包括如下步骤:
若目标运算操作的目标输出数据为目标运算操作之后的其他运算操作的输入数据时(也就是说该目标运算操作的目标输出数据为该待处理运算的中间结果数据),则将该目标运算操作的目标输出数据存储在第一存储器,或者第一存储器和第二存储器上。具体地,若目标运算操作的目标输出数据为目标运算操作之后的其他运算操作的输入数据时,则可以将目标输出数据存储在第一存储器上,以减少该目标输出数据的重复加载操作(即减少了目标输出数据的load的操作)。同时,还可以将该目标输出数据从第一存储器上复制到第二存储器上,从而保证第一存储器和第二存储器上数据的一致性。可选地,是否需要将该目标运算操作对应的目标输出数据同步存储至第二存储器上,可以根据具体地运算需求确定。
当无需将该目标输出数据同步存储至第二存储器上时,可以仅仅将该目标输出数据存储在第一存储上,从而同时减少目标输出数据的load和store的操作。若需要将该目标输出数据同步存储至第二存储器上时,则可以将目标输出数据同步存储在第一存储器和第二存储器上,通过减少该目标输出数据的load操作,以避免数据读取操作过多的占用I/O带宽,影响处理器的处理速度。
如图8所示,若该目标输出数据Y1需要被目标运算操作OP1之后的其他运算操作使用,如该目标输出数据Y1为目标运算操作OP1之后运算操作OP2的输入数据,则将该目标输出数据Y1暂存于第一存储器上。这样,当运算操作OP2为下一个目标运算操作时,则处理器在执行下一运算操作OP2之前,仅需要根据预设规则将该运算操作OP2所需的输入数据Y3从第二存储器搬运至第一存储器上,无需再执行该目标输出数据Y1的搬运步骤。进一步地,处理器还可以将目标输出数据Y1从第一存储器复制到第二存储器中,这样使得第一存储器和第二存储器上的数据具有一致性。这样,本披露实施例的数据预处理方法,减少了运算操作OP1计算结束后将目标输出数据Y1从第一存储器搬运至第二存储器的过程,从而通过减少目标输出数据的读取次数,可以减少运算过程中的I/O读取操作的占用时间,从而可以提高处理器的速度及效率。
如图9所示,由于该池化运算操作的目标输出数据B1—B2为激活运算操作的输入数 据,因此,可以将该池化运算操作的目标输出数据B1—B2同时存储于第一存储器和第二存储器上。这样,在执行激活运算操作之前,无需再次将该目标输出数据B1—B2从第二存储器搬运至第一存储器上。同时,在完成池化运算操作之后,将目标输出数据B1—B2从第一存储器复制至第二存储器上,可以保证第一存储器和第二存储器上数据的一致性。本披露实施例的数据预处理方法,相较于现有技术,减少了将目标输出数据B1—B2从第二存储器再次搬运至第一存储器上的过程,通过减少目标输出数据的读取次数,可以减少运算过程中的I/O读取操作的占用时间,从而可以提高处理器的速度及效率。
在一个实施例中,由于待处理运算的每个目标运算操作所需的全部输入数据的数据容量均较大,因此,处理器可以将各个目标运算操作所涉及的全部输入数据进行拆分,即可以根据第一存储器的可用存储容量,将该各个目标运算操作所涉及的全部输入数据(包括输入神经元数据及权值等等)拆分为多个输入数据块,分别针对每个输入数据块执行该目标运算操作,以获得该目标运算操作的计算结果。最后,可以通过对各个输入数据块对应的计算结果进行融合,获得该目标运算操作对应的输出数据。其中,该输入数据块即为上述的目标输入数据,各个输入数据块对应的输出数据即为上述的目标输出数据。可选地,上述步骤S200具体包括:
处理器根据第一存储器的可用存储容量和该目标运算操作所需的输入数据的数据容量,确定该目标运算操作对应的输入数据块,并将该输入数据块作为该目标运算操作对应的目标输入数据。具体地,若该目标运算操作所需的全部输入数据的数据容量大于该第一存储器的可用存储容量时,处理器可以根据该第一存储器的可用存储容量确定该目标运算操作对应的输入数据块,该输入数据块为该目标运算操作的全部输入数据的一部分。若该目标运算操作所需的全部输入数据的数据容量小于或等于第一存储器的可用存储容量,则可以将该目标运算操作的全部输入数据作为一个输入数据块,即将该目标运算操作的全部输入数据作为其目标输入数据。
例如,如图8所示,处理器可以获取当前目标运算操作OP1,该目标运算操作OP1的全部输入数据为全部输入数据X(其包括输入数据X1及X2)。处理器可以根据该目标运算操作OP1和第一存储器的可用存储容量,将该子输入数据X21和该输入数据的子输入数据X21作为该目标运算操作OP1的目标输入数据,其中,该子输入数据X21和子输入数据X11的数据容量之和小于第一存储器的可用存储容量。当然,在其他实施例中,若该目标运算操作对应的全部输入数据X的数据容量小于第一存储器的可用存储容量时,还可以将该目标运算操作对应的全部输入数据全部加载到第一存储器上。
再如,如图9所示,若当前目标运算操作为池化运算操作,处理器可以根据第一存储 器的可用存储容量及目标运算操作,将C1—C2区间内的数据(C1—C2区间内的数据表示卷积运算操作的输出数据)作为一个输入数据块,并将该输入数据块作为该池化运算操作对应的目标输入数据。若当前目标运算操作为激活运算操作,则处理器可以根据第一存储器的可用存储容量,将B1—B2区间内的数据作为该激活运算操作的一个输入数据块,并将该输入数据块作为该激活运算操作的目标输入数据。
在一个实施例中,当将各个目标运算操作所涉及的全部输入数据拆分为多个输入数据块,由于各个输入数据块的数据容量小于第一存储器的存储容量,因此,该目标运算操作能够融合待处理器运算的多个运算操作,以最大限度地利用第一存储器的存储空间,并提高运算的效率。可选地,上述的目标运算操作包括一个以上的运算操作,即该目标运算操作为一个以上的运算操作的组合。一般地,该目标运算操作中包含的各个运算操作为不同的运算操作,用于实现不同的运算。此时,处理器可以根据第一存储器的可用存储容量,确定各个运算操作对应的子目标输入数据,并根据各个运算操作对应的子目标输入数据确定该目标运算操作对应的目标输入数据。具体地,如图6所示,上述步骤S200中确定该目标运算操作对应的输入数据块的步骤,还包括如下步骤:
S210、根据第一存储器的可用存储容量及各个运算操作的融合属性,确定能够融合的运算操作的数量,获得融合数量阈值。其中,各个运算操作的融合属性可以是包括各个运算操作所涉及输入数据和/或输出数据之间的数据依赖关系等等。
应当清楚的是,若一个或多个运算操作能够一起被处理器执行时,则认为该一个或多个运算操作能够被融合,其融合度较高。若一个或多个运算操作不能一起被处理器执行时,则认为该一个或多个运算操作不能被融合,其融合度低。各个运算操作之间的融合度可由预设的运算规则确定,此处不做具体地限定。
S220、将选定数量的能够融合的一个以上的运算操作的组合作为一个目标运算操作,其中,该选定数量小于或等于融合数量阈值。
例如,该选定数量等于融合数量阈值,即将根据第一存储器的存储容量确定的能够进行融合的多个运算操作,等效为一个目标运算操作。
S230、将该选定数量的各个运算操作对应的子目标输入数据作为该目标运算操作对应的目标输入数据。
例如,如图8所示,待处理运算可以包括运算操作OP1和OP2,根据两个运算操作的融合属性,该运算操作OP1和OP2能够一起被处理器执行,且第一存储器的可用存储容量能够容纳运算操作OP1的目标输入数据和目标输出数据,以及运算操作OP2的目标输入数据和目标输出数据时,则可以认为该目标运算操作能够融合的运算操作的数量为2个, 此时可以将该运算操作OP1和OP2作为一个目标运算操作。同时将该运算操作OP1和OP2对应的子目标输入数据X11、X21及Y3作为该目标运算操作的目标输入数据。
若运算操作OP1和OP2能够融合,但第一存储器的可用存储容量只能容纳运算操作OP1的目标输入数据和目标输出数据,无法完全容纳运算操作OP2的目标输入数据和目标输出数据时,此时,可以任务该目标运算操作能够融合的运算操作的数量为1个,此时可以将该运算操作OP1作为一个目标运算操作。同时将该运算操作OP1对应的子目标输入数据X11、X21作为该目标运算操作的目标输入数据。
当然,在其他实施例中,该目标运算操作包含的运算操作的数量还可以是两个以上。例如,在该待处理运算的深度方向上,若该运算操作OP2之后还有其他可以融合的运算操作,且上述能够融合的运算操作对应的目标输入数据和目标输出数据的数据容量,能够满足该第一存储器的可用存储容量时,则该目标运算操作包含的运算操作的数量可以是OP1、OP2及OPn(其中,n大于2,n为正整数)。其中,OP1、OP2及OPn对应的目标输入数据及目标输出数据的数据容量之和小于或等于该第一存储器的可用存储容量。
进一步地,该待处理运算可以是神经网络等运算,该神经网络运算可以包括多个运算层,每个运算层可以表示一个运算操作。例如,处理器需要对神经网络等进行运算,神经网络的每个运算层均可以作为一个运算操作,根据该神经网络的各个运算层的连接关系,可以确定各个运算操作的融合属性,即可以根据神经网络的各个运算层之间的连接关系,确定哪些运算层进行融合及能够融合的运算层的数量,并将该能够融合的一个以上的运算层的组合作为一个目标运算操作。这样,通过在神经网络的深度方向上融合多个运算层作为一个目标运算操作,可以减少运算的次数及数据的读取次数,进一步提高处理器的处理效率。
例如,如图9所示,根据该神经网络的各个运算层的连接关系,可以确定在该神经网络的深度方向上,卷积运算操作、池化运算操作及激活运算操作能够进行融合。此时,处理器可以根据第一存储器的可用存储容量,以及各个运算操作的目标输入数据容量等确定融合数量阈值。具体地,若该第一存储器的可用存储容量能够容纳池化运算操作的目标输入数据C1-C2,以及激活运算操作的目标输入数据B1-B2,则可以确定该融合数量阈值为2个,并将该池化运算操作和激活运算操作等效为一个目标运算操作。此时,该目标运算操作的目标输入数据可以是C1-C2区间内的数据。在其他实施例中,该目标运算操作还可以是卷积运算操作、池化运算操作及激活运算操作三者的融合。
或者,该激活运算操作之后还需执行其他运算操作时,该目标运算操作还可以根据第一存储器的可用存储容量,继续融合更多的运算操作。例如,该神经网络可以包括N个运 算层,处理器可以根据该第一存储器的可用存储容量确定融合阈值为n个(其中,n大于或等于1,且n小于或等于N),并可以将n个运算层作为一个目标运算操作。此处仅用于举例说明,并不作为具体限定。
更进一步地,当该目标运算操作包含多个运算操作时,还可以将该目标运算操作执行过程中的中间计算结果存储在第一存储器上。具体地,上述方法还包括如下步骤:
若该目标运算操作中当前运算操作输出的中间计算结果需作为其之后的其他运算操作的输入数据,或当前运算操作输出的中间计算结果需作为其他目标运算操作的输入数据时,处理器可以将该当前运算操作输出的中间计算结果暂存于第一存储器上。具体地,处理器可以根据该当前运算操作输出的中间结果的数据容量,在第一存储器上为该当前运算操作输出的中间结果分配一段存储地址。
若该当前运算操作之后的其他运算操作或其他目标运算操作均不需使用该当前运算操作输出的中间计算结果,则可以将该当前运算操作输出的中间结果所占用的存储空间进行重新分配,即可以将该当前运算操作的中间结果所占用的存储地址分配给其他数据。
例如,如图8所示,该当前运算操作OP1输出的中间计算结果Y1为下一运算操作OP2的输入数据时,则处理器可以将该当前运算操作输出的中间结果Y1暂存于第一存储器上。这样,减少了中间计算结果Y1的读取次数,从而可以提高处理器的处理效率及速度。若该运算操作OP2不需继续使用该中间计算结果,且该目标运算操作之后的其他目标运算操作均不需要复用该中间计算结果Y1时,则可以释放该中间计算结果Y1所占用的存储空间,将该中间计算结果Y1所占用的存储地址分配给其他数据,如将当前目标运算操作之后的其他目标运算操作的目标输出数据存放在该中间计算结果所占用的存储空间中,以实现第一存储器上存储空间的复用。
再如,如图9所示,池化运算操作的目标输入数据为C1-C2区间内的数据,该目标输入数据对应的目标输出数据为B1-B2区间内的数据。且该目标输出数据B1-B2为激活运算操作的目标输入数据,则处理器可以该中间计算结果B1-B2暂存于第一存储器上。这样,减少了中间计算结果B1-B2的读取次数,从而可以提高处理器的处理效率及速度。若激活运算操作无需使用该目标输出数据B1-B2时,则可以将目标输出数据B1-B2所占用的存储空间分配给其他数据,以实现第一存储器上存储空间的复用。
在一个实施例中,当目标运算操作的目标输入数据仅为该目标运算操作对应的全部输入数据的一部分时,该目标运算操作的每个目标输入数据仅用于完成该目标运算操作的一部分运算。为提高该目标运算操作的处理速度及充分利用第一存储器的存储空间,该目标运算操作对应的目标输入数据的数量可以为一个以上,每个目标输入数据为全部输入数据 的一部分,即每个目标输入数据包含全部输入数据的一个以上的输入数据块。也就是说,可以同时将一个以上的目标输入数据加载到第一存储器上。进一步地,根据该目标输入数据的数量,可以将该目标运算操作拆分为多个子目标运算操作,可选地,各个子目标运算操作能够实现相同的运算。具体地,如图7所示,上述方法还包括如下步骤:
S500、分别根据各个所述子目标运算操作对应的目标输入数据容量及目标输出数据的数据容量,确定各个子目标运算操作所需的目标存储容量;其中,各个子目标运算操作所需的目标存储容量可以相等,也可以不等。
S510、根据第一存储器的可用存储容量及当前子目标运算操作所需的目标存储容量,确定第一存储器的剩余存储容量;
S520、根据第一存储器的剩余存储容量以及各个子目标运算操作所需的目标存储容量,确定所述子目标运算操作的数量。
可选地,可以根据该第一存储器的剩余存储容量以及该当前子目标运算操作之外的其他子目标运算操作的目标存储容量,确定该第一存储器上还能够容纳多少个子目标运算操作。之后,根据该当前子目标运算操作及其之外的其他子目标运算操作的数量,可以确定该子目标运算操作的总数量。
具体地,在当前子目标运算操作的目标输入数据的数据容量和目标输出数据的数据容量之和小于该第一存储器的可用存储容量时,可以根据该第一存储器的剩余存储容量判断是否能够执行一个以上的子目标运算操作。若是,则处理器可以同时处理该一个以上的子目标运算操作对应的目标输入数据。这样,通过同时处理多段目标输入数据,能够进一步提高该处理器的处理速度及效率。
如图8所示,目标运算操作(图中的左侧的运算操作)可以包括运算操作OP1和OP2,处理器可以根据确定该目标运算操作的当前子目标运算操作的目标输入数据X11、X21及Y3的数据容量,并确定该当前子目标运算操作的目标输出数据Y1和Z1的数据容量,并根据该当前子目标运算操作的目标输入数据及目标输出数据的数据容量之和,确定当前子目标运算操作所需的目标存储容量。若该当前子目标运算操作的目标存储容量小于该第一存储器的可用存储容量时,则可以计算获得该第一存储器的剩余存储容量。该第一存储器的剩余存储容量等于第一存储器的可用存储容量减去该当前子目标运算操作的目标存储容量。之后,处理器可以根据该第一存储器的剩余存储容量确定子目标运算操作的数量。
具体地,若该第一存储器的剩余存储容量还能够容纳另一个子目标运算操作的目标输入数据X12、X22及Y4,运算操作OP1输出的中间计算结果Y2,以及运算操作OP2输出的目标输出数据Z2时,则可以确定该目标运算操作的数量为两个,并可以将该子输入数 据X21、X22及Y4作为其中一个目标运算操作的目标输入数据。这样,通过在该待处理运算的横向方向上,同时加载同一目标运算操作的多段目标输入数据,使得处理器可以并行处理多个目标输入数据,能够进一步提高该处理器的处理速度及效率。
进一步地,若该第一存储器的剩余存储容量不仅能够容纳另一个子目标运算操作的目标输入数据X12、X22及Y4,运算操作OP1输出的中间计算结果Y2,以及运算操作OP2输出的目标输出数据Z2时,且该第一存储器的剩余存储容量还能够容纳运算操作OP3的输出数据Y时,则还可以将运算操作OP1、OP2及OP3进行融合,以通过执行一次运算获得计算结果Y。
再如,如图9所示,该待处理运算为神经网络等运算,该待处理运算可以包括卷积层、池化层及激活层,上述各个运算层执行顺序依次为卷积运算操作—池化运算操作—激活运算操作。若该目标运算操作为激活运算操作,则处理器可以根据第一存储器的存储容量获取当前子目标运算操作的目标输入数据,该当前子目标运算操作的目标输入数据可以是池化层上B1—B2区间内输入数据。该当前子目标运算操作的目标输出数据为A1。若该当前子目标运算操作的目标输入数据的数据容量B1—B2及其对应的目标输出数据的数据容量之和,小于该第一存储器的存储容量,即该当前子目标运算操作所需的目标存储容量小于第一存储器的存储容量时,则处理器可以进一步根据该第一存储器的剩余存储容量确定该子目标运算操作的数量。如,处理器可以根据该第一存储器的剩余存储容量,确定第一存储器的剩余存储容量能够满足激活运算A1—A2这个区间的运算量,则确定子目标运算操作的数量为两个,并可以将目标输入数据B2—B3区间内的数据作为该激活运算操作的一个目标运算操作对应的目标输入数据。
进一步地,若一个以上的子目标运算操作的目标输入数据相交时,则确定当前子目标运算操作的目标输入数据与其他子目标运算操作的目标输入数据之间的交集,将该交集暂存于第一存储器上。即该当前子目标运算操作的目标输入数据的部分或全部还需作为其他运算操作的目标输入数据时,则可以将该交集暂存于第一存储器上,以避免该部分数据的多次读取操作,从而可以提高处理器的处理效率及速度。
例如,如图9所示,若目标运算操作为池化运算操作,且该目标运算操作的子目标运算操作的数量为两个,相应地,该池化运算操作对应的目标输入数据的数量可以为两个,其中一个目标输入数据为C1-C2,另一个目标输入数据为C3-C4。该目标输入数据C1-C2对应的目标输出数据为B1-B2,该目标输入数据C3-C4对应的目标输出数据为B2-B3。结合附图可以看出,输入数据C3-C2区间的数据即是目标输入数据C1-C2的一部分,同时也是目标输入数据C3-C4的一部分,即两个目标输入数据存在交集C3-C2。此时,为减少数 据的读取次数,在完成该目标输入数据C1-C2对应的池化运算操作后,可以将输入数据C3-C2仍然存储在第一存储器上,以避免该部分数据的多次读取操作,从而可以提高处理器的处理效率及速度。
在一个实施例中,上述方法还包括如下步骤:
若目标运算操作之后的运算操作与目标运算操作之间的运算间隔在预设范围内时,则将目标输出数据存储在第一存储器上,以减小目标输出数据的读取次数。
具体地,若目标运算操作之后的其他运算操作与该目标运算操作之间的运算间隔在预设范围内,如目标运算操作与其之后的其他运算操作之间的运算间隔3~5个运算操作时,则可以将该目标输出数据存储在第一存储器上,以减小目标输出数据的读取次数。若目标运算操作之后的其他运算操作与该目标运算操作之间的间隔超出预设范围内时,则为避免该目标运算操作的目标输出数据长时间占用第一存储器的存储空间,可以将该目标输出数据从第一存储器搬运到第二存储器上。
在一个实施例中,上述方法还包括如下步骤:
处理器可以根据目标运算操作的目标输入数据的数据容量,确定目标输入数据在第一存储器上的存储地址;根据目标运算操作的目标输出数据的数据容量,确定目标输出数据在第一存储器上的存储地址。
具体地,处理器可以根据目标运算操作的目标输入数据的数据容量,在第一存储器上为该目标输入数据分配一个与之数据容量匹配的存储空间,并将该存储空间的存储地址分配给该目标输入数据。这样,在实际运算过程中,可以将目标输入数据加载到第一存储器上的指定存储空间上。同理,处理器可以根据目标运算操作的目标输出数据的数据容量,在第一存储器上为该目标输入数据分配一个与之数据容量匹配的存储空间,并将该存储空间的存储地址分配给该目标输出数据。这样,在实际运算过程中,可以将该目标输出数据存储在第一存储器上的指定存储空间上。
在一个实施例中,上述方法还包括如下步骤:
若目标运算操作的目标输入数据无需继续使用时,处理器可以将目标输入数据的存储地址的部分或全部分配给目标运算操作的目标输出数据。这样,可以通过对同一块存储空间的多次复用,提高该第一存储器的空间利用率。
可选地,处理器可以记录上述各个目标运算操作的目标输入数据的存储地址、目标输出数据的存储地址、中间计算结果的存储地址以及第一存储器上各个存储空间的更新规则等等,根据上述的数据对应的存储地址获得该待处理运算对应的存储分配规则。当处理器需要执行该待处理运算时,处理器可以获取该待处理运算对应的存储分配规则,并根据该 存储分配规则确定运算过程中各种数据的读写操作及存储位置等等。
在一个实施例中,上述的数据预处理方法还可以应用于图2至图4所示的计算机设备中。此时,根据预设运算分配规则,该目标运算操作的一部分需要由主处理电路执行,该目标运算操作的另一部分需要由从处理电路执行。相应地,多个第一存储器可以包括主存储器和从存储器,其中,主存储器靠近主处理电路设置,进一步地,该主存储器还可以是该主处理电路的片上存储器。该从存储器靠近从处理电路设置,进一步地,该从存储器还可以是该从处理电路的片上存储器。此时,该目标运算操作对应的目标输入数据一部分需要加载到该主存储器上由主处理电路执行,该目标运算操作对应的目标输入数据另一部分需要加载到一个以上的从存储器上,由各个从存储器对应的从处理电路执行。
具体地,如图10所示,当图2至图4所示的计算机设备执行上述的数据预处理方法时,其包括如下步骤:
S600、获取主存储器的可用存储容量、从存储器的可用存储容量以及目标运算操作;
具体地,处理器可以根据该主存储器的配置信息(如该主存储器的型号等信息),获得该主存储器的总存储容量。进一步地,处理器可以根据该主存储器的总存储容量及该主存储器上已经占用的存储容量,获得该主存储器的可用存储容量。同理,处理器可以根据该从存储器的配置信息,获得从存储器的总存储容量,并根据该从存储器的总存储容量及该从存储器上已经占用的存储容量,获得该从存储器的可用存储容量。可选地,该处理器的主处理电路可以获得该主存储器的可用存储容量,各个从处理电路可以获得对应从存储器的可用存储容量,并将其对应的从存储器的可用存储容量传送至主处理电路。
同时,处理器的控制器单元可以获取待处理运算,并将该待处理运算的解析结果等数据发送至主处理电路。该主处理电路可以根据该待处理运算、主存储器的可用存储容量及从存储器的可用存储容量确定目标运算操作。可选地,该待处理运算包含的运算操作可以是加法操作、减法操作、乘法操作、除法操作、卷积操作、池化操作(Pooling)及激活操作(例如,Relu)等等,此处不做具体限定。该目标运算操作可以是待处理运算中一个或多个运算操作的组合。
S700、根据所述主存储器的可用存储容量、所述从存储器的可用存储容量以及所述目标运算操作,确定所述目标运算操作对应的目标输入数据;其中,所述目标输入数据为所述目标运算操作对应的全部输入数据的一部分或全部。
具体地,处理器的主处理电路处理器可以根据该目标运算操作,确定完成该目标运算操作所需的全部输入数据及该全部输入数据的数据容量(即该全部输入数据需占用的存储空间大小)。进一步地,主处理电路可以根据主存储器的可用存储容量、各个从存储器的 可用存储容量以及该目标运算操作的全部输入数据的数据容量,确定该目标运算操作对应的目标输入数据及其数据容量。
S800、根据所述目标运算操作和所述目标输入数据,确定所述目标运算操作对应的目标输出数据。
具体地,由于待处理运算的计算量是静态可分析的,因此,处理器的主处理电路可以根据该目标运算操作的目标输入数据及该目标运算操作,获得该目标运算操作的目标输出数据及该目标输出数据的数据容量等信息,即处理器的主处理电路可以获得该目标运算操作的目标输出数据所需占用的存储空间。
S900、若所述目标运算操作的目标输出数据为所述目标运算操作之后的其他运算操作的输入数据时,则将所述目标输出数据对应存储在所述主存储器上。
具体地,主处理电路可以根据预设的运算分配规则,将该目标运算操作对应的目标输入数据分配至主存储器及从存储器,以便主处理电路和从处理电路能够协同执行该目标运算操作。在该目标运算操作的执行过程中,从处理电路可以对其从存储器上的目标输入数据进行处理,得到中间计算结果。从处理电路还可以将该中间计算结果传送至主处理电路。主处理电路可以根据可以对其主存储器上的目标输入数据进行处理,并结合各个从处理电路传送的中间计算结果,获得该目标运算操作的目标输出数据。若该目标运算操作对应的目标输出数据为其后的其他运算操作的输入数据时,则可以将该目标输出数据存储在主存储器上,从而减少数据的读取次数,提高处理器的运算速度。
在一个实施例中,如图11所示,上述步骤S700还可以包括:
S710、将所述主存储器的可用存储容量和各个所述从存储器的可用存储容量进行比较,将最小的可用存储容量作为第一存储器的可用存储容量;
S720、根据该第一存储器的可用存储容量及目标运算操作确定目标运算操作对应的目标输入数据。
具体地,由于该目标运算操作需主处理电路和从处理电路协同完成,因此,应当保证主存储器和从存储器同时满足该目标运算操作的目标输入数据的占用空间。即该目标输入数据的数据容量小于该主存储器的可用存储容量,且该目标输入数据的数据容量小于该从存储器的可用存储容量。因此,可以根据主存储器的可用存储容量与各个从存储器的可用存储容量进行比较,并将主存储器及各个从存储器中最小的可用存储容量作为该处理器的第一存储器的可用存储容量。之后,主处理电路可以根据该第一存储器的可用存储容量及目标运算操作,确定该目标运算操作对应的目标输入数据。
在一个实施例中,主处理电路可以根据预设的运算分配规则对该目标输入数据进行拆 分,将该目标输入数据分配为多个数据块,并确定各个数据块对应的处理电路。其中,该目标输入数据中由主处理电路进行处理的数据块可以记为第一目标输入数据。该目标输入数据中由从处理电路进行处理的数据块可以记为第二目标输入数据。进一步地,每个从处理电路对应的第二目标输入数据的数据容量可以不等,具体由运算分配规则确定。具体地,所述方法还包括如下步骤:
根据预设的运算分配规则,确定所述主存储器对应的第一目标输入数据,以及各个所述从存储器对应的第二目标输入数据。具体地,主处理电路可以根据预设的运算分配规则,确定该目标运算操作的哪些目标输入数据由主处理电路进行处理,该目标运算操作的哪些目标输入数据由各个从处理电路进行处理。
例如,如图9所示,该当前目标运算操作为池化运算操作,如需要完成池化层上B1-B2区间的运算时,此时,该目标运算操作所需的目标输入数据为C1-C2。主处理器可以根据预设的运算分配规则,可以将输入数据C1-C3作为第二目标输入数据,并将该第二目标输入数据C1-C3存储至从存储器上。将该输入数据C3-C2作为第一目标输入数据,并将该第一目标输入数据C3-C2存储至主存储器中。
进一步地,上述方法还可以包括如下步骤:
处理器还可以根据主存储器的可用存储容量和所述第一目标输入数据的数据容量,确定第一目标输入数据在主存储器上的存储地址。具体地,主处理电路可以根据主存储器的可用存储容量和第一目标输入数据的数据容量,确定该第一目标输入数据在主存储器上的存储地址。进一步地,该主处理电路还可以根据该第一目标输入数据的数据容量及目标运算操作,确定该第一目标输入数据对应的第一目标输出数据及其数据容量,并确定该第一目标输出数据在主存储器上的存储地址。
处理器还可以根据从存储器的可用存储容量和所述第二目标输入数据的数据容量,确定第二目标输入数据在从存储器上的存储地址。具体地,主处理电路以根据各个从处理电路的可用存储容量及其对应的第二目标输入数据的数据容量,确定各个第二目标输入数据在其对应的从存储器上的存储地址。进一步地,该主处理电路还可以根据各个第二目标输入数据的数据容量及目标运算操作,确定各个第二目标输入数据对应的第二目标输出数据及其数据容量,并确定各个第二目标输出数据在其对应的从存储器上的存储地址。
更进一步地,各个从处理电路可以将计算获得第二目标输出数据传送至主处理电路,主处理电路还可以进一步确定该各个第二目标输出数据在主存储器上的存储地址。
在一个实施例中,若该从处理电路上执行的其他运算操作需要继续其对应的第二目标输出数据时,则可以将该第二目标输出数据暂存于该从处理电路对应的从存储器上。这样, 可以减少主存储器和从存储器之间的数据读取操作,进一步提高该处理器的运算速度。
在一个实施例中,上述的目标运算操作包括一个以上的运算操作,即该目标运算操作为一个以上的运算操作的组合。一般地,该目标运算操作中包含的各个运算操作为不同的运算操作,用于实现不同的运算。此时,处理器的主处理电路可以根据第一存储器的可用存储容量,确定各个运算操作对应的子目标输入数据,并根据各个运算操作对应的子目标输入数据确定该目标运算操作对应的目标输入数据。具体地,该目标输入数据的确定过程与上述方法中的步骤S210~S230一致,具体可参见上文中的描述,此处不再赘述。
进一步地,当该目标运算操作包括一个以上的运算操作时,该一个以上的运算操作可以分为第一目标运算操作和第二目标运算操作。主处理电路可以根据预设的运算分配规则,将该目标运算操作中的第一目标运算操作分配给主处理电路,将该目标运算操作中的第二目标运算操作分配给从处理电路。相应地,主处理电路可以将第一目标运算操作所需的输入数据存储至主存储器上,分别将该各个第二目标运算操作所需的输入数据存储至对应的从存储器上。
例如,如图9所示,若该第一存储器的可用存储容量能够容纳池化运算操作的目标输入数据C1-C2,以及激活运算操作的目标输入数据B1-B2,则可以将该池化运算操作和激活运算操作等效为一个目标运算操作。此时,该目标运算操作的目标输入数据可以是C1-C2区间内的数据。此时,主处理电路可以根据预设的运算规则,将激活运算操作作为第一目标运算操作,分配至主处理电路本身,将池化运算操作作为第二目标运算操作分配至从处理电路。相应地,可以将池化运算操作所需的输入数据C1-C2加载到从存储器上,将该激活运算操作所需的输入数据B1-B2加载到主存储器上。由于该池化运算操作与激活运算操作之间存在依赖关系,因此,可以在完成该池化运算操作之后,再将激活运算操作所需的输入数据B1-B2从存储器上加载到主存储器上。
在一个实施例中,当目标运算操作的目标输入数据仅为该目标运算操作对应的全部输入数据的一部分时,该目标运算操作的每个目标输入数据仅用于完成该目标运算操作的一部分运算。为提高该目标运算操作的处理速度及充分利用第一存储器的存储空间,该目标运算操作对应的目标输入数据的数量可以为一个以上,每个目标输入数据为全部输入数据的一部分,即每个目标输入数据包含全部输入数据的一个以上的输入数据块。也就是说,可以同时将一个以上的目标输入数据加载到第一存储器上。进一步地,根据该目标输入数据的数量,可以将该目标运算操作拆分为多个子目标运算操作,可选地,各个子目标运算操作能够实现相同的运算。
该主处理电路可以根据该第一存储器的可用存储容量以及各个目标运算操作所需的 目标存储容量的大小,确定目标运算操作的数量,从而可以同时将一个以上的目标运算操作的目标输入数据加载到第一存储器上。具体地,该目标运算操作的数量的确定过程与上述方法中的步骤S500~步骤S520一致,具体可参见上文中的描述,此处不再赘述。
应该理解的是,虽然图5-7的流程图以及图10-11中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图5-7以及图10-11中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
应该理解的是,虽然图5-7的流程图以及图10-11中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图5-7以及图10-11中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本披露所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
在一个实施例中,如图12所示,本披露实施例提供了一种数据预处理装置,其可以包括获取模块410、输入确定模块420、输出确定模块430以及存储分配模块440。其中,
获取模块410用于获取第一存储器的可用存储容量及目标运算操作;输入确定模块420 用于根据所述目标运算操作及所述第一存储器的可用存储容量,确定所述目标运算操作对应的目标输入数据;输出确定模块430用于根据所述目标运算操作和所述目标输入数据,确定所述目标运算操作的目标输出数据;存储分配模块440用于在所述目标运算操作的目标输出数据为所述目标运算操作之后的其他运算操作的输入数据时,则将所述目标运算操作的目标输出数据存储在所述第一存储器上,其中,所述第一存储器靠近处理器设置。
可选地,所述目标运算操作包含一个以上的运算操作,每个所述运算操作对应有子目标输入数据。如图13所示,所述输入确定模块420还包括融合确定单元421和输入确定单元422。其中,融合确定单元421用于根据所述第一存储器的可用存储容量及所述待处理运算中各个运算操作的融合属性,确定能够融合的运算操作的数量,获得融合数量阈值。输入确定单元422用于将选定数量的所述能够融合的运算操作的组合作为所述目标运算操作,所述选定数量小于或等于所述融合数量阈值;将所述选定数量的能够融合的各个运算操作对应的子目标输入数据作为所述目标运算操作对应的目标输入数据。
可选地,所述待处理运算为包含多个运算层的神经网络运算,每个所述运算层表示一个所述运算操作;所述融合确定单元421还用于根据所述神经网络运算的各个运算层的连接关系,确定各个所述运算操作的融合属性。
可选地,所述存储分配模块440还用于在所述目标运算操作中当前运算操作输出的中间计算结果需作为所述目标运算操作中其他运算操作的输入数据,或所述当前运算操作输出的中间计算结果需作为其他目标运算操作的输入数据时,则将所述当前运算操作输出的中间计算结果存储于第一存储器上,或将所述当前运算操作输出的中间计算结果存储于第一存储器和第二存储器上。
可选地,所述目标运算操作包括一个以上的子目标运算操作,每个所述子目标运算操作对应一个所述目标输入数据;其中,所述目标运算操作对应的全部输入数据包括多个输入数据块,所述目标运算操作对应的目标输入数据的数量为一个以上,每个所述目标输入数据包含一个以上的所述输入数据块;输入确定模块420还用于分别根据各个所述子目标运算操作的目标输入数据的数据容量及目标输出数据的数据容量,确定各个所述子目标运算操作所需的目标存储容量;根据所述第一存储器的可用存储容量以及当前子目标运算操作所需的目标存储容量,确定所述第一存储器的剩余存储容量;根据所述第一存储器的剩余存储容量,以及所述当前子目标运算操作之外的其他子目标运算操作所需的目标存储容量,确定所述子目标运算操作的数量。
可选地,所述存储分配模块440还用于在一个以上的所述子目标运算操作的目标输入数据存在交集时,则将所述一个以上的子目标运算操作的目标输入数据之间的交集存储在 所述第一存储器上。
可选地,所述存储分配模块440还用于根据所述目标运算操作的目标输入数据的数据容量,确定所述目标输入数据在所述第一存储器上的存储地址;根据所述目标运算操作的目标输出数据的数据容量,确定所述目标输出数据在所述第一存储器上的存储地址;若所述目标运算操作之后的其他运算操作均不需使用所述目标运算操作的目标输入数据时,则在完成所述目标运算操作之后,将所述目标运算操作对应的目标输入数据的存储地址的一部分或全部,分配给所述目标运算操作的目标输出数据。
在另一个实施例中,如图12所示,获取模块410用于获取主存储器的可用存储容量、从存储器的可用存储容量以及目标运算操作;输入确定模块420用于根据所述主存储器的可用存储容量、所述从存储器的可用存储容量以及所述目标运算操作,确定所述目标运算操作对应的目标输入数据;输出确定模块430用于根据所述目标运算操作和所述目标输入数据,确定所述目标运算操作对应的目标输出数据;存储分配模块440用于在所述目标运算操作的目标输出数据为所述目标运算操作之后的其他运算操作的输入数据时,则将所述目标输出数据对应存储在所述主存储器上。
可选地,如图14所示,所述数据预处理装置还包括存储容量确定模块450,用于将所述主存储器的可用存储容量和各个所述从存储器的可用存储容量进行比较,将最小的可用存储容量作为第一存储器的可用存储容量;输入确定模块420具体用于根据所述第一存储器的可用存储容量及目标运算操作,确定目标运算操作对应的目标输入数据。
可选地,所述目标运算操作包含一个以上的运算操作,每个所述运算操作对应有子目标输入数据;所述输入确定模块420还包括融合确定单元421和输入确定单元422。其中,融合确定单元421用于根据所述第一存储器的可用存储容量及所述待处理运算中各个运算操作的融合属性,确定能够融合的运算操作的数量,获得融合数量阈值。输入确定单元422用于将选定数量的所述能够融合的运算操作的组合作为所述目标运算操作,所述选定数量小于或等于所述融合数量阈值;将所述选定数量的各个能够融合的运算操作对应的子目标输入数据作为所述目标运算操作对应的目标输入数据。
可选地,所述待处理运算为包含多个运算层的神经网络运算,每个所述运算层表示一个所述运算操作;所述融合确定单元421还用于根据所述神经网络运算的各个运算层的连接关系,确定各个所述运算操作的融合属性。
可选地,所述目标运算操作包括一个以上的子目标运算操作,每个所述子目标运算操作对应一个所述目标输入数据;其中,所述目标运算操作对应的全部输入数据包括多个输入数据块,所述目标运算操作对应的目标输入数据的数量为一个以上,每个所述目标输入 数据包含一个以上的所述输入数据块。所述输入确定模块还用于分别根据各个所述子目标运算操作的目标输入数据的数据容量及目标输出数据的数据容量,确定各个所述子目标运算操作所需的目标存储容量;根据所述第一存储器的可用存储容量以及当前子目标运算操作所需的目标存储容量,确定所述第一存储器的剩余存储容量;根据所述第一存储器的剩余存储容量,以及所述当前子目标运算操作之外的其他子目标运算操作所需的目标存储容量,确定所述子目标运算操作的数量。
可选地,所述目标输入数据包括第一目标输入数据和第二目标输入数据;所述输入确定模块420还用于根据预设的运算分配规则,确定所述主存储器对应的第一目标输入数据以及各个所述从存储器对应的第二目标输入数据;所述存储分配模块440还用于根据所述主存储器的可用存储容量以及所述第一目标输入数据的数据容量,确定所述第一目标输入数据在所述主存储器上的存储地址;分别根据各个所述从存储器的可用存储容量以及对应的所述第二目标输入数据的数据容量,确定各个所述第二目标输入数据在所述从存储器上的存储地址。
可选地,所述目标输出数据包括第一目标输出数据和第二目标输出数据;所述输出确定模块430还用于根据所述目标运算操作及所述第一目标输入数据,确定所述第一目标输出数据及所述第一目标输出数据在所述主存储器上的存储地址;根据所述目标运算操作及各个所述第二目标输入数据,确定各个所述第二目标输出数据及各个所述第二目标输出数据在对应的所述从存储器上的存储地址;根据各个所述第二目标输出数据,确定各个所述第二目标输出数据在所述主存储器上的存储地址。
可选地,所述存储分配模块440还用于在所述从处理电路上执行的其他目标运算操作需要使用所述第二目标输出数据时,则将所述第二目标输出数据存储在所述从处理电路对应的从存储器上。进一步地,所述存储分配模块440还用于在所述目标运算操作的目标输出数据为所述目标运算操作之后的其他运算操作的输入数据时,则将所述目标输出数据对应存储在所述主存储器和所述第二存储器上。
应当清楚的是,该装置的工作原理与上述方法中各个步骤的执行过程一致,具体可参见上文中的描述,此处不再赘述。
在一个实施例中,本披露实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现上述实施例中任一项所述的方法的步骤。具体地,该计算机程序被处理器执行时,实现如下步骤:
获取第一存储器的可用存储容量及目标运算操作;
根据目标运算操作及第一存储器的可用存储容量,确定所述目标运算操作对应的目标 输入数据;其中,所述目标输入数据为所述目标运算操作对应的全部输入数据的一部分或全部;
根据所述目标运算操作和所述目标输入数据,确定所述目标运算操作的目标输出数据;
若所述目标运算操作的目标输出数据为所述目标运算操作之后的其他运算操作的输入数据时,则将所述目标运算操作的目标输出数据存储在所述第一存储器上,其中,所述第一存储器靠近处理器设置。
进一步地,该处理器可以是包括主处理电路和从处理电路形成的总从结构,此时,该处理器在执行上述计算机程序时,具体实现以下步骤:
获取主存储器的可用存储容量、从存储器的可用存储容量以及目标运算操作;
根据所述主存储器的可用存储容量、所述从存储器的可用存储容量以及所述目标运算操作,确定所述目标运算操作对应的目标输入数据;其中,所述目标输入数据为所述目标运算操作对应的全部输入数据的一部分或全部;
根据所述目标运算操作和所述目标输入数据,确定所述目标运算操作对应的目标输出数据;
若所述目标运算操作的目标输出数据为所述目标运算操作之后的其他运算操作的输入数据时,则将所述目标输出数据对应存储在所述主存储器上。
应当清楚的是,该处理器执行计算机程序时的过程,与上述方法中各个步骤的执行过程一致,具体可参见上文中的描述,此处不再赘述。
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本披露的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本披露构思的前提下,还可以做出若干变形和改进,这些都属于本披露的保护范围。因此,本披露专利的保护范围应以所附权利要求为准。

Claims (20)

  1. 一种数据预处理方法,其特征在于,所述方法包括如下步骤:
    获取第一存储器的可用存储容量及目标运算操作;
    根据所述目标运算操作及所述第一存储器的可用存储容量,确定所述目标运算操作对应的目标输入数据;
    根据所述目标运算操作和所述目标输入数据,确定所述目标运算操作的目标输出数据;
    若所述目标运算操作的目标输出数据为所述目标运算操作之后的其他运算操作的输入数据时,则将所述目标运算操作的目标输出数据存储在所述第一存储器上,其中,所述第一存储器靠近处理器设置。
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括如下步骤:
    若所述目标运算操作的目标输出数据为所述目标运算操作之后的其他运算操作的输入数据时,则将所述目标运算操作的目标输出数据存储在所述第一存储器上和第二存储器上;
    其中,所述第二存储器远离所述处理器设置,所述第一存储器的存储容量小于所述第二存储器的存储容量。
  3. 根据权利要求1所述的方法,其特征在于,所述目标运算操作包含一个以上的运算操作,每个所述运算操作对应有子目标输入数据;所述的根据目标运算操作及第一存储器的可用存储容量,确定所述目标运算操作对应的目标输入数据的步骤,还包括:
    根据所述第一存储器的可用存储容量及待处理运算中各个运算操作的融合属性,确定能够融合的运算操作的数量,获得融合数量阈值;
    将选定数量的所述能够融合的运算操作的组合作为所述目标运算操作,所述选定数量小于或等于所述融合数量阈值;
    将所述选定数量的各个能够融合的运算操作对应的子目标输入数据,作为所述目标运算操作对应的目标输入数据。
  4. 根据权利要求3所述的方法,其特征在于,所述待处理运算为包含多个运算层的神经网络运算,每个所述运算层表示一个所述运算操作;所述方法还包括如下步骤:
    根据所述神经网络运算的各个运算层的连接关系,确定各个所述运算操作的融合属性。
  5. 根据权利要求3所述的方法,其特征在于,所述方法还包括如下步骤:
    若所述目标运算操作中当前运算操作输出的中间计算结果需作为所述目标运算操作中其他运算操作的输入数据,或所述当前运算操作输出的中间计算结果需作为其他目标运算操作的输入数据时,则将所述当前运算操作输出的中间计算结果存储于第一存储器上,或将所述当前运算操作输出的中间计算结果存储于第一存储器和第二存储器上。
  6. 根据权利要求1-5任一项所述的方法,其特征在于,所述目标运算操作对应的输入数据包括多个输入数据块,每个所述目标输入数据包含一个以上的所述输入数据块,所述目标运算操作对应的目标输入数据的数量为一个以上。
  7. 根据权利要求6所述的方法,其特征在于,所述目标运算操作包括一个以上的子目标运算操作,每个所述子目标运算操作对应一个所述目标输入数据;所述方法还包括如下步骤:
    分别根据各个所述子目标运算操作的目标输入数据的数据容量及目标输出数据的数据容量,确定各个所述子目标运算操作所需的目标存储容量;
    根据所述第一存储器的可用存储容量以及当前子目标运算操作所需的目标存储容量,确定所述第一存储器的剩余存储容量;
    根据所述第一存储器的剩余存储容量,以及所述当前子目标运算操作之外的其他子目标运算操作所需的目标存储容量,确定所述子目标运算操作的数量。
  8. 根据权利要求7所述的方法,其特征在于,所述方法还包括如下步骤:
    若一个以上的所述子目标运算操作的目标输入数据存在交集时,则将所述一个以上的子目标运算操作的目标输入数据之间的交集存储在所述第一存储器上。
  9. 根据权利要求1-5任一项所述的方法,其特征在于,所述方法还包括如下步骤:
    若所述目标运算操作之后的其他运算操作与所述目标运算操作之间的运算间隔在预设范围内时,则将所述目标输出数据存储在所述第一存储器上。
  10. 根据权利要求1-5任一项所述的方法,其特征在于,所述方法还包括如下步骤:
    根据所述目标运算操作的目标输入数据的数据容量,确定所述目标输入数据在所述第一存储器上的存储地址;
    根据所述目标运算操作的目标输出数据的数据容量,确定所述目标输出数据在所述第一存储器上的存储地址。
  11. 根据权利要求10所述的方法,其特征在于,所述方法还包括如下步骤:
    若所述目标运算操作之后的其他运算操作均不需使用所述目标运算操作的目标输入数据时,则在完成所述目标运算操作之后,将所述目标运算操作对应的目标输入数据的存 储地址的一部分或全部,分配给所述目标运算操作的目标输出数据。
  12. 一种数据预处理装置,其特征在于,所述装置包括:
    获取模块,用于获取第一存储器的可用存储容量及目标运算操作;
    输入确定模块,用于根据所述目标运算操作及所述第一存储器的可用存储容量,确定所述目标运算操作对应的目标输入数据;
    输出确定模块,用于根据所述目标运算操作和所述目标输入数据,确定所述目标运算操作的目标输出数据;
    存储分配模块,用于在所述目标运算操作的目标输出数据为所述目标运算操作之后的其他运算操作的输入数据时,则将所述目标运算操作的目标输出数据存储在所述第一存储器上,其中,所述第一存储器靠近处理器设置。
  13. 根据权利要求12所述的数据预处理装置,其特征在于,所述目标运算操作包含一个以上的运算操作,每个所述运算操作对应有子目标输入数据;所述输入确定模块还包括:
    融合确定单元,用于根据所述第一存储器的可用存储容量及所述待处理运算中各个运算操作的融合属性,确定能够融合的运算操作的数量,获得融合数量阈值;
    输入确定单元,用于将选定数量的所述能够融合的运算操作的组合作为所述目标运算操作,所述选定数量小于或等于所述融合数量阈值;将所述选定数量的各个能够融合的运算操作对应的子目标输入数据作为所述目标运算操作对应的目标输入数据。
  14. 根据权利要求13所述的数据预处理装置,其特征在于,所述待处理运算为包含多个运算层的神经网络运算,每个所述运算层表示一个所述运算操作;所述融合确定单元还用于根据所述神经网络运算的各个运算层的连接关系,确定各个所述运算操作的融合属性。
  15. 根据权利要求13所述的数据预处理装置,其特征在于,所述存储分配模块还用于在所述目标运算操作中当前运算操作输出的中间计算结果需作为所述目标运算操作中其他运算操作的输入数据,或所述当前运算操作输出的中间计算结果需作为其他目标运算操作的输入数据时,则将所述当前运算操作输出的中间计算结果存储于第一存储器上,或将所述当前运算操作输出的中间计算结果存储于第一存储器和第二存储器上。
  16. 根据权利要求12-15任一项所述的数据预处理装置,其特征在于,所述目标运算操作包括一个以上的子目标运算操作,每个所述子目标运算操作对应一个所述目标输入数据;其中,所述目标运算操作对应的全部输入数据包括多个输入数据块,每个所述目标输入数据包含一个以上的所述输入数据块,所述目标运算操作对应的目标输入数据的数量为 一个以上;所述输入确定模块还用于:
    分别根据各个所述子目标运算操作的目标输入数据的数据容量及目标输出数据的数据容量,确定各个所述子目标运算操作所需的目标存储容量;
    根据所述第一存储器的可用存储容量以及当前子目标运算操作所需的目标存储容量,确定所述第一存储器的剩余存储容量;
    根据所述第一存储器的剩余存储容量,以及所述当前子目标运算操作之外的其他子目标运算操作所需的目标存储容量,确定所述子目标运算操作的数量。
  17. 根据权利要求16所述的数据预处理装置,其特征在于,所述存储分配模块还用于在一个以上的所述子目标运算操作的目标输入数据存在交集时,则将所述一个以上的子目标运算操作的目标输入数据之间的交集存储在所述第一存储器上。
  18. 根据权利要求12-15任一项所述的数据预处理装置,其特征在于,所述存储分配模块还用于:
    根据所述目标运算操作的目标输入数据的数据容量,确定所述目标输入数据在所述第一存储器上的存储地址;
    根据所述目标运算操作的目标输出数据的数据容量,确定所述目标输出数据在所述第一存储器上的存储地址;
    若所述目标运算操作之后的其他运算操作均不需使用所述目标运算操作的目标输入数据时,则在完成所述目标运算操作之后,将所述目标运算操作对应的目标输入数据的存储地址的一部分或全部,分配给所述目标运算操作的目标输出数据。
  19. 一种计算机设备,包括第一存储器、第二存储器和处理器,所述第一存储器靠近所述处理器设置,所述第一存储器和所述第二存储器能够进行数据读写;所述第一存储器或第二存储器存储有计算机程序,其特征在于,所述处理器执行所述计算机程序时实现权利要求1至11中任一项所述方法的步骤。
  20. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求1至11中任一项所述的方法的步骤。
PCT/CN2019/093144 2018-08-28 2019-06-27 数据预处理方法、装置、计算机设备和存储介质 WO2020042739A1 (zh)

Priority Applications (6)

Application Number Priority Date Filing Date Title
US16/622,503 US11966583B2 (en) 2018-08-28 2019-06-27 Data pre-processing method and device, and related computer device and storage medium
KR1020197036813A KR102519467B1 (ko) 2018-08-28 2019-06-27 데이터 전처리 방법, 장치, 컴퓨터 설비 및 저장 매체
JP2019568721A JP6867518B2 (ja) 2018-08-28 2019-06-27 データ前処理方法、装置、コンピュータ機器及び記憶媒体
EP19217269.0A EP3757896B1 (en) 2018-08-28 2019-06-27 Method and device for pre-processing data in a neural network
EP19812653.4A EP3640810A4 (en) 2018-08-28 2019-06-27 DATA PRE-PROCESSING PROCESS AND APPARATUS, COMPUTER DEVICE AND STORAGE MEDIA
US16/718,874 US11243895B2 (en) 2018-08-28 2019-12-18 Data pre-processing method and device, and related computer device and storage medium

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN201810987293.5A CN110865950B (zh) 2018-08-28 2018-08-28 数据预处理方法、装置、计算机设备和存储介质
CN201810987343.XA CN110865792B (zh) 2018-08-28 2018-08-28 数据预处理方法、装置、计算机设备和存储介质
CN201810987343.X 2018-08-28
CN201810987293.5 2018-08-28

Related Child Applications (2)

Application Number Title Priority Date Filing Date
US16/622,503 A-371-Of-International US11966583B2 (en) 2018-08-28 2019-06-27 Data pre-processing method and device, and related computer device and storage medium
US16/718,874 Continuation US11243895B2 (en) 2018-08-28 2019-12-18 Data pre-processing method and device, and related computer device and storage medium

Publications (1)

Publication Number Publication Date
WO2020042739A1 true WO2020042739A1 (zh) 2020-03-05

Family

ID=69643410

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/093144 WO2020042739A1 (zh) 2018-08-28 2019-06-27 数据预处理方法、装置、计算机设备和存储介质

Country Status (5)

Country Link
US (2) US11966583B2 (zh)
EP (2) EP3757896B1 (zh)
JP (1) JP6867518B2 (zh)
KR (1) KR102519467B1 (zh)
WO (1) WO2020042739A1 (zh)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11437032B2 (en) 2017-09-29 2022-09-06 Shanghai Cambricon Information Technology Co., Ltd Image processing apparatus and method
KR102148110B1 (ko) 2018-02-13 2020-08-25 상하이 캠브리콘 인포메이션 테크놀로지 컴퍼니 리미티드 계산 장치 및 방법
US11397579B2 (en) 2018-02-13 2022-07-26 Shanghai Cambricon Information Technology Co., Ltd Computing device and method
US11630666B2 (en) 2018-02-13 2023-04-18 Shanghai Cambricon Information Technology Co., Ltd Computing device and method
CN116991226A (zh) 2018-02-14 2023-11-03 上海寒武纪信息科技有限公司 处理器的控制装置、方法及设备
EP3624020A4 (en) 2018-05-18 2021-05-05 Shanghai Cambricon Information Technology Co., Ltd CALCULATION PROCEDURES AND RELATED PRODUCTS
EP3798850A4 (en) 2018-06-27 2022-03-23 Shanghai Cambricon Information Technology Co., Ltd ON-CHIP CODE BREAKPOINT DEBUG METHOD, ON-CHIP PROCESSOR AND CHIP BREAKPOINT DEBUG SYSTEM
EP3757896B1 (en) 2018-08-28 2023-01-11 Cambricon Technologies Corporation Limited Method and device for pre-processing data in a neural network
WO2020062392A1 (zh) 2018-09-28 2020-04-02 上海寒武纪信息科技有限公司 信号处理装置、信号处理方法及相关产品
CN111383638A (zh) 2018-12-28 2020-07-07 上海寒武纪信息科技有限公司 信号处理装置、信号处理方法及相关产品
CN111832739B (zh) 2019-04-18 2024-01-09 中科寒武纪科技股份有限公司 一种数据处理方法及相关产品
US20200334522A1 (en) 2019-04-18 2020-10-22 Cambricon Technologies Corporation Limited Data processing method and related products
CN112085181B (zh) 2019-06-12 2024-03-29 上海寒武纪信息科技有限公司 神经网络量化方法及装置以及相关产品
US11676028B2 (en) 2019-06-12 2023-06-13 Shanghai Cambricon Information Technology Co., Ltd Neural network quantization parameter determination method and related products

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1503958A (zh) * 2001-04-19 2004-06-09 艾利森电话股份有限公司 自适应存储器分配
CN106874219A (zh) * 2016-12-23 2017-06-20 深圳云天励飞技术有限公司 一种卷积神经网络的数据调度方法、系统及计算机设备
CN107451654A (zh) * 2017-07-05 2017-12-08 深圳市自行科技有限公司 卷积神经网络的加速运算方法、服务器及存储介质
CN107608715A (zh) * 2017-07-20 2018-01-19 上海寒武纪信息科技有限公司 用于执行人工神经网络正向运算的装置及方法

Family Cites Families (190)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0375860A (ja) 1989-08-18 1991-03-29 Hitachi Ltd パーソナライズド端末
JPH045774A (ja) * 1990-04-24 1992-01-09 Seiko Epson Corp 神経網計算機
US5052043A (en) 1990-05-07 1991-09-24 Eastman Kodak Company Neural network with back propagation controlled through an output confidence measure
US6144977A (en) 1995-07-10 2000-11-07 Motorola, Inc. Circuit and method of converting a floating point number to a programmable fixed point number
GB9602701D0 (en) 1996-02-09 1996-04-10 Canon Kk Image manipulation
US7242414B1 (en) 1999-07-30 2007-07-10 Mips Technologies, Inc. Processor having a compare extension of an instruction set architecture
JP2000293371A (ja) 1999-04-09 2000-10-20 Hitachi Ltd マイクロプログラム制御方法及び装置
US6671796B1 (en) 2000-02-25 2003-12-30 Sun Microsystems, Inc. Converting an arbitrary fixed point value to a floating point value
US6931639B1 (en) 2000-08-24 2005-08-16 International Business Machines Corporation Method for implementing a variable-partitioned queue for simultaneous multithreaded processors
AU8446101A (en) 2000-09-07 2002-03-22 Nippon Steel Corp Hexavalent chromium-free surface-treating agent for sn- or al-based coated steelsheet, and surface treated steel sheet
US20020138714A1 (en) 2001-03-22 2002-09-26 Sun Microsystems, Inc. Scoreboard for scheduling of instructions in a microprocessor that provides out of order execution
EP1251460A1 (en) * 2001-04-19 2002-10-23 TELEFONAKTIEBOLAGET L M ERICSSON (publ) Adaptive method for allocation of random access memory
US20030167460A1 (en) 2002-02-26 2003-09-04 Desai Vipul Anil Processor instruction set simulation power estimation method
US7236995B2 (en) 2002-12-27 2007-06-26 Arm Limited Data processing apparatus and method for converting a number between fixed-point and floating-point representations
DE10316381A1 (de) 2003-04-10 2004-10-28 Bayer Technology Services Gmbh Verfahren zum Training von neuronalen Netzen
JP4202244B2 (ja) 2003-12-22 2008-12-24 Necエレクトロニクス株式会社 Vliw型dsp,及びその動作方法
US20060161375A1 (en) 2004-12-30 2006-07-20 Allen Duberstein Optimizing processing speed based on measured temperatures
US7721128B2 (en) 2005-11-29 2010-05-18 International Business Machines Corporation Implementation of thermal throttling logic
US7617164B2 (en) * 2006-03-17 2009-11-10 Microsoft Corporation Efficiency of training for ranking systems based on pairwise training with aggregated gradients
CN1851668A (zh) 2006-06-01 2006-10-25 北京天碁科技有限公司 片上系统芯片、片上系统芯片的跟踪调试系统及方法
DE102006059156B4 (de) 2006-12-14 2008-11-06 Advanced Micro Devices, Inc., Sunnyvale Verfahren zum Testen eines integrierten Schaltkreischips mit zumindest zwei Schaltungskernen sowie integrierter Schaltkreischip und Testsystem
US20110060587A1 (en) 2007-03-07 2011-03-10 Phillips Michael S Command and control utilizing ancillary information in a mobile voice-to-speech application
US8560591B2 (en) 2007-04-25 2013-10-15 International Business Machines Corporation Detection of potential need to use a larger data format in performing floating point operations
US8190664B2 (en) 2007-04-26 2012-05-29 International Business Machines Corporation Employing a mask field of an instruction to encode a sign of a result of the instruction
US8051118B2 (en) 2007-04-26 2011-11-01 International Business Machines Corporation Composition of decimal floating point data
US8051117B2 (en) 2007-04-26 2011-11-01 International Business Machines Corporation Shift significand of decimal floating point data
JP5184824B2 (ja) 2007-06-15 2013-04-17 キヤノン株式会社 演算処理装置及び方法
JP2009110353A (ja) 2007-10-31 2009-05-21 Hitachi Ltd マイクロコントローラ及び制御システム
US7904287B2 (en) 2007-11-13 2011-03-08 International Business Machines Corporation Method and system for real-time prediction of power usage for a change to another performance state
JP4998794B2 (ja) 2007-11-29 2012-08-15 Nkワークス株式会社 画像補正方法と画像補正装置
US20100073068A1 (en) 2008-09-22 2010-03-25 Hanwoo Cho Functional block level thermal control
CN101572829B (zh) 2009-06-10 2011-02-02 中国联合网络通信集团有限公司 Iptv视频质量监测方法、装置和系统
EP2336882A1 (en) 2009-12-18 2011-06-22 Telefonaktiebolaget L M Ericsson (PUBL) Technique for run-time provision of executable code using off-device services
US8478507B2 (en) 2010-04-21 2013-07-02 Toyota Jidosha Kabushiki Kaisha Control device for internal combustion engine
JP2011253374A (ja) 2010-06-02 2011-12-15 Sony Corp 情報処理装置、および情報処理方法、並びにプログラム
US8452463B2 (en) 2010-06-04 2013-05-28 Apple Inc. Adjusting the thermal behavior of a computing system using indirect information about ambient temperature
US8694572B2 (en) 2010-07-06 2014-04-08 Silminds, Llc, Egypt Decimal floating-point fused multiply-add unit
JP5516744B2 (ja) * 2010-08-27 2014-06-11 富士通株式会社 スケジューラ、マルチコアプロセッサシステムおよびスケジューリング方法
US8924455B1 (en) 2011-02-25 2014-12-30 Xilinx, Inc. Multiplication of matrices using systolic arrays
CN102761509B (zh) 2011-04-27 2016-01-06 联芯科技有限公司 Ofdm系统的接收系统及降低接收系统内存的方法
CN103534664B (zh) 2011-05-12 2016-08-31 苹果公司 存在感测
CN102789413B (zh) 2011-05-23 2016-02-17 同济大学 一种并行程序的调试系统及方法
US8594982B2 (en) 2011-06-09 2013-11-26 Pulsar Informatics, Inc. Systems and methods for distributed calculation of fatigue-risk prediction and optimization
CN102404673B (zh) 2011-11-24 2013-12-18 苏州上声电子有限公司 数字化扬声器系统通道均衡与声场控制方法和装置
CN103152673B (zh) 2011-12-07 2015-07-08 中国科学院声学研究所 基于四元码动态失配整形的数字扬声器驱动方法和装置
CN102684701B (zh) 2012-04-27 2014-07-09 苏州上声电子有限公司 基于编码转换的数字扬声器驱动方法和装置
DE102012009502A1 (de) 2012-05-14 2013-11-14 Kisters Ag Verfahren zum Trainieren eines künstlichen neuronalen Netzes
US9417891B2 (en) 2012-06-11 2016-08-16 Vmware, Inc. Unified storage/VDI provisioning methodology
US9063731B2 (en) 2012-08-27 2015-06-23 Samsung Electronics Co., Ltd. Ultra low power apparatus and method to wake up a main processor
CN102903089B (zh) 2012-09-07 2014-12-17 山东大学 一种Linux环境下生成遥感图像快视图的方法
US9412366B2 (en) 2012-09-18 2016-08-09 Adobe Systems Incorporated Natural language image spatial and tonal localization
CN102981854A (zh) 2012-11-16 2013-03-20 天津市天祥世联网络科技有限公司 基于浮点数运算内联函数库的神经网络优化方法
KR20150115724A (ko) 2012-11-22 2015-10-14 각고호우징 게이오기주크 아크릴계 공중합체, 광학 필름, 편광판 및 액정 표시 장치
US9851977B2 (en) 2012-12-06 2017-12-26 Kalray Apparatus and method for combining thread warps with compatible execution masks for simultaneous execution and increased lane utilization
US9720732B1 (en) 2013-02-11 2017-08-01 Amazon Technologies, Inc. Parameter selection for optimization of task execution based on execution history for prior tasks
JP2014170295A (ja) 2013-03-01 2014-09-18 Honda Motor Co Ltd 物体認識システム及び物体認識方法
US20190138372A1 (en) 2013-04-29 2019-05-09 Moogsoft, Inc. System for managing an instructure with security
CN105324750B (zh) 2013-06-12 2018-11-16 三菱电机株式会社 开发环境系统、开发环境装置以及开发环境提供方法
JP6184891B2 (ja) 2014-03-12 2017-08-23 東芝メモリ株式会社 情報処理装置、半導体チップ、情報処理方法およびプログラム
US9507405B2 (en) 2014-06-18 2016-11-29 Oracle International Corporation System and method for managing power in a chip multiprocessor using a proportional feedback mechanism
US9575537B2 (en) 2014-07-25 2017-02-21 Intel Corporation Adaptive algorithm for thermal throttling of multi-core processors with non-homogeneous performance states
US10282100B2 (en) 2014-08-19 2019-05-07 Samsung Electronics Co., Ltd. Data management scheme in virtualized hyperscale environments
GB2524126B (en) 2014-08-28 2016-07-27 Imagination Tech Ltd Combining paths
US9916130B2 (en) 2014-11-03 2018-03-13 Arm Limited Apparatus and method for vector processing
FR3030077B1 (fr) 2014-12-10 2016-12-02 Arnault Ioualalen Procede d'ajustement de la precision d'un programme d'ordinateur manipulant au moins un nombre a virgule.
EP3035204B1 (en) 2014-12-19 2018-08-15 Intel Corporation Storage device and method for performing convolution operations
US20170061279A1 (en) 2015-01-14 2017-03-02 Intel Corporation Updating an artificial neural network using flexible fixed point representation
US20160328645A1 (en) 2015-05-08 2016-11-10 Qualcomm Incorporated Reduced computational complexity for fixed point neural network
US10083395B2 (en) 2015-05-21 2018-09-25 Google Llc Batch processing in a neural network processor
US10438117B1 (en) * 2015-05-21 2019-10-08 Google Llc Computing convolutions using a neural network processor
CN104899641B (zh) 2015-05-25 2018-07-13 杭州朗和科技有限公司 深度神经网络学习方法、处理器和深度神经网络学习系统
CN115100016A (zh) 2015-06-10 2022-09-23 无比视视觉技术有限公司 用于处理图像的图像处理器和方法
CN104978303B (zh) 2015-06-19 2019-06-04 上海兆芯集成电路有限公司 单芯片整合的传感器集线器和多传感器管理方法
CN106469291A (zh) 2015-08-19 2017-03-01 中兴通讯股份有限公司 图像处理方法及终端
US10031765B2 (en) 2015-09-24 2018-07-24 Intel Corporation Instruction and logic for programmable fabric hierarchy and cache
US10812831B2 (en) 2015-09-30 2020-10-20 Piksel, Inc. Video stream delivery via adaptive quality enhancement using error correction models
US11061672B2 (en) 2015-10-02 2021-07-13 Via Alliance Semiconductor Co., Ltd. Chained split execution of fused compound arithmetic operations
CN106570559A (zh) 2015-10-09 2017-04-19 阿里巴巴集团控股有限公司 一种基于神经网络的数据处理方法和装置
US10474951B2 (en) * 2015-10-23 2019-11-12 Nec Corporation Memory efficient scalable deep learning with model parallelization
US9930248B2 (en) 2015-11-17 2018-03-27 Eman Bayani Digital image capturing device system and method
US10409560B1 (en) * 2015-11-18 2019-09-10 Amazon Technologies, Inc. Acceleration techniques for graph analysis programs
CN106814639A (zh) 2015-11-27 2017-06-09 富泰华工业(深圳)有限公司 语音控制系统及方法
CN105893419A (zh) 2015-11-30 2016-08-24 乐视致新电子科技(天津)有限公司 一种多媒体照片生成方法、装置、设备及手机
US10699186B2 (en) * 2015-12-02 2020-06-30 Google Llc Determining orders of execution of a neural network
US11176483B1 (en) * 2016-01-06 2021-11-16 Datarobot Inc. Systems and methods for storing and retrieving data sets based on temporal information
CN107609642B (zh) 2016-01-20 2021-08-31 中科寒武纪科技股份有限公司 计算装置和方法
CN111353588B (zh) * 2016-01-20 2024-03-05 中科寒武纪科技股份有限公司 用于执行人工神经网络反向训练的装置和方法
CN106997236B (zh) * 2016-01-25 2018-07-13 亮风台(上海)信息科技有限公司 基于多模态输入进行交互的方法和设备
US10671938B2 (en) 2016-01-27 2020-06-02 Bonsai AI, Inc. Artificial intelligence engine configured to work with a pedagogical programming language to train one or more trained artificial intelligence models
US10497089B2 (en) 2016-01-29 2019-12-03 Fotonation Limited Convolutional neural network
EP3416105A4 (en) 2016-02-12 2019-02-20 Sony Corporation INFORMATION PROCESSING METHOD AND INFORMATION PROCESSING DEVICE
JP2017156511A (ja) 2016-03-01 2017-09-07 ソニー株式会社 情報処理装置、情報処理方法、およびプログラム
US10103714B2 (en) 2016-03-01 2018-10-16 Qualcomm Incorporated Adjust voltage for thermal mitigation
US10019779B2 (en) 2016-03-08 2018-07-10 Amazon Technologies, Inc. Browsing interface for item counterparts having different scales and lengths
CN109934331B (zh) 2016-04-29 2020-06-19 中科寒武纪科技股份有限公司 用于执行人工神经网络正向运算的装置和方法
US10552119B2 (en) 2016-04-29 2020-02-04 Intel Corporation Dynamic management of numerical representation in a distributed matrix processor architecture
US10187568B1 (en) 2016-05-02 2019-01-22 Bao Tran Video smart phone
US11055063B2 (en) 2016-05-02 2021-07-06 Marvell Asia Pte, Ltd. Systems and methods for deep learning processor
CN105978611B (zh) 2016-05-12 2019-09-17 京信通信系统(中国)有限公司 一种频域信号压缩方法及装置
US10796220B2 (en) * 2016-05-24 2020-10-06 Marvell Asia Pte, Ltd. Systems and methods for vectorized FFT for multi-dimensional convolution operations
AU2016203619A1 (en) * 2016-05-31 2017-12-14 Canon Kabushiki Kaisha Layer-based operations scheduling to optimise memory for CNN applications
EP3252949B1 (en) 2016-06-01 2020-03-18 Intel IP Corporation Methods and devices for predistortion of signals
US20170357910A1 (en) 2016-06-10 2017-12-14 Apple Inc. System for iteratively training an artificial intelligence using cloud-based metrics
CN107545889B (zh) 2016-06-23 2020-10-23 华为终端有限公司 适用于模式识别的模型的优化方法、装置及终端设备
CN106156310A (zh) 2016-06-30 2016-11-23 努比亚技术有限公司 一种图片处理装置和方法
US10372588B2 (en) 2016-07-08 2019-08-06 International Business Machines Corporation Providing debug information on production containers using debug containers
DE102016214786A1 (de) 2016-08-09 2018-02-15 Fujitsu Limited Anwendungsprofiling-Jobmanagement-System, -Programm und -Verfahren
US20180046903A1 (en) 2016-08-12 2018-02-15 DeePhi Technology Co., Ltd. Deep processing unit (dpu) for implementing an artificial neural network (ann)
CN107688855B (zh) 2016-08-12 2021-04-13 赛灵思公司 针对于复杂神经网络的分层量化方法与装置
CN106354568A (zh) 2016-08-23 2017-01-25 京信通信技术(广州)有限公司 一种不同进程间的通信方法及通信装置
US20180060719A1 (en) * 2016-08-29 2018-03-01 International Business Machines Corporation Scale-space label fusion using two-stage deep neural net
CN107797913A (zh) 2016-09-07 2018-03-13 大陆汽车电子(连云港)有限公司 一种实时系统的软件分析系统与方法
US11907760B2 (en) * 2016-09-23 2024-02-20 Apple Inc. Systems and methods of memory allocation for neural networks
CN106650922B (zh) 2016-09-29 2019-05-03 清华大学 硬件神经网络转换方法、计算装置、软硬件协作系统
US20180096243A1 (en) 2016-09-30 2018-04-05 General Electric Company Deep learning for data driven feature representation and anomaly detection
WO2018071546A1 (en) * 2016-10-11 2018-04-19 The Research Foundation For The State University Of New York System, method, and accelerator to process convolutional neural network layers
CN106485316B (zh) 2016-10-31 2019-04-02 北京百度网讯科技有限公司 神经网络模型压缩方法以及装置
CN106502626A (zh) 2016-11-03 2017-03-15 北京百度网讯科技有限公司 数据处理方法和装置
US10949736B2 (en) * 2016-11-03 2021-03-16 Intel Corporation Flexible neural network accelerator and methods therefor
US10762163B2 (en) * 2016-12-05 2020-09-01 Microsoft Technology Licensing, Llc Probabilistic matrix factorization for automated machine learning
US10216479B2 (en) 2016-12-06 2019-02-26 Arm Limited Apparatus and method for performing arithmetic operations to accumulate floating-point numbers
EP3552112A1 (en) 2016-12-09 2019-10-16 Beijing Horizon Information Technology Co., Ltd. Systems and methods for data management
US10997492B2 (en) 2017-01-20 2021-05-04 Nvidia Corporation Automated methods for conversions to a lower precision data format
CN106951587A (zh) 2017-02-15 2017-07-14 芯启源(南京)半导体科技有限公司 Fpga调试系统及方法
CN106951962B (zh) 2017-03-22 2020-09-01 南京地平线机器人技术有限公司 用于神经网络的复合运算单元、方法和电子设备
US10795836B2 (en) * 2017-04-17 2020-10-06 Microsoft Technology Licensing, Llc Data processing performance enhancement for neural networks using a virtualized data iterator
US10402932B2 (en) 2017-04-17 2019-09-03 Intel Corporation Power-based and target-based graphics quality adjustment
US10332302B2 (en) 2017-04-17 2019-06-25 Intel Corporation Scatter gather engine
US20180314945A1 (en) * 2017-04-27 2018-11-01 Advanced Micro Devices, Inc. Graph matching for optimized deep network processing
CN107025629B (zh) 2017-04-27 2021-03-26 维沃移动通信有限公司 一种图像处理方法及移动终端
US11842280B2 (en) 2017-05-05 2023-12-12 Nvidia Corporation Loss-scaling for deep neural network training with reduced precision
US10019668B1 (en) * 2017-05-19 2018-07-10 Google Llc Scheduling neural network processing
US10956535B2 (en) * 2017-05-22 2021-03-23 Microsoft Technology Licensing, Llc Operating a neural network defined by user code
TWI675335B (zh) * 2017-06-09 2019-10-21 宏達國際電子股份有限公司 訓練任務優化系統、訓練任務優化方法及其非暫態電腦可讀媒體
US10944902B2 (en) 2017-06-20 2021-03-09 Adobe Inc. Digital image generation using capture support data
EP3646164A4 (en) 2017-06-30 2021-01-20 INTEL Corporation HETEROGENIC MULTIPLIER
US10427306B1 (en) 2017-07-06 2019-10-01 X Development Llc Multimodal object identification
CN107451658B (zh) 2017-07-24 2020-12-15 杭州菲数科技有限公司 浮点运算定点化方法及系统
CN107688849B (zh) 2017-07-28 2021-04-13 赛灵思电子科技(北京)有限公司 一种动态策略定点化训练方法及装置
WO2019023984A1 (en) 2017-08-02 2019-02-07 Intel Corporation SYSTEM AND METHOD FOR INCLUSION OF NEURAL NETWORKS 1 AMONG N ON AN AUTOMATIC LEARNING COMPUTER PLATFORM
WO2019031858A1 (en) * 2017-08-08 2019-02-14 Samsung Electronics Co., Ltd. METHOD AND APPARATUS FOR DETERMINING MEMORY NEEDS IN A NETWORK
US20190050710A1 (en) 2017-08-14 2019-02-14 Midea Group Co., Ltd. Adaptive bit-width reduction for neural networks
CN107644254A (zh) 2017-09-09 2018-01-30 复旦大学 一种卷积神经网络权重参数量化训练方法及系统
US11437032B2 (en) 2017-09-29 2022-09-06 Shanghai Cambricon Information Technology Co., Ltd Image processing apparatus and method
US10224954B1 (en) 2017-09-29 2019-03-05 Intel Corporation Floating point to fixed point conversion
US11450319B2 (en) 2017-09-29 2022-09-20 Cambricon (Xi'an) Semiconductor Co., Ltd. Image processing apparatus and method
US10223114B1 (en) 2017-09-29 2019-03-05 Intel Corporation Fixed point to floating point conversion
JP6810283B2 (ja) 2017-09-29 2021-01-06 シャンハイ カンブリコン インフォメーション テクノロジー カンパニー リミテッドShanghai Cambricon Information Technology Co.,Ltd. 画像処理装置及び方法
JP6540770B2 (ja) 2017-10-17 2019-07-10 富士通株式会社 演算処理回路、演算処理回路を含む演算処理装置、演算処理装置を含む情報処理装置、および方法
US10410121B2 (en) 2017-10-25 2019-09-10 SparkCognition, Inc. Adjusting automated neural network generation based on evaluation of candidate neural networks
US20210061028A1 (en) 2017-10-26 2021-03-04 Applied Mechatronic Products Apparatus and method for vehicular monitoring, analysis, and control
JP6839641B2 (ja) * 2017-11-17 2021-03-10 株式会社東芝 演算処理装置
US10783634B2 (en) 2017-11-22 2020-09-22 General Electric Company Systems and methods to deliver point of care alerts for radiological findings
US10803379B2 (en) * 2017-12-12 2020-10-13 Amazon Technologies, Inc. Multi-memory on-chip computational network
CN108053028B (zh) 2017-12-21 2021-09-14 深圳励飞科技有限公司 数据定点化处理方法、装置、电子设备及计算机存储介质
US11636327B2 (en) 2017-12-29 2023-04-25 Intel Corporation Machine learning sparse computation mechanism for arbitrary neural networks, arithmetic compute microarchitecture, and sparsity for training mechanism
US11373088B2 (en) 2017-12-30 2022-06-28 Intel Corporation Machine learning accelerator mechanism
US11119915B2 (en) * 2018-02-08 2021-09-14 Samsung Electronics Co., Ltd. Dynamic memory mapping for neural networks
US20190251429A1 (en) 2018-02-12 2019-08-15 Kneron, Inc. Convolution operation device and method of scaling convolution input for convolution neural network
US11630666B2 (en) 2018-02-13 2023-04-18 Shanghai Cambricon Information Technology Co., Ltd Computing device and method
US11397579B2 (en) 2018-02-13 2022-07-26 Shanghai Cambricon Information Technology Co., Ltd Computing device and method
US11106598B2 (en) 2018-02-13 2021-08-31 Shanghai Cambricon Information Technology Co., Ltd. Computing device and method
CN116991226A (zh) 2018-02-14 2023-11-03 上海寒武纪信息科技有限公司 处理器的控制装置、方法及设备
JP7056225B2 (ja) 2018-02-26 2022-04-19 富士通株式会社 演算処理装置、情報処理装置、情報処理方法、およびプログラム
US10628275B2 (en) 2018-03-07 2020-04-21 Nxp B.V. Runtime software-based self-test with mutual inter-core checking
US11475306B2 (en) * 2018-03-22 2022-10-18 Amazon Technologies, Inc. Processing for multiple input data sets
CN108510067B (zh) 2018-04-11 2021-11-09 西安电子科技大学 基于工程化实现的卷积神经网络量化方法
US11562213B2 (en) * 2018-04-17 2023-01-24 Intel Corporation Methods and arrangements to manage memory in cascaded neural networks
US10691413B2 (en) 2018-05-04 2020-06-23 Microsoft Technology Licensing, Llc Block floating point computations using reduced bit-width vectors
EP3624020A4 (en) 2018-05-18 2021-05-05 Shanghai Cambricon Information Technology Co., Ltd CALCULATION PROCEDURES AND RELATED PRODUCTS
CN108717570A (zh) 2018-05-23 2018-10-30 电子科技大学 一种脉冲神经网络参数量化方法
US10360304B1 (en) 2018-06-04 2019-07-23 Imageous, Inc. Natural language processing interface-enabled building conditions control system
CN109062540B (zh) 2018-06-06 2022-11-25 北京理工大学 一种基于cordic算法的可重构浮点运算装置
CN109063820A (zh) 2018-06-07 2018-12-21 中国科学技术大学 利用时频联合长时循环神经网络的数据处理方法
CN110728364A (zh) 2018-07-17 2020-01-24 上海寒武纪信息科技有限公司 一种运算装置和运算方法
EP3798850A4 (en) 2018-06-27 2022-03-23 Shanghai Cambricon Information Technology Co., Ltd ON-CHIP CODE BREAKPOINT DEBUG METHOD, ON-CHIP PROCESSOR AND CHIP BREAKPOINT DEBUG SYSTEM
EP3757896B1 (en) 2018-08-28 2023-01-11 Cambricon Technologies Corporation Limited Method and device for pre-processing data in a neural network
WO2020062392A1 (zh) 2018-09-28 2020-04-02 上海寒武纪信息科技有限公司 信号处理装置、信号处理方法及相关产品
CN109685202B (zh) 2018-12-17 2023-03-21 腾讯科技(深圳)有限公司 数据处理方法及装置、存储介质和电子装置
CN111383638A (zh) 2018-12-28 2020-07-07 上海寒武纪信息科技有限公司 信号处理装置、信号处理方法及相关产品
CN109902745A (zh) 2019-03-01 2019-06-18 成都康乔电子有限责任公司 一种基于cnn的低精度训练与8位整型量化推理方法
CN110059733A (zh) 2019-04-01 2019-07-26 苏州科达科技股份有限公司 卷积神经网络的优化及快速目标检测方法、装置
CN109993296B (zh) 2019-04-01 2020-12-29 安徽寒武纪信息科技有限公司 量化实现方法及相关产品
CN111832739B (zh) 2019-04-18 2024-01-09 中科寒武纪科技股份有限公司 一种数据处理方法及相关产品
US20200334522A1 (en) 2019-04-18 2020-10-22 Cambricon Technologies Corporation Limited Data processing method and related products
US11676028B2 (en) 2019-06-12 2023-06-13 Shanghai Cambricon Information Technology Co., Ltd Neural network quantization parameter determination method and related products
JP7146954B2 (ja) 2019-08-23 2022-10-04 安徽寒武紀信息科技有限公司 データ処理方法、装置、コンピュータデバイス、及び記憶媒体
US20210374511A1 (en) 2019-08-23 2021-12-02 Anhui Cambricon Information Technology Co., Ltd. Data processing method, device, computer equipment and storage medium
EP4024280A4 (en) 2019-08-27 2022-11-16 Anhui Cambricon Information Technology Co., Ltd. DATA PROCESSING METHOD AND APPARATUS, COMPUTER EQUIPMENT AND STORAGE MEDIA
CN110780845B (zh) 2019-10-17 2021-11-30 浙江大学 一种用于量化卷积神经网络的可配置近似乘法器及其实现方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1503958A (zh) * 2001-04-19 2004-06-09 艾利森电话股份有限公司 自适应存储器分配
CN106874219A (zh) * 2016-12-23 2017-06-20 深圳云天励飞技术有限公司 一种卷积神经网络的数据调度方法、系统及计算机设备
CN107451654A (zh) * 2017-07-05 2017-12-08 深圳市自行科技有限公司 卷积神经网络的加速运算方法、服务器及存储介质
CN107608715A (zh) * 2017-07-20 2018-01-19 上海寒武纪信息科技有限公司 用于执行人工神经网络正向运算的装置及方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3640810A4 *

Also Published As

Publication number Publication date
KR20210044669A (ko) 2021-04-23
US11243895B2 (en) 2022-02-08
EP3757896A1 (en) 2020-12-30
EP3757896A8 (en) 2022-09-14
US20200125508A1 (en) 2020-04-23
JP2020533659A (ja) 2020-11-19
US11966583B2 (en) 2024-04-23
JP6867518B2 (ja) 2021-04-28
EP3757896B1 (en) 2023-01-11
EP3640810A1 (en) 2020-04-22
US20210334007A1 (en) 2021-10-28
EP3640810A4 (en) 2021-05-05
KR102519467B1 (ko) 2023-04-06

Similar Documents

Publication Publication Date Title
WO2020042739A1 (zh) 数据预处理方法、装置、计算机设备和存储介质
US11704553B2 (en) Neural network system for single processing common operation group of neural network models, application processor including the same, and operation method of neural network system
CN109492241B (zh) 转换方法、装置、计算机设备和存储介质
CN109102065B (zh) 一种基于PSoC的卷积神经网络加速器
US11561833B1 (en) Allocation and placement of resources for network computation
CN110750351B (zh) 多核任务调度器、多核任务调度方法、装置及相关产品
CN112799599B (zh) 一种数据存储方法、计算核、芯片和电子设备
CN110865950B (zh) 数据预处理方法、装置、计算机设备和存储介质
CN114580606A (zh) 数据处理方法、装置、计算机设备和存储介质
US10733498B1 (en) Parametric mathematical function approximation in integrated circuits
CN114201107A (zh) 存储装置、操作存储装置的方法以及电子装置
US20210256373A1 (en) Method and apparatus with accelerator
CN113837922A (zh) 计算装置、数据处理方法及相关产品
US20220365891A1 (en) Accelerator and electronic device including the same
CN110865792B (zh) 数据预处理方法、装置、计算机设备和存储介质
US20220044101A1 (en) Collaborative sensor data processing by deep learning accelerators with integrated random access memory
JP2019530114A (ja) フィードフォーワード及びフィードバックが設けられた多層パーセプトロンモデルを計算のためのモデル計算ユニット及び制御装置
CN114580607A (zh) 数据处理方法、装置和存储介质
CN112948291A (zh) 数据传输方法、电子设备和可读存储介质
US20220114015A1 (en) Electronic device and method with scheduling
US11720417B2 (en) Distributed inferencing using deep learning accelerators with integrated random access memory
CN114281405A (zh) 数据处理方法、装置和存储介质
CN114239794A (zh) 操作方法和电子装置
JP2019526876A (ja) モデル計算ユニット、及び、多層パーセプトロンモデルを計算するための制御装置
CN110826704A (zh) 一种用于防止神经网络过拟合的处理装置及系统

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2019568721

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2019812653

Country of ref document: EP

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19812653

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE