CN109308280A - Data processing method and relevant device - Google Patents
Data processing method and relevant device Download PDFInfo
- Publication number
- CN109308280A CN109308280A CN201710617841.0A CN201710617841A CN109308280A CN 109308280 A CN109308280 A CN 109308280A CN 201710617841 A CN201710617841 A CN 201710617841A CN 109308280 A CN109308280 A CN 109308280A
- Authority
- CN
- China
- Prior art keywords
- memory address
- target memory
- cpu
- computing unit
- value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
- G06F15/163—Interprocessor communication
- G06F15/167—Interprocessor communication using a common memory, e.g. mailbox
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/0223—User address space allocation, e.g. contiguous or non contiguous base addressing
- G06F12/023—Free address space management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0893—Caches characterised by their organisation or structure
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Software Systems (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
The embodiment of the invention discloses a kind of data processing method, accelerate computing unit, central processing unit and heterogeneous system, for improving service process performance.The data processing method of the embodiment of the present invention is applied to accelerate computing unit, and accelerating computing unit includes accelerating engine and caching management module.This method comprises: accelerating engine, which obtains the business that central processor CPU is sent, accelerates processing request, accelerating engine accelerates processing request to handle business, to obtain processing result, accelerating engine is to caching management module application memory address, obtain target memory address, the memory headroom that processing result write-in target memory address is directed toward by accelerating engine, accelerating engine send target memory address to CPU.For accelerating engine to the acquisition of memory address by realizing to caching management module application, which is to be pre-stored in the memory address accelerated on computing unit.To, accelerating engine can quick obtaining to memory address, realize the raising of service process performance.
Description
Technical field
The present embodiments relate to data processing field more particularly to a kind of data processing method, accelerate computing unit, in
Central processor and heterogeneous system.
Background technique
On heterogeneous system, processing stream is usually responsible for by central processing unit (Central Processing Unit, CPU)
The control of journey, and specific processing (such as compressed and decompressed, encrypting and decrypting etc.) is executed by dedicated acceleration computing unit.Accelerate
Computing unit can be for example field programmable gate array (Field Programmable Gate Array, FPGA), figure
Shape processing unit/graphics processor (Graphics Processing Unit, GPU), digital signal processor (Digital
Signal Processor, DSP), specific integrated circuit (Application Specific Integrated Circuit,
ASIC) etc..Specifically, for CPU to accelerating computing unit to send message, which includes that business accelerates processing request and memory
Then address accelerates to calculate the acceleration processing request of the cell processing business, obtains processing result, and should by processing result write-in
Then the memory headroom that memory address is directed toward accelerates computing unit that the memory address is sent to CPU.
In general, computing unit is accelerated to accelerate processing request to carry out the corresponding processing result for handling and obtaining business
Length is uncertain.If the memory headroom for accelerating computing unit to obtain is not enough to store processing result, accelerate computing unit
It needs to another memory address of CPU application.Detailed process are as follows: CPU to accelerate computing unit send business accelerate processing request and
First memory address obtains processing result so that accelerating the accelerating engine of computing unit to handle the business accelerates processing request, and
The processing result is written to the memory headroom of first memory address direction.If the memory headroom that the first memory address is directed toward is write
Man Hou, the business processing request still untreated completion, then the accelerating engine of computing unit is accelerated to trigger an interrupt notification CPU,
Indicate that result memory headroom is insufficient.The interruption that the interruption processing module processing of CPU accelerates computing unit to report, from the memory of CPU
Module application obtains the second memory address.Length of the interruption processing module of CPU the second memory address and the second memory address
Notice accelerates computing unit.Accelerate the accelerating engine of computing unit that second memory address is written in the result after continuing with to refer to
To memory headroom.If after the memory headroom that the second memory address is directed toward is write completely, business accelerates processing to request still untreated completion,
Computing unit is then accelerated to continue to CPU application memory address.
The implementation of above-mentioned application memory address, the problem of bringing, are: accelerating computing unit at processing business acceleration
When reason request, if when the Insufficient memory of the memory address obtained, accelerating computing unit by way of interruption to CPU application
Memory address, at this time, it may be necessary to waiting for CPU continuation application memory headroom and obtain CPU transmission new memory address.At this section
Between time for waiting, accelerate computing unit that cannot continue to handle source data, cause process performance not high.
Summary of the invention
The embodiment of the invention provides a kind of data processing method, accelerate computing unit, central processing unit and heterogeneous system,
For improving service process performance.
The first aspect of the embodiment of the present invention provides a kind of data processing method, and this method is applied to accelerate computing unit,
Accelerating computing unit includes accelerating engine and caching management module, and caching management module is used for managing internal memory address.This method packet
It includes:
Accelerating engine obtains the business that central processor CPU is sent and accelerates processing request.The business accelerates processing request packet
Information relevant to source data is included, for example including source data address and source data length.The business accelerates processing request for referring to
Show that accelerating engine handles source data.Then, accelerating engine accelerates processing request to handle business, to obtain processing result.
In order to cache the processing result, accelerating engine to caching management module application memory address, caching management module be accelerating engine
Target memory address is distributed, so that accelerating engine obtains target memory address.Wherein, computing unit is accelerated to prestore at least one
Memory address, target memory address belong to the memory address for accelerating to prestore on computing unit.To which accelerating engine can tie processing
The memory headroom that target memory address is directed toward is written in fruit.Then, accelerating engine sends target memory address to CPU, so that CPU is logical
It crosses the target memory address and gets the processing result.
In this way, accelerating engine handles to obtain processing result, accelerating engine can be obtained to caching management module application memory address
To the target memory address accelerated on computing unit is pre-stored in, then, which is written into the target memory address and is directed toward
Memory headroom on, accelerating engine to CPU send target memory address after, CPU can be read by the target memory address
The processing result.Because accelerating engine to the acquisition of memory address by being realized to caching management module application, the accelerating engine
Be located on same acceleration computing unit with caching management module, accelerating engine application to memory address be pre-stored in accelerometer
Calculate unit on memory address, in this way, accelerating engine can quick obtaining arrive memory address, with store processing business acceleration handle ask
The processing result asked, thus, reduce the waiting time that accelerating engine obtains memory address, it can quick obtaining memory
Location realizes the raising of service process performance to use the memory address to store processing result.
In conjunction with the embodiment of the present invention in a first aspect, the first implementation of the first aspect in the embodiment of the present invention
In, accelerating engine obtains target memory address and accelerating engine for processing result to caching management module application memory address
The memory headroom that target memory address is directed toward is written, comprising:
Accelerating engine processing business accelerates processing request, and for the processing result that caching process obtains, accelerating engine is to slow
Management module application memory address is deposited, the first memory address is obtained, wherein the first memory address is to accelerate to prestore on computing unit
One of memory address.Then, the memory sky that the first memory address is directed toward is written in the first processing result by accelerating engine
Between, the first processing result belongs to processing result.That is accelerating engine processing business accelerates the obtained processing result of processing request to include
First processing result, the processing result further include second processing result.At this point, when the memory headroom quilt of the first memory address direction
Write full, and when business accelerates processing to request untreated completion, accelerating engine obtains the to caching management module application memory address
Two memory address, the second memory address are one of the memory address for accelerating to prestore on computing unit.Then, accelerating engine
The memory headroom that the second memory address is directed toward is written into second processing result, second processing result belongs to processing result.
Accelerating engine needs the memory address of the memory headroom caching process result of direction being sent to CPU, thus, accelerate
Engine sends target memory address to CPU, comprising: accelerating engine sends the first memory address and the second memory address to CPU.
In this way, accelerating engine can when accelerating engine needs caching process business that processing is accelerated to request obtained processing result
Can be by caching management module application memory address, obtaining being stored in the memory address accelerated on computing unit
Apply for memory address when processing business accelerates processing request, is also possible to the memory being directed toward in the memory address applied
When space is write completely and processing is accelerated to request still untreated completion for business, then to caching management module application memory address, thus nothing
It need to be to CPU application memory address.
In conjunction with the embodiment of the present invention in a first aspect, second of implementation of the first aspect in the embodiment of the present invention
In, before accelerating engine obtains the business acceleration processing request that CPU is sent, the method for this implementation further include: cache management
Module obtains the target memory address that CPU is sent, and then, target memory address is stored in acceleration and calculates list by caching management module
In member.To, it can be achieved that accelerate computing unit on stored memory address, for accelerating engine use.
In conjunction with the first and second of realization of the first aspect of the embodiment of the present invention, the first aspect of the embodiment of the present invention
Any one in mode, in the third implementation of the first aspect of the embodiment of the present invention, accelerating engine is sent to CPU
After target memory address, the method for this implementation further include: caching management module is with obtaining the target memory that CPU is sent
Location, the processing result on memory headroom that target memory address is directed toward are read by CPU.That is, CPU gets target memory address
Afterwards, the processing result cached on the memory headroom that target memory address is directed toward is read, then, CPU sends out the target memory address
The caching management module for accelerating computing unit is given, thus, target memory address is stored in acceleration and calculated by caching management module
On unit, for subsequent use.
In this way, it can be used repeatedly for memory address, it is ensured that accelerate computing unit to have workable memory address, improve
System performance, and avoid and divide new memory from memory headroom repeatedly for accelerating computing unit to use, reduce system
Expense.
In conjunction with the first and second of realization of the first aspect of the embodiment of the present invention, the first aspect of the embodiment of the present invention
Any one in mode accelerates pre- on computing unit in the 4th kind of implementation of the first aspect of the embodiment of the present invention
The memory address deposited is the memory address according to bit bit compression of alignment.To which, accelerating engine is to caching management module application
Memory address obtains target memory address, comprising: accelerating engine sends memory address request, the memory to caching management module
Address requests are used to request memory address to caching management module.Under the triggering of memory address request, caching management module
From determining target memory address in the memory address prestored on computing unit is accelerated, caching management module is right according to being aligned bit
Target memory address is decompressed, the target memory address decompressed.Then, accelerating engine obtains caching management module hair
The target memory address for the decompression sent.
By compressing to memory address, memory address the space occupied size can be reduced, to be conducive to accelerating
On computing unit, in a certain size memory space, more memory address are stored, facilitate lifting system capacity.
In conjunction with the first and second of realization of the first aspect of the embodiment of the present invention, the first aspect of the embodiment of the present invention
Any one in mode accelerates pre- on computing unit in the 5th kind of implementation of the first aspect of the embodiment of the present invention
The memory address deposited is preset with check value, which can be on memory address to be stored in acceleration computing unit
Before, it is calculated by caching management module, then, be stored in corresponding with the check value of memory address is accelerated on computing unit.
To which accelerating engine obtains target memory address to caching management module application memory address, comprising: accelerating engine is to caching
Management module sends memory address request, and memory address request is for requesting memory address to caching management module.In the memory
Under the triggering of Address requests, caching management module is from accelerating with determining target memory on computing unit in the memory address that prestores
The check value of target memory address is calculated in location, caching management module.When the check value and target memory address being calculated
Preset check value matching when, expressions target memory address be correct memory address, mistake does not occur in storing process
Change, thus caching management module to accelerating engine send target memory address so that accelerating engine is with obtaining target memory
Location.
By the default of check value, before accelerating engine is distributed in target memory address, caching management module is first calculated
The check value of target memory address out, check value and preset check value that logical matching primitives obtain ensure target memory address
The reliable of system operation is improved so that correct memory address can be used in accelerating engine for correct memory address
Property.
In conjunction with the first and second of realization of the first aspect of the embodiment of the present invention, the first aspect of the embodiment of the present invention
Any one in mode, in the 6th kind of implementation of the first aspect of the embodiment of the present invention, target memory address configuration
There is internal storage state word, the value of internal storage state word includes accelerating computing unit to occupy to occupy with CPU, and the value of internal storage state word is by accelerating
Computing unit is arranged to obtain.
Wherein, the acceleration computing unit of target memory address occupies for indicating that target memory address has been stored in acceleration
On computing unit, target memory address is used by acceleration computing unit.The CPU of target memory address occupies for indicating in target
The processing result for depositing the memory headroom caching of address direction can be read by CPU, and target memory address is used by CPU.
Internal storage state word is set, memory address can be managed by internal storage state word.Specifically, accelerating engine to
After CPU sends target memory address, the method for this implementation further include: caching management module obtains the target that CPU is sent
Memory address, the value of the internal storage state word of target memory address are continuously time that CPU occupies and are greater than preset time.I.e. in system
When detecting that the value of the internal storage state word of target memory address is continuously time that CPU occupies and is greater than preset time, for example, passing through
CPU carries out the detection operation, which can be sent to again caching management module by CPU, so that cache management mould
Target memory address is stored in by block to be accelerated on computing unit.
CPU reads the processing cached on the memory headroom that target memory address is directed toward after getting target memory address
As a result, still, it is understood that there may be the problems such as failure, target memory address is caused to be lost, calculated so that memory address is no longer accelerated
Unit and CPU are used, if detect the value of the internal storage state word of target memory address be continuously time that CPU occupies be greater than it is pre-
If the time, then it represents that the target memory address is in lost condition, so that CPU recycles the target memory address, and by the target
Memory address is sent to the caching management module for accelerating computing unit, allows and computing unit is accelerated to reuse in the target
Address is deposited, memory source, the reliability of lifting system can be made full use of in this way.
In conjunction with the 6th kind of implementation of the first aspect of the embodiment of the present invention, the of the first aspect of the embodiment of the present invention
In seven kinds of implementations, target memory address is also configured with verification status word, verifies status word and the state synchronized time is corresponding, shape
Time when being used to state synchronization time indicate to synchronize the value for verifying status word as the value of internal storage state word.Target memory address
The value for verifying status word is obtained by the value of the internal storage state word of target memory address synchronous under synchronous condition, and synchronous condition is mesh
The value for marking the value of the verification status word of memory address and the internal storage state word of target memory address is not identical.For example, CPU is detected
When the value of the internal storage state word of the value and target memory address of the verification status word of target memory address is not identical, CPU is by target
The value of the verification status word of memory address is synchronous be target memory address internal storage state word value, and when updating state synchronized
Between.
The verification status word obtained in this way and state synchronized time, the case where can be used for detecting internal storage state word, i.e. mesh
The value for marking the internal storage state word of memory address is continuously time that CPU occupies and is greater than preset time specifically: target memory address
The value of internal storage state word and the value of verification status word of target memory address be all that CPU occupies, and the shape of target memory address
Difference between state synchronization time and current time is greater than preset time, and current time is the memory for detecting target memory address
The value of the verification status word of the value and target memory address of status word is all the time that CPU occupies.
In this way, the value of internal storage state word to be synchronized to the value for verifying status word, may make verification status word when being worth different
Value the case where reflecting the value of internal storage state word, then the state synchronized time is detected, internal storage state word can be obtained
The temporal information of value.It verifies status word to be arranged by CPU, CPU by detection verifies status word and state synchronized time to reach pair
The detection of internal storage state word, can be improved execution efficiency in this way.
The second aspect of the embodiment of the present invention provides a kind of data processing method, and this method is applied on CPU, and CPU includes
Service Processing Module and memory modules.This method comprises:
Because the memory modules on CPU are used for managing internal memory address, so that Service Processing Module is to memory modules application memory
Address obtains target memory address.Then, Service Processing Module is into the caching management module transmission target for accelerating computing unit
Address is deposited, is accelerated on computing unit so that target memory address is stored in by caching management module.It is calculated in this way, realizing acceleration
Stored memory address on unit.Service Processing Module obtains business and accelerates after handling request, and Service Processing Module is calculated to acceleration
The accelerating engine of unit sends business and accelerates processing request, so that accelerating engine accelerates processing request to handle business, with
Processing result is obtained, accelerating engine is to caching management module application memory address, and after obtaining target memory address, accelerating engine will
The memory headroom that target memory address is directed toward is written in processing result.That is CPU first sends memory address to acceleration computing unit, for
Accelerate computing unit storage, in this way, when accelerating calculating cell processing business that processing request is accelerated to obtain processing result, in order to cache
The processing result accelerates the accelerating engine of computing unit directly can obtain the target memory prestored from acceleration computing unit
Address, without to CPU apply acquisition memory address, accelerating engine can quick obtaining to memory address, added with storing processing business
The processing result that speed processing request obtains, thus, reduce the waiting time that accelerating engine obtains memory address, it can quick obtaining
Memory address realizes the raising of service process performance to use the memory address to store processing result.
In conjunction with the second aspect of the embodiment of the present invention, in the first implementation of the second aspect of the embodiment of the present invention
In, Service Processing Module sends business to the accelerating engine of acceleration computing unit and accelerates after handling request, this implementation
Method further include: Service Processing Module obtains the target memory address that accelerating engine is sent, which is directed toward interior
Depositing spatial cache has processing result, thus the memory headroom reading process knot that Service Processing Module is directed toward from target memory address
Fruit.Then, Service Processing Module sends target memory address to the caching management module of the acceleration computing unit, so that caching is managed
Target memory address is stored in by reason module to be accelerated on computing unit.
In this way, it can be used repeatedly for memory address, it is ensured that accelerate computing unit to have workable memory address, improve
System performance, and avoid and divide new memory from memory headroom repeatedly for accelerating computing unit to use, reduce system
Expense.
In conjunction with the second aspect of the embodiment of the present invention, in second of implementation of the second aspect of the embodiment of the present invention
In, CPU further includes verifying module and target memory address configuration to have internal storage state word, and the value of internal storage state word includes accelerating
Computing unit occupies to be occupied with CPU, and the value of the internal storage state word is arranged to obtain by acceleration computing unit.
Wherein, the acceleration computing unit of target memory address occupies for indicating that target memory address has been stored in acceleration
On computing unit, target memory address is used by acceleration computing unit.The CPU of target memory address occupies for indicating in target
The processing result for depositing the memory headroom caching of address direction can be read by CPU, and target memory address is used by CPU.
Internal storage state word is set, memory address can be managed by internal storage state word.Specifically, business processing mould
Block sends business to the accelerating engine of acceleration computing unit and accelerates after handling request, the method for this implementation further include: core
The value for the internal storage state word for looking into module detection target memory address is continuously whether the time that CPU occupies is greater than preset time.If
The value of the internal storage state word of target memory address is continuously time that CPU occupies and is greater than preset time, then Service Processing Module to
Caching management module sends target memory address, so that target memory address is stored in acceleration computing unit by caching management module
On.
CPU reads the processing cached on the memory headroom that target memory address is directed toward after getting target memory address
As a result, still, it is understood that there may be the problems such as failure, target memory address is caused to be lost, calculated so that memory address is no longer accelerated
Unit and CPU are used, if verify that module detects that the value of the internal storage state word of target memory address is continuously that CPU occupies when
Between be greater than preset time, then it represents that the target memory address is in lost condition, so that CPU recycles the target memory address, and
The target memory address is sent to the caching management module for accelerating computing unit, allows and computing unit is accelerated to reuse
The target memory address can make full use of memory source, the reliability of lifting system in this way.
In conjunction with second of implementation of the second aspect of the embodiment of the present invention, the of the second aspect of the embodiment of the present invention
In three kinds of implementations, target memory address is also configured with verification status word.Wherein, status word and state synchronized time pair are verified
It answers, the time when state synchronized time is used to indicate to synchronize the value for verifying status word as the value of internal storage state word.
Verify module detection target memory address internal storage state word value be continuously the time that CPU occupies whether be greater than it is pre-
If the time, comprising:
Verify module every prefixed time interval, with judging value and the target memory of the verification status word of target memory address
Whether the value of the internal storage state word of location is identical;
If the value of the internal storage state word of the value and target memory address of the verification status word of target memory address is not identical,
The value that the value of the verification status word of target memory address is synchronized the internal storage state word for target memory address by module is verified, and more
The state synchronized time of fresh target memory address;
If the value of the verification status word of target memory address is identical with the value of the internal storage state word of target memory address, when
Verify module detect the value of the verifications status word of target memory address occupy for CPU, the state synchronized of target memory address when
Between difference between current time be greater than preset time and within a preset time interval the internal storage state word of target memory address
Value when having not been changed, Service Processing Module executes the step of sending target memory address to caching management module, and current time is
It verifies module and detects that the value of the verification status word of target memory address is the time that CPU occupies.
In this way, the value of internal storage state word to be synchronized to the value for verifying status word, may make verification status word when being worth different
Value the case where reflecting the value of internal storage state word, then the state synchronized time is detected, internal storage state word can be obtained
The temporal information of value.It verifies status word to be arranged by CPU, CPU by detection verifies status word and state synchronized time to reach pair
The detection of internal storage state word, state synchronized time and verification status word are that CPU is arranged to obtain, and facilitate CPU to the state synchronized
Time and the use for verifying status word, can be improved execution efficiency in this way.
The another aspect of the application provides a kind of computer readable storage medium, in the computer readable storage medium
It is stored with instruction, when run on a computer, so that computer executes method described in above-mentioned various aspects.
The another aspect of the application provides a kind of computer program product comprising instruction, when it runs on computers
When, so that computer executes method described in above-mentioned various aspects.
As can be seen from the above technical solutions, the embodiment of the present invention has the advantage that
Accelerating computing unit includes accelerating engine and caching management module, and caching management module is used for managing internal memory address,
On accelerating computing unit, accelerating engine obtains the business that CPU is sent and accelerates processing request, and then, accelerating engine adds business
Speed processing request is handled, to obtain processing result.Accelerating engine obtains target to caching management module application memory address
Memory address, wherein accelerate computing unit to prestore at least one memory address, target memory address belongs to acceleration computing unit
On the memory address that prestores.To, the memory headroom that processing result write-in target memory address is directed toward by accelerating engine, Yi Jijia
Fast engine sends target memory address to CPU.
In this way, setting is used for the caching management module of managing internal memory address on accelerating computing unit, accelerate computing unit
Accelerating engine business that CPU is sent accelerate processing request to handle, to obtain processing result.Accelerating engine needs to store
The processing result, for this purpose, accelerating engine can obtain being pre-stored on acceleration computing unit to caching management module application memory address
Target memory address, then, by the processing result be written the target memory address direction memory headroom on, accelerating engine to
After CPU sends target memory address, CPU can read the processing result by the target memory address.Because of accelerating engine
To the acquisition of memory address by realizing to caching management module application, the accelerating engine and caching management module are located at same
Accelerate computing unit on, accelerating engine application to memory address be pre-stored in accelerate computing unit on memory address, in this way,
Accelerating engine can quick obtaining to memory address, to store the processing result that processing business accelerates processing request to obtain.With acceleration
Engine is compared to the scheme of CPU application memory address, and the scheme of the embodiment of the present invention is without waiting for CPU to the sound of memory address
It answers, reduces the waiting time that accelerating engine obtains memory address, so as to quick obtaining memory address, with using the memory
Location stores processing result, realizes the raising of service process performance.
Detailed description of the invention
Fig. 1 is a kind of hardware system structure schematic diagram of heterogeneous system provided in an embodiment of the present invention;
Fig. 2 shows a kind of existing step schematic diagrames of data processing method;
Fig. 3 is the memory address stored on acceleration computing unit provided in an embodiment of the present invention, accelerates computing unit and CPU
Relation schematic diagram;
Fig. 4 is a kind of method flow diagram of data processing method provided in an embodiment of the present invention;
Fig. 5 is the flogic system block diagram of the data processing method of the embodiment of the present invention;
Fig. 6 is the method flow diagram of embodiment illustrated in fig. 5;
Fig. 7 is the division mode schematic diagram in the physical memory space of embodiment illustrated in fig. 5;
Fig. 8 is the memory address schematic diagram of embodiment illustrated in fig. 5;
Fig. 9 is the specific implementation process schematic that embodiment illustrated in fig. 6 is related to;
Figure 10 is a kind of concrete structure schematic diagram for accelerating computing unit provided in an embodiment of the present invention;
Figure 11 is a kind of flow chart of data processing method provided in an embodiment of the present invention;
Figure 12 is that the memory headroom that embodiment illustrated in fig. 11 is related to manages schematic diagram;
Figure 13 is that the memory address that embodiment illustrated in fig. 11 is related to compresses schematic diagram;
Figure 14 is that the memory address that embodiment illustrated in fig. 11 is related to compresses another schematic diagram;
Figure 15 is the memory address mappings relational graph that embodiment illustrated in fig. 11 is related to;
Figure 16 is the internal storage state word that embodiment illustrated in fig. 11 is related to and verification status word relational graph;
Figure 17 is a kind of structural schematic diagram for accelerating computing unit provided in an embodiment of the present invention;
Figure 18 is a kind of structural schematic diagram of central processing unit provided in an embodiment of the present invention.
Specific embodiment
The embodiment of the invention provides a kind of data processing method, accelerate computing unit, central processing unit and heterogeneous system,
For improving service process performance.
Fig. 1 is a kind of hardware system structure schematic diagram of heterogeneous system provided in an embodiment of the present invention.Refering to fig. 1, this is different
Construction system includes CPU and acceleration computing unit.Wherein, the external memory of CPU, and pass through PCIe interface and acceleration computing unit phase
Even.CPU is responsible for the control of process flow, and executes specific processing, such as compressed and decompressed and encryption by acceleration computing unit
Decryption etc..
The scene of heterogeneous system execution business processing are as follows: CPU includes Service Processing Module, which passes through
Message accelerates business to be processed to handle request and be sent to that computing unit is accelerated to carry out acceleration processing, accelerates computing unit to industry
After the completion of the processing that business accelerates processing request to carry out, i.e., the memory headroom being directed toward obtained processing result write-in memory address,
Accelerate computing unit that the information such as the memory address and processing result length are passed through message buffer descriptor (Buffer
Descriptor, BD) it is sent to CPU, the Service Processing Module of CPU receives message BD from receiving queue, to obtain processing knot
Fruit, and be further processed.
It is appreciated that the acceleration computing unit can for FPGA, GPU, DSP or ASIC etc., the embodiment of the present invention to this not
Make specific limit.
The existing acceleration treatment process of heterogeneous system are as follows: CPU to accelerate computing unit send message, the message include to
The source data address of processing and source data length, the purpose memory address and purpose memory address length of storing processing result, with
And other processing parameters, the processing parameter are used to refer to accelerate how computing unit is handled source data.Accelerate to calculate single
After member gets these information, computing unit is accelerated to read source data from memory according to source data address and source data length, with
And the processing type to be executed is obtained according to processing parameter, corresponding processing is then carried out to source data according to processing type, is obtained
To processing as a result, and writing processing result in the memory that purpose memory address is directed toward.
In general, the length for the processing result being calculated is uncertain.Such as accelerate computing unit be FPGA
The business specifically executed be decompression scene under, the result length after being decompressed to source data be it is uncertain, depend on
The algorithm and data format of compression.At this time, the memory headroom size of FPGA storage processing result is sent to regard to bad determination.Example
Such as, one section there are the text file of more duplicate contents, and size is 50MB, there was only 10MB after compression, to this compressed file into
When row decompression, accelerate computing unit that the result length after decompression can not be determined in advance, CPU can only apply for one section of mesh based on experience value
Memory headroom, the size of the purpose memory headroom is, for example, 30MB.Then, CPU sends the address of the purpose memory headroom
To FPGA.In decompression procedure the purpose memory headroom can be constantly written in decompression result by FPGA, if the discovery space 30MB is inadequate,
Then FPGA is needed to the more memory headrooms of CPU application, to save decompression result.In the scene having, the picture of a 12MB
File, file size has 10MB after compression.CPU applies for the memory headroom of 30MB based on experience value, and by the ground of the memory headroom
Location is sent to FPGA, for FPGA use, and the space of 12MB can be only used after FPGA decompression, it is seen then that at this moment exist in 18MB
Deposit the waste in space.
Accelerate computing unit when handling source data, if the Insufficient memory caching process for the memory address that discovery obtains
As a result, existing processing mode is: in the memory headroom deficiency that discovery uses, computing unit being accelerated to notify by interrupt mode
CPU, then, CPU apply for memory again, and the memory is supplemented to computing unit is accelerated, for accelerating computing unit to use.Under
Face will be explained by taking Fig. 2 as an example.
As shown in Fig. 2, Fig. 2 shows a kind of existing step schematic diagram of data processing method, wherein accelerating to calculate single
Member is by taking FPGA as an example.The data processing method includes:
The Service Processing Module executed on step 201:CPU obtains the first memory address to memory modules application memory.
Step 202: Service Processing Module constructs the message BD for being sent to FPGA, and sends message BD to and send team
Column.
Wherein message BD includes source data address and source data length to be processed, first for storing processing result
Memory address and the first memory address length and other data processing parameters.
Step 203:FPGA reads message BD from transmit queue, message BD is sent to accelerating engine, to be handled.
Accelerating engine parses message BD, obtains source data address, source data length and the first memory address, then accelerates
Engine reads to obtain source data according to the source data address and source data length, handles the source data, obtains processing knot
Fruit, and the processing result is written in the memory headroom that first memory address is directed toward.
If step is then jumped in the also untreated completion of source data after the memory headroom that the first memory address is directed toward is fully written
204, otherwise jump to step 207.
Step 204: accelerating engine triggers an interrupt notification CPU, and the memory headroom to indicate memory address is insufficient.
The interruption that the interruption processing module processing FPGA of step 205:CPU is reported, obtains in second from memory modules application
Deposit address.
The interruption processing module of step 206:CPU notifies the second memory address and the second memory address length to FPGA.
After FPGA obtains the second memory address and the second memory address length, the accelerating engine of FPGA is to source data continuation
The memory headroom that the second memory address is directed toward is written in the processing result obtained after processing.If the memory that the second memory address is directed toward
After space is write completely, the still untreated completion of source data then jumps to step 204, otherwise jumps to step 207.
Step 207:FPGA has handled source data, constructs message BD, and receiving queue, message BD is written in message BD
It include result address list and processing result length.Wherein, which includes that memory headroom is written with processing
As a result memory address, the first memory address and the second memory address as escribed above.
The Service Processing Module of step 208:CPU reads message BD from receiving queue, then, parses message BD, gets
Result address list and processing result length, to be carried out according to the result address list and processing result length to processing result
Further processing.
Such processing mode, bring influence is:
1) FPGA is in low memory, needs waiting for CPU continuation application memory headroom and sends new memory address and interior
Address size is deposited, in this period, FPGA cannot continue to handle source data, lead to the data processing performance of FPGA not
It is high.
2) FPGA exists with CPU and frequently interacts.
3) in order to reduce the interaction of CPU and FPGA, the memory headroom for the memory address that usual CPU is issued is bigger, this meeting
Biggish memory headroom is caused to waste.
Wherein, accelerate computing unit when the memory headroom of the memory address of acquisition is not enough to store processing result, need
Waiting for CPU supplements memory headroom, in this period, accelerates calculating is single cannot continue with data, causes to accelerate to calculate single number
It is not high according to process performance.The problem of this is especially needed concern accelerates to calculate if can solve the technical problem to improve
The data processing performance of unit, will save the time of user, and make the operational effect of heterogeneous system more excellent.
For this purpose, the embodiment of the invention provides a kind of data processing method, accelerating computing unit, central processing unit and isomery
System, to solve above-mentioned technical problem.
The present embodiments relate to acceleration computing unit include accelerating engine and caching management module, the accelerating engine
For accelerating processing request specifically to be handled business, which is increased module, caching management module
The cache list for accelerating computing unit to use is managed, the memory address of cache list storage is provided by CPU, i.e., CPU can
Memory address is notified to computing unit is accelerated, so that computing unit is accelerated to flush to the memory address in cache list.Add
When fast computing unit is needed using memory, memory address directly can be taken out from the cache list.In this way, accelerating computing unit not
It needs to notify CPU, the response of passive waiting for CPU, to improve system process performance by interrupt mode.
Wherein, which can be realized by calculating cell encoding to acceleration.
About the memory address stored on acceleration computing unit, the relationship of computing unit and CPU is accelerated to can refer to shown in Fig. 3
Schematic diagram.In Fig. 3, accelerate computing unit by taking FPGA as an example, FPGA random access memory (random in piece
Access memory, RAM) storage memory address, the space that memory address is directed toward is the ram space on CPU.
It is appreciated that accelerating the memory address stored on computing unit that can not also be delayed in the form of cache list
It deposits.
In the following, i.e. to data processing method provided in an embodiment of the present invention, acceleration computing unit, central processing unit and isomery
System is described in detail, and the content of Examples below can refer to the content of above-mentioned Fig. 1 to embodiment illustrated in fig. 3.
Firstly, summarizing data processing method provided in an embodiment of the present invention.
Fig. 4 is a kind of method flow diagram of data processing method provided in an embodiment of the present invention.This method is applied to accelerate
Computing unit, the acceleration computing unit include accelerating engine and caching management module, and caching management module is for managing internal memory
Location.Specific usage scenario sees the specific descriptions of Fig. 1 and embodiment shown in Fig. 3 and above-mentioned other embodiments.
Refering to Fig. 4, which includes:
Step 401: accelerating engine obtains the business that CPU is sent and accelerates processing request.
Service Processing Module on CPU obtains business and accelerates after handling request, and the Service Processing Module is single to accelerating to calculate
The accelerating engine of member sends the business and accelerates processing request, so that accelerating engine gets the business and accelerates processing request, and right
It is handled.
Wherein, it includes information relevant to source data to be processed that business, which accelerates processing request, for example including source data
Location, alternatively, including source data address and the information such as source data length and processing parameter.
Computing unit is accelerated to be specifically as follows FPGA, GPU, DSP or ASIC etc..
It can be to be obtained by way of message BD that accelerating engine obtains business to accelerate the concrete mode of processing request from CPU
It takes, i.e., the Service Processing Module of CPU constructs to obtain message BD, and message BD includes that business accelerates processing request, and accelerating engine is
The business, which is got, by message BD accelerates processing request.
Step 402: accelerating engine accelerates processing request to handle business, to obtain processing result.
Accelerating engine gets business and accelerates after handling request, accelerates processing request to handle the business, after processing
Obtain processing result.
CPU is responsible for the control of process flow, and computing unit is accelerated to execute specific processing.
Specifically, accelerate the accelerating engine of computing unit to get business and accelerate processing request, business acceleration processing is asked
It asks including source data address, source data length and processing parameter.The accelerating engine according to source data address and source data length from
Memory reads source data.Then, accelerating engine carries out the specified acceleration calculating of processing parameter to the source data.
Step 403: accelerating engine obtains target memory address to caching management module application memory address.
Wherein, computing unit is accelerated to prestore at least one memory address, target memory address belongs to acceleration computing unit
On the memory address that prestores.
The acceleration computing unit includes accelerating engine and caching management module, and the caching management module is for managing internal memory
Location.When accelerating engine needs memory address, when the processing result obtained with storing step 402, accelerating engine is to cache management mould
Block application memory address.Because the acceleration computing unit prestores at least one memory address, caching management module gets acceleration
After the application of engine, accelerating engine is distributed into the target memory address in the memory address prestored.
Wherein, the application in step 403, is specifically as follows: accelerating engine sends a memory address to caching management module
Request, memory address request is for requesting memory address to caching management module.Caching management module is with getting the memory
After the request of location, target memory address is determined from the memory address for accelerating computing unit to prestore, and the target memory address is sent out
Give accelerating engine.
It is appreciated that target memory address can be one or more memory address, accelerating engine can execute it is primary or
Multiple step 403, the present invention is not especially limit this.
Accelerate computing unit to prestore at least one memory address, and accelerates computing unit with obtaining the memory for being used for caching
The mode of location can be realized by following step, wherein be illustrated for obtaining target memory address.
Memory modules application memory address from the Service Processing Module of step A1:CPU to the CPU, with obtaining target memory
Location.
CPU includes Service Processing Module and memory modules, and memory modules accelerate computing unit to use for managing
Physical memory space.The memory headroom that Service Processing Module can be used to memory modules application for acceleration computing unit.
Specifically, in order to which to computing unit storage allocation address is accelerated, Service Processing Module is into memory modules application
Address is deposited, i.e. Service Processing Module sends memory address request to memory modules, to trigger memory modules from the memory prestored
Target memory address is determined in location, and the target memory address is returned into Service Processing Module.
Step A2: Service Processing Module sends target memory address to the caching management module of acceleration computing unit.
The Service Processing Module notice executed on CPU accelerates computing unit to refresh available memory address, so that caching is managed
Target memory address is stored in by reason module to be accelerated on computing unit.
Specifically, CPU construction includes the message BD of memory address, and message BD is sent to and accelerates computing unit
Caching management module.
Step A3: the caching management module of computing unit is accelerated to refresh available memory address to cache list.
After caching management module gets the target memory address of Service Processing Module transmission, caching management module is by target
Memory address, which is stored in, to be accelerated on computing unit.
Specifically, caching management module reads memory address from message BD, is then buffered in the memory address and adds
In the cache list of fast computing unit, refresh available memory address to cache list to realize.
It is appreciated that step 402 and step 403 may be performed simultaneously, can also be executed after step 402 with step 403.
Processing request is accelerated to handle business in accelerating engine, also untreated complete, when having to part processing result, acceleration
Engine is i.e. to caching management module application target memory address;Alternatively, accelerating engine accelerates processing request processing to complete business
After obtaining whole processing results, then to caching management module application memory address.
It is appreciated that accelerating engine needs memory there are many actual conditions of triggering acceleration computing unit execution step 403
When space is to store processing result, 403 are thened follow the steps.Specifically, it can be and add when acceleration computing unit gets business
Speed processing request, when handling to obtain processing result to business acceleration request, accelerating engine obtains mesh to caching management module application
Mark memory address.Alternatively, the memory headroom write-in processing result for accelerating computing unit to be directed toward to memory address, in the memory headroom
After writing completely, business accelerates processing to request still untreated completion, at this time in order to store remaining processing result, accelerates computing unit
It need to be to caching management module application memory address.
In the embodiment that the present invention has, step 401 can be that accelerating engine obtains the business acceleration processing that CPU is sent
Request and memory address write what the full CPU was sent when accelerating engine handles the processing result that the business accelerates processing request to obtain
The memory headroom that memory address is directed toward, and the business accelerates processing request still untreated complete, then accelerating engine executes step 403.
Step 404: the memory headroom that processing result write-in target memory address is directed toward by accelerating engine.
Accelerating engine accelerates processing request to handle business, obtains all or part of processing result, accelerating engine to
Caching management module application memory address, obtains target memory address, then, all or part that accelerating engine can obtain this
The memory headroom that target memory address is directed toward is written in processing result.
Step 405: accelerating engine sends target memory address to CPU.
After accelerating engine, which has handled business, accelerates processing request, accelerating engine is completed processing result in target memory
Write-in in location, accelerating engine can send the target memory address to CPU, so that CPU reading is buffered in the target memory address
Processing result on the memory headroom of direction carries out the processing of next step.
Specifically, accelerate computing unit after the memory headroom for being directed toward processing result write-in target memory address, add
Fast computing unit handles the message BD completed according to target memory address architecture and notifies CPU, and then, CPU is received from receiving queue
Message BD parses BD, obtains target memory address, obtains processing result according to the target memory address.
It is appreciated that accelerating engine can send target memory address to CPU, it can also be to CPU with sending target memory
Location and other information, other information are, for example, processing result length, so that CPU is long according to the target memory address and processing result
Degree obtains processing result, and wherein processing result length can accelerate processing request to obtain when handling business by accelerating engine
It arrives.When accelerating engine sends target memory address to CPU, CPU can be by reading the acquisition of information on target memory address
To processing result length, to obtain processing result.
In conclusion the data processing method of the embodiment of the present invention, because being arranged in for managing on accelerating computing unit
The caching management module for depositing address accelerates the accelerating engine of computing unit to accelerate at processing request the business that CPU is sent
Reason, to obtain processing result.Accelerating engine needs to store the processing result, for this purpose, accelerating engine can be to caching management module Shen
Please memory address, obtain being pre-stored in and accelerate the target memory address on computing unit that the target then is written in the processing result
On the memory headroom that memory address is directed toward, after accelerating engine sends target memory address to CPU, CPU can be by the target
It deposits address and reads the processing result.Because accelerating engine passes through the acquisition of memory address real to caching management module application
Existing, the accelerating engine and caching management module are located on same acceleration computing unit, the memory address that accelerating engine application is arrived
For be pre-stored in accelerate computing unit on memory address, in this way, accelerating engine can quick obtaining arrive memory address, with store handle
The processing result that business accelerates processing request to obtain.Compared with accelerating engine is to the scheme of CPU application memory address, the present invention is real
Response of the scheme of example without waiting for CPU to memory address is applied, the waiting time that accelerating engine obtains memory address is reduced, from
And can quick obtaining memory address, with use the memory address store processing result, realize the raising of service process performance.
Data processing method provided in an embodiment of the present invention is described in detail below.
Refering to Fig. 5 and Fig. 6, Fig. 5 is the flogic system block diagram of the data processing method of the embodiment of the present invention, and Fig. 6 is Fig. 5 institute
Show the method flow diagram of embodiment.With reference to the content and other embodiments above of Fig. 1 and embodiment illustrated in fig. 4, the present invention
The data processing method of embodiment is applied on heterogeneous system as shown in Figure 1, which includes CPU and accelerate to calculate single
Member, the CPU include Service Processing Module and memory modules, which includes accelerating engine and caching management module,
The caching management module is used for managing internal memory address.Wherein, accelerating computing unit may include one or more accelerating engines, no
Same accelerating engine can handle different business simultaneously and accelerate processing request.
It is below FPGA as example to accelerate computing unit to more intuitively describe method shown in the embodiment of the present invention
It is illustrated, it will be understood that the acceleration computing unit of the embodiment of the present invention can also be other types of acceleration device.
Refering to Fig. 5 and Fig. 6, and with reference to the content of the various embodiments described above, the data processing method packet of the embodiment of the present invention
It includes:
Step 601: Service Processing Module obtains target memory address to memory modules application memory address.
On CPU, Service Processing Module is to memory modules application target memory address.
Wherein, the physical memory space that memory modules management uses for FPGA, the physical memory space can be from behaviour
Make system (Operating System, OS) startup stage is just reserved or system run-time memory management module to pass through
The interface application of OS obtains.Specifically, memory modules can manage the memory sky of memory address direction by memory address
Between.
In order to FPGA storage allocation address, Service Processing Module on CPU is to memory modules application memory address, example
Such as, Service Processing Module calls memory modules to apply for memory address by code interface, and memory modules are from the memory prestored
Target memory address is obtained in location, and the target memory address is sent to Service Processing Module, thus Service Processing Module Shen
It please obtain target memory address.
In the embodiment that the present invention has, when memory modules initialize, according to business demand, physical memory space is divided
For fixed block size.
For example, the division mode in physical memory space is as shown in Figure 7.Assuming that when system initialization, memory modules are from the Shen OS
Please 100MB memory headroom, initial address 0x6400000, end address 0xC800000, then this section of space according to
2MB size is divided into 50 pieces.When Service Processing Module application memory address, with returning to the memory of one of free block every time
Location.
Optionally, in the embodiment for increasing memory address state instruction field, memory modules are set when initializing in corresponding
The internal storage state word of counterfoil is the free time.When Service Processing Module is to memory modules application memory address, memory modules return to one
Internal storage state word is the initial address of idle memory block.It is seen about the embodiment for increasing memory address state instruction field
Following detailed description.
Step 602: Service Processing Module sends target memory address to the caching management module of FPGA.
After Service Processing Module gets target memory address, which is sent to the cache management of FPGA
Module, so that caching management module gets target memory address.The target memory address is memory address workable for FPGA.
Specifically, Service Processing Module is at least wrapped according to the target memory address architecture notification message, the notification message
The target memory address applied is included.Then Service Processing Module sends the notification message to the caching management module of FPGA
BD, notice FPGA refresh available memory address.
In the embodiment that the present invention has, which further comprises configuration parameter, which is used to indicate interior
Deposit address flush to cache list method.Cache list is specific storage mode of the memory address on FPGA.
It is appreciated that CPU notice FPGA refreshes available memory address, can there are many concrete implementation modes, it is as follows
For two examples:
Mode one, CPU, which construct message BD according to memory address and send message BD, discharges queue to memory, wherein the message
BD includes the information such as type of message, memory address, configuration parameter.Caching management module is obtained by reading memory release queue
Message BD is got, so that parsing message BD obtains memory address.
Mode two, CPU directly configure the register of FPGA.The information such as cache address, configuration parameter on the register.
Step 603: target memory address is stored on FPGA by caching management module.
Caching management module is used for managing internal memory address, and caching management module gets the target memory address of CPU transmission
Afterwards, target memory address is stored on FPGA by caching management module.In this way, can be realized on FPGA to the pre- of memory address
It deposits.
Wherein, the different modes of target memory address are sent corresponding to CPU, caching management module obtains target memory address
Mode also there are many.For example, corresponding to above-mentioned mode one and mode two, caching management module receives target memory address
Concrete mode can be as follows:
Mode one: FPGA reads memory and discharges queue, parses message BD, and memory address, configuration ginseng are obtained from message BD
The information such as number, and memory address is put into cache list.
Mode two: FPGA obtains memory address, configuration parameter by register, and memory address is put into cache list.
In embodiments of the present invention, memory address is stored on FPGA in a manner of list.In CPU to caching management module
In the embodiment for sending configuration parameter, because configuration parameter is used to indicate the method that memory address is put into cache list, cache management
Module can specifically cache the target memory address got according to the configuration parameter.For example, by target memory
Location is put into before cache list, and similar last in, first out (Last In Fast Out, LIFO), in this way, FPGA need using
When memory address, the memory address can be preferentially used.Alternatively, configuration parameter, which can also indicate that, is put into caching for target memory address
List backmost, in this way, to realize all memory blocks are used in turn.
It is appreciated that in an embodiment of the present invention, being not limited to the caching of memory address with cache list on FPGA
Mode realize.
About the cache address on FPGA, also there are many specific modes.It is as follows:
1) compression memory address is cached.
In the embodiment that the present invention has, on FPGA, cached again after being compressed to memory address, to reduce memory
Address the space occupied.For example, as described above, the memory address prestored on FPGA is according to bit bit compression of alignment
When memory address, caching management module can be aligned bit according to memory address, realize the compression of memory address.
For example, memory address schematic diagram as shown in Figure 7, since memory address is 2MB alignment, low 20 bit all
It is 0, so as to 0 leave out this low bit, realizes the compression of memory address, then cache compressed memory address.Such as figure
Shown in 8, the memory address that cache list is written is 0x64,0x66,0x68,0x6a, and the initial address for respectively indicating memory block is
0x6400000、0x6600000、0x6800000、0x6a00000。
It 2) is memory address configuration check value.
In the embodiment that the present invention has, on FPGA, check value can be set for memory address, i.e., so that accelerometer calculates list
The memory address prestored in member is preset with check value, so that FPGA can be according to the verification if mistake occurs for the memory address of caching
Value detects the mistake of memory address, to guarantee the memory address that accelerating engine uses for correct memory address, to guarantee
The normal operation of the embodiment of the present invention.
Wherein, which can be parity values.
Concrete operations can be, after caching management module gets target memory address, according to the tool of target memory address
Parity values are calculated in body value, target memory address are being stored when accelerating on computing unit, and parity values are attached
It is added in the end of the target memory address, so that being embodied as the target memory address presets check value.Further cache management module
When obtaining the target memory address, which is carried out that parity values are calculated, is then obtained after use
Parity values and the parity values of caching compare, to realize the verifying to the target memory address.
For example, memory address schematic diagram as shown in Figure 7, configuration are for the memory address of the FPGA memory headroom used
0x6400000~0xC800000, only 28 bit are effectively, then optional bit30, bit31 storage check value, and bit0~
Bit27 stores memory address.
It is appreciated that one of which can be used in above-mentioned two ways, can also be together in concrete implementation mode
It uses.
Optionally, in the embodiment for increasing memory address state instruction field, memory address is put into caching column by FPGA
When table, i.e., memory address is stored when on FPGA, the internal storage state word for setting the memory address occupies for FPGA, for the core of CPU
It looks into module and verifies use.
The value of internal storage state word corresponds to the current state of memory block.The internal storage state word can be configured with the letter such as timestamp
Breath.Wherein, the current state of memory block includes following three kinds of states, i.e. the value of internal storage state word includes following three kinds:
1) Idle state: memory block is init state, and the memory address of the memory block flushes to the caching of FPGA not yet
In list.
2) FPGA occupies: the memory address of memory block has flushed in the cache list of FPGA, this can be used in FPGA
Memory block.
3) CPU occupies: memory block can be used by CPU, and CPU can read the data of memory block caching.
Following detailed description is seen about the embodiment for increasing memory address state instruction field.
Through the above steps 601 to step 603 execution, FPGA has got memory address, and the memory address is delayed
Exist in the cache list on FPGA, realize and memory address is prestored, to be counted for FPGA using memory address
According to storage prepare.
It is appreciated that the description of above-mentioned steps is illustrated by taking target memory address as an example, the side of the embodiment of the present invention
Method can also repeat above-mentioned steps to multiple and different memory address, so that caching multiple memory address on FPGA.
The memory address how FPGA is prestored using this is described below, to realize accelerating engine to the fast of memory address
Speed obtains, to improve the business processing efficiency of the heterogeneous system.
Step 604: Service Processing Module sends business to the accelerating engine of FPGA and accelerates processing request.
Service Processing Module obtains business and accelerates after handling request, and Service Processing Module is sent to the accelerating engine of FPGA should
Business accelerates processing request, so that the business that accelerating engine gets CPU transmission accelerates processing request.Then, FPGA processing should
Business accelerates processing request, and specific treatment process is as shown in the description of subsequent step.
Wherein, business processing request includes business to be processed.For example, the business accelerate processing request may include to
The information such as source data address, source data length and the processing parameter of processing.
Specifically, the Service Processing Module construction of CPU is sent to the message BD of FPGA, and the content of message BD includes
The information such as source data address, source data length and processing parameter to be processed.Then it sends message BD in transmit queue,
So that the accelerating engine of FPGA reads message BD from the transmit queue, to get source number after parsing message BD
According to information such as address, source data length and processing parameters.
Step 605: accelerating engine accelerates processing request to handle business, to obtain processing result.
Accelerating engine gets business and accelerates after handling request, accelerates processing request to handle the business, to obtain
Processing result.
Specifically, when business processing request includes the information such as source data address, source data length and processing parameter, add
Fast engine gets the business and accelerates after handling request, and accelerating engine is obtained according to source data address and source data length from memory
Then source data to be processed carries out the specified acceleration of processing parameter to the source data and calculates.
About the specific implementation of step 605, Fig. 9 is seen, Fig. 9 is the specific implementation that embodiment illustrated in fig. 6 is related to
Process schematic.As shown in figure 9, step 901 is the specific implementation process of step 605, in step 901, accelerate processing submodule
The source data that block requests business processing is handled after obtaining processing result, accelerates processing submodule that the processing result is written
The ram in slice space of FPGA, wherein the ram in slice plays the role of caching, meeting Reusability, with specific reference to retouching for step 905
It states.
Step 606: accelerating engine obtains target memory address to caching management module application memory address.
Execution through the above steps prestores at least one memory address, wherein target in the cache list of FPGA
Memory address belongs to the memory address for accelerating to prestore on computing unit.
For the processing result stored, accelerating engine is needed to caching management module application memory address.Specifically
Application process may is that accelerating engine is sent to caching management module for the memory to caching management module request memory address
Address requests are determined from the memory address of the cache list prestored after caching management module gets memory address request
Then accelerating engine is distributed in the target memory address by target memory address out.
It is appreciated that the quantity of target memory address can be one or more.
For example, as shown in figure 9, step 902 to step 904 is the specific implementation process of step 606.Wherein, accelerating engine
Including accelerating processing submodule, result to write back submodule and ram in slice.
In step 902, accelerates processing submodule to write back submodule to result and send message, the message is for triggering result
Submodule work is write back, wherein the message can be triggers to obtain by electric signal.
In step 903, when accelerating engine needs storage of the memory headroom to carry out processing result, submodule is as a result write back
Block sends message to caching management module, which is memory address request, and memory address request is for cache management
Module application memory address.Specifically, submodule is as a result write back to judge whether to need from caching management module application memory, such as
Fruit needs to apply, then sends memory address request message to caching management module.Wherein, the message of memory address request can lead to
Electric signal is crossed to trigger to obtain.Specific deterministic process is that do not have memory address when result writes back submodule;Or result writes back son
When the memory headroom that the memory address that module has obtained is directed toward has write full, then result writes back submodule and sends memory address request
Message.
In step 904, it as a result writes back submodule and obtains the target memory address that caching management module is sent.
After caching management module gets memory address request, caching management module is from first in first out (Fist In Fist
Out, FIFO) first memory address cached, i.e. target memory address are taken out in queue, and the target memory address is sent to
As a result submodule is write back.In an embodiment of the present invention, after caching management module takes out target memory address, caching management module
The owner pointer value of mobile management fifo queue is needed, then gets the memory address request that result writes back submodule transmission next time
When, caching management module returns to next memory address.Wherein, caching management module is realized by encoding to FPGA.It should
Fifo queue is cache list.
In the embodiment that the present invention has, corresponding to different cache way of the memory address on FPGA, cache management
Module to accelerating engine return target memory address mode also there are many, for example, correspond to memory address compressed cache and
Check value is configured, step 606 there are following two kinds of specific implementation procedures.
1) memory address prestored on FPGA is the memory address according to bit bit compression of alignment.
Step B1: accelerating engine sends memory address request to caching management module.
Wherein, memory address request is for requesting memory address to caching management module.
The specific implementation of step B1 can refer to the detailed description of the step 903 of Fig. 9.
Step B2: caching management module decompresses target memory address for bit according to alignment, the mesh decompressed
Mark memory address.
Caching management module determines target memory address after getting memory address request from cache list.Because slow
It deposits the memory address prestored in list to store using compression method, then cache module needs to decompress memory address, at this
In inventive embodiments, caching management module decompresses target memory address for bit according to alignment, the target decompressed
Memory address.
The concrete mode of cache address can refer to the content of caching compression memory address above on FPGA.
For example, as shown in fig. 7, the memory address that FPGA is obtained is that 2MB is aligned, will be interior by mode as shown in Figure 8
Deposit low bit of address 0 is left out, and after the compression for realizing memory address, FPGA stores the memory address of the compression.Cache management
Module determines the target memory address 0x64 of compression after getting memory address request, then that the address value of 0x64 is extensive
It is again 0x6400000, the target memory address decompressed.
Step B3: accelerating engine obtains the target memory address for the decompression that caching management module is sent.
After the completion of memory address decompression, the target memory address of the decompression is sent to acceleration and drawn by caching management module
It holds up, for accelerating engine use.
The specific implementation of step B3 can refer to the detailed description of the step 904 of Fig. 9.
By compressing memory address, so that low bit of address value does not store in cache list, it is possible to reduce memory
The position bit of address, the size of cache list needed for reducing are equivalent to improve power system capacity.Because being put into the memory of cache list
Address is typically all to be aligned, for example, 64 byte-aligneds, then 6 bit minimum address values are 0, to be stored in caching
The address value of list can remove this 6 bit.When caching management module storage allocation address is to accelerating engine, it is supplemented deleting
Low bit removed.Remove it is bit low by way of carry out address compression, bit needed for reducing cache address,
The size for reducing required cache list facilitates lifting system capacity under same cache list size.
2) memory address prestored on computing unit is accelerated to be preset with check value.
Step C1: accelerating engine sends memory address request to caching management module.
Wherein, memory address request is for requesting memory address to caching management module.
The specific implementation of step C1 can refer to the detailed description of the step 903 of Fig. 9.
Step C2: the check value of target memory address is calculated in caching management module.
Caching management module takes out target memory address after getting memory address request from cache list, because slow
The memory address prestored in list is deposited configured with check value, then caching management module is according to the prewired check value to target memory
Address is verified, to carry out address legitimacy verifies.Checking procedure are as follows: target memory address is calculated in caching management module
Check value, obtain calculated check value, then matched using the calculated check value and the check value that prestores, if
It is identical for matching, and verification passes through, and otherwise, verification does not pass through.When verification passes through, caching management module is just returned to accelerating engine
Target memory address.
Step C3: when the matching of the preset check value for the check value and target memory address being calculated, cache management
Module sends target memory address to accelerating engine, so that accelerating engine obtains target memory address.
If the check value of the check value and target memory address that are calculated in the step C2 of target memory address prestored
Matching, when as identical, then it represents that the target memory address check passes through.The memory address that passes through is verified in storing process not
Mistake occurs, when caching management module determines the memory address that the verification passes through from cache list, which is positive
True memory address.To which caching management module can return to the target memory address to accelerating engine, so that accelerating engine makes
With.
For example, check value be parity values when, caching management module by target memory address caching on FPGA
Cache list on when, calculate the parity values of target memory address, and by the target memory address and the even-odd check
Value correspondence is stored in cache list.For example, the parity values are added at the end of target memory address, and by the two one
Play caching.The surprise of the target memory address is calculated when obtaining target memory address from cache list in caching management module
Then even parity value is compared using the parity values cached in the latter parity values and cache list being calculated
Compared with if two parity values are identical, caching management module sends the target memory address to accelerating engine.
By making the memory address for accelerating to prestore on computing unit preset check value, it is ensured that the memory address used is just
True property, so as to promote heterogeneous system reliability.
Because memory address is on one section of ram space being buffered in inside FPGA, for example, memory address is buffered in caching
In list, cache list is one section of ram space inside FPGA, this is arranged there may be failing so as to cause from caching
The memory address that table is read is incorrect.When memory address is stored into cache list, the check value of the memory address is calculated, is deposited
Enter idle bit of the memory address.Caching management module storage allocation address is to accelerating engine in use, with can calculating memory
Whether the check value of location and the bit value of check bit are identical, pass through the check value, it can be ensured that the memory that accelerating engine application is arrived
Location is correctly, to increase the reliability of heterogeneous system.
Step 607: the memory headroom that processing result write-in target memory address is directed toward by accelerating engine.
Processing business can be accelerated processing to request obtained processing result by accelerating engine after getting target memory address
The memory headroom that target memory address is directed toward is written.
For example, in step 905, as a result writing back submodule refering to Fig. 9 and processing result being requested to write back step by PCIE
The memory headroom that the target memory address of 904 applications is directed toward.Wherein, as a result writing back submodule and writing back the condition of memory headroom is:
Processing result reaches a certain amount of, such as 512Byte;Or step 901 accelerates processing request processing to complete business.
It is appreciated that accelerating engine can be one or many to caching management module application memory address.
For example, accelerating engine accelerates processing request to handle business, to obtain processing result, at this point, accelerating engine
Memory headroom is needed to store the processing result, for this purpose, accelerating engine obtains a memory address from caching management module application.So
Afterwards, which is written the memory headroom of memory address direction by accelerating engine, if the memory that the memory address is directed toward is empty
Between can cache whole processing results, then accelerating engine can not have to again to caching management module application memory address.If the memory
After address is write completely, which accelerates processing to request still untreated completion, then accelerating engine is applied another again to caching management module
Then processing result is continued to write to the memory headroom that another memory address is directed toward by memory address.
For example, in the embodiment that has of the present invention, step 606 and step 607 are implemented as follows:
Step D1: accelerating engine obtains the first memory address to caching management module application memory address.
Wherein, the first memory address is one of the memory address prestored on FPGA.
Accelerating engine processing business accelerates processing request, processing result is obtained, at this point, accelerating engine needs memory headroom
Cache the processing result.For this purpose, accelerating engine to caching management module application memory address, i.e. accelerating engine executes step 606.
Step D2: the memory headroom that the first memory address is directed toward is written in the first processing result by accelerating engine.
First processing result belongs to the processing result in step 605.Accelerating engine processing business accelerates processing request, may
After obtaining the processing result of part, need for the processing result of the part to be stored on the memory headroom that memory address is directed toward.
First processing result is the processing result of part.
Step D3: when the first memory address be directed toward memory headroom be fully written, and business accelerate processing request it is untreated complete
Cheng Shi, accelerating engine obtain the second memory address to caching management module application memory address.
Second memory address is one of the memory address prestored on FPGA.
If the memory headroom that the first memory address is directed toward is fully written, and the still untreated complete business acceleration processing of accelerating engine is asked
It asks, accelerating engine needs memory headroom also to continue to store processing business and processing is accelerated to request obtained subsequent processing result.For
This, accelerating engine applies for memory address to caching management module again, obtains the second memory address.
Step D4: the memory headroom that the second memory address is directed toward is written in second processing result by accelerating engine.
Second processing result belongs to the processing result in step 605.
Accelerating engine continues with business and accelerates processing request, continues to obtain processing result, the processing knot continued
Fruit is second processing as a result, the memory headroom that the second memory address is directed toward can be written in the second processing result by accelerating engine.
If after the memory headroom of the second memory address is write completely, business accelerates processing to request still untreated completion, then accelerate to draw
It holds up and repeats step D3, D4 until business accelerates processing request to be executed.
If the memory headroom that the first memory address and the second memory address are directed toward can cache processing business acceleration processing
Obtained processing result is requested, then in step 608, accelerating engine sends the first memory address and the second memory address to CPU.
Step 608: accelerating engine sends target memory address to CPU.
The accelerating engine of FPGA will handle obtained place after the business that processing completion CPU is sent accelerates processing request
After managing the memory headroom that result write-in target memory address is directed toward, which is sent to CPU by accelerating engine, so that
The Service Processing Module of CPU obtains the processing result according to the target memory address.
For example, accelerating engine constructs message BD, message BD includes the letter such as target memory address, processing result length
Breath.Wherein, which can be recorded in the form of result address list.Then, accelerating engine is by the message
BD is sent to CPU.
It is appreciated that target memory address is the memory address of memory headroom caching process result, target memory address
Number of addresses can be one or more, for example, target memory address includes in first in the example of above-mentioned step D4
Address and the second memory address are deposited, so that first memory address and the second memory address are sent to the industry of CPU by accelerating engine
Business processing module.
It is appreciated that accelerating engine can send target memory address to CPU, in the implementation having in the embodiment having
In example, accelerating engine can also send target memory address to CPU and other information, the other information include that processing result is long
Degree.
Optionally, FPGA is when executing step 608, or executes step 608 front and back, can be by the memory of target memory address
Status word is set as CPU and occupies.Specifically, the accelerating engine of FPGA will handle obtained processing result write-in target memory
After the memory headroom that address is directed toward, the internal storage state word of target memory address is set CPU by accelerating engine to be occupied.For CPU
Verification module verify use.
For example, the concrete structure schematic diagram of FPGA as shown in Figure 9, which includes that result writes back submodule, the result
Writing back submodule can be also used for setting internal storage state word.As a result it writes back submodule and meets the eye on every side the corresponding memory of mark memory address writing
When block, or receive acceleration request processing complete when, setting target memory address internal storage state word occupy for CPU, specifically may be used
Think, the internal storage state word for setting the target memory address in result address list occupies for CPU.Wherein, result address list by
The target memory address of write-in processing result obtains.It is seen hereinafter about the embodiment for increasing memory address state instruction field
Detailed description.
Step 609: the memory headroom reading process result that Service Processing Module is directed toward from target memory address.
After Service Processing Module obtains the target memory address that accelerating engine is sent, the interior of target memory address direction is read
Deposit processing result spatially.
For example, the Service Processing Module executed on CPU reads receiving queue, message BD is obtained, message BD is then parsed,
To obtaining result address list and processing result length, then according to processing result length from the target in result address list
Reading process result on the memory headroom that memory address is directed toward.
In this way, the business that Service Processing Module gets step 604 accelerates processing to request corresponding processing result.
Step 610: Service Processing Module sends target memory address to caching management module.
After the processing result of the memory headroom caching of target memory address is obtained by Service Processing Module, FPGA can be again sharp
With the target memory address, for this purpose, Service Processing Module sends target memory address to caching management module, so that cache management
Target memory address is stored on FPGA by module.To realize the recycling of the Service Processing Module on CPU target memory address,
And FPGA is notified to refresh available memory address.
The specific implementation of step 610 can refer to the detailed description of step 602.
Step 611: target memory address is stored on FPGA by caching management module.
After the processing result on memory headroom that target memory address is directed toward is read by CPU, caching management module is obtained
The target memory address that CPU is sent, and target memory address is stored on FPGA.It the target memory address can be again by FPGA
It utilizes.
The specific implementation of step 611 can refer to the detailed description of step 603.
The execution of step 610 and step 611, so that the processing result that CPU handles FPGA is taken away from memory headroom
Afterwards, the memory address of the memory headroom can be reused.To which by the execution of step 610, it is slow that CPU notifies that FPGA refreshes
List is deposited, so that target memory address is reentered into cache list.By the flush mechanism, memory address can make repeatedly
With, it is ensured that FPGA has available memory address, can promote the process performance of heterogeneous system in this way.
It is appreciated that step 610 and step 611 can not execute in the embodiment that the present invention has, for example, business processing
After module reads the processing result on the memory headroom that target memory address is directed toward, which can be released to memory
Module.
In the embodiment that the present invention has, the Service Processing Module of CPU is in addition to that can send industry to the accelerating engine of FPGA
Business accelerates processing request, can also send memory address to the accelerating engine, the acquisition of the memory address can refer to step 601.
At this point, the accelerating engine, which handles the business, accelerates processing request, processing result is obtained, and the CPU is written into processing result and is sent
The memory headroom that is directed toward of memory address on, write the full and business when the memory headroom and processing accelerated to request still untreated completion
When, then accelerating engine executes step 606, to obtain target memory address from caching management module application, then by processing business
The processing result for accelerating processing request to obtain continues to write to the target memory address, i.e. execution step 607.In this way, accelerating engine
After the processing completion business accelerates processing request, because of the memory headroom of the CPU memory address sent and target memory address direction
It is written with processing result, so that accelerating engine sends the memory address and target memory that the CPU is sent to Service Processing Module
Location, so that CPU gets processing result according to these memory address.
It is previously mentioned the invention also includes the embodiment for increasing memory address state instruction field, is now said about the embodiment
It is bright as follows:
In the embodiment that the present invention has, it also is configured with state instruction field for memory address, passes through the case pointer
Section can be recycled abnormal memory address, to improve the reliability of the method for the embodiment of the present invention.
It specifically, may be because of certain originals because when CPU takes the processing result that FPGA is handled from memory headroom away
Because leading to application exception, FPGA flush buffers list is not notified, i.e. step 610 and step 611 is not carried out, to lead
The memory address of the memory headroom is caused to lose.And the state instruction field of memory address contains the shape of memory address the last time
The information such as state information and timestamp, the verification module of CPU pass through the information of state instruction field, can CPU processing is abnormal and
The memory address of loss flushes to cache list again, with lifting system reliability.
Use about state instruction field is now illustrated below:
In the embodiment that the present invention has, state instruction field includes internal storage state word.In the following, being with target memory address
Example is described.
Target memory address configuration has internal storage state word, and the value of internal storage state word is occupied including acceleration computing unit and CPU
Occupy, wherein the acceleration computing unit of target memory address occupies for indicating that target memory address has been stored in accelerometer
It calculates on unit, target memory address is used by acceleration computing unit.The CPU of target memory address occupies for indicating target memory
The processing result for the memory headroom caching that address is directed toward can be read by CPU, and target memory address is used by CPU.
In the embodiment that the present invention has, acceleration computing unit is FPGA, at this time the acceleration computing unit of internal storage state word
Occupy specially FPGA to occupy.
It is appreciated that the value of internal storage state word can also include Idle state, target memory in the embodiment that the present invention has
The Idle state of the internal storage state word of address is used to indicate that the memory headroom of target memory address to be init state, target memory
Location is stored not yet onto acceleration computing unit, such as is not flushed to also in the cache list of FPGA.
The value of internal storage state word is arranged to obtain by acceleration computing unit, such as can be write back by the result of acceleration computing unit
Submodule executes.
About accelerating computing unit to operate the setting of memory status word, description above is seen.
For example, in step 603, caching management module stores target memory address when accelerating on computing unit, delay
Management module is deposited to set the internal storage state word of target memory address to computing unit is accelerated to occupy.In step 608 or step
608 front and backs, the accelerating engine of FPGA after the memory headroom that will handle obtained processing result write-in target memory address direction,
Caching management module sets CPU for the value of the internal storage state word of target memory address and occupies, so that target memory address
The value of internal storage state word occupied by acceleration computing unit and switch to CPU and occupy.And in step 611, caching management module is by mesh
Mark memory address, which is stored in, to be accelerated on computing unit, and the internal storage state word of target memory address is set as adding by caching management module
Fast computing unit occupies, and switchs to accelerate computing unit so that the value of the internal storage state word of target memory address is occupied by CPU
Occupy.
CPU further includes verifying module, and the verification module is for carrying out verification recycling to memory address.
The specific process for verifying recycling memory address are as follows:
After step 608, the method for the embodiment of the present invention further include:
Step E1: the value for the internal storage state word for verifying module detection target memory address, which is continuously the time that CPU occupies, is
It is no to be greater than preset time;If the value of the internal storage state word of target memory address is continuously the time that CPU occupies and is greater than preset time,
Then follow the steps E2.
If target memory address is not lost, by the setting procedure of the value of above-mentioned internal storage state word, with target memory
Alternately accelerated computing unit and CPU are used for address, and the value of the target memory address can be accounted for accelerating computing unit to occupy with CPU
Alternately change between having.So the value of the internal storage state word of target memory address is continuously time that CPU occupies when being greater than default
Between, it indicates that target memory address be the memory address of loss, that is, is likely to processing result that CPU handles FPGA from memory
When space is taken away, target memory address belonging to the memory headroom is lost by CPU, is no longer participate in CPU and is accelerated computing unit
Use process.Accordingly it is desirable to be recycled to the target memory address.If the value of the internal storage state word of target memory address continues
It is not more than preset time for the time that CPU occupies, then it represents that the target memory address is not set to the memory address of loss, thus nothing
Target memory address need to be recycled.
Wherein, the setting of preset time can rule of thumb or after experiment statistics obtains, and is preset by user.
Step E2: Service Processing Module sends target memory address to caching management module.
If the value of the internal storage state word of target memory address is continuously the time that CPU occupies and is greater than preset time, CPU's
Service Processing Module is recycled the target memory address, and Service Processing Module sends the target to caching management module
Memory address, so that FPGA reuses the target memory address.
Step E3: target memory address is stored in by caching management module to be accelerated on computing unit.
It is greater than preset time, caching pipe because the value of the internal storage state word of target memory address is continuously the time that CPU occupies
It manages module and obtains the target memory address that CPU is sent.Then, target memory address is stored in acceleration and calculated by caching management module
On unit, to realize the re-using to the target memory address, the reliability of system is improved, and avoid memory
The decline of memory usage caused by loss.
Whether the value about the internal storage state word for verifying module detection target memory address is continuously the time that CPU occupies big
In preset time specific practising way there are many, such as:
Example one:
Caching management module is the memory shape when the value of the internal storage state word to target memory address is configured
State word configuration setting time, the setting time are used to record the time of the value setting of internal storage state word.
To which step E1 is specifically included:
When verification module detects that the value of the internal storage state word of target memory address is occupied for CPU, the internal storage state is judged
Whether the difference between setting time and the current time of word is greater than preset time, if the setting time of the internal storage state word and working as
Difference between the preceding time is greater than preset time, then it represents that the value of the internal storage state word of target memory address is continuously CPU and occupies
Time be greater than preset time, then follow the steps E2.
Wherein, it verifies module and detects that the value of the internal storage state word of target memory address can be every prefixed time interval
It is detected.The current time is to verify module to detect that the value of the internal storage state word of target memory address is when CPU occupies
Time.
Example two:
In the embodiment that the present invention has, state instruction field includes internal storage state word, verifies status word and state synchronized
The information such as time.
Target memory address is also configured with verification status word.
It verifies status word and the state synchronized time is corresponding, the state synchronized time is used to indicate that the value for verifying status word is synchronous
For internal storage state word value when time;
The value of the verification status word of target memory address by target memory address synchronous under synchronous condition internal storage state
The value of word obtains, and synchronous condition is the value of the verification status word of target memory address and the internal storage state word of target memory address
It is worth not identical;
The value of the internal storage state word of target memory address is continuously time that CPU occupies and is greater than preset time specifically: mesh
The value for marking the value of the internal storage state word of memory address and the verification status word of target memory address is all that CPU occupies, and in target
It deposits the difference between the state synchronized time of address and current time and is greater than preset time, current time is to detect target memory
The value of the verification status word of the value and target memory address of the internal storage state word of address is all the time that CPU occupies.
It is executed in operation specific, the value of the internal storage state word of the verification module detection target memory address of step E1 is held
Continue whether the time occupied for CPU is greater than preset time, specifically include:
Verify module every prefixed time interval, with judging value and the target memory of the verification status word of target memory address
Whether the value of the internal storage state word of location is identical;
If the value of the internal storage state word of the value and target memory address of the verification status word of target memory address is not identical,
The value that the value of the verification status word of target memory address is synchronized the internal storage state word for target memory address by module is verified, and more
The state synchronized time of fresh target memory address;
If the value of the verification status word of target memory address is identical with the value of the internal storage state word of target memory address, when
Verify module detect the value of the verifications status word of target memory address occupy for CPU, the state synchronized of target memory address when
Between difference between current time be greater than preset time and within a preset time interval the internal storage state word of target memory address
Value when having not been changed, indicate that the value of the internal storage state word of target memory address is continuously the time that CPU occupies and is greater than preset time,
To which Service Processing Module executes the step of sending target memory address to caching management module.
Wherein, current time is to verify module to detect that the value of the verification status word of target memory address is what CPU occupied
Time.Because the value of the verification status word of target memory address is identical with the value of the internal storage state word of target memory address, thus it is current
Time is also that the value of the value for detecting the internal storage state word of target memory address and the verification status word of target memory address is all
The time that CPU occupies.Difference between the state synchronized time of target memory address and current time is greater than preset time, indicates
The state synchronized time has timed, out.
Wherein, the method that whether value of the detection internal storage state word of target memory address within a preset time interval changes has
It is a variety of, for example, in FPGA when the value of the internal storage state word to target memory address is configured, while being the internal storage state word
Setup time stamp, which is used to update the time value of state indicating bit, according to the internal clocking monotonic increase of FPGA.I.e.
Timestamp can record the time of the value setting of internal storage state word.
It is the state instruction in 3 seconds according to preset time (such as 3 seconds) regular check internal storage state word due to verifying module
Position might have multiple variation, such as occupy from CPU → FPGA occupies → CPU occupies, by the timestamp, verification module be can determine that
It is variation on earth that the CPU of state indicating bit, which occupies, again without variation.
Alternatively, being simultaneously when the value of the internal storage state word to target memory address is configured in caching management module
The internal storage state word configuration status sequence number, the every variation of the internal storage state word of memory address is primary, then caching management module is incremented by
The sequence number is primary, which can be cyclic sequence number.By the status switch number, verifying module can determine that state
It is variation on earth that the CPU of indicating bit, which occupies, again without variation.
In order to be judged based on the timestamp or status switch number memory status word, the verification of the embodiment of the present invention
Status word, which is also configured with, verifies timestamp or check sequence number, synchronizes by the value of internal storage state word as the value of verification status word
When, verification timestamp synchronization be internal storage state word timestamp or the check sequence number synchronize be internal storage state word state
Sequence number.It is subsequent, pass through the comparison of verification timestamp and the timestamp of internal storage state word or check sequence number and status switch
Number comparison, that is, can determine the internal storage state word of the internal storage state word of memory address target memory address within a preset time interval
Value whether change.If verifying timestamp identical with the timestamp of internal storage state word or check sequence number and status switch number
It is identical, then it represents that internal storage state word has not been changed, and is otherwise change.
In the example of the example two, using verify carry out status word and synchronization time CPU occupy whether Chao Shi judgement, because
It verifies status word and is arranged to obtain by CPU synchronization time, so that use of the CPU to status word and synchronization time is verified is facilitated,
To improve the treatment effeciency of CPU.For example, if directly judging the CPU of internal storage state word using the timestamp of FPGA setting
Occupy whether time-out, need CPU to know the clock stamp clocking method of FPGA, and be converted to the time (such as the second) of CPU, this is unfavorable for
The raising for the treatment of effeciency.Also, in order to save logical resource, the bit wide of timestamp only has 30bit.Because time-out time compares
Long, for example, 10 minutes, timestamp may be overturn in 10 minutes, and verify the time and for example there was only 3 seconds, use the state of verification
Carry out word and synchronization time CPU occupy whether Chao Shi judgement, it is ensured that timestamp will not be overturn in 3 seconds.
Example three:
The value of the internal storage state word of target memory address is set CPU by caching management module when occupying, for the memory shape
The setting countdown of state word starts to identify, and countdown starts mark and starts for triggering between carrying out countdown default countdown;
When caching management module is set as the value of the internal storage state word of target memory address that computing unit to be accelerated to occupy, it is
The internal storage state word is arranged countdown and cancels mark, and countdown starts mark and cancels falling between default countdown for triggering
Timing.
To which step E1 is specifically included:
When the countdown of internal storage state word for verifying module and detecting target memory address starts mark, start to default
Countdown is carried out between countdown, if being zero between the countdown result default countdown, then it represents that the memory of target memory address
The value of status word is continuously time that CPU occupies and is greater than preset time, thereby executing step E2, if between falling default countdown
Before timing result is zero, verifies module and detect that mark is cancelled in the countdown of the internal storage state word of target memory address, then take
Disappear between carrying out countdown default countdown.
There are internal storage state word and target memory address to be also configured with verification status word about target memory address configuration, indicates
Target memory address and internal storage state word have corresponding relationship and target memory address and verification status word to have corresponding relationship, from
And it can identify the internal storage state word of target memory address and verify status word.
For example, realizing the correspondence of memory address and internal storage state word by initial address+offset.As memory block size is
4MB shares 256 pieces, initial address 0x10000.Internal storage state word equally applies for 256 pieces, and each internal storage state word is 4B.From
And first internal storage state word saves the state of first memory block, second status word saves the state of second memory block.
Known block address memory A can learn which block the block address memory is, needs which refreshes by (A-0x10000)/4MB
A internal storage state word.The situation for verifying status word is identical with this.
It is appreciated that in the embodiment that the present invention has, before step 601, the data processing side of the embodiment of the present invention
Method can also include the following steps:
Step F1: the number of memory address range and maximum memory block workable for configuration FPGA, each memory block it is big
Small and alignment thereof.
Step F2: whether configuration FPGA enables address compression.
Step F3: whether configuration FPGA enables address check.
Whether step F4: configuration FPGA enable internal storage state instruction more new function, if enabled, it is also necessary to configure memory shape
The initial position of state word.Internal storage state instruction more new function, which refers to, carries out returning for memory address using information such as internal storage state words
It receives.
Step F5: the memory address refresh scheme of configuration FPGA default.For example, determining that caching management module is obtained from CPU
To after memory address, which is put into the above or below of cache list.
It is appreciated that above-mentioned step F1-F5 can execute a step therein or multi-step.
In conclusion FPGA includes accelerating engine and caching management module, caching management module is used for managing internal memory address,
On FPGA, accelerating engine obtains the business that CPU is sent and accelerates processing request, and then, accelerating engine asks business acceleration processing
It asks and is handled, to obtain processing result.Accelerating engine is to caching management module application memory address, with obtaining target memory
Location, wherein accelerate computing unit to prestore at least one memory address, target memory address, which belongs to, to be accelerated to prestore on computing unit
Memory address.To which FPGA is during processing business accelerates processing request, without by interrupting to CPU application memory
Location, but memory address is directly obtained from FPGA, improve process performance.Then, mesh is written in processing result by accelerating engine
It marks memory headroom and accelerating engine that memory address is directed toward and sends target memory address to CPU.
In this way, setting is used for the caching management module of managing internal memory address on accelerating computing unit, accelerate computing unit
Accelerating engine business that CPU is sent accelerate processing request to handle, to obtain processing result.Accelerating engine needs to store
The processing result, for this purpose, accelerating engine can obtain being pre-stored on acceleration computing unit to caching management module application memory address
Target memory address, then, by the processing result be written the target memory address direction memory headroom on, accelerating engine to
After CPU sends target memory address, CPU can read the processing result by the target memory address.Because of accelerating engine
To the acquisition of memory address by realizing to caching management module application, the accelerating engine and caching management module are located at same
Accelerate computing unit on, accelerating engine application to memory address be pre-stored in accelerate computing unit on memory address, FPGA
Accelerating engine when need, directly from caching management module application memory address, interrupt so not having to send to CPU application
Memory.In this way, accelerating engine can quick obtaining to memory address, to store the processing knot that processing business accelerates processing request to obtain
Fruit.Compared with accelerating engine is to the scheme of CPU application memory address, the scheme of the embodiment of the present invention is without waiting for CPU to memory
The response of address.Since caching management module is to accelerate computing unit internal module, it is that computing unit programming is accelerated to realize, accelerates
The accelerating engine of computing unit only needs several hardware beats from cache module application memory address, compares from CPU application tens
The accelerating engine of several hundred microseconds, the embodiment of the present invention can be ignored from the time of cache module application memory address.To,
The method of the embodiment of the present invention reduces the waiting time that accelerating engine obtains memory address, so as to quick obtaining memory
Location realizes the raising of service process performance to use the memory address to store processing result.
In order to which the data processing method to Fig. 5 and embodiment shown in fig. 6 is understood more intuitively, the data are provided below
One concrete scene example of processing method, accelerating computing unit as shown in Figure 10, in the concrete scene example is FPGA, should
FPGA is specifically as follows compressing card.The FPGA includes two accelerating engines.
With reference to Figure 10 and above-mentioned each embodiment, refering to fig. 11, the data processing method of the embodiment of the present invention includes:
Step 1101: the initial configuration stage.
The memory modules of CPU to the continuous 1GB physical address space of OS application, such as pass through big page in initialization
Reserved mode.The initial address in the physical memory space is 0x200000000, and size 1GB, the entire space 1GB divides
For 256 memory blocks, each memory block is 4MB.
In addition, memory modules have also applied for the space of 256 internal storage state words, the corresponding memory block of an internal storage state word.
The start physical address in the space of internal storage state word is 0x400000000, size 1KB.In the initial configuration stage, memory mould
The value of 256 internal storage state words is arranged as the free time in block.
As shown in figure 12, the driver of FPGA memory address range workable for configuration FPGA in initialization is
0x200000000~0x240000000, memory block size are 4MB, and block address memory is aligned with 4MB, the brush of allocating cache list
New algorithm is first in first out (First In first Out, FIFO).It configures FPGA and enables address compression function, enable address school
Function is tested, the initial position for enabling internal storage state and indicating more new function, and configure internal storage state word is 0x400000000.Caching
It is sky that management module, which initializes cache list, and size 1KB can store 256 memory address.
The Service Processing Module executed on step 1102:CPU is from memory modules application memory.Memory modules are to business processing
It is idle memory address, such as 0x200000000,0x200400000,0x200800000 ... that module, which returns to internal storage state word,.
Step 1103: Service Processing Module sends memory address to FPGA.
Specifically: the Service Processing Module construction message BD executed on CPU, message BD includes memory address.By this
Message BD notifies available memory address in the list of FPGA flush buffers.Wherein, message format includes type of message and memory
Location.As shown in table 1 below, type of message 1, memory address 0x200000000.
Table 1
1 | 0x 200000000 |
Step 1103 is repeated, CPU notifies the refresh message of 256 memory address to FPGA.
Step 1104: caching management module stores memory address onto FPGA.The caching management module of FPGA receives CPU
Refresh message after, check the legitimacy of message.After validity checking passes through, parsing obtains memory address from message BD.So
Afterwards, according to the management method of configuration, memory address is compressed, 0x200000000 is such as compressed into 0x2000, and calculate
The check value of memory address.Check value calculation method is as follows:
The parity value of address value after bit30=compression, the number if the bit of memory address is 1 is odd number, then odd
Even value is 1, is otherwise 0.
Bit31=bit30 is negated.
For example, being 0x2000 after memory address 0x200000000 compression, the number that the bit of 0x2000 is 1 is odd number
It is a, therefore its parity value is 1, then the 30bit of memory address stores bit 1, and 31bit negate storage 0.Finally, memory
Location 0x200000000 is stored in the value of cache list for 0x40002000, such as Figure 13 institute after increasing address check by address compression
Show.
For another example, memory address 0x200400000 is 0x2004, that 0x2004 bit is 1 after memory address compression
Number is even number, and parity value 0, then 30bit stores bit 0, and 31bit negate storage 1.Last memory address
0x200400000 is stored in the value of cache list for 0x80002004, such as Figure 14 institute after increasing address check by address compression
Show.
The transformational relation of other memory address is similar to shown in above-mentioned example.
According to refresh scheme the address value (such as 0x40002000) after original memory address (such as 0x200000000) conversion
It is put into cache list, the internal storage state word of the original memory address of juxtaposition occupies for FPGA.The structure of internal storage state word such as 2 institute of table
Show.Wherein, Idle state only has initial phase just to exist.Internal storage state word is FPGA module update, and FPGA refers in more new state
When showing, the internal clocking value of current FPGA is written on bit29~bit0.
Table 2:
Step 1104 is repeated, the refresh message of step 1103 is handled and is completed.After the completion of processing, obtain as shown in figure 15
Mapping relations figure.
Step 1105: the processing decompression request of accelerating engine 1, and application obtain memory address.
Accelerating engine 1 has received the decompression request that CPU is sent, and stores decompression knot to caching management module application memory
Fruit, caching management module take out cache list free address, supplement low level alignment memory address value, and remove bit30 and
Address 0x4002000 is finally reduced to 0x200000000, and the address is returned to accelerating engine 1 by the check bit of bit31.
Step 1106: the processing decompression request of accelerating engine 2, and application obtain memory address.
Accelerating engine 2 has received the decompression request that CPU is sent, and stores decompression knot to caching management module application memory
Fruit, the processing of similar step 1105, caching management module return to memory address 0x200400000 to accelerating engine 2.
Step 1107: accelerating engine 1 applies for memory address again.
Accelerating engine 1 in decompression procedure, write completely, not yet by the space 4MB of discovery memory address 0x200000000
Decompression is completed, and continues to store decompression to caching management module application memory as a result, caching management module returns to 0x200800000's
Memory address is to accelerating engine 1.
Step 1108: accelerating engine 2 sends memory address to CPU.
The decompression of accelerating engine 2 is completed, and construction message BD notifies CPU, and message includes the address list of storage decompression result
0x200400000 decompresses length 3.2MB.Then, FPGA sets the corresponding internal storage state word of memory address 0x200400000
The value of 0x400000004 is occupied for CPU.
Step 1109:CPU obtains the memory address that accelerating engine 2 is sent.
The decompression that CPU has received step 1108 completes message, parses BD, obtains the memory address and solution of caching decompression result
Result length is pressed, calls mmap function that physical address 0x200400000 is mapped as the accessible virtual address of software, goes forward side by side
The processing of row next step.
Step 1110:CPU notifies the list of FPGA flush buffers.
CPU has read the decompression data of step 1109, and memory address 0x200400000 can be reused, and construction refreshes
Message informing FPGA flush buffers list.
The treatment process of refresh message notice FPGA flush buffers list is constructed with step 1103.
Step 1111: caching management module stores memory address onto FPGA.
FPGA has received the refresh message of the step 1110 of CPU, and caching management module stores memory address onto FPGA,
The treatment process is the same as step 1104.
Step 1112: accelerating engine 1 sends memory address to CPU.
The decompression of accelerating engine 1 is completed, and construction message BD notifies that CPU, message BD include the address of storage decompression result
List 0x200000000,0x200800000 decompress length 7MB.Then, FPGA set memory address 0x200000000,
The value of corresponding internal storage state word 0x400000000, the 0x400000008 of 0x200800000 is occupied for CPU.
Step 1113:CPU obtains the memory address that accelerating engine 1 is sent.
CPU has received the message BD that the decompression of step 1112 is completed, and parses message BD, obtains decompression result address and length
Degree, calling the total space 8MB mmap Function Mapping physical address 0x200000000,0x200800000 is the accessible company of software
Continuous virtual address, and carry out the processing of next step.
Step 1114:CPU notifies the list of FPGA flush buffers.
CPU has read the decompression data in step 1113, and memory address 0x200000000,0x200800000 can be again
It uses, the list of FPGA flush buffers is notified to two address architecture refresh messages respectively.
The treatment process of refresh message notice FPGA flush buffers list is constructed with step 1103.
Step 1115: caching management module stores memory address onto FPGA.
FPGA has received the refresh message of CPU, and the treatment process is the same as step 1104.
Wherein, the Service Processing Module executed on CPU may be exited extremely for some reason, be led when reading data
It causes the memory address of memory block FPGA not to be notified to flush to cache list, an EMS memory checking module is thus provided, by CPU
It executes, the verification that memory block can be carried out according to internal storage state word is recycled.It is as follows to verify recovery method:
As shown in figure 16, when EMS memory checking module initialization, application is used to protect with an equal amount of space of internal storage state word
Verification status word is deposited, and isochronous memory status word is to status word is verified, when recording the corresponding state synchronized of each verification status word
Between.
Verify module according to preset time (such as 3 seconds) timing traversal full memory status word, and with verify status word ratio
Compared with.
If internal storage state word and verification status word are different, the value for verifying the value of status word as internal storage state word is set,
And update the verification status word corresponding state synchronized time.
If internal storage state word and verify status word be it is the same, check verify status word state indicating bit whether be
FPGA occupies, and occupies if it is FPGA, without processing;Occupy if it is CPU, compares the state synchronized of the verification status word
Whether the difference of time and current time is greater than preset time-out time (such as 10 minutes), and if it is greater than the time, then recycling should
Memory address is flushed to cache list by refresh message by the corresponding memory address of status word again.
The specific example of a memory address state instruction field is provided below.
Assuming that it is 3 seconds that the time is verified in preset timing, time-out time is 10 minutes.There are two internal storage state words, and value is such as
Shown in table 3, wherein 100,98 be the timestamp for updating state indicating bit.
Table 3:
First memory address | Second memory address | |
Internal storage state word | CPU occupies+100 | FPGA occupies+98 |
It verifies module and carries out the 1st verification.Looking into status word due to protokaryon is initialization value, verifies status word and internal storage state
Word is different, verifies module and updates the value for verifying that status word is internal storage state word, as shown in table 4.Wherein, the state synchronized time is
It is the time of CPU, different by FPGA write-in from the timestamp of internal storage state word.
Table 4:
It verifies module and carries out the 2nd verification, detect that internal storage state word is as shown in table 5.
Table 5:
First memory address | Second memory address | |
Internal storage state word | CPU occupies+105 | CPU occupies+106 |
At this point, different due to verifying status word and internal storage state word, verifying module to update verification status word is memory shape
The value of state word, as shown in table 6.
Table 6:
In subsequent implementation procedure, the Service Processing Module of CPU is abnormal, accelerates computing unit by the second memory address
It is sent to Service Processing Module, so that Service Processing Module reads the processing result for the memory headroom that the second memory address is directed toward
Afterwards, the second memory address is not released to FPGA.FPGA does not operate the second memory address, therefore the second memory address
The value of internal storage state word update never again in the follow-up process.
It verifies module and carries out the 3rd verification, detect that internal storage state word is as shown in table 7.
Table 7:
First memory address | Second memory address | |
Internal storage state word | FPGA occupies+108 | CPU occupies+106 |
Since the verification status word of the first memory address and the value of internal storage state word are different, to first memory address,
The value for verifying that status word is internal storage state word is updated, and updates the state synchronized time of the verification status word of the first memory address.
And the verification status word and its state synchronized time-preserving of the second memory address, it is obtaining as a result, as shown in table 8.
Table 8:
First memory address | Second memory address | |
Verify status word | FPGA occupies+108 | CPU occupies+106 |
The state synchronized time | 1000006 | 1000003 |
At this point, the verification status word due to the second memory address remains unchanged, and state indicating bit occupies for CPU, checks
The last renewal time (i.e. state synchronized time) of the verification status word is 1000003 seconds, is 3 seconds with current time difference, also
It is not timed-out, verifies module and be not processed.
It verifies module and carries out the 4th verification to the 201st verification.It is subsequent because the second memory address is in lost condition,
FPGA does not operate the second memory address, so, the value of the internal storage state word of the second memory address is in the follow-up process
It updates never again.To which in the 4th is verified to the 201st time and verified, the first memory address is normal because using, thus its memory
Status word can generate variation, and the verification status word of the first memory address also changes therewith, and the second memory address is because in loss
State, internal storage state word, which always remains as CPU, to be occupied, preset super because being divided into embodiments of the present invention, between verification 3 seconds
When the time be 10 minutes, so, difference between the state synchronized time of the second memory address and current time is verified in the 4th
Time-out time is both less than into the 201st verification, that is, the internal storage state word of the second memory address is continuously the time that CPU occupies
Less than time-out time, CPU is not recycled the second memory address.In this way, the second memory address is verified in the 4th to the 201st
In secondary verification, internal storage state word occupies for CPU, and its verification status word also remains CPU and occupies, and verifies the shape of status word
State synchronization time is 1000003 seconds.
It verifies module and carries out the 202nd verification, detect that internal storage state word is as shown in table 9.
Table 9:
First memory address | Second memory address | |
Internal storage state word | CPU occupies+1100 | CPU occupies+106 |
Since the verification status word and internal storage state word of the first memory address are different, the verification of the first memory address is updated
Status word is the value of internal storage state word, and updates the state synchronized time of the verification status word of the first memory address.And in second
Deposit the verification status word and its state synchronized time-preserving of address, obtained implementing result, as shown in table 10.
Table 10:
First memory address | Second memory address | |
Verify status word | CPU occupies+1100 | CPU occupies+106 |
The state synchronized time | 1000606 | 1000003 |
At this point, the verification status word due to the second memory address remains unchanged, and state indicating bit occupies for CPU, checks
The last renewal time of the verification status word of second memory address is 1000003 seconds, is 603 seconds with current time difference,
Greater than time-out time 10 minutes of setting, CPU recycled the second memory address.After recycling the second memory address, FPGA update this
The internal storage state word of two memory address, so that the internal storage state word of the second memory address restores normal again.
It verifies module and carries out the 203rd verification, detect that internal storage state word is as shown in table 11.
Table 11:
At this point, for the two memory address, it is different due to verifying status word and internal storage state word, update verification state
Word is the value of internal storage state word, and obtained result is as shown in table 12.
Table 12:
In conclusion increasing caching management module inside FPGA, the caching management module is for managing in cache list
Memory address, these memory address be CPU be supplied to the memory address that FPGA is used.To which FPGA is in processing decompression request etc.
During business accelerates processing request, interrupt without sending to CPU application memory address, but directly from caching management module
Apply for memory address, directly memory address is obtained from cache list, to improve process performance.
Figure 17 is a kind of structural schematic diagram for accelerating computing unit provided in an embodiment of the present invention.The acceleration computing unit can
Method for executing the acceleration computing unit in above-mentioned each embodiment.
As shown in figure 17, which includes accelerating engine 1701 and caching management module 1702, caching
Management module 1702 is used for managing internal memory address;
Accelerating engine 1701, the business for obtaining central processor CPU transmission accelerate processing request;
Accelerating engine 1701 is also used to accelerate processing request to handle business, to obtain processing result;
Accelerating engine 1701 is also used to apply for memory address to caching management module 1702, obtains target memory address, add
Fast computing unit prestores at least one memory address, and target memory address is with belonging to the memory for accelerating to prestore on computing unit
Location;
Accelerating engine 1701 is also used to the processing result memory headroom that target memory address is directed toward is written;
Accelerating engine 1701 is also used to send target memory address to CPU.
Optionally,
Accelerating engine 1701, is also used to caching management module application memory address, obtains the first memory address, in first
Depositing address is one of the memory address for accelerating to prestore on computing unit;
Accelerating engine 1701, be also used to by the first processing result be written the first memory address be directed toward memory headroom, first
Processing result belongs to processing result;
Accelerating engine 1701 is also used to be fully written when the memory headroom that the first memory address is directed toward, and business acceleration is handled
When requesting untreated completion, to caching management module application memory address, the second memory address is obtained, the second memory address is to add
One of the memory address prestored on fast computing unit;
Accelerating engine 1701, be also used to by second processing result be written the second memory address be directed toward memory headroom, second
Processing result belongs to processing result;
Accelerating engine 1701 is also used to send the first memory address and the second memory address to CPU.
Optionally,
Before accelerating engine 1701 obtains the business acceleration processing request that CPU is sent, caching management module 1702, for obtaining
The target memory address for taking CPU to send;
Caching management module 1702 is also used to for being stored in target memory address on acceleration computing unit.
Optionally,
After accelerating engine 1701 sends target memory address to CPU, caching management module 1702 is also used to obtain CPU
The target memory address of transmission, the processing result on memory headroom that target memory address is directed toward are read by CPU;
Caching management module 1702 is also used to for being stored in target memory address on acceleration computing unit.
Optionally,
Accelerating the memory address prestored on computing unit is the memory address according to bit bit compression of alignment;
Accelerating engine 1701, is also used to send memory address request to caching management module 1702, and memory address request is used
In to caching management module request memory address;
Caching management module 1702, be also used to according to alignment bit target memory address is decompressed, decompressed
Target memory address;
Accelerating engine 1701 is also used to obtain the target memory address of the decompression of caching management module transmission.
Optionally,
The memory address prestored on computing unit is accelerated to be preset with check value,
Accelerating engine 1701, be also used to caching management module send memory address request, memory address request for
Caching management module requests memory address;
Caching management module 1702 is also used to be calculated the check value of target memory address;
Caching management module 1702 is also used to the preset check value when the check value and target memory address being calculated
When matching, target memory address is sent to accelerating engine, so that accelerating engine obtains target memory address.
Optionally,
Target memory address configuration has internal storage state word, and the value of internal storage state word is occupied including acceleration computing unit and CPU
Occupy, the value of internal storage state word is arranged to obtain by acceleration computing unit;
The acceleration computing unit of target memory address occupies to be calculated for indicating that target memory address has been stored in acceleration
On unit, target memory address is used by acceleration computing unit;
The CPU of target memory address occupies the processing result of the memory headroom caching for indicating to be directed toward target memory address
It can be read by CPU, target memory address is used by CPU;
After accelerating engine 1701 sends target memory address to CPU, caching management module 1702 is also used to obtain CPU
When the target memory address of transmission, the value of the internal storage state word of target memory address are continuously time that CPU occupies and are greater than default
Between;
Caching management module 1702 is also used to for being stored in target memory address on acceleration computing unit.
Optionally,
Target memory address is also configured with verification status word,
It verifies status word and the state synchronized time is corresponding, the state synchronized time is used to indicate that the value for verifying status word is synchronous
For internal storage state word value when time;
The value of the verification status word of target memory address by target memory address synchronous under synchronous condition internal storage state
The value of word obtains, and synchronous condition is the value of the verification status word of target memory address and the internal storage state word of target memory address
It is worth not identical;
The value of the internal storage state word of target memory address is continuously time that CPU occupies and is greater than preset time specifically: mesh
The value for marking the value of the internal storage state word of memory address and the verification status word of target memory address is all that CPU occupies, and in target
It deposits the difference between the state synchronized time of address and current time and is greater than preset time, current time is to detect target memory
The value of the verification status word of the value and target memory address of the internal storage state word of address is all the time that CPU occupies.
In conclusion the acceleration computing unit 1700 includes accelerating engine 1701 and caching management module 1702, caching pipe
It manages module 1702 and is used for managing internal memory address;Accelerating engine 1701 obtains the business acceleration processing that central processor CPU is sent and asks
It asks;Then, accelerating engine 1701 accelerates processing request to handle business, to obtain processing result.In order to store the processing
As a result, accelerating engine 1701 applies for memory address to caching management module 1702, target memory address is obtained, wherein accelerometer
It calculates unit and prestores at least one memory address, target memory address belongs to the memory address for accelerating to prestore on computing unit.So
Afterwards, the memory headroom and accelerating engine 1701 that processing result write-in target memory address is directed toward by accelerating engine 1701 are to CPU
Send target memory address.In this way, setting is used for the caching management module of managing internal memory address on accelerating computing unit, accelerate
The accelerating engine of computing unit accelerates processing request to handle the business that CPU is sent, to obtain processing result.Accelerating engine
Need to store the processing result, for this purpose, accelerating engine can obtain being pre-stored in accelerometer to caching management module application memory address
The target memory address on unit is calculated, then, which is written on the memory headroom of target memory address direction, is added
After fast engine sends target memory address to CPU, CPU can read the processing result by the target memory address.Because
Accelerating engine to the acquisition of memory address by being realized to caching management module application, the accelerating engine and caching management module position
In on same acceleration computing unit, accelerating engine application to memory address be with being pre-stored in the memory accelerated on computing unit
Location, in this way, accelerating engine can quick obtaining to memory address, to store the processing knot that processing business accelerates processing request to obtain
Fruit.Compared with accelerating engine is to the scheme of CPU application memory address, the scheme of the embodiment of the present invention is without waiting for CPU to memory
The response of address reduces the waiting time that accelerating engine obtains memory address, so as to quick obtaining memory address, to use
The memory address stores processing result, realizes the raising of service process performance.
Figure 18 is a kind of structural schematic diagram of central processing unit provided in an embodiment of the present invention.The central processing unit can be used for
The method for executing the central processing unit in above-mentioned each embodiment.
Refering to fig. 18, the central processing unit 1800 of the embodiment of the present invention includes Service Processing Module 1801 and memory modules
1802;
Service Processing Module 1801 obtains target memory address for applying for memory address to memory modules 1802;
Service Processing Module 1801 is also used to the caching management module transmission target memory address for accelerating computing unit,
Accelerate on computing unit so that target memory address is stored in by caching management module;
Service Processing Module 1801 is also used to obtain business and accelerates processing request;
Service Processing Module 1801 is also used to the accelerating engine transmission business acceleration processing request for accelerating computing unit,
So that accelerating engine accelerates processing request to handle business, to obtain processing result, accelerating engine is to caching management module
Apply for memory address, after obtaining target memory address, the memory that target memory address is directed toward is written in processing result by accelerating engine
Space.
Optionally,
Service Processing Module 1801 sends business to the accelerating engine of acceleration computing unit and accelerates after handling request, business
Processing module 1801 is also used to obtain the target memory address of accelerating engine transmission;
Service Processing Module 1801 is also used to the memory headroom reading process result being directed toward from target memory address;
Service Processing Module 1801 is also used to send target memory address to caching management module, so that cache management mould
Target memory address is stored in by block to be accelerated on computing unit.
Optionally,
CPU further includes verifying module 1803,
Target memory address configuration has internal storage state word, and the value of internal storage state word is occupied including acceleration computing unit and CPU
Occupy, the value of internal storage state word is arranged to obtain by acceleration computing unit;
The acceleration computing unit of target memory address occupies to be calculated for indicating that target memory address has been stored in acceleration
On unit, target memory address is used by acceleration computing unit;
The CPU of target memory address occupies the processing result of the memory headroom caching for indicating to be directed toward target memory address
It can be read by CPU, target memory address is used by CPU;
Service Processing Module verifies module to accelerating the accelerating engine transmission business of computing unit to accelerate after handling request
1803, it is continuously whether the time that CPU occupies is greater than preset time for detecting the value of internal storage state word of target memory address;
Service Processing Module 1801, if the value for being also used to the internal storage state word of target memory address is continuously what CPU occupied
Time is greater than preset time, then sends target memory address to caching management module, so that caching management module is by target memory
Address, which is stored in, to be accelerated on computing unit.
Optionally,
Target memory address is also configured with verification status word;
It verifies status word and the state synchronized time is corresponding, the state synchronized time is used to indicate that the value for verifying status word is synchronous
For internal storage state word value when time;
Module 1803 is verified, is also used to judge every prefixed time interval the value of the verification status word of target memory address
It is whether identical with the value of the internal storage state word of target memory address;
Module 1803 is verified, if being also used to the value of the verification status word of target memory address and the memory of target memory address
The value of status word is not identical, then synchronizes the value of the verification status word of target memory address for the internal storage state of target memory address
The value of word, and update the state synchronized time of target memory address;
Service Processing Module 1801, if being also used to value and the target memory address of the verification status word of target memory address
The value of internal storage state word is identical, then when verify module detect target memory address verifications status word value occupy for CPU, mesh
It marks the difference between the state synchronized time and current time of memory address and is greater than preset time and within a preset time interval mesh
When the value of the internal storage state word of mark memory address has not been changed, the step of sending target memory address to caching management module is executed,
Current time is to verify module to detect that the value of the verification status word of target memory address is the time that CPU occupies.
In conclusion central processing unit 1800 includes Service Processing Module 1801 and memory modules 1802.Business processing mould
Block 1801 applies for memory address to memory modules 1802, obtains target memory address, and then, Service Processing Module 1801 is to acceleration
The caching management module of computing unit sends target memory address, adds so that target memory address is stored in by caching management module
On fast computing unit.Service Processing Module 1801 obtains after business accelerates processing request, and Service Processing Module 1801 is to accelerometer
The accelerating engine for calculating unit sends business and accelerates processing request, so that accelerating engine accelerates processing request to handle business,
To obtain processing result, accelerating engine is to caching management module application memory address, after obtaining target memory address, accelerating engine
The memory headroom that processing result write-in target memory address is directed toward.
Because the memory modules on CPU are used for managing internal memory address, so that Service Processing Module is to memory modules application memory
Address obtains target memory address.Then, Service Processing Module is into the caching management module transmission target for accelerating computing unit
Address is deposited, is accelerated on computing unit so that target memory address is stored in by caching management module.It is calculated in this way, realizing acceleration
Stored memory address on unit.Service Processing Module obtains business and accelerates after handling request, and Service Processing Module is calculated to acceleration
The accelerating engine of unit sends business and accelerates processing request, so that accelerating engine accelerates processing request to handle business, with
Processing result is obtained, accelerating engine is to caching management module application memory address, and after obtaining target memory address, accelerating engine will
The memory headroom that target memory address is directed toward is written in processing result.That is CPU first sends memory address to acceleration computing unit, for
Accelerate computing unit storage, in this way, when accelerating calculating cell processing business that processing request is accelerated to obtain processing result, in order to cache
The processing result accelerates the accelerating engine of computing unit directly can obtain the target memory prestored from acceleration computing unit
Address, without to CPU apply acquisition memory address, accelerating engine can quick obtaining to memory address, added with storing processing business
The processing result that speed processing request obtains, thus, reduce the waiting time that accelerating engine obtains memory address, it can quick obtaining
Memory address realizes the raising of service process performance to use the memory address to store processing result.
The embodiment of the present invention also provides a kind of heterogeneous system, and the structure of the heterogeneous system sees Fig. 1.
The heterogeneous system includes accelerating computing unit and central processing unit.
Wherein, accelerating computing unit is the acceleration computing unit of embodiment as shown in figure 17, is detailed in above-mentioned each exemplary reality
Example is applied, details are not described herein again.
Central processing unit is the central processing unit of embodiment as shown in figure 18, is detailed in above-mentioned each exemplary embodiment, herein
It repeats no more.
In the above-described embodiments, can come wholly or partly by software, hardware, firmware or any combination thereof real
It is existing.When implemented in software, it can entirely or partly realize in the form of a computer program product.
The computer program product includes one or more computer instructions.Load and execute on computers the meter
When calculation machine program instruction, entirely or partly generate according to process or function described in the embodiment of the present invention.The computer can
To be general purpose computer, special purpose computer, computer network or other programmable devices.The computer instruction can be deposited
Storage in a computer-readable storage medium, or from a computer readable storage medium to another computer readable storage medium
Transmission, for example, the computer instruction can pass through wired (example from a web-site, computer, server or data center
Such as coaxial cable, optical fiber, Digital Subscriber Line (DSL)) or wireless (such as infrared, wireless, microwave) mode to another website
Website, computer, server or data center are transmitted.The computer readable storage medium can be computer and can deposit
Any usable medium of storage either includes that the data storages such as one or more usable mediums integrated server, data center are set
It is standby.The usable medium can be magnetic medium, (for example, floppy disk, hard disk, tape), optical medium (for example, DVD) or partly lead
Body medium (such as solid state hard disk Solid State Disk (SSD)) etc..
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description,
The specific work process of device and unit, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.
In several embodiments provided herein, it should be understood that disclosed system, device and method can be with
It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the unit
It divides, only a kind of logical function partition, there may be another division manner in actual implementation, such as multiple units or components
It can be combined or can be integrated into another system, or some features can be ignored or not executed.Another point, it is shown or
The mutual coupling, direct-coupling or communication connection discussed can be through some interfaces, the indirect coupling of device or unit
It closes or communicates to connect, can be electrical property, mechanical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit
The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple
In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme
's.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit
It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list
Member both can take the form of hardware realization, can also realize in the form of software functional units.
If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product
When, it can store in a computer readable storage medium.Based on this understanding, the technical solution of the embodiment of the present invention
Substantially all or part of the part that contributes to existing technology or the technical solution can be with software product in other words
Form embody, which is stored in a storage medium, including some instructions use so that one
Computer equipment (can be personal computer, server or the network equipment etc.) executes side described in each embodiment of the present invention
The all or part of the steps of method.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (Read-Only
Memory, ROM), random access memory (Random Access Memory, RAM), magnetic or disk etc. is various to deposit
Store up the medium of program code.
Claims (25)
1. a kind of data processing method, which is characterized in that
The method is applied to accelerate computing unit, and the acceleration computing unit includes accelerating engine and caching management module, institute
Caching management module is stated for managing internal memory address;
The described method includes:
The accelerating engine obtains the business that central processor CPU is sent and accelerates processing request;
The accelerating engine accelerates processing request to handle the business, to obtain processing result;
The accelerating engine obtains target memory address to the caching management module application memory address, and the acceleration calculates
Unit prestores at least one memory address, and the target memory address is with belonging to the memory prestored on the acceleration computing unit
Location;
The memory headroom that the target memory address is directed toward is written in the processing result by the accelerating engine;
The accelerating engine sends the target memory address to the CPU.
2. the method according to claim 1, wherein
The accelerating engine obtains target memory address and the acceleration to the caching management module application memory address
The memory headroom that the target memory address is directed toward is written in the processing result by engine, comprising:
The accelerating engine obtains the first memory address, first memory to the caching management module application memory address
Address is one of the memory address prestored on the acceleration computing unit;
The memory headroom that first memory address is directed toward, first processing is written in first processing result by the accelerating engine
As a result belong to the processing result;
When the memory headroom that first memory address is directed toward is fully written, and the business accelerates processing to request untreated completion
When, the accelerating engine obtains the second memory address to the caching management module application memory address, second memory
Location is one of the memory address prestored on the acceleration computing unit;
The memory headroom that second memory address is directed toward, the second processing is written in second processing result by the accelerating engine
As a result belong to the processing result;
The accelerating engine sends the target memory address to the CPU, comprising:
The accelerating engine sends first memory address and second memory address to the CPU.
3. the method according to claim 1, wherein
Before the accelerating engine obtains the business acceleration processing request that CPU is sent, the method also includes:
The caching management module obtains the target memory address that the CPU is sent;
The target memory address is stored on the acceleration computing unit by the caching management module.
4. method according to any one of claims 1 to 3, which is characterized in that
After the accelerating engine sends the target memory address to the CPU, the method also includes:
The caching management module obtains the target memory address that the CPU is sent, what the target memory address was directed toward
Processing result on memory headroom is read by the CPU;
The target memory address is stored on the acceleration computing unit by the caching management module.
5. method according to any one of claims 1 to 3, which is characterized in that
The memory address prestored on the acceleration computing unit is the memory address according to bit bit compression of alignment;
The accelerating engine obtains target memory address to the caching management module application memory address, comprising:
The accelerating engine sends memory address request to the caching management module, and memory address request is for described
Caching management module requests memory address;
The caching management module decompresses target memory address for bit according to alignment, the target memory decompressed
Address;
The accelerating engine obtains the target memory address for the decompression that the caching management module is sent.
6. method according to any one of claims 1 to 3, which is characterized in that
The memory address prestored on the acceleration computing unit is preset with check value,
The accelerating engine obtains target memory address to the caching management module application memory address, comprising:
The accelerating engine sends memory address request to the caching management module, and memory address request is for described
Caching management module requests memory address;
The check value of target memory address is calculated in the caching management module;
When the check value being calculated and the preset check value of the target memory address match, the cache management
Module sends the target memory address to the accelerating engine, so that the accelerating engine obtains the target memory address.
7. method according to any one of claims 1 to 3, which is characterized in that
The target memory address configuration has an internal storage state word, the value of the internal storage state word include accelerate computing unit occupy and
CPU occupies, and the value of the internal storage state word is arranged to obtain by the acceleration computing unit;
The acceleration computing unit of the target memory address occupies for indicating that the target memory address has been stored in
On the acceleration computing unit, the target memory address is used by the acceleration computing unit;
The CPU of the target memory address occupies the memory headroom caching for indicating to be directed toward the target memory address
Processing result can be read by CPU, and the target memory address is used by the CPU;
After the accelerating engine sends the target memory address to the CPU, the method also includes:
The caching management module obtains the target memory address that the CPU is sent, the memory of the target memory address
The value of status word is continuously time that CPU occupies and is greater than preset time;
The target memory address is stored on the acceleration computing unit by the caching management module.
8. the method according to the description of claim 7 is characterized in that
The target memory address is also configured with verification status word;
The verification status word and state synchronized time are corresponding, and the state synchronized time is for indicating the verification status word
Value synchronous time when being the value of the internal storage state word;
The value of the verification status word of the target memory address under synchronous condition by synchronizing the memory of the target memory address
The value of status word obtains, and the synchronous condition is for the value of the verification status word of the target memory address and the target memory
The value of the internal storage state word of location is not identical;
The value of the internal storage state word of the target memory address is continuously time that CPU occupies and is greater than preset time specifically: institute
The value for stating the value of the internal storage state word of target memory address and the verification status word of the target memory address is all that CPU occupies,
And the difference between the state synchronized time and current time of the target memory address is greater than preset time, the current time
For detect the target memory address internal storage state word value and the target memory address verification status word value it is same
The time occupied for CPU.
9. a kind of data processing method, which is characterized in that
The method is applied on CPU, and the CPU includes Service Processing Module and memory modules;
The described method includes:
The Service Processing Module obtains target memory address to the memory modules application memory address;
The Service Processing Module is to accelerating the caching management module of computing unit to send the target memory address, so that described
The target memory address is stored on the acceleration computing unit by caching management module;
The Service Processing Module obtains business and accelerates processing request;
The Service Processing Module sends the business to the accelerating engine for accelerating computing unit and accelerates processing request, so that
The accelerating engine accelerates processing request to handle the business, and to obtain processing result, the accelerating engine is to described
Caching management module application memory address, after obtaining the target memory address, the accelerating engine writes the processing result
Enter the memory headroom that the target memory address is directed toward.
10. according to the method described in claim 9, it is characterized in that,
The Service Processing Module sends the business to the accelerating engine for accelerating computing unit and accelerates after handling request,
The method also includes:
The Service Processing Module obtains the target memory address that the accelerating engine is sent;
The memory headroom that the Service Processing Module is directed toward from the target memory address reads the processing result;
The Service Processing Module sends the target memory address to the caching management module, so that the cache management mould
The target memory address is stored on the acceleration computing unit by block.
11. according to the method described in claim 9, it is characterized in that,
The CPU further includes verifying module,
The target memory address configuration has an internal storage state word, the value of the internal storage state word include accelerate computing unit occupy and
CPU occupies, and the value of the internal storage state word is arranged to obtain by the acceleration computing unit;
The acceleration computing unit of the target memory address occupies for indicating that the target memory address has been stored in
On the acceleration computing unit, the target memory address is used by the acceleration computing unit;
The CPU of the target memory address occupies the memory headroom caching for indicating to be directed toward the target memory address
Processing result can be read by CPU, and the target memory address is used by the CPU;
The Service Processing Module sends the business to the accelerating engine for accelerating computing unit and accelerates after handling request,
The method also includes:
Whether the value for verifying the internal storage state word that module detects the target memory address is continuously the time that CPU occupies big
In preset time;
If the value of the internal storage state word of the target memory address is continuously the time that CPU occupies and is greater than preset time, described
Service Processing Module sends the target memory address to the caching management module, so that the caching management module will be described
Target memory address is stored on the acceleration computing unit.
12. according to the method for claim 11, which is characterized in that
The target memory address is also configured with verification status word;
The verification status word and state synchronized time are corresponding, and the state synchronized time is for indicating the verification status word
Value synchronous time when being the value of the internal storage state word;
Whether the value for verifying the internal storage state word that module detects the target memory address is continuously the time that CPU occupies big
In preset time, comprising:
The verification module every prefixed time interval, judge the verification status word of the target memory address value and the mesh
Whether the value for marking the internal storage state word of memory address is identical;
If the value of the internal storage state word of the value of the verification status word of the target memory address and the target memory address not phase
Together, then the verification module synchronizes the value of the verification status word of the target memory address in the target memory address
The value of status word is deposited, and updates the state synchronized time of the target memory address;
If the value of the verification status word of the target memory address is identical with the value of the internal storage state word of the target memory address,
Then when the verification module detect the verifications status word of the target memory address value occupy for CPU, the target memory
Difference between the state synchronized time of address and current time is greater than preset time and described in the prefixed time interval
When the value of the internal storage state word of target memory address has not been changed, the Service Processing Module is executed to be sent out to the caching management module
The step of sending the target memory address, the current time are the core that the verification module detects the target memory address
Look into the time that the value of status word is occupied for CPU.
13. a kind of acceleration computing unit, which is characterized in that
The acceleration computing unit includes accelerating engine and caching management module, and the caching management module is for managing internal memory
Location;
The accelerating engine, the business for obtaining central processor CPU transmission accelerate processing request;
The accelerating engine is also used to accelerate processing request to handle the business, to obtain processing result;
The accelerating engine is also used to obtain target memory address to the caching management module application memory address, described to add
Fast computing unit prestores at least one memory address, and the target memory address, which belongs to, to be prestored on the acceleration computing unit
Memory address;
The accelerating engine is also used to be written the processing result memory headroom that the target memory address is directed toward;
The accelerating engine is also used to send the target memory address to the CPU.
14. acceleration computing unit according to claim 13, which is characterized in that
The accelerating engine, is also used to the caching management module application memory address, obtains the first memory address, and described
One memory address is one of the memory address prestored on the acceleration computing unit;
The accelerating engine is also used to be written the first processing result the memory headroom that first memory address is directed toward, described
First processing result belongs to the processing result;
The accelerating engine is also used to be fully written when the memory headroom that first memory address is directed toward, and the business accelerates
When untreated completion is requested in processing, to the caching management module application memory address, the second memory address is obtained, described second
Memory address is one of the memory address prestored on the acceleration computing unit;
The accelerating engine is also used to be written second processing result the memory headroom that second memory address is directed toward, described
Second processing result belongs to the processing result;
The accelerating engine is also used to send first memory address and second memory address to the CPU.
15. acceleration computing unit according to claim 13, which is characterized in that
Before the accelerating engine obtains the business acceleration processing request that CPU is sent, the caching management module, for obtaining
State the target memory address of CPU transmission;
The caching management module is also used to for being stored on the acceleration computing unit target memory address.
16. 3 to 15 described in any item acceleration computing units according to claim 1, which is characterized in that
After the accelerating engine sends the target memory address to the CPU, the caching management module is also used to obtain
The target memory address that the CPU is sent, the processing result on memory headroom that the target memory address is directed toward by
The CPU is read;
The caching management module is also used to for being stored on the acceleration computing unit target memory address.
17. 3 to 15 described in any item acceleration computing units according to claim 1, which is characterized in that
The memory address prestored on the acceleration computing unit is the memory address according to bit bit compression of alignment;
The accelerating engine, is also used to send memory address request to the caching management module, and the memory address request is used
In to the caching management module request memory address;
The caching management module, be also used to according to alignment bit target memory address is decompressed, the mesh decompressed
Mark memory address;
The accelerating engine is also used to obtain the target memory address for the decompression that the caching management module is sent.
18. 3 to 15 described in any item acceleration computing units according to claim 1, which is characterized in that
The memory address prestored on the acceleration computing unit is preset with check value,
The accelerating engine, is also used to send memory address request to the caching management module, and the memory address request is used
In to the caching management module request memory address;
The caching management module is also used to be calculated the check value of target memory address;
The caching management module is also used to when the preset school of the check value and the target memory address being calculated
When testing value matching, the target memory address is sent to the accelerating engine, so that the accelerating engine obtains in the target
Deposit address.
19. 3 to 15 described in any item acceleration computing units according to claim 1, which is characterized in that
The target memory address configuration has an internal storage state word, the value of the internal storage state word include accelerate computing unit occupy and
CPU occupies, and the value of the internal storage state word is arranged to obtain by the acceleration computing unit;
The acceleration computing unit of the target memory address occupies for indicating that the target memory address has been stored in
On the acceleration computing unit, the target memory address is used by the acceleration computing unit;
The CPU of the target memory address occupies the memory headroom caching for indicating to be directed toward the target memory address
Processing result can be read by CPU, and the target memory address is used by the CPU;
After the accelerating engine sends the target memory address to the CPU, the caching management module is also used to obtain
The target memory address that the CPU is sent, the value of the internal storage state word of the target memory address are continuously what CPU occupied
Time is greater than preset time;
The caching management module is also used to for being stored on the acceleration computing unit target memory address.
20. acceleration computing unit according to claim 19, which is characterized in that
The target memory address is also configured with verification status word;
The verification status word and state synchronized time are corresponding, and the state synchronized time is for indicating the verification status word
Value synchronous time when being the value of the internal storage state word;
The value of the verification status word of the target memory address under synchronous condition by synchronizing the memory of the target memory address
The value of status word obtains, and the synchronous condition is for the value of the verification status word of the target memory address and the target memory
The value of the internal storage state word of location is not identical;
The value of the internal storage state word of the target memory address is continuously time that CPU occupies and is greater than preset time specifically: institute
The value for stating the value of the internal storage state word of target memory address and the verification status word of the target memory address is all that CPU occupies,
And the difference between the state synchronized time and current time of the target memory address is greater than preset time, the current time
For detect the target memory address internal storage state word value and the target memory address verification status word value it is same
The time occupied for CPU.
21. a kind of central processing unit, which is characterized in that
The central processing unit includes Service Processing Module and memory modules;
The Service Processing Module, for obtaining target memory address to the memory modules application memory address;
The Service Processing Module is also used to the caching management module transmission target memory address for accelerating computing unit,
So that the target memory address is stored on the acceleration computing unit by the caching management module;
The Service Processing Module is also used to obtain business and accelerates processing request;
The Service Processing Module is also used to send the business acceleration processing to the accelerating engine for accelerating computing unit and ask
It asks, so that the accelerating engine accelerates processing request to handle the business, to obtain processing result, the accelerating engine
To the caching management module application memory address, after obtaining the target memory address, the accelerating engine is by the processing
As a result the memory headroom that the target memory address is directed toward is written.
22. central processing unit according to claim 21, which is characterized in that
The Service Processing Module sends the business to the accelerating engine for accelerating computing unit and accelerates after handling request,
The Service Processing Module is also used to obtain the target memory address that the accelerating engine is sent;
The Service Processing Module, the memory headroom for being also used to be directed toward from the target memory address read the processing result;
The Service Processing Module is also used to send the target memory address to the caching management module, so that described slow
It deposits management module the target memory address is stored on the acceleration computing unit.
23. central processing unit according to claim 21, which is characterized in that
The CPU further includes verifying module,
The target memory address configuration has an internal storage state word, the value of the internal storage state word include accelerate computing unit occupy and
CPU occupies, and the value of the internal storage state word is arranged to obtain by the acceleration computing unit;
The acceleration computing unit of the target memory address occupies for indicating that the target memory address has been stored in
On the acceleration computing unit, the target memory address is used by the acceleration computing unit;
The CPU of the target memory address occupies the memory headroom caching for indicating to be directed toward the target memory address
Processing result can be read by CPU, and the target memory address is used by the CPU;
The Service Processing Module sends the business to the accelerating engine for accelerating computing unit and accelerates after handling request,
The verification module, for detecting whether the value of internal storage state word of the target memory address is continuously time that CPU occupies
Greater than preset time;
The Service Processing Module, if the value for being also used to the internal storage state word of the target memory address is continuously what CPU occupied
Time is greater than preset time, then the target memory address is sent to the caching management module, so that the cache management mould
The target memory address is stored on the acceleration computing unit by block.
24. central processing unit according to claim 23, which is characterized in that
The target memory address is also configured with verification status word;
The verification status word and state synchronized time are corresponding, and the state synchronized time is for indicating the verification status word
Value synchronous time when being the value of the internal storage state word;
The verification module is also used to judge every prefixed time interval the value of the verification status word of the target memory address
It is whether identical with the value of the internal storage state word of the target memory address;
The verification module, if be also used to the verification status word of the target memory address value and the target memory address
The value of internal storage state word is not identical, then synchronizes the value of the verification status word of the target memory address for the target memory
The value of the internal storage state word of location, and update the state synchronized time of the target memory address;
The Service Processing Module, if be also used to the verification status word of the target memory address value and the target memory
The value of the internal storage state word of location is identical, then when the verification module detects the value of the verification status word of the target memory address
Occupy for CPU, difference between the state synchronized time and current time of the target memory address is greater than preset time and
When the value of the internal storage state word of the target memory address has not been changed in the prefixed time interval, execute to the cache management
Module send the target memory address the step of, the current time be the verification module with detecting the target memory
The value of the verification status word of location is the time that CPU occupies.
25. a kind of heterogeneous system, which is characterized in that
The heterogeneous system includes accelerating computing unit and central processing unit,
Wherein, the acceleration computing unit is such as the described in any item acceleration computing units of claim 13 to 20;
The central processing unit is such as the described in any item central processing units of claim 21 to 24.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710617841.0A CN109308280B (en) | 2017-07-26 | 2017-07-26 | Data processing method and related equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710617841.0A CN109308280B (en) | 2017-07-26 | 2017-07-26 | Data processing method and related equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109308280A true CN109308280A (en) | 2019-02-05 |
CN109308280B CN109308280B (en) | 2021-05-18 |
Family
ID=65202809
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710617841.0A Active CN109308280B (en) | 2017-07-26 | 2017-07-26 | Data processing method and related equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109308280B (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110515727A (en) * | 2019-08-16 | 2019-11-29 | 苏州浪潮智能科技有限公司 | A kind of the memory headroom operating method and relevant apparatus of FPGA |
CN111046072A (en) * | 2019-11-29 | 2020-04-21 | 浪潮(北京)电子信息产业有限公司 | Data query method, system, heterogeneous computing acceleration platform and storage medium |
CN111309482A (en) * | 2020-02-20 | 2020-06-19 | 浙江亿邦通信科技有限公司 | Ore machine controller task distribution system, device and storable medium thereof |
CN111367839A (en) * | 2020-02-21 | 2020-07-03 | 苏州浪潮智能科技有限公司 | Data synchronization method between host terminal and FPGA accelerator |
CN111708715A (en) * | 2020-06-17 | 2020-09-25 | Oppo广东移动通信有限公司 | Memory allocation method, memory allocation device and terminal equipment |
CN111813713A (en) * | 2020-09-08 | 2020-10-23 | 苏州浪潮智能科技有限公司 | Data acceleration operation processing method and device and computer readable storage medium |
CN111930510A (en) * | 2020-08-20 | 2020-11-13 | 北京达佳互联信息技术有限公司 | Electronic device and data processing method |
WO2021213209A1 (en) * | 2020-04-22 | 2021-10-28 | 华为技术有限公司 | Data processing method and apparatus, and heterogeneous system |
TWI805302B (en) * | 2021-09-29 | 2023-06-11 | 慧榮科技股份有限公司 | Method and computer program product and apparatus for programming data into flash memory |
WO2023123849A1 (en) * | 2021-12-28 | 2023-07-06 | 苏州浪潮智能科技有限公司 | Method for accelerated computation of data and related apparatus |
US11860775B2 (en) | 2021-09-29 | 2024-01-02 | Silicon Motion, Inc. | Method and apparatus for programming data into flash memory incorporating with dedicated acceleration hardware |
US11972150B2 (en) | 2021-09-29 | 2024-04-30 | Silicon Motion, Inc. | Method and non-transitory computer-readable storage medium and apparatus for programming data into flash memory through dedicated acceleration hardware |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5787468A (en) * | 1996-06-11 | 1998-07-28 | Data General Corporation | Computer system with a cache coherent non-uniform memory access architecture using a fast tag cache to accelerate memory references |
CN101539853A (en) * | 2008-03-21 | 2009-09-23 | 富士通株式会社 | Information processing unit, program, and instruction sequence generation method |
CN101682621A (en) * | 2007-03-12 | 2010-03-24 | 思杰系统有限公司 | Systems and methods for cache operations |
US20110161619A1 (en) * | 2009-12-29 | 2011-06-30 | Advanced Micro Devices, Inc. | Systems and methods implementing non-shared page tables for sharing memory resources managed by a main operating system with accelerator devices |
CN102349051A (en) * | 2008-12-04 | 2012-02-08 | 美国亚德诺半导体公司 | Methods and apparatus for performing jump operations in a digital processor |
CN102346661A (en) * | 2010-07-30 | 2012-02-08 | 国际商业机器公司 | Method and system for state maintenance of request queue of hardware accelerator |
CN102567241A (en) * | 2010-12-27 | 2012-07-11 | 北京国睿中数科技股份有限公司 | Memory controller and memory access control method |
CN104252416A (en) * | 2013-06-28 | 2014-12-31 | 华为技术有限公司 | Accelerator and data processing method |
CN104853213A (en) * | 2015-05-05 | 2015-08-19 | 福州瑞芯微电子有限公司 | Method and system for improving cache processing efficiency of video decoder |
CN105027091A (en) * | 2013-03-13 | 2015-11-04 | 英派尔科技开发有限公司 | Methods, devices and systems for physical-to-logical mapping in solid state drives |
CN105068817A (en) * | 2015-08-26 | 2015-11-18 | 华为技术有限公司 | Method for writing data in storage device and storage device |
US20160239410A1 (en) * | 2015-02-17 | 2016-08-18 | International Business Machines Corporation | Accelerating multiversion concurrency control using hardware transactional memory |
CN106126481A (en) * | 2016-06-29 | 2016-11-16 | 华为技术有限公司 | A kind of computing engines and electronic equipment |
CN106933510A (en) * | 2017-02-27 | 2017-07-07 | 华中科技大学 | A kind of storage control |
-
2017
- 2017-07-26 CN CN201710617841.0A patent/CN109308280B/en active Active
Patent Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5787468A (en) * | 1996-06-11 | 1998-07-28 | Data General Corporation | Computer system with a cache coherent non-uniform memory access architecture using a fast tag cache to accelerate memory references |
CN101682621A (en) * | 2007-03-12 | 2010-03-24 | 思杰系统有限公司 | Systems and methods for cache operations |
CN101539853A (en) * | 2008-03-21 | 2009-09-23 | 富士通株式会社 | Information processing unit, program, and instruction sequence generation method |
CN102349051A (en) * | 2008-12-04 | 2012-02-08 | 美国亚德诺半导体公司 | Methods and apparatus for performing jump operations in a digital processor |
US20110161619A1 (en) * | 2009-12-29 | 2011-06-30 | Advanced Micro Devices, Inc. | Systems and methods implementing non-shared page tables for sharing memory resources managed by a main operating system with accelerator devices |
CN102346661A (en) * | 2010-07-30 | 2012-02-08 | 国际商业机器公司 | Method and system for state maintenance of request queue of hardware accelerator |
CN102567241A (en) * | 2010-12-27 | 2012-07-11 | 北京国睿中数科技股份有限公司 | Memory controller and memory access control method |
CN105027091A (en) * | 2013-03-13 | 2015-11-04 | 英派尔科技开发有限公司 | Methods, devices and systems for physical-to-logical mapping in solid state drives |
CN104252416A (en) * | 2013-06-28 | 2014-12-31 | 华为技术有限公司 | Accelerator and data processing method |
US20160239410A1 (en) * | 2015-02-17 | 2016-08-18 | International Business Machines Corporation | Accelerating multiversion concurrency control using hardware transactional memory |
CN104853213A (en) * | 2015-05-05 | 2015-08-19 | 福州瑞芯微电子有限公司 | Method and system for improving cache processing efficiency of video decoder |
CN105068817A (en) * | 2015-08-26 | 2015-11-18 | 华为技术有限公司 | Method for writing data in storage device and storage device |
CN106126481A (en) * | 2016-06-29 | 2016-11-16 | 华为技术有限公司 | A kind of computing engines and electronic equipment |
CN106933510A (en) * | 2017-02-27 | 2017-07-07 | 华中科技大学 | A kind of storage control |
Non-Patent Citations (1)
Title |
---|
石伟: ""基于闪存特性的存储优化及应用研究"", 《中国博士学位论文全文数据库 信息科技辑》 * |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110515727A (en) * | 2019-08-16 | 2019-11-29 | 苏州浪潮智能科技有限公司 | A kind of the memory headroom operating method and relevant apparatus of FPGA |
CN111046072A (en) * | 2019-11-29 | 2020-04-21 | 浪潮(北京)电子信息产业有限公司 | Data query method, system, heterogeneous computing acceleration platform and storage medium |
CN111309482A (en) * | 2020-02-20 | 2020-06-19 | 浙江亿邦通信科技有限公司 | Ore machine controller task distribution system, device and storable medium thereof |
CN111309482B (en) * | 2020-02-20 | 2023-08-15 | 浙江亿邦通信科技有限公司 | Hash algorithm-based block chain task allocation system, device and storable medium |
CN111367839A (en) * | 2020-02-21 | 2020-07-03 | 苏州浪潮智能科技有限公司 | Data synchronization method between host terminal and FPGA accelerator |
US11762790B2 (en) | 2020-02-21 | 2023-09-19 | Inspur Suzhou Intelligent Technology Co., Ltd. | Method for data synchronization between host side and FPGA accelerator |
WO2021213209A1 (en) * | 2020-04-22 | 2021-10-28 | 华为技术有限公司 | Data processing method and apparatus, and heterogeneous system |
CN111708715B (en) * | 2020-06-17 | 2023-08-15 | Oppo广东移动通信有限公司 | Memory allocation method, memory allocation device and terminal equipment |
CN111708715A (en) * | 2020-06-17 | 2020-09-25 | Oppo广东移动通信有限公司 | Memory allocation method, memory allocation device and terminal equipment |
CN111930510A (en) * | 2020-08-20 | 2020-11-13 | 北京达佳互联信息技术有限公司 | Electronic device and data processing method |
CN111930510B (en) * | 2020-08-20 | 2024-05-07 | 北京达佳互联信息技术有限公司 | Electronic device and data processing method |
CN111813713B (en) * | 2020-09-08 | 2021-02-12 | 苏州浪潮智能科技有限公司 | Data acceleration operation processing method and device and computer readable storage medium |
CN111813713A (en) * | 2020-09-08 | 2020-10-23 | 苏州浪潮智能科技有限公司 | Data acceleration operation processing method and device and computer readable storage medium |
TWI805302B (en) * | 2021-09-29 | 2023-06-11 | 慧榮科技股份有限公司 | Method and computer program product and apparatus for programming data into flash memory |
US11860775B2 (en) | 2021-09-29 | 2024-01-02 | Silicon Motion, Inc. | Method and apparatus for programming data into flash memory incorporating with dedicated acceleration hardware |
US11966604B2 (en) | 2021-09-29 | 2024-04-23 | Silicon Motion, Inc. | Method and apparatus for programming data arranged to undergo specific stages into flash memory based on virtual carriers |
US11972150B2 (en) | 2021-09-29 | 2024-04-30 | Silicon Motion, Inc. | Method and non-transitory computer-readable storage medium and apparatus for programming data into flash memory through dedicated acceleration hardware |
WO2023123849A1 (en) * | 2021-12-28 | 2023-07-06 | 苏州浪潮智能科技有限公司 | Method for accelerated computation of data and related apparatus |
Also Published As
Publication number | Publication date |
---|---|
CN109308280B (en) | 2021-05-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109308280A (en) | Data processing method and relevant device | |
CN105808151B (en) | Solid state hard disk stores the data access method of equipment and solid state hard disk storage equipment | |
CN110377226B (en) | Compression method and device based on storage engine bluestore and storage medium | |
CN105518611B (en) | A kind of remote direct data access method, equipment and system | |
KR102219845B1 (en) | Method and apparatus for compressing addresses | |
US9584332B2 (en) | Message processing method and device | |
CN105446893A (en) | Data storage method and device | |
CN108874688B (en) | Message data caching method and device | |
CN106657356A (en) | Data writing method and device for cloud storage system, and cloud storage system | |
EP3051408A1 (en) | Data operating method and device | |
CN109710185A (en) | Data processing method and device | |
CN110333956A (en) | Message storage method, device, medium and electronic equipment in message queue | |
CN109597653A (en) | Method, BIOS and the BMC of BIOS and BMC command interaction | |
CN109902059A (en) | A kind of data transmission method between CPU and GPU | |
CN112148498A (en) | Data synchronization method, device, server and storage medium | |
CN109933303B (en) | Multi-user high-speed pseudo-random sequence generator circuit and working method thereof | |
CN106843748A (en) | It is a kind of to improve the method and system that data write movable storage device speed | |
CN107102889A (en) | A kind of resources of virtual machine method of adjustment and device | |
CN106302625B (en) | Data-updating method, device and related system | |
CN111181874A (en) | Message processing method, device and storage medium | |
CN105550089B (en) | A kind of FC network frame head error in data method for implanting based on digital circuit | |
CN109597577A (en) | A kind of method, system and relevant apparatus handling NVME agreement read write command | |
CN110134340A (en) | A kind of method, apparatus of metadata updates, equipment and storage medium | |
CN116701248B (en) | Page table management method, unit, SOC, electronic device and readable storage medium | |
CN109933435A (en) | Control method, device and computer equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20200415 Address after: 518129 Bantian HUAWEI headquarters office building, Longgang District, Guangdong, Shenzhen Applicant after: HUAWEI TECHNOLOGIES Co.,Ltd. Address before: 301, A building, room 3, building 301, foreshore Road, No. 310052, Binjiang District, Zhejiang, Hangzhou Applicant before: Huawei Technologies Co.,Ltd. |
|
GR01 | Patent grant | ||
GR01 | Patent grant |